Override/disable DNSSEC system?

I realize this is not the intent of the new DNS system, however it has created havoc for us on our internet connection. We rely on a 3G mobile internet service here in Cameroon, Africa. For some reason, the new DNSSEC system often reports “Broken” and blocks traffic, even though when I connect directly to the 3g modem, the internet is working fine.

I’ve used IPCop and IPFire since 2005 with one of the biggest benefits being Update Accelerator on our metered connections. So, I’d prefer to stay with IPFire since that’s what I know, but the DNSSEC system is beginning to force me to consider abandoning it for something else of which I have no idea what to use.

Thanks

Hi,

For some reason, the new DNSSEC system often reports “Broken” and blocks traffic, even though when I connect directly to the 3g modem, the internet is working fine.

Some users reported similar problems if the IPS was enabled. It seems like Suricata sometimes messes with DNS connections, especially if your machine is behind a very slow uplink connection.

Could you please disable the IPS and let us know if the problem persists?

Anyway: Which DNS resolvers have you configured? Are you using UDP, TCP or TLS?

I am surprised to hear that this feature is still saving a considerable amount of traffic as nearly all updates seem to be downloaded via HTTPS those days, where we cannot apply local caching.

Thanks, and best regards,
Peter Müller

I do not have IPS enabled. I think your point that it has to do with slow uplink.

Here’s a screen shot of the DNS services I have. I have tried all three protocols in an effort to see which is least problematic/more reliable. I’d say I don’t see a big difference between them in my circumstances, other than some of them don’t support TLS.

What I find is that it takes the system a quite a number of minutes, even up to 5-10 minutes at times to start working after a disruption in service.

As for Update Accelerator, there are still system updates that work, but it is not as effective as it used to be which is unfortunate. Apple updates work for the most part, as do many Microsoft. Linux stuff caches as well.

Thanks,
Chris Jackson

Good Morning,

can you quantify the speed/latency of that internet connection you have?

Is unbound logging anything useful to /var/log/messages?

1 Like

can you quantify the speed/latency of that internet connection you have?

On a good day, this is SpeedTest results for one of the two connections we have available (sure wish multiwan capability had become a reality on IPFire!):

Ping: 206; Download: 4.83Mbps; Upload: 1.63Mbps

I wanted to attach a log file to this reply, but I don’t have authorization to do so. The Unbound log shows SERVFAIL messages during the timeframe when the DNS System is broken. What exactly would be of help to you from the log? What kind of info do you need?

This is not exactly a fibre connection, but not bad either. It should not affect DNSSEC at all.

There could be a chance that your ISP is corrupting packets and therefore you cannot receive the whole DNSSEC root key - common problem sometimes. Using TCP should fix that. TCP should always work.

1 Like

@ms

Ok. I will use TCP and see how it goes. I did confirm yesterday that the primary ISP we rely on here was having some major routing/DNS problems themselves which just so happened to be the channel the traffic from this side of the country uses. It seems to have been resolved for the most part and we are seeing better results. So, that is certainly a big part of this problem.

That sounds good. If you still have problems with that, check your MTU on the mobile link.

1 Like

Some more feedback: what I’ve discovered is that there are times I cannot get the DNS to start working. It continues to report “broken” even though a direct connection to the MiFi router we use is working. Repeated attempts to wake up the DNS connection by either refreshing the connection to the WAN router, or by clicking “Check DNS Servers”. Then, sometimes, if I change the protocol from UDP to TCP, or back to UDP, it suddenly wakes up and starts working.

Hi,

please provide related (i. e. containing unbound) log entries in /var/log/messages.

Thanks, and best regards,
Peter Müller

Here is the unbound log from today during which I had a number of times with long delays with DNS in a “broken” state, though traffic was passing on the two different services during that timeframe.

unbound.txt.zip (17.4 KB)

As you can see you are losing responses:

18:26:36 unbound: [11122:0]  info: validation failure <keyvalueservice.fe.apple-dns.net. A IN>: key for validation . is marked as invalid
18:26:35 unbound: [11122:0]  info: validation failure <gsa.apple.com.akadns.net. A IN>: key for validation . is marked as invalid
18:26:34 unbound: [11122:0]  info: validation failure <15-courier.push.apple.com. A IN>: key for validation . is marked as invalid
18:26:34 unbound: [11122:0]  info: validation failure <35-courier.push.apple.com. A IN>: key for validation . is marked as invalid
18:26:33 unbound: [11122:0]  info: validation failure <tweetdeck.twitter.com. A IN>: key for validation . is marked as invalid

But those are not DNSKEY records. They are the records that you wanted. The A records.

So disabling DNSSEC won’t help you at all.

You either have an extreme amount of packet loss, or your upstream name servers do not respond.

You already have quite a lengthy list there which is good. I would recommend removing OpenDNS, because they do not understand what a neutral Internet is.

Could you trace with tcpdump how communication looks with your upstream name servers or could you check your latency graph if you are actually online?

1 Like

Gateway graph from one day. Peak latency roughly corresponds to times during which DNS was “broken”.

Hi,

thank you for providing additional data.

Those latency spikes are extremely high indeed (between 200 ms and 1 second), and I suspect them of being the root cause of your DNS disruptions. Perhaps a wired connection would be more reliable, but according to your initial description, this seems to be out of question.

However, you might wish to change the configured DNS servers, if the average latency to them is greater than ~ 200 ms. I’d recommend mtr for such measurements, as it is more versatile than traceroute.

Thanks, and best regards,
Peter Müller

@pmueller

Thank you for your input. I’ve copied below recent unbound log content to give you an idea of what is going on. I have an active connection to the internet via one of our 2 3G connections we use here. Wired connection is an impossibility as we are in a small village in Cameroon. 3G is the only option. I realize, of course, that my situation is unique relative to the bulk of IPFire users. A 180-200ms latency is the best we ever see here.

This morning, everything was working well, then the connection started going on and off, that is. The connection started to have small interruptions, though it continued working in spite of the instability.

This results in not being able to do anything through IPFire and can go on for a lengthy period of time. Yesterday, it took IPFire 10 minutes to get it’s DNS working again even while the other 3G connection we use was working, albeit it slowly and with an ~300ms latency. Slow, yes, but was working nonetheless.

The point of frustration I have is that before implementation of Unbound and the overall DNS security improvements, even with this type of instability, IPFire was not the barrier to internet access. Before, I could have assigned a fixed IP address for the DNS servers which, I think, is what made IPFire unaffected by these types of connection instabilities.

I’m sure it is not feasible in the current system using Unbound, etc., but I think it is important to consider how to accommodate people in these types of situations and how the new DNS system is actually making it almost impossible to rely on IPFire for all of the other great features it has (I have firewall rules, controls on who can and cannot access internet, etc., etc., as well as using Update Accelerator to save money and time on system updates).

This problem is a relatively new issue for me. I believe it is a combination of now being in rainy season which affects our connection speed/quality, and the changes in IPFire. Previously, IPFire has not been the problem at all, just the internet services.

What to do? Aside from abandoning IPFire, I really don’t know, nor do I know which other firewall tool there is that could serve our need in this context.

One example of the ping statistics this morning during the instability:
40 packets transmitted, 29 packets received, 27.5% packet loss
round-trip min/avg/max/stddev = 158.975/204.081/367.166/48.097 ms

UNBOUND Log segment during the instability:

11:30:03 unbound: [11122:0] error: SERVFAIL <trouter-neu-a.cloudapp.net. A IN>: all the configured stub or forward servers failed, at zone .
11:30:03 unbound: [11122:0] error: SERVFAIL <trouter-neu-a.cloudapp.net. AAAA IN>: all the configured stub or forward servers failed, at zone .
11:30:01 unbound: [11122:0] error: SERVFAIL <gateway.fe.apple-dns.net. A IN>: all the configured stub or fo rward servers failed, at zone .
11:29:59 unbound: [11122:0] error: SERVFAIL <11-courier.push.apple.com. A IN>: all the configured stub or f orward servers failed, at zone .
11:29:58 unbound: [11122:0] error: SERVFAIL <45-courier.push.apple.com. A IN>: all the configured stub or f orward servers failed, at zone .
11:29:38 unbound: [11122:0] info: generate keytag query _ta-4a5c-4f66. NULL IN
11:29:36 unbound: [11122:0] info: start of service (unbound 1.10.1).
11:29:36 unbound: [11122:0] notice: init module 1: iterator
11:29:36 unbound: [11122:0] notice: init module 0: validator
11:29:36 unbound: [11122:0] notice: Restart of unbound 1.10.1.
11:29:36 unbound: [11122:0] info: 1.000000 2.000000 4
11:29:36 unbound: [11122:0] info: 0.524288 1.000000 7
11:29:36 unbound: [11122:0] info: 0.262144 0.524288 14
11:29:36 unbound: [11122:0] info: 0.131072 0.262144 3
11:29:36 unbound: [11122:0] info: 0.065536 0.131072 1
11:29:36 unbound: [11122:0] info: 0.000000 0.000001 5
11:29:36 unbound: [11122:0] info: lower(secs) upper(secs) recursions
11:29:36 unbound: [11122:0] info: [25%]=0.240299 median[50%]=0.411941 [75%]=0.694185
11:29:36 unbound: [11122:0] info: histogram of recursion processing times
11:29:36 unbound: [11122:0] info: average recursion processing time 0.528054 sec
11:29:36 unbound: [11122:0] info: server stats for thread 0: requestlist max 12 avg 3.5 exceeded 0 jostled 0
11:29:36 unbound: [11122:0] info: server stats for thread 0: 35 queries, 1 answers from cache, 34 recursion s, 0 prefetch, 0 rejected by ip ratelimiting
11:29:36 unbound: [11122:0] info: service stopped (unbound 1.10.1).
11:29:30 unbound: [11122:0] error: SERVFAIL <init-p01st.push.apple.com. A IN>: all the configured stub or f orward servers failed, at zone .
11:29:26 unbound: [11122:0] error: SERVFAIL <browser.pipe.aria.microsoft.com. A IN>: all the configured stu b or forward servers failed, at zone .
11:29:23 unbound: [11122:0] error: SERVFAIL <mmx-ds.cdn.whatsapp.net. A IN>: all the configured stub or for ward servers failed, at zone .
11:29:21 unbound: [11122:0] error: SERVFAIL <pagead2.googlesyndication.com. A IN>: all the configured stub or forward servers failed, at zone .
11:29:21 unbound: [11122:0] error: SERVFAIL <ads.yieldmo.com. A IN>: all the configured stub or forward ser vers failed, at zone .
11:29:21 unbound: [11122:0] info: validation failure <tpc.googlesyndication.com. A IN>: SERVFAIL no DS for DS googlesyndication.com. while building chain of trust
11:29:21 unbound: [11122:0] error: SERVFAIL <googleads.g.doubleclick.net. A IN>: all the configured stub or forward servers failed, at zone .
11:29:21 unbound: [11122:0] info: validation failure <play.google.com. A IN>: SERVFAIL no DS for DS google. com. while building chain of trust
11:29:21 unbound: [11122:0] info: validation failure <www.google.com. A IN>: SERVFAIL no DS for DS google.c om. while building chain of trust
11:29:21 unbound: [11122:0] error: SERVFAIL <hbopenbid.pubmatic.com. A IN>: all the configured stub or forw ard servers failed, at zone .
11:29:21 unbound: [11122:0] info: validation failure <dmx.districtm.io. A IN>: SERVFAIL no DS for DS distri ctm.io. while building chain of trust
11:29:21 unbound: [11122:0] error: SERVFAIL <fastlane.rubiconproject.com. A IN>: all the configured stub or forward servers failed, at zone .
11:29:21 unbound: [11122:0] error: SERVFAIL <bidder.criteo.com. A IN>: all the configured stub or forward s ervers failed, at zone .
11:29:20 unbound: [11122:0] error: SERVFAIL <securepubads.g.doubleclick.net. A IN>: all the configured stub or forward servers failed, at zone .
11:29:03 unbound: [11122:0] info: generate keytag query _ta-4a5c-4f66. NULL IN
11:29:02 unbound: [11122:0] info: start of service (unbound 1.10.1).
11:29:02 unbound: [11122:0] notice: init module 1: iterator
11:29:02 unbound: [11122:0] notice: init module 0: validator
11:29:02 unbound: [11122:0] notice: Restart of unbound 1.10.1.
11:29:02 unbound: [11122:0] info: 4.000000 8.000000 1
11:29:02 unbound: [11122:0] info: 2.000000 4.000000 1
11:29:02 unbound: [11122:0] info: 1.000000 2.000000 2
11:29:02 unbound: [11122:0] info: 0.524288 1.000000 3
11:29:02 unbound: [11122:0] info: 0.262144 0.524288 4
11:29:02 unbound: [11122:0] info: 0.000000 0.000001 9
11:29:02 unbound: [11122:0] info: lower(secs) upper(secs) recursions
11:29:02 unbound: [11122:0] info: [25%]=5.55556e-07 median[50%]=0.32768 [75%]=0.841429
11:29:02 unbound: [11122:0] info: histogram of recursion processing times
11:29:02 unbound: [11122:0] info: average recursion processing time 0.770515 sec
11:29:02 unbound: [11122:0] info: server stats for thread 0: requestlist max 5 avg 0.8 exceeded 0 jostled 0
11:29:02 unbound: [11122:0] info: server stats for thread 0: 21 queries, 1 answers from cache, 20 recursion s, 0 prefetch, 0 rejected by ip ratelimiting
11:29:02 unbound: [11122:0] info: service stopped (unbound 1.10.1).
11:28:58 unbound: [11122:0] error: SERVFAIL <init-p01st.push.apple.com. A IN>: all the configured stub or f orward servers failed, at zone .
11:28:58 unbound: [11122:0] error: SERVFAIL <mobile.events.data.trafficmanager.net. A IN>: all the configur ed stub or forward servers failed, at zone .
11:28:58 unbound: [11122:0] error: SERVFAIL <mobile.events.data.trafficmanager.net. AAAA IN>: all the confi gured stub or forward servers failed, at zone .
11:28:56 unbound: [11122:0] error: SERVFAIL <trouter-neu-a.cloudapp.net. A IN>: all the configured stub or forward servers failed, at zone .
11:28:56 unbound: [11122:0] error: SERVFAIL <trouter-neu-a.cloudapp.net. AAAA IN>: all the configured stub or forward servers failed, at zone .
11:28:54 unbound: [11122:0] error: SERVFAIL <ocsp.apple.com. A IN>: all the configured stub or forward serv ers failed, at zone .
11:28:53 unbound: [11122:0] error: SERVFAIL <time-ios.apple.com. AAAA IN>: all the configured stub or forwa rd servers failed, at zone .
11:28:53 unbound: [11122:0] error: SERVFAIL <time-ios.apple.com. A IN>: all the configured stub or forward servers failed, at zone .
11:28:53 unbound: [11122:0] error: SERVFAIL <community.learningequality.org. A IN>: all the configured stub or forward servers failed, at zone .

This is actually really bad.

I understand that internet connectivity is difficult where you are, but nevertheless I would like to support your use case, too.

However, with losing one in four packets, this is difficult.

Only TCP will help you here because it will retransmit any DNS queries and responses which should drastically improve your results.

How saturated is your link on top of this? What is the maximum bandwidth you have and how much do you use?

Did you try using QoS to prioritise DNS?

Thank you for your feedback. Yes, I agree that this particular case with 25% packet drop is bad. However, on the other connection, with an avg latency of 300ms or thereabouts, even with next-to-no packet loss, it can sometimes take upwards of 10 minutes to get the DNS working again. So, it seems that with a latency of 300ms+ on average, Unbound just does not work efficiently for whatever reason.

I have not used QoS to prioritize DNS. I could do that. Any hints on how best to do so as I’ve never done that type of set up.

As for saturation, we are only two users on our system with max 4 devices. The main work we do in web browser and email client, along with WhatsApp. Our link, when working correctly. supports streaming video, though really only one at a time and we keep it down at 360 or 240 resolution setting so it is smoother.

I wish we had better internet, but it is way better than nothing! Haha.

Thanks for your input. I’ll look to see how best to set up the QoS.

I would simply set up two classes for the outbound direction and would put port 53 UDP/TCP in the ACK class. Consider limiting the default class to about 80% of your average bandwidth.

In the inbound direction just keep a single default class.

IPFire is always using fq_codel which prefers a sparse stream over the video stream. So things like DNS should be working fine.

So if the packet loss is happening because of the link being too saturated and packets queueing for many seconds then we can fix this with the QoS. If you link is simply broken and reception is bad, then there isn’t too much we can do about this.

I’m coming back to this topic because it continues to be a real problem for my system usability. I did some reading on unbound configuration and tried out one of the ways to turn off DNSSEC, but that didn’t help.

Basically, if there is any variation in our internet connection quality, which there is many times per day, DNS security reports “Broken” and all the rDNS entries on the page show that they failed to resolve. When this happens, it can take 10+ minutes for the DNS to start working again. If I connect directly to our internet connection device, traffic passes. So, it i the DSN security that is getting in the way of IPFire working on our setup now, which was not the case prior to the change to Unbound, I believe.

I looked at this page: https://nlnetlabs.nl/documentation/unbound/howto-turnoff-dnssec/

And I decided to try out
server:
val-permissive-mode: yes

That didn’t change anything.

Is there any way to turn off this security feature so that Unbound just routes traffic w/o whatever is going on to keep it from working on my system? Again, I realize t his is not ideal for security purposes, but a largely unusable/unreliable router is not ideal either.

Finally, doing some of my own sleuthing, I did find this error reported shown below when checking unbound via command line. Could this be part of the problem?

[root@ipfire unbound]# unbound -v
[1603450576] unbound[4758:0] notice: Start of unbound 1.11.0.
Oct 23 11:56:16 unbound[4758:0] error: can’t bind socket: Address already in use for 127.0.0.1 port 8953
Oct 23 11:56:16 unbound[4758:0] error: cannot open control interface 127.0.0.1 8953
Oct 23 11:56:16 unbound[4758:0] fatal error: could not open ports

Short and quick answer: Use DNSSEC supporting servers.

1 Like