Domain Name System Status:Broken | 141

Hi @troll-op,

I just wanted to see if anything was going wrong in the creation of the forward file.
I disabled all my dns servers and added the two google ones and set the protocol to TCP and my forward.conf is precisely the same as yours. My unbound.conf is also precisely the same as yours.

It seems that the problem with making a connection to the DNS servers is further along the path than unbound.
At this point I am very much out of my depth and can’t help further with suggestions for further debugging but there will be other people on the forum who will be able to give better suggestions of what to investigate next.

Sorry I wasn’t able to help you enough on this topic.

No worries, all help is appreciated, even when there are no positive results. Besides we now know it’s not the conf files :grin:

1 Like

If you use IPS (suricata) try to disable it. I have still the problem at on one of my systems that suricata blocks unbound without any log entry.

If you not use suricata check if there is an other firewall or router before the IPFire that mess with the DNS queries.

Have you already tried to resove some addresses with dig via 8.8.8.8?
eg. “dig ipfire.org @8.8.8.8” and “dig +dnssec ipfire.org @8.8.8.8
maybee your ISP redirect all DNS Traffic to servers that not support DNSSec.

You had me excited there for a second… but alas the machine does the same with IPS/IDS (suricata) and Guardian disabled. Was also thinking that this could have something to do with it, but failed to see how, as nothing really changes except the protocol.
dig slashdot.org @8.8.8.8

; <<>> DiG 9.8.3-P1 <<>> slashdot.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13040
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;slashdot.org.			IN	A

;; Query time: 13 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct  7 16:56:49 2020
;; MSG SIZE  rcvd: 30

Same result for ipfire.org, that ^ just happend to be last one I tried.
Without TLS I have…
dig ipfire.org @8.8.8.8

; <<>> DiG 9.8.3-P1 <<>> ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5297
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3354	IN	A	81.3.27.38

;; Query time: 9 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct  7 17:08:38 2020
;; MSG SIZE  rcvd: 44

No clue… Not the end of the world.

1 Like

8.8.8.8 should answer to ipfire.org with the ad flag.

[root@ipfire ~]# dig ipfire.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25276
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3599	IN	A	81.3.27.38

;; Query time: 90 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 19:33:03 CEST 2020
;; MSG SIZE  rcvd: 55

And more important the +dnssec test. This must have the keys to verify…

[root@ipfire ~]# dig +dnssec ipfire.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> +dnssec ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40426
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3294	IN	RRSIG	A 8 2 3600 20201015000000 20200924000000 54142 ipfire.org. Xu6EOxvrGtuMp4/oY0Hc5lzTCqrUUoDVrJDn4qo2sv+FVwvrlhLlG6nj 7HPGLD/TKLg/EWpy9WlpauBLpFTuszcsP/b992qjKfqSwpOTC+xjfyCV XlA5Zy6GgYCvF+29j/Av/ZeoGxjt69NgrMZ9rg3PRza7zMlw9Je4Ic4U USziwYkQvN4vp2FETH7017CH/b7HfSRU43r9bybm8jsKRy9y0P/mCY2z KpEPV4Qp1e39Fq4/W0M/4bRcaKb1K0EAQRzdRxeDofdrr+AbTDMPId0O 7QgS1w9asGj7RqEfX9QKhe2BHtTtCC/QkeTwT7B6RRNXOTReWaz8vwZl FVGJbA==
ipfire.org.		3294	IN	RRSIG	A 10 2 3600 20201015000000 20200924000000 6536 ipfire.org. iturglnGDaYGlLFppUl7UHN6DbXcs3UEHvmBbns5J9bsqSiuiTEP6ykl NPpg+f0Qm5cLRCzco3CrPd9q5Fo7fVzPK6oFqJuwEkERiMk/UppiZRZ4 w79BMHCRrnrB2SUtGjDfRPEdQ5s0jciDGb5NvdoCFFXCW1TVLvEkeOLh aYXYWQS/h3Pi2rxrA4sKuzBJBxYVzH5t177Dpm0B74hEVIYDNirQXOAO 0PDQZ66Zy1RF5TZsfbVdhRDHRxaaSCV4CMCFHejAGgQaJdEFKaByQ2jM x5skDlm8JdNIupBe6GXcAlCWm/0rQ4V1VivC4V9DaU6StRW75N+vrYr9 o8WnUw==
ipfire.org.		3294	IN	A	81.3.27.38

;; Query time: 14 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 19:33:10 CEST 2020
;; MSG SIZE  rcvd: 651

You should run this on the IPFire because you need a DNSSec enabled dig version.
IPFire currently ship dig 9.11.21

1 Like

Here are the two…

dig ipfire.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36036
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3599	IN	A	81.3.27.38

;; Query time: 232 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 20:40:43 SAST 2020
;; MSG SIZE  rcvd: 55

and the dnssec…

dig +dnssec ipfire.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> +dnssec ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12237
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3599	IN	RRSIG	A 8 2 3600 20201015000000 20200924000000 54142 ipfire.org. Xu6EOxvrGtuMp4/oY0Hc5lzTCqrUUoDVrJDn4qo2sv+FVwvrlhLlG6nj 7HPGLD/TKLg/EWpy9WlpauBLpFTuszcsP/b992qjKfqSwpOTC+xjfyCV XlA5Zy6GgYCvF+29j/Av/ZeoGxjt69NgrMZ9rg3PRza7zMlw9Je4Ic4U USziwYkQvN4vp2FETH7017CH/b7HfSRU43r9bybm8jsKRy9y0P/mCY2z KpEPV4Qp1e39Fq4/W0M/4bRcaKb1K0EAQRzdRxeDofdrr+AbTDMPId0O 7QgS1w9asGj7RqEfX9QKhe2BHtTtCC/QkeTwT7B6RRNXOTReWaz8vwZl FVGJbA==
ipfire.org.		3599	IN	A	81.3.27.38
ipfire.org.		3599	IN	RRSIG	A 10 2 3600 20201015000000 20200924000000 6536 ipfire.org. iturglnGDaYGlLFppUl7UHN6DbXcs3UEHvmBbns5J9bsqSiuiTEP6ykl NPpg+f0Qm5cLRCzco3CrPd9q5Fo7fVzPK6oFqJuwEkERiMk/UppiZRZ4 w79BMHCRrnrB2SUtGjDfRPEdQ5s0jciDGb5NvdoCFFXCW1TVLvEkeOLh aYXYWQS/h3Pi2rxrA4sKuzBJBxYVzH5t177Dpm0B74hEVIYDNirQXOAO 0PDQZ66Zy1RF5TZsfbVdhRDHRxaaSCV4CMCFHejAGgQaJdEFKaByQ2jM x5skDlm8JdNIupBe6GXcAlCWm/0rQ4V1VivC4V9DaU6StRW75N+vrYr9 o8WnUw==

;; Query time: 214 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 20:39:38 SAST 2020
;; MSG SIZE  rcvd: 651

This is when using TCP as protocol. Swapping over to TLS, the GUI shows status broken. Doing the above digs on slashdot, just to get different results.

dig +dnssec ipfire.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> +dnssec ipfire.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12237
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;ipfire.org.			IN	A

;; ANSWER SECTION:
ipfire.org.		3599	IN	RRSIG	A 8 2 3600 20201015000000 20200924000000 54142 ipfire.org. Xu6EOxvrGtuMp4/oY0Hc5lzTCqrUUoDVrJDn4qo2sv+FVwvrlhLlG6nj 7HPGLD/TKLg/EWpy9WlpauBLpFTuszcsP/b992qjKfqSwpOTC+xjfyCV XlA5Zy6GgYCvF+29j/Av/ZeoGxjt69NgrMZ9rg3PRza7zMlw9Je4Ic4U USziwYkQvN4vp2FETH7017CH/b7HfSRU43r9bybm8jsKRy9y0P/mCY2z KpEPV4Qp1e39Fq4/W0M/4bRcaKb1K0EAQRzdRxeDofdrr+AbTDMPId0O 7QgS1w9asGj7RqEfX9QKhe2BHtTtCC/QkeTwT7B6RRNXOTReWaz8vwZl FVGJbA==
ipfire.org.		3599	IN	A	81.3.27.38
ipfire.org.		3599	IN	RRSIG	A 10 2 3600 20201015000000 20200924000000 6536 ipfire.org. iturglnGDaYGlLFppUl7UHN6DbXcs3UEHvmBbns5J9bsqSiuiTEP6ykl NPpg+f0Qm5cLRCzco3CrPd9q5Fo7fVzPK6oFqJuwEkERiMk/UppiZRZ4 w79BMHCRrnrB2SUtGjDfRPEdQ5s0jciDGb5NvdoCFFXCW1TVLvEkeOLh aYXYWQS/h3Pi2rxrA4sKuzBJBxYVzH5t177Dpm0B74hEVIYDNirQXOAO 0PDQZ66Zy1RF5TZsfbVdhRDHRxaaSCV4CMCFHejAGgQaJdEFKaByQ2jM x5skDlm8JdNIupBe6GXcAlCWm/0rQ4V1VivC4V9DaU6StRW75N+vrYr9 o8WnUw==

;; Query time: 214 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 20:39:38 SAST 2020
;; MSG SIZE  rcvd: 651

and the dnssec…

dig +dnssec slashdot.org @8.8.8.8

; <<>> DiG 9.11.21 <<>> +dnssec slashdot.org @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35598
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;slashdot.org.			IN	A

;; ANSWER SECTION:
slashdot.org.		545	IN	A	216.105.38.15

;; Query time: 188 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Oct 07 20:46:34 SAST 2020
;; MSG SIZE  rcvd: 57

misses the keys.
And whilst the DNS in part resolves, all web browsers on the network fail to resolve any pages. nslookup/dig on workstation also fail.

1 Like

Now test the same with @127.0.0.1 If this fail unbound has a problem. (Both normal should give the same result as @8.8.8.8 if the configured server is 8.8.8.8 (udp) it should automaticly failback to tcp for large packets.

I am new here, Where do you configure the upstream DNS choices?

@trish from the WUI, Network>Domain Name System

2 Likes

Thank you.

Changed to TLS and disabled extenal DNS servers.

dig slashdot.org @127.0.0.1

; <<>> DiG 9.11.21 <<>> slashdot.org @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30288
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;slashdot.org.			IN	A

;; ANSWER SECTION:
slashdot.org.		884	IN	A	216.105.38.15

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Oct 09 00:52:14 SAST 2020
;; MSG SIZE  rcvd: 57

And dnssec failed to list any keys.

dig +dnssec slashdot.org @127.0.0.1

; <<>> DiG 9.11.21 <<>> +dnssec slashdot.org @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17901
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;slashdot.org.			IN	A

;; ANSWER SECTION:
slashdot.org.		900	IN	A	216.105.38.15

;; Query time: 197 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Oct 09 00:51:58 SAST 2020
;; MSG SIZE  rcvd: 57

UPDATE: I did the core 150 update over the weekend (17.10.2020). Tested DNS over TLS. All is working. And on a side note after restarts it no longer takes for ever to start. Well done to the devops. :kissing_heart:

1 Like

Hi Guys.

I have applied the update from 148 to 150 to 24 machines and in one I have had problems with the DNSs.

With the 148 everything worked correctly and when applying the 150, problems with the IPSec. In the end, I see that the DNSs were in “Broken” and that is why the IPSec was not raised. It also did not synchronize the time with NTP. Terrible.

In the end, the only way to make it work was to put the team in “Working (Recursor Mode)” by checking the “Use ISP-assigned DNS servers” box.

Bye.

UPDATE: I correct, out of 24 machines, 19 with DNS problems.

UPDATE II: Mea culpa, I am asleep. I have a GeoIP group denying outgoing to this group where I have put “A3 Worldwide Anycast Instance” and that is why it does not work. It´s SOLVED for me.

2 Likes

Have a nice coffee, @roberto :slight_smile:

This is my system level of IPFIRE:

|IPFire version|IPFire 2.25 (x86_64) - core151|
|Pakfire version|2.25-x86_64|
|Kernel version|Linux ipfire.xx.yy.com 4.14.198-ipfire #1 SMP Mon Oct 5 21:52:30 GMT 2020 x86_64 Intel® Pentium® Dual CPU E2200 @ 2.20GHz GenuineIntel GNU/Linux|

The DNS Broken status has been happening on/off from 141 to 151. DNS comes and goes for some random devices and/or destination URLs.

Andreas Otto troll-op has done a good job of asking questions. My DNS resolves most of the time, now and then it will quit resolving for some of the devices and/or destination URLs. A reboot tends to bring IPFIRE as a whole, back to life. i.e. The DNS resolves what it was not resolving before the reboot. It appears to be totally random failures of DNS for devices and/or destination URLs. I keep peeking at the log files, but nothing new to report that has not been reported already. Hopefully some thing will click and the problem will knowingly be solved.

I am on 152 and have been experiencing similar wobbly DNS for months now. In the following I identify an exact time frame when these things started on my machine, as well as what looks like a parsing error.

What caught my eyes was the statement in post #27 about spaces. Years (sigh, decades) ago I ran into a problem with a file parser that died when one used spaces instead of tabs in /etc/hosts. So I decided to check.

I logged in on my IPFire box and typed grep " " /etc/hosts which returned:

127.0.0.1	localhost.localdomain localhost

Replacing the space between localhost.localdomain and localhost with a tab made the name resolution start working again.

FYI: Name resolution stopped working after my router did a soft restart, so that might be a hint to another problem.

So, how bad is it? See for yourself:

[root@ipfire ~]# zgrep -c "all the configured stub or forward servers failed, at zone ." /var/log/messages* \
| awk -F: '{printf $2 "+"}END{print "0"}' \
| bc
121997

That is from 25 Mar to 13 Dec, or 264 days (don’t fall for the fence post error), giving an average of just over 462 log entries per day.

More importantly, it was in the same log file that I started seeing these errors:

DDNSResolveError: Could not resolve DNS entry

but they started on 22 Mar, three days earlier.

My logs on this machine go back to 27 Jan, so I think the above should help.

I am about to go on leave (in the mountains), so won’t have access to the firewall. In fact, it will be switched off while I am gone, so I cannot help any more on this right now.

I got rid of the space in my hosts file also. replaced it with a tab. Does not sound like it matters.

[root@ipfire2 log]# zgrep -c “all the configured stub or forward servers failed” , at zone ." /var/log/messages*
| awk -F: ‘{printf $2 “+”}END{print “0”}’
| bc
568340

This is my total for a year and a week. Only 568,340 on Jan 2, 2021
Update: new total 568,388 on Jan 4, 2021
Update: new total 568,410 on Jan 5, 2021
Update: new total 568,429 on Jan 6, 2021
Update: new total 568,446 on Jan 8, 2021 (after this i went from 4 DNS servers to 7 DNS servers listed.)
Update: new total 568,447 on Jan 9, 2021
Update: new total 568,449 on Jan 10, 2021
Update: new total 568,457 on Jan 12, 2021
Update: new total 568,466 on Jan 19, 2021 (It appears that the count is not increasing as fast. 7 DNS servers appears to be better than 4 DNS servers in this situation.)
Update: new total 568,474 on Jan 20, 2021 (Is it realistic to see this many “servers failed” messages?)
Update: new total 568,520 on Jan 25, 2021
Update: new total 568,542 on Jan 31, 2021

my ipfire home setup is still failing the biggest test of all, the wife knows there is a repeating problem, she keeps loosing internet connectivity.
Other than that, nothing much to add to what has already been said.

Yes there problem if the internet connection has some paket loss. Unbound measure the times of the upstream servers and if a server needs much longer to anser or not answer after some retries it will put on hold for 15 min. Even if no upstream servers left it not try to use it again. We hope to fix this in one of the next core updates.

Tab or space are equal in the /etc/hosts syntax. If not it is a parser bug:
https://man7.org/linux/man-pages/man5/hosts.5.html
I have never problems to resolve names in /etc/hosts …

1 Like

Interesting bit of info on unbound.

From Arne.F : Yes there problem if the internet connection has some paket loss. Unbound measure the times of the upstream servers and if a server needs much longer to anser or not answer after some retries it will put on hold for 15 min. Even if no upstream servers left it not try to use it again. We hope to fix this in one of the next core updates.

These maybe questions that are from my lack of knowledge on this unbound.
So for unbound is it better to have more DNS servers set, than fewer DNS servers? So that when a 15 minute timeout occurs, there are more choices? Does “all the configured stub or forward servers failed, at zone .” happen when unbound times out all DNS servers?

This may be the last update of the “servers failed”
Update: new total 568,542 on Jan 31, 2021
Update: new total 574,672 on Feb 7, 2021
Yea, a big jump of “servers failed”

Quick update as to what I have done lately.
-When websites/URLs quit working, while some are still connected, this ratio changes until more and more of the sites quit. the ping is still working with ipv4 address.
-Before I would reboot/restart IPFire, this would take minutes and stop any body that was still connected to the internet. The wife would be happy again, but know something happened. Remote working has it’s ups and downs.
-Now I found this cmd and it only takes about 40+ seconds to execute. Those connected to the internet still working, do not even notice most of the time.
“/etc/init.d/network restart”
People are happier. Something got reset in the network and things connect to the internet afterwards.
-Is this cmd to be used in this fashion? or is there another way? Just throwing it out there.

Hello @agibson!

This command line is a little different from yours (above). It separates each log by date. I don’t care about the message logs a year ago. How about including them since 1 Dec 2020 (or the first date in December)?

for logf in $(ls /var/log/messages* | sort -rV | tail -12) ; do ls -al $logf ; zgrep -ic "SERVFAIL" $logf ; done

Also what Core Update are you currently using? I saw Core Update 151 above. Is that still current?

-and-

Please add a screenshot of your DNS server page at https://IPFIRE:444/cgi-bin/dns.cgi or menu Network > Domain Name System.

EDIT: Update above command line