DNS issue causing loss of sanity and hair

Ok, I have a DNS issue causing loss of sanity (and hair). I have one single entry in the Hosts configuration, and DHCP server, that does not want to work right. DNS lookup, is the specific issue. Every other entry in my DNS configuration seems to work fine. but for this single entry, nslookup from any device fails. Device is on the network/wire, can ping ip address fine. But neither forward nor reverse lookup works. This device has two network interfaces, first one is wired ethernet, second one is wireless. The wireless entry in DNS works only for reverse lookup, not forward lookup. Just as the wired ethernet interface fails to forward lookup, but can be pinged.

For example

  1. The name pi4modelb0.dd.org fails nslookup (forward), but should return 192.168.1.00 (ethernet) and 192.168.1.101 (wireless).

  2. The ip address of 192.168.1.100 fails to resolve via nslookup (reverse) from Windows 10

  3. The ip address of 192.168.1.101 successful resolve via nslookup (reverse) from Windows 10, to pi4modelb0.dd.org

This is really odd, at least to me. Suggestions?

Hi @dachshund-digital - Welcome to the IPFire Community!

I’m assuming pi4modelb0.dd.org is a local address. Can you try something simple? Maybe just try pi4modelb0 or maybe testpi.

Maybe dd.org is messing things up.

I tried that, no joy, so I finally just removed the DNS entry and recreated it. And that seemed to fix the issue. Really odd, that ONLY the single given entry would not work as expected. I have build and support DNS on Linux, even written bind configuration files from scratch when needed. So I am familiar with the technology. I am not explicitly familiar with unbound, so if removing and adding the entry via the UI had failed, I would have started looking at the configuration files. My guess is that something just got out of wacko with the given entry. Maybe an obscure bug corrupted the forward lookup zone file? But if so, why just the one entry and not many of not all entries in the forward lookup zone file. Odd, very odd.

This is continues to happen, ever since I upgraded to 146 or 147 on. Old local addresses added, work, but any new addresses added, do not work. Clearly something is corrupting or screwing up the configuration files via the web/ui or such. Again this happened to day. I added one new entry. The reverse lookup works, the forward lookup does not work. Older entries work forward and reverse. It is always the newest entry that fails to resolve forward lookup, until I go into the configuration files and manually add it to the file. Then magically things start working as expected. I struggle to believe that no one else is not having the same issue? Running on 32 bit hardware, older hardware. Maybe this is something that does not happen on 64 bit hardware? Would be odd, but sometimes such happens.

From my hosts.conf file…

local-data: "cable.dd.org 60 IN A 192.168.100.1"
local-data: "1.100.168.192.in-addr.arpa 60 IN PTR cable.dd.org"
local-data: "resource.dd.org 60 IN A 192.168.1.253"
local-data: "253.1.168.192.in-addr.arpa 60 IN PTR resource.dd.org"

The cable reference, # nslookup cable.dd.org does not work for forward lookup, but nslookup 192.168.100.1 works.

The resource reference, very old entry, works fine for forward and reverse lookup. This makes no sense at all. Surely someone has seen this, had this issue? This is really getting to be an issue, where I am going to have to stop using ipfire and find something that actually consistently works. I used BIND for years on Linux and never had this type of issue, that keeps reoccurring.

I usually stop/start DHCP and restart unbound (from console) when I have local resolution issues.

When a new system is introduced into the LAN (laptop, desktop, VM), it takes time from other clients to resolve it by name. When a system leaves the LAN, it takes time for that hostname/ip to be ‘forgotten’.

Right, the TTL scenario… I would say that is part of it, but I have validated that the issue remains after more than 24 hours. Moreover, I have no other local DNS sources other than ipfire. I months ago, had a dedicated DNS server, but retired it because it did not make sense, at the time, for a home lab setup to have multiple local DNS servers. I have built BIND based DNS servers and used them for years, so you can understand that this issue with ipfire that I did not have or never recall having before 147 release is now becoming a real issue. I have zeroed out the configuration twice and recreated each DNS entry (both A records and PTR records) explicitly now, and I really don’t want to do it again. The scenario has been the same both times this has happened, original entries work fine, until I add a new entry, which is rare, and something goes sideways, some times. I have added several entries with no issues, then have one that just refused to work at all. I have reviewed the hosts.conf, it looks clean, legit. So I can only think that the issue is somewhere else. Given that DNS non-local resolution and forwarding is working fine, and this is only local scope, that would suggest again that it is core to something in ipfire DNS implementation. When the issue happens, even nslookup on the ipfire console fails for forward lookup but works for reverse lookup… you can’t tell me that makes sense. If neither forward nor reverse worked, then I would agree there is a cache issue, maybe. But if I reboot ipfire after a change, and validate at the console, it should work consistently and immediately.

Here is an example of exactly what I am talking about. I created a new ipfire server, the latest build/release (148). And as I added test entries to the DNS configuration, each one worked until I added one more, and the newest one, fails to resolve as a forward look up. But reverse lookup does work.

# cat /etc/unbound/hosts.conf
# This file is automatically generated and any changes
# will be overwritten. DO NOT EDIT!

local-data: "wall.dd.org 60 IN A 192.168.1.91"
local-data: "91.1.168.192.in-addr.arpa 60 IN PTR wall.dd.org"
local-zone: dd.org typetransparent
local-zone: local typetransparent
local-data: "cable.dd.org 60 IN A 192.168.100.1"
local-data: "1.100.168.192.in-addr.arpa 60 IN PTR cable.dd.org"
local-data: "green.dd.org 60 IN A 192.168.1.91"
local-data: "red.dd.org 60 IN A 192.168.0.91"
local-data: "91.0.168.192.in-addr.arpa 60 IN PTR red.dd.org"
local-data: "zzde61263.local 60 IN A 192.168.1.166"
local-data: "166.1.168.192.in-addr.arpa 60 IN PTR zzde61263.local"

[root@wall ~]# nslookup 192.168.1.166
166.1.168.192.in-addr.arpa      name = zzde61263.local.

[root@wall ~]# nslookup zzde61263.local
Server:         127.0.0.1
Address:        127.0.0.1#53
Name:   zzde61263.local
Address: 192.168.1.166
** server can't find zzde61263.local: NXDOMAIN

Once this error results, the environment goes downhill fast, once working forward references stop working, until you end up with a mix-matched scenario where some forward lookups work, some don’t. Stopping and restarting DNS service does not seem to make a difference every time, based on the testing I have been doing. In this latest test, stopping the service and restarting, did improve the situation, but such is not a solution, at best, a work-around.

Unfortunately, in effect, this issue, makes ipFire completely useless to us, in reference to any DNS resolution functions. This is the 3rd time I have been consistently able to recreate the issue. Sometimes it takes 100 entries, times less, in this last case, about 5 A/PTR pairs.

Before the errors, cable.dd.org forward and reverse lookup worked fine, now after the error, it fails.

[root@wall ~]# nslookup cable.dd.org
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:   cable.dd.org
Address: 192.168.100.1
** server can't find cable.dd.org: NXDOMAIN

[root@wall ~]# nslookup 192.168.100.1
1.100.168.192.in-addr.arpa      name = cable.dd.org.

Are we completely missing something here? As I noted before, never seen this behavior on a BIND DNS server instance, that was configured as in should be, in an appropriate manner. I guess this issue has not gained exposure as maybe it should? Given that serious (scaled) enironments likely have dedicated DNS resources, not integrated to a firewall? Just a guess?

Hi,

I get exactly the same results as you: I see you also get the right IP address for an A request followed by the NXDOMAIN line.

I am not an expert, but I have an suggestion: check /etc/unbound/unbound.conf to contain these lines

# Import any local configurations
include: "/etc/unbound/local.d/*.conf"

If above is present please check the folder /etc/unbound/local.d/ if contains any BlockList - mine contains a HUGE block list because I installed the DNSBlockList script (found in one of the pages of this forum). All that list return NXDOMAIN as answer…

Next check /etc/unbound/unbound.conf to contain these lines:

	# Include hosts
	include: "/etc/unbound/hosts.conf"

Also, this might help:

	# Listen on all interfaces
	interface-automatic: yes
	interface: 0.0.0.0

And, for my case, all IP addresses are also registered in DHCP Leases so this line helps to get A records from DHCP lease database. In my case, the PTR records returned by unbound are from this file!!

	# Include DHCP leases
	include: "/etc/unbound/dhcp-leases.conf"

With all these present in server configuration the answers should come from the local files you edited…

If no answers are found in above file, I thing the next check will be based on this server config line:

	# Include any forward zones
	include: "/etc/unbound/forward.conf"

You can check the answer of the DNS server you have in /etc/unbound/forward.conf
In my case the server I listed in /etc/unbound/forward.conf is the one that responds with NXDOMAIN to my requests -> this seems to be the cause that after I get the right IP from hosts.conf, I also see the line with NXDOMAIN afterwards.

Again, I am not an expert - please take my advice with caution!

hope it helps,
H&M

1 Like

Thanks for the suggestions, will work through them, testing accordingly, of course.

As an update… we are now having to stop and restart the unbound service on ipfire each time we change any DNS entries, and at least once every 24 hours due to DNS resolution failures. The unbound service stops resolving at least once or twice a day now, even if we do no changes for a given 24 hour period. Right after a restart, everything is working fine. But at some point over the course of several hours, unbound just fails to continue to resolving queries. Again, this has been happening on my 32 bit based firewall since release 147. On 146, I would maybe, have to restart unbound once every few months. There is no logical reason why queries would fail like this. Again, these are only local queries that are failing, and only forward queries, never reverse queries. I would think this would point to some very specific code in the unbound services that would suspect.

I’m in the same situation as previous posters. From the ipfire console (10.0.0.1),

[root@ipfire ~]# nslookup 10.0.0.18
18.0.0.10.in-addr.arpa name = mint.lan.

[root@ipfire ~]# nslookup mint.lan 10.0.0.1
Server: 10.0.0.1
Address: 10.0.0.1#53

Name: mint.lan
Address: 10.0.0.18
** server can’t find mint.lan: NXDOMAIN

the solution seems to be, restart unbound, to sync up dhcp entries with DNS cache.

I finally established a small script and action trigger, where our internal monitoring system monitors DNS lookup queries and if we see a failure threshold exceeded, we bound the unbound service. This is ugly nasty hack IMHO. We are seriously considering establishing standalone local DNS servers based on BIND like we once had.

I also stumbled across this problem today on two systems already running this version live.
A soon fix for everyone wold be very appreciated.

Im having problems with VPN N2N connections because my U148 systems are failing to resolve the external domains for the VPN server.

I get errors like this now on various systems with U148 installed.

error: SERVFAIL <myclient.mydomain.de.localdomain. AAAA IN>: all the configured stub or forward servers failed, at zone .  

Any solution for this in sight?

@dachshund-digital
Could you please provide this script?

I can’t provide the script, it is integrated to our monitoring solution. But I can tell you what it does, it just does a simple DNS query by name and by IP for a few devices internal and external to our local DNS on ipFire. If there are any failures on query, the unbound service is stopped and restarted by our monitoring solution, and application notification send via typical enterprise monitoring methods, SNMP, etc.