Core 180: Unbound or DHCP issue?!?

roberto · 24 October 2023 17:05

Continuing the discussion from Unbound starts and stops a lot for unknown reason:

Hi.

I have an IPFire that has 180 and it is giving me these problems and the result is that it is impossible to work.

In the Unbound LOG, the following appears:

|19:01:39|unbound: [4388:0]|info: generate keytag query _ta-4a5c-4f66. NULL IN|
|---|---|---|
|19:00:49|unbound: [4388:0]|info: start of service (unbound 1.18.0).|
|19:00:49|unbound: [4388:0]|notice: init module 1: iterator|
|19:00:49|unbound: [4388:0]|notice: init module 0: validator|
|19:00:49|unbound: [4388:0]|notice: Restart of unbound 1.18.0.|
|19:00:49|unbound: [4388:0]|info: 0.131072 0.262144 1|
|19:00:49|unbound: [4388:0]|info: 0.065536 0.131072 3|
|19:00:49|unbound: [4388:0]|info: 0.032768 0.065536 7|
|19:00:49|unbound: [4388:0]|info: 0.016384 0.032768 10|
|19:00:49|unbound: [4388:0]|info: lower(secs) upper(secs) recursions|
|19:00:49|unbound: [4388:0]|info: [25%]=0.0249856 median[50%]=0.0351086 [75%]=0.0596846|
|19:00:49|unbound: [4388:0]|info: histogram of recursion processing times|
|19:00:49|unbound: [4388:0]|info: average recursion processing time 0.043774 sec|
|19:00:49|unbound: [4388:0]|info: server stats for thread 0: requestlist max 2 avg 0.136364 exceeded 0 jost led 0|
|19:00:49|unbound: [4388:0]|info: server stats for thread 0: 1608 queries, 1587 answers from cache, 21 recu rsions, 1 prefetch, 0 rejected by ip ratelimiting|
|19:00:49|unbound: [4388:0]|info: service stopped (unbound 1.18.0).|
|18:57:10|unbound: [4388:0]|info: generate keytag query _ta-4a5c-4f66. NULL IN|
|18:56:56|unbound: [4388:0]|info: start of service (unbound 1.18.0).|
|18:56:56|unbound: [4388:0]|notice: init module 1: iterator|
|18:56:56|unbound: [4388:0]|notice: init module 0: validator|
|18:56:56|unbound: [4388:0]|notice: Restart of unbound 1.18.0.|
|18:56:56|unbound: [4388:0]|info: 0.131072 0.262144 2|
|18:56:56|unbound: [4388:0]|info: 0.065536 0.131072 2|
|18:56:56|unbound: [4388:0]|info: 0.032768 0.065536 10|
|18:56:56|unbound: [4388:0]|info: 0.016384 0.032768 5|
|18:56:56|unbound: [4388:0]|info: lower(secs) upper(secs) recursions|
|18:56:56|unbound: [4388:0]|info: [25%]=0.0319488 median[50%]=0.0475136 [75%]=0.0630784|
|18:56:56|unbound: [4388:0]|info: histogram of recursion processing times|
|18:56:56|unbound: [4388:0]|info: average recursion processing time 0.064367 sec|
|18:56:56|unbound: [4388:0]|info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0|
|18:56:56|unbound: [4388:0]|info: server stats for thread 0: 36 queries, 17 answers from cache, 19 recursio ns, 0 prefetch, 0 rejected by ip ratelimiting|
|18:56:56|unbound: [4388:0]|info: service stopped (unbound 1.18.0).|
|18:54:10|unbound: [4388:0]|info: generate keytag query _ta-4a5c-4f66. NULL IN|
|18:54:09|unbound: [4388:0]|info: start of service (unbound 1.18.0).|
|18:54:08|unbound: [4388:0]|notice: init module 1: iterator|
|18:54:08|unbound: [4388:0]|notice: init module 0: validator|
|18:54:08|unbound: [4388:0]|notice: Restart of unbound 1.18.0.|

Is there any solution in sight?.

Thanks.

jon · 24 October 2023 18:07

Can you add details around this? I have the same stop/start issue but unbound seems to work OK for me.

EDIT: Also, please run this command and post the results:

find /var/log -iname "messages*" | sort -Vr | tail -15 |	\
xargs -I % bash -c 'echo ; printf "% \t" % ;				\
grep --color -c "start of service" <(zcat -f %) ;			\
grep --color "CORE UPGR:" <(zcat -f %)'

it prints this:

[root@ipfire ~] # find /var/log -iname "messages*" | sort -Vr | tail -15 |      \
xargs -I % bash -c 'echo ; printf "% \t" % ;                            \
grep --color -c "start of service" <(zcat -f %) ;                       \
grep --color "CORE UPGR:" <(zcat -f %)'

/var/log/messages.7.gz 	612

/var/log/messages.6.gz 	453

/var/log/messages.5.gz 	11

/var/log/messages.4.gz 	17
Sep 28 13:57:11 ipfire pakfire: CORE UPGR: Upgrading from release 178 to 179

/var/log/messages.3.gz 	5

/var/log/messages.2.gz 	623

/var/log/messages.1.gz 	527
Oct 16 19:28:34 ipfire pakfire: CORE UPGR: Upgrading from release 179 to 180

/var/log/messages 	442
[root@ipfire ~] #

roberto · 24 October 2023 18:54

Hi @jon

First of all, thank you for responding so quickly.

Apparently, because of this, they suffer constant and intermittent internet outages.

I have disabled the DHCP Server for both Green and Blue and it seems that the problem is over, but the problem is that this is not a solution, since the Blue Interface is needed.

Now I do not have access to the IPFire in question, but tomorrow, I will try to do what you told me about.

Greetings and thanks.

hvacguy · 24 October 2023 21:09

I’m not really any help in this matter.
But I thought unbound was for DNS.?
How is this related to DHCP.?
Hopefully that helps someone in the right direction for a fix.

roberto · 25 October 2023 05:29

Hi.

Here it is:


Last login: Sun Jan 22 09:08:58 2023 from 10.235.8.6
[root@bs ~]# find /var/log -iname "messages*" | sort -Vr | tail -15 |   \
xargs -I % bash -c 'echo ; printf "% \t" % ;                            \
grep --color -c "start of service" <(zcat -f %) ;                       \
grep --color "CORE UPGR:" <(zcat -f %)'

/var/log/messages.14.gz         2

/var/log/messages.13.gz         2

/var/log/messages.12.gz         4

/var/log/messages.11.gz         2

/var/log/messages.10.gz         2

/var/log/messages.9.gz  2

/var/log/messages.8.gz  2

/var/log/messages.7.gz  2

/var/log/messages.6.gz  2

/var/log/messages.5.gz  247
Sep 21 18:41:46 bs pakfire: CORE UPGR: Upgrading from release 175 to 178

/var/log/messages.4.gz  1149

/var/log/messages.3.gz  1036

/var/log/messages.2.gz  1004
Oct 11 08:44:24 bs pakfire: CORE UPGR: Upgrading from release 178 to 179

/var/log/messages.1.gz  1283
Oct 19 08:17:10 bs pakfire: CORE UPGR: Upgrading from release 179 to 180

/var/log/messages       565
[root@bs ~]#

When disabling the DHCP Server, no errors have occurred since yesterday when I disabled it.

If you need anything else, let me know.

Greetings.

cfusco · 25 October 2023 07:11

DHCP and Unbound are closely integrated in IPFire. I’ve outlined the two primary levels of integration below, based on my research.

DNS Resolver Information: When a client connects to the network, DHCP assigns it an IP address and sets Unbound as the DNS resolver, centralizing DNS queries through Unbound.
Local Resolution: DHCP updates the /etc/unbound/dhcp-leases.conf file, which Unbound reads to resolve local hostnames based on current DHCP leases.

I haven’t found additional levels of integration, but these are the key points.

hvacguy · 25 October 2023 09:42

Then perhaps the DHCP mechanism is messing up unbound. Hopefully this helps someone resolve this problem.
Looks like there have bin problems in the past
With the DHCP lease bridge used with unbound.

cfusco · 25 October 2023 11:02

I agree. It was a nice discovery by @roberto . I forgot to mention that there is a third layer of communication between Unbound and DHCP service.

RFC 2136 is commonly referred to as Dynamic DNS Updates, and it provides a mechanism for automatically updating DNS records dynamically as soon as changes occur. In IPFire, when this is enabled in the DHCP settings, the DHCP server will automatically update the DNS records within Unbound whenever a DHCP lease is granted or renewed. This ensures that the hostname and IP address mappings are always up-to-date without requiring manual intervention.

cfusco · 25 October 2023 12:21

Do you have the DNS update option enabled in your DHCP configuration? I use it and have never encountered issues with Unbound. If it’s not activated on your end, perhaps enabling the automatic DNS update might address the issue?

roberto · 25 October 2023 12:39

Hi @cfusco.

Thanks for reply.

I don’t have anything activated other than the DHCP Server for both Green and Blue, nothing else.

I’ll try to activate what you tell me.

Many thanks for everything.

Greetings.

jon · 25 October 2023 15:26

Based on the info, I do not think this has anything to do with unbound restarts.

I moved this to a new topic.

cfusco · 25 October 2023 17:15

I am not so sure. I think it might be due to DHCP DNS registration through the use unbound-dhcp-leases-bridge. While at this point I believe that RFC2136 implementation in DHCPD is not doing anything with Unbound, it could be that when you use unbound-dhcp-leases-bridge this behavior is somehow triggered. It could be worth to test this hypothesis.

arslanone · 25 October 2023 18:09

Hello,

I turned on the DNS update option about 3.5 hours ago and ever since my unbound has not stopped/started.

Before that it was starting/stopping every 5-10-15 minutes.

roberto · 25 October 2023 18:11

Hi guys.

I’ve tried what @cfusco said and it seems the problem has been fixed as those constant “stops” and “restarts” don’t appear.

|19:15:15|unbound: [4388:0]|error: SERVFAIL <cdn.insurads.com. HTTPS IN>: all the configured stub or forwar d servers failed, at zone . from 8.8.4.4 nodata answer|
|---|---|---|
|16:18:45|unbound: [4388:0]|error: SERVFAIL <mvconf.cloud.360safe.com. AAAA IN>: all the configured stub or forward servers failed, at zone . from 8.8.8.8 nodata answer|
|16:15:13|unbound: [4388:0]|info: generate keytag query _ta-4a5c-4f66. NULL IN|
|16:15:12|unbound: [4388:0]|info: start of service (unbound 1.18.0).|
|16:15:11|unbound: [4388:0]|notice: init module 1: iterator|
|16:15:11|unbound: [4388:0]|notice: init module 0: validator|
|16:15:11|unbound: [4388:0]|notice: Restart of unbound 1.18.0.|
|16:15:11|unbound: [4388:0]|info: 0.262144 0.524288 2|
|16:15:11|unbound: [4388:0]|info: 0.065536 0.131072 3|
|16:15:11|unbound: [4388:0]|info: 0.032768 0.065536 5|
|16:15:11|unbound: [4388:0]|info: 0.016384 0.032768 3|
|16:15:11|unbound: [4388:0]|info: lower(secs) upper(secs) recursions|
|16:15:11|unbound: [4388:0]|info: [25%]=0.0344064 median[50%]=0.0557056 [75%]=0.103765|
|16:15:11|unbound: [4388:0]|info: histogram of recursion processing times|
|16:15:11|unbound: [4388:0]|info: average recursion processing time 0.105848 sec|
|16:15:11|unbound: [4388:0]|info: server stats for thread 0: requestlist max 1 avg 0.0769231 exceeded 0 jos tled 0|
|16:15:11|unbound: [4388:0]|info: server stats for thread 0: 200 queries, 187 answers from cache, 13 recurs ions, 0 prefetch, 0 rejected by ip ratelimiting|
|16:15:11|unbound: [4388:0]|info: service stopped (unbound 1.18.0).|
|16:12:20|unbound: [4388:0]|info: generate keytag query _ta-4a5c-4f66. NULL IN|
|16:12:18|unbound: [4388:0]|info: start of service (unbound 1.18.0).|
|16:12:18|unbound: [4388:0]|notice: init module 1: iterator|
|16:12:18|unbound: [4388:0]|notice: init module 0: validator|
|16:12:18|unbound: [4388:0]|notice: Restart of unbound 1.18.0.|
|16:12:18|unbound: [4388:0]|info: 1.000000 2.000000 1|
|16:12:18|unbound: [4388:0]|info: 0.524288 1.000000 1|
|16:12:18|unbound: [4388:0]|info: 0.262144 0.524288 8|
|16:12:18|unbound: [4388:0]|info: 0.131072 0.262144 13|

What seems strange to me is that I have more IPFire boxes with the same configuration and only this one happens.

I hope it gets fixed, since even though what @cfusco said is good praxis, everything should work correctly without implementing this feature.

Greetings.

jon · 25 October 2023 18:42

@Roberto & @arslanone -

I am curious - did you have DNS Update enabled in the past?

or was it added due to this post above

arslanone · 25 October 2023 18:45

Hello Jon,

No, ever since I started using IPFire (maybe… ~2 years) I never enabled that DNS update setting. I looked at it, read the wiki and enabled it about 4+ hours ago after reading cfusco’s post.

bonnietwin · 25 October 2023 19:05

There has to be other interactions occurring here as I have not used the DNA Update in the dhcpd page and have not had this problem at all.

Here is the reuslt from my system using @jon command

find /var/log -iname "messages*" | sort -Vr | tail -15 |      \
xargs -I % bash -c 'echo ; printf "% \t" % ;                            \
grep --color -c "start of service" <(zcat -f %) ;                       \
grep --color "CORE UPGR:" <(zcat -f %)'

/var/log/messages.14.gz 	0

/var/log/messages.13.gz 	12

/var/log/messages.12.gz 	9
Aug  5 22:37:50 iapetus pakfire: CORE UPGR: Upgrading from release 176 to 177

/var/log/messages.11.gz 	0

/var/log/messages.10.gz 	31
Aug 18 12:30:42 iapetus pakfire: CORE UPGR: Upgrading from release 177 to 178

/var/log/messages.9.gz 	0

/var/log/messages.8.gz 	1

/var/log/messages.7.gz 	0

/var/log/messages.6.gz 	6

/var/log/messages.5.gz 	0

/var/log/messages.4.gz 	24
Sep 26 12:36:42 iapetus pakfire: CORE UPGR: Upgrading from release 178 to 179

/var/log/messages.3.gz 	3

/var/log/messages.2.gz 	5
Oct 12 13:29:15 iapetus pakfire: CORE UPGR: Upgrading from release 179 to 180

/var/log/messages.1.gz 	0

/var/log/messages 	0

I have never seen the multiple restarts of unbound.

arslanone · 25 October 2023 19:11

I’ll share my output as well, in case it helps:

find /var/log -iname "messages*" | sort -Vr | tail -15 |   xargs -I % bash -c 'echo ; printf "% \t" % ;                            \
grep --color -c "start of service" <(zcat -f %) ;                       \
grep --color "CORE UPGR:" <(zcat -f %)'

/var/log/messages.13.gz 	0

/var/log/messages.12.gz 	0

/var/log/messages.11.gz 	2071
Aug  6 15:26:41 firefly pakfire: CORE UPGR: Upgrading from release 176 to 177

/var/log/messages.10.gz 	2377

/var/log/messages.9.gz 	2453
Aug 21 07:55:22 firefly pakfire: CORE UPGR: Upgrading from release 177 to 178

/var/log/messages.8.gz 	2534

/var/log/messages.7.gz 	2060

/var/log/messages.6.gz 	2486

/var/log/messages.5.gz 	2009

/var/log/messages.4.gz 	2284
Sep 26 14:49:49 firefly pakfire: CORE UPGR: Upgrading from release 178 to 179

/var/log/messages.3.gz 	2723

/var/log/messages.2.gz 	1001
Oct 12 17:12:42 firefly pakfire: CORE UPGR: Upgrading from release 179 to 180

/var/log/messages.1.gz 	753

/var/log/messages.tar.gz 	234

/var/log/messages 	488

jon · 25 October 2023 19:23

The above test (and command line) was to look for unbound restarts and when the restarts first started appearing in the message log.

For me it seems to have started after there was an update to this script at CU 178:
https://git.ipfire.org/?p=ipfire-2.x.git;a=blobdiff;f=config/unbound/unbound-dhcp-leases-bridge;h=e9f022affa98a2483ca1f41d267fe443b0f048cf;hp=e89e0446b48e90e92a5a812db36c19226fc042dd;hb=f20ca78eff6e8baeb86361f55adf52819d1bae1f;hpb=1b6b4118b2234efa9b28553bc8f9b2c6b74bb5fb

For me this is a potential bug since I do not know much about unbound or the unbound-dhcp-leases-bridge script.

And as discussed before this may be more related to how you use DHCP vs how I use DHCP. I use it for most every device on my network.

When looking thru the unbound-dhcp-leases-bridge script, it seems like its purpose is only to pass DHCP lease info, DHCP fixed lease info ~~(and maybe one other thing that I cannot remember at the moment!)~~ to the DNS server so DNS can point to the local devices.

EDIT: and Static hosts!

EDIT2: One other point…
The change that was made does reload unbound, see code below, and so the restarts seen may be perfectly fine. I just don’t know for sure.

+               log.debug("Reloading Unbound...")
+               # Reload the configuration without dropping the cache
+               self._control("reload_keep_cache")

bonnietwin · 25 October 2023 19:38

Then we need to get that info from each of the users having the problem to see if they are using fully dynamic leases or fixed leases.

So my dhcpd setup uses fixed leases for all my systems on my green network.
My blue network uses dhcp dynamic leases but this network is only used for family and friends when they visit so is not used on an ongoing basis, just occasionally when they visit.

The other people who are having this problem can they confirm that they use dynamic leases on the green and/or blue networks.

Can other forum members who use only, or predominantly, fixed leases confirm if they do or don’t see the problem using the command from @jon

Similarly, can people who use only, or predominantly, dynamic leases confirm if they do or don’t see the problem using the command from @jon

The problem, if present and linked to the unbound dhcp leases bridge change when unbound was able to reload its configuration, should show from Core Update 177 onwards.
https://blog.ipfire.org/post/ipfire-2-27-core-update-177-released#misc