Red0 unable to re-establish connections, long-term dhcpcd problem

Failure - op (180) is not BOOTREPLY?! Something worth searching for!

Mar 22 21:42:41 gateway dhcpcd[13881]: dhcpcd-7.2.3 starting
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: executing `/var/ipfire/dhcpc/dhcpcd-run-hooks' PREINIT
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: executing `/var/ipfire/dhcpc/dhcpcd-run-hooks' CARRIER
Mar 22 21:42:41 gateway dhcpcd[13881]: DUID 00:04:03:00:02:00:04:00:05:00:00:06:00:07:00:08:00:09
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: IAID c0:08:aa:bb
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: adding address fe80::c2c1:c0ff:fe08:aabb
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: pltime infinity, vltime infinity
Mar 22 21:42:41 gateway dhcpcd[13881]: ipv6_addaddr1: Permission denied
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: delaying IPv6 router solicitation for 0.5 seconds
Mar 22 21:42:41 gateway dhcpcd[13881]: red0: delaying IPv4 for 0.9 seconds
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: soliciting an IPv6 router
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: delaying Router Solicitation for LL address
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: using hwaddr c0:c1:c0:08:aa:bb
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: soliciting a DHCP lease
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: sending DISCOVER (xid 0xbad3999d), next in 3.6 seconds
Mar 22 21:42:42 gateway dhcpcd[13881]: red0: op (180) is not BOOTREPLY
Mar 22 21:42:46 gateway dhcpcd[13881]: red0: sending DISCOVER (xid 0xbad3999d), next in 7.1 seconds
Mar 22 21:42:53 gateway dhcpcd[13881]: red0: sending DISCOVER (xid 0xbad3999d), next in 16.5 seconds
Mar 22 21:43:09 gateway dhcpcd[13881]: red0: sending DISCOVER (xid 0xbad3999d), next in 33.0 seconds
Mar 22 21:43:11 gateway dhcpcd[13881]: timed out
Mar 22 21:43:11 gateway dhcpcd[13881]: dhcpcd exited

Success - even though the ā€œignoring offerā€ message is displayed

Mar 22 21:45:23 gateway dhcpcd[16156]: dhcpcd-7.2.3 starting
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: executing `/var/ipfire/dhcpc/dhcpcd-run-hooks' PREINIT
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: executing `/var/ipfire/dhcpc/dhcpcd-run-hooks' CARRIER
Mar 22 21:45:23 gateway dhcpcd[16156]: DUID 00:04:03:00:02:00:04:00:05:00:00:06:00:07:00:08:00:09
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: IAID c0:08:aa:bb
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: adding address fe80::c2c1:c0ff:fe08:aabb
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: pltime infinity, vltime infinity
Mar 22 21:45:23 gateway dhcpcd[16156]: ipv6_addaddr1: Permission denied
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: delaying IPv6 router solicitation for 0.8 seconds
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: delaying IPv4 for 0.3 seconds
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: using hwaddr c0:c1:c0:08:aa:bb
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: soliciting a DHCP lease
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: sending DISCOVER (xid 0x5e4794a3), next in 4.0 seconds
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: offered x.x.x.101 from x.x.x.1
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: sending REQUEST (xid 0x5e4794a3), next in 3.9 seconds
Mar 22 21:45:23 gateway dhcpcd[16156]: red0: ignoring offer of x.x.x.101 from x.x.x.1
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: soliciting an IPv6 router
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: delaying Router Solicitation for LL address
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: acknowledged x.x.x.101 from x.x.x.1
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: probing address x.x.x.101/24
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: probing for x.x.x.101
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: ARP probing x.x.x.101 (1 of 3), next in 1.5 seconds
Mar 22 21:45:24 gateway dhcpcd[16156]: red0: probing, ignoring x.x.x.101 from x.x.x.1
Mar 22 21:45:26 gateway dhcpcd[16156]: red0: ARP probing x.x.x.101 (2 of 3), next in 1.4 seconds
Mar 22 21:45:27 gateway dhcpcd[16156]: red0: ARP probing x.x.x.101 (3 of 3), next in 2.0 seconds
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: DAD completed for x.x.x.101
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: leased x.x.x.101 for 600 seconds
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: renew in 300 seconds, rebind in 525 seconds
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: writing lease `/var/ipfire/dhcpc/red0.lease'
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: adding IP address x.x.x.101/24 broadcast x.x.x.255
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: adding route to x.x.x.0/24
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: adding default route via x.x.x.1
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: ARP announcing x.x.x.101 (1 of 2), next in 2.0 seconds
Mar 22 21:45:29 gateway dhcpcd[16156]: red0: executing `/var/ipfire/dhcpc/dhcpcd-run-hooks' BOUND
Mar 22 21:45:29 gateway dhcpcd.exe[16244]: red0 has been (re)configured with IP=x.x.x.101

Sadly, this hasn’t lead to anything helpful, other than me finding some source code for dhcpcd mentioning checking for BOOTREPLY but the code is unclear to me.

I’m sure my ISP uses Cisco equipment, but either way this is looking like something I’ll have to ask them about.

Unless anyone has any good ideas I’m going to have to go through the RFCs for DHCP! :disappointed:
Might need to get a PCAP from red0 during the problem.

I’ve since seen
Mar 23 19:57:36 gateway dhcpcd[13592]: red0: op (141) is not BOOTREPLY
in a few attempts. Usually the respones which are being marked as ā€œnot BOOTREPLYā€ are within the first 4 seconds, making me wonder if a recent change to dhcpcd could have caused this to be a problem for me.

tcpdump for red0 (with no filters) was unable to capture whatever packets caused that error. When I eventually was able to record a successful connection I captured the DHCP offer and DHCP ACK without problems.

https://git.ipfire.org/?p=ipfire-2.x.git;a=history;f=lfs/dhcpcd;h=83252e8976d153b033923ffd757687f1598206e5;hb=70af65df4198c58f99a333748faa39b39ad1c3c4

The problem is that we have updated it with core136 and reverted this back because user reported such problems with 8.02 to 8.10.
And now users have the same problems also with the old 7.2.3 version. Im not sure what happens here. Maybee some changes at the ISP’s are the real reason… But im not sure…

Thank you.
Do you have any theories as to what may have changed?

I’ve not been deliberately breaking my connection to test and, without doing that, dhcpcd has been reliably renewing every 5 minutes (my ISP lease time is only 10 minutes).

The problem I described above, happens if for any reason the modem/router is restarted, my link to the ISP is down or the ISP themselves reset their router. While these things happen rarely, when they happen IPFire remains disconnected forever, needing intervention.

Deliberately responding to a very old post here

I’m hoping that the dhcp rapid_commit option may have been causing this issue. I have had to run the script I mentioned here every minute for the past 5 years to ensure my IPFire system reconnects to the internet if it is ever disconnected.

I updated the wiki page and will test the option with my ISP now: www.ipfire.org - dhcpcd

2 Likes

From core 190 blog post / update.

Misc.

  • The RED interface can now be configured to no longer require the RFC4039 Rapid Commit option. This is a default option in almost all DHCP clients for over 20 years, but we have recently observed ISPs running broken DHCP servers which no longer work if this option is enabled. It can now be enabled or disabled using the setup command.
2 Likes

Hey @hvacguy thank you for posting but I’m confused.
What did you mean to add by linking to what I have just mentioned?

I was aware of the change in Core Update 190 and I’ve posted to this old thread now explaining that hope it will fix my issue.

I had to update the wiki as until yesterday it said that the rapid_commit option had to be changed in an /etc/ configuration file which no longer exists.

2 Likes

My bad.
Good job on wiki update. :smiley:
hopefully this fixes your WAN connection problems once and for all.

Unfortunately the problem (which might be specific to my hardware) still occurs. Changing the rapid_commit option makes no difference.

This says that your IPFire solicited a dhcp lease and was offered a lease and then a very short time later was offerred the same lease again which was ignored by dhcpcd.

The same lease being offered twice in short succession is not what should happen but from what was said by Roy Marples (originator and developer of dhcpcd) in an issue report, it would not cause dhcpcd any issues because it will just ignore the repeated version.

The only other advice that was given by Roy Marples was to comment out some options in dhcpcd.conf, one at a time, and see if things improve as your ISP’s dhcp server maybe is not RFC compliant and doesn’t like one of them.

You have already excluded the rapid_commit option but maybe there is another option that your ISP’s dhcp server has been designed to not work with even though it should.

Of course the ISP’s dhcp server should work with all the options listed in the dhcpcd.conf file as they have been around for a long time and are RFC defined but, as with the rapid_commit, some ISP’s are just having incorrectly working dhcp servers.

My previous ISP had a dhcp server that worked fine with the rapid_commit option but they were then bought out by another ISP and their dhcp server won’t work properly with that option.
No response back from them about the issue.

2 Likes

problem is its going to be hard to duplicate for Roy.

The ISP has the lease too short. 1000 seconds should be the shortest lease. Looking at what others did (OpenWRT,OpenSEnse,pfsense) they install a script that runs in chron to see if the internet is up and if not, manually bring down the interface,bring it back up and initiate dhcp. They added this as a ā€œkeep aliveā€ option for these Fiber DSL systems. Because this issue is with the way they configured the system and not the router OS or the modules like dchpd.

Thanks again @bonnietwin . This has been in my ā€œtoo hard basketā€ for a while but I should invest the time to experiment with DHCP options and try to get to the root cause.

1 Like

Unfortunately I have two problems. From my post 5 years ago:

The first issue is arguably more important.

I have been ignoring this for 5 years because I have a script which checks for connectivity running from /etc/fcron.cyclic which I shared previously in this post for someone who had a similar issue.

Does anyone have any ideas how to diagnose this plesae?

I’ve taken some packet captures of red0 before pulling the VDSL cable, waiting for the modem to disconnect, then reconnecting the cable. I can’t anything of value in them.
Unfortunately the packet captures contain a lot of personal metadata, so I’d prefer not to share them generally with this forum (but could PM an expert).

On the second issue, one retry resulted in this log in IPFire:

Mar 30 20:26:45 gateway dhcpcd[13171]: timed out
Mar 30 20:26:45 gateway dhcpcd[13171]: main: control_stop: No such file or directory
Mar 30 20:26:45 gateway dhcpcd[13171]: dhcpcd exited

but at least I can experiement with removing DHCP options as suggested.

So was this used equipment that windows10 was ran on?
Because there was an unconfirmed notice from Arch Linux to Main that Windows 10 flashed something in the firmware on an update in network cards that later was used in Arch Linux and caused DHCP renew issues. But no one responded back from Arch when Main asked what hardware was getting altered like this. So it might be something to look at and try a new card or usb interface that never has seen windows. Because that is what was reported the fix for them. Even though they should communicate better on this subject to Linux Main.

Do you know more about this? Or have you only read some posts about this anywhere?

I remember this in Main Dev email about two years ago, but I imagine it was discussed in Arch linux. But since you’ve mentioned this, I am going to search and see if its been brought up. I just wanted to mention this because of the possibility. But further investigation might be needed to rule out if the driver itself is malfunctioning too. The issue is being able to duplicate it in a dev lab and what interfaces are involved. I doubt my realtek driver I help integrate is causing issues because of all the testing that was done and didn’t have much issues with other modules, but I did had to add a flush buffers before dhcp renew, but I would think Linus would not overlook that because he’s pretty thorough with things and he tasked himself for the Intel driver and that may or may not be needed. Grimm (Linus) is very intimidating so some of us will sneak the patch in instead of getting scald for bring it up because its everybody’s responsibility is to double check everyone’s work. I was too busy to double check the Intel driver, but there are 128 more people that was tasked to do so. I haven’t looked to see if dchpd devs are having an issue with things, but rarely they have something new pop up that they can’t resolve quickly.

But I did notice I have a big delay with my server handing out dhcp clients since 6.6.56 when they boot up, and that was another intel driver that was edited. I should investigate it but I haven’t got around to that.

Looking at what was installed in IPFire:

Internet Systems Consortium DHCP Server 4.4.3-P1
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.

This is old and one would have to check if RTA_EXPIRES and RTA_MULTIPATH is configured in the Kernel which was one of the things that was going to be removed last year. Also I think there was a path change too in a Kernal call and that can explain why the file not found error occurred.

Have you checked? IPFire is open source! :wink:

1 Like

You have confused the dhcpd server with the dhcpcd client.

The dhcpd server is used to provide the dynamic and fixed IP’s to clients on the green and blue networks in IPFire.
The version in IPFire is the latest and last version from the ISC DHCP server.

ISC have changed from the DHCP package for the server to the Kea package.

The migration from DHCP to Kea is on the list but it is not easy as the syntax and config file construction are radically different. We need to figure out how to migrate all the users systems, irrespective of the options etc that they have used.

The dhcpcd client is the package from Roy Marples and that is the one being used as a client on the red interface.

We currently have dhcpcd-10.2.2 in Core Update 193 Testing., which is the latest version, from Feb this year.

Core Update 192 has version 10.2.1

1 Like