I realized that I hadn’t actually reproduced the failure scenario in my systematic testing, so I created a virtual network where I could remove the DHCP server at will.
This time, booting with all (virtual) network cables plugged in, but simply no DHCP server around to respond (just like what happend to me this afternoon):
Jan 8 22:41:06 ipfire dhcpcd[1584]: dhcpcd-9.4.1 starting
Jan 8 22:41:06 ipfire dhcpcd[1587]: DUID 00:04:24:70:f1:41:01:82:d0:4b:b5:09:ae:87:60:d0:15:54
...
Jan 8 22:41:07 ipfire dhcpcd[1587]: red0: IAID 27:40:11:d7
Jan 8 22:41:07 ipfire dhcpcd[1587]: red0: adding address fe80::a00:27ff:fe40:11d7
Jan 8 22:41:07 ipfire dhcpcd[1587]: ipv6_addaddr1: Permission denied
Jan 8 22:41:07 ipfire dhcpcd[1587]: red0: soliciting an IPv6 router
...
Jan 8 22:41:07 ipfire dhcpcd[1587]: red0: soliciting a DHCP lease
And we see dhcpcd
retrying properly in the background:
22:42:09.575612 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 303
22:43:13.114342 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 303
22:44:17.252694 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 303
All this time it shows “Connecting…”, correctly.
I left it this way for a few minutes, then made the DHCP server available again. On the next retry, everything works:
Jan 8 22:45:21 ipfire dhcpcd[1587]: red0: offered 10.0.3.98 from 10.0.3.250
Jan 8 22:45:21 ipfire dhcpcd[1587]: red0: probing address 10.0.3.98/24
Jan 8 22:45:25 ipfire dhcpcd[1587]: red0: leased 10.0.3.98 for 3600 seconds
Jan 8 22:45:25 ipfire dhcpcd[1587]: red0: adding route to 10.0.3.0/24
Jan 8 22:45:25 ipfire dhcpcd[1587]: red0: adding default route via 10.0.3.250
Jan 8 22:45:25 ipfire dhcpcd.exe[2217]: red0 has been (re)configured with IP=10.0.3.98
Jan 8 22:45:25 ipfire unbound: [1454:0] info: service stopped (unbound 1.13.2).
...
And the tcpdump comfirms it, although I don’t know why it asked twice.
22:45:21.258739 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 303
22:45:21.258739 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 345: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 303
22:45:21.259428 REDACTED > 08:00:27:40:11:d7, ethertype IPv4 (0x0800), length 342: 10.0.3.250.67 > 10.0.3.98.68: BOOTP/DHCP, Reply, length 300
22:45:21.259553 08:00:27:40:11:d7 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 355: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 08:00:27:40:11:d7, length 313
22:45:21.259989 REDACTED > 08:00:27:40:11:d7, ethertype IPv4 (0x0800), length 342: 10.0.3.250.67 > 10.0.3.98.68: BOOTP/DHCP, Reply, length 300
All the usual reaction. Even a pakfire update ran.
This confirms that dhcpcd
does retry forever when there is no initial lease at boot, and that ipfire reacts correctly when it does get a lease.
So, with my changes to start dhcpcd
in the background, it seems that booting in an environment where there is no DHCP server immediately available on red0
will produce the correct recovery once the DHCP server arrives even if no other triggers occur (such as link up/down).