will also drop the internet connection but it claims it got the right IP.
Bringing up the red0 interface…
Starting dhcpcd on the red0 interface… [ OK ]
DHCP Assigned Settings for red0:
IP Address: 173.180.240.144
Subnet Mask: 255.255.252.0
Default Gateway: 173.180.240.1
DNS Server: 75.153.171.67 75.153.171.116
My power and internet are very stable so I don’t need to do this very often but I’d like to fix it for the time I am not at home and need VPN.
I also tried with a new install and is the same so it seems that is somehow my hardware, I guess some incompatibility with my modem, but before I had zeroshell and didn’t have this issue.
Any ideas?
My Hw is an APU 2 with bios up2date so no usb on red0
Modem is in bridge mode but it does include a switch.
I did more digging and on
[root@ipfire ~]# /etc/rc.d/init.d/networking/red stop
Stopping dhcpcd on the red0 interface… [ OK ]
Bringing down the red0 interface… [ OK ]
[root@ipfire ~]# /etc/rc.d/init.d/networking/red start
Bringing up the red0 interface…
Starting dhcpcd on the red0 interface… [ OK ]
DHCP Assigned Settings for red0:
IP Address: 173.180.243.195
Subnet Mask: 255.255.252.0
Default Gateway: 173.180.240.1
DNS Server: 75.153.171.67 75.153.171.116
On parallel ssh instance we have: tail -f /var/log/messages | grep dhc
will bring
Oct 7 09:27:41 d173-180-240-140 dhcpcd[18116]: sending signal ALRM to pid 11618
Oct 7 09:27:41 d173-180-240-140 dhcpcd[18116]: waiting for pid 11618 to exit
Oct 7 09:27:41 d173-180-240-140 dhcpcd[11619]: received SIGALRM, releasing
Oct 7 09:27:41 d173-180-240-140 dhcpcd[11619]: red0: removing interface
Oct 7 09:27:41 d173-180-240-140 dhcpcd[11619]: red0: releasing lease of 173.180.242.226
Oct 7 09:27:41 d173-180-240-140 dhcpcd[11619]: red0: deleting route to 173.180.240.0/22
Oct 7 09:27:41 d173-180-240-140 dhcpcd[11619]: red0: deleting default route via 173.180.240.1
Oct 7 09:27:41 d173-180-240-140 dhcpcd.exe[18117]: red0 has been brought down (STOP)
Oct 7 09:27:43 d173-180-240-140 dhcpcd[11619]: main: control_stop: No such file or directory
Oct 7 09:27:43 d173-180-240-140 dhcpcd[11619]: dhcpcd exited
Oct 7 09:28:47 d173-180-240-140 dhcpcd[18445]: dhcpcd-10.0.1 starting
Oct 7 09:28:47 d173-180-240-140 dhcpcd[18448]: DUID 00:01:00:01:24:4e:69:c1:00:0d:b9:4a:50:18
Oct 7 09:28:47 d173-180-240-140 dhcpcd[18448]: red0: waiting for carrier
Oct 7 09:28:50 d173-180-240-140 dhcpcd[18448]: red0: carrier acquired
Oct 7 09:28:50 d173-180-240-140 dhcpcd[18448]: red0: IAID b9:4a:50:18
Oct 7 09:28:50 d173-180-240-140 dhcpcd[18448]: red0: soliciting a DHCP lease
Oct 7 09:28:50 d173-180-240-140 dhcpcd[18448]: red0: probing address 173.180.243.195/22
Oct 7 09:28:55 d173-180-240-140 dhcpcd[18448]: red0: leased 173.180.243.195 for 14400 seconds
Oct 7 09:28:55 d173-180-240-140 dhcpcd[18448]: red0: adding route to 173.180.240.0/22
Oct 7 09:28:55 d173-180-240-140 dhcpcd[18448]: red0: adding default route via 173.180.240.1
Oct 7 09:28:56 d173-180-240-140 dhcpcd.exe[18505]: red0 has been (re)configured with IP=173.180.243.195
Oct 7 09:30:23 d173-180-240-140 dhcpd: DHCPREQUEST for 192.168.7.143 from 00:15:00:c6:c5:f0 (X250) via green0
Oct 7 09:30:23 d173-180-240-140 dhcpd: DHCPACK on 192.168.7.143 to 00:15:00:c6:c5:f0 (X250) via green0
Now physically unplug the cable:
Oct 7 09:45:25 d173-180-240-140 dhcpcd[21072]: red0: carrier lost
Oct 7 09:45:25 d173-180-240-140 dhcpcd[21072]: red0: deleting route to 173.180.240.0/22
Oct 7 09:45:25 d173-180-240-140 dhcpcd[21072]: red0: deleting default route via 173.180.240.1
Oct 7 09:45:25 d173-180-240-140 dhcpcd.exe[21945]: red0 has been brought down (EXPIRE)
Oct 7 09:45:29 d173-180-240-140 dhcpcd[21072]: red0: carrier acquired
Oct 7 09:45:29 d173-180-240-140 dhcpcd[21072]: red0: IAID b9:4a:50:18
Oct 7 09:45:31 d173-180-240-140 dhcpcd[21072]: red0: rebinding lease of 173.180.242.184
Oct 7 09:45:31 d173-180-240-140 dhcpcd[21072]: red0: probing address 173.180.242.184/22
Oct 7 09:45:37 d173-180-240-140 dhcpcd[21072]: red0: leased 173.180.242.184 for 14400 seconds
Oct 7 09:45:37 d173-180-240-140 dhcpcd[21072]: red0: adding route to 173.180.240.0/22
Oct 7 09:45:37 d173-180-240-140 dhcpcd[21072]: red0: adding default route via 173.180.240.1
Oct 7 09:45:37 d173-180-240-140 dhcpcd.exe[22167]: red0 has been (re)configured with IP=173.180.242.184
Route ( good state ):
root@ipfire ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 1002 0 0 red0
10.153.89.0 10.153.89.2 255.255.255.0 UG 0 0 0 tun0
10.153.89.2 0.0.0.0 255.255.255.255 UH 0 0 0 tun0
173.180.240.0 0.0.0.0 255.255.252.0 U 1002 0 0 red0
192.168.7.0 0.0.0.0 255.255.255.0 U 0 0 0 green0
Route on bad state
[root@ipfire ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 1002 0 0 red0
10.153.89.0 10.153.89.2 255.255.255.0 UG 0 0 0 tun0
10.153.89.2 0.0.0.0 255.255.255.255 UH 0 0 0 tun0
173.180.240.0 0.0.0.0 255.255.252.0 U 1002 0 0 red0
192.168.7.0 0.0.0.0 255.255.255.0 U 0 0 0 green0
As far as I can tell everything is pretty much the same. I also did some log filtering on red0, everything is messier but there one more line that is present on bad state Oct 7 09:48:29 d173-180-240-140 kernel: 8021q: adding VLAN 0 to HW filter on device red0
Not sure if is important or not.
I have found that the FIB table in a switch, at least for un-managed switches, can get out of kilter. It has also occurred with my modem, although that is in routing mode. Your re-plugging of the network cable might be temporarily fixing it.
I find that a switch has to be powered down for several minutes to correct this, although that will probably vary with switches. In your case, try powering down the modem for several minutes.
Good point, my modem is alive for more than 1 year so lets kill it.
Unfortunately didn’t fix the issue.
I also tried to power cycle the modem and the apu in the same time. It seems that the apu come alive first, cannot get an IP then remain in this state forever.
This is a different bug but will affect me as well because if power goes down this is the case. Here is what I could get after 10 minutes
root@ipfire ~]# /etc/rc.d/init.d/networking/red stop
Stopping dhcpcd on the red0 interface… Not running. [ WARN ]
[root@ipfire ~]# /etc/rc.d/init.d/networking/red start
Bringing up the red0 interface…
Starting dhcpcd on the red0 interface… [ OK ]
DHCP Assigned Settings for red0:
IP Address: 173.180.241.178
Subnet Mask: 255.255.252.0
Default Gateway: 173.180.240.1
DNS Server: 75.153.171.67 75.153.171.116
[root@ipfire ~]# ping 173.180.240.1
PING 173.180.240.1 (173.180.240.1) 56(84) bytes of data.
From 173.180.241.178 icmp_seq=1 Destination Host Unreachable
As you can see pinging the gatway doesn’t work. After plug in out the cable then
[root@ipfire ~]# ping 173.180.240.1
PING 173.180.240.1 (173.180.240.1) 56(84) bytes of data.
64 bytes from 173.180.240.1: icmp_seq=1 ttl=64 time=4.68 ms
64 bytes from 173.180.240.1: icmp_seq=2 ttl=64 time=4.56 ms
I also tried the power cycle ( no reboot or soft reset ) the apu and surprise, is coming alive ok which is good for me, except… in real world will be in same time with modem.
So I have 2 bugs.:
networking is dead on red after doing /etc/rc.d/init.d/networking/red start
dhcp client does not retry on failure to acquire an IP after hard reboot.
I recall that the issue of delaying the timing of red0 being brought up has been discussed previously in this forum.
It is also fairly simple to edit the /boot/grub/grub.conf file to impose a much longer delay on the default boot, although that was deemed, at the time, to not be the optimum solution.
I am aware of the “delay” option, but being a devops and doing automation I can tell you that is a bad solution. Usually the right way is to have a number of retries. That sounds easy right… we just change the /etc/init.d/networking/red where we have
if [ -z "${pid}" ]; then
# No wpa_supplicant is running. So it's save to start dhcpcd.
dhcpcd_start "${DEVICE}"
fi
I tried to apply this and was hit by the poorly written library. Instead of using return the function use exit and that is not even consistent ( bug number 3 ) . So in order to check a return you need to run the function in a subshell.
This is also really tough to debug because no messages are logged to syslog to be able to see / inspect after reboot.
I also discovered that if red0 network cable is not present at boot time then dhcpcd fails and if cable is connected later on will stay in that status which brings ( bug number 4 ). This is because if dhcpcd is already started it will handle nicely the red0 up and down.
Here is the diff that fix bug #2
[root@ipfire networking]# diff red red.orig
85d84
< logger "Bringing up the ${DEVICE} interface..."
102d100
< logger "Interface ${DEVICE} doesn't exists. ${FAILURE}"
175,185c173
< for i in {1..20}
< do
< if ( dhcpcd_start "${DEVICE}" );
< then
< logger "dhcpcd successfully started"
< break
< fi
< boot_mesg "dhcpcd does not start... retry #: $i"
< logger "dhcpcd does not start.... retry #: $i"
< sleep 20
< done
---
> dhcpcd_start "${DEVICE}"
OOOPs … turn out I didn’t bother to save functions.network … LOL so here is the pertinent part
dhcpcd_start() {
# This function will start a dhcpcd on a speciefied device.
local device="$1"
shift
local dhcp_start=()
boot_mesg -n "Starting dhcpcd on the ${device} interface..."
logger "Starting dhcpcd on the ${device} interface... _start"
# Check if a dhcpcd is already running.
local pid="$(dhcpcd_get_pid "${device}")"
if dhcpcd_is_running "${pid}"; then
boot_mesg "dhcpcd already running!" ${WARNING}
logger "dhcpcd already running"
echo_warning
exit 2
fi
# Check if a DHCP hostname has been set.
if [ -n "${RED_DHCP_HOSTNAME}" ]; then
dhcp_start+=( "-h" "${RED_DHCP_HOSTNAME}" )
fi
# Tell dhcpcd to use the configured MTU
if [ -n "${RED_DHCP_FORCE_MTU}" ]; then
dhcp_start+=( "--static" "mtu=${RED_DHCP_FORCE_MTU}" )
fi
# Append any further command line options
dhcp_start+=( $@ )
# Start dhcpcd.
echo "Invoking dhcpcd"
/sbin/dhcpcd "${dhcp_start[@]}" ${device} >/dev/null 2>&1
ret="$?"
echo "dhcpcd return code: $ret"
if [ "${ret}" -eq 0 ]; then
. /var/ipfire/dhcpc/dhcpcd-"${device}".info
if [ $ip_address ]; then
echo ""
echo_ok
boot_mesg " DHCP Assigned Settings for ${device}:"
boot_mesg_flush
boot_mesg " IP Address: $ip_address"
boot_mesg_flush
if [ -n "${RED_DHCP_HOSTNAME}" ]; then
boot_mesg " Hostname: $RED_DHCP_HOSTNAME"
boot_mesg_flush
fi
boot_mesg " Subnet Mask: $subnet_mask"
boot_mesg_flush
boot_mesg " Default Gateway: $routers"
boot_mesg_flush
boot_mesg " DNS Server: $domain_name_servers"
boot_mesg_flush
else
echo ""
echo_ok
boot_mesg "DHCP for ${device} still running..."
boot_mesg_flush
fi
else
echo "Can't start dhcpcd ${ret}"
logger "Can't start dhcpcd ${ret}"
$(exit "${ret}")
evaluate_retval
exit "$ret"
fi
}
Basically I added exit "$ret" to fix a rookie mistake and some logger. I also reduced the delay to 20 seconds but… it doesn’t fix my original bug. However it fix the delay of my modem start, makes ipfire to boot faster and is way more flexible.
If you have a github I can make a PR.
I also noticed that ifconfig red0 down then ifconfig red0 up worked once so I will try more tomorrow on this direction.
I’ll will also try to install pfsense to see if this issue is there, maybe is my HW combination.
Then I run git format-patch which creates the patch in the correct format for submitting. I find that this is much simpler than trying to manually format the patch into the correct style.
Then I run git send-email with the formatted patch as the argument and the development@lists.ipfire.org as the email destination (again defined in the git config together with the mail server details that you use for sending your mails)
Any patch submissions will then be automatically picked up by the IPFire mail server and added into the IPFire Patchwork system.
I will clean it, take a look on the boot_mesg function cause instead of logger "bla bla I should have something like logger $1 so more correct it should be like 2 individual commits / patches.
LOL, did some reading and it turns out I was misunderstanding how dhcpcd works.
According to docs dhcpcd - ArchWiki
The dhcpcd does the request then wait so the timeout is how long it will wait for a response and not a delay before requesting. Technically it can wait indefinitely ( with 0 )
The problem is that in current implementation the boot on the router will stop for the wait, there is way around that described in docs but not implemented in ipfire.
This kind of makes my retry patch useless. However the exit is a good fix but not necessary either. Did some tries and 5 minutes works in same way as 10 seconds.
Back to my issues, I made some progress. ifconfig red0 down then ifconfig red0 up fix my issue so I can easy make some script to automate this.
However on bad state:
The only difference is Message Type and rapid_commit
Based on other post I said to try to remove the rapid_commit option and give it a try.
It turns out that this fixed my issue. Really weird.
The good:
no patches or scripts are necessary,
The I’d like to:
fix boot_log to log to /var/log/messages
fix library to exit consistent on error and red script to proper log errors.
Few info from analyzing your logs and chatting with GPT4.
In the good state log, dhcp_message_type value is 5, which stands for DHCPACK, an acknowledgment message. In the bad state, the value is 2, which stands for DHCPOFFER, an offer message from the provider.
The rapid_commit option is part of the DHCP protocol and is supported by dhcpcd. It’s used to speed up the initial DHCP handshake. Normally, a DHCP handshake is a 4-step process: DISCOVER, OFFER, REQUEST, and ACK. With rapid_commit, the process is shortened to just 2 messages: SOLICIT (which is a combination of DISCOVER and REQUEST) and a combined OFFER/ACK. This makes the acquisition of an IP address faster.
If rapid_commit is present in the dhcpcd information file, it indicates that the DHCP server and client have agreed to use the Rapid Commit option for faster IP address assignment.
To me this means that shortened handshake fails with your provider but it is attempted nonetheless.
The rapid_commit option in dhcpcd should be used if the ISP has enabled rapid_commit on their server. If not enabled on the ISP server then having it enabled on the client should just be ignored. That is the status if the ISP is following the dhcp rfc’s.
Unfortunately not all do and other posters have had to comment out the rapid_commit otherwise the ISP’s dhcp server does not complete the dhcp communication flow properly.
The ISP has obviously configured their dhcp server not in line with the rfc’s.
For most people it is fine but there have been around 4 or 5 posters that have had problems that were solved by commenting out that option.
Another poster solved their problem by commenting out another standard option.
What should happen is that the client asks if the 2 message option can be used. The server ignores it and starts with the normal 4 message communication.
Some ISP’s have configured their dhcp servers so that they stop or do something very non-standard if they get the 2 message request
It doesn’t indicate that anything has been agreed between the server and the client. It just means that the client will ask if the 2 message option can be used. The server then either agrees and uses the 2 message option or ignores the request and starts with the full 4 message approach.
For most people with their ISP’s that is working fine but not with some ISP’s.
the message was present in the bad state, therefore from the point of view of dhcpcd the server agreed to the rapid handshake. As you said, the server is improperly configured.
EDIT: after reading again, now I see what you meant. Yes, your language is more precise in describing what happened.