Red0 unable to re-establish connections, long-term dhcpcd problem

dnl · 19 March 2020 10:51

Hello,

I have a VDSL2 connection. IPFire connects through an ISP-supplied modem using DHCP. Over the past few months if my VDSL router is rebooted or drops out IPFire always fails to reconnect (dhcpcd fails). Running /etc/init.d/network restart red0 from a shell only works about 25% of the time. Mostly something fails and the red0 interface stays down. The only way I can reliably get a connection is to cold boot my IPFire system.

I’ve ignored the problem as it’s rare that my modem drops out. However the problem has started happening after IPfire reboots (for an update).

Worse, after the latest core update (today), my connection drops out after 15 minutes of usage. red0 refuses to come back up and dhcpd is marked as failed, even though it appears I am assigned an IP by my ISP. The PC running IPFire shows link and activity lights for the red0 interface as does the VDSL modem/router where it is connected. The modem/router shows an “internet” light indicating a connection is established.

Gracefully shutting down IPFire then cold booting the system is the only way to get the connection to establish!

This is a really frustrating problem, does anyone know how to troubleshoot it?

Here’s log output from a recent attempt to restart the red0 interface from a secure shell:

Mar 19 21:18:17 gateway dhcpcd[18046]: DUID 00:02:03:00:02:00:02:00:05:00:00:06:00:09:00:08:00:01
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: IAID 31:08:a4:d8
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: adding address fe80::4262:31ff:fe08:a4d8
Mar 19 21:18:17 gateway dhcpcd[18046]: ipv6_addaddr1: Permission denied
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: soliciting a DHCP lease
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: soliciting an IPv6 router
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: offered x.x.x.137 from x.x.x.1
Mar 19 21:18:17 gateway dhcpcd[18046]: red0: ignoring offer of x.x.x.137 from x.x.x.1
Mar 19 21:18:18 gateway dhcpcd[18046]: red0: probing address x.x.x.137/24
Mar 19 21:18:23 gateway dhcpcd[18046]: red0: leased x.x.x.137 for 600 seconds
Mar 19 21:18:23 gateway dhcpcd[18046]: red0: adding route to x.x.x.0/24
Mar 19 21:18:23 gateway dhcpcd[18046]: red0: adding default route via x.x.x.1
Mar 19 21:18:23 gateway dhcpcd.exe[18126]: red0 has been (re)configured with IP=x.x.x.137
Mar 19 21:18:53 gateway dhcpcd[18046]: forked to background, child pid 19606
Mar 19 21:19:37 gateway dhcpcd[20116]: sending signal ALRM to pid 19606
Mar 19 21:19:37 gateway dhcpcd[20116]: waiting for pid 19606 to exit
Mar 19 21:19:37 gateway dhcpcd[19606]: received SIGALRM, releasing
Mar 19 21:19:37 gateway dhcpcd[19606]: red0: removing interface
Mar 19 21:19:37 gateway dhcpcd[19606]: red0: releasing lease of x.x.x.137
Mar 19 21:19:37 gateway dhcpcd[19606]: red0: deleting route to x.x.x.0/24
Mar 19 21:19:37 gateway dhcpcd[19606]: red0: deleting default route via x.x.x.1
Mar 19 21:19:37 gateway dhcpcd.exe[20117]: red0 has been brought down (STOP)

I notice that despite it’s name dhcpcd.exe seems to be a custom IPFire shell script.

Thank you very much in advance,
Frustrated.

dnl · 20 March 2020 09:10

After rebooting to apply the latest kernel update, the problem happened twice after two 15 minute periods. Since then it has thankfully stayed up, but I’m sure it is not resolved. The connection won’t be re-established if it drops out for any reason.

No interface errors:

# ethtool -S red0 | grep error
 rx_crc_errors: 0
 rx_missed_errors: 0
 tx_aborted_errors: 0
 tx_carrier_errors: 0
 tx_window_errors: 0
 rx_long_length_errors: 0
 rx_short_length_errors: 0
 rx_align_errors: 0
 rx_errors: 0
 tx_errors: 0
 rx_length_errors: 0
 rx_over_errors: 0
 rx_frame_errors: 0
 rx_fifo_errors: 0
 tx_fifo_errors: 0
 tx_heartbeat_errors: 0

dnl · 20 March 2020 09:27

Working through the connection establishment scripts…

When I run this:

/etc/init.d/network restart red

for my DHCP connection it deletes a bunch of transient config files under /var/ipfire/red
It then runs:

/etc/rc.d/init.d/networking/red

which checks that an interface is assigned for “red”.
then checks ‘ip link show red0’ succeeds
if a custom MAC is used, sets it using ‘ip link set dev…’
then adds firewall rules to allow communications with the DHCP server
writes the interface name to /var/ipfire/red/iface
checks if the wlan-client is used on red (but it is not for me)
then runs:

dhcpcd_start red0

but I cannot find how this starts the shell script named “dhcpcd.exe” nor any other functions called dhcpcd_start.

dnl · 20 March 2020 09:53

Still not sure what dhcpcd_start does, but it ends up calling dhcpcd.exe as I can see output from the latter script in syslog.

dhcpcd.exe’s dhcpcd_up function:
does some work with variables
checks this is the ‘red’ device
if DHCP is set it:
checks to see if /var/ipfire/red/active trigger file exists
then does some checks to see if services need to be restarted at update
gets DNS server info from a files in which the ISP’s DNS servers are set by DHCP
gets the new IP address from the DHCP offer and writes it to /var/ipfire/red/local-ipaddress
gets new default gateway details and writes that in to /etc/hosts
if update is required it:
checks that /var/ipfire/red/active triggger file exists (again) or touches that file if it does not
writes the ISPs router IP to /var/ipfire/red/remote-ipaddress

*when I get a chance I’ll come back and edit this to note exactly where messages are written to syslog - so I’ll be able to pinpoint where the first failure happens.

arne_f · 20 March 2020 21:25

dhcpcd.exe is a script that configure the red interface to the settings delivered from dhcp client. For an unknown reason the dhcp client get “SIGALRM” and stop the connection. Maybee there is a link loss or an other problem. dhcpcd.exe is not the reason. It was called with the parameter to stop and does this correctly.

Mar 19 21:19:37 gateway dhcpcd.exe[20117]: red0 has been brought down (STOP)

rodneyp · 20 March 2020 21:32

Some ISP supplied modem/router are very slow at establishing DHCP connections.

You can overcome this by making red0 a static address, within the router’s range. It is then mandatory to configure Setup -> Network Setup -> DNS & Gateway, from a console.

Nevertheless, IPFire red0 does not seem to be capable of estabishing a re-connection and reboot is generally required if red0 comes down.

The router’s LAN side & IPFire red0 should keep working, even if the WAN side drops out. A faulty router ?

dnl · 21 March 2020 02:21

@rodneyp I think you might be right. It has seemed to me before that the script gives up even though it is succeeding.

@arne_f I see you’re the author of dhcpcd.exe. Is it possible to adjust how connections are established so that IPFire can tolerate longer connection time? I gather this will need to be done in one of the many scripts in /etc/rc.d/init.d/networking/red.up (see below):

From the log output in my first post, this is the block where the problem occurs:

	if [ $update ]; then
		[ -e "/var/ipfire/red/active" ] || touch /var/ipfire/red/active
		echo -n "$new_routers" > /var/ipfire/red/remote-ipaddress
		logger -p local0.info -t dhcpcd.exe[$$] "$interface has been (re)configured with IP=$new_ip_address"
		run_subdir ${rc_base}/init.d/networking/red.up/
		touch /var/ipfire/red/active
	fi

It seems to be happening with run_subdir /etc/rc.d/init.d/networking/red.up.
I see the run_subdir function is included /etc/init.d/functions. It seems to just run all executables from the directory supplied. In my case that’s /etc/rc.d/init.d/networking/red.up. That directory contains:

drwxr-xr-x 2 root root 4096 Mar 21 13:33 .
drwxr-xr-x 4 root root  166 Mar 21 13:32 ..
-rwxr-xr-- 1 root root  727 Jun 28  2019 01-conntrack-cleanup
-rwxr-xr-- 1 root root   96 Jun 28  2019 10-miniupnpd
-rwxr-xr-- 1 root root  189 Jun 28  2019 10-multicast
-rwxr-xr-- 1 root root   83 Jun 28  2019 10-static-routes
-rwxr-xr-- 1 root root   47 Jun 28  2019 20-firewall
-rwxr-xr-- 1 root root 1029 Dec 14 19:23 23-suricata
lrwxrwxrwx 1 root root   36 Jun 28  2019 24-RS-qos -> ../../../../../usr/local/bin/qosctrl
-rwxr-xr-- 1 root root   80 Feb  4 18:02 25-update-dns-forwarders
lrwxrwxrwx 1 root root   11 Jun 28  2019 27-RS-squid -> ../../squid
-rwxr-xr-- 1 root root   43 Jun 28  2019 30-ddns
-rwxr-xr-- 1 root root   72 Mar 12 22:52 35-guardian
-rwxr-xr-- 1 root root   45 Jun 28  2019 50-ipsec
-rwxr-xr-- 1 root root  118 Jun 28  2019 50-ovpn
-rwxr-xr-- 1 root root   54 Jun 28  2019 98-leds
-rwxr-xr-- 1 root root   71 Dec 14 19:23 99-beep
-rwxr-xr-- 1 root root  209 Jun 28  2019 99-fireinfo
-rwxr-xr-- 1 root root  335 Oct 23 03:52 99-geoip-database
-rwxr-xr-- 1 root root   76 Jun 28  2019 99-pakfire-update

No idea which of these is failing as there’s no output in the logs to help.

dnl · 21 March 2020 02:24

rodneyp wrote:

You can overcome this by making red0 a static address, within the router’s range. It is then mandatory to configure Setup → Network Setup → DNS & Gateway, from a console.

The issue is that my ISP has multiple small IPv4 subnets - I can’t possibly know which they will advertise. Is your connection a static IP or how do you achieve this?

EDIT: For clarification, I’m using DHCP for my connection to my ISP - the modem/router is acting as just a translator. The actual DHCP lease is requested by IPFire through the modem/router and responded to by the ISP’s DHCP server. I cannot see how I could set a static IP in this situation.

dnl · 21 March 2020 03:07

@arne_f I cannot reproduce the problem right now, so it is possible that the time-out problem which @rodneyp suggests may not be happening. I will try to gather evidence when this happens next.

This post contains full output from a successful reconnection attempt. Aside from being slow (mainly due to squid and suricata restarting, the only thing which concerns me is a message I see all the time:

Mar 21 13:55:05 gateway dhcpcd[31695]: red0: ignoring offer of x.x.x.187 from x.x.x.1

Why would you ignore a valid offer, then take it later?!

Note that I have hacked the functions script to log every sub-task it runs as dhcpcd.exe in syslog, which is why you’ll see lots of unusual mentions of the scripts from /etc/rc.d/init.d/networking/red.up here

See my # comments inline:

Mar 21 13:54:44 gateway dhcpcd[31241]: sending signal ALRM to pid 30869
Mar 21 13:54:44 gateway dhcpcd[31241]: waiting for pid 30869 to exit
Mar 21 13:54:44 gateway dhcpcd[30869]: received SIGALRM, releasing
Mar 21 13:54:44 gateway dhcpcd[30869]: red0: removing interface
Mar 21 13:54:44 gateway dhcpcd[30869]: red0: releasing lease of x.x.x.44
Mar 21 13:54:44 gateway dhcpcd[30869]: red0: deleting route to x.x.x.0/24
Mar 21 13:54:44 gateway dhcpcd[30869]: red0: deleting default route via x.x.x.1
Mar 21 13:54:44 gateway dhcpcd.exe[31242]: red0 has been brought down (STOP)
Mar 21 13:54:44 gateway dhcpcd.exe: run_subdir on /etc/rc.d/init.d/networking/red.down/
Mar 21 13:54:44 gateway dhcpcd.exe: ipsec
Mar 21 13:54:44 gateway dhcpcd.exe: miniupnpd
Mar 21 13:54:44 gateway dhcpcd.exe: ovpn
Mar 21 13:54:44 gateway dhcpcd.exe: static
Mar 21 13:54:44 gateway dhcpcd.exe: firewall

Mar 21 13:54:45 gateway unbound: [1756:0] error: SERVFAIL <push.services.mozilla.com. A IN>: all the configured stub or forward servers failed, at zone .
Mar 21 13:54:45 gateway unbound: [1756:0] error: SERVFAIL <push.services.mozilla.com.private.lan. A IN>: all the configured stub or forward servers failed, at zone .

Mar 21 13:54:46 gateway dhcpcd.exe: beep
Mar 21 13:54:46 gateway dhcpcd[30869]: dhcpcd exited

Mar 21 13:54:48 gateway kernel: 8021q: adding VLAN 0 to HW filter on device red0
Mar 21 13:54:48 gateway dhcpcd[31695]: red0: waiting for carrier
Mar 21 13:54:51 gateway dhcpcd[31695]: red0: carrier acquired
# Note that a carrier was aquired in 3 seconds - seems OK to me.

Mar 21 13:54:51 gateway kernel: igb 0000:01:00.0 red0: igb: red0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Mar 21 13:54:51 gateway dhcpcd[31695]: DUID 00:04:03:00:02:00:04:00:05:00:00:06:00:01:00:03:00:04
Mar 21 13:54:51 gateway dhcpcd[31695]: red0: IAID c0:08:a4:d8
Mar 21 13:54:51 gateway dhcpcd[31695]: red0: adding address fe80::c2c1:c0ff:fe08:a4d8
Mar 21 13:54:51 gateway dhcpcd[31695]: ipv6_addaddr1: Permission denied
Mar 21 13:54:51 gateway dhcpcd[31695]: red0: soliciting a DHCP lease
Mar 21 13:54:51 gateway dhcpcd[31695]: red0: soliciting an IPv6 router

# I assume this section from unbound relates to some queries which were already in progess while the internet was down
Mar 21 13:54:55 gateway unbound: [1756:0] error: SERVFAIL <push.services.mozilla.com. A IN>: all the configured stub or forward servers failed, at zone .
Mar 21 13:54:55 gateway unbound: [1756:0] error: SERVFAIL <push.services.mozilla.com.private.lan. A IN>: all the configured stub or forward servers failed, at zone .
Mar 21 13:54:58 gateway unbound: [1756:0] error: SERVFAIL <edge-chat.facebook.com. A IN>: all the configured stub or forward servers failed, at zone .
Mar 21 13:55:05 gateway unbound: [1756:0] error: SERVFAIL <edge-chat.facebook.com. A IN>: all the configured stub or forward servers failed, at zone .

# Note that the lease solicitation to response took 6 seconds - that seems OK to me.
Mar 21 13:55:05 gateway dhcpcd[31695]: red0: offered x.x.x.187 from x.x.x.1

# Ignoring offer from the ISP's router IP?  I've seen this often before.  This seems wrong!
Mar 21 13:55:05 gateway dhcpcd[31695]: red0: ignoring offer of x.x.x.187 from x.x.x.1

Mar 21 13:55:05 gateway dhcpcd[31695]: red0: probing address x.x.x.187/24

Mar 21 13:55:09 gateway unbound: [1756:0] error: SERVFAIL <www.google.com. A IN>: all the configured stub or forward servers failed, at zone .
Mar 21 13:55:09 gateway unbound: [1756:0] error: SERVFAIL <www.gstatic.com. A IN>: all the configured stub or forward servers failed, at zone .

Mar 21 13:55:10 gateway dhcpcd[31695]: red0: leased x.x.x.187 for 600 seconds
Mar 21 13:55:10 gateway dhcpcd[31695]: red0: adding route to x.x.x.0/24
Mar 21 13:55:10 gateway dhcpcd[31695]: red0: adding default route via x.x.x.1
Mar 21 13:55:10 gateway dhcpcd.exe[32090]: red0 has been (re)configured with IP=x.x.x.187

Mar 21 13:55:10 gateway dhcpcd.exe: run_subdir on /etc/rc.d/init.d/networking/red.up/
Mar 21 13:55:10 gateway dhcpcd.exe: conntrack
Mar 21 13:55:10 gateway dhcpcd.exe: miniupnpd
Mar 21 13:55:10 gateway dhcpcd.exe: multicast
Mar 21 13:55:11 gateway dhcpcd.exe: static
Mar 21 13:55:11 gateway dhcpcd.exe: firewall
Mar 21 13:55:12 gateway dhcpcd.exe: suricata

Mar 21 13:55:18 gateway suricata: rule reload complete
Mar 21 13:55:18 gateway suricata: Signature(s) loaded, Detect thread(s) activated.
Mar 21 13:55:18 gateway suricata: Signal Received.  Stopping engine.
Mar 21 13:55:18 gateway suricata: (W-NFQ#0) Treated: Pkts 2300, Bytes 498594, Errors 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#0) Verdict: Accepted 2254, Dropped 46, Replaced 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#1) Treated: Pkts 509, Bytes 85103, Errors 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#1) Verdict: Accepted 499, Dropped 10, Replaced 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#2) Treated: Pkts 46, Bytes 12985, Errors 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#2) Verdict: Accepted 46, Dropped 0, Replaced 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#3) Treated: Pkts 141, Bytes 49597, Errors 0
Mar 21 13:55:18 gateway suricata: (W-NFQ#3) Verdict: Accepted 132, Dropped 9, Replaced 0
Mar 21 13:55:19 gateway suricata: This is Suricata version 5.0.2 RELEASE running in SYSTEM mode
Mar 21 13:55:19 gateway suricata: [ERRCODE: SC_WARN_NO_STATS_LOGGERS(261)] - stats are enabled but no loggers are active
Mar 21 13:55:19 gateway suricata: all 4 packet processing threads, 2 management threads initialized, engine started.
Mar 21 13:55:19 gateway suricata: rule reload starting

Mar 21 13:55:19 gateway dhcpcd.exe: RS
Mar 21 13:55:20 gateway kernel: HTB: quantum of class 10001 is big. Consider r2q change.
Mar 21 13:55:20 gateway kernel: HTB: quantum of class 20001 is big. Consider r2q change.
Mar 21 13:55:20 gateway kernel: HTB: quantum of class 20203 is big. Consider r2q change.
Mar 21 13:55:20 gateway kernel: HTB: quantum of class 20204 is big. Consider r2q change.
Mar 21 13:55:20 gateway root: Could not find a bridged zone for imq0
Mar 21 13:55:20 gateway codel: Codel AQM could not be enabled on 'imq0'. Error code: 2
Mar 21 13:55:21 gateway dhcpcd.exe: update

Mar 21 13:55:21 gateway unbound: [1756:0] info: service stopped (unbound 1.9.6).
Mar 21 13:55:21 gateway unbound: [1756:0] info: server stats for thread 0: 149 queries, 130 answers from cache, 19 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Mar 21 13:55:21 gateway unbound: [1756:0] info: server stats for thread 0: requestlist max 2 avg 0.263158 exceeded 0 jostled 0
Mar 21 13:55:21 gateway unbound: [1756:0] info: average recursion processing time 0.316794 sec
Mar 21 13:55:21 gateway unbound: [1756:0] info: histogram of recursion processing times
Mar 21 13:55:21 gateway unbound: [1756:0] info: [25%]=5.9375e-07 median[50%]=0.16384 [75%]=0.432538
Mar 21 13:55:21 gateway unbound: [1756:0] info: lower(secs) upper(secs) recursions
Mar 21 13:55:21 gateway unbound: [1756:0] info:    0.000000    0.000001 8
Mar 21 13:55:21 gateway unbound: [1756:0] info:    0.065536    0.131072 1
Mar 21 13:55:21 gateway unbound: [1756:0] info:    0.131072    0.262144 2
Mar 21 13:55:21 gateway unbound: [1756:0] info:    0.262144    0.524288 5
Mar 21 13:55:21 gateway unbound: [1756:0] info:    0.524288    1.000000 1
Mar 21 13:55:21 gateway unbound: [1756:0] info:    1.000000    2.000000 2
Mar 21 13:55:21 gateway unbound: [1756:0] notice: Restart of unbound 1.9.6.
Mar 21 13:55:21 gateway unbound: [1756:0] notice: init module 0: validator
Mar 21 13:55:21 gateway unbound: [1756:0] notice: init module 1: iterator
Mar 21 13:55:21 gateway unbound: [1756:0] info: start of service (unbound 1.9.6).
Mar 21 13:55:21 gateway unbound: [1756:0] info: generate keytag query _ta-4a5c-4f66. NULL IN

Mar 21 13:55:25 gateway dhcpcd.exe: RS

# Suricata restarting takes a while...
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'is_proto_irc' is checked but not set. Checked in 2002029 and 7 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.http.javaclient.vulnerable' is checked but not set. Checked in 2013036 and 2 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.http.javaclient' is checked but not set. Checked in 2017181 and 15 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.ELFDownload' is checked but not set. Checked in 2019896 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.DocVBAProject' is checked but not set. Checked in 2020170 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.MSSQL' is checked but not set. Checked in 2020569 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ETPRO.RTF' is checked but not set. Checked in 2020700 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.MCOFF' is checked but not set. Checked in 2022303 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.autoit.ua' is checked but not set. Checked in 2019165 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.OSX.Mokes' is checked but not set. Checked in 2023183 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.IE7.NoRef.NoCookie' is checked but not set. Checked in 2023671 and 9 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'min.gethttp' is checked but not set. Checked in 2023711 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.http.binary' is checked but not set. Checked in 2018103 and 6 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.armwget' is checked but not set. Checked in 2024241 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.smb.binary' is checked but not set. Checked in 2027402 and 4 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.Socks5.OnionReq' is checked but not set. Checked in 2027704 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.JavaArchiveOrClass' is checked but not set. Checked in 2017772 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.pdf.in.http' is checked but not set. Checked in 2017150 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.JS.Obfus.Func' is checked but not set. Checked in 2017246 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.http.PK' is checked but not set. Checked in 2017670 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'HTTP.UncompressedFlash' is checked but not set. Checked in 2018428 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.WinHttpRequest' is checked but not set. Checked in 2019822 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.wininet.UA' is checked but not set. Checked in 2021312 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.MS.XMLHTTP.ip.request' is checked but not set. Checked in 2022050 and 1 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.MS.XMLHTTP.no.exe.request' is checked but not set. Checked in 2022053 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'et.MS.WinHttpRequest.no.exe.request' is checked but not set. Checked in 2022653 and 0 other sigs
Mar 21 13:55:27 gateway suricata: [ERRCODE: SC_WARN_FLOWBIT(306)] - flowbit 'ET.eduphish' is checked but not set. Checked in 2025114 and 0 other sigs

# Squid restarting takes a long time...
Mar 21 13:55:31 gateway squid[30737]: Squid Parent: squid-1 process 30740 exited with status 0
Mar 21 13:55:31 gateway squid: squid shutdown time: 6 seconds
Mar 21 13:55:36 gateway squid[1021]: Squid Parent: will start 1 kids
Mar 21 13:55:36 gateway squid[1021]: Squid Parent: (squid-1) process 1024 started
Mar 21 13:55:36 gateway squid[1021]: Squid Parent: squid-1 process 1024 exited with status 0
Mar 21 13:55:37 gateway squid[1043]: Squid Parent: will start 1 kids
Mar 21 13:55:37 gateway squid[1043]: Squid Parent: (squid-1) process 1046 started

# Other things are quick...
Mar 21 13:55:37 gateway dhcpcd.exe: ddns
Mar 21 13:55:38 gateway dhcpcd.exe: guardian
Mar 21 13:55:38 gateway dhcpcd.exe: ipsec
Mar 21 13:55:38 gateway guardian[3533]: <info> Reloading ignore list... 
Mar 21 13:55:38 gateway dhcpcd.exe: ovpn
Mar 21 13:55:38 gateway guardian[3533]: <err> /var/ipfire/red/dns1 will not be included. File does not exist! 
Mar 21 13:55:38 gateway guardian[3533]: <err> /var/ipfire/red/dns2 will not be included. File does not exist! 
Mar 21 13:55:38 gateway dhcpcd.exe: leds
Mar 21 13:55:38 gateway dhcpcd.exe: beep
Mar 21 13:55:39 gateway dhcpcd.exe: fireinfo
Mar 21 13:55:39 gateway dhcpcd.exe: geoip
Mar 21 13:55:39 gateway dhcpcd.exe: pakfire
# Yay connection fully established this time!
Mar 21 13:55:39 gateway dhcpcd[31695]: forked to background, child pid 1170
# On with business as usual...
Mar 21 13:55:40 gateway pakfire: PAKFIRE INFO: IPFire Pakfire 2.25-x86_64 started!
...

rodneyp · 21 March 2020 03:08

My modem/router, as supplied, was not capable of bridging mode. It has a fixed, 10.1.1.1 LAN side address and works with devices having a static address on that network. The WAN side is fairly irrelevant, although it appears to be a peristent address.

It does look as though you can’t set a static address, without changing the modem to one that routes. Meanwhile, you look like being stuck with rebooting IPFire every time there is a dropout. It wouild be worthwhile having an IPFire hardware that has hardware RNG or at least fast software RNG. It’s surprising how much faster a Nano Pi is cf Banana Pi.

dnl · 21 March 2020 03:11

Thanks, yeah I’m using bridging mode - I forgot to say that!

My system is a mini PC which does support Intel(R) RDRAND although I’m not sure how that is relevant to this problem. Can you please explain?

EDIT: Even with a Intel(R) Celeron(R) CPU J1900 @ 1.99GHz, SSD storage and 4GiB RAM, IPFire takes a considerable time to boot and establish an internet connection. Admittedly I do have most of the bells+whistles enabled like Suricata (with a well maintained ruleset - not everything!).

rodneyp · 21 March 2020 04:36

The software RNG can take a relatively long part of the boot time and be a nuisance if you are having to reboot frequently.

If you follow the boot process on a terminal, you will get a feel for the times, which the bootlog does not record.

I don’t find any disadvantages in a routing modem, although it might be a constraint on a fast Internet link. Are you able to use one, or is your ISP restricting to a specific MAC address ?

rodneyp · 21 March 2020 04:41

The other suggestion that I intended to add, is to put your fastest NIC on the red0 side. It might result in more reliable connection to your ISP

dnl · 21 March 2020 08:09

Thank you, but this is going off-topic. I’m trying to find the root cause of this in the hope it can be solved - not find ways to rebooting my IPFire box faster every time I have a problem! (Also all 4 NICs are the same intel NICs)

Also I’m 99% sure my IPFire hardware isn’t at fault in this. The problem has happened since I switched to VDSL2, but that doesn’t mean it’s the modem/router alone - it could be a related thing in IPFire (like a short timeout).

dnl · 21 March 2020 08:16

I don’t find any disadvantages in a routing modem, although it might be a constraint on a fast Internet link. Are you able to use one, or is your ISP restricting to a specific MAC address ?

There are multiple reasons to use bridging wherever possible:

Security (I trust IPFire a lot more than I trust a vendor selling a modem/router at a low price and with many features which are often vulnerable). When using Bridging the modem/router’s attack surface is minuscule - See network - Is it possible to attack bridged DSL modem? - Information Security Stack Exchange
To avoid double NAT
EDIT:
So all Internet firewall denial logs are in the one place - IPFire

rodneyp · 21 March 2020 21:54

That article requires much interpretation:

“modem logically doesn’t act like a router” confirms an explanation that I have previously seen that DSL modem are designed as routers and bridged mode deploys additional processing to work with the same IP address for both sides. In early DSL days I used bridged mode, in the mistaken belief that it imposed less processing on the puny modems of the day and would improve speed. Turns out the reverse is true.

The article does point out that LAN side devices, IPFire in your case, are visible from the Internet. My modem does run an SPI firewall, that would reject a lot of incoming packets. Further back, my ISP also filters out up to 7 protocols for home users, including, if so set, inbound http & Windows file sharing.

My modem, TG brand, is not available retail and any firmware updates need to be done by the ISP, which I believe they will do only in routing mode. They generally do updates during “silent hours” The updates cover a plethora of issues including DNS, VoIP, dropouts etc.

The comment I have, from several network engineers, regarding double-NAT, is that it is now standard practice in telco networks & large organisations, freeing up masses of IP4 public addresses and avoiding the need for IP6 indefinitely (as well as saving the cost of public addresses). The challenge has become using double-NAT reliably. The only change that I need is to inform IPFire that its gateway is the modem.

dnl · 22 March 2020 01:58

Thank you for your opinion. I don’t agree for a few technical reasons. Could you please start another thread to discuss the differences between bridging and using a separate router?

In this thread I’ve asked for help in diagnosing a problem. While some assumptions have been made we still don’t actually know the root cause.

@Arne.F can you think of any way I can find more information about the problem? By default, syslog does not seem to have any detail.

As I mentioned before, I’ve added some logger entries to the functions file, so we can hopefully see where the problem may be. I’m concerned that this still won’t be enough to diagnose it though.

Thank you

dnl · 22 March 2020 09:38

I can reproduce the problem and I believe I have evidence at least one issue needs to be solved in IPFire.
1. IPFire isn’t detecting when the line sync (ppp session) has been lost. As a result it never attempts to reconnect.
2. dhcpcd isn’t accepting all valid offers and may need an extended timeout for DHCP (or the timeout is too short)

Looks like /etc/rc.d/init.d/networking/functions.network is where this is run from. I wonder if I can tweak it to increase the timeout, or if the issue is related to how it is ignoring valid DHCP leases - without explanation.

Here’s a recap from my testing now:

Pull VDSL cable between ISP and Modem
Wait 10 seconds, observe modem/router switch off “internet” LED
Reconnect VDSL cable
IPFire “Main page” still shows a connection established.
From IPFire (via SSH) ping 1.1.1.1 - no response. IPFire logs fill with DNS request failures.
Wait at least 15 seconds. IPFire still hasn’t connected internet is unavailable and tried to re-establish a session.
After a further 15 seconds, manually run /etc/init.d/network restart red, but it fails with only the unhelpful message Starting dhcpcd on the red0 interface... [ FAIL ] and with dhcpcd exited in syslog.
Run it again, this also fails with the same message.
Run it a third time - it succeeds.

First failure - Times out after 26 seconds

Mar 22 20:11:11 gateway dhcpcd[11387]: red0: soliciting a DHCP lease
Mar 22 20:11:11 gateway dhcpcd[11387]: red0: soliciting an IPv6 router
Mar 22 20:11:37 gateway dhcpcd[11387]: timed out
Mar 22 20:11:37 gateway dhcpcd[11387]: dhcpcd exited

Second failure - ignores a valid offer after 28 seconds - times out after 29 seconds

Mar 22 20:12:37 gateway dhcpcd[12489]: red0: soliciting a DHCP lease
Mar 22 20:12:37 gateway dhcpcd[12489]: red0: soliciting an IPv6 router
Mar 22 20:13:05 gateway dhcpcd[12489]: red0: offered x.x.x.x.191 from x.x.x.x.1
Mar 22 20:13:05 gateway dhcpcd[12489]: red0: ignoring offer of x.x.x.191 from x.x.x.1
#WHY?!!
Mar 22 20:13:06 gateway dhcpcd[12489]: red0: probing address x.x.x.191/24
Mar 22 20:13:06 gateway dhcpcd[12489]: timed out
Mar 22 20:13:06 gateway dhcpcd[12489]: dhcpcd exited

3rd attempt and success - ignores a valid offer after 4 seconds - but accepts it after probing by the time 8 seconds have past

Mar 22 20:13:40 gateway dhcpcd[13322]: red0: soliciting a DHCP lease
Mar 22 20:13:41 gateway dhcpcd[13322]: red0: soliciting an IPv6 router
Mar 22 20:13:44 gateway dhcpcd[13322]: red0: offered x.x.x.163 from x.x.x.1
Mar 22 20:13:44 gateway dhcpcd[13322]: red0: ignoring offer of x.x.x.163 from x.x.x.1
Mar 22 20:13:44 gateway dhcpcd[13322]: red0: probing address x.x.x.163/24
Mar 22 20:13:49 gateway dhcpcd[13322]: red0: leased x.x.x.163 for 600 seconds
Mar 22 20:13:49 gateway dhcpcd[13322]: red0: adding route to x.x.x.0/24
Mar 22 20:13:49 gateway dhcpcd[13322]: red0: adding default route via x.x.x.1
Mar 22 20:13:49 gateway dhcpcd.exe[13430]: red0 has been (re)configured with IP=x.x.x.163

Why reject an invalid offer after 28 seconds, if previously it timed out at 26?!
If it is still waiting for an offer at 28 seconds and it gets one, why ignore it?

dnl · 22 March 2020 10:35

Troubleshooting dhcpcd

Changed line 78 in /etc/rc.d/init.d/networking/functions.network from
/sbin/dhcpcd ${dhcp_start} ${device} >/dev/null 2>&1
to
/sbin/dhcpcd ${dhcp_start} ${device} --debug

Now syslog has a lot more detail!

dnl · 22 March 2020 10:36

Another tip: You can forcibly restart a bridged network from the IPFire UI by:

Open the Network -> Assign MAC-Address page
Pick a valid but random MAC
Click Save. A “Reconnect” button is displayed.

If you want to reconnect again, just click “Save” again - you do not have to change the custom MAC. The “Reconnect” button will be displayed again and you can use it to forcibly reconnect.