IPFire went down last night, can't find cause

Do you use nrpe?

If you use nrpe then you can try uninstalling and then re-installing.

You’ll see the symlink in the “Create symlinks” section of the wiki.

https://wiki.ipfire.org/addons/nagios-nrpe#create-symlinks

2 Likes

Reading through the messages file there is a normal kernel message about a firewall DROP_INPUT for some packet on the red interface and then there are just a load of # characters before there is the message that IPFire is restarting with the checks for the network interfaces and the cpu etc, after the filesystem has been recovered as recorded in the bootlog.

For there to be no message in the log at all before IPFire starts again suggests that something caused a hard restart, the equivalent of pulling the plug on the pc.
A normal restart process would have all the messages about IPFire stopping the various services in order and none of these are present.

Reading through the bootlog for the messages after the system came back up, I found the file recovery messages for sda4 (root file system) indicating that the whole system was just effectively switched off as the filesystem had to be checked due to files that were not closed properly.
The bootlog finishes with the remounting of the drives in read write mode after the boot process has finished.

I have done a search about finding the causes for random reboots with no log messages and unfortunately I couldn’t find anything to help track it down any further.

3 Likes

I found some thread discussion of certain combination of firmware and linux kernel, like this one, but there are many to be found. There is also a linux bug report here.

Unfortunately the situation happened again, this time as I came into the office at 7:55 AM, I was told the network was down, and it went down in the last few minutes.

My work PC was not able to get to any ip address, I could not ping the router. I went into the server room, the router was still powered on, I could log into the router at the console, and I could ping google from the console.

I tried restarting dhcpd - no internet on other machines. I tried restarting unbound, still nothing. I rebooted the router, then we are back online. No changes to any other devices were made.

Attached are the sanitized bootlog and the portion of /var/log/messages from today, with names, ip’s and mac’s omitted for security. Please help.

bootlog.gz (14.2 KB)
messags.10.27.22.txt.gz (906.2 KB)

please add these graphs:

Here they are

Out of curiosity: do you have IPsec enabled also? Do you use it?

https://ipfire.localdomain:444/cgi-bin/vpnmain.cgi

Yes, it is enabled and we use it for a point to point vpn tunnel to a second office.

Things were running when ROOT logged in and did a restart near 8:01.

It looks like the green network stopped talking to the IPFire device near 07:40. this is the last communication:

Oct 27 07:40:40 ipfire kernel: FORWARDFW IN=green0 OUT=red0 MAC=<green-mac>:7c:cb:e2:e4:d0:b8:08:00 SRC=<green-network.>26 DST=40.126.28.14 LEN=52 TOS=0x00 PREC=0x00 TTL=127 ID=2972 DF PROTO=TCP SPT=64922 DPT=443 WINDOW=8192 RES=0x00 SYN URGP=0

(blue stopped a minute before)

but I don’t see the “why”…

Can you post these graphs:
https://ipfire.localdomain:444/cgi-bin/netinternal.cgi

The uptime of the system was 2.5 days, it was up since I replaced the power supply on the unit and moved it to a different battery backup. I thought that would ultimately be the fix but it went down again today … :thinking:

to me, this looks very different from the other issues. (again I don’t know why).

What is the GREEN and BLUE networks plugged into? Is it two different switches (or hubs)?

It is odd that BLUE & GREEN both stopped within a minute. And RED kept on running.

Blue is plugged in to a stand alone wireless access point that we use for guest internet access for customers who come into the building

Green is plugged into our core switch

Red is plugged into the Fiber modem provided by our ISP

I’m strongly considering refreshing the hardware of our IPFire, but upon reviewing the Lightningwire labs boxes that are built, they don’t have some of the features that I want. The specs that I’m looking at are this:

Rackmount Unit
Single Xeon CPU
8+ GB ECC Ram
RAID 1 w/ 2 SSD’s and controller card
Redundant power supplies
4 Intel NIC Controller Cards all 1Gbps, or possibly some 10Gbps cards for green to go between the IPFire and the Core Switch.
Onboard Encryption hardware chip

Thoughts? What vendors do you guys use?

Lots! Selfmade.

What brands of components do you like?

Never had probs with Advantech or Supermicro motherboards. Case → Chenbro. I don’t use hardware raid anymore → oldschool. For reg. ECC memory I use Samsung. SSD for enterprise usage WD red.

Is the core switch also plugged into a UPS?

Yes it is.

Any other thoughts on troubleshooting on why it went down? Should I just replace the hardware?

Chris