IPFire performances drops then crashes

Hello everyone,

All my machines (including IPFire) are connected in Ethernet in a standard type network (Red-Green).

Since the CU187 (I don’t know if it’s related), I notice a significant degradation of the performance of one of my IPfire… Access to the local network becomes more and more difficult, access to the Internet random, the symptoms:

From a machine on the local network, the response to the ping with IPFire oscillates between 15 ms to 2500 ms and sometimes more, which explains the loss of Internet and that of the local network (IPFire is also a DHCP server!).

The storage seems to be heavily used, until IPFire completely blocks (see screenshots - IPFire crashed between 4:30 and 13:00).



All my machines (including IPFire) are connected in Ethernet for a standard type network (Red-Green).

When this happens to me, I have to restart IPFire in forced mode; after reboot, the network cards are functional again for 2 to 3 days… Until the next blockage.

I then switched IPFire from CU187 to CU189, but that didn’t change anything.

I suspected a connection problem so I checked all the cables, but no anomalies.

I also thought it was a hardware problem for IPFire, so I saved and restored the IPFire configuration on another machine - same symptoms on the new machine.

I then suspect a problem in the IPFire configuration… If any of you have an idea, I’m interested ! Thanks !

All the best,
Stephane

Mine did that until I manually updated the Kernel.

Write this down, and download and install this kernel at the IPfire computer as root. Run these command lines in order.

wget https://people.ipfire.org/~arne_f/highly-experimental/kernel/kernel-6.6.56-ipfire-x86_64.tar.xz
tar xvf kernel-6.6.56-ipfire-x86_64.tar.xz -C /
rm kernel-6.6.56-ipfire-x86_64.tar.xz
dracut --regenerate-all
grub-mkconfig -o /boot/grub/grub.cfg
1 Like

Hi @dr_techno,

Thanks for your quick and precise answer.

Is the action you propose recommended by the IPFire developers? Because it seems definitive and really sensitive…

I am not sure.

In the last update, firmware has been updated, and you might have some hardware that is using something that is now not working as well as before… Or the hardware is literally failing.

Are there any processes that are locking up? The graph itself doesn’t show a lot of detail.

I had problems and asked for a kernel upgrade since it was a known bug in the kernel they used with certain combinations with hardware. The file comes from an IPFire developer. There are some new 64 Intel network drivers that you have to set up manually, but this issue is being addressed with Intel.

Kernel bugs in long term service kernels are a rarity. This is the second time I know of this happening in the Linux eco-system. I asked a friend of mine that is a developer for Redhat what was wrong with my system and after he analysed what was going on, he told me of the issues in the kernel they deployed. Because not too long before, him and his team had to patch their kernel on Redhat for the same reasons.

Ok @ms, @dr_techno, thanks for your answers,

what makes me doubt is that following this problem on the original machine, I reinstalled IPFire CU 189 on another machine (identical) and the same problems appeared after restoring the IPFire configuration (+ addons).

The same symptoms: The IPFire system is installed on SSD, it is increasingly heavily used, to the point of no longer being able to navigate in the GUI and making the IPFire Ethernet network cards inoperable (hence the local network / Internet problem for the machines connected to it).

Indeed, I only shared a few graphs, I don’t know where to look to find where the disk load comes from…

There has not been a kernel update in the last update. Just the firmware, but that would not change anything about the SSD as it brings its own firmware.

Do you have a RAID controller or something like that?

No RAID, only 1 SSD in IPFire :unamused:

Then give the new kernel a try… It is just the latest version of the current LTS kernel…

Alternatively you can install Core Update 190 from unstable which has the same kernel.

1 Like

Ok @ms,

as I have a machine identical to the one in production (~20 users connected on it), I will update it to CU190 unstable, do new tests and compare… I will come back here to share my results

1 Like

FYI…

A few seconds after powering on my spare machine (CU189 stable), on the same LAN as my work PC in Gigabit Ethernet - the ping results speak for themselves:

A post was split to a new topic: IUPFire VM with IPS has high CPU utilisation

If I understand correctly 192.168.1.100 is a machine on your green network being pinged from IPFire itself. If the ping time is reaching 0.5 to 2 seconds then maybe there is some hardware problem with the network interface on your system.

What do you find if you run the command

ip -s link show green0

replace green0 with whichever interface you aree actually using.

This will show if there are any dropped packets during transmission or reception.
The response should be all zeros as

ip -s link show green0
3: green0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 00:0d:b9:5e:aa:dd brd ff:ff:ff:ff:ff:ff
    RX:  bytes packets errors dropped  missed   mcast           
     201902824 1496502      0       0       0       0 
    TX:  bytes packets errors dropped carrier collsns           
    8398359424 7320842      0       0       0       0 
2 Likes

Hello @bonnietwin,

Indeed, localhost is my work PC and 192.168.1.100 is IPFire, on the same Gigabit Ethernet switch (green network) ; my results:

RX - dropped : 995
RX - mcast : 23

The other results and TX are equal to 0

you can use Linux perf tool Linux perf Examples to profile the system. perf tool is not built in IPFire though, but can be easily added to IPFire toolset.

Okay.

Are you able to connect your work PC directly to the IPFire green interface and see if you still have the problem.

If no then the dropped packets are occurring because of the ethernet switch.
If yes the the dropped packets are occurring because of the ethernet interface on IPFire.

You could also try doing a fresh install of CU186 and restore your backup and see if the problem persists. You indicated that the problem started after CU187 took place, so if related to that then going back to CU186 should no longer show the problem. If there is no problem then do an update to CU189 to see if the problem comes back. Then you have shown you can turn the problem on and off by CU version.

If the problem is still present with CU186 then a hardware problem somewhere is more likely.

2 Likes

kernel.perf_event_paranoid is set to 3 in sysctl.conf

There is a kernel hardening patch related to kernel perf events in IPFire that has the following description

When kernel.perf_event_paranoid is set to 3 (or greater), disallow
all access to performance events by users without CAP_SYS_ADMIN.

This new level of restriction is intended to reduce the attack
surface of the kernel. Perf is a valuable tool for developers but
is generally unnecessary and unused on production systems. Perf may
open up an attack vector to vulnerable device-specific drivers as
recently demonstrated in CVE-2016-0805, CVE-2016-0819,
CVE-2016-0843, CVE-2016-3768, and CVE-2016-3843. This new level of
restriction allows for a safe default to be set on production systems
while leaving a simple means for developers to grant access [1].

This feature is derived from CONFIG_GRKERNSEC_PERF_HARDEN by Brad
Spengler. It is based on a patch by Ben Hutchings [2]. Ben’s patches
have been modified and split up to address on-list feedback.

kernel.perf_event_paranoid=3 is the default on both Debian [2] and
Android [3].

https://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=src/patches/linux/linux-5.15.17-security-perf-allow-further-restriction-of-perf_event_open.patch;hb=78f1bb1de5972bcb4f8db873d0271352d1dbb506#l7

This suggests not to use this setting on a production system.

1 Like

@ms,

I upgraded my spare IPFire machine to an unstable CU190 and then rebooted; same problem within seconds…

@bonnietwin,

Thanks for these ideas that I will try right now…

A first look at the page cited shows that these tools are mainly profiling tools. This kind of tools are used at the stage of development.
IPFire is a production system, therefore also the lacking of compilers and linkers etc.