Since installing 198 IPS/Suricata bogging down the FW

Hi,

Since installing V198 I have noticed the system is being bogged down by IPS/Suricata. Key indicators:

  • I have a speedtest to my ISP running periodically (on one of my servers lscr.io speedtest-tracker docker container) and noticed in the graph that since V198 I see my 1g download speed going from an average of 920mbps to about 620mbps
  • I started investigating what might be causing this, initially thinking it was an ISP/NBN issue, but when I checked the FW I found that load average was higher than usual and that Suricata was by far chewing up the most resources….on average north of 20% of CPU time…..this has never been the case before and I have been running IPS for a very long time
  • My FW green network has a 10gb ethernet connection as most of my servers on green are 10gb connected. I noticed running an iperf3 between the FW and one of my servers (to and from) yielded a bitrate of ~9.4gbps with IPS switched off and ~900mbps with IPS switched on
  • Further to this, looking at the IPS throughput graph on the FW GUI I can see that where most of the traffic was ā€œoffloadedā€ and a small amount was ā€œscannedā€ prior to V198 its is now almost exactly the opposite ie majority of traffic ā€œscannedā€ and a small amount of traffic ā€œoffloadedā€
  • Finally I ran several speedtests from the container mentioned above to be sure IPS was causing the issue and sure enough I got ~300mbps drop when IPS was on vs when it was off…….

Everything I can see from a degradation started when I went to V198 at the end of last month.

Has anyone else seen this performance hit due to IPS? Any ideas?

Thanks

Hi @jimz!

I am seeing the same behavior here with my mini appliance. When I turn off monitoring of the IPS system for RED, I get the original transfer rates (200 Mbps). When turned on, the download speed is halved.

Greetings Wayne

1 Like

Not really.

The following is what I have on my system.

             offloaded              scanned
CU197          50kbps                55kbps
CU198          32kbps                90kbps

So with the latest Suricata version of 8.0.1 which is able to cover more protocols that previously then more of the traffic is able to be analysed.
That seems to me to be a good thing but the total has gone from 105kbps to 122kbps which is not a big difference and certainly I have not seen any impact on my speed checks.
The vectorscan library that suricata uses had an update in CU198 and this might well have enabled suricata to handle analyses quicker so that might be related to the total numbers for traffic through suricata having increased.

I have a script that collects the speed data every 4 hours. I have not noticed any change in that performance.

My cpu% has not changed. It is still at around 0.25%

There has been no change in the Load Average graph over the change from CU197 to CU198. Also no change to the memory, Connections or Firewall Hits levels.

So no I am not seeing any performance hit due to IPS with CU198.

I am only running Emerging Threats ruleset on my system.

1 Like

I have a problem with my upgraded internet access ( Vodafone Cable Internet ) at the moment.
I get only 100MBit/s, provisioned is 1000MBit/s. One aspect may be the connection at higher frequencies. This is investigated by the provider.

After following this thread I disabled IPS. And I get 400MBit/s, which could be explained by the physical problems.

1 Like

:thinking: Does this result show up in speedtest-cli?

Yes.
But my concern isn’t the absolute value ( Cable problem, reliability of the measured speed ), but the relative step with suricata on/off.

You are going to have to look in your kernel and suricata logs to see if you can see what is triggering that, as I don’t experience that at all, so I have no idea what could be causing it.

Bernhard Im seeing that step change as well from an on/off perspective. As I said this has only started since CU198 was installed…..nothing else has changed. Just to recap IPS shows a throughput change from a 10:1 offloaded vs scanned to now almost exactly the other way around. Also did some simple testing using iperf3 during the night (everyone in bed and nothing really happening on the network). Connection to the ISP was disconnected to remove internet traffic from the equation:

  • IPS off on every interface…..iperf3 on green (over 10gbps) generates north of 9.4gbps bitrate consistently (this is normal)
  • IPS activated only on the Green interface……iperf3 generates ~900mbps bitrate consistently…….note 900mbps is normal for a 1g link
  • ethtool confirms the connection on both ipfier and destination server are both at 10gbps

Like I said above 900mbps is what I see as normal throughput between two 1gbps connections. It’s as if suricata tells the 10gb NIC to only push 1gbps max instead of 10gbps when it is activated on that network? Very weird….Im also trying to understand if the following is also relevant 11.5. High Performance Configuration — Suricata 9.0.0-dev documentation

Also maybe suricata doesn’t handle iperf well…….

In the end something changed in CU198 that’s causing this……

1 Like

The article sited focuses on NICs and their drivers, only.

I can see a high CPU usage during the speedtest. This is based on x parallel iperf3 tasks.
Therefore I checked with htop.
Each iperf3 task shows CPU high usage (~12%).
If I enable IPS, there show up a couple of suricata tasks with a CPU usage of 90% - 135%.
This shows, that suricata isn’t able to manage the amount of packets.
I have enabled no rules from Talos.

Im seeing the CPU hit as well, but the drop in NIC throughput look too coincidental. Im going to build a much faster test machine and just see if it can handle it better……for the record current machine i5-6500 4core 3.2/3.4GHz, 8GB Ram, PCI3.0, Intel I350 Gigabit card for Orange and Blue, Intel X540 10 Gigabit card for Green, and onboard Intel I219 for Red which is much greater than the listed minimum reqs for IPFIRE……IPS bottoms it all out when switched on so IPFIRE may need to change their listed min reqs at a minimum? However I still think someone needs to see what changed in CU198 with IPS that is causing an obvious drag that wasn’t there in CU197 and below………

1 Like

I don’t think it is the NIC throughput. This would limit the bandwidth without IPS also.
Switching IPS on ( without activated rules! ) produces the high load of suricata tasks. These aren’t able to transfer the network packets with the speed of the NIC.
Maybe with a much powerful CPU this can be handled, but this cannot be the goal.
Another idea, which brings the dependancy of the NIC up again, is that suricata must give a lot of packet handling to the NIC to be efficient. Another case of possible deficiencies of suricata.

I don’t know, what changed with the step to suricata 8.xx.
Because of the incidence of IPFire CU and upgrade of my internet connection, I don’t know the situation before the CU.

1 Like

I can not reproduce this at all. I have tried with two separate systems.

Disabled IPS, then unchecked all rules for the emerging threats provider I am using and then enabled IPS again. The suricata graph traffic went down and the memory consumption went down. This is not surprising as running suricata without any rules enabled, means very little is being scanned.

I could not see an increased high load of suricata tasks.

Do a fresh install of CU197, test that out with and without IPS and then upgrade to CU198 and re-do the tests. That will give you both sets of data with your new internet connection.

I did another test with same results.
Lacking a spare system, I cannot easily downgrade my production system. But I will try at night the next days.

Fact is, even with no rules activated suricata scans all packets. This is documented in the graph.
But with that high load suricata can’t fulfill the realtime conditons. suricata is part of the network stack between NIC and application. Realtime processes should not consume nearly all CPU power.
—> suricata 8.0.1 in IPFire CU198 is not usable ( with arbitrary HW fulfilling system requirements ).

That is incorrect. The traffic goes through suricata but nothing is scanned if there are no rules to be used for the scanning.

Certainly with my current LWL Mini Appliance this is not happening.

I have taken various screen shots after I ran for a while with all rules disabled and then turned the rules back on again.

This is the memory graph and you can see that it went down when all the rules were disabled and when they were enabled again it went back up.

This is the cpu usage graph. The peak circled in red is the cpu usage from restarting of suricata after disabling all the rules. The peak circled in black is the cpu usage from restarting suricata after enabling all my defined rules again.
In between is the cpu usage from when traffic was going through suricata but not being scanned as there are no rules for the traffic to be scanned against. After the black circled peak and before the red circled peak is the cpu usage when the defined rules were enabled and the traffic being scanned against it. It is all very low.

This is the screenshot of the suricata traffic graph. The red circled portion shows the traffic during the time all the rules were disabled. The peak in the section to the right after the rules were enabled again and suricata re-started is when I was watching a You Tube video. That maxed out at 80 Kbps. During when the rules were disabled it was between 10 and 20 Kbps.

This screenshot shows when the change was made from CU197 to CU198. The total traffic rate stayed the same. What changed was the proportion that suricata-8 is able to scan while suricata-7 had to ignore it and it was offloaded as suricata couldn’t do anything with it.
This is an improvement in suricata-8 where it is now able to scan more protocols than it could in the past. So to me the fact that more can be scanned and doesn’t need to be offloaded is a positive situation. The bulk of the total traffic, with both CU’s, is around the 90 to 100 Kbps.

I just had a short peak of traffic which got up to 11 Mbps and the cpu usage just about got up to 10%.

1 Like

I will try and see if I can get some time over the next days or so to test out CU198 with my old LWL Mini Appliance that was using the APU4 board.

I can see the issues with an iperf3 speedtest to the WAN, only.
My speed is 1000MBit/s.

If I whitelist the iperf server, I can measure ~950MBit/s.
The peaks in the IPS graphic change from red to green.
The CPU% in htop remains at a moderate level.

If no rule categories or rules are selected, the only 13 rules active are the ā€˜local rules’. Is some problem there?

Hi, i had a similar problem recently. After upgrading my internet modem, my internet speed was slowly but surely loose speed. On top of that i was loosing connection with all dns providers, and dhcp would totaly die every 2 to 4 days.

I solved the problem by changing the mtu on red.

I found my red mtu with this command : tracepath -4 -b xxx.xxx.xxx.xxx (x=internet ip)

I modified /var/ipfire/ethernet/settings dhcp mtu with the result of tracepath ( 65535 )

To apply the modified setting, /etc/init.d/network restart

Now it’s been 3 weeks since the modification and no more slow downs, dns problems, or dhcp server dying.

Hope this somehow help. Good luck

1 Like

What do you mean with internet ip?
And where is dhcp mtu?

Just for the record (like Bernhard) If I leave IPS on but disable the rules (snort) I get a slight improvement in statistics ie

- speedtest to internet is now ~730mbps where with rules activated it was ~630mbps and prior to cu198 it was ~930mbps

- iperf on a 10g link is now ~1.2gbps where with the rules activated it was ~850mbps and prior to CU198 it was ~9.4gbps

Suricata 8.0.1 has issues on my env.

1 Like

Bernhard I think Loup means just run the command to your external network (for ease i used microsoft web address) ie tracepath -4 -b www.micro…….com. This will give an ā€œpmtuā€ figure at the top of the trace in my case 1492. You can check your red mtu by running ā€œip link showā€ and looking for the red0: line.

You can set red0 mtu at the command line ā€œip link set mtu 1492 dev red0ā€

Firstly this did nothing noticeable for me (my red mtu before was 1500)

Now as far as making this change stick - in my /var/ipfire/ethernet/settings there are no MTU references and I cannot set force dhcp mtu via setup as it won’t let me touch the field as I have set the interface to PPP DIALUP etc so I guess I have to use the above ip link set command in rc.local??