Throughput green<->red "slow" (one core?)

Hi everybody,
I’m actually using ipfire since a few years in my home-lab, and so far, I was quite happy with the stability, performance and experience.

Nevertheless, as technology evolves, so does infrastructure, and I finally got a fiber installed. While IPFire easily handled the old 1/0.5Gbit down/up WAN, it struggles now mirroring the new speeds (even approximately) when routing between network interfaces.

I did run several iperf3 tests:

  • IpFire to a server on green:

[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  27.4 GBytes  23.5 Gbits/sec                  receiver
  • machine on green to ipfire:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  27.3 GBytes  23.5 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  27.3 GBytes  23.5 Gbits/sec                  receiver
  • IPFIRE to WAN:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  17.0 GBytes  14.6 Gbits/sec  43175             sender
[  5]   0.00-10.00  sec  17.0 GBytes  14.6 Gbits/sec                  receiver

So, everything runs smoothly when not routing between interfaces. Checking top on ipfire reveals, that at least three cores of the Ryzen5600G are used while above iperf3 is running.

Now, if I run the speedtest from a machine on GREEN to RED through ipfire (so, routing between interfaces) I get like 1/10 of WAN speed. AND: ipfirev host maxes out on one CPU core:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.04 GBytes  1.75 Gbits/sec   36             sender
[  5]   0.00-10.00  sec  2.04 GBytes  1.75 Gbits/sec                  receiver

Is it an IRQ issue? Hardware / Board not fitting the SFP28 network? All interfaces are Multi-Queue capable, and ethtool confirmes that.

[root@ipfire ~]# ethtool -l red0 / green0
Channel parameters for red0:
Pre-set maximums:
RX:		74
TX:		74
Other:		n/a
Combined:	120
Current hardware settings:
RX:		0
TX:		0
Other:		n/a
Combined:	3

Any hints how to investigate further / what to look for?

(And, is there a roadmap for ipv6? I now get a full range of ipv6 addresses with my fiber…)

Welcome to the community!

What speed should you be getting? I don’t see that you mentioned the ISP advertised speed.

Hi there, Thanks!

Well, advertised is 25GBit. Network adapter Red / Green is a broadcom sfp28 dual card. As shown above, IpFire achieves around 13 - 20GBit iperf3 to a speedtest server wan. As this is already quite fast ;), I would be happy with that, but dropping to not even 2GBit throughput is too sad… Nevertheless, I would be happy with real 10GBit throughtput, as my network is a mix of 2.5/10/25 devices.

So the issue is especially when ipfire has to route between interfaces from green to red or red to green, speed drops to approx. 10% …

What services are running? IPS? QoS? Proxy? VPN? If you haven’t already you might try shutting off all of these things and retesting.

Disabled already all services, except

  • openVPN server, currently idling with no active clients
  • standards like dhpd, etc
  • a few firewall rules, but none affecting green-red

… same numbers :pensive:

It seems as if this may have something to do with it? Do you have any spare NICs that you could test to separate red and green?

Well, there is another 10gbe sfp+ card (aquantium) in the machine, which leads to the same “problem”, max 1,7gbit throughput and 100% on one cpu. I cannot really test another 25gbe card, as the board runs the second pcie on pcie3x4 …

What do you refer to? Are there known issues with dual-port adapters? I searched this forum before opening this thread and found nothing mentioning this …

Ah, The slot runs on pcie3x16, the broadcom card is pcie3x8 (BCM957414A4142CC) . Are there some pcie x4 25gbit cards on the market?

No, it was just a troubleshooting step. Eliminating variables.

Took this input and did:

  • complete reinstall of ipfire with reformatting
  • assigned green to the 10Gbe adapter in the other pcie-slot, using a mikrotik 10gbe sfp+ switch

Bad news: Problem still exists, an both slots and interfaces.

  • Green To IPFire over SFP+:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.0 GBytes  9.42 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec                  receiver

As expected, maxing out at the sfp+ / switch / interface limit.

But, GREEN to WAN:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.33 GBytes  2.00 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.32 GBytes  1.99 Gbits/sec                  receiver

Iperf from ipfire to WAN still gives me between 14 and 20 Gbit.

So, in the end, as soon as ipfire has to route through interfaces, it slows down to approx 2Gbit on this hardware. Note this is a naked, new installation with only dhcpd setup and running.

There is only two things I can think of that might cause this which is rate limits set on the iptables when its executed and Kernel network stack.

It limiting onto one cpu is not the issue, but I think it has to do more with kernel buffers and their thresholds in the network stack settings. Because they have to purposely invoke rate limiting in iptables and I doubt they are doing that, but it would be nice to find the file they are setting up iptables with since they are not using the default Kernel define paths to set up networking.

I think its more of kernel tuning especially with interfaces 10Gb and higher.

For instance, the software irq scheduling being too low would cause the condition of the rate of incoming data exceeding the kernel’s capability to drain the buffer fast enough. As a result, the NIC buffers will overflow and traffic will be lost.

This parameter is netdev_budget which is set as 300 on all Kernels. Which should be at least 600 for the application we are using.

Problem with setting parameters generically in /etc/sysctl.conf is that interfaces of a certain speed will perform optimally, but others will perform poorly, so if anyone modifies this for 10Gb-100Gb interfaces the ones below 10Gb can suffer buffer bloat issues due to setting buffers where they need to be for the faster interfaces.

But this is just one of several parameters that gets changed with fast commercial interfaces. The are other ones I could mention, but it would take some testing because I imagine its going to be different than the settings I know that are specific to setting up a server for 1000 web sites.

I think the network adapter is the culprit. to be direct the SPF adapter overheats. check to see if its heat releated. mine SPF+ get crazy hot. I know they are made that way but less heat is always better.

1 Like

Well - did some more testing based on your inputs, thanks guys!

Checked the temps of the adapters - yes, they get hot, but temperature under load is between 60 and 65 degrees celsius. Will probably spend them an additional fan, but this is not the problem.

Did a lot of trial and error with several network parameters, increased netdev_budget besides others

net.core.rmem_max, net.core.wmem_max, net.ipv4.tcp_rmem, net.ipv4.tcp_wmem, activated mtu_probing

As well, configured IRQ as specified / recommended by broadcom.

Throughput went up by around 1Gbit, so I was able to reach around 2.8GBit Green->Red.

To point the issue down to something in IPFIRE I don’t understand (application developer here, not network specialist) I tested the following:

  • Install pure Debian NetInstall baremetal on the machine
  • Configured Red & Green in /etc/interfaces
  • Install DNSMasq & iptables-persistent
  • Enable IPv4 forwarding, add basic iptable rules, configure dhcp on green
  • Run iperf3 from server behind mikrotik 20GBit switch connected via sfp28 as above

This router is now a pure debian on same hardware, without any tweaking or services except ssh, and the problem seems gone / hefty improved when running iperf3 from the server on green:

 iperf3 -c speedtest.init7.net
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.99 GBytes  4.29 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  4.99 GBytes  4.29 Gbits/sec                  receiver
iperf3 -P4 -c speedtest.init7.net
[SUM]   0.00-10.00  sec  15.9 GBytes  13.7 Gbits/sec  1248             sender
[SUM]   0.00-10.00  sec  15.9 GBytes  13.7 Gbits/sec                  receiver

 iperf3 -P8 -c speedtest.init7.net
[SUM]   0.00-10.00  sec  15.9 GBytes  13.6 Gbits/sec  597             sender
[SUM]   0.00-10.00  sec  15.8 GBytes  13.6 Gbits/sec                  receiver

So I assume this has to do with something hidden within IpFire.

seems so … ?

yes that was expected. I have to look at one of the web servers at the data center to see what I set in them 22 years ago, but that is one of them that does get changed. The number value I got from Red Hat’s proposed change in main for 2.5Gb interfaces, but was rejected because this file’s settings is up to the distribution to settle on its configuration.

So I would crank this to like 2400 and see what a 2.4Mb buffer behaves.

There might be concurrency going on that the systemd on debain is running and IPFIre isn’t (because it doesn’t use systemd) but compare /etc/sysctl.conf files or even swap them and see how they perform differently.

Also, I want to add kernel versions might come in play in comparing ipfire with debain.

maybe, but I’ll look into it. It could be the optimizations that were done after 6.6.59 why its different, but copy the stack configuration file to the ipfire machine to see how the memory thresholds of the network stack behaves in that configuration. /etc/sysctl.conf

debain is not a bad os to use for internet purposes, however its common practice to remove dbus, cups, avahi on hosted machines as they add an attack surface and these modules are not used and what you want functions that were done with dbus you would want password control over it instead of the granted dbus running as root for those operations.

remove dbus ist really a bad recommendation

it leads to weird behavior on the System

have testet your suggest a few minutes ago on an Debian 12.9 VM

even a reboot leads to error messages!

1 Like

I would have to see what all they do to prepare that, but they basically strip out all plug n play, print server, and run init instead of systemd.