Celeron J3160 capabilities with QoS

ag · 17 April 2024 18:35

Do you happen to use PPPoE on your IPFire system?

bloater99 · 17 April 2024 18:37

Hard to say for sure, because I’m estimating based on watching the utilization bouncing around “live” for the time of the speed test. I’d estimate with QoS on, roughly 50-75% and with QoS off, between 25-50%.

bloater99 · 17 April 2024 18:37

No, I do not use PPPoE.

bloater99 · 18 April 2024 11:07

Screenshots.

QoS OFF:

QoS ON:

Keep in mind this is just a moment that was captured. The values fluctuate quite a bit throughout the speed test.

I have seen ksoftirqd/1 get up to 13.8% with QoS OFF. Still significantly lower than the cpu usage of ksoftirqd/1 and ksoftirqd/2 with QoS ON.

bloater99 · 20 April 2024 12:36

Okay, I’ve just finished some pretty thorough testing of the QoS on my J3160-based IPFire. There is definitely a correlation with ksoftirqd and bandwidth. I artificially dropped bandwidths for Classes 203 (Web), and 210 (Default), then raised them in 25Mbps increments in between tests. I watched the top line in htop and picked out the highest cpu usage for ksoftirqd that showed, and recorded the average download speed reported by Waveform’s BufferBloat Test at:

Results:
QoS set to 25Mbps: download speed of 25Mbps measured at ~1% cpu usage on ksoftirqd
QoS set to 50Mbps: download speed of 48Mbps measured at ~5% cpu usage on ksoftirqd
Qos set to 75Mbps: download speed of 72Mbps measured at ~12% cpu usage on ksoftirqd
Qos set to 100Mbps: download speed of 95Mbps measured at 25% cpu usage on ksoftirqd
QoS set to 125Mbps: download speed of 116Mbps measured at 31% cpu usage on ksoftirqd
QoS set to 150Mbps: download speed of 133Mbps measured at 38% cpu usage on ksoftirqd
QoS set to 175Mbps: download speed of 150Mbps measure at 47% cpu usage on ksoftirqd
QoS set to 200Mbps: download speed of 158Mbps measured at 58% cpu usage on ksoftirqd
QoS set to 225Mbps: download speed of 161Mbps measured at 64% cpu usage on ksoftirqd
Qos set to 250Mbps: download speed of 166Mbps measured at 60% cpu usage on ksoftirqd
QoS set to 275Mbps: download speed of 177Mbps measured at 63% cpu usage on ksoftirqd
QoS set to 300Mbps: download speed of 191Mbps measured at 79% cpu usage on ksoftirqd.

Edit:

%bandwidth retained:

Bandwidth	%bandwidth retained
25	97%
50	95%
75	96%
100	95%
125	93%
150	88%
175	85%
200	79%
225	72%
250	66%
275	64%
300	64%

pike_it · 20 April 2024 16:35

Long story short: on this hardware, with this distro, using QoS management might expose not enought computational power (or cache availability) to allow a smooth experience around 350mbps internet connection.
According to the official site
https://kb.protectli.com/kb/bios-versions-for-the-vault/?intsrc=eu
Latest bios is BSW4L011.
I’m not aware or confident that any bios update can improve performances, due to oldness of the microarchitecture and the transient execution vulnerability gang (starting with Spectre) which might be a part of computational power loss.

According to Intel
https://www.intel.com/content/www/us/en/products/sku/91533/intel-celeron-processor-j3160-2m-cache-up-to-2-24-ghz/specifications.html
that Braswell processor is out of support since mid 2022 and has been at beginning of 2016.
It’s nice silicon anyway… 4 phisical cores. But underlying Silvermont architecture lead to a “not top notch” IPC, which for linux might still be an interesting plus.

bloater99 · 20 April 2024 19:47

Before I put the FW4B in service, I checked the BIOS and the version included was actually either the latest or perhaps newer than the latest version on their site. But like you said, it probably would not make much difference.

I am okay with just discontinuing use of QoS on this device. Not ideal, but not the end of the world either.

Thanks.

bloater99 · 21 April 2024 10:39

@ms
I wonder if over the years there have been some regressions in the IMQ driver? This blog post from 2019/2020 indicates that some updates to the IMQ driver allowed a lowly Atom processor (D525) to run QoS at 400Mbps. My much faster J3160 is falling short at half those speeds.

arne_f · 22 April 2024 12:20

IPFire doesn’t use IMQ since years. We have switched to IFB (we have only left the name of the device IMQ for compatiblity reasons)

https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=0dfb3984d0c56164dc25dd1a9d1e2213da0aa142

bloater99 · 22 April 2024 22:38

So it sounds like IFB is supposed to be more efficient than IMQ. If an Atom D525 could handle 400Mbps under IMQ, shouldn’t a J3160 be able to handle 300Mbps under IFB?

ms · 24 April 2024 09:01

As Arne said, we do not use IMQ any more but for some reasons, the device is still called imq0.

IPFire (if the NICs support it) handles packets on all processor cores. Usually the NICs have multiple queues and put certain flows into the same queue which is then connected to one processor core which handles those packets. IFB should also support this.

However, QoS itself has some limitations when it comes to calculating bandwidth and so on. For example, in order to determine what the current throughput of a class is, it needs to count all packets. That will have to be synchronised between multiple processor cores, because there is a chance that each processor only sees a fraction of the packets.

On CPUs that have large caches (L1 and L2) or those that support atomic operations, this should be very fast. On processors that are not connecting their CPU cores very well (one reason to do this deliberately is to save energy consumption), this will have a rather large performance impact. There is nothing that we can do about it, because we have to synchronise packet counters for the QoS to work.

The only option is to upgrade to a faster CPU - and faster doesn’t mean 64 core Intel Xeon Gold. Simply one that has sufficient bus bandwidth.

As Arne has mentioned, we have seen a huge drop in throughput (with or without QoS) on some Intel Atom processors. If I remember correctly anything after the N2600 was affected. And to be fair, these are designed for mobile applications consuming only 3W. The D525 is consuming closer to 15W. That saving has to come from somewhere.

bloater99 · 24 April 2024 10:21

Thank you for the reply, Michael. I am actually using a Celeron J3160, which is many times faster than the Atom D525. At the time you wrote that blog post I quoted above, you were able to run QoS at 390Mbps on an Atom D525. Hence my posts here: If a D525 can handle 390Mbps, why can’t a J3160 handle 300Mbps?

D525: 45nm, 13W, 1.1MB total cache, 398 Passmark score, launched in 2010
J3160: 14nm, 6W, 2.2MB total cache, 1265 Passmark score, launched in 2016

Thank you

ms · 24 April 2024 12:50

I am really not the biggest expert when it comes to CPU design. I just have a couple of assumptions that I use in my day to day job.

The D525 is a CPU of the Intel Atom series; J3160 is an Intel Celeron Series CPU which are deliberately made slower. Hence the cheaper price and hence the lower energy consumption. Although each single CPU core is faster, it might simply be sitting and waiting for the data to arrive from memory. Since the Celeron does not have HyperThreading which allows the CPU to do something else while waiting for data, I think that could be the reason.

pike_it · 24 April 2024 15:35

Bold part might be a huge part of the consumption drop. Reducing production proces can lead to astonish crawl of heat and power consumption.
250nm Intel Katmai was a sluggish, freezing (process) and overheating CPU. 180nm Intel Coppermine was way faster, cooler, stable (cache integration into day played a big part for performance improving, needed to say), just to say “close” products that improved a lot after a production node change.

May I put fiches on… SMT axed on IpFire?
My personal guess is that D525 (which has fewer cores) can be exploited better (pun intended) with SMT enable with performance, extracting slight more computational power.
J3160 do not support hyperthreading, but have 4 physical cores. However… since D525 IpFire “grew”, also in… load.

ms · 24 April 2024 15:40

I have no idea what you are going on about, but that QoS benchmark back in 2019 was done with HyperThreading enabled.

bloater99 · 24 April 2024 17:09

Hyperthreading is less efficient than a physical core. My understanding is that a 4 core processor should outperform a 2 core/4 thread processor significantly, all other things being equal. So I don’t believe that the D525 using hyperthreading gives it any advantage over 4 physical cores, especially when the J3160 has the advantage in all other categories. I believe many articles have been published over the years, especially when the technologies were fresh.
Here is one such forum thread that covers this: 2-core/4-thread VS. 4-core/4-thread: do they perform nearly the same? | Tom's Hardware Forum

And some performance benchmarks of 1 core/0 thread vs 1 core/1 thread, vs 2 core/0 thread: iXBT Labs - How CPU Features Affect CPU Performance, Part 4 - Page 2: Tests, part 2, conclusions

I think this may be what is going on. Since 2019, IPFire has grown in scope, with more complex processes and kernels, and thus is a “heavier” OS than the IPFire of 2019. The D525 running the latest Core 185 would no doubt be even slower than my J3160 in QoS.

pike_it · 25 April 2024 10:33

TDP is a way to determine how much heat need to be dissipated and roughly how much current will be needed to feed a CPU.
Without knowledge of process node size or underlying architecture it’s simply not meaningful for evaluate computational power.
I mean…
https://www.cpubenchmark.net/compare/1024vs5294/Intel-Core2-Extreme-Q6850-vs-Intel-i5-1335U
9x more energy. 14% of the performance (or 86% less)
Yeah, i “made a point” quite cheating.
Let’s try something a bit closer.
https://www.cpubenchmark.net/compare/2332vs2792/Intel-i7-5960X-vs-Intel-i7-6950X
Same power consumption, same base frequency, same underlying microarchitecture (Haswell). But a die shrink (which lead to bigger cache and more cores, higher turbo speed) due to switch from 22nm to 14nm production node. +27% of result in this benchmark.
Yes, close to 25% that 2 cores are compared with 8. However I bet that at the same computational occupation the power consumption in the latter will be lower.
Last but not least. Until 8th gen core Intel was quite slacking without any kind of competition from AMD. Since Ryzen bell chimed in, things changed. Quite a lot.

bloater99 · 25 April 2024 10:56

A more relevant cpu comparison:
https://www.cpubenchmark.net/compare/2772vs611/Intel-Celeron-J3160-vs-Intel-Atom-D525

The D525 has a single core score of 305 and a multi-core score of 398. In other words, using both cores and all 4 threads only gives it 30% more performance than using 1 core an no threads.

The J3160 single core score is 601 and multi-core score is 1265. 2.1 times the performance gained using all 4 cores.

Comparing the two cpus, the J3160 is twice as fast with only 1 core, and 3.2 times faster using all available cores/threads. Simply no way the D525 could ever outperform it in any apples-to-apples comparison.

nickh · 25 April 2024 11:00

TDP figures are very hard to use. Have a look at these three processors - Intel Core i5-11400 @ 2.60GHz vs Intel Core i5-11500 @ 2.70GHz vs Intel Core i5-11600 @ 2.80GHz [cpubenchmark.net] by PassMark Software. They are all from the same family, just different speeds. Are you seriously going to believe that they all consume the same power? Manufacturers just bracket their CPUs into power bands so meaningful power comparisons are virtually impossible.

pike_it · 25 April 2024 12:13

Yes… and no. I mean.
speculation mode on
All x86_64 cpus are x86_64 cpus; however not all are built/designed/architectured in the same way.
So benchmarks are a “rough” way to assign a score that allows comparison, however not all tasks are the same.

Calculation of data is a task, transferring data is a task, calculate a valute after a data has been delivered from network, then sent it to a storage is a task. All roughly works, however they are not all the same.
99.9% of the times D525 should eat lorries of dust from J3160. Hower, a poorly optimized task/software for use the full capabilities of J3160 could result faster on D525 only for… higher clock which is not that much (200mhz) but still 12% faster on D525.
Even with a 1/2x single thread rating on D525 than J3160.

As a rule of thumb you’re indeed correct, @bloater99. But be prepared to unexpected

Bandwidth	%bandwidth retained
25	97%
50	95%
75	96%
100	95%
125	93%
150	88%
175	85%
200	79%
225	72%
250	66%
275	64%
300	64%

Bandwidth	%bandwidth retained
25	97%
50	95%
75	96%
100	95%
125	93%
150	88%
175	85%
200	79%
225	72%
250	66%
275	64%
300	64%

Bandwidth	%bandwidth retained
25	97%
50	95%
75	96%
100	95%
125	93%
150	88%
175	85%
200	79%
225	72%
250	66%
275	64%
300	64%