E1000e green0 - Detected Hardware Unit Hang

ipfireuser87 · 24 September 2021 16:24

after 1 month and 6 days uptime the error reappeared

Even after the last update, the network card error is still there

17:19:27	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:27	kernel: 	TDH <33>
17:19:27	kernel: 	TDT <56>
17:19:27	kernel: 	next_to_use <56>
17:19:27	kernel: 	next_to_clean <33>
17:19:27	kernel: 	buffer_info[next_to_clean]:
17:19:27	kernel: 	time_stamp <112f01cb3>
17:19:27	kernel: 	next_to_watch <37>
17:19:27	kernel: 	jiffies <112f01d30>
17:19:27	kernel: 	next_to_watch.status <0>
17:19:27	kernel: 	MAC Status <40080083>
17:19:27	kernel: 	PHY Status <796d>
17:19:27	kernel: 	PHY 1000BASE-T Status <3800>
17:19:27	kernel: 	PHY Extended Status <3000>
17:19:27	kernel: 	PCI Status <10>
17:19:29	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:29	kernel: 	TDH <33>
17:19:29	kernel: 	TDT <56>
17:19:29	kernel: 	next_to_use <56>
17:19:29	kernel: 	next_to_clean <33>
17:19:29	kernel: 	buffer_info[next_to_clean]:
17:19:29	kernel: 	time_stamp <112f01cb3>
17:19:29	kernel: 	next_to_watch <37>
17:19:29	kernel: 	jiffies <112f01df8>
17:19:29	kernel: 	next_to_watch.status <0>
17:19:29	kernel: 	MAC Status <40080083>
17:19:29	kernel: 	PHY Status <796d>
17:19:29	kernel: 	PHY 1000BASE-T Status <3800>
17:19:29	kernel: 	PHY Extended Status <3000>
17:19:29	kernel: 	PCI Status <10>
17:19:31	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:31	kernel: 	TDH <33>
17:19:31	kernel: 	TDT <56>
17:19:31	kernel: 	next_to_use <56>
17:19:31	kernel: 	next_to_clean <33>
17:19:31	kernel: 	buffer_info[next_to_clean]:
17:19:31	kernel: 	time_stamp <112f01cb3>
17:19:31	kernel: 	next_to_watch <37>
17:19:31	kernel: 	jiffies <112f01ec0>
17:19:31	kernel: 	next_to_watch.status <0>
17:19:31	kernel: 	MAC Status <40080083>
17:19:31	kernel: 	PHY Status <796d>
17:19:31	kernel: 	PHY 1000BASE-T Status <3800>
17:19:31	kernel: 	PHY Extended Status <3000>
17:19:31	kernel: 	PCI Status <10>
17:19:33	kernel: 	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
17:19:36	kernel: 	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
17:22:09	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=47278 DF PROTO =UDP SPT=6843 DPT=53 LEN=45
17:22:09	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=47283 DF PROTO =UDP SPT=47422 DPT=53 LEN=45
17:22:10	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=67 TOS=0x00 PREC=0x00 TTL=63 ID=47295 DF PROTO =UDP SPT=45043 DPT=53 LEN=47
17:22:13	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=57595 DPT=53 LEN=45
17:22:13	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=45694 DPT=53 LEN=45
17:22:24	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=69 TOS=0x00 PREC=0x00 TTL=63 ID=48690 DF PROTO =UDP SPT=21372 DPT=53 LEN=49
17:26:09	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:

jon · 24 September 2021 16:37

What IPFire version are you running?

At the bottom of the WebUI page https://ipfire.localdomain:444/cgi-bin/index.cgi I see:

IPFire 2.27 (x86_64) - Core Update 159

ipfireuser87 · 24 September 2021 16:38

Its running 159

jon · 24 September 2021 16:41

Sorry - I moved this to a new topic since the old one was “solved”

Did you updated recently? About when?

ipfireuser87 · 24 September 2021 16:44

How the 159 update came out was 1 months and 6 days ago

By then it was over and then the green network card hung up again

I now have the command ethtool -K eth0 gso off gro off tso off again
entered the run bit now always bit to restart without hanging up the network card

it is clearly a driver bug that is still there

jon · 24 September 2021 16:50

It certainly sounds like one! Please add a bug report. (feel free to ask questions if you have trouble entering the report)

Information to add a bug report in IPFire Bugzilla:

Bugzilla info in IPFire Wiki
- https://wiki.ipfire.org/devel/bugzilla
IPFire Bugzilla
- https://bugzilla.ipfire.org

ipfireuser87 · 24 September 2021 16:52

the bug should already be known just not fixed yet

jon · 24 September 2021 16:59

I searched bugzilla for e1000e (which I think is your device) and didn’t notice a report.

It would probably be good to have a new report since this is a new kernel (and hopefully a new driver).

mangrove · 25 September 2021 10:28

IIRC this is actually a hardware bug in some revisions of e1000e compatible chipsets (i218/i219). I researched this when one of my customers had daily hangs because of this.

The solution? Swap out your NICs. Even cheapo Realtek ones are better than hangs. This problem has been present for a loooong time (not only in IPfire of course) and it must be fixed upstream.

ipfireuser87 · 25 September 2021 12:33

I plan to switch to 10 GB network cards for some time, but not yet

I don’t believe in a hardware bug because an update of an ipfire version caused this error

if I enter the said command it runs without problems until the next restart, so software bug … driver etc

ipfireuser87 · 25 September 2021 13:25

Thats my upgrade Option

Chipsatz
Intel XL710-BM1

ipfireuser87 · 8 October 2021 22:34

unfortunately the bug is still present in the core 160

after about 3 days after a restart after installing the 160 version, the green network card has hung up again

the usual command lets the network card continue to run without problems until the restart, then I have to enter the command again so that it runs until the next restart

maybe one day we could report the bug


e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:48	kernel:	TDH <c8>
00:03:48	kernel:	TDT <e9>
00:03:48	kernel:	next_to_use <e9>
00:03:48	kernel:	next_to_clean <c8>
00:03:48	kernel:	buffer_info[next_to_clean]:
00:03:48	kernel:	time_stamp <1017eeb1c>
00:03:48	kernel:	next_to_watch <c8>
00:03:48	kernel:	jiffies <1017eebc0>
00:03:48	kernel:	next_to_watch.status <0>
00:03:48	kernel:	MAC Status <40080083>
00:03:48	kernel:	PHY Status <796d>
00:03:48	kernel:	PHY 1000BASE-T Status <3800>
00:03:48	kernel:	PHY Extended Status <3000>
00:03:48	kernel:	PCI Status <10>
00:03:50	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:50	kernel:	TDH <c8>
00:03:50	kernel:	TDT <e9>
00:03:50	kernel:	next_to_use <e9>
00:03:50	kernel:	next_to_clean <c8>
00:03:50	kernel:	buffer_info[next_to_clean]:
00:03:50	kernel:	time_stamp <1017eeb1c>
00:03:50	kernel:	next_to_watch <c8>
00:03:50	kernel:	jiffies <1017eec88>
00:03:50	kernel:	next_to_watch.status <0>
00:03:50	kernel:	MAC Status <40080083>
00:03:50	kernel:	PHY Status <796d>
00:03:50	kernel:	PHY 1000BASE-T Status <3800>
00:03:50	kernel:	PHY Extended Status <3000>
00:03:50	kernel:	PCI Status <10>
00:03:52	kernel:	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
00:03:56	kernel:	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
00:03:58	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:58	kernel:	TDH <0>
00:03:58	kernel:	TDT <9>
00:03:58	kernel:	next_to_use <9>
00:03:58	kernel:	next_to_clean <0>
00:03:58	kernel:	buffer_info[next_to_clean]:
00:03:58	kernel:	time_stamp <1017eeee9>
00:03:58	kernel:	next_to_watch <0>
00:03:58	kernel:	jiffies <1017eefa8>
00:03:58	kernel:	next_to_watch.status <0>
00:03:58	kernel:	MAC Status <40080083>
00:03:58	kernel:	PHY Status <796d>
00:03:58	kernel:	PHY 1000BASE-T Status <3800>
00:03:58	kernel:	PHY Extended Status <3000>
00:03:58	kernel:	PCI Status <10>
00:04:00	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:

jon · 8 October 2021 23:14

Yes! That would definitely help. Please add a bug report.

Feel free to ask questions if you have trouble entering the report.

Information to add a bug report in IPFire Bugzilla:

Bugzilla info in IPFire Wiki
- https://wiki.ipfire.org/devel/bugzilla
IPFire Bugzilla
- https://bugzilla.ipfire.org

ipfireuser87 · 9 October 2021 02:52

Okay i have reported the bug

xperimental · 9 October 2021 07:16

I don’t think that this is a bug. I run systems from old Intel 82576 till X740 and don’t have unit hangs at all. Basic infos are missing here. What NIC are we talking about? Is that NIC onboard or an addon card? Is the systems firmware up to date?

mangrove · 9 October 2021 10:26

If you all google “e1000e hangs”, you will see that this is an extremely common problem in the Linux world for many years. It must be fixed upstream. Discussion here won’t help. If you have the problem, swap out your NICs.

The hardware bug I was referring too was likely this:

Look for “82573(V/L/E) TX Unit Hang Messages” (the driver is old, the error messages have changed)
That fix didn’t help in my case, though. Ethtool commands didn’t help, either. It’s either a hardware or a driver bug (or a combination). Swap out your NICs!

mangrove · 9 October 2021 10:37

The cards you have mentioned don’t use the e1000e driver, they use the igb driver. This is a driver/hardware bug, and OP has indicated in the title that it’s about the e1000e driver.

You all should go hang out in the kernel.org bugzilla instead.
https://bugzilla.kernel.org/show_bug.cgi?id=47331
Do note that this was reported in 2012.

The e1000e, or at least some hardware revisions using that driver, is buggy under Linux since, like, 10 years. It won’t be fixed in ipfire, it can’t be fixed in ipfire, it must be fixed upstream. Swap out your network cards.

(btw if this reappeared in 158 it’s likely because of the linux-firmware 20210511 that was in that update, but it doesn’t matter, as e1000e has been more-or-less broken since ten years, and hunting this bug downstream is a fool’s errand. It must be fixed upstream. It hasn’t reliably been fixed upstream for ten years. Do the math)

xperimental · 9 October 2021 10:44

Atm I run multiple 82572EI cards in 5 different systems and no NIC hangs at all @24/7.

In Private I had problems in the past with cheap chinese clone cards that died or chrashed because of thermal issues. Exspecially with X540/550 cards. When I found out, I improved the cooling. Since then no probs any more.

ipfireuser87 · 9 October 2021 11:03

Thats my system

https://fireinfo.ipfire.org/profile/17d94e763f9ebbd470a3e57866faffd5b7caf12f

Network

Intel Corporation ‐ 82574L Gigabit Network Connection (e1000e)

Network

Intel Corporation ‐ 82579LM Gigabit Network Connection (e1000e)

https://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCL.cfm

On-Board Devices
$\ 337x1$
Chipset * Intel® C202 PCH chipset
SATA * 6x SATA2 (3Gb/s) w/ RAID 0, 1, 5, 10
Network Controllers * Intel® 82579LM and 82574L, 2x Gigabit LAN ports

Virtual Machine Device Queues High efficient I/O overhead
Supports 10BASE-T, 100BASE-TX, and 1000BASE-T, RJ45 output

xperimental · 9 October 2021 11:25

Is the BIOS up to date? There is an update form 2021 to fix: “1. Fixed problem of system hanging when entering setup with system date year 2021.”

There is nothing else you can do about it, but using other NICs. But first I would swap the NICs to see if this is really hardware related.