E1000e green0 - Detected Hardware Unit Hang

after 1 month and 6 days uptime the error reappeared

Even after the last update, the network card error is still there

17:19:27	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:27	kernel: 	TDH <33>
17:19:27	kernel: 	TDT <56>
17:19:27	kernel: 	next_to_use <56>
17:19:27	kernel: 	next_to_clean <33>
17:19:27	kernel: 	buffer_info[next_to_clean]:
17:19:27	kernel: 	time_stamp <112f01cb3>
17:19:27	kernel: 	next_to_watch <37>
17:19:27	kernel: 	jiffies <112f01d30>
17:19:27	kernel: 	next_to_watch.status <0>
17:19:27	kernel: 	MAC Status <40080083>
17:19:27	kernel: 	PHY Status <796d>
17:19:27	kernel: 	PHY 1000BASE-T Status <3800>
17:19:27	kernel: 	PHY Extended Status <3000>
17:19:27	kernel: 	PCI Status <10>
17:19:29	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:29	kernel: 	TDH <33>
17:19:29	kernel: 	TDT <56>
17:19:29	kernel: 	next_to_use <56>
17:19:29	kernel: 	next_to_clean <33>
17:19:29	kernel: 	buffer_info[next_to_clean]:
17:19:29	kernel: 	time_stamp <112f01cb3>
17:19:29	kernel: 	next_to_watch <37>
17:19:29	kernel: 	jiffies <112f01df8>
17:19:29	kernel: 	next_to_watch.status <0>
17:19:29	kernel: 	MAC Status <40080083>
17:19:29	kernel: 	PHY Status <796d>
17:19:29	kernel: 	PHY 1000BASE-T Status <3800>
17:19:29	kernel: 	PHY Extended Status <3000>
17:19:29	kernel: 	PCI Status <10>
17:19:31	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
17:19:31	kernel: 	TDH <33>
17:19:31	kernel: 	TDT <56>
17:19:31	kernel: 	next_to_use <56>
17:19:31	kernel: 	next_to_clean <33>
17:19:31	kernel: 	buffer_info[next_to_clean]:
17:19:31	kernel: 	time_stamp <112f01cb3>
17:19:31	kernel: 	next_to_watch <37>
17:19:31	kernel: 	jiffies <112f01ec0>
17:19:31	kernel: 	next_to_watch.status <0>
17:19:31	kernel: 	MAC Status <40080083>
17:19:31	kernel: 	PHY Status <796d>
17:19:31	kernel: 	PHY 1000BASE-T Status <3800>
17:19:31	kernel: 	PHY Extended Status <3000>
17:19:31	kernel: 	PCI Status <10>
17:19:33	kernel: 	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
17:19:36	kernel: 	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
17:22:09	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=47278 DF PROTO =UDP SPT=6843 DPT=53 LEN=45
17:22:09	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=47283 DF PROTO =UDP SPT=47422 DPT=53 LEN=45
17:22:10	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=67 TOS=0x00 PREC=0x00 TTL=63 ID=47295 DF PROTO =UDP SPT=45043 DPT=53 LEN=47
17:22:13	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=57595 DPT=53 LEN=45
17:22:13	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=65 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=45694 DPT=53 LEN=45
17:22:24	kernel: 	FORWARDFW IN=blue0 OUT=green0 MAC=00:13:3b:2f:25:eb:40:0e:85:58:a2:e5:08:00 SRC= 192.168.0.52 DST=192.168.1.16 LEN=69 TOS=0x00 PREC=0x00 TTL=63 ID=48690 DF PROTO =UDP SPT=21372 DPT=53 LEN=49
17:26:09	kernel: 	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:

What IPFire version are you running?

At the bottom of the WebUI page https://ipfire.localdomain:444/cgi-bin/index.cgi I see:

IPFire 2.27 (x86_64) - Core Update 159

Its running 159

Sorry - I moved this to a new topic since the old one was “solved”

Did you updated recently? About when?

How the 159 update came out was 1 months and 6 days ago

By then it was over and then the green network card hung up again

I now have the command ethtool -K eth0 gso off gro off tso off again
entered the run bit now always bit to restart without hanging up the network card

it is clearly a driver bug that is still there

It certainly sounds like one! Please add a bug report. (feel free to ask questions if you have trouble entering the report)

Information to add a bug report in IPFire Bugzilla:

the bug should already be known just not fixed yet

I searched bugzilla for e1000e (which I think is your device) and didn’t notice a report.

It would probably be good to have a new report since this is a new kernel (and hopefully a new driver).

IIRC this is actually a hardware bug in some revisions of e1000e compatible chipsets (i218/i219). I researched this when one of my customers had daily hangs because of this.

The solution? Swap out your NICs. Even cheapo Realtek ones are better than hangs. This problem has been present for a loooong time (not only in IPfire of course) and it must be fixed upstream.

I plan to switch to 10 GB network cards for some time, but not yet

I don’t believe in a hardware bug because an update of an ipfire version caused this error

if I enter the said command it runs without problems until the next restart, so software bug … driver etc

Thats my upgrade Option

Chipsatz
Intel XL710-BM1

unfortunately the bug is still present in the core 160

after about 3 days after a restart after installing the 160 version, the green network card has hung up again

the usual command lets the network card continue to run without problems until the restart, then I have to enter the command again so that it runs until the next restart

maybe one day we could report the bug


e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:48	kernel:	TDH <c8>
00:03:48	kernel:	TDT <e9>
00:03:48	kernel:	next_to_use <e9>
00:03:48	kernel:	next_to_clean <c8>
00:03:48	kernel:	buffer_info[next_to_clean]:
00:03:48	kernel:	time_stamp <1017eeb1c>
00:03:48	kernel:	next_to_watch <c8>
00:03:48	kernel:	jiffies <1017eebc0>
00:03:48	kernel:	next_to_watch.status <0>
00:03:48	kernel:	MAC Status <40080083>
00:03:48	kernel:	PHY Status <796d>
00:03:48	kernel:	PHY 1000BASE-T Status <3800>
00:03:48	kernel:	PHY Extended Status <3000>
00:03:48	kernel:	PCI Status <10>
00:03:50	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:50	kernel:	TDH <c8>
00:03:50	kernel:	TDT <e9>
00:03:50	kernel:	next_to_use <e9>
00:03:50	kernel:	next_to_clean <c8>
00:03:50	kernel:	buffer_info[next_to_clean]:
00:03:50	kernel:	time_stamp <1017eeb1c>
00:03:50	kernel:	next_to_watch <c8>
00:03:50	kernel:	jiffies <1017eec88>
00:03:50	kernel:	next_to_watch.status <0>
00:03:50	kernel:	MAC Status <40080083>
00:03:50	kernel:	PHY Status <796d>
00:03:50	kernel:	PHY 1000BASE-T Status <3800>
00:03:50	kernel:	PHY Extended Status <3000>
00:03:50	kernel:	PCI Status <10>
00:03:52	kernel:	e1000e 0000:00:19.0 green0: Reset adapter unexpectedly
00:03:56	kernel:	e1000e 0000:00:19.0 green0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
00:03:58	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:
00:03:58	kernel:	TDH <0>
00:03:58	kernel:	TDT <9>
00:03:58	kernel:	next_to_use <9>
00:03:58	kernel:	next_to_clean <0>
00:03:58	kernel:	buffer_info[next_to_clean]:
00:03:58	kernel:	time_stamp <1017eeee9>
00:03:58	kernel:	next_to_watch <0>
00:03:58	kernel:	jiffies <1017eefa8>
00:03:58	kernel:	next_to_watch.status <0>
00:03:58	kernel:	MAC Status <40080083>
00:03:58	kernel:	PHY Status <796d>
00:03:58	kernel:	PHY 1000BASE-T Status <3800>
00:03:58	kernel:	PHY Extended Status <3000>
00:03:58	kernel:	PCI Status <10>
00:04:00	kernel:	e1000e 0000:00:19.0 green0: Detected Hardware Unit Hang:

Yes! That would definitely help. Please add a bug report.

Feel free to ask questions if you have trouble entering the report.

Information to add a bug report in IPFire Bugzilla:

1 Like

Okay i have reported the bug

1 Like

I don’t think that this is a bug. I run systems from old Intel 82576 till X740 and don’t have unit hangs at all. Basic infos are missing here. What NIC are we talking about? Is that NIC onboard or an addon card? Is the systems firmware up to date?

If you all google “e1000e hangs”, you will see that this is an extremely common problem in the Linux world for many years. It must be fixed upstream. Discussion here won’t help. If you have the problem, swap out your NICs.

The hardware bug I was referring too was likely this:

Look for “82573(V/L/E) TX Unit Hang Messages” (the driver is old, the error messages have changed)
That fix didn’t help in my case, though. Ethtool commands didn’t help, either. It’s either a hardware or a driver bug (or a combination). Swap out your NICs!

The cards you have mentioned don’t use the e1000e driver, they use the igb driver. This is a driver/hardware bug, and OP has indicated in the title that it’s about the e1000e driver.

You all should go hang out in the kernel.org bugzilla instead.
https://bugzilla.kernel.org/show_bug.cgi?id=47331
Do note that this was reported in 2012.

The e1000e, or at least some hardware revisions using that driver, is buggy under Linux since, like, 10 years. It won’t be fixed in ipfire, it can’t be fixed in ipfire, it must be fixed upstream. Swap out your network cards.

(btw if this reappeared in 158 it’s likely because of the linux-firmware 20210511 that was in that update, but it doesn’t matter, as e1000e has been more-or-less broken since ten years, and hunting this bug downstream is a fool’s errand. It must be fixed upstream. It hasn’t reliably been fixed upstream for ten years. Do the math)

Atm I run multiple 82572EI cards in 5 different systems and no NIC hangs at all @24/7.

In Private I had problems in the past with cheap chinese clone cards that died or chrashed because of thermal issues. Exspecially with X540/550 cards. When I found out, I improved the cooling. Since then no probs any more.

Thats my system

https://fireinfo.ipfire.org/profile/17d94e763f9ebbd470a3e57866faffd5b7caf12f

Network

Intel Corporation ‐ 82574L Gigabit Network Connection (e1000e)

Network

Intel Corporation ‐ 82579LM Gigabit Network Connection (e1000e)

https://www.supermicro.com/products/motherboard/Xeon/C202_C204/X9SCL.cfm

On-Board Devices
\ 337x1
Chipset * IntelÂŽ C202 PCH chipset
SATA * 6x SATA2 (3Gb/s) w/ RAID 0, 1, 5, 10
Network Controllers * IntelÂŽ 82579LM and 82574L, 2x Gigabit LAN ports

  • Virtual Machine Device Queues High efficient I/O overhead
  • Supports 10BASE-T, 100BASE-TX, and 1000BASE-T, RJ45 output

Is the BIOS up to date? There is an update form 2021 to fix: “1. Fixed problem of system hanging when entering setup with system date year 2021.”

There is nothing else you can do about it, but using other NICs. But first I would swap the NICs to see if this is really hardware related.