Instability on I225-V

Hello,

I’ve been having an intermittent issue for several months with my NIC dropping, seemingly at random. I have not yet found any metric that correlates with the issue. Load, temps and memory utilization on the system are all low.

I was wondering if there was anything else that I could try to improve the adapter’s stability.

Logs:

19:36:47 kernel:  igc 0000:01:00.0 green0: NETDEV WATCHDOG: CPU: 1: transmit queue 2 timed out 5498 ms
19:36:47 kernel:  igc 0000:01:00.0 green0: Register Dump
19:36:47 kernel:  igc 0000:01:00.0 green0: Register Name   Value
19:36:47 kernel:  igc 0000:01:00.0 green0: CTRL            181c0641
19:36:47 kernel:  igc 0000:01:00.0 green0: STATUS          40680683
19:36:47 kernel:  igc 0000:01:00.0 green0: CTRL_EXT        100000c0
19:36:47 kernel:  igc 0000:01:00.0 green0: MDIC            1805dde1
19:36:47 kernel:  igc 0000:01:00.0 green0: ICR             00000081
19:36:47 kernel:  igc 0000:01:00.0 green0: RCTL            04408022
19:36:47 kernel:  igc 0000:01:00.0 green0: RDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:47 kernel:  igc 0000:01:00.0 green0: RDH[0-3]        0000002c 00000024 000000fe 0000008b
19:36:47 kernel:  igc 0000:01:00.0 green0: RDT[0-3]        0000002b 00000023 000000fd 0000008a
19:36:47 kernel:  igc 0000:01:00.0 green0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
19:36:47 kernel:  igc 0000:01:00.0 green0: RDBAL[0-3]      ffffb000 ffffa000 ffff9000 ffff8000
19:36:47 kernel:  igc 0000:01:00.0 green0: RDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:47 kernel:  igc 0000:01:00.0 green0: TCTL            a503f0fa
19:36:47 kernel:  igc 0000:01:00.0 green0: TDBAL[0-3]      fffff000 ffffe000 ffffd000 ffffc000
19:36:47 kernel:  igc 0000:01:00.0 green0: TDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:47 kernel:  igc 0000:01:00.0 green0: TDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:47 kernel:  igc 0000:01:00.0 green0: TDH[0-3]        00000097 0000007f 000000db 00000058
19:36:47 kernel:  igc 0000:01:00.0 green0: TDT[0-3]        00000097 0000007f 000000db 00000058
19:36:47 kernel:  igc 0000:01:00.0 green0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
19:36:47 kernel:  igc 0000:01:00.0 green0: Reset adapter
19:36:47 kernel:  igc 0000:01:00.0 green0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
19:36:47 kernel:  igc 0000:01:00.0 green0: NIC Link is Down
19:36:48 kernel:  igc 0000:01:00.0 green0: Register Dump
19:36:48 kernel:  igc 0000:01:00.0 green0: Register Name   Value
19:36:48 kernel:  igc 0000:01:00.0 green0: CTRL            081c0641
19:36:48 kernel:  igc 0000:01:00.0 green0: STATUS          40680681
19:36:48 kernel:  igc 0000:01:00.0 green0: CTRL_EXT        100000c0
19:36:48 kernel:  igc 0000:01:00.0 green0: MDIC            18017949
19:36:48 kernel:  igc 0000:01:00.0 green0: ICR             00000001
19:36:48 kernel:  igc 0000:01:00.0 green0: RCTL            04408022
19:36:48 kernel:  igc 0000:01:00.0 green0: RDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:48 kernel:  igc 0000:01:00.0 green0: RDH[0-3]        00000000 00000000 00000000 00000000
19:36:48 kernel:  igc 0000:01:00.0 green0: RDT[0-3]        000000ff 000000ff 000000ff 000000ff
19:36:48 kernel:  igc 0000:01:00.0 green0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
19:36:48 kernel:  igc 0000:01:00.0 green0: RDBAL[0-3]      ffffb000 ffffa000 ffff9000 ffff8000
19:36:48 kernel:  igc 0000:01:00.0 green0: RDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:48 kernel:  igc 0000:01:00.0 green0: TCTL            a50400fa
19:36:48 kernel:  igc 0000:01:00.0 green0: TDBAL[0-3]      fffff000 ffffe000 ffffd000 ffffc000
19:36:48 kernel:  igc 0000:01:00.0 green0: TDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:48 kernel:  igc 0000:01:00.0 green0: TDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:48 kernel:  igc 0000:01:00.0 green0: TDH[0-3]        00000000 00000000 00000009 00000000
19:36:48 kernel:  igc 0000:01:00.0 green0: TDT[0-3]        0000000d 00000000 00000096 00000004
19:36:48 kernel:  igc 0000:01:00.0 green0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
19:36:48 kernel:  igc 0000:01:00.0 green0: Reset adapter
19:36:50 kernel:  igc 0000:01:00.0 green0: Register Dump
19:36:50 kernel:  igc 0000:01:00.0 green0: Register Name   Value
19:36:50 kernel:  igc 0000:01:00.0 green0: CTRL            081c0641
19:36:50 kernel:  igc 0000:01:00.0 green0: STATUS          40680681
19:36:50 kernel:  igc 0000:01:00.0 green0: CTRL_EXT        100000c0
19:36:50 kernel:  igc 0000:01:00.0 green0: MDIC            18017949
19:36:50 kernel:  igc 0000:01:00.0 green0: ICR             00000001
19:36:50 kernel:  igc 0000:01:00.0 green0: RCTL            04408022
19:36:50 kernel:  igc 0000:01:00.0 green0: RDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:50 kernel:  igc 0000:01:00.0 green0: RDH[0-3]        00000000 00000000 00000000 00000000
19:36:50 kernel:  igc 0000:01:00.0 green0: RDT[0-3]        000000ff 000000ff 000000ff 000000ff
19:36:50 kernel:  igc 0000:01:00.0 green0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
19:36:50 kernel:  igc 0000:01:00.0 green0: RDBAL[0-3]      ffffb000 ffffa000 ffff9000 ffff8000
19:36:50 kernel:  igc 0000:01:00.0 green0: RDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:50 kernel:  igc 0000:01:00.0 green0: TCTL            a50400fa
19:36:50 kernel:  igc 0000:01:00.0 green0: TDBAL[0-3]      fffff000 ffffe000 ffffd000 ffffc000
19:36:50 kernel:  igc 0000:01:00.0 green0: TDBAH[0-3]      00000000 00000000 00000000 00000000
19:36:50 kernel:  igc 0000:01:00.0 green0: TDLEN[0-3]      00001000 00001000 00001000 00001000
19:36:50 kernel:  igc 0000:01:00.0 green0: TDH[0-3]        0000000f 00000006 00000018 0000001e
19:36:50 kernel:  igc 0000:01:00.0 green0: TDT[0-3]        0000001a 00000006 00000018 0000003f
19:36:50 kernel:  igc 0000:01:00.0 green0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
19:36:50 kernel:  igc 0000:01:00.0 green0: Reset adapter
19:36:53 kernel:  igc 0000:01:00.0 green0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
19:37:13 kernel:  igc 0000:01:00.0 green0: NETDEV WATCHDOG: CPU: 1: transmit queue 2 timed out 5493 ms
19:37:13 kernel:  igc 0000:01:00.0 green0: Register Dump
19:37:13 kernel:  igc 0000:01:00.0 green0: Register Name   Value
19:37:13 kernel:  igc 0000:01:00.0 green0: CTRL            181c0641
19:37:13 kernel:  igc 0000:01:00.0 green0: STATUS          40680683
19:37:13 kernel:  igc 0000:01:00.0 green0: CTRL_EXT        100000c0
19:37:13 kernel:  igc 0000:01:00.0 green0: MDIC            1805dde1
19:37:13 kernel:  igc 0000:01:00.0 green0: ICR             00000081
19:37:13 kernel:  igc 0000:01:00.0 green0: RCTL            04408022
19:37:13 kernel:  igc 0000:01:00.0 green0: RDLEN[0-3]      00001000 00001000 00001000 00001000
19:37:13 kernel:  igc 0000:01:00.0 green0: RDH[0-3]        00000052 00000075 0000009e 00000078
19:37:13 kernel:  igc 0000:01:00.0 green0: RDT[0-3]        00000051 00000074 0000009d 00000077
19:37:13 kernel:  igc 0000:01:00.0 green0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
19:37:13 kernel:  igc 0000:01:00.0 green0: RDBAL[0-3]      ffffb000 ffffa000 ffff9000 ffff8000
19:37:13 kernel:  igc 0000:01:00.0 green0: RDBAH[0-3]      00000000 00000000 00000000 00000000
19:37:13 kernel:  igc 0000:01:00.0 green0: TCTL            a503f0fa
19:37:13 kernel:  igc 0000:01:00.0 green0: TDBAL[0-3]      fffff000 ffffe000 ffffd000 ffffc000
19:37:13 kernel:  igc 0000:01:00.0 green0: TDBAH[0-3]      00000000 00000000 00000000 00000000
19:37:13 kernel:  igc 0000:01:00.0 green0: TDLEN[0-3]      00001000 00001000 00001000 00001000
19:37:13 kernel:  igc 0000:01:00.0 green0: TDH[0-3]        000000e1 000000a9 00000021 00000003
19:37:13 kernel:  igc 0000:01:00.0 green0: TDT[0-3]        000000e1 000000a9 00000021 00000003
19:37:13 kernel:  igc 0000:01:00.0 green0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
19:37:13 kernel:  igc 0000:01:00.0 green0: Reset adapter
19:37:17 kernel:  igc 0000:01:00.0 green0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX

From what I could find online and on this board, the I225-V NIC is known to have stability issues. However, a lot of the information out there is either out of date or incomplete.

Here is the most recent post that I could find: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 5600 ms

I appreciate hearing all suggestions and other peoples experience with the I225-V.

Thanks!

This NIC is embedded on board or is a replaceable PCIe card?

Which version of IPfire you’re running?

It’s an embedded chip, and I’m running IPFire 2.29 (x86_64) - core197.

This sounds like the firmware is stalling. It could be a firmware bug, or something like not enough power for the chip to draw. Does it go away if you manually set it to only do 1 Gbps?

I’ll try dropping it to 1Gb/s to see if that helps, thanks.

Do you have any suggestions on ways that I could try to reliably replicate the issue? That’s been a large part of my difficulty in troubleshooting this.

Since it’s embedded, there sould be power saving options in BIOS/UEFI settings. Check and disable them.

Thanks for the suggestion. I’ve now disabled all powersaver options.

I’ve also played around with my QoS settings and so far it’s been about a day since the last occurrence. Since I can’t replicate this on demand I’ll wait another full day before marking this as solved.

Only for the NICs, not the CPU etc.