Bad update to 161/162? Or bad hardware?

Greetings,

I have a LightningWireLabs eco from 2015 that I adore and has done me well for 6 years. After the upgrade to 161 Saturday, I’ve been having issues with it hard locking up on me. After an update to 162, I can’t get it to boot at all.

I think that the hardware just gave up on me, but I want to report what went down and see if anyone has any thoughts/suggestions.

Plugging a monitor in all I see is:
NMI: IOCK error (debug interrupt?) for reason 70 on CPU0
NMI: IOCK error (debug interrupt?) for reason 60 on CPU0

And the keyboard wouldn’t respond. A hard reset “works”. Sometimes for a few hours, sometimes for only a few minutes. The IOCK errors are not in any log files before the upgrade. They start appearing really quick after ipfire is booted up. In dmesg I see things like this:


Dec  6 16:31:22 ipfire kernel: NMI: IOCK error (debug interrupt?) for reason 60 on CPU 0.
Dec  6 16:31:22 ipfire kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.10.76-ipfire #1
Dec  6 16:31:22 ipfire kernel: Hardware name: ASUS P9A-I/2550/4L/P9A-I/2550/4L, BIOS 0206 05/14/2014
Dec  6 16:31:22 ipfire kernel: RIP: 0010:mwait_idle_with_hints.constprop.0+0x52/0xa0
Dec  6 16:31:22 ipfire kernel: Code: 65 48 8b 04 25 c0 7b 01 00 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d 95 84 9b 00 b9 01 00 00 00 48 89 f8 0f 01 c9 <65> 48 8b 04 25 c0 7b 01 00 f0 80 60 02 df f0 83 4
4 24 fc 00 48 8b
Dec  6 16:31:22 ipfire kernel: RSP: 0018:ffffffffb0c03e38 EFLAGS: 00000046
Dec  6 16:31:22 ipfire kernel: RAX: 0000000000000051 RBX: ffff8cc337c33f20 RCX: 0000000000000001
Dec  6 16:31:22 ipfire kernel: RDX: 0000000000000000 RSI: ffffffffb0daa3a0 RDI: 0000000000000051
Dec  6 16:31:22 ipfire kernel: RBP: 0000000000000002 R08: 000000290b3459e6 R09: 00000029218d2bef
Dec  6 16:31:22 ipfire kernel: R10: 0000000000007938 R11: 000000000000f5d1 R12: 0000000000000002
Dec  6 16:31:22 ipfire kernel: R13: ffffffffb0daa488 R14: 0000000000000002 R15: 0000000000000000
Dec  6 16:31:22 ipfire kernel: FS:  0000000000000000(0000) GS:ffff8cc337c00000(0000) knlGS:0000000000000000
Dec  6 16:31:22 ipfire kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  6 16:31:22 ipfire kernel: CR2: 0000000001d445f8 CR3: 0000000188c0c000 CR4: 00000000001006f0
Dec  6 16:31:22 ipfire kernel: Call Trace:
Dec  6 16:31:22 ipfire kernel:  intel_idle+0x1f/0x30
Dec  6 16:31:22 ipfire kernel:  cpuidle_enter_state+0x89/0x370
Dec  6 16:31:22 ipfire kernel:  cpuidle_enter+0x29/0x40
Dec  6 16:31:22 ipfire kernel:  do_idle+0x1cc/0x230
Dec  6 16:31:22 ipfire kernel:  cpu_startup_entry+0x19/0x20
Dec  6 16:31:22 ipfire kernel:  start_kernel+0x83e/0x879
Dec  6 16:31:22 ipfire kernel:  secondary_startup_64_no_verify+0xb0/0xbb
Dec  6 16:31:24 ipfire kernel: DROP_INPUT IN=red0 OUT= MAC=78:24:af:82:9e:f2:00:00:5e:00:01:8c:08:00 SRC=185.195.232.162 DST=136.35.226.111 LEN=132 TOS=0x00 PREC=0x00 TTL=53 ID=51558 PROTO=UDP SPT=51820 DPT=48925 L
EN=112 MARK=0x80000000 
Dec  6 16:31:27 ipfire kernel: NMI: IOCK error (debug interrupt?) for reason 70 on CPU 0.
Dec  6 16:31:27 ipfire kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.10.76-ipfire #1
Dec  6 16:31:27 ipfire kernel: Hardware name: ASUS P9A-I/2550/4L/P9A-I/2550/4L, BIOS 0206 05/14/2014
Dec  6 16:31:27 ipfire kernel: RIP: 0010:mwait_idle_with_hints.constprop.0+0x52/0xa0
Dec  6 16:31:27 ipfire kernel: Code: 65 48 8b 04 25 c0 7b 01 00 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d 95 84 9b 00 b9 01 00 00 00 48 89 f8 0f 01 c9 <65> 48 8b 04 25 c0 7b 01 00 f0 80 60 02 df f0 83 4
4 24 fc 00 48 8b
Dec  6 16:31:27 ipfire kernel: RSP: 0018:ffffffffb0c03e38 EFLAGS: 00000046
Dec  6 16:31:27 ipfire kernel: RAX: 0000000000000051 RBX: ffff8cc337c33f20 RCX: 0000000000000001
:
Message from syslogd@ipfire at Mon Dec  6 16:32:56 2021 ...
ipfire kernel: NMI: IOCK error (debug interrupt?) for reason 70 on CPU 0.

In a hope that it was possibly a bad kernel update, I updated to 162 hoping that it would fix it. I rebooted and… nothing… Once IPFire starts to boot I get a blinking cursor. Well… dang… Hard power off and try again… and… blinking cursor.
Fine. Yank power, let it sit 10 seconds, plug in, and hit power button … And it won’t even post the BIOS anymore…
Damn.

Grabbed an old desktop sitting under my desk, grabbed a bunch of USB network dongles from an old RPi project, yank the SSD out of the IPFire system, cram it into the desktop, and… it works!! Re-did setup. Fought hard with my ISP to give me a proper DHCP to a new device it wasn’t expecting… and my firewall lives!!!

But my eco does not…

I got 6 hard years of always-on from that thing. Maybe it was time and this was all horrible coincidence. In fact, I’m 99% confident this was just bad timing on the hardware failing. However, the timing of this is suspect to me that I did an update Saturday night, these errors appear Sunday, and Sunday night/all-day-Monday I’m fighting hard lockups. Not to mention that it was immediately after the 162 update that the device just doesn’t boot anymore. So for that <1% chance that something went horribly wrong in 161/162 update process I’m asking for other considerations.

Anyone else have reason to suspect something in 161/162 that might have not gone well for an old Eco?

Thanks!

1 Like

Core upgrade 162 (testing) on arm6l, is filling /boot and that results in a lockup, on reboot. It appears to be a result of superceded dtb & uInit for kernel 5.10.55 not being removed with the upgrade to kernel 5.10.76

I need to investigate further whether cleaning out /boot will suffice or a re-install is required.

Thanks for the feedback. I’m not sure that was the cause because a) hardware won’t post into BIOS anymore and b) I used the SSD from the eco in the desktop and it booted right up.

I’m trying to find the stats right now of the eco that I had but it is an older Intel Atom and my worry/concern is that it was something specific to that architecture

Thanks!.

Some Atom are 64 bit capable. Lightningwirelabs no longer list the specs for eco. Have you tried booting x86_64 on it ?

All appliances that we have ever sold are 64 bit capable. Some came installed with the 32 bit version when the 64 bit version wasn’t available, yet.

2 Likes

It was indeed 64bit. And I poked at it a bit more last night, the device is as dead as the Monty Python Parrot.

I’m still leaning heavily on “this was a bad coincidence” but, wow, the timing sucks. If no one has any suspected reasons in the 161/162 updates then I’m happy closing this topic as “bad timing”.

Thanks!

1 Like

Hello @stack

Googling something I found this:

Maybe by tapping something in the BIOS you can revive it or maybe you can fix it by updating the BIOS. Who knows…

Regards.

1 Like