APU4D4 - kernel: mce: [Hardware Error]: Machine check events logged

My test IPFire APU4D4 suddenly rebooted. It has been sitting idle for the past few days and I heard a familiar “beep” reboot at Aug 3 14:09:57.

FYI - There is nothing in the log at the time of reboot:
message.log.zip (4.0 KB)

but I did find this from a few days ago. This message popped up about 10 times since the beginning of July.

Jul 30 10:18:12 ipfireAPU2 The database has been updated recently
Jul 30 11:51:40 ipfireAPU2 kernel: mce: [Hardware Error]: Machine check events logged
Jul 30 11:51:41 ipfireAPU2 kernel: [Hardware Error]: Corrected error, no action required.
Jul 30 11:51:41 ipfireAPU2 kernel: [Hardware Error]: CPU:0 (16:30:1) MC1_STATUS[-|CE|-|AddrV|-|-|-]: 0x9400000000000151
Jul 30 11:51:41 ipfireAPU2 kernel: [Hardware Error]: Error Addr: 0x0000ffffbd41b770
Jul 30 11:51:41 ipfireAPU2 kernel: [Hardware Error]: MC1 Error: Data/tag array parity error for a tag hit.
Jul 30 11:51:41 ipfireAPU2 kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Jul 30 11:55:48 ipfireAPU2 The database has been updated recently

I loaded up mcelog to try and decode but the APU does not have a supported CPU.

Any thoughts on what this might indicate?

I saw my firmware version was not up to date and I just did the update with www.ipfire.org - firmware-update.

Hopefully that will help!

I am still seeing these same errors. And occasional “reboots” for no know reason (besides the below).

Any ideas what this might mean?

Sep 19 03:20:24 ipfireAPU2 kernel: mce: [Hardware Error]: Machine check events logged
Sep 19 03:20:24 ipfireAPU2 kernel: [Hardware Error]: Corrected error, no action required.
Sep 19 03:20:24 ipfireAPU2 kernel: [Hardware Error]: CPU:0 (16:30:1) MC1_STATUS[-|CE|-|AddrV|-|-|-]: 0x9400000000000151
Sep 19 03:20:24 ipfireAPU2 kernel: [Hardware Error]: Error Addr: 0x0000ffffa2145270
Sep 19 03:20:24 ipfireAPU2 kernel: [Hardware Error]: MC1 Error: Data/tag array parity error for a tag hit.
Sep 19 03:20:24 ipfireAPU2 kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD

also mcelog did not help me:

[root@ipfireAPU2 ~]# mcelog
mcelog: ERROR: AMD Processor family 22: mcelog does not support this processor.  Please use the edac_mce_amd module instead.
CPU is unsupported

Found this in the kernel bugzilla.

https://bugzilla.kernel.org/show_bug.cgi?id=207907

While not exactly the same error, it appears that those type of messages are when a physical error in memory has occurred that has been able to be corrected by the kernel and as long as it can be corrected you have to live with it.

3 Likes

Thanks!

I am not sure it is really corrected since I also experience this…

I am running a memtest now since the error mentions a “parity error”.

Adolf,

I thought there was a memtest within the ipfire build. I see a LFS file

but I cannot locate it within pakfire or on any of my ipfire devices. (Maybe I need more coffee)

Am I blind?!?

It goes into the cdrom menu system, not into the system itself I believe.

So you could use a usb stick with IPFire iso installed and then select the memtest on there and after running it just exit from the cdrom process so no installation change.

3 Likes

BIOS memtest has been running OK (no errors). So I am thinking it may be this:

I am going to disable CPB (core performance boost) in the BIOS and see what happens…

So far so good! No mce: Hardware errors and no spontaneous (unplanned!) reboots since Sep 20. Yay.

Steps:

  • go to BIOS via F10 on boot
  • click setup
  • disable Core Performance Boost
  • save it!

I do not know if there are any performance issues related to this change.

2 Likes