Log.dat won't load

Restored but no good, the log is still full of what looks like exceptions in the kernel.

Should I send my log to someone?

It looks like it is spamming this message over and over and over again indefinitely:

May 9 03:09:24 NF-WKIT-01 kernel: red0: hw csum failure
May 9 03:09:24 NF-WKIT-01 kernel: CPU: 0 PID: 3070 Comm: W-NFQ#0 Tainted: G W O 4.14.212-ipfire-multi #1
May 9 03:09:24 NF-WKIT-01 kernel: Hardware name: Generic DT based system
May 9 03:09:24 NF-WKIT-01 kernel: [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
May 9 03:09:24 NF-WKIT-01 kernel: [] (show_stack) from [] (dump_stack+0xc4/0xe0)
May 9 03:09:24 NF-WKIT-01 kernel: [] (dump_stack) from [] (netdev_rx_csum_fault+0x3c/0x48)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netdev_rx_csum_fault) from [] (__skb_checksum_complete+0xac/0xb4)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__skb_checksum_complete) from [] (nf_ip_checksum+0x60/0x120)
May 9 03:09:24 NF-WKIT-01 kernel: [] (nf_ip_checksum) from [] (udp_error+0x198/0x1f4)
May 9 03:09:24 NF-WKIT-01 kernel: [] (udp_error) from [] (nf_conntrack_in+0xf8/0x6a0)
May 9 03:09:24 NF-WKIT-01 kernel: [] (nf_conntrack_in) from [] (ipv4_conntrack_in+0x2c/0x30)
May 9 03:09:24 NF-WKIT-01 kernel: [] (ipv4_conntrack_in) from [] (nf_hook_slow+0x4c/0xd4)
May 9 03:09:24 NF-WKIT-01 kernel: [] (nf_hook_slow) from [] (ip_rcv+0x4e4/0x654)
May 9 03:09:24 NF-WKIT-01 kernel: [] (ip_rcv) from [] (__netif_receive_skb_core+0x740/0xce4)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__netif_receive_skb_core) from [] (__netif_receive_skb+0x20/0x70)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__netif_receive_skb) from [] (netif_receive_skb_internal+0x4c/0x100)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netif_receive_skb_internal) from [] (netif_receive_skb+0x24/0x98)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netif_receive_skb) from [] (ifb_ri_tasklet+0x1a4/0x2c8 [ifb])
May 9 03:09:24 NF-WKIT-01 kernel: [] (ifb_ri_tasklet [ifb]) from [] (tasklet_action+0xc0/0x1f8)
May 9 03:09:24 NF-WKIT-01 kernel: [] (tasklet_action) from [] (__do_softirq+0xec/0x390)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__do_softirq) from [] (do_softirq+0x94/0xbc)
May 9 03:09:24 NF-WKIT-01 kernel: [] (do_softirq) from [] (__local_bh_enable_ip+0xc0/0xe0)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__local_bh_enable_ip) from [] (nf_reinject+0x17c/0x1f0)
May 9 03:09:24 NF-WKIT-01 kernel: [] (nf_reinject) from [] (nfqnl_recv_verdict+0x284/0x494 [nfnetlink_queue])
May 9 03:09:24 NF-WKIT-01 kernel: [] (nfqnl_recv_verdict [nfnetlink_queue]) from [] (nfnetlink_rcv_msg+0x15c/0x27c [nfnetlink])
May 9 03:09:24 NF-WKIT-01 kernel: [] (nfnetlink_rcv_msg [nfnetlink]) from [] (netlink_rcv_skb+0xcc/0x12c)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netlink_rcv_skb) from [] (nfnetlink_rcv+0x98/0x770 [nfnetlink])
May 9 03:09:24 NF-WKIT-01 kernel: [] (nfnetlink_rcv [nfnetlink]) from [] (netlink_unicast+0x1a8/0x248)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netlink_unicast) from [] (netlink_sendmsg+0x208/0x3bc)
May 9 03:09:24 NF-WKIT-01 kernel: [] (netlink_sendmsg) from [] (sock_sendmsg+0x44/0x54)
May 9 03:09:24 NF-WKIT-01 kernel: [] (sock_sendmsg) from [] (___sys_sendmsg+0x2b0/0x2c4)
May 9 03:09:24 NF-WKIT-01 kernel: [] (___sys_sendmsg) from [] (__sys_sendmsg+0x78/0xbc)
May 9 03:09:24 NF-WKIT-01 kernel: [] (__sys_sendmsg) from [] (SyS_sendmsg+0x18/0x1c)
May 9 03:09:24 NF-WKIT-01 kernel: [] (SyS_sendmsg) from [] (ret_fast_syscall+0x0/0x54)

Iā€™m not sure who you would send the log to.

Having looked at the small section you provided earlier it wouldnā€™t help me to see more.

The only thing I am wondering about is if your hardware is dying.

You had Core 155 working on the system so I would suggest installing Core Update 155 on the system. If it also has the same problem that would probably be suggesting some hardware problem. If it works with Core 155 then you could try doing the update to Core 156 using Pakfire and see how that goes. If the problem re-appears then you have the path back to Core 155 and you should raise a bug on Core Update 156 installation.

Edit:
Maybe someone like @arne_f would be able to comment on the kernel messages. He is the IPFire kernel guru.

1 Like

Ok sounds like a plan, yea it seems to be giving the hw csum error for every process here is another example:

May 9 03:08:59 NF-WKIT-01 kernel: red0: hw csum failure
May 9 03:08:59 NF-WKIT-01 kernel: CPU: 0 PID: 2229 Comm: klogd Tainted: G W O 4.14.212-ipfire-multi #1
May 9 03:08:59 NF-WKIT-01 kernel: Hardware name: Generic DT based system
May 9 03:08:59 NF-WKIT-01 kernel: [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
May 9 03:08:59 NF-WKIT-01 kernel: [] (show_stack) from [] (dump_stack+0xc4/0xe0)
May 9 03:08:59 NF-WKIT-01 kernel: [] (dump_stack) from [] (netdev_rx_csum_fault+0x3c/0x48)
May 9 03:08:59 NF-WKIT-01 kernel: [] (netdev_rx_csum_fault) from [] (__skb_checksum_complete+0xac/0xb4)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__skb_checksum_complete) from [] (nf_ip_checksum+0x60/0x120)
May 9 03:08:59 NF-WKIT-01 kernel: [] (nf_ip_checksum) from [] (udp_error+0x198/0x1f4)
May 9 03:08:59 NF-WKIT-01 kernel: [] (udp_error) from [] (nf_conntrack_in+0xf8/0x6a0)
May 9 03:08:59 NF-WKIT-01 kernel: [] (nf_conntrack_in) from [] (ipv4_conntrack_in+0x2c/0x30)
May 9 03:08:59 NF-WKIT-01 kernel: [] (ipv4_conntrack_in) from [] (nf_hook_slow+0x4c/0xd4)
May 9 03:08:59 NF-WKIT-01 kernel: [] (nf_hook_slow) from [] (ip_rcv+0x4e4/0x654)
May 9 03:08:59 NF-WKIT-01 kernel: [] (ip_rcv) from [] (__netif_receive_skb_core+0x740/0xce4)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__netif_receive_skb_core) from [] (__netif_receive_skb+0x20/0x70)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__netif_receive_skb) from [] (netif_receive_skb_internal+0x4c/0x100)
May 9 03:08:59 NF-WKIT-01 kernel: [] (netif_receive_skb_internal) from [] (netif_receive_skb+0x24/0x98)
May 9 03:08:59 NF-WKIT-01 kernel: [] (netif_receive_skb) from [] (ifb_ri_tasklet+0x1a4/0x2c8 [ifb])
May 9 03:08:59 NF-WKIT-01 kernel: [] (ifb_ri_tasklet [ifb]) from [] (tasklet_action+0xc0/0x1f8)
May 9 03:08:59 NF-WKIT-01 kernel: [] (tasklet_action) from [] (__do_softirq+0xec/0x390)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__do_softirq) from [] (irq_exit+0xdc/0x188)
May 9 03:08:59 NF-WKIT-01 kernel: [] (irq_exit) from [] (__handle_domain_irq+0x8c/0xfc)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__handle_domain_irq) from [] (bcm2836_arm_irqchip_handle_irq+0xb4/0xbc)
May 9 03:08:59 NF-WKIT-01 kernel: [] (bcm2836_arm_irqchip_handle_irq) from [] (__irq_svc+0x6c/0x90)
May 9 03:08:59 NF-WKIT-01 kernel: Exception stack(0xc2265dd8 to 0xc2265e20)
May 9 03:08:59 NF-WKIT-01 kernel: 5dc0: 00000093 2e6a0000
May 9 03:08:59 NF-WKIT-01 kernel: 5de0: 00000000 c1042420 000003ab c1230918 00000c54 00025953 00000052 00000000
May 9 03:08:59 NF-WKIT-01 kernel: 5e00: c1230e18 c2265e7c c2265e28 c2265e28 c01ab674 c01ab678 00000013 ffffffff
May 9 03:08:59 NF-WKIT-01 kernel: [] (__irq_svc) from [] (do_syslog+0x2b8/0x5d8)
May 9 03:08:59 NF-WKIT-01 kernel: [] (do_syslog) from [] (kmsg_read+0x58/0x64)
May 9 03:08:59 NF-WKIT-01 kernel: [] (kmsg_read) from [] (proc_reg_read+0x70/0xcc)
May 9 03:08:59 NF-WKIT-01 kernel: [] (proc_reg_read) from [] (__vfs_read+0x4c/0x180)
May 9 03:08:59 NF-WKIT-01 kernel: [] (__vfs_read) from [] (vfs_read+0xa0/0x174)
May 9 03:08:59 NF-WKIT-01 kernel: [] (vfs_read) from [] (SyS_read+0x6c/0xf4)
May 9 03:08:59 NF-WKIT-01 kernel: [] (SyS_read) from [] (ret_fast_syscall+0x0/0x54)

Iā€™m going to test and report based on 155 on the pi, but actually Iā€™m going to upgrade to a Celeron J3160 + 8GB 1600 DDR3 based platform as well, given that the pi is limiting my bandwidth with no AES-NI and crummy RAM, and was only ever intended as a test anyway. Iā€™ll probably swap over to the celeron run it on stable, and keep the Pi for testing test images.

In any case Iā€™ll check with 155 on the pi and report the bug if necessary.

Unfortunately it is doing it with v155 as well.

I suspect in that case it must have happened when I enabled roadwarrior IPSec recently.

So are you saying that a fresh core155 install worked until you restored your backup.

If so do you have a backup from before you installed the ipsec road warrior that you could use.

Thatā€™s correct. I can see in my log that everything is normal, mostly just drop events from the FW etc.

I can see where I installed hostapd, still everything is normal, then I can see in the log where I reboot the ipfire device after restoring the backup, and then from there the kernel errors begin.

An old backup I possibly do have yes I will look.

My new x86_amd64 hardware restored from the very same backup does not exhibit these symptoms at all.

That means one of two things:

a) My Pi has a hardware failure
b) There is an issue with the arm kernel

I have seen this issue evident in older versions of the arm kernel when I did some googling, however I checked the version of the kernel in v155 and that error was supposed to be patched in that kernel version so I dunno.

What to do next?

Sorry, I have run out of ideas.

If it is to do with the kernel then that is beyond my capabilities to debug.

No problem thanks anyway, I guess Iā€™ve outgrown the Pi as a firewall now, using a x86_amd64 device now which doesnā€™t have the problem.