I am curious - Have you tried installing a new hard drive?
Jon, As I am writing this, the device has been online for 24 hours without issue, the device went down at 2:00 PM yesterday (10/17/22). At that time it had a sudden reboot. As I am writing this I see the CPU Spiking on 1 core at nearly 100%,
Here’s the S.M.A.R.T info:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.59-ipfire] (IPFire 2.27) Copyright (C) 2002-22 Bruce Allen Christian Franke www.smartmontools.org === START OF INFORMATION SECTION === Device Model: KingDian M280 120GB Serial Number: A4720783073000208341 LU WWN Device Id: 0 000000 000000000 Firmware Version: SBFM61.2 User Capacity: 120 034 123 776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: mSATA TRIM Command: Available Device is: Not in smartctl database 7.3/5319 ATA Version is: ACS-4 (minor revision not indicated) SATA Version is: SATA 3.2 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Oct 18 13:51:58 2022 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 34376 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 374 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 170 Unknown_Attribute 0x0003 095 095 000 Pre-fail Always - 52 173 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 5636205 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 321 194 Temperature_Celsius 0x0023 067 067 000 Pre-fail Always - 33 (Min/Max 33/33) 218 Unknown_Attribute 0x000b 100 100 050 Pre-fail Always - 0 231 Unknown_SSD_Attribute 0x0013 100 100 000 Pre-fail Always - 97 241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 10979
I swapped out the whole system, not just a drive, because I didn’t know if it was specifically a drive issue.
I have the old system on my desk here and I ran some tests on it. Both the unit in production and the unit on my desk have Identical hardware: Model 101S-6 Supermicro SYS-E200-9B. 4 Core Intel Pentium N3700 1.6 Ghz, 8GB ram, 120 GB m.2 sata ssd, 4x Intel Ethernet Controller i210 Gigabit Netework connection (rev03).
On the production machine:
[root@ipfire log]# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 3092 MB in 1.99 seconds = 1552.27 MB/sec Timing buffered disk reads: 1174 MB in 3.00 seconds = 391.22 MB/sec [root@ipfire log]#
On the machine on my desk with no users connected:
[root@ipfire log]# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 2236 MB in 2.00 seconds = 1117.89 MB/sec Timing buffered disk reads: 230 MB in 3.01 seconds = 76.5 MB/sec [root@ipfire log]#
That is quite a performance difference.
EDIT: added block code formatting - moderator
keep going… What is in the message log a little before that time?
I private messaged you the logs for the last hour 1:00 - 2:00 PM yesterday.
A post was split to a new topic: openVPN - Out of memory
Since the new certificate has been in place have you had any more memory spikes? Also, did you have to re-set up every OpenVPN client install again?
So the system did it yesterday and it did it again today where the system did some kind of a reboot, but honestly today it happened so fast that I didn’t even notice. The same thing happened yesterday, but I don’t know what is causing this situation. Attached is the bootlog.
bootlog.gz (14.3 KB)
Here is the memory Graph:
You can see the tiny white line between 3:30 and 4:00 PM today. The uptime on IPFire is 1 hour 3 minutes, as of 4:51 PM today so 3:48 is when the system went down / restarted for some unknown reason. This time not because of a memory load, the graph doesn’t show it, but something is still causing our system to go down now daily. Here’s the /var/log/messages for the 3:00 hour with security related info removed.
10-19-22-1500.txt.gz (159.1 KB)
If anyone could help I would appreciate it.
The 05:30 crash is not in the message log. That might be the important one since memory usage starts climbing near 05:00.
I can find the one that happened at 15:46:43 (up at 15:48:05):
Oct 19 15:46:43 ipfire kernel: DROP_INPUT IN=red0 OUT= MAC=<red-mac-address>:84:bb:69:d2:b8:b0:08:00 SRC=188.8.131.52 DST=<External-IP> LEN=80 TOS=0x00 PREC=0x00 TTL=111 ID=57233 PROTO=UDP SPT=55342 DPT=64645 LEN=60 MARK=0x80000000 Oct 19 15:48:05 ipfire syslogd 1.5.1: restart (remote reception). Oct 19 15:48:05 ipfire kernel: usb 1-184.108.40.206: New USB device found, idVendor=0557, idProduct=2419, bcdDevice= 1.00
I don’t see anything interesting in the message log. To me it acts like it is a power issue (lost power for whatever odd reason) or a maybe hardware problem. Maybe someone else can look and add there comments.
I have the same problem, this process (openvpn-authent) fills memory until full. This process occurs periodically every night or after a new certificate is created.
Other processes are then terminated.
I have now deleted the certificates and deactivated OpenVPN, since then the problem has not occurred again.
Same thing occurred again this morning:
Oct 17 05:00:34 ipfire openvpnserver: MANAGEMENT: Client connected from /var/run/openvpn.sock Oct 17 06:25:26 ipfire kernel: openvpn-authent invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 Oct 17 06:25:26 ipfire kernel: CPU: 0 PID: 16199 Comm: openvpn-authent Not tainted 5.15.59-ipfire #1 Oct 17 06:25:26 ipfire kernel: [ 16199] 0 16199 2154150 1858074 16859136 236139 0 openvpn-authent Oct 17 06:25:26 ipfire kernel: [ 31339] 99 31339 1910 32 49152 145 0 openvpn Oct 17 06:25:26 ipfire kernel: [ 31389] 0 31389 4059 0 73728 1802 0 openvpn-authent Oct 17 06:25:26 ipfire kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=openvpn-authent,pid=16199,uid=0 Oct 17 06:25:26 ipfire kernel: Out of memory: Killed process 16199 (openvpn-authent) total-vm:8616600kB, anon-rss:7432056kB, file-rss:240kB, shmem-rss:0kB, UID:0 pgtables:16464kB oom_score_adj:0 grep: messages: binary file matches [root@ipfire log]# uptime 09:09:18 up 4:16, 1 user, load average: 1.43, 1.37, 1.24 [root@ipfire log]#
Is there any way to downgrade to 168 where these problems did not occur, or do you have to delete and rebuild a whole system?
Sorry to say you have to rebuild.
Please open a bug report about the openvpn-authent issue. This will help make sure the Development team reviews this information.
Login using your IPFire email address and the IPFire password.
Is there an archive ftp server that holds the older build versions?
Chris - FYI
I already downloaded that version, but I can’t reboot until after hours, probably this weekend.
Hi Chris, stop openVPN, delete the certificate, upgrade to core 171 and reboot. Then create a new certificate and start openVPN.
Seems to help me. It’s worth a try and less work than reinstalling everything.
If it is a bug in OpenVPN < 2.5.7 then it is not an endemic problem, it must be a combination between an earlier version and something else.
I am using OpenVPN with IPFire for several years and I have not seen any issues with Out Of Memory events in any of that time (I have grepped through the IPFire logs for oom and Out of memory and found nothing) and my memory graph has not seen any large spikes. The used memory has varied between 7% min and 27% max.
It might have some relation to interactions with other packages. I have seen some references to qemu also being impacted but I don’t use qemu on my IPFire system.
I’ll try that and post the results later today.