It is not a heavy load. The yellow colour is the cpu waiting for I/O. So it is spending a lot of time waiting to get info from the disk that it has requested because it has to be tried many times to get a successful response.
It means that things started to get very bad with that drive around 4:00am this morning as that is when the CPU Waiting for IO increased to 40% but even before that you had around 2% to 10% occurring which is not good for a normal drive.
My IPFire system has an average of 0.01% for CPU waiting for IO.
@bonnietwin Thank you! Maybe in future versions it will be possible to display a notification on the ipfire homepage in case of degradation of the hard disk?
That is not so easy to do as the CPU waiting for IO is not a sufficient indicator of hard disk failure as it could also be high for valid reasons, such as a File Server being used on IPFire with large files. You could also have hard disk failure that doesn’t show as higher CPU waiting for IO.
What do you see on the WUI Status - Media page re the disk access per day. Is there anything unusual in the values there.
What does SMART show if you press the SMART Information button on the same page.
And that shows the problems with SMART interpretation. That shows no failing paramters at all but the logs indicate big problems.
SMART sometimes shows no messages and then the drive just dies.
Sometimes it shows messages and nothing happens to the drive. It just keeps working.
Sometimes it shows potential failures and those are an indication of real problems to come.
Hard Disk degradation evaluation is something that really requires multiple sources of information to be reviewed together with built up experience to be able to make a decision.
I had a hard disk on my desktop that showed some errors that were classified as the hard disk controller electronics starting to fail to be able to read from the disk.
I found that the sata connector was not seated well. Replaced with a new sata cable that has a clip to hold it in place and the drive works fine but as that parameter is considered critical for drive health it always flags up a failure with every SMART scan that I run at boot. The parameter has never increased again.
In your case you could look at checking the seating of the hard drive connector and see if the problem disappears completely but I would also purchase a new disk to have available as a precaution.
I completely agree and this is also my experience. Actually, I never saw a SMART table indicating failure, but plenty of disks died with a clean bill of health.
These are the three concerning parameters from the table:
Current_Pending_Sector: The value is 66. These are sectors that are currently unstable and might become reallocated sectors.
Load_Cycle_Count: This is at 2,676,178, which is a high count. Load cycles refer to the number of times the drive head parks itself. A high count indicates a lot of wear and tear, potentially leading to drive failure in the future.
Power_On_Hours: This drive has been powered on for 33,170 hours, which equates to nearly 3.8 years of continuous operation.
Considering this is a mechanical hard disk, it is probably simply wearing off and approaching the end of its life.
Thank you for your help, there is definitely nothing wrong with the hard drive connector… Over the weekend, we will prepare new backup network equipment, since the current network hardware works in a production environment.
Personal opinion/experience: Samsung was not greatest hard drive brand while stayed on market. SSD far better products, for reliability and usually performances.
If this is for corporate use, or even SMB, please consider asking your IT infrastructure team if they’re willing to donate to the IPFire project, if they do not already.
IPFire is community-supported, and a donation can go a long way in ensuring its continual development and improvement, especially since this thread has helped in potentially preventing an outage.
this kind of equipment was purchased a long time ago, and at that time there were other tasks. Time does not stand still, and with the obsolescence of equipment, the requirements also grow. Therefore, over time we will replace the old equipment completely, since it works 24/7 for a fairly long period in not very good conditions.
I’ve taken the liberty of checking the warranty of the drive at Seagate (model: ST320LM000) and the warranty expired 28. April 2012. So if the warranty was 2 years it was produced in 2010, that’s a long time ago.
In fairness, if it’s operating correctly since 2010, 12 years of almost 24/7 operation put this sample at least among excellent performance about reliability.
However, according to my experience, this seems an exception, more than the average. However, reliability of this hard drive might be helped by good PSU and UPS for filtering power issues.
2000/2010 were tough years about PSUs, market was flooded with cheap but crappy products.
The electronics industry went from leaded solder to non-leaded solder in the early to mid 2000’s. So there were lots of learning curve issues related to poorly soldered components.
and to add to the misery!
There were LOTS of bad electrolytic capacitors manufactured in late 1990’s to near 2007. And PSUs need good (really good) electrolytic capacitors!
I hope this will be the shortest OT Ever… So.
AFAIK RoHS kicked in in 2012, at least, the “tougher one” RoHS 2, which hurted a lot european automotive, and Ilmor-designed engines branded Mercedes (bye bye beryllium). This lead (pun intended) to several difficulties for make metals and electronics work… less crappier than with hazardous materials. Capacitors issue were the same during all electronics life, a Youtuber that I follow that restores a lot of electronic equipment as “standard procedure” hit all capacitors for evaluate the status, and change them if are damaged, not responding to specs or… simply crappy (due to original material). Devices are from '60, more '70, and '80.
Rollback to the issues: a lot of mass production changed gear around 2017/2018, most of the times changing processes and tinkering enough with soldering mixes.
In any case, whenever possible, we try to change the equipment every 5 years, since it works 24/7. Yes, even in new equipment, not very fresh components may come across… But I am glad that Ipfire is not very demanding on equipment and if we come across, for example, not very fast hard drives of the old generation, then we do not pay attention to it, the main thing is that it would work for a long time)