Core update 170 SATA hard drives not recognized on the SATA controller

Hi,

After updating to core update 170, hard drives are no longer recognized on the SATA controller.

05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0])
	Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller
	Capabilities: [e0] SATA HBA v0.0

05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])
	Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller [1458:b000]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 33
	IOMMU group: 19
	Region 0: I/O ports at b050 [size=8]
	Region 1: I/O ports at b040 [size=4]
	Region 2: I/O ports at b030 [size=8]
	Region 3: I/O ports at b020 [size=4]
	Region 4: I/O ports at b000 [size=32]
	Region 5: Memory at f0810000 (32-bit, non-prefetchable) [size=2K]
	Expansion ROM at f0800000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00358  Data: 0000
	Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x2
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [e0] SATA HBA v0.0 BAR4 Offset=00000004
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Kernel driver in use: ahci

Would be very grateful for help.

Is there a driver missing?

Hi,

could be. Which SATA drivers were loaded on your system before you installed Core Update 170?

Thanks, and best regards,
Peter MĂĽller

Hi, thanks for your answer!

How do I find out which driver was loaded before the update?

05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 10)
	Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller
	Kernel driver in use: ahci

I think it was “ahci” before and now “ahci” too

Or am I wrong?

Hi,

such information is usually logged at /var/log/bootlog, so if you try to compare your current bootlog to an older one from before the upgrade, you should see a difference.

I. e.:

$ diff -Naur /var/log/bootlog.old /var/log/bootlog

(Replace /var/log/bootlog.old with the filename, such as /var/log/bootlog.2.gz, if you updated the IPFire machine two days ago. Also, please keep in mind that these include randomly allocated memory addresses, so a part of the diff is just the normal randomness, and not part of the issue. :slight_smile: )

Thanks, and best regards,
Peter MĂĽller

1 Like

bootlog-diff.txt.tar.gz (18.0 KB)
Apparently a lot has changed.
How can I get the system running again?

Hi Fritz - I’m stepping in for a moment just to help get some additional info.

I think one of the files grabbed was incorrect. There should be lots of + in the file you returned (and there are) and lots of - in the file. But there are no - .

can you post the results of:

ls -lt /var/log/bootlog*

Here are my results:

[root@ipfire ~] # ls -lt /var/log/bootlog*
-rw-rw-r-- 1 root syslogd     0 Sep 18 00:01 /var/log/bootlog
-rw-r--r-- 1 root root    13361 Sep 17 17:48 /var/log/bootlog.1.gz
-rw-rw-r-- 1 root syslogd     0 Sep 11 00:01 /var/log/bootlog.old
-rw-rw-r-- 1 root syslogd    20 Sep  4 00:01 /var/log/bootlog.2.gz
-rw-rw-r-- 1 root syslogd    20 Aug 28 00:01 /var/log/bootlog.3.gz
-rw-rw-r-- 1 root syslogd    20 Aug 21 00:01 /var/log/bootlog.4.gz
-rw-rw-r-- 1 root syslogd    20 Aug 14 00:01 /var/log/bootlog.5.gz
-rw-rw-r-- 1 root syslogd    20 Aug  7 00:01 /var/log/bootlog.6.gz
-rw-r--r-- 1 root root    13334 Aug  6 17:06 /var/log/bootlog.7.gz
-rw-rw-r-- 1 root syslogd    20 Jul 24 00:01 /var/log/bootlog.8.gz
-rw-rw-r-- 1 root syslogd    20 Jul 17 00:01 /var/log/bootlog.9.gz
-rw-r--r-- 1 root root    13351 Jul 12 22:38 /var/log/bootlog.10.gz

So my diff would be the two larger files.

-rw-r--r-- 1 root root    13361 Sep 17 17:48 /var/log/bootlog.1.gz

and

-rw-r--r-- 1 root root    13334 Aug  6 17:06 /var/log/bootlog.7.gz


EDIT: My diff ended up looking like this:

diff -Naur <(zcat /var/log/bootlog.1.gz) <(zcat /var/log/bootlog.7.gz)
2 Likes

Did I do diff wrong? The boot log from the Core 170 is very large for me.

ls -lt /var/log/bootlog*
-rw-r--r-- 1 root root    70308 Sep 20 11:53 /var/log/bootlog
-rw-r--r-- 1 root root    70308 Sep 19 13:32 /var/log/bootlog.old
-rw-rw-r-- 1 root syslogd    20 Sep 11 00:01 /var/log/bootlog.1.gz
-rw-rw-r-- 1 root syslogd    20 Sep  4 00:01 /var/log/bootlog.2.gz
-rw-rw-r-- 1 root syslogd    20 Aug 28 00:01 /var/log/bootlog.3.gz
-rw-rw-r-- 1 root syslogd    20 Aug 21 00:01 /var/log/bootlog.4.gz
-rw-rw-r-- 1 root syslogd    20 Aug 14 00:01 /var/log/bootlog.5.gz
-rw-rw-r-- 1 root syslogd    20 Aug  7 00:01 /var/log/bootlog.6.gz
-rw-rw-r-- 1 root syslogd    20 Jul 31 00:01 /var/log/bootlog.7.gz
-rw-rw-r-- 1 root syslogd    20 Jul 24 00:01 /var/log/bootlog.8.gz
-rw-r--r-- 1 root root    17854 Jul 19 05:00 /var/log/bootlog.9.gz
-rw-rw-r-- 1 root syslogd    20 Jul 10 00:01 /var/log/bootlog.10.gz
-rw-rw-r-- 1 root syslogd    20 Jul  3 00:01 /var/log/bootlog.11.gz
-rw-rw-r-- 1 root syslogd    20 Jun 26 00:01 /var/log/bootlog.12.gz
-rw-rw-r-- 1 root syslogd    20 Jun 19 00:01 /var/log/bootlog.13.gz
-rw-r--r-- 1 root root    17801 Jun 15 22:43 /var/log/bootlog.14.gz
-rw-rw-r-- 1 root syslogd    20 Jun  5 00:01 /var/log/bootlog.15.gz
1 Like

Try this:

diff -Naur <(zcat /var/log/bootlog.9.gz) /var/log/bootlog

Thanks for the help, I unpacked “bootlog.9.gz” the first time, now “-” is displayed.

diff-new.tar (136 KB)

0.652713] DMAR: [DMA Read NO_PASID] Request device [05:00.0] fault addr 0xac311000 [fault reason 0x06] PTE Read access is not set

How can I change that in the kernel?

Many greetings

Fritz,

Peter @pmueller will need to answer your questions about the Kernel.

This might make the output a little easier to read. It deletes the timestamp from every line.

diff -Naur <(cut -b15- <(zcat /var/log/bootlog.9.gz) ) <(cut -b15- /var/log/bootlog) 

bootlog.new.tar (38.5 KB)

Hello Jon,
Is the output okay?

Greetings Fritz

As far as looks - it looks good to me! Peter @pmueller will need to review for debugging
feedback.

Hi Jon,

Is this problem more common on the Core 170?

Can I help you?

Hi Fritz - I am guessing Peter got busy with work stuff since he has to been around in the past 5-6 days.

Is there anything new?

Hi,

apologies for being unresponsive over the course of last week. ${dayjob} keeps my days quite packed…

I will work through your replies and the supplied logs in due course and get back to you afterwards.

EDIT: These log lines indicate the problem:

ata7: link is slow to respond, please be patient (ready=0)
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: link is slow to respond, please be patient (ready=0)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: limiting SATA link speed to 1.5 Gbps
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: link is slow to respond, please be patient (ready=0)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: limiting SATA link speed to 3.0 Gbps
ata7: COMRESET failed (errno=-16)
ata7: reset failed, giving up

The root cause to this is unknown to me at the time of writing, will investigate and report back.

In unrelated findings, could you check whether there is a BIOS update available for your machine? The following log message as well as the CPU microcode no longer loaded indicate that there is a problem - however, it is not a severe issue, and certainly not related to the SATA controller problem.

[Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)

Thanks, and best regards,
Peter MĂĽller

2 Likes

Hi @dyle,

looks like the problem was already reported to the kernel developers in 2018. However, no fix is available for it.

https://bugzilla.kernel.org/show_bug.cgi?id=199733

The workaround would be to disable IOMMU in the BIOS of your machine. This should not have any functional consequences, except if you have other VMs running on top of IPFire (discouraged for security reasons anyway), particular if PCI device pass-through is used.

Sorry to disappoint, and best regards,
Peter MĂĽller

2 Likes

Hi Peter,

Thank you for your help!!!

Yes that was the mistake! Turned off virtualization in the BIOS and the Sata Controller works again.

However, rsync via ssh hasn’t worked since 170, I always get the following error:

rsync -az --delete 192.168.0.2::Relase /mnt/backup/backup/
ERROR: rejecting unrequested file-list name: lib
rsync error: protocol incompatibility (code 2) at flist.c(998) [Receiver=3.2.4]

Should I open a new thread for this?

2 Likes