dyle
(Fritz Lange)
19 September 2022 18:36
1
Hi,
After updating to core update 170, hard drives are no longer recognized on the SATA controller.
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0])
Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller
Capabilities: [e0] SATA HBA v0.0
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller [1b4b:9230] (rev 10) (prog-if 01 [AHCI 1.0])
Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller [1458:b000]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 33
IOMMU group: 19
Region 0: I/O ports at b050 [size=8]
Region 1: I/O ports at b040 [size=4]
Region 2: I/O ports at b030 [size=8]
Region 3: I/O ports at b020 [size=4]
Region 4: I/O ports at b000 [size=32]
Region 5: Memory at f0810000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at f0800000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00358 Data: 0000
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x2
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] SATA HBA v0.0 BAR4 Offset=00000004
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: ahci
Would be very grateful for help.
Is there a driver missing?
pmueller
(Peter MĂĽller)
20 September 2022 11:34
2
Hi,
could be. Which SATA drivers were loaded on your system before you installed Core Update 170?
Thanks, and best regards,
Peter MĂĽller
dyle
(Fritz Lange)
20 September 2022 13:47
3
Hi, thanks for your answer!
How do I find out which driver was loaded before the update?
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 10)
Subsystem: Gigabyte Technology Co., Ltd 88SE9230 PCIe SATA 6Gb/s Controller
Kernel driver in use: ahci
I think it was “ahci” before and now “ahci” too
Or am I wrong?
pmueller
(Peter MĂĽller)
20 September 2022 13:51
4
Hi,
such information is usually logged at /var/log/bootlog
, so if you try to compare your current bootlog
to an older one from before the upgrade, you should see a difference.
I. e.:
$ diff -Naur /var/log/bootlog.old /var/log/bootlog
(Replace /var/log/bootlog.old
with the filename, such as /var/log/bootlog.2.gz
, if you updated the IPFire machine two days ago. Also, please keep in mind that these include randomly allocated memory addresses, so a part of the diff is just the normal randomness, and not part of the issue. )
Thanks, and best regards,
Peter MĂĽller
1 Like
dyle
(Fritz Lange)
20 September 2022 15:31
5
bootlog-diff.txt.tar.gz (18.0 KB)
Apparently a lot has changed.
How can I get the system running again?
jon
(Jon)
20 September 2022 18:40
6
Hi Fritz - I’m stepping in for a moment just to help get some additional info.
I think one of the files grabbed was incorrect. There should be lots of +
in the file you returned (and there are) and lots of -
in the file. But there are no -
.
can you post the results of:
ls -lt /var/log/bootlog*
Here are my results:
[root@ipfire ~] # ls -lt /var/log/bootlog*
-rw-rw-r-- 1 root syslogd 0 Sep 18 00:01 /var/log/bootlog
-rw-r--r-- 1 root root 13361 Sep 17 17:48 /var/log/bootlog.1.gz
-rw-rw-r-- 1 root syslogd 0 Sep 11 00:01 /var/log/bootlog.old
-rw-rw-r-- 1 root syslogd 20 Sep 4 00:01 /var/log/bootlog.2.gz
-rw-rw-r-- 1 root syslogd 20 Aug 28 00:01 /var/log/bootlog.3.gz
-rw-rw-r-- 1 root syslogd 20 Aug 21 00:01 /var/log/bootlog.4.gz
-rw-rw-r-- 1 root syslogd 20 Aug 14 00:01 /var/log/bootlog.5.gz
-rw-rw-r-- 1 root syslogd 20 Aug 7 00:01 /var/log/bootlog.6.gz
-rw-r--r-- 1 root root 13334 Aug 6 17:06 /var/log/bootlog.7.gz
-rw-rw-r-- 1 root syslogd 20 Jul 24 00:01 /var/log/bootlog.8.gz
-rw-rw-r-- 1 root syslogd 20 Jul 17 00:01 /var/log/bootlog.9.gz
-rw-r--r-- 1 root root 13351 Jul 12 22:38 /var/log/bootlog.10.gz
So my diff
would be the two larger files.
-rw-r--r-- 1 root root 13361 Sep 17 17:48 /var/log/bootlog.1.gz
and
-rw-r--r-- 1 root root 13334 Aug 6 17:06 /var/log/bootlog.7.gz
EDIT : My diff
ended up looking like this:
diff -Naur <(zcat /var/log/bootlog.1.gz) <(zcat /var/log/bootlog.7.gz)
2 Likes
dyle
(Fritz Lange)
20 September 2022 20:17
7
Did I do diff wrong? The boot log from the Core 170 is very large for me.
ls -lt /var/log/bootlog*
-rw-r--r-- 1 root root 70308 Sep 20 11:53 /var/log/bootlog
-rw-r--r-- 1 root root 70308 Sep 19 13:32 /var/log/bootlog.old
-rw-rw-r-- 1 root syslogd 20 Sep 11 00:01 /var/log/bootlog.1.gz
-rw-rw-r-- 1 root syslogd 20 Sep 4 00:01 /var/log/bootlog.2.gz
-rw-rw-r-- 1 root syslogd 20 Aug 28 00:01 /var/log/bootlog.3.gz
-rw-rw-r-- 1 root syslogd 20 Aug 21 00:01 /var/log/bootlog.4.gz
-rw-rw-r-- 1 root syslogd 20 Aug 14 00:01 /var/log/bootlog.5.gz
-rw-rw-r-- 1 root syslogd 20 Aug 7 00:01 /var/log/bootlog.6.gz
-rw-rw-r-- 1 root syslogd 20 Jul 31 00:01 /var/log/bootlog.7.gz
-rw-rw-r-- 1 root syslogd 20 Jul 24 00:01 /var/log/bootlog.8.gz
-rw-r--r-- 1 root root 17854 Jul 19 05:00 /var/log/bootlog.9.gz
-rw-rw-r-- 1 root syslogd 20 Jul 10 00:01 /var/log/bootlog.10.gz
-rw-rw-r-- 1 root syslogd 20 Jul 3 00:01 /var/log/bootlog.11.gz
-rw-rw-r-- 1 root syslogd 20 Jun 26 00:01 /var/log/bootlog.12.gz
-rw-rw-r-- 1 root syslogd 20 Jun 19 00:01 /var/log/bootlog.13.gz
-rw-r--r-- 1 root root 17801 Jun 15 22:43 /var/log/bootlog.14.gz
-rw-rw-r-- 1 root syslogd 20 Jun 5 00:01 /var/log/bootlog.15.gz
1 Like
jon
(Jon)
20 September 2022 20:53
8
Try this:
diff -Naur <(zcat /var/log/bootlog.9.gz) /var/log/bootlog
dyle
(Fritz Lange)
21 September 2022 03:55
9
Thanks for the help, I unpacked “bootlog.9.gz” the first time, now “-” is displayed.
diff-new.tar (136 KB)
0.652713] DMAR: [DMA Read NO_PASID] Request device [05:00.0] fault addr 0xac311000 [fault reason 0x06] PTE Read access is not set
How can I change that in the kernel?
Many greetings
jon
(Jon)
21 September 2022 19:33
10
Fritz,
Peter @pmueller will need to answer your questions about the Kernel.
This might make the output a little easier to read. It deletes the timestamp from every line.
diff -Naur <(cut -b15- <(zcat /var/log/bootlog.9.gz) ) <(cut -b15- /var/log/bootlog)
dyle
(Fritz Lange)
21 September 2022 20:12
11
dyle
(Fritz Lange)
23 September 2022 03:54
12
Jon:
output
Hello Jon,
Is the output okay?
Greetings Fritz
jon
(Jon)
23 September 2022 17:47
13
Fritz Lange:
Is the output okay?
As far as looks - it looks good to me! Peter @pmueller will need to review for debugging
feedback.
dyle
(Fritz Lange)
23 September 2022 18:54
14
Hi Jon,
Is this problem more common on the Core 170?
jon
(Jon)
27 September 2022 23:44
16
Hi Fritz - I am guessing Peter got busy with work stuff since he has to been around in the past 5-6 days.
pmueller
(Peter MĂĽller)
3 October 2022 10:58
18
Hi,
apologies for being unresponsive over the course of last week. ${dayjob} keeps my days quite packed…
I will work through your replies and the supplied logs in due course and get back to you afterwards.
EDIT: These log lines indicate the problem:
ata7: link is slow to respond, please be patient (ready=0)
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: link is slow to respond, please be patient (ready=0)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: limiting SATA link speed to 1.5 Gbps
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: link is slow to respond, please be patient (ready=0)
ata9.00: qc timeout (cmd 0xec)
ata9.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata14.00: qc timeout (cmd 0xa1)
ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata7: COMRESET failed (errno=-16)
ata7: limiting SATA link speed to 3.0 Gbps
ata7: COMRESET failed (errno=-16)
ata7: reset failed, giving up
The root cause to this is unknown to me at the time of writing, will investigate and report back.
In unrelated findings, could you check whether there is a BIOS update available for your machine? The following log message as well as the CPU microcode no longer loaded indicate that there is a problem - however, it is not a severe issue, and certainly not related to the SATA controller problem.
[Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Thanks, and best regards,
Peter MĂĽller
2 Likes
pmueller
(Peter MĂĽller)
3 October 2022 11:17
19
Hi @dyle ,
looks like the problem was already reported to the kernel developers in 2018. However, no fix is available for it.
https://bugzilla.kernel.org/show_bug.cgi?id=199733
The workaround would be to disable IOMMU in the BIOS of your machine. This should not have any functional consequences, except if you have other VMs running on top of IPFire (discouraged for security reasons anyway), particular if PCI device pass-through is used.
Sorry to disappoint, and best regards,
Peter MĂĽller
2 Likes
dyle
(Fritz Lange)
3 October 2022 13:31
20
Hi Peter,
Thank you for your help!!!
Yes that was the mistake! Turned off virtualization in the BIOS and the Sata Controller works again.
However, rsync via ssh hasn’t worked since 170, I always get the following error:
rsync -az --delete 192.168.0.2::Relase /mnt/backup/backup/
ERROR: rejecting unrequested file-list name: lib
rsync error: protocol incompatibility (code 2) at flist.c(998) [Receiver=3.2.4]
Should I open a new thread for this?
2 Likes