* [gentoo-user] Kernel2.6.33: ATA failed command: READ FPDMA QUEUED, hard resetting link
@ 2010-03-26 20:17 99% Paul Hartman
0 siblings, 0 replies; 1+ results
From: Paul Hartman @ 2010-03-26 20:17 UTC (permalink / raw
To: gentoo-user
Hi,
Setting up and testing my new system (after wasting nearly 1 month
with bad RAM modules), I got this error today:
[48055.741389] ata3.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
[48055.741393] ata3.00: failed command: READ FPDMA QUEUED
[48055.741398] ata3.00: cmd 60/20:08:38:15:03/01:00:18:00:00/40 tag 1
ncq 147456 in
[48055.741400] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[48055.741402] ata3.00: status: { DRDY }
[48055.741405] ata3: hard resetting link
[48056.198746] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[48056.210514] ata3.00: configured for UDMA/133
[48056.210518] ata3.00: device reported invalid CHS sector 0
[48056.210523] ata3: EH complete
I really don't understand what it means, but the "timeout", "hard
resetting link" and "invalid CHS sector 0" look scary to me...
Initial bootup messages for this device were:
Mar 25 22:02:32 [kernel] [ 4.496102] ata3: SATA max UDMA/133 abar
m2048@0xfbffc000 port 0xfbffc200 irq 34
Mar 25 22:02:32 [kernel] [ 8.519169] ata3: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Mar 25 22:02:32 [kernel] [ 8.536681] ata3.00: ATA-8: SAMSUNG
HD203WI, 1AN10002, max UDMA/133
Mar 25 22:02:32 [kernel] [ 8.548388] ata3.00: 3907029168 sectors,
multi 0: LBA48 NCQ (depth 31/32), AA
Mar 25 22:02:32 [kernel] [ 8.566100] ata3.00: configured for UDMA/133
That disk is part of a md RAID5, but I was at work when this error
happened so I didn't notice if the RAID repaired itself or whatever
would happen in this case (I don't have mdadm monitoring configured
yet). Right now all RAID disks are all up and healthy.
I googled it but most of the results are pastebin snippets. I'm using
kernel 2.6.33 and ahci driver for the SATA controllers.
From libata documentation in the section about timeout errors it says:
"Most often this is due to an unrelated interrupt subsystem bug (try
booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to
deliver an interrupt when we were expecting one from the hardware."
I really don't know the potential implications of disabling MSI or
APIC, but in /proc/interrupts I do see AHCI related to both MSI and
APIC rows. So at least I know they are active right now.
Temperatures in my system are good, hddtemp says the drive in question
is 21C degrees right now.
Another possibility is that I need to increase voltage on the
motherboard, since it is running 6 hdd's and 1 DVD-ROM. I'll have to
research to see which voltage is related to this. (X58 motherboard)
Thanks in advance if anyone has any knowledge about this, otherwise I
go to trial-and-hopefully-no-error mode. :)
Paul
^ permalink raw reply [relevance 99%]
Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2010-03-26 20:17 99% [gentoo-user] Kernel2.6.33: ATA failed command: READ FPDMA QUEUED, hard resetting link Paul Hartman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox