From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-user+bounces-109332-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1NvFyd-00056I-2q
	for garchives@archives.gentoo.org; Fri, 26 Mar 2010 20:17:55 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 94DDDE0901;
	Fri, 26 Mar 2010 20:17:20 +0000 (UTC)
Received: from mail-pw0-f53.google.com (mail-pw0-f53.google.com [209.85.160.53])
	by pigeon.gentoo.org (Postfix) with ESMTP id 65ED7E0968
	for <gentoo-user@lists.gentoo.org>; Fri, 26 Mar 2010 20:17:20 +0000 (UTC)
Received: by pwj10 with SMTP id 10so6293694pwj.40
        for <gentoo-user@lists.gentoo.org>; Fri, 26 Mar 2010 13:17:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:mime-version:sender:received:date
         :x-google-sender-auth:received:message-id:subject:from:to
         :content-type;
        bh=TInYQHkVyCR+z8lCXBhIeuZrikfTFu9kW+ESeJ1sF1M=;
        b=QPTz/0ozhVukGIkNL4Pj2gdznR7Zf0pklKIqqV3/CgXYKPiEBAV3HTXiLglm2lEd9C
         Og89ZhjCHKooVTN1iT/EEdtVx0H7XPXhj4TzGLkhbdUnB0ENnYghSN7CvxI09oShbt/f
         VRgq9H8HMN4dk/PrYOqyCduJPR1TmffRAMVLE=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:date:x-google-sender-auth:message-id:subject
         :from:to:content-type;
        b=BCOmtHGyjWlpWzFjOEAcYvx0czdTq/qiQRNV6yyu5i55VRdYR0OLf94axXPlfm8V2m
         hcpiTj8cwplUEEFWUFMFhrWrDlvnVmxb/dEZVHHURwmMY75OJwBdgOqmUrBUw/yzsDPI
         jCi3mMFWLE7Gnq8UrCiZAtHZTsoxxo91Bgisk=
Precedence: bulk
List-Post: <mailto:gentoo-user@lists.gentoo.org>
List-Help: <mailto:gentoo-user+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-user.gentoo.org>
X-BeenThere: gentoo-user@lists.gentoo.org
Reply-to: gentoo-user@lists.gentoo.org
MIME-Version: 1.0
Sender: paul.hartman@gmail.com
Received: by 10.150.229.15 with HTTP; Fri, 26 Mar 2010 13:17:19 -0700 (PDT)
Date: Fri, 26 Mar 2010 15:17:19 -0500
X-Google-Sender-Auth: 9245b9b3cf0a111d
Received: by 10.141.12.7 with SMTP id p7mr1495538rvi.235.1269634639883; Fri, 
	26 Mar 2010 13:17:19 -0700 (PDT)
Message-ID: <58965d8a1003261317j24856b5cied7c5bf4b83ebf50@mail.gmail.com>
Subject: [gentoo-user] Kernel2.6.33: ATA failed command: READ FPDMA QUEUED, hard resetting 
	link
From: Paul Hartman <paul.hartman+gentoo@gmail.com>
To: gentoo-user@lists.gentoo.org
Content-Type: text/plain; charset=ISO-8859-1
X-Archives-Salt: 72ef7525-6b26-4281-b021-1f5bd40e90a0
X-Archives-Hash: 24a1c4eb0409c237010283d008289905

Hi,

Setting up and testing my new system (after wasting nearly 1 month
with bad RAM modules), I got this error today:

[48055.741389] ata3.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
[48055.741393] ata3.00: failed command: READ FPDMA QUEUED
[48055.741398] ata3.00: cmd 60/20:08:38:15:03/01:00:18:00:00/40 tag 1
ncq 147456 in
[48055.741400]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[48055.741402] ata3.00: status: { DRDY }
[48055.741405] ata3: hard resetting link
[48056.198746] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[48056.210514] ata3.00: configured for UDMA/133
[48056.210518] ata3.00: device reported invalid CHS sector 0
[48056.210523] ata3: EH complete

I really don't understand what it means, but the "timeout", "hard
resetting link" and "invalid CHS sector 0" look scary to me...

Initial bootup messages for this device were:
Mar 25 22:02:32 [kernel] [    4.496102] ata3: SATA max UDMA/133 abar
m2048@0xfbffc000 port 0xfbffc200 irq 34
Mar 25 22:02:32 [kernel] [    8.519169] ata3: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Mar 25 22:02:32 [kernel] [    8.536681] ata3.00: ATA-8: SAMSUNG
HD203WI, 1AN10002, max UDMA/133
Mar 25 22:02:32 [kernel] [    8.548388] ata3.00: 3907029168 sectors,
multi 0: LBA48 NCQ (depth 31/32), AA
Mar 25 22:02:32 [kernel] [    8.566100] ata3.00: configured for UDMA/133

That disk is part of a md RAID5, but I was at work when this error
happened so I didn't notice if the RAID repaired itself or whatever
would happen in this case (I don't have mdadm monitoring configured
yet). Right now all RAID disks are all up and healthy.

I googled it but most of the results are pastebin snippets. I'm using
kernel 2.6.33 and ahci driver for the SATA controllers.

>From libata documentation in the section about timeout errors it says:
"Most often this is due to an unrelated interrupt subsystem bug (try
booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to
deliver an interrupt when we were expecting one from the hardware."

I really don't know the potential implications of disabling MSI or
APIC, but in /proc/interrupts I do see AHCI related to both MSI and
APIC rows. So at least I know they are active right now.

Temperatures in my system are good, hddtemp says the drive in question
is 21C degrees right now.

Another possibility is that I need to increase voltage on the
motherboard, since it is running 6 hdd's and 1 DVD-ROM. I'll have to
research to see which voltage is related to this. (X58 motherboard)

Thanks in advance if anyone has any knowledge about this, otherwise I
go to trial-and-hopefully-no-error mode. :)

Paul