From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from <gentoo-user+bounces-109332-garchives=archives.gentoo.org@lists.gentoo.org>) id 1NvFyd-00056I-2q for garchives@archives.gentoo.org; Fri, 26 Mar 2010 20:17:55 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 94DDDE0901; Fri, 26 Mar 2010 20:17:20 +0000 (UTC) Received: from mail-pw0-f53.google.com (mail-pw0-f53.google.com [209.85.160.53]) by pigeon.gentoo.org (Postfix) with ESMTP id 65ED7E0968 for <gentoo-user@lists.gentoo.org>; Fri, 26 Mar 2010 20:17:20 +0000 (UTC) Received: by pwj10 with SMTP id 10so6293694pwj.40 for <gentoo-user@lists.gentoo.org>; Fri, 26 Mar 2010 13:17:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:received:message-id:subject:from:to :content-type; bh=TInYQHkVyCR+z8lCXBhIeuZrikfTFu9kW+ESeJ1sF1M=; b=QPTz/0ozhVukGIkNL4Pj2gdznR7Zf0pklKIqqV3/CgXYKPiEBAV3HTXiLglm2lEd9C Og89ZhjCHKooVTN1iT/EEdtVx0H7XPXhj4TzGLkhbdUnB0ENnYghSN7CvxI09oShbt/f VRgq9H8HMN4dk/PrYOqyCduJPR1TmffRAMVLE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=BCOmtHGyjWlpWzFjOEAcYvx0czdTq/qiQRNV6yyu5i55VRdYR0OLf94axXPlfm8V2m hcpiTj8cwplUEEFWUFMFhrWrDlvnVmxb/dEZVHHURwmMY75OJwBdgOqmUrBUw/yzsDPI jCi3mMFWLE7Gnq8UrCiZAtHZTsoxxo91Bgisk= Precedence: bulk List-Post: <mailto:gentoo-user@lists.gentoo.org> List-Help: <mailto:gentoo-user+help@lists.gentoo.org> List-Unsubscribe: <mailto:gentoo-user+unsubscribe@lists.gentoo.org> List-Subscribe: <mailto:gentoo-user+subscribe@lists.gentoo.org> List-Id: Gentoo Linux mail <gentoo-user.gentoo.org> X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Sender: paul.hartman@gmail.com Received: by 10.150.229.15 with HTTP; Fri, 26 Mar 2010 13:17:19 -0700 (PDT) Date: Fri, 26 Mar 2010 15:17:19 -0500 X-Google-Sender-Auth: 9245b9b3cf0a111d Received: by 10.141.12.7 with SMTP id p7mr1495538rvi.235.1269634639883; Fri, 26 Mar 2010 13:17:19 -0700 (PDT) Message-ID: <58965d8a1003261317j24856b5cied7c5bf4b83ebf50@mail.gmail.com> Subject: [gentoo-user] Kernel2.6.33: ATA failed command: READ FPDMA QUEUED, hard resetting link From: Paul Hartman <paul.hartman+gentoo@gmail.com> To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=ISO-8859-1 X-Archives-Salt: 72ef7525-6b26-4281-b021-1f5bd40e90a0 X-Archives-Hash: 24a1c4eb0409c237010283d008289905 Hi, Setting up and testing my new system (after wasting nearly 1 month with bad RAM modules), I got this error today: [48055.741389] ata3.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen [48055.741393] ata3.00: failed command: READ FPDMA QUEUED [48055.741398] ata3.00: cmd 60/20:08:38:15:03/01:00:18:00:00/40 tag 1 ncq 147456 in [48055.741400] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [48055.741402] ata3.00: status: { DRDY } [48055.741405] ata3: hard resetting link [48056.198746] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [48056.210514] ata3.00: configured for UDMA/133 [48056.210518] ata3.00: device reported invalid CHS sector 0 [48056.210523] ata3: EH complete I really don't understand what it means, but the "timeout", "hard resetting link" and "invalid CHS sector 0" look scary to me... Initial bootup messages for this device were: Mar 25 22:02:32 [kernel] [ 4.496102] ata3: SATA max UDMA/133 abar m2048@0xfbffc000 port 0xfbffc200 irq 34 Mar 25 22:02:32 [kernel] [ 8.519169] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 25 22:02:32 [kernel] [ 8.536681] ata3.00: ATA-8: SAMSUNG HD203WI, 1AN10002, max UDMA/133 Mar 25 22:02:32 [kernel] [ 8.548388] ata3.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Mar 25 22:02:32 [kernel] [ 8.566100] ata3.00: configured for UDMA/133 That disk is part of a md RAID5, but I was at work when this error happened so I didn't notice if the RAID repaired itself or whatever would happen in this case (I don't have mdadm monitoring configured yet). Right now all RAID disks are all up and healthy. I googled it but most of the results are pastebin snippets. I'm using kernel 2.6.33 and ahci driver for the SATA controllers. >From libata documentation in the section about timeout errors it says: "Most often this is due to an unrelated interrupt subsystem bug (try booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to deliver an interrupt when we were expecting one from the hardware." I really don't know the potential implications of disabling MSI or APIC, but in /proc/interrupts I do see AHCI related to both MSI and APIC rows. So at least I know they are active right now. Temperatures in my system are good, hddtemp says the drive in question is 21C degrees right now. Another possibility is that I need to increase voltage on the motherboard, since it is running 6 hdd's and 1 DVD-ROM. I'll have to research to see which voltage is related to this. (X58 motherboard) Thanks in advance if anyone has any knowledge about this, otherwise I go to trial-and-hopefully-no-error mode. :) Paul