From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1NsNX3-0007g2-IA for garchives@archives.gentoo.org; Thu, 18 Mar 2010 21:45:33 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 58D54E0BBE; Thu, 18 Mar 2010 21:45:06 +0000 (UTC) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by pigeon.gentoo.org (Postfix) with SMTP id DCE87E0BBE for ; Thu, 18 Mar 2010 21:45:05 +0000 (UTC) Received: (qmail invoked by alias); 18 Mar 2010 21:45:04 -0000 Received: from vol75-13-88-166-24-64.fbx.proxad.net (EHLO [192.168.7.10]) [88.166.24.64] by mail.gmx.net (mp014) with SMTP; 18 Mar 2010 22:45:04 +0100 X-Authenticated: #5388774 X-Provags-ID: V01U2FsdGVkX18KiWseP/wKIXhAWO5g5D01+Gmz/ivmjTcObiTU0L xGprMKBPGQTHj+ Message-ID: <4BA29EDF.3010001@gmx.net> Date: Thu, 18 Mar 2010 22:45:03 +0100 From: Carlos Hendson User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100313 Thunderbird/3.0.3 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: [gentoo-user] [HELP] Intermittent software RAID failures Content-Type: multipart/mixed; boundary="------------030307070605060005060306" X-Y-GMX-Trusted: 0 X-FuHaFi: 0.67000000000000004 X-Archives-Salt: 0d24c1cd-de06-4ddd-b539-7c138cd80582 X-Archives-Hash: 93727f3feaab3059204e8e4f525c5897 This is a multi-part message in MIME format. --------------030307070605060005060306 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello, I've got a Dell Inspiron 1720 laptop with dual 2.5" hard drives setup using software RAID1. I've had this computer for about a year and half and all's been working well. I've experienced intermittent software RAID errors like those found in the "softraid-fail.txt" attachment. Initially I suspected a kernel bug because it started around the same time I'd upgraded the kernel (around the 2.6.30 upgrade) but subsequent kernel upgrades haven't improved the situation. I've run smartctl --all and bablocks on both disks, but nothing is reported as faulty. I don't understand what is causing RAID to report these faults and would like some ideas as to how I can further diagnose the problem. Thanks in advance, Carlos --------------030307070605060005060306 Content-Type: text/plain; name="softraid-fail.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="softraid-fail.txt" Feb 28 15:14:16 pheonix kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen Feb 28 15:14:16 pheonix kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed Feb 28 15:14:16 pheonix kernel: ata3: SError: { PHYRdyChg } Feb 28 15:14:16 pheonix kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Feb 28 15:14:16 pheonix kernel: res 40/00:0c:97:74:25/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Feb 28 15:14:16 pheonix kernel: ata3.00: status: { DRDY } Feb 28 15:14:16 pheonix kernel: ata3: hard resetting link Feb 28 15:14:19 pheonix kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Feb 28 15:14:19 pheonix kernel: ata3.00: configured for UDMA/133 Feb 28 15:14:19 pheonix kernel: ata3: EH complete Feb 28 15:14:19 pheonix kernel: end_request: I/O error, dev sdb, sector 178062452 Feb 28 15:14:19 pheonix kernel: raid1: Disk failure on sdb1, disabling device. Feb 28 15:14:19 pheonix kernel: raid1: Operation continuing on 1 devices. Feb 28 15:14:19 pheonix kernel: md: recovery of RAID array md0 Feb 28 15:14:19 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Feb 28 15:14:19 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Feb 28 15:14:19 pheonix kernel: md: using 128k window, over a total of 178024192 blocks. Feb 28 15:14:19 pheonix kernel: md: resuming recovery of md0 from checkpoint. Feb 28 15:14:19 pheonix kernel: md: md0: recovery done. Feb 28 15:14:19 pheonix kernel: RAID1 conf printout: Feb 28 15:14:19 pheonix kernel: --- wd:1 rd:2 Feb 28 15:14:19 pheonix kernel: disk 0, wo:0, o:1, dev:sda8 Feb 28 15:14:19 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1 Feb 28 15:14:19 pheonix kernel: md: recovery of RAID array md0 Feb 28 15:14:19 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Feb 28 15:14:19 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Feb 28 15:14:19 pheonix kernel: md: using 128k window, over a total of 178024192 blocks. Feb 28 15:14:19 pheonix kernel: md: resuming recovery of md0 from checkpoint. Feb 28 15:14:19 pheonix kernel: md: md0: recovery done. Feb 28 15:14:20 pheonix kernel: RAID1 conf printout: Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2 Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8 Feb 28 15:14:20 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1 Feb 28 15:14:20 pheonix kernel: RAID1 conf printout: Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2 Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8 Feb 28 15:14:20 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1 Feb 28 15:14:20 pheonix kernel: RAID1 conf printout: Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2 Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8 Mar 12 19:38:06 pheonix kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen Mar 12 19:38:06 pheonix kernel: ata1.00: irq_stat 0x00400000, PHY RDY changed Mar 12 19:38:06 pheonix kernel: ata1: SError: { PHYRdyChg } Mar 12 19:38:06 pheonix kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 12 19:38:06 pheonix kernel: res 40/00:24:b6:fa:df/00:00:17:00:00/40 Emask 0x10 (ATA bus error) Mar 12 19:38:06 pheonix kernel: ata1.00: status: { DRDY } Mar 12 19:38:06 pheonix kernel: ata1: hard resetting link Mar 12 19:38:09 pheonix kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 12 19:38:09 pheonix kernel: ata1.00: configured for UDMA/133 Mar 12 19:38:09 pheonix kernel: ata1: EH complete Mar 12 19:38:09 pheonix kernel: end_request: I/O error, dev sda, sector 305244964 Mar 12 19:38:09 pheonix kernel: raid1: Disk failure on sda8, disabling device. Mar 12 19:38:09 pheonix kernel: raid1: Operation continuing on 1 devices. Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0 Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks. Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint. Mar 12 19:38:09 pheonix kernel: md: md0: recovery done. Mar 12 19:38:09 pheonix kernel: RAID1 conf printout: Mar 12 19:38:09 pheonix kernel: --- wd:1 rd:2 Mar 12 19:38:09 pheonix kernel: disk 0, wo:1, o:0, dev:sda8 Mar 12 19:38:09 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1 Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0 Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks. Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint. Mar 12 19:38:09 pheonix kernel: md: md0: recovery done. Mar 12 19:38:09 pheonix kernel: RAID1 conf printout: Mar 12 19:38:09 pheonix kernel: --- wd:1 rd:2 Mar 12 19:38:09 pheonix kernel: disk 0, wo:1, o:0, dev:sda8 Mar 12 19:38:09 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1 Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0 Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks. Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint. Mar 12 19:38:09 pheonix kernel: md: md0: recovery done. Mar 12 19:38:10 pheonix kernel: RAID1 conf printout: Mar 12 19:38:10 pheonix kernel: --- wd:1 rd:2 Mar 12 19:38:10 pheonix kernel: disk 0, wo:1, o:0, dev:sda8 Mar 12 19:38:10 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1 Mar 12 19:38:10 pheonix kernel: md: recovery of RAID array md0 Mar 18 21:57:33 pheonix kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen Mar 18 21:57:33 pheonix kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed Mar 18 21:57:33 pheonix kernel: ata3: SError: { PHYRdyChg } Mar 18 21:57:33 pheonix kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 18 21:57:33 pheonix kernel: res 40/00:24:bf:1c:1f/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Mar 18 21:57:33 pheonix kernel: ata3.00: status: { DRDY } Mar 18 21:57:33 pheonix kernel: ata3: hard resetting link Mar 18 21:57:37 pheonix kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 18 21:57:37 pheonix kernel: ata3.00: configured for UDMA/133 Mar 18 21:57:37 pheonix kernel: ata3: EH complete Mar 18 21:57:37 pheonix kernel: end_request: I/O error, dev sdb, sector 178116972 --------------030307070605060005060306--