From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1Nl1wZ-0006kT-H2 for garchives@archives.gentoo.org; Fri, 26 Feb 2010 15:17:31 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 40001E08BF; Fri, 26 Feb 2010 15:17:13 +0000 (UTC) Received: from mail-pw0-f53.google.com (mail-pw0-f53.google.com [209.85.160.53]) by pigeon.gentoo.org (Postfix) with ESMTP id 02B7EE08BF for ; Fri, 26 Feb 2010 15:17:12 +0000 (UTC) Received: by pwi2 with SMTP id 2so85173pwi.40 for ; Fri, 26 Feb 2010 07:17:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=BzW3XeCCJFiraf3pw8+erCaC4t0k3dAtvR8knwnoayY=; b=LiT3gx4NyDAmfodhX4gRJpba+1VFJG3K+vhQ8rqEU0r2d3VypDArVGpa90v+dxSdhX SCuRM5h2VLmqyESslIQwKPyjhbP3Jdn15WWbIDmGi1yJeX2jxcIqu0QiU+3TJE/GtHXZ spGJ2ief5JcKDM4+5i+/PrV44bre3ZF/mvJJ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ufGxrkMCThsZm6f1OCP3Lq5Uow7Hue0jCnA1MJ2VvPmzSFDP3L8fundooM4NNBCGsi NrIbr+0xl/b3wdWtHY5sCoRna+eUNZ/ywspwf9sLqMANr0bFLtpPxiNaTY98Qd8+gC88 2UxUX3+W4gu+E6fhnKoawJyUSHERJCUIxTzsM= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.143.27.35 with SMTP id e35mr272983wfj.283.1267197432468; Fri, 26 Feb 2010 07:17:12 -0800 (PST) In-Reply-To: <201002261046.24994.wonko@wonkology.org> References: <5bdc1c8b1002251933s6a250b99v607c97e09f41d4fe@mail.gmail.com> <201002261046.24994.wonko@wonkology.org> Date: Fri, 26 Feb 2010 07:17:12 -0800 Message-ID: <5bdc1c8b1002260717qd783a59k6f78a57ed384c0a9@mail.gmail.com> Subject: Re: [gentoo-user] recovery from /var corruption? From: Mark Knecht To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Archives-Salt: fcfa3ee4-8cef-47c5-b215-52f5e0ae568b X-Archives-Hash: cb42298a8057f43d5f8b7fa9990a2f7d On Fri, Feb 26, 2010 at 1:46 AM, Alex Schuster wrote: > Mark Knecht writes: > >> Do I just watch the logs looking for problems? I have no way of >> knowing right now whether this was a disk problem that's going to come >> back, a 1 time deal due to power, or something else entirely. >> >> As these cheap machines that don't use RAID what's the right way to >> go? emerge -e @world and then wait for the next event? Do nothing and >> wait? > > Emerge smartmontools, then: > > smartctl -h /dev/sda =C2=A0# get overview of what the drive thinks about = itself > > smartctl -t short /dev/sda =C2=A0 =C2=A0 # start short self test > Wait > smartctl -l selftest /dev/sda =C2=A0# see results > > smartctl -t long /dev/sda =C2=A0 =C2=A0 =C2=A0# start long self test > Wait a lot longer > smartctl -l selftest /dev/sda =C2=A0# see results > > You can continue working in the meanwhile, there will be no performance > impact. You will see something like this in the log: > > =3D=3D=3D START OF READ SMART DATA SECTION =3D=3D=3D > SMART Self-test log structure revision number 1 > Num =C2=A0Test_Description =C2=A0 Status =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0Remaining =C2=A0LifeTime(hours) > LBA_of_first_error > # 1 =C2=A0Short offline =C2=A0 =C2=A0 =C2=A0Completed without error =C2= =A0 00% =C2=A0 =C2=A02275 =C2=A0 =C2=A0 =C2=A0 - > # 2 =C2=A0Extended offline =C2=A0 Completed without error =C2=A0 00% =C2= =A0 =C2=A02270 =C2=A0 =C2=A0 =C2=A0 - > # 3 =C2=A0Extended offline =C2=A0 Completed without error =C2=A0 00% =C2= =A0 =C2=A01799 =C2=A0 =C2=A0 =C2=A0 - > # 4 =C2=A0Extended offline =C2=A0 Completed without error =C2=A0 00% =C2= =A0 =C2=A0 197 =C2=A0 =C2=A0 =C2=A0 - > # 5 =C2=A0Extended offline =C2=A0 Completed without error =C2=A0 00% =C2= =A0 =C2=A0 =C2=A026 =C2=A0 =C2=A0 =C2=A0 - > > I you have a '-' in the right column, the disk has found no errors. If > there is a number, than it's the position of the first error. > > There's also badblocks, this will check every block and output the bad > ones: badblocks -sv /dev/sda > > badblocks -svn /dev/sda will do a read-write test. In case of a bad block= , > the drive should exchange it with a spare one. Maybe this happens already > in read-only mode, I am not sure. > > Also watch for errors in syslog or via dmesg, there should be some when > bad blocks are being accessed. > > =C2=A0 =C2=A0 =C2=A0 =C2=A0Wonko > > Hi Wonko, Yes, I do use smartctl on some other machines although I'm not very good about it and your write-up is helpful so thanks for that. My wife's machines is older and and I don't think SMART is supported on her drive. Note the lack of a * on the SMART line in hdparm -I: dragonfly ~ # hdparm -I /dev/hda /dev/hda: ATA device, with non-removable media Model Number: WDC WD1600BB-00FTA0 Serial Number: WD-WMAES2091586 Firmware Revision: 15.05R15 Standards: Supported: 6 5 4 Likely used: 6 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 312581808 Logical/Physical Sector size: 512 bytes device size with M =3D 1024*1024: 152627 MBytes device size with M =3D 1000*1000: 160041 MBytes (160 GB) cache/buffer size =3D 2048 KBytes (type=3DDualPortCache) Capabilities: LBA, IORDY(can be disabled) Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max =3D 16 Current =3D 16 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 Cycle time: min=3D120ns recommended=3D120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=3D120ns IORDY flow control=3D120ns Commands/features: Enabled Supported: SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase HW reset results: CBLID- above Vih Device num =3D 0 determined by CSEL Checksum: correct dragonfly ~ # dragonfly ~ # smartctl -H /dev/hda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ SMART Disabled. Use option -s with argument 'on' to enable it. dragonfly ~ # smartctl -s on /dev/hda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ =3D=3D=3D START OF ENABLE/DISABLE COMMANDS SECTION =3D=3D=3D Error SMART Enable failed: Input/output error Smartctl: SMART Enable Failed. A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. dragonfly ~ # I've not tried the -T permissive options. I've never used badblocks as it seems I should only do that off-line. This might be a good time to boot with a CD and try it out. Maybe I should just get a new drive that supports SMART? - Mark