From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1Nkx2X-0006SF-Ox for garchives@archives.gentoo.org; Fri, 26 Feb 2010 10:03:21 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id B1886E0971 for ; Fri, 26 Feb 2010 10:03:20 +0000 (UTC) Received: from mail.askja.de (mail.askja.de [83.137.103.136]) by pigeon.gentoo.org (Postfix) with ESMTP id A34E4E07ED for ; Fri, 26 Feb 2010 09:46:40 +0000 (UTC) Received: from static-87-79-89-40.netcologne.de ([87.79.89.40] helo=zone.wonkology.org) by mail.askja.de with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1NkwmN-00035C-4D for gentoo-user@lists.gentoo.org; Fri, 26 Feb 2010 10:46:39 +0100 Received: from localhost (localhost [127.0.0.1]) (uid 1000) by zone.wonkology.org with local; Fri, 26 Feb 2010 10:46:31 +0100 id 0001000C.4B879877.000028DE From: Alex Schuster To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] recovery from /var corruption? Date: Fri, 26 Feb 2010 10:46:23 +0100 User-Agent: KMail/1.13.0 (Linux/2.6.31-tuxonice_k8; KDE/4.4.0; i686; ; ) References: <5bdc1c8b1002251933s6a250b99v607c97e09f41d4fe@mail.gmail.com> In-Reply-To: <5bdc1c8b1002251933s6a250b99v607c97e09f41d4fe@mail.gmail.com> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201002261046.24994.wonko@wonkology.org> X-Archives-Salt: c81b38d2-acc6-4faf-844d-97ca202c4ea8 X-Archives-Hash: b9016d07ebf2de5ed5046c3340c61bc3 Mark Knecht writes: > Do I just watch the logs looking for problems? I have no way of > knowing right now whether this was a disk problem that's going to come > back, a 1 time deal due to power, or something else entirely. > > As these cheap machines that don't use RAID what's the right way to > go? emerge -e @world and then wait for the next event? Do nothing and > wait? Emerge smartmontools, then: smartctl -h /dev/sda # get overview of what the drive thinks about itself smartctl -t short /dev/sda # start short self test Wait smartctl -l selftest /dev/sda # see results smartctl -t long /dev/sda # start long self test Wait a lot longer smartctl -l selftest /dev/sda # see results You can continue working in the meanwhile, there will be no performance impact. You will see something like this in the log: === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2275 - # 2 Extended offline Completed without error 00% 2270 - # 3 Extended offline Completed without error 00% 1799 - # 4 Extended offline Completed without error 00% 197 - # 5 Extended offline Completed without error 00% 26 - I you have a '-' in the right column, the disk has found no errors. If there is a number, than it's the position of the first error. There's also badblocks, this will check every block and output the bad ones: badblocks -sv /dev/sda badblocks -svn /dev/sda will do a read-write test. In case of a bad block, the drive should exchange it with a spare one. Maybe this happens already in read-only mode, I am not sure. Also watch for errors in syslog or via dmesg, there should be some when bad blocks are being accessed. Wonko