* [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" @ 2010-03-03 12:24 Stroller 2010-03-03 12:42 ` Willie Wong 2010-03-03 14:00 ` Mark Knecht 0 siblings, 2 replies; 19+ messages in thread From: Stroller @ 2010-03-03 12:24 UTC (permalink / raw To: gentoo-user There seem to have been a few people posting with filesystem corruption in the last week or two. It seems to be my turn, so I hope it isn't contagious. The cause here is quite clear - whilst rummaging in the server cupboard yesterday, power to the machine was accidentally disconnected. I have booted with a live CD & run `reiserfsck --fix-fixable` on the filesystem, but nevertheless when I attempt to boot the system I get a "failed to open the device... no such file or directory" message, followed by another error as per subject line. However, you will see from this screenshot (taken with an IP KVM) that the filesystem does indeed seem to have been mounted successfully, if read-only: http://linux.stroller.uk.eu.org/fs-corruption.png All I did here was log in with the root password. When I boot with a live CD I can mount, read & write the filesystem: root@sysresccd /root % mount -v -L root /mnt/gentoo mount: you didn't specify a filesystem type for /dev/sda3 I will try type reiserfs /dev/sda3 on /mnt/gentoo type reiserfs (rw) root@sysresccd /root % ls /mnt/gentoo bin boot dev etc home lib mnt opt proc root sbin sys tmp usr var root@sysresccd /root % touch /mnt/gentoo/foo root@sysresccd /root % echo foobar >> /mnt/gentoo/foo root@sysresccd /root % ls -lh !!:$ ls -lh /mnt/gentoo/foo -rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo root@sysresccd /root % cat !!:$ cat /mnt/gentoo/foo foobar root@sysresccd /root % rm !!:$ rm /mnt/gentoo/foo rm: remove regular file `/mnt/gentoo/foo'? y root@sysresccd /root % All the important system stuff on this PC is on a single partition. I have two other drives attached at /mnt/space & /mnt/morespace - they are XFS and I have run xfs_repair on both of them, which completes quickly indicating no problems. I'm not really sure how to proceed next. I feel the problem is indeed on this reiserfs filesystem, the root filesystem with the label "root". I can't help thinking that the problem is not that the system "failed to open the device", but instead maybe that there's an important system file missing that means the init script (or whatever responsible for mounting the fiesystem) is not properly returning 0. Does this seem possible? Maybe the reiserfs handler for mount is somehow broken (performing the mount, but not returning 0, or perhaps broken in such as was it is able to mount read-only but not read-write). I am tempted to chroot into the system and re-emerge system & baselayout. If I'm correct in this above guess then re-emerging the correct file will fix the problem. Right? `reiserfsck --help` shows some other options besides the simple --fix- fixable - I assume the "expert option" of --scan-whole-partition is unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I safely run these? Am I advised to run these? Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller @ 2010-03-03 12:42 ` Willie Wong 2010-03-03 13:28 ` Stroller 2010-03-03 14:00 ` Mark Knecht 1 sibling, 1 reply; 19+ messages in thread From: Willie Wong @ 2010-03-03 12:42 UTC (permalink / raw To: gentoo-user On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote: > There seem to have been a few people posting with filesystem > corruption in the last week or two. It seems to be my turn, so I hope > it isn't contagious. The cause here is quite clear - whilst rummaging > in the server cupboard yesterday, power to the machine was > accidentally disconnected. > > I have booted with a live CD & run `reiserfsck --fix-fixable` on the > filesystem, but nevertheless when I attempt to boot the system I get a > "failed to open the device... no such file or directory" message, > followed by another error as per subject line. from the output it looks like you are mounting by label? What if you edit fstab to point to the device name /dev/hd?? instead of LABEL=root? Check the filesystem label to make sure it is ok? Cheers, W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 12:42 ` Willie Wong @ 2010-03-03 13:28 ` Stroller 2010-03-03 14:01 ` Mick 2010-03-03 15:18 ` Willie Wong 0 siblings, 2 replies; 19+ messages in thread From: Stroller @ 2010-03-03 13:28 UTC (permalink / raw To: gentoo-user On 3 Mar 2010, at 12:42, Willie Wong wrote: > On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote: >> There seem to have been a few people posting with filesystem >> corruption in the last week or two. It seems to be my turn, so I hope >> it isn't contagious. The cause here is quite clear - whilst rummaging >> in the server cupboard yesterday, power to the machine was >> accidentally disconnected. >> >> I have booted with a live CD & run `reiserfsck --fix-fixable` on the >> filesystem, but nevertheless when I attempt to boot the system I >> get a >> "failed to open the device... no such file or directory" message, >> followed by another error as per subject line. > > from the output it looks like you are mounting by label? What if you > edit fstab to point to the device name /dev/hd?? instead of > LABEL=root? Check the filesystem label to make sure it is ok? Many thanks for this suggestion, however following it makes no difference, except in the trivia that it says "failed to open the device '/dev/hda3': No such file or directory" (instead of "LABEL=..."). I also tried editing grub to point to /dev/sda3 (although admittedly with the LABEL= entry in /etc/fstab) but that makes no difference. I have never tried (intentionally) reconfiguring this kernel to use /dev/ sdX instead of /dev/hdX and I'm pretty sure it's booted using the current kernel & configuration in the past. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 13:28 ` Stroller @ 2010-03-03 14:01 ` Mick 2010-03-03 16:19 ` Stroller 2010-03-03 15:18 ` Willie Wong 1 sibling, 1 reply; 19+ messages in thread From: Mick @ 2010-03-03 14:01 UTC (permalink / raw To: gentoo-user On 3 March 2010 13:28, Stroller <stroller@stellar.eclipse.co.uk> wrote: > > On 3 Mar 2010, at 12:42, Willie Wong wrote: > >> On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote: >>> >>> There seem to have been a few people posting with filesystem >>> corruption in the last week or two. It seems to be my turn, so I hope >>> it isn't contagious. The cause here is quite clear - whilst rummaging >>> in the server cupboard yesterday, power to the machine was >>> accidentally disconnected. >>> >>> I have booted with a live CD & run `reiserfsck --fix-fixable` on the >>> filesystem, but nevertheless when I attempt to boot the system I get a >>> "failed to open the device... no such file or directory" message, >>> followed by another error as per subject line. >> >> from the output it looks like you are mounting by label? What if you >> edit fstab to point to the device name /dev/hd?? instead of >> LABEL=root? Check the filesystem label to make sure it is ok? > > Many thanks for this suggestion, however following it makes no difference, > except in the trivia that it says "failed to open the device '/dev/hda3': No > such file or directory" (instead of "LABEL=..."). > > I also tried editing grub to point to /dev/sda3 (although admittedly with > the LABEL= entry in /etc/fstab) but that makes no difference. I have never > tried (intentionally) reconfiguring this kernel to use /dev/sdX instead of > /dev/hdX and I'm pretty sure it's booted using the current kernel & > configuration in the past. In my experience reiserfs is a very stable fs. I had a dodgy memory module once which I put up with for more than 9 months. The machine would lock up hard on a daily basis and the only way to get it going again would be to pull the plug. That would happen at random, midstream emerge --sync, package updates, updatedb, etc. It survived through hundreds of crashes by fsck at the next boot. Once or twice things went hairy and I would get a message similar to yours. On these rare occasions I booted with a LiveCD and with the partitions unmounted I ran --check, then --fix-fixable and finally --rebuild-tree. You may want to use an external drive with dd to image the current / partition and do all your recovery work on that. If you don't care too much about the risk of catastrophic failure then just run --rebuild-tree with a LiveCD and see what you get. Good luck. -- Regards, Mick ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 14:01 ` Mick @ 2010-03-03 16:19 ` Stroller 0 siblings, 0 replies; 19+ messages in thread From: Stroller @ 2010-03-03 16:19 UTC (permalink / raw To: gentoo-user On 3 Mar 2010, at 14:01, Mick wrote: > ... Once or twice > things went hairy and I would get a message similar to yours. On > these rare occasions I booted with a LiveCD and with the partitions > unmounted I ran --check, then --fix-fixable and finally > --rebuild-tree. You may want to use an external drive with dd to > image the current / partition and do all your recovery work on that. > If you don't care too much about the risk of catastrophic failure then > just run --rebuild-tree with a LiveCD and see what you get. That's a great idea. I'm (now) religious about backing up my customers' computers, often using dd like this, but for some reason it hadn't yet occurred to me today. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 13:28 ` Stroller 2010-03-03 14:01 ` Mick @ 2010-03-03 15:18 ` Willie Wong 2010-03-03 16:16 ` Stroller 1 sibling, 1 reply; 19+ messages in thread From: Willie Wong @ 2010-03-03 15:18 UTC (permalink / raw To: gentoo-user On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote: > >from the output it looks like you are mounting by label? What if you > >edit fstab to point to the device name /dev/hd?? instead of > >LABEL=root? Check the filesystem label to make sure it is ok? > > Many thanks for this suggestion, however following it makes no > difference, except in the trivia that it says "failed to open the > device '/dev/hda3': No such file or directory" (instead of "LABEL=..."). If you try to boot, after the failure to check rootfs, it should dump you to a recovery console, what happens if you issue ls /dev ? Also check dmesg? Cheers, W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 15:18 ` Willie Wong @ 2010-03-03 16:16 ` Stroller 2010-03-03 17:14 ` Willie Wong 2010-03-03 17:53 ` Neil Bothwick 0 siblings, 2 replies; 19+ messages in thread From: Stroller @ 2010-03-03 16:16 UTC (permalink / raw To: gentoo-user Many thanks for your help, Willie! On 3 Mar 2010, at 15:18, Willie Wong wrote: > On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote: >>> from the output it looks like you are mounting by label? What if you >>> edit fstab to point to the device name /dev/hd?? instead of >>> LABEL=root? Check the filesystem label to make sure it is ok? >> >> Many thanks for this suggestion, however following it makes no >> difference, except in the trivia that it says "failed to open the >> device '/dev/hda3': No such file or directory" (instead of >> "LABEL=..."). > > If you try to boot, after the failure to check rootfs, it should dump > you to a recovery console, what happens if you issue ls /dev ? About 13 items. Is this unlucky? http://linux.stroller.uk.eu.org/fs-corruption-dev.png > Also check dmesg? I don't think this gives any clues: http://linux.stroller.uk.eu.org/fs-corruption-dmesg.png Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 16:16 ` Stroller @ 2010-03-03 17:14 ` Willie Wong 2010-03-04 9:55 ` Stroller 2010-03-03 17:53 ` Neil Bothwick 1 sibling, 1 reply; 19+ messages in thread From: Willie Wong @ 2010-03-03 17:14 UTC (permalink / raw To: gentoo-user On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote: > > Many thanks for your help, Willie! > > About 13 items. Is this unlucky? > > http://linux.stroller.uk.eu.org/fs-corruption-dev.png > Okay, something is screwed up with udev. Is udev started? Is it upgraded recently? Any config files in /etc that needs updating? Is udev directory in /etc okay? At this point I don't think your problem is necessarily with the harddrive itself: I think we now know why fsck cannot open file or device. Check /var/log/emerge.log or the portage elogs. Did you upgrade baselayout recently? Good luck, W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 17:14 ` Willie Wong @ 2010-03-04 9:55 ` Stroller 2010-03-04 10:23 ` Willie Wong 0 siblings, 1 reply; 19+ messages in thread From: Stroller @ 2010-03-04 9:55 UTC (permalink / raw To: gentoo-user On 3 Mar 2010, at 17:14, Willie Wong wrote: > On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote: >> >> Many thanks for your help, Willie! >> >> About 13 items. Is this unlucky? >> >> http://linux.stroller.uk.eu.org/fs-corruption-dev.png >> > > Okay, something is screwed up with udev. Is udev started? Ah! Ok... this is shown as an error when before the "failed to open the device" message. http://linux.stroller.uk.eu.org/fs-corruption.png Startin udevd... error getting signalfd > Is it upgraded recently? chroot # genlop udev | tail -n 4 Tue May 26 17:26:46 2009 >>> sys-fs/udev-124-r2 Thu Jun 25 06:05:37 2009 >>> sys-fs/udev-141 Mon Dec 21 04:57:20 2009 >>> sys-fs/udev-146-r1 chroot # The system last booted ok in October and, until yesterday, has been up since then. So this December update may well be to blame. > Any config files in /etc that needs updating? Nope. chroot # etc-update Scanning Configuration files... Exiting: Nothing left to do; exiting. :) chroot # > Is udev directory in /etc okay? All looks ok to me: chroot # ls -l /etc/udev/* -rw-r--r-- 1 root root 277 2009-12-21 04:56 /etc/udev/udev.conf /etc/udev/rules.d: total 24 -rw-r--r-- 1 root root 55 2009-10-11 09:38 30-svgalib.rules -rw-r--r-- 1 root root 4311 2009-09-11 23:02 70-nut-usbups.rules -rw-r--r-- 1 root root 1606 2008-12-22 12:33 70-persistent-cd.rules -rw-r--r-- 1 root root 440 2009-12-21 04:57 70-persistent-net.rules -rw-r--r-- 1 root root 28 2009-07-22 03:04 99-fuse.rules chroot # > At this point I don't think your problem is necessarily with the > harddrive itself: I think we now know why fsck cannot open file or > device. > > Check /var/log/emerge.log or the portage elogs. Did you upgrade > baselayout recently? Yes, upgraded to baselayout-1.12.13 at the same time I udev was upgraded: chroot # genlop baselayout | tail -n 3 Thu Feb 12 09:36:25 2009 >>> sys-apps/baselayout-1.12.11.1 Mon Dec 21 01:25:32 2009 >>> sys-apps/baselayout-1.12.13 chroot # What I'm inclined to do now, having taken an image of the drive (I used ddrescue, which showed no errors) is try `reiserfsck --rebuild- tree` on the partition, "just to be sure". I get the impression this is probably quite safe, but I have a backup anyway. I, too, suspect that won't fix the problem, so after that I guess I'll try remerging udev & baselayout. I suspect remerging --empty may be necessary, but any other suggestions welcome. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-04 9:55 ` Stroller @ 2010-03-04 10:23 ` Willie Wong 2010-03-04 11:42 ` Stroller 2010-03-09 10:57 ` Stroller 0 siblings, 2 replies; 19+ messages in thread From: Willie Wong @ 2010-03-04 10:23 UTC (permalink / raw To: gentoo-user On Thu, Mar 04, 2010 at 09:55:44AM +0000, Stroller wrote: > > On 3 Mar 2010, at 17:14, Willie Wong wrote: > > >On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote: > >> > >>Many thanks for your help, Willie! > >> > >>About 13 items. Is this unlucky? > >> > >>http://linux.stroller.uk.eu.org/fs-corruption-dev.png > >> > > > >Okay, something is screwed up with udev. Is udev started? > > Ah! > > Ok... this is shown as an error when before the "failed to open the > device" message. > > http://linux.stroller.uk.eu.org/fs-corruption.png > > Startin udevd... error getting signalfd What kernel are you running? You need at least 2.6.22 to have signalfd support, and at least 2.6.27 to have reliable signalfd support. If this is not satisfied, please either downgrade udev or upgrade your kernel. Cheers, W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-04 10:23 ` Willie Wong @ 2010-03-04 11:42 ` Stroller 2010-03-09 10:57 ` Stroller 1 sibling, 0 replies; 19+ messages in thread From: Stroller @ 2010-03-04 11:42 UTC (permalink / raw To: gentoo-user On 4 Mar 2010, at 10:23, Willie Wong wrote: > On Thu, Mar 04, 2010 at 09:55:44AM +0000, Stroller wrote: >> >> On 3 Mar 2010, at 17:14, Willie Wong wrote: >> >>> On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote: >>>> >>>> Many thanks for your help, Willie! >>>> >>>> About 13 items. Is this unlucky? >>>> >>>> http://linux.stroller.uk.eu.org/fs-corruption-dev.png >>>> >>> >>> Okay, something is screwed up with udev. Is udev started? >> >> Ah! >> >> Ok... this is shown as an error when before the "failed to open the >> device" message. >> >> http://linux.stroller.uk.eu.org/fs-corruption.png >> >> Startin udevd... error getting signalfd > > What kernel are you running? > > You need at least 2.6.22 to have signalfd support, and at least 2.6.27 > to have reliable signalfd support. If this is not satisfied, please > either downgrade udev or upgrade your kernel. I'm surprised to find this system is still running 2.6.25. I'll upgrade now. Many thanks for your patient help, Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-04 10:23 ` Willie Wong 2010-03-04 11:42 ` Stroller @ 2010-03-09 10:57 ` Stroller 1 sibling, 0 replies; 19+ messages in thread From: Stroller @ 2010-03-09 10:57 UTC (permalink / raw To: gentoo-user On 4 Mar 2010, at 10:23, Willie Wong wrote: >>> ... >>> Okay, something is screwed up with udev. Is udev started? >> >> Ah! >> >> Ok... this is shown as an error when before the "failed to open the >> device" message. >> >> http://linux.stroller.uk.eu.org/fs-corruption.png >> >> Startin udevd... error getting signalfd > > What kernel are you running? > > You need at least 2.6.22 to have signalfd support, and at least 2.6.27 > to have reliable signalfd support. If this is not satisfied, please > either downgrade udev or upgrade your kernel. Many thanks. Upgrading to 2.6.31 resolved the problem, although it did take a couple of extra kernel compiles & reboots, allowing for the / dev/hdX -> /dev/sdX changes & compiling drivers for the EIDE controller. Many thanks to everyone who posted. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 16:16 ` Stroller 2010-03-03 17:14 ` Willie Wong @ 2010-03-03 17:53 ` Neil Bothwick 1 sibling, 0 replies; 19+ messages in thread From: Neil Bothwick @ 2010-03-03 17:53 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 548 bytes --] On Wed, 3 Mar 2010 16:16:50 +0000, Stroller wrote: > > If you try to boot, after the failure to check rootfs, it should dump > > you to a recovery console, what happens if you issue ls /dev ? > > About 13 items. Is this unlucky? > > http://linux.stroller.uk.eu.org/fs-corruption-dev.png Is that the same as you see in the dev directory of your root filesystem when you boot from a live CD? It looks like udev may not be running for some reason. -- Neil Bothwick Everyone has a photographic memory. Some don't have film. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller 2010-03-03 12:42 ` Willie Wong @ 2010-03-03 14:00 ` Mark Knecht 2010-03-03 14:26 ` Stroller 2010-03-03 15:29 ` [gentoo-user] " Harry Putnam 1 sibling, 2 replies; 19+ messages in thread From: Mark Knecht @ 2010-03-03 14:00 UTC (permalink / raw To: gentoo-user On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote: > There seem to have been a few people posting with filesystem corruption in > the last week or two. It seems to be my turn, so I hope it isn't contagious. > The cause here is quite clear - whilst rummaging in the server cupboard > yesterday, power to the machine was accidentally disconnected. > > I have booted with a live CD & run `reiserfsck --fix-fixable` on the > filesystem, but nevertheless when I attempt to boot the system I get a > "failed to open the device... no such file or directory" message, followed > by another error as per subject line. > > However, you will see from this screenshot (taken with an IP KVM) that the > filesystem does indeed seem to have been mounted successfully, if read-only: > > http://linux.stroller.uk.eu.org/fs-corruption.png > > All I did here was log in with the root password. > > > When I boot with a live CD I can mount, read & write the filesystem: > > root@sysresccd /root % mount -v -L root /mnt/gentoo > mount: you didn't specify a filesystem type for /dev/sda3 > I will try type reiserfs > /dev/sda3 on /mnt/gentoo type reiserfs (rw) > root@sysresccd /root % ls /mnt/gentoo > bin boot dev etc home lib mnt opt proc root sbin sys tmp usr > var > root@sysresccd /root % touch /mnt/gentoo/foo > root@sysresccd /root % echo foobar >> /mnt/gentoo/foo > root@sysresccd /root % ls -lh !!:$ > ls -lh /mnt/gentoo/foo > -rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo > root@sysresccd /root % cat !!:$ > cat /mnt/gentoo/foo > foobar > root@sysresccd /root % rm !!:$ > rm /mnt/gentoo/foo > rm: remove regular file `/mnt/gentoo/foo'? y > root@sysresccd /root % > > All the important system stuff on this PC is on a single partition. I have > two other drives attached at /mnt/space & /mnt/morespace - they are XFS and > I have run xfs_repair on both of them, which completes quickly indicating no > problems. > > I'm not really sure how to proceed next. I feel the problem is indeed on > this reiserfs filesystem, the root filesystem with the label "root". I can't > help thinking that the problem is not that the system "failed to open the > device", but instead maybe that there's an important system file missing > that means the init script (or whatever responsible for mounting the > fiesystem) is not properly returning 0. Does this seem possible? Maybe the > reiserfs handler for mount is somehow broken (performing the mount, but not > returning 0, or perhaps broken in such as was it is able to mount read-only > but not read-write). > > I am tempted to chroot into the system and re-emerge system & baselayout. If > I'm correct in this above guess then re-emerging the correct file will fix > the problem. Right? > > `reiserfsck --help` shows some other options besides the simple > --fix-fixable - I assume the "expert option" of --scan-whole-partition is > unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I safely run > these? Am I advised to run these? > > Stroller. Hi Stroller, Sorry for your problems. I've had a rash of machine problems over the last 6 weeks. No fun. I feel for you. In my most recent case what looked like a simple disk corruption problem was really a prelude to the drive just plain going bad. Have you tried smartctl to see what it says about the drive at this point? It would be even more frustrating to chroot in, do all the work, think you had it fixed and then the underlying foundation of your house crumbles beneath you 3 weeks from now. Good luck, Mark ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 14:00 ` Mark Knecht @ 2010-03-03 14:26 ` Stroller 2010-03-03 15:26 ` Mark Knecht 2010-03-03 15:28 ` Willie Wong 2010-03-03 15:29 ` [gentoo-user] " Harry Putnam 1 sibling, 2 replies; 19+ messages in thread From: Stroller @ 2010-03-03 14:26 UTC (permalink / raw To: gentoo-user On 3 Mar 2010, at 14:00, Mark Knecht wrote: > On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk > > wrote: >> There seem to have been a few people posting with filesystem >> corruption in >> the last week or two. It seems to be my turn, so I hope it isn't >> contagious. >> The cause here is quite clear - whilst rummaging in the server >> cupboard >> yesterday, power to the machine was accidentally disconnected. > ... > Sorry for your problems. I've had a rash of machine problems over > the last 6 weeks. No fun. I feel for you. > > In my most recent case what looked like a simple disk corruption > problem was really a prelude to the drive just plain going bad. Have > you tried smartctl to see what it says about the drive at this point? > > It would be even more frustrating to chroot in, do all the work, > think you had it fixed and then the underlying foundation of your > house crumbles beneath you 3 weeks from now. I don't think this is a problem. I would love to know what others think of the `smartctl` output: root@sysresccd /root % smartctl -H /dev/sda smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED Please note the following marginal Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 9 Power_On_Seconds 0x0012 001 001 020 Old_age Always FAILING_NOW 44803h+12m+16s root@sysresccd /root % smartctl -i /dev/sda smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Fujitsu MPA..MPG series Device Model: FUJITSU MPF3204AT Serial Number: 05030567 Firmware Version: 0028 User Capacity: 20,496,236,544 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 5 ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 Local Time is: Wed Mar 3 14:14:31 2010 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled root@sysresccd /root % This looks to me like smartctl is going "OMG! What an ancient drive!" - it's a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365), it's seen 5 years of active use - and that's the "marginal attribute" referred to. Like I said, the power plug was accidentally pulled on this drive, so I'm inclined to attribute the corruption only to that, not to the drive actually failing. The drive is in a computer that has rarely been turned off in the last couple of years, and is also in a warm environment, conditions which are ideal. I appreciate the latter seems unintuitive, but in fact studies have showed that drives in somewhat warm environments last longer than those that are cooled. That it passes the "SMART overall-health self-assessment test" suggests to me that it is chugging away quite happily. I would have dismissed your concerns were it not for the capitalised "FAILING_NOW" in the output. Like I say, I think this is just smartctl declaring "OMG! this drive is old!", but I open this matter to the list for discussion (should you wish). I think I'm actually nearly ready to migrate off this system. The power was actually pulled as I installed 3 new (to me) rackmount machines in the server cupboard - the plan is to have identical machines running RAID, so that in the case of ANY problems I have spares available. I have take nightly backups of the important data on this machine, however I'd prefer it to run just a couple or a few weeks longer to allow me to migrate at my own leisure. Stroller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 14:26 ` Stroller @ 2010-03-03 15:26 ` Mark Knecht 2010-03-03 15:28 ` Willie Wong 1 sibling, 0 replies; 19+ messages in thread From: Mark Knecht @ 2010-03-03 15:26 UTC (permalink / raw To: gentoo-user On Wed, Mar 3, 2010 at 6:26 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote: > > On 3 Mar 2010, at 14:00, Mark Knecht wrote: >> >> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk> >> wrote: >>> >>> There seem to have been a few people posting with filesystem corruption >>> in >>> the last week or two. It seems to be my turn, so I hope it isn't >>> contagious. >>> The cause here is quite clear - whilst rummaging in the server cupboard >>> yesterday, power to the machine was accidentally disconnected. >> >> ... >> Sorry for your problems. I've had a rash of machine problems over >> the last 6 weeks. No fun. I feel for you. >> >> In my most recent case what looked like a simple disk corruption >> problem was really a prelude to the drive just plain going bad. Have >> you tried smartctl to see what it says about the drive at this point? >> >> It would be even more frustrating to chroot in, do all the work, >> think you had it fixed and then the underlying foundation of your >> house crumbles beneath you 3 weeks from now. > > I don't think this is a problem. I would love to know what others think of > the `smartctl` output: > > > root@sysresccd /root % smartctl -H /dev/sda > smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > Please note the following marginal Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 9 Power_On_Seconds 0x0012 001 001 020 Old_age Always > FAILING_NOW 44803h+12m+16s > > root@sysresccd /root % smartctl -i /dev/sda > smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Model Family: Fujitsu MPA..MPG series > Device Model: FUJITSU MPF3204AT > Serial Number: 05030567 > Firmware Version: 0028 > User Capacity: 20,496,236,544 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 5 > ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1 > Local Time is: Wed Mar 3 14:14:31 2010 UTC > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > root@sysresccd /root % > > > This looks to me like smartctl is going "OMG! What an ancient drive!" - it's > a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365), > it's seen 5 years of active use - and that's the "marginal attribute" > referred to. > > Like I said, the power plug was accidentally pulled on this drive, so I'm > inclined to attribute the corruption only to that, not to the drive actually > failing. > > The drive is in a computer that has rarely been turned off in the last > couple of years, and is also in a warm environment, conditions which are > ideal. I appreciate the latter seems unintuitive, but in fact studies have > showed that drives in somewhat warm environments last longer than those that > are cooled. > > That it passes the "SMART overall-health self-assessment test" suggests to > me that it is chugging away quite happily. > > I would have dismissed your concerns were it not for the capitalised > "FAILING_NOW" in the output. Like I say, I think this is just smartctl > declaring "OMG! this drive is old!", but I open this matter to the list for > discussion (should you wish). > > I think I'm actually nearly ready to migrate off this system. The power was > actually pulled as I installed 3 new (to me) rackmount machines in the > server cupboard - the plan is to have identical machines running RAID, so > that in the case of ANY problems I have spares available. I have take > nightly backups of the important data on this machine, however I'd prefer it > to run just a couple or a few weeks longer to allow me to migrate at my own > leisure. > > Stroller. I've had two machines go bad due to hard drive problems in the last 6 weeks. One drive was 4.5 years old, the other 6 years old. I have no experience with smart. I'm just learning about it. However it is generated by the microcontroller in the hard drive as per the view of the drive manufacturer so if the drive is telling you it's failing then... My 4.5 year failure actually stopped producing smart output somewhere along the way before it failed. The 6 year drive I wasn't using smart at the time so I had no data from it but it was in an environment where the UPS went through a lot of abuse. I sounds like you have good backups so just make sure they are good and do what you want. - Mark ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 14:26 ` Stroller 2010-03-03 15:26 ` Mark Knecht @ 2010-03-03 15:28 ` Willie Wong 1 sibling, 0 replies; 19+ messages in thread From: Willie Wong @ 2010-03-03 15:28 UTC (permalink / raw To: gentoo-user On Wed, Mar 03, 2010 at 02:26:46PM +0000, Stroller wrote: > I don't think this is a problem. I would love to know what others > think of the `smartctl` output: > > > root@sysresccd /root % smartctl -H /dev/sda > smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce > Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > Please note the following marginal Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 9 Power_On_Seconds 0x0012 001 001 020 Old_age > Always FAILING_NOW 44803h+12m+16s You can always run the smart long-test to double check. The FAILING_NOW just indicates that the normalised value falls below the threshold. For Power_On_Seconds, this usually just indicates that your are way pass the warranty. If you really care about your data, swap it out now or make frequent backups. Otherwise I don't see the harm of keeping it until it actually dies. Cheers, W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
* [gentoo-user] Re: Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 14:00 ` Mark Knecht 2010-03-03 14:26 ` Stroller @ 2010-03-03 15:29 ` Harry Putnam 2010-03-03 15:31 ` Willie Wong 1 sibling, 1 reply; 19+ messages in thread From: Harry Putnam @ 2010-03-03 15:29 UTC (permalink / raw To: gentoo-user Mark Knecht <markknecht@gmail.com> writes: > In my most recent case what looked like a simple disk corruption > problem was really a prelude to the drive just plain going bad. Have > you tried smartctl to see what it says about the drive at this point? Sorry to butt in here... is that tool, smartctl in some pkg on portage? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Re: Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" 2010-03-03 15:29 ` [gentoo-user] " Harry Putnam @ 2010-03-03 15:31 ` Willie Wong 0 siblings, 0 replies; 19+ messages in thread From: Willie Wong @ 2010-03-03 15:31 UTC (permalink / raw To: gentoo-user On Wed, Mar 03, 2010 at 09:29:43AM -0600, Harry Putnam wrote: > Mark Knecht <markknecht@gmail.com> writes: > > > In my most recent case what looked like a simple disk corruption > > problem was really a prelude to the drive just plain going bad. Have > > you tried smartctl to see what it says about the drive at this point? > > Sorry to butt in here... is that tool, smartctl in some pkg on portage? > sys-app/smartmontools W -- Willie W. Wong wwong@math.princeton.edu Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire et vice versa ~~~ I. Newton ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-03-09 10:58 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller 2010-03-03 12:42 ` Willie Wong 2010-03-03 13:28 ` Stroller 2010-03-03 14:01 ` Mick 2010-03-03 16:19 ` Stroller 2010-03-03 15:18 ` Willie Wong 2010-03-03 16:16 ` Stroller 2010-03-03 17:14 ` Willie Wong 2010-03-04 9:55 ` Stroller 2010-03-04 10:23 ` Willie Wong 2010-03-04 11:42 ` Stroller 2010-03-09 10:57 ` Stroller 2010-03-03 17:53 ` Neil Bothwick 2010-03-03 14:00 ` Mark Knecht 2010-03-03 14:26 ` Stroller 2010-03-03 15:26 ` Mark Knecht 2010-03-03 15:28 ` Willie Wong 2010-03-03 15:29 ` [gentoo-user] " Harry Putnam 2010-03-03 15:31 ` Willie Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox