* [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
@ 2010-03-03 12:24 Stroller
2010-03-03 12:42 ` Willie Wong
2010-03-03 14:00 ` Mark Knecht
0 siblings, 2 replies; 19+ messages in thread
From: Stroller @ 2010-03-03 12:24 UTC (permalink / raw
To: gentoo-user
There seem to have been a few people posting with filesystem
corruption in the last week or two. It seems to be my turn, so I hope
it isn't contagious. The cause here is quite clear - whilst rummaging
in the server cupboard yesterday, power to the machine was
accidentally disconnected.
I have booted with a live CD & run `reiserfsck --fix-fixable` on the
filesystem, but nevertheless when I attempt to boot the system I get a
"failed to open the device... no such file or directory" message,
followed by another error as per subject line.
However, you will see from this screenshot (taken with an IP KVM) that
the filesystem does indeed seem to have been mounted successfully, if
read-only:
http://linux.stroller.uk.eu.org/fs-corruption.png
All I did here was log in with the root password.
When I boot with a live CD I can mount, read & write the filesystem:
root@sysresccd /root % mount -v -L root /mnt/gentoo
mount: you didn't specify a filesystem type for /dev/sda3
I will try type reiserfs
/dev/sda3 on /mnt/gentoo type reiserfs (rw)
root@sysresccd /root % ls /mnt/gentoo
bin boot dev etc home lib mnt opt proc root sbin sys tmp
usr var
root@sysresccd /root % touch /mnt/gentoo/foo
root@sysresccd /root % echo foobar >> /mnt/gentoo/foo
root@sysresccd /root % ls -lh !!:$
ls -lh /mnt/gentoo/foo
-rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo
root@sysresccd /root % cat !!:$
cat /mnt/gentoo/foo
foobar
root@sysresccd /root % rm !!:$
rm /mnt/gentoo/foo
rm: remove regular file `/mnt/gentoo/foo'? y
root@sysresccd /root %
All the important system stuff on this PC is on a single partition. I
have two other drives attached at /mnt/space & /mnt/morespace - they
are XFS and I have run xfs_repair on both of them, which completes
quickly indicating no problems.
I'm not really sure how to proceed next. I feel the problem is indeed
on this reiserfs filesystem, the root filesystem with the label
"root". I can't help thinking that the problem is not that the system
"failed to open the device", but instead maybe that there's an
important system file missing that means the init script (or whatever
responsible for mounting the fiesystem) is not properly returning 0.
Does this seem possible? Maybe the reiserfs handler for mount is
somehow broken (performing the mount, but not returning 0, or perhaps
broken in such as was it is able to mount read-only but not read-write).
I am tempted to chroot into the system and re-emerge system &
baselayout. If I'm correct in this above guess then re-emerging the
correct file will fix the problem. Right?
`reiserfsck --help` shows some other options besides the simple --fix-
fixable - I assume the "expert option" of --scan-whole-partition is
unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I
safely run these? Am I advised to run these?
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller
@ 2010-03-03 12:42 ` Willie Wong
2010-03-03 13:28 ` Stroller
2010-03-03 14:00 ` Mark Knecht
1 sibling, 1 reply; 19+ messages in thread
From: Willie Wong @ 2010-03-03 12:42 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:
> There seem to have been a few people posting with filesystem
> corruption in the last week or two. It seems to be my turn, so I hope
> it isn't contagious. The cause here is quite clear - whilst rummaging
> in the server cupboard yesterday, power to the machine was
> accidentally disconnected.
>
> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
> filesystem, but nevertheless when I attempt to boot the system I get a
> "failed to open the device... no such file or directory" message,
> followed by another error as per subject line.
from the output it looks like you are mounting by label? What if you
edit fstab to point to the device name /dev/hd?? instead of
LABEL=root? Check the filesystem label to make sure it is ok?
Cheers,
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 12:42 ` Willie Wong
@ 2010-03-03 13:28 ` Stroller
2010-03-03 14:01 ` Mick
2010-03-03 15:18 ` Willie Wong
0 siblings, 2 replies; 19+ messages in thread
From: Stroller @ 2010-03-03 13:28 UTC (permalink / raw
To: gentoo-user
On 3 Mar 2010, at 12:42, Willie Wong wrote:
> On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:
>> There seem to have been a few people posting with filesystem
>> corruption in the last week or two. It seems to be my turn, so I hope
>> it isn't contagious. The cause here is quite clear - whilst rummaging
>> in the server cupboard yesterday, power to the machine was
>> accidentally disconnected.
>>
>> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
>> filesystem, but nevertheless when I attempt to boot the system I
>> get a
>> "failed to open the device... no such file or directory" message,
>> followed by another error as per subject line.
>
> from the output it looks like you are mounting by label? What if you
> edit fstab to point to the device name /dev/hd?? instead of
> LABEL=root? Check the filesystem label to make sure it is ok?
Many thanks for this suggestion, however following it makes no
difference, except in the trivia that it says "failed to open the
device '/dev/hda3': No such file or directory" (instead of "LABEL=...").
I also tried editing grub to point to /dev/sda3 (although admittedly
with the LABEL= entry in /etc/fstab) but that makes no difference. I
have never tried (intentionally) reconfiguring this kernel to use /dev/
sdX instead of /dev/hdX and I'm pretty sure it's booted using the
current kernel & configuration in the past.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller
2010-03-03 12:42 ` Willie Wong
@ 2010-03-03 14:00 ` Mark Knecht
2010-03-03 14:26 ` Stroller
2010-03-03 15:29 ` [gentoo-user] " Harry Putnam
1 sibling, 2 replies; 19+ messages in thread
From: Mark Knecht @ 2010-03-03 14:00 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> There seem to have been a few people posting with filesystem corruption in
> the last week or two. It seems to be my turn, so I hope it isn't contagious.
> The cause here is quite clear - whilst rummaging in the server cupboard
> yesterday, power to the machine was accidentally disconnected.
>
> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
> filesystem, but nevertheless when I attempt to boot the system I get a
> "failed to open the device... no such file or directory" message, followed
> by another error as per subject line.
>
> However, you will see from this screenshot (taken with an IP KVM) that the
> filesystem does indeed seem to have been mounted successfully, if read-only:
>
> http://linux.stroller.uk.eu.org/fs-corruption.png
>
> All I did here was log in with the root password.
>
>
> When I boot with a live CD I can mount, read & write the filesystem:
>
> root@sysresccd /root % mount -v -L root /mnt/gentoo
> mount: you didn't specify a filesystem type for /dev/sda3
> I will try type reiserfs
> /dev/sda3 on /mnt/gentoo type reiserfs (rw)
> root@sysresccd /root % ls /mnt/gentoo
> bin boot dev etc home lib mnt opt proc root sbin sys tmp usr
> var
> root@sysresccd /root % touch /mnt/gentoo/foo
> root@sysresccd /root % echo foobar >> /mnt/gentoo/foo
> root@sysresccd /root % ls -lh !!:$
> ls -lh /mnt/gentoo/foo
> -rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo
> root@sysresccd /root % cat !!:$
> cat /mnt/gentoo/foo
> foobar
> root@sysresccd /root % rm !!:$
> rm /mnt/gentoo/foo
> rm: remove regular file `/mnt/gentoo/foo'? y
> root@sysresccd /root %
>
> All the important system stuff on this PC is on a single partition. I have
> two other drives attached at /mnt/space & /mnt/morespace - they are XFS and
> I have run xfs_repair on both of them, which completes quickly indicating no
> problems.
>
> I'm not really sure how to proceed next. I feel the problem is indeed on
> this reiserfs filesystem, the root filesystem with the label "root". I can't
> help thinking that the problem is not that the system "failed to open the
> device", but instead maybe that there's an important system file missing
> that means the init script (or whatever responsible for mounting the
> fiesystem) is not properly returning 0. Does this seem possible? Maybe the
> reiserfs handler for mount is somehow broken (performing the mount, but not
> returning 0, or perhaps broken in such as was it is able to mount read-only
> but not read-write).
>
> I am tempted to chroot into the system and re-emerge system & baselayout. If
> I'm correct in this above guess then re-emerging the correct file will fix
> the problem. Right?
>
> `reiserfsck --help` shows some other options besides the simple
> --fix-fixable - I assume the "expert option" of --scan-whole-partition is
> unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I safely run
> these? Am I advised to run these?
>
> Stroller.
Hi Stroller,
Sorry for your problems. I've had a rash of machine problems over
the last 6 weeks. No fun. I feel for you.
In my most recent case what looked like a simple disk corruption
problem was really a prelude to the drive just plain going bad. Have
you tried smartctl to see what it says about the drive at this point?
It would be even more frustrating to chroot in, do all the work,
think you had it fixed and then the underlying foundation of your
house crumbles beneath you 3 weeks from now.
Good luck,
Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 13:28 ` Stroller
@ 2010-03-03 14:01 ` Mick
2010-03-03 16:19 ` Stroller
2010-03-03 15:18 ` Willie Wong
1 sibling, 1 reply; 19+ messages in thread
From: Mick @ 2010-03-03 14:01 UTC (permalink / raw
To: gentoo-user
On 3 March 2010 13:28, Stroller <stroller@stellar.eclipse.co.uk> wrote:
>
> On 3 Mar 2010, at 12:42, Willie Wong wrote:
>
>> On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:
>>>
>>> There seem to have been a few people posting with filesystem
>>> corruption in the last week or two. It seems to be my turn, so I hope
>>> it isn't contagious. The cause here is quite clear - whilst rummaging
>>> in the server cupboard yesterday, power to the machine was
>>> accidentally disconnected.
>>>
>>> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
>>> filesystem, but nevertheless when I attempt to boot the system I get a
>>> "failed to open the device... no such file or directory" message,
>>> followed by another error as per subject line.
>>
>> from the output it looks like you are mounting by label? What if you
>> edit fstab to point to the device name /dev/hd?? instead of
>> LABEL=root? Check the filesystem label to make sure it is ok?
>
> Many thanks for this suggestion, however following it makes no difference,
> except in the trivia that it says "failed to open the device '/dev/hda3': No
> such file or directory" (instead of "LABEL=...").
>
> I also tried editing grub to point to /dev/sda3 (although admittedly with
> the LABEL= entry in /etc/fstab) but that makes no difference. I have never
> tried (intentionally) reconfiguring this kernel to use /dev/sdX instead of
> /dev/hdX and I'm pretty sure it's booted using the current kernel &
> configuration in the past.
In my experience reiserfs is a very stable fs. I had a dodgy memory
module once which I put up with for more than 9 months. The machine
would lock up hard on a daily basis and the only way to get it going
again would be to pull the plug. That would happen at random,
midstream emerge --sync, package updates, updatedb, etc. It survived
through hundreds of crashes by fsck at the next boot. Once or twice
things went hairy and I would get a message similar to yours. On
these rare occasions I booted with a LiveCD and with the partitions
unmounted I ran --check, then --fix-fixable and finally
--rebuild-tree. You may want to use an external drive with dd to
image the current / partition and do all your recovery work on that.
If you don't care too much about the risk of catastrophic failure then
just run --rebuild-tree with a LiveCD and see what you get.
Good luck.
--
Regards,
Mick
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 14:00 ` Mark Knecht
@ 2010-03-03 14:26 ` Stroller
2010-03-03 15:26 ` Mark Knecht
2010-03-03 15:28 ` Willie Wong
2010-03-03 15:29 ` [gentoo-user] " Harry Putnam
1 sibling, 2 replies; 19+ messages in thread
From: Stroller @ 2010-03-03 14:26 UTC (permalink / raw
To: gentoo-user
On 3 Mar 2010, at 14:00, Mark Knecht wrote:
> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk
> > wrote:
>> There seem to have been a few people posting with filesystem
>> corruption in
>> the last week or two. It seems to be my turn, so I hope it isn't
>> contagious.
>> The cause here is quite clear - whilst rummaging in the server
>> cupboard
>> yesterday, power to the machine was accidentally disconnected.
> ...
> Sorry for your problems. I've had a rash of machine problems over
> the last 6 weeks. No fun. I feel for you.
>
> In my most recent case what looked like a simple disk corruption
> problem was really a prelude to the drive just plain going bad. Have
> you tried smartctl to see what it says about the drive at this point?
>
> It would be even more frustrating to chroot in, do all the work,
> think you had it fixed and then the underlying foundation of your
> house crumbles beneath you 3 weeks from now.
I don't think this is a problem. I would love to know what others
think of the `smartctl` output:
root@sysresccd /root % smartctl -H /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
9 Power_On_Seconds 0x0012 001 001 020 Old_age
Always FAILING_NOW 44803h+12m+16s
root@sysresccd /root % smartctl -i /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Fujitsu MPA..MPG series
Device Model: FUJITSU MPF3204AT
Serial Number: 05030567
Firmware Version: 0028
User Capacity: 20,496,236,544 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
Local Time is: Wed Mar 3 14:14:31 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
root@sysresccd /root %
This looks to me like smartctl is going "OMG! What an ancient drive!"
- it's a 20gig EIDE drive and if my pocket calculator is correct
(44803/24/365), it's seen 5 years of active use - and that's the
"marginal attribute" referred to.
Like I said, the power plug was accidentally pulled on this drive, so
I'm inclined to attribute the corruption only to that, not to the
drive actually failing.
The drive is in a computer that has rarely been turned off in the last
couple of years, and is also in a warm environment, conditions which
are ideal. I appreciate the latter seems unintuitive, but in fact
studies have showed that drives in somewhat warm environments last
longer than those that are cooled.
That it passes the "SMART overall-health self-assessment test"
suggests to me that it is chugging away quite happily.
I would have dismissed your concerns were it not for the capitalised
"FAILING_NOW" in the output. Like I say, I think this is just smartctl
declaring "OMG! this drive is old!", but I open this matter to the
list for discussion (should you wish).
I think I'm actually nearly ready to migrate off this system. The
power was actually pulled as I installed 3 new (to me) rackmount
machines in the server cupboard - the plan is to have identical
machines running RAID, so that in the case of ANY problems I have
spares available. I have take nightly backups of the important data on
this machine, however I'd prefer it to run just a couple or a few
weeks longer to allow me to migrate at my own leisure.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 13:28 ` Stroller
2010-03-03 14:01 ` Mick
@ 2010-03-03 15:18 ` Willie Wong
2010-03-03 16:16 ` Stroller
1 sibling, 1 reply; 19+ messages in thread
From: Willie Wong @ 2010-03-03 15:18 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote:
> >from the output it looks like you are mounting by label? What if you
> >edit fstab to point to the device name /dev/hd?? instead of
> >LABEL=root? Check the filesystem label to make sure it is ok?
>
> Many thanks for this suggestion, however following it makes no
> difference, except in the trivia that it says "failed to open the
> device '/dev/hda3': No such file or directory" (instead of "LABEL=...").
If you try to boot, after the failure to check rootfs, it should dump
you to a recovery console, what happens if you issue ls /dev ?
Also check dmesg?
Cheers,
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 14:26 ` Stroller
@ 2010-03-03 15:26 ` Mark Knecht
2010-03-03 15:28 ` Willie Wong
1 sibling, 0 replies; 19+ messages in thread
From: Mark Knecht @ 2010-03-03 15:26 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 3, 2010 at 6:26 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote:
>
> On 3 Mar 2010, at 14:00, Mark Knecht wrote:
>>
>> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk>
>> wrote:
>>>
>>> There seem to have been a few people posting with filesystem corruption
>>> in
>>> the last week or two. It seems to be my turn, so I hope it isn't
>>> contagious.
>>> The cause here is quite clear - whilst rummaging in the server cupboard
>>> yesterday, power to the machine was accidentally disconnected.
>>
>> ...
>> Sorry for your problems. I've had a rash of machine problems over
>> the last 6 weeks. No fun. I feel for you.
>>
>> In my most recent case what looked like a simple disk corruption
>> problem was really a prelude to the drive just plain going bad. Have
>> you tried smartctl to see what it says about the drive at this point?
>>
>> It would be even more frustrating to chroot in, do all the work,
>> think you had it fixed and then the underlying foundation of your
>> house crumbles beneath you 3 weeks from now.
>
> I don't think this is a problem. I would love to know what others think of
> the `smartctl` output:
>
>
> root@sysresccd /root % smartctl -H /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 9 Power_On_Seconds 0x0012 001 001 020 Old_age Always
> FAILING_NOW 44803h+12m+16s
>
> root@sysresccd /root % smartctl -i /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: Fujitsu MPA..MPG series
> Device Model: FUJITSU MPF3204AT
> Serial Number: 05030567
> Firmware Version: 0028
> User Capacity: 20,496,236,544 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 5
> ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
> Local Time is: Wed Mar 3 14:14:31 2010 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> root@sysresccd /root %
>
>
> This looks to me like smartctl is going "OMG! What an ancient drive!" - it's
> a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365),
> it's seen 5 years of active use - and that's the "marginal attribute"
> referred to.
>
> Like I said, the power plug was accidentally pulled on this drive, so I'm
> inclined to attribute the corruption only to that, not to the drive actually
> failing.
>
> The drive is in a computer that has rarely been turned off in the last
> couple of years, and is also in a warm environment, conditions which are
> ideal. I appreciate the latter seems unintuitive, but in fact studies have
> showed that drives in somewhat warm environments last longer than those that
> are cooled.
>
> That it passes the "SMART overall-health self-assessment test" suggests to
> me that it is chugging away quite happily.
>
> I would have dismissed your concerns were it not for the capitalised
> "FAILING_NOW" in the output. Like I say, I think this is just smartctl
> declaring "OMG! this drive is old!", but I open this matter to the list for
> discussion (should you wish).
>
> I think I'm actually nearly ready to migrate off this system. The power was
> actually pulled as I installed 3 new (to me) rackmount machines in the
> server cupboard - the plan is to have identical machines running RAID, so
> that in the case of ANY problems I have spares available. I have take
> nightly backups of the important data on this machine, however I'd prefer it
> to run just a couple or a few weeks longer to allow me to migrate at my own
> leisure.
>
> Stroller.
I've had two machines go bad due to hard drive problems in the last 6
weeks. One drive was 4.5 years old, the other 6 years old. I have no
experience with smart. I'm just learning about it. However it is
generated by the microcontroller in the hard drive as per the view of
the drive manufacturer so if the drive is telling you it's failing
then...
My 4.5 year failure actually stopped producing smart output somewhere
along the way before it failed. The 6 year drive I wasn't using smart
at the time so I had no data from it but it was in an environment
where the UPS went through a lot of abuse.
I sounds like you have good backups so just make sure they are good
and do what you want.
- Mark
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 14:26 ` Stroller
2010-03-03 15:26 ` Mark Knecht
@ 2010-03-03 15:28 ` Willie Wong
1 sibling, 0 replies; 19+ messages in thread
From: Willie Wong @ 2010-03-03 15:28 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 03, 2010 at 02:26:46PM +0000, Stroller wrote:
> I don't think this is a problem. I would love to know what others
> think of the `smartctl` output:
>
>
> root@sysresccd /root % smartctl -H /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 9 Power_On_Seconds 0x0012 001 001 020 Old_age
> Always FAILING_NOW 44803h+12m+16s
You can always run the smart long-test to double check. The
FAILING_NOW just indicates that the normalised value falls below the
threshold. For Power_On_Seconds, this usually just indicates that your
are way pass the warranty. If you really care about your data, swap it
out now or make frequent backups. Otherwise I don't see the harm of
keeping it until it actually dies.
Cheers,
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* [gentoo-user] Re: Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 14:00 ` Mark Knecht
2010-03-03 14:26 ` Stroller
@ 2010-03-03 15:29 ` Harry Putnam
2010-03-03 15:31 ` Willie Wong
1 sibling, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2010-03-03 15:29 UTC (permalink / raw
To: gentoo-user
Mark Knecht <markknecht@gmail.com> writes:
> In my most recent case what looked like a simple disk corruption
> problem was really a prelude to the drive just plain going bad. Have
> you tried smartctl to see what it says about the drive at this point?
Sorry to butt in here... is that tool, smartctl in some pkg on portage?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Re: Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 15:29 ` [gentoo-user] " Harry Putnam
@ 2010-03-03 15:31 ` Willie Wong
0 siblings, 0 replies; 19+ messages in thread
From: Willie Wong @ 2010-03-03 15:31 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 03, 2010 at 09:29:43AM -0600, Harry Putnam wrote:
> Mark Knecht <markknecht@gmail.com> writes:
>
> > In my most recent case what looked like a simple disk corruption
> > problem was really a prelude to the drive just plain going bad. Have
> > you tried smartctl to see what it says about the drive at this point?
>
> Sorry to butt in here... is that tool, smartctl in some pkg on portage?
>
sys-app/smartmontools
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 15:18 ` Willie Wong
@ 2010-03-03 16:16 ` Stroller
2010-03-03 17:14 ` Willie Wong
2010-03-03 17:53 ` Neil Bothwick
0 siblings, 2 replies; 19+ messages in thread
From: Stroller @ 2010-03-03 16:16 UTC (permalink / raw
To: gentoo-user
Many thanks for your help, Willie!
On 3 Mar 2010, at 15:18, Willie Wong wrote:
> On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote:
>>> from the output it looks like you are mounting by label? What if you
>>> edit fstab to point to the device name /dev/hd?? instead of
>>> LABEL=root? Check the filesystem label to make sure it is ok?
>>
>> Many thanks for this suggestion, however following it makes no
>> difference, except in the trivia that it says "failed to open the
>> device '/dev/hda3': No such file or directory" (instead of
>> "LABEL=...").
>
> If you try to boot, after the failure to check rootfs, it should dump
> you to a recovery console, what happens if you issue ls /dev ?
About 13 items. Is this unlucky?
http://linux.stroller.uk.eu.org/fs-corruption-dev.png
> Also check dmesg?
I don't think this gives any clues:
http://linux.stroller.uk.eu.org/fs-corruption-dmesg.png
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 14:01 ` Mick
@ 2010-03-03 16:19 ` Stroller
0 siblings, 0 replies; 19+ messages in thread
From: Stroller @ 2010-03-03 16:19 UTC (permalink / raw
To: gentoo-user
On 3 Mar 2010, at 14:01, Mick wrote:
> ... Once or twice
> things went hairy and I would get a message similar to yours. On
> these rare occasions I booted with a LiveCD and with the partitions
> unmounted I ran --check, then --fix-fixable and finally
> --rebuild-tree. You may want to use an external drive with dd to
> image the current / partition and do all your recovery work on that.
> If you don't care too much about the risk of catastrophic failure then
> just run --rebuild-tree with a LiveCD and see what you get.
That's a great idea. I'm (now) religious about backing up my
customers' computers, often using dd like this, but for some reason it
hadn't yet occurred to me today.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 16:16 ` Stroller
@ 2010-03-03 17:14 ` Willie Wong
2010-03-04 9:55 ` Stroller
2010-03-03 17:53 ` Neil Bothwick
1 sibling, 1 reply; 19+ messages in thread
From: Willie Wong @ 2010-03-03 17:14 UTC (permalink / raw
To: gentoo-user
On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote:
>
> Many thanks for your help, Willie!
>
> About 13 items. Is this unlucky?
>
> http://linux.stroller.uk.eu.org/fs-corruption-dev.png
>
Okay, something is screwed up with udev. Is udev started? Is it
upgraded recently? Any config files in /etc that needs updating? Is
udev directory in /etc okay?
At this point I don't think your problem is necessarily with the
harddrive itself: I think we now know why fsck cannot open file or
device.
Check /var/log/emerge.log or the portage elogs. Did you upgrade
baselayout recently?
Good luck,
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 16:16 ` Stroller
2010-03-03 17:14 ` Willie Wong
@ 2010-03-03 17:53 ` Neil Bothwick
1 sibling, 0 replies; 19+ messages in thread
From: Neil Bothwick @ 2010-03-03 17:53 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
On Wed, 3 Mar 2010 16:16:50 +0000, Stroller wrote:
> > If you try to boot, after the failure to check rootfs, it should dump
> > you to a recovery console, what happens if you issue ls /dev ?
>
> About 13 items. Is this unlucky?
>
> http://linux.stroller.uk.eu.org/fs-corruption-dev.png
Is that the same as you see in the dev directory of your root filesystem
when you boot from a live CD? It looks like udev may not be running for
some reason.
--
Neil Bothwick
Everyone has a photographic memory. Some don't have film.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-03 17:14 ` Willie Wong
@ 2010-03-04 9:55 ` Stroller
2010-03-04 10:23 ` Willie Wong
0 siblings, 1 reply; 19+ messages in thread
From: Stroller @ 2010-03-04 9:55 UTC (permalink / raw
To: gentoo-user
On 3 Mar 2010, at 17:14, Willie Wong wrote:
> On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote:
>>
>> Many thanks for your help, Willie!
>>
>> About 13 items. Is this unlucky?
>>
>> http://linux.stroller.uk.eu.org/fs-corruption-dev.png
>>
>
> Okay, something is screwed up with udev. Is udev started?
Ah!
Ok... this is shown as an error when before the "failed to open the
device" message.
http://linux.stroller.uk.eu.org/fs-corruption.png
Startin udevd... error getting signalfd
> Is it upgraded recently?
chroot # genlop udev | tail -n 4
Tue May 26 17:26:46 2009 >>> sys-fs/udev-124-r2
Thu Jun 25 06:05:37 2009 >>> sys-fs/udev-141
Mon Dec 21 04:57:20 2009 >>> sys-fs/udev-146-r1
chroot #
The system last booted ok in October and, until yesterday, has been up
since then. So this December update may well be to blame.
> Any config files in /etc that needs updating?
Nope.
chroot # etc-update
Scanning Configuration files...
Exiting: Nothing left to do; exiting. :)
chroot #
> Is udev directory in /etc okay?
All looks ok to me:
chroot # ls -l /etc/udev/*
-rw-r--r-- 1 root root 277 2009-12-21 04:56 /etc/udev/udev.conf
/etc/udev/rules.d:
total 24
-rw-r--r-- 1 root root 55 2009-10-11 09:38 30-svgalib.rules
-rw-r--r-- 1 root root 4311 2009-09-11 23:02 70-nut-usbups.rules
-rw-r--r-- 1 root root 1606 2008-12-22 12:33 70-persistent-cd.rules
-rw-r--r-- 1 root root 440 2009-12-21 04:57 70-persistent-net.rules
-rw-r--r-- 1 root root 28 2009-07-22 03:04 99-fuse.rules
chroot #
> At this point I don't think your problem is necessarily with the
> harddrive itself: I think we now know why fsck cannot open file or
> device.
>
> Check /var/log/emerge.log or the portage elogs. Did you upgrade
> baselayout recently?
Yes, upgraded to baselayout-1.12.13 at the same time I udev was
upgraded:
chroot # genlop baselayout | tail -n 3
Thu Feb 12 09:36:25 2009 >>> sys-apps/baselayout-1.12.11.1
Mon Dec 21 01:25:32 2009 >>> sys-apps/baselayout-1.12.13
chroot #
What I'm inclined to do now, having taken an image of the drive (I
used ddrescue, which showed no errors) is try `reiserfsck --rebuild-
tree` on the partition, "just to be sure". I get the impression this
is probably quite safe, but I have a backup anyway.
I, too, suspect that won't fix the problem, so after that I guess I'll
try remerging udev & baselayout. I suspect remerging --empty may be
necessary, but any other suggestions welcome.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-04 9:55 ` Stroller
@ 2010-03-04 10:23 ` Willie Wong
2010-03-04 11:42 ` Stroller
2010-03-09 10:57 ` Stroller
0 siblings, 2 replies; 19+ messages in thread
From: Willie Wong @ 2010-03-04 10:23 UTC (permalink / raw
To: gentoo-user
On Thu, Mar 04, 2010 at 09:55:44AM +0000, Stroller wrote:
>
> On 3 Mar 2010, at 17:14, Willie Wong wrote:
>
> >On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote:
> >>
> >>Many thanks for your help, Willie!
> >>
> >>About 13 items. Is this unlucky?
> >>
> >>http://linux.stroller.uk.eu.org/fs-corruption-dev.png
> >>
> >
> >Okay, something is screwed up with udev. Is udev started?
>
> Ah!
>
> Ok... this is shown as an error when before the "failed to open the
> device" message.
>
> http://linux.stroller.uk.eu.org/fs-corruption.png
>
> Startin udevd... error getting signalfd
What kernel are you running?
You need at least 2.6.22 to have signalfd support, and at least 2.6.27
to have reliable signalfd support. If this is not satisfied, please
either downgrade udev or upgrade your kernel.
Cheers,
W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-04 10:23 ` Willie Wong
@ 2010-03-04 11:42 ` Stroller
2010-03-09 10:57 ` Stroller
1 sibling, 0 replies; 19+ messages in thread
From: Stroller @ 2010-03-04 11:42 UTC (permalink / raw
To: gentoo-user
On 4 Mar 2010, at 10:23, Willie Wong wrote:
> On Thu, Mar 04, 2010 at 09:55:44AM +0000, Stroller wrote:
>>
>> On 3 Mar 2010, at 17:14, Willie Wong wrote:
>>
>>> On Wed, Mar 03, 2010 at 04:16:50PM +0000, Stroller wrote:
>>>>
>>>> Many thanks for your help, Willie!
>>>>
>>>> About 13 items. Is this unlucky?
>>>>
>>>> http://linux.stroller.uk.eu.org/fs-corruption-dev.png
>>>>
>>>
>>> Okay, something is screwed up with udev. Is udev started?
>>
>> Ah!
>>
>> Ok... this is shown as an error when before the "failed to open the
>> device" message.
>>
>> http://linux.stroller.uk.eu.org/fs-corruption.png
>>
>> Startin udevd... error getting signalfd
>
> What kernel are you running?
>
> You need at least 2.6.22 to have signalfd support, and at least 2.6.27
> to have reliable signalfd support. If this is not satisfied, please
> either downgrade udev or upgrade your kernel.
I'm surprised to find this system is still running 2.6.25. I'll
upgrade now.
Many thanks for your patient help,
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("
2010-03-04 10:23 ` Willie Wong
2010-03-04 11:42 ` Stroller
@ 2010-03-09 10:57 ` Stroller
1 sibling, 0 replies; 19+ messages in thread
From: Stroller @ 2010-03-09 10:57 UTC (permalink / raw
To: gentoo-user
On 4 Mar 2010, at 10:23, Willie Wong wrote:
>>> ...
>>> Okay, something is screwed up with udev. Is udev started?
>>
>> Ah!
>>
>> Ok... this is shown as an error when before the "failed to open the
>> device" message.
>>
>> http://linux.stroller.uk.eu.org/fs-corruption.png
>>
>> Startin udevd... error getting signalfd
>
> What kernel are you running?
>
> You need at least 2.6.22 to have signalfd support, and at least 2.6.27
> to have reliable signalfd support. If this is not satisfied, please
> either downgrade udev or upgrade your kernel.
Many thanks. Upgrading to 2.6.31 resolved the problem, although it did
take a couple of extra kernel compiles & reboots, allowing for the /
dev/hdX -> /dev/sdX changes & compiling drivers for the EIDE controller.
Many thanks to everyone who posted.
Stroller.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-03-09 10:58 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-03 12:24 [gentoo-user] Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :(" Stroller
2010-03-03 12:42 ` Willie Wong
2010-03-03 13:28 ` Stroller
2010-03-03 14:01 ` Mick
2010-03-03 16:19 ` Stroller
2010-03-03 15:18 ` Willie Wong
2010-03-03 16:16 ` Stroller
2010-03-03 17:14 ` Willie Wong
2010-03-04 9:55 ` Stroller
2010-03-04 10:23 ` Willie Wong
2010-03-04 11:42 ` Stroller
2010-03-09 10:57 ` Stroller
2010-03-03 17:53 ` Neil Bothwick
2010-03-03 14:00 ` Mark Knecht
2010-03-03 14:26 ` Stroller
2010-03-03 15:26 ` Mark Knecht
2010-03-03 15:28 ` Willie Wong
2010-03-03 15:29 ` [gentoo-user] " Harry Putnam
2010-03-03 15:31 ` Willie Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox