* [gentoo-amd64] not amd64 specific - disk failure
@ 2007-11-19 10:05 Raffaele BELARDI
2007-11-19 11:52 ` Beso
0 siblings, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-19 10:05 UTC (permalink / raw
To: gentoo-amd64
Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
The drive is formatted as ext3, single partition (sdc1), no RAID, used
as an archive of divx movies, completely full with data. Motherboard is
ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
ASUS K8V SE (Via K8T800 chipset) if it helps.
At boot the syslog shows (more or less):
I/O buffer read error: logical block 0
I/O buffer read error: logical block 1
Any attempt to mount /dev/sdc1 results in tens of the above message
(plus other details I don't remember right now) and finally fails.
fdisk -l shows the partition table as it should be.
It was late night so I gave up. Are there any chances to recover my data
by e.g. specifying a different superblock (whatever that is)? Any links
to help me?
thanks,
raffaele
PS I bought the drive in 2005 and I've used it only to archive movies,
so very little. It' the last Maxtor I buy (ok, also because it's Seagate
now..)
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure
2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI
@ 2007-11-19 11:52 ` Beso
2007-11-19 13:27 ` Raffaele BELARDI
0 siblings, 1 reply; 16+ messages in thread
From: Beso @ 2007-11-19 11:52 UTC (permalink / raw
To: gentoo-amd64
[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]
before doing something on the disk first, read all and then take a decision
on which options may help you. on my reiserfs filesystem, the included utils
were enough to let me recover about 98% of the data after the full index
rebuild. if you've journaled your filesystem, then i think that you should
be able to recover it. also, it may only be a problem of superblock and in
that case the second link might help you more. anyway, get a disk that can
contain all the data that there was on the failed one, since you wouldn't
want to do stuff on it to avoid loss of data.
try reading this (the ext2/3 part):
http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
and this:
http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html
or you might try this utility:
http://www.cgsecurity.org/wiki/PhotoRec
2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
> The drive is formatted as ext3, single partition (sdc1), no RAID, used
> as an archive of divx movies, completely full with data. Motherboard is
> ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
> ASUS K8V SE (Via K8T800 chipset) if it helps.
>
> At boot the syslog shows (more or less):
>
> I/O buffer read error: logical block 0
> I/O buffer read error: logical block 1
>
> Any attempt to mount /dev/sdc1 results in tens of the above message
> (plus other details I don't remember right now) and finally fails.
>
> fdisk -l shows the partition table as it should be.
>
> It was late night so I gave up. Are there any chances to recover my data
> by e.g. specifying a different superblock (whatever that is)? Any links
> to help me?
>
> thanks,
>
> raffaele
>
> PS I bought the drive in 2005 and I've used it only to archive movies,
> so very little. It' the last Maxtor I buy (ok, also because it's Seagate
> now..)
> --
> gentoo-amd64@gentoo.org mailing list
>
>
--
dott. ing. beso
[-- Attachment #2: Type: text/html, Size: 2617 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure
2007-11-19 11:52 ` Beso
@ 2007-11-19 13:27 ` Raffaele BELARDI
2007-11-19 13:56 ` Beso
2007-11-19 17:14 ` [gentoo-amd64] " Duncan
0 siblings, 2 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-19 13:27 UTC (permalink / raw
To: gentoo-amd64
Beso,
thanks for the links, I've already started reading. I've also got a new
drive to copy the recovered data (if any, cross fingers...)
Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
detected and a 'bad superblock' message displayed when attempting to mount.
In my case the kernel is unable to detect /dev/sdc1, after the long list
of read errors below it ends up with only /dev/sdc.
Does this look like superblock issue, or something worse?
thanks,
raffaele
Beso wrote:
> before doing something on the disk first, read all and then take a
> decision on which options may help you. on my reiserfs filesystem, the
> included utils were enough to let me recover about 98% of the data after
> the full index rebuild. if you've journaled your filesystem, then i
> think that you should be able to recover it. also, it may only be a
> problem of superblock and in that case the second link might help you
> more. anyway, get a disk that can contain all the data that there was on
> the failed one, since you wouldn't want to do stuff on it to avoid loss
> of data.
> try reading this (the ext2/3 part):
> http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
> and this:
> http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html
> or you might try this utility:
> http://www.cgsecurity.org/wiki/PhotoRec
>
> 2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com
> <mailto:raffaele.belardi@st.com>>:
>
> Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
> The drive is formatted as ext3, single partition (sdc1), no RAID, used
> as an archive of divx movies, completely full with data. Motherboard is
> ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
> ASUS K8V SE (Via K8T800 chipset) if it helps.
>
> At boot the syslog shows (more or less):
>
> I/O buffer read error: logical block 0
> I/O buffer read error: logical block 1
>
> Any attempt to mount /dev/sdc1 results in tens of the above message
> (plus other details I don't remember right now) and finally fails.
>
> fdisk -l shows the partition table as it should be.
>
> It was late night so I gave up. Are there any chances to recover my data
> by e.g. specifying a different superblock (whatever that is)? Any links
> to help me?
>
> thanks,
>
> raffaele
>
> PS I bought the drive in 2005 and I've used it only to archive movies,
> so very little. It' the last Maxtor I buy (ok, also because it's Seagate
> now..)
> --
> gentoo-amd64@gentoo.org <mailto:gentoo-amd64@gentoo.org> mailing list
>
>
>
>
> --
> dott. ing. beso
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure
2007-11-19 13:27 ` Raffaele BELARDI
@ 2007-11-19 13:56 ` Beso
2007-11-19 17:14 ` [gentoo-amd64] " Duncan
1 sibling, 0 replies; 16+ messages in thread
From: Beso @ 2007-11-19 13:56 UTC (permalink / raw
To: gentoo-amd64
[-- Attachment #1: Type: text/plain, Size: 989 bytes --]
2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Beso,
>
> thanks for the links, I've already started reading. I've also got a new
> drive to copy the recovered data (if any, cross fingers...)
>
> Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
> detected and a 'bad superblock' message displayed when attempting to
> mount.
>
> In my case the kernel is unable to detect /dev/sdc1, after the long list
> of read errors below it ends up with only /dev/sdc.
>
> Does this look like superblock issue, or something worse?
>
> thanks,
>
> raffaele
try a df /dev/sdc and see if this recognizes something. if it gives you
something then you might really have a superblock issue that could be
corrected according to this link:
http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
.
if it wouldn't give you valid filesystem outputs, then maybe it's not a
superblock issue and it's a bit more complicated to recover from the
partition..
[-- Attachment #2: Type: text/html, Size: 1399 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-19 13:27 ` Raffaele BELARDI
2007-11-19 13:56 ` Beso
@ 2007-11-19 17:14 ` Duncan
2007-11-20 7:47 ` Raffaele BELARDI
1 sibling, 1 reply; 16+ messages in thread
From: Duncan @ 2007-11-19 17:14 UTC (permalink / raw
To: gentoo-amd64
Raffaele BELARDI <raffaele.belardi@st.com> posted 47418F30.6050607@st.com,
excerpted below, on Mon, 19 Nov 2007 14:27:12 +0100:
> Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
> detected and a 'bad superblock' message displayed when attempting to
> mount.
>
> In my case the kernel is unable to detect /dev/sdc1, after the long list
> of read errors below it ends up with only /dev/sdc.
>
> Does this look like superblock issue, or something worse?
If you have a spare drive of the same size or larger, you can try dd, or
probably better yet, merge dd-rescue and try it. They copy a file or
part of one, in this case an entire block device, from one location to
another, "raw". What you want to do is copy the entire bad device, /dev/
sdc above, to the new device. Then you have a copy to play around with
without worrying about making the bad device worse before you get
whatever you were trying to get off of it, off.
dd-rescue is different than dd in that if there are bad blocks, it will
run until it starts hitting them, then it will work backwards from the
other end until it hits them there, then it'll try blocks in the middle.
Thus, if you have good blocks, bad blocks, good blocks, bad blocks, good
blocks, dd-rescue recovers more of the disk in a reasonable amount of
time, as compared to straight dd, which will try straight thru only.
(The problem is that once you start hitting bad blocks, everything slows
down, because the system tries and retries the bad block multiple times
before giving up, taking minutes to read a block or finally decide it
can't, before moving on, where it'd read a good block in seconds. Thus,
to work thru even a few hundred bad blocks can take DAYS. Giving up
after a few and starting from the other end, then trying in the middle,
increases the number of blocks recovered, without sitting there waiting
for days for it to work thru all the bad ones and get to the rest of the
good ones again, and that's what dd-rescue does for you, automates the
give up and try from the other end and then in the middle stuff.)
The thing to remember when working with either program is that you want
to be **VERY** **SURE** you get the right devices specified, particularly
for the output device. If you tell it to write to the device that your
main system is on instead of the new empty device, it WILL overwrite your
main system device, boot record, partition table, and all. Thus, make
TRIPLE SURE you have the right output device specified before you hit
that enter key.
The reason I'm suggesting dd/dd-rescue is because they'll grab the raw
data (what they can of it) directly off of the device you point them at (/
dev/sdc above, but as I said, be absolutely sure the devices aren't
reordered after you attach the new one). Since you can't mount the
partition, you need something that can grab the info off of the
unpartitioned drive itself. Then, once you get a copy to work with, you
can check what fdisk says about it and go from there.
In fact, if the data is worth it and you have the money, you may want to
get TWO replacement drives, one to make a "safe" copy to, and a second to
make a working copy (from the safe copy) to. Then you play with the
working copy, and if you screw up, you can simply recopy from the safe
copy, without having to go back to the damaged drive -- because it's
possible you'll only get the one chance to get stuff off of the damaged
one.
If the damage is severe enough dd-rescue can't pull anything off,
consider wrapping the drive in paper for padding and moisture absorption)
then plastic (preferably double- or triple-wrapped), then put it in your
freezer overnight. There are quite a number of tales of folks that had
dead drives that they were able to revive long enough by freezing them,
to get the stuff or at least part of the stuff off that they needed to.
Keep in mind that as soon as you take it out and plug it in, it'll start
warming up, and you may only have a few minutes, if it's bad enough. In
that case, if you have just a few smaller files you really want, you can
hope the filesystem is usable again, and you may have time enough to
retrieve them. If that's not the case, and you need to go for the bulk,
then you can write down how far you get, then try freezing it again, and
tell dd-rescue to start where it stopped the second time. Of course, you
may have a limited number of times even freezing the drive will work, and
you likely won't recover the entire thing this way, but if some is better
than none...
There are special forensics LiveCD distributions out there. Try STD and
INSERT (google them, that's how I found them), both based off KNOPPIX, I
believe. INSERT is small enough to fit on the small credit/business-card
sized CD, 180 MB or some such. STD is a full-sized CD-image, basically a
normal KNOPPIX only with enough stuff removed to load the extra forensics/
recovery/etc tools. It even still has some games (Frozen Bubble, etc) on
it. They may help you recover something workable off the image you
copied over with dd-rescue. Of course, since they have a bunch of
programs, including AV and other MS Windows recovery stuff (there's a
nice MS eXPrivacy password blanker utility on there, my boss ran into
trouble, hadn't created a password reset disk, and I had to use it on his
box, yes, it was his), and network troubleshooting stuff as well, you'll
probably want to grab them and play around with them a bit to see what
they are like, before actually starting to work on recovering your data.
Hope that's useful.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-19 17:14 ` [gentoo-amd64] " Duncan
@ 2007-11-20 7:47 ` Raffaele BELARDI
2007-11-20 9:01 ` Beso
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20 7:47 UTC (permalink / raw
To: gentoo-amd64
Duncan wrote:
> If you have a spare drive of the same size or larger, you can try dd, or
> probably better yet, merge dd-rescue and try it. They copy a file or
> part of one, in this case an entire block device, from one location to
> another, "raw". What you want to do is copy the entire bad device, /dev/
> sdc above, to the new device. Then you have a copy to play around with
> without worrying about making the bad device worse before you get
> whatever you were trying to get off of it, off.
>
Duncan,
thanks for the ddrescue explanation, I will surely give it a try.
Yesterday evening I got a new drive double the size of the damaged one,
created a 250Gb partition on it and tried:
# dd if=/dev/hdb of=/mnt/disk_500/sdb.img
It stopped after few kb due to read errors. So I modified to
dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img
and after some time I got a 250Gb sdb.img on the new drive.
Then started the fun (it was already past midnight). When I created the
new partition I noted down the superblock backup locations.
Unfortunately, every:
# e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img
returned 'bad superblock'. After googling for some utility to scan disc
for superblocks, I ended up with testdisk (it's ~amd64). To my
understanding this works on real HW only, so I had to reconnect the
damaged HD and let it do its job. testdisk found the superblocks, but
according to it they were at the exact locations I had already noted, so
no help. I also tried to let it search for partitions because I read it
has an option to parse the directory. It worked, it let me see the list
of lost files, but that's all, it has no option to recover. But at least
it told me there is some good superblock somewhere.
Finally I went back to the sdg.img and used "od | less" to look at what
was present at the superblock location. What I saw was, I believe, a
part of the superblock (an almost regular patter of numbers, increasing,
which could be a list of blocks? I need to study ext2) but the point is
that this pattern began well before the 'theoretical address' of the
superblock.
So my hypothesis is that the bad blocks or sectors at the beginning of
the partition were not copied, or only partly copied, by dd, and due to
this the superblocks are all shifted down. Although I don't like to
access again the hw, maybe I should try:
# dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img
to get an aligned image. Problem is I don't know what bs= should be.
Block size, so 4k?
Any other option I might have?
thanks,
raffaele
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-20 7:47 ` Raffaele BELARDI
@ 2007-11-20 9:01 ` Beso
2007-11-20 10:06 ` Raffaele BELARDI
2007-11-20 9:41 ` Duncan
2007-11-21 9:24 ` Raffaele BELARDI
2 siblings, 1 reply; 16+ messages in thread
From: Beso @ 2007-11-20 9:01 UTC (permalink / raw
To: gentoo-amd64
[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]
2007/11/20, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Duncan wrote:
> > If you have a spare drive of the same size or larger, you can try dd, or
> > probably better yet, merge dd-rescue and try it. They copy a file or
> > part of one, in this case an entire block device, from one location to
> > another, "raw". What you want to do is copy the entire bad device,
> /dev/
> > sdc above, to the new device. Then you have a copy to play around with
> > without worrying about making the bad device worse before you get
> > whatever you were trying to get off of it, off.
> >
> Duncan,
>
> thanks for the ddrescue explanation, I will surely give it a try.
>
> Yesterday evening I got a new drive double the size of the damaged one,
> created a 250Gb partition on it and tried:
> # dd if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> It stopped after few kb due to read errors. So I modified to
> dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> and after some time I got a 250Gb sdb.img on the new drive.
> Then started the fun (it was already past midnight). When I created the
> new partition I noted down the superblock backup locations.
> Unfortunately, every:
> # e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img
>
> returned 'bad superblock'. After googling for some utility to scan disc
> for superblocks, I ended up with testdisk (it's ~amd64). To my
> understanding this works on real HW only, so I had to reconnect the
> damaged HD and let it do its job. testdisk found the superblocks, but
> according to it they were at the exact locations I had already noted, so
> no help. I also tried to let it search for partitions because I read it
> has an option to parse the directory. It worked, it let me see the list
> of lost files, but that's all, it has no option to recover. But at least
> it told me there is some good superblock somewhere.
>
> Finally I went back to the sdg.img and used "od | less" to look at what
> was present at the superblock location. What I saw was, I believe, a
> part of the superblock (an almost regular patter of numbers, increasing,
> which could be a list of blocks? I need to study ext2) but the point is
> that this pattern began well before the 'theoretical address' of the
> superblock.
>
> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try:
> # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
this should tell you what the block size is:
df /dev/sdc
--
dott. ing. beso
[-- Attachment #2: Type: text/html, Size: 3205 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-20 7:47 ` Raffaele BELARDI
2007-11-20 9:01 ` Beso
@ 2007-11-20 9:41 ` Duncan
2007-11-20 10:25 ` Raffaele BELARDI
2007-11-21 9:24 ` Raffaele BELARDI
2 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2007-11-20 9:41 UTC (permalink / raw
To: gentoo-amd64
Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com,
excerpted below, on Tue, 20 Nov 2007 08:47:32 +0100:
> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try: # dd conv=noerror,sync bs=4096
> if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
>
> Any other option I might have?
This sounds reasonable. I run reiserfs here and don't know a whole lot
about ext2/3/4, so won't even attempt an opinion at that level of
detail. (That's why I left the actual recovery procedure after creating
the copy to work with so vague... I wasn't going to try to go there.)
However, I can say this. Based on my experience with recovery on
reiserfs (and in fact reiserfs and dd-rescue recovery notes, so it's not
just me), the block-size doesn't necessarily have to match, as it does
copy over "raw", so the data it gets it gets, and the data it doesn't,
well... It keeps it in the same order serially, as well, so that's not
an issue. What the block-size DOES affect is how much data is operated
on at once -- when it reaches bad blocks, that's the unit that's going to
determine the amount of missing data.
Working on a good disk, a relatively large block size (as long as it can
be buffered in memory) is often more efficient, that is, faster, because
the big blocks mean lower processing overhead. On a partially bad disk,
larger blocks will still allow it to cover the good area faster (but
that's trivial time anyway, compared to the time trying to access the bad
blocks), AND because the block size is larger, it SHOULD mean less bad
blocks to try and try and try before giving up in the bad areas too, so
faster there as well.
The flip side to the faster access over the bad areas is that as I said,
that's the chunk size that's declared bad, so the larger the block size
you choose, the more potentially recoverable data gets declared bad when
the entire block is declared bad.
As for working off the bad disk vs working off an image of it, as long as
you can continue to recover data off the bad disk, you can keep trying to
use it. The problem, of course, is that every access might be your last,
and it's also possible that each time thru may lose a few more blocks of
data at the margin.
So it's up to you. The aligned image will certainly be easier to work
with, but you might not be able to get the same amount of valid data off.
... You never mentioned exactly what happened to the disk. Mine was
overheating. I live in Phoenix, AZ, and my AC went out in the middle of
the summer, with me gone and the computer left running. With outside
temps often reaching close to 50 C (122 F), the temps inside with the AC
off could have easily reached 60 C (140 F). Ambient case air temps could
therefore have reached 70 C, and with the drive spinning in that... one
can only guess what temps it reached!
Well, rather obviously, the platters expanded and the heads crashed,
grooving out a circle in the platter at whatever location they were at at
the time, plus wherever the still operating system told the heads to seek
to. However, once I came home and realized what had happened, I shut
down and let everything cool down. After replacing the AC, with
everything running normal temps again, I was able to boot back up.
I ended up with two separate heavily damaged areas in which I could
recover little if anything, but fortunately, the partition table and
superblocks were intact. I also had been running backup partition copies
of most of my valuable stuff, by partition, and was able to recover most
of it from that (barring the new stuff since my last backup, which was
longer ago then it should have been), since they had been unmounted at
the time and therefore didn't have the heads seeking into them, only
across them a few times.
Actually, perhaps surprisingly, I was able to run those disks for some
time without any known additional damage. I did switch disks as soon as
possible, because I was leery of continuing to depend on the partially
bad ones, but in the mean time, I just checked off the affected
partitions as dead, and continued to use the others without issue. In
fact, I still have the disk, and might still be using it for extra
storage, except that was the second disk I had lost in two years (looking
back, the one I'd lost the previous year was probably heat related as
well, as it had the same failure pattern, and the AC wasn't doing so well
even then), and I decided to switch to RAID and go slower speed but
longer warrantee (5 yr) Seagate drives. Those are now going into their
third year, without issue (and with a new AC with cooling capacity to
spare, so hopefully it'll be several years before I need to worry about
/that/ issue again), but at least now I have the RAID backing me up, with
most of the system on kernel/md RAID-6, so I can lose up to two of the
four drives and maintain data integrity. I am, however, already thinking
about how I'll do it better next time, now that I've a bit of RAID
experience under my belt. =8^)
So anyway, if it was heat related, chances are pretty decent it'll remain
relatively stable, no additional data loss, as long as you keep pretty
strict watch on the temps and don't let it overheat again. That was my
experience this last time, when I know it was heat related, and the time
before, which had the same failure pattern, so I'm guessing it was heat
related. Of course, you never can tell, but that has been my experience
with heat related disk failures, anyway.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-20 9:01 ` Beso
@ 2007-11-20 10:06 ` Raffaele BELARDI
0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20 10:06 UTC (permalink / raw
To: gentoo-amd64
Beso wrote:
>
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
>
>
> this should tell you what the block size is:
> df /dev/sdc
>
Beso,
the block size in the filesystem is 4k, this I know having formatted a
second HD with same sized partition as failed HD.
I don't know how much data I should tell dd to skip (and NULL=fill) in
case of read error. From the dd output I only know there is a read error
and only 32kbyte were transferred, but I have no idea how big was the
block of data dd was trying to access.
But probably, as Duncan suggests, this is not important. dd will try to
access a block size as specified on the command line, and if error
occurs it will zero-fill and jump to next block of data, but this size
it totally unrelated with the _filesystem_ block size.
raffaele
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-20 9:41 ` Duncan
@ 2007-11-20 10:25 ` Raffaele BELARDI
0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20 10:25 UTC (permalink / raw
To: gentoo-amd64
Duncan wrote:
> Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com,
> excerpted below, on Tue, 20 Nov 2007 08:47:32 +0100:
>
> ... You never mentioned exactly what happened to the disk. Mine was
> overheating. I live in Phoenix, AZ, and my AC went out in the middle of
> the summer, with me gone and the computer left running. With outside
> temps often reaching close to 50 C (122 F), the temps inside with the AC
> off could have easily reached 60 C (140 F). Ambient case air temps could
> therefore have reached 70 C, and with the drive spinning in that... one
> can only guess what temps it reached!
Duncan,
I never get those high temperatures here, the highest I've seen at home
is ~30 C (86 F), no AC. I don't think the failure is temperature
related, I always mount a 12cm fan in front of the the HD enclosure, and
the HDs aren't even warm.
Previously the same HD was mounted in a different box, still AUSUS mobo
but VIA chipset instead of NVidia, and different BIOS. In that box the
disk was not automounted from fstab and if not mounted manually it was
normally completely cold even without the 12cm fan running, so I suspect
it was not even spinning (low power mode from BIOS?). I mounted it only
to archive movies after viewing them. In the Nvidia box behavior is
different, the disk warms up even if not mounted, but with the 12cm fan
always running the temp is low.
I have no explanation for this failure. The disk was used really little.
After (if) I recover data I'll try to read the HD SMART database, if I
remember well there are some useful counters there.
The HD is less than 3 years old and should still be under warranty, I'll
try to get a replacement from Seagate. But my movies are more important
now!
I'm sure it's not related, but the irony is that just few hours before
the failure I had run an e2fsck on that partition, not an error did it
report.
raffaele
PS Thanks for the hint on block size.
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-20 7:47 ` Raffaele BELARDI
2007-11-20 9:01 ` Beso
2007-11-20 9:41 ` Duncan
@ 2007-11-21 9:24 ` Raffaele BELARDI
2007-11-21 10:20 ` Raffaele BELARDI
2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
2 siblings, 2 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21 9:24 UTC (permalink / raw
To: gentoo-amd64
Raffaele Belardi wrote:
> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try:
> # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
>
So, I re-created an image of the disk with the above. The first 512
bytes contain an MBR (I recognize the aa55 signature), then lots of
nulls until what seems a part of the first superblock, obviously
unusable. I manually searched for another superblock looking for the
magic number and found one at 0x8007e00. According to mkfs the first sb
backup should be at block 32768, so byte 0x8000000. Thus the image is
shifted up of 0x7e00 bytes (probably the sum of MBR+Grub stage 1.5,
although the numbers do not correspond with another drive I used to check).
Now the problem is how to tell e2fsck to use sb at 0x8007e00? This is
not divisible by block size (0x1000), I tried to specify a different
block size with -B 512 but it complains that it's not legal size. Should
I trim the first 0x7e00 bytes of the image and use e2fsck normally (with
-b 32768)? If so, how can I remove the first 0x7e00 bytes without
re-reading the whole image from damaged disc?
Other options I may have? Is it possible to fake the kernel into using
the sdb.img as a disk image, so mount it as a disk (not as a partition)
so maybe it automatically skips the first 0x7e00 bytes and gives me an
aligned first partition?
thanks,
raffaele
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-21 9:24 ` Raffaele BELARDI
@ 2007-11-21 10:20 ` Raffaele BELARDI
2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
1 sibling, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21 10:20 UTC (permalink / raw
To: gentoo-amd64
Raffaele Belardi wrote:
> Other options I may have? Is it possible to fake the kernel into using
> the sdb.img as a disk image, so mount it as a disk (not as a partition)
> so maybe it automatically skips the first 0x7e00 bytes and gives me an
> aligned first partition?
losetup
http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux
Google before you ask...
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-21 9:24 ` Raffaele BELARDI
2007-11-21 10:20 ` Raffaele BELARDI
@ 2007-11-21 14:45 ` Billy Holmes
2007-11-21 15:24 ` Raffaele BELARDI
1 sibling, 1 reply; 16+ messages in thread
From: Billy Holmes @ 2007-11-21 14:45 UTC (permalink / raw
To: gentoo-amd64
Quoting Raffaele BELARDI <raffaele.belardi@st.com>:
> Raffaele Belardi wrote:
> the sdb.img as a disk image, so mount it as a disk (not as a partition)
> so maybe it automatically skips the first 0x7e00 bytes and gives me an
> aligned first partition?
have you tried playing with losetup ?
this link might help
http://lists.samba.org/archive/linux/2004-December/012627.html
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
@ 2007-11-21 15:24 ` Raffaele BELARDI
0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21 15:24 UTC (permalink / raw
To: gentoo-amd64
Billy Holmes wrote:
> Quoting Raffaele BELARDI <raffaele.belardi@st.com>:
>
>> Raffaele Belardi wrote:
>> the sdb.img as a disk image, so mount it as a disk (not as a partition)
>> so maybe it automatically skips the first 0x7e00 bytes and gives me an
>> aligned first partition?
>
> have you tried playing with losetup ?
>
> this link might help
>
> http://lists.samba.org/archive/linux/2004-December/012627.html
>
Thanks, I had never heard of this command till this morning, there's
always something to learn. I'll give it a try tonight.
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS
2007-11-21 10:20 ` Raffaele BELARDI
@ 2007-11-22 9:36 ` Raffaele BELARDI
2007-11-23 10:07 ` Duncan
0 siblings, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-22 9:36 UTC (permalink / raw
To: gentoo-amd64
Raffaele Belardi wrote:
> Raffaele Belardi wrote:
>> Other options I may have? Is it possible to fake the kernel into using
>> the sdb.img as a disk image, so mount it as a disk (not as a partition)
>> so maybe it automatically skips the first 0x7e00 bytes and gives me an
>> aligned first partition?
>
> losetup
>
> http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux
>
> Google before you ask...
>
So, for the record, finally using the hints from the above link I
managed to recover most of the data. The successful commands were:
# losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img
# mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro
Thanks to all for the precious help.
raffaele
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS
2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
@ 2007-11-23 10:07 ` Duncan
0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2007-11-23 10:07 UTC (permalink / raw
To: gentoo-amd64
Raffaele BELARDI <raffaele.belardi@st.com> posted 47454D91.5030109@st.com,
excerpted below, on Thu, 22 Nov 2007 10:36:17 +0100:
> So, for the record, finally using the hints from the above link I
> managed to recover most of the data. The successful commands were:
>
> # losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img
> # mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro
>
> Thanks to all for the precious help.
Thanks for the success story followup! =8^)
Besides being gratifying to know something that we helped with worked,
knowing exactly /what/ worked can be most helpful next time someone sees
the problem, either experiencing it one's self, or helping someone else
with it, as here.
Besides, with the list available on the web via gmane and the like,
there's always the chance that someone with the same issue will find this
discussion months or years later via Google or similar, and the followup
report of what exactly /did/ work can then be helpful to an entirely /
new/ generation of users! Never underestimate the possible audience of
something like this! It could end up being useful to way more people than
any of the original participants ever considered likely or in some cases,
even possible. =8^)
So certainly, thanks, even if I /am/ hoping that with luck, I'll never
actually have to use this again myself. =8^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
--
gentoo-amd64@gentoo.org mailing list
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-11-23 10:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI
2007-11-19 11:52 ` Beso
2007-11-19 13:27 ` Raffaele BELARDI
2007-11-19 13:56 ` Beso
2007-11-19 17:14 ` [gentoo-amd64] " Duncan
2007-11-20 7:47 ` Raffaele BELARDI
2007-11-20 9:01 ` Beso
2007-11-20 10:06 ` Raffaele BELARDI
2007-11-20 9:41 ` Duncan
2007-11-20 10:25 ` Raffaele BELARDI
2007-11-21 9:24 ` Raffaele BELARDI
2007-11-21 10:20 ` Raffaele BELARDI
2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
2007-11-23 10:07 ` Duncan
2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
2007-11-21 15:24 ` Raffaele BELARDI
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox