public inbox for gentoo-amd64@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-amd64] not amd64 specific - disk failure
@ 2007-11-19 10:05 Raffaele BELARDI
  2007-11-19 11:52 ` Beso
  0 siblings, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-19 10:05 UTC (permalink / raw
  To: gentoo-amd64

Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
The drive is formatted as ext3, single partition (sdc1), no RAID, used
as an archive of divx movies, completely full with data. Motherboard is
ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
ASUS K8V SE (Via K8T800 chipset) if it helps.

At boot the syslog shows (more or less):

I/O buffer read error: logical block 0
I/O buffer read error: logical block 1

Any attempt to mount /dev/sdc1 results in tens of the above message
(plus other details I don't remember right now) and finally fails.

fdisk -l shows the partition table as it should be.

It was late night so I gave up. Are there any chances to recover my data
by e.g. specifying a different superblock (whatever that is)? Any links
to help me?

thanks,

raffaele

PS I bought the drive in 2005 and I've used it only to archive movies,
so very little. It' the last Maxtor I buy (ok, also because it's Seagate
now..)
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] not amd64 specific - disk failure
  2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI
@ 2007-11-19 11:52 ` Beso
  2007-11-19 13:27   ` Raffaele BELARDI
  0 siblings, 1 reply; 16+ messages in thread
From: Beso @ 2007-11-19 11:52 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

before doing something on the disk first, read all and then take a decision
on which options may help you. on my reiserfs filesystem, the included utils
were enough to let me recover about 98% of the data after the full index
rebuild. if you've journaled your filesystem, then i think that you should
be able to recover it. also, it may only be a problem of superblock and in
that case the second link might help you more. anyway, get a disk that can
contain all the data that there was on the failed one, since you wouldn't
want to do stuff on it to avoid loss of data.
try reading this (the ext2/3 part):
http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
and this:
http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html
or you might try this utility:
http://www.cgsecurity.org/wiki/PhotoRec

2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
> The drive is formatted as ext3, single partition (sdc1), no RAID, used
> as an archive of divx movies, completely full with data. Motherboard is
> ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
> ASUS K8V SE (Via K8T800 chipset) if it helps.
>
> At boot the syslog shows (more or less):
>
> I/O buffer read error: logical block 0
> I/O buffer read error: logical block 1
>
> Any attempt to mount /dev/sdc1 results in tens of the above message
> (plus other details I don't remember right now) and finally fails.
>
> fdisk -l shows the partition table as it should be.
>
> It was late night so I gave up. Are there any chances to recover my data
> by e.g. specifying a different superblock (whatever that is)? Any links
> to help me?
>
> thanks,
>
> raffaele
>
> PS I bought the drive in 2005 and I've used it only to archive movies,
> so very little. It' the last Maxtor I buy (ok, also because it's Seagate
> now..)
> --
> gentoo-amd64@gentoo.org mailing list
>
>


-- 
dott. ing. beso

[-- Attachment #2: Type: text/html, Size: 2617 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] not amd64 specific - disk failure
  2007-11-19 11:52 ` Beso
@ 2007-11-19 13:27   ` Raffaele BELARDI
  2007-11-19 13:56     ` Beso
  2007-11-19 17:14     ` [gentoo-amd64] " Duncan
  0 siblings, 2 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-19 13:27 UTC (permalink / raw
  To: gentoo-amd64

Beso,

thanks for the links, I've already started reading. I've also got a new
drive to copy the recovered data (if any, cross fingers...)

Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
detected and a 'bad superblock' message displayed when attempting to mount.

In my case the kernel is unable to detect /dev/sdc1, after the long list
of read errors below it ends up with only /dev/sdc.

Does this look like superblock issue, or something worse?

thanks,

raffaele

Beso wrote:
> before doing something on the disk first, read all and then take a
> decision on which options may help you. on my reiserfs filesystem, the
> included utils were enough to let me recover about 98% of the data after
> the full index rebuild. if you've journaled your filesystem, then i
> think that you should be able to recover it. also, it may only be a
> problem of superblock and in that case the second link might help you
> more. anyway, get a disk that can contain all the data that there was on
> the failed one, since you wouldn't want to do stuff on it to avoid loss
> of data.
> try reading this (the ext2/3 part):
> http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
> and this:
> http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html
> or you might try this utility:
> http://www.cgsecurity.org/wiki/PhotoRec
> 
> 2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com
> <mailto:raffaele.belardi@st.com>>:
> 
>     Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail.
>     The drive is formatted as ext3, single partition (sdc1), no RAID, used
>     as an archive of divx movies, completely full with data. Motherboard is
>     ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an
>     ASUS K8V SE (Via K8T800 chipset) if it helps.
> 
>     At boot the syslog shows (more or less):
> 
>     I/O buffer read error: logical block 0
>     I/O buffer read error: logical block 1
> 
>     Any attempt to mount /dev/sdc1 results in tens of the above message
>     (plus other details I don't remember right now) and finally fails.
> 
>     fdisk -l shows the partition table as it should be.
> 
>     It was late night so I gave up. Are there any chances to recover my data
>     by e.g. specifying a different superblock (whatever that is)? Any links
>     to help me?
> 
>     thanks,
> 
>     raffaele
> 
>     PS I bought the drive in 2005 and I've used it only to archive movies,
>     so very little. It' the last Maxtor I buy (ok, also because it's Seagate
>     now..)
>     --
>     gentoo-amd64@gentoo.org <mailto:gentoo-amd64@gentoo.org> mailing list
> 
> 
> 
> 
> -- 
> dott. ing. beso
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] not amd64 specific - disk failure
  2007-11-19 13:27   ` Raffaele BELARDI
@ 2007-11-19 13:56     ` Beso
  2007-11-19 17:14     ` [gentoo-amd64] " Duncan
  1 sibling, 0 replies; 16+ messages in thread
From: Beso @ 2007-11-19 13:56 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 989 bytes --]

2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Beso,
>
> thanks for the links, I've already started reading. I've also got a new
> drive to copy the recovered data (if any, cross fingers...)
>
> Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
> detected and a 'bad superblock' message displayed when attempting to
> mount.
>
> In my case the kernel is unable to detect /dev/sdc1, after the long list
> of read errors below it ends up with only /dev/sdc.
>
> Does this look like superblock issue, or something worse?
>
> thanks,
>
> raffaele


try a df /dev/sdc and see if this recognizes something. if it gives you
something then you might really have a superblock issue that could be
corrected according to this link:
http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/
.
if it wouldn't give you valid filesystem outputs, then maybe it's not a
superblock issue and it's a bit more complicated to recover from the
partition..

[-- Attachment #2: Type: text/html, Size: 1399 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-19 13:27   ` Raffaele BELARDI
  2007-11-19 13:56     ` Beso
@ 2007-11-19 17:14     ` Duncan
  2007-11-20  7:47       ` Raffaele BELARDI
  1 sibling, 1 reply; 16+ messages in thread
From: Duncan @ 2007-11-19 17:14 UTC (permalink / raw
  To: gentoo-amd64

Raffaele BELARDI <raffaele.belardi@st.com> posted 47418F30.6050607@st.com,
excerpted below, on  Mon, 19 Nov 2007 14:27:12 +0100:

> Most of the resources I've read up to now imply that e.g. /dev/sdc1 is
> detected and a 'bad superblock' message displayed when attempting to
> mount.
> 
> In my case the kernel is unable to detect /dev/sdc1, after the long list
> of read errors below it ends up with only /dev/sdc.
> 
> Does this look like superblock issue, or something worse?

If you have a spare drive of the same size or larger, you can try dd, or 
probably better yet, merge dd-rescue and try it.  They copy a file or 
part of one, in this case an entire block device, from one location to 
another, "raw".  What you want to do is copy the entire bad device, /dev/
sdc above, to the new device.  Then you have a copy to play around with 
without worrying about making the bad device worse before you get 
whatever you were trying to get off of it, off.

dd-rescue is different than dd in that if there are bad blocks, it will 
run until it starts hitting them, then it will work backwards from the 
other end until it hits them there, then it'll try blocks in the middle.  
Thus, if you have good blocks, bad blocks, good blocks, bad blocks, good 
blocks, dd-rescue recovers more of the disk in a reasonable amount of 
time, as compared to straight dd, which will try straight thru only.

(The problem is that once you start hitting bad blocks, everything slows 
down, because the system tries and retries the bad block multiple times 
before giving up, taking minutes to read a block or finally decide it 
can't, before moving on, where it'd read a good block in seconds.  Thus, 
to work thru even a few hundred bad blocks can take DAYS.  Giving up 
after a few and starting from the other end, then trying in the middle, 
increases the number of blocks recovered, without sitting there waiting 
for days for it to work thru all the bad ones and get to the rest of the 
good ones again, and that's what dd-rescue does for you, automates the 
give up and try from the other end and then in the middle stuff.)

The thing to remember when working with either program is that you want 
to be **VERY** **SURE** you get the right devices specified, particularly 
for the output device.  If you tell it to write to the device that your 
main system is on instead of the new empty device, it WILL overwrite your 
main system device, boot record, partition table, and all.  Thus, make 
TRIPLE SURE you have the right output device specified before you hit 
that enter key.

The reason I'm suggesting dd/dd-rescue is because they'll grab the raw 
data (what they can of it) directly off of the device you point them at (/
dev/sdc above, but as I said, be absolutely sure the devices aren't 
reordered after you attach the new one).  Since you can't mount the 
partition, you need something that can grab the info off of the 
unpartitioned drive itself.  Then, once you get a copy to work with, you 
can check what fdisk says about it and go from there.

In fact, if the data is worth it and you have the money, you may want to 
get TWO replacement drives, one to make a "safe" copy to, and a second to 
make a working copy (from the safe copy) to.  Then you play with the 
working copy, and if you screw up, you can simply recopy from the safe 
copy, without having to go back to the damaged drive -- because it's 
possible you'll only get the one chance to get stuff off of the damaged 
one.

If the damage is severe enough dd-rescue can't pull anything off, 
consider wrapping the drive in paper for padding and moisture absorption) 
then plastic (preferably double- or triple-wrapped), then put it in your 
freezer overnight.  There are quite a number of tales of folks that had 
dead drives that they were able to revive long enough by freezing them, 
to get the stuff or at least part of the stuff off that they needed to.  
Keep in mind that as soon as you take it out and plug it in, it'll start 
warming up, and you may only have a few minutes, if it's bad enough.  In 
that case, if you have just a few smaller files you really want, you can 
hope the filesystem is usable again, and you may have time enough to 
retrieve them.  If that's not the case, and you need to go for the bulk, 
then you can write down how far you get, then try freezing it again, and 
tell dd-rescue to start where it stopped the second time.  Of course, you 
may have a limited number of times even freezing the drive will work, and 
you likely won't recover the entire thing this way, but if some is better 
than none...

There are special forensics LiveCD distributions out there.  Try STD and 
INSERT (google them, that's how I found them), both based off KNOPPIX, I 
believe.  INSERT is small enough to fit on the small credit/business-card 
sized CD, 180 MB or some such.  STD is a full-sized CD-image, basically a 
normal KNOPPIX only with enough stuff removed to load the extra forensics/
recovery/etc tools.  It even still has some games (Frozen Bubble, etc) on 
it.  They may help you recover something workable off the image you 
copied over with dd-rescue.  Of course, since they have a bunch of 
programs, including AV and other MS Windows recovery stuff (there's a 
nice MS eXPrivacy password blanker utility on there, my boss ran into 
trouble, hadn't created a password reset disk, and I had to use it on his 
box, yes, it was his), and network troubleshooting stuff as well, you'll 
probably want to grab them and play around with them a bit to see what 
they are like, before actually starting to work on recovering your data.

Hope that's useful.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-19 17:14     ` [gentoo-amd64] " Duncan
@ 2007-11-20  7:47       ` Raffaele BELARDI
  2007-11-20  9:01         ` Beso
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20  7:47 UTC (permalink / raw
  To: gentoo-amd64

Duncan wrote:
> If you have a spare drive of the same size or larger, you can try dd, or 
> probably better yet, merge dd-rescue and try it.  They copy a file or 
> part of one, in this case an entire block device, from one location to 
> another, "raw".  What you want to do is copy the entire bad device, /dev/
> sdc above, to the new device.  Then you have a copy to play around with 
> without worrying about making the bad device worse before you get 
> whatever you were trying to get off of it, off.
> 
Duncan,

thanks for the ddrescue explanation, I will surely give it a try.

Yesterday evening I got a new drive double the size of the damaged one,
created a 250Gb partition on it and tried:
# dd if=/dev/hdb of=/mnt/disk_500/sdb.img

It stopped after few kb due to read errors. So I modified to
dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img

and after some time I got a 250Gb sdb.img on the new drive.
Then started the fun (it was already past midnight). When I created the
new partition I noted down the superblock backup locations.
Unfortunately, every:
# e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img

returned 'bad superblock'. After googling for some utility to scan disc
for superblocks, I ended up with testdisk (it's ~amd64). To my
understanding this works on real HW only, so I had to reconnect the
damaged HD and let it do its job. testdisk found the superblocks, but
according to it they were at the exact locations I had already noted, so
no help. I also tried to let it search for partitions because I read it
has an option to parse the directory. It worked, it let me see the list
of lost files, but that's all, it has no option to recover. But at least
it told me there is some good superblock somewhere.

Finally I went back to the sdg.img and used "od | less" to look at what
was present at the superblock location. What I saw was, I believe, a
part of the superblock (an almost regular patter of numbers, increasing,
which could be a list of blocks? I need to study ext2) but the point is
that this pattern began well before the 'theoretical address' of the
superblock.

So my hypothesis is that the bad blocks or sectors at the beginning of
the partition were not copied, or only partly copied, by dd, and due to
this the superblocks are all shifted down. Although I don't like to
access again the hw, maybe I should try:
# dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img

to get an aligned image. Problem is I don't know what bs= should be.
Block size, so 4k?

Any other option I might have?

thanks,

raffaele
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
  2007-11-20  7:47       ` Raffaele BELARDI
@ 2007-11-20  9:01         ` Beso
  2007-11-20 10:06           ` Raffaele BELARDI
  2007-11-20  9:41         ` Duncan
  2007-11-21  9:24         ` Raffaele BELARDI
  2 siblings, 1 reply; 16+ messages in thread
From: Beso @ 2007-11-20  9:01 UTC (permalink / raw
  To: gentoo-amd64

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

2007/11/20, Raffaele BELARDI <raffaele.belardi@st.com>:
>
> Duncan wrote:
> > If you have a spare drive of the same size or larger, you can try dd, or
> > probably better yet, merge dd-rescue and try it.  They copy a file or
> > part of one, in this case an entire block device, from one location to
> > another, "raw".  What you want to do is copy the entire bad device,
> /dev/
> > sdc above, to the new device.  Then you have a copy to play around with
> > without worrying about making the bad device worse before you get
> > whatever you were trying to get off of it, off.
> >
> Duncan,
>
> thanks for the ddrescue explanation, I will surely give it a try.
>
> Yesterday evening I got a new drive double the size of the damaged one,
> created a 250Gb partition on it and tried:
> # dd if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> It stopped after few kb due to read errors. So I modified to
> dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> and after some time I got a 250Gb sdb.img on the new drive.
> Then started the fun (it was already past midnight). When I created the
> new partition I noted down the superblock backup locations.
> Unfortunately, every:
> # e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img
>
> returned 'bad superblock'. After googling for some utility to scan disc
> for superblocks, I ended up with testdisk (it's ~amd64). To my
> understanding this works on real HW only, so I had to reconnect the
> damaged HD and let it do its job. testdisk found the superblocks, but
> according to it they were at the exact locations I had already noted, so
> no help. I also tried to let it search for partitions because I read it
> has an option to parse the directory. It worked, it let me see the list
> of lost files, but that's all, it has no option to recover. But at least
> it told me there is some good superblock somewhere.
>
> Finally I went back to the sdg.img and used "od | less" to look at what
> was present at the superblock location. What I saw was, I believe, a
> part of the superblock (an almost regular patter of numbers, increasing,
> which could be a list of blocks? I need to study ext2) but the point is
> that this pattern began well before the 'theoretical address' of the
> superblock.
>
> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try:
> # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img
>
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?


this should tell you what the block size is:
df /dev/sdc




-- 
dott. ing. beso

[-- Attachment #2: Type: text/html, Size: 3205 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-20  7:47       ` Raffaele BELARDI
  2007-11-20  9:01         ` Beso
@ 2007-11-20  9:41         ` Duncan
  2007-11-20 10:25           ` Raffaele BELARDI
  2007-11-21  9:24         ` Raffaele BELARDI
  2 siblings, 1 reply; 16+ messages in thread
From: Duncan @ 2007-11-20  9:41 UTC (permalink / raw
  To: gentoo-amd64

Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com,
excerpted below, on  Tue, 20 Nov 2007 08:47:32 +0100:

> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try: # dd conv=noerror,sync bs=4096
> if=/dev/hdb of=/mnt/disk_500/sdb.img
> 
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
> 
> Any other option I might have?

This sounds reasonable.  I run reiserfs here and don't know a whole lot 
about ext2/3/4, so won't even attempt an opinion at that level of 
detail.  (That's why I left the actual recovery procedure after creating 
the copy to work with so vague... I wasn't going to try to go there.)

However, I can say this.  Based on my experience with recovery on 
reiserfs (and in fact reiserfs and dd-rescue recovery notes, so it's not 
just me), the block-size doesn't necessarily have to match, as it does 
copy over "raw", so the data it gets it gets, and the data it doesn't, 
well...  It keeps it in the same order serially, as well, so that's not 
an issue.  What the block-size DOES affect is how much data is operated 
on at once -- when it reaches bad blocks, that's the unit that's going to 
determine the amount of missing data.

Working on a good disk, a relatively large block size (as long as it can 
be buffered in memory) is often more efficient, that is, faster, because 
the big blocks mean lower processing overhead.  On a partially bad disk, 
larger blocks will still allow it to cover the good area faster (but 
that's trivial time anyway, compared to the time trying to access the bad 
blocks), AND because the block size is larger, it SHOULD mean less bad 
blocks to try and try and try before giving up in the bad areas too, so 
faster there as well.

The flip side to the faster access over the bad areas is that as I said, 
that's the chunk size that's declared bad, so the larger the block size 
you choose, the more potentially recoverable data gets declared bad when 
the entire block is declared bad.

As for working off the bad disk vs working off an image of it, as long as 
you can continue to recover data off the bad disk, you can keep trying to 
use it.  The problem, of course, is that every access might be your last, 
and it's also possible that each time thru may lose a few more blocks of 
data at the margin.

So it's up to you.  The aligned image will certainly be easier to work 
with, but you might not be able to get the same amount of valid data off.

... You never mentioned exactly what happened to the disk.  Mine was 
overheating.  I live in Phoenix, AZ, and my AC went out in the middle of 
the summer, with me gone and the computer left running.  With outside 
temps often reaching close to 50 C (122 F), the temps inside with the AC 
off could have easily reached 60 C (140 F).  Ambient case air temps could 
therefore have reached 70 C, and with the drive spinning in that... one 
can only guess what temps it reached!

Well, rather obviously, the platters expanded and the heads crashed, 
grooving out a circle in the platter at whatever location they were at at 
the time, plus wherever the still operating system told the heads to seek 
to.  However, once I came home and realized what had happened, I shut 
down and let everything cool down.  After replacing the AC, with 
everything running normal temps again, I was able to boot back up.

I ended up with two separate heavily damaged areas in which I could 
recover little if anything, but fortunately, the partition table and 
superblocks were intact.  I also had been running backup partition copies 
of most of my valuable stuff, by partition, and was able to recover most 
of it from that (barring the new stuff since my last backup, which was 
longer ago then it should have been), since they had been unmounted at 
the time and therefore didn't have the heads seeking into them, only 
across them a few times.

Actually, perhaps surprisingly, I was able to run those disks for some 
time without any known additional damage.  I did switch disks as soon as 
possible, because I was leery of continuing to depend on the partially 
bad ones, but in the mean time, I just checked off the affected 
partitions as dead, and continued to use the others without issue.  In 
fact, I still have the disk, and might still be using it for extra 
storage, except that was the second disk I had lost in two years (looking 
back, the one I'd lost the previous year was probably heat related as 
well, as it had the same failure pattern, and the AC wasn't doing so well 
even then), and I decided to switch to RAID and go slower speed but 
longer warrantee (5 yr) Seagate drives.  Those are now going into their 
third year, without issue (and with a new AC with cooling capacity to 
spare, so hopefully it'll be several years before I need to worry about 
/that/ issue again), but at least now I have the RAID backing me up, with 
most of the system on kernel/md RAID-6, so I can lose up to two of the 
four drives and maintain data integrity.  I am, however, already thinking 
about how I'll do it better next time, now that I've a bit of RAID 
experience under my belt. =8^)

So anyway, if it was heat related, chances are pretty decent it'll remain 
relatively stable, no additional data loss, as long as you keep pretty 
strict watch on the temps and don't let it overheat again.  That was my 
experience this last time, when I know it was heat related, and the time 
before, which had the same failure pattern, so I'm guessing it was heat 
related.  Of course, you never can tell, but that has been my experience 
with heat related disk failures, anyway.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64] Re: not amd64 specific - disk failure
  2007-11-20  9:01         ` Beso
@ 2007-11-20 10:06           ` Raffaele BELARDI
  0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20 10:06 UTC (permalink / raw
  To: gentoo-amd64

Beso wrote:
> 
>     to get an aligned image. Problem is I don't know what bs= should be.
>     Block size, so 4k?
> 
> 
> this should tell you what the block size is:
> df /dev/sdc 
> 

Beso,

the block size in the filesystem is 4k, this I know having formatted a
second HD with same sized partition as failed HD.

I don't know how much data I should tell dd to skip (and NULL=fill) in
case of read error. From the dd output I only know there is a read error
and only 32kbyte were transferred, but I have no idea how big was the
block of data dd was trying to access.

But probably, as Duncan suggests, this is not important. dd will try to
access a block size as specified on the command line, and if error
occurs it will zero-fill and jump to next block of data, but this size
it totally unrelated with the _filesystem_ block size.

raffaele
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-20  9:41         ` Duncan
@ 2007-11-20 10:25           ` Raffaele BELARDI
  0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-20 10:25 UTC (permalink / raw
  To: gentoo-amd64

Duncan wrote:
> Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com,
> excerpted below, on  Tue, 20 Nov 2007 08:47:32 +0100:
> 
> ... You never mentioned exactly what happened to the disk.  Mine was 
> overheating.  I live in Phoenix, AZ, and my AC went out in the middle of 
> the summer, with me gone and the computer left running.  With outside 
> temps often reaching close to 50 C (122 F), the temps inside with the AC 
> off could have easily reached 60 C (140 F).  Ambient case air temps could 
> therefore have reached 70 C, and with the drive spinning in that... one 
> can only guess what temps it reached!

Duncan,

I never get those high temperatures here, the highest I've seen at home
is ~30 C (86 F), no AC. I don't think the failure is temperature
related, I always mount a 12cm fan in front of the the HD enclosure, and
the HDs aren't even warm.

Previously the same HD was mounted in a different box, still AUSUS mobo
but VIA chipset instead of NVidia, and different BIOS. In that box the
disk was not automounted from fstab and if not mounted manually it was
normally completely cold even without the 12cm fan running, so I suspect
it was not even spinning (low power mode from BIOS?). I mounted it only
to archive movies after viewing them. In the Nvidia box behavior is
different, the disk warms up even if not mounted, but with the 12cm fan
always running the temp is low.

I have no explanation for this failure. The disk was used really little.
After (if) I recover data I'll try to read the HD SMART database, if I
remember well there are some useful counters there.

The HD is less than 3 years old and should still be under warranty, I'll
 try to get a replacement from Seagate. But my movies are more important
now!

I'm sure it's not related, but the irony is that just few hours before
the failure I had run an e2fsck on that partition, not an error did it
report.

raffaele

PS Thanks for the hint on block size.
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-20  7:47       ` Raffaele BELARDI
  2007-11-20  9:01         ` Beso
  2007-11-20  9:41         ` Duncan
@ 2007-11-21  9:24         ` Raffaele BELARDI
  2007-11-21 10:20           ` Raffaele BELARDI
  2007-11-21 14:45           ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
  2 siblings, 2 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21  9:24 UTC (permalink / raw
  To: gentoo-amd64

Raffaele Belardi wrote:
> So my hypothesis is that the bad blocks or sectors at the beginning of
> the partition were not copied, or only partly copied, by dd, and due to
> this the superblocks are all shifted down. Although I don't like to
> access again the hw, maybe I should try:
> # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img
> 
> to get an aligned image. Problem is I don't know what bs= should be.
> Block size, so 4k?
> 

So, I re-created an image of the disk with the above. The first 512
bytes contain an MBR (I recognize the aa55 signature), then lots of
nulls until what seems a part of the first superblock, obviously
unusable. I manually searched for another superblock looking for the
magic number and found one at 0x8007e00. According to mkfs the first sb
backup should be at block 32768, so byte 0x8000000. Thus the image is
shifted up of 0x7e00 bytes (probably the sum of MBR+Grub stage 1.5,
although the numbers do not correspond with another drive I used to check).

Now the problem is how to tell e2fsck to use sb at 0x8007e00? This is
not divisible by block size (0x1000), I tried to specify a different
block size with -B 512 but it complains that it's not legal size. Should
I trim the first 0x7e00 bytes of the image and use e2fsck normally (with
-b 32768)? If so, how can I remove the first 0x7e00 bytes without
re-reading the whole image from damaged disc?

Other options I may have? Is it possible to fake the kernel into using
the sdb.img as a disk image, so mount it as a disk (not as a partition)
so maybe it automatically skips the first 0x7e00 bytes and gives me an
aligned first partition?

thanks,

raffaele
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-21  9:24         ` Raffaele BELARDI
@ 2007-11-21 10:20           ` Raffaele BELARDI
  2007-11-22  9:36             ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
  2007-11-21 14:45           ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
  1 sibling, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21 10:20 UTC (permalink / raw
  To: gentoo-amd64

Raffaele Belardi wrote:
> Other options I may have? Is it possible to fake the kernel into using
> the sdb.img as a disk image, so mount it as a disk (not as a partition)
> so maybe it automatically skips the first 0x7e00 bytes and gives me an
> aligned first partition?

losetup

http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux

Google before you ask...
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-21  9:24         ` Raffaele BELARDI
  2007-11-21 10:20           ` Raffaele BELARDI
@ 2007-11-21 14:45           ` Billy Holmes
  2007-11-21 15:24             ` Raffaele BELARDI
  1 sibling, 1 reply; 16+ messages in thread
From: Billy Holmes @ 2007-11-21 14:45 UTC (permalink / raw
  To: gentoo-amd64

Quoting Raffaele BELARDI <raffaele.belardi@st.com>:

> Raffaele Belardi wrote:
> the sdb.img as a disk image, so mount it as a disk (not as a partition)
> so maybe it automatically skips the first 0x7e00 bytes and gives me an
> aligned first partition?

have you tried playing with losetup ?

this link might help

http://lists.samba.org/archive/linux/2004-December/012627.html

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure
  2007-11-21 14:45           ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
@ 2007-11-21 15:24             ` Raffaele BELARDI
  0 siblings, 0 replies; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-21 15:24 UTC (permalink / raw
  To: gentoo-amd64

Billy Holmes wrote:
> Quoting Raffaele BELARDI <raffaele.belardi@st.com>:
> 
>> Raffaele Belardi wrote:
>> the sdb.img as a disk image, so mount it as a disk (not as a partition)
>> so maybe it automatically skips the first 0x7e00 bytes and gives me an
>> aligned first partition?
> 
> have you tried playing with losetup ?
> 
> this link might help
> 
> http://lists.samba.org/archive/linux/2004-December/012627.html
> 

Thanks, I had never heard of this command till this morning, there's
always something to learn. I'll give it a try tonight.
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [gentoo-amd64]  Re: not amd64 specific - disk failure - SUCCESS
  2007-11-21 10:20           ` Raffaele BELARDI
@ 2007-11-22  9:36             ` Raffaele BELARDI
  2007-11-23 10:07               ` Duncan
  0 siblings, 1 reply; 16+ messages in thread
From: Raffaele BELARDI @ 2007-11-22  9:36 UTC (permalink / raw
  To: gentoo-amd64

Raffaele Belardi wrote:
> Raffaele Belardi wrote:
>> Other options I may have? Is it possible to fake the kernel into using
>> the sdb.img as a disk image, so mount it as a disk (not as a partition)
>> so maybe it automatically skips the first 0x7e00 bytes and gives me an
>> aligned first partition?
> 
> losetup
> 
> http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux
> 
> Google before you ask...
> 

So, for the record, finally using the hints from the above link I
managed to recover most of the data. The successful commands were:

# losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img
# mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro

Thanks to all for the precious help.

raffaele
-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [gentoo-amd64]  Re: not amd64 specific - disk failure - SUCCESS
  2007-11-22  9:36             ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
@ 2007-11-23 10:07               ` Duncan
  0 siblings, 0 replies; 16+ messages in thread
From: Duncan @ 2007-11-23 10:07 UTC (permalink / raw
  To: gentoo-amd64

Raffaele BELARDI <raffaele.belardi@st.com> posted 47454D91.5030109@st.com,
excerpted below, on  Thu, 22 Nov 2007 10:36:17 +0100:

> So, for the record, finally using the hints from the above link I
> managed to recover most of the data. The successful commands were:
> 
> # losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img 
> # mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro
> 
> Thanks to all for the precious help.

Thanks for the success story followup! =8^)

Besides being gratifying to know something that we helped with worked, 
knowing exactly /what/ worked can be most helpful next time someone sees 
the problem, either experiencing it one's self, or helping someone else 
with it, as here.  

Besides, with the list available on the web via gmane and the like, 
there's always the chance that someone with the same issue will find this 
discussion months or years later via Google or similar, and the followup 
report of what exactly /did/ work can then be helpful to an entirely /
new/ generation of users!  Never underestimate the possible audience of 
something like this! It could end up being useful to way more people than 
any of the original participants ever considered likely or in some cases, 
even possible.  =8^)

So certainly, thanks, even if I /am/ hoping that with luck, I'll never 
actually have to use this again myself. =8^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

-- 
gentoo-amd64@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-11-23 10:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI
2007-11-19 11:52 ` Beso
2007-11-19 13:27   ` Raffaele BELARDI
2007-11-19 13:56     ` Beso
2007-11-19 17:14     ` [gentoo-amd64] " Duncan
2007-11-20  7:47       ` Raffaele BELARDI
2007-11-20  9:01         ` Beso
2007-11-20 10:06           ` Raffaele BELARDI
2007-11-20  9:41         ` Duncan
2007-11-20 10:25           ` Raffaele BELARDI
2007-11-21  9:24         ` Raffaele BELARDI
2007-11-21 10:20           ` Raffaele BELARDI
2007-11-22  9:36             ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI
2007-11-23 10:07               ` Duncan
2007-11-21 14:45           ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes
2007-11-21 15:24             ` Raffaele BELARDI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox