* [gentoo-amd64] not amd64 specific - disk failure @ 2007-11-19 10:05 Raffaele BELARDI 2007-11-19 11:52 ` Beso 0 siblings, 1 reply; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-19 10:05 UTC (permalink / raw To: gentoo-amd64 Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail. The drive is formatted as ext3, single partition (sdc1), no RAID, used as an archive of divx movies, completely full with data. Motherboard is ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an ASUS K8V SE (Via K8T800 chipset) if it helps. At boot the syslog shows (more or less): I/O buffer read error: logical block 0 I/O buffer read error: logical block 1 Any attempt to mount /dev/sdc1 results in tens of the above message (plus other details I don't remember right now) and finally fails. fdisk -l shows the partition table as it should be. It was late night so I gave up. Are there any chances to recover my data by e.g. specifying a different superblock (whatever that is)? Any links to help me? thanks, raffaele PS I bought the drive in 2005 and I've used it only to archive movies, so very little. It' the last Maxtor I buy (ok, also because it's Seagate now..) -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure 2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI @ 2007-11-19 11:52 ` Beso 2007-11-19 13:27 ` Raffaele BELARDI 0 siblings, 1 reply; 16+ messages in thread From: Beso @ 2007-11-19 11:52 UTC (permalink / raw To: gentoo-amd64 [-- Attachment #1: Type: text/plain, Size: 1986 bytes --] before doing something on the disk first, read all and then take a decision on which options may help you. on my reiserfs filesystem, the included utils were enough to let me recover about 98% of the data after the full index rebuild. if you've journaled your filesystem, then i think that you should be able to recover it. also, it may only be a problem of superblock and in that case the second link might help you more. anyway, get a disk that can contain all the data that there was on the failed one, since you wouldn't want to do stuff on it to avoid loss of data. try reading this (the ext2/3 part): http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/ and this: http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html or you might try this utility: http://www.cgsecurity.org/wiki/PhotoRec 2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>: > > Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail. > The drive is formatted as ext3, single partition (sdc1), no RAID, used > as an archive of divx movies, completely full with data. Motherboard is > ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an > ASUS K8V SE (Via K8T800 chipset) if it helps. > > At boot the syslog shows (more or less): > > I/O buffer read error: logical block 0 > I/O buffer read error: logical block 1 > > Any attempt to mount /dev/sdc1 results in tens of the above message > (plus other details I don't remember right now) and finally fails. > > fdisk -l shows the partition table as it should be. > > It was late night so I gave up. Are there any chances to recover my data > by e.g. specifying a different superblock (whatever that is)? Any links > to help me? > > thanks, > > raffaele > > PS I bought the drive in 2005 and I've used it only to archive movies, > so very little. It' the last Maxtor I buy (ok, also because it's Seagate > now..) > -- > gentoo-amd64@gentoo.org mailing list > > -- dott. ing. beso [-- Attachment #2: Type: text/html, Size: 2617 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure 2007-11-19 11:52 ` Beso @ 2007-11-19 13:27 ` Raffaele BELARDI 2007-11-19 13:56 ` Beso 2007-11-19 17:14 ` [gentoo-amd64] " Duncan 0 siblings, 2 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-19 13:27 UTC (permalink / raw To: gentoo-amd64 Beso, thanks for the links, I've already started reading. I've also got a new drive to copy the recovered data (if any, cross fingers...) Most of the resources I've read up to now imply that e.g. /dev/sdc1 is detected and a 'bad superblock' message displayed when attempting to mount. In my case the kernel is unable to detect /dev/sdc1, after the long list of read errors below it ends up with only /dev/sdc. Does this look like superblock issue, or something worse? thanks, raffaele Beso wrote: > before doing something on the disk first, read all and then take a > decision on which options may help you. on my reiserfs filesystem, the > included utils were enough to let me recover about 98% of the data after > the full index rebuild. if you've journaled your filesystem, then i > think that you should be able to recover it. also, it may only be a > problem of superblock and in that case the second link might help you > more. anyway, get a disk that can contain all the data that there was on > the failed one, since you wouldn't want to do stuff on it to avoid loss > of data. > try reading this (the ext2/3 part): > http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/ > and this: > http://forums.gentoo.org/viewtopic-t-569462-highlight-ext3+recover.html > or you might try this utility: > http://www.cgsecurity.org/wiki/PhotoRec > > 2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com > <mailto:raffaele.belardi@st.com>>: > > Yesterday evening I had one 250Gb SATA disk Maxtor MaXLine Plus II fail. > The drive is formatted as ext3, single partition (sdc1), no RAID, used > as an archive of divx movies, completely full with data. Motherboard is > ASUS M2NPV-VM (Nvidia Nforce 430 chipset), I can easily mount it on an > ASUS K8V SE (Via K8T800 chipset) if it helps. > > At boot the syslog shows (more or less): > > I/O buffer read error: logical block 0 > I/O buffer read error: logical block 1 > > Any attempt to mount /dev/sdc1 results in tens of the above message > (plus other details I don't remember right now) and finally fails. > > fdisk -l shows the partition table as it should be. > > It was late night so I gave up. Are there any chances to recover my data > by e.g. specifying a different superblock (whatever that is)? Any links > to help me? > > thanks, > > raffaele > > PS I bought the drive in 2005 and I've used it only to archive movies, > so very little. It' the last Maxtor I buy (ok, also because it's Seagate > now..) > -- > gentoo-amd64@gentoo.org <mailto:gentoo-amd64@gentoo.org> mailing list > > > > > -- > dott. ing. beso -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] not amd64 specific - disk failure 2007-11-19 13:27 ` Raffaele BELARDI @ 2007-11-19 13:56 ` Beso 2007-11-19 17:14 ` [gentoo-amd64] " Duncan 1 sibling, 0 replies; 16+ messages in thread From: Beso @ 2007-11-19 13:56 UTC (permalink / raw To: gentoo-amd64 [-- Attachment #1: Type: text/plain, Size: 989 bytes --] 2007/11/19, Raffaele BELARDI <raffaele.belardi@st.com>: > > Beso, > > thanks for the links, I've already started reading. I've also got a new > drive to copy the recovered data (if any, cross fingers...) > > Most of the resources I've read up to now imply that e.g. /dev/sdc1 is > detected and a 'bad superblock' message displayed when attempting to > mount. > > In my case the kernel is unable to detect /dev/sdc1, after the long list > of read errors below it ends up with only /dev/sdc. > > Does this look like superblock issue, or something worse? > > thanks, > > raffaele try a df /dev/sdc and see if this recognizes something. if it gives you something then you might really have a superblock issue that could be corrected according to this link: http://edseek.com/archives/2004/02/25/ext3-filesystem-bad-superblock-recovery/ . if it wouldn't give you valid filesystem outputs, then maybe it's not a superblock issue and it's a bit more complicated to recover from the partition.. [-- Attachment #2: Type: text/html, Size: 1399 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-19 13:27 ` Raffaele BELARDI 2007-11-19 13:56 ` Beso @ 2007-11-19 17:14 ` Duncan 2007-11-20 7:47 ` Raffaele BELARDI 1 sibling, 1 reply; 16+ messages in thread From: Duncan @ 2007-11-19 17:14 UTC (permalink / raw To: gentoo-amd64 Raffaele BELARDI <raffaele.belardi@st.com> posted 47418F30.6050607@st.com, excerpted below, on Mon, 19 Nov 2007 14:27:12 +0100: > Most of the resources I've read up to now imply that e.g. /dev/sdc1 is > detected and a 'bad superblock' message displayed when attempting to > mount. > > In my case the kernel is unable to detect /dev/sdc1, after the long list > of read errors below it ends up with only /dev/sdc. > > Does this look like superblock issue, or something worse? If you have a spare drive of the same size or larger, you can try dd, or probably better yet, merge dd-rescue and try it. They copy a file or part of one, in this case an entire block device, from one location to another, "raw". What you want to do is copy the entire bad device, /dev/ sdc above, to the new device. Then you have a copy to play around with without worrying about making the bad device worse before you get whatever you were trying to get off of it, off. dd-rescue is different than dd in that if there are bad blocks, it will run until it starts hitting them, then it will work backwards from the other end until it hits them there, then it'll try blocks in the middle. Thus, if you have good blocks, bad blocks, good blocks, bad blocks, good blocks, dd-rescue recovers more of the disk in a reasonable amount of time, as compared to straight dd, which will try straight thru only. (The problem is that once you start hitting bad blocks, everything slows down, because the system tries and retries the bad block multiple times before giving up, taking minutes to read a block or finally decide it can't, before moving on, where it'd read a good block in seconds. Thus, to work thru even a few hundred bad blocks can take DAYS. Giving up after a few and starting from the other end, then trying in the middle, increases the number of blocks recovered, without sitting there waiting for days for it to work thru all the bad ones and get to the rest of the good ones again, and that's what dd-rescue does for you, automates the give up and try from the other end and then in the middle stuff.) The thing to remember when working with either program is that you want to be **VERY** **SURE** you get the right devices specified, particularly for the output device. If you tell it to write to the device that your main system is on instead of the new empty device, it WILL overwrite your main system device, boot record, partition table, and all. Thus, make TRIPLE SURE you have the right output device specified before you hit that enter key. The reason I'm suggesting dd/dd-rescue is because they'll grab the raw data (what they can of it) directly off of the device you point them at (/ dev/sdc above, but as I said, be absolutely sure the devices aren't reordered after you attach the new one). Since you can't mount the partition, you need something that can grab the info off of the unpartitioned drive itself. Then, once you get a copy to work with, you can check what fdisk says about it and go from there. In fact, if the data is worth it and you have the money, you may want to get TWO replacement drives, one to make a "safe" copy to, and a second to make a working copy (from the safe copy) to. Then you play with the working copy, and if you screw up, you can simply recopy from the safe copy, without having to go back to the damaged drive -- because it's possible you'll only get the one chance to get stuff off of the damaged one. If the damage is severe enough dd-rescue can't pull anything off, consider wrapping the drive in paper for padding and moisture absorption) then plastic (preferably double- or triple-wrapped), then put it in your freezer overnight. There are quite a number of tales of folks that had dead drives that they were able to revive long enough by freezing them, to get the stuff or at least part of the stuff off that they needed to. Keep in mind that as soon as you take it out and plug it in, it'll start warming up, and you may only have a few minutes, if it's bad enough. In that case, if you have just a few smaller files you really want, you can hope the filesystem is usable again, and you may have time enough to retrieve them. If that's not the case, and you need to go for the bulk, then you can write down how far you get, then try freezing it again, and tell dd-rescue to start where it stopped the second time. Of course, you may have a limited number of times even freezing the drive will work, and you likely won't recover the entire thing this way, but if some is better than none... There are special forensics LiveCD distributions out there. Try STD and INSERT (google them, that's how I found them), both based off KNOPPIX, I believe. INSERT is small enough to fit on the small credit/business-card sized CD, 180 MB or some such. STD is a full-sized CD-image, basically a normal KNOPPIX only with enough stuff removed to load the extra forensics/ recovery/etc tools. It even still has some games (Frozen Bubble, etc) on it. They may help you recover something workable off the image you copied over with dd-rescue. Of course, since they have a bunch of programs, including AV and other MS Windows recovery stuff (there's a nice MS eXPrivacy password blanker utility on there, my boss ran into trouble, hadn't created a password reset disk, and I had to use it on his box, yes, it was his), and network troubleshooting stuff as well, you'll probably want to grab them and play around with them a bit to see what they are like, before actually starting to work on recovering your data. Hope that's useful. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-19 17:14 ` [gentoo-amd64] " Duncan @ 2007-11-20 7:47 ` Raffaele BELARDI 2007-11-20 9:01 ` Beso ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-20 7:47 UTC (permalink / raw To: gentoo-amd64 Duncan wrote: > If you have a spare drive of the same size or larger, you can try dd, or > probably better yet, merge dd-rescue and try it. They copy a file or > part of one, in this case an entire block device, from one location to > another, "raw". What you want to do is copy the entire bad device, /dev/ > sdc above, to the new device. Then you have a copy to play around with > without worrying about making the bad device worse before you get > whatever you were trying to get off of it, off. > Duncan, thanks for the ddrescue explanation, I will surely give it a try. Yesterday evening I got a new drive double the size of the damaged one, created a 250Gb partition on it and tried: # dd if=/dev/hdb of=/mnt/disk_500/sdb.img It stopped after few kb due to read errors. So I modified to dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img and after some time I got a 250Gb sdb.img on the new drive. Then started the fun (it was already past midnight). When I created the new partition I noted down the superblock backup locations. Unfortunately, every: # e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img returned 'bad superblock'. After googling for some utility to scan disc for superblocks, I ended up with testdisk (it's ~amd64). To my understanding this works on real HW only, so I had to reconnect the damaged HD and let it do its job. testdisk found the superblocks, but according to it they were at the exact locations I had already noted, so no help. I also tried to let it search for partitions because I read it has an option to parse the directory. It worked, it let me see the list of lost files, but that's all, it has no option to recover. But at least it told me there is some good superblock somewhere. Finally I went back to the sdg.img and used "od | less" to look at what was present at the superblock location. What I saw was, I believe, a part of the superblock (an almost regular patter of numbers, increasing, which could be a list of blocks? I need to study ext2) but the point is that this pattern began well before the 'theoretical address' of the superblock. So my hypothesis is that the bad blocks or sectors at the beginning of the partition were not copied, or only partly copied, by dd, and due to this the superblocks are all shifted down. Although I don't like to access again the hw, maybe I should try: # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img to get an aligned image. Problem is I don't know what bs= should be. Block size, so 4k? Any other option I might have? thanks, raffaele -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-20 7:47 ` Raffaele BELARDI @ 2007-11-20 9:01 ` Beso 2007-11-20 10:06 ` Raffaele BELARDI 2007-11-20 9:41 ` Duncan 2007-11-21 9:24 ` Raffaele BELARDI 2 siblings, 1 reply; 16+ messages in thread From: Beso @ 2007-11-20 9:01 UTC (permalink / raw To: gentoo-amd64 [-- Attachment #1: Type: text/plain, Size: 2744 bytes --] 2007/11/20, Raffaele BELARDI <raffaele.belardi@st.com>: > > Duncan wrote: > > If you have a spare drive of the same size or larger, you can try dd, or > > probably better yet, merge dd-rescue and try it. They copy a file or > > part of one, in this case an entire block device, from one location to > > another, "raw". What you want to do is copy the entire bad device, > /dev/ > > sdc above, to the new device. Then you have a copy to play around with > > without worrying about making the bad device worse before you get > > whatever you were trying to get off of it, off. > > > Duncan, > > thanks for the ddrescue explanation, I will surely give it a try. > > Yesterday evening I got a new drive double the size of the damaged one, > created a 250Gb partition on it and tried: > # dd if=/dev/hdb of=/mnt/disk_500/sdb.img > > It stopped after few kb due to read errors. So I modified to > dd conv=noerror if=/dev/hdb of=/mnt/disk_500/sdb.img > > and after some time I got a 250Gb sdb.img on the new drive. > Then started the fun (it was already past midnight). When I created the > new partition I noted down the superblock backup locations. > Unfortunately, every: > # e2fsck -b xxx -B 4096 /mnt/disk_500/sdb.img > > returned 'bad superblock'. After googling for some utility to scan disc > for superblocks, I ended up with testdisk (it's ~amd64). To my > understanding this works on real HW only, so I had to reconnect the > damaged HD and let it do its job. testdisk found the superblocks, but > according to it they were at the exact locations I had already noted, so > no help. I also tried to let it search for partitions because I read it > has an option to parse the directory. It worked, it let me see the list > of lost files, but that's all, it has no option to recover. But at least > it told me there is some good superblock somewhere. > > Finally I went back to the sdg.img and used "od | less" to look at what > was present at the superblock location. What I saw was, I believe, a > part of the superblock (an almost regular patter of numbers, increasing, > which could be a list of blocks? I need to study ext2) but the point is > that this pattern began well before the 'theoretical address' of the > superblock. > > So my hypothesis is that the bad blocks or sectors at the beginning of > the partition were not copied, or only partly copied, by dd, and due to > this the superblocks are all shifted down. Although I don't like to > access again the hw, maybe I should try: > # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img > > to get an aligned image. Problem is I don't know what bs= should be. > Block size, so 4k? this should tell you what the block size is: df /dev/sdc -- dott. ing. beso [-- Attachment #2: Type: text/html, Size: 3205 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-20 9:01 ` Beso @ 2007-11-20 10:06 ` Raffaele BELARDI 0 siblings, 0 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-20 10:06 UTC (permalink / raw To: gentoo-amd64 Beso wrote: > > to get an aligned image. Problem is I don't know what bs= should be. > Block size, so 4k? > > > this should tell you what the block size is: > df /dev/sdc > Beso, the block size in the filesystem is 4k, this I know having formatted a second HD with same sized partition as failed HD. I don't know how much data I should tell dd to skip (and NULL=fill) in case of read error. From the dd output I only know there is a read error and only 32kbyte were transferred, but I have no idea how big was the block of data dd was trying to access. But probably, as Duncan suggests, this is not important. dd will try to access a block size as specified on the command line, and if error occurs it will zero-fill and jump to next block of data, but this size it totally unrelated with the _filesystem_ block size. raffaele -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-20 7:47 ` Raffaele BELARDI 2007-11-20 9:01 ` Beso @ 2007-11-20 9:41 ` Duncan 2007-11-20 10:25 ` Raffaele BELARDI 2007-11-21 9:24 ` Raffaele BELARDI 2 siblings, 1 reply; 16+ messages in thread From: Duncan @ 2007-11-20 9:41 UTC (permalink / raw To: gentoo-amd64 Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com, excerpted below, on Tue, 20 Nov 2007 08:47:32 +0100: > So my hypothesis is that the bad blocks or sectors at the beginning of > the partition were not copied, or only partly copied, by dd, and due to > this the superblocks are all shifted down. Although I don't like to > access again the hw, maybe I should try: # dd conv=noerror,sync bs=4096 > if=/dev/hdb of=/mnt/disk_500/sdb.img > > to get an aligned image. Problem is I don't know what bs= should be. > Block size, so 4k? > > Any other option I might have? This sounds reasonable. I run reiserfs here and don't know a whole lot about ext2/3/4, so won't even attempt an opinion at that level of detail. (That's why I left the actual recovery procedure after creating the copy to work with so vague... I wasn't going to try to go there.) However, I can say this. Based on my experience with recovery on reiserfs (and in fact reiserfs and dd-rescue recovery notes, so it's not just me), the block-size doesn't necessarily have to match, as it does copy over "raw", so the data it gets it gets, and the data it doesn't, well... It keeps it in the same order serially, as well, so that's not an issue. What the block-size DOES affect is how much data is operated on at once -- when it reaches bad blocks, that's the unit that's going to determine the amount of missing data. Working on a good disk, a relatively large block size (as long as it can be buffered in memory) is often more efficient, that is, faster, because the big blocks mean lower processing overhead. On a partially bad disk, larger blocks will still allow it to cover the good area faster (but that's trivial time anyway, compared to the time trying to access the bad blocks), AND because the block size is larger, it SHOULD mean less bad blocks to try and try and try before giving up in the bad areas too, so faster there as well. The flip side to the faster access over the bad areas is that as I said, that's the chunk size that's declared bad, so the larger the block size you choose, the more potentially recoverable data gets declared bad when the entire block is declared bad. As for working off the bad disk vs working off an image of it, as long as you can continue to recover data off the bad disk, you can keep trying to use it. The problem, of course, is that every access might be your last, and it's also possible that each time thru may lose a few more blocks of data at the margin. So it's up to you. The aligned image will certainly be easier to work with, but you might not be able to get the same amount of valid data off. ... You never mentioned exactly what happened to the disk. Mine was overheating. I live in Phoenix, AZ, and my AC went out in the middle of the summer, with me gone and the computer left running. With outside temps often reaching close to 50 C (122 F), the temps inside with the AC off could have easily reached 60 C (140 F). Ambient case air temps could therefore have reached 70 C, and with the drive spinning in that... one can only guess what temps it reached! Well, rather obviously, the platters expanded and the heads crashed, grooving out a circle in the platter at whatever location they were at at the time, plus wherever the still operating system told the heads to seek to. However, once I came home and realized what had happened, I shut down and let everything cool down. After replacing the AC, with everything running normal temps again, I was able to boot back up. I ended up with two separate heavily damaged areas in which I could recover little if anything, but fortunately, the partition table and superblocks were intact. I also had been running backup partition copies of most of my valuable stuff, by partition, and was able to recover most of it from that (barring the new stuff since my last backup, which was longer ago then it should have been), since they had been unmounted at the time and therefore didn't have the heads seeking into them, only across them a few times. Actually, perhaps surprisingly, I was able to run those disks for some time without any known additional damage. I did switch disks as soon as possible, because I was leery of continuing to depend on the partially bad ones, but in the mean time, I just checked off the affected partitions as dead, and continued to use the others without issue. In fact, I still have the disk, and might still be using it for extra storage, except that was the second disk I had lost in two years (looking back, the one I'd lost the previous year was probably heat related as well, as it had the same failure pattern, and the AC wasn't doing so well even then), and I decided to switch to RAID and go slower speed but longer warrantee (5 yr) Seagate drives. Those are now going into their third year, without issue (and with a new AC with cooling capacity to spare, so hopefully it'll be several years before I need to worry about /that/ issue again), but at least now I have the RAID backing me up, with most of the system on kernel/md RAID-6, so I can lose up to two of the four drives and maintain data integrity. I am, however, already thinking about how I'll do it better next time, now that I've a bit of RAID experience under my belt. =8^) So anyway, if it was heat related, chances are pretty decent it'll remain relatively stable, no additional data loss, as long as you keep pretty strict watch on the temps and don't let it overheat again. That was my experience this last time, when I know it was heat related, and the time before, which had the same failure pattern, so I'm guessing it was heat related. Of course, you never can tell, but that has been my experience with heat related disk failures, anyway. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-20 9:41 ` Duncan @ 2007-11-20 10:25 ` Raffaele BELARDI 0 siblings, 0 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-20 10:25 UTC (permalink / raw To: gentoo-amd64 Duncan wrote: > Raffaele BELARDI <raffaele.belardi@st.com> posted 47429114.5030102@st.com, > excerpted below, on Tue, 20 Nov 2007 08:47:32 +0100: > > ... You never mentioned exactly what happened to the disk. Mine was > overheating. I live in Phoenix, AZ, and my AC went out in the middle of > the summer, with me gone and the computer left running. With outside > temps often reaching close to 50 C (122 F), the temps inside with the AC > off could have easily reached 60 C (140 F). Ambient case air temps could > therefore have reached 70 C, and with the drive spinning in that... one > can only guess what temps it reached! Duncan, I never get those high temperatures here, the highest I've seen at home is ~30 C (86 F), no AC. I don't think the failure is temperature related, I always mount a 12cm fan in front of the the HD enclosure, and the HDs aren't even warm. Previously the same HD was mounted in a different box, still AUSUS mobo but VIA chipset instead of NVidia, and different BIOS. In that box the disk was not automounted from fstab and if not mounted manually it was normally completely cold even without the 12cm fan running, so I suspect it was not even spinning (low power mode from BIOS?). I mounted it only to archive movies after viewing them. In the Nvidia box behavior is different, the disk warms up even if not mounted, but with the 12cm fan always running the temp is low. I have no explanation for this failure. The disk was used really little. After (if) I recover data I'll try to read the HD SMART database, if I remember well there are some useful counters there. The HD is less than 3 years old and should still be under warranty, I'll try to get a replacement from Seagate. But my movies are more important now! I'm sure it's not related, but the irony is that just few hours before the failure I had run an e2fsck on that partition, not an error did it report. raffaele PS Thanks for the hint on block size. -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-20 7:47 ` Raffaele BELARDI 2007-11-20 9:01 ` Beso 2007-11-20 9:41 ` Duncan @ 2007-11-21 9:24 ` Raffaele BELARDI 2007-11-21 10:20 ` Raffaele BELARDI 2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes 2 siblings, 2 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-21 9:24 UTC (permalink / raw To: gentoo-amd64 Raffaele Belardi wrote: > So my hypothesis is that the bad blocks or sectors at the beginning of > the partition were not copied, or only partly copied, by dd, and due to > this the superblocks are all shifted down. Although I don't like to > access again the hw, maybe I should try: > # dd conv=noerror,sync bs=4096 if=/dev/hdb of=/mnt/disk_500/sdb.img > > to get an aligned image. Problem is I don't know what bs= should be. > Block size, so 4k? > So, I re-created an image of the disk with the above. The first 512 bytes contain an MBR (I recognize the aa55 signature), then lots of nulls until what seems a part of the first superblock, obviously unusable. I manually searched for another superblock looking for the magic number and found one at 0x8007e00. According to mkfs the first sb backup should be at block 32768, so byte 0x8000000. Thus the image is shifted up of 0x7e00 bytes (probably the sum of MBR+Grub stage 1.5, although the numbers do not correspond with another drive I used to check). Now the problem is how to tell e2fsck to use sb at 0x8007e00? This is not divisible by block size (0x1000), I tried to specify a different block size with -B 512 but it complains that it's not legal size. Should I trim the first 0x7e00 bytes of the image and use e2fsck normally (with -b 32768)? If so, how can I remove the first 0x7e00 bytes without re-reading the whole image from damaged disc? Other options I may have? Is it possible to fake the kernel into using the sdb.img as a disk image, so mount it as a disk (not as a partition) so maybe it automatically skips the first 0x7e00 bytes and gives me an aligned first partition? thanks, raffaele -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-21 9:24 ` Raffaele BELARDI @ 2007-11-21 10:20 ` Raffaele BELARDI 2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI 2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes 1 sibling, 1 reply; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-21 10:20 UTC (permalink / raw To: gentoo-amd64 Raffaele Belardi wrote: > Other options I may have? Is it possible to fake the kernel into using > the sdb.img as a disk image, so mount it as a disk (not as a partition) > so maybe it automatically skips the first 0x7e00 bytes and gives me an > aligned first partition? losetup http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux Google before you ask... -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS 2007-11-21 10:20 ` Raffaele BELARDI @ 2007-11-22 9:36 ` Raffaele BELARDI 2007-11-23 10:07 ` Duncan 0 siblings, 1 reply; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-22 9:36 UTC (permalink / raw To: gentoo-amd64 Raffaele Belardi wrote: > Raffaele Belardi wrote: >> Other options I may have? Is it possible to fake the kernel into using >> the sdb.img as a disk image, so mount it as a disk (not as a partition) >> so maybe it automatically skips the first 0x7e00 bytes and gives me an >> aligned first partition? > > losetup > > http://www.osdev.org/osfaq2/index.php/Disk%20Images%20Under%20Linux > > Google before you ask... > So, for the record, finally using the hints from the above link I managed to recover most of the data. The successful commands were: # losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img # mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro Thanks to all for the precious help. raffaele -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS 2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI @ 2007-11-23 10:07 ` Duncan 0 siblings, 0 replies; 16+ messages in thread From: Duncan @ 2007-11-23 10:07 UTC (permalink / raw To: gentoo-amd64 Raffaele BELARDI <raffaele.belardi@st.com> posted 47454D91.5030109@st.com, excerpted below, on Thu, 22 Nov 2007 10:36:17 +0100: > So, for the record, finally using the hints from the above link I > managed to recover most of the data. The successful commands were: > > # losetup -o32356 /dev/loop0 /mnt/disk_500/sdb.img > # mount -text2 /dev/loop /mnt/sdb_img -o sb=131072,ro > > Thanks to all for the precious help. Thanks for the success story followup! =8^) Besides being gratifying to know something that we helped with worked, knowing exactly /what/ worked can be most helpful next time someone sees the problem, either experiencing it one's self, or helping someone else with it, as here. Besides, with the list available on the web via gmane and the like, there's always the chance that someone with the same issue will find this discussion months or years later via Google or similar, and the followup report of what exactly /did/ work can then be helpful to an entirely / new/ generation of users! Never underestimate the possible audience of something like this! It could end up being useful to way more people than any of the original participants ever considered likely or in some cases, even possible. =8^) So certainly, thanks, even if I /am/ hoping that with luck, I'll never actually have to use this again myself. =8^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-21 9:24 ` Raffaele BELARDI 2007-11-21 10:20 ` Raffaele BELARDI @ 2007-11-21 14:45 ` Billy Holmes 2007-11-21 15:24 ` Raffaele BELARDI 1 sibling, 1 reply; 16+ messages in thread From: Billy Holmes @ 2007-11-21 14:45 UTC (permalink / raw To: gentoo-amd64 Quoting Raffaele BELARDI <raffaele.belardi@st.com>: > Raffaele Belardi wrote: > the sdb.img as a disk image, so mount it as a disk (not as a partition) > so maybe it automatically skips the first 0x7e00 bytes and gives me an > aligned first partition? have you tried playing with losetup ? this link might help http://lists.samba.org/archive/linux/2004-December/012627.html -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [gentoo-amd64] Re: not amd64 specific - disk failure 2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes @ 2007-11-21 15:24 ` Raffaele BELARDI 0 siblings, 0 replies; 16+ messages in thread From: Raffaele BELARDI @ 2007-11-21 15:24 UTC (permalink / raw To: gentoo-amd64 Billy Holmes wrote: > Quoting Raffaele BELARDI <raffaele.belardi@st.com>: > >> Raffaele Belardi wrote: >> the sdb.img as a disk image, so mount it as a disk (not as a partition) >> so maybe it automatically skips the first 0x7e00 bytes and gives me an >> aligned first partition? > > have you tried playing with losetup ? > > this link might help > > http://lists.samba.org/archive/linux/2004-December/012627.html > Thanks, I had never heard of this command till this morning, there's always something to learn. I'll give it a try tonight. -- gentoo-amd64@gentoo.org mailing list ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-11-23 10:13 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-19 10:05 [gentoo-amd64] not amd64 specific - disk failure Raffaele BELARDI 2007-11-19 11:52 ` Beso 2007-11-19 13:27 ` Raffaele BELARDI 2007-11-19 13:56 ` Beso 2007-11-19 17:14 ` [gentoo-amd64] " Duncan 2007-11-20 7:47 ` Raffaele BELARDI 2007-11-20 9:01 ` Beso 2007-11-20 10:06 ` Raffaele BELARDI 2007-11-20 9:41 ` Duncan 2007-11-20 10:25 ` Raffaele BELARDI 2007-11-21 9:24 ` Raffaele BELARDI 2007-11-21 10:20 ` Raffaele BELARDI 2007-11-22 9:36 ` [gentoo-amd64] Re: not amd64 specific - disk failure - SUCCESS Raffaele BELARDI 2007-11-23 10:07 ` Duncan 2007-11-21 14:45 ` [gentoo-amd64] Re: not amd64 specific - disk failure Billy Holmes 2007-11-21 15:24 ` Raffaele BELARDI
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox