* [gentoo-user] Contradictionary behaviour of SMART on hds ?!? @ 2014-07-27 10:12 meino.cramer 2014-07-27 10:23 ` Dale 2014-07-27 10:27 ` Neil Bothwick 0 siblings, 2 replies; 18+ messages in thread From: meino.cramer @ 2014-07-27 10:12 UTC (permalink / raw To: Gentoo Hi, after finding a bad sector on my hd (see previous thread) a read a lot stuff about SMART, smartctl and such to determine wether and how severe is a bad sector and how to cope with it. There is (at least ;) one thing I dont understand: On the one hand, the surface test (extended offline and such) aborts as soon the first read fgailure happens. On the other hand it is said: If the count of bad sectors increases over time it is time to change the hd. How can the second happen, if the first is true??? Best regards, mcc ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 10:12 [gentoo-user] Contradictionary behaviour of SMART on hds ?!? meino.cramer @ 2014-07-27 10:23 ` Dale 2014-07-27 10:27 ` Neil Bothwick 1 sibling, 0 replies; 18+ messages in thread From: Dale @ 2014-07-27 10:23 UTC (permalink / raw To: gentoo-user meino.cramer@gmx.de wrote: > Hi, > > after finding a bad sector on my hd (see previous thread) a read a lot > stuff about SMART, smartctl and such to determine wether and how > severe is a bad sector and how to cope with it. > > There is (at least ;) one thing I dont understand: > > On the one hand, the surface test (extended offline and such) aborts > as soon the first read fgailure happens. > > On the other hand it is said: If the count of bad sectors increases > over time it is time to change the hd. > > How can the second happen, if the first is true??? > > Best regards, > mcc > > I am by no means a expert on this but I'll try to provide some info, which others may correct. ;-) I had a situation similar to yours a few months ago. I had a bad spot that popped up. After some effort, I got the drive to mark that as bad. Now, from my understanding, the drive sort of has some extra space that it can use if needed. However, at some point it will run out if you have spots going bad one after another. Again, my understanding. Let's say it has 200 extra spots. You then run the test or the drive detects on its own and finds a bad spot. It marks that spot as bad and then uses one of the extra spots. So, if you have 201 spots to go bad, you fresh out of luck. Again, that's my understanding of this. Now can someone else that knows even more than me explain this better? :-D Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 10:12 [gentoo-user] Contradictionary behaviour of SMART on hds ?!? meino.cramer 2014-07-27 10:23 ` Dale @ 2014-07-27 10:27 ` Neil Bothwick 2014-07-27 10:41 ` meino.cramer 1 sibling, 1 reply; 18+ messages in thread From: Neil Bothwick @ 2014-07-27 10:27 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 767 bytes --] On Sun, 27 Jul 2014 12:12:47 +0200, meino.cramer@gmx.de wrote: > On the one hand, the surface test (extended offline and such) aborts > as soon the first read fgailure happens. > > On the other hand it is said: If the count of bad sectors increases > over time it is time to change the hd. > > How can the second happen, if the first is true??? My understanding is that the test only aborts if the error is severe enough to force it to do so. A simple bad block can be skipped and the rest of the drive tested. I've had a couple of drives get to the stage where SMART tests abort at an error and in both cases the manufacturer replaced them without question. -- Neil Bothwick If at first you do succeed, try to hide your astonishment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 10:27 ` Neil Bothwick @ 2014-07-27 10:41 ` meino.cramer 2014-07-27 11:11 ` Dale 2014-07-27 18:57 ` Neil Bothwick 0 siblings, 2 replies; 18+ messages in thread From: meino.cramer @ 2014-07-27 10:41 UTC (permalink / raw To: gentoo-user Neil Bothwick <neil@digimed.co.uk> [14-07-27 12:32]: > On Sun, 27 Jul 2014 12:12:47 +0200, meino.cramer@gmx.de wrote: > > > On the one hand, the surface test (extended offline and such) aborts > > as soon the first read fgailure happens. > > > > On the other hand it is said: If the count of bad sectors increases > > over time it is time to change the hd. > > > > How can the second happen, if the first is true??? > > My understanding is that the test only aborts if the error is severe > enough to force it to do so. A simple bad block can be skipped and the > rest of the drive tested. > > I've had a couple of drives get to the stage where SMART tests abort at > an error and in both cases the manufacturer replaced them without > question. > > > -- > Neil Bothwick > > If at first you do succeed, try to hide your astonishment. Hi Dale, hi Neil, thanks for the infos. But it is slightly off the point I tried to explain (I am no native english speaker...sorry...:) Suppose - as in my case - I have not yert managed to urge the hd to map the bad sector off... Now...all tests abort after scanning 10% of the disk. Disk health status is reported as "PASSED"...cause only one bad sector has been found. But 90% of the space of the disk has never been scanned. Is this an implementation fault? And if YES...is it the implementation of the firmware? And: Is it my firmware or the one of the drive? ;) Best regards, mcc ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 10:41 ` meino.cramer @ 2014-07-27 11:11 ` Dale 2014-07-27 11:29 ` meino.cramer 2014-07-27 18:57 ` Neil Bothwick 1 sibling, 1 reply; 18+ messages in thread From: Dale @ 2014-07-27 11:11 UTC (permalink / raw To: gentoo-user meino.cramer@gmx.de wrote: > Neil Bothwick <neil@digimed.co.uk> [14-07-27 12:32]: >> On Sun, 27 Jul 2014 12:12:47 +0200, meino.cramer@gmx.de wrote: >> >>> On the one hand, the surface test (extended offline and such) aborts >>> as soon the first read fgailure happens. >>> >>> On the other hand it is said: If the count of bad sectors increases >>> over time it is time to change the hd. >>> >>> How can the second happen, if the first is true??? >> My understanding is that the test only aborts if the error is severe >> enough to force it to do so. A simple bad block can be skipped and the >> rest of the drive tested. >> >> I've had a couple of drives get to the stage where SMART tests abort at >> an error and in both cases the manufacturer replaced them without >> question. >> >> >> -- >> Neil Bothwick >> >> If at first you do succeed, try to hide your astonishment. > Hi Dale, hi Neil, > > thanks for the infos. > > But it is slightly off the point I tried to explain (I am no native > english speaker...sorry...:) > > Suppose - as in my case - I have not yert managed to urge the hd to > map the bad sector off... > > Now...all tests abort after scanning 10% of the disk. Disk health > status is reported as "PASSED"...cause only one bad sector has been > found. > > But 90% of the space of the disk has never been scanned. > > Is this an implementation fault? > And if YES...is it the implementation of the firmware? > And: Is it my firmware or the one of the drive? > ;) > > Best regards, > mcc > Interesting. I was able to get mine to do a full test and give me a clean result. If yours doesn't, well, I'd be diggin me out a box and sending that puppy back to mommy. It seems to need some help. To me, errors is one thing, errors that can't be corrected is a whole new problem. It should fix it and pass the test. Even with my drive passing the test, I don't trust it yet. If it was still showing the error even after I did what I had done, I certainly wouldn't trust it. If yours can't finish the long self test, it may need repairs that are above our pay grade. Maybe Neil or someone will have more ideas. I hope. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 11:11 ` Dale @ 2014-07-27 11:29 ` meino.cramer 2014-07-27 12:33 ` Dale 2014-07-28 4:44 ` Jc García 0 siblings, 2 replies; 18+ messages in thread From: meino.cramer @ 2014-07-27 11:29 UTC (permalink / raw To: gentoo-user Dale <rdalek1967@gmail.com> [14-07-27 13:12]: > meino.cramer@gmx.de wrote: > > Neil Bothwick <neil@digimed.co.uk> [14-07-27 12:32]: > >> On Sun, 27 Jul 2014 12:12:47 +0200, meino.cramer@gmx.de wrote: > >> > >>> On the one hand, the surface test (extended offline and such) aborts > >>> as soon the first read fgailure happens. > >>> > >>> On the other hand it is said: If the count of bad sectors increases > >>> over time it is time to change the hd. > >>> > >>> How can the second happen, if the first is true??? > >> My understanding is that the test only aborts if the error is severe > >> enough to force it to do so. A simple bad block can be skipped and the > >> rest of the drive tested. > >> > >> I've had a couple of drives get to the stage where SMART tests abort at > >> an error and in both cases the manufacturer replaced them without > >> question. > >> > >> > >> -- > >> Neil Bothwick > >> > >> If at first you do succeed, try to hide your astonishment. > > Hi Dale, hi Neil, > > > > thanks for the infos. > > > > But it is slightly off the point I tried to explain (I am no native > > english speaker...sorry...:) > > > > Suppose - as in my case - I have not yert managed to urge the hd to > > map the bad sector off... > > > > Now...all tests abort after scanning 10% of the disk. Disk health > > status is reported as "PASSED"...cause only one bad sector has been > > found. > > > > But 90% of the space of the disk has never been scanned. > > > > Is this an implementation fault? > > And if YES...is it the implementation of the firmware? > > And: Is it my firmware or the one of the drive? > > ;) > > > > Best regards, > > mcc > > > > Interesting. I was able to get mine to do a full test and give me a > clean result. If yours doesn't, well, I'd be diggin me out a box and > sending that puppy back to mommy. It seems to need some help. To me, > errors is one thing, errors that can't be corrected is a whole new > problem. It should fix it and pass the test. > > Even with my drive passing the test, I don't trust it yet. If it was > still showing the error even after I did what I had done, I certainly > wouldn't trust it. If yours can't finish the long self test, it may > need repairs that are above our pay grade. > > Maybe Neil or someone will have more ideas. I hope. > > Dale > > :-) :-) > Back to the initial problem: How can I offline test the rest of the disk if the first bad sector (10%) of the surface breaks the test with an error? Best regards, mcc ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 11:29 ` meino.cramer @ 2014-07-27 12:33 ` Dale 2014-07-27 13:19 ` meino.cramer 2014-07-28 4:44 ` Jc García 1 sibling, 1 reply; 18+ messages in thread From: Dale @ 2014-07-27 12:33 UTC (permalink / raw To: gentoo-user meino.cramer@gmx.de wrote: > Back to the initial problem: How can I offline test the rest of the > disk if the first bad sector (10%) of the surface breaks the test with > an error? Best regards, mcc I never got mine to go past the first failure until I used dd to erase the drive. As mentioned before, I may could have done that without moving my data but that was to complicated and risky for me at the time. From my understanding tho, until that data is moved off the bad spot so that the drive knows it can do what it needs to, that spot is still going to show up. I don't know of a way to make it test beyond the bad spot either. If you have a drive that you can move that data over to so that you can play with the bad drive, that's what I would do. Once you get it moved, then dd the whole drive, run the test and then see what results you get. I looked at a howto that someone posted or I found and doing it with the data on there just made me nervous. I'm running out of info here. Anyone else provide more help than me? Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 12:33 ` Dale @ 2014-07-27 13:19 ` meino.cramer 2014-07-27 14:05 ` Dale 0 siblings, 1 reply; 18+ messages in thread From: meino.cramer @ 2014-07-27 13:19 UTC (permalink / raw To: gentoo-user Dale <rdalek1967@gmail.com> [14-07-27 14:36]: > meino.cramer@gmx.de wrote: > > Back to the initial problem: How can I offline test the rest of the > > disk if the first bad sector (10%) of the surface breaks the test with > > an error? Best regards, mcc > > I never got mine to go past the first failure until I used dd to erase > the drive. As mentioned before, I may could have done that without > moving my data but that was to complicated and risky for me at the > time. From my understanding tho, until that data is moved off the bad > spot so that the drive knows it can do what it needs to, that spot is > still going to show up. I don't know of a way to make it test beyond > the bad spot either. > > If you have a drive that you can move that data over to so that you can > play with the bad drive, that's what I would do. Once you get it moved, > then dd the whole drive, run the test and then see what results you > get. I looked at a howto that someone posted or I found and doing it > with the data on there just made me nervous. > > I'm running out of info here. Anyone else provide more help than me? > > Dale > > :-) :-) > Hi Dale, thanks for the info... I already did this. PLEASE read my previous posting completly. dd failed with an I/O error at that spot. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 13:19 ` meino.cramer @ 2014-07-27 14:05 ` Dale 2014-07-27 14:34 ` Mick 0 siblings, 1 reply; 18+ messages in thread From: Dale @ 2014-07-27 14:05 UTC (permalink / raw To: gentoo-user meino.cramer@gmx.de wrote: > Dale <rdalek1967@gmail.com> [14-07-27 14:36]: >> meino.cramer@gmx.de wrote: >>> Back to the initial problem: How can I offline test the rest of the >>> disk if the first bad sector (10%) of the surface breaks the test with >>> an error? Best regards, mcc >> I never got mine to go past the first failure until I used dd to erase >> the drive. As mentioned before, I may could have done that without >> moving my data but that was to complicated and risky for me at the >> time. From my understanding tho, until that data is moved off the bad >> spot so that the drive knows it can do what it needs to, that spot is >> still going to show up. I don't know of a way to make it test beyond >> the bad spot either. >> >> If you have a drive that you can move that data over to so that you can >> play with the bad drive, that's what I would do. Once you get it moved, >> then dd the whole drive, run the test and then see what results you >> get. I looked at a howto that someone posted or I found and doing it >> with the data on there just made me nervous. >> >> I'm running out of info here. Anyone else provide more help than me? >> >> Dale >> >> :-) :-) >> > Hi Dale, > > thanks for the info... > > I already did this. PLEASE read my previous posting completly. > > dd failed with an I/O error at that spot. > > > > Hmmmm. I'd be getting my data off there or some sort of backup and then try erasing the whole drive. If that fails as well, then it seems like you need a box and some shipping to get a replacement if it is under warranty. If the dd fails, that sounds like maybe it has a error it can't correct for some reason. I think dd does its thing on a basic level and I have never had it give me a error except for running out of space when it is done. I'm sure if the command you used was wrong, Neil would have picked up on it and said something. So, I don't think you are doing anything wrong, I just think your drive may have even more serious issues than mine had. Unless someone else comes on with a idea on something else to try, I'd be looking for somewhere to put my data and a different drive. If after that you can get it working, well, you got a spare. If not, it was broke anyway. I hope someone else has more ideas. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 14:05 ` Dale @ 2014-07-27 14:34 ` Mick 2014-07-27 16:50 ` meino.cramer 0 siblings, 1 reply; 18+ messages in thread From: Mick @ 2014-07-27 14:34 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 2742 bytes --] On Sunday 27 Jul 2014 15:05:53 Dale wrote: > meino.cramer@gmx.de wrote: > > Dale <rdalek1967@gmail.com> [14-07-27 14:36]: > >> meino.cramer@gmx.de wrote: > >>> Back to the initial problem: How can I offline test the rest of the > >>> disk if the first bad sector (10%) of the surface breaks the test with > >>> an error? Best regards, mcc > >> > >> I never got mine to go past the first failure until I used dd to erase > >> the drive. As mentioned before, I may could have done that without > >> moving my data but that was to complicated and risky for me at the > >> time. From my understanding tho, until that data is moved off the bad > >> spot so that the drive knows it can do what it needs to, that spot is > >> still going to show up. I don't know of a way to make it test beyond > >> the bad spot either. > >> > >> If you have a drive that you can move that data over to so that you can > >> play with the bad drive, that's what I would do. Once you get it moved, > >> then dd the whole drive, run the test and then see what results you > >> get. I looked at a howto that someone posted or I found and doing it > >> with the data on there just made me nervous. > >> > >> I'm running out of info here. Anyone else provide more help than me? > >> > >> Dale > >> > >> :-) :-) > > > > Hi Dale, > > > > thanks for the info... > > > > I already did this. PLEASE read my previous posting completly. > > > > dd failed with an I/O error at that spot. > > Hmmmm. I'd be getting my data off there or some sort of backup and then > try erasing the whole drive. If that fails as well, then it seems like > you need a box and some shipping to get a replacement if it is under > warranty. If the dd fails, that sounds like maybe it has a error it > can't correct for some reason. I think dd does its thing on a basic > level and I have never had it give me a error except for running out of > space when it is done. I'm sure if the command you used was wrong, Neil > would have picked up on it and said something. So, I don't think you > are doing anything wrong, I just think your drive may have even more > serious issues than mine had. > > Unless someone else comes on with a idea on something else to try, I'd > be looking for somewhere to put my data and a different drive. If after > that you can get it working, well, you got a spare. If not, it was > broke anyway. > > I hope someone else has more ideas. Does it still error out if you run the commands in this sequence? mkswap -L swap -f -c /dev/sda2 dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc Also, did you try the 'hdparm --write-sector' option that Volker mentioned? -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 14:34 ` Mick @ 2014-07-27 16:50 ` meino.cramer 2014-07-28 8:22 ` Helmut Jarausch 2014-07-29 10:27 ` Mick 0 siblings, 2 replies; 18+ messages in thread From: meino.cramer @ 2014-07-27 16:50 UTC (permalink / raw To: gentoo-user Mick <michaelkintzios@gmail.com> [14-07-27 16:36]: > On Sunday 27 Jul 2014 15:05:53 Dale wrote: > > meino.cramer@gmx.de wrote: > > > Dale <rdalek1967@gmail.com> [14-07-27 14:36]: > > >> meino.cramer@gmx.de wrote: > > >>> Back to the initial problem: How can I offline test the rest of the > > >>> disk if the first bad sector (10%) of the surface breaks the test with > > >>> an error? Best regards, mcc > > >> > > >> I never got mine to go past the first failure until I used dd to erase > > >> the drive. As mentioned before, I may could have done that without > > >> moving my data but that was to complicated and risky for me at the > > >> time. From my understanding tho, until that data is moved off the bad > > >> spot so that the drive knows it can do what it needs to, that spot is > > >> still going to show up. I don't know of a way to make it test beyond > > >> the bad spot either. > > >> > > >> If you have a drive that you can move that data over to so that you can > > >> play with the bad drive, that's what I would do. Once you get it moved, > > >> then dd the whole drive, run the test and then see what results you > > >> get. I looked at a howto that someone posted or I found and doing it > > >> with the data on there just made me nervous. > > >> > > >> I'm running out of info here. Anyone else provide more help than me? > > >> > > >> Dale > > >> > > >> :-) :-) > > > > > > Hi Dale, > > > > > > thanks for the info... > > > > > > I already did this. PLEASE read my previous posting completly. > > > > > > dd failed with an I/O error at that spot. > > > > Hmmmm. I'd be getting my data off there or some sort of backup and then > > try erasing the whole drive. If that fails as well, then it seems like > > you need a box and some shipping to get a replacement if it is under > > warranty. If the dd fails, that sounds like maybe it has a error it > > can't correct for some reason. I think dd does its thing on a basic > > level and I have never had it give me a error except for running out of > > space when it is done. I'm sure if the command you used was wrong, Neil > > would have picked up on it and said something. So, I don't think you > > are doing anything wrong, I just think your drive may have even more > > serious issues than mine had. > > > > Unless someone else comes on with a idea on something else to try, I'd > > be looking for somewhere to put my data and a different drive. If after > > that you can get it working, well, you got a spare. If not, it was > > broke anyway. > > > > I hope someone else has more ideas. > > Does it still error out if you run the commands in this sequence? > > mkswap -L swap -f -c /dev/sda2 > dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc > > Also, did you try the 'hdparm --write-sector' option that Volker mentioned? > > -- > Regards, > Mick Hi Mick, thanks for your reply on the topic. I executed the mkswap/dd combo a several times today. Since I have no logs I repeated again. Here are the results: solfire:/home/user>mkswap -L swap -f -c /dev/sda2 1 bad page mkswap: /dev/sda2: warning: wiping old swap signature. Setting up swapspace version 1, size = 6291448 KiB LABEL=swap, UUID=e742c0a6-862c-41e9-be4b-698b33c5a236 solfire:/home/user>dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc dd: error writing ‘/dev/sda2’: Input/output error 1669369+0 records in 1669368+0 records out 854716416 bytes (855 MB) copied, 28.4799 s, 30.0 MB/s [1] 24047 exit 1 dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc solfire:/home/user> I am a little anxious about the hdparm command... For me it is unclear what sector is meant: smartclt says: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Selective offline Completed: read failure 90% 14500 4288352511 From a previous posting I learned that "LBA" in this case is the byte counter. The sector is therefore 4288352511/512=8375688 However as a result of the dd command above I found this in the dmesg log: [48588.471905] end_request: I/O error, dev sda, sector 1773816 Now...what sector count fits what sector count ... ? I will not fire zeroes towards my hd this way before I know exactly to what I am shooting at... ;) Any light in all this shadow is heartly appreciated... Best regards, mcc ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 16:50 ` meino.cramer @ 2014-07-28 8:22 ` Helmut Jarausch 2014-07-29 10:27 ` Mick 1 sibling, 0 replies; 18+ messages in thread From: Helmut Jarausch @ 2014-07-28 8:22 UTC (permalink / raw To: gentoo-user On 07/27/2014 06:50:49 PM, meino.cramer@gmx.de wrote: > Hi Mick, > > thanks for your reply on the topic. > > I executed the mkswap/dd combo a several times today. Since I have > no logs I repeated again. Here are the results: > > solfire:/home/user>mkswap -L swap -f -c /dev/sda2 > 1 bad page > mkswap: /dev/sda2: warning: wiping old swap signature. > Setting up swapspace version 1, size = 6291448 KiB > LABEL=swap, UUID=e742c0a6-862c-41e9-be4b-698b33c5a236 > solfire:/home/user>dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc > dd: error writing ‘/dev/sda2’: Input/output error > 1669369+0 records in > 1669368+0 records out > 854716416 bytes (855 MB) copied, 28.4799 s, 30.0 MB/s > [1] 24047 exit 1 dd if=/dev/zero of=/dev/sda2 bs=512 > conv=notrunc > solfire:/home/user> > > > I am a little anxious about the hdparm command... > For me it is unclear what sector is meant: > > smartclt says: > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Selective offline Completed: read failure 90% > 14500 4288352511 > > From a previous posting I learned that "LBA" in this case is the byte > counter. > > The sector is therefore 4288352511/512=8375688 > > However as a result of the dd command above I found this in the dmesg > log: > > [48588.471905] end_request: I/O error, dev sda, sector 1773816 > > Now...what sector count fits what sector count ... ? > > I will not fire zeroes towards my hd this way before I know exactly > to what I am shooting at... ;) > > Any light in all this shadow is heartly appreciated... > > Best regards, > mcc > Here a few observations: First, smartctl starts counting at the very first sector of the drive while dd starts counting at the first sector of the partition. So, find out where the partition starts by using fdisk and add the partition offset to the number given by dd. Second, if your file system is ext{2,3,4} try using fsdebug as described in file:///home/jarausch/GenToo/Hints/Smartmontools_badblockhowto.html Third, as far as I understand, smartctl's '-t select' option lets you test specific ranges of the disk. You could try to start the test after the defective sector. Helmut ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 16:50 ` meino.cramer 2014-07-28 8:22 ` Helmut Jarausch @ 2014-07-29 10:27 ` Mick 1 sibling, 0 replies; 18+ messages in thread From: Mick @ 2014-07-29 10:27 UTC (permalink / raw To: gentoo-user; +Cc: meino.cramer [-- Attachment #1: Type: Text/Plain, Size: 2290 bytes --] On Sunday 27 Jul 2014 17:50:49 meino.cramer@gmx.de wrote: > I executed the mkswap/dd combo a several times today. Since I have > no logs I repeated again. Here are the results: > > solfire:/home/user>mkswap -L swap -f -c /dev/sda2 > 1 bad page > mkswap: /dev/sda2: warning: wiping old swap signature. > Setting up swapspace version 1, size = 6291448 KiB > LABEL=swap, UUID=e742c0a6-862c-41e9-be4b-698b33c5a236 > solfire:/home/user>dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc > dd: error writing ‘/dev/sda2’: Input/output error > 1669369+0 records in > 1669368+0 records out > 854716416 bytes (855 MB) copied, 28.4799 s, 30.0 MB/s > [1] 24047 exit 1 dd if=/dev/zero of=/dev/sda2 bs=512 conv=notrunc > solfire:/home/user> Ahh! This is a different result to what you showed previously, unless you chopped off the output last time. It now shows that it was able to write: 1669368 x 512 = 854,716,416 bytes out of a total sda2 partition size of 6.29 GiB. > I am a little anxious about the hdparm command... > For me it is unclear what sector is meant: > > smartclt says: > Num Test_Description Status Remaining LifeTime(hours) > LBA_of_first_error # 1 Selective offline Completed: read failure > 90% 14500 4288352511 > > From a previous posting I learned that "LBA" in this case is the byte > counter. > > The sector is therefore 4288352511/512=8375688 OK, let's not confuse ourselves: LBA counting starts from zero. Therefore, if you subtract the start of the sda2 partition it becomes: 4288352511 - 104448 = 4288248063 / 512 = 8,375,484.5 bytes, within the sda2 partition. Unless my maths failed me above, I can't say I understand why dd bailed out after writing 854,716,416 bytes, but smartctl failed earlier than that reading just 8,375,484.5 bytes. Perhaps dd can write further, than what smartctl can read without error, because these are two different mechanisms - but I am guessing wildly. :-) Someone more knowledgeable should chime in here. PS. I'm copying you in just in case this is lost in your Inbox - I had to resend it because I used the wrong From address by mistake and I suspect it never made it to the list. -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 11:29 ` meino.cramer 2014-07-27 12:33 ` Dale @ 2014-07-28 4:44 ` Jc García 1 sibling, 0 replies; 18+ messages in thread From: Jc García @ 2014-07-28 4:44 UTC (permalink / raw To: gentoo-user 2014-07-27 5:29 GMT-06:00 <meino.cramer@gmx.de>: > > Back to the initial problem: > > How can I offline test the rest of the disk if > the first bad sector (10%) of the surface breaks > the test with an error? > > Best regards, > mcc I've only read this thread not the previous, and I can only give some feedback on this part, if i understood dd reported failing at 4288352511 bytes in your drive, make a loopback device with an offset past that sector to try dd keep going, or run dd with skip up to that sector with. # losetup -o 428835252 /dev/loop1 /dev/your_hdd just to give an example, calculate an apropiate byte count for a 1 sector above, and then use dd on /dev/loop1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 10:41 ` meino.cramer 2014-07-27 11:11 ` Dale @ 2014-07-27 18:57 ` Neil Bothwick 2014-07-27 19:20 ` Dale 1 sibling, 1 reply; 18+ messages in thread From: Neil Bothwick @ 2014-07-27 18:57 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1192 bytes --] On Sun, 27 Jul 2014 12:41:15 +0200, meino.cramer@gmx.de wrote: > > My understanding is that the test only aborts if the error is severe > > enough to force it to do so. A simple bad block can be skipped and the > > rest of the drive tested. > But it is slightly off the point I tried to explain (I am no native > english speaker...sorry...:) > > Suppose - as in my case - I have not yert managed to urge the hd to > map the bad sector off... > > Now...all tests abort after scanning 10% of the disk. Disk health > status is reported as "PASSED"...cause only one bad sector has been > found. > > But 90% of the space of the disk has never been scanned. Read the smartctl message again, it's not reporting a bad sector, it's reporting a read failure. Bad sectors are detected and mapped out in the background, you have something more serious, something that prevents the drive scanning past this point. If it's less then two years old, send it back. Most drive manufacturers have a form on their web site where you can input the serial number and see the warranty status. If you can return it so so, ASAP. -- Neil Bothwick Press every key to continue. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 18:57 ` Neil Bothwick @ 2014-07-27 19:20 ` Dale 2014-07-27 21:21 ` Neil Bothwick 0 siblings, 1 reply; 18+ messages in thread From: Dale @ 2014-07-27 19:20 UTC (permalink / raw To: gentoo-user Neil Bothwick wrote: > On Sun, 27 Jul 2014 12:41:15 +0200, meino.cramer@gmx.de wrote: > >>> My understanding is that the test only aborts if the error is severe >>> enough to force it to do so. A simple bad block can be skipped and the >>> rest of the drive tested. >> But it is slightly off the point I tried to explain (I am no native >> english speaker...sorry...:) >> >> Suppose - as in my case - I have not yert managed to urge the hd to >> map the bad sector off... >> >> Now...all tests abort after scanning 10% of the disk. Disk health >> status is reported as "PASSED"...cause only one bad sector has been >> found. >> >> But 90% of the space of the disk has never been scanned. > Read the smartctl message again, it's not reporting a bad sector, it's > reporting a read failure. Bad sectors are detected and mapped out in the > background, you have something more serious, something that prevents the > drive scanning past this point. If it's less then two years old, send it > back. Most drive manufacturers have a form on their web site where you > can input the serial number and see the warranty status. If you can > return it so so, ASAP. > > Glad you noticed something I didn't. I just wish it was better news for the OP. Question. Does that mean that the heads can't move past that point? If yes, does that mean the OP can't get any data that is further out than that point? I'm asking hoping I will learn something. I have taken drives apart so I know how the arm moves the heads across the platter. If I get what you are saying, it's like the heads get to a certain point, about 10%, and then stop. Thanks. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 19:20 ` Dale @ 2014-07-27 21:21 ` Neil Bothwick 2014-07-28 3:11 ` Dale 0 siblings, 1 reply; 18+ messages in thread From: Neil Bothwick @ 2014-07-27 21:21 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 787 bytes --] On Sun, 27 Jul 2014 14:20:23 -0500, Dale wrote: > Question. Does that mean that the heads can't move past that point? If > yes, does that mean the OP can't get any data that is further out than > that point? I'm asking hoping I will learn something. I have taken > drives apart so I know how the arm moves the heads across the platter. > If I get what you are saying, it's like the heads get to a certain > point, about 10%, and then stop. I don't think so, as I've seen this sort of thing on a drive but still been able to access ~all my data. It seems that the SMART tests are a little stupid in this respect and give up when they decide a drive is broken, as opposed to failing. -- Neil Bothwick Irritable? Who the bloody hell are you calling irritable? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Contradictionary behaviour of SMART on hds ?!? 2014-07-27 21:21 ` Neil Bothwick @ 2014-07-28 3:11 ` Dale 0 siblings, 0 replies; 18+ messages in thread From: Dale @ 2014-07-28 3:11 UTC (permalink / raw To: gentoo-user Neil Bothwick wrote: > On Sun, 27 Jul 2014 14:20:23 -0500, Dale wrote: > >> Question. Does that mean that the heads can't move past that point? If >> yes, does that mean the OP can't get any data that is further out than >> that point? I'm asking hoping I will learn something. I have taken >> drives apart so I know how the arm moves the heads across the platter. >> If I get what you are saying, it's like the heads get to a certain >> point, about 10%, and then stop. > I don't think so, as I've seen this sort of thing on a drive but still > been able to access ~all my data. It seems that the SMART tests are a > little stupid in this respect and give up when they decide a drive is > broken, as opposed to failing. > > So, it isn't likely a mechanical failure but *maybe* some sort of firmware/software/or other type of failure? Interesting. The reason I was asking is because it seems the OP is using the drive, even booting from it I think, which makes me think it is still able to access the data but yet the SMART test can't get to the same area. It was a bit confusing since it wasn't "logical". One type of access is working while another isn't. Odd. I realize that short of some techy person taking the drive apart, we won't likely really know why it failed the test but just curious as to what options were there as to the failure. Well, run into something interesting everyday. I hope the OP can get his data off there before this gets worse. I guess if nothing else, SMART showed that something isn't right, is likely failing and needs attention. If SMART is correct. ;-) Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-07-29 10:28 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-27 10:12 [gentoo-user] Contradictionary behaviour of SMART on hds ?!? meino.cramer 2014-07-27 10:23 ` Dale 2014-07-27 10:27 ` Neil Bothwick 2014-07-27 10:41 ` meino.cramer 2014-07-27 11:11 ` Dale 2014-07-27 11:29 ` meino.cramer 2014-07-27 12:33 ` Dale 2014-07-27 13:19 ` meino.cramer 2014-07-27 14:05 ` Dale 2014-07-27 14:34 ` Mick 2014-07-27 16:50 ` meino.cramer 2014-07-28 8:22 ` Helmut Jarausch 2014-07-29 10:27 ` Mick 2014-07-28 4:44 ` Jc García 2014-07-27 18:57 ` Neil Bothwick 2014-07-27 19:20 ` Dale 2014-07-27 21:21 ` Neil Bothwick 2014-07-28 3:11 ` Dale
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox