* [gentoo-user] OT: Fighting bit rot
@ 2013-01-07 20:11 Florian Philipp
2013-01-07 21:07 ` Paul Hartman
` (3 more replies)
0 siblings, 4 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-07 20:11 UTC (permalink / raw
To: Gentoo User List
[-- Attachment #1: Type: text/plain, Size: 763 bytes --]
Hi list!
I have a use case where I am seriously concerned about bit rot [1] and I
thought it might be a good idea to start looking for it in my own
private stuff, too.
Solving the problem is easy enough:
- Record checksums and timestamps for each file
- Check and update records via cronjob
- If checksum changed but timestamp didn't, notify user
- Let user restore from backup
However, I haven't found any application in portage for this task. Now,
the implementation is easy enough but I'm wondering why it hasn't been
done. Or do I just look for the wrong thing? The only suitable thing
seems to be app-admin/tripwire but that application also looks like
overkill.
[1] http://en.wikipedia.org/wiki/Bit_rot
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 20:11 [gentoo-user] OT: Fighting bit rot Florian Philipp
@ 2013-01-07 21:07 ` Paul Hartman
2013-01-07 22:05 ` Florian Philipp
2013-01-07 21:33 ` Michael Mol
` (2 subsequent siblings)
3 siblings, 1 reply; 39+ messages in thread
From: Paul Hartman @ 2013-01-07 21:07 UTC (permalink / raw
To: gentoo-user
On Mon, Jan 7, 2013 at 2:11 PM, Florian Philipp <lists@binarywings.net> wrote:
> Hi list!
>
> I have a use case where I am seriously concerned about bit rot [1] and I
> thought it might be a good idea to start looking for it in my own
> private stuff, too.
>
> Solving the problem is easy enough:
> - Record checksums and timestamps for each file
> - Check and update records via cronjob
> - If checksum changed but timestamp didn't, notify user
> - Let user restore from backup
>
> However, I haven't found any application in portage for this task. Now,
> the implementation is easy enough but I'm wondering why it hasn't been
> done. Or do I just look for the wrong thing? The only suitable thing
> seems to be app-admin/tripwire but that application also looks like
> overkill.
Not really what you are asking for, but I think btrfs and zfs have
checksumming built-in to the filesystem. I'm not sure what userspace
tools are like to monitor this, or if it's just an fsck away.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 20:11 [gentoo-user] OT: Fighting bit rot Florian Philipp
2013-01-07 21:07 ` Paul Hartman
@ 2013-01-07 21:33 ` Michael Mol
2013-01-07 22:10 ` Florian Philipp
2013-01-07 23:20 ` Alan McKinnon
2013-01-07 23:31 ` William Kenworthy
3 siblings, 1 reply; 39+ messages in thread
From: Michael Mol @ 2013-01-07 21:33 UTC (permalink / raw
To: gentoo-user
On Mon, Jan 7, 2013 at 3:11 PM, Florian Philipp <lists@binarywings.net> wrote:
> Hi list!
>
> I have a use case where I am seriously concerned about bit rot [1] and I
> thought it might be a good idea to start looking for it in my own
> private stuff, too.
>
> Solving the problem is easy enough:
> - Record checksums and timestamps for each file
> - Check and update records via cronjob
> - If checksum changed but timestamp didn't, notify user
> - Let user restore from backup
>
> However, I haven't found any application in portage for this task. Now,
> the implementation is easy enough but I'm wondering why it hasn't been
> done. Or do I just look for the wrong thing? The only suitable thing
> seems to be app-admin/tripwire but that application also looks like
> overkill.
>
> [1] http://en.wikipedia.org/wiki/Bit_rot
>
> Regards,
> Florian Philipp
>
Have you looked at Tripwire to see if it'll do what you need?
--
:wq
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 21:07 ` Paul Hartman
@ 2013-01-07 22:05 ` Florian Philipp
0 siblings, 0 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-07 22:05 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]
Am 07.01.2013 22:07, schrieb Paul Hartman:
> On Mon, Jan 7, 2013 at 2:11 PM, Florian Philipp <lists@binarywings.net> wrote:
>> Hi list!
>>
>> I have a use case where I am seriously concerned about bit rot [1] and I
>> thought it might be a good idea to start looking for it in my own
>> private stuff, too.
>>
>> Solving the problem is easy enough:
>> - Record checksums and timestamps for each file
>> - Check and update records via cronjob
>> - If checksum changed but timestamp didn't, notify user
>> - Let user restore from backup
>>
>> However, I haven't found any application in portage for this task. Now,
>> the implementation is easy enough but I'm wondering why it hasn't been
>> done. Or do I just look for the wrong thing? The only suitable thing
>> seems to be app-admin/tripwire but that application also looks like
>> overkill.
>
> Not really what you are asking for, but I think btrfs and zfs have
> checksumming built-in to the filesystem. I'm not sure what userspace
> tools are like to monitor this, or if it's just an fsck away.
>
Yes, that's a start. `btrfs scrub start` might give something
meaningful. But I'm not really trusting btrfs with valuable data, yet.
And CRC32 isn't much, either.
Thanks anyway,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 21:33 ` Michael Mol
@ 2013-01-07 22:10 ` Florian Philipp
0 siblings, 0 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-07 22:10 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 727 bytes --]
Am 07.01.2013 22:33, schrieb Michael Mol:
> On Mon, Jan 7, 2013 at 3:11 PM, Florian Philipp <lists@binarywings.net> wrote:
>> Hi list!
>>
>> I have a use case where I am seriously concerned about bit rot [1] and I
>> thought it might be a good idea to start looking for it in my own
>> private stuff, too.
>>[...]
>> The only suitable thing
>> seems to be app-admin/tripwire but that application also looks like
>> overkill.
[...]
>
> Have you looked at Tripwire to see if it'll do what you need?
>
Not in detail. I guess you can get it to do that but one look at the
configuration and I told myself, "Screw this, I will have coded it in
perl before I understood tripwire."
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 20:11 [gentoo-user] OT: Fighting bit rot Florian Philipp
2013-01-07 21:07 ` Paul Hartman
2013-01-07 21:33 ` Michael Mol
@ 2013-01-07 23:20 ` Alan McKinnon
2013-01-08 7:27 ` Florian Philipp
2013-01-07 23:31 ` William Kenworthy
3 siblings, 1 reply; 39+ messages in thread
From: Alan McKinnon @ 2013-01-07 23:20 UTC (permalink / raw
To: gentoo-user
On Mon, 07 Jan 2013 21:11:35 +0100
Florian Philipp <lists@binarywings.net> wrote:
> Hi list!
>
> I have a use case where I am seriously concerned about bit rot [1]
> and I thought it might be a good idea to start looking for it in my
> own private stuff, too.
>
> Solving the problem is easy enough:
> - Record checksums and timestamps for each file
> - Check and update records via cronjob
> - If checksum changed but timestamp didn't, notify user
> - Let user restore from backup
>
> However, I haven't found any application in portage for this task.
> Now, the implementation is easy enough but I'm wondering why it
> hasn't been done. Or do I just look for the wrong thing? The only
> suitable thing seems to be app-admin/tripwire but that application
> also looks like overkill.
>
> [1] http://en.wikipedia.org/wiki/Bit_rot
>
> Regards,
> Florian Philipp
>
You are using a very peculiar definition of bitrot.
"bits" do not "rot", they are not apples in a barrel. Bitrot usually
refers to code that goes unmaintained and no longer works in the system
it was installed. What definition are you using?
If you mean crummy code that goes unmaintained, then keep systems up to
date and report bugs.
If you mean disk file corruption, then doing it file by file is a
colossal waste of time IMNSHO. You likely have >1,000,000 files. Are
you really going to md5sum each one daily? Really?
This is a filesystem task, not a cronjab task. Use a filesystem that
does proper checksumming. ZFS does it, but that is of course somewhat
problematic on Linux. Check out the others, it will be something modern
you need, like ext4 maybe or btrfs
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 20:11 [gentoo-user] OT: Fighting bit rot Florian Philipp
` (2 preceding siblings ...)
2013-01-07 23:20 ` Alan McKinnon
@ 2013-01-07 23:31 ` William Kenworthy
3 siblings, 0 replies; 39+ messages in thread
From: William Kenworthy @ 2013-01-07 23:31 UTC (permalink / raw
To: gentoo-user
On 08/01/13 04:11, Florian Philipp wrote:
> Hi list!
>
> I have a use case where I am seriously concerned about bit rot [1] and I
> thought it might be a good idea to start looking for it in my own
> private stuff, too.
>
> Solving the problem is easy enough:
> - Record checksums and timestamps for each file
> - Check and update records via cronjob
> - If checksum changed but timestamp didn't, notify user
> - Let user restore from backup
>
> However, I haven't found any application in portage for this task. Now,
> the implementation is easy enough but I'm wondering why it hasn't been
> done. Or do I just look for the wrong thing? The only suitable thing
> seems to be app-admin/tripwire but that application also looks like
> overkill.
>
> [1] http://en.wikipedia.org/wiki/Bit_rot
>
> Regards,
> Florian Philipp
>
equery check pkg for the OS and tripwire for private data.
BillK
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-07 23:20 ` Alan McKinnon
@ 2013-01-08 7:27 ` Florian Philipp
2013-01-08 7:55 ` Alan McKinnon
` (3 more replies)
0 siblings, 4 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 7:27 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]
Am 08.01.2013 00:20, schrieb Alan McKinnon:
> On Mon, 07 Jan 2013 21:11:35 +0100
> Florian Philipp <lists@binarywings.net> wrote:
>
>> Hi list!
>>
>> I have a use case where I am seriously concerned about bit rot [1]
>> and I thought it might be a good idea to start looking for it in my
>> own private stuff, too.
[...]
>> [1] http://en.wikipedia.org/wiki/Bit_rot
>>
>> Regards,
>> Florian Philipp
>>
>
> You are using a very peculiar definition of bitrot.
>
> "bits" do not "rot", they are not apples in a barrel. Bitrot usually
> refers to code that goes unmaintained and no longer works in the system
> it was installed. What definition are you using?
>
That's why I referred to wikipedia, not the jargon file ;-)
The definition that I thought about was decay of storage media,
especially hard disks. I'm not aware of another commonly used name for
that effect. Disk rot seems to apply only to optical media.
> If you mean crummy code that goes unmaintained, then keep systems up to
> date and report bugs.
>
> If you mean disk file corruption, then doing it file by file is a
> colossal waste of time IMNSHO. You likely have >1,000,000 files. Are
> you really going to md5sum each one daily? Really?
>
Well, not daily but often enough that I likely still have a valid copy
as a backup.
> This is a filesystem task, not a cronjab task. Use a filesystem that
> does proper checksumming. ZFS does it, but that is of course somewhat
> problematic on Linux. Check out the others, it will be something modern
> you need, like ext4 maybe or btrfs
>
AFAIK, ext4 only has checksums for its metadata. Even if the file system
would support appropriate checksums out-of-the-box, I'd still need a
tool to regularly read files and report on errors.
As I said above, the point is that I need to detect the error as long as
I still have a valid backup. Professional archive solutions do this on
their own but I'm looking for something suitable for desktop usage.
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 7:27 ` Florian Philipp
@ 2013-01-08 7:55 ` Alan McKinnon
2013-01-08 16:16 ` Florian Philipp
2013-01-08 15:29 ` Grant Edwards
` (2 subsequent siblings)
3 siblings, 1 reply; 39+ messages in thread
From: Alan McKinnon @ 2013-01-08 7:55 UTC (permalink / raw
To: gentoo-user
On Tue, 08 Jan 2013 08:27:51 +0100
Florian Philipp <lists@binarywings.net> wrote:
> > This is a filesystem task, not a cronjab task. Use a filesystem that
> > does proper checksumming. ZFS does it, but that is of course
> > somewhat problematic on Linux. Check out the others, it will be
> > something modern you need, like ext4 maybe or btrfs
> >
>
> AFAIK, ext4 only has checksums for its metadata. Even if the file
> system would support appropriate checksums out-of-the-box, I'd still
> need a tool to regularly read files and report on errors.
>
> As I said above, the point is that I need to detect the error as long
> as I still have a valid backup. Professional archive solutions do
> this on their own but I'm looking for something suitable for desktop
> usage.
rsync might be able to give you something close to what you want
easily
Use the -n switch for an rsync between your originals and the last
backup copy, and mail the output to yourself. Parse it looking for ">"
and "<" symbols and investigate why the file changed.
This strikes me as being a very easy solution that you could use
reliably with a suitable combination of options.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 7:27 ` Florian Philipp
2013-01-08 7:55 ` Alan McKinnon
@ 2013-01-08 15:29 ` Grant Edwards
2013-01-08 15:42 ` Michael Mol
2013-01-08 17:35 ` [gentoo-user] " Volker Armin Hemmann
2013-01-09 0:12 ` [gentoo-user] " Randy Barlow
3 siblings, 1 reply; 39+ messages in thread
From: Grant Edwards @ 2013-01-08 15:29 UTC (permalink / raw
To: gentoo-user
On 2013-01-08, Florian Philipp <lists@binarywings.net> wrote:
> Am 08.01.2013 00:20, schrieb Alan McKinnon:
>> On Mon, 07 Jan 2013 21:11:35 +0100
>> Florian Philipp <lists@binarywings.net> wrote:
>>
>>> Hi list!
>>>
>>> I have a use case where I am seriously concerned about bit rot [1]
>>> and I thought it might be a good idea to start looking for it in my
>>> own private stuff, too.
> [...]
>>> [1] http://en.wikipedia.org/wiki/Bit_rot
>>>
>>> Regards,
>>> Florian Philipp
>>>
>>
>> You are using a very peculiar definition of bitrot.
>>
>> "bits" do not "rot", they are not apples in a barrel. Bitrot usually
>> refers to code that goes unmaintained and no longer works in the
>> system it was installed. What definition are you using?
>
> That's why I referred to wikipedia, not the jargon file ;-)
The wikipedia page to which you refer has _two_ definitions. The
"uncommon" on you're using:
http://en.wikipedia.org/wiki/Bit_rot#Decay_of_storage_media
and the the common one:
http://en.wikipedia.org/wiki/Bit_rot#Problems_with_software
I've heard the term "bit rot" for decades, but I've never heard the
"decay of storage media" usage. It's always referred to unmaintained
code that no longer words because of changes to tools or the
surrounding environment.
--
Grant Edwards grant.b.edwards Yow! Is something VIOLENT
at going to happen to a
gmail.com GARBAGE CAN?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 15:29 ` Grant Edwards
@ 2013-01-08 15:42 ` Michael Mol
2013-01-08 16:28 ` Florian Philipp
0 siblings, 1 reply; 39+ messages in thread
From: Michael Mol @ 2013-01-08 15:42 UTC (permalink / raw
To: gentoo-user
On Tue, Jan 8, 2013 at 10:29 AM, Grant Edwards
<grant.b.edwards@gmail.com> wrote:
> On 2013-01-08, Florian Philipp <lists@binarywings.net> wrote:
>> Am 08.01.2013 00:20, schrieb Alan McKinnon:
>>> On Mon, 07 Jan 2013 21:11:35 +0100
>>> Florian Philipp <lists@binarywings.net> wrote:
>>>
>>>> Hi list!
>>>>
>>>> I have a use case where I am seriously concerned about bit rot [1]
>>>> and I thought it might be a good idea to start looking for it in my
>>>> own private stuff, too.
>> [...]
>>>> [1] http://en.wikipedia.org/wiki/Bit_rot
>>>>
>>>> Regards,
>>>> Florian Philipp
>>>>
>>>
>>> You are using a very peculiar definition of bitrot.
>>>
>>> "bits" do not "rot", they are not apples in a barrel. Bitrot usually
>>> refers to code that goes unmaintained and no longer works in the
>>> system it was installed. What definition are you using?
>>
>> That's why I referred to wikipedia, not the jargon file ;-)
>
> The wikipedia page to which you refer has _two_ definitions. The
> "uncommon" on you're using:
>
> http://en.wikipedia.org/wiki/Bit_rot#Decay_of_storage_media
>
> and the the common one:
>
> http://en.wikipedia.org/wiki/Bit_rot#Problems_with_software
>
> I've heard the term "bit rot" for decades, but I've never heard the
> "decay of storage media" usage. It's always referred to unmaintained
> code that no longer words because of changes to tools or the
> surrounding environment.
Frankly, I'd heard of bitrot first as applying to decay of storage
media. But this was back when your average storage media decay
(floppies and early hard disks) was expected to happen within months,
if not weeks.
The term's applying to software utility being damaged by assumptions
about its platform is a far, far newer application of the term. I
still think of "crappy media and errors in transmission" before I
think of platform compatibility decay.
--
:wq
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 7:55 ` Alan McKinnon
@ 2013-01-08 16:16 ` Florian Philipp
2013-01-08 16:42 ` Alan McKinnon
2013-01-08 17:41 ` Pandu Poluan
0 siblings, 2 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 16:16 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1509 bytes --]
Am 08.01.2013 08:55, schrieb Alan McKinnon:
> On Tue, 08 Jan 2013 08:27:51 +0100
> Florian Philipp <lists@binarywings.net> wrote:
>
[...]
>>
>> As I said above, the point is that I need to detect the error as long
>> as I still have a valid backup. Professional archive solutions do
>> this on their own but I'm looking for something suitable for desktop
>> usage.
>
> rsync might be able to give you something close to what you want
> easily
>
> Use the -n switch for an rsync between your originals and the last
> backup copy, and mail the output to yourself. Parse it looking for ">"
> and "<" symbols and investigate why the file changed.
>
> This strikes me as being a very easy solution that you could use
> reliably with a suitable combination of options.
>
>
Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves
you with the problem of distinguishing between legitimate changes (i.e.
a user wrote to the file) and decay.
When you have completely static content, md5sum, rsync and friends are
sufficient. But if you have content that changes from time to time, the
number of false-positives would be too high. In this case, I think you
could easily distinguish by comparing both file content and time stamps.
Now, that of course introduces the problem that decay could occur in the
same time frame as a legitimate change, thus masking the decay. To
reduce this risk, you have to reduce the checking interval.
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 15:42 ` Michael Mol
@ 2013-01-08 16:28 ` Florian Philipp
0 siblings, 0 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 16:28 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2361 bytes --]
Am 08.01.2013 16:42, schrieb Michael Mol:
> On Tue, Jan 8, 2013 at 10:29 AM, Grant Edwards
> <grant.b.edwards@gmail.com> wrote:
>> On 2013-01-08, Florian Philipp <lists@binarywings.net> wrote:
>>> Am 08.01.2013 00:20, schrieb Alan McKinnon:
>>>> On Mon, 07 Jan 2013 21:11:35 +0100
>>>> Florian Philipp <lists@binarywings.net> wrote:
>>>>
>>>>> Hi list!
>>>>>
>>>>> I have a use case where I am seriously concerned about bit rot [1]
>>>>> and I thought it might be a good idea to start looking for it in my
>>>>> own private stuff, too.
>>> [...]
>>>>> [1] http://en.wikipedia.org/wiki/Bit_rot
>>>>>
>>>>> Regards,
>>>>> Florian Philipp
>>>>>
>>>>
>>>> You are using a very peculiar definition of bitrot.
>>>>
>>>> "bits" do not "rot", they are not apples in a barrel. Bitrot usually
>>>> refers to code that goes unmaintained and no longer works in the
>>>> system it was installed. What definition are you using?
>>>
>>> That's why I referred to wikipedia, not the jargon file ;-)
>>
>> The wikipedia page to which you refer has _two_ definitions. The
>> "uncommon" on you're using:
>>
>> http://en.wikipedia.org/wiki/Bit_rot#Decay_of_storage_media
>>
>> and the the common one:
>>
>> http://en.wikipedia.org/wiki/Bit_rot#Problems_with_software
>>
>> I've heard the term "bit rot" for decades, but I've never heard the
>> "decay of storage media" usage. It's always referred to unmaintained
>> code that no longer words because of changes to tools or the
>> surrounding environment.
>
> Frankly, I'd heard of bitrot first as applying to decay of storage
> media. But this was back when your average storage media decay
> (floppies and early hard disks) was expected to happen within months,
> if not weeks.
>
> The term's applying to software utility being damaged by assumptions
> about its platform is a far, far newer application of the term. I
> still think of "crappy media and errors in transmission" before I
> think of platform compatibility decay.
>
> --
> :wq
>
Google Scholar and Google Search have both usages on the first page of
their search results for bit rot. So let's agree that both forms are
common depending on the context. Next time, when I write about "Fighting
bugs" I'll make it clear if I'm dealing with an infestation of critters.
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 16:16 ` Florian Philipp
@ 2013-01-08 16:42 ` Alan McKinnon
2013-01-08 17:41 ` Pandu Poluan
1 sibling, 0 replies; 39+ messages in thread
From: Alan McKinnon @ 2013-01-08 16:42 UTC (permalink / raw
To: gentoo-user
On Tue, 08 Jan 2013 17:16:32 +0100
Florian Philipp <lists@binarywings.net> wrote:
> Am 08.01.2013 08:55, schrieb Alan McKinnon:
> > On Tue, 08 Jan 2013 08:27:51 +0100
> > Florian Philipp <lists@binarywings.net> wrote:
> >
> [...]
> >>
> >> As I said above, the point is that I need to detect the error as
> >> long as I still have a valid backup. Professional archive
> >> solutions do this on their own but I'm looking for something
> >> suitable for desktop usage.
> >
> > rsync might be able to give you something close to what you want
> > easily
> >
> > Use the -n switch for an rsync between your originals and the last
> > backup copy, and mail the output to yourself. Parse it looking for
> > ">" and "<" symbols and investigate why the file changed.
> >
> > This strikes me as being a very easy solution that you could use
> > reliably with a suitable combination of options.
> >
> >
>
> Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves
> you with the problem of distinguishing between legitimate changes
> (i.e. a user wrote to the file) and decay.
>
> When you have completely static content, md5sum, rsync and friends are
> sufficient. But if you have content that changes from time to time,
> the number of false-positives would be too high. In this case, I
> think you could easily distinguish by comparing both file content and
> time stamps.
>
> Now, that of course introduces the problem that decay could occur in
> the same time frame as a legitimate change, thus masking the decay. To
> reduce this risk, you have to reduce the checking interval.
I think your basic problem is that you are trying to detect a rare
event (corruption) that looks exactly like a common event (edits you
intended to make)
I don't know how to tell these apart except by somehow recording which
files have been written to - inotify is useful for this - and removing
those from the list of things rsync says have changed.
All of which leads to a massively complex lump of code that is sure to
cause many more problems than it is designed to solve....
I'm afraid I don't have any real solution to offer.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 7:27 ` Florian Philipp
2013-01-08 7:55 ` Alan McKinnon
2013-01-08 15:29 ` Grant Edwards
@ 2013-01-08 17:35 ` Volker Armin Hemmann
2013-01-08 19:06 ` Florian Philipp
2013-01-08 19:11 ` [gentoo-user] " James
2013-01-09 0:12 ` [gentoo-user] " Randy Barlow
3 siblings, 2 replies; 39+ messages in thread
From: Volker Armin Hemmann @ 2013-01-08 17:35 UTC (permalink / raw
To: gentoo-user; +Cc: Florian Philipp
Am Dienstag, 8. Januar 2013, 08:27:51 schrieb Florian Philipp:
> Am 08.01.2013 00:20, schrieb Alan McKinnon:
> > On Mon, 07 Jan 2013 21:11:35 +0100
> >
> > Florian Philipp <lists@binarywings.net> wrote:
> >> Hi list!
> >>
> >> I have a use case where I am seriously concerned about bit rot [1]
> >> and I thought it might be a good idea to start looking for it in my
> >> own private stuff, too.
>
> [...]
>
> >> [1] http://en.wikipedia.org/wiki/Bit_rot
> >>
> >> Regards,
> >> Florian Philipp
> >
> > You are using a very peculiar definition of bitrot.
> >
> > "bits" do not "rot", they are not apples in a barrel. Bitrot usually
> > refers to code that goes unmaintained and no longer works in the system
> > it was installed. What definition are you using?
>
> That's why I referred to wikipedia, not the jargon file ;-)
>
> The definition that I thought about was decay of storage media,
> especially hard disks. I'm not aware of another commonly used name for
> that effect. Disk rot seems to apply only to optical media.
>
> > If you mean crummy code that goes unmaintained, then keep systems up to
> > date and report bugs.
> >
> > If you mean disk file corruption, then doing it file by file is a
> > colossal waste of time IMNSHO. You likely have >1,000,000 files. Are
> > you really going to md5sum each one daily? Really?
>
> Well, not daily but often enough that I likely still have a valid copy
> as a backup.
and who guarantees that the backup is the correct file?
btw, the solution is zfs and weekly scrub runs.
--
#163933
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 16:16 ` Florian Philipp
2013-01-08 16:42 ` Alan McKinnon
@ 2013-01-08 17:41 ` Pandu Poluan
2013-01-08 19:02 ` Florian Philipp
2013-01-08 19:53 ` [gentoo-user] " Grant Edwards
1 sibling, 2 replies; 39+ messages in thread
From: Pandu Poluan @ 2013-01-08 17:41 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]
On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net> wrote:
>
-- snip --
>
> Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves
> you with the problem of distinguishing between legitimate changes (i.e.
> a user wrote to the file) and decay.
>
> When you have completely static content, md5sum, rsync and friends are
> sufficient. But if you have content that changes from time to time, the
> number of false-positives would be too high. In this case, I think you
> could easily distinguish by comparing both file content and time stamps.
>
> Now, that of course introduces the problem that decay could occur in the
> same time frame as a legitimate change, thus masking the decay. To
> reduce this risk, you have to reduce the checking interval.
>
> Regards,
> Florian Philipp
>
IMO, we're all barking up the wrong tree here...
Before a file's content can change without user involvement, bit rot must
first get through the checksum (CRC?) of the hard disk itself. There will
be no 'gradual degradation of data', just 'catastrophic data loss'.
I would rather focus my efforts on ensuring that my backups are always
restorable, at least until the most recent time of archival.
Rgds,
--
[-- Attachment #2: Type: text/html, Size: 1487 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 17:41 ` Pandu Poluan
@ 2013-01-08 19:02 ` Florian Philipp
2013-01-09 2:55 ` Pandu Poluan
2013-01-08 19:53 ` [gentoo-user] " Grant Edwards
1 sibling, 1 reply; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 19:02 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]
Am 08.01.2013 18:41, schrieb Pandu Poluan:
>
> On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net
> <mailto:lists@binarywings.net>> wrote:
>>
>
> -- snip --
>
[...]
>>
>> When you have completely static content, md5sum, rsync and friends are
>> sufficient. But if you have content that changes from time to time, the
>> number of false-positives would be too high. In this case, I think you
>> could easily distinguish by comparing both file content and time stamps.
>>
[...]
>
> IMO, we're all barking up the wrong tree here...
>
> Before a file's content can change without user involvement, bit rot
> must first get through the checksum (CRC?) of the hard disk itself.
> There will be no 'gradual degradation of data', just 'catastrophic data
> loss'.
>
Unfortunately, that's only partly true. Latent disk errors are a well
researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
detect and correct errors while you still have valid backups or other
types of redundancy.
The only way to do this is regular scrubbing. That's why professional
archival solutions offer some kind of self-healing which is usually just
the same as what I proposed (plus whatever on-access integrity checks
the platform supports) [4].
> I would rather focus my efforts on ensuring that my backups are always
> restorable, at least until the most recent time of archival.
>
That's the point:
a) You have to detect when you have to restore from backup.
b) You have to verify that the backup itself is still valid.
c) You have to avoid situations where undetected errors creep into the
backup.
I'm not talking about a purely theoretical possibility. I have
experienced just that: Some data that I have kept lying around for years
was corrupted.
[1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
[2] Baker et.al: A fresh look at the reliability of long-term digital
storage
http://arxiv.org/pdf/cs/0508130
[3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
Drives
http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
[4]
http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 17:35 ` [gentoo-user] " Volker Armin Hemmann
@ 2013-01-08 19:06 ` Florian Philipp
2013-01-08 20:57 ` Joshua Murphy
2013-01-08 21:49 ` Alan McKinnon
2013-01-08 19:11 ` [gentoo-user] " James
1 sibling, 2 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 19:06 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]
Am 08.01.2013 18:35, schrieb Volker Armin Hemmann:
> Am Dienstag, 8. Januar 2013, 08:27:51 schrieb Florian Philipp:
>> Am 08.01.2013 00:20, schrieb Alan McKinnon:
>>> On Mon, 07 Jan 2013 21:11:35 +0100
>>>
>>> Florian Philipp <lists@binarywings.net> wrote:
>>>> Hi list!
>>>>
>>>> I have a use case where I am seriously concerned about bit rot [1]
>>>> and I thought it might be a good idea to start looking for it in my
>>>> own private stuff, too.
>>
>> [...]
>>
>>>> [1] http://en.wikipedia.org/wiki/Bit_rot
[...]
>>> If you mean disk file corruption, then doing it file by file is a
>>> colossal waste of time IMNSHO. You likely have >1,000,000 files. Are
>>> you really going to md5sum each one daily? Really?
>>
>> Well, not daily but often enough that I likely still have a valid copy
>> as a backup.
>
> and who guarantees that the backup is the correct file?
>
That's why I wanted to store md5sum (or sha2sums).
> btw, the solution is zfs and weekly scrub runs.
>
Seems so.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 17:35 ` [gentoo-user] " Volker Armin Hemmann
2013-01-08 19:06 ` Florian Philipp
@ 2013-01-08 19:11 ` James
2013-01-09 4:40 ` Volker Armin Hemmann
1 sibling, 1 reply; 39+ messages in thread
From: James @ 2013-01-08 19:11 UTC (permalink / raw
To: gentoo-user
Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
> btw, the solution is zfs and weekly scrub runs.
I have similar concerns as Florian.
What's the consensus (opinions) on btrfs?
I'm building up several new system, with large video file
archives. I was going to try BTRFS on the entire
workstation. However, several comments here seem to suggest
to make a partition for video using BTRFS
and ext4 for the rest of the system ?
I intend keep duplicates on another system, via rsync, until such time
that BTFRS is stable by consensus.
Bitrot (silent corruption?) is a concern for video that is to be mostly archived
and only accessed once every few years: similar to the poster's original concern.
Comments/guidance on ZFS vs BTFRS are welcome. I never used ZFS; googling
suggests lots of disdain for ZFS ? Maybe someone knows a good article
or wiki discussion where the various merits of the currently available file
systems are presented?
http://en.wikipedia.org/wiki/ZFS
James
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 17:41 ` Pandu Poluan
2013-01-08 19:02 ` Florian Philipp
@ 2013-01-08 19:53 ` Grant Edwards
2013-01-08 20:30 ` Florian Philipp
2013-01-08 21:45 ` Alan McKinnon
1 sibling, 2 replies; 39+ messages in thread
From: Grant Edwards @ 2013-01-08 19:53 UTC (permalink / raw
To: gentoo-user
On 2013-01-08, Pandu Poluan <pandu@poluan.info> wrote:
> On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net> wrote:
>>
>
> -- snip --
>
>>
>> Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves
>> you with the problem of distinguishing between legitimate changes (i.e.
>> a user wrote to the file) and decay.
>>
>> When you have completely static content, md5sum, rsync and friends are
>> sufficient. But if you have content that changes from time to time, the
>> number of false-positives would be too high. In this case, I think you
>> could easily distinguish by comparing both file content and time stamps.
>>
>> Now, that of course introduces the problem that decay could occur in the
>> same time frame as a legitimate change, thus masking the decay. To
>> reduce this risk, you have to reduce the checking interval.
>>
>> Regards,
>> Florian Philipp
>
> IMO, we're all barking up the wrong tree here...
>
> Before a file's content can change without user involvement, bit rot must
> first get through the checksum (CRC?) of the hard disk itself. There will
> be no 'gradual degradation of data', just 'catastrophic data loss'.
When a hard drive starts to fail, you don't unknowingly get back
"rotten" data with some bits flipped. You get either a "seek error"
or "read error", and no data at all. IIRC, the same is true for
attempts to read a failing CD.
However, if you've got failing RAM that doesn't have hardware ECC,
that often appears as corrupted data in files. If a bit gets
erroneously flippped in a RAM page that's being used to cache file
data, and that page is marked as dirty, then the erroneous bits will
get written back to disk just like the rest of them.
--
Grant Edwards grant.b.edwards Yow! ... he dominates the
at DECADENT SUBWAY SCENE.
gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 19:53 ` [gentoo-user] " Grant Edwards
@ 2013-01-08 20:30 ` Florian Philipp
2013-01-08 21:45 ` Alan McKinnon
1 sibling, 0 replies; 39+ messages in thread
From: Florian Philipp @ 2013-01-08 20:30 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]
Am 08.01.2013 20:53, schrieb Grant Edwards:
> On 2013-01-08, Pandu Poluan <pandu@poluan.info> wrote:
>> On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net> wrote:
>>>
>>
>> -- snip --
>>
>>>
>>> Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves
>>> you with the problem of distinguishing between legitimate changes (i.e.
>>> a user wrote to the file) and decay.
>>>
>>> When you have completely static content, md5sum, rsync and friends are
>>> sufficient. But if you have content that changes from time to time, the
>>> number of false-positives would be too high. In this case, I think you
>>> could easily distinguish by comparing both file content and time stamps.
>>>
>>> Now, that of course introduces the problem that decay could occur in the
>>> same time frame as a legitimate change, thus masking the decay. To
>>> reduce this risk, you have to reduce the checking interval.
>>>
>>> Regards,
>>> Florian Philipp
>>
>> IMO, we're all barking up the wrong tree here...
>>
>> Before a file's content can change without user involvement, bit rot must
>> first get through the checksum (CRC?) of the hard disk itself. There will
>> be no 'gradual degradation of data', just 'catastrophic data loss'.
>
> When a hard drive starts to fail, you don't unknowingly get back
> "rotten" data with some bits flipped. You get either a "seek error"
> or "read error", and no data at all. IIRC, the same is true for
> attempts to read a failing CD.
>
> However, if you've got failing RAM that doesn't have hardware ECC,
> that often appears as corrupted data in files. If a bit gets
> erroneously flippped in a RAM page that's being used to cache file
> data, and that page is marked as dirty, then the erroneous bits will
> get written back to disk just like the rest of them.
>
Related: The guys in [1] observed md5sums of data and noticed all kinds
of issues: bit rot, temporary controller issues and so on.
[1] Schwarz et.al: Disk Failure Investigations at the Internet Archive
http://www.hpl.hp.com/personal/Mary_Baker/publications/wip.pdf
Regards,
Florian Philipp
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 19:06 ` Florian Philipp
@ 2013-01-08 20:57 ` Joshua Murphy
2013-01-08 21:49 ` Alan McKinnon
1 sibling, 0 replies; 39+ messages in thread
From: Joshua Murphy @ 2013-01-08 20:57 UTC (permalink / raw
To: gentoo-user
On Tue, Jan 8, 2013 at 2:06 PM, Florian Philipp <lists@binarywings.net> wrote:
> Am 08.01.2013 18:35, schrieb Volker Armin Hemmann:
>> Am Dienstag, 8. Januar 2013, 08:27:51 schrieb Florian Philipp:
>>> Am 08.01.2013 00:20, schrieb Alan McKinnon:
>>>> On Mon, 07 Jan 2013 21:11:35 +0100
>>>>
>>>> Florian Philipp <lists@binarywings.net> wrote:
>>>>> Hi list!
>>>>>
>>>>> I have a use case where I am seriously concerned about bit rot [1]
>>>>> and I thought it might be a good idea to start looking for it in my
>>>>> own private stuff, too.
>>>
>>> [...]
>>>
>>>>> [1] http://en.wikipedia.org/wiki/Bit_rot
> [...]
>>>> If you mean disk file corruption, then doing it file by file is a
>>>> colossal waste of time IMNSHO. You likely have >1,000,000 files. Are
>>>> you really going to md5sum each one daily? Really?
>>>
>>> Well, not daily but often enough that I likely still have a valid copy
>>> as a backup.
>>
>> and who guarantees that the backup is the correct file?
>>
>
> That's why I wanted to store md5sum (or sha2sums).
>
>> btw, the solution is zfs and weekly scrub runs.
>>
>
> Seems so.
>
And, while it's not exceptionally likely, there's always a possibility
that the checksum table, rather than the file being checked itself, is
the location of the corruption, meaning you have to verify that as
well when discrepancies occur. The likelihood of the perfect few bits
flipping to match the corrupted data with a corrupted hash, within the
time between checks, however, I would think is low enough to gamble on
never seeing it in a reasonable lifetime.
--
Poison [BLX]
Joshua M. Murphy
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 19:53 ` [gentoo-user] " Grant Edwards
2013-01-08 20:30 ` Florian Philipp
@ 2013-01-08 21:45 ` Alan McKinnon
2013-01-08 22:15 ` Grant Edwards
1 sibling, 1 reply; 39+ messages in thread
From: Alan McKinnon @ 2013-01-08 21:45 UTC (permalink / raw
To: gentoo-user
On Tue, 8 Jan 2013 19:53:41 +0000 (UTC)
Grant Edwards <grant.b.edwards@gmail.com> wrote:
> On 2013-01-08, Pandu Poluan <pandu@poluan.info> wrote:
> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net>
> > wrote:
> >>
> >
> > -- snip --
> >
> >>
> >> Hmm, good idea, albeit similar to the `md5sum -c`. Either tool
> >> leaves you with the problem of distinguishing between legitimate
> >> changes (i.e. a user wrote to the file) and decay.
> >>
> >> When you have completely static content, md5sum, rsync and friends
> >> are sufficient. But if you have content that changes from time to
> >> time, the number of false-positives would be too high. In this
> >> case, I think you could easily distinguish by comparing both file
> >> content and time stamps.
> >>
> >> Now, that of course introduces the problem that decay could occur
> >> in the same time frame as a legitimate change, thus masking the
> >> decay. To reduce this risk, you have to reduce the checking
> >> interval.
> >>
> >> Regards,
> >> Florian Philipp
> >
> > IMO, we're all barking up the wrong tree here...
> >
> > Before a file's content can change without user involvement, bit
> > rot must first get through the checksum (CRC?) of the hard disk
> > itself. There will be no 'gradual degradation of data', just
> > 'catastrophic data loss'.
>
> When a hard drive starts to fail, you don't unknowingly get back
> "rotten" data with some bits flipped. You get either a "seek error"
> or "read error", and no data at all. IIRC, the same is true for
> attempts to read a failing CD.
I see what Florian is getting at here, and he's perfectly correct.
We techie types often like to think our storage is purely binary, the
cells are either on or off and they never change unless we
deliberately make them change. We think this way because we wrap our
storage in layers to make it look that way, in the style of an API.
The truth is that our storage is subject to decay. Harddrives are
magnetic at heart, and atoms have to align and stay aligned for the
drive to work. Floppies are infinitely worse at this, but drives are
not immune. Writeable CDs do not have physical pits and lands like
factory original discs have, they use chemicals to make reflective and
non-reflective spots. The list of points of corruption is long and
they all happen after the data has been committed to physical storage.
Worse, you only know about the corruption by reading it, there is no
other way to discover if the medium and the data are still OK. He wants
to read the medium occasionally and verify it while the backups are
still usable, and not wait for the point of no return - the "read error"
from a medium that long since failed.
Maybe Florian's data is valuable enough to warrant worth the effort. I
know mine isn't, but his might be.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 19:06 ` Florian Philipp
2013-01-08 20:57 ` Joshua Murphy
@ 2013-01-08 21:49 ` Alan McKinnon
1 sibling, 0 replies; 39+ messages in thread
From: Alan McKinnon @ 2013-01-08 21:49 UTC (permalink / raw
To: gentoo-user
On Tue, 08 Jan 2013 20:06:25 +0100
Florian Philipp <lists@binarywings.net> wrote:
> Am 08.01.2013 18:35, schrieb Volker Armin Hemmann:
> > Am Dienstag, 8. Januar 2013, 08:27:51 schrieb Florian Philipp:
> >> Am 08.01.2013 00:20, schrieb Alan McKinnon:
> >>> On Mon, 07 Jan 2013 21:11:35 +0100
> >>>
> >>> Florian Philipp <lists@binarywings.net> wrote:
> >>>> Hi list!
> >>>>
> >>>> I have a use case where I am seriously concerned about bit rot
> >>>> [1] and I thought it might be a good idea to start looking for
> >>>> it in my own private stuff, too.
> >>
> >> [...]
> >>
> >>>> [1] http://en.wikipedia.org/wiki/Bit_rot
> [...]
> >>> If you mean disk file corruption, then doing it file by file is a
> >>> colossal waste of time IMNSHO. You likely have >1,000,000 files.
> >>> Are you really going to md5sum each one daily? Really?
> >>
> >> Well, not daily but often enough that I likely still have a valid
> >> copy as a backup.
> >
> > and who guarantees that the backup is the correct file?
> >
>
> That's why I wanted to store md5sum (or sha2sums).
Watch out for circular problems - you will likely store the md5sum on
the same medium type you are trying to validate. Which means the md5sum
is just as unreliable as the data itself :-)
Interesting factoid: we long since passed the point where there is a
statistical good chance of cosmic rays flipping bits in a RAID that is
being rebuilt *before* the rebuild is complete. Usually we all just
pretend it's not like this and we'll get lucky. usually this works out
fine because we are lucky
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 21:45 ` Alan McKinnon
@ 2013-01-08 22:15 ` Grant Edwards
2013-01-08 23:37 ` Alan McKinnon
0 siblings, 1 reply; 39+ messages in thread
From: Grant Edwards @ 2013-01-08 22:15 UTC (permalink / raw
To: gentoo-user
On 2013-01-08, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
>> When a hard drive starts to fail, you don't unknowingly get back
>> "rotten" data with some bits flipped. You get either a "seek error"
>> or "read error", and no data at all. IIRC, the same is true for
>> attempts to read a failing CD.
>
> I see what Florian is getting at here, and he's perfectly correct.
>
> We techie types often like to think our storage is purely binary, the
> cells are either on or off and they never change unless we
> deliberately make them change. We think this way because we wrap our
> storage in layers to make it look that way, in the style of an API.
>
> The truth is that our storage is subject to decay. Harddrives are
> magnetic at heart, and atoms have to align and stay aligned for the
> drive to work. Floppies are infinitely worse at this, but drives are
> not immune. Writeable CDs do not have physical pits and lands like
> factory original discs have, they use chemicals to make reflective and
> non-reflective spots. The list of points of corruption is long and
> they all happen after the data has been committed to physical storage.
True. But, in my experience, the chances of any of those failures
resulting in a successful read of incorrect data is vanishly small.
> Worse, you only know about the corruption by reading it, there is no
> other way to discover if the medium and the data are still OK. He
> wants to read the medium occasionally
That may be a good idea, and will detect media failures.
> and verify it
That's the part I think is pointless in practice (if you're trying to
detect failing media).
> while the backups are still usable, and not wait for the point of no
> return - the "read error" from a medium that long since failed.
My point is that _comparing_data_to_a_backup_ just isn't a useful,
practical way to detect failing hard drives, optical drives, or CDs.
I've seen a lot of hard drives, optical drives, floppy drives,
flopies, and CDs fail. The failure mode in every case has been a "seek
error" or "read error" resulting in _no_data_ rather than a read
returning erroneous data.
It seems that in laboratory conditions, people have managed to see
erroneous data, but I'm not convinced worrying about it is worthwhile.
IMO, having backup data _is_ very valuable, but regularly reading
files and comparing them to backup copies isn't a useful way to detect
failing media.
You're much more likely to detect failing RAM (which is useful, but
there are better ways to do it).
--
Grant Edwards grant.b.edwards Yow! I think I am an
at overnight sensation right
gmail.com now!!
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 22:15 ` Grant Edwards
@ 2013-01-08 23:37 ` Alan McKinnon
2013-01-09 2:47 ` Grant Edwards
0 siblings, 1 reply; 39+ messages in thread
From: Alan McKinnon @ 2013-01-08 23:37 UTC (permalink / raw
To: gentoo-user
On Tue, 8 Jan 2013 22:15:15 +0000 (UTC)
Grant Edwards <grant.b.edwards@gmail.com> wrote:
> IMO, having backup data _is_ very valuable, but regularly reading
> files and comparing them to backup copies isn't a useful way to detect
> failing media.
He doesn't suggest you compare the live data to a backup. He suggests
you compare the current checksum to the last known (presumed or
verified as good) checksum, and if they are different then deal with it.
"deal with it" likely involves a restore after some kind of verify
process.
I agree that comparing current data with a backup is pretty pointless -
you don't know which is the bad one if they differ.
ZFS is designed to deal with this problem by checksumming fs blocks
continually; it does this at the filesystem level, not at the disk
firmware level. Pity about the license incompatibility, it's a great fs.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 7:27 ` Florian Philipp
` (2 preceding siblings ...)
2013-01-08 17:35 ` [gentoo-user] " Volker Armin Hemmann
@ 2013-01-09 0:12 ` Randy Barlow
3 siblings, 0 replies; 39+ messages in thread
From: Randy Barlow @ 2013-01-09 0:12 UTC (permalink / raw
To: gentoo-user
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 01/08/2013 02:27 AM, Florian Philipp wrote:
> As I said above, the point is that I need to detect the error as
> long as I still have a valid backup. Professional archive solutions
> do this on their own but I'm looking for something suitable for
> desktop usage.
I wouldn't recommend using it for real data yet, but I do believe that
btrfs will be a good way to get what you want in the future if you use
their RAID-like feature. That filesystem is still experimental, but it
is very cool and I am excited about it.
- --
R
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlDstgsACgkQw3vjPfF7QfWH7wCfcGyy2J1rP9cNQSiPwl66PJDG
bKkAn3j8DMiORrwZ3MrFhebw5en6GA0q
=n9/9
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 23:37 ` Alan McKinnon
@ 2013-01-09 2:47 ` Grant Edwards
2013-01-09 8:31 ` Alan McKinnon
0 siblings, 1 reply; 39+ messages in thread
From: Grant Edwards @ 2013-01-09 2:47 UTC (permalink / raw
To: gentoo-user
On 2013-01-08, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> On Tue, 8 Jan 2013 22:15:15 +0000 (UTC)
> Grant Edwards <grant.b.edwards@gmail.com> wrote:
>
>> IMO, having backup data _is_ very valuable, but regularly reading
>> files and comparing them to backup copies isn't a useful way to detect
>> failing media.
>
> He doesn't suggest you compare the live data to a backup. He suggests
> you compare the current checksum to the last known (presumed or
> verified as good) checksum,
My point is that comparing the read data with <whatever> is a waste of
time if you're worried about detecting media failure. In my
expierence, you don't _get_ erroneous data from failing media. You
get seek/read failures.
> and if they are different then deal with it. "deal with it" likely
> involves a restore after some kind of verify process.
>
> I agree that comparing current data with a backup is pretty pointless -
> you don't know which is the bad one if they differ.
>
> ZFS is designed to deal with this problem by checksumming fs blocks
> continually; it does this at the filesystem level, not at the disk
> firmware level.
I don't understand. If you're worried about media failure, what good
does checksumming at the file level do when failing media produces
seek/read errors rather than erroneous data? When the media fails,
there is no data to checksum.
--
Grant
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] OT: Fighting bit rot
2013-01-08 19:02 ` Florian Philipp
@ 2013-01-09 2:55 ` Pandu Poluan
0 siblings, 0 replies; 39+ messages in thread
From: Pandu Poluan @ 2013-01-09 2:55 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 3095 bytes --]
On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@binarywings.net> wrote:
>
> Am 08.01.2013 18:41, schrieb Pandu Poluan:
> >
> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net
> > <mailto:lists@binarywings.net>> wrote:
> >>
> >
> > -- snip --
> >
> [...]
> >>
> >> When you have completely static content, md5sum, rsync and friends are
> >> sufficient. But if you have content that changes from time to time, the
> >> number of false-positives would be too high. In this case, I think you
> >> could easily distinguish by comparing both file content and time
stamps.
> >>
> [...]
> >
> > IMO, we're all barking up the wrong tree here...
> >
> > Before a file's content can change without user involvement, bit rot
> > must first get through the checksum (CRC?) of the hard disk itself.
> > There will be no 'gradual degradation of data', just 'catastrophic data
> > loss'.
> >
>
> Unfortunately, that's only partly true. Latent disk errors are a well
> researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
> detect and correct errors while you still have valid backups or other
> types of redundancy.
>
> The only way to do this is regular scrubbing. That's why professional
> archival solutions offer some kind of self-healing which is usually just
> the same as what I proposed (plus whatever on-access integrity checks
> the platform supports) [4].
>
> > I would rather focus my efforts on ensuring that my backups are always
> > restorable, at least until the most recent time of archival.
> >
>
> That's the point:
> a) You have to detect when you have to restore from backup.
> b) You have to verify that the backup itself is still valid.
> c) You have to avoid situations where undetected errors creep into the
> backup.
>
> I'm not talking about a purely theoretical possibility. I have
> experienced just that: Some data that I have kept lying around for years
> was corrupted.
>
> [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
> http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
>
> [2] Baker et.al: A fresh look at the reliability of long-term digital
> storage
> http://arxiv.org/pdf/cs/0508130
>
> [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
> Drives
> http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
>
> [4]
>
http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
>
> Regards,
> Florian Philipp
>
Interesting reads... thanks for the link!
Hmm... if I'm in your position, I think this is what I'll do:
1. Make a set of MD5 'checksums', one per file for ease of update.
2. Compare the checksums with the actual files before opening a file. If
mismatch, notify.
3. When file handle is closed, recalculate.
Protect the set of MD5 periodically using par2.
Also protect your backups using par2, for that matter (that's what I always
do when I archive something to optical media).
Of course, you can outright use par2 to protect and ECC your data, but the
time needed to generate the .par files *every time* would be too much,
methinks...
Rgds,
--
[-- Attachment #2: Type: text/html, Size: 4325 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-08 19:11 ` [gentoo-user] " James
@ 2013-01-09 4:40 ` Volker Armin Hemmann
2013-01-09 15:17 ` walt
0 siblings, 1 reply; 39+ messages in thread
From: Volker Armin Hemmann @ 2013-01-09 4:40 UTC (permalink / raw
To: gentoo-user; +Cc: James
Am Dienstag, 8. Januar 2013, 19:11:19 schrieb James:
> Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
> Comments/guidance on ZFS vs BTFRS are welcome. I never used ZFS; googling
> suggests lots of disdain for ZFS ? Maybe someone knows a good article
> or wiki discussion where the various merits of the currently available file
> systems are presented?
does btrfs support raid levels others than 1?
zfs does. Is freaking easy to set up and to use. Can handle swap files and
supports dedup.
is not linux-only.
--
#163933
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 2:47 ` Grant Edwards
@ 2013-01-09 8:31 ` Alan McKinnon
2013-01-09 14:48 ` Grant Edwards
0 siblings, 1 reply; 39+ messages in thread
From: Alan McKinnon @ 2013-01-09 8:31 UTC (permalink / raw
To: gentoo-user
On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
Grant Edwards <grant.b.edwards@gmail.com> wrote:
> > ZFS is designed to deal with this problem by checksumming fs blocks
> > continually; it does this at the filesystem level, not at the disk
> > firmware level.
>
> I don't understand. If you're worried about media failure, what good
> does checksumming at the file level do when failing media produces
> seek/read errors rather than erroneous data? When the media fails,
> there is no data to checksum.
Not file level - it's filesystem level. It checksums filesystem blocks.
And we are not talking about failing media either, we are talking about
media corruption. You appear to have conflated them.
The data on a medium can corrupt, and it can corrupt silently for a
long time. At some point it may deteriorate to where it passes a cusp
and then you will get your first visible sign - read failure. You did
not see anything that happened prior as it was silent.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 8:31 ` Alan McKinnon
@ 2013-01-09 14:48 ` Grant Edwards
2013-01-09 15:36 ` Holger Hoffstaette
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: Grant Edwards @ 2013-01-09 14:48 UTC (permalink / raw
To: gentoo-user
On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
> The data on a medium can corrupt, and it can corrupt silently for a
> long time.
And I'm saying I've never seen that happen.
So you're saying that the data on a medium can corrupt without being
detected by the block encodings and CRCs used by the disk controller?
> At some point it may deteriorate to where it passes a cusp
> and then you will get your first visible sign
No, the first visible sign in the scenario you're describing would be
a read returning erroneous data.
> - read failure. You did not see anything that happened prior as it
> was silent.
If a read successfully returns correct data, how is it "silent"?
--
Grant Edwards grant.b.edwards Yow! Someone in DAYTON,
at Ohio is selling USED
gmail.com CARPETS to a SERBO-CROATIAN
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 4:40 ` Volker Armin Hemmann
@ 2013-01-09 15:17 ` walt
2013-01-09 18:57 ` Volker Armin Hemmann
0 siblings, 1 reply; 39+ messages in thread
From: walt @ 2013-01-09 15:17 UTC (permalink / raw
To: gentoo-user
On 01/08/2013 08:40 PM, Volker Armin Hemmann wrote:
> Am Dienstag, 8. Januar 2013, 19:11:19 schrieb James:
>> Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
>
>> Comments/guidance on ZFS vs BTFRS are welcome. I never used ZFS; googling
>> suggests lots of disdain for ZFS ? Maybe someone knows a good article
>> or wiki discussion where the various merits of the currently available file
>> systems are presented?
>
> does btrfs support raid levels others than 1?
>
> zfs does. Is freaking easy to set up and to use. Can handle swap files and
> supports dedup.
> is not linux-only.
Are you using the gentoo zfs and zfs-kmod packages to get zfs support? Are
they ready for prime time?
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 14:48 ` Grant Edwards
@ 2013-01-09 15:36 ` Holger Hoffstaette
2013-01-09 16:32 ` Pandu Poluan
2013-01-09 16:42 ` Grant Edwards
2013-01-09 20:52 ` Alan McKinnon
2013-01-09 20:53 ` Alan McKinnon
2 siblings, 2 replies; 39+ messages in thread
From: Holger Hoffstaette @ 2013-01-09 15:36 UTC (permalink / raw
To: gentoo-user
On Wed, 09 Jan 2013 14:48:33 +0000, Grant Edwards wrote:
> On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
>> On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
>
>> The data on a medium can corrupt, and it can corrupt silently for a long
>> time.
>
> And I'm saying I've never seen that happen.
Well, that's the point.
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
Things are much worse than you think.
-h
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 15:36 ` Holger Hoffstaette
@ 2013-01-09 16:32 ` Pandu Poluan
2013-01-09 16:42 ` Grant Edwards
1 sibling, 0 replies; 39+ messages in thread
From: Pandu Poluan @ 2013-01-09 16:32 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 693 bytes --]
On Jan 9, 2013 10:41 PM, "Holger Hoffstaette" <
holger.hoffstaette@googlemail.com> wrote:
>
> On Wed, 09 Jan 2013 14:48:33 +0000, Grant Edwards wrote:
>
> > On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> >> On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
> >
> >> The data on a medium can corrupt, and it can corrupt silently for a
long
> >> time.
> >
> > And I'm saying I've never seen that happen.
>
> Well, that's the point.
>
> http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
>
> Things are much worse than you think.
>
> -h
>
But that link also shows a bright light: RAID arrays are in general more
reliable, with BER less than one-third their spec.
Rgds,
--
[-- Attachment #2: Type: text/html, Size: 1097 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 15:36 ` Holger Hoffstaette
2013-01-09 16:32 ` Pandu Poluan
@ 2013-01-09 16:42 ` Grant Edwards
1 sibling, 0 replies; 39+ messages in thread
From: Grant Edwards @ 2013-01-09 16:42 UTC (permalink / raw
To: gentoo-user
On 2013-01-09, Holger Hoffstaette <holger.hoffstaette@googlemail.com> wrote:
> On Wed, 09 Jan 2013 14:48:33 +0000, Grant Edwards wrote:
>
>> On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
>>> On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
>>
>>> The data on a medium can corrupt, and it can corrupt silently for a long
>>> time.
>>
>> And I'm saying I've never seen that happen.
>
> Well, that's the point.
>
> http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
>
> Things are much worse than you think.
Apparenty so. It looks like anybody who'se not using ECC RAM, is
already in a pretty hopeless situation. :/
--
Grant Edwards grant.b.edwards Yow! I request a weekend in
at Havana with Phil Silvers!
gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 15:17 ` walt
@ 2013-01-09 18:57 ` Volker Armin Hemmann
0 siblings, 0 replies; 39+ messages in thread
From: Volker Armin Hemmann @ 2013-01-09 18:57 UTC (permalink / raw
To: gentoo-user; +Cc: walt
Am Mittwoch, 9. Januar 2013, 07:17:25 schrieb walt:
> On 01/08/2013 08:40 PM, Volker Armin Hemmann wrote:
> > Am Dienstag, 8. Januar 2013, 19:11:19 schrieb James:
> >> Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
> >>
> >> Comments/guidance on ZFS vs BTFRS are welcome. I never used ZFS; googling
> >> suggests lots of disdain for ZFS ? Maybe someone knows a good article
> >> or wiki discussion where the various merits of the currently available
> >> file
> >> systems are presented?
> >
> > does btrfs support raid levels others than 1?
> >
> > zfs does. Is freaking easy to set up and to use. Can handle swap files and
> > supports dedup.
> > is not linux-only.
>
> Are you using the gentoo zfs and zfs-kmod packages to get zfs support?
yes.
> Are
> they ready for prime time?
they work for me. I don't use latest-and-greatest kernels and I use vanilla
kernel.org sources.
zpool status
pool: zfstank
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 3h23m with 9 errors on Sat Jan 5 05:47:34 2013
config:
NAME STATE READ WRITE
CKSUM
zfstank ONLINE 0 0
0
raidz1-0 ONLINE 0 0
0
ata-Hitachi_HDS5C3020ALA632_ML4230FA17X6EK ONLINE 0 0
0
ata-Hitachi_HDS5C3020ALA632_ML4230FA17X6HK ONLINE 0 0
0
ata-Hitachi_HDS5C3020ALA632_ML4230FA17X7YK ONLINE 0 0
0
errors: 9 data errors, use '-v' for a list
those errors were caused by a memory glitch (and are video files... so... I
don't even care about them - also they are still on two different backup
media...), But zfs caught these errors. ext4? I really doubt it.
--
#163933
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 14:48 ` Grant Edwards
2013-01-09 15:36 ` Holger Hoffstaette
@ 2013-01-09 20:52 ` Alan McKinnon
2013-01-09 20:53 ` Alan McKinnon
2 siblings, 0 replies; 39+ messages in thread
From: Alan McKinnon @ 2013-01-09 20:52 UTC (permalink / raw
To: gentoo-user
On Wed, 9 Jan 2013 14:48:33 +0000 (UTC)
Grant Edwards <grant.b.edwards@gmail.com> wrote:
> On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> > On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
>
> > The data on a medium can corrupt, and it can corrupt silently for a
> > long time.
>
> And I'm saying I've never seen that happen.
>
> So you're saying that the data on a medium can corrupt without being
> detected by the block encodings and CRCs used by the disk controller?
No, I'm not saying that *at*all*
I've been saying all along that data which you never go near can
corrupt silently while you are not using it. When you do eventually get
around to reading it, the electronics will do what they are designed to
do and properly detect a problem that already happened.
>
> > At some point it may deteriorate to where it passes a cusp
> > and then you will get your first visible sign
>
> No, the first visible sign in the scenario you're describing would be
> a read returning erroneous data.
That's what I said. The first VISIBLE sign is an error. You want to
catch it before then.
Analogy time: A murderer plans to do Grant. By observing Grant and only
observing Grant, the first visible sign of an issue is the death of
Grant. Obviously this is sub-optimum and we should be looking at a few
more things than just Grant in order to preserve Grant.
>
> > - read failure. You did not see anything that happened prior as it
> > was silent.
>
> If a read successfully returns correct data, how is it "silent"?
I never used those words and never said "successfully returns correct
data". At best I said something equivalent to "If a read returns".
The point I'm trying hard to make is that all our fancy hardware merely
gives an *apparency* of reliable results that are totally right or
totally wrong. It looks that way because the IT industry spent much
time creating wrappers and APIs to give that effect. Under the covers
where the actual storage happens it is not like that, and errors can
happen. They are rare.
Lucky for us, these days we have precision machinery and clever
mathematics that reduce the problem vastly. I know in my own case the
electronics offer a reliability that far exceeds what I need so I can
afford to ignore rare problems. Other people have different needs.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [gentoo-user] Re: OT: Fighting bit rot
2013-01-09 14:48 ` Grant Edwards
2013-01-09 15:36 ` Holger Hoffstaette
2013-01-09 20:52 ` Alan McKinnon
@ 2013-01-09 20:53 ` Alan McKinnon
2 siblings, 0 replies; 39+ messages in thread
From: Alan McKinnon @ 2013-01-09 20:53 UTC (permalink / raw
To: gentoo-user
On Wed, 9 Jan 2013 14:48:33 +0000 (UTC)
Grant Edwards <grant.b.edwards@gmail.com> wrote:
> On 2013-01-09, Alan McKinnon <alan.mckinnon@gmail.com> wrote:
> > On Wed, 9 Jan 2013 02:47:07 +0000 (UTC)
>
> > The data on a medium can corrupt, and it can corrupt silently for a
> > long time.
>
> And I'm saying I've never seen that happen.
>
> So you're saying that the data on a medium can corrupt without being
> detected by the block encodings and CRCs used by the disk controller?
No, I'm not saying that *at*all*
I've been saying all along that data which you never go near can
corrupt silently while you are not using it. When you do eventually get
around to reading it, the electronics will do what they are designed to
do and properly detect a problem that already happened.
>
> > At some point it may deteriorate to where it passes a cusp
> > and then you will get your first visible sign
>
> No, the first visible sign in the scenario you're describing would be
> a read returning erroneous data.
That's what I said. The first VISIBLE sign is an error. You want to
catch it before then.
Analogy time: A murderer plans to do Grant. By observing Grant and only
observing Grant, the first visible sign of an issue is the death of
Grant. Obviously this is sub-optimum and we should be looking at a few
more things than just Grant in order to preserve Grant.
>
> > - read failure. You did not see anything that happened prior as it
> > was silent.
>
> If a read successfully returns correct data, how is it "silent"?
I never used those words and never said "successfully returns correct
data". At best I said something equivalent to "If a read returns".
The point I'm trying hard to make is that all our fancy hardware merely
gives an *apparency* of reliable results that are totally right or
totally wrong. It looks that way because the IT industry spent much
time creating wrappers and APIs to give that effect. Under the covers
where the actual storage happens it is not like that, and errors can
happen. They are rare.
Lucky for us, these days we have precision machinery and clever
mathematics that reduce the problem vastly. I know in my own case the
electronics offer a reliability that far exceeds what I need so I can
afford to ignore rare problems. Other people have different needs.
--
Alan McKinnon
alan.mckinnon@gmail.com
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2013-01-09 20:55 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-07 20:11 [gentoo-user] OT: Fighting bit rot Florian Philipp
2013-01-07 21:07 ` Paul Hartman
2013-01-07 22:05 ` Florian Philipp
2013-01-07 21:33 ` Michael Mol
2013-01-07 22:10 ` Florian Philipp
2013-01-07 23:20 ` Alan McKinnon
2013-01-08 7:27 ` Florian Philipp
2013-01-08 7:55 ` Alan McKinnon
2013-01-08 16:16 ` Florian Philipp
2013-01-08 16:42 ` Alan McKinnon
2013-01-08 17:41 ` Pandu Poluan
2013-01-08 19:02 ` Florian Philipp
2013-01-09 2:55 ` Pandu Poluan
2013-01-08 19:53 ` [gentoo-user] " Grant Edwards
2013-01-08 20:30 ` Florian Philipp
2013-01-08 21:45 ` Alan McKinnon
2013-01-08 22:15 ` Grant Edwards
2013-01-08 23:37 ` Alan McKinnon
2013-01-09 2:47 ` Grant Edwards
2013-01-09 8:31 ` Alan McKinnon
2013-01-09 14:48 ` Grant Edwards
2013-01-09 15:36 ` Holger Hoffstaette
2013-01-09 16:32 ` Pandu Poluan
2013-01-09 16:42 ` Grant Edwards
2013-01-09 20:52 ` Alan McKinnon
2013-01-09 20:53 ` Alan McKinnon
2013-01-08 15:29 ` Grant Edwards
2013-01-08 15:42 ` Michael Mol
2013-01-08 16:28 ` Florian Philipp
2013-01-08 17:35 ` [gentoo-user] " Volker Armin Hemmann
2013-01-08 19:06 ` Florian Philipp
2013-01-08 20:57 ` Joshua Murphy
2013-01-08 21:49 ` Alan McKinnon
2013-01-08 19:11 ` [gentoo-user] " James
2013-01-09 4:40 ` Volker Armin Hemmann
2013-01-09 15:17 ` walt
2013-01-09 18:57 ` Volker Armin Hemmann
2013-01-09 0:12 ` [gentoo-user] " Randy Barlow
2013-01-07 23:31 ` William Kenworthy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox