On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@binarywings.net> wrote:
>
> Am 08.01.2013 18:41, schrieb Pandu Poluan:
> >
> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net
> > <mailto:lists@binarywings.net>> wrote:
> >>
> >
> > -- snip --
> >
> [...]
> >>
> >> When you have completely static content, md5sum, rsync and friends are
> >> sufficient. But if you have content that changes from time to time, the
> >> number of false-positives would be too high. In this case, I think you
> >> could easily distinguish by comparing both file content and time
stamps.
> >>
> [...]
> >
> > IMO, we're all barking up the wrong tree here...
> >
> > Before a file's content can change without user involvement, bit rot
> > must first get through the checksum (CRC?) of the hard disk itself.
> > There will be no 'gradual degradation of data', just 'catastrophic data
> > loss'.
> >
>
> Unfortunately, that's only partly true. Latent disk errors are a well
> researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
> detect and correct errors while you still have valid backups or other
> types of redundancy.
>
> The only way to do this is regular scrubbing. That's why professional
> archival solutions offer some kind of self-healing which is usually just
> the same as what I proposed (plus whatever on-access integrity checks
> the platform supports) [4].
>
> > I would rather focus my efforts on ensuring that my backups are always
> > restorable, at least until the most recent time of archival.
> >
>
> That's the point:
> a) You have to detect when you have to restore from backup.
> b) You have to verify that the backup itself is still valid.
> c) You have to avoid situations where undetected errors creep into the
> backup.
>
> I'm not talking about a purely theoretical possibility. I have
> experienced just that: Some data that I have kept lying around for years
> was corrupted.
>
> [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
> http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
>
> [2] Baker et.al: A fresh look at the reliability of long-term digital
> storage
> http://arxiv.org/pdf/cs/0508130
>
> [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
> Drives
> http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
>
> [4]
>
http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
>
> Regards,
> Florian Philipp
>

Interesting reads... thanks for the link!

Hmm... if I'm in your position, I think this is what I'll do:

1. Make a set of MD5 'checksums', one per file for ease of update.
2. Compare the checksums with the actual files before opening a file. If
mismatch, notify.
3. When file handle is closed, recalculate.

Protect the set of MD5 periodically using par2.

Also protect your backups using par2, for that matter (that's what I always
do when I archive something to optical media).

Of course, you can outright use par2 to protect and ECC your data, but the
time needed to generate the .par files *every time* would be too much,
methinks...

Rgds,
--