From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id EB448138383 for ; Wed, 9 Jan 2013 02:57:57 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 4653721C18A; Wed, 9 Jan 2013 02:57:38 +0000 (UTC) Received: from svr-us4.tirtonadi.com (svr-us4.tirtonadi.com [69.65.43.212]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 7DAB321C15F for ; Wed, 9 Jan 2013 02:55:58 +0000 (UTC) Received: from mail-pa0-f46.google.com ([209.85.220.46]:50209) by svr-us4.tirtonadi.com with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.80) (envelope-from ) id 1Tslpd-002ZQS-Dj for gentoo-user@lists.gentoo.org; Wed, 09 Jan 2013 09:55:58 +0700 Received: by mail-pa0-f46.google.com with SMTP id bh2so722275pad.5 for ; Tue, 08 Jan 2013 18:55:55 -0800 (PST) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Received: by 10.68.134.232 with SMTP id pn8mr202488405pbb.47.1357700155900; Tue, 08 Jan 2013 18:55:55 -0800 (PST) Received: by 10.68.248.66 with HTTP; Tue, 8 Jan 2013 18:55:55 -0800 (PST) Received: by 10.68.248.66 with HTTP; Tue, 8 Jan 2013 18:55:55 -0800 (PST) In-Reply-To: <50EC6D59.1090809@binarywings.net> References: <50EB2BF7.4040109@binarywings.net> <20130108012016.2f02c68c@khamul.example.com> <50EBCA77.8030603@binarywings.net> <20130108095510.04f84040@khamul.example.com> <50EC4660.5090208@binarywings.net> <50EC6D59.1090809@binarywings.net> Date: Wed, 9 Jan 2013 09:55:55 +0700 Message-ID: Subject: Re: [gentoo-user] OT: Fighting bit rot From: Pandu Poluan To: gentoo-user@lists.gentoo.org Content-Type: multipart/alternative; boundary=047d7b10c86181025f04d2d233e8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - svr-us4.tirtonadi.com X-AntiAbuse: Original Domain - lists.gentoo.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - poluan.info X-Get-Message-Sender-Via: svr-us4.tirtonadi.com: authenticated_id: rileyer+pandu.poluan.info/only user confirmed/virtual account not confirmed X-Archives-Salt: 03ecab56-13bf-4eb6-b400-12044b73add4 X-Archives-Hash: 57114151e5de0373314d62355296d2ea --047d7b10c86181025f04d2d233e8 Content-Type: text/plain; charset=UTF-8 On Jan 9, 2013 2:06 AM, "Florian Philipp" wrote: > > Am 08.01.2013 18:41, schrieb Pandu Poluan: > > > > On Jan 8, 2013 11:20 PM, "Florian Philipp" > > wrote: > >> > > > > -- snip -- > > > [...] > >> > >> When you have completely static content, md5sum, rsync and friends are > >> sufficient. But if you have content that changes from time to time, the > >> number of false-positives would be too high. In this case, I think you > >> could easily distinguish by comparing both file content and time stamps. > >> > [...] > > > > IMO, we're all barking up the wrong tree here... > > > > Before a file's content can change without user involvement, bit rot > > must first get through the checksum (CRC?) of the hard disk itself. > > There will be no 'gradual degradation of data', just 'catastrophic data > > loss'. > > > > Unfortunately, that's only partly true. Latent disk errors are a well > researched topic [1-3]. CRCs are not perfectly reliable. The trick is to > detect and correct errors while you still have valid backups or other > types of redundancy. > > The only way to do this is regular scrubbing. That's why professional > archival solutions offer some kind of self-healing which is usually just > the same as what I proposed (plus whatever on-access integrity checks > the platform supports) [4]. > > > I would rather focus my efforts on ensuring that my backups are always > > restorable, at least until the most recent time of archival. > > > > That's the point: > a) You have to detect when you have to restore from backup. > b) You have to verify that the backup itself is still valid. > c) You have to avoid situations where undetected errors creep into the > backup. > > I'm not talking about a purely theoretical possibility. I have > experienced just that: Some data that I have kept lying around for years > was corrupted. > > [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems > http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf > > [2] Baker et.al: A fresh look at the reliability of long-term digital > storage > http://arxiv.org/pdf/cs/0508130 > > [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk > Drives > http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf > > [4] > http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf > > Regards, > Florian Philipp > Interesting reads... thanks for the link! Hmm... if I'm in your position, I think this is what I'll do: 1. Make a set of MD5 'checksums', one per file for ease of update. 2. Compare the checksums with the actual files before opening a file. If mismatch, notify. 3. When file handle is closed, recalculate. Protect the set of MD5 periodically using par2. Also protect your backups using par2, for that matter (that's what I always do when I archive something to optical media). Of course, you can outright use par2 to protect and ECC your data, but the time needed to generate the .par files *every time* would be too much, methinks... Rgds, -- --047d7b10c86181025f04d2d233e8 Content-Type: text/html; charset=UTF-8


On Jan 9, 2013 2:06 AM, "Florian Philipp" <lists@binarywings.net> wrote:
>
> Am 08.01.2013 18:41, schrieb Pandu Poluan:
> >
> > On Jan 8, 2013 11:20 PM, "Florian Philipp" <lists@binarywings.net
> > <mailto:lists@binarywings.net>> wrote:
> >>
> >
> > -- snip --
> >
> [...]
> >>
> >> When you have completely static content, md5sum, rsync and friends are
> >> sufficient. But if you have content that changes from time to time, the
> >> number of false-positives would be too high. In this case, I think you
> >> could easily distinguish by comparing both file content and time stamps.
> >>
> [...]
> >
> > IMO, we're all barking up the wrong tree here...
> >
> > Before a file's content can change without user involvement, bit rot
> > must first get through the checksum (CRC?) of the hard disk itself.
> > There will be no 'gradual degradation of data', just 'catastrophic data
> > loss'.
> >
>
> Unfortunately, that's only partly true. Latent disk errors are a well
> researched topic [1-3]. CRCs are not perfectly reliable. The trick is to
> detect and correct errors while you still have valid backups or other
> types of redundancy.
>
> The only way to do this is regular scrubbing. That's why professional
> archival solutions offer some kind of self-healing which is usually just
> the same as what I proposed (plus whatever on-access integrity checks
> the platform supports) [4].
>
> > I would rather focus my efforts on ensuring that my backups are always
> > restorable, at least until the most recent time of archival.
> >
>
> That's the point:
> a) You have to detect when you have to restore from backup.
> b) You have to verify that the backup itself is still valid.
> c) You have to avoid situations where undetected errors creep into the
> backup.
>
> I'm not talking about a purely theoretical possibility. I have
> experienced just that: Some data that I have kept lying around for years
> was corrupted.
>
> [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems
> http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf
>
> [2] Baker et.al: A fresh look at the reliability of long-term digital
> storage
> http://arxiv.org/pdf/cs/0508130
>
> [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk
> Drives
> http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf
>
> [4]
> http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-centera.pdf
>
> Regards,
> Florian Philipp
>

Interesting reads... thanks for the link!

Hmm... if I'm in your position, I think this is what I'll do:

1. Make a set of MD5 'checksums', one per file for ease of update.
2. Compare the checksums with the actual files before opening a file. If mismatch, notify.
3. When file handle is closed, recalculate.

Protect the set of MD5 periodically using par2.

Also protect your backups using par2, for that matter (that's what I always do when I archive something to optical media).

Of course, you can outright use par2 to protect and ECC your data, but the time needed to generate the .par files *every time* would be too much, methinks...

Rgds,
--

--047d7b10c86181025f04d2d233e8--