From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 38D62138373 for ; Tue, 8 Jan 2013 19:04:42 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 9FA2021C0C2; Tue, 8 Jan 2013 19:04:26 +0000 (UTC) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 02997E05CB for ; Tue, 8 Jan 2013 19:02:59 +0000 (UTC) Received: from compute4.internal (compute4.nyi.mail.srv.osa [10.202.2.44]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id 90374203EF for ; Tue, 8 Jan 2013 14:02:59 -0500 (EST) Received: from frontend2.nyi.mail.srv.osa ([10.202.2.161]) by compute4.internal (MEProxy); Tue, 08 Jan 2013 14:02:59 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=binarywings.net; h=message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=mesmtp; bh=MUdpR/jrhpemA9LjwjWeECaP W+8=; b=Sk8FXFg7UTBgdLg6atCWnQsGf9IEJXIyrd2P1L/8GCDaA0sy7Hc8SwC5 Rdhfqu94OAqob3WwcWiGF8Ga/m4HGWHlDEe6dgvapPhc6QwtU2+uZjCEfoZUg2Un C6d3Beuc3Krp2r1bCr9B8K9MEwGwsTYLHVi3stu+hmYWHvICT6U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:date:from:mime-version:to :subject:references:in-reply-to:content-type; s=smtpout; bh=MUdp R/jrhpemA9LjwjWeECaPW+8=; b=laLpa5F/nXqd35239l33hJ7we5BVLreC1hLt 41+DbjGruy11aJS9jvRv30jofQsZIRKHs65Y3kjpTbAi9x0BW0lgX4uej6RWkBOA imK/KSTLkKvOdtiBWQOSKodqlRXIlS360RWYkAgbdLYjFn4RcKbGiMA/TU/uPeqt ohsUNzs= X-Sasl-enc: KbTIVZMYxN2YJ7BTZBqFpkLF2U2xXXAs8B/d9Eet9SW0 1357671778 Received: from [192.168.5.18] (unknown [83.169.5.6]) by mail.messagingengine.com (Postfix) with ESMTPA id 6561A4827CB for ; Tue, 8 Jan 2013 14:02:58 -0500 (EST) Message-ID: <50EC6D59.1090809@binarywings.net> Date: Tue, 08 Jan 2013 20:02:49 +0100 From: Florian Philipp User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.11) Gecko/20121130 Thunderbird/10.0.11 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] OT: Fighting bit rot References: <50EB2BF7.4040109@binarywings.net> <20130108012016.2f02c68c@khamul.example.com> <50EBCA77.8030603@binarywings.net> <20130108095510.04f84040@khamul.example.com> <50EC4660.5090208@binarywings.net> In-Reply-To: X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigDF67096F251EBCB1F5D0DC09" X-Archives-Salt: 5b4461e6-6edb-4094-9860-33ac4336de59 X-Archives-Hash: 30e8c1ba1a729681cd7c1eb2dd71b27d This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigDF67096F251EBCB1F5D0DC09 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am 08.01.2013 18:41, schrieb Pandu Poluan: >=20 > On Jan 8, 2013 11:20 PM, "Florian Philipp" > wrote: >> >=20 > -- snip -- >=20 [...] >> >> When you have completely static content, md5sum, rsync and friends are= >> sufficient. But if you have content that changes from time to time, th= e >> number of false-positives would be too high. In this case, I think you= >> could easily distinguish by comparing both file content and time stamp= s. >> [...] >=20 > IMO, we're all barking up the wrong tree here... >=20 > Before a file's content can change without user involvement, bit rot > must first get through the checksum (CRC?) of the hard disk itself. > There will be no 'gradual degradation of data', just 'catastrophic data= > loss'. >=20 Unfortunately, that's only partly true. Latent disk errors are a well researched topic [1-3]. CRCs are not perfectly reliable. The trick is to detect and correct errors while you still have valid backups or other types of redundancy. The only way to do this is regular scrubbing. That's why professional archival solutions offer some kind of self-healing which is usually just the same as what I proposed (plus whatever on-access integrity checks the platform supports) [4]. > I would rather focus my efforts on ensuring that my backups are always > restorable, at least until the most recent time of archival. >=20 That's the point: a) You have to detect when you have to restore from backup. b) You have to verify that the backup itself is still valid. c) You have to avoid situations where undetected errors creep into the backup. I'm not talking about a purely theoretical possibility. I have experienced just that: Some data that I have kept lying around for years was corrupted. [1] Schwarz et.al: Disk Scrubbing in Large, Archival Storage Systems http://www.cse.scu.edu/~tschwarz/Papers/mascots04.pdf [2] Baker et.al: A fresh look at the reliability of long-term digital storage http://arxiv.org/pdf/cs/0508130 [3] Bairavasundaram et.al: An Analysis of Latent Sector Errors in Disk Drives http://bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/11.1.pdf [4] http://uk.emc.com/collateral/analyst-reports/kci-evaluation-of-emc-center= a.pdf Regards, Florian Philipp --------------enigDF67096F251EBCB1F5D0DC09 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlDsbWAACgkQqs4uOUlOuU+8UgCaAqWCqTJ36QrQWHJzKs9amaoE AFsAni6BvZeBnLZUm4gmZ+lA1e6i36s3 =oAfr -----END PGP SIGNATURE----- --------------enigDF67096F251EBCB1F5D0DC09--