From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 4F33E138371 for ; Tue, 8 Jan 2013 16:48:40 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id D2D9C21C0AC; Tue, 8 Jan 2013 16:48:25 +0000 (UTC) Received: from mail-wg0-f48.google.com (mail-wg0-f48.google.com [74.125.82.48]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id C5E5321C0C2 for ; Tue, 8 Jan 2013 16:46:58 +0000 (UTC) Received: by mail-wg0-f48.google.com with SMTP id dt10so505758wgb.27 for ; Tue, 08 Jan 2013 08:46:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:date:from:to:subject:message-id:in-reply-to:references :organization:x-mailer:mime-version:content-type :content-transfer-encoding; bh=bprAnHRoZaeO742cTGKhinCZ7BkAT9P/m2m8TmPkxK8=; b=kegT9xCctwcOHLyvZYNE9UQBTQSY050nfahWMC2sLOCMYjCcG/DCnIDtHl8PPdgl7d NJt9ju44cKfp5JV74vcIXLwRxpiM+6Lophk/Mzi/fXBiTOwBG2trZnRkpMm4sW+p46QZ 1PggdSd5t8l2el91c+Zi87qHvf28FjI9QLFPbGSeCIEnsKGVRm1KcgXkIcGczyzvGjTE 0okClo5ScuzGpyFoIWcwbsW5U68bju6ttoBMmSsfMTZKRGChbfsroiroHWERM3cvWd9s XocNdplNBAmEiFOCRfLOkHs6u5yp5gkERD8id5K+MTGrzVsl6i45j72R1W0QkgM8xiko nTxQ== X-Received: by 10.180.77.35 with SMTP id p3mr15862230wiw.18.1357663617473; Tue, 08 Jan 2013 08:46:57 -0800 (PST) Received: from khamul.example.com (196-210-238-60.dynamic.isadsl.co.za. [196.210.238.60]) by mx.google.com with ESMTPS id hg17sm17195807wib.1.2013.01.08.08.46.54 (version=SSLv3 cipher=RC4-SHA bits=128/128); Tue, 08 Jan 2013 08:46:56 -0800 (PST) Date: Tue, 8 Jan 2013 18:42:34 +0200 From: Alan McKinnon To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] OT: Fighting bit rot Message-ID: <20130108184234.65037a18@khamul.example.com> In-Reply-To: <50EC4660.5090208@binarywings.net> References: <50EB2BF7.4040109@binarywings.net> <20130108012016.2f02c68c@khamul.example.com> <50EBCA77.8030603@binarywings.net> <20130108095510.04f84040@khamul.example.com> <50EC4660.5090208@binarywings.net> Organization: Internet Solutions X-Mailer: Claws Mail 3.9.0 (GTK+ 2.24.14; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Archives-Salt: b47af04d-23cb-438e-9fec-0dc07c9b897f X-Archives-Hash: 3e1dbc800c2161765eaee20026056aea On Tue, 08 Jan 2013 17:16:32 +0100 Florian Philipp wrote: > Am 08.01.2013 08:55, schrieb Alan McKinnon: > > On Tue, 08 Jan 2013 08:27:51 +0100 > > Florian Philipp wrote: > > > [...] > >> > >> As I said above, the point is that I need to detect the error as > >> long as I still have a valid backup. Professional archive > >> solutions do this on their own but I'm looking for something > >> suitable for desktop usage. > > > > rsync might be able to give you something close to what you want > > easily > > > > Use the -n switch for an rsync between your originals and the last > > backup copy, and mail the output to yourself. Parse it looking for > > ">" and "<" symbols and investigate why the file changed. > > > > This strikes me as being a very easy solution that you could use > > reliably with a suitable combination of options. > > > > > > Hmm, good idea, albeit similar to the `md5sum -c`. Either tool leaves > you with the problem of distinguishing between legitimate changes > (i.e. a user wrote to the file) and decay. > > When you have completely static content, md5sum, rsync and friends are > sufficient. But if you have content that changes from time to time, > the number of false-positives would be too high. In this case, I > think you could easily distinguish by comparing both file content and > time stamps. > > Now, that of course introduces the problem that decay could occur in > the same time frame as a legitimate change, thus masking the decay. To > reduce this risk, you have to reduce the checking interval. I think your basic problem is that you are trying to detect a rare event (corruption) that looks exactly like a common event (edits you intended to make) I don't know how to tell these apart except by somehow recording which files have been written to - inotify is useful for this - and removing those from the list of things rsync says have changed. All of which leads to a massively complex lump of code that is sure to cause many more problems than it is designed to solve.... I'm afraid I don't have any real solution to offer. -- Alan McKinnon alan.mckinnon@gmail.com