* [gentoo-user] md5sum for directories?
@ 2008-02-24 11:06 Stroller
2008-02-24 11:46 ` Etaoin Shrdlu
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Stroller @ 2008-02-24 11:06 UTC (permalink / raw
To: gentoo-user
Hi there,
I'm in the habit of backing up customer data by booting from knoppix,
connecting a portable hard-drive and copying with `cp -rvf`.
When this has finished I connect the portable hard-drive to my
desktop machine, copy the directory of data from it to my homedir,
and make a zip file of the directory.
I've done this loads in the past, and never been aware of any file
corruption, but I guess I'm just paranoid today. Perhaps I shouldn't
use the -v flags during my copy - it's reassuring to see the files
being copied, but what if I overlooked a bunch of errors in the
middle of all those thousands of "copied successfully" confirmations?
What if something has gone wrong during one of the two copies?
So my question is:
Is there any way to check the integrity of copied directories, to be
sure that none of the files or sub-directories in them have become
damaged during transfer? I'm thinking of something like md5sum for
directories.
It occurred to me that one could run `find . -type f -exec md5sum \{}
\; > file.txt` on both machines and diff the outputs, but some of
these directories contain many thousands of files, and I'd imagine
that mdsumming of all these could take some time.
Does anyone have any suggestions, please?
Stroller.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller
@ 2008-02-24 11:46 ` Etaoin Shrdlu
2008-02-27 0:38 ` Stroller
2008-02-24 14:29 ` Neil Bothwick
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Etaoin Shrdlu @ 2008-02-24 11:46 UTC (permalink / raw
To: gentoo-user
On Sunday 24 February 2008, Stroller wrote:
> I've done this loads in the past, and never been aware of any file
> corruption, but I guess I'm just paranoid today. Perhaps I shouldn't
> use the -v flags during my copy - it's reassuring to see the files
> being copied, but what if I overlooked a bunch of errors in the
> middle of all those thousands of "copied successfully" confirmations?
> What if something has gone wrong during one of the two copies?
Well, in that case cp will have a nnonzero exit status. Look:
$ ls -l
total 12
-rw-r--r-- 1 kermit users 4 2008-02-24 12:30 a
-rw-r--r-- 1 kermit users 12 2008-02-24 12:30 b
drwxr-xr-x 2 kermit users 4096 2008-02-24 12:30 destdir
$ ls -l destdir
total 0
$ chmod 000 b
$ ls -l
total 12
-rw-r--r-- 1 kermit users 4 2008-02-24 12:30 a
---------- 1 kermit users 12 2008-02-24 12:30 b
drwxr-xr-x 2 kermit users 4096 2008-02-24 12:30 destdir
$ cp a b destdir
cp: cannot open `b' for reading: Permission denied
$ echo $?
1
$ ls -l destdir
total 4
-rw-r--r-- 1 kermit users 4 2008-02-24 12:31 a
I think this should hold for the majority of cases/errors cp might
encounter during the copy.
Of course, this does not detect a succesful, but somehow corrupted, copy
(which should be exceptionally rare, anyway).
> So my question is:
>
> Is there any way to check the integrity of copied directories, to be
> sure that none of the files or sub-directories in them have become
> damaged during transfer? I'm thinking of something like md5sum for
> directories.
I'm not aware of any such tool (which might exist nonetheless, of
course). However, on the filesystem, the objects that we
call "directories" are just index files holding filenames and pointers
to inodes. Running a checksum on the directories themselves would not
guarantee against corruption of any of the contained files, since file
data is not contained in the directory. Thus, to be accurate, such a
tool would have to scan the directory, find each file, and perform a
checksum on it, which would result in something not much different from
the find command you suggested, in terms of resource usage.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller
2008-02-24 11:46 ` Etaoin Shrdlu
@ 2008-02-24 14:29 ` Neil Bothwick
2008-02-24 16:39 ` cabbage
2008-02-24 19:46 ` Christopher Copeland
2008-02-24 21:15 ` [gentoo-user] " »Q«
3 siblings, 1 reply; 13+ messages in thread
From: Neil Bothwick @ 2008-02-24 14:29 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 474 bytes --]
On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote:
> Is there any way to check the integrity of copied directories, to be
> sure that none of the files or sub-directories in them have become
> damaged during transfer? I'm thinking of something like md5sum for
> directories.
Diff?
diff -r /source /dest
will return no output if the two copies are identical.
--
Neil Bothwick
Never underestimate the bandwidth of a station wagon full of tapes!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 14:29 ` Neil Bothwick
@ 2008-02-24 16:39 ` cabbage
2008-02-24 16:46 ` Dirk Heinrichs
2008-02-24 16:49 ` Andrew Gaydenko
0 siblings, 2 replies; 13+ messages in thread
From: cabbage @ 2008-02-24 16:39 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 590 bytes --]
diff can use for binary files ?
On Sun, Feb 24, 2008 at 10:29 PM, Neil Bothwick <neil@digimed.co.uk> wrote:
> On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote:
>
> > Is there any way to check the integrity of copied directories, to be
> > sure that none of the files or sub-directories in them have become
> > damaged during transfer? I'm thinking of something like md5sum for
> > directories.
>
> Diff?
>
> diff -r /source /dest
> will return no output if the two copies are identical.
>
>
> --
> Neil Bothwick
>
> Never underestimate the bandwidth of a station wagon full of tapes!
>
[-- Attachment #2: Type: text/html, Size: 932 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 16:39 ` cabbage
@ 2008-02-24 16:46 ` Dirk Heinrichs
2008-02-24 16:49 ` Andrew Gaydenko
1 sibling, 0 replies; 13+ messages in thread
From: Dirk Heinrichs @ 2008-02-24 16:46 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 148 bytes --]
Am Sonntag, 24. Februar 2008 schrieb cabbage:
> diff can use for binary files ?
If you just want to know "different or not", sure.
Bye...
Dirk
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 16:39 ` cabbage
2008-02-24 16:46 ` Dirk Heinrichs
@ 2008-02-24 16:49 ` Andrew Gaydenko
1 sibling, 0 replies; 13+ messages in thread
From: Andrew Gaydenko @ 2008-02-24 16:49 UTC (permalink / raw
To: gentoo-user
Hi!
======= On Sunday 24 February 2008, you wrote: =======
...
> > On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote:
> > > Is there any way to check the integrity of copied directories, to
> > > be sure that none of the files or sub-directories in them have
> > > become damaged during transfer? I'm thinking of something like
> > > md5sum for directories.
I use this script to check how DVD-data were written:
nice -n 15 find $1/* -type f -print0 | sort -z | xargs -0 cat | md5sum -b
Don't ask me how does it work - I forgot :-) But it works.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller
2008-02-24 11:46 ` Etaoin Shrdlu
2008-02-24 14:29 ` Neil Bothwick
@ 2008-02-24 19:46 ` Christopher Copeland
2008-02-27 0:51 ` Stroller
2008-02-24 21:15 ` [gentoo-user] " »Q«
3 siblings, 1 reply; 13+ messages in thread
From: Christopher Copeland @ 2008-02-24 19:46 UTC (permalink / raw
To: gentoo-user
On 24 Feb 2008, at 06:06, Stroller wrote:
> So my question is:
>
> Is there any way to check the integrity of copied directories, to be
> sure that none of the files or sub-directories in them have become
> damaged during transfer? I'm thinking of something like md5sum for
> directories.
I use rsync for this and would suggest you look into it. You can tell
it to compare files based on checksum (which is slower) and the real
beauty is that if there is a file that is corrupt or otherwise not the
same as the source it will copy just that single file to your backup
disk. Test it by deleting a random file somewhere in the backup tree..
rerun your rsync command and the file is copied back.
man rsync
--
Christopher
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* [gentoo-user] Re: md5sum for directories?
2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller
` (2 preceding siblings ...)
2008-02-24 19:46 ` Christopher Copeland
@ 2008-02-24 21:15 ` »Q«
2008-02-26 19:59 ` Mick
3 siblings, 1 reply; 13+ messages in thread
From: »Q« @ 2008-02-24 21:15 UTC (permalink / raw
To: gentoo-user
Stroller <stroller@stellar.eclipse.co.uk> wrote:
> I'm thinking of something like md5sum for directories.
I think you may have gotten better solutions for your situation, but
md5deep (in portage) is like md5sum but with directory recursion.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: md5sum for directories?
2008-02-24 21:15 ` [gentoo-user] " »Q«
@ 2008-02-26 19:59 ` Mick
0 siblings, 0 replies; 13+ messages in thread
From: Mick @ 2008-02-26 19:59 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]
On Sunday 24 February 2008, »Q« wrote:
> Stroller <stroller@stellar.eclipse.co.uk> wrote:
> > I'm thinking of something like md5sum for directories.
>
> I think you may have gotten better solutions for your situation, but
> md5deep (in portage) is like md5sum but with directory recursion.
I'm probably not suggesting anything you don't already know, but just in case:
Notwithstanding that rsync is a superior tool just made for the job, I more
often use tar instead of either rsync or cp. This is because when I back up
a complete fs I use whichever LiveCD I have at hand (usually Knoppix) which
doesn't always have rsync on it. Anyway, the tar command has the option -d
which diffs the contents of the archive and the original fs, if you want to
see what happened after the archive was written, or want to decide if it is
time/worth making a fresher back up. Alternatively and more appropriately
if you run this as part of a back up process, there is the -W option. From
the man page:
-W, --verify
attempt to verify the archive after writing it
HTH.
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 11:46 ` Etaoin Shrdlu
@ 2008-02-27 0:38 ` Stroller
2008-02-27 9:40 ` Etaoin Shrdlu
0 siblings, 1 reply; 13+ messages in thread
From: Stroller @ 2008-02-27 0:38 UTC (permalink / raw
To: gentoo-user
On 24 Feb 2008, at 11:46, Etaoin Shrdlu wrote:
> On Sunday 24 February 2008, Stroller wrote:
>
>> I've done this loads in the past, and never been aware of any file
>> corruption, but I guess I'm just paranoid today. Perhaps I shouldn't
>> use the -v flags during my copy - it's reassuring to see the files
>> being copied, but what if I overlooked a bunch of errors in the
>> middle of all those thousands of "copied successfully" confirmations?
>> What if something has gone wrong during one of the two copies?
>
> Well, in that case cp will have a nnonzero exit status. Look:
>
> ...
> $ cp a b destdir
> cp: cannot open `b' for reading: Permission denied
> $ echo $?
> 1
> ...
> I think this should hold for the majority of cases/errors cp might
> encounter during the copy.
Good point. I should have checked this when I first made the copy
using cp, and will do so in the future.
> Of course, this does not detect a succesful, but somehow corrupted,
> copy
> (which should be exceptionally rare, anyway).
Well perhaps I'm just being paranoid today.
But how do I know that a successful, but somehow corrupted, copy has
not occurred?
What makes you confident that these are rare? I don't ask this to be
antagonistic, just to increase my own confidence in the `cp` command.
>> Is there any way to check the integrity of copied directories, to be
>> sure that none of the files or sub-directories in them have become
>> damaged during transfer? I'm thinking of something like md5sum for
>> directories.
>
> I'm not aware of any such tool (which might exist nonetheless, of
> course). However, on the filesystem, the objects that we
> call "directories" are just index files holding filenames and pointers
> to inodes. Running a checksum on the directories themselves would not
> guarantee against corruption of any of the contained files, since file
> data is not contained in the directory.
Naturally.
Perhaps I should have phrased my question differently: "Is there any
way to recursively check the integrity of copied directories of
files?" However the words "to be sure that none of the files or sub-
directories in them have become damaged during transfer"
> Thus, to be accurate, such a
> tool would have to scan the directory, find each file, and perform a
> checksum on it, which would result in something not much different
> from
> the find command you suggested, in terms of resource usage.
I have to admit that I haven't run this command and I don't have any
idea what its actual resource usage would be. I guess I'd be happy
with a lower-grade of checksumming, if it would reduce the runtime to
acceptable levels. With md5sum one can be - barring certain malicious
external attacks - quite certain that a copied file is identical to
the original. I would be happy with a "the file's there and it looks
ok" level of confidence.
Stroller.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-24 19:46 ` Christopher Copeland
@ 2008-02-27 0:51 ` Stroller
2008-02-27 2:36 ` Christopher Copeland
0 siblings, 1 reply; 13+ messages in thread
From: Stroller @ 2008-02-27 0:51 UTC (permalink / raw
To: gentoo-user
On 24 Feb 2008, at 19:46, Christopher Copeland wrote:
> On 24 Feb 2008, at 06:06, Stroller wrote:
>
>> So my question is:
>>
>> Is there any way to check the integrity of copied directories, to
>> be sure that none of the files or sub-directories in them have
>> become damaged during transfer? I'm thinking of something like
>> md5sum for directories.
>
> I use rsync for this and would suggest you look into it. You can
> tell it to compare files based on checksum (which is slower) and
> the real beauty is that if there is a file that is corrupt or
> otherwise not the same as the source it will copy just that single
> file to your backup disk. Test it by deleting a random file
> somewhere in the backup tree.. rerun your rsync command and the
> file is copied back.
>
> man rsync
Thanks. I think this has been suggested before for my backups - IIRC
it has a useful --ignore-path or --exclude-path command which can
insure you all the users' Documents & Settings, without the useless
temp & "Temporary Internet Files".
I've just tried `rsync- vrchi` on a pair of subdirectories ("My
Documents") of the backup I made last week and on those it seems run
in acceptable time. I got little output, however, so have deleted a
couple of files from the destination (I should perhaps write some
random data to another) and am running it again in anticipation of
some "copying /a/b/c/file /x/y/z/file" output.
I appreciate your help,
Stroller.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-27 0:51 ` Stroller
@ 2008-02-27 2:36 ` Christopher Copeland
0 siblings, 0 replies; 13+ messages in thread
From: Christopher Copeland @ 2008-02-27 2:36 UTC (permalink / raw
To: gentoo-user
On 26 Feb 2008, at 19:51, Stroller wrote:
> Thanks. I think this has been suggested before for my backups - IIRC
> it has a useful --ignore-path or --exclude-path command which can
> insure you all the users' Documents & Settings, without the useless
> temp & "Temporary Internet Files".
>
rsync has excellent control over what is copied via the include and
exclude options.
> I've just tried `rsync- vrchi` on a pair of subdirectories ("My
> Documents") of the backup I made last week and on those it seems run
> in acceptable time. I got little output, however, so have deleted a
> couple of files from the destination (I should perhaps write some
> random data to another) and am running it again in anticipation of
> some "copying /a/b/c/file /x/y/z/file" output.
>
When I run rsync interactively i usually add --stats and --progress to
the command. Those will give you more feedback.
> I appreciate your help,
Least I could do, and if I hadn't mentioned it I am sure someone else
would have. This is a gentoo list ;-)
--
Christopher
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories?
2008-02-27 0:38 ` Stroller
@ 2008-02-27 9:40 ` Etaoin Shrdlu
0 siblings, 0 replies; 13+ messages in thread
From: Etaoin Shrdlu @ 2008-02-27 9:40 UTC (permalink / raw
To: gentoo-user
On Wednesday 27 February 2008, Stroller wrote:
> > Of course, this does not detect a succesful, but somehow corrupted,
> > copy
> > (which should be exceptionally rare, anyway).
>
> Well perhaps I'm just being paranoid today.
> But how do I know that a successful, but somehow corrupted, copy has
> not occurred?
>
> What makes you confident that these are rare? I don't ask this to be
> antagonistic, just to increase my own confidence in the `cp` command.
Ah well, I have no statistics here. But I can say that such a thing has
never occured to me in the past (or at least if it occured, I did not
notice that). Not a definitive proof, I know; rather, just my
experience. You are of course free to not trust me and, if you're truly
paranoid, you probably should do so :-)
> I have to admit that I haven't run this command and I don't have any
> idea what its actual resource usage would be. I guess I'd be happy
> with a lower-grade of checksumming, if it would reduce the runtime to
> acceptable levels. With md5sum one can be - barring certain malicious
> external attacks - quite certain that a copied file is identical to
> the original. I would be happy with a "the file's there and it looks
> ok" level of confidence.
Well, md5deep has already been suggested. If you are content with a
lower-grade checksumming, you could write your own script that compares
file lenghts and calculate checksums only on the first n and last m
bytes of each file, for some reasonable values of n and m (bigger is
better, as you guess). This is what backuppc (an excellent backup
software) does when it has to decide whether a file has changed (and
thus has to be backed up) compared with the copy stored in the backup
pool.
Read this for more info:
http://backuppc.sourceforge.net/faq/BackupPC.html#some_design_issues
"The hashing function" paragraph. Do note that (of course) that method is
not 100% accurate and might report false negatives if the corruption is
in the middle of the file and file length did not change.
--
gentoo-user@lists.gentoo.org mailing list
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-02-27 9:28 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller
2008-02-24 11:46 ` Etaoin Shrdlu
2008-02-27 0:38 ` Stroller
2008-02-27 9:40 ` Etaoin Shrdlu
2008-02-24 14:29 ` Neil Bothwick
2008-02-24 16:39 ` cabbage
2008-02-24 16:46 ` Dirk Heinrichs
2008-02-24 16:49 ` Andrew Gaydenko
2008-02-24 19:46 ` Christopher Copeland
2008-02-27 0:51 ` Stroller
2008-02-27 2:36 ` Christopher Copeland
2008-02-24 21:15 ` [gentoo-user] " »Q«
2008-02-26 19:59 ` Mick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox