* [gentoo-user] md5sum for directories? @ 2008-02-24 11:06 Stroller 2008-02-24 11:46 ` Etaoin Shrdlu ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Stroller @ 2008-02-24 11:06 UTC (permalink / raw To: gentoo-user Hi there, I'm in the habit of backing up customer data by booting from knoppix, connecting a portable hard-drive and copying with `cp -rvf`. When this has finished I connect the portable hard-drive to my desktop machine, copy the directory of data from it to my homedir, and make a zip file of the directory. I've done this loads in the past, and never been aware of any file corruption, but I guess I'm just paranoid today. Perhaps I shouldn't use the -v flags during my copy - it's reassuring to see the files being copied, but what if I overlooked a bunch of errors in the middle of all those thousands of "copied successfully" confirmations? What if something has gone wrong during one of the two copies? So my question is: Is there any way to check the integrity of copied directories, to be sure that none of the files or sub-directories in them have become damaged during transfer? I'm thinking of something like md5sum for directories. It occurred to me that one could run `find . -type f -exec md5sum \{} \; > file.txt` on both machines and diff the outputs, but some of these directories contain many thousands of files, and I'd imagine that mdsumming of all these could take some time. Does anyone have any suggestions, please? Stroller. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller @ 2008-02-24 11:46 ` Etaoin Shrdlu 2008-02-27 0:38 ` Stroller 2008-02-24 14:29 ` Neil Bothwick ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Etaoin Shrdlu @ 2008-02-24 11:46 UTC (permalink / raw To: gentoo-user On Sunday 24 February 2008, Stroller wrote: > I've done this loads in the past, and never been aware of any file > corruption, but I guess I'm just paranoid today. Perhaps I shouldn't > use the -v flags during my copy - it's reassuring to see the files > being copied, but what if I overlooked a bunch of errors in the > middle of all those thousands of "copied successfully" confirmations? > What if something has gone wrong during one of the two copies? Well, in that case cp will have a nnonzero exit status. Look: $ ls -l total 12 -rw-r--r-- 1 kermit users 4 2008-02-24 12:30 a -rw-r--r-- 1 kermit users 12 2008-02-24 12:30 b drwxr-xr-x 2 kermit users 4096 2008-02-24 12:30 destdir $ ls -l destdir total 0 $ chmod 000 b $ ls -l total 12 -rw-r--r-- 1 kermit users 4 2008-02-24 12:30 a ---------- 1 kermit users 12 2008-02-24 12:30 b drwxr-xr-x 2 kermit users 4096 2008-02-24 12:30 destdir $ cp a b destdir cp: cannot open `b' for reading: Permission denied $ echo $? 1 $ ls -l destdir total 4 -rw-r--r-- 1 kermit users 4 2008-02-24 12:31 a I think this should hold for the majority of cases/errors cp might encounter during the copy. Of course, this does not detect a succesful, but somehow corrupted, copy (which should be exceptionally rare, anyway). > So my question is: > > Is there any way to check the integrity of copied directories, to be > sure that none of the files or sub-directories in them have become > damaged during transfer? I'm thinking of something like md5sum for > directories. I'm not aware of any such tool (which might exist nonetheless, of course). However, on the filesystem, the objects that we call "directories" are just index files holding filenames and pointers to inodes. Running a checksum on the directories themselves would not guarantee against corruption of any of the contained files, since file data is not contained in the directory. Thus, to be accurate, such a tool would have to scan the directory, find each file, and perform a checksum on it, which would result in something not much different from the find command you suggested, in terms of resource usage. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 11:46 ` Etaoin Shrdlu @ 2008-02-27 0:38 ` Stroller 2008-02-27 9:40 ` Etaoin Shrdlu 0 siblings, 1 reply; 13+ messages in thread From: Stroller @ 2008-02-27 0:38 UTC (permalink / raw To: gentoo-user On 24 Feb 2008, at 11:46, Etaoin Shrdlu wrote: > On Sunday 24 February 2008, Stroller wrote: > >> I've done this loads in the past, and never been aware of any file >> corruption, but I guess I'm just paranoid today. Perhaps I shouldn't >> use the -v flags during my copy - it's reassuring to see the files >> being copied, but what if I overlooked a bunch of errors in the >> middle of all those thousands of "copied successfully" confirmations? >> What if something has gone wrong during one of the two copies? > > Well, in that case cp will have a nnonzero exit status. Look: > > ... > $ cp a b destdir > cp: cannot open `b' for reading: Permission denied > $ echo $? > 1 > ... > I think this should hold for the majority of cases/errors cp might > encounter during the copy. Good point. I should have checked this when I first made the copy using cp, and will do so in the future. > Of course, this does not detect a succesful, but somehow corrupted, > copy > (which should be exceptionally rare, anyway). Well perhaps I'm just being paranoid today. But how do I know that a successful, but somehow corrupted, copy has not occurred? What makes you confident that these are rare? I don't ask this to be antagonistic, just to increase my own confidence in the `cp` command. >> Is there any way to check the integrity of copied directories, to be >> sure that none of the files or sub-directories in them have become >> damaged during transfer? I'm thinking of something like md5sum for >> directories. > > I'm not aware of any such tool (which might exist nonetheless, of > course). However, on the filesystem, the objects that we > call "directories" are just index files holding filenames and pointers > to inodes. Running a checksum on the directories themselves would not > guarantee against corruption of any of the contained files, since file > data is not contained in the directory. Naturally. Perhaps I should have phrased my question differently: "Is there any way to recursively check the integrity of copied directories of files?" However the words "to be sure that none of the files or sub- directories in them have become damaged during transfer" > Thus, to be accurate, such a > tool would have to scan the directory, find each file, and perform a > checksum on it, which would result in something not much different > from > the find command you suggested, in terms of resource usage. I have to admit that I haven't run this command and I don't have any idea what its actual resource usage would be. I guess I'd be happy with a lower-grade of checksumming, if it would reduce the runtime to acceptable levels. With md5sum one can be - barring certain malicious external attacks - quite certain that a copied file is identical to the original. I would be happy with a "the file's there and it looks ok" level of confidence. Stroller. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-27 0:38 ` Stroller @ 2008-02-27 9:40 ` Etaoin Shrdlu 0 siblings, 0 replies; 13+ messages in thread From: Etaoin Shrdlu @ 2008-02-27 9:40 UTC (permalink / raw To: gentoo-user On Wednesday 27 February 2008, Stroller wrote: > > Of course, this does not detect a succesful, but somehow corrupted, > > copy > > (which should be exceptionally rare, anyway). > > Well perhaps I'm just being paranoid today. > But how do I know that a successful, but somehow corrupted, copy has > not occurred? > > What makes you confident that these are rare? I don't ask this to be > antagonistic, just to increase my own confidence in the `cp` command. Ah well, I have no statistics here. But I can say that such a thing has never occured to me in the past (or at least if it occured, I did not notice that). Not a definitive proof, I know; rather, just my experience. You are of course free to not trust me and, if you're truly paranoid, you probably should do so :-) > I have to admit that I haven't run this command and I don't have any > idea what its actual resource usage would be. I guess I'd be happy > with a lower-grade of checksumming, if it would reduce the runtime to > acceptable levels. With md5sum one can be - barring certain malicious > external attacks - quite certain that a copied file is identical to > the original. I would be happy with a "the file's there and it looks > ok" level of confidence. Well, md5deep has already been suggested. If you are content with a lower-grade checksumming, you could write your own script that compares file lenghts and calculate checksums only on the first n and last m bytes of each file, for some reasonable values of n and m (bigger is better, as you guess). This is what backuppc (an excellent backup software) does when it has to decide whether a file has changed (and thus has to be backed up) compared with the copy stored in the backup pool. Read this for more info: http://backuppc.sourceforge.net/faq/BackupPC.html#some_design_issues "The hashing function" paragraph. Do note that (of course) that method is not 100% accurate and might report false negatives if the corruption is in the middle of the file and file length did not change. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller 2008-02-24 11:46 ` Etaoin Shrdlu @ 2008-02-24 14:29 ` Neil Bothwick 2008-02-24 16:39 ` cabbage 2008-02-24 19:46 ` Christopher Copeland 2008-02-24 21:15 ` [gentoo-user] " »Q« 3 siblings, 1 reply; 13+ messages in thread From: Neil Bothwick @ 2008-02-24 14:29 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 474 bytes --] On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote: > Is there any way to check the integrity of copied directories, to be > sure that none of the files or sub-directories in them have become > damaged during transfer? I'm thinking of something like md5sum for > directories. Diff? diff -r /source /dest will return no output if the two copies are identical. -- Neil Bothwick Never underestimate the bandwidth of a station wagon full of tapes! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 14:29 ` Neil Bothwick @ 2008-02-24 16:39 ` cabbage 2008-02-24 16:46 ` Dirk Heinrichs 2008-02-24 16:49 ` Andrew Gaydenko 0 siblings, 2 replies; 13+ messages in thread From: cabbage @ 2008-02-24 16:39 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 590 bytes --] diff can use for binary files ? On Sun, Feb 24, 2008 at 10:29 PM, Neil Bothwick <neil@digimed.co.uk> wrote: > On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote: > > > Is there any way to check the integrity of copied directories, to be > > sure that none of the files or sub-directories in them have become > > damaged during transfer? I'm thinking of something like md5sum for > > directories. > > Diff? > > diff -r /source /dest > will return no output if the two copies are identical. > > > -- > Neil Bothwick > > Never underestimate the bandwidth of a station wagon full of tapes! > [-- Attachment #2: Type: text/html, Size: 932 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 16:39 ` cabbage @ 2008-02-24 16:46 ` Dirk Heinrichs 2008-02-24 16:49 ` Andrew Gaydenko 1 sibling, 0 replies; 13+ messages in thread From: Dirk Heinrichs @ 2008-02-24 16:46 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 148 bytes --] Am Sonntag, 24. Februar 2008 schrieb cabbage: > diff can use for binary files ? If you just want to know "different or not", sure. Bye... Dirk [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 16:39 ` cabbage 2008-02-24 16:46 ` Dirk Heinrichs @ 2008-02-24 16:49 ` Andrew Gaydenko 1 sibling, 0 replies; 13+ messages in thread From: Andrew Gaydenko @ 2008-02-24 16:49 UTC (permalink / raw To: gentoo-user Hi! ======= On Sunday 24 February 2008, you wrote: ======= ... > > On Sun, 24 Feb 2008 11:06:10 +0000, Stroller wrote: > > > Is there any way to check the integrity of copied directories, to > > > be sure that none of the files or sub-directories in them have > > > become damaged during transfer? I'm thinking of something like > > > md5sum for directories. I use this script to check how DVD-data were written: nice -n 15 find $1/* -type f -print0 | sort -z | xargs -0 cat | md5sum -b Don't ask me how does it work - I forgot :-) But it works. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller 2008-02-24 11:46 ` Etaoin Shrdlu 2008-02-24 14:29 ` Neil Bothwick @ 2008-02-24 19:46 ` Christopher Copeland 2008-02-27 0:51 ` Stroller 2008-02-24 21:15 ` [gentoo-user] " »Q« 3 siblings, 1 reply; 13+ messages in thread From: Christopher Copeland @ 2008-02-24 19:46 UTC (permalink / raw To: gentoo-user On 24 Feb 2008, at 06:06, Stroller wrote: > So my question is: > > Is there any way to check the integrity of copied directories, to be > sure that none of the files or sub-directories in them have become > damaged during transfer? I'm thinking of something like md5sum for > directories. I use rsync for this and would suggest you look into it. You can tell it to compare files based on checksum (which is slower) and the real beauty is that if there is a file that is corrupt or otherwise not the same as the source it will copy just that single file to your backup disk. Test it by deleting a random file somewhere in the backup tree.. rerun your rsync command and the file is copied back. man rsync -- Christopher -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-24 19:46 ` Christopher Copeland @ 2008-02-27 0:51 ` Stroller 2008-02-27 2:36 ` Christopher Copeland 0 siblings, 1 reply; 13+ messages in thread From: Stroller @ 2008-02-27 0:51 UTC (permalink / raw To: gentoo-user On 24 Feb 2008, at 19:46, Christopher Copeland wrote: > On 24 Feb 2008, at 06:06, Stroller wrote: > >> So my question is: >> >> Is there any way to check the integrity of copied directories, to >> be sure that none of the files or sub-directories in them have >> become damaged during transfer? I'm thinking of something like >> md5sum for directories. > > I use rsync for this and would suggest you look into it. You can > tell it to compare files based on checksum (which is slower) and > the real beauty is that if there is a file that is corrupt or > otherwise not the same as the source it will copy just that single > file to your backup disk. Test it by deleting a random file > somewhere in the backup tree.. rerun your rsync command and the > file is copied back. > > man rsync Thanks. I think this has been suggested before for my backups - IIRC it has a useful --ignore-path or --exclude-path command which can insure you all the users' Documents & Settings, without the useless temp & "Temporary Internet Files". I've just tried `rsync- vrchi` on a pair of subdirectories ("My Documents") of the backup I made last week and on those it seems run in acceptable time. I got little output, however, so have deleted a couple of files from the destination (I should perhaps write some random data to another) and am running it again in anticipation of some "copying /a/b/c/file /x/y/z/file" output. I appreciate your help, Stroller. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] md5sum for directories? 2008-02-27 0:51 ` Stroller @ 2008-02-27 2:36 ` Christopher Copeland 0 siblings, 0 replies; 13+ messages in thread From: Christopher Copeland @ 2008-02-27 2:36 UTC (permalink / raw To: gentoo-user On 26 Feb 2008, at 19:51, Stroller wrote: > Thanks. I think this has been suggested before for my backups - IIRC > it has a useful --ignore-path or --exclude-path command which can > insure you all the users' Documents & Settings, without the useless > temp & "Temporary Internet Files". > rsync has excellent control over what is copied via the include and exclude options. > I've just tried `rsync- vrchi` on a pair of subdirectories ("My > Documents") of the backup I made last week and on those it seems run > in acceptable time. I got little output, however, so have deleted a > couple of files from the destination (I should perhaps write some > random data to another) and am running it again in anticipation of > some "copying /a/b/c/file /x/y/z/file" output. > When I run rsync interactively i usually add --stats and --progress to the command. Those will give you more feedback. > I appreciate your help, Least I could do, and if I hadn't mentioned it I am sure someone else would have. This is a gentoo list ;-) -- Christopher -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* [gentoo-user] Re: md5sum for directories? 2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller ` (2 preceding siblings ...) 2008-02-24 19:46 ` Christopher Copeland @ 2008-02-24 21:15 ` »Q« 2008-02-26 19:59 ` Mick 3 siblings, 1 reply; 13+ messages in thread From: »Q« @ 2008-02-24 21:15 UTC (permalink / raw To: gentoo-user Stroller <stroller@stellar.eclipse.co.uk> wrote: > I'm thinking of something like md5sum for directories. I think you may have gotten better solutions for your situation, but md5deep (in portage) is like md5sum but with directory recursion. -- gentoo-user@lists.gentoo.org mailing list ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [gentoo-user] Re: md5sum for directories? 2008-02-24 21:15 ` [gentoo-user] " »Q« @ 2008-02-26 19:59 ` Mick 0 siblings, 0 replies; 13+ messages in thread From: Mick @ 2008-02-26 19:59 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1126 bytes --] On Sunday 24 February 2008, »Q« wrote: > Stroller <stroller@stellar.eclipse.co.uk> wrote: > > I'm thinking of something like md5sum for directories. > > I think you may have gotten better solutions for your situation, but > md5deep (in portage) is like md5sum but with directory recursion. I'm probably not suggesting anything you don't already know, but just in case: Notwithstanding that rsync is a superior tool just made for the job, I more often use tar instead of either rsync or cp. This is because when I back up a complete fs I use whichever LiveCD I have at hand (usually Knoppix) which doesn't always have rsync on it. Anyway, the tar command has the option -d which diffs the contents of the archive and the original fs, if you want to see what happened after the archive was written, or want to decide if it is time/worth making a fresher back up. Alternatively and more appropriately if you run this as part of a back up process, there is the -W option. From the man page: -W, --verify attempt to verify the archive after writing it HTH. -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-02-27 9:28 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-24 11:06 [gentoo-user] md5sum for directories? Stroller 2008-02-24 11:46 ` Etaoin Shrdlu 2008-02-27 0:38 ` Stroller 2008-02-27 9:40 ` Etaoin Shrdlu 2008-02-24 14:29 ` Neil Bothwick 2008-02-24 16:39 ` cabbage 2008-02-24 16:46 ` Dirk Heinrichs 2008-02-24 16:49 ` Andrew Gaydenko 2008-02-24 19:46 ` Christopher Copeland 2008-02-27 0:51 ` Stroller 2008-02-27 2:36 ` Christopher Copeland 2008-02-24 21:15 ` [gentoo-user] " »Q« 2008-02-26 19:59 ` Mick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox