public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] RFC : fast copying of a whole directory tree
@ 2012-02-13 10:49 Helmut Jarausch
  2012-02-13 15:17 ` Michael Orlitzky
  0 siblings, 1 reply; 15+ messages in thread
From: Helmut Jarausch @ 2012-02-13 10:49 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

Hi,

when copying a whole directory tree with standard tools, e.g.
tar cf - . | ( cd $DEST && tar xf - )
or  cpio -p ...

the source disk is busy seeking. That's noisy and particularly slow.

I've written a small Python program which outputs the file names in
i-node order. If this is fed into tar or cpio nearly no seeks are 
required during copying.

I've tested it by comparing the resulting copied tree to one created by 
tar | tar.

But it's correctness for backing up data is critical.
Therefore I'd like to ask for comments.

Thanks for any comments,
Helmut.

[-- Attachment #2: TreeWalk_I_Sorted.py --]
[-- Type: text/x-python, Size: 1345 bytes --]

#!/usr/bin/python3
import os, sys, stat


def walktree(top):
    '''recursively descend the directory tree rooted at top,
       calling the callback function for each regular file'''
    for f in os.listdir(top):
        pathname = os.path.join(top, f)
        Stat= os.lstat(pathname)
        Dev = Stat.st_dev
        if  Dev != Root_Dev :
            continue
        Ino = Stat.st_ino
        mode = Stat.st_mode
        if stat.S_ISDIR(mode):
            # It's a directory, recurse into it
            FN_List.append((Ino,pathname))
            walktree(pathname)
        else :
          FN_List.append((Ino,pathname))


if len(sys.argv) != 2 :
  print('''usage:
  TreeWalk_I_Sorted <TOPDIR>  # generates a list of files in inode order
  # example with tar :
  TreeWalk_I_Sorted <TOPDIR> | tar --no-recursion -c -j -T- -f XXX.tar.bz2
  # example with cpio
  TreeWalk_I_Sorted <TOPDIR> | cpio -padmu <DESTDIR>
  ''')
  exit(1)

TOP= sys.argv[1]
Stat= os.lstat(TOP)
Root_Dev= Stat.st_dev
FN_List=[(Stat.st_ino,TOP)]

# import resource
# print("at Start in kB ",resource.getrusage(0).ru_maxrss)
# uses about 500 bytes per file

walktree(TOP)
FN_List.sort()

# print("*** starting ...",file=sys.stderr)
for I,F in FN_List :
  print(F)  #  print(I," -> " ,F)

# print("after loading",len(FN_List)," items : ",resource.getrusage(0).ru_maxrss)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-02-14 17:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-13 10:49 [gentoo-user] RFC : fast copying of a whole directory tree Helmut Jarausch
2012-02-13 15:17 ` Michael Orlitzky
2012-02-13 15:31   ` [gentoo-user] " Grant Edwards
2012-02-13 16:11     ` Joerg Schilling
2012-02-13 16:29       ` Pandu Poluan
2012-02-13 16:37         ` Nikos Chantziaras
2012-02-13 17:42           ` Pandu Poluan
2012-02-13 22:58             ` Neil Bothwick
2012-02-14  0:37               ` Pandu Poluan
2012-02-13 18:20           ` Joerg Schilling
2012-02-13 22:11             ` Dale
2012-02-14 12:50               ` Mick
2012-02-14  9:05     ` Florian Philipp
2012-02-14  9:57       ` Joerg Schilling
2012-02-14 17:45         ` Florian Philipp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox