On 09/09/2010 07:24 PM, Matt Neimeyer wrote:
> My generic question is: When I'm using a pipe line series of commands
> do I use up more/less space than doing things in sequence?
> 
> For example, I have a development Gentoo VM that has a hard drive that
> is too small... I wanted to move a database off of that onto another
> machine but when I tried the following I filled my partition and 'evil
> things' happened...
> 
> mysqldump blah...
> gzip blah...
> 
> In this specific case I added another virtual drive, mounted that and
> went on with life but I'm curious if I could have gotten away with the
> pipe line instead. Will doing something like this still use "twice"
> the space?
> 
> mysqldump | gzip > file.sql.gz
> 
> OR going back to my generic question if I pipe line like "type | sort
> | unique > output" does that only use 1x or 3x the disk space?
> 
> Thanks in advance!
> 
> Matt
> 
> P.S. If the answer is "it depends" how do know what it depends on?
> 
Everyone already answered the disk space question. I want to add just
this: It also saves you lots of i/o-bandwidth: only the compressed data
gets written to disk. As i/o is the most common bottleneck, it is often
an imperative to do as much as possible in a pipe. If you're lucky it
can also mean, that multiple programs run at the same time, resulting in
higher throughput. Lucky is, when consumer and producer (right and left
of pipe) can work simultaneously because the buffer is big enough. You
can see this every time you (un)pack a tar.gz.

Bye,
Daniel


-- 
PGP key @ http://pgpkeys.pca.dfn.de/pks/lookup?search=0xBB9D4887&op=get
# gpg --recv-keys --keyserver hkp://subkeys.pgp.net 0xBB9D4887