public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Paul Hartman <paul.hartman+gentoo@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Pipe Lines - A really basic question
Date: Fri, 10 Sep 2010 10:10:07 -0500	[thread overview]
Message-ID: <AANLkTik-J-gJe_H7Y9uogibmQLARZjAs1U1hx4_+TswO@mail.gmail.com> (raw)
In-Reply-To: <4C8947AD.6010906@admin-box.com>

On Thu, Sep 9, 2010 at 3:46 PM, Daniel Troeder <daniel@admin-box.com> wrote:
> On 09/09/2010 07:24 PM, Matt Neimeyer wrote:
>> My generic question is: When I'm using a pipe line series of commands
>> do I use up more/less space than doing things in sequence?
>>
>> For example, I have a development Gentoo VM that has a hard drive that
>> is too small... I wanted to move a database off of that onto another
>> machine but when I tried the following I filled my partition and 'evil
>> things' happened...
>>
>> mysqldump blah...
>> gzip blah...
>>
>> In this specific case I added another virtual drive, mounted that and
>> went on with life but I'm curious if I could have gotten away with the
>> pipe line instead. Will doing something like this still use "twice"
>> the space?
>>
>> mysqldump | gzip > file.sql.gz
>>
>> OR going back to my generic question if I pipe line like "type | sort
>> | unique > output" does that only use 1x or 3x the disk space?
>>
>> Thanks in advance!
>>
>> Matt
>>
>> P.S. If the answer is "it depends" how do know what it depends on?
>>
> Everyone already answered the disk space question. I want to add just
> this: It also saves you lots of i/o-bandwidth: only the compressed data
> gets written to disk. As i/o is the most common bottleneck, it is often
> an imperative to do as much as possible in a pipe. If you're lucky it
> can also mean, that multiple programs run at the same time, resulting in
> higher throughput. Lucky is, when consumer and producer (right and left
> of pipe) can work simultaneously because the buffer is big enough. You
> can see this every time you (un)pack a tar.gz.

And if you have a huge amount of data where compression causes CPU to
become the bottleneck you can use something like pbzip2 which uses all
CPUs/cores in parallel to speed up [de]compression. :)



  reply	other threads:[~2010-09-10 16:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-09 17:24 [gentoo-user] Pipe Lines - A really basic question Matt Neimeyer
2010-09-09 18:03 ` Etaoin Shrdlu
2010-09-09 18:25 ` Andrea Conti
2010-09-09 19:19   ` Florian Philipp
2010-09-09 20:28     ` [gentoo-user] " Grant Edwards
2010-09-10 16:34       ` Florian Philipp
2010-09-10 18:33         ` Grant Edwards
2010-09-09 19:09 ` [gentoo-user] " Florian Philipp
2010-09-09 20:46 ` Daniel Troeder
2010-09-10 15:10   ` Paul Hartman [this message]
2010-09-10 15:22   ` Matt Neimeyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTik-J-gJe_H7Y9uogibmQLARZjAs1U1hx4_+TswO@mail.gmail.com \
    --to=paul.hartman+gentoo@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox