public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Rich Freeman <rich0@gentoo.org>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Clusters on Gentoo ?
Date: Tue, 19 Aug 2014 06:33:29 -0400	[thread overview]
Message-ID: <CAGfcS_nKhm6PsGdgJWdstBCOzO-OnEd36ZaThM3uiDiYBQDjFQ@mail.gmail.com> (raw)
In-Reply-To: <1598929.3JxaQGxvRs@andromeda>

On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <joost@antarean.org> wrote:
> On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote:
>> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote:
>> > Hadoop is a very specialized tool.  It does what it does very well,
>> > but if you want to use it for something other than map/reduce then
>> > consider carefully whether it is the right tool for the job.
>>
>> Agreed; unless you have decent hardware and can comfortably measure
>> your data in TB, it'll be quicker to use something else once you factor
>> in the administration time and learning curve.
>
> The benefit of clustering technologies is that you don't need high-end
> hardware to start with. You can use the old hardware you found collecting dust
> in the basement.
>
> The learning curve isn't as steep as it used to be. There are plenty of tools
> to make it easier to start using Hadoop.
>

As long as you're counting words and don't mind coding everything in Java.  :)

I found that if you want to avoid using Java, then the available
documentation plummets, and I'm pretty sure the version I was
attempting to use was buggy - it was losing records in the sort/reduce
phase I believe.  Or perhaps I was just using it incorrectly, but the
same exact code worked just fine when I ran it on a single host with a
smaller dataset and just piped map | sort | reduce without using
Hadoop.  The documentation was pretty sparse on how to get Hadoop to
work via stdin/out with non-Java code and it is quite possible I
wasn't quite doing things right.  In the end my problem wasn't big
enough to necessitate using Hadoop and I used GNU parallel instead.

--
Rich


  reply	other threads:[~2014-08-19 10:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-06 16:50 [gentoo-user] Clusters on Gentoo ? James
2014-08-07  7:38 ` J. Roeleveld
2014-08-07 11:10   ` Alec Ten Harmsel
2014-08-07 22:16     ` [gentoo-user] " James
2014-08-08  2:36       ` Alec Ten Harmsel
2014-08-08  6:29         ` J. Roeleveld
2014-08-08 10:17           ` Alec Ten Harmsel
2014-08-17 19:46 ` [gentoo-user] " thegeezer
2014-08-18 14:31   ` J. Roeleveld
2014-08-18 14:50     ` Rich Freeman
2014-08-18 14:53       ` Alec Ten Harmsel
2014-08-19  9:34         ` J. Roeleveld
2014-08-19 10:33           ` Rich Freeman [this message]
2014-08-19 10:45             ` J. Roeleveld
2014-08-19 10:52           ` Alec Ten Harmsel
2014-08-18 19:09     ` thegeezer
2014-08-19  9:18       ` J. Roeleveld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGfcS_nKhm6PsGdgJWdstBCOzO-OnEd36ZaThM3uiDiYBQDjFQ@mail.gmail.com \
    --to=rich0@gentoo.org \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox