From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id B38C813877A for ; Tue, 19 Aug 2014 10:34:35 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 3679FE0944; Tue, 19 Aug 2014 10:34:30 +0000 (UTC) Received: from mail-vc0-f181.google.com (mail-vc0-f181.google.com [209.85.220.181]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id D6F1DE0941 for ; Tue, 19 Aug 2014 10:34:27 +0000 (UTC) Received: by mail-vc0-f181.google.com with SMTP id lf12so7218579vcb.12 for ; Tue, 19 Aug 2014 03:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=tItWQPWnk1UJv6RP/JtCHD5+pKfSeasoZy+B6jDaIdY=; b=ROQ/05aSxPmPQ3LJwXbYveAzn15NWQ75vc8G+ArXxe4UgTq8+ksCo1mP82nS7G+xkr GCvO57Gey/LBVjF/NQtD/MsRpzws6NENUQSFCmzT7S8PitQFugRM3ODyVzTtijpxgMwI FS967+0FdovF96/SorOdVG/SPfUyeGyPT1FLddbM9RH6Pr+wKOKMsEVqVLZ9WBXC9iEM vBZ3XZPJelsOyJyo2wxoaUy1ToDFYGO5ApbJ38LGj8jEKiDvavIewlJzGCZgK/mEMdDb jsoRQwp2biG1mOKTXbOHa3Q0b4NPH6MwBLQ1M4aGjzen160J5V+HCPO9uHZUinOyehVM Vl7Q== Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 X-Received: by 10.220.114.66 with SMTP id d2mr29437786vcq.27.1408444409387; Tue, 19 Aug 2014 03:33:29 -0700 (PDT) Sender: freemanrich@gmail.com Received: by 10.52.8.229 with HTTP; Tue, 19 Aug 2014 03:33:29 -0700 (PDT) In-Reply-To: <1598929.3JxaQGxvRs@andromeda> References: <53F2137F.2040004@alectenharmsel.com> <1598929.3JxaQGxvRs@andromeda> Date: Tue, 19 Aug 2014 06:33:29 -0400 X-Google-Sender-Auth: _tzvDRNvJT6vqCeO2TLK7yjqW0U Message-ID: Subject: Re: [gentoo-user] Clusters on Gentoo ? From: Rich Freeman To: gentoo-user@lists.gentoo.org Content-Type: text/plain; charset=UTF-8 X-Archives-Salt: bbc7f547-e311-4aae-824a-475e69e21cc2 X-Archives-Hash: fef7e5ce5c7a71dd48433dfaf8f221e6 On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld wrote: > On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote: >> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote: >> > Hadoop is a very specialized tool. It does what it does very well, >> > but if you want to use it for something other than map/reduce then >> > consider carefully whether it is the right tool for the job. >> >> Agreed; unless you have decent hardware and can comfortably measure >> your data in TB, it'll be quicker to use something else once you factor >> in the administration time and learning curve. > > The benefit of clustering technologies is that you don't need high-end > hardware to start with. You can use the old hardware you found collecting dust > in the basement. > > The learning curve isn't as steep as it used to be. There are plenty of tools > to make it easier to start using Hadoop. > As long as you're counting words and don't mind coding everything in Java. :) I found that if you want to avoid using Java, then the available documentation plummets, and I'm pretty sure the version I was attempting to use was buggy - it was losing records in the sort/reduce phase I believe. Or perhaps I was just using it incorrectly, but the same exact code worked just fine when I ran it on a single host with a smaller dataset and just piped map | sort | reduce without using Hadoop. The documentation was pretty sparse on how to get Hadoop to work via stdin/out with non-Java code and it is quite possible I wasn't quite doing things right. In the end my problem wasn't big enough to necessitate using Hadoop and I used GNU parallel instead. -- Rich