From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org)
	by finch.gentoo.org with esmtp (Exim 4.60)
	(envelope-from <gentoo-dev+bounces-52164-garchives=archives.gentoo.org@lists.gentoo.org>)
	id 1Say39-0006Kr-Ce
	for garchives@archives.gentoo.org; Sat, 02 Jun 2012 23:48:05 +0000
Received: from pigeon.gentoo.org (localhost [127.0.0.1])
	by pigeon.gentoo.org (Postfix) with SMTP id 64C53E076B;
	Sat,  2 Jun 2012 23:47:42 +0000 (UTC)
Received: from mail-pb0-f53.google.com (mail-pb0-f53.google.com [209.85.160.53])
	by pigeon.gentoo.org (Postfix) with ESMTP id 0A066E0747
	for <gentoo-dev@lists.gentoo.org>; Sat,  2 Jun 2012 23:47:05 +0000 (UTC)
Received: by pbbrr13 with SMTP id rr13so5266018pbb.40
        for <gentoo-dev@lists.gentoo.org>; Sat, 02 Jun 2012 16:47:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        bh=mJWi7jQrJqrSBcdv1Wop6hDT+mNjhhQ1++J9llHWSEc=;
        b=N1MvVsiSRu1fggZkyezdcQ5qsH9UMCMW0Tlzr8jVkIESx2NXUHqzIQ9MsU8MX0kxTR
         34hx30Wq7WKpA14eqzHsaQ7G7MUlJMR52J0O3vPbmvlfo4xO07L226qaH1QyvSuCDixl
         7TJfptZbfX53KHV2+9yO4j1IFYsBFlPtaDa3G1DbY5qqM7owtP1vpRSujV/yAASUCAX1
         Dlk5AtOxY9YwXUWzZ4OkrEF1+qxiKBhgMfOQli4+g9GtsRztYTVTCEL8Gyh4M7R4ufYQ
         TYpqkWQDriHmPjZbCXSLEi0cwqGvlTt3yNc13wvT8dBAruQMIgNBMdGiToPcpnppLb0L
         IliA==
Received: by 10.68.194.201 with SMTP id hy9mr24939551pbc.69.1338680825263;
        Sat, 02 Jun 2012 16:47:05 -0700 (PDT)
Received: from smtp.gmail.com:587 (74-95-192-101-SFBA.hfc.comcastbusiness.net. [74.95.192.101])
        by mx.google.com with ESMTPS id ua6sm7758646pbc.20.2012.06.02.16.47.02
        (version=TLSv1/SSLv3 cipher=OTHER);
        Sat, 02 Jun 2012 16:47:04 -0700 (PDT)
Received: by smtp.gmail.com:587 (sSMTP sendmail emulation); Sat, 02 Jun 2012 16:47:26 -0700
Date: Sat, 2 Jun 2012 16:47:26 -0700
From: Brian Harring <ferringb@gmail.com>
To: Micha?? G??rny <mgorny@gentoo.org>
Cc: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Message-ID: <20120602234726.GB9296@localhost>
References: <201206011841.23302.vapier@gentoo.org>
 <201206021554.04552.vapier@gentoo.org>
 <20120602233132.24cc67ef@pomiocik.lan>
 <4FCA989E.3050307@gentoo.org>
Precedence: bulk
List-Post: <mailto:gentoo-dev@lists.gentoo.org>
List-Help: <mailto:gentoo-dev+help@lists.gentoo.org>
List-Unsubscribe: <mailto:gentoo-dev+unsubscribe@lists.gentoo.org>
List-Subscribe: <mailto:gentoo-dev+subscribe@lists.gentoo.org>
List-Id: Gentoo Linux mail <gentoo-dev.gentoo.org>
X-BeenThere: gentoo-dev@lists.gentoo.org
Reply-to: gentoo-dev@lists.gentoo.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4FCA989E.3050307@gentoo.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Archives-Salt: 3bb02654-4bcd-4d76-85f1-071f1840c03a
X-Archives-Hash: 5da583d30b012113ad2f8961961486ea

On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote:
> On 06/02/2012 02:31 PM, Micha?? G??rny wrote:
> > On Sat, 2 Jun 2012 15:54:03 -0400
> > Mike Frysinger <vapier@gentoo.org> wrote:
> > 
> >> # @FUNCTION: redirect_alloc_fd
> >> # @USAGE: <var> <file> [redirection]
> >> # @DESCRIPTION:
> > 
> > (...and a lot of code)
> > 
> > I may be wrong but wouldn't it be simpler to just stick with a named
> > pipe here? Well, at first glance you wouldn't be able to read exactly
> > one result at a time but is it actually useful?
> 
> I'm pretty sure that the pipe has remain constantly open in read mode
> (which can only be done by assigning it a file descriptor). Otherwise,
> there's a race condition that can occur, where a write is lost because
> it's written just before the reader closes the pipe.

There isn't a race; write side, it'll block once it exceeds pipe buf 
size; read side, bash's read functionality is explicitly byte by byte 
reads to avoid consuming data it doesn't need.

That said, Mgorny's suggestion ignores that the the code already is 
pointed at a fifo.  Presume he's suggesting "Just open it everytime 
you need to fuck with it"... which, sure, 'cept that complicates the 
read side (either having to find a free fd, open to it, then close 
it), or abuse cat or $(<) to pull the results and make the reclaim 
code handle multiple results in a single shot.

Frankly, don't see the point in doing that.  The code isn't that 
complex frankly, and we *need* the overhead of this to be minimal- 
the hand off/reclaim is effectively the bottleneck for scaling.

If the jobs you've backgrounded are a second a piece, it matters less; 
if they're quick little bursts of activity, the scaling *will* be 
limited by how fast we can blast off/reclaim jobs.  Keep in mind that 
the main process has to go find more work to queue up between the 
reclaims, thus this matters more than you'd think.


Either way, that limit varies dependent on time required for each job 
vs # of cores; that said, you run code like this on a 48 core and you 
see it start becoming an actual bottleneck (which is why I came up 
with this hacky bash semaphore).

~harring