From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1Say39-0006Kr-Ce for garchives@archives.gentoo.org; Sat, 02 Jun 2012 23:48:05 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 64C53E076B; Sat, 2 Jun 2012 23:47:42 +0000 (UTC) Received: from mail-pb0-f53.google.com (mail-pb0-f53.google.com [209.85.160.53]) by pigeon.gentoo.org (Postfix) with ESMTP id 0A066E0747 for ; Sat, 2 Jun 2012 23:47:05 +0000 (UTC) Received: by pbbrr13 with SMTP id rr13so5266018pbb.40 for ; Sat, 02 Jun 2012 16:47:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=mJWi7jQrJqrSBcdv1Wop6hDT+mNjhhQ1++J9llHWSEc=; b=N1MvVsiSRu1fggZkyezdcQ5qsH9UMCMW0Tlzr8jVkIESx2NXUHqzIQ9MsU8MX0kxTR 34hx30Wq7WKpA14eqzHsaQ7G7MUlJMR52J0O3vPbmvlfo4xO07L226qaH1QyvSuCDixl 7TJfptZbfX53KHV2+9yO4j1IFYsBFlPtaDa3G1DbY5qqM7owtP1vpRSujV/yAASUCAX1 Dlk5AtOxY9YwXUWzZ4OkrEF1+qxiKBhgMfOQli4+g9GtsRztYTVTCEL8Gyh4M7R4ufYQ TYpqkWQDriHmPjZbCXSLEi0cwqGvlTt3yNc13wvT8dBAruQMIgNBMdGiToPcpnppLb0L IliA== Received: by 10.68.194.201 with SMTP id hy9mr24939551pbc.69.1338680825263; Sat, 02 Jun 2012 16:47:05 -0700 (PDT) Received: from smtp.gmail.com:587 (74-95-192-101-SFBA.hfc.comcastbusiness.net. [74.95.192.101]) by mx.google.com with ESMTPS id ua6sm7758646pbc.20.2012.06.02.16.47.02 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 02 Jun 2012 16:47:04 -0700 (PDT) Received: by smtp.gmail.com:587 (sSMTP sendmail emulation); Sat, 02 Jun 2012 16:47:26 -0700 Date: Sat, 2 Jun 2012 16:47:26 -0700 From: Brian Harring To: Micha?? G??rny Cc: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash Message-ID: <20120602234726.GB9296@localhost> References: <201206011841.23302.vapier@gentoo.org> <201206021554.04552.vapier@gentoo.org> <20120602233132.24cc67ef@pomiocik.lan> <4FCA989E.3050307@gentoo.org> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FCA989E.3050307@gentoo.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Archives-Salt: 3bb02654-4bcd-4d76-85f1-071f1840c03a X-Archives-Hash: 5da583d30b012113ad2f8961961486ea On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote: > On 06/02/2012 02:31 PM, Micha?? G??rny wrote: > > On Sat, 2 Jun 2012 15:54:03 -0400 > > Mike Frysinger wrote: > > > >> # @FUNCTION: redirect_alloc_fd > >> # @USAGE: [redirection] > >> # @DESCRIPTION: > > > > (...and a lot of code) > > > > I may be wrong but wouldn't it be simpler to just stick with a named > > pipe here? Well, at first glance you wouldn't be able to read exactly > > one result at a time but is it actually useful? > > I'm pretty sure that the pipe has remain constantly open in read mode > (which can only be done by assigning it a file descriptor). Otherwise, > there's a race condition that can occur, where a write is lost because > it's written just before the reader closes the pipe. There isn't a race; write side, it'll block once it exceeds pipe buf size; read side, bash's read functionality is explicitly byte by byte reads to avoid consuming data it doesn't need. That said, Mgorny's suggestion ignores that the the code already is pointed at a fifo. Presume he's suggesting "Just open it everytime you need to fuck with it"... which, sure, 'cept that complicates the read side (either having to find a free fd, open to it, then close it), or abuse cat or $(<) to pull the results and make the reclaim code handle multiple results in a single shot. Frankly, don't see the point in doing that. The code isn't that complex frankly, and we *need* the overhead of this to be minimal- the hand off/reclaim is effectively the bottleneck for scaling. If the jobs you've backgrounded are a second a piece, it matters less; if they're quick little bursts of activity, the scaling *will* be limited by how fast we can blast off/reclaim jobs. Keep in mind that the main process has to go find more work to queue up between the reclaims, thus this matters more than you'd think. Either way, that limit varies dependent on time required for each job vs # of cores; that said, you run code like this on a 48 core and you see it start becoming an actual bottleneck (which is why I came up with this hacky bash semaphore). ~harring