From: Brian Harring <ferringb@gmail.com>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Date: Fri, 1 Jun 2012 21:11:19 -0700 [thread overview]
Message-ID: <20120602041119.GA9296@localhost> (raw)
In-Reply-To: <201206011841.23302.vapier@gentoo.org>
On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote:
> regenerating autotools in packages that have a lot of AC_CONFIG_SUBDIRS is
> really slow due to the serialization of all the dirs (which really isn't
> required). so i took some code that i merged into portage semi-recently
> (which is based on work by Brian, although i'm not sure he wants to admit it)
I've come up with worse things in the name of speed (see the
daemonized ebuild processor...) ;)
> and put it into a new multiprocessing.eclass. this way people can generically
> utilize this in their own eclasses/ebuilds.
>
> it doesn't currently support nesting. not sure if i should fix that.
>
> i'll follow up with an example of parallelizing of eautoreconf. for
> mail-filter/maildrop on my 4 core system, it cuts the time needed to run from
> ~2.5 min to ~1 min.
My main concern here is cleanup during uncontrolled shutdown; if the
backgrounded job has hung itself for some reason, the job *will* just
sit; I'm not aware of any of the PMs doing process tree killing, or
cgroups containment; in my copious free time I'm planning on adding a
'cjobs' tool for others, and adding cgroups awareness into pkgcore;
that said, none of 'em do this *now*, thus my concern.
> -mike
>
> # Copyright 1999-2012 Gentoo Foundation
> # Distributed under the terms of the GNU General Public License v2
> # $Header: $
>
> # @ECLASS: multiprocessing.eclass
> # @MAINTAINER:
> # base-system@gentoo.org
> # @AUTHORS:
> # Brian Harring <ferringb@gentoo.org>
> # Mike Frysinger <vapier@gentoo.org>
> # @BLURB: parallelization with bash (wtf?)
> # @DESCRIPTION:
> # The multiprocessing eclass contains a suite of functions that allow ebuilds
> # to quickly run things in parallel using shell code.
>
> if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then
> ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank"
>
> # @FUNCTION: makeopts_jobs
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs number
> # specified therein. Useful for running non-make tools in parallel too.
> # i.e. if the user has MAKEOPTS=-j9, this will show "9".
> # We can't return the number as bash normalizes it to [0, 255]. If the flags
> # haven't specified a -j flag, then "1" is shown as that is the default `make`
> # uses. Since there's no way to represent infinity, we return 999 if the user
> # has -j without a number.
> makeopts_jobs() {
> [[ $# -eq 0 ]] && set -- ${MAKEOPTS}
> # This assumes the first .* will be more greedy than the second .*
> # since POSIX doesn't specify a non-greedy match (i.e. ".*?").
> local jobs=$(echo " $* " | sed -r -n \
> -e 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \
> -e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p')
> echo ${jobs:-1}
> }
This function belongs in eutils, or somewhere similar- pretty sure
we've got variants of this in multiple spots. I'd prefer a single
point to change if/when we add a way to pass parallelism down into the
env via EAPI.
> # @FUNCTION: multijob_init
> # @USAGE: [${MAKEOPTS}]
> # @DESCRIPTION:
> # Setup the environment for executing things in parallel.
> # You must call this before any other multijob function.
> multijob_init() {
> # Setup a pipe for children to write their pids to when they finish.
> mj_control_pipe="${T}/multijob.pipe"
> mkfifo "${mj_control_pipe}"
> exec {mj_control_fd}<>${mj_control_pipe}
> rm -f "${mj_control_pipe}"
Nice; hadn't thought to wipe the pipe on the way out.
>
> # See how many children we can fork based on the user's settings.
> mj_max_jobs=$(makeopts_jobs "$@")
> mj_num_jobs=0
> }
>
> # @FUNCTION: multijob_child_init
> # @DESCRIPTION:
> # You must call this first in the forked child process.
> multijob_child_init() {
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
>
> trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT
> trap 'exit 1' INT TERM
> }
Kind of dislike this form since it means consuming code has to be
aware of, and do the () & trick.
A helper function, something like
multijob_child_job() {
(
multijob_child_init
"$@"
) &
multijob_post_fork || die "game over man, game over"
}
Doing so, would conver your eautoreconf from:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
if [[ -d ${x} ]] ; then
pushd "${x}" >/dev/null
(
multijob_child_init
AT_NOELIBTOOLIZE="yes" eautoreconf
) &
multijob_post_fork || die
popd >/dev/null
fi
done
To:
for x in $(autotools_check_macro_val AC_CONFIG_SUBDIRS) ; do
if [[ -d ${x} ]]; then
pushd "${x}" > /dev/null
AT_NOELIBTOOLIZE="yes" multijob_child_job eautoreconf
popd
fi
done
Note, if we used an eval in multijob_child_job, the pushd/popd could
be folded in. Debatable.
> # @FUNCTION: multijob_post_fork
> # @DESCRIPTION:
> # You must call this in the parent process after forking a child process.
> # If the parallel limit has been hit, it will wait for one to finish and
> # return the child's exit status.
> multijob_post_fork() {
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
>
> : $(( ++mj_num_jobs ))
> if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then
> multijob_finish_one
> fi
> return $?
> }
>
> # @FUNCTION: multijob_finish_one
> # @DESCRIPTION:
> # Wait for a single process to exit and return its exit code.
> multijob_finish_one() {
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
>
> local pid ret
> read -r -u ${mj_control_fd} pid ret
Mildly concerned about the failure case here- specifically if the read
fails (fd was closed, take your pick).
> : $(( --mj_num_jobs ))
> return ${ret}
> }
>
> # @FUNCTION: multijob_finish
> # @DESCRIPTION:
> # Wait for all pending processes to exit and return the bitwise or
> # of all their exit codes.
> multijob_finish() {
> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments"
Tend to think this should do cleanup, then die if someone invoked the
api incorrectly; I'd rather see the children reaped before this blows
up.
> local ret=0
> while [[ ${mj_num_jobs} -gt 0 ]] ; do
> multijob_finish_one
> : $(( ret |= $? ))
> done
> # Let bash clean up its internal child tracking state.
> wait
> return ${ret}
> }
>
> fi
~harring
next prev parent reply other threads:[~2012-06-02 4:11 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-01 22:41 [gentoo-dev] multiprocessing.eclass: doing parallel work in bash Mike Frysinger
2012-06-01 22:50 ` Mike Frysinger
2012-06-02 4:11 ` Brian Harring [this message]
2012-06-02 4:57 ` Mike Frysinger
2012-06-02 9:23 ` Cyprien Nicolas
2012-06-02 9:52 ` David Leverton
2012-06-02 19:18 ` Mike Frysinger
2012-06-02 19:54 ` Mike Frysinger
2012-06-02 20:39 ` Zac Medico
2012-06-02 21:12 ` Mike Frysinger
2012-06-02 23:29 ` Zac Medico
2012-06-02 23:58 ` Mike Frysinger
2012-06-02 21:31 ` Michał Górny
2012-06-02 22:50 ` Zac Medico
2012-06-02 23:47 ` Brian Harring
2012-06-03 1:04 ` Zac Medico
2012-06-03 1:10 ` Zac Medico
2012-06-03 7:15 ` Michał Górny
2012-06-03 7:18 ` Zac Medico
2012-06-02 23:59 ` Brian Harring
2012-06-03 5:05 ` Mike Frysinger
2012-06-03 6:53 ` Zac Medico
2012-06-03 5:08 ` Mike Frysinger
2012-06-03 22:16 ` Zac Medico
2012-06-05 6:10 ` Mike Frysinger
2012-06-03 22:21 ` Zac Medico
2012-06-04 1:41 ` Zac Medico
2012-06-05 6:14 ` Mike Frysinger
2012-06-07 4:57 ` Mike Frysinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120602041119.GA9296@localhost \
--to=ferringb@gmail.com \
--cc=gentoo-dev@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox