* [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
@ 2024-05-12 2:26 Michał Górny
2024-05-12 2:44 ` Sam James
2024-05-12 17:22 ` Florian Schmaus
0 siblings, 2 replies; 7+ messages in thread
From: Michał Górny @ 2024-05-12 2:26 UTC (permalink / raw
To: gentoo-dev; +Cc: Michał Górny
Unpack crates in parallel using xargs to utilize multicore systems
better. Perform checksumming via a single sha256sum invocation.
For dev-python/watchfiles, this speeds up unpacking on my machine
from 2.6 s to 0.75 s (warm cache).
Signed-off-by: Michał Górny <mgorny@gentoo.org>
---
eclass/cargo.eclass | 56 ++++++++++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 23 deletions(-)
diff --git a/eclass/cargo.eclass b/eclass/cargo.eclass
index 0f2da982f60c..5a16d3a30528 100644
--- a/eclass/cargo.eclass
+++ b/eclass/cargo.eclass
@@ -329,40 +329,50 @@ _cargo_gen_git_config() {
cargo_src_unpack() {
debug-print-function ${FUNCNAME} "$@"
- mkdir -p "${ECARGO_VENDOR}" || die
- mkdir -p "${S}" || die
+ mkdir -p "${ECARGO_VENDOR}" "${S}" || die
local archive shasum pkg
+ local crates=()
for archive in ${A}; do
case "${archive}" in
*.crate)
- # when called by pkgdiff-mg, do not unpack crates
- [[ ${PKGBUMPING} == ${PVR} ]] && continue
-
- ebegin "Loading ${archive} into Cargo registry"
- tar -xf "${DISTDIR}"/${archive} -C "${ECARGO_VENDOR}/" || die
- # generate sha256sum of the crate itself as cargo needs this
- shasum=$(sha256sum "${DISTDIR}"/${archive} | cut -d ' ' -f 1)
- pkg=$(basename ${archive} .crate)
- cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json
- {
- "package": "${shasum}",
- "files": {}
- }
- EOF
- # if this is our target package we need it in ${WORKDIR} too
- # to make ${S} (and handle any revisions too)
- if [[ ${P} == ${pkg}* ]]; then
- tar -xf "${DISTDIR}"/${archive} -C "${WORKDIR}" || die
- fi
- eend $?
+ crates+=( "${archive}" )
;;
*)
- unpack ${archive}
+ unpack "${archive}"
;;
esac
done
+ if [[ ${PKGBUMPING} != ${PVR} ]]; then
+ pushd "${DISTDIR}" >/dev/null || die
+
+ ebegin "Unpacking crates"
+ printf '%s\0' "${crates[@]}" |
+ xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
+ tar -x -C "${ECARGO_VENDOR}" -f
+ assert
+ eend $?
+
+ while read -d '' -r shasum archive; do
+ pkg=${archive%.crate}
+ cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json || die
+ {
+ "package": "${shasum}",
+ "files": {}
+ }
+ EOF
+
+ # if this is our target package we need it in ${WORKDIR} too
+ # to make ${S} (and handle any revisions too)
+ if [[ ${P} == ${pkg}* ]]; then
+ tar -xf "${archive}" -C "${WORKDIR}" || die
+ fi
+ done < <(sha256sum -z "${crates[@]}" || die)
+
+ popd >/dev/null || die
+ fi
+
cargo_gen_config
}
--
2.45.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
2024-05-12 2:26 [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking Michał Górny
@ 2024-05-12 2:44 ` Sam James
2024-05-12 17:22 ` Florian Schmaus
1 sibling, 0 replies; 7+ messages in thread
From: Sam James @ 2024-05-12 2:44 UTC (permalink / raw
To: Michał Górny; +Cc: gentoo-dev
Michał Górny <mgorny@gentoo.org> writes:
> Unpack crates in parallel using xargs to utilize multicore systems
> better. Perform checksumming via a single sha256sum invocation.
>
> For dev-python/watchfiles, this speeds up unpacking on my machine
> from 2.6 s to 0.75 s (warm cache).
>
> Signed-off-by: Michał Górny <mgorny@gentoo.org>
For completeness (acked on PR), lgtm. I assume tested by building
all cargo inheritees (nearly said 'heirs' but.. lol)?
Give a chance for others to look though.
Also, thank you!
> ---
> eclass/cargo.eclass | 56 ++++++++++++++++++++++++++-------------------
> 1 file changed, 33 insertions(+), 23 deletions(-)
>
> diff --git a/eclass/cargo.eclass b/eclass/cargo.eclass
> index 0f2da982f60c..5a16d3a30528 100644
> --- a/eclass/cargo.eclass
> +++ b/eclass/cargo.eclass
> @@ -329,40 +329,50 @@ _cargo_gen_git_config() {
> cargo_src_unpack() {
> debug-print-function ${FUNCNAME} "$@"
>
> - mkdir -p "${ECARGO_VENDOR}" || die
> - mkdir -p "${S}" || die
> + mkdir -p "${ECARGO_VENDOR}" "${S}" || die
>
> local archive shasum pkg
> + local crates=()
> for archive in ${A}; do
> case "${archive}" in
> *.crate)
> - # when called by pkgdiff-mg, do not unpack crates
> - [[ ${PKGBUMPING} == ${PVR} ]] && continue
> -
> - ebegin "Loading ${archive} into Cargo registry"
> - tar -xf "${DISTDIR}"/${archive} -C "${ECARGO_VENDOR}/" || die
> - # generate sha256sum of the crate itself as cargo needs this
> - shasum=$(sha256sum "${DISTDIR}"/${archive} | cut -d ' ' -f 1)
> - pkg=$(basename ${archive} .crate)
> - cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json
> - {
> - "package": "${shasum}",
> - "files": {}
> - }
> - EOF
> - # if this is our target package we need it in ${WORKDIR} too
> - # to make ${S} (and handle any revisions too)
> - if [[ ${P} == ${pkg}* ]]; then
> - tar -xf "${DISTDIR}"/${archive} -C "${WORKDIR}" || die
> - fi
> - eend $?
> + crates+=( "${archive}" )
> ;;
> *)
> - unpack ${archive}
> + unpack "${archive}"
> ;;
> esac
> done
>
> + if [[ ${PKGBUMPING} != ${PVR} ]]; then
> + pushd "${DISTDIR}" >/dev/null || die
> +
> + ebegin "Unpacking crates"
> + printf '%s\0' "${crates[@]}" |
> + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
> + tar -x -C "${ECARGO_VENDOR}" -f
> + assert
> + eend $?
> +
> + while read -d '' -r shasum archive; do
> + pkg=${archive%.crate}
> + cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json || die
> + {
> + "package": "${shasum}",
> + "files": {}
> + }
> + EOF
> +
> + # if this is our target package we need it in ${WORKDIR} too
> + # to make ${S} (and handle any revisions too)
> + if [[ ${P} == ${pkg}* ]]; then
> + tar -xf "${archive}" -C "${WORKDIR}" || die
> + fi
> + done < <(sha256sum -z "${crates[@]}" || die)
> +
> + popd >/dev/null || die
> + fi
> +
> cargo_gen_config
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
2024-05-12 2:26 [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking Michał Górny
2024-05-12 2:44 ` Sam James
@ 2024-05-12 17:22 ` Florian Schmaus
2024-05-12 18:21 ` Michał Górny
1 sibling, 1 reply; 7+ messages in thread
From: Florian Schmaus @ 2024-05-12 17:22 UTC (permalink / raw
To: gentoo-dev, Michał Górny
[-- Attachment #1.1.1: Type: text/plain, Size: 3256 bytes --]
On 12/05/2024 04.26, Michał Górny wrote:
> Unpack crates in parallel using xargs to utilize multicore systems
> better. Perform checksumming via a single sha256sum invocation.
>
> For dev-python/watchfiles, this speeds up unpacking on my machine
> from 2.6 s to 0.75 s (warm cache).
>
> Signed-off-by: Michał Górny <mgorny@gentoo.org>
> ---
> eclass/cargo.eclass | 56 ++++++++++++++++++++++++++-------------------
> 1 file changed, 33 insertions(+), 23 deletions(-)
>
> diff --git a/eclass/cargo.eclass b/eclass/cargo.eclass
> index 0f2da982f60c..5a16d3a30528 100644
> --- a/eclass/cargo.eclass
> +++ b/eclass/cargo.eclass
> @@ -329,40 +329,50 @@ _cargo_gen_git_config() {
> cargo_src_unpack() {
> debug-print-function ${FUNCNAME} "$@"
>
> - mkdir -p "${ECARGO_VENDOR}" || die
> - mkdir -p "${S}" || die
> + mkdir -p "${ECARGO_VENDOR}" "${S}" || die
>
> local archive shasum pkg
> + local crates=()
> for archive in ${A}; do
> case "${archive}" in
> *.crate)
> - # when called by pkgdiff-mg, do not unpack crates
> - [[ ${PKGBUMPING} == ${PVR} ]] && continue
> -
> - ebegin "Loading ${archive} into Cargo registry"
> - tar -xf "${DISTDIR}"/${archive} -C "${ECARGO_VENDOR}/" || die
> - # generate sha256sum of the crate itself as cargo needs this
> - shasum=$(sha256sum "${DISTDIR}"/${archive} | cut -d ' ' -f 1)
> - pkg=$(basename ${archive} .crate)
> - cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json
> - {
> - "package": "${shasum}",
> - "files": {}
> - }
> - EOF
> - # if this is our target package we need it in ${WORKDIR} too
> - # to make ${S} (and handle any revisions too)
> - if [[ ${P} == ${pkg}* ]]; then
> - tar -xf "${DISTDIR}"/${archive} -C "${WORKDIR}" || die
> - fi
> - eend $?
> + crates+=( "${archive}" )
> ;;
> *)
> - unpack ${archive}
> + unpack "${archive}"
> ;;
> esac
> done
>
> + if [[ ${PKGBUMPING} != ${PVR} ]]; then
> + pushd "${DISTDIR}" >/dev/null || die
> +
> + ebegin "Unpacking crates"
> + printf '%s\0' "${crates[@]}" |
> + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
Consider using get_makeopts_jobs instead of makeopts_jobs, as it
searches more variables for --jobs.
N.B.: If this where asking for a load-average limit, then using
get_makeopts_loadavg would be the ideal way to pick up portage's default
wrt --load-average. Therefore we should IMHO encourage using the
get_makeopts_* functions over the (legacy?) non-get_ variants.
> + tar -x -C "${ECARGO_VENDOR}" -f
> + assert
> + eend $?
> +
> + while read -d '' -r shasum archive; do
> + pkg=${archive%.crate}
> + cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json || die
> + {
> + "package": "${shasum}",
> + "files": {}
> + }
> + EOF
> +
> + # if this is our target package we need it in ${WORKDIR} too
> + # to make ${S} (and handle any revisions too)
> + if [[ ${P} == ${pkg}* ]]; then
> + tar -xf "${archive}" -C "${WORKDIR}" || die
> + fi
> + done < <(sha256sum -z "${crates[@]}" || die)
> +
> + popd >/dev/null || die
> + fi
> +
> cargo_gen_config
> }
>
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17797 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
2024-05-12 17:22 ` Florian Schmaus
@ 2024-05-12 18:21 ` Michał Górny
2024-05-13 14:00 ` Florian Schmaus
2024-05-16 6:55 ` [gentoo-dev] Obtaining values for --jobs and --load-average Florian Schmaus
0 siblings, 2 replies; 7+ messages in thread
From: Michał Górny @ 2024-05-12 18:21 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]
On Sun, 2024-05-12 at 19:22 +0200, Florian Schmaus wrote:
> On 12/05/2024 04.26, Michał Górny wrote:
> > Unpack crates in parallel using xargs to utilize multicore systems
> > better. Perform checksumming via a single sha256sum invocation.
> >
> > For dev-python/watchfiles, this speeds up unpacking on my machine
> > from 2.6 s to 0.75 s (warm cache).
> >
> > Signed-off-by: Michał Górny <mgorny@gentoo.org>
> > ---
> > eclass/cargo.eclass | 56 ++++++++++++++++++++++++++-------------------
> > 1 file changed, 33 insertions(+), 23 deletions(-)
> >
> > diff --git a/eclass/cargo.eclass b/eclass/cargo.eclass
> > index 0f2da982f60c..5a16d3a30528 100644
> > --- a/eclass/cargo.eclass
> > +++ b/eclass/cargo.eclass
> > @@ -329,40 +329,50 @@ _cargo_gen_git_config() {
> > cargo_src_unpack() {
> > debug-print-function ${FUNCNAME} "$@"
> >
> > - mkdir -p "${ECARGO_VENDOR}" || die
> > - mkdir -p "${S}" || die
> > + mkdir -p "${ECARGO_VENDOR}" "${S}" || die
> >
> > local archive shasum pkg
> > + local crates=()
> > for archive in ${A}; do
> > case "${archive}" in
> > *.crate)
> > - # when called by pkgdiff-mg, do not unpack crates
> > - [[ ${PKGBUMPING} == ${PVR} ]] && continue
> > -
> > - ebegin "Loading ${archive} into Cargo registry"
> > - tar -xf "${DISTDIR}"/${archive} -C "${ECARGO_VENDOR}/" || die
> > - # generate sha256sum of the crate itself as cargo needs this
> > - shasum=$(sha256sum "${DISTDIR}"/${archive} | cut -d ' ' -f 1)
> > - pkg=$(basename ${archive} .crate)
> > - cat <<- EOF > ${ECARGO_VENDOR}/${pkg}/.cargo-checksum.json
> > - {
> > - "package": "${shasum}",
> > - "files": {}
> > - }
> > - EOF
> > - # if this is our target package we need it in ${WORKDIR} too
> > - # to make ${S} (and handle any revisions too)
> > - if [[ ${P} == ${pkg}* ]]; then
> > - tar -xf "${DISTDIR}"/${archive} -C "${WORKDIR}" || die
> > - fi
> > - eend $?
> > + crates+=( "${archive}" )
> > ;;
> > *)
> > - unpack ${archive}
> > + unpack "${archive}"
> > ;;
> > esac
> > done
> >
> > + if [[ ${PKGBUMPING} != ${PVR} ]]; then
> > + pushd "${DISTDIR}" >/dev/null || die
> > +
> > + ebegin "Unpacking crates"
> > + printf '%s\0' "${crates[@]}" |
> > + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
>
> Consider using get_makeopts_jobs instead of makeopts_jobs, as it
> searches more variables for --jobs.
Whose bright idea was to add a second similarly named function that does
roughly the same thing but apparently differently? It can hardly get
more confusing.
--
Best regards,
Michał Górny
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
2024-05-12 18:21 ` Michał Górny
@ 2024-05-13 14:00 ` Florian Schmaus
2024-05-13 14:02 ` Michał Górny
2024-05-16 6:55 ` [gentoo-dev] Obtaining values for --jobs and --load-average Florian Schmaus
1 sibling, 1 reply; 7+ messages in thread
From: Florian Schmaus @ 2024-05-13 14:00 UTC (permalink / raw
To: gentoo-dev, Michał Górny
On 12/05/2024 20.21, Michał Górny wrote:
> On Sun, 2024-05-12 at 19:22 +0200, Florian Schmaus wrote:
>> On 12/05/2024 04.26, Michał Górny wrote:
>>> + if [[ ${PKGBUMPING} != ${PVR} ]]; then
>>> + pushd "${DISTDIR}" >/dev/null || die
>>> +
>>> + ebegin "Unpacking crates"
>>> + printf '%s\0' "${crates[@]}" |
>>> + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
>>
>> Consider using get_makeopts_jobs instead of makeopts_jobs, as it
>> searches more variables for --jobs.
>
> Whose bright idea was to add a second similarly named function that does
> roughly the same thing but apparently differently? It can hardly get
> more confusing.
You are absolutely right, it sucks that we have two very similar methods.
You are invited to suggest how the situation can be improved. However,
rambling without presenting alternatives is not helpful in any way.
Potentially, you will either discover that there is a reason why things
are the way they are, or find a better solution.
- Flow
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking
2024-05-13 14:00 ` Florian Schmaus
@ 2024-05-13 14:02 ` Michał Górny
0 siblings, 0 replies; 7+ messages in thread
From: Michał Górny @ 2024-05-13 14:02 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]
On Mon, 2024-05-13 at 16:00 +0200, Florian Schmaus wrote:
> On 12/05/2024 20.21, Michał Górny wrote:
> > On Sun, 2024-05-12 at 19:22 +0200, Florian Schmaus wrote:
> > > On 12/05/2024 04.26, Michał Górny wrote:
> > > > + if [[ ${PKGBUMPING} != ${PVR} ]]; then
> > > > + pushd "${DISTDIR}" >/dev/null || die
> > > > +
> > > > + ebegin "Unpacking crates"
> > > > + printf '%s\0' "${crates[@]}" |
> > > > + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
> > >
> > > Consider using get_makeopts_jobs instead of makeopts_jobs, as it
> > > searches more variables for --jobs.
> >
> > Whose bright idea was to add a second similarly named function that does
> > roughly the same thing but apparently differently? It can hardly get
> > more confusing.
>
> You are absolutely right, it sucks that we have two very similar methods.
>
> You are invited to suggest how the situation can be improved. However,
> rambling without presenting alternatives is not helpful in any way.
>
My suggestion would be for the person who introduced new methods
and implicitly claimed the old methods to be "legacy" to put an actual
effort to migrate consumers.
--
Best regards,
Michał Górny
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 512 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gentoo-dev] Obtaining values for --jobs and --load-average
2024-05-12 18:21 ` Michał Górny
2024-05-13 14:00 ` Florian Schmaus
@ 2024-05-16 6:55 ` Florian Schmaus
1 sibling, 0 replies; 7+ messages in thread
From: Florian Schmaus @ 2024-05-16 6:55 UTC (permalink / raw
To: gentoo-dev, Michał Górny
[-- Attachment #1.1.1: Type: text/plain, Size: 2626 bytes --]
On 12/05/2024 20.21, Michał Górny wrote:
> On Sun, 2024-05-12 at 19:22 +0200, Florian Schmaus wrote:
>> On 12/05/2024 04.26, Michał Górny wrote:
>>> + if [[ ${PKGBUMPING} != ${PVR} ]]; then
>>> + pushd "${DISTDIR}" >/dev/null || die
>>> +
>>> + ebegin "Unpacking crates"
>>> + printf '%s\0' "${crates[@]}" |
>>> + xargs -0 -P "$(makeopts_jobs)" -n 1 -- \
>>
>> Consider using get_makeopts_jobs instead of makeopts_jobs, as it
>> searches more variables for --jobs.
I'm sorry, I was wrong.
I thought I remembered that we only extended the new get_makeopts_*
functions so that they also search other relevant variables besides
MAKEOPTS for --jobs/--load-avgerage. This is wrong, the already existing
makeopts_* functions were also changed at that time so that they take
GNUMAKEFLAGS & Co. into account.
As a reminder, this change was due to newer Portage versions setting
reasonable default values for --jobs and --load-average. And
GNUMAKEFLAGS is used for --load-average.
The problem is that the order of the parameters of the makeopts_*
functions is unfavorable. First comes the (optional) opts/flags string,
then the (optional) default value. This means that as soon as you want
to specify the default value explicitly, you also have to specify the
opts/flags string. As a result, it can easily happen that only $MAKEOPTS
is specified, which can lead to the default value set by portage not
being taken into account, especially --load-average. This is, for
example, the case with app-i18n/mozc and dev-qt/qtwebengine.
The get_makeopts_* functions do not suffer from this issue.
> My suggestion would be for the person who introduced new methods
> and implicitly claimed the old methods to be "legacy" to put an actual
> effort to migrate consumers.
That is probably sensible and I believe also possible.
I could only find a single ebuild where makeopts_ was used with a custom
make opts/flags string: sci-libs/vtk which has some logic for NINJAOPTS.
However, I wonder if it is ever sensible to specify other make
opts/flags besides what _get_all_makeopts() from multiprocessing.eclass
considers.
If your make.conf contains
NINJAOPS="-j 4 -l 4 <some-ninja-specific-args>"
then you most likely always also want to set
MAKEOPTS="-j 4"
otherwise, only ninja builds are done with "-j 4".
Of course, we could also simply add $NINJAOPTS to _get_all_makeopts().
Consequentially this raises the question if other similar variables
should also be added (SAMUFLAGS, maybe?).
I'd love to hear some opinions on this.
- Flow
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 17797 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 618 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-16 6:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-12 2:26 [gentoo-dev] [PATCH] cargo.eclass: Optimize crate unpacking Michał Górny
2024-05-12 2:44 ` Sam James
2024-05-12 17:22 ` Florian Schmaus
2024-05-12 18:21 ` Michał Górny
2024-05-13 14:00 ` Florian Schmaus
2024-05-13 14:02 ` Michał Górny
2024-05-16 6:55 ` [gentoo-dev] Obtaining values for --jobs and --load-average Florian Schmaus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox