From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 7E6611580B9 for ; Fri, 27 Aug 2021 20:50:56 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id CBA76E08F7; Fri, 27 Aug 2021 20:50:55 +0000 (UTC) Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 9D88EE08F7 for ; Fri, 27 Aug 2021 20:50:55 +0000 (UTC) Received: by mail-ej1-x62f.google.com with SMTP id i21so16539443ejd.2 for ; Fri, 27 Aug 2021 13:50:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gentoo-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=KgVFu+KaOSSXaV/iibpg0NgIUpBOJ5UgU0+YGoStIew=; b=uU/A3Vq691E+iLFrmFSbxPyfcsJRvh3AuDx4SnLV0JZvzSkYrkeYS70bcCcvevnAM9 xViMANmX2MMlXySCaJmpiWfe4e3oRvhUJXVZ44+ObBr0ZTDta5b6BSZ5G2OjEwxB1LsH aubfUr6lJKgi3TVUqFqynPyKsExq8EFqmwKRfYP3n6PaOhn40vcjcn91I40S7HZI5WxM kRTZYLbQ8TZa+0CHdher575rR9x21lwaoQEvbabK9KPLjz1MJrti/WFvVZiDfnD7lXZ2 fl6YGp16LqZST0GerXXTFf2KIRWD4pNdn+aPPibO/GqdHUwK9kiRS1bk/M7cM4AP9Shw XRfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=KgVFu+KaOSSXaV/iibpg0NgIUpBOJ5UgU0+YGoStIew=; b=g6LoU8eQXNp6VqTycwCEP6E43NXqVav4WyNKoiRd/IydqJVsWOE+7C1Nx4BDvl/lJU el/HfsyAni5VjENMJIjS1lyt80izCXvQX2txeYB38xl/dMRniFDx7Z3f4CxemlP7tgwA U3I13J7jnhlDFsANWWT5q/BPKt5RF4Q/0y7sKnHgaCayLCaE002Pon1G28SCXJ4ZGpz9 b0mVR+xX+cyIuByN1s2poHK3DeyKP/ZRYVJayEBS2CjMVT90EUo9btJp5bBa87OQ2P6R Vfhtz+sNck4Cw97suRmPseixKX4CEzTHyfqamIu2MY7cqY+iK2hui9/akjW0sP2LjvSg tbIg== X-Gm-Message-State: AOAM531LVtNlmyQ6unS2LsoICKJKGSn7Lot42mdiR4VQ9RbaqCMn7lRE 4pRGHiYrrlrTFpK6Pp/MTo7XENXDlyGZalQcC1q/qHTFkE1Wfg== X-Google-Smtp-Source: ABdhPJxnEAKFbonwo7zYhiS3NsyE2d4C2eRqTsgu0O3UWx6MS9xGS1FVmH8DQTe+YYX0Upn6v6U7jOeZysVInARZzRM= X-Received: by 2002:a17:906:640f:: with SMTP id d15mr12367740ejm.419.1630097453765; Fri, 27 Aug 2021 13:50:53 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 References: In-Reply-To: From: Alec Warner Date: Fri, 27 Aug 2021 13:50:42 -0700 Message-ID: Subject: Re: [gentoo-portage-dev] Performance tuning and parallelisation To: gentoo-portage-dev@lists.gentoo.org Content-Type: text/plain; charset="UTF-8" X-Archives-Salt: 858646bd-6cda-4fe6-a8c7-eb65cce6f2fb X-Archives-Hash: 8908b68463ec4882904ae62f991f88a2 On Thu, Aug 26, 2021 at 4:03 AM Ed W wrote: > > Hi All > > Consider this a tentative first email to test the water, but I have started to look at performance > of particularly the install phase of the emerge utility and I could use some guidance on where to go > next To clarify; the 'install' phase installs the package into ${D}. The 'qmerge' phase is the phase that merges to the livefs. > > Firstly, to define the "problem": I have found gentoo to be a great base for building custom > distributions and I use it to build a small embedded distro which runs on a couple of different > architectures. (Essentially just a "ROOT=/something emerge $some_packages"). However, I use some > packaging around binpackages to avoid uncessary rebuilds, and this highlights that "building" a > complete install using only binary packages rarely gets over a load of 1. Can we do better than > this? Seems to be highly serialised on the install phase of copying the files to the disk? In terms of parallelism it's not safe to run multiple phase functions simultaneously. This is a problem in theory and occasionally in practice (recently discussed in #gentoo-dev.) The phase functions run arbitrary code that modifies the livefs (as pre / post install and rm can touch $ROOT.) As an example we observed recently; font ebuilds will generate font related metadata. If 2 ebuilds try to generate the metadata at the same time; they can race and cause unexpected results. Sometimes this is caught in the ebuild (e.g. they wrote code like rebuild_indexes || die and the indexer returned non-zero) but can simply result in silent data corruption instead; particularly if the races go undetected. > > (Note I use parallel build and parallel-install flags, plus --jobs=N. If there is code to compile > then load will shoot up, but simply installing binpackages struggles to get the load over about > 0.7-1.1, so presumably single threaded in all parts?) > > > Now, this is particularly noticeable where I cheated to build my arm install and just used qemu > user-mode on an amd64 host (rather than using cross-compile). Here it's very noticeable that the > install/merge phase of the build is consuming much/most of the install time. > > eg, random example (under qemu user mode) I think perhaps a simpler test is to use qmerge (from portage-utils)? If you can use emerge (e.g. in --pretend mode) to generate a package list to merge; you can simply merge them with qmerge. I suspect qmerge will both (a) be faster and (b) be less safe than emerge; as emerge is doing a bunch of extra work you may or may not care about. You can also consider running N qmerge's (again less sure how safe this is; as the writes by qmerge may be racy.) Note again that this speed may not come for free and you may end up with a corrupt image afterwards. I'm not sure if folks are running qmerge in production like this (maybe others on the list have experience.) > > # time ROOT=/tmp/timetest emerge -1k --nodeps openssl > > >>> Emerging binary (1 of 1) dev-libs/openssl-1.1.1k-r1::gentoo for /tmp/timetest/ > ... > real 0m30.145s > user 0m29.066s > sys 0m1.685s > > > Running the same on the native host is about 5-6sec, (and I find this ratio fairly consistent for > qemu usermode, about 5-6x slower than native) > > If I pick another package with fewer files, then I will see this 5-6 secs drop, suggesting (without > offering proof) that the bulk of the time here is some "per file" processing. > > Note this machine is a 12 core AMD ryzen 3900x with SSDs that bench around the 4GB/s+. So really 5-6 > seconds to install a few files is relatively "slow". Random benchmark on this machine might be that > I can backup 4.5GB of chroot with tar+zstd in about 4 seconds. > > > So the question is: I assume that further parallelisation of the install phase will be difficult, > therefore the low hanging fruit here seems to be the install/merge phase and why there seems to be > quite a bit of CPU "per file installed"? Can anyone give me a leg up on how I could benchmark this > further and look for the hotspot? Perhaps someone understand the architecture of this point more > intimately and could point at whether there are opportunities to do some of the processing on mass, > rather than per file? > > I'm not really a python guru, but interested to poke further to see where the time is going. > > > Many thanks > > Ed W > > > > >