From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 2A4B81382C5 for ; Mon, 29 Jan 2018 07:21:24 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id ECB8EE0BED; Mon, 29 Jan 2018 07:21:17 +0000 (UTC) Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 8ECEBE0B58 for ; Mon, 29 Jan 2018 07:21:17 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 3B5A8335C85 for ; Mon, 29 Jan 2018 07:21:16 +0000 (UTC) Received: (qmail 20246 invoked by uid 10000); 29 Jan 2018 07:21:14 -0000 Date: Mon, 29 Jan 2018 07:21:14 +0000 From: "Robin H. Johnson" To: gentoo-dev@lists.gentoo.org Subject: Re: [gentoo-dev] [News item review] Portage rsync tree verification (v4) Message-ID: References: <1516874667.1833.4.camel@gentoo.org> <1517129917.1270.1.camel@gentoo.org> <1517171431.2109764.1251018832.6F16557B@webmail.messagingengine.com> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="vIq7vvlOcsOjFaxi" Content-Disposition: inline In-Reply-To: <1517171431.2109764.1251018832.6F16557B@webmail.messagingengine.com> User-Agent: Mutt/1.8.2 (2017-04-18) X-Archives-Salt: ddffe90d-586d-4a0a-9635-7371585a659c X-Archives-Hash: 21ea0e4da27cdc591bf40d1b7e2e7ebc --vIq7vvlOcsOjFaxi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jan 28, 2018 at 09:30:31PM +0100, Andrew Barchuk wrote: > Hi everyone, >=20 > > three possible solutions for splitting distfiles were listed: > There's another option to use character ranges for each directory > computed in a way to have the files distributed evenly. One way to do > that is to use filename prefix of dynamic length so that each range > holds the same number of files. E.g. we would have Ab/, Ap/, Ar/ but > texlive-module-te/, texlive-module-th/, texlive-module-ti/. A similar > but simpler option is to use file names as range bounds (the same way > dictionaries use words to demarcate page bounds): each directory will > have a name of the first file located inside. This way files will be > distributed evenly and it's still easy to pick a correct directory where > a file will be located manually. This was discussed early on, but thank you for the reminder, as it got dropped from later discussions. > [snip code] > Using the approach above the files will distributed evenly among the > directories keeping the possibility to determine the directory for a > specific file by hand. It's possible if necessary to keep the directory > structure unchanged for very long time and it will likely stay > well-balanced. Picking a directory for a file is very cheap. The only > obvious downside I see is that it's necessary to know list of > directories to pick the correct one (can be mitigated by caching the > list of directories if important). If it's desirable to make directory > names shorter or to look less like file names it's fairly easy to > achieve by keeping only unique prefixes of directories. For example: As for the problem you describe, one of the requirements in the discussion is that given ONLY the file or filename, and NOTHING ELSE, it should be possible to determine where in a hierarchy it should go. No prior knowledge about the hierarchy was permitted. Some parties might answer that you just need an index file then, but that means you have to keep the index file in sync often. It's a superbly readable result (in the general class of perfect hashes based on lots of well-known input). The class of solution suffers another problem in addition the one you noted: if input changes sufficiently, then rebalancing is expensive/hard. As a concrete example, say we add a new category for something something with lots of common prefixes in distfiles.=20 dev-scratch/ as an example, where all distfiles start with 'scratch-'. Unless we know up-front that we're going to add a thousand distfiles here (not unreasonable, dev-python is ~1800 packages), they might start by going into the 'sc' directory, but later we want them to be in 'scratch', as the tree is unweighted otherwise. --=20 Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 --vIq7vvlOcsOjFaxi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it. iQKTBAEBCgB9FiEEveu2pS8Vb98xaNkRGTlfI8WIJsQFAlpuy2lfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEJE RUJCNkE1MkYxNTZGREYzMTY4RDkxMTE5Mzk1RjIzQzU4ODI2QzQACgkQGTlfI8WI JsRXWw/+OFiMhE6hK6gNvxETu5ioX+N4n1hZ5t1YQuU0BQEeEs/FcKw0HqiPkJyH RS+0V41umiXAcnOs3zjzzR1iHcbVI1kSWfFA+0FQBRyA061paQFyMufP1fjK7oHz gKK8yIOeMwvAx+4dfLBJENbWz0+Y6hjU9HWDl9WBbyvqb7KGa26lCXfBt3CLJo5c OLb0C8MX9iou54P/I3kl7bTp1bFZ6G6z/MptcP+GhI77+f2eQwWb+beCekHCfStT ixXSMO/3m3+rNAbczGo+bpnFYQb10AXREs4rLXbERu1ysNokykA/gvP4ngYJwV59 XVZyoDzPw+O/HzIT72uPf1haQvO7Xho0qGB89Xk9UoYPjovHVd5IsGn25Op+G56q PgFIsyjFrzFoQ/JLf7R4GLgiVf8nrXtzBB+qF4wENV1msptuPKEp6mHXXObMSK9N CNu9VrMF8F/djJXM7APYAAJ0QM1XrDW0c/UADUWmyM1rGlehnls0sqnLbVnGtlWQ TLTG0vRSu4cwsSCOktLPb4JaCdaddd/occxKhmteEyrf5ZR2CDWdpsMeNLGbEt5+ 6dGoqhEGMjWiyhA9t03eaj4Xko7UbFhQ2ja2kbgNy8sbdrLq31el6l43cp9Y6TR1 BDlz8TI7+nLoCg5J0tfqkPd8QGXlBz/quJMAOABqG1cSqC1Bmgs= =dd/g -----END PGP SIGNATURE----- --vIq7vvlOcsOjFaxi--