From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id A3D5F1396D9 for ; Sun, 5 Nov 2017 21:10:44 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id BBB25E0E92; Sun, 5 Nov 2017 21:10:38 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 6192FE0DA6 for ; Sun, 5 Nov 2017 21:10:38 +0000 (UTC) Received: from pomiot (d202-252.icpnet.pl [109.173.202.252]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mgorny) by smtp.gentoo.org (Postfix) with ESMTPSA id 0647E3416EE; Sun, 5 Nov 2017 21:10:35 +0000 (UTC) Message-ID: <1509916232.21193.19.camel@gentoo.org> Subject: Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files From: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= To: gentoo-dev@lists.gentoo.org Date: Sun, 05 Nov 2017 22:10:32 +0100 In-Reply-To: References: <1509048745.18656.6.camel@gentoo.org> <1509649919.21210.12.camel@gentoo.org> Organization: Gentoo Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.24.5 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Archives-Salt: b18c9f2b-c1d7-48e5-a1ea-a5d0de383bc4 X-Archives-Hash: b4678417f6265c3b08d58df60a7a6bcf W dniu czw, 02.11.2017 o godzinie 23∶43 +0000, użytkownik Robin H. Johnson napisał: > On Thu, Nov 02, 2017 at 08:11:59PM +0100, Michał Górny wrote: > > Next version. Now without MISC/OPTIONAL, and with many clarifications. > > Huge improvements in this version, I found it much easier to understand. > > Nits: > - please stick to ASCII ellipsis. The unicode ellipsis is unreadable in > some monospace fonts. Done. Also replaced '—' for consistency. > > Further items inline: > > Directory tree coverage > > ----------------------- > > ... > > The file entries (except for ``IGNORE``) can be specified for regular > > files only. Symbolic links are followed when opening files > > and traversing directories. It is an error to specify an entry for > > a different file type. If the tree contain files of other types > > that are not otherwise ignored, they need to be covered by an explicit > > ``IGNORE``. > > > > All the local (non-``DIST``) files covered by a Manifest tree must > > reside on the same filesystem. It is an error to specify entries > > applying to files on another filesystem. If subdirectories > > that are not otherwise ignored reside on a different filesystem, they > > must be explicitly excluded via ``IGNORE``. > > I would prefer this to say: > 'If files that are not otherwise ignored reside on a different > filesystem', as expanded from sub-directories. > This implicitly forbids following a symlink that crosses a filesystem > boundary, and then matches the similar part of 'Tree layout > restrictions'. I've went for something even more explicit: | If files or directories that are not otherwise ignored reside | on a different filesystem, or symbolic links point to targets | on a different filesystem, they must be explicitly excluded | via ``IGNORE``. > > > Rationale > > ========= > > ... > > Tree layout restrictions > > ------------------------ > > > > The algorithm is meant to work primarily with ebuild repositories which > > normally contain only files and directories. Directories provide > > no useful metadata for verification, and specifying special entries > > for additional file types is purposeless. Therefore, the specification > > is restricted to dealing with regular files. > > > > The Gentoo repository does not use symbolic links. Some Gentoo > > repositories do, however. To provide a simple solution for dealing with > > symlinks without having to take care to implement special handling for > > them, the common behavior of implicitly resolving them is used. > > Therefore, symbolic links to files are stored as if they were regular > > files, and symbolic links to directories are followed as if they were > > regular directories. > > > > Dotfiles are implicitly ignored as that is a common notion used > > in software written for POSIX systems. All other common filenames > > require explicit ``IGNORE`` lines. > > 'common' in the second sentence seems odd. What about uncommon > filenames? Maybe just s/other common filenames/other filenames/. Done. The idea was to say 'do not put IGNORE for corner cases which are better handled via PM config' but I guess it's not necessary here. > > > An ability to inject additional ignore entries is provided to account > > for site configuration affecting the repository tree — placing > > additional files in it, skipping some of the categories from syncing. > > Mention that the package manager may provide wildcards or regex in the > additional entries. Eg: 'IGNORE **/metadata.xml' Done. | This configuration can extend beyond the limits of this GLEP, | e.g. by allowing wildcards or regular expressions. > > > Non-strict Manifest verification > > -------------------------------- > > ... > > The cases for stripping unnecessary files mostly focused around space > > savings. For this purpose, stripping ``metadata.xml`` and similar files > > has little value. It is much more common for users to strip whole > > categories which can not be handled via the ``MISC`` type, and needs > > a dedicated package manager mechanism. The same mechanism can also > > handle files that used the ``MISC`` type. > > Exclusion by package does happen as well. A list of categories or > packages can be used for both the rsync exclusion and the IGNORE. Rewritten to: | It is much more common for users to strip whole packages | or categories. The ``MISC`` type is not suitable for that, | and so a dedicated package manager mechanism needs to be developed | instead; possibly combining it with rsync exclusion list. The same | mechanism can also handle files that historically used the ``MISC`` | type. But it's merely a rationale, so I'd rather not spend another hour trying to cover every corner case in it. > > > Splitting distfile checksums from file checksums > > ------------------------------------------------ > > > > Another problem with the current Manifest format is that the checksums > > for fetched files are combined with checksums for local files > > in a single file inside the package directory. It has been specifically > > pointed out that: > > > > - since distfiles are sometimes reused across different packages, > > the repeating checksums are redundant, > > Comment: 8.4% of all DIST entries are duplicate, representing a 2MiB > saving in tree size (25MiB of DIST entries altogether). Included as footnote: .. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries at the time of writing are duplicate, representing a 2 MiB out of 25 MiB of DIST entries altogether. > > > - mirror admins were interested in the possibility of verifying all > > the distfiles with a single tool. > > > > This specification does not provide a clean solution to this problem. > > It technically permits moving ``DIST`` entries to higher-level Manifests > > but the usefulness of such a solution is doubtful. > > This solution would require the packager manager to consider > higher-level Manifests or all Manifests in the tree when searching for > the DIST entry. The most useful implementation of this would be for the > git->rsync process to move all DIST entries elsewhere (metadata/ maybe). Technically speaking, the package manager needs to consider parent Manifests anyway in order to verify the deeper Manifests, and I think we can reasonably assume it will keep them cached. > > Either way, this would have many downsides, and make manual work on the > Manifest DIST entries painful. That's what 'doubtful usefulness' means ;-P. -- Best regards, Michał Górny