public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
@ 2017-10-26 20:12 Michał Górny
  2017-10-26 21:58 ` Roy Bamford
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Michał Górny @ 2017-10-26 20:12 UTC (permalink / raw
  To: gentoo-dev

Hi, everyone.

After a week of hard work, I'd like to request your comments
on the draft of GLEP 74. This GLEP aims to replace the old tree-signing
GLEPs 58 and 60 with a superior implementation and more complete
specification.

The original tree-signing GLEPs were accepted a few years back but they
have never been implemented. This specification, on the other hand,
comes with a working reference implementation for the verification
algorithm. I expect to finish the update/generation part in a few days,
then work on additional optimizations (threading, incremental
verification, incremental updates).

ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
impl: https://github.com/mgorny/gemato/

Full text following for inline comments.


---
GLEP: 74
Title: Full-tree verification using Manifest files
Author: Michał Górny <mgorny@gentoo.org>
Type: Standards Track
Status: Draft
Version: 1
Created: 2017-10-21
Last-Modified: 2017-10-26
Post-History: 2017-10-26
Content-Type: text/x-rst
Requires: 59, 61
Replaces: 44, 58, 60
---

Abstract
========

This GLEP extends the Manifest file format to cover full-tree file
integrity and authenticity checks.The format aims to be future-proof,
efficient and provide means of backwards compatibility.


Motivation
==========

The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
means of verifying the integrity of distfiles and package files
in Gentoo. Combined with OpenPGP signatures, they provide means to
ensure the authenticity of the covered files. However, as noted
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
authenticity verification as they do not cover any files outside
the package directory. In particular, they provide multiple ways
for a third party to inject malicious code into the ebuild environment.

Historically, the topic of providing authenticity coverage for the whole
repository has been mentioned multiple times. The most noteworthy effort
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
They were accepted by the Council in 2010 but have never been
implemented. When potential implementation work started in 2017, a new
discussion about the specification arose. It prompted the creation
of a competing GLEP that would provide a redesigned alternative to
the old GLEPs.

This specification is designed with the following goals in mind:

1. It should provide means to ensure the authenticity of the complete
   repository, including preventing the injection of additional files.

2. Alike the original Manifest2, the files should be split into two
   groups — files whose authenticity is critical, and those whose
   mismatch may be accepted in non-strict mode. The same classification
   should apply both to files listed in Manifests, and to stray files
   present only in the repository.

3. The format should be universal enough to work both for the Gentoo
   repository and third-party repositories of different characteristics.

4. The Manifest files should be verifiable stand-alone, that is without
   knowing any details about the underlying repository format.


Specification
=============

Manifest file format
--------------------

This specification reuses and extends the Manifest file format defined
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
repurposed as a generic *tag* that could also indicate additional
(non-checksum) metadata. Appropriately, those tags can be followed by
other space-separated values.

Unless specified otherwise, the paths used in the Manifest files
are relative to the directory containing the Manifest file. The paths
must not reference the parent directory (``..``).


Manifest file locations and nesting
-----------------------------------

The ``Manifest`` file located in the root directory of the repository
is called top-level Manifest, and it is used to perform the full-tree
verification. In order to verify the authenticity, it must be signed
using OpenPGP, using the armored cleartext format.

The top-level Manifest may reference sub-Manifests contained
in subdirectories of the repository. The sub-Manifests are traditionally
named ``Manifest``; however, the implementation must support arbitrary
names, including the possibility of multiple (split) Manifests
for a single directory. The sub-Manifest can only cover the files inside
the directory tree where it resides.

The sub-Manifest can also be signed using OpenPGP armored cleartext
format. However, the signature verification can be omitted if it is
covered by a signed top-level Manifest.

The Manifest files can also specify ``IGNORE`` entries to skip Manifest
verification of subdirectories and/or files. Files and directories
starting with a dot are always implicitly ignored. All files that
are not ignored must be covered by at least one of the Manifests.

A single file may be matched by multiple identical or equivalent
Manifest entries, if and only if the entries have the same semantics,
specify the same size and the checksums common to both entries match.
It is an error for a single file to be matched by multiple entries
of different semantics, file size or checksum values. It is an error
to specify another entry for a file matching ``IGNORE``, or one of its
subdirectories.

The file entries (except for ``IGNORE``) can be specified for regular
files only. Symbolic links are followed when opening files. It is
an error to specify an entry for a different file type.

All the files covered by a Manifest tree must reside on the same
filesystem. It is an error to specify entries applying to files
on another filesystem. If subdirectories of the Manifest tree reside
on a different filesystem, they must be explicitly excluded
via ``IGNORE``.


File verification
-----------------

When verifying a file against the Manifest, the following rules are
used:

- if a file listed in Manifest is not present, then the verification
  for the file fails,

- if a file listed in Manifest is present but has a different size
  or one of the checksums does not match, the verification fails,

- if a file is present but not listed in Manifest, the verification
  fails,

- otherwise, the verification succeeds.

Unless specified otherwise, the package manager must not allow using
any files for which the verification failed. The package manager may
reject any package or even the whole repository if it may refer to files
for which the verification failed.


New Manifest tags
-----------------

The Manifest files can specify the following tags:

``TIMESTAMP <iso8601>``
  Specifies a timestamp of when the Manifest file was last updated.
  The timestamp must be a valid second-precision ISO8601 extended format
  combined date and time in UTC timezone, i.e. using the following
  ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
  in the top-level Manifest file. The package manager can use it
  to detect an outdated repository checkout.

``MANIFEST <path> <size> <checksums>…``
  Specifies a sub-Manifest. The sub-Manifest must be verified like
  a regular file. If the verification succeeds, the entries from
  the sub-Manifest are included for verification as described
  in `Manifest file locations and nesting`_.

``IGNORE <path>``
  Ignores a subdirectory or file from Manifest checks. If the specified
  path is present, it and its contents are omitted from the Manifest
  verification (always pass).

``DATA <path> <size> <checksums>…``
  Specifies a file subject to obligatory Manifest verification.
  The file is required to pass verification. Used for all files directly
  affecting package manager operation (ebuilds, eclasses, profiles).

``MISC <path> <size> <checksums>…``
  Specifies a file subject to non-obligatory Manifest verification.
  The package manager may ignore a verification failure if operating
  in non-strict mode. Used for files that do not affect the installed
  packages (``metadata.xml``, ``use.desc``).

``OPTIONAL <path>``
  Specifies a file that would be subject to non-obligatory Manifest
  verification if it existed. The package may ignore a stray file
  matching this entry if operating in non-strict mode. Used for paths
  that would match ``MISC`` if they existed.

``DIST <filename> <size> <checksums>…``
  Specifies a distfile entry used to verify files fetched as part
  of ``SRC_URI``. The filename must match the filename used to store
  the fetched file as specified in the PMS [#PMS-FETCH]_. The package
  manager must reject the fetched file if it fails verification.
  ``DIST`` entries apply to all packages below the Manifest file
  specifying them.


Deprecated Manifest tags
------------------------

For backwards compatibility, the following tags are additionally
allowed at the package directory level:

``EBUILD <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type.

``AUX <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type, except that the filename is relative
  to ``files/`` subdirectory.


Algorithm for full-tree verification
------------------------------------

In order to perform full-tree verification, the following algorithm
can be used:

1. Collect all files present in the repository into *present* set.

2. Start at the top-level Manifest file. Verify its OpenPGP signature.
   Optionally verify the ``TIMESTAMP`` entry if present. Remove
   the top-level Manifest from the *present* set.

3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
   files according to `file verification`_ section, and include their
   entries in the current Manifest entry list (using paths relative
   to directories containing the Manifests).

4. Process all ``IGNORE`` entries. Remove any paths matching them
   from the *present* set.

5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
   ``EBUILD`` and ``AUX`` entries into the *covered* set.

6. Verify all the files in the union of the *present* and *covered*
   sets, according to `file verification`_ section.


Algorithm for finding parent Manifests
--------------------------------------

In order to find the top-level Manifest from the current directory
the following algorithm can be used:

1. Store the current directory as *original* and the device ID
   of the containing filesystem (``st_dev``) as *startdev*,

2. If the device ID of the containing filesystem (``st_dev``)
   of the current directory is different than *startdev*, stop.

3. If the current directory contains a ``Manifest`` file:

   a. If a ``IGNORE`` entry in the ``Manifest`` file covers
      the *original* directory (or one of the parent directories), stop.

   b. Otherwise, store the current directory as *last_found*.

4. If the current directory is the root system directory (``/``), stop.

5. Otherwise, enter the parent directory and jump to step 2.

Once the algorithm stops, *last_found* will contain the relevant
top-level Manifest. If *last_found* is null, then the directory tree
does not contain any valid top-level Manifest candidates and one should
be created in the *original* directory.

Once the top-level Manifest is found, its ``MANIFEST`` entries should
be used to find any sub-Manifests below the top-level Manifest,
up to and including the *original* directory. Note that those
sub-Manifests can use different filenames than ``Manifest``.


Checksum algorithms
-------------------

This section is informational only. Specifying the exact set
of supported algorithms is outside the scope of this specification.

The algorithm names reserved at the time of writing are:

- ``MD5`` [#MD5]_,
- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
- ``SHA1`` [#SHS]_,
- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
- ``WHIRLPOOL`` [#WHIRLPOOL]_,
- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
  [#STREEBOG]_.

The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
It is recommended that any new hashes are named after the Python
``hashlib`` module algorithm names, transformed into uppercase.


Manifest compression
--------------------

The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
This section merely addresses interoperability issues between Manifest
compression and this specification.

The compressed Manifest files are required to be suffixed for their
compression algorithm. This suffix should be used to recognize
the compression and decompress Manifests transparently. The exact list
of algorithms and their corresponding suffixes are outside the scope
of this specification.

Whenever this specification refers to top-level Manifest file,
the implementation should account for compressed variants of this file
with appropriate suffixes (e.g. ``Manifest.gz``).

Whenever this specification refers to sub-Manifests, they can use any
names but are also required to use a specific compression suffix.
The ``MANIFEST`` entries are required to specify the full name including
compression suffix, and the verification is performed on the compressed
file.

The specification permits uncompressed Manifests to exist alongside
their compressed counterparts, and multiple compressed formats
to coexist. If that is the case, the files must have the same
uncompressed content and the specification is free to choose either
of the files using the same base name.


Rationale
=========

Stand-alone format
------------------

The first question that needed to be asked before proceeding with
the design was whether the Manifest file format was supposed to be
stand-alone, or tightly bound to the repository format.

The stand-alone format has been selected because of its three
advantages:

1. It is more future-proof. If an incompatible change to the repository
   format is introduced, only developers need to be upgrade the tools
   they use to generate the Manifests. The tools used to verify
   the updated Manifests will continue to work.

2. It is more flexible and universal. With a dedicated tool,
   the Manifest files can be used to sign and verify arbitrary file
   sets.

3. It keeps the verification tool simpler. In particular, we can easily
   write an independent verification tool that could work on any
   distribution without needing to depend on a package manager
   implementation or rewrite parts of it.

Designing a stand-alone format requires that the Manifest carries enough
information to perform the verification following all the rules specific
to the Gentoo repository.


Tree design
-----------

The second important point of the design was determining whether
the Manifest files should be structured hierarchically, or independent.
Both options have their advantages.

In the hierarchical model, each sub-Manifest file is covered by a higher
level Manifest. As a result, only the top-level Manifest has to be
OpenPGP-signed, and subsequent Manifests need to be only verified by
checksum stored in the parent Manifest. This has the following
implications:

- Verifying any set of files in the repository requires using checksums
  from the most relevant Manifests and the parent Manifests.

- The OpenPGP signature of the top-level Manifest needs to be verified
  only once per process.

- Altering any set of files requires updating the relevant Manifests,
  and their parent Manifests up to the top-level Manifest, and signing
  the last one.

- As a result, the top-level Manifest changes on every commit,
  and various middle-level Manifests change (and need to be transferred)
  frequently.

In the independent model, each sub-Manifest file is independent
of the parent Manifests. As a result, each of them needs to be signed
and verified independently. However, the parent Manifests still need
to list sub-Manifests (albeit without verification data) in order
to detect removal or replacement of subdirectories. This has
the following implications:

- Verifying any set of files in the repository requires using checksums
  and verifying signatures of the most relevant Manifest files.

- Altering any set of files requires updating the relevant Manifests
  and signing them again.

- Parent Manifests are updated only when Manifests are added or removed
  from subdirectories. As a result, they change infrequently.

While both models have their advantages, the hierarchical model was
selected because it reduces the number of OpenPGP operations
which are comparatively costly to the minimum.


Tree layout restrictions
------------------------

The algorithm is meant to work primarily with ebuild repositories which
normally contain only files and directories. Directories provide
no useful metadata for verification, and specifying special entries
for additional file types is purposeless. Therefore, the specification
is restricted to dealing with regular files.

The Gentoo repository does not use symbolic links. Some Gentoo
repositories do, however. To provide a simple solution for dealing with
symlinks without having to take care to implement special handling for
them, the common behavior of implicitly resolving them is used.
Therefore, symbolic links to files are stored as if they were regular
files, and symbolic links to directories are followed as if they were
regular directories.

Dotfiles are implicitly ignored as that is a common notion used
in software written for POSIX systems. All other filenames require
explicit ``IGNORE`` lines.

The algorithm is restricted to work on a single filesystem. This is
mostly relevant when scanning for top-level Manifest — we do not want
to cross filesystem boundaries then. However, to ensure consistent
bidirectional behavior we need to also ban them when operating downwards
the tree.

The directories and files on different filesystems needs to be ignored
explicitly as implicitly skipping them would cause confusion.
In particular, tools might then claim that a file does not exist when
it clearly does because it was skipped due to filesystem boundaries.


File verification model
-----------------------

The verification model aims to provide full coverage against different
forms of attack. In particular, three different kinds of manipulation
are considered:

1. Alteration of the file content.

2. Removal of a file.

3. Addition of a new file.

In order to prevent against all three, the system requires that all
files in the repository are listed in Manifests and verified against
them.

As a special case, ignores are allowed to account for directories
that are not part of the repository but were traditionally placed inside
it. Those directories were ``distfiles``, ``local`` and ``packages``. It
could be also used to ignore VCS directories such as ``CVS``.


Non-obligatory Manifest verification
------------------------------------

While this specification recommends all tools to use strict verification
by default, it allows declaring some files as non-obligatory like
the original Manifest2 format did. This could be used on files that do
not affect the normal package manager operation.

It aims to account for two use cases:

1. Stripping down files that are not strictly required to install
   packages from repository checkouts.

2. Accounting for automatically generated files that might be updated
   by standard tooling.

The traditional ``MISC`` type is amended with a complementary
``OPTIONAL`` tag to account for files that are not provided
in the specific repository. It aims to ensure that the same path would
be non-fatal when provided by the repository but fatal when created
by the user tooling.


Timestamp field
---------------

The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
to include a generation timestamp in the Manifest. A similar feature
was originally proposed in GLEP 58 [#GLEP58]_.

The timestamp can be used to detect delay or replay attacks against
Gentoo mirrors.

Strictly speaking, this is already provided by the various
``metadata/timestamp.*`` files provided already by Gentoo which are also
covered by the Manifest. However, including the value in the Manifest
itself has a little cost and provides the ability to perform
the verification stand-alone.


New vs deprecated tags
----------------------

Out of the four types defined by Manifest2, two are reused and two are
marked deprecated.

The ``DIST`` and ``MISC`` tags are reused since they can be relatively
clearly marked into the new concept.

The ``EBUILD`` tag could potentially be reused for generic file
verification data. However, it would be confusing if all the different
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
type was introduced as a replacement.

The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
the limiting property of implicit ``files/`` path prefix.


Finding top-level Manifest
--------------------------

The development of a reference implementation for this GLEP has brought
the following problem: how to find all the relevant Manifests when
the Manifest tool is run inside a subdirectory of the repository?

One of the options would be to provide a bi-directional linking
of Manifests via a ``PARENT`` tag. However, that would not solve
the problem when a new Manifest file is being created.

Instead, an algorithm for iterating over parent directories is proposed.
Since there is no obligatory explicit indicator for the top-level
Manifest, the algorithm assumes that the top-level Manifest
is the highest ``Manifest`` in the directory hierarchy that can cover
the current directory. This generally makes sense since the Manifest
files are required to provide coverage for all subdirectories, so all
Manifests starting from that one need to be updated.

If independent Manifest trees are nested in the directory structure,
then an ``IGNORE`` entry needs to be used to separate them.

Since sub-Manifests can use any filenames, the Manifest finding
algorithm must not short-cut the procedure by storing all ``Manifest``
files along the parent directories. Instead, it needs to retrace
the relevant sub-Manifest files along ``MANIFEST`` entries
in the top-level Manifest.


Injecting ChangeLogs into the checkout
--------------------------------------

One of the problems considered in the new Manifest format was that
of injecting historical and autogenerated ChangeLog into the repository.
Normally we are not including those files to reduce the checkout size.
However, some users have shown interest in them and Infra is working
on providing them via an additional rsync module.

If such files were injected into the repository, they would cause strict
verification failures of Manifests. To account for this, Infra could
provide either ``OPTIONAL`` entries for the Manifest files to allow them
in non-strict verification mode, or ``IGNORE`` entries to allow them
in the strict mode.


Splitting distfile checksums from file checksums
------------------------------------------------

Another problem with the current Manifest format is that the checksums
for fetched files are combined with checksums for local files
in a single file inside the package directory. It has been specifically
pointed out that:

- since distfiles are sometimes reused across different packages,
  the repeating checksums are redundant,

- mirror admins were interested in the possibility of verifying all
  the distfiles with a single tool.

This specification does not provide a clean solution to this problem.
It technically permits moving ``DIST`` entries to higher-level Manifests
but the usefulness of such a solution is doubtful.

However, for the second problem we will probably deliver a dedicated
tool working with this Manifest format.


Hash algorithms
---------------

While maintaining a consistent supported hash set is important
for interoperability, it is no good fit for the generic layout of this
GLEP. Furthermore, it would require updating the GLEP in the future
every time the used algorithms change.

Instead, the specification focuses on listing the currently used
algorithm names for interoperability, and sets a recommendation
for consistent naming of algorithms in the future. The Python
``hashlib`` module is used as a reference since it is used
as the provider of hash functions for most of the Python software,
including Portage and PkgCore.

The basic rules for changing hash algorithms are defined in GLEP 59
[#GLEP59]_. The implementations can focus only on those algorithms
that are actually used or planned on being used. It may be feasible
to devise a new GLEP that specifies the currently used hashes (or update
GLEP 59 accordingly).


Manifest compression
--------------------

The support for Manifest compression is introduced with minimal changes
to the file format. The ``MANIFEST`` entries are required to provide
the real (compressed) file path for compatibility with other file
entries and to avoid confusion.

The existence of additional entries for uncompressed Manifest checksums
was debated. However, plain entries for the uncompressed file would
be confusing if only compressed file existed, and conflicting if both
uncompressed and compressed variants existed. Furthermore, it has been
pointed out that ``DIST`` entries do not have uncompressed variant
either.


Performance considerations
--------------------------

Performing a full-tree verification on every sync raises some
performance concerns for end-user systems. The initial testing has shown
that a cold-cache verification on a btrfs file system can take up around
4 minutes, with the process being mostly I/O bound. On the other hand,
it can be expected that the verification will be performed directly
after syncing, taking advantage of warm filesystem cache.

To improve speed on I/O and/or CPU-restrained systems even further,
the algorithms can be easily extended to perform incremental
verification. Given that rsync does not preserve mtimes by default,
the tool can take advantage of mtime and Manifest comparisons to recheck
only the parts of the repository that have changed.

Furthermore, the package manager implementations can restrict checking
only to the parts of the repository that are actually being used.


Backwards Compatibility
=======================

This GLEP provides optional means of preserving backwards compatibility.
To preserve the backwards compatibility, the following needs to be
ensured:

- all files within the package directory must be covered by ``Manifest``
  file inside that package directory,

- all distfiles used by the package must be covered by ``Manifest``
  file inside the package directory,

- all files inside the ``files/`` subdirectory of a package directory
  need to be use the deprecated ``AUX`` tag (rather than ``DATA``),

- all ``.ebuild`` files inside the package directory need to use
  the deprecated ``EBUILD`` tag (rather than ``DATA``),

- the Manifest files inside the package directory can be signed
  to provide authenticity verification.

Once the backwards compatibility is no longer a concern, the above
no longer needs to hold and the deprecated tags can be removed.


Reference Implementation
========================

The reference implementation for this GLEP is being developed
as the gemato project [#GEMATO]_.


References
==========

.. [#GLEP44] GLEP 44: Manifest2 format
   (https://www.gentoo.org/glep/glep-0044.html)

.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
   - Overview
   (https://www.gentoo.org/glep/glep-0057.html)

.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
   - Infrastructure to User distribution - MetaManifest
   (https://www.gentoo.org/glep/glep-0058.html)

.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
   (https://www.gentoo.org/glep/glep-0059.html)

.. [#GLEP60] GLEP 60: Manifest2 filetypes
   (https://www.gentoo.org/glep/glep-0060.html)

.. [#GLEP61] GLEP 61: Manifest2 compression
   (https://www.gentoo.org/glep/glep-0061.html)

.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
   Format - SRC_URI
   (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)

.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
   (https://www.ietf.org/rfc/rfc1321.txt)

.. [#RIPEMD160] The hash function RIPEMD-160
   (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)

.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)

.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
   (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)

.. [#BLAKE2] BLAKE2 — fast secure hashing
   (https://blake2.net/)

.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
   and Extendable-Output Functions
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)

.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
   (https://www.streebog.net/)

.. [#GEMATO] gemato: Gentoo Manifest Tool
   (https://github.com/mgorny/gemato/)

Copyright
=========
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.


-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
@ 2017-10-26 21:58 ` Roy Bamford
  2017-10-27  6:22   ` Michał Górny
  2017-10-27 21:05 ` Robin H. Johnson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Roy Bamford @ 2017-10-26 21:58 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

On 2017.10.26 21:12, Michał Górny wrote:
> Hi, everyone.
> 
> After a week of hard work, I'd like to request your comments
> on the draft of GLEP 74. This GLEP aims to replace the old
> tree-signing
> GLEPs 58 and 60 with a superior implementation and more complete
> specification.
> 
> The original tree-signing GLEPs were accepted a few years back but
> they
> have never been implemented. This specification, on the other hand,
> comes with a working reference implementation for the verification
> algorithm. I expect to finish the update/generation part in a few
> days,
> then work on additional optimizations (threading, incremental
> verification, incremental updates).
> 
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
> impl: https://github.com/mgorny/gemato/
> 
> Full text following for inline comments.
> 
[snip lots of hard work]
> 
> -- 
> Best regards,
> Michał Górny
> 
> 
> 

Michał,

Thank you for the hard work.

This GLEP implies that users need to have the entire repository to validate
and authenticate, if I understand it correctly.

For example 
PORTAGE_RSYNC_EXTRA_OPTS="--exclude=<list_of_<package/categories>"
wil still work but the resulting tree could not be authenticaed. as
the top level signature would fail. 

The manifests would still work correctly because they only apply to
the directory containing them. Pruning the repository at 
rsync time will therefore remove the manifents and the files that they cover.

Is that understanding correct?  

-- 
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods

[-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 21:58 ` Roy Bamford
@ 2017-10-27  6:22   ` Michał Górny
  2017-10-28  2:41     ` Dean Stephens
  0 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-10-27  6:22 UTC (permalink / raw
  To: gentoo-dev, Roy Bamford

Dnia 26 października 2017 23:58:53 CEST, Roy Bamford <neddyseagoon@gentoo.org> napisał(a):
>On 2017.10.26 21:12, Michał Górny wrote:
>> Hi, everyone.
>> 
>> After a week of hard work, I'd like to request your comments
>> on the draft of GLEP 74. This GLEP aims to replace the old
>> tree-signing
>> GLEPs 58 and 60 with a superior implementation and more complete
>> specification.
>> 
>> The original tree-signing GLEPs were accepted a few years back but
>> they
>> have never been implemented. This specification, on the other hand,
>> comes with a working reference implementation for the verification
>> algorithm. I expect to finish the update/generation part in a few
>> days,
>> then work on additional optimizations (threading, incremental
>> verification, incremental updates).
>> 
>> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
>> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
>> impl: https://github.com/mgorny/gemato/
>> 
>> Full text following for inline comments.
>> 
>[snip lots of hard work]
>> 
>> -- 
>> Best regards,
>> Michał Górny
>> 
>> 
>> 
>
>Michał,
>
>Thank you for the hard work.
>
>This GLEP implies that users need to have the entire repository to
>validate
>and authenticate, if I understand it correctly.
>
>For example 
>PORTAGE_RSYNC_EXTRA_OPTS="--exclude=<list_of_<package/categories>"
>wil still work but the resulting tree could not be authenticaed. as
>the top level signature would fail. 
>
>The manifests would still work correctly because they only apply to
>the directory containing them. Pruning the repository at 
>rsync time will therefore remove the manifents and the files that they
>cover.
>
>Is that understanding correct?  

Yes. We can't technically distinguish intentional package removal by user from malicious third party stripping them. This is something that a package manager extension might handle but it doesn't belong in the spec.


-- 
Best regards,
Michał Górny (by phone)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
  2017-10-26 21:58 ` Roy Bamford
@ 2017-10-27 21:05 ` Robin H. Johnson
  2017-10-28 11:50   ` Michał Górny
  2017-10-27 21:48 ` Hanno Böck
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Robin H. Johnson @ 2017-10-27 21:05 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 8228 bytes --]

On Thu, Oct 26, 2017 at 10:12:25PM +0200, Michał Górny wrote:
> Hi, everyone.
> 
> After a week of hard work, I'd like to request your comments
> on the draft of GLEP 74. This GLEP aims to replace the old tree-signing
> GLEPs 58 and 60 with a superior implementation and more complete
> specification.
Edits inline, with trimming content.

Very strong proposal, I approve of this replacing my earlier work, as it learns
from the prototypes, failed implementation, and the intervening years of Gentoo
experience.

> 2. Alike the original Manifest2, the files should be split into two
>    groups — files whose authenticity is critical, and those whose
>    mismatch may be accepted in non-strict mode. The same classification
>    should apply both to files listed in Manifests, and to stray files
>    present only in the repository.
nit: s/Alike/Like/, or rewrite the sentence.

> Manifest file locations and nesting
> -----------------------------------
> The ``Manifest`` file located in the root directory of the repository
> is called top-level Manifest, and it is used to perform the full-tree
> verification. In order to verify the authenticity, it must be signed
> using OpenPGP, using the armored cleartext format.
Are detached signatures also permitted (for all levels of Manifest)?
> 
> The Manifest files can also specify ``IGNORE`` entries to skip Manifest
> verification of subdirectories and/or files. Files and directories
> starting with a dot are always implicitly ignored. All files that
> are not ignored must be covered by at least one of the Manifests.
Do we need to keep that implicit ignore rule? Rather, convert it to being
always explicit.

There is only one such file in the rsync checkout presently:
metadata/.checksum-test-marker (see bug #572168, it is used to detect
mis-configured mirrors).

A SVN or Git repo might also have dot-named directories.

> All the files covered by a Manifest tree must reside on the same
> filesystem. It is an error to specify entries applying to files
> on another filesystem. If subdirectories of the Manifest tree reside
> on a different filesystem, they must be explicitly excluded
> via ``IGNORE``.
Distfiles aren't required to be in the same filesystem.

> New Manifest tags
> -----------------
...
> ``IGNORE <path>``
>   Ignores a subdirectory or file from Manifest checks. If the specified
>   path is present, it and its contents are omitted from the Manifest
>   verification (always pass).
Should this be accepted even by strict-mode? Alternatively, should strict mode
require that other content is kept outside of the tree?

> Algorithm for full-tree verification
> ------------------------------------
...
> 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
>    Optionally verify the ``TIMESTAMP`` entry if present. Remove
>    the top-level Manifest from the *present* set.
This spec does not state how the timestamp should be verified. 
Borrow from the original GLEP?

> 4. Process all ``IGNORE`` entries. Remove any paths matching them
>    from the *present* set.
>
> 5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
>    ``EBUILD`` and ``AUX`` entries into the *covered* set.
Clarification request: point out again in this section, that IGNORE entries are
prohibited from also matching any other entry. It is mentioned further up, but
a reminder is good.

> Checksum algorithms
> -------------------
> This section is informational only. Specifying the exact set
> of supported algorithms is outside the scope of this specification.
...
> The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
> It is recommended that any new hashes are named after the Python
> ``hashlib`` module algorithm names, transformed into uppercase.
Would we ever consider algorithm parameters? Yes, outside of this spec, but checking anyway.

> Manifest compression
> --------------------
...
> The specification permits uncompressed Manifests to exist alongside
> their compressed counterparts, and multiple compressed formats
> to coexist. If that is the case, the files must have the same
> uncompressed content and the specification is free to choose either
> of the files using the same base name.
GLEP61, for the transition period, required compressed & uncompressed Manifests
in the same directory to have identical content. Include mention of that here.

Saying that either can be used is a potential issue.

> Tree design
> -----------
...

Add a minor header here, to say this is the end of the 'Tree design' section?
> In the independent model, each sub-Manifest file is independent
> of the parent Manifests. As a result, each of them needs to be signed
> and verified independently. However, the parent Manifests still need
> to list sub-Manifests (albeit without verification data) in order
> to detect removal or replacement of subdirectories. This has
> the following implications:
...

> File verification model
> -----------------------
> 
> The verification model aims to provide full coverage against different
> forms of attack. In particular, three different kinds of manipulation
> are considered:
> ...
Selective denial of syncing was also one of the attacks in the original GLEPs
that was considered. See details re timestamp below.

> Timestamp field
> ---------------
> 
> The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
> to include a generation timestamp in the Manifest. A similar feature
> was originally proposed in GLEP 58 [#GLEP58]_.
> 
> The timestamp can be used to detect delay or replay attacks against
> Gentoo mirrors.
> 
> Strictly speaking, this is already provided by the various
> ``metadata/timestamp.*`` files provided already by Gentoo which are also
> covered by the Manifest. However, including the value in the Manifest
> itself has a little cost and provides the ability to perform
> the verification stand-alone.
There's a critical part of the GLEP58 spec that got missed here:
https://www.gentoo.org/glep/glep-0058.html#timestamps-additional-distribution-of-metamanifest
The timestamp needs to be usable to verify if the mirror is update to date vs
known masters.

The attack being defended against is that local community mirror (or MITM)
isn't deliberately handing them an unmodified but stale copy of the tree.

I do approve of changing the format of the tag; but it still needs to be
linkable to a more verifiable source of truth, 

> Backwards Compatibility
> =======================
> 
> This GLEP provides optional means of preserving backwards compatibility.
> To preserve the backwards compatibility, the following needs to be
> ensured:
> 
> - all files within the package directory must be covered by ``Manifest``
>   file inside that package directory,
This implies that IGNORE entries are NOT permitted to cover any file in
a package directory during the transition period.
> 
> - all distfiles used by the package must be covered by ``Manifest``
>   file inside the package directory,
This implies that non-package-dir DIST entries may be a duplicate of a
package-level DIST during the transition.

> - all files inside the ``files/`` subdirectory of a package directory
>   need to be use the deprecated ``AUX`` tag (rather than ``DATA``),
> 
> - all ``.ebuild`` files inside the package directory need to use
>   the deprecated ``EBUILD`` tag (rather than ``DATA``),
Could we please note here, for the transitional period, that the
file equivalence rule is applicable? 
During the transitional, the package Manifests may contain two entries for a
given file: (DATA, EBUILD) or (DATA, AUX). The MISC type remains the
same.

> - the Manifest files inside the package directory can be signed
>   to provide authenticity verification.
Why do we need to specify this in backwards compat, it's still permitted above.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
  2017-10-26 21:58 ` Roy Bamford
  2017-10-27 21:05 ` Robin H. Johnson
@ 2017-10-27 21:48 ` Hanno Böck
  2017-10-28  2:41   ` Dean Stephens
  2017-10-29 19:07 ` [gentoo-dev] [v1.0.1] " Michał Górny
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Hanno Böck @ 2017-10-27 21:48 UTC (permalink / raw
  To: gentoo-dev

Hi,

On Thu, 26 Oct 2017 22:12:25 +0200
Michał Górny <mgorny@gentoo.org> wrote:

> After a week of hard work, I'd like to request your comments
> on the draft of GLEP 74. This GLEP aims to replace the old
> tree-signing GLEPs 58 and 60 with a superior implementation and more
> complete specification.

Thanks for working on this, it's really one of the biggest security
issues Gentoo has these days that need to be fixed.

I hope I'll find time to read it in detail, but by skimming through it
I noted that the downgrade attack prevention is kinda not very clear.
It says in the timestamp section "The package manager can use it  to
detect an outdated repository checkout." But it doesn't say how exactly.

Should a package manager reject a sync if it is too old? or not install
packages if a sync hasn't happened for some time? What is considered
"outdated"? I think that should be clarified how exactly it's supposed
to work.

-- 
Hanno Böck
https://hboeck.de/

mail/jabber: hanno@hboeck.de
GPG: FE73757FA60E4E21B937579FA5880072BBB51E42


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-27  6:22   ` Michał Górny
@ 2017-10-28  2:41     ` Dean Stephens
  0 siblings, 0 replies; 32+ messages in thread
From: Dean Stephens @ 2017-10-28  2:41 UTC (permalink / raw
  To: gentoo-dev

On 10/27/17 02:22, Michał Górny wrote:
> Yes. We can't technically distinguish intentional package removal by user from malicious third party stripping them. This is something that a package manager extension might handle but it doesn't belong in the spec.
> 
"Implementations may provide mechanisms for verifying partial
repositories or accepting repositories which could not be fully
verified, such mechanisms are outside the scope of this document."

Especially given: "The package manager may reject any package or even
the whole repository if it may refer to files for which the verification
failed."


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-27 21:48 ` Hanno Böck
@ 2017-10-28  2:41   ` Dean Stephens
  2017-10-28  3:27     ` M. J. Everitt
  0 siblings, 1 reply; 32+ messages in thread
From: Dean Stephens @ 2017-10-28  2:41 UTC (permalink / raw
  To: gentoo-dev

On 10/27/17 17:48, Hanno Böck wrote:
> Should a package manager reject a sync if it is too old? or not install
> packages if a sync hasn't happened for some time? What is considered
> "outdated"? I think that should be clarified how exactly it's supposed
> to work.
> 
If such a rejection is to be the default, an override option should be
required as part of the spec. There are use cases where using an "old"
repository would be necessary, even if only temporarily.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28  2:41   ` Dean Stephens
@ 2017-10-28  3:27     ` M. J. Everitt
  2017-10-28  4:43       ` Allan Wegan
  0 siblings, 1 reply; 32+ messages in thread
From: M. J. Everitt @ 2017-10-28  3:27 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1: Type: text/plain, Size: 570 bytes --]

On 28/10/17 03:41, Dean Stephens wrote:
> On 10/27/17 17:48, Hanno Böck wrote:
>> Should a package manager reject a sync if it is too old? or not install
>> packages if a sync hasn't happened for some time? What is considered
>> "outdated"? I think that should be clarified how exactly it's supposed
>> to work.
>>
> If such a rejection is to be the default, an override option should be
> required as part of the spec. There are use cases where using an "old"
> repository would be necessary, even if only temporarily.
>
I_KNOW_WHAT_I_AM_DOING=1

:]



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28  3:27     ` M. J. Everitt
@ 2017-10-28  4:43       ` Allan Wegan
  0 siblings, 0 replies; 32+ messages in thread
From: Allan Wegan @ 2017-10-28  4:43 UTC (permalink / raw
  To: gentoo-dev


[-- Attachment #1.1: Type: text/plain, Size: 1372 bytes --]

On 28.10.2017 05:27, M. J. Everitt wrote:
> On 28/10/17 03:41, Dean Stephens wrote:
>> On 10/27/17 17:48, Hanno Böck wrote:
>>> Should a package manager reject a sync if it is too old? or not install
>>> packages if a sync hasn't happened for some time? What is considered
>>> "outdated"? I think that should be clarified how exactly it's supposed
>>> to work.
>>>
>> If such a rejection is to be the default, an override option should be
>> required as part of the spec. There are use cases where using an "old"
>> repository would be necessary, even if only temporarily.
>>
> I_KNOW_WHAT_I_AM_DOING=1
>
> :]

That is already reserved for disabling the signature checks :P

I would suggest --max-repository-age-days=<value> with <value>
defaulting to as much days as the maximum update intervall of the
repository + 1.

But then the repository actually has to be newly signed at least once
each <value> days to prevent users from getting false positive replay
attack detection errors breaking their update process...



-- 
Allan Wegan
<http://www.allanwegan.de/>
Jabber: allanwegan@ffnord.net
 OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
Jabber: allanwegan@jabber.ccc.de
 OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
ICQ: 209459114
 OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-27 21:05 ` Robin H. Johnson
@ 2017-10-28 11:50   ` Michał Górny
  2017-10-28 12:49     ` Ulrich Mueller
  2017-10-28 18:44     ` Robin H. Johnson
  0 siblings, 2 replies; 32+ messages in thread
From: Michał Górny @ 2017-10-28 11:50 UTC (permalink / raw
  To: gentoo-dev

W dniu pią, 27.10.2017 o godzinie 21∶05 +0000, użytkownik Robin H.
Johnson napisał:
> On Thu, Oct 26, 2017 at 10:12:25PM +0200, Michał Górny wrote:
> > 2. Alike the original Manifest2, the files should be split into two
> >    groups — files whose authenticity is critical, and those whose
> >    mismatch may be accepted in non-strict mode. The same classification
> >    should apply both to files listed in Manifests, and to stray files
> >    present only in the repository.
> 
> nit: s/Alike/Like/, or rewrite the sentence.

Done.

> > Manifest file locations and nesting
> > -----------------------------------
> > The ``Manifest`` file located in the root directory of the repository
> > is called top-level Manifest, and it is used to perform the full-tree
> > verification. In order to verify the authenticity, it must be signed
> > using OpenPGP, using the armored cleartext format.
> 
> Are detached signatures also permitted (for all levels of Manifest)?

I'd say no. Keeping it always contained in a single file is simpler.

> > The Manifest files can also specify ``IGNORE`` entries to skip Manifest
> > verification of subdirectories and/or files. Files and directories
> > starting with a dot are always implicitly ignored. All files that
> > are not ignored must be covered by at least one of the Manifests.
> 
> Do we need to keep that implicit ignore rule? Rather, convert it to being
> always explicit.
> 
> There is only one such file in the rsync checkout presently:
> metadata/.checksum-test-marker (see bug #572168, it is used to detect
> mis-configured mirrors).
> 
> A SVN or Git repo might also have dot-named directories.

I like the implicit idea better as it is more consistent with normal
tool behavior, like 'ls' not listing the files. Dotfiles can be created
by many random tools or even the filesystem (especially in some cases
of overlay filesystems).

That said, the case of 'lost+found' just occurred to me. I suppose this
one we will want to always IGNORE.

> > All the files covered by a Manifest tree must reside on the same
> > filesystem. It is an error to specify entries applying to files
> > on another filesystem. If subdirectories of the Manifest tree reside
> > on a different filesystem, they must be explicitly excluded
> > via ``IGNORE``.
> 
> Distfiles aren't required to be in the same filesystem.

I've updated the sentence to clearly indicate it's about «local (non-
``DIST``) files».

> 
> > New Manifest tags
> > -----------------
> 
> ...
> > ``IGNORE <path>``
> >   Ignores a subdirectory or file from Manifest checks. If the specified
> >   path is present, it and its contents are omitted from the Manifest
> >   verification (always pass).
> 
> Should this be accepted even by strict-mode? Alternatively, should strict mode
> require that other content is kept outside of the tree?

Yes, it should. I'd really prefer if strict mode still worked out-of-
the-box for most of our users without requiring them to do major
reshuffling of their systems.

Plus, see 'lost+found' above.

> > Algorithm for full-tree verification
> > ------------------------------------
> 
> ...
> > 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
> >    Optionally verify the ``TIMESTAMP`` entry if present. Remove
> >    the top-level Manifest from the *present* set.
> 
> This spec does not state how the timestamp should be verified. 
> Borrow from the original GLEP?

Let's try:

| 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
|    Optionally verify the ``TIMESTAMP`` entry if present.
|    If the timestamp is significantly out of date compared to the local
|    clock or a trusted source, halt or require manual intervention
|    from the user. Remove the top-level Manifest from the *present* set.

Maybe it would look better if I split it into sub-points.

> 
> > 4. Process all ``IGNORE`` entries. Remove any paths matching them
> >    from the *present* set.
> > 
> > 5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
> >    ``EBUILD`` and ``AUX`` entries into the *covered* set.
> 
> Clarification request: point out again in this section, that IGNORE entries are
> prohibited from also matching any other entry. It is mentioned further up, but
> a reminder is good.

I've added an extra step:

| 6. Verify the entries in *covered* set for incompatible duplicates
|    and collisions with ignored files as explained in `Manifest file
|    locations and nesting`_.

> 
> > Checksum algorithms
> > -------------------
> > This section is informational only. Specifying the exact set
> > of supported algorithms is outside the scope of this specification.
> 
> ...
> > The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
> > It is recommended that any new hashes are named after the Python
> > ``hashlib`` module algorithm names, transformed into uppercase.
> 
> Would we ever consider algorithm parameters? Yes, outside of this spec, but checking anyway.

I can't say for sure but so far I've went for 'no'. That's why gemato
does not support e.g. SHAKE* algorithms. If we ever decide to do that,
I suppose we can do it inside hash name, e.g. FOO-<param1>-<param2>...

> 
> > Manifest compression
> > --------------------
> 
> ...
> > The specification permits uncompressed Manifests to exist alongside
> > their compressed counterparts, and multiple compressed formats
> > to coexist. If that is the case, the files must have the same
> > uncompressed content and the specification is free to choose either
> > of the files using the same base name.
> 
> GLEP61, for the transition period, required compressed & uncompressed Manifests
> in the same directory to have identical content. Include mention of that here.

Can do. But I'll do it in 'Backwards compatibility' section:

| - if the Manifest files inside the package directory are compressed,
|   a uncompressed file of identical content must coexist.

> Saying that either can be used is a potential issue.

Why? It also says that they must be identical, so it's of no difference
to the implementation which one is used.

> > Tree design
> > -----------
> 
> ...
> 
> Add a minor header here, to say this is the end of the 'Tree design' section?

It's not the end, it's description of the alternative. Both belong
in one section. I could add additional section level but I'd rather
not do that for a single use.

> > In the independent model, each sub-Manifest file is independent
> > of the parent Manifests. As a result, each of them needs to be signed
> > and verified independently. However, the parent Manifests still need
> > to list sub-Manifests (albeit without verification data) in order
> > to detect removal or replacement of subdirectories. This has
> > the following implications:
> 
> ...
> 
> > File verification model
> > -----------------------
> > 
> > The verification model aims to provide full coverage against different
> > forms of attack. In particular, three different kinds of manipulation
> > are considered:
> > ...
> 
> Selective denial of syncing was also one of the attacks in the original GLEPs
> that was considered. See details re timestamp below.

But that's not covered by 'file verification model', is it? So I suppose
it's better to detail it below.

> 
> > Timestamp field
> > ---------------
> > 
> > The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
> > to include a generation timestamp in the Manifest. A similar feature
> > was originally proposed in GLEP 58 [#GLEP58]_.
> > 
> > The timestamp can be used to detect delay or replay attacks against
> > Gentoo mirrors.
> > 
> > Strictly speaking, this is already provided by the various
> > ``metadata/timestamp.*`` files provided already by Gentoo which are also
> > covered by the Manifest. However, including the value in the Manifest
> > itself has a little cost and provides the ability to perform
> > the verification stand-alone.
> 
> There's a critical part of the GLEP58 spec that got missed here:
> https://www.gentoo.org/glep/glep-0058.html#timestamps-additional-distribution-of-metamanifest
> The timestamp needs to be usable to verify if the mirror is update to date vs
> known masters.
> 
> The attack being defended against is that local community mirror (or MITM)
> isn't deliberately handing them an unmodified but stale copy of the tree.
> 
> I do approve of changing the format of the tag; but it still needs to be
> linkable to a more verifiable source of truth, 

I've tried to expand it a bit without getting too specific. New content
for paragraphs 2+:

| A malicious third-party may use the principles of exclusion and replay
| to deny an update to clients, while at the same time recording
| the identity of clients to attack. The timestamp field can be used
| to detect that.
|
| In order to provide a more complete protection, the Gentoo
| Infrastructure should provide an ability to obtain the timestamps
| of all Manifests from a recent timeframe over a secure channel
| for comparison.
|
| Strictly speaking, this is already provided by the various
| ``metadata/timestamp.*`` files provided already by Gentoo which are also
| covered by the Manifest. However, including the value in the Manifest
| itself has a little cost and provides the ability to perform
| the verification stand-alone.

> > Backwards Compatibility
> > =======================
> > 
> > This GLEP provides optional means of preserving backwards compatibility.
> > To preserve the backwards compatibility, the following needs to be
> > ensured:
> > 
> > - all files within the package directory must be covered by ``Manifest``
> >   file inside that package directory,
> 
> This implies that IGNORE entries are NOT permitted to cover any file in
> a package directory during the transition period.

Well, obviously you can't use new tags in those files and rely on they
working correctly.

> > 
> > - all distfiles used by the package must be covered by ``Manifest``
> >   file inside the package directory,
> 
> This implies that non-package-dir DIST entries may be a duplicate of a
> package-level DIST during the transition.

Yes, that's permitted if they're compatible.

> > - all files inside the ``files/`` subdirectory of a package directory
> >   need to be use the deprecated ``AUX`` tag (rather than ``DATA``),
> > 
> > - all ``.ebuild`` files inside the package directory need to use
> >   the deprecated ``EBUILD`` tag (rather than ``DATA``),
> 
> Could we please note here, for the transitional period, that the
> file equivalence rule is applicable? 
> During the transitional, the package Manifests may contain two entries for a
> given file: (DATA, EBUILD) or (DATA, AUX). The MISC type remains the
> same.

Equivalence rule is applicable always. However, there's no point
in duplicating the entry for the same file as that's only going
to increase space use.

> > - the Manifest files inside the package directory can be signed
> >   to provide authenticity verification.
> 
> Why do we need to specify this in backwards compat, it's still permitted above.

But it makes no sense when top-level Manifest is signed. This points out
that for tools not supporting full-tree verification smaller signatures
need to be used (skipping the fact that Portage did not ever implement
it).

Updated the two linked files.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 11:50   ` Michał Górny
@ 2017-10-28 12:49     ` Ulrich Mueller
  2017-10-28 13:23       ` Michał Górny
  2017-10-28 18:44     ` Robin H. Johnson
  1 sibling, 1 reply; 32+ messages in thread
From: Ulrich Mueller @ 2017-10-28 12:49 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]

>>>>> On Sat, 28 Oct 2017, Michał Górny wrote:

>> > The Manifest files can also specify ``IGNORE`` entries to skip
>> > Manifest verification of subdirectories and/or files. Files and
>> > directories starting with a dot are always implicitly ignored.
>> > All files that are not ignored must be covered by at least one
>> > of the Manifests.
>>
>> Do we need to keep that implicit ignore rule? Rather, convert it
>> to being always explicit.
>>
>> There is only one such file in the rsync checkout presently:
>> metadata/.checksum-test-marker (see bug #572168, it is used to
>> detect mis-configured mirrors).
>>
>> A SVN or Git repo might also have dot-named directories.

> I like the implicit idea better as it is more consistent with normal
> tool behavior, like 'ls' not listing the files. Dotfiles can be
> created by many random tools or even the filesystem (especially in
> some cases of overlay filesystems).

Other tools like "find" don't special-case dot-prefixed files though
(in fact, "ls" may well be the exception there).

Implicit ignores only create an unnecessary attack surface. Better
make them explicit, even if this will require adding some entries for
common cases (like .git in the top-level dir).

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 12:49     ` Ulrich Mueller
@ 2017-10-28 13:23       ` Michał Górny
  2017-10-28 13:46         ` Ulrich Mueller
  0 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-10-28 13:23 UTC (permalink / raw
  To: gentoo-dev

W dniu sob, 28.10.2017 o godzinie 14∶49 +0200, użytkownik Ulrich Mueller
napisał:
> > > > > > On Sat, 28 Oct 2017, Michał Górny wrote:
> > > > The Manifest files can also specify ``IGNORE`` entries to skip
> > > > Manifest verification of subdirectories and/or files. Files and
> > > > directories starting with a dot are always implicitly ignored.
> > > > All files that are not ignored must be covered by at least one
> > > > of the Manifests.
> > > 
> > > Do we need to keep that implicit ignore rule? Rather, convert it
> > > to being always explicit.
> > > 
> > > There is only one such file in the rsync checkout presently:
> > > metadata/.checksum-test-marker (see bug #572168, it is used to
> > > detect mis-configured mirrors).
> > > 
> > > A SVN or Git repo might also have dot-named directories.
> > I like the implicit idea better as it is more consistent with normal
> > tool behavior, like 'ls' not listing the files. Dotfiles can be
> > created by many random tools or even the filesystem (especially in
> > some cases of overlay filesystems).
> 
> Other tools like "find" don't special-case dot-prefixed files though
> (in fact, "ls" may well be the exception there).
> 
> Implicit ignores only create an unnecessary attack surface. Better
> make them explicit, even if this will require adding some entries for
> common cases (like .git in the top-level dir).
> 

I dare say it's not an attack surface if tools are explicitly directed
not to use those files. The problem is, you can't predict all possible
dotfiles and even if you do, you're effectively blocking the user from
creating any files for his own use.

Say, if user wanted to use git on top of rsync for his own purposes, why
 would you prevent him from doing that?

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 13:23       ` Michał Górny
@ 2017-10-28 13:46         ` Ulrich Mueller
  2017-10-28 20:55           ` Michał Górny
  0 siblings, 1 reply; 32+ messages in thread
From: Ulrich Mueller @ 2017-10-28 13:46 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1512 bytes --]

>>>>> On Sat, 28 Oct 2017, Michał Górny wrote:

> W dniu sob, 28.10.2017 o godzinie 14∶49 +0200, użytkownik Ulrich Mueller
> napisał:
>> Other tools like "find" don't special-case dot-prefixed files
>> though (in fact, "ls" may well be the exception there).
>>
>> Implicit ignores only create an unnecessary attack surface. Better
>> make them explicit, even if this will require adding some entries
>> for common cases (like .git in the top-level dir).

> I dare say it's not an attack surface if tools are explicitly
> directed not to use those files.

For example, an ebuild can apply all patches from a given directory.
We certainly don't want any unaccounted dot-prefixed files being
injected there. (And yes, globbing shouldn't normally match such
files, but there's at least one eclass setting the dotglob option.)

> The problem is, you can't predict all possible dotfiles and even if
> you do, you're effectively blocking the user from creating any files
> for his own use.

Create files for their own use in random locations in the Gentoo
repository? Why would anyone want to do that?

> Say, if user wanted to use git on top of rsync for his own purposes,
> why would you prevent him from doing that?

As I said before, top-level .git should have an explicit IGNORE entry.

IMHO we should rather stay on the safe side there, unless someone will
speak up who has a concrete workflow where such dot-prefixed files
with unpredictable names are needed.

Ulrich

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 11:50   ` Michał Górny
  2017-10-28 12:49     ` Ulrich Mueller
@ 2017-10-28 18:44     ` Robin H. Johnson
  2017-10-29 18:47       ` Michał Górny
  1 sibling, 1 reply; 32+ messages in thread
From: Robin H. Johnson @ 2017-10-28 18:44 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 5215 bytes --]

On Sat, Oct 28, 2017 at 01:50:46PM +0200, Michał Górny wrote:
> > A SVN or Git repo might also have dot-named directories.
> I like the implicit idea better as it is more consistent with normal
> tool behavior, like 'ls' not listing the files. Dotfiles can be created
> by many random tools or even the filesystem (especially in some cases
> of overlay filesystems).
> 
> That said, the case of 'lost+found' just occurred to me. I suppose this
> one we will want to always IGNORE.
Thought: make the package manager responsible for their own ignore list
in addition to the IGNORE values; and by default it can contain a
partial overlap with the IGNORE manifest entries.
**/.git
/lost+found # ignore at the top-level only
/distfiles # ignore at the top-level only
/packages # ignore at the top-level only
/local # ignore at the top-level only

If users need other values, it's a package-manager config knob.

> Let's try:
> 
> | 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
> |    Optionally verify the ``TIMESTAMP`` entry if present.
> |    If the timestamp is significantly out of date compared to the local
> |    clock or a trusted source, halt or require manual intervention
> |    from the user. Remove the top-level Manifest from the *present* set.
> 
> Maybe it would look better if I split it into sub-points.
Yes, put 'Verifying TIMESTAMP' into a new section as you added below,
including the out-of-date part there; don't detail how to verify it in
this section.

> > GLEP61, for the transition period, required compressed & uncompressed Manifests
> > in the same directory to have identical content. Include mention of that here.
> Can do. But I'll do it in 'Backwards compatibility' section:
> | - if the Manifest files inside the package directory are compressed,
> |   a uncompressed file of identical content must coexist.
> > Saying that either can be used is a potential issue.
> Why? It also says that they must be identical, so it's of no difference
> to the implementation which one is used.
It's safe if the identical requirement is there, and potentially unsafe otherwise.

> > > Tree design
> > > -----------
> > Add a minor header here, to say this is the end of the 'Tree design' section?
> It's not the end, it's description of the alternative. Both belong
> in one section. I could add additional section level but I'd rather
> not do that for a single use.
Hmm, just reads unclear if that should have been a different section.
Not sure if there is a nice way to fix it at all.

> > > Timestamp field
> | A malicious third-party may use the principles of exclusion and replay
> | to deny an update to clients, while at the same time recording
> | the identity of clients to attack. The timestamp field can be used
> | to detect that.
> |
> | In order to provide a more complete protection, the Gentoo
> | Infrastructure should provide an ability to obtain the timestamps
> | of all Manifests from a recent timeframe over a secure channel
> | for comparison.
> |
> | Strictly speaking, this is already provided by the various
> | ``metadata/timestamp.*`` files provided already by Gentoo which are also
> | covered by the Manifest. However, including the value in the Manifest
> | itself has a little cost and provides the ability to perform
> | the verification stand-alone.
Just add in the sentence re trusted source from before, otherwise good.
The rest of this thread devolved into specifics about implementing the
validation; which aren't relevant to this GLEP (yes, telling the package
manager that it's a known old tree, ignore the age only, is a valid use
case).


> > Could we please note here, for the transitional period, that the
> > file equivalence rule is applicable? 
> > During the transitional, the package Manifests may contain two entries for a
> > given file: (DATA, EBUILD) or (DATA, AUX). The MISC type remains the
> > same.
> Equivalence rule is applicable always. However, there's no point
> in duplicating the entry for the same file as that's only going
> to increase space use.
This means that new verification tools (beyond Gemato) need to handle
the legacy types for the moment, and can't just skip them (eg if both
entries existed).

> > > - the Manifest files inside the package directory can be signed
> > >   to provide authenticity verification.
> > Why do we need to specify this in backwards compat, it's still permitted above.
> But it makes no sense when top-level Manifest is signed. This points out
> that for tools not supporting full-tree verification smaller signatures
> need to be used (skipping the fact that Portage did not ever implement
> it).
The Manifests might not be signed by the same entity.
/metadata/glsa/Manifest might be signed by the security team, 
/sec-policy/Manifest might be signed by SELinux team, 
/Manifest should STILL be signed by Infra/tree-generation-process.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 13:46         ` Ulrich Mueller
@ 2017-10-28 20:55           ` Michał Górny
  0 siblings, 0 replies; 32+ messages in thread
From: Michał Górny @ 2017-10-28 20:55 UTC (permalink / raw
  To: gentoo-dev

W dniu sob, 28.10.2017 o godzinie 15∶46 +0200, użytkownik Ulrich Mueller
napisał:
> > > > > > On Sat, 28 Oct 2017, Michał Górny wrote:
> > W dniu sob, 28.10.2017 o godzinie 14∶49 +0200, użytkownik Ulrich Mueller
> > napisał:
> > > Other tools like "find" don't special-case dot-prefixed files
> > > though (in fact, "ls" may well be the exception there).
> > > 
> > > Implicit ignores only create an unnecessary attack surface. Better
> > > make them explicit, even if this will require adding some entries
> > > for common cases (like .git in the top-level dir).
> > I dare say it's not an attack surface if tools are explicitly
> > directed not to use those files.
> 
> For example, an ebuild can apply all patches from a given directory.
> We certainly don't want any unaccounted dot-prefixed files being
> injected there. (And yes, globbing shouldn't normally match such
> files, but there's at least one eclass setting the dotglob option.)

I think that's a really poor argument.

Firstly, the mentioned eclass does it for one command call, and it
doesn't go anywhere near the repository. So no, that doesn't count.

Secondly, someone being able to theoretically cut himself with a spoon
if he only sharpened its edge is no reason to forbid people from having
spoons without explicitly written permission.

> > The problem is, you can't predict all possible dotfiles and even if
> > you do, you're effectively blocking the user from creating any files
> > for his own use.
> 
> Create files for their own use in random locations in the Gentoo
> repository? Why would anyone want to do that?

.DS_Store? ;-)

> > Say, if user wanted to use git on top of rsync for his own purposes,
> > why would you prevent him from doing that?
> 
> As I said before, top-level .git should have an explicit IGNORE entry.

Are we going to supply explicit IGNORE entries for any VCS anyone might
choose to use? Or backup software and any other weird thing?

> IMHO we should rather stay on the safe side there, unless someone will
> speak up who has a concrete workflow where such dot-prefixed files
> with unpredictable names are needed.

I've already mentioned two. The first one were cheap union filesystems
based on FUSE where I'm pretty sure I've seen random dotfiles.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-28 18:44     ` Robin H. Johnson
@ 2017-10-29 18:47       ` Michał Górny
  2017-10-29 20:54         ` Robin H. Johnson
  0 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-10-29 18:47 UTC (permalink / raw
  To: gentoo-dev

W dniu sob, 28.10.2017 o godzinie 18∶44 +0000, użytkownik Robin H.
Johnson napisał:
> On Sat, Oct 28, 2017 at 01:50:46PM +0200, Michał Górny wrote:
> > > A SVN or Git repo might also have dot-named directories.
> > 
> > I like the implicit idea better as it is more consistent with normal
> > tool behavior, like 'ls' not listing the files. Dotfiles can be created
> > by many random tools or even the filesystem (especially in some cases
> > of overlay filesystems).
> > 
> > That said, the case of 'lost+found' just occurred to me. I suppose this
> > one we will want to always IGNORE.
> 
> Thought: make the package manager responsible for their own ignore list
> in addition to the IGNORE values; and by default it can contain a
> partial overlap with the IGNORE manifest entries.
> **/.git
> /lost+found # ignore at the top-level only
> /distfiles # ignore at the top-level only
> /packages # ignore at the top-level only
> /local # ignore at the top-level only
> 
> If users need other values, it's a package-manager config knob.

We don't want pre-EAPI times where things will fail out of the box
unless the user choose the one tool that got the whole list right
and/or configure it to account for default list.

I don't mind package manager providing the ability to ignore additional
entries but the spec should work out of the box too.

> 
> > Let's try:
> > 
> > > 2. Start at the top-level Manifest file. Verify its OpenPGP signature.
> > >    Optionally verify the ``TIMESTAMP`` entry if present.
> > >    If the timestamp is significantly out of date compared to the local
> > >    clock or a trusted source, halt or require manual intervention
> > >    from the user. Remove the top-level Manifest from the *present* set.
> > 
> > Maybe it would look better if I split it into sub-points.
> 
> Yes, put 'Verifying TIMESTAMP' into a new section as you added below,
> including the out-of-date part there; don't detail how to verify it in
> this section.

I will try to do this today.

> > > GLEP61, for the transition period, required compressed & uncompressed Manifests
> > > in the same directory to have identical content. Include mention of that here.
> > 
> > Can do. But I'll do it in 'Backwards compatibility' section:
> > > - if the Manifest files inside the package directory are compressed,
> > >   a uncompressed file of identical content must coexist.
> > > Saying that either can be used is a potential issue.
> > 
> > Why? It also says that they must be identical, so it's of no difference
> > to the implementation which one is used.
> 
> It's safe if the identical requirement is there, and potentially unsafe otherwise.

That's why they're both put in a *single sentence*?

> > > > Timestamp field
> > > 
> > > A malicious third-party may use the principles of exclusion and replay
> > > to deny an update to clients, while at the same time recording
> > > the identity of clients to attack. The timestamp field can be used
> > > to detect that.
> > > 
> > > In order to provide a more complete protection, the Gentoo
> > > Infrastructure should provide an ability to obtain the timestamps
> > > of all Manifests from a recent timeframe over a secure channel
> > > for comparison.
> > > 
> > > Strictly speaking, this is already provided by the various
> > > ``metadata/timestamp.*`` files provided already by Gentoo which are also
> > > covered by the Manifest. However, including the value in the Manifest
> > > itself has a little cost and provides the ability to perform
> > > the verification stand-alone.
> 
> Just add in the sentence re trusted source from before, otherwise good.
> The rest of this thread devolved into specifics about implementing the
> validation; which aren't relevant to this GLEP (yes, telling the package
> manager that it's a known old tree, ignore the age only, is a valid use
> case).

Ok.

> > > Could we please note here, for the transitional period, that the
> > > file equivalence rule is applicable? 
> > > During the transitional, the package Manifests may contain two entries for a
> > > given file: (DATA, EBUILD) or (DATA, AUX). The MISC type remains the
> > > same.
> > 
> > Equivalence rule is applicable always. However, there's no point
> > in duplicating the entry for the same file as that's only going
> > to increase space use.
> 
> This means that new verification tools (beyond Gemato) need to handle
> the legacy types for the moment, and can't just skip them (eg if both
> entries existed).

Which is the easier way forward. Otherwise, we end up having a lot of
duplication during the transition period (which would amount to 2 years
at the very least, and probably a lot more).

> 
> > > > - the Manifest files inside the package directory can be signed
> > > >   to provide authenticity verification.
> > > 
> > > Why do we need to specify this in backwards compat, it's still permitted above.
> > 
> > But it makes no sense when top-level Manifest is signed. This points out
> > that for tools not supporting full-tree verification smaller signatures
> > need to be used (skipping the fact that Portage did not ever implement
> > it).
> 
> The Manifests might not be signed by the same entity.
> /metadata/glsa/Manifest might be signed by the security team, 
> /sec-policy/Manifest might be signed by SELinux team, 
> /Manifest should STILL be signed by Infra/tree-generation-process.

I honestly doubt this will ever happen, and even if it does, it isn't
really relevant to the spec at hand. My point was: if someone signs
the whole repository, he normally will consider it pointless to sign
individual package Manifests. This explains why he might consider it.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [gentoo-dev] [v1.0.1] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
                   ` (2 preceding siblings ...)
  2017-10-27 21:48 ` Hanno Böck
@ 2017-10-29 19:07 ` Michał Górny
  2017-10-29 20:39   ` Robin H. Johnson
  2017-10-30 16:51 ` [gentoo-dev] [v1.0.2] " Michał Górny
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-10-29 19:07 UTC (permalink / raw
  To: gentoo-dev

W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
napisał:
> After a week of hard work, I'd like to request your comments
> on the draft of GLEP 74. This GLEP aims to replace the old tree-signing
> GLEPs 58 and 60 with a superior implementation and more complete
> specification.
> 
> The original tree-signing GLEPs were accepted a few years back but they
> have never been implemented. This specification, on the other hand,
> comes with a working reference implementation for the verification
> algorithm. I expect to finish the update/generation part in a few days,
> then work on additional optimizations (threading, incremental
> verification, incremental updates).
> 
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
> impl: https://github.com/mgorny/gemato/
> 
> Full text following for inline comments.
> 

Here's an updated version based on the feedback so far. Gemato is also
ready for the first public testing, albeit it does not implement Gentoo-
specific rules yet.

---
GLEP: 74
Title: Full-tree verification using Manifest files
Author: Michał Górny <mgorny@gentoo.org>,
        Robin Hugh Johnson <robbat2@gentoo.org>,
        Ulrich Müller <ulm@gentoo.org>
Type: Standards Track
Status: Draft
Version: 1
Created: 2017-10-21
Last-Modified: 2017-10-29
Post-History: 2017-10-26
Content-Type: text/x-rst
Requires: 59, 61
Replaces: 44, 58, 60
---

Abstract
========

This GLEP extends the Manifest file format to cover full-tree file
integrity and authenticity checks.The format aims to be future-proof,
efficient and provide means of backwards compatibility.


Motivation
==========

The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
means of verifying the integrity of distfiles and package files
in Gentoo. Combined with OpenPGP signatures, they provide means to
ensure the authenticity of the covered files. However, as noted
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
authenticity verification as they do not cover any files outside
the package directory. In particular, they provide multiple ways
for a third party to inject malicious code into the ebuild environment.

Historically, the topic of providing authenticity coverage for the whole
repository has been mentioned multiple times. The most noteworthy effort
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
They were accepted by the Council in 2010 but have never been
implemented. When potential implementation work started in 2017, a new
discussion about the specification arose. It prompted the creation
of a competing GLEP that would provide a redesigned alternative to
the old GLEPs.

This specification is designed with the following goals in mind:

1. It should provide means to ensure the authenticity of the complete
   repository, including preventing the injection of additional files.

2. Like the original Manifest2, the files should be split into two
   groups — files whose authenticity is critical, and those whose
   mismatch may be accepted in non-strict mode. The same classification
   should apply both to files listed in Manifests, and to stray files
   present only in the repository.

3. The format should be universal enough to work both for the Gentoo
   repository and third-party repositories of different characteristics.

4. The Manifest files should be verifiable stand-alone, that is without
   knowing any details about the underlying repository format.


Specification
=============

Manifest file format
--------------------

This specification reuses and extends the Manifest file format defined
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
repurposed as a generic *tag* that could also indicate additional
(non-checksum) metadata. Appropriately, those tags can be followed by
other space-separated values.

Unless specified otherwise, the paths used in the Manifest files
are relative to the directory containing the Manifest file. The paths
must not reference the parent directory (``..``).


Manifest file locations and nesting
-----------------------------------

The ``Manifest`` file located in the root directory of the repository
is called top-level Manifest, and it is used to perform the full-tree
verification. In order to verify the authenticity, it must be signed
using OpenPGP, using the armored cleartext format.

The top-level Manifest may reference sub-Manifests contained
in subdirectories of the repository. The sub-Manifests are traditionally
named ``Manifest``; however, the implementation must support arbitrary
names, including the possibility of multiple (split) Manifests
for a single directory. The sub-Manifest can only cover the files inside
the directory tree where it resides.

The sub-Manifest can also be signed using OpenPGP armored cleartext
format. However, the signature verification can be omitted if it is
covered by a signed top-level Manifest.

The Manifest files can also specify ``IGNORE`` entries to skip Manifest
verification of subdirectories and/or files. Files and directories
starting with a dot are always implicitly ignored. All files that
are not ignored must be covered by at least one of the Manifests.

A single file may be matched by multiple identical or equivalent
Manifest entries, if and only if the entries have the same semantics,
specify the same size and the checksums common to both entries match.
It is an error for a single file to be matched by multiple entries
of different semantics, file size or checksum values. It is an error
to specify another entry for a file matching ``IGNORE``, or one of its
subdirectories.

The file entries (except for ``IGNORE``) can be specified for regular
files only. Symbolic links are followed when opening files. It is
an error to specify an entry for a different file type.

All the local (non-``DIST``) files covered by a Manifest tree must
reside on the same filesystem. It is an error to specify entries
applying to files on another filesystem. If subdirectories
of the Manifest tree reside on a different filesystem, they must
be explicitly excluded via ``IGNORE``.


File verification
-----------------

When verifying a file against the Manifest, the following rules are
used:

- if a file listed in Manifest is not present, then the verification
  for the file fails,

- if a file listed in Manifest is present but has a different size
  or one of the checksums does not match, the verification fails,

- if a file is present but not listed in Manifest, the verification
  fails,

- otherwise, the verification succeeds.

Unless specified otherwise, the package manager must not allow using
any files for which the verification failed. The package manager may
reject any package or even the whole repository if it may refer to files
for which the verification failed.


New Manifest tags
-----------------

The Manifest files can specify the following tags:

``TIMESTAMP <iso8601>``
  Specifies a timestamp of when the Manifest file was last updated.
  The timestamp must be a valid second-precision ISO8601 extended format
  combined date and time in UTC timezone, i.e. using the following
  ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
  in the top-level Manifest file. The package manager can use it
  to detect an outdated repository checkout as described in `Timestamp
  verification`_.

``MANIFEST <path> <size> <checksums>…``
  Specifies a sub-Manifest. The sub-Manifest must be verified like
  a regular file. If the verification succeeds, the entries from
  the sub-Manifest are included for verification as described
  in `Manifest file locations and nesting`_.

``IGNORE <path>``
  Ignores a subdirectory or file from Manifest checks. If the specified
  path is present, it and its contents are omitted from the Manifest
  verification (always pass).

``DATA <path> <size> <checksums>…``
  Specifies a file subject to obligatory Manifest verification.
  The file is required to pass verification. Used for all files directly
  affecting package manager operation (ebuilds, eclasses, profiles).

``MISC <path> <size> <checksums>…``
  Specifies a file subject to non-obligatory Manifest verification.
  The package manager may ignore a verification failure if operating
  in non-strict mode. Used for files that do not affect the installed
  packages (``metadata.xml``, ``use.desc``).

``OPTIONAL <path>``
  Specifies a file that would be subject to non-obligatory Manifest
  verification if it existed. The package may ignore a stray file
  matching this entry if operating in non-strict mode. Used for paths
  that would match ``MISC`` if they existed.

``DIST <filename> <size> <checksums>…``
  Specifies a distfile entry used to verify files fetched as part
  of ``SRC_URI``. The filename must match the filename used to store
  the fetched file as specified in the PMS [#PMS-FETCH]_. The package
  manager must reject the fetched file if it fails verification.
  ``DIST`` entries apply to all packages below the Manifest file
  specifying them.


Deprecated Manifest tags
------------------------

For backwards compatibility, the following tags are additionally
allowed at the package directory level:

``EBUILD <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type.

``AUX <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type, except that the filename is relative
  to ``files/`` subdirectory.


Timestamp verification
----------------------

The Manifest file can contain a ``TIMESTAMP`` entry to account
for attacks against tree update distribution. If such an entry
is present, it should be updated every time at least one
of the Manifests changes. Every unique timestamp value must correspond
to a single tree state.

During the verification process, the client should compare the timestamp
against the update time obtained from a local clock or a trusted time
source. If the comparison result indicates that the Manifest at the time
of receiving was already significantly outdated, the client should
either fail the verification or require manual confirmation from user.

Furthermore, the Manifest provider may employ additional methods
of distributing the timestamps of recently generated Manifests
using a secure channel from a trusted source for exact comparison.
The exact details of such a solution are outside the scope of this
specification.


Algorithm for full-tree verification
------------------------------------

In order to perform full-tree verification, the following algorithm
can be used:

1. Collect all files present in the repository into *present* set.

2. Start at the top-level Manifest file. Verify its OpenPGP signature.
   Optionally verify the ``TIMESTAMP`` entry if present as specified
   in `timestamp verification`. Remove the top-level Manifest
   from the *present* set.

3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
   files according to `file verification`_ section, and include their
   entries in the current Manifest entry list (using paths relative
   to directories containing the Manifests).

4. Process all ``IGNORE`` entries. Remove any paths matching them
   from the *present* set.

5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
   ``EBUILD`` and ``AUX`` entries into the *covered* set.

6. Verify the entries in *covered* set for incompatible duplicates
   and collisions with ignored files as explained in `Manifest file
   locations and nesting`_.

7. Verify all the files in the union of the *present* and *covered*
   sets, according to `file verification`_ section.


Algorithm for finding parent Manifests
--------------------------------------

In order to find the top-level Manifest from the current directory
the following algorithm can be used:

1. Store the current directory as *original* and the device ID
   of the containing filesystem (``st_dev``) as *startdev*,

2. If the device ID of the containing filesystem (``st_dev``)
   of the current directory is different than *startdev*, stop.

3. If the current directory contains a ``Manifest`` file:

   a. If a ``IGNORE`` entry in the ``Manifest`` file covers
      the *original* directory (or one of the parent directories), stop.

   b. Otherwise, store the current directory as *last_found*.

4. If the current directory is the root system directory (``/``), stop.

5. Otherwise, enter the parent directory and jump to step 2.

Once the algorithm stops, *last_found* will contain the relevant
top-level Manifest. If *last_found* is null, then the directory tree
does not contain any valid top-level Manifest candidates and one should
be created in the *original* directory.

Once the top-level Manifest is found, its ``MANIFEST`` entries should
be used to find any sub-Manifests below the top-level Manifest,
up to and including the *original* directory. Note that those
sub-Manifests can use different filenames than ``Manifest``.


Checksum algorithms
-------------------

This section is informational only. Specifying the exact set
of supported algorithms is outside the scope of this specification.

The algorithm names reserved at the time of writing are:

- ``MD5`` [#MD5]_,
- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
- ``SHA1`` [#SHS]_,
- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
- ``WHIRLPOOL`` [#WHIRLPOOL]_,
- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
  [#STREEBOG]_.

The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
It is recommended that any new hashes are named after the Python
``hashlib`` module algorithm names, transformed into uppercase.


Manifest compression
--------------------

The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
This section merely addresses interoperability issues between Manifest
compression and this specification.

The compressed Manifest files are required to be suffixed for their
compression algorithm. This suffix should be used to recognize
the compression and decompress Manifests transparently. The exact list
of algorithms and their corresponding suffixes are outside the scope
of this specification.

Whenever this specification refers to top-level Manifest file,
the implementation should account for compressed variants of this file
with appropriate suffixes (e.g. ``Manifest.gz``).

Whenever this specification refers to sub-Manifests, they can use any
names but are also required to use a specific compression suffix.
The ``MANIFEST`` entries are required to specify the full name including
compression suffix, and the verification is performed on the compressed
file.

The specification permits uncompressed Manifests to exist alongside
their compressed counterparts, and multiple compressed formats
to coexist. If that is the case, the files must have the same
uncompressed content and the specification is free to choose either
of the files using the same base name.


Rationale
=========

Stand-alone format
------------------

The first question that needed to be asked before proceeding with
the design was whether the Manifest file format was supposed to be
stand-alone, or tightly bound to the repository format.

The stand-alone format has been selected because of its three
advantages:

1. It is more future-proof. If an incompatible change to the repository
   format is introduced, only developers need to be upgrade the tools
   they use to generate the Manifests. The tools used to verify
   the updated Manifests will continue to work.

2. It is more flexible and universal. With a dedicated tool,
   the Manifest files can be used to sign and verify arbitrary file
   sets.

3. It keeps the verification tool simpler. In particular, we can easily
   write an independent verification tool that could work on any
   distribution without needing to depend on a package manager
   implementation or rewrite parts of it.

Designing a stand-alone format requires that the Manifest carries enough
information to perform the verification following all the rules specific
to the Gentoo repository.


Tree design
-----------

The second important point of the design was determining whether
the Manifest files should be structured hierarchically, or independent.
Both options have their advantages.

In the hierarchical model, each sub-Manifest file is covered by a higher
level Manifest. As a result, only the top-level Manifest has to be
OpenPGP-signed, and subsequent Manifests need to be only verified by
checksum stored in the parent Manifest. This has the following
implications:

- Verifying any set of files in the repository requires using checksums
  from the most relevant Manifests and the parent Manifests.

- The OpenPGP signature of the top-level Manifest needs to be verified
  only once per process.

- Altering any set of files requires updating the relevant Manifests,
  and their parent Manifests up to the top-level Manifest, and signing
  the last one.

- As a result, the top-level Manifest changes on every commit,
  and various middle-level Manifests change (and need to be transferred)
  frequently.

In the independent model, each sub-Manifest file is independent
of the parent Manifests. As a result, each of them needs to be signed
and verified independently. However, the parent Manifests still need
to list sub-Manifests (albeit without verification data) in order
to detect removal or replacement of subdirectories. This has
the following implications:

- Verifying any set of files in the repository requires using checksums
  and verifying signatures of the most relevant Manifest files.

- Altering any set of files requires updating the relevant Manifests
  and signing them again.

- Parent Manifests are updated only when Manifests are added or removed
  from subdirectories. As a result, they change infrequently.

While both models have their advantages, the hierarchical model was
selected because it reduces the number of OpenPGP operations
which are comparatively costly to the minimum.


Tree layout restrictions
------------------------

The algorithm is meant to work primarily with ebuild repositories which
normally contain only files and directories. Directories provide
no useful metadata for verification, and specifying special entries
for additional file types is purposeless. Therefore, the specification
is restricted to dealing with regular files.

The Gentoo repository does not use symbolic links. Some Gentoo
repositories do, however. To provide a simple solution for dealing with
symlinks without having to take care to implement special handling for
them, the common behavior of implicitly resolving them is used.
Therefore, symbolic links to files are stored as if they were regular
files, and symbolic links to directories are followed as if they were
regular directories.

Dotfiles are implicitly ignored as that is a common notion used
in software written for POSIX systems. All other filenames require
explicit ``IGNORE`` lines.

The algorithm is restricted to work on a single filesystem. This is
mostly relevant when scanning for top-level Manifest — we do not want
to cross filesystem boundaries then. However, to ensure consistent
bidirectional behavior we need to also ban them when operating downwards
the tree.

The directories and files on different filesystems needs to be ignored
explicitly as implicitly skipping them would cause confusion.
In particular, tools might then claim that a file does not exist when
it clearly does because it was skipped due to filesystem boundaries.


File verification model
-----------------------

The verification model aims to provide full coverage against different
forms of attack. In particular, three different kinds of manipulation
are considered:

1. Alteration of the file content.

2. Removal of a file.

3. Addition of a new file.

In order to prevent against all three, the system requires that all
files in the repository are listed in Manifests and verified against
them.

As a special case, ignores are allowed to account for directories
that are not part of the repository but were traditionally placed inside
it. Those directories were ``distfiles``, ``local`` and ``packages``. It
could be also used to ignore VCS directories such as ``CVS``.


Non-obligatory Manifest verification
------------------------------------

While this specification recommends all tools to use strict verification
by default, it allows declaring some files as non-obligatory like
the original Manifest2 format did. This could be used on files that do
not affect the normal package manager operation.

It aims to account for two use cases:

1. Stripping down files that are not strictly required to install
   packages from repository checkouts.

2. Accounting for automatically generated files that might be updated
   by standard tooling.

The traditional ``MISC`` type is amended with a complementary
``OPTIONAL`` tag to account for files that are not provided
in the specific repository. It aims to ensure that the same path would
be non-fatal when provided by the repository but fatal when created
by the user tooling.


Timestamp field
---------------

The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
to include a generation timestamp in the Manifest. A similar feature
was originally proposed in GLEP 58 [#GLEP58]_.

A malicious third-party may use the principles of exclusion and replay
to deny an update to clients, while at the same time recording
the identity of clients to attack. The timestamp field can be used
to detect that.

In order to provide a more complete protection, the Gentoo
Infrastructure should provide an ability to obtain the timestamps
of all Manifests from a recent timeframe over a secure channel
from a trusted source for comparison.

Strictly speaking, this is already provided by the various
``metadata/timestamp.*`` files provided already by Gentoo which are also
covered by the Manifest. However, including the value in the Manifest
itself has a little cost and provides the ability to perform
the verification stand-alone.


New vs deprecated tags
----------------------

Out of the four types defined by Manifest2, two are reused and two are
marked deprecated.

The ``DIST`` and ``MISC`` tags are reused since they can be relatively
clearly marked into the new concept.

The ``EBUILD`` tag could potentially be reused for generic file
verification data. However, it would be confusing if all the different
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
type was introduced as a replacement.

The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
the limiting property of implicit ``files/`` path prefix.


Finding top-level Manifest
--------------------------

The development of a reference implementation for this GLEP has brought
the following problem: how to find all the relevant Manifests when
the Manifest tool is run inside a subdirectory of the repository?

One of the options would be to provide a bi-directional linking
of Manifests via a ``PARENT`` tag. However, that would not solve
the problem when a new Manifest file is being created.

Instead, an algorithm for iterating over parent directories is proposed.
Since there is no obligatory explicit indicator for the top-level
Manifest, the algorithm assumes that the top-level Manifest
is the highest ``Manifest`` in the directory hierarchy that can cover
the current directory. This generally makes sense since the Manifest
files are required to provide coverage for all subdirectories, so all
Manifests starting from that one need to be updated.

If independent Manifest trees are nested in the directory structure,
then an ``IGNORE`` entry needs to be used to separate them.

Since sub-Manifests can use any filenames, the Manifest finding
algorithm must not short-cut the procedure by storing all ``Manifest``
files along the parent directories. Instead, it needs to retrace
the relevant sub-Manifest files along ``MANIFEST`` entries
in the top-level Manifest.


Injecting ChangeLogs into the checkout
--------------------------------------

One of the problems considered in the new Manifest format was that
of injecting historical and autogenerated ChangeLog into the repository.
Normally we are not including those files to reduce the checkout size.
However, some users have shown interest in them and Infra is working
on providing them via an additional rsync module.

If such files were injected into the repository, they would cause strict
verification failures of Manifests. To account for this, Infra could
provide either ``OPTIONAL`` entries for the Manifest files to allow them
in non-strict verification mode, or ``IGNORE`` entries to allow them
in the strict mode.


Splitting distfile checksums from file checksums
------------------------------------------------

Another problem with the current Manifest format is that the checksums
for fetched files are combined with checksums for local files
in a single file inside the package directory. It has been specifically
pointed out that:

- since distfiles are sometimes reused across different packages,
  the repeating checksums are redundant,

- mirror admins were interested in the possibility of verifying all
  the distfiles with a single tool.

This specification does not provide a clean solution to this problem.
It technically permits moving ``DIST`` entries to higher-level Manifests
but the usefulness of such a solution is doubtful.

However, for the second problem we will probably deliver a dedicated
tool working with this Manifest format.


Hash algorithms
---------------

While maintaining a consistent supported hash set is important
for interoperability, it is no good fit for the generic layout of this
GLEP. Furthermore, it would require updating the GLEP in the future
every time the used algorithms change.

Instead, the specification focuses on listing the currently used
algorithm names for interoperability, and sets a recommendation
for consistent naming of algorithms in the future. The Python
``hashlib`` module is used as a reference since it is used
as the provider of hash functions for most of the Python software,
including Portage and PkgCore.

The basic rules for changing hash algorithms are defined in GLEP 59
[#GLEP59]_. The implementations can focus only on those algorithms
that are actually used or planned on being used. It may be feasible
to devise a new GLEP that specifies the currently used hashes (or update
GLEP 59 accordingly).


Manifest compression
--------------------

The support for Manifest compression is introduced with minimal changes
to the file format. The ``MANIFEST`` entries are required to provide
the real (compressed) file path for compatibility with other file
entries and to avoid confusion.

The existence of additional entries for uncompressed Manifest checksums
was debated. However, plain entries for the uncompressed file would
be confusing if only compressed file existed, and conflicting if both
uncompressed and compressed variants existed. Furthermore, it has been
pointed out that ``DIST`` entries do not have uncompressed variant
either.


Performance considerations
--------------------------

Performing a full-tree verification on every sync raises some
performance concerns for end-user systems. The initial testing has shown
that a cold-cache verification on a btrfs file system can take up around
4 minutes, with the process being mostly I/O bound. On the other hand,
it can be expected that the verification will be performed directly
after syncing, taking advantage of warm filesystem cache.

To improve speed on I/O and/or CPU-restrained systems even further,
the algorithms can be easily extended to perform incremental
verification. Given that rsync does not preserve mtimes by default,
the tool can take advantage of mtime and Manifest comparisons to recheck
only the parts of the repository that have changed.

Furthermore, the package manager implementations can restrict checking
only to the parts of the repository that are actually being used.


Backwards Compatibility
=======================

This GLEP provides optional means of preserving backwards compatibility.
To preserve the backwards compatibility, the following needs to be
ensured:

- all files within the package directory must be covered by ``Manifest``
  file inside that package directory,

- all distfiles used by the package must be covered by ``Manifest``
  file inside the package directory,

- all files inside the ``files/`` subdirectory of a package directory
  need to be use the deprecated ``AUX`` tag (rather than ``DATA``),

- all ``.ebuild`` files inside the package directory need to use
  the deprecated ``EBUILD`` tag (rather than ``DATA``),

- the Manifest files inside the package directory can be signed
  to provide authenticity verification,

- if the Manifest files inside the package directory are compressed,
  a uncompressed file of identical content must coexist.

Once the backwards compatibility is no longer a concern, the above
no longer needs to hold and the deprecated tags can be removed.


Reference Implementation
========================

The reference implementation for this GLEP is being developed
as the gemato project [#GEMATO]_.


Credits
=======

Thanks to all the people whose contributions were invaluable
to the creation of this GLEP. This includes but is not limited to:

- Robin Hugh Johnson,
- Ulrich Müller.

Additionally, thanks to Robin Hugh Johnson for the original
MataManifest GLEP series which served both as inspiration and source
of many concepts used in this GLEP. Recursively, also thanks to all
the people who contributed to the original GLEPs.


References
==========

.. [#GLEP44] GLEP 44: Manifest2 format
   (https://www.gentoo.org/glep/glep-0044.html)

.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
   - Overview
   (https://www.gentoo.org/glep/glep-0057.html)

.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
   - Infrastructure to User distribution - MetaManifest
   (https://www.gentoo.org/glep/glep-0058.html)

.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
   (https://www.gentoo.org/glep/glep-0059.html)

.. [#GLEP60] GLEP 60: Manifest2 filetypes
   (https://www.gentoo.org/glep/glep-0060.html)

.. [#GLEP61] GLEP 61: Manifest2 compression
   (https://www.gentoo.org/glep/glep-0061.html)

.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
   Format - SRC_URI
   (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)

.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
   (https://www.ietf.org/rfc/rfc1321.txt)

.. [#RIPEMD160] The hash function RIPEMD-160
   (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)

.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)

.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
   (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)

.. [#BLAKE2] BLAKE2 — fast secure hashing
   (https://blake2.net/)

.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
   and Extendable-Output Functions
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)

.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
   (https://www.streebog.net/)

.. [#GEMATO] gemato: Gentoo Manifest Tool
   (https://github.com/mgorny/gemato/)

Copyright
=========
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.1] GLEP 74: Full-tree verification using Manifest files
  2017-10-29 19:07 ` [gentoo-dev] [v1.0.1] " Michał Górny
@ 2017-10-29 20:39   ` Robin H. Johnson
  2017-10-30 16:11     ` Michał Górny
  0 siblings, 1 reply; 32+ messages in thread
From: Robin H. Johnson @ 2017-10-29 20:39 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

On Sun, Oct 29, 2017 at 08:07:56PM +0100, Michał Górny wrote:
> File verification model
> -----------------------
> The verification model aims to provide full coverage against different
> forms of attack. In particular, three different kinds of manipulation
> are considered:
s/three/four/
> 1. Alteration of the file content.
> 
> 2. Removal of a file.
> 
> 3. Addition of a new file.
Add:
4. Metadata replay attacks [C08].

> In order to prevent against all three, the system requires that all
> files in the repository are listed in Manifests and verified against
> them.
s/three/four/.

> Timestamp field
> ---------------
...
> A malicious third-party may use the principles of exclusion and replay 
Insert [C08] after 'replay'.

> Strictly speaking, this is already provided by the various
> ``metadata/timestamp.*`` files provided already by Gentoo which are also
> covered by the Manifest. However, including the value in the Manifest
> itself has a little cost and provides the ability to perform
> the verification stand-alone.
Implementation Note: with TIMESTAMP, some of the old timestamp files will be obsolete; they
will already need special handling in Manifest generation, because they are
added VERY late in distribution. Sadly not all of them, because of legacy
dependencies (they will get IGNORE entries instead, as they are populated much
later than manifest generation).

> References
> ==========
Additions:

.. [#C08]	Cappos, J et al. (2008). "Attacks on Package Managers" 
   (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-29 18:47       ` Michał Górny
@ 2017-10-29 20:54         ` Robin H. Johnson
  2017-10-30 16:01           ` Michał Górny
  0 siblings, 1 reply; 32+ messages in thread
From: Robin H. Johnson @ 2017-10-29 20:54 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3634 bytes --]

On Sun, Oct 29, 2017 at 07:47:41PM +0100, Michał Górny wrote:
...
> > If users need other values, it's a package-manager config knob.
> 
> We don't want pre-EAPI times where things will fail out of the box
> unless the user choose the one tool that got the whole list right
> and/or configure it to account for default list.
> 
> I don't mind package manager providing the ability to ignore additional
> entries but the spec should work out of the box too.
Ok, can we have a minor additions to the text then:
- The package manager may support additional user-specified IGNORE
  entries, for usage where a user's processes need to inject additional
  files that would not be ignored by existing rules (e.g. user commits
  the rsync tree to CVS with -kb).

Notes:
- distfiles/packages/local will be in IGNORE as distributed.
- package-manager might add lost+found if they have a filesystem just
  for the tree.

> > Yes, put 'Verifying TIMESTAMP' into a new section as you added below,
> > including the out-of-date part there; don't detail how to verify it in
> > this section.
> I will try to do this today.
Looks good.

> 
> > > > GLEP61, for the transition period, required compressed & uncompressed Manifests
> > > > in the same directory to have identical content. Include mention of that here.
> > > 
> > > Can do. But I'll do it in 'Backwards compatibility' section:
> > > > - if the Manifest files inside the package directory are compressed,
> > > >   a uncompressed file of identical content must coexist.
> > > > Saying that either can be used is a potential issue.
> > > 
> > > Why? It also says that they must be identical, so it's of no difference
> > > to the implementation which one is used.
> > 
> > It's safe if the identical requirement is there, and potentially unsafe otherwise.
> That's why they're both put in a *single sentence*?
'co-exist' in this context makes it the English parse weirdly to me,
that's why I was worried at first.

Maybe a rewrite:
An uncompressed Manifest file inside a package directory MUST exist
during the transition period. A compressed Manifest of identical content
MAY be present.

> > > But it makes no sense when top-level Manifest is signed. This points out
> > > that for tools not supporting full-tree verification smaller signatures
> > > need to be used (skipping the fact that Portage did not ever implement
> > > it).
> > The Manifests might not be signed by the same entity.
> > /metadata/glsa/Manifest might be signed by the security team, 
> > /sec-policy/Manifest might be signed by SELinux team, 
> > /Manifest should STILL be signed by Infra/tree-generation-process.
> I honestly doubt this will ever happen, and even if it does, it isn't
> really relevant to the spec at hand. My point was: if someone signs
> the whole repository, he normally will consider it pointless to sign
> individual package Manifests. This explains why he might consider it.
My argument is that it make sense to permit multiple levels of signature
even when the top-level is signed: glsa-check could get ahead of the
Portage curve by verifying /metadata/glsa/Manifest using Gemato :-).
It doesn't need to verify the whole tree, just that directory.

The package manager should decide about the GPG-verification of the
nested Manifests however, as they convey trust from different sources.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files
  2017-10-29 20:54         ` Robin H. Johnson
@ 2017-10-30 16:01           ` Michał Górny
  0 siblings, 0 replies; 32+ messages in thread
From: Michał Górny @ 2017-10-30 16:01 UTC (permalink / raw
  To: gentoo-dev

W dniu nie, 29.10.2017 o godzinie 20∶54 +0000, użytkownik Robin H.
Johnson napisał:
> On Sun, Oct 29, 2017 at 07:47:41PM +0100, Michał Górny wrote:
> ...
> > > If users need other values, it's a package-manager config knob.
> > 
> > We don't want pre-EAPI times where things will fail out of the box
> > unless the user choose the one tool that got the whole list right
> > and/or configure it to account for default list.
> > 
> > I don't mind package manager providing the ability to ignore additional
> > entries but the spec should work out of the box too.
> 
> Ok, can we have a minor additions to the text then:
> - The package manager may support additional user-specified IGNORE
>   entries, for usage where a user's processes need to inject additional
>   files that would not be ignored by existing rules (e.g. user commits
>   the rsync tree to CVS with -kb).

Included.

> Notes:
> - distfiles/packages/local will be in IGNORE as distributed.
> - package-manager might add lost+found if they have a filesystem just
>   for the tree.

Not sure if we lost+found isn't actually common enough to be included
in the standard set. But I'm fine either way.

> > > Yes, put 'Verifying TIMESTAMP' into a new section as you added below,
> > > including the out-of-date part there; don't detail how to verify it in
> > > this section.
> > 
> > I will try to do this today.
> 
> Looks good.
> 
> > 
> > > > > GLEP61, for the transition period, required compressed & uncompressed Manifests
> > > > > in the same directory to have identical content. Include mention of that here.
> > > > 
> > > > Can do. But I'll do it in 'Backwards compatibility' section:
> > > > > - if the Manifest files inside the package directory are compressed,
> > > > >   a uncompressed file of identical content must coexist.
> > > > > Saying that either can be used is a potential issue.
> > > > 
> > > > Why? It also says that they must be identical, so it's of no difference
> > > > to the implementation which one is used.
> > > 
> > > It's safe if the identical requirement is there, and potentially unsafe otherwise.
> > 
> > That's why they're both put in a *single sentence*?
> 
> 'co-exist' in this context makes it the English parse weirdly to me,
> that's why I was worried at first.
> 
> Maybe a rewrite:
> An uncompressed Manifest file inside a package directory MUST exist
> during the transition period. A compressed Manifest of identical content
> MAY be present.

Done.

> 
> > > > But it makes no sense when top-level Manifest is signed. This points out
> > > > that for tools not supporting full-tree verification smaller signatures
> > > > need to be used (skipping the fact that Portage did not ever implement
> > > > it).
> > > 
> > > The Manifests might not be signed by the same entity.
> > > /metadata/glsa/Manifest might be signed by the security team, 
> > > /sec-policy/Manifest might be signed by SELinux team, 
> > > /Manifest should STILL be signed by Infra/tree-generation-process.
> > 
> > I honestly doubt this will ever happen, and even if it does, it isn't
> > really relevant to the spec at hand. My point was: if someone signs
> > the whole repository, he normally will consider it pointless to sign
> > individual package Manifests. This explains why he might consider it.
> 
> My argument is that it make sense to permit multiple levels of signature
> even when the top-level is signed: glsa-check could get ahead of the
> Portage curve by verifying /metadata/glsa/Manifest using Gemato :-).
> It doesn't need to verify the whole tree, just that directory.
> 
> The package manager should decide about the GPG-verification of the
> nested Manifests however, as they convey trust from different sources.

Sure. Gemato currently verifies all signed files it finds. However, it
only requires the top-level Manifest to be signed to consider the tree
signed.

I will submit an update once I process the other mail and do some
clarifications wrt OPTIONAL.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.1] GLEP 74: Full-tree verification using Manifest files
  2017-10-29 20:39   ` Robin H. Johnson
@ 2017-10-30 16:11     ` Michał Górny
  0 siblings, 0 replies; 32+ messages in thread
From: Michał Górny @ 2017-10-30 16:11 UTC (permalink / raw
  To: gentoo-dev

W dniu nie, 29.10.2017 o godzinie 20∶39 +0000, użytkownik Robin H.
Johnson napisał:
> On Sun, Oct 29, 2017 at 08:07:56PM +0100, Michał Górny wrote:
> > File verification model
> > -----------------------
> > The verification model aims to provide full coverage against different
> > forms of attack. In particular, three different kinds of manipulation
> > are considered:
> 
> s/three/four/
> > 1. Alteration of the file content.
> > 
> > 2. Removal of a file.
> > 
> > 3. Addition of a new file.
> 
> Add:
> 4. Metadata replay attacks [C08].

This isn't covered by the file verification model but merely
by the timestamp field which is described in a separate section.

> 
> > In order to prevent against all three, the system requires that all
> > files in the repository are listed in Manifests and verified against
> > them.
> 
> s/three/four/.
> 
> > Timestamp field
> > ---------------
> 
> ...
> > A malicious third-party may use the principles of exclusion and replay 
> 
> Insert [C08] after 'replay'.

Done.

> 
> > Strictly speaking, this is already provided by the various
> > ``metadata/timestamp.*`` files provided already by Gentoo which are also
> > covered by the Manifest. However, including the value in the Manifest
> > itself has a little cost and provides the ability to perform
> > the verification stand-alone.
> 
> Implementation Note: with TIMESTAMP, some of the old timestamp files will be obsolete; they
> will already need special handling in Manifest generation, because they are
> added VERY late in distribution. Sadly not all of them, because of legacy
> dependencies (they will get IGNORE entries instead, as they are populated much
> later than manifest generation).

Tried to word it somewhat without getting too detailed.

> 
> > References
> > ==========
> 
> Additions:
> 
> .. [#C08]	Cappos, J et al. (2008). "Attacks on Package Managers" 
>    (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)
> 

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
                   ` (3 preceding siblings ...)
  2017-10-29 19:07 ` [gentoo-dev] [v1.0.1] " Michał Górny
@ 2017-10-30 16:51 ` Michał Górny
  2017-10-30 19:56   ` Robin H. Johnson
  2017-11-02 19:11 ` [gentoo-dev] [v1.0.3] " Michał Górny
  2017-11-06 21:53 ` [gentoo-dev] [v1.0.4] " Michał Górny
  6 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-10-30 16:51 UTC (permalink / raw
  To: gentoo-dev

Here's another version with a few clarification-class changes:

62819e2 glep-0074: Clarify OPTIONAL desc
e953eaf glep-0074: Add two example files for reference
f98cabc glep-0074: Reorganize to have tag references after basic algos
56b06b0 glep-0074: Rewrite the file verificaton to cover OPTIONAL
bbabc4d glep-0074: Split 'Directory tree coverage' section out
fe62b50 glep-0074: Apply more suggestions from Robin

W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
napisał:
> 
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
> impl: https://github.com/mgorny/gemato/
> 

---
GLEP: 74
Title: Full-tree verification using Manifest files
Author: Michał Górny <mgorny@gentoo.org>,
        Robin Hugh Johnson <robbat2@gentoo.org>,
        Ulrich Müller <ulm@gentoo.org>
Type: Standards Track
Status: Draft
Version: 1
Created: 2017-10-21
Last-Modified: 2017-10-30
Post-History: 2017-10-26
Content-Type: text/x-rst
Requires: 59, 61
Replaces: 44, 58, 60
---

Abstract
========

This GLEP extends the Manifest file format to cover full-tree file
integrity and authenticity checks.The format aims to be future-proof,
efficient and provide means of backwards compatibility.


Motivation
==========

The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
means of verifying the integrity of distfiles and package files
in Gentoo. Combined with OpenPGP signatures, they provide means to
ensure the authenticity of the covered files. However, as noted
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
authenticity verification as they do not cover any files outside
the package directory. In particular, they provide multiple ways
for a third party to inject malicious code into the ebuild environment.

Historically, the topic of providing authenticity coverage for the whole
repository has been mentioned multiple times. The most noteworthy effort
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
They were accepted by the Council in 2010 but have never been
implemented. When potential implementation work started in 2017, a new
discussion about the specification arose. It prompted the creation
of a competing GLEP that would provide a redesigned alternative to
the old GLEPs.

This specification is designed with the following goals in mind:

1. It should provide means to ensure the authenticity of the complete
   repository, including preventing the injection of additional files.

2. Like the original Manifest2, the files should be split into two
   groups — files whose authenticity is critical, and those whose
   mismatch may be accepted in non-strict mode. The same classification
   should apply both to files listed in Manifests, and to stray files
   present only in the repository.

3. The format should be universal enough to work both for the Gentoo
   repository and third-party repositories of different characteristics.

4. The Manifest files should be verifiable stand-alone, that is without
   knowing any details about the underlying repository format.


Specification
=============

Manifest file format
--------------------

This specification reuses and extends the Manifest file format defined
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
repurposed as a generic *tag* that could also indicate additional
(non-checksum) metadata. Appropriately, those tags can be followed by
other space-separated values.

Unless specified otherwise, the paths used in the Manifest files
are relative to the directory containing the Manifest file. The paths
must not reference the parent directory (``..``).


Manifest file locations and nesting
-----------------------------------

The ``Manifest`` file located in the root directory of the repository
is called top-level Manifest, and it is used to perform the full-tree
verification. In order to verify the authenticity, it must be signed
using OpenPGP, using the armored cleartext format.

The top-level Manifest may reference sub-Manifests contained
in subdirectories of the repository. The sub-Manifests are traditionally
named ``Manifest``; however, the implementation must support arbitrary
names, including the possibility of multiple (split) Manifests
for a single directory. The sub-Manifest can only cover the files inside
the directory tree where it resides.

The sub-Manifest can also be signed using OpenPGP armored cleartext
format. However, the signature verification can be omitted if it is
covered by a signed top-level Manifest.


Directory tree coverage
-----------------------

The Manifest files can also specify ``IGNORE`` entries to skip Manifest
verification of subdirectories and/or files. The package manager can
support injecting ignore paths to account for additional files created,
modified or removed by user's processes that would not be ignored
by existing rules. Files and directories starting with a dot are always
implicitly ignored. All files that are not ignored must be covered
by at least one of the Manifests.

A single file may be matched by multiple identical or equivalent
Manifest entries, if and only if the entries have the same semantics,
specify the same size and the checksums common to both entries match.
It is an error for a single file to be matched by multiple entries
of different semantics, file size or checksum values. It is an error
to specify another entry for a file matching ``IGNORE``, or one of its
subdirectories.

The file entries (except for ``IGNORE``) can be specified for regular
files only. Symbolic links are followed when opening files. It is
an error to specify an entry for a different file type.

All the local (non-``DIST``) files covered by a Manifest tree must
reside on the same filesystem. It is an error to specify entries
applying to files on another filesystem. If subdirectories
of the Manifest tree reside on a different filesystem, they must
be explicitly excluded via ``IGNORE``.


File verification
-----------------

When verifying a file against the Manifest, the following rules are
used:

1. If the file is covered directly or indirectly by an entry
   of the ``IGNORE`` type, the verification always succeeds.

2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
   ``MISC``, ``EBUILD`` or ``AUX`` type:

   a. if the file is not present, then the verification fails,

   b. if the file is present but has a different size or one
      of the checksums does not match, the verification fails,

   c. otherwise, the verification succeeds.

3. If the file is covered by an entry of the ``OPTIONAL`` type:

   a. if the file is present, then the verification fails,

   b. otherwise, the verification succeeds.

4. If the file is present but not listed in Manifest, the verification
   fails.

Unless specified otherwise, the package manager must not allow using
any files for which the verification failed. The package manager may
reject any package or even the whole repository if it may refer to files
for which the verification failed.


Timestamp verification
----------------------

The Manifest file can contain a ``TIMESTAMP`` entry to account
for attacks against tree update distribution. If such an entry
is present, it should be updated every time at least one
of the Manifests changes. Every unique timestamp value must correspond
to a single tree state.

During the verification process, the client should compare the timestamp
against the update time obtained from a local clock or a trusted time
source. If the comparison result indicates that the Manifest at the time
of receiving was already significantly outdated, the client should
either fail the verification or require manual confirmation from user.

Furthermore, the Manifest provider may employ additional methods
of distributing the timestamps of recently generated Manifests
using a secure channel from a trusted source for exact comparison.
The exact details of such a solution are outside the scope of this
specification.


Modern Manifest tags
--------------------

The Manifest files can specify the following tags:

``TIMESTAMP <iso8601>``
  Specifies a timestamp of when the Manifest file was last updated.
  The timestamp must be a valid second-precision ISO8601 extended format
  combined date and time in UTC timezone, i.e. using the following
  ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
  in the top-level Manifest file. The package manager can use it
  to detect an outdated repository checkout as described in `Timestamp
  verification`_.

``MANIFEST <path> <size> <checksums>…``
  Specifies a sub-Manifest. The sub-Manifest must be verified like
  a regular file. If the verification succeeds, the entries from
  the sub-Manifest are included for verification as described
  in `Manifest file locations and nesting`_.

``IGNORE <path>``
  Ignores a subdirectory or file from Manifest checks. If the specified
  path is present, it and its contents are omitted from the Manifest
  verification (always pass).

``DATA <path> <size> <checksums>…``
  Specifies a file subject to obligatory Manifest verification.
  The file is required to pass verification. Used for all files directly
  affecting package manager operation (ebuilds, eclasses, profiles).

``MISC <path> <size> <checksums>…``
  Specifies a file subject to non-obligatory Manifest verification.
  The package manager may ignore a verification failure if operating
  in non-strict mode. Used for files that do not affect the installed
  packages (``metadata.xml``, ``use.desc``).

``OPTIONAL <path>``
  Specifies a file that does not exist in the distribution but if it
  did, it would be marked as ``MISC``. In the strict mode, the file
  must not exist for the verification to pass. The package manager
  may ignore a stray file matching this entry if operating in non-strict
  mode.

``DIST <filename> <size> <checksums>…``
  Specifies a distfile entry used to verify files fetched as part
  of ``SRC_URI``. The filename must match the filename used to store
  the fetched file as specified in the PMS [#PMS-FETCH]_. The package
  manager must reject the fetched file if it fails verification.
  ``DIST`` entries apply to all packages below the Manifest file
  specifying them.


Deprecated Manifest tags
------------------------

For backwards compatibility, the following tags are additionally
allowed at the package directory level:

``EBUILD <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type.

``AUX <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type, except that the filename is relative
  to ``files/`` subdirectory.


Algorithm for full-tree verification
------------------------------------

In order to perform full-tree verification, the following algorithm
can be used:

1. Collect all files present in the repository into *present* set.

2. Start at the top-level Manifest file. Verify its OpenPGP signature.
   Optionally verify the ``TIMESTAMP`` entry if present as specified
   in `timestamp verification`. Remove the top-level Manifest
   from the *present* set.

3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
   files according to `file verification`_ section, and include their
   entries in the current Manifest entry list (using paths relative
   to directories containing the Manifests).

4. Process all ``IGNORE`` entries. Remove any paths matching them
   from the *present* set.

5. Collect all files covered by ``DATA``, ``MISC``, ``OPTIONAL``,
   ``EBUILD`` and ``AUX`` entries into the *covered* set.

6. Verify the entries in *covered* set for incompatible duplicates
   and collisions with ignored files as explained in `Manifest file
   locations and nesting`_.

7. Verify all the files in the union of the *present* and *covered*
   sets, according to `file verification`_ section.


Algorithm for finding parent Manifests
--------------------------------------

In order to find the top-level Manifest from the current directory
the following algorithm can be used:

1. Store the current directory as *original* and the device ID
   of the containing filesystem (``st_dev``) as *startdev*,

2. If the device ID of the containing filesystem (``st_dev``)
   of the current directory is different than *startdev*, stop.

3. If the current directory contains a ``Manifest`` file:

   a. If a ``IGNORE`` entry in the ``Manifest`` file covers
      the *original* directory (or one of the parent directories), stop.

   b. Otherwise, store the current directory as *last_found*.

4. If the current directory is the root system directory (``/``), stop.

5. Otherwise, enter the parent directory and jump to step 2.

Once the algorithm stops, *last_found* will contain the relevant
top-level Manifest. If *last_found* is null, then the directory tree
does not contain any valid top-level Manifest candidates and one should
be created in the *original* directory.

Once the top-level Manifest is found, its ``MANIFEST`` entries should
be used to find any sub-Manifests below the top-level Manifest,
up to and including the *original* directory. Note that those
sub-Manifests can use different filenames than ``Manifest``.


Checksum algorithms
-------------------

This section is informational only. Specifying the exact set
of supported algorithms is outside the scope of this specification.

The algorithm names reserved at the time of writing are:

- ``MD5`` [#MD5]_,
- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
- ``SHA1`` [#SHS]_,
- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
- ``WHIRLPOOL`` [#WHIRLPOOL]_,
- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
  [#STREEBOG]_.

The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
It is recommended that any new hashes are named after the Python
``hashlib`` module algorithm names, transformed into uppercase.


Manifest compression
--------------------

The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
This section merely addresses interoperability issues between Manifest
compression and this specification.

The compressed Manifest files are required to be suffixed for their
compression algorithm. This suffix should be used to recognize
the compression and decompress Manifests transparently. The exact list
of algorithms and their corresponding suffixes are outside the scope
of this specification.

Whenever this specification refers to top-level Manifest file,
the implementation should account for compressed variants of this file
with appropriate suffixes (e.g. ``Manifest.gz``).

Whenever this specification refers to sub-Manifests, they can use any
names but are also required to use a specific compression suffix.
The ``MANIFEST`` entries are required to specify the full name including
compression suffix, and the verification is performed on the compressed
file.

The specification permits uncompressed Manifests to exist alongside
their compressed counterparts, and multiple compressed formats
to coexist. If that is the case, the files must have the same
uncompressed content and the specification is free to choose either
of the files using the same base name.


An example Manifest file (informational)
----------------------------------------

An example top-level Manifest file for the Gentoo repository would have
the following content::

    TIMESTAMP 2017-10-30T10:11:12Z
    IGNORE distfiles
    IGNORE local
    IGNORE lost+found
    IGNORE packages
    MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512
f7eb..
    ...
    MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
    ...

An example modern Manifest (disregarding backwards compatibility)
for a package directory would have the following content::

    DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
    DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
    DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
    DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
    DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512
1b33..
    DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..
    MISC metadata.xml 664 SHA256 97c6.. SHA512 1175..


Rationale
=========

Stand-alone format
------------------

The first question that needed to be asked before proceeding with
the design was whether the Manifest file format was supposed to be
stand-alone, or tightly bound to the repository format.

The stand-alone format has been selected because of its three
advantages:

1. It is more future-proof. If an incompatible change to the repository
   format is introduced, only developers need to be upgrade the tools
   they use to generate the Manifests. The tools used to verify
   the updated Manifests will continue to work.

2. It is more flexible and universal. With a dedicated tool,
   the Manifest files can be used to sign and verify arbitrary file
   sets.

3. It keeps the verification tool simpler. In particular, we can easily
   write an independent verification tool that could work on any
   distribution without needing to depend on a package manager
   implementation or rewrite parts of it.

Designing a stand-alone format requires that the Manifest carries enough
information to perform the verification following all the rules specific
to the Gentoo repository.


Tree design
-----------

The second important point of the design was determining whether
the Manifest files should be structured hierarchically, or independent.
Both options have their advantages.

In the hierarchical model, each sub-Manifest file is covered by a higher
level Manifest. As a result, only the top-level Manifest has to be
OpenPGP-signed, and subsequent Manifests need to be only verified by
checksum stored in the parent Manifest. This has the following
implications:

- Verifying any set of files in the repository requires using checksums
  from the most relevant Manifests and the parent Manifests.

- The OpenPGP signature of the top-level Manifest needs to be verified
  only once per process.

- Altering any set of files requires updating the relevant Manifests,
  and their parent Manifests up to the top-level Manifest, and signing
  the last one.

- As a result, the top-level Manifest changes on every commit,
  and various middle-level Manifests change (and need to be transferred)
  frequently.

In the independent model, each sub-Manifest file is independent
of the parent Manifests. As a result, each of them needs to be signed
and verified independently. However, the parent Manifests still need
to list sub-Manifests (albeit without verification data) in order
to detect removal or replacement of subdirectories. This has
the following implications:

- Verifying any set of files in the repository requires using checksums
  and verifying signatures of the most relevant Manifest files.

- Altering any set of files requires updating the relevant Manifests
  and signing them again.

- Parent Manifests are updated only when Manifests are added or removed
  from subdirectories. As a result, they change infrequently.

While both models have their advantages, the hierarchical model was
selected because it reduces the number of OpenPGP operations
which are comparatively costly to the minimum.


Tree layout restrictions
------------------------

The algorithm is meant to work primarily with ebuild repositories which
normally contain only files and directories. Directories provide
no useful metadata for verification, and specifying special entries
for additional file types is purposeless. Therefore, the specification
is restricted to dealing with regular files.

The Gentoo repository does not use symbolic links. Some Gentoo
repositories do, however. To provide a simple solution for dealing with
symlinks without having to take care to implement special handling for
them, the common behavior of implicitly resolving them is used.
Therefore, symbolic links to files are stored as if they were regular
files, and symbolic links to directories are followed as if they were
regular directories.

Dotfiles are implicitly ignored as that is a common notion used
in software written for POSIX systems. All other filenames require
explicit ``IGNORE`` lines.

The algorithm is restricted to work on a single filesystem. This is
mostly relevant when scanning for top-level Manifest — we do not want
to cross filesystem boundaries then. However, to ensure consistent
bidirectional behavior we need to also ban them when operating downwards
the tree.

The directories and files on different filesystems needs to be ignored
explicitly as implicitly skipping them would cause confusion.
In particular, tools might then claim that a file does not exist when
it clearly does because it was skipped due to filesystem boundaries.


File verification model
-----------------------

The verification model aims to provide full coverage against different
forms of attack. In particular, three different kinds of manipulation
are considered:

1. Alteration of the file content.

2. Removal of a file.

3. Addition of a new file.

In order to prevent against all three, the system requires that all
files in the repository are listed in Manifests and verified against
them.

As a special case, ignores are allowed to account for directories
that are not part of the repository but were traditionally placed inside
it. Those directories were ``distfiles``, ``local`` and ``packages``. It
could be also used to ignore VCS directories such as ``CVS``.


Non-obligatory Manifest verification
------------------------------------

While this specification recommends all tools to use strict verification
by default, it allows declaring some files as non-obligatory like
the original Manifest2 format did. This could be used on files that do
not affect the normal package manager operation.

It aims to account for two use cases:

1. Stripping down files that are not strictly required to install
   packages from repository checkouts.

2. Accounting for automatically generated files that might be updated
   by standard tooling.

The traditional ``MISC`` type is amended with a complementary
``OPTIONAL`` tag to account for files that are not provided
in the specific repository. It aims to ensure that the same path would
be non-fatal when provided by the repository but fatal when created
by the user tooling.


Timestamp field
---------------

The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
to include a generation timestamp in the Manifest. A similar feature
was originally proposed in GLEP 58 [#GLEP58]_.

A malicious third-party may use the principles of exclusion or replay
[#C08]_ to deny an update to clients, while at the same time recording
the identity of clients to attack. The timestamp field can be used to
detect that.

In order to provide a more complete protection, the Gentoo
Infrastructure should provide an ability to obtain the timestamps
of all Manifests from a recent timeframe over a secure channel
from a trusted source for comparison.

Strictly speaking, this information is already provided by the various
``metadata/timestamp*`` files that are already present. However,
including the value in the Manifest itself has a little cost
and provides the ability to perform the verification stand-alone.

Furthermore, some of the timestamp files are added very late
in the distribution process, past the Manifest generation phase. Those
files will most likely receive ``IGNORE`` entries and therefore
be not suitable to safe use.


New vs deprecated tags
----------------------

Out of the four types defined by Manifest2, two are reused and two are
marked deprecated.

The ``DIST`` and ``MISC`` tags are reused since they can be relatively
clearly marked into the new concept.

The ``EBUILD`` tag could potentially be reused for generic file
verification data. However, it would be confusing if all the different
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
type was introduced as a replacement.

The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
the limiting property of implicit ``files/`` path prefix.


Finding top-level Manifest
--------------------------

The development of a reference implementation for this GLEP has brought
the following problem: how to find all the relevant Manifests when
the Manifest tool is run inside a subdirectory of the repository?

One of the options would be to provide a bi-directional linking
of Manifests via a ``PARENT`` tag. However, that would not solve
the problem when a new Manifest file is being created.

Instead, an algorithm for iterating over parent directories is proposed.
Since there is no obligatory explicit indicator for the top-level
Manifest, the algorithm assumes that the top-level Manifest
is the highest ``Manifest`` in the directory hierarchy that can cover
the current directory. This generally makes sense since the Manifest
files are required to provide coverage for all subdirectories, so all
Manifests starting from that one need to be updated.

If independent Manifest trees are nested in the directory structure,
then an ``IGNORE`` entry needs to be used to separate them.

Since sub-Manifests can use any filenames, the Manifest finding
algorithm must not short-cut the procedure by storing all ``Manifest``
files along the parent directories. Instead, it needs to retrace
the relevant sub-Manifest files along ``MANIFEST`` entries
in the top-level Manifest.


Injecting ChangeLogs into the checkout
--------------------------------------

One of the problems considered in the new Manifest format was that
of injecting historical and autogenerated ChangeLog into the repository.
Normally we are not including those files to reduce the checkout size.
However, some users have shown interest in them and Infra is working
on providing them via an additional rsync module.

If such files were injected into the repository, they would cause strict
verification failures of Manifests. To account for this, Infra could
provide either ``OPTIONAL`` entries for the Manifest files to allow them
in non-strict verification mode, or ``IGNORE`` entries to allow them
in the strict mode.


Splitting distfile checksums from file checksums
------------------------------------------------

Another problem with the current Manifest format is that the checksums
for fetched files are combined with checksums for local files
in a single file inside the package directory. It has been specifically
pointed out that:

- since distfiles are sometimes reused across different packages,
  the repeating checksums are redundant,

- mirror admins were interested in the possibility of verifying all
  the distfiles with a single tool.

This specification does not provide a clean solution to this problem.
It technically permits moving ``DIST`` entries to higher-level Manifests
but the usefulness of such a solution is doubtful.

However, for the second problem we will probably deliver a dedicated
tool working with this Manifest format.


Hash algorithms
---------------

While maintaining a consistent supported hash set is important
for interoperability, it is no good fit for the generic layout of this
GLEP. Furthermore, it would require updating the GLEP in the future
every time the used algorithms change.

Instead, the specification focuses on listing the currently used
algorithm names for interoperability, and sets a recommendation
for consistent naming of algorithms in the future. The Python
``hashlib`` module is used as a reference since it is used
as the provider of hash functions for most of the Python software,
including Portage and PkgCore.

The basic rules for changing hash algorithms are defined in GLEP 59
[#GLEP59]_. The implementations can focus only on those algorithms
that are actually used or planned on being used. It may be feasible
to devise a new GLEP that specifies the currently used hashes (or update
GLEP 59 accordingly).


Manifest compression
--------------------

The support for Manifest compression is introduced with minimal changes
to the file format. The ``MANIFEST`` entries are required to provide
the real (compressed) file path for compatibility with other file
entries and to avoid confusion.

The existence of additional entries for uncompressed Manifest checksums
was debated. However, plain entries for the uncompressed file would
be confusing if only compressed file existed, and conflicting if both
uncompressed and compressed variants existed. Furthermore, it has been
pointed out that ``DIST`` entries do not have uncompressed variant
either.


Performance considerations
--------------------------

Performing a full-tree verification on every sync raises some
performance concerns for end-user systems. The initial testing has shown
that a cold-cache verification on a btrfs file system can take up around
4 minutes, with the process being mostly I/O bound. On the other hand,
it can be expected that the verification will be performed directly
after syncing, taking advantage of warm filesystem cache.

To improve speed on I/O and/or CPU-restrained systems even further,
the algorithms can be easily extended to perform incremental
verification. Given that rsync does not preserve mtimes by default,
the tool can take advantage of mtime and Manifest comparisons to recheck
only the parts of the repository that have changed.

Furthermore, the package manager implementations can restrict checking
only to the parts of the repository that are actually being used.


Backwards Compatibility
=======================

This GLEP provides optional means of preserving backwards compatibility.
To preserve the backwards compatibility, the following needs to be
ensured:

- all files within the package directory must be covered by ``Manifest``
  file inside that package directory,

- all distfiles used by the package must be covered by ``Manifest``
  file inside the package directory,

- all files inside the ``files/`` subdirectory of a package directory
  need to be use the deprecated ``AUX`` tag (rather than ``DATA``),

- all ``.ebuild`` files inside the package directory need to use
  the deprecated ``EBUILD`` tag (rather than ``DATA``),

- the Manifest files inside the package directory can be signed
  to provide authenticity verification,

- an uncompressed Manifest file must exist in the package directory,
  and a compressed Manifest of identical content may be present.

Once the backwards compatibility is no longer a concern, the above
no longer needs to hold and the deprecated tags can be removed.


Reference Implementation
========================

The reference implementation for this GLEP is being developed
as the gemato project [#GEMATO]_.


Credits
=======

Thanks to all the people whose contributions were invaluable
to the creation of this GLEP. This includes but is not limited to:

- Robin Hugh Johnson,
- Ulrich Müller.

Additionally, thanks to Robin Hugh Johnson for the original
MataManifest GLEP series which served both as inspiration and source
of many concepts used in this GLEP. Recursively, also thanks to all
the people who contributed to the original GLEPs.


References
==========

.. [#GLEP44] GLEP 44: Manifest2 format
   (https://www.gentoo.org/glep/glep-0044.html)

.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
   - Overview
   (https://www.gentoo.org/glep/glep-0057.html)

.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
   - Infrastructure to User distribution - MetaManifest
   (https://www.gentoo.org/glep/glep-0058.html)

.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
   (https://www.gentoo.org/glep/glep-0059.html)

.. [#GLEP60] GLEP 60: Manifest2 filetypes
   (https://www.gentoo.org/glep/glep-0060.html)

.. [#GLEP61] GLEP 61: Manifest2 compression
   (https://www.gentoo.org/glep/glep-0061.html)

.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
   Format - SRC_URI
   (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)

.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
   (https://www.ietf.org/rfc/rfc1321.txt)

.. [#RIPEMD160] The hash function RIPEMD-160
   (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)

.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)

.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
   (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)

.. [#BLAKE2] BLAKE2 — fast secure hashing
   (https://blake2.net/)

.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
   and Extendable-Output Functions
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)

.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
   (https://www.streebog.net/)

.. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
   (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-
package-managers.html)

.. [#GEMATO] gemato: Gentoo Manifest Tool
   (https://github.com/mgorny/gemato/)

Copyright
=========
This work is licensed under the Creative Commons Attribution-ShareAlike
3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-10-30 16:51 ` [gentoo-dev] [v1.0.2] " Michał Górny
@ 2017-10-30 19:56   ` Robin H. Johnson
  2017-11-01  8:44     ` Michał Górny
  2017-11-02 19:10     ` Michał Górny
  0 siblings, 2 replies; 32+ messages in thread
From: Robin H. Johnson @ 2017-10-30 19:56 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 8625 bytes --]

On Mon, Oct 30, 2017 at 05:51:36PM +0100, Michał Górny wrote:
...
> Directory tree coverage
> -----------------------
This section should maybe cover OPTIONAL in more detail (see more
below).

> The Manifest files can also specify ``IGNORE`` entries to skip Manifest
> verification of subdirectories and/or files. The package manager can
> support injecting ignore paths to account for additional files created,
> modified or removed by user's processes that would not be ignored
> by existing rules. Files and directories starting with a dot are always
> implicitly ignored. All files that are not ignored must be covered
> by at least one of the Manifests.
(English) There are multiple points in this paragraph, and I missed on first
reading. The package manager part is esp. lost.
|| All files that are not ignored must be covered by at least one of the
|| Manifests. Files may be ignored by several ways:
|| - Files and directories starting with a dot are always implicitly
||   ignored.
|| - The Manifest files can specify ``IGNORE`` entries to skip
||   verification of ubdirectories and/or files.
|| - The package manager can support injecting ignore paths to account for
||   additional files created modified or removed by user's processes that
||   would not be ignored by existing rules.

> File verification
> -----------------
> When verifying a file against the Manifest, the following rules are
> used:
...
> 3. If the file is covered by an entry of the ``OPTIONAL`` type:
>    a. if the file is present, then the verification fails,
>    b. otherwise, the verification succeeds.
Move the OPTIONAL type further up in the verification list maybe? See
the interpretation question below.


> Modern Manifest tags
> --------------------
...
> ``IGNORE <path>``
>   Ignores a subdirectory or file from Manifest checks. If the specified
>   path is present, it and its contents are omitted from the Manifest
>   verification (always pass).
Clarification Needed:
Should subdirectories have a trailing slash in the Manifest or not?
This affects matching of the type.
Case 1.1:
Manifest has 'IGNORE foo'; 'foo' is a file; => ignored.
Case 1.2:
Manifest has 'IGNORE foo'; 'foo' is a directory; => ignored.
Case 2.1:
Manifest has 'IGNORE foo/'; 'foo' is a file; => FAIL
Case 2.2:
Manifest has 'IGNORE foo/'; 'foo' is a directory; => ignored.


> ``OPTIONAL <path>``
>   Specifies a file that does not exist in the distribution but if it
>   did, it would be marked as ``MISC``. In the strict mode, the file
>   must not exist for the verification to pass. The package manager
>   may ignore a stray file matching this entry if operating in non-strict
>   mode.
This has gotten less clear.
Is the following correct interpretation?
if(package manager is strict) then
  Treat the OPTIONAL entry as NOT present in the Manifest.
  This will cause files to be in the present set but not the covered set.
else
  Treat the OPTIONAL entry as 'IGNORE <path>'
endif

> ``DIST <filename> <size> <checksums>…``
>   Specifies a distfile entry used to verify files fetched as part
>   of ``SRC_URI``. The filename must match the filename used to store
>   the fetched file as specified in the PMS [#PMS-FETCH]_. The package
>   manager must reject the fetched file if it fails verification.
>   ``DIST`` entries apply to all packages below the Manifest file
>   specifying them.
Nit: You have used a unicode ellipsis '…' in some places and plain ASCII
ellipsis '...' in others. Stick to ASCII?


> An example Manifest file (informational)
> ----------------------------------------
Can you add an example for OPTIONAL?

> Tree layout restrictions
> ------------------------
> The Gentoo repository does not use symbolic links. Some Gentoo
> repositories do, however. To provide a simple solution for dealing with
> symlinks without having to take care to implement special handling for
> them, the common behavior of implicitly resolving them is used.
> Therefore, symbolic links to files are stored as if they were regular
> files, and symbolic links to directories are followed as if they were
> regular directories.
Clarification: should cross-device symlinks be rejected? (perhaps
implicit, but wanted to check)
If so, need to add to 'Algorithm for full-tree verification' section.

> Dotfiles are implicitly ignored as that is a common notion used
> in software written for POSIX systems. All other filenames require
> explicit ``IGNORE`` lines.
This paragraph should re-iterate that the package manager may specify
additional files to be ignored per the user.

> The algorithm is restricted to work on a single filesystem. This is
> mostly relevant when scanning for top-level Manifest — we do not want
> to cross filesystem boundaries then. However, to ensure consistent
> bidirectional behavior we need to also ban them when operating downwards
> the tree.
> 
> The directories and files on different filesystems needs to be ignored
> explicitly as implicitly skipping them would cause confusion.
> In particular, tools might then claim that a file does not exist when
> it clearly does because it was skipped due to filesystem boundaries.
The downward path needs to check the device on files.
Otherwise:
cat/pn/Manifest
cat/pn/files/ <-- different filesystem here

> Non-obligatory Manifest verification
> ------------------------------------
...
> The traditional ``MISC`` type is amended with a complementary
> ``OPTIONAL`` tag to account for files that are not provided
> in the specific repository. It aims to ensure that the same path would
> be non-fatal when provided by the repository but fatal when created
> by the user tooling.
Clarify the last sentence to be for strict mode only?

> Splitting distfile checksums from file checksums
> ------------------------------------------------
> 
> Another problem with the current Manifest format is that the checksums
> for fetched files are combined with checksums for local files
> in a single file inside the package directory. It has been specifically
> pointed out that:
> 
> - since distfiles are sometimes reused across different packages,
>   the repeating checksums are redundant,
> 
> - mirror admins were interested in the possibility of verifying all
>   the distfiles with a single tool.
> 
> This specification does not provide a clean solution to this problem.
> It technically permits moving ``DIST`` entries to higher-level Manifests
> but the usefulness of such a solution is doubtful.
Clarification of validity:
If cat/pn1 and cat/pn2 share 1000 DIST files; would it be valid to
have: the following:
cat/pn1/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
cat/pn1/Manifest:DIST unique-pn1-dist.tgz 1234 ...
cat/pn2/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
cat/pn2/Manifest:DIST unique-pn2-dist.tgz 1234 ...

> Performance considerations
> --------------------------
...
> To improve speed on I/O and/or CPU-restrained systems even further,
> the algorithms can be easily extended to perform incremental
> verification. Given that rsync does not preserve mtimes by default,
> the tool can take advantage of mtime and Manifest comparisons to recheck
> only the parts of the repository that have changed.
Implementation note, not for GLEP addition:
If the package manager caches by filename,inode,mtime locally, it can
then avoid repeat-checking of the hashes (it only needs a stat),
provided that it is happy there was no local attacker who might perform
an in-place modification of a file (mtime&inode remain the same).

> 
> Furthermore, the package manager implementations can restrict checking
> only to the parts of the repository that are actually being used.
> 
> 
> Backwards Compatibility
> =======================
> 
> This GLEP provides optional means of preserving backwards compatibility.
> To preserve the backwards compatibility, the following needs to be
> ensured:
"package directory" is common to all of the items here, if you move that
to the list preamble, it's a lot cleaner to read.
|| To preserve the backwards compatibility, the following needs to be
|| ensured about package directories:
And cleanup the list:
s/in(side)? (that|the) package directory//
s/of a package directory//

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-10-30 19:56   ` Robin H. Johnson
@ 2017-11-01  8:44     ` Michał Górny
  2017-11-01  9:47       ` Walter Dnes
  2017-11-01 13:08       ` Andreas K. Huettel
  2017-11-02 19:10     ` Michał Górny
  1 sibling, 2 replies; 32+ messages in thread
From: Michał Górny @ 2017-11-01  8:44 UTC (permalink / raw
  To: gentoo-dev

Hi,

Ok, so before we get into this deeper, here's another option we've been
discussing. Let's drop the non-strict mode entirely, drop OPTIONAL
and keep MISC as deprecated-used-to-have-special-meaning alias to DATA.

This is going to make a lot of things simpler, and avoid having the very
long discussion on what should be MISC and what not. Especially given
that the specific definition of MISC makes little sense as-is.


Two reasons have been mentioned for having non-strict mode:

1. Stripping some of non-strictly necessary files to reduce repository
size. However:

1a. Stripping of files that we can mark MISC is not going to do much.
Most of the time, people would strip whole categories or other data we
can't really mark MISC, so they will need a different solution anyway.

1b. That's just an argument for allowing them to be missing. There's no
clear reason why they would have different content, and it doesn't have
much sense to allow it implicitly.

1c. Those files can still be means of doing some kind of attacks --
starting with misinformation resulting in the user reducing security of
his systems, ending with attacks e.g. exploiting XML parser
vulnerabilities.

2. Allowing different content for cache-class files that can be updated
on user's end (e.g. md5-cache, use.local.desc...). However:

2a. We can't really do this for md5-cache since it clearly can be
abused.

2b. Again, it makes little sense since we took special care that all
those tools have stable output.


All that said, if we really have a problem that needs solving here, I'm
not convinced MISC is the right solution for it. If people need to
explicitly exclude stuff, then I suppose the configuration-injected
ignore list is much better solution for this.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-11-01  8:44     ` Michał Górny
@ 2017-11-01  9:47       ` Walter Dnes
  2017-11-01 13:08       ` Andreas K. Huettel
  1 sibling, 0 replies; 32+ messages in thread
From: Walter Dnes @ 2017-11-01  9:47 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

On Wed, Nov 01, 2017 at 09:44:09AM +0100, Micha?? Górny wrote

> All that said, if we really have a problem that needs solving here, I'm
> not convinced MISC is the right solution for it. If people need to
> explicitly exclude stuff, then I suppose the configuration-injected
> ignore list is much better solution for this.

  An example of stuff you'd run into; my make.conf has the line...
PORTAGE_RSYNC_EXTRA_OPTS="--exclude-from=/etc/portage/rsync_excludes"

  I've attached a script that I run whenever I add a directory to the
exclusion list.  It's specific to me, ie. program categories I don't
use.  It deletes the specified portage dirs, and updates the exclusion
file.  It really speeds up "emerge --sync" on my ancient Atom netbook
with 2 gigabytes of ram.  Will there still be an option to cycle through
all existing top-level subdirectories of /usr/portage and check against
the directory manifests?

-- 
Walter Dnes <waltdnes@waltdnes.org>
I don't run "desktop environments"; I run useful applications

[-- Attachment #2: cleanup.gz --]
[-- Type: application/octet-stream, Size: 427 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-11-01  8:44     ` Michał Górny
  2017-11-01  9:47       ` Walter Dnes
@ 2017-11-01 13:08       ` Andreas K. Huettel
  1 sibling, 0 replies; 32+ messages in thread
From: Andreas K. Huettel @ 2017-11-01 13:08 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

Am Mittwoch, 1. November 2017, 09:44:09 CET schrieb Michał Górny:
> Hi,
> 
> Ok, so before we get into this deeper, here's another option we've been
> discussing. Let's drop the non-strict mode entirely, drop OPTIONAL
> and keep MISC as deprecated-used-to-have-special-meaning alias to DATA.
> 
> This is going to make a lot of things simpler, and avoid having the very
> long discussion on what should be MISC and what not. Especially given
> that the specific definition of MISC makes little sense as-is.
> 

+1 

(unless someone comes up with an extremely good usecase that goes beyond user 
configuration)

-- 
Andreas K. Hüttel
dilfridge@gentoo.org
Gentoo Linux developer (council, perl, libreoffice)

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 981 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.2] GLEP 74: Full-tree verification using Manifest files
  2017-10-30 19:56   ` Robin H. Johnson
  2017-11-01  8:44     ` Michał Górny
@ 2017-11-02 19:10     ` Michał Górny
  1 sibling, 0 replies; 32+ messages in thread
From: Michał Górny @ 2017-11-02 19:10 UTC (permalink / raw
  To: gentoo-dev

W dniu pon, 30.10.2017 o godzinie 19∶56 +0000, użytkownik Robin H.
Johnson napisał:
> On Mon, Oct 30, 2017 at 05:51:36PM +0100, Michał Górny wrote:
> ...
> > Directory tree coverage
> > -----------------------
> 
> This section should maybe cover OPTIONAL in more detail (see more
> below).

So I've removed OPTIONAL from the new version, so I'm going to skip all
the comments regarding it.

> 
> > The Manifest files can also specify ``IGNORE`` entries to skip Manifest
> > verification of subdirectories and/or files. The package manager can
> > support injecting ignore paths to account for additional files created,
> > modified or removed by user's processes that would not be ignored
> > by existing rules. Files and directories starting with a dot are always
> > implicitly ignored. All files that are not ignored must be covered
> > by at least one of the Manifests.
> 
> (English) There are multiple points in this paragraph, and I missed on first
> reading. The package manager part is esp. lost.

Replaced this with an enumeration.

> > > All files that are not ignored must be covered by at least one of the
> > > Manifests. Files may be ignored by several ways:
> > > - Files and directories starting with a dot are always implicitly
> > >   ignored.
> > > - The Manifest files can specify ``IGNORE`` entries to skip
> > >   verification of ubdirectories and/or files.
> > > - The package manager can support injecting ignore paths to account for
> > >   additional files created modified or removed by user's processes that
> > >   would not be ignored by existing rules.
> > File verification
> > -----------------
> > When verifying a file against the Manifest, the following rules are
> > used:
> 
> ...
> > 3. If the file is covered by an entry of the ``OPTIONAL`` type:
> >    a. if the file is present, then the verification fails,
> >    b. otherwise, the verification succeeds.
> 
> Move the OPTIONAL type further up in the verification list maybe? See
> the interpretation question below.
> 
> 
> > Modern Manifest tags
> > --------------------
> 
> ...
> > ``IGNORE <path>``
> >   Ignores a subdirectory or file from Manifest checks. If the specified
> >   path is present, it and its contents are omitted from the Manifest
> >   verification (always pass).
> 
> Clarification Needed:
> Should subdirectories have a trailing slash in the Manifest or not?
> This affects matching of the type.
> Case 1.1:
> Manifest has 'IGNORE foo'; 'foo' is a file; => ignored.
> Case 1.2:
> Manifest has 'IGNORE foo'; 'foo' is a directory; => ignored.
> Case 2.1:
> Manifest has 'IGNORE foo/'; 'foo' is a file; => FAIL
> Case 2.2:
> Manifest has 'IGNORE foo/'; 'foo' is a directory; => ignored.

Added a syntax clarification, and noted that we don't support wildcards.

> 
> 
> > ``OPTIONAL <path>``
> >   Specifies a file that does not exist in the distribution but if it
> >   did, it would be marked as ``MISC``. In the strict mode, the file
> >   must not exist for the verification to pass. The package manager
> >   may ignore a stray file matching this entry if operating in non-strict
> >   mode.
> 
> This has gotten less clear.
> Is the following correct interpretation?
> if(package manager is strict) then
>   Treat the OPTIONAL entry as NOT present in the Manifest.
>   This will cause files to be in the present set but not the covered set.
> else
>   Treat the OPTIONAL entry as 'IGNORE <path>'
> endif
> 
> > ``DIST <filename> <size> <checksums>…``
> >   Specifies a distfile entry used to verify files fetched as part
> >   of ``SRC_URI``. The filename must match the filename used to store
> >   the fetched file as specified in the PMS [#PMS-FETCH]_. The package
> >   manager must reject the fetched file if it fails verification.
> >   ``DIST`` entries apply to all packages below the Manifest file
> >   specifying them.
> 
> Nit: You have used a unicode ellipsis '…' in some places and plain ASCII
> ellipsis '...' in others. Stick to ASCII?

Well, I've actually used ASCII only in the code samples because it felt
more right but I've switched to Unicode everywhere now, to match dashes.

> 
> 
> > An example Manifest file (informational)
> > ----------------------------------------
> 
> Can you add an example for OPTIONAL?
> 
> > Tree layout restrictions
> > ------------------------
> > The Gentoo repository does not use symbolic links. Some Gentoo
> > repositories do, however. To provide a simple solution for dealing with
> > symlinks without having to take care to implement special handling for
> > them, the common behavior of implicitly resolving them is used.
> > Therefore, symbolic links to files are stored as if they were regular
> > files, and symbolic links to directories are followed as if they were
> > regular directories.
> 
> Clarification: should cross-device symlinks be rejected? (perhaps
> implicit, but wanted to check)
> If so, need to add to 'Algorithm for full-tree verification' section.

Yes, it's implied by the rules in `Directory tree coverage`_. Not sure
about adding it there. We also don't explicitly tell people to verify
file type. And in any case, I think it belongs more in `File
verification`_ algorithm since that needs to be done separately for
every file.

> > Dotfiles are implicitly ignored as that is a common notion used
> > in software written for POSIX systems. All other filenames require
> > explicit ``IGNORE`` lines.
> 
> This paragraph should re-iterate that the package manager may specify
> additional files to be ignored per the user.

Added extra paragraph about it, with example uses.

> 
> > The algorithm is restricted to work on a single filesystem. This is
> > mostly relevant when scanning for top-level Manifest — we do not want
> > to cross filesystem boundaries then. However, to ensure consistent
> > bidirectional behavior we need to also ban them when operating downwards
> > the tree.
> > 
> > The directories and files on different filesystems needs to be ignored
> > explicitly as implicitly skipping them would cause confusion.
> > In particular, tools might then claim that a file does not exist when
> > it clearly does because it was skipped due to filesystem boundaries.
> 
> The downward path needs to check the device on files.
> Otherwise:
> cat/pn/Manifest
> cat/pn/files/ <-- different filesystem here

I don't understand how this comment is relevant here. It's required
earlier.

> 
> > Non-obligatory Manifest verification
> > ------------------------------------
> 
> ...
> > The traditional ``MISC`` type is amended with a complementary
> > ``OPTIONAL`` tag to account for files that are not provided
> > in the specific repository. It aims to ensure that the same path would
> > be non-fatal when provided by the repository but fatal when created
> > by the user tooling.
> 
> Clarify the last sentence to be for strict mode only?

This wholes ection has been rewritten.

> 
> > Splitting distfile checksums from file checksums
> > ------------------------------------------------
> > 
> > Another problem with the current Manifest format is that the checksums
> > for fetched files are combined with checksums for local files
> > in a single file inside the package directory. It has been specifically
> > pointed out that:
> > 
> > - since distfiles are sometimes reused across different packages,
> >   the repeating checksums are redundant,
> > 
> > - mirror admins were interested in the possibility of verifying all
> >   the distfiles with a single tool.
> > 
> > This specification does not provide a clean solution to this problem.
> > It technically permits moving ``DIST`` entries to higher-level Manifests
> > but the usefulness of such a solution is doubtful.
> 
> Clarification of validity:
> If cat/pn1 and cat/pn2 share 1000 DIST files; would it be valid to
> have: the following:
> cat/pn1/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
> cat/pn1/Manifest:DIST unique-pn1-dist.tgz 1234 ...
> cat/pn2/Manifest:MANIFEST ../Manifest.some-shared-name 1234 ...
> cat/pn2/Manifest:DIST unique-pn2-dist.tgz 1234 ...

Nope. Parent directory references are forbidden at the top. Also
MANIFEST is not a dumb INCLUDE but local path split.

> > Performance considerations
> > --------------------------
> 
> ...
> > To improve speed on I/O and/or CPU-restrained systems even further,
> > the algorithms can be easily extended to perform incremental
> > verification. Given that rsync does not preserve mtimes by default,
> > the tool can take advantage of mtime and Manifest comparisons to recheck
> > only the parts of the repository that have changed.
> 
> Implementation note, not for GLEP addition:
> If the package manager caches by filename,inode,mtime locally, it can
> then avoid repeat-checking of the hashes (it only needs a stat),
> provided that it is happy there was no local attacker who might perform
> an in-place modification of a file (mtime&inode remain the same).

I've went for mtime matching against a single value for all files, plus
size checks (since we need to read inode anyway). Also, I don't see
the case for inode change without mtime change.

> > 
> > Furthermore, the package manager implementations can restrict checking
> > only to the parts of the repository that are actually being used.
> > 
> > 
> > Backwards Compatibility
> > =======================
> > 
> > This GLEP provides optional means of preserving backwards compatibility.
> > To preserve the backwards compatibility, the following needs to be
> > ensured:
> 
> "package directory" is common to all of the items here, if you move that
> to the list preamble, it's a lot cleaner to read.
> > > To preserve the backwards compatibility, the following needs to be
> > > ensured about package directories:
> 
> And cleanup the list:
> s/in(side)? (that|the) package directory//
> s/of a package directory//
> 

Done.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
                   ` (4 preceding siblings ...)
  2017-10-30 16:51 ` [gentoo-dev] [v1.0.2] " Michał Górny
@ 2017-11-02 19:11 ` Michał Górny
  2017-11-02 23:43   ` Robin H. Johnson
  2017-11-06 21:53 ` [gentoo-dev] [v1.0.4] " Michał Górny
  6 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-11-02 19:11 UTC (permalink / raw
  To: gentoo-dev

Next version. Now without MISC/OPTIONAL, and with many clarifications.

W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
napisał:
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
> impl: https://github.com/mgorny/gemato/

---
GLEP: 74
Title: Full-tree verification using Manifest files
Author: Michał Górny <mgorny@gentoo.org>,
        Robin Hugh Johnson <robbat2@gentoo.org>,
        Ulrich Müller <ulm@gentoo.org>
Type: Standards Track
Status: Draft
Version: 1
Created: 2017-10-21
Last-Modified: 2017-10-30
Post-History: 2017-10-26
Content-Type: text/x-rst
Requires: 59, 61
Replaces: 44, 58, 60
---

Abstract
========

This GLEP extends the Manifest file format to cover full-tree file
integrity and authenticity checks.The format aims to be future-proof,
efficient and provide means of backwards compatibility.


Motivation
==========

The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
means of verifying the integrity of distfiles and package files
in Gentoo. Combined with OpenPGP signatures, they provide means to
ensure the authenticity of the covered files. However, as noted
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
authenticity verification as they do not cover any files outside
the package directory. In particular, they provide multiple ways
for a third party to inject malicious code into the ebuild environment.

Historically, the topic of providing authenticity coverage for the whole
repository has been mentioned multiple times. The most noteworthy effort
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
They were accepted by the Council in 2010 but have never been
implemented. When potential implementation work started in 2017, a new
discussion about the specification arose. It prompted the creation
of a competing GLEP that would provide a redesigned alternative to
the old GLEPs.

This specification is designed with the following goals in mind:

1. It should provide means to ensure the authenticity of the complete
   repository, including preventing the injection of additional files.

2. The format should be universal enough to work both for the Gentoo
   repository and third-party repositories of different characteristics.

3. The Manifest files should be verifiable stand-alone, that is without
   knowing any details about the underlying repository format.


Specification
=============

Manifest file format
--------------------

This specification reuses and extends the Manifest file format defined
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
repurposed as a generic *tag* that could also indicate additional
(non-checksum) metadata. Appropriately, those tags can be followed by
other space-separated values.

Unless specified otherwise, the paths used in the Manifest files
are relative to the directory containing the Manifest file. The paths
must not reference the parent directory (``..``).


Manifest file locations and nesting
-----------------------------------

The ``Manifest`` file located in the root directory of the repository
is called top-level Manifest, and it is used to perform the full-tree
verification. In order to verify the authenticity, it must be signed
using OpenPGP, using the armored cleartext format.

The top-level Manifest may reference sub-Manifests contained
in subdirectories of the repository. The sub-Manifests are traditionally
named ``Manifest``; however, the implementation must support arbitrary
names, including the possibility of multiple (split) Manifests
for a single directory. The sub-Manifest can only cover the files inside
the directory tree where it resides.

The sub-Manifest can also be signed using OpenPGP armored cleartext
format. However, the signature verification can be omitted if it is
covered by a signed top-level Manifest.


Directory tree coverage
-----------------------

The specification provides three ways of skipping Manifest verification
of specific files and directories (recursively):

1. explicit ``IGNORE`` entries in Manifest files,

2. injected ignore paths via package manager configuration,

3. using names starting with a dot (``.``) which are always skipped.

All files that are not ignored must be covered by at least one
of the Manifests.

A single file may be matched by multiple identical or equivalent
Manifest entries, if and only if the entries have the same semantics,
specify the same size and the checksums common to both entries match.
It is an error for a single file to be matched by multiple entries
of different semantics, file size or checksum values. It is an error
to specify another entry for a file matching ``IGNORE``, or one of its
subdirectories.

The file entries (except for ``IGNORE``) can be specified for regular
files only. Symbolic links are followed when opening files
and traversing directories. It is an error to specify an entry for
a different file type. If the tree contain files of other types
that are not otherwise ignored, they need to be covered by an explicit
``IGNORE``.

All the local (non-``DIST``) files covered by a Manifest tree must
reside on the same filesystem. It is an error to specify entries
applying to files on another filesystem. If subdirectories
that are not otherwise ignored reside on a different filesystem, they
must be explicitly excluded via ``IGNORE``.


File verification
-----------------

When verifying a file against the Manifest, the following rules are
used:

1. If the file is covered directly or indirectly by an entry
   of the ``IGNORE`` type, the verification always succeeds.

2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
   ``MISC``, ``EBUILD`` or ``AUX`` type:

   a. if the file is not present, then the verification fails,

   b. if the file is present but has a different size or one
      of the checksums does not match, the verification fails,

   c. otherwise, the verification succeeds.

3. If the file is present but not listed in Manifest, the verification
   fails.

Unless specified otherwise, the package manager must not allow using
any files for which the verification failed. The package manager may
reject any package or even the whole repository if it may refer to files
for which the verification failed.


Timestamp verification
----------------------

The Manifest file can contain a ``TIMESTAMP`` entry to account
for attacks against tree update distribution. If such an entry
is present, it should be updated every time at least one
of the Manifests changes. Every unique timestamp value must correspond
to a single tree state.

During the verification process, the client should compare the timestamp
against the update time obtained from a local clock or a trusted time
source. If the comparison result indicates that the Manifest at the time
of receiving was already significantly outdated, the client should
either fail the verification or require manual confirmation from user.

Furthermore, the Manifest provider may employ additional methods
of distributing the timestamps of recently generated Manifests
using a secure channel from a trusted source for exact comparison.
The exact details of such a solution are outside the scope of this
specification.


Modern Manifest tags
--------------------

The Manifest files can specify the following tags:

``TIMESTAMP <iso8601>``
  Specifies a timestamp of when the Manifest file was last updated.
  The timestamp must be a valid second-precision ISO8601 extended format
  combined date and time in UTC timezone, i.e. using the following
  ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
  in the top-level Manifest file. The package manager can use it
  to detect an outdated repository checkout as described in `Timestamp
  verification`_.

``MANIFEST <path> <size> <checksums>…``
  Specifies a sub-Manifest. The sub-Manifest must be verified like
  a regular file. If the verification succeeds, the entries from
  the sub-Manifest are included for verification as described
  in `Manifest file locations and nesting`_.

``IGNORE <path>``
  Ignores a subdirectory or file from Manifest checks. If the specified
  path is present, it and its contents are omitted from the Manifest
  verification (always pass). *Path* must be a plain file or directory
  path without a trailing slash, and must not contain wildcards.

``DATA <path> <size> <checksums>…``
  Specifies a regular file subject to Manifest verification. The file
  is required to pass verification. Used for all files that do not match
  any other type.

``DIST <filename> <size> <checksums>…``
  Specifies a distfile entry used to verify files fetched as part
  of ``SRC_URI``. The filename must match the filename used to store
  the fetched file as specified in the PMS [#PMS-FETCH]_. The package
  manager must reject the fetched file if it fails verification.
  ``DIST`` entries apply to all packages below the Manifest file
  specifying them.


Deprecated Manifest tags
------------------------

For backwards compatibility, the following tags are additionally
allowed at the package directory level:

``EBUILD <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type.

``MISC <path> <size> <checksums>…``
  Equivalent to the ``DATA`` type. Historically indicated that
  the package manager may ignore a verification failure if operating
  in non-strict mode. However, that behavior is deprecated.

``AUX <filename> <size> <checksums>…``
  Equivalent to the ``DATA`` type, except that the filename is relative
  to ``files/`` subdirectory.


Algorithm for full-tree verification
------------------------------------

In order to perform full-tree verification, the following algorithm
can be used:

1. Collect all files present in the repository into *present* set.

2. Start at the top-level Manifest file. Verify its OpenPGP signature.
   Optionally verify the ``TIMESTAMP`` entry if present as specified
   in `timestamp verification`. Remove the top-level Manifest
   from the *present* set.

3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
   files according to `file verification`_ section, and include their
   entries in the current Manifest entry list (using paths relative
   to directories containing the Manifests).

4. Process all ``IGNORE`` entries. Remove any paths matching them
   from the *present* set.

5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
   and ``AUX`` entries into the *covered* set.

6. Verify the entries in *covered* set for incompatible duplicates
   and collisions with ignored files as explained in `Manifest file
   locations and nesting`_.

7. Verify all the files in the union of the *present* and *covered*
   sets, according to `file verification`_ section.


Algorithm for finding parent Manifests
--------------------------------------

In order to find the top-level Manifest from the current directory
the following algorithm can be used:

1. Store the current directory as *original* and the device ID
   of the containing filesystem (``st_dev``) as *startdev*,

2. If the device ID of the containing filesystem (``st_dev``)
   of the current directory is different than *startdev*, stop.

3. If the current directory contains a ``Manifest`` file:

   a. If a ``IGNORE`` entry in the ``Manifest`` file covers
      the *original* directory (or one of the parent directories), stop.

   b. Otherwise, store the current directory as *last_found*.

4. If the current directory is the root system directory (``/``), stop.

5. Otherwise, enter the parent directory and jump to step 2.

Once the algorithm stops, *last_found* will contain the relevant
top-level Manifest. If *last_found* is null, then the directory tree
does not contain any valid top-level Manifest candidates and one should
be created in the *original* directory.

Once the top-level Manifest is found, its ``MANIFEST`` entries should
be used to find any sub-Manifests below the top-level Manifest,
up to and including the *original* directory. Note that those
sub-Manifests can use different filenames than ``Manifest``.


Checksum algorithms
-------------------

This section is informational only. Specifying the exact set
of supported algorithms is outside the scope of this specification.

The algorithm names reserved at the time of writing are:

- ``MD5`` [#MD5]_,
- ``RMD160`` — RIPEMD-160 [#RIPEMD160]_,
- ``SHA1`` [#SHS]_,
- ``SHA256`` and ``SHA512`` — SHA-2 family of hashes [#SHS]_,
- ``WHIRLPOOL`` [#WHIRLPOOL]_,
- ``BLAKE2B`` and ``BLAKE2S`` — BLAKE2 family of hashes [#BLAKE2]_,
- ``SHA3_256`` and ``SHA3_512`` — SHA-3 family of hashes [#SHA3]_,
- ``STREEBOG256`` and ``STREEBOG512`` — Streebog family of hashes
  [#STREEBOG]_.

The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
It is recommended that any new hashes are named after the Python
``hashlib`` module algorithm names, transformed into uppercase.


Manifest compression
--------------------

The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
This section merely addresses interoperability issues between Manifest
compression and this specification.

The compressed Manifest files are required to be suffixed for their
compression algorithm. This suffix should be used to recognize
the compression and decompress Manifests transparently. The exact list
of algorithms and their corresponding suffixes are outside the scope
of this specification.

Whenever this specification refers to top-level Manifest file,
the implementation should account for compressed variants of this file
with appropriate suffixes (e.g. ``Manifest.gz``).

Whenever this specification refers to sub-Manifests, they can use any
names but are also required to use a specific compression suffix.
The ``MANIFEST`` entries are required to specify the full name including
compression suffix, and the verification is performed on the compressed
file.

The specification permits uncompressed Manifests to exist alongside
their compressed counterparts, and multiple compressed formats
to coexist. If that is the case, the files must have the same
uncompressed content and the specification is free to choose either
of the files using the same base name.


An example Manifest file (informational)
----------------------------------------

An example top-level Manifest file for the Gentoo repository would have
the following content::

    TIMESTAMP 2017-10-30T10:11:12Z
    IGNORE distfiles
    IGNORE local
    IGNORE lost+found
    IGNORE packages
    MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
    …
    MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
    …

An example modern Manifest (disregarding backwards compatibility)
for a package directory would have the following content::

    DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
    DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
    DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
    DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
    DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
    DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
    DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..


Rationale
=========

Stand-alone format
------------------

The first question that needed to be asked before proceeding with
the design was whether the Manifest file format was supposed to be
stand-alone, or tightly bound to the repository format.

The stand-alone format has been selected because of its three
advantages:

1. It is more future-proof. If an incompatible change to the repository
   format is introduced, only developers need to be upgrade the tools
   they use to generate the Manifests. The tools used to verify
   the updated Manifests will continue to work.

2. It is more flexible and universal. With a dedicated tool,
   the Manifest files can be used to sign and verify arbitrary file
   sets.

3. It keeps the verification tool simpler. In particular, we can easily
   write an independent verification tool that could work on any
   distribution without needing to depend on a package manager
   implementation or rewrite parts of it.

Designing a stand-alone format requires that the Manifest carries enough
information to perform the verification following all the rules specific
to the Gentoo repository.


Tree design
-----------

The second important point of the design was determining whether
the Manifest files should be structured hierarchically, or independent.
Both options have their advantages.

In the hierarchical model, each sub-Manifest file is covered by a higher
level Manifest. As a result, only the top-level Manifest has to be
OpenPGP-signed, and subsequent Manifests need to be only verified by
checksum stored in the parent Manifest. This has the following
implications:

- Verifying any set of files in the repository requires using checksums
  from the most relevant Manifests and the parent Manifests.

- The OpenPGP signature of the top-level Manifest needs to be verified
  only once per process.

- Altering any set of files requires updating the relevant Manifests,
  and their parent Manifests up to the top-level Manifest, and signing
  the last one.

- As a result, the top-level Manifest changes on every commit,
  and various middle-level Manifests change (and need to be transferred)
  frequently.

In the independent model, each sub-Manifest file is independent
of the parent Manifests. As a result, each of them needs to be signed
and verified independently. However, the parent Manifests still need
to list sub-Manifests (albeit without verification data) in order
to detect removal or replacement of subdirectories. This has
the following implications:

- Verifying any set of files in the repository requires using checksums
  and verifying signatures of the most relevant Manifest files.

- Altering any set of files requires updating the relevant Manifests
  and signing them again.

- Parent Manifests are updated only when Manifests are added or removed
  from subdirectories. As a result, they change infrequently.

While both models have their advantages, the hierarchical model was
selected because it reduces the number of OpenPGP operations
which are comparatively costly to the minimum.


Tree layout restrictions
------------------------

The algorithm is meant to work primarily with ebuild repositories which
normally contain only files and directories. Directories provide
no useful metadata for verification, and specifying special entries
for additional file types is purposeless. Therefore, the specification
is restricted to dealing with regular files.

The Gentoo repository does not use symbolic links. Some Gentoo
repositories do, however. To provide a simple solution for dealing with
symlinks without having to take care to implement special handling for
them, the common behavior of implicitly resolving them is used.
Therefore, symbolic links to files are stored as if they were regular
files, and symbolic links to directories are followed as if they were
regular directories.

Dotfiles are implicitly ignored as that is a common notion used
in software written for POSIX systems. All other common filenames
require explicit ``IGNORE`` lines.

An ability to inject additional ignore entries is provided to account
for site configuration affecting the repository tree — placing
additional files in it, skipping some of the categories from syncing.

The algorithm is restricted to work on a single filesystem. This is
mostly relevant when scanning for top-level Manifest — we do not want
to cross filesystem boundaries then. However, to ensure consistent
bidirectional behavior we need to also ban them when operating downwards
the tree.

The directories and files on different filesystems need to be ignored
explicitly as implicitly skipping them would cause confusion.
In particular, tools might then claim that a file does not exist when
it clearly does because it was skipped due to filesystem boundaries.


File verification model
-----------------------

The verification model aims to provide full coverage against different
forms of attack. In particular, three different kinds of manipulation
are considered:

1. Alteration of the file content.

2. Removal of a file.

3. Addition of a new file.

In order to prevent against all three, the system requires that all
files in the repository are listed in Manifests and verified against
them.

As a special case, ignores are allowed to account for directories
that are not part of the repository but were traditionally placed inside
it. Those directories were ``distfiles``, ``local`` and ``packages``. It
could be also used to ignore VCS directories such as ``CVS``.


Non-strict Manifest verification
--------------------------------

Originally the Manifest2 format provided a special ``MISC`` tag that
was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
indicated that the Manifest verification failures could be ignored for
those files unless the package manager was working in strict mode.

The first versions of this specification continued the use of this tag.
However, after a long debate it was decided to deprecate it along with
the non-strict behavior, and require all files to strictly match.

Two arguments were mentioned for the usefulness of a ``MISC`` type:

1. being able to reduce the checkout size by stripping unnecessary
   files out, and

2. being able to run update automatically generated files locally
   without causing unnecessary verification failures.

However, the usefulness of ``MISC`` in both cases is doubtful.

The cases for stripping unnecessary files mostly focused around space
savings. For this purpose, stripping ``metadata.xml`` and similar files
has little value. It is much more common for users to strip whole
categories which can not be handled via the ``MISC`` type, and needs
a dedicated package manager mechanism. The same mechanism can also
handle files that used the ``MISC`` type.

The cases for autogenerated files involve such cache files
as ``use.local.desc``. However, we can not include ``md5-cache`` there
due to security concerns which results in inconsistent cache handling.
Furthermore, the tools were historically modified to provide stable
output which means that their content can not change without
a non-``MISC`` content being changed first. This practically defeats
the purpose of using ``MISC``.

Finally, the non-strict mode could be used as means to an attack.
The allowance of missing or modified documentation file could be used
to spread misinformation, resulting in bad decisions made by the user.
A modified file could also be used e.g. to exploit vulnerabilities
of an XML parser.


Timestamp field
---------------

The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
to include a generation timestamp in the Manifest. A similar feature
was originally proposed in GLEP 58 [#GLEP58]_.

A malicious third-party may use the principles of exclusion or replay
[#C08]_ to deny an update to clients, while at the same time recording
the identity of clients to attack. The timestamp field can be used to
detect that.

In order to provide a more complete protection, the Gentoo
Infrastructure should provide an ability to obtain the timestamps
of all Manifests from a recent timeframe over a secure channel
from a trusted source for comparison.

Strictly speaking, this information is already provided by the various
``metadata/timestamp*`` files that are already present. However,
including the value in the Manifest itself has a little cost
and provides the ability to perform the verification stand-alone.

Furthermore, some of the timestamp files are added very late
in the distribution process, past the Manifest generation phase. Those
files will most likely receive ``IGNORE`` entries and therefore
be not suitable to safe use.


New vs deprecated tags
----------------------

Out of the four types defined by Manifest2, only one is reused
and the remaining three is replaced by a single, universal ``DATA``
type.

The ``DIST`` tag is reused since the specification does not change
anything with regard to distfile handling.

The ``EBUILD`` tag could potentially be reused for generic file
verification data. However, it would be confusing if all the different
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
type was introduced as a replacement.

The ``MISC`` tag and the relevant non-strict mode has been removed
as being of little value, as detailed in the `Non-strict Manifest
verification`_ section.

The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
the limiting property of implicit ``files/`` path prefix.


Finding top-level Manifest
--------------------------

The development of a reference implementation for this GLEP has brought
the following problem: how to find all the relevant Manifests when
the Manifest tool is run inside a subdirectory of the repository?

One of the options would be to provide a bi-directional linking
of Manifests via a ``PARENT`` tag. However, that would not solve
the problem when a new Manifest file is being created.

Instead, an algorithm for iterating over parent directories is proposed.
Since there is no obligatory explicit indicator for the top-level
Manifest, the algorithm assumes that the top-level Manifest
is the highest ``Manifest`` in the directory hierarchy that can cover
the current directory. This generally makes sense since the Manifest
files are required to provide coverage for all subdirectories, so all
Manifests starting from that one need to be updated.

If independent Manifest trees are nested in the directory structure,
then an ``IGNORE`` entry needs to be used to separate them.

Since sub-Manifests can use any filenames, the Manifest finding
algorithm must not short-cut the procedure by storing all ``Manifest``
files along the parent directories. Instead, it needs to retrace
the relevant sub-Manifest files along ``MANIFEST`` entries
in the top-level Manifest.


Injecting ChangeLogs into the checkout
--------------------------------------

One of the problems considered in the new Manifest format was that
of injecting historical and autogenerated ChangeLog into the repository.
Normally we are not including those files to reduce the checkout size.
However, some users have shown interest in them and Infra is working
on providing them via an additional rsync module.

If such files were injected into the repository, they would cause
verification failures of Manifests. To account for this, Infra could
provide ``IGNORE`` entries to allow them to exist.


Splitting distfile checksums from file checksums
------------------------------------------------

Another problem with the current Manifest format is that the checksums
for fetched files are combined with checksums for local files
in a single file inside the package directory. It has been specifically
pointed out that:

- since distfiles are sometimes reused across different packages,
  the repeating checksums are redundant,

- mirror admins were interested in the possibility of verifying all
  the distfiles with a single tool.

This specification does not provide a clean solution to this problem.
It technically permits moving ``DIST`` entries to higher-level Manifests
but the usefulness of such a solution is doubtful.

However, for the second problem we will probably deliver a dedicated
tool working with this Manifest format.


Hash algorithms
---------------

While maintaining a consistent supported hash set is important
for interoperability, it is no good fit for the generic layout of this
GLEP. Furthermore, it would require updating the GLEP in the future
every time the used algorithms change.

Instead, the specification focuses on listing the currently used
algorithm names for interoperability, and sets a recommendation
for consistent naming of algorithms in the future. The Python
``hashlib`` module is used as a reference since it is used
as the provider of hash functions for most of the Python software,
including Portage and PkgCore.

The basic rules for changing hash algorithms are defined in GLEP 59
[#GLEP59]_. The implementations can focus only on those algorithms
that are actually used or planned on being used. It may be feasible
to devise a new GLEP that specifies the currently used hashes (or update
GLEP 59 accordingly).


Manifest compression
--------------------

The support for Manifest compression is introduced with minimal changes
to the file format. The ``MANIFEST`` entries are required to provide
the real (compressed) file path for compatibility with other file
entries and to avoid confusion.

The existence of additional entries for uncompressed Manifest checksums
was debated. However, plain entries for the uncompressed file would
be confusing if only compressed file existed, and conflicting if both
uncompressed and compressed variants existed. Furthermore, it has been
pointed out that ``DIST`` entries do not have uncompressed variant
either.


Performance considerations
--------------------------

Performing a full-tree verification on every sync raises some
performance concerns for end-user systems. The initial testing has shown
that a cold-cache verification on a btrfs file system can take up around
4 minutes, with the process being mostly I/O bound. On the other hand,
it can be expected that the verification will be performed directly
after syncing, taking advantage of warm filesystem cache.

To improve speed on I/O and/or CPU-restrained systems even further,
the algorithms can be easily extended to perform incremental
verification. Given that rsync does not preserve mtimes by default,
the tool can take advantage of mtime and Manifest comparisons to recheck
only the parts of the repository that have changed.

Furthermore, the package manager implementations can restrict checking
only to the parts of the repository that are actually being used.


Backwards Compatibility
=======================

This GLEP provides optional means of preserving backwards compatibility.
To preserve the backwards compatibility, the following needs to hold
for the ``Manifest`` file in every package directory:

- all files must be covered by the single ``Manifest`` file,

- all distfiles used by the package must be included,

- all files inside the ``files/`` subdirectory need to use
  the ``AUX`` tag (rather than ``DATA``),

- all ``.ebuild`` files need to use the ``EBUILD`` tag,

` the ``metadata.xml`` and ``ChangeLog`` files need to use
  the ``MISC`` tag,

- the Manifest can be signed to provide authenticity verification,

- an uncompressed Manifest must always exist, and a compressed Manifest
  of identical content may be present.

Once the backwards compatibility is no longer a concern, the above
no longer needs to hold and the deprecated tags can be removed.


Reference Implementation
========================

The reference implementation for this GLEP is being developed
as the gemato project [#GEMATO]_.


Credits
=======

Thanks to all the people whose contributions were invaluable
to the creation of this GLEP. This includes but is not limited to:

- Robin Hugh Johnson,
- Ulrich Müller.

Additionally, thanks to Robin Hugh Johnson for the original
MataManifest GLEP series which served both as inspiration and source
of many concepts used in this GLEP. Recursively, also thanks to all
the people who contributed to the original GLEPs.


References
==========

.. [#GLEP44] GLEP 44: Manifest2 format
   (https://www.gentoo.org/glep/glep-0044.html)

.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
   - Overview
   (https://www.gentoo.org/glep/glep-0057.html)

.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
   - Infrastructure to User distribution - MetaManifest
   (https://www.gentoo.org/glep/glep-0058.html)

.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
   (https://www.gentoo.org/glep/glep-0059.html)

.. [#GLEP60] GLEP 60: Manifest2 filetypes
   (https://www.gentoo.org/glep/glep-0060.html)

.. [#GLEP61] GLEP 61: Manifest2 compression
   (https://www.gentoo.org/glep/glep-0061.html)

.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
   Format - SRC_URI
   (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)

.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
   (https://www.ietf.org/rfc/rfc1321.txt)

.. [#RIPEMD160] The hash function RIPEMD-160
   (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)

.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)

.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
   (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)

.. [#BLAKE2] BLAKE2 — fast secure hashing
   (https://blake2.net/)

.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
   and Extendable-Output Functions
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)

.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
   (https://www.streebog.net/)

.. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
   (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)

.. [#GEMATO] gemato: Gentoo Manifest Tool
   (https://github.com/mgorny/gemato/)

Copyright
=========
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
  2017-11-02 19:11 ` [gentoo-dev] [v1.0.3] " Michał Górny
@ 2017-11-02 23:43   ` Robin H. Johnson
  2017-11-05 21:10     ` Michał Górny
  0 siblings, 1 reply; 32+ messages in thread
From: Robin H. Johnson @ 2017-11-02 23:43 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 5057 bytes --]

On Thu, Nov 02, 2017 at 08:11:59PM +0100, Michał Górny wrote:
> Next version. Now without MISC/OPTIONAL, and with many clarifications.
Huge improvements in this version, I found it much easier to understand.

Nits: 
- please stick to ASCII ellipsis. The unicode ellipsis is unreadable in
  some monospace fonts.

Further items inline:
> Directory tree coverage
> -----------------------
...
> The file entries (except for ``IGNORE``) can be specified for regular
> files only. Symbolic links are followed when opening files
> and traversing directories. It is an error to specify an entry for
> a different file type. If the tree contain files of other types
> that are not otherwise ignored, they need to be covered by an explicit
> ``IGNORE``.
> 
> All the local (non-``DIST``) files covered by a Manifest tree must
> reside on the same filesystem. It is an error to specify entries
> applying to files on another filesystem. If subdirectories
> that are not otherwise ignored reside on a different filesystem, they
> must be explicitly excluded via ``IGNORE``.
I would prefer this to say:
'If files that are not otherwise ignored reside on a different
filesystem', as expanded from sub-directories.  
This implicitly forbids following a symlink that crosses a filesystem
boundary, and then matches the similar part of 'Tree layout
restrictions'.

> Rationale
> =========
...
> Tree layout restrictions
> ------------------------
> 
> The algorithm is meant to work primarily with ebuild repositories which
> normally contain only files and directories. Directories provide
> no useful metadata for verification, and specifying special entries
> for additional file types is purposeless. Therefore, the specification
> is restricted to dealing with regular files.
> 
> The Gentoo repository does not use symbolic links. Some Gentoo
> repositories do, however. To provide a simple solution for dealing with
> symlinks without having to take care to implement special handling for
> them, the common behavior of implicitly resolving them is used.
> Therefore, symbolic links to files are stored as if they were regular
> files, and symbolic links to directories are followed as if they were
> regular directories.
> 
> Dotfiles are implicitly ignored as that is a common notion used
> in software written for POSIX systems. All other common filenames
> require explicit ``IGNORE`` lines.
'common' in the second sentence seems odd. What about uncommon
filenames? Maybe just s/other common filenames/other filenames/.

> An ability to inject additional ignore entries is provided to account
> for site configuration affecting the repository tree — placing
> additional files in it, skipping some of the categories from syncing.
Mention that the package manager may provide wildcards or regex in the
additional entries. Eg: 'IGNORE **/metadata.xml' 

> Non-strict Manifest verification
> --------------------------------
...
> The cases for stripping unnecessary files mostly focused around space
> savings. For this purpose, stripping ``metadata.xml`` and similar files
> has little value. It is much more common for users to strip whole
> categories which can not be handled via the ``MISC`` type, and needs
> a dedicated package manager mechanism. The same mechanism can also
> handle files that used the ``MISC`` type.
Exclusion by package does happen as well. A list of categories or
packages can be used for both the rsync exclusion and the IGNORE.

> Splitting distfile checksums from file checksums
> ------------------------------------------------
> 
> Another problem with the current Manifest format is that the checksums
> for fetched files are combined with checksums for local files
> in a single file inside the package directory. It has been specifically
> pointed out that:
> 
> - since distfiles are sometimes reused across different packages,
>   the repeating checksums are redundant,
Comment: 8.4% of all DIST entries are duplicate, representing a 2MiB
saving in tree size (25MiB of DIST entries altogether).

> - mirror admins were interested in the possibility of verifying all
>   the distfiles with a single tool.
> 
> This specification does not provide a clean solution to this problem.
> It technically permits moving ``DIST`` entries to higher-level Manifests
> but the usefulness of such a solution is doubtful.
This solution would require the packager manager to consider
higher-level Manifests or all Manifests in the tree when searching for
the DIST entry. The most useful implementation of this would be for the
git->rsync process to move all DIST entries elsewhere (metadata/ maybe).

Either way, this would have many downsides, and make manual work on the
Manifest DIST entries painful.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
  2017-11-02 23:43   ` Robin H. Johnson
@ 2017-11-05 21:10     ` Michał Górny
  2017-11-06 20:42       ` Robin H. Johnson
  0 siblings, 1 reply; 32+ messages in thread
From: Michał Górny @ 2017-11-05 21:10 UTC (permalink / raw
  To: gentoo-dev

W dniu czw, 02.11.2017 o godzinie 23∶43 +0000, użytkownik Robin H.
Johnson napisał:
> On Thu, Nov 02, 2017 at 08:11:59PM +0100, Michał Górny wrote:
> > Next version. Now without MISC/OPTIONAL, and with many clarifications.
> 
> Huge improvements in this version, I found it much easier to understand.
> 
> Nits: 
> - please stick to ASCII ellipsis. The unicode ellipsis is unreadable in
>   some monospace fonts.

Done. Also replaced '—' for consistency.

> 
> Further items inline:
> > Directory tree coverage
> > -----------------------
> 
> ...
> > The file entries (except for ``IGNORE``) can be specified for regular
> > files only. Symbolic links are followed when opening files
> > and traversing directories. It is an error to specify an entry for
> > a different file type. If the tree contain files of other types
> > that are not otherwise ignored, they need to be covered by an explicit
> > ``IGNORE``.
> > 
> > All the local (non-``DIST``) files covered by a Manifest tree must
> > reside on the same filesystem. It is an error to specify entries
> > applying to files on another filesystem. If subdirectories
> > that are not otherwise ignored reside on a different filesystem, they
> > must be explicitly excluded via ``IGNORE``.
> 
> I would prefer this to say:
> 'If files that are not otherwise ignored reside on a different
> filesystem', as expanded from sub-directories.  
> This implicitly forbids following a symlink that crosses a filesystem
> boundary, and then matches the similar part of 'Tree layout
> restrictions'.

I've went for something even more explicit:

| If files or directories that are not otherwise ignored reside
| on a different filesystem, or symbolic links point to targets
| on a different filesystem, they must be explicitly excluded
| via ``IGNORE``.


> 
> > Rationale
> > =========
> 
> ...
> > Tree layout restrictions
> > ------------------------
> > 
> > The algorithm is meant to work primarily with ebuild repositories which
> > normally contain only files and directories. Directories provide
> > no useful metadata for verification, and specifying special entries
> > for additional file types is purposeless. Therefore, the specification
> > is restricted to dealing with regular files.
> > 
> > The Gentoo repository does not use symbolic links. Some Gentoo
> > repositories do, however. To provide a simple solution for dealing with
> > symlinks without having to take care to implement special handling for
> > them, the common behavior of implicitly resolving them is used.
> > Therefore, symbolic links to files are stored as if they were regular
> > files, and symbolic links to directories are followed as if they were
> > regular directories.
> > 
> > Dotfiles are implicitly ignored as that is a common notion used
> > in software written for POSIX systems. All other common filenames
> > require explicit ``IGNORE`` lines.
> 
> 'common' in the second sentence seems odd. What about uncommon
> filenames? Maybe just s/other common filenames/other filenames/.

Done. The idea was to say 'do not put IGNORE for corner cases which are
better handled via PM config' but I guess it's not necessary here.

> 
> > An ability to inject additional ignore entries is provided to account
> > for site configuration affecting the repository tree — placing
> > additional files in it, skipping some of the categories from syncing.
> 
> Mention that the package manager may provide wildcards or regex in the
> additional entries. Eg: 'IGNORE **/metadata.xml' 

Done.

| This configuration can extend beyond the limits of this GLEP,
| e.g. by allowing wildcards or regular expressions.

> 
> > Non-strict Manifest verification
> > --------------------------------
> 
> ...
> > The cases for stripping unnecessary files mostly focused around space
> > savings. For this purpose, stripping ``metadata.xml`` and similar files
> > has little value. It is much more common for users to strip whole
> > categories which can not be handled via the ``MISC`` type, and needs
> > a dedicated package manager mechanism. The same mechanism can also
> > handle files that used the ``MISC`` type.
> 
> Exclusion by package does happen as well. A list of categories or
> packages can be used for both the rsync exclusion and the IGNORE.

Rewritten to:

| It is much more common for users to strip whole packages
| or categories. The ``MISC`` type is not suitable for that,
| and so a dedicated package manager mechanism needs to be developed
| instead; possibly combining it with rsync exclusion list. The same
| mechanism can also handle files that historically used the ``MISC``
| type.

But it's merely a rationale, so I'd rather not spend another hour trying
to cover every corner case in it.

> 
> > Splitting distfile checksums from file checksums
> > ------------------------------------------------
> > 
> > Another problem with the current Manifest format is that the checksums
> > for fetched files are combined with checksums for local files
> > in a single file inside the package directory. It has been specifically
> > pointed out that:
> > 
> > - since distfiles are sometimes reused across different packages,
> >   the repeating checksums are redundant,
> 
> Comment: 8.4% of all DIST entries are duplicate, representing a 2MiB
> saving in tree size (25MiB of DIST entries altogether).

Included as footnote:

.. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
   at the time of writing are duplicate, representing a 2 MiB
   out of 25 MiB of DIST entries altogether.

> 
> > - mirror admins were interested in the possibility of verifying all
> >   the distfiles with a single tool.
> > 
> > This specification does not provide a clean solution to this problem.
> > It technically permits moving ``DIST`` entries to higher-level Manifests
> > but the usefulness of such a solution is doubtful.
> 
> This solution would require the packager manager to consider
> higher-level Manifests or all Manifests in the tree when searching for
> the DIST entry. The most useful implementation of this would be for the
> git->rsync process to move all DIST entries elsewhere (metadata/ maybe).

Technically speaking, the package manager needs to consider parent
Manifests anyway in order to verify the deeper Manifests, and I think we
can reasonably assume it will keep them cached.

> 
> Either way, this would have many downsides, and make manual work on the
> Manifest DIST entries painful.

That's what 'doubtful usefulness' means ;-P.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.3] GLEP 74: Full-tree verification using Manifest files
  2017-11-05 21:10     ` Michał Górny
@ 2017-11-06 20:42       ` Robin H. Johnson
  0 siblings, 0 replies; 32+ messages in thread
From: Robin H. Johnson @ 2017-11-06 20:42 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 2391 bytes --]

On Sun, Nov 05, 2017 at 10:10:32PM +0100, Michał Górny wrote:
> > Nits: 
> > - please stick to ASCII ellipsis. The unicode ellipsis is unreadable in
> >   some monospace fonts.
> Done. Also replaced '—' for consistency.
I wasn't even aware you had used a different dash, it was rendered
identically here, definitely thanks for fixing that too.

> > Further items inline:
> > > Directory tree coverage
> > > -----------------------
> I've went for something even more explicit:
> | If files or directories that are not otherwise ignored reside
> | on a different filesystem, or symbolic links point to targets
> | on a different filesystem, they must be explicitly excluded
> | via ``IGNORE``.
+1, resolves the concern very well, nice and clear.

> > > Tree layout restrictions
> > > ------------------------
> > 'common' in the second sentence seems odd. What about uncommon
> > filenames? Maybe just s/other common filenames/other filenames/.
> Done. The idea was to say 'do not put IGNORE for corner cases which are
> better handled via PM config' but I guess it's not necessary here.
Yes. Generally, IGNORE entries in Manifest should be for files
distributed alongside the Manifest. We're say as common special cases,
that local/distfiles/packages/lost+found are also known for ignore,
since they have previously-defined meaning in the repo (along with the
old timestamp files).

> > > Non-strict Manifest verification
> > > --------------------------------
> Rewritten to:
> | It is much more common for users to strip whole packages
> | or categories. The ``MISC`` type is not suitable for that,
> | and so a dedicated package manager mechanism needs to be developed
> | instead; possibly combining it with rsync exclusion list. The same
> | mechanism can also handle files that historically used the ``MISC``
> | type.
> But it's merely a rationale, so I'd rather not spend another hour trying
> to cover every corner case in it.
+1. Maybe cover it with a single sentence, "As an example, the package
manager may choose to generate both the rsync exclusion list and
Manifest IGNORE based on a source list"

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Asst. Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [gentoo-dev] [v1.0.4] GLEP 74: Full-tree verification using Manifest files
  2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
                   ` (5 preceding siblings ...)
  2017-11-02 19:11 ` [gentoo-dev] [v1.0.3] " Michał Górny
@ 2017-11-06 21:53 ` Michał Górny
  6 siblings, 0 replies; 32+ messages in thread
From: Michał Górny @ 2017-11-06 21:53 UTC (permalink / raw
  To: gentoo-dev

Hopefully the last version, after getting all the suggestions
from Robin.

W dniu czw, 26.10.2017 o godzinie 22∶12 +0200, użytkownik Michał Górny
napisał:
> 
> ReST: https://dev.gentoo.org/~mgorny/tmp/glep-0074.rst
> HTML: https://dev.gentoo.org/~mgorny/tmp/glep-0074.html
> impl: https://github.com/mgorny/gemato/
> 

---
GLEP: 74
Title: Full-tree verification using Manifest files
Author: Michał Górny <mgorny@gentoo.org>,
        Robin Hugh Johnson <robbat2@gentoo.org>,
        Ulrich Müller <ulm@gentoo.org>
Type: Standards Track
Status: Draft
Version: 1
Created: 2017-10-21
Last-Modified: 2017-11-06
Post-History: 2017-10-26
Content-Type: text/x-rst
Requires: 59, 61
Replaces: 44, 58, 60
---

Abstract
========

This GLEP extends the Manifest file format to cover full-tree file
integrity and authenticity checks.The format aims to be future-proof,
efficient and provide means of backwards compatibility.


Motivation
==========

The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
means of verifying the integrity of distfiles and package files
in Gentoo. Combined with OpenPGP signatures, they provide means to
ensure the authenticity of the covered files. However, as noted
in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
authenticity verification as they do not cover any files outside
the package directory. In particular, they provide multiple ways
for a third party to inject malicious code into the ebuild environment.

Historically, the topic of providing authenticity coverage for the whole
repository has been mentioned multiple times. The most noteworthy effort
are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
They were accepted by the Council in 2010 but have never been
implemented. When potential implementation work started in 2017, a new
discussion about the specification arose. It prompted the creation
of a competing GLEP that would provide a redesigned alternative to
the old GLEPs.

This specification is designed with the following goals in mind:

1. It should provide means to ensure the authenticity of the complete
   repository, including preventing the injection of additional files.

2. The format should be universal enough to work both for the Gentoo
   repository and third-party repositories of different characteristics.

3. The Manifest files should be verifiable stand-alone, that is without
   knowing any details about the underlying repository format.


Specification
=============

Manifest file format
--------------------

This specification reuses and extends the Manifest file format defined
in GLEP 44 [#GLEP44]_. For the purpose of it, the *file type* field is
repurposed as a generic *tag* that could also indicate additional
(non-checksum) metadata. Appropriately, those tags can be followed by
other space-separated values.

Unless specified otherwise, the paths used in the Manifest files
are relative to the directory containing the Manifest file. The paths
must not reference the parent directory (``..``).


Manifest file locations and nesting
-----------------------------------

The ``Manifest`` file located in the root directory of the repository
is called top-level Manifest, and it is used to perform the full-tree
verification. In order to verify the authenticity, it must be signed
using OpenPGP, using the armored cleartext format.

The top-level Manifest may reference sub-Manifests contained
in subdirectories of the repository. The sub-Manifests are traditionally
named ``Manifest``; however, the implementation must support arbitrary
names, including the possibility of multiple (split) Manifests
for a single directory. The sub-Manifest can only cover the files inside
the directory tree where it resides.

The sub-Manifest can also be signed using OpenPGP armored cleartext
format. However, the signature verification can be omitted if it is
covered by a signed top-level Manifest.


Directory tree coverage
-----------------------

The specification provides three ways of skipping Manifest verification
of specific files and directories (recursively):

1. explicit ``IGNORE`` entries in Manifest files,

2. injected ignore paths via package manager configuration,

3. using names starting with a dot (``.``) which are always skipped.

All files that are not ignored must be covered by at least one
of the Manifests.

A single file may be matched by multiple identical or equivalent
Manifest entries, if and only if the entries have the same semantics,
specify the same size and the checksums common to both entries match.
It is an error for a single file to be matched by multiple entries
of different semantics, file size or checksum values. It is an error
to specify another entry for a file matching ``IGNORE``, or one of its
subdirectories.

The file entries (except for ``IGNORE``) can be specified for regular
files only. Symbolic links are followed when opening files
and traversing directories. It is an error to specify an entry for
a different file type. If the tree contain files of other types
that are not otherwise ignored, they need to be covered by an explicit
``IGNORE``.

All the local (non-``DIST``) files covered by a Manifest tree must
reside on the same filesystem. It is an error to specify entries
applying to files on another filesystem. If files or directories that
are not otherwise ignored reside on a different filesystem, or symbolic
links point to targets on a different filesystem, they must
be explicitly excluded via ``IGNORE``.


File verification
-----------------

When verifying a file against the Manifest, the following rules are
used:

1. If the file is covered directly or indirectly by an entry
   of the ``IGNORE`` type, the verification always succeeds.

2. If the file is covered by an entry of the ``MANIFEST``, ``DATA``,
   ``MISC``, ``EBUILD`` or ``AUX`` type:

   a. if the file is not present, then the verification fails,

   b. if the file is present but has a different size or one
      of the checksums does not match, the verification fails,

   c. otherwise, the verification succeeds.

3. If the file is present but not listed in Manifest, the verification
   fails.

Unless specified otherwise, the package manager must not allow using
any files for which the verification failed. The package manager may
reject any package or even the whole repository if it may refer to files
for which the verification failed.


Timestamp verification
----------------------

The Manifest file can contain a ``TIMESTAMP`` entry to account
for attacks against tree update distribution. If such an entry
is present, it should be updated every time at least one
of the Manifests changes. Every unique timestamp value must correspond
to a single tree state.

During the verification process, the client should compare the timestamp
against the update time obtained from a local clock or a trusted time
source. If the comparison result indicates that the Manifest at the time
of receiving was already significantly outdated, the client should
either fail the verification or require manual confirmation from user.

Furthermore, the Manifest provider may employ additional methods
of distributing the timestamps of recently generated Manifests
using a secure channel from a trusted source for exact comparison.
The exact details of such a solution are outside the scope of this
specification.


Modern Manifest tags
--------------------

The Manifest files can specify the following tags:

``TIMESTAMP <iso8601>``
  Specifies a timestamp of when the Manifest file was last updated.
  The timestamp must be a valid second-precision ISO8601 extended format
  combined date and time in UTC timezone, i.e. using the following
  ``strftime()`` format string: ``%Y-%m-%dT%H:%M:%SZ``. Optionally used
  in the top-level Manifest file. The package manager can use it
  to detect an outdated repository checkout as described in `Timestamp
  verification`_.

``MANIFEST <path> <size> <checksums>...``
  Specifies a sub-Manifest. The sub-Manifest must be verified like
  a regular file. If the verification succeeds, the entries from
  the sub-Manifest are included for verification as described
  in `Manifest file locations and nesting`_.

``IGNORE <path>``
  Ignores a subdirectory or file from Manifest checks. If the specified
  path is present, it and its contents are omitted from the Manifest
  verification (always pass). *Path* must be a plain file or directory
  path without a trailing slash, and must not contain wildcards.

``DATA <path> <size> <checksums>...``
  Specifies a regular file subject to Manifest verification. The file
  is required to pass verification. Used for all files that do not match
  any other type.

``DIST <filename> <size> <checksums>...``
  Specifies a distfile entry used to verify files fetched as part
  of ``SRC_URI``. The filename must match the filename used to store
  the fetched file as specified in the PMS [#PMS-FETCH]_. The package
  manager must reject the fetched file if it fails verification.
  ``DIST`` entries apply to all packages below the Manifest file
  specifying them.


Deprecated Manifest tags
------------------------

For backwards compatibility, the following tags are additionally
allowed at the package directory level:

``EBUILD <filename> <size> <checksums>...``
  Equivalent to the ``DATA`` type.

``MISC <path> <size> <checksums>...``
  Equivalent to the ``DATA`` type. Historically indicated that
  the package manager may ignore a verification failure if operating
  in non-strict mode. However, that behavior is deprecated.

``AUX <filename> <size> <checksums>...``
  Equivalent to the ``DATA`` type, except that the filename is relative
  to ``files/`` subdirectory.


Algorithm for full-tree verification
------------------------------------

In order to perform full-tree verification, the following algorithm
can be used:

1. Collect all files present in the repository into *present* set.

2. Start at the top-level Manifest file. Verify its OpenPGP signature.
   Optionally verify the ``TIMESTAMP`` entry if present as specified
   in `timestamp verification`. Remove the top-level Manifest
   from the *present* set.

3. Process all ``MANIFEST`` entries, recursively. Verify the Manifest
   files according to `file verification`_ section, and include their
   entries in the current Manifest entry list (using paths relative
   to directories containing the Manifests).

4. Process all ``IGNORE`` entries. Remove any paths matching them
   from the *present* set.

5. Collect all files covered by ``DATA``, ``MISC``, ``EBUILD``
   and ``AUX`` entries into the *covered* set.

6. Verify the entries in *covered* set for incompatible duplicates
   and collisions with ignored files as explained in `Manifest file
   locations and nesting`_.

7. Verify all the files in the union of the *present* and *covered*
   sets, according to `file verification`_ section.


Algorithm for finding parent Manifests
--------------------------------------

In order to find the top-level Manifest from the current directory
the following algorithm can be used:

1. Store the current directory as *original* and the device ID
   of the containing filesystem (``st_dev``) as *startdev*,

2. If the device ID of the containing filesystem (``st_dev``)
   of the current directory is different than *startdev*, stop.

3. If the current directory contains a ``Manifest`` file:

   a. If a ``IGNORE`` entry in the ``Manifest`` file covers
      the *original* directory (or one of the parent directories), stop.

   b. Otherwise, store the current directory as *last_found*.

4. If the current directory is the root system directory (``/``), stop.

5. Otherwise, enter the parent directory and jump to step 2.

Once the algorithm stops, *last_found* will contain the relevant
top-level Manifest. If *last_found* is null, then the directory tree
does not contain any valid top-level Manifest candidates and one should
be created in the *original* directory.

Once the top-level Manifest is found, its ``MANIFEST`` entries should
be used to find any sub-Manifests below the top-level Manifest,
up to and including the *original* directory. Note that those
sub-Manifests can use different filenames than ``Manifest``.


Checksum algorithms
-------------------

This section is informational only. Specifying the exact set
of supported algorithms is outside the scope of this specification.

The algorithm names reserved at the time of writing are:

- ``MD5`` [#MD5]_,
- ``RMD160`` -- RIPEMD-160 [#RIPEMD160]_,
- ``SHA1`` [#SHS]_,
- ``SHA256`` and ``SHA512`` -- SHA-2 family of hashes [#SHS]_,
- ``WHIRLPOOL`` [#WHIRLPOOL]_,
- ``BLAKE2B`` and ``BLAKE2S`` -- BLAKE2 family of hashes [#BLAKE2]_,
- ``SHA3_256`` and ``SHA3_512`` -- SHA-3 family of hashes [#SHA3]_,
- ``STREEBOG256`` and ``STREEBOG512`` -- Streebog family of hashes
  [#STREEBOG]_.

The method of introducing new hashes is defined by GLEP 59 [#GLEP59]_.
It is recommended that any new hashes are named after the Python
``hashlib`` module algorithm names, transformed into uppercase.


Manifest compression
--------------------

The topic of Manifest file compression is covered by GLEP 61 [#GLEP61]_.
This section merely addresses interoperability issues between Manifest
compression and this specification.

The compressed Manifest files are required to be suffixed for their
compression algorithm. This suffix should be used to recognize
the compression and decompress Manifests transparently. The exact list
of algorithms and their corresponding suffixes are outside the scope
of this specification.

Whenever this specification refers to top-level Manifest file,
the implementation should account for compressed variants of this file
with appropriate suffixes (e.g. ``Manifest.gz``).

Whenever this specification refers to sub-Manifests, they can use any
names but are also required to use a specific compression suffix.
The ``MANIFEST`` entries are required to specify the full name including
compression suffix, and the verification is performed on the compressed
file.

The specification permits uncompressed Manifests to exist alongside
their compressed counterparts, and multiple compressed formats
to coexist. If that is the case, the files must have the same
uncompressed content and the specification is free to choose either
of the files using the same base name.


An example Manifest file (informational)
----------------------------------------

An example top-level Manifest file for the Gentoo repository would have
the following content::

    TIMESTAMP 2017-10-30T10:11:12Z
    IGNORE distfiles
    IGNORE local
    IGNORE lost+found
    IGNORE packages
    MANIFEST app-accessibility/Manifest 14821 SHA256 1b5f.. SHA512 f7eb..
    ...
    MANIFEST eclass/Manifest.gz 50812 SHA256 8c55.. SHA512 2915..
    ...

An example modern Manifest (disregarding backwards compatibility)
for a package directory would have the following content::

    DATA SphinxTrain-0.9.1-r1.ebuild 932 SHA256 3d3b.. SHA512 be4d..
    DATA SphinxTrain-1.0.8.ebuild 912 SHA256 f681.. SHA512 0749..
    DATA metadata.xml 664 SHA256 97c6.. SHA512 1175..
    DATA files/gcc.patch 816 SHA256 b56e.. SHA512 2468..
    DATA files/gcc34.patch 333 SHA256 c107.. SHA512 9919..
    DIST SphinxTrain-0.9.1-beta.tar.gz 469617 SHA256 c1a4.. SHA512 1b33..
    DIST sphinxtrain-1.0.8.tar.gz 8925803 SHA256 548e.. SHA512 465d..


Rationale
=========

Stand-alone format
------------------

The first question that needed to be asked before proceeding with
the design was whether the Manifest file format was supposed to be
stand-alone, or tightly bound to the repository format.

The stand-alone format has been selected because of its three
advantages:

1. It is more future-proof. If an incompatible change to the repository
   format is introduced, only developers need to be upgrade the tools
   they use to generate the Manifests. The tools used to verify
   the updated Manifests will continue to work.

2. It is more flexible and universal. With a dedicated tool,
   the Manifest files can be used to sign and verify arbitrary file
   sets.

3. It keeps the verification tool simpler. In particular, we can easily
   write an independent verification tool that could work on any
   distribution without needing to depend on a package manager
   implementation or rewrite parts of it.

Designing a stand-alone format requires that the Manifest carries enough
information to perform the verification following all the rules specific
to the Gentoo repository.


Tree design
-----------

The second important point of the design was determining whether
the Manifest files should be structured hierarchically, or independent.
Both options have their advantages.

In the hierarchical model, each sub-Manifest file is covered by a higher
level Manifest. As a result, only the top-level Manifest has to be
OpenPGP-signed, and subsequent Manifests need to be only verified by
checksum stored in the parent Manifest. This has the following
implications:

- Verifying any set of files in the repository requires using checksums
  from the most relevant Manifests and the parent Manifests.

- The OpenPGP signature of the top-level Manifest needs to be verified
  only once per process.

- Altering any set of files requires updating the relevant Manifests,
  and their parent Manifests up to the top-level Manifest, and signing
  the last one.

- As a result, the top-level Manifest changes on every commit,
  and various middle-level Manifests change (and need to be transferred)
  frequently.

In the independent model, each sub-Manifest file is independent
of the parent Manifests. As a result, each of them needs to be signed
and verified independently. However, the parent Manifests still need
to list sub-Manifests (albeit without verification data) in order
to detect removal or replacement of subdirectories. This has
the following implications:

- Verifying any set of files in the repository requires using checksums
  and verifying signatures of the most relevant Manifest files.

- Altering any set of files requires updating the relevant Manifests
  and signing them again.

- Parent Manifests are updated only when Manifests are added or removed
  from subdirectories. As a result, they change infrequently.

While both models have their advantages, the hierarchical model was
selected because it reduces the number of OpenPGP operations
which are comparatively costly to the minimum.


Tree layout restrictions
------------------------

The algorithm is meant to work primarily with ebuild repositories which
normally contain only files and directories. Directories provide
no useful metadata for verification, and specifying special entries
for additional file types is purposeless. Therefore, the specification
is restricted to dealing with regular files.

The Gentoo repository does not use symbolic links. Some Gentoo
repositories do, however. To provide a simple solution for dealing with
symlinks without having to take care to implement special handling for
them, the common behavior of implicitly resolving them is used.
Therefore, symbolic links to files are stored as if they were regular
files, and symbolic links to directories are followed as if they were
regular directories.

Dotfiles are implicitly ignored as that is a common notion used
in software written for POSIX systems. All other filenames require
explicit ``IGNORE`` lines.

An ability to inject additional ignore entries is provided to account
for site configuration affecting the repository tree -- placing
additional files in it, skipping some of the categories from syncing.
This configuration can extend beyond the limits of this GLEP,
e.g. by allowing wildcards or regular expressions.

The algorithm is restricted to work on a single filesystem. This is
mostly relevant when scanning for top-level Manifest -- we do not want
to cross filesystem boundaries then. However, to ensure consistent
bidirectional behavior we need to also ban them when operating downwards
the tree.

The directories and files on different filesystems need to be ignored
explicitly as implicitly skipping them would cause confusion.
In particular, tools might then claim that a file does not exist when
it clearly does because it was skipped due to filesystem boundaries.


File verification model
-----------------------

The verification model aims to provide full coverage against different
forms of attack. In particular, three different kinds of manipulation
are considered:

1. Alteration of the file content.

2. Removal of a file.

3. Addition of a new file.

In order to prevent against all three, the system requires that all
files in the repository are listed in Manifests and verified against
them.

As a special case, ignores are allowed to account for directories
that are not part of the repository but were traditionally placed inside
it. Those directories were ``distfiles``, ``local`` and ``packages``. It
could be also used to ignore VCS directories such as ``CVS``.


Non-strict Manifest verification
--------------------------------

Originally the Manifest2 format provided a special ``MISC`` tag that
was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
indicated that the Manifest verification failures could be ignored for
those files unless the package manager was working in strict mode.

The first versions of this specification continued the use of this tag.
However, after a long debate it was decided to deprecate it along with
the non-strict behavior, and require all files to strictly match.

Two arguments were mentioned for the usefulness of a ``MISC`` type:

1. being able to reduce the checkout size by stripping unnecessary
   files out, and

2. being able to run update automatically generated files locally
   without causing unnecessary verification failures.

However, the usefulness of ``MISC`` in both cases is doubtful.

The cases for stripping unnecessary files mostly focused around space
savings. For this purpose, stripping ``metadata.xml`` and similar files
has little value. It is much more common for users to strip whole
packages or categories. The ``MISC`` type is not suitable for that,
and so a dedicated package manager mechanism needs to be developed
instead. The same mechanism can also handle files that historically used
the ``MISC`` type. As an example, the package manager may choose
to generate both the rsync exclusion list and Manifest ignore list
using a single source list.

The cases for autogenerated files involve such cache files
as ``use.local.desc``. However, we can not include ``md5-cache`` there
due to security concerns which results in inconsistent cache handling.
Furthermore, the tools were historically modified to provide stable
output which means that their content can not change without
a non-``MISC`` content being changed first. This practically defeats
the purpose of using ``MISC``.

Finally, the non-strict mode could be used as means to an attack.
The allowance of missing or modified documentation file could be used
to spread misinformation, resulting in bad decisions made by the user.
A modified file could also be used e.g. to exploit vulnerabilities
of an XML parser.


Timestamp field
---------------

The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
to include a generation timestamp in the Manifest. A similar feature
was originally proposed in GLEP 58 [#GLEP58]_.

A malicious third-party may use the principles of exclusion or replay
[#C08]_ to deny an update to clients, while at the same time recording
the identity of clients to attack. The timestamp field can be used to
detect that.

In order to provide a more complete protection, the Gentoo
Infrastructure should provide an ability to obtain the timestamps
of all Manifests from a recent timeframe over a secure channel
from a trusted source for comparison.

Strictly speaking, this information is already provided by the various
``metadata/timestamp*`` files that are already present. However,
including the value in the Manifest itself has a little cost
and provides the ability to perform the verification stand-alone.

Furthermore, some of the timestamp files are added very late
in the distribution process, past the Manifest generation phase. Those
files will most likely receive ``IGNORE`` entries and therefore
be not suitable to safe use.


New vs deprecated tags
----------------------

Out of the four types defined by Manifest2, only one is reused
and the remaining three is replaced by a single, universal ``DATA``
type.

The ``DIST`` tag is reused since the specification does not change
anything with regard to distfile handling.

The ``EBUILD`` tag could potentially be reused for generic file
verification data. However, it would be confusing if all the different
data files were marked as ``EBUILD``. Therefore, an equivalent ``DATA``
type was introduced as a replacement.

The ``MISC`` tag and the relevant non-strict mode has been removed
as being of little value, as detailed in the `Non-strict Manifest
verification`_ section.

The ``AUX`` tag is deprecated as it is redundant to ``DATA``, and has
the limiting property of implicit ``files/`` path prefix.


Finding top-level Manifest
--------------------------

The development of a reference implementation for this GLEP has brought
the following problem: how to find all the relevant Manifests when
the Manifest tool is run inside a subdirectory of the repository?

One of the options would be to provide a bi-directional linking
of Manifests via a ``PARENT`` tag. However, that would not solve
the problem when a new Manifest file is being created.

Instead, an algorithm for iterating over parent directories is proposed.
Since there is no obligatory explicit indicator for the top-level
Manifest, the algorithm assumes that the top-level Manifest
is the highest ``Manifest`` in the directory hierarchy that can cover
the current directory. This generally makes sense since the Manifest
files are required to provide coverage for all subdirectories, so all
Manifests starting from that one need to be updated.

If independent Manifest trees are nested in the directory structure,
then an ``IGNORE`` entry needs to be used to separate them.

Since sub-Manifests can use any filenames, the Manifest finding
algorithm must not short-cut the procedure by storing all ``Manifest``
files along the parent directories. Instead, it needs to retrace
the relevant sub-Manifest files along ``MANIFEST`` entries
in the top-level Manifest.


Injecting ChangeLogs into the checkout
--------------------------------------

One of the problems considered in the new Manifest format was that
of injecting historical and autogenerated ChangeLog into the repository.
Normally we are not including those files to reduce the checkout size.
However, some users have shown interest in them and Infra is working
on providing them via an additional rsync module.

If such files were injected into the repository, they would cause
verification failures of Manifests. To account for this, Infra could
provide ``IGNORE`` entries to allow them to exist.


Splitting distfile checksums from file checksums
------------------------------------------------

Another problem with the current Manifest format is that the checksums
for fetched files are combined with checksums for local files
in a single file inside the package directory. It has been specifically
pointed out that:

- since distfiles are sometimes reused across different packages,
  the repeating checksums are redundant [#DIST]_.
  
- mirror admins were interested in the possibility of verifying all
  the distfiles with a single tool.

This specification does not provide a clean solution to this problem.
It technically permits moving ``DIST`` entries to higher-level Manifests
but the usefulness of such a solution is doubtful.

However, for the second problem we will probably deliver a dedicated
tool working with this Manifest format.


Hash algorithms
---------------

While maintaining a consistent supported hash set is important
for interoperability, it is no good fit for the generic layout of this
GLEP. Furthermore, it would require updating the GLEP in the future
every time the used algorithms change.

Instead, the specification focuses on listing the currently used
algorithm names for interoperability, and sets a recommendation
for consistent naming of algorithms in the future. The Python
``hashlib`` module is used as a reference since it is used
as the provider of hash functions for most of the Python software,
including Portage and PkgCore.

The basic rules for changing hash algorithms are defined in GLEP 59
[#GLEP59]_. The implementations can focus only on those algorithms
that are actually used or planned on being used. It may be feasible
to devise a new GLEP that specifies the currently used hashes (or update
GLEP 59 accordingly).


Manifest compression
--------------------

The support for Manifest compression is introduced with minimal changes
to the file format. The ``MANIFEST`` entries are required to provide
the real (compressed) file path for compatibility with other file
entries and to avoid confusion.

The existence of additional entries for uncompressed Manifest checksums
was debated. However, plain entries for the uncompressed file would
be confusing if only compressed file existed, and conflicting if both
uncompressed and compressed variants existed. Furthermore, it has been
pointed out that ``DIST`` entries do not have uncompressed variant
either.


Performance considerations
--------------------------

Performing a full-tree verification on every sync raises some
performance concerns for end-user systems. The initial testing has shown
that a cold-cache verification on a btrfs file system can take up around
4 minutes, with the process being mostly I/O bound. On the other hand,
it can be expected that the verification will be performed directly
after syncing, taking advantage of warm filesystem cache.

To improve speed on I/O and/or CPU-restrained systems even further,
the algorithms can be easily extended to perform incremental
verification. Given that rsync does not preserve mtimes by default,
the tool can take advantage of mtime and Manifest comparisons to recheck
only the parts of the repository that have changed.

Furthermore, the package manager implementations can restrict checking
only to the parts of the repository that are actually being used.


Backwards Compatibility
=======================

This GLEP provides optional means of preserving backwards compatibility.
To preserve the backwards compatibility, the following needs to hold
for the ``Manifest`` file in every package directory:

- all files must be covered by the single ``Manifest`` file,

- all distfiles used by the package must be included,

- all files inside the ``files/`` subdirectory need to use
  the ``AUX`` tag (rather than ``DATA``),

- all ``.ebuild`` files need to use the ``EBUILD`` tag,

` the ``metadata.xml`` and ``ChangeLog`` files need to use
  the ``MISC`` tag,

- the Manifest can be signed to provide authenticity verification,

- an uncompressed Manifest must always exist, and a compressed Manifest
  of identical content may be present.

Once the backwards compatibility is no longer a concern, the above
no longer needs to hold and the deprecated tags can be removed.


Reference Implementation
========================

The reference implementation for this GLEP is being developed
as the gemato project [#GEMATO]_.


Credits
=======

Thanks to all the people whose contributions were invaluable
to the creation of this GLEP. This includes but is not limited to:

- Robin Hugh Johnson,
- Ulrich Müller.

Additionally, thanks to Robin Hugh Johnson for the original
MataManifest GLEP series which served both as inspiration and source
of many concepts used in this GLEP. Recursively, also thanks to all
the people who contributed to the original GLEPs.


References
==========

.. [#GLEP44] GLEP 44: Manifest2 format
   (https://www.gentoo.org/glep/glep-0044.html)

.. [#GLEP57] GLEP 57: Security of distribution of Gentoo software
   - Overview
   (https://www.gentoo.org/glep/glep-0057.html)

.. [#GLEP58] GLEP 58: Security of distribution of Gentoo software
   - Infrastructure to User distribution - MetaManifest
   (https://www.gentoo.org/glep/glep-0058.html)

.. [#GLEP59] GLEP 59: Manifest2 hash policies and security implications
   (https://www.gentoo.org/glep/glep-0059.html)

.. [#GLEP60] GLEP 60: Manifest2 filetypes
   (https://www.gentoo.org/glep/glep-0060.html)

.. [#GLEP61] GLEP 61: Manifest2 compression
   (https://www.gentoo.org/glep/glep-0061.html)

.. [#PMS-FETCH] Package Manager Specification: Dependency Specification
   Format - SRC_URI
   (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10)

.. [#MD5] RFC1321: The MD5 Message-Digest Algorithm
   (https://www.ietf.org/rfc/rfc1321.txt)

.. [#RIPEMD160] The hash function RIPEMD-160
   (https://homes.esat.kuleuven.be/~bosselae/ripemd160.html)

.. [#SHS] FIPS PUB 180-4: Secure Hash Standard (SHS)
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf)

.. [#WHIRLPOOL] The WHIRLPOOL Hash Function
   (http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html)

.. [#BLAKE2] BLAKE2 -- fast secure hashing
   (https://blake2.net/)

.. [#SHA3] FIPS PUB 202: SHA-3 Standard: Permutation-Based Hash
   and Extendable-Output Functions
   (http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf)

.. [#STREEBOG] GOST R 34.11-2012: Streebog Hash Function
   (https://www.streebog.net/)

.. [#C08] Cappos, J et al. (2008). "Attacks on Package Managers"
   (https://www2.cs.arizona.edu/stork/packagemanagersecurity/attacks-on-package-managers.html)

.. [#DIST] According to Robin H. Johnson, 8.4% of all DIST entries
   at the time of writing are duplicate, representing a 2 MiB
   out of 25 MiB of DIST entries altogether.

.. [#GEMATO] gemato: Gentoo Manifest Tool
   (https://github.com/mgorny/gemato/)

Copyright
=========
This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.

-- 
Best regards,
Michał Górny



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-11-06 21:53 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-26 20:12 [gentoo-dev] [RFC] GLEP 74: Full-tree verification using Manifest files Michał Górny
2017-10-26 21:58 ` Roy Bamford
2017-10-27  6:22   ` Michał Górny
2017-10-28  2:41     ` Dean Stephens
2017-10-27 21:05 ` Robin H. Johnson
2017-10-28 11:50   ` Michał Górny
2017-10-28 12:49     ` Ulrich Mueller
2017-10-28 13:23       ` Michał Górny
2017-10-28 13:46         ` Ulrich Mueller
2017-10-28 20:55           ` Michał Górny
2017-10-28 18:44     ` Robin H. Johnson
2017-10-29 18:47       ` Michał Górny
2017-10-29 20:54         ` Robin H. Johnson
2017-10-30 16:01           ` Michał Górny
2017-10-27 21:48 ` Hanno Böck
2017-10-28  2:41   ` Dean Stephens
2017-10-28  3:27     ` M. J. Everitt
2017-10-28  4:43       ` Allan Wegan
2017-10-29 19:07 ` [gentoo-dev] [v1.0.1] " Michał Górny
2017-10-29 20:39   ` Robin H. Johnson
2017-10-30 16:11     ` Michał Górny
2017-10-30 16:51 ` [gentoo-dev] [v1.0.2] " Michał Górny
2017-10-30 19:56   ` Robin H. Johnson
2017-11-01  8:44     ` Michał Górny
2017-11-01  9:47       ` Walter Dnes
2017-11-01 13:08       ` Andreas K. Huettel
2017-11-02 19:10     ` Michał Górny
2017-11-02 19:11 ` [gentoo-dev] [v1.0.3] " Michał Górny
2017-11-02 23:43   ` Robin H. Johnson
2017-11-05 21:10     ` Michał Górny
2017-11-06 20:42       ` Robin H. Johnson
2017-11-06 21:53 ` [gentoo-dev] [v1.0.4] " Michał Górny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox