Hi,

After my post to -core about how to move ahead with signing, I thought
the next best place to continue is in a discussion of how Portage
handles manifests and their signatures.

First, the blatantly obvious, for the benefit of same developers, even
though it's not relevant to signing. It is still a weak-point and does
need to be addressed. Multiple-hashes!
Ok, so Wang et al. showed you can break MD5 about an hour, on modern
machines, in a specific fashion. Their paper also discussed the effects
of their new mathematical work had on SHA1, which was making it
significantly easier to break. However, the nature of the math is that
it's still too computationally intensive for anybody but NSA-type
attackers to defeat both SHA1 and MD5 at the same time.
Simple solution - just put both hashes into the Manifest/digest.

Now onto the real stuff. The existing Manifest system has a very
specific problem in ensuring the trustworthy-ness of files.
It's quickest to explain via example.
1. cat-foo/bar has a complete, signed Manifest, and all of the content
is verified.
2. Dev A commits a change to the package, the Manifest is regenerated,
and Dev A doesn't sign it.
3. Dev B goes commits something to the package, maybe not even to the
same file. However Dev B does
sign the newly generated Manifest.
The change made by Dev A was not certified via Dev A's signature,
however when Dev B came to change it, it suddenly became
trusted/certified code. Now what if Dev A was an attacker with a stolen
SSH key? And worse, Dev B didn't actually check the ebuild, as he was
using ekeyword to stable a lot of packages? (ignoring the fact that
many-eyes should have looked at the ebuild, but really, how long will
it take before the change it actually noticed, esp. if there is no
accompanying version bump).

Everybody on board with the problem?

Now a moments discussion about a solution.
One of my goals in examining signing is to make true end-to-end
verification via checksums+signing possible - to avoid any and all forms
of injection attacks going unnoticed.

Once the entire tree is signed, then the process of development will be
altered towards the following:
1) [dev] cvs up + verify all incoming changes are signed (if paranoid)
2) [dev] work on something, repoman ci with initial Manifest.
3) [dev] signed manifest commited.
4) [cvs server] queue newly signed manifest for verification
5) [cvs server - async] verify all newly signed manifests, if one
fails, block CVS -> rsync update to allow manual fix/revert. (Yes, this
is policing).
7) [user] emerge --sync
8) [user] emerge foobar - and everything is verified as signed

Solving the problem isn't hard. All we need is to consider each change
to a Manifest separately.

Ergo, instead of a Manifest being re-generated each time, it needs to
act like a FIFO queue.
Each queue element consists of:
- checksum/existing Manifest element of items changed in that action ONLY (however it should be possible to forcibly include files).
- Signature around the above checksum.

So now the new Manifest structure looks roughly like this (abbreviated):
-- PGP
MD5 ...
MD5 ...
-- SIG
-- SIG
-- PGP
MD5 ...
-- SIG
-- SIG
etc.

This has one important implication for backwards compatibility in
checking of Manifests.
In the case that a filename appears more than once in the file, only
the last instance of it should be used, as that is the one that relates
to the current version of the file. It's 4 lines of code in the current
portage that need to be removed for this to work (see my -core post for
where exactly).

Generation of the above is reasonably simple, just make the checksums in
a string, clear-sign via gpg separately/tmpfile, append to Manifest.

There is one last part that does need to be taken care of.
At what point is it safe to remove a checksum/signature from the
Manifest?
You cannot remove a single checksum from inside a signature block, as
that invalidates the signature.
So instead, now we need to wait for the existence more recent
checksums/signatures to exist for every item in a block before the old
block can be removed 

This comes into play again as us needing to be able to force recreation
of checksums/signatures - as the key that did the original signature
might no longer be valid (for various reasons). So repoman needs to
have some option to do:
repoman forcesign $FILES

Notes:
There were some concerns about the speed of Manifest checking.
I did some simple benchmarking, using all existing signed Manifests
(40% of the tree). Using a Pentium-D 3Ghz, running on only one core, it
took 43 seconds to check all 4000 Manifests. 90% of the time was in the
setup/teardown for gpg.

Getting around the setup/teardown time problem basically means we need
something of our own to interface with the gpg API (gpgme). This is not
too hard, GPG ships with an example that is most of the way there
already - 'gpgv'.

I'm comfortable with the GPG/GPGME codebase, and writing crypto-related
code, so I'm going to tackle that, starting after I've eaten dinner
tonight.

Even still, re-checking every digest in the tree should not happen on
every CVS->rsync window. It's computationally pointless. Just check the
changes.

-- 
Robin Hugh Johnson
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85