On Tue, Jun 1, 2010 at 1:01 PM, Ned Ludd <solar@gentoo.org> wrote:
On Mon, 2010-05-31 at 22:16 -0700, Brian Harring wrote:
> On Mon, May 31, 2010 at 08:32:34PM -0700, Zac Medico wrote:
> > Hi,
> >
> > In order to support alternative compression types for binhost
> > packages, I was thinking about adding support for a header field in
> > the Packages index file. For example, a header line like
> > "PACKAGE_EXTENSION: txz" could be used to indicate that clients
> > should download files with txz extensions instead of tbz2
> > extensions. I'm planning to add support for both tgz [1] and txz
> > extensions.
> >
> > [1] http://bugs.gentoo.org/show_bug.cgi?id=142579
>
> 1) requires a version header bump

Agreed. But there were some other pending changes for "VERSION: 1"

Any planned changes to the format should be documented on
https://bugs.gentoo.org/show_bug.cgi?id=263994


> 2) a header alone isn't useful unless it's specifiable per cpv entry;
> thus it must be inheritable

Per CPV entries is going to bloat the format and make me carry around a
more data on a per pkg basis then I'd want to. How about we run with
zac's idea but use tools to convert a full repo over to $EXTENTION
This should keep the portage code fast as well as it checks for invalid
binpkgs all the time. Having to have portage process a ton of ever
growing extentions is just going to be slow.

Note I said 'inheritable'; one of the main flaws w/ version 0 is that it requires quite a few entries per CPV, instead of setting a default in the preamble and then overriding as needed at the CPV level.

What I'm suggesting is a COMPRESSOR in the preamble, and individual cpv's override it if they're not that compressor.

As for zacs tool to try and generate new views of a repository via hardlinking/recreating the tree... frankly it's a bit of a hack.  Via DEFAULT_URI and relying on the hash, you can make a stable repository that is able to be updated in place without corrupting ongoing downloads- simply put, new additions to the repo don't perturb current DL's since the md5 is the same (hash collision chance is low enough that I don't care about it here).


> 3) PACKAGE_EXTENSION is overly verbose and unclear it's specifying
> the compressor too; it's intention is for compression, state it as
> such (I mention this in light of URI's existance where
> PACKAGE_EXTENSION would only be a hint of compressor)
>
> Re: #1, there is a decent set of optimizations I'm kicking around in
> pkgcore for the next version- a discussion should probably be started
> there.
>
> Offhand, having a compression specific header (a simple enumeration
> of known compressors) and a DEFAULT_URI that is python string

No go bro. The 'Packages' format should be independent of python.

> interpolation  assembled (for example,
> DEFAULT_URI="%(host)s/%(category)s/%(pf)s.txz") seems wiser.  Via
> doing what I'm suggesting, it would be possible to do binpkg
> repository 'views' w/out having to map each binpkg into the url space
> for it.

Then come up w/ an alternative w/ the same power as DEFAULT_URI that isn't python specific; think through the potentials of it, I could very easily centralize the binpkgs for an arch, use the hash as they're lookup value, then use the Packages cache as a 'view' into that binpkg repository.  Differing use flag combinations, differing license views, hell, differing ACCEPT_KEYWORDS, all of that can have the raw pkgs stored centrally while just providing differing views into it- DEFAULT_URI lays the groundwork for it.