From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1OJYvi-0006Fq-0P for garchives@archives.gentoo.org; Tue, 01 Jun 2010 21:23:22 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id E3DE4E10D7; Tue, 1 Jun 2010 21:22:44 +0000 (UTC) Received: from mail-gy0-f181.google.com (mail-gy0-f181.google.com [209.85.160.181]) by pigeon.gentoo.org (Postfix) with ESMTP id B33B0E10D7 for ; Tue, 1 Jun 2010 21:22:44 +0000 (UTC) Received: by gyg8 with SMTP id 8so3638263gyg.40 for ; Tue, 01 Jun 2010 14:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=vBoYKu4MdWJvWQPmoy77/uNQDaZtRvgeYrEnTBHRRBY=; b=UUHSiMAugbf7Oqg4h0Ez/VrC9tTiY8BlXRLJcJrItzWfhz7RUemsG5xMuHbl2lMJKH xzbBgb9nruZOk5WBo+2Ay8KprHsxAZ18l0+Skf9JtNY6GbavyQ+vrUeahVqdP77haRnb 46JljMGPann6qp9/VfX22N6EW33PlWsmYupTI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=qmslreiS2hK45UyJGB9zDMsCVHPEmmtSdpybFOajZNBLxhkkm41pzW335b4gg8GLyj qN/IcNgbMhWtIzeUlea1D56TM21W9zNiGzACIZ4RXImBcTxDgkqhwH2zwVz5iD79oAJy u3iRO1do7j+YYdMmbiSvnkYx4/ZX1vSjDTNUs= Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-portage-dev@lists.gentoo.org Reply-to: gentoo-portage-dev@lists.gentoo.org MIME-Version: 1.0 Received: by 10.231.178.135 with SMTP id bm7mr8465257ibb.73.1275427354075; Tue, 01 Jun 2010 14:22:34 -0700 (PDT) Received: by 10.231.194.147 with HTTP; Tue, 1 Jun 2010 14:22:33 -0700 (PDT) In-Reply-To: <1275422465.24611.9.camel@hangover> References: <4C047F52.30209@gentoo.org> <20100601051608.GD19306@hrair> <1275422465.24611.9.camel@hangover> Date: Tue, 1 Jun 2010 14:22:33 -0700 Message-ID: Subject: Re: [gentoo-portage-dev] Package compression header for binhosts From: Brian Harring To: gentoo-portage-dev@lists.gentoo.org Content-Type: multipart/alternative; boundary=001636b145238457250487fe92f8 X-Archives-Salt: 956c3d39-89e4-45e8-a925-75ba55665ab1 X-Archives-Hash: 0801e5a6a55bc3aba3f06735c427da93 --001636b145238457250487fe92f8 Content-Type: text/plain; charset=UTF-8 On Tue, Jun 1, 2010 at 1:01 PM, Ned Ludd wrote: > On Mon, 2010-05-31 at 22:16 -0700, Brian Harring wrote: > > On Mon, May 31, 2010 at 08:32:34PM -0700, Zac Medico wrote: > > > Hi, > > > > > > In order to support alternative compression types for binhost > > > packages, I was thinking about adding support for a header field in > > > the Packages index file. For example, a header line like > > > "PACKAGE_EXTENSION: txz" could be used to indicate that clients > > > should download files with txz extensions instead of tbz2 > > > extensions. I'm planning to add support for both tgz [1] and txz > > > extensions. > > > > > > [1] http://bugs.gentoo.org/show_bug.cgi?id=142579 > > > > 1) requires a version header bump > > Agreed. But there were some other pending changes for "VERSION: 1" > > Any planned changes to the format should be documented on > https://bugs.gentoo.org/show_bug.cgi?id=263994 > > > > 2) a header alone isn't useful unless it's specifiable per cpv entry; > > thus it must be inheritable > > Per CPV entries is going to bloat the format and make me carry around a > more data on a per pkg basis then I'd want to. How about we run with > zac's idea but use tools to convert a full repo over to $EXTENTION > This should keep the portage code fast as well as it checks for invalid > binpkgs all the time. Having to have portage process a ton of ever > growing extentions is just going to be slow. > Note I said 'inheritable'; one of the main flaws w/ version 0 is that it requires quite a few entries per CPV, instead of setting a default in the preamble and then overriding as needed at the CPV level. What I'm suggesting is a COMPRESSOR in the preamble, and individual cpv's override it if they're not that compressor. As for zacs tool to try and generate new views of a repository via hardlinking/recreating the tree... frankly it's a bit of a hack. Via DEFAULT_URI and relying on the hash, you can make a stable repository that is able to be updated in place without corrupting ongoing downloads- simply put, new additions to the repo don't perturb current DL's since the md5 is the same (hash collision chance is low enough that I don't care about it here). > > 3) PACKAGE_EXTENSION is overly verbose and unclear it's specifying > > the compressor too; it's intention is for compression, state it as > > such (I mention this in light of URI's existance where > > PACKAGE_EXTENSION would only be a hint of compressor) > > > > Re: #1, there is a decent set of optimizations I'm kicking around in > > pkgcore for the next version- a discussion should probably be started > > there. > > > > Offhand, having a compression specific header (a simple enumeration > > of known compressors) and a DEFAULT_URI that is python string > > No go bro. The 'Packages' format should be independent of python. > > > interpolation assembled (for example, > > DEFAULT_URI="%(host)s/%(category)s/%(pf)s.txz") seems wiser. Via > > doing what I'm suggesting, it would be possible to do binpkg > > repository 'views' w/out having to map each binpkg into the url space > > for it. > Then come up w/ an alternative w/ the same power as DEFAULT_URI that isn't python specific; think through the potentials of it, I could very easily centralize the binpkgs for an arch, use the hash as they're lookup value, then use the Packages cache as a 'view' into that binpkg repository. Differing use flag combinations, differing license views, hell, differing ACCEPT_KEYWORDS, all of that can have the raw pkgs stored centrally while just providing differing views into it- DEFAULT_URI lays the groundwork for it. --001636b145238457250487fe92f8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Tue, Jun 1, 2010 at 1:01 PM, Ned Ludd <solar@= gentoo.org> wrote:
On Mon, 2010-05-31 at 22:16 -0700, Brian Harring wrote:
> On Mon, May 31, 2010 at 08:32:34PM -0700, Zac Medico wrote:
> > Hi,
> >
> > In order to support alternative compression types for binhost
> > packages, I was thinking about adding support for a header field = in
> > the Packages index file. For example, a header line like
> > "PACKAGE_EXTENSION: txz" could be used to indicate that= clients
> > should download files with txz extensions instead of tbz2
> > extensions. I'm planning to add support for both tgz [1] and = txz
> > extensions.
> >
> > [1] http://bugs.gentoo.org/show_bug.cgi?id=3D142579
>
> 1) requires a version header bump

Agreed. But there were some other pending changes for "VERSION: = 1"

Any planned changes to the format should be documented on
https://bugs.gentoo.org/show_bug.cgi?id=3D263994


> 2) a header alone isn't useful unless it's specifiable per cpv= entry;
> thus it must be inheritable

Per CPV entries is going to bloat the format and make me carry around= a
more data on a per pkg basis then I'd want to. How about we run with zac's idea but use tools to convert a full repo over to $EXTENTION
This should keep the portage code fast as well as it checks for invalid
binpkgs all the time. Having to have portage process a ton of ever
growing extentions is just going to be slow.

Note I said 'inheritable'; one of the main flaws w/ version 0= is that it requires quite a few entries per CPV, instead of setting a defa= ult in the preamble and then overriding as needed at the CPV level.

What I'm suggesting is a COMPRESSOR in the preamble= , and individual cpv's override it if they're not that compressor.<= /div>

As for zacs tool to try and generate new views of = a repository via hardlinking/recreating the tree... frankly it's a bit = of a hack. =C2=A0Via DEFAULT_URI and relying on the hash, you can make a st= able repository that is able to be updated in place without corrupting ongo= ing downloads- simply put, new additions to the repo don't perturb curr= ent DL's since the md5 is the same (hash collision chance is low enough= that I don't care about it here).


> 3) PACKAGE_EXTENSION is overly verbose and unclear it's specifying=
> the compressor too; it's intention is for compression, state it as=
> such (I mention this in light of URI's existance where
> PACKAGE_EXTENSION would only be a hint of compressor)
>
> Re: #1, there is a decent set of optimizations I'm kicking around = in
> pkgcore for the next version- a discussion should probably be started<= br> > there.
>
> Offhand, having a compression specific header (a simple enumeration > of known compressors) and a DEFAULT_URI that is python string

No go bro. The 'Packages' format should be independent of pyt= hon.

> interpolation =C2=A0assembled (for example,
> DEFAULT_URI=3D"%(host)s/%(category)s/%(pf)s.txz") seems wise= r. =C2=A0Via
> doing what I'm suggesting, it would be possible to do binpkg
> repository 'views' w/out having to map each binpkg into the ur= l space
> for it.

Then come up w= / an alternative w/ the same power as DEFAULT_URI that isn't python spe= cific; think through the potentials of it, I could very easily centralize t= he binpkgs for an arch, use the hash as they're lookup value, then use = the Packages cache as a 'view' into that binpkg repository. =C2=A0D= iffering use flag combinations, differing license views, hell, differing AC= CEPT_KEYWORDS, all of that can have the raw pkgs stored centrally while jus= t providing differing views into it- DEFAULT_URI lays the groundwork for it= .
--001636b145238457250487fe92f8--