* Fwd: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
@ 2018-11-18 21:10 Roy Bamford
2018-11-18 21:55 ` Rich Freeman
0 siblings, 1 reply; 6+ messages in thread
From: Roy Bamford @ 2018-11-18 21:10 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1.1: Type: text/plain, Size: 169 bytes --]
See attached.
Replying off list because I am not on the whitelist ...
--
Regards,
Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods
[-- Attachment #1.2: Type: message/rfc822, Size: 12975 bytes --]
[-- Attachment #1.2.1.1: Type: text/plain, Size: 4245 bytes --]
On Sun, Nov 18, 2018 at 5:04 AM Roy Bamford <neddyseagoon@gentoo.org> wrote:
> On 2018.11.18 09:38, Michał Górny wrote:
> > On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote:
> > > On 17-11-2018 12:21:40 +0100, Michał Górny wrote:
> > > > Problems with the current binary package format
>
> [snip]
>
> > > > 2. **The format relies on obscure compressor feature of ignoring
> > > > trailing garbage**. While this behavior is traditionally
> > implemented
> > > > by many compressors, the original reasons for it have become
> > long
> > > > irrelevant and it is not surprising that new compressors do not
> > > > support it. In particular, Portage already hit this problem
> > twice:
> > > > once when users replaced bzip2 with parallel-capable pbzip2
> > > > implementation [#PBZIP2]_, and the second time when support for
> > zstd
> > > > compressor was added [#ZSTD]_.
> > >
> > > I think this is actually the result of a rather opportunistic
> > > implementation. The fault is that we chose to use an extension that
> > > suggests the file is a regular compressed tarball.
> > > When one detects that a file is xpak padded, it is trivial to feed
> > the
> > > decompressor just the relevant part of the datastream. The format
> > > itself isn't bad, and doesn't rely on obscure behaviour.
> >
> > Except if you don't have the proper tools installed. In which case
> > the 'opportunistic' behavior made it possible to extract the contents
> > without special tools... except when it actually happens not to work
> > anymore. Roy's reply indicates that there is actually interest in
> > this
> > design feature.
> >
> [snip]
>
> Team,
>
> I use to post something like https://wiki.gentoo.org/wiki/Fix_My_Gentoo
> with a link to Patricks binhost on the forums every three or four months.
> It made it worth writing that wiki page anyway.
>
> We still get users removing elements of their toolchain or glbc from time
> to time. The requirement that I didn't express very well, is that it
> shall
> be possible to install binary packages without the use of any Gentoo
> specific tooling.
>
> The current tarball of tarballs proposal would satisfy that requirement.
>
> Its unlikely that a custom binary format would. Of course, this being
> Gentoo someone would write a run anywhere script that did the
> unpicking, We already have deb2targz and rpm2targz. We have the
> opportunity to design out binpgk2targz before it exists.
>
> --
> Regards,
>
> Roy Bamford
> (Neddyseagoon) a member of
> elections
> gentoo-ops
> forum-mods
>
Replying off list because I am not on the whitelist.
Please also consider my use case:
I have a cluster file system, cephfs, which all of my gentoo machines mount
for access to various shared file resources.
I want to have all of them mount a cephfs path to the folder which portage
is configured to look for binary packages.
This works great if all of the machines have identical portage
configurations, but breaks down as soon as one machine uses a different use
flag.
The reason for this is that the package file names do not encode anything
other than the package name and version number. So if a binpkg already
exists in my binpkg repository, and another machine builds with different
use flags, the binpkg gets overwritten, potentially while a third machine
is reading the binpkg file.
The filename also does not represent compile time dependencies, or any
number of other possible points of differentiation
This issue could be (at least partially) solved at least 3 ways.
1) append a uuid to each filename. Generated when the bin package file is
generated.
2) encode the hostname of the machine that generated the file
3) encode the use flags in the filename.
Perhaps a fuller solution is to respect an environment variable
"BINARY_PKG_FILENAME_FORMAT" that accepts a series of variable
substitutions to append after the package name and version number?
This variable would be used only when generating the binary package.
Portage would still use any binary package that it found that matched its
needs, regardless of suffix.
Thanks for your time.
[-- Attachment #1.2.1.2: Type: text/html, Size: 5590 bytes --]
[-- Attachment #2: Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
2018-11-18 21:10 Fwd: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com] Roy Bamford
@ 2018-11-18 21:55 ` Rich Freeman
2018-11-18 22:40 ` Zac Medico
0 siblings, 1 reply; 6+ messages in thread
From: Rich Freeman @ 2018-11-18 21:55 UTC (permalink / raw
To: gentoo-dev
On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford <neddyseagoon@gentoo.org> wrote:
>
> Replying off list because I am not on the whitelist.
That seems odd.
> 1) append a uuid to each filename. Generated when the bin package file is generated.
> 2) encode the hostname of the machine that generated the file
> 3) encode the use flags in the filename.
So, I brought up this same issue in the earlier discussion and it was
considered out of scope, and I think this is fair. The GLEP does not
specify filename, and IMO the standard for what goes INSIDE the file
will work just fine with any future enhancements that address exactly
this use case.
Besides your case of building for a cluster, another use case is
having a central binary repo that portage could check and utilize when
a user's preferences happen to match what is pre-built.
I suggest we start a different thread for any additional discussion of
this use case. I was thinking and it probably wouldn't be super-hard
to actually start building something like this. But, I don't want to
derail this GLEP as I don't see any reason designing something like
this needs to hold up the binary package format. Both the existing
and proposed binary package formats will encode any metadata needed by
the package manager inside the file, and the only extension we need is
to encode identifying info in the filename.
My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash. If a hit is found then
it reads the complete metadata in the file and applies all the sanity
checks it already does. Generating of binary packages with the hash
cold be made optional, and portage could also be configured to first
look for the matching hash, then fall back to the existing naming
convention, so that it would be compatible with existing generic
names. So, users would get a choice as to whether they want to build
up a library of these packages, or just have each build overwrite the
last.
Then the next step would be to allow these files to be fetched from a
binary repo optionally, and then finally we'd need tools to create the
repo. But, this step isn't needed for your use case. With the proper
optional switches you could utilize as much of this scheme as you
like.
Also, you could optionally choose how much you want portage to encode
in the tag and look for. Are you very fussy and only want a binary
package with matching CFLAGS/USE/whatever? Or is just matching
USE/arch/etc enough? Some of the existing portage options could
potentially be re-used here.
Please make any replies in a new thread.
--
Rich
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
2018-11-18 21:55 ` Rich Freeman
@ 2018-11-18 22:40 ` Zac Medico
2018-11-19 2:51 ` Rich Freeman
2018-11-19 10:45 ` M. J. Everitt
0 siblings, 2 replies; 6+ messages in thread
From: Zac Medico @ 2018-11-18 22:40 UTC (permalink / raw
To: gentoo-dev, Rich Freeman
[-- Attachment #1.1: Type: text/plain, Size: 3097 bytes --]
On 11/18/18 1:55 PM, Rich Freeman wrote:
> On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford <neddyseagoon@gentoo.org> wrote:
>>
>> Replying off list because I am not on the whitelist.
>
> That seems odd.
>
>> 1) append a uuid to each filename. Generated when the bin package file is generated.
>> 2) encode the hostname of the machine that generated the file
>> 3) encode the use flags in the filename.
>
> So, I brought up this same issue in the earlier discussion and it was
> considered out of scope, and I think this is fair. The GLEP does not
> specify filename, and IMO the standard for what goes INSIDE the file
> will work just fine with any future enhancements that address exactly
> this use case.
>
> Besides your case of building for a cluster, another use case is
> having a central binary repo that portage could check and utilize when
> a user's preferences happen to match what is pre-built.
>
> I suggest we start a different thread for any additional discussion of
> this use case. I was thinking and it probably wouldn't be super-hard
> to actually start building something like this. But, I don't want to
> derail this GLEP as I don't see any reason designing something like
> this needs to hold up the binary package format. Both the existing
> and proposed binary package formats will encode any metadata needed by
> the package manager inside the file, and the only extension we need is
> to encode identifying info in the filename.
>
> My idea is to basically have portage generate a tag with all the info
> needed to identify the "right" package, take a hash of it, and then
> stick that in the filename. Then when portage is looking for a binary
> package to use at install time it generates the same tag using the
> same algorithm and looks for a matching hash. If a hit is found then
> it reads the complete metadata in the file and applies all the sanity
> checks it already does. Generating of binary packages with the hash
> cold be made optional, and portage could also be configured to first
> look for the matching hash, then fall back to the existing naming
> convention, so that it would be compatible with existing generic
> names. So, users would get a choice as to whether they want to build
> up a library of these packages, or just have each build overwrite the
> last.
>
> Then the next step would be to allow these files to be fetched from a
> binary repo optionally, and then finally we'd need tools to create the
> repo. But, this step isn't needed for your use case. With the proper
> optional switches you could utilize as much of this scheme as you
> like.
>
> Also, you could optionally choose how much you want portage to encode
> in the tag and look for. Are you very fussy and only want a binary
> package with matching CFLAGS/USE/whatever? Or is just matching
> USE/arch/etc enough? Some of the existing portage options could
> potentially be re-used here.
We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
--
Thanks,
Zac
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 981 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
2018-11-18 22:40 ` Zac Medico
@ 2018-11-19 2:51 ` Rich Freeman
2018-11-19 18:45 ` Zac Medico
2018-11-19 10:45 ` M. J. Everitt
1 sibling, 1 reply; 6+ messages in thread
From: Rich Freeman @ 2018-11-19 2:51 UTC (permalink / raw
To: Zac Medico; +Cc: gentoo-dev
On Sun, Nov 18, 2018 at 5:40 PM Zac Medico <zmedico@gentoo.org> wrote:
>
> On 11/18/18 1:55 PM, Rich Freeman wrote:
> >
> > My idea is to basically have portage generate a tag with all the info
> > needed to identify the "right" package, take a hash of it, and then
> > stick that in the filename. Then when portage is looking for a binary
> > package to use at install time it generates the same tag using the
> > same algorithm and looks for a matching hash.
>
> We've already had this handled for a couple years now, via
> FEATURES=binpkg-multi-instance.
According to the make.conf manpage this simply numbers builds. So, if
you build something twice with the same config you end up with two
duplicate files (wasteful). Presumably if you had a large collection
of these packages portage would have to read the metadata within each
one to figure out which one is appropriate to install. That would be
expensive if IO is slow, such as when fetching packages online
on-demand.
But, it obviously is somewhat of an improvement for Roy's use case.
IMO using a content-hash of certain metadata would eliminate
duplication, and based on filename alone it would be clear whether the
sought-after binary package exists or not. As with the build numbers
you couldn't tell from filename inspection what packages you have, but
if you know what you want you could immediately find it. IMO trying
to cram all that metadata into a filename to make them more
transparent isn't a good idea, and using hashes lets the user set
their own policy regarding flexibility. Heck, you could auto-gen
symlinks for subsets of metadata (ie, the same file could be linked
from a file that specifies its USE flags but not its CFLAGS, so it
would be found if either an exact hit on CFLAGS was sought or if
CFLAGS were considered unimportant).
But, I'm certainly not suggesting that you're not allowed to go to bed
until you've built it. :)
--
Rich
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
2018-11-19 2:51 ` Rich Freeman
@ 2018-11-19 18:45 ` Zac Medico
0 siblings, 0 replies; 6+ messages in thread
From: Zac Medico @ 2018-11-19 18:45 UTC (permalink / raw
To: Rich Freeman, Zac Medico; +Cc: gentoo-dev
[-- Attachment #1.1: Type: text/plain, Size: 2951 bytes --]
On 11/18/18 6:51 PM, Rich Freeman wrote:
> On Sun, Nov 18, 2018 at 5:40 PM Zac Medico <zmedico@gentoo.org> wrote:
>>
>> On 11/18/18 1:55 PM, Rich Freeman wrote:
>>>
>>> My idea is to basically have portage generate a tag with all the info
>>> needed to identify the "right" package, take a hash of it, and then
>>> stick that in the filename. Then when portage is looking for a binary
>>> package to use at install time it generates the same tag using the
>>> same algorithm and looks for a matching hash.
>>
>> We've already had this handled for a couple years now, via
>> FEATURES=binpkg-multi-instance.
>
> According to the make.conf manpage this simply numbers builds. So, if
> you build something twice with the same config you end up with two
> duplicate files (wasteful). Presumably if you had a large collection
> of these packages portage would have to read the metadata within each
> one to figure out which one is appropriate to install. That would be
> expensive if IO is slow, such as when fetching packages online
> on-demand.
>
> But, it obviously is somewhat of an improvement for Roy's use case.
>
> IMO using a content-hash of certain metadata would eliminate
> duplication, and based on filename alone it would be clear whether the
> sought-after binary package exists or not. As with the build numbers
> you couldn't tell from filename inspection what packages you have, but
> if you know what you want you could immediately find it. IMO trying
> to cram all that metadata into a filename to make them more
> transparent isn't a good idea, and using hashes lets the user set
> their own policy regarding flexibility. Heck, you could auto-gen
> symlinks for subsets of metadata (ie, the same file could be linked
> from a file that specifies its USE flags but not its CFLAGS, so it
> would be found if either an exact hit on CFLAGS was sought or if
> CFLAGS were considered unimportant).
>
> But, I'm certainly not suggesting that you're not allowed to go to bed
> until you've built it. :)
The existing ${PKGDIR}/Packages file optimizes metadata access for both
local an remote access, and performs well for reasonable numbers of
packages.
If you insist on mixing binary packages in the same ${PKGDIR} for a
large number of alternative configurations, then it will not scale
unless you create a way to send your local configuration to the server
so that it can select the relevant package list for you.
However, bear in mind that mixing alternative configurations in the same
${PKGDIR} might lead to undesirable results if there is anything
relevant that is unaccounted for in the package metadata. Possible
unaccounted things may include:
1) glibc version the package was built against
2) symbols and/or sonames not accounted for by slot operator dependencies
3) soname dependencies (--usepkgonly + --ignore-soname-deps=n handles this)
--
Thanks,
Zac
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 981 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
2018-11-18 22:40 ` Zac Medico
2018-11-19 2:51 ` Rich Freeman
@ 2018-11-19 10:45 ` M. J. Everitt
1 sibling, 0 replies; 6+ messages in thread
From: M. J. Everitt @ 2018-11-19 10:45 UTC (permalink / raw
To: gentoo-dev
[-- Attachment #1.1: Type: text/plain, Size: 3200 bytes --]
On 18/11/18 22:40, Zac Medico wrote:
> On 11/18/18 1:55 PM, Rich Freeman wrote:
>> On Sun, Nov 18, 2018 at 4:10 PM Roy Bamford <neddyseagoon@gentoo.org> wrote:
>>> Replying off list because I am not on the whitelist.
>> That seems odd.
>>
>>> 1) append a uuid to each filename. Generated when the bin package file is generated.
>>> 2) encode the hostname of the machine that generated the file
>>> 3) encode the use flags in the filename.
>> So, I brought up this same issue in the earlier discussion and it was
>> considered out of scope, and I think this is fair. The GLEP does not
>> specify filename, and IMO the standard for what goes INSIDE the file
>> will work just fine with any future enhancements that address exactly
>> this use case.
>>
>> Besides your case of building for a cluster, another use case is
>> having a central binary repo that portage could check and utilize when
>> a user's preferences happen to match what is pre-built.
>>
>> I suggest we start a different thread for any additional discussion of
>> this use case. I was thinking and it probably wouldn't be super-hard
>> to actually start building something like this. But, I don't want to
>> derail this GLEP as I don't see any reason designing something like
>> this needs to hold up the binary package format. Both the existing
>> and proposed binary package formats will encode any metadata needed by
>> the package manager inside the file, and the only extension we need is
>> to encode identifying info in the filename.
>>
>> My idea is to basically have portage generate a tag with all the info
>> needed to identify the "right" package, take a hash of it, and then
>> stick that in the filename. Then when portage is looking for a binary
>> package to use at install time it generates the same tag using the
>> same algorithm and looks for a matching hash. If a hit is found then
>> it reads the complete metadata in the file and applies all the sanity
>> checks it already does. Generating of binary packages with the hash
>> cold be made optional, and portage could also be configured to first
>> look for the matching hash, then fall back to the existing naming
>> convention, so that it would be compatible with existing generic
>> names. So, users would get a choice as to whether they want to build
>> up a library of these packages, or just have each build overwrite the
>> last.
>>
>> Then the next step would be to allow these files to be fetched from a
>> binary repo optionally, and then finally we'd need tools to create the
>> repo. But, this step isn't needed for your use case. With the proper
>> optional switches you could utilize as much of this scheme as you
>> like.
>>
>> Also, you could optionally choose how much you want portage to encode
>> in the tag and look for. Are you very fussy and only want a binary
>> package with matching CFLAGS/USE/whatever? Or is just matching
>> USE/arch/etc enough? Some of the existing portage options could
>> potentially be re-used here.
> We've already had this handled for a couple years now, via
> FEATURES=binpkg-multi-instance.
Working fine for me for catalyst ARM runs ...
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-19 18:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-18 21:10 Fwd: Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com] Roy Bamford
2018-11-18 21:55 ` Rich Freeman
2018-11-18 22:40 ` Zac Medico
2018-11-19 2:51 ` Rich Freeman
2018-11-19 18:45 ` Zac Medico
2018-11-19 10:45 ` M. J. Everitt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox