public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] GSoC - cache sync/self-contained ebuilds
@ 2011-03-23  9:39 Michael Seifert
  2011-03-23 10:12 ` Fabian Groffen
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-23  9:39 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Gentoo team,

the other SoC ideas of interest are projects #14 [1] and #25 [2].
The idea is to automatically create ebuild descriptors that contain
metadata only. This way, server load on emerge --sync will be reduced,
since the ebuilds will only be fetched, if the package is about to be
installed.
In my opinion, this is the first step to take before trying to implement
the self-contained ebuilds. I think of them as Python eggs that contain
everything you need for installation (ebuild, patches, eclasses, sources).
Are these packaged ebuilds meant to be a replacement for the current
ebuilds in the long term? If so, the above mentioned reduction of server
load and network traffic would be diminished. Say you want to install 5
packages that use the eutils.eclass, you will have to download it 5
times (in a compressed archive of course).

A tool for creating the packaged ebuilds does not seem to cause much
trouble, either. What seems a bit more difficult to me, though, are the
changes to portage.

On the first glance, the rough specifications and tasks seem pretty
straight forward:
1. Create a tool that extracts an ebuild descriptor from an existing
ebuild (containing arch, version, dependencies, ebuild location,...)
2. Make portage work with the ebuild descriptors at first, then fetching
the required files
3. Create a tool that assembles an ebuild with its patches, sources, and
eclasses
4. Make portage use the assembled archives

However, since I have merged TWO project ideas, I surely have overlooked
some traps :)
Probably I underestimated points 2 and 4?
Please, share you opinions.

[1] http://www.gentoo.org/proj/en/userrel/soc/ideas.xml#doc_chap2_sect14
[2] http://www.gentoo.org/proj/en/userrel/soc/ideas.xml#doc_chap2_sect25


Best regards and thanks in advance
Michael Seifert
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2Jv88ACgkQnzX+Jf4GTUyj3wCgxijF5HzPswow4gsqqABnBGuT
jsYAmwcj4wI1LznnwCnpWfGWXEKO0Ji9
=nW4G
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23  9:39 [gentoo-soc] GSoC - cache sync/self-contained ebuilds Michael Seifert
@ 2011-03-23 10:12 ` Fabian Groffen
  2011-03-23 17:44   ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Fabian Groffen @ 2011-03-23 10:12 UTC (permalink / raw
  To: gentoo-soc

Hi Michael,

On 23-03-2011 10:39:27 +0100, Michael Seifert wrote:
> On the first glance, the rough specifications and tasks seem pretty
> straight forward:
> 1. Create a tool that extracts an ebuild descriptor from an existing
> ebuild (containing arch, version, dependencies, ebuild location,...)
> 2. Make portage work with the ebuild descriptors at first, then fetching
> the required files
> 3. Create a tool that assembles an ebuild with its patches, sources, and
> eclasses
> 4. Make portage use the assembled archives

Can you quantify the gains here?  How much space do you win by removing
the build-recipe code?  Could you also do with the metadata directory
alone?  How much do you lose to fetch the ebuilds you need eventually?
Do you intend to cache "full" ebuilds?

In short:
What do you win, when and how?

:)

Regards,

-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23 10:12 ` Fabian Groffen
@ 2011-03-23 17:44   ` Michael Seifert
  2011-03-23 18:43     ` Rich Freeman
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-23 17:44 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 23.03.2011 11:12, schrieb Fabian Groffen:
> Hi Michael,
> 
> On 23-03-2011 10:39:27 +0100, Michael Seifert wrote:
>> On the first glance, the rough specifications and tasks seem pretty
>> straight forward:
>> 1. Create a tool that extracts an ebuild descriptor from an existing
>> ebuild (containing arch, version, dependencies, ebuild location,...)
>> 2. Make portage work with the ebuild descriptors at first, then fetching
>> the required files
>> 3. Create a tool that assembles an ebuild with its patches, sources, and
>> eclasses
>> 4. Make portage use the assembled archives
> 
> Can you quantify the gains here?  How much space do you win by removing
> the build-recipe code?  Could you also do with the metadata directory
> alone? 

The metadata should be sufficient for the ebuild descriptions, yes. But
I have to check, if fetch restrictions or interactive installations are
contained in the metadata. Here are some very early estimates based on
my local portage tree (without excludes):

size of /usr/portage/ without distfiles = 276.4 MB
size of metadata + profiles + licenses + some other files = 34.0 MB
After "steps" 1 and 2, the for emerge --sync would shrink to  approx.
12.28 % of the current size. There is room for optimizations, of course.


> How much do you lose to fetch the ebuilds you need eventually?

I cannot tell you at this stage, sorry. The real loss for steps 3/4 has
to be measured after the implementation. The problem with the estimates
here is that the assembled ebuilds also contain the sources and the
eclasses. Maybe I will do some number crunching on a few selected ebuilds.

> Do you intend to cache "full" ebuilds?

I am not sure, if I got you right, but basically yes. Say I want to
install sys-apps/gentoo-sources: "emerge -av gentoo-sources" will look
at the ebuild descriptor (like the respective metadata file). With the
descriptor's help, it identifies the required dependencies, checks
keywords/masking, and so on. Just after you hit enter to install, emerge
fetches the real ebuild and  behaves usual.

Best regards,
Michael Seifert
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2KMYIACgkQnzX+Jf4GTUzAnQCfdzr+j6jVagYIS557vRERA6yJ
MREAoMaJ1jPo/zUi4Y3FFq2oOdgEHkGI
=yuVO
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23 17:44   ` Michael Seifert
@ 2011-03-23 18:43     ` Rich Freeman
  2011-03-23 19:47       ` Donnie Berkholz
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Freeman @ 2011-03-23 18:43 UTC (permalink / raw
  To: gentoo-soc; +Cc: Michael Seifert

On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert
<michael.seifert@gmx.net> wrote:
> Am 23.03.2011 11:12, schrieb Fabian Groffen:
>> How much do you lose to fetch the ebuilds you need eventually?
>
> I cannot tell you at this stage, sorry. The real loss for steps 3/4 has
> to be measured after the implementation. The problem with the estimates
> here is that the assembled ebuilds also contain the sources and the
> eclasses. Maybe I will do some number crunching on a few selected ebuilds.
>

A more critical factor could be the dependencies - unless we otherwise
cache them.  To install a package you need to walk the dependency tree
(well, at least until you hit installed packages with the right USE
flags).  That requires one set of fetches for each level you traverse,
and that means at least one round trip per level.

Is this a solution in search of a problem?  It seems like there are a
lot of tradeoffs with an approach like this.  If space or compression
CPU, etc is the real issue, would it make more sense to just gzip all
the ebuilds or something?

Rich



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23 18:43     ` Rich Freeman
@ 2011-03-23 19:47       ` Donnie Berkholz
  2011-03-23 21:01         ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Donnie Berkholz @ 2011-03-23 19:47 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

On 14:43 Wed 23 Mar     , Rich Freeman wrote:
> On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert
> <michael.seifert@gmx.net> wrote:
> > I cannot tell you at this stage, sorry. The real loss for steps 3/4 
> > has to be measured after the implementation. The problem with the 
> > estimates here is that the assembled ebuilds also contain the 
> > sources and the eclasses. Maybe I will do some number crunching on a 
> > few selected ebuilds.
> 
> A more critical factor could be the dependencies - unless we otherwise 
> cache them.  To install a package you need to walk the dependency tree 
> (well, at least until you hit installed packages with the right USE 
> flags).  That requires one set of fetches for each level you traverse, 
> and that means at least one round trip per level.

Perhaps both of you need to take a quick stroll through the metadata 
cache (/usr/portage/metadata/cache/) and check out what those files look 
like. =)

Here's a description of the format from our Package Manager 
Specification (PMS):

http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-16000014

-- 
Thanks,
Donnie

Donnie Berkholz
Admin, Summer of Code
Gentoo Linux
Blog: http://dberkholz.com

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23 19:47       ` Donnie Berkholz
@ 2011-03-23 21:01         ` Michael Seifert
  2011-03-24 16:58           ` Zac Medico
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-23 21:01 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 23.03.2011 20:47, schrieb Donnie Berkholz:
> On 14:43 Wed 23 Mar     , Rich Freeman wrote:
>> On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert
>> <michael.seifert@gmx.net> wrote:
>>> I cannot tell you at this stage, sorry. The real loss for steps 3/4 
>>> has to be measured after the implementation. The problem with the 
>>> estimates here is that the assembled ebuilds also contain the 
>>> sources and the eclasses. Maybe I will do some number crunching on a 
>>> few selected ebuilds.
>>
>> A more critical factor could be the dependencies - unless we otherwise 
>> cache them.  To install a package you need to walk the dependency tree 
>> (well, at least until you hit installed packages with the right USE 
>> flags).  That requires one set of fetches for each level you traverse, 
>> and that means at least one round trip per level.

You are right. I only considered the size improvements, not the changes
to speed.
Anyway, is the calculation of dependencies handled differently in
current portage versions?

> 
> Perhaps both of you need to take a quick stroll through the metadata 
> cache (/usr/portage/metadata/cache/) and check out what those files look 
> like. =)
> 
> Here's a description of the format from our Package Manager 
> Specification (PMS):
> 
> http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-16000014
> 

I already did, but the documentation helps a lot, thanks. This way one
doesn't have to match the digits in the metadata (e.g. EAPI version)
with those in the ebuild to know what they are good for :)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2KX64ACgkQnzX+Jf4GTUwQaQCgiR5xig03B6ELbDuOP+GkbUSA
KW0AnAtCKIaK73NGOqZ9KoyhDmEkW5f8
=gyhl
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-23 21:01         ` Michael Seifert
@ 2011-03-24 16:58           ` Zac Medico
  2011-03-27 14:28             ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Zac Medico @ 2011-03-24 16:58 UTC (permalink / raw
  To: gentoo-soc

On 03/23/2011 02:01 PM, Michael Seifert wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am 23.03.2011 20:47, schrieb Donnie Berkholz:
>> On 14:43 Wed 23 Mar     , Rich Freeman wrote:
>>> On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert
>>> <michael.seifert@gmx.net> wrote:
>>>> I cannot tell you at this stage, sorry. The real loss for steps 3/4 
>>>> has to be measured after the implementation. The problem with the 
>>>> estimates here is that the assembled ebuilds also contain the 
>>>> sources and the eclasses. Maybe I will do some number crunching on a 
>>>> few selected ebuilds.
>>>
>>> A more critical factor could be the dependencies - unless we otherwise 
>>> cache them.  To install a package you need to walk the dependency tree 
>>> (well, at least until you hit installed packages with the right USE 
>>> flags).  That requires one set of fetches for each level you traverse, 
>>> and that means at least one round trip per level.
> 
> You are right. I only considered the size improvements, not the changes
> to speed.
> Anyway, is the calculation of dependencies handled differently in
> current portage versions?

Well, it would be inefficient to open separate TCP connections for
individual metadata files since there are so many of them and they are
so small. This is why package managers typically download the metadata
for all packages as a single bundle. For example, see the type of
metadata bundle that is used to implement PORTAGE_BINHOST support:

  http://tinderbox.dev.gentoo.org/default-linux/x86/Packages

It's conceivable that you could simply use rsync to sync the
metadata/cache/ subdirectory from
rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree
constantly mutates and doesn't provide any kind version control, it
would not be very practical to use it in this way. If you fetch the
metadata and the ebuilds separately, you need a way to guarantee that
you can fetch exactly the same revisions of ebuilds that the earlier
fetched metadata corresponds to.
-- 
Thanks,
Zac



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-24 16:58           ` Zac Medico
@ 2011-03-27 14:28             ` Michael Seifert
  2011-03-27 19:39               ` Zac Medico
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-27 14:28 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 24.03.2011 17:58, schrieb Zac Medico:
> Well, it would be inefficient to open separate TCP connections for
> individual metadata files since there are so many of them and they are
> so small. This is why package managers typically download the metadata
> for all packages as a single bundle. For example, see the type of
> metadata bundle that is used to implement PORTAGE_BINHOST support:
> 
>   http://tinderbox.dev.gentoo.org/default-linux/x86/Packages
> 

Is there a specific reason why the PORTAGE_BINHOST metadata is different
from the metadata/cache format?
I like the BINHOST metadata better, even if it is split up into several
files, because it would already contain the ebuild version it was
generated for. Probably it would be a good idea to merge the information
of both metadata into a single unified format?

This would also solve the problem with the missing version control
(described below) as well as simplifying the way portage handles
metadata. On the other hand it would be an even more substantial change,
which is not necessarily a bad thing. Portage is supposed to work the
same way as before – just faster.

> It's conceivable that you could simply use rsync to sync the
> metadata/cache/ subdirectory from
> rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree
> constantly mutates and doesn't provide any kind version control, it
> would not be very practical to use it in this way. If you fetch the
> metadata and the ebuilds separately, you need a way to guarantee that
> you can fetch exactly the same revisions of ebuilds that the earlier
> fetched metadata corresponds to.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2PSYAACgkQnzX+Jf4GTUyz7ACcCV44bXSEwoyCg/6uMz8E9/2g
c+EAn1m/BpF7rKkSSmpouousupVCbUHL
=G4GS
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-27 14:28             ` Michael Seifert
@ 2011-03-27 19:39               ` Zac Medico
  2011-03-29 14:31                 ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Zac Medico @ 2011-03-27 19:39 UTC (permalink / raw
  To: gentoo-soc

On 03/27/2011 07:28 AM, Michael Seifert wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am 24.03.2011 17:58, schrieb Zac Medico:
>> Well, it would be inefficient to open separate TCP connections for
>> individual metadata files since there are so many of them and they are
>> so small. This is why package managers typically download the metadata
>> for all packages as a single bundle. For example, see the type of
>> metadata bundle that is used to implement PORTAGE_BINHOST support:
>>
>>   http://tinderbox.dev.gentoo.org/default-linux/x86/Packages
>>
> 
> Is there a specific reason why the PORTAGE_BINHOST metadata is different
> from the metadata/cache format?

They have minor differences because they were designed for slightly
different use cases:

The metadata/cache format was designed to be distributed together with
the ebuilds that it was generated from. Its main drawback is that it can
be slow to read many small files since it may require lots of disk
seeks. We could pack them all into a single file, similar to one that
PORTAGE_BINHOST uses. That would help for tools like eix since it's
faster to read one big file than many small files. However, if it was
fetched earlier and separate from the ebuilds, it wouldn't be very
practical for dependency calculations unless you provided a way to fetch
exactly the same revisions of ebuilds (and inherited eclasses which can
modify dependencies) that the earlier fetched metadata corresponds to.
For example, the cache could be made to refer to a UUID would be used to
generate a URI in order to fetch a particular revision of ebuild/eclass
bundle that exactly corresponds to the cache entry.

The PORTAGE_BINHOST cache format is better than the metadata/cache
format for the use case that it's designed for, however the current
design has a race condition which has been experienced by chromium-os
developers:

  http://code.google.com/p/chromium-os/issues/detail?id=3225

> I like the BINHOST metadata better, even if it is split up into several
> files, because it would already contain the ebuild version it was
> generated for. Probably it would be a good idea to merge the information
> of both metadata into a single unified format?
> This would also solve the problem with the missing version control
> (described below) as well as simplifying the way portage handles
> metadata. On the other hand it would be an even more substantial change,
> which is not necessarily a bad thing. Portage is supposed to work the
> same way as before – just faster.

Well, a new cache format is only part of the solution. In order to
provide revision control that's necessary for practical dependency
calculations when the cache is fetched earlier that the
ebuilds/eclasses, you're also going to need to create individually
fetchable revisioned ebuild/eclass bundles that the cache will refer to
(without any race conditions).

>> It's conceivable that you could simply use rsync to sync the
>> metadata/cache/ subdirectory from
>> rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree
>> constantly mutates and doesn't provide any kind version control, it
>> would not be very practical to use it in this way. If you fetch the
>> metadata and the ebuilds separately, you need a way to guarantee that
>> you can fetch exactly the same revisions of ebuilds that the earlier
>> fetched metadata corresponds to.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk2PSYAACgkQnzX+Jf4GTUyz7ACcCV44bXSEwoyCg/6uMz8E9/2g
> c+EAn1m/BpF7rKkSSmpouousupVCbUHL
> =G4GS
> -----END PGP SIGNATURE-----
> 


-- 
Thanks,
Zac



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-27 19:39               ` Zac Medico
@ 2011-03-29 14:31                 ` Michael Seifert
  2011-03-29 15:47                   ` Zac Medico
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-29 14:31 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 27.03.2011 21:39, schrieb Zac Medico:
> On 03/27/2011 07:28 AM, Michael Seifert wrote:
> I like the BINHOST metadata better, even if it is split up into several
> files, because it would already contain the ebuild version it was
> generated for. Probably it would be a good idea to merge the information
> of both metadata into a single unified format?
> This would also solve the problem with the missing version control
> (described below) as well as simplifying the way portage handles
> metadata. On the other hand it would be an even more substantial change,
> which is not necessarily a bad thing. Portage is supposed to work the
> same way as before \x13 just faster.
> 
>> Well, a new cache format is only part of the solution. In order to
>> provide revision control that's necessary for practical dependency
>> calculations when the cache is fetched earlier that the
>> ebuilds/eclasses, you're also going to need to create individually
>> fetchable revisioned ebuild/eclass bundles that the cache will refer to
>> (without any race conditions).

If I understood your point correctly, there can be problems, if there
are changes to eclasses after a user ran emerge --sync and before the
ebuilds are fetched (race condition between server and client). In such
a case, the correct ebuild would be installed using a wrong environment
(i.e. the wrong eclasses), because the metadata is outdated and contains
wrong dependencies. Did this cover your outlined case?

I can think of three solutions for this:
1. Self-contained ebuilds
Your proposal of packaging the ebuild together with its eclasses and the
source code would make the need of eclass versioning unnecessary, since
the ebuild is always packaged within the correct environment.
(Creates a new race condition, see your link :)

2. Using the VCS
Store the versions of eclasses in the metadata and fetch it from a
repository before installing the ebuilds. This would pull a dependency
on a VCS (CVS in this case), though.

3. Fetching the eclasses together with the metadata
This way you would have a local snapshot of the build environment, which
would eliminate the race condition and the need to identify eclass
versions from the metadata, because it stays consitent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2R7V0ACgkQnzX+Jf4GTUyUWwCfXhrDkUotVqvaef81Mj27bWF1
WqkAoM4a3WcnkQVBIhtOvnGFjzr3OL5m
=cUbO
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-29 14:31                 ` Michael Seifert
@ 2011-03-29 15:47                   ` Zac Medico
  2011-03-30 17:33                     ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Zac Medico @ 2011-03-29 15:47 UTC (permalink / raw
  To: gentoo-soc

On 03/29/2011 07:31 AM, Michael Seifert wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am 27.03.2011 21:39, schrieb Zac Medico:
>> On 03/27/2011 07:28 AM, Michael Seifert wrote:
>> I like the BINHOST metadata better, even if it is split up into several
>> files, because it would already contain the ebuild version it was
>> generated for. Probably it would be a good idea to merge the information
>> of both metadata into a single unified format?
>> This would also solve the problem with the missing version control
>> (described below) as well as simplifying the way portage handles
>> metadata. On the other hand it would be an even more substantial change,
>> which is not necessarily a bad thing. Portage is supposed to work the
>> same way as before \x13 just faster.
>>
>>> Well, a new cache format is only part of the solution. In order to
>>> provide revision control that's necessary for practical dependency
>>> calculations when the cache is fetched earlier that the
>>> ebuilds/eclasses, you're also going to need to create individually
>>> fetchable revisioned ebuild/eclass bundles that the cache will refer to
>>> (without any race conditions).
> 
> If I understood your point correctly, there can be problems, if there
> are changes to eclasses after a user ran emerge --sync and before the
> ebuilds are fetched (race condition between server and client). In such
> a case, the correct ebuild would be installed using a wrong environment
> (i.e. the wrong eclasses), because the metadata is outdated and contains
> wrong dependencies. Did this cover your outlined case?

That's mostly correct, however, it's not just the eclasses that
introduce the race condition. Just like the eclasses, the ebuilds
themselves can be modified. In addition, so the can the files that may
be included with the ebuilds in the "files" subdirectory ($FILESDIR).

In order to be exhaustively complete, you'd also have to include any
licenses referenced by LICENSE variable if the ebuild (these are located
in the licenses/ subdirectory of the repository).

You'd probably also want to include any matching package.mask entries
from the global profiles/package.mask file, since this is can be very
relevant to dependency calculations.

Finally, there's the user's architecture-specific profile. This can also
affect dependency calculations via things like package.mask,
package.unmask, use.mask, and use.force. If you want to be entirely
exhaustive, then you'll need your ebuild metadata to reference a
snapshot of this profile.

> I can think of three solutions for this:
> 1. Self-contained ebuilds
> Your proposal of packaging the ebuild together with its eclasses and the
> source code would make the need of eclass versioning unnecessary, since
> the ebuild is always packaged within the correct environment.
> (Creates a new race condition, see your link :)

Really, the eclass versioning would not be unnecessary, but it would be
tied directly to the ebuild versioning. If we're going to be completely
exhaustive, then it might make sense to have the ebuild metadata
reference separate eclass, license, package.mask, and profile bundles.

> 2. Using the VCS
> Store the versions of eclasses in the metadata and fetch it from a
> repository before installing the ebuilds. This would pull a dependency
> on a VCS (CVS in this case), though.

Not necessarily. If the ebuild metadata contains UUIDs that the client
can translate to URIs using a predefined protocol, then the client
simply needs to be aware of the protocol so that it can fetch the
appropriate URIs.

> 3. Fetching the eclasses together with the metadata
> This way you would have a local snapshot of the build environment, which
> would eliminate the race condition and the need to identify eclass
> versions from the metadata, because it stays consitent.

Right. But if you decide to split out license, package.mask, and profile
bundles as discussed above, you might also decide to do that for
eclasses as well.
-- 
Thanks,
Zac



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-29 15:47                   ` Zac Medico
@ 2011-03-30 17:33                     ` Michael Seifert
  2011-03-30 18:11                       ` Zac Medico
  0 siblings, 1 reply; 14+ messages in thread
From: Michael Seifert @ 2011-03-30 17:33 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 29.03.2011 17:47, schrieb Zac Medico:
>> 2. Using the VCS
>> Store the versions of eclasses in the metadata and fetch it from a
>> repository before installing the ebuilds. This would pull a dependency
>> on a VCS (CVS in this case), though.
> 
> Not necessarily. If the ebuild metadata contains UUIDs that the client
> can translate to URIs using a predefined protocol, then the client
> simply needs to be aware of the protocol so that it can fetch the
> appropriate URIs.
> 

Although I like the idea with the UUID, I don't know if it is really
worth the trouble. Due to the number of eclasses let alone the files in
/usr/portage/profile, there is a huge number of permutations and you
would need a very complex (in terms of size) UUID.

Is it enough for an SoC project to "just" make portage leave out the
ebuilds on synchronization?
This would be of similar scope as the original idea (Cache sync).

I also miss the documentation a bit, to be honest. Maybe improving the
portage documentation would be a nice addition?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2TaX8ACgkQnzX+Jf4GTUxsdgCfRXDz4MGSBkz7iVugjMjSwNr1
/loAn2i07Y0tVLgfvGWHipiFW+RVa5JP
=Frv4
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-30 17:33                     ` Michael Seifert
@ 2011-03-30 18:11                       ` Zac Medico
  2011-03-31 11:55                         ` Michael Seifert
  0 siblings, 1 reply; 14+ messages in thread
From: Zac Medico @ 2011-03-30 18:11 UTC (permalink / raw
  To: gentoo-soc

On 03/30/2011 10:33 AM, Michael Seifert wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am 29.03.2011 17:47, schrieb Zac Medico:
>>> 2. Using the VCS
>>> Store the versions of eclasses in the metadata and fetch it from a
>>> repository before installing the ebuilds. This would pull a dependency
>>> on a VCS (CVS in this case), though.
>>
>> Not necessarily. If the ebuild metadata contains UUIDs that the client
>> can translate to URIs using a predefined protocol, then the client
>> simply needs to be aware of the protocol so that it can fetch the
>> appropriate URIs.
>>
> 
> Although I like the idea with the UUID, I don't know if it is really
> worth the trouble. Due to the number of eclasses let alone the files in
> /usr/portage/profile, there is a huge number of permutations and you
> would need a very complex (in terms of size) UUID.

For our purposes, it's really not necessary to use full RFC 4122 128-bit
UUIDs. For example, if the repository is only refreshed once every 30
minutes (like the rsync tree currently is), then timestamps with 1
minute precision would be more than adequate to uniquely identify a
given revision of a particular bundle.

> Is it enough for an SoC project to "just" make portage leave out the
> ebuilds on synchronization?
> This would be of similar scope as the original idea (Cache sync).

The metadata/cache/ and profiles/ subdirectories would be enough
information to do correct dependency calculations, but in practice I
think the race conditions involved in trying to actually build anything
from those calculations would lead to overwhelming dissatisfaction and
complaints from users.

> I also miss the documentation a bit, to be honest. Maybe improving the
> portage documentation would be a nice addition?

Yes, we can always use more documentation.

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAk2TaX8ACgkQnzX+Jf4GTUxsdgCfRXDz4MGSBkz7iVugjMjSwNr1
> /loAn2i07Y0tVLgfvGWHipiFW+RVa5JP
> =Frv4
> -----END PGP SIGNATURE-----
> 


-- 
Thanks,
Zac



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds
  2011-03-30 18:11                       ` Zac Medico
@ 2011-03-31 11:55                         ` Michael Seifert
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Seifert @ 2011-03-31 11:55 UTC (permalink / raw
  To: gentoo-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 30.03.2011 20:11, schrieb Zac Medico:
> On 03/30/2011 10:33 AM, Michael Seifert wrote:
> Am 29.03.2011 17:47, schrieb Zac Medico:
>>>>> 2. Using the VCS
>>>>> Store the versions of eclasses in the metadata and fetch it from a
>>>>> repository before installing the ebuilds. This would pull a dependency
>>>>> on a VCS (CVS in this case), though.
>>>>
>>>> Not necessarily. If the ebuild metadata contains UUIDs that the client
>>>> can translate to URIs using a predefined protocol, then the client
>>>> simply needs to be aware of the protocol so that it can fetch the
>>>> appropriate URIs.
>>>>
> 
> Although I like the idea with the UUID, I don't know if it is really
> worth the trouble. Due to the number of eclasses let alone the files in
> /usr/portage/profile, there is a huge number of permutations and you
> would need a very complex (in terms of size) UUID.
> 
>> For our purposes, it's really not necessary to use full RFC 4122 128-bit
>> UUIDs. For example, if the repository is only refreshed once every 30
>> minutes (like the rsync tree currently is), then timestamps with 1
>> minute precision would be more than adequate to uniquely identify a
>> given revision of a particular bundle.
> 
> Is it enough for an SoC project to "just" make portage leave out the
> ebuilds on synchronization?
> This would be of similar scope as the original idea (Cache sync).
> 
>> The metadata/cache/ and profiles/ subdirectories would be enough
>> information to do correct dependency calculations, but in practice I
>> think the race conditions involved in trying to actually build anything
>> from those calculations would lead to overwhelming dissatisfaction and
>> complaints from users.
> 

Now I got it! I was thinking much too complicated with the UUIDs. Thank
you for the clarification!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2Ua6QACgkQnzX+Jf4GTUynmACgqmBA3ZLBI2hQhtGRRWkBt8tl
icAAoJerNgjOkLB2zRGullnOSEnTwjOY
=YmHm
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-03-31 11:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-23  9:39 [gentoo-soc] GSoC - cache sync/self-contained ebuilds Michael Seifert
2011-03-23 10:12 ` Fabian Groffen
2011-03-23 17:44   ` Michael Seifert
2011-03-23 18:43     ` Rich Freeman
2011-03-23 19:47       ` Donnie Berkholz
2011-03-23 21:01         ` Michael Seifert
2011-03-24 16:58           ` Zac Medico
2011-03-27 14:28             ` Michael Seifert
2011-03-27 19:39               ` Zac Medico
2011-03-29 14:31                 ` Michael Seifert
2011-03-29 15:47                   ` Zac Medico
2011-03-30 17:33                     ` Michael Seifert
2011-03-30 18:11                       ` Zac Medico
2011-03-31 11:55                         ` Michael Seifert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox