* [gentoo-soc] GSoC - cache sync/self-contained ebuilds @ 2011-03-23 9:39 Michael Seifert 2011-03-23 10:12 ` Fabian Groffen 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-23 9:39 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Gentoo team, the other SoC ideas of interest are projects #14 [1] and #25 [2]. The idea is to automatically create ebuild descriptors that contain metadata only. This way, server load on emerge --sync will be reduced, since the ebuilds will only be fetched, if the package is about to be installed. In my opinion, this is the first step to take before trying to implement the self-contained ebuilds. I think of them as Python eggs that contain everything you need for installation (ebuild, patches, eclasses, sources). Are these packaged ebuilds meant to be a replacement for the current ebuilds in the long term? If so, the above mentioned reduction of server load and network traffic would be diminished. Say you want to install 5 packages that use the eutils.eclass, you will have to download it 5 times (in a compressed archive of course). A tool for creating the packaged ebuilds does not seem to cause much trouble, either. What seems a bit more difficult to me, though, are the changes to portage. On the first glance, the rough specifications and tasks seem pretty straight forward: 1. Create a tool that extracts an ebuild descriptor from an existing ebuild (containing arch, version, dependencies, ebuild location,...) 2. Make portage work with the ebuild descriptors at first, then fetching the required files 3. Create a tool that assembles an ebuild with its patches, sources, and eclasses 4. Make portage use the assembled archives However, since I have merged TWO project ideas, I surely have overlooked some traps :) Probably I underestimated points 2 and 4? Please, share you opinions. [1] http://www.gentoo.org/proj/en/userrel/soc/ideas.xml#doc_chap2_sect14 [2] http://www.gentoo.org/proj/en/userrel/soc/ideas.xml#doc_chap2_sect25 Best regards and thanks in advance Michael Seifert -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2Jv88ACgkQnzX+Jf4GTUyj3wCgxijF5HzPswow4gsqqABnBGuT jsYAmwcj4wI1LznnwCnpWfGWXEKO0Ji9 =nW4G -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 9:39 [gentoo-soc] GSoC - cache sync/self-contained ebuilds Michael Seifert @ 2011-03-23 10:12 ` Fabian Groffen 2011-03-23 17:44 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Fabian Groffen @ 2011-03-23 10:12 UTC (permalink / raw To: gentoo-soc Hi Michael, On 23-03-2011 10:39:27 +0100, Michael Seifert wrote: > On the first glance, the rough specifications and tasks seem pretty > straight forward: > 1. Create a tool that extracts an ebuild descriptor from an existing > ebuild (containing arch, version, dependencies, ebuild location,...) > 2. Make portage work with the ebuild descriptors at first, then fetching > the required files > 3. Create a tool that assembles an ebuild with its patches, sources, and > eclasses > 4. Make portage use the assembled archives Can you quantify the gains here? How much space do you win by removing the build-recipe code? Could you also do with the metadata directory alone? How much do you lose to fetch the ebuilds you need eventually? Do you intend to cache "full" ebuilds? In short: What do you win, when and how? :) Regards, -- Fabian Groffen Gentoo on a different level ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 10:12 ` Fabian Groffen @ 2011-03-23 17:44 ` Michael Seifert 2011-03-23 18:43 ` Rich Freeman 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-23 17:44 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 23.03.2011 11:12, schrieb Fabian Groffen: > Hi Michael, > > On 23-03-2011 10:39:27 +0100, Michael Seifert wrote: >> On the first glance, the rough specifications and tasks seem pretty >> straight forward: >> 1. Create a tool that extracts an ebuild descriptor from an existing >> ebuild (containing arch, version, dependencies, ebuild location,...) >> 2. Make portage work with the ebuild descriptors at first, then fetching >> the required files >> 3. Create a tool that assembles an ebuild with its patches, sources, and >> eclasses >> 4. Make portage use the assembled archives > > Can you quantify the gains here? How much space do you win by removing > the build-recipe code? Could you also do with the metadata directory > alone? The metadata should be sufficient for the ebuild descriptions, yes. But I have to check, if fetch restrictions or interactive installations are contained in the metadata. Here are some very early estimates based on my local portage tree (without excludes): size of /usr/portage/ without distfiles = 276.4 MB size of metadata + profiles + licenses + some other files = 34.0 MB After "steps" 1 and 2, the for emerge --sync would shrink to approx. 12.28 % of the current size. There is room for optimizations, of course. > How much do you lose to fetch the ebuilds you need eventually? I cannot tell you at this stage, sorry. The real loss for steps 3/4 has to be measured after the implementation. The problem with the estimates here is that the assembled ebuilds also contain the sources and the eclasses. Maybe I will do some number crunching on a few selected ebuilds. > Do you intend to cache "full" ebuilds? I am not sure, if I got you right, but basically yes. Say I want to install sys-apps/gentoo-sources: "emerge -av gentoo-sources" will look at the ebuild descriptor (like the respective metadata file). With the descriptor's help, it identifies the required dependencies, checks keywords/masking, and so on. Just after you hit enter to install, emerge fetches the real ebuild and behaves usual. Best regards, Michael Seifert -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2KMYIACgkQnzX+Jf4GTUzAnQCfdzr+j6jVagYIS557vRERA6yJ MREAoMaJ1jPo/zUi4Y3FFq2oOdgEHkGI =yuVO -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 17:44 ` Michael Seifert @ 2011-03-23 18:43 ` Rich Freeman 2011-03-23 19:47 ` Donnie Berkholz 0 siblings, 1 reply; 14+ messages in thread From: Rich Freeman @ 2011-03-23 18:43 UTC (permalink / raw To: gentoo-soc; +Cc: Michael Seifert On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert <michael.seifert@gmx.net> wrote: > Am 23.03.2011 11:12, schrieb Fabian Groffen: >> How much do you lose to fetch the ebuilds you need eventually? > > I cannot tell you at this stage, sorry. The real loss for steps 3/4 has > to be measured after the implementation. The problem with the estimates > here is that the assembled ebuilds also contain the sources and the > eclasses. Maybe I will do some number crunching on a few selected ebuilds. > A more critical factor could be the dependencies - unless we otherwise cache them. To install a package you need to walk the dependency tree (well, at least until you hit installed packages with the right USE flags). That requires one set of fetches for each level you traverse, and that means at least one round trip per level. Is this a solution in search of a problem? It seems like there are a lot of tradeoffs with an approach like this. If space or compression CPU, etc is the real issue, would it make more sense to just gzip all the ebuilds or something? Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 18:43 ` Rich Freeman @ 2011-03-23 19:47 ` Donnie Berkholz 2011-03-23 21:01 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Donnie Berkholz @ 2011-03-23 19:47 UTC (permalink / raw To: gentoo-soc [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] On 14:43 Wed 23 Mar , Rich Freeman wrote: > On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert > <michael.seifert@gmx.net> wrote: > > I cannot tell you at this stage, sorry. The real loss for steps 3/4 > > has to be measured after the implementation. The problem with the > > estimates here is that the assembled ebuilds also contain the > > sources and the eclasses. Maybe I will do some number crunching on a > > few selected ebuilds. > > A more critical factor could be the dependencies - unless we otherwise > cache them. To install a package you need to walk the dependency tree > (well, at least until you hit installed packages with the right USE > flags). That requires one set of fetches for each level you traverse, > and that means at least one round trip per level. Perhaps both of you need to take a quick stroll through the metadata cache (/usr/portage/metadata/cache/) and check out what those files look like. =) Here's a description of the format from our Package Manager Specification (PMS): http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-16000014 -- Thanks, Donnie Donnie Berkholz Admin, Summer of Code Gentoo Linux Blog: http://dberkholz.com [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 19:47 ` Donnie Berkholz @ 2011-03-23 21:01 ` Michael Seifert 2011-03-24 16:58 ` Zac Medico 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-23 21:01 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 23.03.2011 20:47, schrieb Donnie Berkholz: > On 14:43 Wed 23 Mar , Rich Freeman wrote: >> On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert >> <michael.seifert@gmx.net> wrote: >>> I cannot tell you at this stage, sorry. The real loss for steps 3/4 >>> has to be measured after the implementation. The problem with the >>> estimates here is that the assembled ebuilds also contain the >>> sources and the eclasses. Maybe I will do some number crunching on a >>> few selected ebuilds. >> >> A more critical factor could be the dependencies - unless we otherwise >> cache them. To install a package you need to walk the dependency tree >> (well, at least until you hit installed packages with the right USE >> flags). That requires one set of fetches for each level you traverse, >> and that means at least one round trip per level. You are right. I only considered the size improvements, not the changes to speed. Anyway, is the calculation of dependencies handled differently in current portage versions? > > Perhaps both of you need to take a quick stroll through the metadata > cache (/usr/portage/metadata/cache/) and check out what those files look > like. =) > > Here's a description of the format from our Package Manager > Specification (PMS): > > http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-16000014 > I already did, but the documentation helps a lot, thanks. This way one doesn't have to match the digits in the metadata (e.g. EAPI version) with those in the ebuild to know what they are good for :) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2KX64ACgkQnzX+Jf4GTUwQaQCgiR5xig03B6ELbDuOP+GkbUSA KW0AnAtCKIaK73NGOqZ9KoyhDmEkW5f8 =gyhl -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-23 21:01 ` Michael Seifert @ 2011-03-24 16:58 ` Zac Medico 2011-03-27 14:28 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Zac Medico @ 2011-03-24 16:58 UTC (permalink / raw To: gentoo-soc On 03/23/2011 02:01 PM, Michael Seifert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Am 23.03.2011 20:47, schrieb Donnie Berkholz: >> On 14:43 Wed 23 Mar , Rich Freeman wrote: >>> On Wed, Mar 23, 2011 at 1:44 PM, Michael Seifert >>> <michael.seifert@gmx.net> wrote: >>>> I cannot tell you at this stage, sorry. The real loss for steps 3/4 >>>> has to be measured after the implementation. The problem with the >>>> estimates here is that the assembled ebuilds also contain the >>>> sources and the eclasses. Maybe I will do some number crunching on a >>>> few selected ebuilds. >>> >>> A more critical factor could be the dependencies - unless we otherwise >>> cache them. To install a package you need to walk the dependency tree >>> (well, at least until you hit installed packages with the right USE >>> flags). That requires one set of fetches for each level you traverse, >>> and that means at least one round trip per level. > > You are right. I only considered the size improvements, not the changes > to speed. > Anyway, is the calculation of dependencies handled differently in > current portage versions? Well, it would be inefficient to open separate TCP connections for individual metadata files since there are so many of them and they are so small. This is why package managers typically download the metadata for all packages as a single bundle. For example, see the type of metadata bundle that is used to implement PORTAGE_BINHOST support: http://tinderbox.dev.gentoo.org/default-linux/x86/Packages It's conceivable that you could simply use rsync to sync the metadata/cache/ subdirectory from rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree constantly mutates and doesn't provide any kind version control, it would not be very practical to use it in this way. If you fetch the metadata and the ebuilds separately, you need a way to guarantee that you can fetch exactly the same revisions of ebuilds that the earlier fetched metadata corresponds to. -- Thanks, Zac ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-24 16:58 ` Zac Medico @ 2011-03-27 14:28 ` Michael Seifert 2011-03-27 19:39 ` Zac Medico 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-27 14:28 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 24.03.2011 17:58, schrieb Zac Medico: > Well, it would be inefficient to open separate TCP connections for > individual metadata files since there are so many of them and they are > so small. This is why package managers typically download the metadata > for all packages as a single bundle. For example, see the type of > metadata bundle that is used to implement PORTAGE_BINHOST support: > > http://tinderbox.dev.gentoo.org/default-linux/x86/Packages > Is there a specific reason why the PORTAGE_BINHOST metadata is different from the metadata/cache format? I like the BINHOST metadata better, even if it is split up into several files, because it would already contain the ebuild version it was generated for. Probably it would be a good idea to merge the information of both metadata into a single unified format? This would also solve the problem with the missing version control (described below) as well as simplifying the way portage handles metadata. On the other hand it would be an even more substantial change, which is not necessarily a bad thing. Portage is supposed to work the same way as before – just faster. > It's conceivable that you could simply use rsync to sync the > metadata/cache/ subdirectory from > rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree > constantly mutates and doesn't provide any kind version control, it > would not be very practical to use it in this way. If you fetch the > metadata and the ebuilds separately, you need a way to guarantee that > you can fetch exactly the same revisions of ebuilds that the earlier > fetched metadata corresponds to. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2PSYAACgkQnzX+Jf4GTUyz7ACcCV44bXSEwoyCg/6uMz8E9/2g c+EAn1m/BpF7rKkSSmpouousupVCbUHL =G4GS -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-27 14:28 ` Michael Seifert @ 2011-03-27 19:39 ` Zac Medico 2011-03-29 14:31 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Zac Medico @ 2011-03-27 19:39 UTC (permalink / raw To: gentoo-soc On 03/27/2011 07:28 AM, Michael Seifert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Am 24.03.2011 17:58, schrieb Zac Medico: >> Well, it would be inefficient to open separate TCP connections for >> individual metadata files since there are so many of them and they are >> so small. This is why package managers typically download the metadata >> for all packages as a single bundle. For example, see the type of >> metadata bundle that is used to implement PORTAGE_BINHOST support: >> >> http://tinderbox.dev.gentoo.org/default-linux/x86/Packages >> > > Is there a specific reason why the PORTAGE_BINHOST metadata is different > from the metadata/cache format? They have minor differences because they were designed for slightly different use cases: The metadata/cache format was designed to be distributed together with the ebuilds that it was generated from. Its main drawback is that it can be slow to read many small files since it may require lots of disk seeks. We could pack them all into a single file, similar to one that PORTAGE_BINHOST uses. That would help for tools like eix since it's faster to read one big file than many small files. However, if it was fetched earlier and separate from the ebuilds, it wouldn't be very practical for dependency calculations unless you provided a way to fetch exactly the same revisions of ebuilds (and inherited eclasses which can modify dependencies) that the earlier fetched metadata corresponds to. For example, the cache could be made to refer to a UUID would be used to generate a URI in order to fetch a particular revision of ebuild/eclass bundle that exactly corresponds to the cache entry. The PORTAGE_BINHOST cache format is better than the metadata/cache format for the use case that it's designed for, however the current design has a race condition which has been experienced by chromium-os developers: http://code.google.com/p/chromium-os/issues/detail?id=3225 > I like the BINHOST metadata better, even if it is split up into several > files, because it would already contain the ebuild version it was > generated for. Probably it would be a good idea to merge the information > of both metadata into a single unified format? > This would also solve the problem with the missing version control > (described below) as well as simplifying the way portage handles > metadata. On the other hand it would be an even more substantial change, > which is not necessarily a bad thing. Portage is supposed to work the > same way as before – just faster. Well, a new cache format is only part of the solution. In order to provide revision control that's necessary for practical dependency calculations when the cache is fetched earlier that the ebuilds/eclasses, you're also going to need to create individually fetchable revisioned ebuild/eclass bundles that the cache will refer to (without any race conditions). >> It's conceivable that you could simply use rsync to sync the >> metadata/cache/ subdirectory from >> rsync://rsync.gentoo.org/gentoo-portage/. However, since the rsync tree >> constantly mutates and doesn't provide any kind version control, it >> would not be very practical to use it in this way. If you fetch the >> metadata and the ebuilds separately, you need a way to guarantee that >> you can fetch exactly the same revisions of ebuilds that the earlier >> fetched metadata corresponds to. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.17 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2PSYAACgkQnzX+Jf4GTUyz7ACcCV44bXSEwoyCg/6uMz8E9/2g > c+EAn1m/BpF7rKkSSmpouousupVCbUHL > =G4GS > -----END PGP SIGNATURE----- > -- Thanks, Zac ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-27 19:39 ` Zac Medico @ 2011-03-29 14:31 ` Michael Seifert 2011-03-29 15:47 ` Zac Medico 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-29 14:31 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 27.03.2011 21:39, schrieb Zac Medico: > On 03/27/2011 07:28 AM, Michael Seifert wrote: > I like the BINHOST metadata better, even if it is split up into several > files, because it would already contain the ebuild version it was > generated for. Probably it would be a good idea to merge the information > of both metadata into a single unified format? > This would also solve the problem with the missing version control > (described below) as well as simplifying the way portage handles > metadata. On the other hand it would be an even more substantial change, > which is not necessarily a bad thing. Portage is supposed to work the > same way as before \x13 just faster. > >> Well, a new cache format is only part of the solution. In order to >> provide revision control that's necessary for practical dependency >> calculations when the cache is fetched earlier that the >> ebuilds/eclasses, you're also going to need to create individually >> fetchable revisioned ebuild/eclass bundles that the cache will refer to >> (without any race conditions). If I understood your point correctly, there can be problems, if there are changes to eclasses after a user ran emerge --sync and before the ebuilds are fetched (race condition between server and client). In such a case, the correct ebuild would be installed using a wrong environment (i.e. the wrong eclasses), because the metadata is outdated and contains wrong dependencies. Did this cover your outlined case? I can think of three solutions for this: 1. Self-contained ebuilds Your proposal of packaging the ebuild together with its eclasses and the source code would make the need of eclass versioning unnecessary, since the ebuild is always packaged within the correct environment. (Creates a new race condition, see your link :) 2. Using the VCS Store the versions of eclasses in the metadata and fetch it from a repository before installing the ebuilds. This would pull a dependency on a VCS (CVS in this case), though. 3. Fetching the eclasses together with the metadata This way you would have a local snapshot of the build environment, which would eliminate the race condition and the need to identify eclass versions from the metadata, because it stays consitent. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2R7V0ACgkQnzX+Jf4GTUyUWwCfXhrDkUotVqvaef81Mj27bWF1 WqkAoM4a3WcnkQVBIhtOvnGFjzr3OL5m =cUbO -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-29 14:31 ` Michael Seifert @ 2011-03-29 15:47 ` Zac Medico 2011-03-30 17:33 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Zac Medico @ 2011-03-29 15:47 UTC (permalink / raw To: gentoo-soc On 03/29/2011 07:31 AM, Michael Seifert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Am 27.03.2011 21:39, schrieb Zac Medico: >> On 03/27/2011 07:28 AM, Michael Seifert wrote: >> I like the BINHOST metadata better, even if it is split up into several >> files, because it would already contain the ebuild version it was >> generated for. Probably it would be a good idea to merge the information >> of both metadata into a single unified format? >> This would also solve the problem with the missing version control >> (described below) as well as simplifying the way portage handles >> metadata. On the other hand it would be an even more substantial change, >> which is not necessarily a bad thing. Portage is supposed to work the >> same way as before \x13 just faster. >> >>> Well, a new cache format is only part of the solution. In order to >>> provide revision control that's necessary for practical dependency >>> calculations when the cache is fetched earlier that the >>> ebuilds/eclasses, you're also going to need to create individually >>> fetchable revisioned ebuild/eclass bundles that the cache will refer to >>> (without any race conditions). > > If I understood your point correctly, there can be problems, if there > are changes to eclasses after a user ran emerge --sync and before the > ebuilds are fetched (race condition between server and client). In such > a case, the correct ebuild would be installed using a wrong environment > (i.e. the wrong eclasses), because the metadata is outdated and contains > wrong dependencies. Did this cover your outlined case? That's mostly correct, however, it's not just the eclasses that introduce the race condition. Just like the eclasses, the ebuilds themselves can be modified. In addition, so the can the files that may be included with the ebuilds in the "files" subdirectory ($FILESDIR). In order to be exhaustively complete, you'd also have to include any licenses referenced by LICENSE variable if the ebuild (these are located in the licenses/ subdirectory of the repository). You'd probably also want to include any matching package.mask entries from the global profiles/package.mask file, since this is can be very relevant to dependency calculations. Finally, there's the user's architecture-specific profile. This can also affect dependency calculations via things like package.mask, package.unmask, use.mask, and use.force. If you want to be entirely exhaustive, then you'll need your ebuild metadata to reference a snapshot of this profile. > I can think of three solutions for this: > 1. Self-contained ebuilds > Your proposal of packaging the ebuild together with its eclasses and the > source code would make the need of eclass versioning unnecessary, since > the ebuild is always packaged within the correct environment. > (Creates a new race condition, see your link :) Really, the eclass versioning would not be unnecessary, but it would be tied directly to the ebuild versioning. If we're going to be completely exhaustive, then it might make sense to have the ebuild metadata reference separate eclass, license, package.mask, and profile bundles. > 2. Using the VCS > Store the versions of eclasses in the metadata and fetch it from a > repository before installing the ebuilds. This would pull a dependency > on a VCS (CVS in this case), though. Not necessarily. If the ebuild metadata contains UUIDs that the client can translate to URIs using a predefined protocol, then the client simply needs to be aware of the protocol so that it can fetch the appropriate URIs. > 3. Fetching the eclasses together with the metadata > This way you would have a local snapshot of the build environment, which > would eliminate the race condition and the need to identify eclass > versions from the metadata, because it stays consitent. Right. But if you decide to split out license, package.mask, and profile bundles as discussed above, you might also decide to do that for eclasses as well. -- Thanks, Zac ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-29 15:47 ` Zac Medico @ 2011-03-30 17:33 ` Michael Seifert 2011-03-30 18:11 ` Zac Medico 0 siblings, 1 reply; 14+ messages in thread From: Michael Seifert @ 2011-03-30 17:33 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 29.03.2011 17:47, schrieb Zac Medico: >> 2. Using the VCS >> Store the versions of eclasses in the metadata and fetch it from a >> repository before installing the ebuilds. This would pull a dependency >> on a VCS (CVS in this case), though. > > Not necessarily. If the ebuild metadata contains UUIDs that the client > can translate to URIs using a predefined protocol, then the client > simply needs to be aware of the protocol so that it can fetch the > appropriate URIs. > Although I like the idea with the UUID, I don't know if it is really worth the trouble. Due to the number of eclasses let alone the files in /usr/portage/profile, there is a huge number of permutations and you would need a very complex (in terms of size) UUID. Is it enough for an SoC project to "just" make portage leave out the ebuilds on synchronization? This would be of similar scope as the original idea (Cache sync). I also miss the documentation a bit, to be honest. Maybe improving the portage documentation would be a nice addition? -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2TaX8ACgkQnzX+Jf4GTUxsdgCfRXDz4MGSBkz7iVugjMjSwNr1 /loAn2i07Y0tVLgfvGWHipiFW+RVa5JP =Frv4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-30 17:33 ` Michael Seifert @ 2011-03-30 18:11 ` Zac Medico 2011-03-31 11:55 ` Michael Seifert 0 siblings, 1 reply; 14+ messages in thread From: Zac Medico @ 2011-03-30 18:11 UTC (permalink / raw To: gentoo-soc On 03/30/2011 10:33 AM, Michael Seifert wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Am 29.03.2011 17:47, schrieb Zac Medico: >>> 2. Using the VCS >>> Store the versions of eclasses in the metadata and fetch it from a >>> repository before installing the ebuilds. This would pull a dependency >>> on a VCS (CVS in this case), though. >> >> Not necessarily. If the ebuild metadata contains UUIDs that the client >> can translate to URIs using a predefined protocol, then the client >> simply needs to be aware of the protocol so that it can fetch the >> appropriate URIs. >> > > Although I like the idea with the UUID, I don't know if it is really > worth the trouble. Due to the number of eclasses let alone the files in > /usr/portage/profile, there is a huge number of permutations and you > would need a very complex (in terms of size) UUID. For our purposes, it's really not necessary to use full RFC 4122 128-bit UUIDs. For example, if the repository is only refreshed once every 30 minutes (like the rsync tree currently is), then timestamps with 1 minute precision would be more than adequate to uniquely identify a given revision of a particular bundle. > Is it enough for an SoC project to "just" make portage leave out the > ebuilds on synchronization? > This would be of similar scope as the original idea (Cache sync). The metadata/cache/ and profiles/ subdirectories would be enough information to do correct dependency calculations, but in practice I think the race conditions involved in trying to actually build anything from those calculations would lead to overwhelming dissatisfaction and complaints from users. > I also miss the documentation a bit, to be honest. Maybe improving the > portage documentation would be a nice addition? Yes, we can always use more documentation. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.17 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk2TaX8ACgkQnzX+Jf4GTUxsdgCfRXDz4MGSBkz7iVugjMjSwNr1 > /loAn2i07Y0tVLgfvGWHipiFW+RVa5JP > =Frv4 > -----END PGP SIGNATURE----- > -- Thanks, Zac ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [gentoo-soc] GSoC - cache sync/self-contained ebuilds 2011-03-30 18:11 ` Zac Medico @ 2011-03-31 11:55 ` Michael Seifert 0 siblings, 0 replies; 14+ messages in thread From: Michael Seifert @ 2011-03-31 11:55 UTC (permalink / raw To: gentoo-soc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 30.03.2011 20:11, schrieb Zac Medico: > On 03/30/2011 10:33 AM, Michael Seifert wrote: > Am 29.03.2011 17:47, schrieb Zac Medico: >>>>> 2. Using the VCS >>>>> Store the versions of eclasses in the metadata and fetch it from a >>>>> repository before installing the ebuilds. This would pull a dependency >>>>> on a VCS (CVS in this case), though. >>>> >>>> Not necessarily. If the ebuild metadata contains UUIDs that the client >>>> can translate to URIs using a predefined protocol, then the client >>>> simply needs to be aware of the protocol so that it can fetch the >>>> appropriate URIs. >>>> > > Although I like the idea with the UUID, I don't know if it is really > worth the trouble. Due to the number of eclasses let alone the files in > /usr/portage/profile, there is a huge number of permutations and you > would need a very complex (in terms of size) UUID. > >> For our purposes, it's really not necessary to use full RFC 4122 128-bit >> UUIDs. For example, if the repository is only refreshed once every 30 >> minutes (like the rsync tree currently is), then timestamps with 1 >> minute precision would be more than adequate to uniquely identify a >> given revision of a particular bundle. > > Is it enough for an SoC project to "just" make portage leave out the > ebuilds on synchronization? > This would be of similar scope as the original idea (Cache sync). > >> The metadata/cache/ and profiles/ subdirectories would be enough >> information to do correct dependency calculations, but in practice I >> think the race conditions involved in trying to actually build anything >> from those calculations would lead to overwhelming dissatisfaction and >> complaints from users. > Now I got it! I was thinking much too complicated with the UUIDs. Thank you for the clarification! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2Ua6QACgkQnzX+Jf4GTUynmACgqmBA3ZLBI2hQhtGRRWkBt8tl icAAoJerNgjOkLB2zRGullnOSEnTwjOY =YmHm -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-03-31 11:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-23 9:39 [gentoo-soc] GSoC - cache sync/self-contained ebuilds Michael Seifert 2011-03-23 10:12 ` Fabian Groffen 2011-03-23 17:44 ` Michael Seifert 2011-03-23 18:43 ` Rich Freeman 2011-03-23 19:47 ` Donnie Berkholz 2011-03-23 21:01 ` Michael Seifert 2011-03-24 16:58 ` Zac Medico 2011-03-27 14:28 ` Michael Seifert 2011-03-27 19:39 ` Zac Medico 2011-03-29 14:31 ` Michael Seifert 2011-03-29 15:47 ` Zac Medico 2011-03-30 17:33 ` Michael Seifert 2011-03-30 18:11 ` Zac Medico 2011-03-31 11:55 ` Michael Seifert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox