public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
@ 2011-08-06 14:13 Fabian Groffen
  2011-08-06 15:36 ` Markos Chandras
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-06 14:13 UTC (permalink / raw
  To: gentoo-dev

All,

When we migrate away from CVS for gentoo-x86 (gx86), as it looks now,
the same structure will be kept as we have in CVS now.  Policies to
reject merge commits and only allow rebases on e.g. the Git
infrastructure will even more closely match the central and server-based
way of working Gentoo is used to now.

In this email, I step away from the current model that Gentoo uses for
the gentoo-x86 repository.  Instead, I consider a repo-per-package
model, as in use by e.g. Fedora [1] and Debian [2].

In short, the repo-per-package model means that each package
(my-cat/package) is a separate repository in some VCS.
Instead of having a huge tree that will only grow forever (gx86),
packages are just in their own repository.

This approach can potentially be interesting for a number of reasons:
- history is per package
  + package is likely to be small enough not to have to consider any
    history removal/splitting -- everything can be retained
  + if a package is removed, it's repository is simply no longer
    considered, hence its existence and history doesn't clobber a main
    repository
  + since the repository can move, its history can also easily move
    along with its location, being either a category, or even as purpose
    (e.g. packages that started on sunrise, or in developer overlays)
- tree generation is dynamic
  + a full (rsync) tree has to be created by first getting all repositories,
    and/or getting them up-to-date -- avoid those packages you don't
    need
  + easy to make different "trees", e.g. a server tree (no GNOME, KDE),
    prefix tree (different versions of packages), etc.
  + easy to move packages around, their category is specified by the
    tree configuration, the repository the package lives in doesn't change,
    probably overlays, betagarden, graveyard, sunset, etc. can all go
  + no restriction to using only a single VCS, because packages are just
	refreshed before inclusion in the tree, their (source) origin
    doesn't matter
- per package branches
  + instead of developing in overlays, simply branches could be used,
    such that a single place is sufficient to for each package
  + switching branches can implement atomic tree-wide changes for
    complex situations

No restriction to using a only a single VCS.
While I don't think that allowing developers to use their VCS of choice
is very relevant when committing package changes, the ability to use
more than just *one* VCS when assembling the rsync tree is a huge
advantage if we want to migrate away from the current CVS tree slowly
during a migration period.  It could enable the use of git (the obvious
choice of many) now, alongside the current gx86 tree.

Because the rsync tree would be generated by assembling all packages
that need to be in the tree, the only thing necessary for that
generation is to understand which VCS commands to use to acquire/update
a package and what files/directories to skip when copying the package to
its final destination in the rsync tree.  This is easily scriptable,
given that only the old gx86 tree will be CVS, and the rest git.

When rsync tree generation would be based on a file with packages to
include, I can imagine a simple way define where the package comes from,
and where it should end up, e.g.:

  my-cat;package1;git://git.g.o/foo-package.git;optionalbranch
  my-cat;package2;cvs://cvs.g.o/gentoo-x86/my-cat/package2;

which defines the category, the package name, its source location, and
perhaps something like a branch or tag in case we ever want to e.g.
split development (what is now in overlays typically) from what's in the
tree.  Branch support would also be useful for e.g. Prefix modifications
to a package, when only checked out by Prefix rsync tree generation.  It
can as well be a solution for what is often referred to as the "slacker
arches" problem, when old versions of ebuilds that a maintainer wants to
drop, remain available for minority arches that need them, but only for
their rsync tree, without bothering the maintainer with it.

Obviously, in an ideal world all packages would be in the same VCS, git
in this case.  With a system like this, however, CVS packages can slowly
be moved to git, as their maintainers see fit.  Some developers aim to
benefit more from git than others.  They can move their packages,
directly.  For the remaining packages, eventual migration is necessary,
but they should block developers that want to use git for their packages
less.

There probably are drawbacks to this system as well.  I, however, only
see big advantages for the moment.
Comments, thoughts, ideas welcome.


[1] http://pkgs.fedoraproject.org/gitweb/
[2] http://packages.qa.debian.org/common/index.html

-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
@ 2011-08-06 15:36 ` Markos Chandras
  2011-08-07  8:12   ` Fabian Groffen
  2011-08-06 18:37 ` Nirbheek Chauhan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Markos Chandras @ 2011-08-06 15:36 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 08/06/2011 03:13 PM, Fabian Groffen wrote:
> All,
> 
I think this post belongs to either -project or -scm MLs but anyway

> When we migrate away from CVS for gentoo-x86 (gx86), as it looks
> now,

I like your proposal but please clarify the following two questions

1) Each package requires a new repository. Who is responsible to create
that? Should developers be responsible to do that or they should ping infra?

2) Assuming the repository for a new package was created. Who is
responsible to include this in the rsync generation file?

I think your approach requires centralised administration to ensure
minimal incidents in the infrastructure mechanisms.

- -- 
Regards,
Markos Chandras / Gentoo Linux Developer / Key ID: B4AFF2C2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBCgAGBQJOPV9gAAoJEPqDWhW0r/LCXYUP/12m801wqAFfb0mdLkckCpa4
x/B4JNYPRqu+ec8ItO+WqOlDpNdg/QSfaGy/6YwCqp4jS0Ijz+MoZDGElgyjnhTD
0M8KiYKZKlhPsf/skWfs1wfFH0IPzCBfz7+soCAp8Lx30LMqZUJjFu5jTpQRS9KX
Aegn8LIlhJIF8tQk9RlfsMdqybMLLw6IGPlylDGJ0pRcJ8oGycRbePF4Gko5m5QJ
iBofXfYhkZTL5vhlFotbdnVdW3q+MlwvSge4liVKiWhjLUJGvJdvJCfL85fOSQGO
z1qBkOKannmdc4O4xxN2H4dVseA8rHbY1ZzxHqo5w0B5YHSJjPMe0a7CuuBXx0fW
VKbC/ctVgUq1sE9caXWZQTKoV/Sy0pmokrcV0tiNELXvuw8zotNH6QO/Po3ud1WL
/iLPGgyM2hT3956Zwf2nEsiTyYZIbJ0yQnFdVf4xBM//ngZfEs1cuMOAqNd7JMb+
D77Gwgs4TB2wie7WKWbYN6jrWcOCjH3BrIWz9ZHZ7+JbE1kemWG/EzNh3OO+XDKD
OiKsr6IgC75K2/jTCGf8yqMlw49RodCVLHnpORlxtBgzJbVHm/hxARaFllTTAaGx
7bp25JlQId1R1lMcVOR2T5G7AMmaHEeymK6Kizx3M9xIdowxDGGx1dYRmV3a6D0c
8jL2ZFvO4AZmL+y6jQLc
=XFxx
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
  2011-08-06 15:36 ` Markos Chandras
@ 2011-08-06 18:37 ` Nirbheek Chauhan
  2011-08-07  8:24   ` Fabian Groffen
  2011-08-06 20:17 ` James Cloos
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Nirbheek Chauhan @ 2011-08-06 18:37 UTC (permalink / raw
  To: gentoo-dev

Hey,

On Sat, Aug 6, 2011 at 7:43 PM, Fabian Groffen <grobian@gentoo.org> wrote:
> In this email, I step away from the current model that Gentoo uses for
> the gentoo-x86 repository.  Instead, I consider a repo-per-package
> model, as in use by e.g. Fedora [1] and Debian [2].
>
> In short, the repo-per-package model means that each package
> (my-cat/package) is a separate repository in some VCS.
> Instead of having a huge tree that will only grow forever (gx86),
> packages are just in their own repository.
>

I had mixed feelings while reading your email. The idea is certainly
very intriguing, but there's a few things that make it a no-go for me:

1. One of the big things I've been looking forward to with git is the
ability to do atomic commits across the tree. Addition of GNOME
releases, pkgmove changes across the tree, changing ebuild/eclass
behaviour, etc. without inconsistency or praying that my connection
doesn't get dropped in the middle of a hundred interrelated commits.

Without this feature, I think some arch teams and GNOME/KDE teams will
become sad.

2. The ability to do "feature" commits across the whole tree instead
of hundreds of tiny commits everywhere. This combined with the
ChangeLog generation will save a lot of time and space. This will
especially benefit arch teams, but I've felt the need for this
numerous times myself. Example: we moved to using .xz tarballs for
GNOME, and that touched a lot of ebuilds, and it was extremely
time-consuming to repeat echangelog && repoman commit per-package.

3. Adding packages from overlays via `cherry-pick` or `git am` will
become extremely tedious. If thin manifests are implemented, a series
of patches + a simple merge hook will be all you need to move
KDE/GNOME releases from the overlay to the tree. Without a single
tree, you need to go back to the current way of doing things.

4. We'll need to write extra tools to keep the user's cat/pkg list
up-to-date; adding and removing repositories as needed, etc. This is
added complexity for which we'll need volunteers (we've been facing a
manpower shortage already...)

5. The total size of the tree will increase a *lot* since all these
repositories will no longer share data. The current gentoo-x86 tree
stored in git without history takes only ~25MB because ebuilds are
extremely redundant. The space requirements will balloon once we need
to store 15,000 repositories. And arch teams will have to store *all*
of them, often on devices with very low space.

The per-package models looks very neat and tidy in some respects, but
the loss of a common git repository is too great, IMO.

-- 
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
  2011-08-06 15:36 ` Markos Chandras
  2011-08-06 18:37 ` Nirbheek Chauhan
@ 2011-08-06 20:17 ` James Cloos
  2011-08-07  8:31   ` Fabian Groffen
  2011-08-06 20:42 ` Krzysztof Pawlik
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: James Cloos @ 2011-08-06 20:17 UTC (permalink / raw
  To: gentoo-dev

Your idea is a step in the right direction, but the ideal config would
have a top level portage.git with sub-modules for each category, as well
as for eclass, licenses, profiles and scripts.  Each category.git should
have sub-modules for each package therein.

Within the profiles.git it *might* be reasonable for each directory in
arch/ also to be a sub-modules.  Or not.  That should be dicussed.

And the bureaucracy should be minimal.  Adding, changing or removing a
submodule from its parent repo should only require a call for consensus
among the devs, and not be pushed through a small set of devs on some
given team.

It may also be useful for the process which generates metadata/ to push
out to a repo, too, just before syncing out to the rsync mirrors.

Having each package in its own repo is a great idea.  But a simple
recursive git pull to update the whole thing is highly desireable.
Git submodules fit the bill perfectly.

This would require re-doing the cvs→git conversion, but it’d be worth it.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
                   ` (2 preceding siblings ...)
  2011-08-06 20:17 ` James Cloos
@ 2011-08-06 20:42 ` Krzysztof Pawlik
  2011-08-07  8:40   ` Fabian Groffen
  2011-08-06 20:55 ` Robin H. Johnson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Krzysztof Pawlik @ 2011-08-06 20:42 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]


On 06/08/11 16:13, Fabian Groffen wrote:
> There probably are drawbacks to this system as well.  I, however, only
> see big advantages for the moment.
> Comments, thoughts, ideas welcome.

To be honest I don't like that idea. I don't see any benefits from doing so:
 - history per package - huh? git log for specific path/file works, pulling all
the history for whole repository is one-time thing, does not happen often,
Nirbheek already pointed out some history-sharing issues

 - tree generation is dynamic - actually I think this is a disadvantage, it has
a nice potential to eat a lot of resources on master rsync server, also having
different "flavours" of the tree only brings in added complexity

 - per package branches - I like overlays, I couldn't care less about branches
for single packages :)

So:
 - having it all in single repository means that I need to care only about one
thing, not around 14956 of them
 - git was designed to be efficient with large repositories, use this ability
 - KISS

-- 
Krzysztof Pawlik  <nelchael at gentoo.org>  key id: 0xF6A80E46
desktop-misc, java, vim, kernel, python, apache...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 554 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
                   ` (3 preceding siblings ...)
  2011-08-06 20:42 ` Krzysztof Pawlik
@ 2011-08-06 20:55 ` Robin H. Johnson
  2011-08-07  9:12   ` Fabian Groffen
  2011-08-09 13:24   ` Donnie Berkholz
  2011-08-06 21:57 ` Fabio Erculiani
  2011-08-08  4:15 ` [gentoo-dev] " Nathan Phillip Brink
  6 siblings, 2 replies; 21+ messages in thread
From: Robin H. Johnson @ 2011-08-06 20:55 UTC (permalink / raw
  To: gentoo-dev

On Sat, Aug 06, 2011 at 04:13:52PM +0200, Fabian Groffen wrote:
> When we migrate away from CVS for gentoo-x86 (gx86), as it looks now,
> the same structure will be kept as we have in CVS now.  Policies to
> reject merge commits and only allow rebases on e.g. the Git
> infrastructure will even more closely match the central and
> server-based way of working Gentoo is used to now.
The discussion about rejecting merges was never completed IIRC. I think
there may be some very valid cases where we need merges still (esp the
big atomic commit cases from KDE/GNOME), but they should still be used
sparingly. Additionally, the rebase problem has problems of requiring
everybody else to hard-reset their trees if they have pushed to multiple
places, then rebase to push to the main tree, so I don't know if that
will actually fly.

> In this email, I step away from the current model that Gentoo uses for
> the gentoo-x86 repository.  Instead, I consider a repo-per-package
> model, as in use by e.g. Fedora [1] and Debian [2].
Everything you have mentioned here was previously covered in the
discussions about Git conversion models. Please consult the history of
this list, as well as the -scm list. Additionally, a large discussion
about the pros and cons of all 3 models (package per repo, category per
repo, single repo) was had at the GSoC mentor summit last year, and a
number of the core Git developers were involved in the discussion.

Problems:
- atomic/well-ordered commits that span packages, eclasses and profiles/
  directories. (Esp. committing to eclasses and then packages
  afterwards).
- Massive space overhead: Every .git directory requires a minimum of 25
  inodes [1], covering at least 100KiB. We have 15k packages in the tree
  right now. Assuming there is no tail-packing in use, that's a minimum
  of 1.5GiB on .git overhead.
- Massive space overhead(2): Having a repo per package also removes ANY
  git compression advantage that would be gained where ebuilds between
  packages are substantially similar. The _complete_ history packfile
  for the Tree right is under 1GiB [2].
- Pain in branching/forking: instead of being able to just have your own
  local clone of the single git repo, a user wanting to work on multiple
  packages together would need to have repos for ALL of them. No
  pull/merge ability at all.

[1] Git space usage testcase:
mkdir foo && cd foo && git init \
&& touch bar && git commit -m '.' bar \
&& git gc && du .git --exclude '*.sample' && find .git ! -name
'*.sample' |wc -l

[2] Packfile size:
The final proposal regarding packfile size was that we were going to
partition older history using grafts, similar to when Linus moved the
kernel into Git, and had a graft available of the old history. Initial
packfile size was under 50MiB.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
                   ` (4 preceding siblings ...)
  2011-08-06 20:55 ` Robin H. Johnson
@ 2011-08-06 21:57 ` Fabio Erculiani
  2011-08-08 14:42   ` Andreas K. Huettel
  2011-08-08  4:15 ` [gentoo-dev] " Nathan Phillip Brink
  6 siblings, 1 reply; 21+ messages in thread
From: Fabio Erculiani @ 2011-08-06 21:57 UTC (permalink / raw
  To: gentoo-dev

I really love the idea of being able to atomically push updates across
multiple CPVs.
This is also what KDE, GNOME, and many other teams are waiting for.
Having multiple repos means no atomicity and at this point, I would
rather prefer CVS (omg!).

-- 
Fabio Erculiani



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 15:36 ` Markos Chandras
@ 2011-08-07  8:12   ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  8:12 UTC (permalink / raw
  To: gentoo-dev

On 06-08-2011 16:36:00 +0100, Markos Chandras wrote:
> I like your proposal but please clarify the following two questions
> 
> 1) Each package requires a new repository. Who is responsible to create
> that? Should developers be responsible to do that or they should ping infra?

I would prefer all ebuild devs to be able to create new packages
(repos), like they can right now.

> 2) Assuming the repository for a new package was created. Who is
> responsible to include this in the rsync generation file?

The dev in question that wants it to be added to the rsync tree.

> I think your approach requires centralised administration to ensure
> minimal incidents in the infrastructure mechanisms.

Absolutely.  Typically the rsync generation file is in its own repo, and
requires as much centralisation as the current CVS tree, or the proposed
git variant of that tree.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 18:37 ` Nirbheek Chauhan
@ 2011-08-07  8:24   ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  8:24 UTC (permalink / raw
  To: gentoo-dev

On 07-08-2011 00:07:41 +0530, Nirbheek Chauhan wrote:
> On Sat, Aug 6, 2011 at 7:43 PM, Fabian Groffen <grobian@gentoo.org> wrote:
> > In short, the repo-per-package model means that each package
> > (my-cat/package) is a separate repository in some VCS.
> > Instead of having a huge tree that will only grow forever (gx86),
> > packages are just in their own repository.
> 
> I had mixed feelings while reading your email. The idea is certainly
> very intriguing, but there's a few things that make it a no-go for me:
> 
> 1. One of the big things I've been looking forward to with git is the
> ability to do atomic commits across the tree. Addition of GNOME
> releases, pkgmove changes across the tree, changing ebuild/eclass
> behaviour, etc. without inconsistency or praying that my connection
> doesn't get dropped in the middle of a hundred interrelated commits.
> Without this feature, I think some arch teams and GNOME/KDE teams will
> become sad.

I see this being possible by making a single commit to the rsync tree
generation script.

I also consider alternatives possible, as touched upon by James Cloos in
this thread where large projects like GNOME and KDE have a single
repository for all/most of their ebuilds, and perhaps even eclasses.
Repo-per-package may be too finegrained for projects like these, and
being flexible here is not going to be any problem AFAICT.

> 2. The ability to do "feature" commits across the whole tree instead
> of hundreds of tiny commits everywhere. This combined with the
> ChangeLog generation will save a lot of time and space. This will
> especially benefit arch teams, but I've felt the need for this
> numerous times myself. Example: we moved to using .xz tarballs for
> GNOME, and that touched a lot of ebuilds, and it was extremely
> time-consuming to repeat echangelog && repoman commit per-package.

Consensus is that echangelog is eventually going to disappear, IIRC, and
repoman commit probably can be done on the entire tree/repo, with the
help of sub-repos, or when you have a repo for full GNOME.

Whether you script a loop, or make a single call to repoman, you always
have to pay for running repoman, since it's your QA tool, that you're
not supposed to skip/bypass.

> 3. Adding packages from overlays via `cherry-pick` or `git am` will
> become extremely tedious. If thin manifests are implemented, a series
> of patches + a simple merge hook will be all you need to move
> KDE/GNOME releases from the overlay to the tree. Without a single
> tree, you need to go back to the current way of doing things.

With my proposal you wouldn't do this.  You would simply add a line in
the rsync tree script for including that package.  Most probably the
package would already live on g.g.o or something Gentooish, so it
wouldn't move at all, it would just be included.

In case you would have a repo with multiple packages, you would just
tell the script to now also include the directory where your package
lives.

> 4. We'll need to write extra tools to keep the user's cat/pkg list
> up-to-date; adding and removing repositories as needed, etc. This is
> added complexity for which we'll need volunteers (we've been facing a
> manpower shortage already...)

I don't understand this.  Users don't see anything of this change.
Developers could use subtrees, forests, or just only what they care
about.

> 5. The total size of the tree will increase a *lot* since all these
> repositories will no longer share data. The current gentoo-x86 tree
> stored in git without history takes only ~25MB because ebuilds are
> extremely redundant. The space requirements will balloon once we need
> to store 15,000 repositories. And arch teams will have to store *all*
> of them, often on devices with very low space.

I'm not too concerned about disk space.  Cloning a repo as-needed should
be fairly fast, and even arch teams won't need all 15,000 repositories.
It's easy to throw away repos for packages no longer necessary too.
For the limited disk-space arches, the specialised rsync trees do come
in handy though.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 20:17 ` James Cloos
@ 2011-08-07  8:31   ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  8:31 UTC (permalink / raw
  To: gentoo-dev

On 06-08-2011 16:17:32 -0400, James Cloos wrote:
> Your idea is a step in the right direction, but the ideal config would
> have a top level portage.git with sub-modules for each category, as well
> as for eclass, licenses, profiles and scripts.  Each category.git should
> have sub-modules for each package therein.

I believe the size of a repo (how much it contains) should depend on
what it is.  Some packages (like e.g. Mutt) live very well on their own,
I understand larger projects like GNOME and KDE prefer to have many
sub-components in one repo.

I don't necessarily think there should be a clear hierarchy, although
subtrees may require that.

> Within the profiles.git it *might* be reasonable for each directory in
> arch/ also to be a sub-modules.  Or not.  That should be dicussed.
> 
> And the bureaucracy should be minimal.  Adding, changing or removing a
> submodule from its parent repo should only require a call for consensus
> among the devs, and not be pushed through a small set of devs on some
> given team.

Currently, all devs can add and remove (with notice) packages, so I
don't see why that would require a consensus with this model, suddenly.

> It may also be useful for the process which generates metadata/ to push
> out to a repo, too, just before syncing out to the rsync mirrors.

I don't understand what you mean by this.  Can you elaborate?

> Having each package in its own repo is a great idea.  But a simple
> recursive git pull to update the whole thing is highly desireable.
> Git submodules fit the bill perfectly.

I assumed something like this possible to be able to get "all" easily or
something.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 20:42 ` Krzysztof Pawlik
@ 2011-08-07  8:40   ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  8:40 UTC (permalink / raw
  To: gentoo-dev

On 06-08-2011 22:42:33 +0200, Krzysztof Pawlik wrote:
> To be honest I don't like that idea. I don't see any benefits from doing so:
>  - tree generation is dynamic - actually I think this is a disadvantage, it has
> a nice potential to eat a lot of resources on master rsync server, also having
> different "flavours" of the tree only brings in added complexity

To be honest, I don't see any problem there.  The rsync master server is
a modern machine.  Generating multiple trees, hardly takes more since
all repos in use are shared, of course.
With the prefix rsync tree generation [1] in mind, I think the extra
cost timewise aren't too bad either.

> So:
>  - having it all in single repository means that I need to care only about one
> thing, not around 14956 of them

subtrees would help you here

>  - git was designed to be efficient with large repositories, use this ability

I'm not claiming git is inefficient.  I think our current model is not
very flexible.  An alternatives like the one I proposed solves certain
problems that currently exist within Gentoo.


[1] http://stats.prefix.freens.org/timing-rsync0.png

-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 20:55 ` Robin H. Johnson
@ 2011-08-07  9:12   ` Fabian Groffen
  2011-08-07  9:21     ` Michał Górny
  2011-08-07 11:05     ` Rich Freeman
  2011-08-09 13:24   ` Donnie Berkholz
  1 sibling, 2 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  9:12 UTC (permalink / raw
  To: gentoo-dev

On 06-08-2011 20:55:05 +0000, Robin H. Johnson wrote:
> On Sat, Aug 06, 2011 at 04:13:52PM +0200, Fabian Groffen wrote:
> > In this email, I step away from the current model that Gentoo uses for
> > the gentoo-x86 repository.  Instead, I consider a repo-per-package
> > model, as in use by e.g. Fedora [1] and Debian [2].
> Everything you have mentioned here was previously covered in the
> discussions about Git conversion models. Please consult the history of
> this list, as well as the -scm list. Additionally, a large discussion
> about the pros and cons of all 3 models (package per repo, category per
> repo, single repo) was had at the GSoC mentor summit last year, and a
> number of the core Git developers were involved in the discussion.

I see now my previous search wasn't complete.  Please correct me if I'm
wrong, but I have the impression the previous discussions looked at
repo-per-package just from a storage point of view, not from a
functional point of view.  The git overhead for repo-per-package is
admittedly quite undesirable.

> Problems:
> - atomic/well-ordered commits that span packages, eclasses and profiles/
>   directories. (Esp. committing to eclasses and then packages
>   afterwards).

This can be done with a single commit to the rsync tree script, and it
doesn't necessarily need git repos.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-07  9:12   ` Fabian Groffen
@ 2011-08-07  9:21     ` Michał Górny
  2011-08-07  9:29       ` Fabian Groffen
  2011-08-07 11:05     ` Rich Freeman
  1 sibling, 1 reply; 21+ messages in thread
From: Michał Górny @ 2011-08-07  9:21 UTC (permalink / raw
  To: gentoo-dev; +Cc: grobian

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

On Sun, 7 Aug 2011 11:12:47 +0200
Fabian Groffen <grobian@gentoo.org> wrote:

> > Problems:
> > - atomic/well-ordered commits that span packages, eclasses and
> > profiles/ directories. (Esp. committing to eclasses and then
> > packages afterwards).
> 
> This can be done with a single commit to the rsync tree script, and it
> doesn't necessarily need git repos.

And have you considered the function PoV on this?

With clean git repo: few commits, git push

With your split-tree: a lot of commits to random packages, potentially
using random VCS-es, a lot of pushes, hacking some magical rsync stuff
and finally guessing what went wrong this time

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-07  9:21     ` Michał Górny
@ 2011-08-07  9:29       ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07  9:29 UTC (permalink / raw
  To: gentoo-dev

On 07-08-2011 11:21:51 +0200, Michał Górny wrote:
> Fabian Groffen <grobian@gentoo.org> wrote:
> > This can be done with a single commit to the rsync tree script, and it
> > doesn't necessarily need git repos.
> 
> And have you considered the function PoV on this?
> 
> With clean git repo: few commits, git push
> 
> With your split-tree: a lot of commits to random packages, potentially
> using random VCS-es, a lot of pushes, hacking some magical rsync stuff
> and finally guessing what went wrong this time

Ideally, only one VCS would be in use.  For the current situation there
is both CVS and git, though.

With some experience from the Prefix rsync tree generation (CVS + SVN),
I can tell the magic is quite absent, and I've seen no "guessing what
went wrong this time".

I have considered it.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-07  9:12   ` Fabian Groffen
  2011-08-07  9:21     ` Michał Górny
@ 2011-08-07 11:05     ` Rich Freeman
  2011-08-07 11:39       ` Fabian Groffen
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Freeman @ 2011-08-07 11:05 UTC (permalink / raw
  To: gentoo-dev

On Sun, Aug 7, 2011 at 5:12 AM, Fabian Groffen <grobian@gentoo.org> wrote:
> On 06-08-2011 20:55:05 +0000, Robin H. Johnson wrote:
>> Problems:
>> - atomic/well-ordered commits that span packages, eclasses and profiles/
>>   directories. (Esp. committing to eclasses and then packages
>>   afterwards).
>
> This can be done with a single commit to the rsync tree script, and it
> doesn't necessarily need git repos.
>

What exactly are you thinking about here.  How about this use case:

I have a list of 150 packages/versions.  I want to make all of them go
from ~x86 to x86 at the same time.

If they're all in one git repo, then I can use a script or whatever to
go through every one at leisure and rekeyword them.  Then I can do a
repoman scan on the entire repository for an hour or two if I want.
When I'm happy I can commit everything atomically.

How do you envision doing this by just making a single commit to the
rsync tree script if the files are in multiple repos?  Right now that
rsync tree is pulling in all those files already - in the ~x86
version.  Do you propose cloning all the repos, fixing the arch flag
in the new repos, and then re-pointing the rsync tree atomically?
That would work, but any commits to the 150 packages by others in the
meantime would get lost, and it seems a bit painful to do it this way.
 I can see how you could atomically add or remove 150 packages
entirely, but not how you can tweak individual versions of packages
without a fair bit of pain.  Admittedly, you could have some clever
solution in mind that I'm just not grokking.

The other thing that was tossed out is having multiple repos, but
putting things like kde/gnome in their own bigger repos.  I'm not sure
this is going to work, since it only works for those particular
situations.  A package can only be in one repo, so you can't have one
repo for kde, and another repo for everything that uses qt, and
another for everything that uses pulseaudio, or whatever.  Atomic
changes to many packages could be required for any number of unforseen
reasons.

I can see the elegance of allowing the portage tree to be a collage of
packages from different sources, but I'm not convinced we really need
this.  Users can already accomplish this on their end with overlays.
It seems like we're just making the portage tree an overlay of its
own.  I'm not sure what it really buys us.  Just using git in the
first place already simplifies distributed development.  If you took
this idea to an extreme you might not have the rsync server assemble
the tree at all, but just push out the official list as a
"recommended" list of overlays, and let the users put their own trees
together (with the ability to override parts of it).

Rich



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-07 11:05     ` Rich Freeman
@ 2011-08-07 11:39       ` Fabian Groffen
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Groffen @ 2011-08-07 11:39 UTC (permalink / raw
  To: gentoo-dev

On 07-08-2011 07:05:03 -0400, Rich Freeman wrote:
> What exactly are you thinking about here.  How about this use case:
> 
> I have a list of 150 packages/versions.  I want to make all of them go
> from ~x86 to x86 at the same time.
> 
> If they're all in one git repo, then I can use a script or whatever to
> go through every one at leisure and rekeyword them.  Then I can do a
> repoman scan on the entire repository for an hour or two if I want.
> When I'm happy I can commit everything atomically.
> 
> How do you envision doing this by just making a single commit to the
> rsync tree script if the files are in multiple repos?  Right now that
> rsync tree is pulling in all those files already - in the ~x86
> version.  Do you propose cloning all the repos, fixing the arch flag
> in the new repos, and then re-pointing the rsync tree atomically?
> That would work, but any commits to the 150 packages by others in the
> meantime would get lost, and it seems a bit painful to do it this way.
>  I can see how you could atomically add or remove 150 packages
> entirely, but not how you can tweak individual versions of packages
> without a fair bit of pain.  Admittedly, you could have some clever
> solution in mind that I'm just not grokking.

Not sure.  You could branch I guess.  It takes more work, undoubtedly.

> The other thing that was tossed out is having multiple repos, but
> putting things like kde/gnome in their own bigger repos.  I'm not sure
> this is going to work, since it only works for those particular
> situations.  A package can only be in one repo, so you can't have one
> repo for kde, and another repo for everything that uses qt, and
> another for everything that uses pulseaudio, or whatever.  Atomic
> changes to many packages could be required for any number of unforseen
> reasons.

This indeed makes it difficult.

> I can see the elegance of allowing the portage tree to be a collage of
> packages from different sources, but I'm not convinced we really need
> this.  Users can already accomplish this on their end with overlays.
> It seems like we're just making the portage tree an overlay of its
> own.  I'm not sure what it really buys us.  Just using git in the
> first place already simplifies distributed development.  If you took
> this idea to an extreme you might not have the rsync server assemble
> the tree at all, but just push out the official list as a
> "recommended" list of overlays, and let the users put their own trees
> together (with the ability to override parts of it).

I don't feel users should be playing with these things in general.  I
see the tree assembling thing more as a technical way to deal with some
given legacy and limitations.  Admittedly, it isn't perfect, and many
people seem to intend doing things with a git-based tree that cannot be
done with now CVS, and an assembled tree wouldn't really support it out
of the box either.


-- 
Fabian Groffen
Gentoo on a different level



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
                   ` (5 preceding siblings ...)
  2011-08-06 21:57 ` Fabio Erculiani
@ 2011-08-08  4:15 ` Nathan Phillip Brink
  6 siblings, 0 replies; 21+ messages in thread
From: Nathan Phillip Brink @ 2011-08-08  4:15 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

On Sat, Aug 06, 2011 at 04:13:52PM +0200, Fabian Groffen wrote:
> - tree generation is dynamic
>   + easy to move packages around, their category is specified by the
>     tree configuration, the repository the package lives in doesn't change,
>     probably overlays, betagarden, graveyard, sunset, etc. can all go
> - per package branches
>   + instead of developing in overlays, simply branches could be used,
>     such that a single place is sufficient to for each package

Recreating the overlay experience with many repos sounds
difficult. Many overlays include multi-component packages or changes
to interdependent packages. Using per-package branching instead of
overlays would complicate this, with a user (or layman) having to
search each package's repository for branches associated with a
particular overlay when trying to guess which overlay a package should
be pulled from.

The current behavior of PORTDIR_OVERLAY is quite well-defined and
easier to understand. It even allows overlays to gracefully fall
behind in keeping their packages up to date. For example, when a fix
in an overlay is committed to gentoo-x86 as a new ebuild revision, the
overlay maintainer can forget that he has a stale version of the
package without harming anyone because portage chooses the newest
package. It seems that the traditional overlay idea -- where overlays
overlay gentoo-x86 and eachother -- can't quite exist with per-package
branches. To recreate this idea, you'd need to have one checkout per
package per repo (including overlays) and you'd still use
PORTDIR_OVERLAY.

I sorta like how overlays work currently ;-).

-- 
binki

Look out for missing or extraneous apostrophes!

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 21:57 ` Fabio Erculiani
@ 2011-08-08 14:42   ` Andreas K. Huettel
  2011-08-08 16:51     ` "Paweł Hajdan, Jr."
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas K. Huettel @ 2011-08-08 14:42 UTC (permalink / raw
  To: gentoo-dev

Am Samstag 06 August 2011, 23:57:13 schrieb Fabio Erculiani:
> I really love the idea of being able to atomically push updates across
> multiple CPVs.
> This is also what KDE, GNOME, and many other teams are waiting for.
> Having multiple repos means no atomicity and at this point, I would
> rather prefer CVS (omg!).

Exactly. This is why I would also vote for a single tree and single modern vcs.

In addition, I would like to propose that we keep the number of required "home-made addons and scripts" to a minimum. As long as we have straight cvs or straight git, every tool developed for these systems just works. As soon as we start assembling our tree with a huge self-made infrastructure, we're all confined to our own tools for every operation that steps over the newly created repository limits.


-- 
Andreas K. Huettel
Gentoo Linux developer - kde, sci, arm, tex
dilfridge@gentoo.org
http://www.akhuettel.de/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-08 14:42   ` Andreas K. Huettel
@ 2011-08-08 16:51     ` "Paweł Hajdan, Jr."
  2011-08-08 23:49       ` [gentoo-dev] " Duncan
  0 siblings, 1 reply; 21+ messages in thread
From: "Paweł Hajdan, Jr." @ 2011-08-08 16:51 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

On 8/8/11 7:42 AM, Andreas K. Huettel wrote:
> Am Samstag 06 August 2011, 23:57:13 schrieb Fabio Erculiani:
>> I really love the idea of being able to atomically push updates
>> across multiple CPVs. This is also what KDE, GNOME, and many other
>> teams are waiting for. Having multiple repos means no atomicity and
>> at this point, I would rather prefer CVS (omg!).
> 
> Exactly. This is why I would also vote for a single tree and single
> modern vcs.

+1 here. I'm curious what problems multiple repos would be solving, or
is it just "it's cool and Fedora/other distros does it" ?

> In addition, I would like to propose that we keep the number of
> required "home-made addons and scripts" to a minimum. As long as we
> have straight cvs or straight git, every tool developed for these
> systems just works. As soon as we start assembling our tree with a
> huge self-made infrastructure, we're all confined to our own tools
> for every operation that steps over the newly created repository
> limits.

+1 here too. Vanilla git + repoman is cool. If we have a wrapper on top
of that that assembles the rsync tree it starts to be much more complex,
even more than our current CVS it seems.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [gentoo-dev] Re: [RFC] gentoo-x86 migration to repo-per-package
  2011-08-08 16:51     ` "Paweł Hajdan, Jr."
@ 2011-08-08 23:49       ` Duncan
  0 siblings, 0 replies; 21+ messages in thread
From: Duncan @ 2011-08-08 23:49 UTC (permalink / raw
  To: gentoo-dev

Paweł Hajdan, Jr. posted on Mon, 08 Aug 2011 09:51:52 -0700 as excerpted:

> On 8/8/11 7:42 AM, Andreas K. Huettel wrote:
>> Am Samstag 06 August 2011, 23:57:13 schrieb Fabio Erculiani:
>>> I really love the idea of being able to atomically push updates across
>>> multiple CPVs. This is also what KDE, GNOME, and many other teams are
>>> waiting for. Having multiple repos means no atomicity and at this
>>> point, I would rather prefer CVS (omg!).
>> 
>> Exactly. This is why I would also vote for a single tree and single
>> modern vcs.
> 
> +1 here. I'm curious what problems multiple repos would be solving, or
> is it just "it's cool and Fedora/other distros does it" ?

"Don't take the name of root in vain."?

Just as it's theoretically possible to run everything on a system as 
root, but arguably, nobody sane wants or encourages that, even for single-
human-user systems where the user is obviously capable and trusted enough 
to admin their own system...

... And just as it's possible to put the entire system on a single 
partition covering the entire disk...

... One reasonable argument here is that the multiple repos idea splits 
the damage potential and that it lends itself /naturally/ to security at 
a finer grain than "all or nothing", with said security not necessarily 
having anything to do with whether you trust the people making the 
commits, or not, just as multiple Unix user accounts doesn't necessarily 
have anything to do with whether you trust the human users or not.

There's certainly an appeal to that, but I too lean toward the "the 
hassle in this case isn't worth the trouble" argument, and thus favor a 
single all-tree repo.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package
  2011-08-06 20:55 ` Robin H. Johnson
  2011-08-07  9:12   ` Fabian Groffen
@ 2011-08-09 13:24   ` Donnie Berkholz
  1 sibling, 0 replies; 21+ messages in thread
From: Donnie Berkholz @ 2011-08-09 13:24 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 1700 bytes --]

On 20:55 Sat 06 Aug     , Robin H. Johnson wrote:
> Everything you have mentioned here was previously covered in the 
> discussions about Git conversion models. Please consult the history of 
> this list, as well as the -scm list. Additionally, a large discussion 
> about the pros and cons of all 3 models (package per repo, category 
> per repo, single repo) was had at the GSoC mentor summit last year, 
> and a number of the core Git developers were involved in the 
> discussion.

While noting the above [1 and its thread], I'd also like to point out 
that git submodules are conceptually a good fit but the implementation 
is lacking. Two examples:

- Creating new submodules requires administrative rights on the server. 
You can't just add one and push it up. This could conceivably be fixed 
by a hook that ran a specific privileged command to add a submodule, but 
I'm not really sure how or whether it's currently possible given the 
times available to run hooks.

- What we'd really want with submodules is to have the primary object 
storage shared in the master repo rather than in the submodule. That way 
we'd benefit from compression across packages, and furthmore, package 
moves wouldn't duplicate history.

If you're interested in fixing the above problems as well as the ones 
that exist regardless of repo format (linked on the main tracker bug 
[2]), then submodules could become a better option.

-- 
Thanks,
Donnie

Donnie Berkholz
Council Member / Sr. Developer
Gentoo Linux
Blog: http://dberkholz.com

1. http://archives.gentoo.org/gentoo-scm/msg_98932c55ec10fcc5445ab950e62b12dc.xml
2. https://bugs.gentoo.org/show_bug.cgi?id=333531

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-08-09 13:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-06 14:13 [gentoo-dev] [RFC] gentoo-x86 migration to repo-per-package Fabian Groffen
2011-08-06 15:36 ` Markos Chandras
2011-08-07  8:12   ` Fabian Groffen
2011-08-06 18:37 ` Nirbheek Chauhan
2011-08-07  8:24   ` Fabian Groffen
2011-08-06 20:17 ` James Cloos
2011-08-07  8:31   ` Fabian Groffen
2011-08-06 20:42 ` Krzysztof Pawlik
2011-08-07  8:40   ` Fabian Groffen
2011-08-06 20:55 ` Robin H. Johnson
2011-08-07  9:12   ` Fabian Groffen
2011-08-07  9:21     ` Michał Górny
2011-08-07  9:29       ` Fabian Groffen
2011-08-07 11:05     ` Rich Freeman
2011-08-07 11:39       ` Fabian Groffen
2011-08-09 13:24   ` Donnie Berkholz
2011-08-06 21:57 ` Fabio Erculiani
2011-08-08 14:42   ` Andreas K. Huettel
2011-08-08 16:51     ` "Paweł Hajdan, Jr."
2011-08-08 23:49       ` [gentoo-dev] " Duncan
2011-08-08  4:15 ` [gentoo-dev] " Nathan Phillip Brink

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox