public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [RFC git*.eclass] Do we need user-friendly egit-src/?
@ 2013-08-28 17:39 Michał Górny
  2013-08-28 19:07 ` Ian Stakenvicius
  0 siblings, 1 reply; 2+ messages in thread
From: Michał Górny @ 2013-08-28 17:39 UTC (permalink / raw
  To: gentoo-dev

[-- Attachment #1: Type: text/plain, Size: 3468 bytes --]

Hello,

My previous mail didn't focus on the most important thing, so I'd like
to start another thread with a simple question: do we need to provide
a user-friendly ${DISTDIR}/egit-src/?

Currently the repository stores consists of either bare or non-bare
clones of the remote repository. We do not support committing to those
local clones but people can easily clone them in order to obtain
a local development repository that can be used to work with the code
and push patches upstream.

However, supporting that increases the complexity of eclass
and decreases space efficiency. For example, if we started to do
shallow clones people would no longer be able to clone the repo
directly. We also need to worry about clone location collisions
and reusing the same location when multiple packages use the same repo.
As you can guess, git hostings don't make this easy on us.

The question would be: do you feel like we should really provide
a verbatim clone of upstream's repository? Or should we focus on
the eclass' main goal, that is fetching the remote sources in the most
bandwith and space-efficient manner?


If we decide to go for 'sane' clones, we need the eclass to be able to
provide sane paths for local copies. Those paths need to suit
the following points:

1. multiple remote repos (e.g. forks) may need to reuse the same local
   clone,

2. multiple packages may reuse the same repo and then they should
   create just one local clone,

3. a package may use multiple repos :),

4. submodules may reuse the same repo as other package, and then they
   should use the same local clone.

Honestly, I have no idea how to achieve that. The best idea that comes
to my mind is to use the whole 'path' part of the URI. That is, like:

  git://git.overlays.gentoo.org/proj/foo.git

would map to a path like:

  proj <something> foo.git

where <something> may be '/', '-', '_', '%2F', whatever.

This solves 2.-4. but won't help with 1. Plus the incoming bikeshed
about which character should be used, bikeshed that people really want
to override this and probably one more bikeshed. Oh, and some git
hostings put some prefix like '/git', '/p' or '/pub/scm/whatever' that
would be part of the checkout directory as well.

We could also supposedly use some unique identifier like root commit
identifier but I doubt users will like having hashes in egit-src.


An alternative is to create a semi-obfuscated yet space-efficient store
for all the repositories. That is, fetch *all* git repositories into
a single location.

Since git uses hashes to identify everything, this will work better
than you'd think first. Most importantly, we can avoid fetching
duplicates with no real effort since git simply reuses local objects
with the same ids.

This involves both duplicates in case of repos used by multiple
ebuilds, forked repos and identical files that are used by different
projects. I doubt you could make git more space efficient than that.

We no longer have to worry about EGIT_PROJECT, about submodules, about
bikesheds. However, the local store structure would no longer be
familiar to our users. We are basically switching from using git as VCS
to using git as efficient file fetching tool.

There's also some increased risk wrt hash collisions but I doubt that
should be considered a problem at the moment.


What are your thoughts?

-- 
Best regards,
Michał Górny

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 966 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [gentoo-dev] [RFC git*.eclass] Do we need user-friendly egit-src/?
  2013-08-28 17:39 [gentoo-dev] [RFC git*.eclass] Do we need user-friendly egit-src/? Michał Górny
@ 2013-08-28 19:07 ` Ian Stakenvicius
  0 siblings, 0 replies; 2+ messages in thread
From: Ian Stakenvicius @ 2013-08-28 19:07 UTC (permalink / raw
  To: gentoo-dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 28/08/13 01:39 PM, Michał Górny wrote:
> The question would be: do you feel like we should really provide a
> verbatim clone of upstream's repository? Or should we focus on the
> eclass' main goal, that is fetching the remote sources in the most 
> bandwith and space-efficient manner?


+1 on the second one -- eclasses are first and foremost to support
ebuilds, and unless ebuilds need access to multiple branches at the
same time or some other rather odd and non-straight-forward git trick
that would require a full clone, all we should worry about doing is
providing a way to get that shallow clone of only that code snapshot
that the ebuild needs to build.


As a bit of a tangent, if an end-user has their own local clone of a
repository and uses ${PN}_LIVE_REPO to point to it (or if they have a
custom out-of-tree ebuild that only has a local URI), it might be nice
to detect that and skip the fetch so that a checkout can be done from
it directly into ${S} rather than doing the extra copy...


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)

iF4EAREIAAYFAlIeSl4ACgkQ2ugaI38ACPAzBAEAt5jDJA5uiB4AcS4wPWHjjZA0
LtqErcFZuF5kOYLzXSgA/24Oa7GxBguFrLQBWQJDt95IYz8Po76us4BVg4X6/wu8
=xrdH
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-08-28 19:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-28 17:39 [gentoo-dev] [RFC git*.eclass] Do we need user-friendly egit-src/? Michał Górny
2013-08-28 19:07 ` Ian Stakenvicius

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox