On Sun, 10 Jun 2018 22:34:45 +0200 Ulrich Mueller wrote: > - Gentoo projects must release their work under GPL-compatible free > software licenses, preferably under GPL-2 or later. CC-BY-SA-3.0 > (or 4.0) can be used for documentation. > > - All commits to Gentoo repositories must be accompanied by a > certificate of origin. Typically, this would be declared by adding > a "Signed-off-by:" line (or several of them) to every commit. > This model is very similar to the one used for development of the > Linux kernel. Using the Linux kernel as a reference point, while amicable, is somewhat a dangerous line of reasoning here. Partly, because we have several significant differences in practice from that project, and "we both use Git" is insufficient. - We don't have the hierarchy of Marshals who process their lowers contributions. The closest we have to this is Gentoo-Proxy-Maint, but besides that, we have a wild west free-for-all where commits are essentially unmonitored, there is no structural/procedural oversight where higher privileges vet the contributions of lower privileges. - We cannot rely on copy/move detection in any reasonable fashion, as: 1. File names are significant data which have source-time consequences. 2. File contents are incredibly regular, having high similarity to huge volumes of the tree, making a statistical percentage similarity basis prone to misdetection. 3. We routinely employ "copy the file to increment its version" logic, which is practically unseen outside Gentoo, partly, because the reliance on filenames as a versioning system is "you're too poor to have a real VCS", typically speaking, and we only really have it as a historical wart from how we evolved. Such an idea is typically nonsense if you can rely on a VCS to handle versioning for you. These factors basically make our workflow, and the nature of the work we do substantially different from the Kernel project, and most other projects. > - The copyright notice, especially for ebuilds, will contain the > name of the largest contributor, plus an "and others" clause when > necessary. (So the "Copyright Gentoo Foundation" lines will be > phased out.) This idea to me is just asking for trouble, given the aformentioned factors. The need to update copyright year is already a minor burden, smoothed over slightly by the fact repoman can reliably fix it on its own. Reliably handling contribution factors however seems difficult, given the stock output given by "git blame" is routinely wrong due to how or workflow operates. ( For example, attribution for every virtual/perl-*/*.ebuild should trace back to 56bd759df1d0c750a065b8c845e93d5dfa6b549d when robbat2 committed the first incarnation of those files, as there are many lines that haven't changed since then, but my Git Fu doesn't know of a way to reliably do this without manually implementing new tools that are portage-semantics aware and do log processing, and I say that while actively developing tools that scrape git fast-export to attempt to do something remotely like this, but its quickly approaching the limits of what can be done in a week, let alone regularly ) So given that, as it stands, automating this is either: a) hard b) impossible And subsequently, manually doing it will tend towards those entries quickly becoming wrong. And to add insult to injury, changing these entries via either mechanism produces source of commit churn and conflicts ( which themselves, will alter the statistical bias of attribution, potentially necessitating subsequent commits ) And around about here you ask "what's the point?". A lot of work, for negligible benefit. If anything, I'd rather have a per-package attribution file, not a per-ebuild attribution file. 1. Its out-of-band, so it doesn't lead to changes in the attributed content itself 2. Its not subject as much to accidents of copy/rename because it anchors itself to the logical directory (one could argue, all files in ${cat}/${pn}/${pn}-${pv} are derivative versions of the same file, modulo ${pv}) 3. As a result of #1 and #2, said file can be generated server side while generating rsync trees, either by careful application of `git shortlog -s -n`, or some clever system that generates all such files by reading the output of `git fast-export --no-data`, avoiding commit churn On its surface, what I've proposed for #3 looks like it could be done per-file, but the trick is in order to do it per-file, one needs to create a much smarter fast-export/short-log filter that can determine the "true origin" of a given ebuild across renames, and the data is simply not there to answer that question. You can make assumptions like "-r1 is the parent of -r2", but that often isn't the case. With the "per-directory" approach, we can basically incorporate the assumption that true ancestry is hard to determine in Gentoo, and not lie to people about the accuracy of the data provided. ( It will also by nature encompass a greater number of copyright holders having influenced the file, even if its not clear which holders contributed which lines ) cf: git --no-pager shortlog -n -s virtual/perl-ExtUtils-MakeMaker/ 19 Kent Fredric 13 Agostino Sarubbo 3 Tobias Klausmann 2 Andreas K. Hüttel 2 Jeroen Roovers 2 Markus Meier 2 Michał Górny 2 Mike Frysinger 2 Robin H. Johnson 1 Aaron Bauman 1 Fabian Groffen 1 Justin Lecher 1 Michael Haubenwallner 1 Michael Weber 1 Mike Gilbert 1 Sergei Trofimovich 1 Thomas Deutschmann 1 Ulrich Müller vs: git blame -M1% -C1% --line-porcelain virtual/perl-ExtUtils-MakeMaker/perl-ExtUtils-MakeMaker-7.100.200_rc-r4.ebuild | grep -E '^(author |filename)' | sort -u author Kent Fredric filename virtual/perl-Archive-Tar/perl-Archive-Tar-2.40.100_rc.ebuild filename virtual/perl-Archive-Tar/perl-Archive-Tar-2.40.100_rc-r5.ebuild git blame -M1% -C1% --line-porcelain virtual/perl-ExtUtils-MakeMaker/perl-ExtUtils-MakeMaker-7.240.0.ebuild | grep -E '^(author |filename)' | sort -u author Andreas K. Hüttel author Kent Fredric filename virtual/perl-ExtUtils-MakeMaker/perl-ExtUtils-MakeMaker-7.240.0.ebuild These 2 latter results are simply wrong, as evidenced by the unchanged lines shown in: git --no-pager diff -M1 -C1 -D 56bd759df1d0c750a065b8c845e93d5dfa6b549d -- "virtual/perl-ExtUtils-MakeMaker/*.ebuild" I attempted similar with --find-copies-harder, but lost patience when it spent 5 minutes doing nothing. IN SUMMARY: The nature of the proposed changes seems strongly in conflict with the technology we have to use, and will produce no benefit, at the expense of real problems.