From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 8411A138334 for ; Sun, 17 Jun 2018 01:04:31 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id BCDDAE085D; Sun, 17 Jun 2018 01:04:29 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 693FAE083E for ; Sun, 17 Jun 2018 01:04:29 +0000 (UTC) Received: from katipo2.lan (unknown [203.86.205.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: kentnl) by smtp.gentoo.org (Postfix) with ESMTPSA id 6F98B335C8C for ; Sun, 17 Jun 2018 01:04:26 +0000 (UTC) Date: Sun, 17 Jun 2018 13:03:58 +1200 From: Kent Fredric To: gentoo-project@lists.gentoo.org Subject: Re: [gentoo-project] [RFC] GLEP 76: Copyright Policy Message-ID: <20180617123737.122ef070@katipo2.lan> In-Reply-To: <23325.35685.793702.267278@a1i15.kph.uni-mainz.de> References: <23325.35685.793702.267278@a1i15.kph.uni-mainz.de> Organization: Gentoo X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Project discussion list X-BeenThere: gentoo-project@lists.gentoo.org Reply-To: gentoo-project@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; boundary="Sig_/KKu9d8N1nX5WBYn4wYN=NwS"; protocol="application/pgp-signature" X-Archives-Salt: ba5861c8-f720-4d76-a668-6ec94a2e099a X-Archives-Hash: 049ff35d17dd5465a34934721b10d05a --Sig_/KKu9d8N1nX5WBYn4wYN=NwS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, 10 Jun 2018 22:34:45 +0200 Ulrich Mueller wrote: > - Gentoo projects must release their work under GPL-compatible free > software licenses, preferably under GPL-2 or later. CC-BY-SA-3.0 > (or 4.0) can be used for documentation. >=20 > - All commits to Gentoo repositories must be accompanied by a > certificate of origin. Typically, this would be declared by adding > a "Signed-off-by:" line (or several of them) to every commit. > This model is very similar to the one used for development of the > Linux kernel. Using the Linux kernel as a reference point, while amicable, is somewhat a dangerous line of reasoning here. Partly, because we have several significant differences in practice from that project, and "we both use Git" is insufficient. - We don't have the hierarchy of Marshals who process their lowers contributions. The closest we have to this is Gentoo-Proxy-Maint, but besides that, we have a wild west free-for-all where commits are essentially unmonitored, there is no structural/procedural oversight where higher privileges vet the contributions of lower privileges. - We cannot rely on copy/move detection in any reasonable fashion, as: 1. File names are significant data which have source-time consequences. 2. File contents are incredibly regular, having high similarity to huge volumes of the tree, making a statistical percentage similarity basis prone to misdetection. 3. We routinely employ "copy the file to increment its version" logic, which is practically unseen outside Gentoo, partly, because the reliance on filenames as a versioning system is "you're too poor to have a real VCS", typically speaking, and we only really have it as a historical wart from how we evolved. Such an idea is typically nonsense if you can rely on a VCS to handle versioning for you. These factors basically make our workflow, and the nature of the work we do substantially different from the Kernel project, and most other projects. > - The copyright notice, especially for ebuilds, will contain the > name of the largest contributor, plus an "and others" clause when > necessary. (So the "Copyright Gentoo Foundation" lines will be > phased out.) This idea to me is just asking for trouble, given the aformentioned factors. The need to update copyright year is already a minor burden, smoothed over slightly by the fact repoman can reliably fix it on its own. Reliably handling contribution factors however seems difficult, given the stock output given by "git blame" is routinely wrong due to how or workflow operates. ( For example, attribution for every virtual/perl-*/*.ebuild should trace back to 56bd759df1d0c750a065b8c845e93d5dfa6b549d when robbat2 committed the first incarnation of those files, as there are many lines that haven't changed since then, but my Git Fu doesn't know of a way to reliably do this without manually implementing new tools that are portage-semantics aware and do log processing, and I say that while actively developing tools that scrape git fast-export to attempt to do something remotely like this, but its quickly approaching the limits of what can be done in a week, let alone regularly ) So given that, as it stands, automating this is either: a) hard b) impossible And subsequently, manually doing it will tend towards those entries quickly becoming wrong. And to add insult to injury, changing these entries via either mechanism produces source of commit churn and conflicts ( which themselves, will alter the statistical bias of attribution, potentially necessitating subsequent commits ) And around about here you ask "what's the point?". A lot of work, for negligible benefit. If anything, I'd rather have a per-package attribution file, not a per-ebuild attribution file. 1. Its out-of-band, so it doesn't lead to changes in the attributed content= itself 2. Its not subject as much to accidents of copy/rename because it anchors itself to the logical directory (one could argue, all files in ${cat}/${pn}/${pn}-${pv} are derivative versions of the same file, modulo $= {pv}) 3. As a result of #1 and #2, said file can be generated server side while generating rsync trees, either by careful application of `git shortlog -s -n`, or some clever system that generates all such files by reading the output of `git fast-export --no-data`, avoiding commit churn=20 On its surface, what I've proposed for #3 looks like it could be done per-f= ile, but the trick is in order to do it per-file, one needs to create a much smarter fast-export/short-log filter that can determine the "true origin" of a given ebuild across renames, and the data is simply not there to answer that question. You can make assumptions like "-r1 is the parent of -r2", but that often isn't the case. With the "per-directory" approach, we can basically incorporate the assumption that true ancestry is hard to determine in Gentoo, and not lie to people about the accuracy of the data provided. ( It will also by nature encompass a greater number of copyright holders having influenced the file, even if its not clear which holders contributed which lines ) cf: git --no-pager shortlog -n -s virtual/perl-ExtUtils-MakeMaker/ 19 Kent Fredric 13 Agostino Sarubbo 3 Tobias Klausmann 2 Andreas K. H=C3=BCttel 2 Jeroen Roovers 2 Markus Meier 2 Micha=C5=82 G=C3=B3rny 2 Mike Frysinger 2 Robin H. Johnson 1 Aaron Bauman 1 Fabian Groffen 1 Justin Lecher 1 Michael Haubenwallner 1 Michael Weber 1 Mike Gilbert 1 Sergei Trofimovich 1 Thomas Deutschmann 1 Ulrich M=C3=BCller vs: git blame -M1% -C1% --line-porcelain virtual/perl-ExtUtils-MakeMaker/perl= -ExtUtils-MakeMaker-7.100.200_rc-r4.ebuild | grep -E '^(author |filename)' = | sort -u author Kent Fredric filename virtual/perl-Archive-Tar/perl-Archive-Tar-2.40.100_rc.ebuild filename virtual/perl-Archive-Tar/perl-Archive-Tar-2.40.100_rc-r5.ebuild git blame -M1% -C1% --line-porcelain virtual/perl-ExtUtils-MakeMaker/perl-E= xtUtils-MakeMaker-7.240.0.ebuild | grep -E '^(author |filename)' | sort -u author Andreas K. H=C3=BCttel author Kent Fredric filename virtual/perl-ExtUtils-MakeMaker/perl-ExtUtils-MakeMaker-7.240.0.eb= uild These 2 latter results are simply wrong, as evidenced by the unchanged lines shown in: git --no-pager diff -M1 -C1 -D 56bd759df1d0c750a065b8c845e93d5dfa6b549d = -- "virtual/perl-ExtUtils-MakeMaker/*.ebuild" I attempted similar with --find-copies-harder, but lost patience when it spent 5 minutes doing nothing. IN SUMMARY: The nature of the proposed changes seems strongly in conflict with the technology we have to use, and will produce no benefit, at the expense of real problems. --Sig_/KKu9d8N1nX5WBYn4wYN=NwS Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEPZazbI/qrFT1o9rn6FQySxNmqCAFAlsls40ACgkQ6FQySxNm qCDo0g//clpXmxUlb6qkEAgSiO4j/D5inOPR5U32+58aRsSlgjMnn1MiRlvyMi9j XorLWyLdllMstjiDHOwoIzv5BKx76F7lzjBRwV7Su6dfOMlqNbIDO3/i4EeLox2z uSieeYHJtsbfhmgCoVVL7jPTQOc1nsKcLjeEAv0CAOXAbRAaTMvYtUwLl9sU0vXd B/CyA1Utcs8s5+W7/BzKSc9w+bsv9tNh8lqdjSHSu2IjsrTb20gYI0SZcU6yrTsv AO65BvNBRa/m6B7mrz6+eoVHehGzlt60GUGgNOKhww5VvWGEaZZldetIDbQd5vnk o5bESiKlh+5TIOMjOcCCEXpr/G+DTGctoTKBfgaF4XW3MgAT5iwMbB7WDQR6mZxf JLgNrIqIp4RW4AdiE8Ji7M3CDWwFKUpSUu9epG/2Oe/X2xU1L3yBACkUzvVb8qzy 8d+wulKnIEkoltT56SFxCM7amPa6wkr95sMb1hvUlAkp9mnli5fVJtSSgy2bx91k wDmaQpnVuYjWPlUArHo/DUeIhDfGrFaU2UYrAI0BJOHL7jbz/sRkJpKRU/Xj1voK 86nDMolxj4FykycIoQ5YCIkRgt2x1ZCIjztLRtUB9Mllk4xj86ggM8hbTCcwIWp9 GE2UrNoPdQFn6ewYtNd3OExJYiIzBhsqP2kxxuf5J3ans0xU080= =Jaqq -----END PGP SIGNATURE----- --Sig_/KKu9d8N1nX5WBYn4wYN=NwS--