public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
From: Matt Jolly <kangie@gentoo.org>
To: gentoo-dev@lists.gentoo.org
Subject: Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo
Date: Wed, 28 Feb 2024 21:06:51 +1000	[thread overview]
Message-ID: <cbb02a15-54b8-4804-b062-4731ed3d7f73@gentoo.org> (raw)
In-Reply-To: <u1q8wnbnq@gentoo.org>


[-- Attachment #1.1.1: Type: text/plain, Size: 4184 bytes --]


> But where do we draw the line? Are translation tools like DeepL 
> allowed? I don't see much of a copyright issue for these.

I'd also like to jump in and play devil's advocate. There's a fair
chance that this is because I just got back from a
supercomputing/research conf where LLMs were the hot topic in every keynote.

As mentioned by Sam, this RFC is performative. Any users that are going
to abuse LLMs are going to do it _anyway_, regardless of the rules. We
already rely on common sense to filter these out; we're always going to
have BS/Spam PRs and bugs - I don't really think that the content being
generated by LLM is really any worse.

This doesn't mean that I think we should blanket allow poor quality LLM
contributions. It's especially important that we take into account the
potential for bias, factual errors, and outright plagarism when these
tools are used incorrectly.  We already have methods for weeding out low
quality contributions and bad faith contributors - let's trust in these
and see what we can do to strengthen these tools and processes.

A bit closer to home for me, what about using a LLMs as an assistive
technology / to reduce boilerplate? I'm recovering from RSI - I don't
know when (if...) I'll be able to type like I used to again. If a model
is able to infer some mostly salvagable boilerplate from its context
window I'm going to use it and spend the effort I would writing that to
fix something else; an outright ban on LLM use will reduce my _ability_
to contribute to the project.

What about using a LLM for code documentation? Some models can do a
passable job of writing decent quality function documentation and, in
production, I _have_ caught real issues in my logic this way. Why should
I type that out (and write what I think the code does rather than what
it actually does) if an LLM can get 'close enough' and I only need to do
light editing?

In line with the above, if the concern is about code quality / potential
for plagiarised code, What about indirect use of LLMs? Imagine a
hypothetical situation where a contributor asks a LLM to summarise a
topic and uses that knowledge to implement a feature. Is this now
tainted / forbidden knowledge according to the Gentoo project?

As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
and repos, or more likely trained on exclusively open-source
contributions and fine-tuned on Gentoo specifics? I'm in the process of
spinning up several models at work to get a handle on the tech / turn
more electricity into heat - this is a real possibility (if I can ever
find the time).

The cat is out of the bag when it comes to LLMs. In my real-world job I
talk to scientists and engineers using these things (for their
strengths) to quickly iterate on designs, to summarise experimental
results, and even to generate testable hypotheses. We're only going to
see increasing use of this technology going forward.

TL;DR: I think this is a bad idea. We already have effective mechanisms
for dealing with spam and bad faith contributions. Banning LLM use by
Gentoo contributors at this point is just throwing the baby out with the
bathwater.

As an alternative I'd be very happy some guidelines for the use of LLMs
and other assistive technologies like "Don't use LLM code snippets
unless you understand them", "Don't blindly copy and paste LLM output",
or, my personal favourite, "Don't be a jerk to our poor bug wranglers".

A blanket "No completely AI/LLM generated works" might be fine, too.

Let's see how the legal issues shake out before we start pre-emptively
banning useful tools. There's a lot of ongoing action in this space - at
the very least I'd like to see some thorough discussion of the legal
issues separately if we're making a case for banning an entire class of
technology.

A Gentoo LLM project formed of experts who could actually provide good
advice / some actual guidelines for LLM use within the project (and
engaging some real-world legal advice) might be a good starting point.
Are there any volunteers in the audience?

Thanks for listening to my TED talk,

Matt

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 665 bytes --]

  reply	other threads:[~2024-02-28 11:07 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27 14:45 [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo Michał Górny
2024-02-27 15:10 ` Arsen Arsenović
2024-02-27 15:21 ` Kenton Groombridge
2024-02-27 15:31   ` Alex Boag-Munroe
2024-02-27 16:11 ` Marek Szuba
2024-02-27 16:29   ` Sam James
2024-02-27 16:48 ` Andreas K. Huettel
2024-02-27 17:02 ` Ionen Wolkens
2024-02-27 17:41 ` Rich Freeman
2024-02-27 18:07   ` Ulrich Mueller
2024-02-27 18:27     ` Kenton Groombridge
2024-02-27 17:46 ` Matthias Maier
2024-02-27 17:50 ` Roy Bamford
2024-02-27 18:40   ` Peter Böhm
2024-02-27 18:04 ` Sam James
2024-03-09 14:57   ` Michał Górny
2024-02-27 19:17 ` Eli Schwartz
2024-02-28  3:05 ` Oskari Pirhonen
2024-02-28  3:12   ` Michał Górny
2024-02-28 10:08     ` Ulrich Mueller
2024-02-28 11:06       ` Matt Jolly [this message]
2024-02-28 20:20         ` Eli Schwartz
2024-03-01  7:06         ` Sam James
2024-03-09 15:00           ` Michał Górny
2024-02-28 13:09       ` Michał Górny
2024-02-28 10:34 ` David Seifert
2024-02-28 18:50 ` Arthur Zamarin
2024-02-28 19:26   ` Rich Freeman
2024-03-01  6:33 ` Zoltan Puskas
2024-03-05  6:12 ` Robin H. Johnson
2024-03-06  6:53   ` Oskari Pirhonen
2024-03-08  3:59   ` [gentoo-dev] " Duncan
2024-03-09 15:04     ` Michał Górny
2024-03-09 21:13       ` Duncan
2024-03-10  1:53         ` Eli Schwartz
2024-03-06 13:53 ` [gentoo-dev] " martin-kokos
2024-03-08  7:09 ` Fco. Javier Felix Belmonte
2024-03-21 15:25 ` Michał Górny
2024-04-15 19:50   ` Jérôme Carretero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbb02a15-54b8-4804-b062-4731ed3d7f73@gentoo.org \
    --to=kangie@gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox