Hi,


It's a good thing that
 https://wiki.gentoo.org/wiki/Project:Council/AI_policy
has been voted, and that it mentions:

> This motion can be revisited, should a case been made over such a
tool  that does not pose copyright, ethical and quality concerns.


I wanted to provide some meat to discuss improvements of the specific 
phrasing "created with the assistance of Natural Language
Processing artificial intelligence tools" which may not be the most
optimal.


First, I think we should not limit this to LLMs / NLP stuff, when it
should be about all algorithmically/automatically generated content,
which could all cause a flood of time-wasting, low-quality information.


Second, I think we should define what would be acceptable use cases of
algorithmically-generated content; I'd suggest for a starting point,
the combination of:

- The algorithm generating such content is proper F/LOSS

- In the case of a machine learning algorithm, the dataset allowing
to generate such algorithm is proper F/LOSS itself (with traceability
of all of its bits)

- The algorithm generating such content is reproducible (training
produces the exact same bits)

- The algorithm did not publish the content automatically: all the
content was reviewed and approved by a human, who bears responsibility
for their contribution, and the content has been flagged as having been
generated using $tool.


Third, I think a "developer certificate of origin" policy could be
augmented with the "bot did not publish the content automatically" bits
and should also be mandated in the context of bug reporting, so as to
have a "human gate" for issues discovered by automation / tinderboxes.


Best regards,

-- 
Jérôme