On Tue, Mar 05, 2024 at 06:12:06 +0000, Robin H. Johnson wrote:
> At the top, I noted that it will be possible in future for AI generation
> to be used in a good, safe way, and we should provide some signals to
> the researchers behind the AI industry on this matter.
> 
> What should it have?
> - The output has correct license & copyright attributions for portions that are copyrightable.
> - The output explicitly disclaims copyright for uncopyrightable portions
>   (yes, this is a higher bar than we set for humans today).
> - The output is provably correct (QA checks, actually running tests etc)
> - The output is free of non-functional/nonsense garbage.
> - The output is free of hallucinations (aka don't invent dependencies that don't exist).
> 
> Can you please contribute other requirements that you feel "good" AI output should have?
> 

- The output is not overly clever even if correct.

It should resemble something a reasonable human might write. For
example, some contrived sequence of Bash parameter expansions vs using
sed.

- The output is succinct enough.

This continues the "reasonable human" theme from above. For example, it
should not first increment some value by 4, then 3, then 2, and finaly 1
when incrementing by 10 right off the bat makes more sense.

- The output domain is able to be restricted in some form.

Given a problem, some things are simply outside of the space of valid
answers. For example,

    sudo rm -rf --no-preserve-root /

should not be a line that can be generated in the context of ebuilds.

- Simply enumerating restrictions should be considered intractable.

While it may be trivial to create a list of forbidden words in the
context of a basic family-friendly environment, how can you effectively
guard against forbidden constructs when you might not know them all
beforehand? For example, how do you define what constitutes "malicious
output"?

- Oskari