On Tue, Mar 05, 2024 at 06:12:06 +0000, Robin H. Johnson wrote: > At the top, I noted that it will be possible in future for AI generation > to be used in a good, safe way, and we should provide some signals to > the researchers behind the AI industry on this matter. > > What should it have? > - The output has correct license & copyright attributions for portions that are copyrightable. > - The output explicitly disclaims copyright for uncopyrightable portions > (yes, this is a higher bar than we set for humans today). > - The output is provably correct (QA checks, actually running tests etc) > - The output is free of non-functional/nonsense garbage. > - The output is free of hallucinations (aka don't invent dependencies that don't exist). > > Can you please contribute other requirements that you feel "good" AI output should have? > - The output is not overly clever even if correct. It should resemble something a reasonable human might write. For example, some contrived sequence of Bash parameter expansions vs using sed. - The output is succinct enough. This continues the "reasonable human" theme from above. For example, it should not first increment some value by 4, then 3, then 2, and finaly 1 when incrementing by 10 right off the bat makes more sense. - The output domain is able to be restricted in some form. Given a problem, some things are simply outside of the space of valid answers. For example, sudo rm -rf --no-preserve-root / should not be a line that can be generated in the context of ebuilds. - Simply enumerating restrictions should be considered intractable. While it may be trivial to create a list of forbidden words in the context of a basic family-friendly environment, how can you effectively guard against forbidden constructs when you might not know them all beforehand? For example, how do you define what constitutes "malicious output"? - Oskari