

I think it’s fine for this to be poorly defined; what I want is something aligned with reality beyond op-eds. Qualitative evidence isn’t bad; but I think it needs to be aggregated instead of anecdoted. Humans are real bad at judging how the kids are doing (complaints like the OP are older than liberal education, no?); I don’t want to continue the pattern. A bunch of old people worrying too much about students not reading shakespear in classes is how we got the cancel culture moral panic - I’d rather learn from that mistake.
A handful of thoughts: There are longitudinal studies that interview kids at intervals; are any of these getting real weird swings? Some kids have AI earlier; are they much different from similar peers without? Where’s the broad interviews/story collection from the kids? Are they worried? How would they describe their use and their peers use of AI?
As I understand it, there are many many such models. Especially those made for academic use. Some common training corpus’s are listed here: https://www.tensorflow.org/datasets
Examples include wikipedia edits and discussions, and open source scientific articles.
Almost all research models are going to be trained on stuff like this. Many of them have demos, open code, and local installation instructions. They generally don’t have a marketing budget. Some of the models listed here certainly qualify: https://github.com/eugeneyan/open-llms?tab=readme-ov-file
Both of these are lists that are not so difficult to get on; so I imagine some of these have trouble with falsification or mislabeling, as you point out. But there’s little reason for people to do so (beyond improving a papers results I guess?).
Art generation seems to have had a harder time, but there are stable diffusion equivalents that used only CC work. A few minutes of search found: Common Canvas, claims to have been competitive.