Against the Quantification of Integrity
When the measure of language becomes its target, it ceases to be good language.
💡Nerd Rating: 1/5. I discuss the origins of certain linguistic tics in LLMs and what it means for writing, student assessment, and thinking.
"It's not x, it's y."
Large Language
The hysteria is a sad part of the whole situation. I think the core tension is just how serious an infarction plagurism is in academia.
It’s like if you had a medical machine that first did harm to produce a more accurate diagnosis. You’d be compelled to find the oath breaking physicians and you’d have to do some weird statistical analysis like “are you getting too many accurate diagnoses?” Then you’ll have arguments about correct diagnoses being the point. But your institution and teaching since its inception is to first do no harm and you don’t have any institutional capacity, and likely any will to, change it so that you can do a little bit of harm.
I feel a lot better about code than I do art or writing or anything else. Before Bill Gates did all his patent fraud and paywall schemes, I believe open source was the default mode. Information wants to propagate after all. Even before LLMs you’re just copy and pasting stuff from stack overflow. Code should be iterative and available; who cares if it’s easy to come by? I think it takes us closer to how code was envisioned when your SaaS nonsense gets one shot by Claude which is opposite to a plagurism machine removing us from the envisioned principles of education.
So the origin of the hysteria is good and just. Obviously if you send someone out into the world with your endorsement and they rely on a tool instead of their own capacity that’s against the point of an education. On the flip side I also absolutely get the desire to instrumentalize the education you receive. A diploma is a piece of paper with an ROI and the context around that education is you being instrumentalized. Not to mention the way to avoid death by exposure is to be capable of being profited off of. So why would they care about the principles of academia? You’d care about the principles of medicine because you don’t want to receive or give harm. You care about the principles of code because you don’t want to be paywalled out of the shared knowledge pool. But academia? You’re subject to punishment if you do or if you don’t (10% of the time per the article). You’re seldom hired or useful by virtue of your capacity without a tool otherwise. I was waxing poetic earlier, but you are seldom seen praised, acknowledged, or rewarded for unassisted general intelligence in culture and media.
So therefore an educator has this existential threat to the model of how the institution works and tools that are insufficient to ward against it. You hollow out and undermine the people you’re looking after if you turn them into instruments of the LLM (e.g. people who are good at checking LLM outputs for accuracy). But, like the article mentioned, you begin to police thought and the most common ways of resolving ambiguity by using language as a conduit. The very act of policing incentivising a lack of engagement with reason instead of its form as well as putting people who don’t engage with any of it into the crossfire.
That sucks and that’s grim.
Yeah, the whole closed source thing with code only started when corps realized they could make money of software. People writing code and sharing it was the default mode before that. So, completely agree that LLMs can be helpful in making open source become the default because it’s just not going to be worth hoarding the code going forward. I’ve also found they’re pretty capable at reverse engineering closed code as well. My Nikon camera uses a weird ass proprietary format for its RAW files, and I was able to reverse engineer it by a combination of decompiling and instrumenting a proprietary library from the app I’ve been using to read the files. This is just something that wouldn’t have been practical for me to even attempt before LLMs.
I think for science, we need to see more of what arxiv started banning people for a year if they have hallucinated references in their papers. That’s the kind of thing that makes sense to focus on instead of style. If a paper makes stuff up, then you know it’s bad quality regardless how it was made and you can deal with the offender. You can even have an automated process to do the initial survey of papers to validate them before they get to a human review. This stuff can be done fairly deterministically since a citation either exists or it does not.
Ultimately, the focus should be on the substance, and as you note, the problem was already there long before LLMs. Now it’s just a lot easier for people to produce garbage.