[Opinion] AI finds errors in 90% of Wikipedia's best articles

King@blackneon.net · edit-2 2 months ago

[Opinion] AI finds errors in 90% of Wikipedia's best articles

dukemirage@lemmy.world · 2 months ago

legitimate use of a LLM

anamethatisnt@sopuli.xyz · 2 months ago

I find that an extremely simplified way of finding out whether the use of an LLM is good or not is whether the output from it is used as a finished product or not. Here the human uses it to identify possible errors and then verify the LLM output before acting and the use of AI isn’t mentioned at all for the corrections.

The only danger I see is that errors the LLM didn’t find will continue to go undiscovered, but they probably would be undiscovered without the use of the LLM too.

porcoesphino@mander.xyz · edit-2 2 months ago

I think the first part you wrote is a bit hard to parse but I think this is related:

I think the problematic part of most genAI use cases is validation at the end. If you’re doing something that has a large amount of exploration but a small amount of validation, like this, then it’s useful.

A friend was using it to learn the linux command line, that can be framed as having a single command at the end that you copy, paste and validate. That isn’t perfect because the explanation could still be off and it wouldn’t be validated but I think it’s still a better use case than most.

If you’re asking for the grand unifying theory of gravity then:

validation isn’t built into the task (so you’re unlikely to do it with time).
validation could be as time intensive as the task (so there is no efficiency gain if you validate).
its beyond your ability to validate so if it says nice things about you then a subset of people will decide the tool is amazing.

anamethatisnt@sopuli.xyz · 2 months ago

Yeah, my morning brain was trying to say that when it is used as a tool by someone that can validate the output and act upon it then it’s often good. When it is used by someone who can’t, or won’t, validate the output and simply uses it as the finished product then it usually isn’t any good.

Regarding your friend learning to use the terminal I’d still recommend validating the output before using it. If it’s asking genAI about flags for ls then sure no big deal, but if a genAI ends up switching around sda and sdb in your dd command resulting in a wiped drive you only got yourself to blame for not checking the manual.

shiroininja@lemmy.world · edit-2 2 months ago

Or it flags something as an error falsely and the human has so much faith in the system that it must be correct, and either wastes time finding the solution or bends reality to “correct” it in a human form of hallucinating bs. Especially dangerous if saying there is an error supports the individual’s personal beliefs

Edit:

I’ll call it “AI-induced confirmation bias” cousin to AI-induced psychosis.

ordnance_qf_17_pounder@reddthat.com · 2 months ago

“AI” summed up. 95% of the time it’s pointless bullshit being shoehorned into absolutely everything. 5% of the time it can be useful.

dukemirage@lemmy.world · 2 months ago

like Comic Sans

earthworm@sh.itjust.works · 2 months ago

Something weird about corporations spending billions on “the Comic Sans of technology”

Treczoks@lemmy.world · 2 months ago

Yep. Let it flag potential problems, and have humans react to it, e.g. by reviewing and correcting things manually. AI can do a lot of things quick and efficiently, but it must be supervised like a toddler.

buffing_lecturer@leminal.space · 2 months ago

This is an interesting idea:

The “at least one” in the prompt is deliberately aggressive, and seems likely to force hallucinations in case an article is definitely error-free. So, while the sample here (running the prompt only once against a small set of articles) would still be too small for it, it might be interesting to investigate using this prompt to produce a kind of article quality metric: If it repeatedly results only in invalid error findings (i.e. what a human reviewer Disagrees with), that should indicate that the article is less likely to contain factual errors

architect@thelemmy.club · 2 months ago

So… the same as most employees but cheaper.

People here are above average and overestimate the vast majority of humanity.

passepartout@feddit.org · 2 months ago

Yes and no. I have enjoyed reading through this approach, but it seems like a slippery slope from this to “vibe knowledge” where LLMs are used for actually trying to add / infer information.

LastYearsIrritant@sopuli.xyz · 2 months ago

Don’t discard a good technique cause it can be implemented poorly.

architect@thelemmy.club · 2 months ago

The issue is that some people are lazy cheaters no matter what you do. Banning every tool because of those people isn’t helpful to the rest of humanity.

De Lancre@lemmy.world · 2 months ago

Wait, you mean using Large Language Model that created to parse walls of text, to parse walls of text, is a legit use?

Those kids at openai would’ve been very upset if they could read.

dukemirage@lemmy.world · 2 months ago

Chatbots aren’t the worst use case, too, even though we are headed in a wrong direction.

lightnsfw@reddthat.com · 2 months ago

Even for that it’s mid at best. I try using co-pilot at work often and it makes shit up constantly.

[Opinion] AI finds errors in 90% of Wikipedia's best articles

[Opinion] AI finds errors in 90% of Wikipedia's best articles

Wikipedia:Wikipedia Signpost/2025-12-01/Opinion - Wikipedia