That is just utter bullshit. Hallucinations are a by-product of how LLMs work under the hood, not an intentional design choice. An AI that doesn’t make mistakes would be orders of magnitude more profitable.
Mistakes are part of the human process… an automaton which produces only one solution for a problem is easily stuck, trapped, dead-ended. Building imperfect solution candidates and improving them until they are acceptable is how humans have designed things since forever. There are no perfect answers to the questions that matter.
The prevalence of hallucination in LLMs is a design choice. It is a result of raising the ‘temperature’ which is just fancy speak for randomization so it doesn’t spit out the same text for the same question over and over to make it look like it has nuance and whatever.
If it was consistent they would be able to reduce incorrect results, but they want it to look like a human response.
It’s not just “looking like a human response” it’s also functioning like a human response. The randomness of results enables iterative soltions that make forward progress instead of getting stuck.
There are a vast set of problems which don’t have a single perfect “correct” answer where all others are wrong, there are just collections of “answers” which - when taken as a set - work together to form a working solution. You may have 100 questions to answer, and how you answer the first 10 will affect what does or does not work for the next 10, and the next, down the line, and you may find when you get to the last set of 10 that you can’t get to the end solution unless you revise some of the answers that you previously gave - answers that looked resonable until you built the next 80% of the product…
Life isn’t school - there aren’t 10 question quizzes with pick one of 4 multiple choice possible answers where you can get a perfect score just by answering each question correctly one at a time. Real-life school is being in charge of class assignments for 1000 students, chosing which 25 students go in each room with each teacher. What classes do they get, what combinations of students should be kept together, kept apart, grouped with which teachers… they aren’t impossible problems, but they are impossible to optimize for all possible considerations. Tradeoffs have to be made.
They’re getting a little better about that as time goes on, but yeah, last year the time blindless was a major handicap at times.
On questions like geometrically constrained requirements, they’re pretty good at telling you when a problem is overconstrained such that there is no answer, but… in the fuzzier world of underspecified questions, they’ll stretch pretty far to make up an answer. In the world of computer programming, sometimes that’s a brilliant move - they “make up” some code, compile it, test it, and it works - it’s actually a functional solution.
The other day I challenged Gemini to find a person that I had a vague description of, Gemini went out and made up a name, job title, vague description of their publication history. When I pressed for actual evidence, its answers were evasive, and when I finally cornered it with a demand for anything concrete proving this person actually exists it came clean with “I hallucinated that.”
That is just utter bullshit. Hallucinations are a by-product of how LLMs work under the hood, not an intentional design choice. An AI that doesn’t make mistakes would be orders of magnitude more profitable.
Mistakes are part of the human process… an automaton which produces only one solution for a problem is easily stuck, trapped, dead-ended. Building imperfect solution candidates and improving them until they are acceptable is how humans have designed things since forever. There are no perfect answers to the questions that matter.
The prevalence of hallucination in LLMs is a design choice. It is a result of raising the ‘temperature’ which is just fancy speak for randomization so it doesn’t spit out the same text for the same question over and over to make it look like it has nuance and whatever.
If it was consistent they would be able to reduce incorrect results, but they want it to look like a human response.
Can you provide sources to “they want it to look like a human response?”
I have not read about that before.
The whole idea of LLMs is to replicate human language, as in have LLM output replicate language spoken by humans.
Here’s something about improving it: Enhancing Human-Like Responses in Large Language Models
Here’s a big thing about temperature: https://www.ibm.com/think/topics/llm-temperature
It’s not just “looking like a human response” it’s also functioning like a human response. The randomness of results enables iterative soltions that make forward progress instead of getting stuck.
There are a vast set of problems which don’t have a single perfect “correct” answer where all others are wrong, there are just collections of “answers” which - when taken as a set - work together to form a working solution. You may have 100 questions to answer, and how you answer the first 10 will affect what does or does not work for the next 10, and the next, down the line, and you may find when you get to the last set of 10 that you can’t get to the end solution unless you revise some of the answers that you previously gave - answers that looked resonable until you built the next 80% of the product…
Life isn’t school - there aren’t 10 question quizzes with pick one of 4 multiple choice possible answers where you can get a perfect score just by answering each question correctly one at a time. Real-life school is being in charge of class assignments for 1000 students, chosing which 25 students go in each room with each teacher. What classes do they get, what combinations of students should be kept together, kept apart, grouped with which teachers… they aren’t impossible problems, but they are impossible to optimize for all possible considerations. Tradeoffs have to be made.
Tradeoffs like not understanding time and always giving an answer even when there isn’t one.
They’re getting a little better about that as time goes on, but yeah, last year the time blindless was a major handicap at times.
On questions like geometrically constrained requirements, they’re pretty good at telling you when a problem is overconstrained such that there is no answer, but… in the fuzzier world of underspecified questions, they’ll stretch pretty far to make up an answer. In the world of computer programming, sometimes that’s a brilliant move - they “make up” some code, compile it, test it, and it works - it’s actually a functional solution.
The other day I challenged Gemini to find a person that I had a vague description of, Gemini went out and made up a name, job title, vague description of their publication history. When I pressed for actual evidence, its answers were evasive, and when I finally cornered it with a demand for anything concrete proving this person actually exists it came clean with “I hallucinated that.”