I found the aeticle in a post on the fediverse, and I can’t find it anymore.
The reaserchers asked a simple mathematical question to an LLM ( like 7+4) and then could see how internally it worked by finding similar paths, but nothing like performing mathematical reasoning, even if the final answer was correct.
Then they asked the LLM to explain how it found the result, what was it’s internal reasoning. The answer was detailed step by step mathematical logic, like a human explaining how to perform an addition.
This showed 2 things:
-
LLM don’t “know” how they work
-
the second answer was a rephrasing of original text used for training that explain how math works, so LLM just used that as an explanation
I think it was a very interesting an meaningful analysis
Can anyone help me find this?
EDIT: thanks to @theunknownmuncher @lemmy.world https://www.anthropic.com/research/tracing-thoughts-language-model its this one
EDIT2: I’m aware LLM dont “know” anything and don’t reason, and it’s exactly why I wanted to find the article. Some more details here: https://feddit.it/post/18191686/13815095
Why aren’t they tokens when you use them? Does your brain not also choose the most apt selection for the sequence to make maximal meaning in the context prompted? I assert that after a sufficiently complex obfuscation of the underlying mathematical calculations the concept of reasoning becomes an exercise in pedantic dissection of the mutual interpretation of meaning. Our own minds are objectively deterministic, but the obfuscation provided by lack of direct observation provides the quantum cover fire needed to claim we are not just LLM equivalent representation on biological circuit boards.