" …
What exactly is AMI building? The short answer is world models, a category of AI system that LeCun has been arguing for, and working on, for years. The longer answer requires understanding why he thinks the industry has taken a wrong turn.
Large language models learn by predicting which word comes next in a sequence. They have been trained on vast quantities of human-generated text, and the results have been remarkable, ChatGPT, Claude, and Gemini have demonstrated an ability to generate fluent, plausible language across an enormous range of subjects. But LeCun has spent years arguing, loudly and repeatedly, that this approach has fundamental limits.
His alternative is JEPA: the Joint Embedding Predictive Architecture, a framework he first proposed in 2022. Rather than predicting the future state of the world in pixel-perfect or word-by-word detail, the approach that makes generative AI both powerful and prone to hallucination, JEPA learns abstract representations of how the world works, ignoring unpredictable surface detail. The idea is to build systems that understand physical reality the way humans and animals do: not through language, but through embodied experience."



World models aren’t just for robotics (though they definitely WILL be used for that). They’re for reasoning under uncertainty in domains where you can’t see the outcome in advance. Eg:
Medical diagnosis: you can’t physically “embody” whether a treatment will work. But a system that understands disease progression, drug interactions, and physiological constraints (not by pattern-matching text, but by learning causal structure) - well, that’s fundamentally different from an LLM hallucinating plausible-sounding symptoms.
Financial modeling, engineering simulations, climate prediction…all domains where the “embodied experience” is simulation, not physical interaction. You learn how the world actually works by understanding constraint and causality, not by predicting the next token in a Bloomberg article.
The point isn’t “robots will finally work.” The point is: understanding causality is cheaper in the long run and more reliable than memorizing correlations. Embodiment is just the training signal that forces you to learn causality instead of surface patterns.
My read is that LeCun’s betting that a system trained to predict abstract state transitions in any domain (be that medical, financial, physical) will generalize better / hallucinate less than one trained to predict text.
Whether that’s true? Fucked if I know - that’s why it’s (literally) the billion-dollar question. If he cracks it…it’s big.
But “it won’t cook dinner” misses the point (and besides which, it might actually cook dinner and change lightbulbs, so…)