The models you can run on consumer hardware are still nowhere near the stuff that runs in corporate data centers. To stick with your metaphor, its like running a little steam engine at home while the big guys get to operate nuclear reactors…
You can get pretty far with a stack of 5090s and llama.cpp with split mode graph (or so I’ve heard, I’ve never tried), or AMD’s unified memory CPU thing.
It’s not as good as data centre grade stuff, but it’s not nothing either.
Same, pretty much. It is possible though, which makes LLMs a more democratic technology than, say, nuclear reactors.
The models you can run on consumer hardware are still nowhere near the stuff that runs in corporate data centers. To stick with your metaphor, its like running a little steam engine at home while the big guys get to operate nuclear reactors…
You can get pretty far with a stack of 5090s and llama.cpp with split mode graph (or so I’ve heard, I’ve never tried), or AMD’s unified memory CPU thing.
It’s not as good as data centre grade stuff, but it’s not nothing either.