Qwen3 8B sorry, Idiot spelling. I use it to talk about problems when I have no internet or maxed out on Claude. I can rarely trust it with anything reasoning related, it’s faster and easier to do most things myself.
There are tons of places to get free access to bigger models. I’d suggest Jamba, Kimi, Deepseek Chat, and Google AI Studio, and the new GLM chat app: https://chat.z.ai/
And depending on your hardware, you can probably run better MoEs at the speed of 8Bs. Qwen3 30B is so much smarter its not even funny, and faster on CPU.
You can probably just use ollama and import the model.
It’s going to be slow as molasses on ollama. It needs a better runtime, and GLM 4.5 probably isn’t supported at this moment anyway.
I’m running Qwen 3B and it is seldom useful
It’s too small.
IDK what your platform is, but have you tried Qwen3 A3B? Or smallthinker 21B?
https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct
The speed should be somewhat similar.
Qwen3 8B sorry, Idiot spelling. I use it to talk about problems when I have no internet or maxed out on Claude. I can rarely trust it with anything reasoning related, it’s faster and easier to do most things myself.
Yeah, 7B models are just not quite there.
There are tons of places to get free access to bigger models. I’d suggest Jamba, Kimi, Deepseek Chat, and Google AI Studio, and the new GLM chat app: https://chat.z.ai/
And depending on your hardware, you can probably run better MoEs at the speed of 8Bs. Qwen3 30B is so much smarter its not even funny, and faster on CPU.