No, it’s not quite as strong, and especially the initial prefill can take a bit. I also sometimes run into infinite thinking loops where I have to stop it and re-run my last prompt.
It’s surprising how close Qwen 3.6 gets on the benchmarks to Claude models, though. Especially when running locally with 200k context, I’ve found it’s good enough to be a daily driver. Despite the faults, it’s better than paying Anthropic $200 a month so they can rate limit me and collect my data.
No, it’s not quite as strong, and especially the initial prefill can take a bit. I also sometimes run into infinite thinking loops where I have to stop it and re-run my last prompt.
It’s surprising how close Qwen 3.6 gets on the benchmarks to Claude models, though. Especially when running locally with 200k context, I’ve found it’s good enough to be a daily driver. Despite the faults, it’s better than paying Anthropic $200 a month so they can rate limit me and collect my data.
I prefer to run with cheap pay-per-prompt cloud model. You can find really good open models that cost $0.50 per million tokens.