• BlackLaZoR@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    13 hours ago

    TBH local models aren’t as good as cloud. Even with 16GB VRAM you aren’t getting anywhere close to >100GB cloud LLM

    • melfie@lemmy.zip
      link
      fedilink
      English
      arrow-up
      4
      ·
      8 hours ago

      No, it’s not quite as strong, and especially the initial prefill can take a bit. I also sometimes run into infinite thinking loops where I have to stop it and re-run my last prompt.

      It’s surprising how close Qwen 3.6 gets on the benchmarks to Claude models, though. Especially when running locally with 200k context, I’ve found it’s good enough to be a daily driver. Despite the faults, it’s better than paying Anthropic $200 a month so they can rate limit me and collect my data.

      • BlackLaZoR@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 hours ago

        I prefer to run with cheap pay-per-prompt cloud model. You can find really good open models that cost $0.50 per million tokens.