Conducting deep web searches and gathering sources is one of the main things I’ve been using LLMs for. How far away are we from being able to self-host something like Claude’s web search capabilities? Or even just a service where I’d pay with my money instead of my data?

  • vapeloki@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    2 days ago

    To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.

    Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.

    Edit: see here for more details

    • Avid Amoeba@lemmy.ca
      link
      fedilink
      arrow-up
      1
      ·
      2 days ago

      Oh I see. Okay this makes sense. I just throw Qwen 3.6 35B Q8 on 2 GPUs and use it for everything but coding agent.