LLM with Web Search functionality

SubArcticTundra@lemmy.ml · 3 days ago

LLM with Web Search functionality

Avid Amoeba@lemmy.ca · edit-2 2 days ago

Curious why do you swap between Qwen and E4B. On my hardware they perform with similar tps. Qwen 3.6 35B spits out 80-100tps on AMD 9700 and E4B gives me about the same tps.

vapeloki@lemmy.world · edit-2 2 days ago

To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.

Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.

Edit: see here for more details

Avid Amoeba@lemmy.ca · 2 days ago

Oh I see. Okay this makes sense. I just throw Qwen 3.6 35B Q8 on 2 GPUs and use it for everything but coding agent.