schizoidman@lemm.ee to Technology@lemmy.worldEnglish · 8 days agoDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comexternal-linkmessage-square22fedilinkarrow-up1176arrow-down118 cross-posted to: technology@lemmy.zip
arrow-up1158arrow-down1external-linkDeepSeek's distilled new R1 AI model can run on a single GPU | TechCrunchtechcrunch.comschizoidman@lemm.ee to Technology@lemmy.worldEnglish · 8 days agomessage-square22fedilink cross-posted to: technology@lemmy.zip
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up2·edit-27 days agoDepends on the quantization. 7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.
Depends on the quantization.
7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.