mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-24 16:49:06 -04:00
First crash-resilient slab of the apples-to-apples NVFP4-vs-NVFP4 llama.cpp-vs-vLLM benchmark on GB10. MoE Qwen3.6-35B-A3B paged llama.cpp (patch 0015) decode/prefill/TTFT/VRAM at npl 8/32/64/128. vLLM and dense tables append as the sweeps land. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>