LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-24 16:49:06 -04:00

Files

Ettore Di Giacinto ee78ae4a11 docs(paged): Qwen3.6 NVFP4 h2h bench doc - MoE llama.cpp table

First crash-resilient slab of the apples-to-apples NVFP4-vs-NVFP4
llama.cpp-vs-vLLM benchmark on GB10. MoE Qwen3.6-35B-A3B paged
llama.cpp (patch 0015) decode/prefill/TTFT/VRAM at npl 8/32/64/128.
vLLM and dense tables append as the sweeps land.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-23 19:43:55 +00:00

paged

feat(paged): target-readiness for 2xH200 - correctness PASS, load-gen harness, projection

2026-06-21 23:16:28 +00:00

patches

docs(paged): Qwen3.6 NVFP4 h2h bench doc - MoE llama.cpp table

2026-06-23 19:43:55 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat(llama-cpp): per-model max_prefill_tokens option (chunked-prefill QoS budget)