LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto aaf7b4112e test(llama-cpp): NVFP4-dense FP4 quality+speed eval on GB10

NVFP4-dense is producible via --tensor-type attn=nvfp4 --tensor-type ffn=nvfp4
(GGML_TYPE_NVFP4 has a full quantize path; no top-level ftype needed). Clean-from-BF16
4B PPL: NVFP4 14.31 vs Q4_K 13.66 vs MXFP4 17.42 vs BF16 13.32 - Q4_K-class, not
MXFP4-class. Prefill routes onto the FP4 MMA kernel (~1.29x Q4_K on 4B, within 5% of
MXFP4). It is the quality-preserving FP4 win MXFP4 was not.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-21 18:44:57 +00:00

paged

test(llama-cpp): NVFP4-dense FP4 quality+speed eval on GB10

2026-06-21 18:44:57 +00:00

patches

bench(dense): root-cause the W4A4 NVFP4 hang; W4A16 vs Q4 is the headline

2026-06-20 06:59:50 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )