Commit Graph

1 Commits

Author SHA1 Message Date
Ettore Di Giacinto
aaf7b4112e test(llama-cpp): NVFP4-dense FP4 quality+speed eval on GB10
NVFP4-dense is producible via --tensor-type attn=nvfp4 --tensor-type ffn=nvfp4
(GGML_TYPE_NVFP4 has a full quantize path; no top-level ftype needed). Clean-from-BF16
4B PPL: NVFP4 14.31 vs Q4_K 13.66 vs MXFP4 17.42 vs BF16 13.32 - Q4_K-class, not
MXFP4-class. Prefill routes onto the FP4 MMA kernel (~1.29x Q4_K on 4B, within 5% of
MXFP4). It is the quality-preserving FP4 win MXFP4 was not.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-21 18:44:57 +00:00