LocalAI

mirror/LocalAI

Fork 0

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 08:08:52 -04:00

Commit Graph

Author	SHA1	Message	Date
Ettore Di Giacinto	aaf7b4112e	test(llama-cpp): NVFP4-dense FP4 quality+speed eval on GB10 NVFP4-dense is producible via --tensor-type attn=nvfp4 --tensor-type ffn=nvfp4 (GGML_TYPE_NVFP4 has a full quantize path; no top-level ftype needed). Clean-from-BF16 4B PPL: NVFP4 14.31 vs Q4_K 13.66 vs MXFP4 17.42 vs BF16 13.32 - Q4_K-class, not MXFP4-class. Prefill routes onto the FP4 MMA kernel (~1.29x Q4_K on 4B, within 5% of MXFP4). It is the quality-preserving FP4 win MXFP4 was not. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-21 18:44:57 +00:00

Author

SHA1

Message

Date

Ettore Di Giacinto

aaf7b4112e

test(llama-cpp): NVFP4-dense FP4 quality+speed eval on GB10

NVFP4-dense is producible via --tensor-type attn=nvfp4 --tensor-type ffn=nvfp4
(GGML_TYPE_NVFP4 has a full quantize path; no top-level ftype needed). Clean-from-BF16
4B PPL: NVFP4 14.31 vs Q4_K 13.66 vs MXFP4 17.42 vs BF16 13.32 - Q4_K-class, not
MXFP4-class. Prefill routes onto the FP4 MMA kernel (~1.29x Q4_K on 4B, within 5% of
MXFP4). It is the quality-preserving FP4 win MXFP4 was not.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-21 18:44:57 +00:00

1 Commits