LocalAI

mirror/LocalAI

Fork 0

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 08:08:52 -04:00

Commit Graph

Author	SHA1	Message	Date
Ettore Di Giacinto	037ad82b7c	docs(paged): MXFP4-dense vs Q4_K quality gate on GB10 (do not recommend) Fair clean-source perplexity check on DGX Spark (GB10): quantize Qwen3-4B from one BF16 source to both Q4_K_M and MXFP4 (no imatrix, identical recipe). Q4_K_M is +2.6% PPL vs BF16; MXFP4-dense is +30.8% (+27.5% worse than Q4_K). The existing 32B MXFP4 was confirmed double-quant (Q4_K_M -> MXFP4 via --allow-requantize), but the clean 4B test shows the gap is intrinsic to the format, not the double-quant. Output stays coherent. Verdict: the ~1.58x prefill / ~1.2x decode win does not justify a Blackwell MXFP4-dense quality recommendation; keep Q4_K_M the dense default, pursue NVFP4 instead. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-21 17:25:14 +00:00
Ettore Di Giacinto	1887385b79	analysis: MXFP4-dense fails quality check (~27% worse PPL than Q4_K) - do not recommend Clean fair comparison (Qwen3-4B, all from same BF16 source, wikitext PPL): BF16 13.32, Q4_K_M 13.66 (+2.6%, near-lossless), MXFP4 17.42 (+30.8%). MXFP4 is ~27% worse than Q4_K even clean from BF16 (32B double-quant cross-check: 7.39 vs 8.46, +14.6%, same direction). MXFP4_MOE is built for MoE expert tensors; on dense attn/ffn it is far lossier than Q4_K's 6-bit superblock structure. The ~1.58x prefill is not worth ~27% PPL - Q4_K stays the dense default; FP4 only where the model is trained for it (MoE). Verdict: do NOT ship a Blackwell MXFP4-dense rec. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-21 17:24:24 +00:00

Author

SHA1

Message

Date

Ettore Di Giacinto

037ad82b7c

docs(paged): MXFP4-dense vs Q4_K quality gate on GB10 (do not recommend)

Fair clean-source perplexity check on DGX Spark (GB10): quantize Qwen3-4B
from one BF16 source to both Q4_K_M and MXFP4 (no imatrix, identical recipe).
Q4_K_M is +2.6% PPL vs BF16; MXFP4-dense is +30.8% (+27.5% worse than Q4_K).
The existing 32B MXFP4 was confirmed double-quant (Q4_K_M -> MXFP4 via
--allow-requantize), but the clean 4B test shows the gap is intrinsic to the
format, not the double-quant. Output stays coherent. Verdict: the ~1.58x
prefill / ~1.2x decode win does not justify a Blackwell MXFP4-dense quality
recommendation; keep Q4_K_M the dense default, pursue NVFP4 instead.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-21 17:25:14 +00:00

Ettore Di Giacinto

1887385b79

analysis: MXFP4-dense fails quality check (~27% worse PPL than Q4_K) - do not recommend

Clean fair comparison (Qwen3-4B, all from same BF16 source, wikitext PPL): BF16
13.32, Q4_K_M 13.66 (+2.6%, near-lossless), MXFP4 17.42 (+30.8%). MXFP4 is ~27%
worse than Q4_K even clean from BF16 (32B double-quant cross-check: 7.39 vs 8.46,
+14.6%, same direction). MXFP4_MOE is built for MoE expert tensors; on dense
attn/ffn it is far lossier than Q4_K's 6-bit superblock structure. The ~1.58x
prefill is not worth ~27% PPL - Q4_K stays the dense default; FP4 only where the
model is trained for it (MoE). Verdict: do NOT ship a Blackwell MXFP4-dense rec.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-21 17:24:24 +00:00

2 Commits