mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-23 16:19:07 -04:00
Fair clean-source perplexity check on DGX Spark (GB10): quantize Qwen3-4B from one BF16 source to both Q4_K_M and MXFP4 (no imatrix, identical recipe). Q4_K_M is +2.6% PPL vs BF16; MXFP4-dense is +30.8% (+27.5% worse than Q4_K). The existing 32B MXFP4 was confirmed double-quant (Q4_K_M -> MXFP4 via --allow-requantize), but the clean 4B test shows the gap is intrinsic to the format, not the double-quant. Output stays coherent. Verdict: the ~1.58x prefill / ~1.2x decode win does not justify a Blackwell MXFP4-dense quality recommendation; keep Q4_K_M the dense default, pursue NVFP4 instead. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>