mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-23 08:08:52 -04:00
Clean fair comparison (Qwen3-4B, all from same BF16 source, wikitext PPL): BF16 13.32, Q4_K_M 13.66 (+2.6%, near-lossless), MXFP4 17.42 (+30.8%). MXFP4 is ~27% worse than Q4_K even clean from BF16 (32B double-quant cross-check: 7.39 vs 8.46, +14.6%, same direction). MXFP4_MOE is built for MoE expert tensors; on dense attn/ffn it is far lossier than Q4_K's 6-bit superblock structure. The ~1.58x prefill is not worth ~27% PPL - Q4_K stays the dense default; FP4 only where the model is trained for it (MoE). Verdict: do NOT ship a Blackwell MXFP4-dense rec. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>