Files
LocalAI/backend/cpp/llama-cpp
Ettore Di Giacinto 14e3da25b6 kernel: dense MXFP4 test = free 1.44x (765->1153) but FP4-MMA untuned (~17% of ceiling)
MXFP4 dense moves prefill off int8-MMQ onto the FP4-MMA path (existing kernel) for
a free 1.44x - shippable as a Blackwell dense-quant recommendation. But it's ~17%
of the FP4 roofline, so the FP4-MMA kernel is itself untuned: ~4-6x still in the
kernel. Sharpens the target to TUNING the FP4-MMA (serves dense+MoE, only path to
beat vLLM). Marlin-style W4A16 BF16 is the alt to match on the BF16 ceiling.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 07:48:29 +00:00
..
2026-04-12 08:51:30 +02:00