mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-23 16:19:07 -04:00
Benchmark confirms dense prefill 7.6-32x behind too, so the kernel track needs a non-grouped FP4 dense GEMM (simpler, land first) + the MoE grouped GEMM. Both share the e2m1 block-scaled collective; dense is grouped-with-one-group. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>