mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-26 09:26:55 -04:00
Ranked pick-up points after the 95%-bit-exact plateau: hybrid-precision SSM state (per-head f32/bf16 split - the bf16 error is concentrated in long-memory heads, so a split could capture most of the +25-31% while passing the f32 KL gate), dense CUDA-graph instability, the rms_norm->fp4 fold (flat-risk), datacenter Blackwell sm_100 (no LPDDR5x floor), adaptive prefill budget, MoE-specific recurrence tuning. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>