mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-26 09:26:55 -04:00
De-risk passed (test-backend-ops 52/52 bf16, f32 default byte-identical to 0023), and the throughput lever is real (recurrence -49%/call, dense ~490 t/s = 125% of vLLM clean). But bf16-vs-f32 KLD is 0.06-0.17 at >=1024 ctx (threshold 1e-3) with ~90% top-token agreement: intrinsic bf16 error over gated-DeltaNet long-memory heads, not a bug. That is exactly vLLM's own bf16 GDN precision. Shelved; ship the 95% bit-exact f32 plateau (0018-0023). bf16 work backed up on DGX (BF16_SSM_STATE.diff). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>