mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-26 01:16:58 -04:00
Synthesizes the bf16 SSM recurrent-state-cache plan into a build-agent brief: ordered file-by-file edit list (kernel/op dtype-generic first, then cparams default flip, gRPC/YAML, back-compat), the KL<1e-3 + PPL-delta + coherence + long-context-drift acceptance gate that REPLACES the bit-exact md5 gate (bf16 is intentionally non-bit-exact, equal precision to vLLM), bench targets (recurrence 3.98->2-3 ms/call, step 384->289-339 ms, 360-443 tok/s dense) + nsys check, the default-bf16/f32-opt-out semantics + state-file back-compat, the risk register, and the single biggest risk (silent corruption on the prefill/keep_rs_t/gather paths) with the de-risk-first test-backend-ops step. Conv state stays f32 in v1. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>