Synthesizes the bf16 SSM recurrent-state-cache plan into a build-agent brief:
ordered file-by-file edit list (kernel/op dtype-generic first, then cparams
default flip, gRPC/YAML, back-compat), the KL<1e-3 + PPL-delta + coherence +
long-context-drift acceptance gate that REPLACES the bit-exact md5 gate (bf16 is
intentionally non-bit-exact, equal precision to vLLM), bench targets (recurrence
3.98->2-3 ms/call, step 384->289-339 ms, 360-443 tok/s dense) + nsys check, the
default-bf16/f32-opt-out semantics + state-file back-compat, the risk register,
and the single biggest risk (silent corruption on the prefill/keep_rs_t/gather
paths) with the de-risk-first test-backend-ops step. Conv state stays f32 in v1.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>