mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
Lever A patch + build/de-risk results. Splits the persisted gated-DeltaNet recurrent state per head: f32 on long-memory heads (where bf16 rounding does not contract and the KL error concentrates), bf16 on fast-decaying heads, classified at model load by tau_h = 1/(|ssm_a|*softplus(ssm_dt)). Default ssm_hybrid_tau_thresh = 0.0 keeps every head f32 (bit-exact opt-out). De-risk gates BOTH PASS: test-backend-ops GATED_DELTA_NET CUDA0 OK (incl 32 hybrid mixed CUDA-vs-CPU cases); default all-f32 greedy md5 == 0023 baseline both models (dense 5951a5b4d624ce891e22ab5fca9bc439, MoE 07db32c2bcb78d17a43ed18bc22705cd). Known open issue (opt-in hybrid only; default unaffected): hybrid-ON model decode (ids in-place path) is incoherent; classifier/cache/kernel-params verified correct, bug isolated to the ids in-place cross-step state path. See A_HYBRID_SSM_RESULTS.md. Not ready for the GateSweep until fixed. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code]