Files
LocalAI/backend/cpp/llama-cpp/patches/paged
Ettore Di Giacinto 621a20d2b5 feat(paged): backend-gate fused GDN/discriminated SSM_CONV emission (patch 0030)
Closes audit RISKY-1 (the one latent silent-miscompute hazard). The fused/in-place
Gated Delta Net op (0018/0019/0026) and the discriminated SSM_CONV decode op
(0021/0028, which REUSE GGML_OP_SSM_CONV / GGML_OP_GATED_DELTA_NET via a non-null
src[3]/src[4] discriminator) are CUDA+CPU-only but were emitted DEFAULT-ON
(cparams.fused_gdn_ar/ch=true, auto_fgdn=true) with no backend guard. A backend
that supports plain SSM_CONV but ignores the discriminator (Vulkan/SYCL/Metal)
would run the wrong plain conv => silent corruption.

Fix: in llama_context::sched_reserve(), before the auto_fgdn resolution, force
fused_gdn_ar = fused_gdn_ch = auto_fgdn = false when any non-CPU compute backend
is not CUDA-family (reg name not "CUDA"/"ROCm"/"MUSA"). Every emission site keys
off these flags, so the graph falls back to the upstream non-fused plain
ggml_ssm_conv + ggml_silu path that every backend handles. On CUDA the reg name is
"CUDA", the flags are left untouched, and the decode graph is byte-identical.

Mirror of DGX paged patch 0030; adds FUSED_OP_BACKEND_GATE_RESULTS.md.

Verified GPU-free: reconstructed pin 9d5d882d + paged 0001-0029 + 0030, CPU-only
build (GGML_CUDA=OFF) of libllama + test-backend-ops links with 0 errors; 0030
applies cleanly via git apply and patch -p1. test-backend-ops correctness for
SSM_CONV/SSM_CONV_UPDATE(_IDS)/GATED_DELTA_NET is CUDA0-vs-CPU (pending DGX,
tunnel offline this session); registered test cases will exercise it.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 07:32:49 +00:00
..