mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-28 18:37:43 -04:00
The DGX re-run showed toggling LLAMA_KV_PAGED on/off on the patched binary does NOT reproduce stock: the dominant SSM decode fusions are compiled in, not runtime-gated, so the toggle measures only the (here ~neutral) paged-KV part. True stock needs a separately-built unpatched binary at the same pin. Correct the methodology skill's per-lever discipline + apples-to-apples rule accordingly. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>