Append the four-point synthesis to A2_CUDAGRAPH_DECODE.md: measured
CUDA-graph lever size (<1%, not the guessed 10-20%), the corrected
'eager' premise (default paged decode already captures), the unchanged
37-38% of vLLM at npl128, and the honest verdict that A.2 closes none of
the 2.6x gap because paged attention touches ~0.4% of decode on this
hybrid-SSM model. Residual lever is the qwen35 gated-DeltaNet SSM path
(state D2D copy + get_rows gather), orthogonal to paged attention.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>