diff --git a/backend/cpp/llama-cpp-localai-paged/README.md b/backend/cpp/llama-cpp-localai-paged/README.md
index 087c20a34..1d29cc1f3 100644
--- a/backend/cpp/llama-cpp-localai-paged/README.md
+++ b/backend/cpp/llama-cpp-localai-paged/README.md
@@ -30,11 +30,11 @@ vendored patch series over upstream llama.cpp that adds
   gated-DeltaNet (SSM) models, where the recurrent-state plumbing - not the FP4
   GEMM - dominates the decode step.
 
-It is **pinned to llama.cpp `9d5d882d`** (kept == the stock `llama-cpp` backend's
+It is **pinned to llama.cpp `0ed235ea2c17a19fc8238668653946721ed136fd`** (kept == the stock `llama-cpp` backend's
 pin) and advanced only by a manual, bit-exact-gated pin-sync process (see
 section 7, "Pin + maintenance policy"), decoupled from the nightly auto-bumper. The pin must stay aligned with the stock pin because
 `grpc-server.cpp` is shared; an earlier bump to `c299a92c` was bit-exact but broke
-the grpc-server link and was reverted.
+the grpc-server link and was reverted to the then-current stock pin.
 
 The build gate is `LLAMA_PAGED` (default on in this tree); the paged engine is
 enabled per-model at runtime via the gallery `options:` knobs (`paged_kv:true`,
@@ -497,7 +497,7 @@ targeted is already recovered by the gather-fusion + block-table cache.
   per commit) from that branch, which is the pin commit plus the paged patch
   commits in order, so there is no more hand-export drift between the dev tree and
   the shipped series.
-- **Pinned to llama.cpp `9d5d882d`** (kept == the stock `llama-cpp` pin). The pin
+- **Pinned to llama.cpp `0ed235ea2c17a19fc8238668653946721ed136fd`** (kept == the stock `llama-cpp` pin). The pin
   is advanced **only** by the manual pin-sync process (this section):
   rebase the source-only patch series onto the new tip, rebuild on GPU, pass the
   bit-exact gate on every path (dense + MoE, paged + non-paged) plus
@@ -507,7 +507,7 @@ targeted is already recovered by the gather-fusion + block-table cache.
   server-API refactor breaks the grpc-server LINK even when the patches are
   bit-exact. A bump to `c299a92c` (23 commits ahead of stock) was greedy-md5
   bit-exact but failed to link (undefined `stream_*` server helpers introduced by
-  the refactor), and was reverted to `9d5d882d`. The bit-exact gate alone does not
+  the refactor), and was reverted to the then-current stock pin. The bit-exact gate alone does not
   catch this; only the full CI grpc-server build does.
 - **Decoupled from the nightly auto-bumper.** There is deliberately **no**
   `bump_deps.yaml` entry for this backend - a naive `LLAMA_VERSION` bump could
diff --git a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
index 0084fb4f0..23ab4ce18 100644
--- a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
@@ -87,7 +87,7 @@ Because the dir now permanently contains an `owner` file, **release with `rm -rf
 
 A separate 0-byte `~/bench/gpu.lock` is legacy/unrelated - ignore.
 
-**Always gate on BOTH** `nvidia-smi --query-compute-apps=pid` count == 0 **and** `owner` FREE before benching. Concurrent jobs share this GPU: an offline-repack Marlin workflow, an `~/.cache/autoresearch-quant/` quant pipeline (this is the `llama-imatrix` class of job), and finetune trees. The canonical harnesses poll for GPU-idle up to 2h.
+**Always gate on ALL THREE** before benching or building on DGX: `nvidia-smi --query-compute-apps=pid` count == 0, `owner` FREE, and `docker ps` shows no running containers. In particular, do not start work while a `local-ai-worker` container is running. Concurrent jobs share this GPU: an offline-repack Marlin workflow, an `~/.cache/autoresearch-quant/` quant pipeline (this is the `llama-imatrix` class of job), finetune trees, and LocalAI worker containers. The canonical harnesses poll for GPU-idle up to 2h.
 
 ### 3.2 Build (long; run detached + poll)
 - **Mainline / canonical grpc-server + binaries: CUDA arch `121`** (`-DCMAKE_CUDA_ARCHITECTURES=121`). Runtime banner shows `ARCHS = 1210 | BLACKWELL_NATIVE_FP4 = 1`.
@@ -268,16 +268,16 @@ Only pursue if (a)+(b) are not options and someone explicitly wants the residual
 - Graph-node-traced high-N profiles: `~/highN_prof2/*.nsys-rep` (paged npl=256), `~/highN_vllm/*.nsys-rep` (vLLM), 2026-06-30.
 - A/B dirs: `~/bench/marlin_gate/`, `~/bench/gdn_p1_ab/`.
 
-### Unpushed doc commits (in this worktree, not on origin)
+### Recent context commits
 - `6edbb56b0` "docs(paged): definitive vLLM-parity final-state record (GB10, CLOSED)" - adds `VLLM_PARITY_FINAL.md`.
 - `baf102524` "docs(paged): correct decode-serving record to ~86% GPU-steady parity (graph-node-traced)" - the ~56% -> ~86% correction.
 - `bd100dd20` "fix(paged): repair the patch series, sync to the fork branch" - dropped dev-tree 0044/0045, added f32-only M5 as 0047.
 - `b028c81ed` "docs(paged): record padded/fixed-slot decode shape as tested-and-rejected".
 
 ### Discrepancies to flag / resolve (carried verbatim from the gather, including UNVERIFIED labels)
-1. **Pin mismatch.** Makefile line 52 `LLAMA_VERSION?=0ed235ea2c17a19fc8238668653946721ed136fd` (authoritative, what builds; recent `ea72a56e2` / `2c5980526` pin-synced to it) vs README section 7 prose `9d5d882d` and `VLLM_PARITY_FINAL.md` "backend pin 9d5d882d" (STALE). Hard rule: the paged pin must equal the stock `llama-cpp` pin (shared `grpc-server.cpp`); a bump to `c299a92c` once broke the grpc-server link despite being bit-exact and was reverted. Trust the Makefile; fix the prose.
+1. **Pin prose reconciled in this worktree.** Makefile line 52 `LLAMA_VERSION?=0ed235ea2c17a19fc8238668653946721ed136fd` is authoritative and matches the local fork merge-base. Hard rule: the paged pin must equal the stock `llama-cpp` pin (shared `grpc-server.cpp`); a bump to `c299a92c` once broke the grpc-server link despite being bit-exact and was reverted. Trust the Makefile when building.
 2. **Both DGX checkouts are dirty** (`gated_delta_net.cu` modified in each), and the fork HEAD (`51168c5ee`, patch 0044) differs from the dev-tree HEAD (`a7d439e`, M8 bf16) that actually produced the `COMBINED_DEFINITIVE` numbers.
-3. **Worktree patch 0044 is committed on the fork but untracked here** (`patches/paged/0044-*.patch` shows `??`).
+3. **Worktree patch 0044 is now tracked here.** LocalAI commit `2033086f6` added `patches/paged/0044-feat-paged-fused-gated-RMSNorm-SiLU-gate-mul.patch`; the only current untracked path in this worktree is `.claude/`.
 4. **`sm_121a` is not in the worktree build files** - it lives only in the DGX experimental build scripts (`gdn_cc.sh`, `gdn_bv_build.sh`, `paged-build.sh`); mainline uses arch `121`. **UNVERIFIED** whether the shipped CI Dockerfile build path injects `121a` for the FP4-MMA kernels (`Dockerfile.llama-cpp-localai-paged` does not hardcode a CUDA arch).
 5. **The `0921716...` paged-MoE md5 open item.** `COMBINED_DEFINITIVE.txt` records `PAGED_GATE_MD5=0921716cd0582b5d15af8c362b811d00` for MoE, but a full doc/patch/`git log -S` grep of the worktree found **no** occurrence of `0921716...` in any committed source; the committed canonical paged-MoE gate is `8cb0ce23`. Treat this as **unreconciled**: the documented, KL-validated paged-MoE gate remains `8cb0ce23`, and any paged-MoE divergence (including `0921716`) must be KL-validated against the f16 reference before being accepted as benign, never on assertion alone. The `0921716` value is **UNVERIFIED** as a sanctioned gate; do not adopt it as canonical without re-running the KL gate. The **dense** run is symmetric: `COMBINED_DEFINITIVE.txt` records `PAGED_GATE_MD5=ecfe924dee6c5622c149f419ff2a6481` for dense, which likewise differs from the canonical dense gate `5951a5b4`. Both CDEF `PAGED_GATE_MD5` values come from the `combined_definitive.sh` harness's own gate command, NOT the canonical bit-exact gate command in section 3.3, which is why they diverge from the committed `8cb0ce23` / `5951a5b4`; neither is a sanctioned gate and both must be KL-validated before being treated as benign.
 
diff --git a/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_FINAL.md b/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_FINAL.md
index 1f1342348..28ee15268 100644
--- a/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_FINAL.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_FINAL.md
@@ -33,7 +33,9 @@ Source key (every number below cites one of these):
 Two models: the MoE **Qwen3.6-35B-A3B-NVFP4** (decision model, 256 experts top-8,
 30 GDN + 10 full-attn layers + a dense shared expert per layer) and the dense
 **Qwen3.6-27B-NVFP4** (48 GDN + 16 full-attn). All numbers GB10 / CUDA 13 /
-sm_121, backend pin `9d5d882d`.
+sm_121. The current backend pin is `0ed235ea2c17a19fc8238668653946721ed136fd`;
+the CDEF benchmark artifact itself records the dev-tree commit that produced
+those binaries.
 
 ### 1a. Prefill (S_PP, prefill tokens/s)