diff --git a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
index ab2aad514..10ee7416d 100644
--- a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
@@ -2668,6 +2668,60 @@ Decision:
 - Treat this artifact as a harness failure investigation, not a benchmark.
 - Retry Phase47 only after the Phase48 readiness/cleanup hardening is present.
 
+## Phase 47 Dense Serving Snapshot Retry
+
+After Phase48 hardening, Phase47 was retried and completed successfully.
+
+Artifact:
+
+- `/home/mudler/bench/phase47_dense_serving_retry/20260701_100811`
+
+Run shape:
+
+- `MODEL=$HOME/bench/q36-27b-nvfp4.gguf`
+- `VLLM_MODEL=$HOME/bench/q36-27b-nvfp4-vllm`
+- `SERVED_MODEL_NAME=dense-q36`
+- `NPL="1 8 32 128"`, `PARALLEL=128`, `CTX=131072`, `PTOK=128`, `GEN=64`
+- `OPS=MUL_MAT,MUL_MAT_ID`, `VLLM_READY_ATTEMPTS=700`
+
+Pre/post gates:
+
+| phase | MoE md5 | dense md5 | `MUL_MAT` | `MUL_MAT_ID` |
+|-------|---------|-----------|-----------|--------------|
+| pre | `8cb0ce23777bf55f92f63d0292c756b0` | `5951a5b4d624ce891e22ab5fca9bc439` | `1146/1146` | `806/806` |
+| post | `8cb0ce23777bf55f92f63d0292c756b0` | `5951a5b4d624ce891e22ab5fca9bc439` | `1146/1146` | `806/806` |
+
+Results:
+
+| arm | n | agg t/s | decode agg t/s | decode per-seq t/s | prefill t/s | TTFT ms |
+|-----|---|---------|-----------------|---------------------|-------------|---------|
+| paged | 1 | `12.5` | `13.3` | `13.11` | `515.1` | `312.5` |
+| vLLM | 1 | `9.6` | `9.9` | `9.72` | `983.6` | `166.7` |
+| paged | 8 | `61.8` | `85.2` | `10.39` | `579.5` | `2201.4` |
+| vLLM | 8 | `67.6` | `73.7` | `9.04` | `2147.7` | `544.0` |
+| paged | 32 | `105.9` | `198.7` | `5.44` | `595.8` | `7442.7` |
+| vLLM | 32 | `171.7` | `219.9` | `6.49` | `2094.4` | `2041.9` |
+| paged | 128 | `139.6` | `360.8` | `1.86` | `608.1` | `21177.2` |
+| vLLM | 128 | `275.3` | `456.0` | `2.89` | `1889.6` | `6615.7` |
+
+Ratios:
+
+| n | paged decode / vLLM | paged per-seq / vLLM | paged agg / vLLM | paged TTFT / vLLM |
+|---|---------------------|----------------------|------------------|-------------------|
+| 1 | `1.3434` | `1.3488` | `1.3021` | `1.8746` |
+| 8 | `1.1560` | `1.1493` | `0.9142` | `4.0467` |
+| 32 | `0.9036` | `0.8382` | `0.6168` | `3.6450` |
+| 128 | `0.7912` | `0.6436` | `0.5071` | `3.2011` |
+
+Decision:
+
+- Dense decode is ahead of vLLM at low concurrency (`n=1/8`) but falls behind
+  at `n=32/128`; this mirrors the broader conclusion that low-N decode can be
+  strong while prefill/TTFT and higher-concurrency serving remain gaps.
+- Dense TTFT remains much worse than vLLM at all tested concurrency points, so
+  dense serving does not change the GB10 conclusion or reopen closed shortcut
+  work.
+
 ## Phase 48 Serving Harness Readiness Hardening
 
 Phase 48 fixes the harness behavior exposed by the failed dense snapshot
diff --git a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
index 85d7e63dc..09ec2a4f6 100644
--- a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
@@ -598,6 +598,14 @@ with `curl --max-time 2`, and uses bounded server cleanup that escalates from
 `/home/mudler/bench/phase48_readiness_harness_dryrun/20260701_100533`, with
 `VLLM_READY_ATTEMPTS=700` printed and clean DGX preflight.
 
+Phase 47 retry completed after Phase48. Artifact:
+`/home/mudler/bench/phase47_dense_serving_retry/20260701_100811`. Pre/post
+gates were green: MoE `8cb0ce23777bf55f92f63d0292c756b0`, dense
+`5951a5b4d624ce891e22ab5fca9bc439`, `MUL_MAT` `1146/1146`, `MUL_MAT_ID`
+`806/806`. Dense paged decode beats vLLM at low concurrency (`1.3434x` at `n=1`,
+`1.1560x` at `n=8`) but falls behind at `n=32/128` (`0.9036x`, `0.7912x`), and
+TTFT remains `1.87x` to `4.05x` vLLM. This does not change the GB10 conclusion.
+
 ---
 
 ## 5. METHODOLOGY LESSONS (so you do not repeat the mistakes)
@@ -688,6 +696,7 @@ Only pursue if (a)+(b) are not options and someone explicitly wants the residual
 - `~/bench/phase47_dense_serving_dryrun/20260701_095141` - dense serving dry-run with `SERVED_MODEL_NAME=dense-q36`.
 - `~/bench/phase47_dense_serving/20260701_095151` - incomplete dense serving attempt; pre-gates and paged arm completed, vLLM did not produce result JSONs under the old readiness budget.
 - `~/bench/phase48_readiness_harness_dryrun/20260701_100533` - harness dry-run proving configurable readiness budgets and clean preflight before retrying dense serving.
+- `~/bench/phase47_dense_serving_retry/20260701_100811` - completed dense serving snapshot after Phase48; pre/post md5 and op gates green; paged low-N decode ahead, high-N aggregate and TTFT behind.
 - Per-engine logs `~/bench/COMBINED_{paged,vllm}_{MOE,DENSE}_server.log`; `~/bench/BENCHMARK_PROGRESS.md`.
 - Graph-node-traced high-N profiles: `~/highN_prof2/*.nsys-rep` (paged npl=256), `~/highN_vllm/*.nsys-rep` (vLLM), 2026-06-30.
 - A/B dirs: `~/bench/marlin_gate/`, `~/bench/gdn_p1_ab/`.
diff --git a/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_LEVER_MAP.md b/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_LEVER_MAP.md
index 52c071f63..7b95ee648 100644
--- a/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_LEVER_MAP.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/VLLM_PARITY_LEVER_MAP.md
@@ -1165,6 +1165,27 @@ DGX dry-run artifact:
 run printed `VLLM_READY_ATTEMPTS=700` with clean preflight. Retry dense serving
 snapshots with this hardening before interpreting dense paged-vs-vLLM ratios.
 
+### Phase 47 dense serving snapshot retry
+
+After Phase48, the dense snapshot completed at
+`/home/mudler/bench/phase47_dense_serving_retry/20260701_100811` with pre/post
+gates green: MoE `8cb0ce23777bf55f92f63d0292c756b0`, dense
+`5951a5b4d624ce891e22ab5fca9bc439`, `MUL_MAT` `1146/1146`, and `MUL_MAT_ID`
+`806/806`.
+
+Dense paged-vs-vLLM ratios:
+
+| n | paged decode / vLLM | paged per-seq / vLLM | paged agg / vLLM | paged TTFT / vLLM |
+|---|---------------------|----------------------|------------------|-------------------|
+| 1 | `1.3434` | `1.3488` | `1.3021` | `1.8746` |
+| 8 | `1.1560` | `1.1493` | `0.9142` | `4.0467` |
+| 32 | `0.9036` | `0.8382` | `0.6168` | `3.6450` |
+| 128 | `0.7912` | `0.6436` | `0.5071` | `3.2011` |
+
+Decision: dense low-N decode remains a real paged strength, but dense serving
+still does not close GB10 parity because TTFT and high-concurrency aggregate
+throughput remain substantially behind vLLM.
+
 Relevant files (all absolute): `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/{DECODE_SERVING_SCOPE.md,PREFILL_GEMM_SCOPE.md,PREFILL_GEMM_RESULTS.md,TENSORCORE_GDN_SCOPE.md,final_benchmark.csv}`, `.../README.md`, `.../patches/paged/0034-feat-paged-native-NVFP4-W4A4-FP4-MMA-large-M-prefill.patch` (P1/P2), `.../patches/paged/0042-feat-paged-fused-residual-add-RMS-norm-weight-multip.patch` (P7), `.../patches/paged/0031` (P4), `0025` (D1), `0018/0022` (D4/D5), `0009/0010` (D3/D6/D7); graph source `/home/mudler/_git/LocalAI/backend/cpp/llama-cpp-paged-dev/src/{models/qwen35moe.cpp,models/delta-net-base.cpp,llama-graph.cpp}`.
 
 ### Phase 10 GDN C32 slab update
diff --git a/docs/superpowers/plans/2026-07-01-dense-serving-snapshot-phase47.md b/docs/superpowers/plans/2026-07-01-dense-serving-snapshot-phase47.md
index b1984a6c1..38175c7d4 100644
--- a/docs/superpowers/plans/2026-07-01-dense-serving-snapshot-phase47.md
+++ b/docs/superpowers/plans/2026-07-01-dense-serving-snapshot-phase47.md
@@ -28,7 +28,7 @@ Expected: exit `0`, docker/local-ai-worker/GPU compute all zero, dense model pat
 **Files:**
 - Test: `backend/cpp/llama-cpp-localai-paged/paged-current-serving-snapshot.sh`
 
-- [ ] **Step 1: Run full dense snapshot after Phase48 hardening**
+- [x] **Step 1: Run full dense snapshot after Phase48 hardening**
 
 ```bash
 ssh dgx.casa 'set -euo pipefail; ART=$HOME/bench/phase47_dense_serving/$(date +%Y%m%d_%H%M%S); SRC=$HOME/llama-phase6-source BUILD_DIR=$HOME/llama-phase6-source/build-phase36 BIN=$HOME/llama-phase6-source/build-phase36/bin MODEL=$HOME/bench/q36-27b-nvfp4.gguf VLLM_MODEL=$HOME/bench/q36-27b-nvfp4-vllm SERVED_MODEL_NAME=dense-q36 ART=$ART NPL="1 8 32 128" PARALLEL=128 CTX=131072 PTOK=128 GEN=64 OPS=MUL_MAT,MUL_MAT_ID bash -s' < backend/cpp/llama-cpp-localai-paged/paged-current-serving-snapshot.sh
@@ -41,6 +41,10 @@ First attempt status: incomplete at
 paged arm completed, but vLLM startup exceeded the old fixed readiness budget
 and produced no vLLM result JSONs. Retry only after Phase48 readiness hardening.
 
+Retry status: completed at
+`/home/mudler/bench/phase47_dense_serving_retry/20260701_100811` after Phase48
+with `VLLM_READY_ATTEMPTS=700`.
+
 ### Task 3: Record dense snapshot result
 
 **Files:**
@@ -49,11 +53,11 @@ and produced no vLLM result JSONs. Retry only after Phase48 readiness hardening.
 - Modify: `backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md`
 - Modify: `docs/superpowers/plans/2026-07-01-dense-serving-snapshot-phase47.md`
 
-- [ ] **Step 1: Summarize artifact outputs**
+- [x] **Step 1: Summarize artifact outputs**
 
 Record the dry-run artifact, full snapshot artifact, pre/post md5/op gate status, and the ratio rows from `summary.tsv`.
 
-- [ ] **Step 2: Mark completed plan items**
+- [x] **Step 2: Mark completed plan items**
 
 Mark this plan's checkboxes complete only after the corresponding command or docs update has happened.
 
@@ -62,7 +66,7 @@ Mark this plan's checkboxes complete only after the corresponding command or doc
 **Files:**
 - Commit Phase47 docs and plan changes.
 
-- [ ] **Step 1: Run final checks**
+- [x] **Step 1: Run final checks**
 
 ```bash
 git diff --check
@@ -71,7 +75,7 @@ git status --short
 
 Expected: no whitespace errors; only intended docs/plan changes plus the pre-existing untracked `.claude/`.
 
-- [ ] **Step 2: Commit**
+- [x] **Step 2: Commit**
 
 ```bash
 git add backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md \