From 337ebb8a37397cede34a329bd0b66b56170d0b7c Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Tue, 30 Jun 2026 20:35:43 +0000 Subject: [PATCH] docs(paged): record phase0 decode repro Record comparable graph-node-traced paged and vLLM decode difference-method artifacts for the GB10 parity reopen. Assisted-by: Codex:gpt-5 --- .../docs/GB10_PARITY_PHASE0_RESULTS.md | 51 +++++++++++++++++++ .../plans/2026-06-30-gb10-parity-reopen.md | 13 +++-- 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md index 5af30bf88..637c37ef5 100644 --- a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md +++ b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md @@ -33,6 +33,57 @@ Dense paged prefill: | 512 | 4 | 32 | 16512 | 16.749 | 978.18 | 0.842 | 152.03 | 17.591 | 938.64 | | 2048 | 4 | 32 | 65664 | 63.791 | 1027.35 | 0.687 | 186.29 | 64.479 | 1018.38 | +## Decode Difference-Method Reproduction + +Paged llama.cpp artifacts: + +- `~/bench/reopen_phase0/paged_decode_nsys/paged_moe_n256_ntg16.nsys-rep` +- `~/bench/reopen_phase0/paged_decode_nsys/paged_moe_n256_ntg16.bench.log` +- `~/bench/reopen_phase0/paged_decode_nsys/paged_moe_n256_ntg64.nsys-rep` +- `~/bench/reopen_phase0/paged_decode_nsys/paged_moe_n256_ntg64.bench.log` + +Paged llama.cpp rows: + +| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s | +|----|----|---|------|--------|----------|--------|----------|-----|-------| +| 128 | 16 | 256 | 36864 | 14.933 | 2194.39 | 4.502 | 909.80 | 19.435 | 1896.81 | +| 128 | 64 | 256 | 49152 | 14.949 | 2191.96 | 17.924 | 914.09 | 32.873 | 1495.21 | + +Paged difference-method decode: + +- Token delta: `256 * (64 - 16) = 12288` +- Wall delta: `17.924 - 4.502 = 13.422 s` +- Decode throughput: `915.51 t/s` + +vLLM artifacts: + +- `~/bench/reopen_phase0/vllm_decode_nsys/vllm_version.txt` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg16.nsys-rep` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg16.run.log` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg16.kern.csv` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg16.gpu_trace.csv` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg64.nsys-rep` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg64.run.log` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg64.kern.csv` +- `~/bench/reopen_phase0/vllm_decode_nsys/dec_npl256_ntg64.gpu_trace.csv` + +vLLM version: `0.23.0` + +vLLM profiled rows: + +| NSEQ | GEN | Generated tokens | Wall s | Logged tok/s | +|------|-----|------------------|--------|--------------| +| 256 | 16 | 4096 | 6.195 | 661.2 | +| 256 | 64 | 16384 | 17.607 | 930.5 | + +vLLM difference-method decode: + +- Token delta: `16384 - 4096 = 12288` +- Wall delta: `17.607 - 6.195 = 11.412 s` +- Decode throughput: `1076.76 t/s` + +Clean reproduced paged/vLLM decode ratio: `85.0%`. + ## Clean Build First clean build attempt: diff --git a/docs/superpowers/plans/2026-06-30-gb10-parity-reopen.md b/docs/superpowers/plans/2026-06-30-gb10-parity-reopen.md index f6ce0278c..5dab00cf0 100644 --- a/docs/superpowers/plans/2026-06-30-gb10-parity-reopen.md +++ b/docs/superpowers/plans/2026-06-30-gb10-parity-reopen.md @@ -482,7 +482,12 @@ Commit succeeds. **Files:** - Modify: `backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md` -- [ ] **Step 1: Dispatch a vLLM harness discovery subagent** +- [x] **Step 1: Dispatch a vLLM harness discovery subagent** + +Result: read-only subagent found prior harnesses +`/home/mudler/vllm_moe_nsys.sh` and `/home/mudler/vllm_moe_prof.py`, plus a +concrete `~/highN_vllm_diff` `NSEQ`/`GEN` command sequence using +`nsys profile --cuda-graph-trace=node`. Prompt: @@ -496,7 +501,7 @@ Expected: Subagent returns a concrete vLLM command sequence or reports that no prior harness exists. ``` -- [ ] **Step 2: Run paged graph-node-traced decode difference-method** +- [x] **Step 2: Run paged graph-node-traced decode difference-method** Run only after DGX preflight passes: @@ -525,7 +530,7 @@ Expected: Two `.nsys-rep` files and two `.bench.log` files exist. ``` -- [ ] **Step 3: Run vLLM graph-node-traced decode difference-method** +- [x] **Step 3: Run vLLM graph-node-traced decode difference-method** Use the exact command sequence from Step 1. Required properties: @@ -543,7 +548,7 @@ Expected: Two vLLM graph-node-traced artifacts exist and can be reduced by the difference method. ``` -- [ ] **Step 4: Update Phase 0 results and commit** +- [x] **Step 4: Update Phase 0 results and commit** Record paged and vLLM tokens/s using: