docs(paged): record GB10 parity artifact gaps

Add the read-only DGX artifact review for the Phase 0 parity reopen, including supported paged measurements and missing vLLM difference-method evidence. Assisted-by: Codex:gpt-5
2026-07-03 04:46:54 -04:00 · 2026-06-30 15:55:16 +00:00
parent b3cfdfac4a
commit b1a1b721bd
1 changed files with 40 additions and 0 deletions
--- a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
@@ -29,6 +29,46 @@ No baseline runs have been started yet.
  HEAD.
 - Tree hash after patch application: `a73d759350277532a14e853e1fe78f08bbb74ce8`

+## Existing Artifact Gap Review
+
+Read-only DGX artifact inspection was performed after confirming the machine was
+idle: `docker ps` returned no running containers,
+`nvidia-smi --query-compute-apps` returned no compute-app rows, and
+`~/gpu_bench_lock/owner` read
+`FREE released-by-claude-fp4norm-profile 1782828229`.
+
+Existing paged llama.cpp decode and prefill numbers are supported by
+`/home/mudler/bench/COMBINED_DEFINITIVE.txt`: MoE paged prefill lines 13-18,
+MoE paged serving decode lines 23-26, dense paged prefill lines 43-48, and
+dense paged serving decode lines 53-56. Supporting comparison artifacts are
+`/home/mudler/bench/STOCK3WAY.txt`, `/home/mudler/bench/PREFILL_KNOB.txt`,
+`/home/mudler/bench/DEFINITIVE_S3ab.txt`, and the adjacent raw logs.
+
+No self-contained vLLM `1078 t/s` GPU-steady `ntg16`/`ntg64`
+difference-method artifact was found. The available vLLM evidence is
+serving-run output in `/home/mudler/bench/COMBINED_DEFINITIVE.txt` plus
+nsys/run artifacts under `/home/mudler/bench/profgap/` and
+`/home/mudler/bench/postssm_decomp/`; these do not form a packaged
+`ntg16`/`ntg64` difference-method report.
+
+W4A16/Marlin evidence exists in `/home/mudler/bench/vllm_prefix.log`,
+`/home/mudler/bench/profgap/vllm_moe_decode.run.log`, and
+`/home/mudler/bench/marlin_gate/kl_marlin.log`.
+`/home/mudler/llama-paged-dev/LEVER3_ACTQUANT_FUSION_RESULTS.md` records the
+parity conclusion: W4A16/Marlin is a precision-change lever, not a bit-exact
+llama.cpp parity lever.
+
+GDN M5/M8 evidence exists in `/home/mudler/bench/COMBINED_DEFINITIVE.txt`
+(`GDN CONFIG C (M8)` and production defaults noting GDN M5),
+`/home/mudler/llama-paged-dev/LEVER1_GATHER_RESULTS.md`, and
+`/home/mudler/llama-paged-dev/CONV_STATE_FUSION_RESULTS.md`.
+
+S3 evidence exists in `/home/mudler/bench/DEFINITIVE_S3ab.txt`; that A/B shows
+S3-on was worse unless paired with `LLAMA_PAGED_PREFILL_PERIOD=1`, matching
+`/home/mudler/bench/COMBINED_DEFINITIVE.txt` where S3 is recorded as off by
+default. No separate self-contained adaptive-scheduling proof artifact was
+found beyond the S3 and prefill-knob artifacts.
+
 ## Open Items

 - Reproduce paged prefill and decode baselines.