docs(paged): scope phase7 serving candidates

Mark the Phase 6 serving classifier complete, preserve the old parity final as historical, and scope Phase 7 source candidates with explicit md5 and op gates.

Assisted-by: Codex:gpt-5
This commit is contained in:
Ettore Di Giacinto
2026-06-30 23:12:09 +00:00
parent b647460dee
commit 34c4b5ce8d
5 changed files with 190 additions and 2 deletions

View File

@@ -560,3 +560,5 @@ Result:
- No current env-only lever clears the serving performance gate. Scope the next
source candidate against either structural MoE decode fusion or async serving
input/sampler uploads, with a workload that proves the target bucket matters.
- Phase 7 must keep the canonical MoE and dense md5 gates as the first
inference-safety check before any performance result is accepted.

View File

@@ -1,5 +1,11 @@
# PARITY_HANDOFF: how to pick up the GB10 vLLM-parity work
> 2026-06-30 update: this handoff is now historical procedure, not the active
> verdict. The GB10 investigation was reopened in `GB10_PARITY_REOPEN_SPEC.md`
> and `GB10_PARITY_PHASE0_RESULTS.md`, with Phase 6 serving-nsys evidence and
> the active follow-up plans under `docs/superpowers/plans/`. Use those files for
> the current state before relying on the older "closed" conclusion below.
Audience: an agent with **zero prior context** who has been told to "continue the GB10 vLLM-parity investigation" on the `llama-cpp-localai-paged` backend.
This file is the **operational how-to**. It is the companion to `VLLM_PARITY_FINAL.md`, which is the **why / authoritative record** ("never re-litigate"). If the two ever disagree on a *fact*, `VLLM_PARITY_FINAL.md` and the bench artifacts it cites win; this file wins on *procedure* (how to ssh, lock, build, bench, profile).

View File

@@ -1,5 +1,10 @@
# vLLM Parity - Final State (Qwen3.6 NVFP4 on GB10)
> 2026-06-30 update: this document records the earlier final-state verdict. The
> investigation has since been reopened; see `GB10_PARITY_REOPEN_SPEC.md`,
> `GB10_PARITY_PHASE0_RESULTS.md`, and the active `docs/superpowers/plans/`
> Phase 6/Phase 7 files for the current measured state and follow-up scope.
> **Status: CLOSED.** This is the standing record of the exhaustive GB10 (DGX
> Spark, sm_121) parity investigation for `llama-cpp-localai-paged` against vLLM
> on the Qwen3.6 hybrid gated-DeltaNet NVFP4 models. It exists so the