mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 04:46:54 -04:00
docs(paged): scope phase7 serving candidates
Mark the Phase 6 serving classifier complete, preserve the old parity final as historical, and scope Phase 7 source candidates with explicit md5 and op gates. Assisted-by: Codex:gpt-5
This commit is contained in:
@@ -560,3 +560,5 @@ Result:
|
||||
- No current env-only lever clears the serving performance gate. Scope the next
|
||||
source candidate against either structural MoE decode fusion or async serving
|
||||
input/sampler uploads, with a workload that proves the target bucket matters.
|
||||
- Phase 7 must keep the canonical MoE and dense md5 gates as the first
|
||||
inference-safety check before any performance result is accepted.
|
||||
|
||||
@@ -1,5 +1,11 @@
|
||||
# PARITY_HANDOFF: how to pick up the GB10 vLLM-parity work
|
||||
|
||||
> 2026-06-30 update: this handoff is now historical procedure, not the active
|
||||
> verdict. The GB10 investigation was reopened in `GB10_PARITY_REOPEN_SPEC.md`
|
||||
> and `GB10_PARITY_PHASE0_RESULTS.md`, with Phase 6 serving-nsys evidence and
|
||||
> the active follow-up plans under `docs/superpowers/plans/`. Use those files for
|
||||
> the current state before relying on the older "closed" conclusion below.
|
||||
|
||||
Audience: an agent with **zero prior context** who has been told to "continue the GB10 vLLM-parity investigation" on the `llama-cpp-localai-paged` backend.
|
||||
|
||||
This file is the **operational how-to**. It is the companion to `VLLM_PARITY_FINAL.md`, which is the **why / authoritative record** ("never re-litigate"). If the two ever disagree on a *fact*, `VLLM_PARITY_FINAL.md` and the bench artifacts it cites win; this file wins on *procedure* (how to ssh, lock, build, bench, profile).
|
||||
|
||||
@@ -1,5 +1,10 @@
|
||||
# vLLM Parity - Final State (Qwen3.6 NVFP4 on GB10)
|
||||
|
||||
> 2026-06-30 update: this document records the earlier final-state verdict. The
|
||||
> investigation has since been reopened; see `GB10_PARITY_REOPEN_SPEC.md`,
|
||||
> `GB10_PARITY_PHASE0_RESULTS.md`, and the active `docs/superpowers/plans/`
|
||||
> Phase 6/Phase 7 files for the current measured state and follow-up scope.
|
||||
|
||||
> **Status: CLOSED.** This is the standing record of the exhaustive GB10 (DGX
|
||||
> Spark, sm_121) parity investigation for `llama-cpp-localai-paged` against vLLM
|
||||
> on the Qwen3.6 hybrid gated-DeltaNet NVFP4 models. It exists so the
|
||||
|
||||
Reference in New Issue
Block a user