Commit Graph

4 Commits

Author SHA1 Message Date
Ettore Di Giacinto
de34cd5954 docs(paged): refresh parity handoff state
Reconcile the paged backend pin prose with the current Makefile pin, mark the 0044 patch tracking note as resolved, and add DGX Docker worker idleness to the benchmark preflight.

Assisted-by: Codex:gpt-5
2026-06-30 15:27:44 +00:00
Ettore Di Giacinto
1b9176c2c8 docs(paged): codify fork-first patch workflow as mandatory policy
The fork mudler/llama.cpp branch localai-paged is the canonical source of
truth for all paged-backend kernel/patch work. Always update it FIRST: commit
the change on the fork branch and push it, then regenerate the LocalAI patch
series (backend/cpp/llama-cpp-localai-paged/patches/paged/) from the fork via
git format-patch so the series is a 1:1 drift-free mirror of the branch. Never
edit the LocalAI patch files directly, and never add a patch with no
corresponding fork-branch commit. The series is a derivative; the fork is the
source. The fork branch is also where the build and the per-path bit-exact md5
gate actually run, so it is the only place a change is truly validated.

Codified in two places:
- .agents/llama-cpp-localai-paged-backend.md: new "Fork-first workflow
  (MANDATORY)" section at the top of the patch/pin-sync material, plus the
  "Encapsulating your work" bullet now points at it.
- backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md: strengthened the
  hard-gate (section 2.5) into "Fork-first is MANDATORY", and corrected a stale
  numbering example (fork 51168c5ee "patch 0044" maps to worktree 0044, not the
  f32-only M5 which is worktree 0047).

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 15:12:36 +00:00
Ettore Di Giacinto
8bb47e5a8a docs(paged): correct PARITY_HANDOFF ahead/behind + note dense CDEF gate md5
Ground-check follow-up to 2431090ff. Two factual corrections:

- Section 7 worktree line had the ahead/behind counts swapped ("25 ahead,
  197 behind"); the branch is actually ~199 ahead / 25 behind origin/master.
- Discrepancy item 5 flagged only the MoE CDEF PAGED_GATE_MD5 (0921716...);
  the dense run is symmetric (COMBINED_DEFINITIVE.txt records ecfe924d... for
  dense, which likewise differs from the canonical dense gate 5951a5b4). Both
  CDEF values come from combined_definitive.sh's own gate command, not the
  canonical bit-exact gate in section 3.3, so neither is sanctioned and both
  must be KL-validated.

Everything else in the handoff verified accurate: fork branch localai-paged
HEAD 51168c5ee (patch 0044) on dgx:~/llama-paged-fork, dev-tree HEAD a7d439e,
all md5/KL numbers, the 86%/1078/924 decode record, bench env, and all
referenced file/artifact paths.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 14:49:06 +00:00
Ettore Di Giacinto
2431090ff3 docs(paged): future-agent vLLM-parity HANDOFF guide (GB10, how-to companion to FINAL)
Adds docs/PARITY_HANDOFF.md: the operational how-to for an agent with zero
context picking up the GB10 vLLM-parity work. Complements VLLM_PARITY_FINAL.md
(the why/record) with TL;DR state, the hard gates (per-path bit-exact md5,
KL-gate, no LLAMA_MAX_BATCH_TOKENS, fork-is-canonical), a copy-pasteable
operational quickstart (ssh/lock/build/bench + the --cuda-graph-trace=node
decode-profiling rule that caused 4 wrong analyses), the complete tested-and-
rejected lever map, methodology lessons, the three forward directions, and a
key file/artifact index with the open discrepancies to reconcile.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 14:42:44 +00:00