docs(paged): record patch mirror readiness phase

Assisted-by: Codex:gpt-5
This commit is contained in:
Ettore Di Giacinto
2026-07-01 13:11:57 +00:00
parent 2b2b1f0b25
commit e573194799
4 changed files with 341 additions and 0 deletions

View File

@@ -3767,3 +3767,61 @@ Decision:
- Next default-on consideration requires regenerating the LocalAI patch series
from the fork and rerunning the broader current serving snapshot gates. Do not
default it from Phase68 alone.
## Patch Series Mirror Readiness Phase69 Result
Phase69 is recorded in
`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`.
It did not change llama.cpp source and did not edit generated LocalAI patch
files. It verified that the current LocalAI series is still drift-free at the
Phase37 tip, then dry-ran the additive patches needed to mirror the current
local fork HEAD.
Current committed series:
| check | value |
|-------|-------|
| base | `0ed235ea2c17a19fc8238668653946721ed136fd` |
| patch count | `54` |
| applied tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` |
| Phase37 fork-tip tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` |
| current fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
| committed series matches Phase37 tip | `yes` |
| committed series matches current fork HEAD | `no` |
Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only
candidate patches:
| projected patch | source commit |
|-----------------|---------------|
| `0064-feat-server-trace-serving-admission-batches.patch` | `c6cb8460e` |
| `0065-feat-server-add-admission-trace-histograms.patch` | `bd7b2e952` |
| `0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch` | `8a97629a4` |
| `0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch` | `3b6ab5fa8` |
| `0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch` | `8759213e3` |
| `0069-test-cuda-cover-W4A16-direct-activation-policy.patch` | `41be3da5b` |
| `0070-feat-cuda-route-W4A16-direct-activation-stub.patch` | `7967ad47f` |
| `0071-feat-cuda-trace-layout-tensor-names.patch` | `fa944bb5f` |
| `0072-feat-cuda-trace-activation-quant-routes.patch` | `afc2c7030` |
| `0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch` | `ea0875d14` |
Projected mirror check:
| check | value |
|-------|-------|
| current patches | `54` |
| missing patches | `10` |
| projected patches | `64` |
| applied plus missing tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
| fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
| projected series matches fork HEAD | `yes` |
Decision:
- The Phase68 BF16 F32 opt-in would become projected patch `0073` and has a
conflict-free path into the LocalAI series.
- Do not commit generated patches yet. The fork branch is `26` commits ahead of
`fork/localai-paged`, and the repo workflow requires pushing the fork before
regenerating the LocalAI patch series. Push still requires explicit approval.
- After push approval, regenerate `0064..0073`, repeat the tree hash check, and
only then run broader serving gates for any default-on BF16 policy decision.

View File

@@ -1029,3 +1029,53 @@ Decision: carry the shortcut as a default-off opt-in candidate. It is no longer
just a prefill-only win, but Phase68 is not enough to default it on. Any future
default-on proposal must mirror the fork commit into the LocalAI patch series
and rerun a broader current serving snapshot with pre/post md5 and op gates.
## 14. PHASE69 RESULT: PATCH SERIES MIRROR READINESS
Phase69 checked the patch-series state without pushing and without editing
generated patch files. Plan:
`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`.
Current committed LocalAI patches still match the Phase37 fork tip:
```text
base=0ed235ea2c17a19fc8238668653946721ed136fd
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_patch_tip=yes
match_fork_head=no
patch_count=54
```
Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only
patches, projected as `0064..0073`. Applying current `0001..0063` plus temp
`0064..0073` onto the pin exactly reconstructed current fork HEAD:
```text
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_fork_head=yes
current_patch_count=54
missing_patch_count=10
projected_patch_count=64
```
Projected patch tail:
- `0064` serving admission trace (`c6cb8460e`)
- `0065` admission histograms (`bd7b2e952`)
- `0066..0068` TTFT prefill-first scheduler knobs (`8a97629a4`,
`3b6ab5fa8`, `8759213e3`)
- `0069..0070` W4A16 direct-activation policy/stub (`41be3da5b`,
`7967ad47f`)
- `0071` layout trace (`fa944bb5f`)
- `0072` quant trace (`afc2c7030`)
- `0073` BF16 cuBLAS F32 output (`ea0875d14`)
Decision: mirror regeneration is technically ready but not executed. The local
fork is `26` commits ahead of `fork/localai-paged`, and the fork-first policy
requires pushing before regenerating the LocalAI series. Do not push without
explicit approval. After approval, push the fork, regenerate `0064..0073`, rerun
the same tree-hash check, and then run the broader serving gates before any
default-on BF16 policy change.

View File

@@ -70,6 +70,61 @@ The check used a fresh worktree at `LLAMA_VERSION`, applied every
`git write-tree` to canonical fork branch `localai-paged` at
`2d590d770 feat(cuda): trace cublas tensor names`.
Phase 69 re-verified that the committed LocalAI patch series still matches the
Phase37 fork tip, and then dry-ran the additive patch export needed for the
current local fork HEAD. No generated patch files were edited in Phase69 because
the repo policy requires pushing the fork branch before regenerating the LocalAI
series, and pushes still require explicit approval.
Committed-series check:
```text
base=0ed235ea2c17a19fc8238668653946721ed136fd
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_patch_tip=yes
match_fork_head=no
patch_count=54
```
Dry-run export from `2d590d770..ea0875d14` produced ten source-only candidate
patches:
```text
0064-feat-server-trace-serving-admission-batches.patch
0065-feat-server-add-admission-trace-histograms.patch
0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch
0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch
0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch
0069-test-cuda-cover-W4A16-direct-activation-policy.patch
0070-feat-cuda-route-W4A16-direct-activation-stub.patch
0071-feat-cuda-trace-layout-tensor-names.patch
0072-feat-cuda-trace-activation-quant-routes.patch
0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch
```
Projected-series check with current `0001..0063` plus temp `0064..0073`:
```text
base=0ed235ea2c17a19fc8238668653946721ed136fd
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_fork_head=yes
current_patch_count=54
missing_patch_count=10
projected_patch_count=64
```
Next mirror action after explicit push approval:
1. Push `/home/mudler/_git/llama.cpp` branch `localai-paged` to
`fork/localai-paged`.
2. Regenerate or copy the equivalent source-only `0064..0073` patches from the
pushed fork.
3. Repeat the projected-series tree hash check above against fork HEAD before
committing generated patches.
## Status
- **0001 vendor manager — DONE.** Applies clean to the pin; builds into `libllama`.

View File

@@ -0,0 +1,178 @@
# Patch Series Mirror Readiness Phase69 Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** prove the LocalAI paged patch series can be extended from `0063` to the current local fork HEAD without conflicts, while respecting the no-push-without-approval rule.
**Architecture:** Do not edit generated patches yet. First verify the current on-disk series still matches the Phase37 fork tip, then export the missing commits into `/tmp`, apply current plus missing patches onto the pinned llama.cpp base, and compare that tree to the current local fork HEAD.
**Tech Stack:** Git worktrees, `git apply`, `git format-patch`, LocalAI paged patch stack, llama.cpp fork branch `localai-paged`.
---
## Guardrails
- Do not push `mudler/llama.cpp:localai-paged` without explicit user approval.
- Do not edit `backend/cpp/llama-cpp-localai-paged/patches/paged/*.patch` directly.
- Do not regenerate committed LocalAI patch files before the fork push step required by the repo policy.
- Use strict `git apply`, matching the LocalAI build path.
- Record drift as a first-class phase result.
## Files
- Create: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md`
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md`
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md`
---
### Task 1: Verify Current Mirror Baseline
- [x] **Step 1: Confirm current LocalAI state**
Result:
- LocalAI HEAD: `2b2b1f0b2 docs(paged): record BF16 F32 output dense serving phase`
- Untracked files: pre-existing `.claude/` scratch files only.
- Patch-series tail: `0063-feat-cuda-trace-cublas-tensor-names.patch`.
- [x] **Step 2: Compare current patch series against Phase37 fork tip**
Command shape:
```bash
BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile)
CHECK=/tmp/llama-paged-series-applycheck-phase69
git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE"
for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch; do
git -C "$CHECK" apply --verbose "$p"
done
git -C "$CHECK" add -A
git -C "$CHECK" write-tree
git -C /home/mudler/_git/llama.cpp rev-parse 2d590d770^{tree}
```
Result:
```text
base=0ed235ea2c17a19fc8238668653946721ed136fd
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_patch_tip=yes
match_fork_head=no
patch_count=54
```
Decision: the committed LocalAI series remains correct for Phase37, but it is
intentionally behind the local fork HEAD.
### Task 2: Dry-Run Missing Patch Export
- [x] **Step 1: Inspect fork divergence**
Result:
```text
upstream=fork/localai-paged
ahead_of_upstream=26
ahead_of_patch_tip_2d590d770=10
fork_head=ea0875d14225a10d87a1d0e1b9b57b74c81d873e
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
```
- [x] **Step 2: Export missing commits to `/tmp` only**
Run:
```bash
OUT=/tmp/phase69_missing_patches
rm -rf "$OUT"
mkdir -p "$OUT"
git -C /home/mudler/_git/llama.cpp format-patch \
--zero-commit --no-signature --start-number 64 \
-o "$OUT" 2d590d770..HEAD
```
Result:
```text
0064-feat-server-trace-serving-admission-batches.patch
0065-feat-server-add-admission-trace-histograms.patch
0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch
0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch
0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch
0069-test-cuda-cover-W4A16-direct-activation-policy.patch
0070-feat-cuda-route-W4A16-direct-activation-stub.patch
0071-feat-cuda-trace-layout-tensor-names.patch
0072-feat-cuda-trace-activation-quant-routes.patch
0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch
```
- [x] **Step 3: Confirm source-only candidate paths**
The temp patches touch only llama.cpp source, tests, CMake, and server files:
```text
ggml/src/ggml-cuda/*
tests/*
tools/server/*
```
No markdown or LocalAI files are included in the generated candidate patches.
### Task 3: Prove Full Projected Mirror
- [x] **Step 1: Apply current plus temp patches to the pinned base**
Command shape:
```bash
BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile)
CHECK=/tmp/llama-paged-series-applycheck-phase69-full
git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE"
for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch /tmp/phase69_missing_patches/*.patch; do
git -C "$CHECK" apply --verbose "$p"
done
git -C "$CHECK" add -A
```
Result:
```text
base=0ed235ea2c17a19fc8238668653946721ed136fd
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
match_fork_head=yes
current_patch_count=54
missing_patch_count=10
projected_patch_count=64
```
Decision: after push approval, the LocalAI patch-series regeneration path is
known: add temp-export-equivalent patches `0064..0073`, then verify the same tree
hash. The BF16 F32 opt-in is projected as patch `0073`.
### Task 4: Record and Commit Documentation
- [x] **Step 1: Record phase result**
Update:
- `backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md`
- `backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md`
- `backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md`
- [x] **Step 2: Commit LocalAI docs**
Run:
```bash
git add -f docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md
git add backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md \
backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md \
backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
git commit -m "docs(paged): record patch mirror readiness phase" \
-m "Assisted-by: Codex:gpt-5"
```