diff --git a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md index 90d13d15a..473673ed5 100644 --- a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md +++ b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md @@ -3767,3 +3767,61 @@ Decision: - Next default-on consideration requires regenerating the LocalAI patch series from the fork and rerunning the broader current serving snapshot gates. Do not default it from Phase68 alone. + +## Patch Series Mirror Readiness Phase69 Result + +Phase69 is recorded in +`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`. +It did not change llama.cpp source and did not edit generated LocalAI patch +files. It verified that the current LocalAI series is still drift-free at the +Phase37 tip, then dry-ran the additive patches needed to mirror the current +local fork HEAD. + +Current committed series: + +| check | value | +|-------|-------| +| base | `0ed235ea2c17a19fc8238668653946721ed136fd` | +| patch count | `54` | +| applied tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` | +| Phase37 fork-tip tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` | +| current fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` | +| committed series matches Phase37 tip | `yes` | +| committed series matches current fork HEAD | `no` | + +Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only +candidate patches: + +| projected patch | source commit | +|-----------------|---------------| +| `0064-feat-server-trace-serving-admission-batches.patch` | `c6cb8460e` | +| `0065-feat-server-add-admission-trace-histograms.patch` | `bd7b2e952` | +| `0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch` | `8a97629a4` | +| `0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch` | `3b6ab5fa8` | +| `0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch` | `8759213e3` | +| `0069-test-cuda-cover-W4A16-direct-activation-policy.patch` | `41be3da5b` | +| `0070-feat-cuda-route-W4A16-direct-activation-stub.patch` | `7967ad47f` | +| `0071-feat-cuda-trace-layout-tensor-names.patch` | `fa944bb5f` | +| `0072-feat-cuda-trace-activation-quant-routes.patch` | `afc2c7030` | +| `0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch` | `ea0875d14` | + +Projected mirror check: + +| check | value | +|-------|-------| +| current patches | `54` | +| missing patches | `10` | +| projected patches | `64` | +| applied plus missing tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` | +| fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` | +| projected series matches fork HEAD | `yes` | + +Decision: + +- The Phase68 BF16 F32 opt-in would become projected patch `0073` and has a + conflict-free path into the LocalAI series. +- Do not commit generated patches yet. The fork branch is `26` commits ahead of + `fork/localai-paged`, and the repo workflow requires pushing the fork before + regenerating the LocalAI patch series. Push still requires explicit approval. +- After push approval, regenerate `0064..0073`, repeat the tree hash check, and + only then run broader serving gates for any default-on BF16 policy decision. diff --git a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md index 0fc115a9c..538ea00a9 100644 --- a/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md +++ b/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md @@ -1029,3 +1029,53 @@ Decision: carry the shortcut as a default-off opt-in candidate. It is no longer just a prefill-only win, but Phase68 is not enough to default it on. Any future default-on proposal must mirror the fork commit into the LocalAI patch series and rerun a broader current serving snapshot with pre/post md5 and op gates. + +## 14. PHASE69 RESULT: PATCH SERIES MIRROR READINESS + +Phase69 checked the patch-series state without pushing and without editing +generated patch files. Plan: +`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`. + +Current committed LocalAI patches still match the Phase37 fork tip: + +```text +base=0ed235ea2c17a19fc8238668653946721ed136fd +applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_patch_tip=yes +match_fork_head=no +patch_count=54 +``` + +Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only +patches, projected as `0064..0073`. Applying current `0001..0063` plus temp +`0064..0073` onto the pin exactly reconstructed current fork HEAD: + +```text +applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_fork_head=yes +current_patch_count=54 +missing_patch_count=10 +projected_patch_count=64 +``` + +Projected patch tail: + +- `0064` serving admission trace (`c6cb8460e`) +- `0065` admission histograms (`bd7b2e952`) +- `0066..0068` TTFT prefill-first scheduler knobs (`8a97629a4`, + `3b6ab5fa8`, `8759213e3`) +- `0069..0070` W4A16 direct-activation policy/stub (`41be3da5b`, + `7967ad47f`) +- `0071` layout trace (`fa944bb5f`) +- `0072` quant trace (`afc2c7030`) +- `0073` BF16 cuBLAS F32 output (`ea0875d14`) + +Decision: mirror regeneration is technically ready but not executed. The local +fork is `26` commits ahead of `fork/localai-paged`, and the fork-first policy +requires pushing before regenerating the LocalAI series. Do not push without +explicit approval. After approval, push the fork, regenerate `0064..0073`, rerun +the same tree-hash check, and then run the broader serving gates before any +default-on BF16 policy change. diff --git a/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md b/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md index 5182f260f..9edd32c73 100644 --- a/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md +++ b/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md @@ -70,6 +70,61 @@ The check used a fresh worktree at `LLAMA_VERSION`, applied every `git write-tree` to canonical fork branch `localai-paged` at `2d590d770 feat(cuda): trace cublas tensor names`. +Phase 69 re-verified that the committed LocalAI patch series still matches the +Phase37 fork tip, and then dry-ran the additive patch export needed for the +current local fork HEAD. No generated patch files were edited in Phase69 because +the repo policy requires pushing the fork branch before regenerating the LocalAI +series, and pushes still require explicit approval. + +Committed-series check: + +```text +base=0ed235ea2c17a19fc8238668653946721ed136fd +applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_patch_tip=yes +match_fork_head=no +patch_count=54 +``` + +Dry-run export from `2d590d770..ea0875d14` produced ten source-only candidate +patches: + +```text +0064-feat-server-trace-serving-admission-batches.patch +0065-feat-server-add-admission-trace-histograms.patch +0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch +0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch +0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch +0069-test-cuda-cover-W4A16-direct-activation-policy.patch +0070-feat-cuda-route-W4A16-direct-activation-stub.patch +0071-feat-cuda-trace-layout-tensor-names.patch +0072-feat-cuda-trace-activation-quant-routes.patch +0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch +``` + +Projected-series check with current `0001..0063` plus temp `0064..0073`: + +```text +base=0ed235ea2c17a19fc8238668653946721ed136fd +applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_fork_head=yes +current_patch_count=54 +missing_patch_count=10 +projected_patch_count=64 +``` + +Next mirror action after explicit push approval: + +1. Push `/home/mudler/_git/llama.cpp` branch `localai-paged` to + `fork/localai-paged`. +2. Regenerate or copy the equivalent source-only `0064..0073` patches from the + pushed fork. +3. Repeat the projected-series tree hash check above against fork HEAD before + committing generated patches. + ## Status - **0001 vendor manager — DONE.** Applies clean to the pin; builds into `libllama`. diff --git a/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md b/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md new file mode 100644 index 000000000..874f9ee35 --- /dev/null +++ b/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md @@ -0,0 +1,178 @@ +# Patch Series Mirror Readiness Phase69 Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** prove the LocalAI paged patch series can be extended from `0063` to the current local fork HEAD without conflicts, while respecting the no-push-without-approval rule. + +**Architecture:** Do not edit generated patches yet. First verify the current on-disk series still matches the Phase37 fork tip, then export the missing commits into `/tmp`, apply current plus missing patches onto the pinned llama.cpp base, and compare that tree to the current local fork HEAD. + +**Tech Stack:** Git worktrees, `git apply`, `git format-patch`, LocalAI paged patch stack, llama.cpp fork branch `localai-paged`. + +--- + +## Guardrails + +- Do not push `mudler/llama.cpp:localai-paged` without explicit user approval. +- Do not edit `backend/cpp/llama-cpp-localai-paged/patches/paged/*.patch` directly. +- Do not regenerate committed LocalAI patch files before the fork push step required by the repo policy. +- Use strict `git apply`, matching the LocalAI build path. +- Record drift as a first-class phase result. + +## Files + +- Create: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md` +- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md` +- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md` +- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md` + +--- + +### Task 1: Verify Current Mirror Baseline + +- [x] **Step 1: Confirm current LocalAI state** + +Result: + +- LocalAI HEAD: `2b2b1f0b2 docs(paged): record BF16 F32 output dense serving phase` +- Untracked files: pre-existing `.claude/` scratch files only. +- Patch-series tail: `0063-feat-cuda-trace-cublas-tensor-names.patch`. + +- [x] **Step 2: Compare current patch series against Phase37 fork tip** + +Command shape: + +```bash +BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile) +CHECK=/tmp/llama-paged-series-applycheck-phase69 +git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE" +for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch; do + git -C "$CHECK" apply --verbose "$p" +done +git -C "$CHECK" add -A +git -C "$CHECK" write-tree +git -C /home/mudler/_git/llama.cpp rev-parse 2d590d770^{tree} +``` + +Result: + +```text +base=0ed235ea2c17a19fc8238668653946721ed136fd +applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_patch_tip=yes +match_fork_head=no +patch_count=54 +``` + +Decision: the committed LocalAI series remains correct for Phase37, but it is +intentionally behind the local fork HEAD. + +### Task 2: Dry-Run Missing Patch Export + +- [x] **Step 1: Inspect fork divergence** + +Result: + +```text +upstream=fork/localai-paged +ahead_of_upstream=26 +ahead_of_patch_tip_2d590d770=10 +fork_head=ea0875d14225a10d87a1d0e1b9b57b74c81d873e +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +``` + +- [x] **Step 2: Export missing commits to `/tmp` only** + +Run: + +```bash +OUT=/tmp/phase69_missing_patches +rm -rf "$OUT" +mkdir -p "$OUT" +git -C /home/mudler/_git/llama.cpp format-patch \ + --zero-commit --no-signature --start-number 64 \ + -o "$OUT" 2d590d770..HEAD +``` + +Result: + +```text +0064-feat-server-trace-serving-admission-batches.patch +0065-feat-server-add-admission-trace-histograms.patch +0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch +0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch +0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch +0069-test-cuda-cover-W4A16-direct-activation-policy.patch +0070-feat-cuda-route-W4A16-direct-activation-stub.patch +0071-feat-cuda-trace-layout-tensor-names.patch +0072-feat-cuda-trace-activation-quant-routes.patch +0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch +``` + +- [x] **Step 3: Confirm source-only candidate paths** + +The temp patches touch only llama.cpp source, tests, CMake, and server files: + +```text +ggml/src/ggml-cuda/* +tests/* +tools/server/* +``` + +No markdown or LocalAI files are included in the generated candidate patches. + +### Task 3: Prove Full Projected Mirror + +- [x] **Step 1: Apply current plus temp patches to the pinned base** + +Command shape: + +```bash +BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile) +CHECK=/tmp/llama-paged-series-applycheck-phase69-full +git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE" +for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch /tmp/phase69_missing_patches/*.patch; do + git -C "$CHECK" apply --verbose "$p" +done +git -C "$CHECK" add -A +``` + +Result: + +```text +base=0ed235ea2c17a19fc8238668653946721ed136fd +applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4 +match_fork_head=yes +current_patch_count=54 +missing_patch_count=10 +projected_patch_count=64 +``` + +Decision: after push approval, the LocalAI patch-series regeneration path is +known: add temp-export-equivalent patches `0064..0073`, then verify the same tree +hash. The BF16 F32 opt-in is projected as patch `0073`. + +### Task 4: Record and Commit Documentation + +- [x] **Step 1: Record phase result** + +Update: + +- `backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md` +- `backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md` +- `backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md` + +- [x] **Step 2: Commit LocalAI docs** + +Run: + +```bash +git add -f docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md +git add backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md \ + backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md \ + backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md +git commit -m "docs(paged): record patch mirror readiness phase" \ + -m "Assisted-by: Codex:gpt-5" +```