mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-02 20:37:03 -04:00
docs(paged): record patch mirror readiness phase
Assisted-by: Codex:gpt-5
This commit is contained in:
@@ -3767,3 +3767,61 @@ Decision:
|
||||
- Next default-on consideration requires regenerating the LocalAI patch series
|
||||
from the fork and rerunning the broader current serving snapshot gates. Do not
|
||||
default it from Phase68 alone.
|
||||
|
||||
## Patch Series Mirror Readiness Phase69 Result
|
||||
|
||||
Phase69 is recorded in
|
||||
`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`.
|
||||
It did not change llama.cpp source and did not edit generated LocalAI patch
|
||||
files. It verified that the current LocalAI series is still drift-free at the
|
||||
Phase37 tip, then dry-ran the additive patches needed to mirror the current
|
||||
local fork HEAD.
|
||||
|
||||
Current committed series:
|
||||
|
||||
| check | value |
|
||||
|-------|-------|
|
||||
| base | `0ed235ea2c17a19fc8238668653946721ed136fd` |
|
||||
| patch count | `54` |
|
||||
| applied tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` |
|
||||
| Phase37 fork-tip tree | `dedb1182910eafe9f6875588dc8285bfb544cce5` |
|
||||
| current fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
|
||||
| committed series matches Phase37 tip | `yes` |
|
||||
| committed series matches current fork HEAD | `no` |
|
||||
|
||||
Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only
|
||||
candidate patches:
|
||||
|
||||
| projected patch | source commit |
|
||||
|-----------------|---------------|
|
||||
| `0064-feat-server-trace-serving-admission-batches.patch` | `c6cb8460e` |
|
||||
| `0065-feat-server-add-admission-trace-histograms.patch` | `bd7b2e952` |
|
||||
| `0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch` | `8a97629a4` |
|
||||
| `0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch` | `3b6ab5fa8` |
|
||||
| `0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch` | `8759213e3` |
|
||||
| `0069-test-cuda-cover-W4A16-direct-activation-policy.patch` | `41be3da5b` |
|
||||
| `0070-feat-cuda-route-W4A16-direct-activation-stub.patch` | `7967ad47f` |
|
||||
| `0071-feat-cuda-trace-layout-tensor-names.patch` | `fa944bb5f` |
|
||||
| `0072-feat-cuda-trace-activation-quant-routes.patch` | `afc2c7030` |
|
||||
| `0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch` | `ea0875d14` |
|
||||
|
||||
Projected mirror check:
|
||||
|
||||
| check | value |
|
||||
|-------|-------|
|
||||
| current patches | `54` |
|
||||
| missing patches | `10` |
|
||||
| projected patches | `64` |
|
||||
| applied plus missing tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
|
||||
| fork HEAD tree | `fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4` |
|
||||
| projected series matches fork HEAD | `yes` |
|
||||
|
||||
Decision:
|
||||
|
||||
- The Phase68 BF16 F32 opt-in would become projected patch `0073` and has a
|
||||
conflict-free path into the LocalAI series.
|
||||
- Do not commit generated patches yet. The fork branch is `26` commits ahead of
|
||||
`fork/localai-paged`, and the repo workflow requires pushing the fork before
|
||||
regenerating the LocalAI patch series. Push still requires explicit approval.
|
||||
- After push approval, regenerate `0064..0073`, repeat the tree hash check, and
|
||||
only then run broader serving gates for any default-on BF16 policy decision.
|
||||
|
||||
@@ -1029,3 +1029,53 @@ Decision: carry the shortcut as a default-off opt-in candidate. It is no longer
|
||||
just a prefill-only win, but Phase68 is not enough to default it on. Any future
|
||||
default-on proposal must mirror the fork commit into the LocalAI patch series
|
||||
and rerun a broader current serving snapshot with pre/post md5 and op gates.
|
||||
|
||||
## 14. PHASE69 RESULT: PATCH SERIES MIRROR READINESS
|
||||
|
||||
Phase69 checked the patch-series state without pushing and without editing
|
||||
generated patch files. Plan:
|
||||
`docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`.
|
||||
|
||||
Current committed LocalAI patches still match the Phase37 fork tip:
|
||||
|
||||
```text
|
||||
base=0ed235ea2c17a19fc8238668653946721ed136fd
|
||||
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_patch_tip=yes
|
||||
match_fork_head=no
|
||||
patch_count=54
|
||||
```
|
||||
|
||||
Dry-run export from `2d590d770..ea0875d14` produced ten additive source-only
|
||||
patches, projected as `0064..0073`. Applying current `0001..0063` plus temp
|
||||
`0064..0073` onto the pin exactly reconstructed current fork HEAD:
|
||||
|
||||
```text
|
||||
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_fork_head=yes
|
||||
current_patch_count=54
|
||||
missing_patch_count=10
|
||||
projected_patch_count=64
|
||||
```
|
||||
|
||||
Projected patch tail:
|
||||
|
||||
- `0064` serving admission trace (`c6cb8460e`)
|
||||
- `0065` admission histograms (`bd7b2e952`)
|
||||
- `0066..0068` TTFT prefill-first scheduler knobs (`8a97629a4`,
|
||||
`3b6ab5fa8`, `8759213e3`)
|
||||
- `0069..0070` W4A16 direct-activation policy/stub (`41be3da5b`,
|
||||
`7967ad47f`)
|
||||
- `0071` layout trace (`fa944bb5f`)
|
||||
- `0072` quant trace (`afc2c7030`)
|
||||
- `0073` BF16 cuBLAS F32 output (`ea0875d14`)
|
||||
|
||||
Decision: mirror regeneration is technically ready but not executed. The local
|
||||
fork is `26` commits ahead of `fork/localai-paged`, and the fork-first policy
|
||||
requires pushing before regenerating the LocalAI series. Do not push without
|
||||
explicit approval. After approval, push the fork, regenerate `0064..0073`, rerun
|
||||
the same tree-hash check, and then run the broader serving gates before any
|
||||
default-on BF16 policy change.
|
||||
|
||||
@@ -70,6 +70,61 @@ The check used a fresh worktree at `LLAMA_VERSION`, applied every
|
||||
`git write-tree` to canonical fork branch `localai-paged` at
|
||||
`2d590d770 feat(cuda): trace cublas tensor names`.
|
||||
|
||||
Phase 69 re-verified that the committed LocalAI patch series still matches the
|
||||
Phase37 fork tip, and then dry-ran the additive patch export needed for the
|
||||
current local fork HEAD. No generated patch files were edited in Phase69 because
|
||||
the repo policy requires pushing the fork branch before regenerating the LocalAI
|
||||
series, and pushes still require explicit approval.
|
||||
|
||||
Committed-series check:
|
||||
|
||||
```text
|
||||
base=0ed235ea2c17a19fc8238668653946721ed136fd
|
||||
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_patch_tip=yes
|
||||
match_fork_head=no
|
||||
patch_count=54
|
||||
```
|
||||
|
||||
Dry-run export from `2d590d770..ea0875d14` produced ten source-only candidate
|
||||
patches:
|
||||
|
||||
```text
|
||||
0064-feat-server-trace-serving-admission-batches.patch
|
||||
0065-feat-server-add-admission-trace-histograms.patch
|
||||
0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch
|
||||
0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch
|
||||
0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch
|
||||
0069-test-cuda-cover-W4A16-direct-activation-policy.patch
|
||||
0070-feat-cuda-route-W4A16-direct-activation-stub.patch
|
||||
0071-feat-cuda-trace-layout-tensor-names.patch
|
||||
0072-feat-cuda-trace-activation-quant-routes.patch
|
||||
0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch
|
||||
```
|
||||
|
||||
Projected-series check with current `0001..0063` plus temp `0064..0073`:
|
||||
|
||||
```text
|
||||
base=0ed235ea2c17a19fc8238668653946721ed136fd
|
||||
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_fork_head=yes
|
||||
current_patch_count=54
|
||||
missing_patch_count=10
|
||||
projected_patch_count=64
|
||||
```
|
||||
|
||||
Next mirror action after explicit push approval:
|
||||
|
||||
1. Push `/home/mudler/_git/llama.cpp` branch `localai-paged` to
|
||||
`fork/localai-paged`.
|
||||
2. Regenerate or copy the equivalent source-only `0064..0073` patches from the
|
||||
pushed fork.
|
||||
3. Repeat the projected-series tree hash check above against fork HEAD before
|
||||
committing generated patches.
|
||||
|
||||
## Status
|
||||
|
||||
- **0001 vendor manager — DONE.** Applies clean to the pin; builds into `libllama`.
|
||||
|
||||
@@ -0,0 +1,178 @@
|
||||
# Patch Series Mirror Readiness Phase69 Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** prove the LocalAI paged patch series can be extended from `0063` to the current local fork HEAD without conflicts, while respecting the no-push-without-approval rule.
|
||||
|
||||
**Architecture:** Do not edit generated patches yet. First verify the current on-disk series still matches the Phase37 fork tip, then export the missing commits into `/tmp`, apply current plus missing patches onto the pinned llama.cpp base, and compare that tree to the current local fork HEAD.
|
||||
|
||||
**Tech Stack:** Git worktrees, `git apply`, `git format-patch`, LocalAI paged patch stack, llama.cpp fork branch `localai-paged`.
|
||||
|
||||
---
|
||||
|
||||
## Guardrails
|
||||
|
||||
- Do not push `mudler/llama.cpp:localai-paged` without explicit user approval.
|
||||
- Do not edit `backend/cpp/llama-cpp-localai-paged/patches/paged/*.patch` directly.
|
||||
- Do not regenerate committed LocalAI patch files before the fork push step required by the repo policy.
|
||||
- Use strict `git apply`, matching the LocalAI build path.
|
||||
- Record drift as a first-class phase result.
|
||||
|
||||
## Files
|
||||
|
||||
- Create: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md`
|
||||
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md`
|
||||
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md`
|
||||
- Modify: `/home/mudler/_git/LocalAI/.claude/worktrees/feat+paged-attention/backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Verify Current Mirror Baseline
|
||||
|
||||
- [x] **Step 1: Confirm current LocalAI state**
|
||||
|
||||
Result:
|
||||
|
||||
- LocalAI HEAD: `2b2b1f0b2 docs(paged): record BF16 F32 output dense serving phase`
|
||||
- Untracked files: pre-existing `.claude/` scratch files only.
|
||||
- Patch-series tail: `0063-feat-cuda-trace-cublas-tensor-names.patch`.
|
||||
|
||||
- [x] **Step 2: Compare current patch series against Phase37 fork tip**
|
||||
|
||||
Command shape:
|
||||
|
||||
```bash
|
||||
BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile)
|
||||
CHECK=/tmp/llama-paged-series-applycheck-phase69
|
||||
git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE"
|
||||
for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch; do
|
||||
git -C "$CHECK" apply --verbose "$p"
|
||||
done
|
||||
git -C "$CHECK" add -A
|
||||
git -C "$CHECK" write-tree
|
||||
git -C /home/mudler/_git/llama.cpp rev-parse 2d590d770^{tree}
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
base=0ed235ea2c17a19fc8238668653946721ed136fd
|
||||
applied_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
patch_tip_tree=dedb1182910eafe9f6875588dc8285bfb544cce5
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_patch_tip=yes
|
||||
match_fork_head=no
|
||||
patch_count=54
|
||||
```
|
||||
|
||||
Decision: the committed LocalAI series remains correct for Phase37, but it is
|
||||
intentionally behind the local fork HEAD.
|
||||
|
||||
### Task 2: Dry-Run Missing Patch Export
|
||||
|
||||
- [x] **Step 1: Inspect fork divergence**
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
upstream=fork/localai-paged
|
||||
ahead_of_upstream=26
|
||||
ahead_of_patch_tip_2d590d770=10
|
||||
fork_head=ea0875d14225a10d87a1d0e1b9b57b74c81d873e
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
```
|
||||
|
||||
- [x] **Step 2: Export missing commits to `/tmp` only**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
OUT=/tmp/phase69_missing_patches
|
||||
rm -rf "$OUT"
|
||||
mkdir -p "$OUT"
|
||||
git -C /home/mudler/_git/llama.cpp format-patch \
|
||||
--zero-commit --no-signature --start-number 64 \
|
||||
-o "$OUT" 2d590d770..HEAD
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
0064-feat-server-trace-serving-admission-batches.patch
|
||||
0065-feat-server-add-admission-trace-histograms.patch
|
||||
0066-feat-server-add-TTFT-prefill-first-scheduler-mode.patch
|
||||
0067-feat-server-cap-TTFT-prefill-first-decode-deferral.patch
|
||||
0068-feat-server-gate-TTFT-defer-by-prompt-backlog.patch
|
||||
0069-test-cuda-cover-W4A16-direct-activation-policy.patch
|
||||
0070-feat-cuda-route-W4A16-direct-activation-stub.patch
|
||||
0071-feat-cuda-trace-layout-tensor-names.patch
|
||||
0072-feat-cuda-trace-activation-quant-routes.patch
|
||||
0073-feat-cuda-gate-BF16-cuBLAS-F32-output.patch
|
||||
```
|
||||
|
||||
- [x] **Step 3: Confirm source-only candidate paths**
|
||||
|
||||
The temp patches touch only llama.cpp source, tests, CMake, and server files:
|
||||
|
||||
```text
|
||||
ggml/src/ggml-cuda/*
|
||||
tests/*
|
||||
tools/server/*
|
||||
```
|
||||
|
||||
No markdown or LocalAI files are included in the generated candidate patches.
|
||||
|
||||
### Task 3: Prove Full Projected Mirror
|
||||
|
||||
- [x] **Step 1: Apply current plus temp patches to the pinned base**
|
||||
|
||||
Command shape:
|
||||
|
||||
```bash
|
||||
BASE=$(awk -F '?=' '/^LLAMA_VERSION/ {print $2}' backend/cpp/llama-cpp-localai-paged/Makefile)
|
||||
CHECK=/tmp/llama-paged-series-applycheck-phase69-full
|
||||
git -C /home/mudler/_git/llama.cpp worktree add --detach "$CHECK" "$BASE"
|
||||
for p in "$PWD"/backend/cpp/llama-cpp-localai-paged/patches/paged/0*.patch /tmp/phase69_missing_patches/*.patch; do
|
||||
git -C "$CHECK" apply --verbose "$p"
|
||||
done
|
||||
git -C "$CHECK" add -A
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```text
|
||||
base=0ed235ea2c17a19fc8238668653946721ed136fd
|
||||
applied_plus_missing_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
fork_head_tree=fcf5720b659c5e1e2b487ccf3c8f7289bb12b9c4
|
||||
match_fork_head=yes
|
||||
current_patch_count=54
|
||||
missing_patch_count=10
|
||||
projected_patch_count=64
|
||||
```
|
||||
|
||||
Decision: after push approval, the LocalAI patch-series regeneration path is
|
||||
known: add temp-export-equivalent patches `0064..0073`, then verify the same tree
|
||||
hash. The BF16 F32 opt-in is projected as patch `0073`.
|
||||
|
||||
### Task 4: Record and Commit Documentation
|
||||
|
||||
- [x] **Step 1: Record phase result**
|
||||
|
||||
Update:
|
||||
|
||||
- `backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md`
|
||||
- `backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md`
|
||||
- `backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md`
|
||||
|
||||
- [x] **Step 2: Commit LocalAI docs**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add -f docs/superpowers/plans/2026-07-01-patch-series-mirror-readiness-phase69.md
|
||||
git add backend/cpp/llama-cpp-localai-paged/docs/PATCH_MAINTENANCE.md \
|
||||
backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md \
|
||||
backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md
|
||||
git commit -m "docs(paged): record patch mirror readiness phase" \
|
||||
-m "Assisted-by: Codex:gpt-5"
|
||||
```
|
||||
Reference in New Issue
Block a user