mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
docs(paged): consolidate the dev-trail docs into one canonical README
The paged-attention patch directory had accumulated ~55 scattered dev docs (results, progress, scope, lever, and gap-analysis notes). Consolidate the durable content of all of them into one canonical backend/cpp/llama-cpp/patches/paged/README.md covering: what the patchset is, the architecture (paged KV + block-table flash-attn, the gated-DeltaNet SSM decode path, NVFP4 FP4-MMA, the decode-first scheduler), the full 0001-0030 patch series table with bit-exact status, the GB10 benchmarks (patched-vs-stock-vs-vLLM + the Apple M4 architectural note), the dev notes (bit-exact methodology, the per-path gate, the MoE-parity conclusion, the rejected/flat levers, the opt-in bf16-SSM mode), arch+quant generality, the pin + canary maintenance policy, and the published NVFP4 gallery models. Delete the consolidated-away dev trail. Keep the three operational docs the README links to: PIN_SYNC_c299a92c.md (canary reference), PAGED_BITEXACT_NOTE.md (per-path gate reference) and LOCALAI_LLAMACPP_BACKEND_PLAN.md (the ship-as-own-backend design-of-record), plus the benchmark plots + csv. The .patch files and the unit/bench .cpp are untouched. Repoint every external reference to a deleted doc at the new README: grpc-server.cpp, docs/content/features/backends.md, gallery/index.yaml, the canary apply script (PIN_BUMP_APPLY_CHECK.md -> README), and the base patches/README.md (ADDITIVE_DESIGN.md -> README). The canary's PIN_SYNC reference still resolves; its inert SSM_DECODE_FIX_RESULTS.md glob (a patch-internal path matcher, not a repo-doc link) is left intact. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
2
.github/scripts/paged-canary-apply.sh
vendored
2
.github/scripts/paged-canary-apply.sh
vendored
@@ -28,7 +28,7 @@
|
||||
# build on 0019's code, the rejection cascades to them too. This is a
|
||||
# PRE-EXISTING shipped-series defect, present identically on every pin, NOT an
|
||||
# upstream break (see backend/cpp/llama-cpp/patches/paged/PIN_SYNC_c299a92c.md
|
||||
# and PIN_BUMP_APPLY_CHECK.md). We exclude ONLY that dev-doc path and still
|
||||
# and README.md). We exclude ONLY that dev-doc path and still
|
||||
# apply 0019's real code hunks atomically, so a genuine code-hunk break in 0019
|
||||
# still fails the canary. prepare.sh tolerates the same hunk via
|
||||
# `patch ... || true`; this mirrors that tolerance precisely.
|
||||
|
||||
Reference in New Issue
Block a user