ci(paged): add early-warning canary for vendored llama.cpp paged patches

The paged backend (backend/cpp/llama-cpp-localai-paged) pins its own verified llama.cpp tip and is excluded from the nightly auto-bumper so a naive bump can never silently break the shipped build. That exclusion also removed the early warning of upstream drift. This restores the signal without touching the pin. Add .github/workflows/llama-cpp-paged-canary.yml (weekly + workflow_dispatch): - apply-check job (ubuntu-latest, toolchain-free): resolve the latest ggml-org/llama.cpp master tip, shallow-checkout it, and apply the full paged series 0001-0030 in order with the build's own git-apply method via the new shared helper .github/scripts/paged-canary-apply.sh. Red on any apply break. - compile job (needs apply-check): on the exact tip it validated, build the paged backend (cublas) inside the same base-grpc-cuda-12 toolchain and the same `make grpc-server` target the shipped build uses, so a red means upstream drift, not toolchain noise. nvcc compiles the kernels with no GPU present. Red here = run a PIN_SYNC (rebase + bit-exact gate + re-export), then bump the paged Makefile pin. The canary is signal-only: it opens no PR and never moves the pin, so the shipped build and the dep-bump PRs stay green regardless. It is fully separate from bump_deps. The lone pre-existing quirk in the series (patch 0019 carries a stray modify hunk against the dev-only doc SSM_DECODE_FIX_RESULTS.md, absent from any clean upstream checkout; git apply is atomic so it rejects the whole patch and cascades to 0021/0022/0026/0028) is handled path-scoped: the helper excludes only that dev-doc and still applies 0019's real code hunks atomically, mirroring prepare.sh's tolerance, so the quirk never false-positives the canary but a genuine code break in 0019 still turns it red. Point the existing pin comments in backend/cpp/llama-cpp-localai-paged/Makefile and .github/workflows/bump_deps.yaml at this canary as the drift signal, and document it in the PIN_SYNC doc: canary red -> do a pin-sync. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 18:06:58 -04:00 · 2026-06-27 08:29:09 +00:00
parent e160041f05
commit 2bee7a5ab1
5 changed files with 294 additions and 0 deletions
--- a/backend/cpp/llama-cpp-localai-paged/Makefile
+++ b/backend/cpp/llama-cpp-localai-paged/Makefile
@@ -28,6 +28,13 @@
 # Advance ONLY via the PIN_SYNC process (rebase patches + bit-exact gate +
 # re-export), then update this value. See:
 #   backend/cpp/llama-cpp/patches/paged/PIN_SYNC_*.md
+#
+# This pin = the manual, verified sync. The signal telling you WHEN to do the
+# next sync is the early-warning canary
+# (.github/workflows/llama-cpp-paged-canary.yml): weekly it applies + compiles
+# this patch series against the latest upstream llama.cpp tip and goes red the
+# moment upstream drifts past the patches. Canary red -> run a PIN_SYNC, then
+# bump this value. The canary never touches this pin; it is signal-only.
 LLAMA_VERSION?=9d5d882d8cd0f0a9283d87ed5e6fe3ee0d925fb1

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/patches/paged/PIN_SYNC_9d5d882d.md
+++ b/backend/cpp/llama-cpp/patches/paged/PIN_SYNC_9d5d882d.md
@@ -7,6 +7,35 @@ re-exported from the rebased commits; **4 patch files changed** and are updated
 in this commit. A quick decode bench confirms the patchset performs the same on
 the new tip.

+## Early-warning canary: when to run the NEXT pin-sync
+
+The shipped pin (this file's tip, mirrored in
+`backend/cpp/llama-cpp-localai-paged/Makefile`) is advanced ONLY by this manual,
+GPU-verified PIN_SYNC. Because the paged backend is excluded from the nightly
+auto-bumper (`.github/workflows/bump_deps.yaml`), nothing nightly tells you when
+upstream has drifted past the patches. That signal comes from a dedicated
+scheduled canary:
+
+- **Workflow:** `.github/workflows/llama-cpp-paged-canary.yml` (weekly, plus
+  `workflow_dispatch`). It resolves the latest `ggml-org/llama.cpp` master tip,
+  then in two jobs (a) APPLIES the full series to that tip with the build's own
+  `git apply` method via `.github/scripts/paged-canary-apply.sh`, and (b)
+  COMPILES the paged backend (cublas) against it using the same base-grpc-cuda-12
+  toolchain + `make grpc-server` target the shipped build uses.
+- **Green** = the series still applies and compiles on upstream HEAD; nothing to
+  do.
+- **Red** = upstream moved out from under the patches. **Canary red -> run a
+  PIN_SYNC** (rebase the patches onto the new tip, pass the bit-exact gate on the
+  GPU, re-export the `.patch` files, then advance the pin). The canary is
+  signal-only: it opens no PR and never moves the pin, so the shipped build and
+  the dep-bump PRs stay green regardless.
+- **0019 handling:** the canary apply helper excludes ONLY the stray
+  `SSM_DECODE_FIX_RESULTS.md` dev-doc hunk (the pre-existing quirk documented in
+  the "Pre-existing finding" section below and in `PIN_BUMP_APPLY_CHECK.md`),
+  applying 0019's real code hunks atomically. So that benign quirk never
+  false-positives the canary, but a genuine code break in 0019 still turns it
+  red.
+
 ## Upstream jump

 - OLD LocalAI pin: `8be759e6`