mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-29 03:24:49 -04:00
ci(python-backends): add weekly DEPS_REFRESH cache-buster
The shared backend/Dockerfile.python ends in:
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.
DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.
Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.
Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
This commit is contained in:
87
.agents/ci-caching.md
Normal file
87
.agents/ci-caching.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# CI Build Caching
|
||||
|
||||
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
|
||||
|
||||
## Cache layout
|
||||
|
||||
- **Cache registry**: `quay.io/go-skynet/ci-cache`
|
||||
- **One tag per matrix entry**, derived from the existing `tag-suffix`:
|
||||
- Backend builds (`backend_build.yml`): `cache<tag-suffix>`
|
||||
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
|
||||
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
|
||||
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
|
||||
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
|
||||
|
||||
## Read/write semantics
|
||||
|
||||
| Trigger | `cache-from` | `cache-to` |
|
||||
|---|---|---|
|
||||
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
|
||||
| `pull_request` | yes | **no** |
|
||||
|
||||
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
|
||||
|
||||
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
|
||||
|
||||
## Self-warming, no separate populator
|
||||
|
||||
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
|
||||
|
||||
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
|
||||
|
||||
## The `DEPS_REFRESH` cache-buster (Python backends)
|
||||
|
||||
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
|
||||
|
||||
```dockerfile
|
||||
ARG DEPS_REFRESH=initial
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
```
|
||||
|
||||
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
|
||||
|
||||
`DEPS_REFRESH` defends against that:
|
||||
|
||||
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
|
||||
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
|
||||
- Within a week, builds stay warm.
|
||||
|
||||
This applies only to `Dockerfile.python` because:
|
||||
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
|
||||
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
|
||||
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
|
||||
|
||||
### Adjusting the cadence
|
||||
|
||||
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
|
||||
|
||||
## Manually evicting cache
|
||||
|
||||
To force a fully cold build for one backend or the whole image:
|
||||
|
||||
```bash
|
||||
# Delete a single tag (requires quay credentials with admin on the repo)
|
||||
curl -X DELETE \
|
||||
-H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
|
||||
|
||||
# List all tags
|
||||
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
|
||||
```
|
||||
|
||||
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
|
||||
|
||||
## What the cache **does not** cover
|
||||
|
||||
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
|
||||
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
|
||||
|
||||
## Touching the cache pipeline
|
||||
|
||||
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
|
||||
|
||||
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
|
||||
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
|
||||
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
|
||||
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
|
||||
11
.github/workflows/backend_build.yml
vendored
11
.github/workflows/backend_build.yml
vendored
@@ -208,6 +208,15 @@ jobs:
|
||||
username: ${{ secrets.quayUsername }}
|
||||
password: ${{ secrets.quayPassword }}
|
||||
|
||||
# Weekly cache-buster for the per-backend `make` step. Most Python
|
||||
# backends list unpinned deps (torch, transformers, vllm, ...), so a
|
||||
# warm cache freezes upstream versions indefinitely. Rolling this
|
||||
# weekly forces a re-resolve of the install layer at most once per
|
||||
# week, picking up newer wheels without a full cold rebuild.
|
||||
- name: Compute deps refresh key
|
||||
id: deps_refresh
|
||||
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v7
|
||||
if: github.event_name != 'pull_request'
|
||||
@@ -222,6 +231,7 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
@@ -245,6 +255,7 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
|
||||
@@ -19,6 +19,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
|
||||
|------|-------------|
|
||||
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
|
||||
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
||||
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
|
||||
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
|
||||
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
||||
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
|
||||
|
||||
@@ -203,6 +203,13 @@ COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
|
||||
ARG FROM_SOURCE=""
|
||||
ENV FROM_SOURCE=${FROM_SOURCE}
|
||||
|
||||
# Cache-buster for the per-backend `make` step. Most Python backends list
|
||||
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
|
||||
# would otherwise freeze upstream versions indefinitely. CI passes a value
|
||||
# that rolls weekly so the install layer is rebuilt at most once per week
|
||||
# and picks up newer wheels from PyPI / nightly indexes.
|
||||
ARG DEPS_REFRESH=initial
|
||||
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
|
||||
# Package GPU libraries into the backend's lib directory
|
||||
|
||||
Reference in New Issue
Block a user