mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-18 05:33:09 -04:00
c2fe0a647500a52303c74c812cb095966d360695
25 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
86a7f6c9fa |
ci: close GC race + cascade-skip + darwin grpc gaps from v4.2.1 (#9781)
* ci: close the GC race + cascade-skip + darwin grpc gaps from v4.2.1
v4.2.1's backend.yml run (#25701862853) exposed three independent issues
on top of the singletons fix shipped in
|
||
|
|
9be5310394 |
chore(deps): bump actions/upload-artifact from 4 to 7 (#9770)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
ea00199554 |
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true, no tags applied at build time). User-facing tagging happens in backend_merge.yml's `imagetools create` step. Before this commit, scripts/changed-backends.js emitted a merge entry only for tag-suffixes with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python images, vLLM, sglang, transformers, diffusers, ...) pushed its digest untagged and stayed that way until quay's GC reaped it. Symptom: tag releases shipped multi-arch backends tagged correctly, but no v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared in the registry. Changes: - scripts/changed-backends.js drops the `group.length < 2` skip and emits two merge matrices, one per arch class, so each downstream merge job can `needs:` only its corresponding build matrix. - backend.yml splits backend-merge-jobs into multiarch and singlearch variants. The split preserves PR #9746's fix: slow singlearch CUDA builds (~6h) must not gate multiarch merges, or quay's GC reaps the multiarch per-arch digests before they're tagged. - backend_pr.yml mirrors the split. - backend_build.yml renames the digest artifact from `digests<suffix>-<platform-tag>` to `digests<suffix>--<platform-tag-or-"single">`. The `--` separator prevents the merge-side glob from over-matching sibling backends whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs -cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder keeps the name well-formed when platform-tag is empty. - backend_merge.yml updates the download pattern to match. Verified locally: a tag-push event now expands to 36 multiarch merge entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one per singleton, including -gpu-nvidia-cuda-12-vllm at index 24). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
593f3a8648 |
ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738)
* ci(backend_build): plumb builder-base-image and BUILDER_TARGET build-args Adds an optional builder-base-image input. When set, BUILDER_BASE_IMAGE is forwarded as a build-arg AND BUILDER_TARGET=builder-prebuilt is set to select the variant Dockerfile's prebuilt-base stage. When empty, BUILDER_TARGET=builder-fromsource (the default) keeps the existing from-source build path. This makes the prebuilt-base optimization opt-in per matrix entry without breaking local `make backends/<name>` invocations or backends whose Dockerfile doesn't have a prebuilt path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(llama-cpp,ik-llama-cpp,turboquant): multi-target Dockerfiles for prebuilt + from-source Restructure the three llama.cpp-derived Dockerfiles so each supports two builder paths in a single file, selected via the BUILDER_TARGET build-arg: BUILDER_TARGET=builder-fromsource (default) - Standalone build: gRPC stage + apt installs + (conditionally) CUDA/ROCm/Vulkan + compile. - Used by `make backends/llama-cpp` locally and any caller that doesn't supply a prebuilt base. BUILDER_TARGET=builder-prebuilt - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache: base-grpc-* shipped in PR #9737). - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs. - Used by CI when the matrix entry sets builder-base-image. Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage (BuildKit doesn't support variable expansion directly in COPY --from), then COPY --from=builder pulls package output from the chosen path. BuildKit prunes the unreferenced builder, so each build only does the work for the chosen path. The compile RUN is identical between both builder stages, so it's factored into .docker/<name>-compile.sh and bind-mounted into both. ccache mount + cache-id stay per-arch / per-build-type. Local DX preserved: `make backends/llama-cpp` (no extra args) defaults to BUILDER_TARGET=builder-fromsource and works exactly as before. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend.yml,backend_pr.yml): forward builder-base-image from matrix Plumbs the new optional builder-base-image input from matrix into backend_build.yml. backend_build.yml derives BUILDER_TARGET from whether builder-base-image is set, so matrix entries that map to a prebuilt base get the prebuilt path; entries that don't (python/go/ rust backends) fall through to the default builder-fromsource (which their own Dockerfiles don't reference, so it's a no-op for them). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend-matrix): wire builder-base-image to llama-cpp variants For every entry whose Dockerfile is llama-cpp/ik-llama-cpp/turboquant, add a builder-base-image field pointing at the appropriate prebuilt quay.io/go-skynet/ci-cache:base-grpc-* tag. backend_build.yml derives BUILDER_TARGET from this field's presence: non-empty -> builder-prebuilt; empty -> builder-fromsource. So this commit alone activates the prebuilt-base path for these 23 backends in CI, while local `make backends/<name>` (no extra args) keeps the from-source path. Mapping by (build-type, arch): - '' / amd64 -> base-grpc-amd64 - '' / arm64 -> base-grpc-arm64 - cublas-12 / amd64 -> base-grpc-cuda-12-amd64 - cublas-13 / amd64 -> base-grpc-cuda-13-amd64 - cublas-13 / arm64 -> base-grpc-cuda-13-arm64 - hipblas / amd64 -> base-grpc-rocm-amd64 - vulkan / amd64 -> base-grpc-vulkan-amd64 - vulkan / arm64 -> base-grpc-vulkan-arm64 - sycl_* / amd64 -> base-grpc-intel-amd64 - cublas-12 + JetPack r36.4.0 / arm64 -> base-grpc-l4t-cuda-12-arm64 Cold-build savings expected: ~25-35 min per variant (skips the gRPC compile + toolchain install that's now in the base). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: add base-grpc-l4t-cuda-12-arm64 variant for legacy JetPack entries Two matrix entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64- turboquant) build against nvcr.io/nvidia/l4t-jetpack:r36.4.0 + CUDA 12 ARM64. They're distinct from -nvidia-l4t-cuda-13-arm64-* which use Ubuntu 24.04 + CUDA 13 sbsa. Add the missing JetPack-based variant to base-images.yml so those two entries' builder-base-image mapping in the previous commit resolves. Bootstrap order before merging this PR (re-run base-images.yml on this branch — 9 existing variants hit BuildKit cache, only the new l4t-cuda-12-arm64 builds cold): gh workflow run base-images.yml --ref ci/base-images-consumers Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: extract base-builder install logic into .docker/install-base-deps.sh Pre-extraction, the apt + protoc + cmake + conditional CUDA/ROCm/Vulkan + gRPC install logic was duplicated across four files: - backend/Dockerfile.base-grpc-builder (CI prebuilt-base source of truth) - backend/Dockerfile.llama-cpp (builder-fromsource stage) - backend/Dockerfile.ik-llama-cpp (builder-fromsource stage) - backend/Dockerfile.turboquant (builder-fromsource stage) A bump to e.g. CUDA toolkit packages had to be made in 4 places, and drift between the prebuilt base and the variant-Dockerfile from-source path was a real concern (ik-llama-cpp's hipblas branch was already missing the rocBLAS Kernels echo that llama-cpp / turboquant / base-grpc-builder all had). Factor the install logic into a single .docker/install-base-deps.sh that reads its inputs from env vars and runs conditionally on BUILD_TYPE / CUDA_*_VERSION / TARGETARCH. Each Dockerfile now bind- mounts the script alongside .docker/apt-mirror.sh and invokes it from a single RUN step. The variant Dockerfiles' grpc-source stage is removed entirely — the script handles gRPC compile + install at /opt/grpc, and the builder-fromsource stage mirrors builder-prebuilt by copying /opt/grpc/. to /usr/local/. Result: - install-base-deps.sh: 244 lines (one source of truth) - Dockerfile.base-grpc-builder: 268 -> 98 lines - Dockerfile.llama-cpp: 361 -> 157 lines - Dockerfile.ik-llama-cpp: 348 -> 151 lines - Dockerfile.turboquant: 355 -> 154 lines - Total Dockerfile bytes: 1332 -> 560 lines (58% reduction) Bit-equivalence between prebuilt and from-source paths is now enforced by construction: both invoke the same script with the same inputs. A side-effect is that ik-llama-cpp now also gets the rocBLAS Kernels echo + clblas block parity it was previously missing. Includes the BUILD_TYPE=clblas branch (libclblast-dev) for parity even though no current CI matrix entry uses it. After this commit's force-push, base-images.yml needs to be redispatched on this branch — the Dockerfile.base-grpc-builder content shifts so the existing cache won't apply for the install layer (gRPC layer also rebuilds since it's now in the same RUN step). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(base-images): skip-drivers on JetPack l4t variant cuda-nvcc-12-0 isn't installable via apt on the JetPack r36.4.0 base image — JetPack ships CUDA preinstalled at /usr/local/cuda and its apt feed doesn't carry the cuda-nvcc-* packages from the public repositories. The original matrix entry for -nvidia-l4t-arm64-llama-cpp on master sets skip-drivers: 'true' for exactly this reason; the new base-grpc-l4t-cuda-12-arm64 base needs to match. Also forwards SKIP_DRIVERS as a build-arg from matrix into the build (was missing entirely before this commit). Caught by run 25612030775 — l4t-cuda-12-arm64 failed at: E: Package 'cuda-nvcc-12-0' has no installation candidate Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
f0374aa0e8 |
ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) (#9730)
* ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
1f313cfdb0 |
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from backend_build.yml, image_build.yml, and test.yml into a single composite action. The three callers had slightly different inline blocks; the composite uses the more aggressive backend_build/image_build variant for all three callers — test.yml jobs now also purge snapd, edge/firefox/ powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost + $AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners. In test.yml, actions/checkout now runs before the composite action call because the composite lives at ./.github/actions/free-disk-space and requires a checked-out repo. The original ordering relied on jlumbroso/free-disk-space@main being a remote action; this is the minimum-invasive change to support a local composite. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: path-filter backend.yml master push Run scripts/changed-backends.js on master pushes too (not just PRs) so unrelated commits don't rebuild all ~210 backend container images. Tag pushes still build the full matrix via FORCE_ALL. Push events use the GitHub Compare API to diff event.before..event.after. Edge cases (first push with zero base, API truncation beyond 300 files, missing fields, network failure) fall back to "run everything" — better safe than silently miss a backend. The matrix literal moves from .github/workflows/backend.yml into a new data-only file at .github/backend-matrix.yml (outside workflows/ so actionlint doesn't try to parse it as a workflow). Both backend.yml and backend_pr.yml now consume the dynamic matrix output uniformly via fromJson(needs.generate-matrix.outputs.matrix); the script reads the matrix from the new location. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: bound max-parallel on backend-jobs matrices Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free pool while migration is in flight. Lift after Phases 4-5 retire the self-hosted runners. Also drops a leftover commented-out max-parallel line that lived in backend.yml since the previous matrix shape. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: scope backend cache per arch, push by digest Prepare backend_build.yml for the multi-arch split. The reusable workflow now accepts a `platform-tag` input ("amd64" / "arm64") that scopes the registry cache to cache<suffix>-<platform-tag> and (on push events) pushes the resulting image by canonical digest only. Digests are uploaded as artifacts named digests<suffix>-<platform-tag> for the merge job (Task 2.2) to consume. `platform-tag` is optional with empty default during the migration — existing callers continue to work unchanged (their cache key just becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will update callers to pass an explicit "amd64" / "arm64" value. Phase 6 flips the input to required: true once every caller is wired. PR builds keep their existing tag-based push to ci-tests but pick up the per-arch cache key. Multi-arch PR builds remain emulated in this commit; they migrate when the matrix entries split (Tasks 2.3+). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: add backend_merge.yml reusable workflow Joins per-arch digest artifacts (uploaded by backend_build.yml when called with platform-tag) into a single tagged multi-arch manifest list via `docker buildx imagetools create`. Called once per backend by backend.yml after both per-arch build jobs succeed. The workflow generates final tags identically to the previous monolithic build job (same docker/metadata-action invocation), so consumers of quay.io/go-skynet/local-ai-backends and localai/localai-backends see no tag-shape change. Two imagetools calls (one per registry) reference the same per-arch digests under different image names. Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix entries to expand into per-arch + merge jobs that call this workflow. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: relocate Docker data-root to /mnt on hosted runners GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and restarting with data-root pointing there yields ~100 GB of working space (combined with the apt-clean from Task 1.1) — enough for ROCm dev image + vLLM torch install + flash-attn intermediate layers. This is the structural change that lets Phases 4 and 5 of the migration plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest. The composite action is no-op on self-hosted runners (where /mnt isn't expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted pool's /mnt shape separately before enabling). Wired into both backend_build.yml and image_build.yml between free-disk-space and the first Docker operation. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(setup-build-disk): chmod 1777 /mnt/docker-tmp buildx CLI runs as the unprivileged 'runner' user and creates config dirs under TMPDIR before binding them into the buildkit container. /mnt is root-owned by default, so the original mkdir produced a permission-denied when buildx tried to write there: ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied Mirror /tmp's permission mode (1777 — world-writable with sticky bit) on /mnt/docker-tmp so non-root processes can stage their config. Caught by the first PR run (image-build hipblas job) on PR #9726. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: weekly full-matrix rebuild via cron Path-filtering backend.yml master push (the previous commit's main optimization) skips backends whose source didn't change. That broke the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on %Y-W%V busts the install layer's cache on a new ISO week, but only when the build actually runs. Untouched Python backends (torch, transformers, vllm with no version pin) would otherwise ship stale wheels indefinitely. Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule events have no event.ref / event.before, so the script's changedFiles == null fallback (scripts/changed-backends.js) emits the full matrix automatically — no script change needed. C++/Go backends with pinned deps cache-hit and complete fast, so the weekly cost is dominated by Python re-resolves which is exactly what we want. workflow_dispatch added so a maintainer can trigger an ad-hoc full-matrix rebuild without faking a tag push. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
50580a84ae |
fix(ci): switch apt mirror per runner — azure on github-hosted, kernel.org on self-hosted
Self-hosted runners (arc-runner-set, bigger-runner) cannot reach
azure.archive.ubuntu.com — they live in different networks (e.g. our
arc-runner-set Kubernetes cluster) where Azure's mirror IP is not
routable. Symptom: "Connection failed [IP: 51.11.236.225 80]" with each
Ign:/Err: cycle taking 60s, hanging the build for ~16 minutes before
exit 100.
Pick the mirror based on `runner.environment`:
* github-hosted (ubuntu-latest, ubuntu-24.04-arm) → Azure
(http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
— same VPC as the runner.
* self-hosted (arc-runner-set, bigger-runner) → kernel.org
(https://mirrors.edge.kernel.org for both archive and ports)
— publicly reachable from any network.
The choice now lives in one place: the .github/actions/configure-apt-mirror
composite action exposes `effective-mirror` / `effective-ports-mirror`
outputs so the reusable workflows can forward the same value as Docker
build-args without duplicating the per-runner-environment branch.
The now-redundant `apt-mirror` / `apt-ports-mirror` workflow inputs on
image_build.yml and backend_build.yml are dropped — defaults live in the
composite action and are visible there.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
8edac61e57 |
feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650)
* feat(ci): allow routing apt traffic through an alternate Ubuntu mirror
Adds opt-in APT_MIRROR / APT_PORTS_MIRROR knobs to all Dockerfiles, the
Makefile, and CI workflows so we can fail over to a non-canonical Ubuntu
mirror when archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com
are degraded (recently observed: multi-day DDoS against the default pool).
Defaults are empty everywhere — behavior is unchanged unless a mirror is
configured. To enable in CI, set the repo-level GitHub Actions variables
APT_MIRROR (and APT_PORTS_MIRROR for arm64 builds). Locally:
make docker APT_MIRROR=http://azure.archive.ubuntu.com
A small POSIX-sh helper in .docker/apt-mirror.sh rewrites both DEB822
(/etc/apt/sources.list.d/ubuntu.sources, Ubuntu 24.04+) and the legacy
/etc/apt/sources.list before the first apt-get update. Dockerfile stages
load it via RUN --mount=type=bind, so there is no extra layer and no
cache invalidation when the script is unchanged. Reusable workflows also
rewrite the runner's own /etc/apt sources before any sudo apt-get call.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(apt-mirror): default to the Azure mirror, visible in the workflow source
Bakes Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
in as the default for both Docker builds and runner-side apt — rather than
hiding the URL behind a GitHub Actions repo variable that's not visible
from the source tree.
A new composite action at .github/actions/configure-apt-mirror is the
single source of truth for runner-side rewrites. Five standalone
workflows (build-test, release, tests-e2e, tests-ui-e2e, update_swagger)
just `uses: ./.github/actions/configure-apt-mirror`.
Three workflows (image_build, backend_build, checksum_checker) keep an
inline bash rewrite, because they install/upgrade git via apt *before*
the checkout step (so the local composite action isn't loadable yet).
The Azure URL is visible in those files too.
The `apt-mirror` / `apt-ports-mirror` inputs of the reusable workflows
keep their now-Azure defaults — they still feed the Docker build-args
block in addition to the inline runner-side rewrite. Callers (image.yml,
image-pr.yml, backend.yml, backend_pr.yml) drop the previous
`vars.APT_MIRROR` plumbing and rely on those defaults.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(apt-mirror): drop Force Install GIT, consolidate on the composite action
The PPA git upgrade ran add-apt-repository ppa:git-core/ppa, which talks
to api.launchpad.net — also part of Canonical's infrastructure and
currently returning HTTP 504. The Azure mirror only covers
archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com, not PPAs.
The system git that ubuntu-latest already ships is sufficient for
actions/checkout and the build pipeline, so just drop the upgrade. With
that gone, the apt-before-checkout constraint disappears too — all three
holdouts (image_build, backend_build, checksum_checker) can now switch
to ./.github/actions/configure-apt-mirror like the other five.
Net: 0 inline apt-mirror blocks, all 8 workflows route through the
composite action.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
18e039f305 |
fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds (#9626)
* fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds
|
||
|
|
f4036fa83f |
ci(python-backends): add weekly DEPS_REFRESH cache-buster
The shared backend/Dockerfile.python ends in:
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.
DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.
Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.
Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
|
||
|
|
bdfa5e934a |
ci: switch image/backend build cache to a dedicated registry image
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
from the unused gha cache to type=registry pointing at
quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
ignore-error=true. Master/tag builds populate their own
per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
Dockerfile stage that was removed by
|
||
|
|
181ebb6df4 |
feat: voice recognition (#9500)
* feat(voice-recognition): add /v1/voice/{verify,analyze,embed} + speaker-recognition backend
Audio analog to face recognition. Adds three gRPC RPCs
(VoiceVerify / VoiceAnalyze / VoiceEmbed), their Go service and HTTP
layers, a new FLAG_SPEAKER_RECOGNITION capability flag, and a Python
backend scaffold under backend/python/speaker-recognition/ wrapping
SpeechBrain ECAPA-TDNN with a parallel OnnxDirectEngine for
WeSpeaker / 3D-Speaker ONNX exports.
The kokoros Rust backend gets matching unimplemented trait stubs —
tonic's async_trait has no defaults, so adding an RPC without Rust
stubs breaks the build (same regression fixed by
|
||
|
|
c66c41e8d7 |
fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445)
Commit
|
||
|
|
01bd3d8212 |
chore(deps): bump docker/login-action from 3 to 4 (#8918)
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
7f11f66b44 |
chore(deps): bump docker/build-push-action from 6 to 7 (#8919)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
2a351e1f0c |
chore(deps): bump docker/metadata-action from 5 to 6 (#8917)
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](https://github.com/docker/metadata-action/compare/v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
8dfeea2f55 |
fix: use ubuntu 24.04 for cuda13 l4t images (#7418)
* fix: use ubuntu 24.04 for cuda13 l4t images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop openblas from containers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
91248da09e |
chore(deps): bump actions/checkout from 5 to 6 (#7339)
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
ebd1db2f09 |
chore(ci): Build modified backends on PR (#6086)
* chore(stablediffusion-ggml): rm redundant comment Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(ci): Build modified backends on PR Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
0ca1765c17 |
chore(deps): bump actions/checkout from 4 to 5 (#6014)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
580687da46 |
feat: remove stablediffusion-ggml from main binary (#5861)
* feat: split stablediffusion-ggml from main binary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt ci tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to support nvidial4t Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Latest fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
294f7022f3 |
feat: do not bundle llama-cpp anymore (#5790)
* Build llama.cpp separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Start to try to attach some tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add git and small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: correctly autoload external backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run AIO tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Slightly update the Makefile helps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt auto-bumper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run linux test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add llama-cpp into build pipelines Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add default capability (for cpu) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop llama-cpp specific logic from the backend loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop grpc install in ci for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Pass by backends path for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build protogen at start Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests): set backends path consistently Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Correctly configure the backends path Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to build for darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Compile for metal on arm64/darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run build off from cross-arch Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to the backend index nvidia-l4t and cpu's llama-cpp backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build also darwin-x86 for llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable arm64 builds temporary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test backend build on PR Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup build backend reusable workflow Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pass by skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use crane Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * x86 darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add packaging step for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix leftover from bark-cpp extraction Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to fix hipblas build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
cfc9dfa3d5 |
fix(ci): better handling of latest images for backends (#5735)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
be3ff482d0 |
chore(ci): try to optimize disk space when tagging latest (#5695)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
1e1f0ee321 |
chore(backends): move bark-cpp to the backend gallery (#5682)
chore(bark-cpp): move outside from binary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |