LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-18 05:33:09 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	86a7f6c9fa	ci: close GC race + cascade-skip + darwin grpc gaps from v4.2.1 (#9781 ) * ci: close the GC race + cascade-skip + darwin grpc gaps from v4.2.1 v4.2.1's backend.yml run (#25701862853) exposed three independent issues on top of the singletons fix shipped in `ea001995`. Address all three plus two related cleanups: 1. quay GC race in backend-merge-jobs-multiarch (12/37 merges failed with "manifest not found"). Even after PR #9746 split multi/single-arch merges, the multiarch matrix itself takes ~2h to drain at max-parallel: 8, and the earliest per-arch digests (push-by-digest, no tag) get reaped by quay's GC before the merge runs. The split bounded the race for multiarch; it doesn't eliminate it. Anchor each per-arch digest immediately to a tag in the internal ci-cache image (`keepalive-<run_id><tag-suffix>-<platform-tag>`). Quay won't GC tagged manifests. backend_merge.yml deletes the keepalive tags via quay REST API after publishing the user-facing manifest list. Cleanup is best-effort: if the quay token is not OAuth-scoped the merge does NOT fail, the orphan tags just persist. 2. cascade-skip on backend-merge-jobs-singlearch. v4.2.1 had 2 failed and 2 cancelled singlearch builds (out of 199); GHA's default `needs:` semantics cascade-skipped the entire singlearch merge matrix, so zero singleton tags were applied even though 197 singletons built successfully. Wrap the merge `if:` in `!cancelled() && ...` for both multi and single arch in backend.yml and backend_pr.yml so partial build failures publish the successful tag-suffixes. 3. Darwin llama-cpp grpc-server build fails with `find_package(absl)` not found. Same shape as the ccache/blake3/fmt/hiredis/xxhash/zstd fix already in `Dependencies`: a brew cache hit restores `/opt/homebrew/Cellar/grpc` so `brew install grpc` no-ops, but abseil isn't in our Cellar cache list and never gets installed alongside, leaving grpc's CMake unable to resolve it. Mirror the `brew reinstall ccache` line with `brew reinstall grpc` to re-validate grpc's full transitive dep closure on every cache-hit run. 4. Move the four heaviest CUDA cpp builds back to bigger-runner. v4.2.1 wall-clock: -gpu-nvidia-cuda-12-llama-cpp 5h36m, -gpu-nvidia-cuda-12-turboquant 6h05m, -gpu-nvidia-cuda-13-llama-cpp 5h37m, -gpu-nvidia-cuda-13-turboquant 6h05m. The cuda-12 turboquant and cuda-13 turboquant entries are over GHA's 6h job timeout. Phase 5.3 of the free-tier migration (PR #9730) had explicitly flagged this batch as 'highest-risk' with a per-entry revert path. All other matrix entries (vulkan-llama-cpp ~47m, ROCm hipblas-llama-cpp ~2h, intel sycl-f32 ~1h49m) stay on free-tier ubuntu-latest. Verified locally: all six edited workflow YAMLs parse cleanly. Real verification has to come from the next tag release run. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: extract keepalive anchor + cleanup into .github/scripts/ The two inline shell blocks from the previous commit are long enough to hurt readability of the workflow YAML and benefit from their own files with self-contained docs. Move them to .github/scripts/: anchor-digest-in-cache.sh backend_build.yml's keepalive anchor cleanup-keepalive-tags.sh backend_merge.yml's best-effort cleanup Workflow steps reduce to a single `run:` invocation each, with all the parameter plumbing handled by env vars on the step. backend_merge.yml also gains a sparse `actions/checkout@v6` step (sparse to .github/scripts only) so the cleanup script is available on the runner — backend_build already checks out for the docker build. Net workflow diff: -36 lines across the two files. Script logic and behavior are byte-identical to the inline version. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-12 17:22:09 +02:00
dependabot[bot]	9be5310394	chore(deps): bump actions/upload-artifact from 4 to 7 (#9770 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-12 09:20:03 +02:00
Ettore Di Giacinto	ea00199554	ci: tag every backend digest, including singletons backend_build.yml pushes by canonical digest only (push-by-digest=true, no tags applied at build time). User-facing tagging happens in backend_merge.yml's `imagetools create` step. Before this commit, scripts/changed-backends.js emitted a merge entry only for tag-suffixes with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python images, vLLM, sglang, transformers, diffusers, ...) pushed its digest untagged and stayed that way until quay's GC reaped it. Symptom: tag releases shipped multi-arch backends tagged correctly, but no v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared in the registry. Changes: - scripts/changed-backends.js drops the `group.length < 2` skip and emits two merge matrices, one per arch class, so each downstream merge job can `needs:` only its corresponding build matrix. - backend.yml splits backend-merge-jobs into multiarch and singlearch variants. The split preserves PR #9746's fix: slow singlearch CUDA builds (~6h) must not gate multiarch merges, or quay's GC reaps the multiarch per-arch digests before they're tagged. - backend_pr.yml mirrors the split. - backend_build.yml renames the digest artifact from `digests<suffix>-<platform-tag>` to `digests<suffix>--<platform-tag-or-"single">`. The `--` separator prevents the merge-side glob from over-matching sibling backends whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs -cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder keeps the name well-formed when platform-tag is empty. - backend_merge.yml updates the download pattern to match. Verified locally: a tag-push event now expands to 36 multiarch merge entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one per singleton, including -gpu-nvidia-cuda-12-vllm at index 24). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-11 13:22:00 +00:00
LocalAI [bot]	593f3a8648	ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738 ) * ci(backend_build): plumb builder-base-image and BUILDER_TARGET build-args Adds an optional builder-base-image input. When set, BUILDER_BASE_IMAGE is forwarded as a build-arg AND BUILDER_TARGET=builder-prebuilt is set to select the variant Dockerfile's prebuilt-base stage. When empty, BUILDER_TARGET=builder-fromsource (the default) keeps the existing from-source build path. This makes the prebuilt-base optimization opt-in per matrix entry without breaking local `make backends/<name>` invocations or backends whose Dockerfile doesn't have a prebuilt path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(llama-cpp,ik-llama-cpp,turboquant): multi-target Dockerfiles for prebuilt + from-source Restructure the three llama.cpp-derived Dockerfiles so each supports two builder paths in a single file, selected via the BUILDER_TARGET build-arg: BUILDER_TARGET=builder-fromsource (default) - Standalone build: gRPC stage + apt installs + (conditionally) CUDA/ROCm/Vulkan + compile. - Used by `make backends/llama-cpp` locally and any caller that doesn't supply a prebuilt base. BUILDER_TARGET=builder-prebuilt - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache: base-grpc-* shipped in PR #9737). - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs. - Used by CI when the matrix entry sets builder-base-image. Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage (BuildKit doesn't support variable expansion directly in COPY --from), then COPY --from=builder pulls package output from the chosen path. BuildKit prunes the unreferenced builder, so each build only does the work for the chosen path. The compile RUN is identical between both builder stages, so it's factored into .docker/<name>-compile.sh and bind-mounted into both. ccache mount + cache-id stay per-arch / per-build-type. Local DX preserved: `make backends/llama-cpp` (no extra args) defaults to BUILDER_TARGET=builder-fromsource and works exactly as before. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend.yml,backend_pr.yml): forward builder-base-image from matrix Plumbs the new optional builder-base-image input from matrix into backend_build.yml. backend_build.yml derives BUILDER_TARGET from whether builder-base-image is set, so matrix entries that map to a prebuilt base get the prebuilt path; entries that don't (python/go/ rust backends) fall through to the default builder-fromsource (which their own Dockerfiles don't reference, so it's a no-op for them). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend-matrix): wire builder-base-image to llama-cpp variants For every entry whose Dockerfile is llama-cpp/ik-llama-cpp/turboquant, add a builder-base-image field pointing at the appropriate prebuilt quay.io/go-skynet/ci-cache:base-grpc-* tag. backend_build.yml derives BUILDER_TARGET from this field's presence: non-empty -> builder-prebuilt; empty -> builder-fromsource. So this commit alone activates the prebuilt-base path for these 23 backends in CI, while local `make backends/<name>` (no extra args) keeps the from-source path. Mapping by (build-type, arch): - '' / amd64 -> base-grpc-amd64 - '' / arm64 -> base-grpc-arm64 - cublas-12 / amd64 -> base-grpc-cuda-12-amd64 - cublas-13 / amd64 -> base-grpc-cuda-13-amd64 - cublas-13 / arm64 -> base-grpc-cuda-13-arm64 - hipblas / amd64 -> base-grpc-rocm-amd64 - vulkan / amd64 -> base-grpc-vulkan-amd64 - vulkan / arm64 -> base-grpc-vulkan-arm64 - sycl_* / amd64 -> base-grpc-intel-amd64 - cublas-12 + JetPack r36.4.0 / arm64 -> base-grpc-l4t-cuda-12-arm64 Cold-build savings expected: ~25-35 min per variant (skips the gRPC compile + toolchain install that's now in the base). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: add base-grpc-l4t-cuda-12-arm64 variant for legacy JetPack entries Two matrix entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64- turboquant) build against nvcr.io/nvidia/l4t-jetpack:r36.4.0 + CUDA 12 ARM64. They're distinct from -nvidia-l4t-cuda-13-arm64-* which use Ubuntu 24.04 + CUDA 13 sbsa. Add the missing JetPack-based variant to base-images.yml so those two entries' builder-base-image mapping in the previous commit resolves. Bootstrap order before merging this PR (re-run base-images.yml on this branch — 9 existing variants hit BuildKit cache, only the new l4t-cuda-12-arm64 builds cold): gh workflow run base-images.yml --ref ci/base-images-consumers Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: extract base-builder install logic into .docker/install-base-deps.sh Pre-extraction, the apt + protoc + cmake + conditional CUDA/ROCm/Vulkan + gRPC install logic was duplicated across four files: - backend/Dockerfile.base-grpc-builder (CI prebuilt-base source of truth) - backend/Dockerfile.llama-cpp (builder-fromsource stage) - backend/Dockerfile.ik-llama-cpp (builder-fromsource stage) - backend/Dockerfile.turboquant (builder-fromsource stage) A bump to e.g. CUDA toolkit packages had to be made in 4 places, and drift between the prebuilt base and the variant-Dockerfile from-source path was a real concern (ik-llama-cpp's hipblas branch was already missing the rocBLAS Kernels echo that llama-cpp / turboquant / base-grpc-builder all had). Factor the install logic into a single .docker/install-base-deps.sh that reads its inputs from env vars and runs conditionally on BUILD_TYPE / CUDA__VERSION / TARGETARCH. Each Dockerfile now bind- mounts the script alongside .docker/apt-mirror.sh and invokes it from a single RUN step. The variant Dockerfiles' grpc-source stage is removed entirely — the script handles gRPC compile + install at /opt/grpc, and the builder-fromsource stage mirrors builder-prebuilt by copying /opt/grpc/. to /usr/local/. Result: - install-base-deps.sh: 244 lines (one source of truth) - Dockerfile.base-grpc-builder: 268 -> 98 lines - Dockerfile.llama-cpp: 361 -> 157 lines - Dockerfile.ik-llama-cpp: 348 -> 151 lines - Dockerfile.turboquant: 355 -> 154 lines - Total Dockerfile bytes: 1332 -> 560 lines (58% reduction) Bit-equivalence between prebuilt and from-source paths is now enforced by construction: both invoke the same script with the same inputs. A side-effect is that ik-llama-cpp now also gets the rocBLAS Kernels echo + clblas block parity it was previously missing. Includes the BUILD_TYPE=clblas branch (libclblast-dev) for parity even though no current CI matrix entry uses it. After this commit's force-push, base-images.yml needs to be redispatched on this branch — the Dockerfile.base-grpc-builder content shifts so the existing cache won't apply for the install layer (gRPC layer also rebuilds since it's now in the same RUN step). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> ci(base-images): skip-drivers on JetPack l4t variant cuda-nvcc-12-0 isn't installable via apt on the JetPack r36.4.0 base image — JetPack ships CUDA preinstalled at /usr/local/cuda and its apt feed doesn't carry the cuda-nvcc-* packages from the public repositories. The original matrix entry for -nvidia-l4t-arm64-llama-cpp on master sets skip-drivers: 'true' for exactly this reason; the new base-grpc-l4t-cuda-12-arm64 base needs to match. Also forwards SKIP_DRIVERS as a build-arg from matrix into the build (was missing entirely before this commit). Caught by run 25612030775 — l4t-cuda-12-arm64 failed at: E: Package 'cuda-nvcc-12-0' has no installation candidate Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-10 00:03:52 +02:00
LocalAI [bot]	f0374aa0e8	ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) (#9730 ) * ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-09 09:37:00 +02:00
LocalAI [bot]	1f313cfdb0	ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726 ) * ci: extract free-disk-space composite action Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from backend_build.yml, image_build.yml, and test.yml into a single composite action. The three callers had slightly different inline blocks; the composite uses the more aggressive backend_build/image_build variant for all three callers — test.yml jobs now also purge snapd, edge/firefox/ powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost + $AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners. In test.yml, actions/checkout now runs before the composite action call because the composite lives at ./.github/actions/free-disk-space and requires a checked-out repo. The original ordering relied on jlumbroso/free-disk-space@main being a remote action; this is the minimum-invasive change to support a local composite. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: path-filter backend.yml master push Run scripts/changed-backends.js on master pushes too (not just PRs) so unrelated commits don't rebuild all ~210 backend container images. Tag pushes still build the full matrix via FORCE_ALL. Push events use the GitHub Compare API to diff event.before..event.after. Edge cases (first push with zero base, API truncation beyond 300 files, missing fields, network failure) fall back to "run everything" — better safe than silently miss a backend. The matrix literal moves from .github/workflows/backend.yml into a new data-only file at .github/backend-matrix.yml (outside workflows/ so actionlint doesn't try to parse it as a workflow). Both backend.yml and backend_pr.yml now consume the dynamic matrix output uniformly via fromJson(needs.generate-matrix.outputs.matrix); the script reads the matrix from the new location. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: bound max-parallel on backend-jobs matrices Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free pool while migration is in flight. Lift after Phases 4-5 retire the self-hosted runners. Also drops a leftover commented-out max-parallel line that lived in backend.yml since the previous matrix shape. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: scope backend cache per arch, push by digest Prepare backend_build.yml for the multi-arch split. The reusable workflow now accepts a `platform-tag` input ("amd64" / "arm64") that scopes the registry cache to cache<suffix>-<platform-tag> and (on push events) pushes the resulting image by canonical digest only. Digests are uploaded as artifacts named digests<suffix>-<platform-tag> for the merge job (Task 2.2) to consume. `platform-tag` is optional with empty default during the migration — existing callers continue to work unchanged (their cache key just becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will update callers to pass an explicit "amd64" / "arm64" value. Phase 6 flips the input to required: true once every caller is wired. PR builds keep their existing tag-based push to ci-tests but pick up the per-arch cache key. Multi-arch PR builds remain emulated in this commit; they migrate when the matrix entries split (Tasks 2.3+). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: add backend_merge.yml reusable workflow Joins per-arch digest artifacts (uploaded by backend_build.yml when called with platform-tag) into a single tagged multi-arch manifest list via `docker buildx imagetools create`. Called once per backend by backend.yml after both per-arch build jobs succeed. The workflow generates final tags identically to the previous monolithic build job (same docker/metadata-action invocation), so consumers of quay.io/go-skynet/local-ai-backends and localai/localai-backends see no tag-shape change. Two imagetools calls (one per registry) reference the same per-arch digests under different image names. Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix entries to expand into per-arch + merge jobs that call this workflow. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: relocate Docker data-root to /mnt on hosted runners GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and restarting with data-root pointing there yields ~100 GB of working space (combined with the apt-clean from Task 1.1) — enough for ROCm dev image + vLLM torch install + flash-attn intermediate layers. This is the structural change that lets Phases 4 and 5 of the migration plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest. The composite action is no-op on self-hosted runners (where /mnt isn't expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted pool's /mnt shape separately before enabling). Wired into both backend_build.yml and image_build.yml between free-disk-space and the first Docker operation. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(setup-build-disk): chmod 1777 /mnt/docker-tmp buildx CLI runs as the unprivileged 'runner' user and creates config dirs under TMPDIR before binding them into the buildkit container. /mnt is root-owned by default, so the original mkdir produced a permission-denied when buildx tried to write there: ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied Mirror /tmp's permission mode (1777 — world-writable with sticky bit) on /mnt/docker-tmp so non-root processes can stage their config. Caught by the first PR run (image-build hipblas job) on PR #9726. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: weekly full-matrix rebuild via cron Path-filtering backend.yml master push (the previous commit's main optimization) skips backends whose source didn't change. That broke the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on %Y-W%V busts the install layer's cache on a new ISO week, but only when the build actually runs. Untouched Python backends (torch, transformers, vllm with no version pin) would otherwise ship stale wheels indefinitely. Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule events have no event.ref / event.before, so the script's changedFiles == null fallback (scripts/changed-backends.js) emits the full matrix automatically — no script change needed. C++/Go backends with pinned deps cache-hit and complete fast, so the weekly cost is dominated by Python re-resolves which is exactly what we want. workflow_dispatch added so a maintainer can trigger an ad-hoc full-matrix rebuild without faking a tag push. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-08 23:43:41 +02:00
Ettore Di Giacinto	50580a84ae	fix(ci): switch apt mirror per runner — azure on github-hosted, kernel.org on self-hosted Self-hosted runners (arc-runner-set, bigger-runner) cannot reach azure.archive.ubuntu.com — they live in different networks (e.g. our arc-runner-set Kubernetes cluster) where Azure's mirror IP is not routable. Symptom: "Connection failed [IP: 51.11.236.225 80]" with each Ign:/Err: cycle taking 60s, hanging the build for ~16 minutes before exit 100. Pick the mirror based on `runner.environment`: * github-hosted (ubuntu-latest, ubuntu-24.04-arm) → Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com) — same VPC as the runner. * self-hosted (arc-runner-set, bigger-runner) → kernel.org (https://mirrors.edge.kernel.org for both archive and ports) — publicly reachable from any network. The choice now lives in one place: the .github/actions/configure-apt-mirror composite action exposes `effective-mirror` / `effective-ports-mirror` outputs so the reusable workflows can forward the same value as Docker build-args without duplicating the per-runner-environment branch. The now-redundant `apt-mirror` / `apt-ports-mirror` workflow inputs on image_build.yml and backend_build.yml are dropped — defaults live in the composite action and are visible there. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-03 22:59:26 +00:00
Ettore Di Giacinto	8edac61e57	feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650 ) * feat(ci): allow routing apt traffic through an alternate Ubuntu mirror Adds opt-in APT_MIRROR / APT_PORTS_MIRROR knobs to all Dockerfiles, the Makefile, and CI workflows so we can fail over to a non-canonical Ubuntu mirror when archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com are degraded (recently observed: multi-day DDoS against the default pool). Defaults are empty everywhere — behavior is unchanged unless a mirror is configured. To enable in CI, set the repo-level GitHub Actions variables APT_MIRROR (and APT_PORTS_MIRROR for arm64 builds). Locally: make docker APT_MIRROR=http://azure.archive.ubuntu.com A small POSIX-sh helper in .docker/apt-mirror.sh rewrites both DEB822 (/etc/apt/sources.list.d/ubuntu.sources, Ubuntu 24.04+) and the legacy /etc/apt/sources.list before the first apt-get update. Dockerfile stages load it via RUN --mount=type=bind, so there is no extra layer and no cache invalidation when the script is unchanged. Reusable workflows also rewrite the runner's own /etc/apt sources before any sudo apt-get call. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(apt-mirror): default to the Azure mirror, visible in the workflow source Bakes Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com) in as the default for both Docker builds and runner-side apt — rather than hiding the URL behind a GitHub Actions repo variable that's not visible from the source tree. A new composite action at .github/actions/configure-apt-mirror is the single source of truth for runner-side rewrites. Five standalone workflows (build-test, release, tests-e2e, tests-ui-e2e, update_swagger) just `uses: ./.github/actions/configure-apt-mirror`. Three workflows (image_build, backend_build, checksum_checker) keep an inline bash rewrite, because they install/upgrade git via apt before the checkout step (so the local composite action isn't loadable yet). The Azure URL is visible in those files too. The `apt-mirror` / `apt-ports-mirror` inputs of the reusable workflows keep their now-Azure defaults — they still feed the Docker build-args block in addition to the inline runner-side rewrite. Callers (image.yml, image-pr.yml, backend.yml, backend_pr.yml) drop the previous `vars.APT_MIRROR` plumbing and rely on those defaults. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(apt-mirror): drop Force Install GIT, consolidate on the composite action The PPA git upgrade ran add-apt-repository ppa:git-core/ppa, which talks to api.launchpad.net — also part of Canonical's infrastructure and currently returning HTTP 504. The Azure mirror only covers archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com, not PPAs. The system git that ubuntu-latest already ships is sufficient for actions/checkout and the build pipeline, so just drop the upgrade. With that gone, the apt-before-checkout constraint disappears too — all three holdouts (image_build, backend_build, checksum_checker) can now switch to ./.github/actions/configure-apt-mirror like the other five. Net: 0 inline apt-mirror blocks, all 8 workflows route through the composite action. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-03 23:50:13 +02:00
Russell Sim	18e039f305	fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds (#9626 ) * fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds `399c1dec` wired amdgpu-targets through the backend_build workflow_call interface, intending the input's default value to cover matrix entries that don't specify targets. However, GitHub Actions only applies a workflow_call input default when the caller omits the input entirely. When backend.yml passes `amdgpu-targets: ${{ matrix.amdgpu-targets }}` and the matrix entry has no amdgpu-targets key, the expression evaluates to an empty string, which is treated as an explicit value — bypassing the default. The result is Docker receiving AMDGPU_TARGETS="" which in turn causes Make's ?= default to be skipped (since the variable is already set in the environment, even to empty), and cmake gets -DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an indeterminate target rather than the intended GPU list. Fix this at two levels: 1. backend.yml: use a \|\| fallback in the expression so that an undefined matrix.amdgpu-targets never reaches the reusable workflow as an empty string. The target list is the canonical default and lives here. 2. backend_build.yml: remove the now-misleading default value from the input declaration. The default never fired due to the above bug, so keeping it implied a guarantee that didn't exist. 3. backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard after the ?= assignment so that if AMDGPU_TARGETS is empty (whether from environment or any future CI wiring mistake) the build fails immediately with a clear message rather than silently producing a binary compiled for an unknown GPU target. Assisted-by: Claude Code:claude-sonnet-4-6 Signed-off-by: Russell Sim <rsl@simopolis.xyz> * fix(build): plumb AMDGPU_TARGETS through to Docker builds The docker-build-backend Makefile macro and Dockerfile.golang did not pass AMDGPU_TARGETS to the inner make invocation, so hipblas builds always used the backend Makefile's hardcoded default GPU targets regardless of what was specified via environment or CI inputs. Signed-off-by: Russell Sim <rsl@simopolis.xyz> --------- Signed-off-by: Russell Sim <rsl@simopolis.xyz>	2026-05-02 15:53:14 +02:00
Ettore Di Giacinto	f4036fa83f	ci(python-backends): add weekly DEPS_REFRESH cache-buster The shared backend/Dockerfile.python ends in: RUN cd /${BACKEND} && PORTABLE_PYTHON=true make which `pip install`s each backend's requirements*.txt. A scan of all 34 Python backends shows every single one ships at least some unpinned deps (torch, transformers, vllm, diffusers, ...). With the registry cache now enabled, that `make` layer's BuildKit hash depends only on Dockerfile instructions + COPYed source — not on what pip resolves at runtime — so a warm cache would freeze upstream versions indefinitely. DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as a build-arg, so the install layer invalidates at most once per week and re-resolves PyPI / nightly indexes. Within a week, builds stay warm. Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock) already lock their deps, and the C++ backends pull gRPC at a pinned tag and llama.cpp at a pinned commit. Add .agents/ci-caching.md documenting the cache layout (quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics (master writes, PRs read-only), DEPS_REFRESH semantics, and how to manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7-1m	2026-04-27 14:21:11 +00:00
Ettore Di Giacinto	bdfa5e934a	ci: switch image/backend build cache to a dedicated registry image - Switch cache-from/cache-to in backend_build.yml and image_build.yml from the unused gha cache to type=registry pointing at quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with ignore-error=true. Master/tag builds populate their own per-matrix-entry cache; PR builds read-only. - Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc` Dockerfile stage that was removed by `b1fc5acd` in July 2025, has been failing every night since, and never populated the gha cache. The new registry-cache scheme is self-warming, so no separate populator is needed. - Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in the root Dockerfile (the root Dockerfile no longer compiles gRPC; the source build now lives in backend/Dockerfile.{llama-cpp, ik-llama-cpp, turboquant} only and uses its own ARG defaults). - Drop the unused grpc-base-image input from image_build.yml plus the matrix passthroughs in image.yml / image-pr.yml. - Drop the unused GRPC_VERSION env in test.yml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7-1m	2026-04-27 13:13:04 +00:00
Ettore Di Giacinto	181ebb6df4	feat: voice recognition (#9500 ) * feat(voice-recognition): add /v1/voice/{verify,analyze,embed} + speaker-recognition backend Audio analog to face recognition. Adds three gRPC RPCs (VoiceVerify / VoiceAnalyze / VoiceEmbed), their Go service and HTTP layers, a new FLAG_SPEAKER_RECOGNITION capability flag, and a Python backend scaffold under backend/python/speaker-recognition/ wrapping SpeechBrain ECAPA-TDNN with a parallel OnnxDirectEngine for WeSpeaker / 3D-Speaker ONNX exports. The kokoros Rust backend gets matching unimplemented trait stubs — tonic's async_trait has no defaults, so adding an RPC without Rust stubs breaks the build (same regression fixed by `eb01c772` for face). Swagger, /api/instructions, and the auth RouteFeatureRegistry / APIFeatures list are updated so the endpoints surface everywhere a client or admin UI looks. Assisted-by: Claude:claude-opus-4-7 * feat(voice-recognition): add 1:N identify + register/forget endpoints Mirrors the face-recognition register/identify/forget surface. New package core/services/voicerecognition/ carries a Registry interface and a local-store-backed implementation (same in-memory vector-store plumbing facerecognition uses, separate instance so the embedding spaces stay isolated). Handlers under /v1/voice/{register,identify,forget} reuse backend.VoiceEmbed to compute the probe vector, then delegate the nearest-neighbour search to the registry. Default cosine-distance threshold is tuned for ECAPA-TDNN on VoxCeleb (0.25, EER ~1.9%). As with the face registry, the current backing is in-memory only — a pgvector implementation is a future constructor-level swap. Assisted-by: Claude:claude-opus-4-7 * feat(voice-recognition): gallery, docs, CI and e2e coverage - backend/index.yaml: speaker-recognition backend entry + CPU and CUDA-12 image variants (plus matching development variants). - gallery/index.yaml: speechbrain-ecapa-tdnn (default) and wespeaker-resnet34 model entries. The WeSpeaker SHA-256 is a deliberate placeholder — the HF URI must be curl'd and its hash filled in before the entry installs. - docs/content/features/voice-recognition.md: API reference + quickstart, mirrors the face-recognition docs. - React UI: CAP_SPEAKER_RECOGNITION flag export (consumers follow face's precedent — no dedicated tab yet). - tests/e2e-backends: voice_embed / voice_verify / voice_analyze specs. Helper resolveFaceFixture is reused as-is — the only thing face/voice share is "download a file into workDir", so no need for a new helper. - Makefile: docker-build-speaker-recognition + test-extra-backend- speaker-recognition-{ecapa,all} targets. Audio fixtures default to VCTK p225/p226 samples from HuggingFace. - CI: test-extra.yml grows a tests-speaker-recognition-grpc job mirroring insightface. backend.yml matrix gains CPU + CUDA-12 image build entries — scripts/changed-backends.js auto-picks these up. Assisted-by: Claude:claude-opus-4-7 * feat(voice-recognition): wire a working /v1/voice/analyze head Adds AnalysisHead: a lazy-loading age / gender / emotion inference wrapper that plugs into both SpeechBrainEngine and OnnxDirectEngine. Defaults to two open-licence HuggingFace checkpoints: - audeering/wav2vec2-large-robust-24-ft-age-gender (Apache 2.0) — age regression + 3-way gender (female / male / child). - superb/wav2vec2-base-superb-er (Apache 2.0) — 4-way emotion. Both are optional and degrade gracefully when transformers or the model can't be loaded — the engine raises NotImplementedError so the gRPC layer returns 501 instead of a generic 500. Emotion classes pass through from the model (neutral/happy/angry/sad on the default checkpoint); the e2e test now accepts any non-empty dominant gender so custom age_gender_model overrides don't fail it. Adds transformers to the backend's CPU and CUDA-12 requirements. Assisted-by: Claude:claude-opus-4-7 * fix(voice-recognition): pin real WeSpeaker ResNet34 ONNX SHA-256 Replaces the placeholder hash in gallery/index.yaml with the actual SHA-256 (7bb2f06e…) of the upstream Wespeaker/wespeaker-voxceleb-resnet34-LM ONNX at ~25MB. `local-ai models install wespeaker-resnet34` now succeeds. Assisted-by: Claude:claude-opus-4-7 * fix(voice-recognition): soundfile loader + honest analyze default Two issues surfaced on first end-to-end smoke with the actual backend image: 1. torchaudio.load in torchaudio 2.8+ requires the torchcodec package for audio decoding. Switch SpeechBrainEngine._load_waveform to the already-present soundfile (listed in requirements.txt) plus a numpy linear resample to 16kHz. Drops a heavy ffmpeg-linked dep and the codepath we never exercise (torchaudio's ffmpeg backend). 2. The AnalysisHead was defaulting to audeering/wav2vec2-large-robust- 24-ft-age-gender, but AutoModelForAudioClassification silently mangles that checkpoint — it reports the age head weights as UNEXPECTED and re-initialises the classifier head with random values, so the "gender" output is noise and there is no age output at all. Make age/gender opt-in instead (empty default; users wire a cleanly-loadable Wav2Vec2ForSequenceClassification checkpoint via age_gender_model: option). Emotion keeps its working Superb default. Also broaden _infer_age_gender's tensor-shape handling and catch runtime exceptions so a dodgy age/gender head never takes down the whole analyze call. Docs and README updated to match the new policy. Verified with the branch-scoped gallery on localhost: - voice/embed → 192-d ECAPA-TDNN vector - voice/verify → same-clip dist≈6e-08 verified=true; cross-speaker dist 0.76–0.99 verified=false (as expected) - voice/register/identify/forget → round-trip works, 404 on unknown id - voice/analyze → emotion populated, age/gender omitted (opt-in) Assisted-by: Claude:claude-opus-4-7 * fix(voice-recognition): real CI audio fixtures + fixture-agnostic verify spec Two issues surfaced after CI actually ran the speaker-recognition e2e target (I'd curl-tested against a running server but hadn't run the make target locally): 1. The default BACKEND_TEST_VOICE_AUDIO_* URLs pointed at huggingface.co/datasets/CSTR-Edinburgh/vctk paths that return 404 (the dataset is gated). Swap them for the speechbrain test samples served from github.com/speechbrain/speechbrain/raw/develop/ — public, no auth, correct 16kHz mono format. 2. The VoiceVerify spec required d(file1,file2) < 0.4, assuming file1/file2 were same-speaker. The speechbrain samples are three different speakers (example1/2/5), and there is no easy un-gated source of true same-speaker audio pairs (VoxCeleb/VCTK/LibriSpeech are all license- or size-gated for CI use). Replace the ceiling check with a relative-ordering assertion: d(pair) > d(same-clip) for both file2 and file3 — that's enough to prove the embeddings encode speaker info, and it works with any three non-identical clips. Actual speaker ordering d(1,2) vs d(1,3) is logged but not asserted. Local run: 4/4 voice specs pass (Health, LoadModel, VoiceEmbed, VoiceVerify) on the built backend image. 12 non-voice specs skipped as expected. Assisted-by: Claude:claude-opus-4-7 * fix(ci): checkout with submodules in the reusable backend_build workflow The kokoros Rust backend build fails with failed to read .../sources/Kokoros/kokoros/Cargo.toml: No such file because the reusable backend_build.yml workflow's actions/checkout step was missing `submodules: true`. Dockerfile.rust does `COPY . /LocalAI`, and without the submodule files the subsequent `cargo build` can't find the vendored Kokoros crate. The bug pre-dates this PR — scripts/changed-backends.js only triggers the kokoros image job when something under backend/rust/kokoros or the shared proto changes, so master had been coasting past it. The voice-recognition proto addition re-broke it. Other checkouts in backend.yml (llama-cpp-darwin) and test-extra.yml (insightface, kokoros, speaker-recognition) already pass `submodules: true`; this brings the shared backend image builder in line. Assisted-by: Claude:claude-opus-4-7	2026-04-23 12:07:14 +02:00
Russell Sim	c66c41e8d7	fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445 ) Commit `8839a71c` exposed AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so GPU targets could be overridden, but never wired the value through the CI workflow inputs. Without it, Docker receives AMDGPU_TARGETS="" which overrides the Makefile's ?= default, causing all hipblas builds to compile only for gfx906 regardless of the target list in the Makefile. Add amdgpu-targets as a workflow_call input with the same default list as the Makefile, and pass it as AMDGPU_TARGETS in the build-args of both the push and PR build steps. Assisted-by: Claude Code:claude-sonnet-4-6 Signed-off-by: Russell Sim <rsl@simopolis.xyz>	2026-04-20 23:41:19 +02:00
dependabot[bot]	01bd3d8212	chore(deps): bump docker/login-action from 3 to 4 (#8918 ) Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 22:30:11 +01:00
dependabot[bot]	7f11f66b44	chore(deps): bump docker/build-push-action from 6 to 7 (#8919 ) Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 22:29:51 +01:00
dependabot[bot]	2a351e1f0c	chore(deps): bump docker/metadata-action from 5 to 6 (#8917 ) Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](https://github.com/docker/metadata-action/compare/v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-09 22:27:02 +01:00
Ettore Di Giacinto	8dfeea2f55	fix: use ubuntu 24.04 for cuda13 l4t images (#7418 ) * fix: use ubuntu 24.04 for cuda13 l4t images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop openblas from containers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-03 09:47:03 +01:00
dependabot[bot]	91248da09e	chore(deps): bump actions/checkout from 5 to 6 (#7339 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-24 21:18:15 +01:00
Richard Palethorpe	ebd1db2f09	chore(ci): Build modified backends on PR (#6086 ) * chore(stablediffusion-ggml): rm redundant comment Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(ci): Build modified backends on PR Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2025-08-18 17:56:34 +02:00
dependabot[bot]	0ca1765c17	chore(deps): bump actions/checkout from 4 to 5 (#6014 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-08-12 18:54:39 +02:00
Ettore Di Giacinto	580687da46	feat: remove stablediffusion-ggml from main binary (#5861 ) * feat: split stablediffusion-ggml from main binary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt ci tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to support nvidial4t Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Latest fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-19 21:58:53 +02:00
Ettore Di Giacinto	294f7022f3	feat: do not bundle llama-cpp anymore (#5790 ) * Build llama.cpp separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Start to try to attach some tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add git and small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: correctly autoload external backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run AIO tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Slightly update the Makefile helps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt auto-bumper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run linux test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add llama-cpp into build pipelines Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add default capability (for cpu) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop llama-cpp specific logic from the backend loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop grpc install in ci for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Pass by backends path for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build protogen at start Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests): set backends path consistently Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Correctly configure the backends path Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to build for darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Compile for metal on arm64/darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run build off from cross-arch Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to the backend index nvidia-l4t and cpu's llama-cpp backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build also darwin-x86 for llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable arm64 builds temporary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test backend build on PR Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup build backend reusable workflow Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pass by skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use crane Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * x86 darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add packaging step for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix leftover from bark-cpp extraction Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to fix hipblas build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-18 13:24:12 +02:00
Ettore Di Giacinto	cfc9dfa3d5	fix(ci): better handling of latest images for backends (#5735 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-27 10:32:58 +02:00
Ettore Di Giacinto	be3ff482d0	chore(ci): try to optimize disk space when tagging latest (#5695 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-20 15:54:14 +02:00
Ettore Di Giacinto	1e1f0ee321	chore(backends): move `bark-cpp` to the backend gallery (#5682 ) chore(bark-cpp): move outside from binary Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-18 19:48:50 +02:00

25 Commits