* ci: add per-arch + manifest-merge support for LocalAI server image
Mirror the backend_build.yml + backend_merge.yml pattern shipped in
PR #9726 for the LocalAI server image:
- image_build.yml accepts optional platform-tag (default ''), scopes
registry cache to cache-localai<suffix>-<platform-tag>, and pushes
by canonical digest only on push events. Digests upload as artifacts
named digests-localai<suffix>-<platform-tag>, with a "-core"
placeholder when tag-suffix is empty so the merge job's download
pattern doesn't over-match across multiple suffixes.
- image_merge.yml is a new reusable workflow that downloads matching
digest artifacts and assembles the final tagged manifest list via
docker buildx imagetools create.
Image names differ from backend_*.yml: the LocalAI server is published
under quay.io/go-skynet/local-ai and localai/localai (not -backends).
Not yet wired into image.yml / image-pr.yml — Commit C does that.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: fan out per-arch split to remaining 34 backends
Convert all remaining linux/amd64,linux/arm64 entries in
backend-matrix.yml to per-arch + manifest-merge form. Each was a
single matrix entry running both arches on x86 under QEMU emulation;
each becomes two entries — amd64 on ubuntu-latest, arm64 on
ubuntu-24.04-arm (native).
Four backends that were on bigger-runner (-cpu-llama-cpp,
-cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have
both legs moved to free tier as part of the same change. They are
compile-only (no torch/CUDA install) and fit comfortably with the
setup-build-disk /mnt relocation. Phase 4 (next commit) retires the
remaining 5 single-arch bigger-runner entries.
After this commit:
- 271 total matrix entries (was 237)
- 0 multi-arch entries left
- 36 per-arch pairs (34 new + 2 pilots from PR #9727)
- 5 bigger-runner entries remaining (single-arch, Phase 4 target)
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: split LocalAI image multi-arch entries per arch + merge
Mirror the backend per-arch split for the main LocalAI image:
- image.yml's core-image-build matrix: split the core ('') and
-gpu-vulkan entries into amd64 + arm64 legs each. amd64 on
ubuntu-latest, arm64 on ubuntu-24.04-arm (native).
- New top-level core-image-merge and gpu-vulkan-image-merge jobs
call image_merge.yml after core-image-build completes.
- image-pr.yml's image-build matrix: split the -vulkan-core entry.
No merge job added on the PR side — image_build.yml's digest-push
is push-only-event-gated, so a PR-side merge would have nothing
to download.
After this commit, no workflow file references
linux/amd64,linux/arm64 in a single matrix slot.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: retire bigger-runner from backend matrix (Phase 4)
Migrate the remaining 5 single-arch bigger-runner entries to
ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt
relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of
working space — enough for ROCm dev image (~16 GB), CUDA toolkit
(~5 GB), and the per-backend compile/install steps these entries do.
Backends migrated:
- -gpu-nvidia-cuda-12-llama-cpp
- -gpu-nvidia-cuda-12-turboquant
- -gpu-rocm-hipblas-faster-whisper
- -gpu-rocm-hipblas-coqui
- -cpu-ik-llama-cpp
After this commit, .github/backend-matrix.yml has zero bigger-runner
references. The bigger-runner used in tests-vibevoice-cpp-grpc-
transcription (test-extra.yml) is a separate concern handled in a
follow-up.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1)
Intel oneAPI base image is ~6 GB; each backend's wheel install
stays well within the ~100 GB working space provided by Phase 3's
setup-build-disk /mnt relocation. Lowest-risk batch of the
arc-runner-set retirement.
Backends migrated:
vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts,
fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: migrate 15 ROCm Python backends to free tier (Phase 5.2)
ROCm dev image (~16 GB) plus per-backend torch/wheels install fits
on ubuntu-latest with the /mnt-relocated Docker root. These entries
include the heavier vLLM/sglang/transformers/diffusers stack on
ROCm; if any specific backend OOMs or runs out of disk, individual
flips back to arc-runner-set are revertable per-entry.
Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on
arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/
ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/
voxcpm/pocket-tts/neutts).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: migrate 6 CUDA Python backends to free tier (Phase 5.3)
vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest
backends in the matrix — flash-attn intermediate layers can spike
disk usage during build. setup-build-disk's /mnt relocation gives
~100 GB working space which fits the documented peak.
Highest-risk batch of the arc-runner-set retirement; if any
backend fails to build on free tier, the per-entry runs-on flip
is the unit of revert.
Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}.
After this commit, .github/backend-matrix.yml has zero references
to arc-runner-set or bigger-runner. The migration is complete.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: disable provenance on multi-registry digest pushes
Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7
pushes a single build to TWO registries simultaneously with
push-by-digest=true, buildx generates a per-registry provenance
attestation manifest (because mode=max — the default for push:true —
includes the runner ID). That makes the resulting manifest-list digest
diverge across registries:
arm64 -cpu-faster-whisper build:
image manifest: sha256:d3bdd34b... (identical, content-only)
quay manifest list: sha256:66b4cfc8... (with quay attestation)
dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation)
steps.build.outputs.digest returns only one of the list digests
(empirically the dockerhub one). The merge job then asks
"quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that
list has digest 66b4cfc8 there. Result: imagetools create fails with
"not found" and the merge job fails (run 25581983094, job 75110021491).
Setting provenance: false drops the per-registry attestation; the
manifest-list digest becomes pure content, identical across both
registries, and steps.build.outputs.digest works on either lookup.
Applied to backend_build.yml and image_build.yml — both refactored
to use the same multi-registry digest-push pattern in the prior PRs.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Self-hosted runners (arc-runner-set, bigger-runner) cannot reach
azure.archive.ubuntu.com — they live in different networks (e.g. our
arc-runner-set Kubernetes cluster) where Azure's mirror IP is not
routable. Symptom: "Connection failed [IP: 51.11.236.225 80]" with each
Ign:/Err: cycle taking 60s, hanging the build for ~16 minutes before
exit 100.
Pick the mirror based on `runner.environment`:
* github-hosted (ubuntu-latest, ubuntu-24.04-arm) → Azure
(http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
— same VPC as the runner.
* self-hosted (arc-runner-set, bigger-runner) → kernel.org
(https://mirrors.edge.kernel.org for both archive and ports)
— publicly reachable from any network.
The choice now lives in one place: the .github/actions/configure-apt-mirror
composite action exposes `effective-mirror` / `effective-ports-mirror`
outputs so the reusable workflows can forward the same value as Docker
build-args without duplicating the per-runner-environment branch.
The now-redundant `apt-mirror` / `apt-ports-mirror` workflow inputs on
image_build.yml and backend_build.yml are dropped — defaults live in the
composite action and are visible there.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ci): allow routing apt traffic through an alternate Ubuntu mirror
Adds opt-in APT_MIRROR / APT_PORTS_MIRROR knobs to all Dockerfiles, the
Makefile, and CI workflows so we can fail over to a non-canonical Ubuntu
mirror when archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com
are degraded (recently observed: multi-day DDoS against the default pool).
Defaults are empty everywhere — behavior is unchanged unless a mirror is
configured. To enable in CI, set the repo-level GitHub Actions variables
APT_MIRROR (and APT_PORTS_MIRROR for arm64 builds). Locally:
make docker APT_MIRROR=http://azure.archive.ubuntu.com
A small POSIX-sh helper in .docker/apt-mirror.sh rewrites both DEB822
(/etc/apt/sources.list.d/ubuntu.sources, Ubuntu 24.04+) and the legacy
/etc/apt/sources.list before the first apt-get update. Dockerfile stages
load it via RUN --mount=type=bind, so there is no extra layer and no
cache invalidation when the script is unchanged. Reusable workflows also
rewrite the runner's own /etc/apt sources before any sudo apt-get call.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(apt-mirror): default to the Azure mirror, visible in the workflow source
Bakes Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
in as the default for both Docker builds and runner-side apt — rather than
hiding the URL behind a GitHub Actions repo variable that's not visible
from the source tree.
A new composite action at .github/actions/configure-apt-mirror is the
single source of truth for runner-side rewrites. Five standalone
workflows (build-test, release, tests-e2e, tests-ui-e2e, update_swagger)
just `uses: ./.github/actions/configure-apt-mirror`.
Three workflows (image_build, backend_build, checksum_checker) keep an
inline bash rewrite, because they install/upgrade git via apt *before*
the checkout step (so the local composite action isn't loadable yet).
The Azure URL is visible in those files too.
The `apt-mirror` / `apt-ports-mirror` inputs of the reusable workflows
keep their now-Azure defaults — they still feed the Docker build-args
block in addition to the inline runner-side rewrite. Callers (image.yml,
image-pr.yml, backend.yml, backend_pr.yml) drop the previous
`vars.APT_MIRROR` plumbing and rely on those defaults.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(apt-mirror): drop Force Install GIT, consolidate on the composite action
The PPA git upgrade ran add-apt-repository ppa:git-core/ppa, which talks
to api.launchpad.net — also part of Canonical's infrastructure and
currently returning HTTP 504. The Azure mirror only covers
archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com, not PPAs.
The system git that ubuntu-latest already ships is sufficient for
actions/checkout and the build pipeline, so just drop the upgrade. With
that gone, the apt-before-checkout constraint disappears too — all three
holdouts (image_build, backend_build, checksum_checker) can now switch
to ./.github/actions/configure-apt-mirror like the other five.
Net: 0 inline apt-mirror blocks, all 8 workflows route through the
composite action.
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
from the unused gha cache to type=registry pointing at
quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
ignore-error=true. Master/tag builds populate their own
per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
Dockerfile stage that was removed by b1fc5acd in July 2025, has been
failing every night since, and never populated the gha cache. The new
registry-cache scheme is self-warming, so no separate populator is
needed.
- Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS
build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in
the root Dockerfile (the root Dockerfile no longer compiles gRPC; the
source build now lives in backend/Dockerfile.{llama-cpp,
ik-llama-cpp, turboquant} only and uses its own ARG defaults).
- Drop the unused grpc-base-image input from image_build.yml plus the
matrix passthroughs in image.yml / image-pr.yml.
- Drop the unused GRPC_VERSION env in test.yml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
AIO images are behind, and takes effort to maintain these. Wizard and
installation of models have been semplified massively, so AIO images
lost their purpose.
This allows us to be more laser focused on main images and reliefes
stress from CI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: use ubuntu 24.04 for cuda13 l4t images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openblas from containers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: split remaining backends and drop embedded backends
- Drop silero-vad, huggingface, and stores backend from embedded
binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): be specific with llama-cpp backend templates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(docs): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): minor fixes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop all ffmpeg references
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: run protogen-go
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Always enable p2p mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gorelease file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(stores): do not always load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Mac OS fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add python symlink, use absolute python env path when running backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ci): do not push images when building PRs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
fix: more places where we are installing grpc that need a version specified
fix: attempt to fix metal tests
fix: metal/brew is forcing an update, they don't have 1.58 available anymore
Signed-off-by: Chris Jowett <421501+cryptk@users.noreply.github.com>
* feat(build): adjust number of parallel make jobs
* fix: update make on MacOS from brew to support --output-sync argument
* fix: cache grpc with version as part of key to improve validity of cache hits
* fix: use gmake for tests-apple to use the updated GNU make version
* fix: actually use the new make version for tests-apple
* feat: parallelize tests-extra
* feat: attempt to cache grpc build for docker images
* fix: don't quote GRPC version
* fix: don't cache go modules, we have limited cache space, better used elsewhere
* fix: release with the same version of go that we test with
* fix: don't fail on exporting cache layers
* fix: remove deprecated BUILD_GRPC docker arg from Makefile
* fix: clean up Makefile dependencies to allow for parallel builds
* refactor: remove old unused backend from Makefile
* fix: finish removing legacy backend, update piper
* fix: I broke llama... I fixed llama
* feat: give the tests and builds a few threads
* fix: ensure libraries are replaced before build, add dropreplace target
* Fix image build workflows
* cleanup backends
* switch image to ubuntu 22.04
* adapt commands for ubuntu
* transformers cleanup
* no contrib on ubuntu
* Change test model to gguf
* ci: disable bark tests (too cpu-intensive)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* cleanup
* refinements
* use intel base image
* Makefile: Add docker targets
* Change test model
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>