17 Commits

Author SHA1 Message Date
LocalAI [bot]
593f3a8648 ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738)
* ci(backend_build): plumb builder-base-image and BUILDER_TARGET build-args

Adds an optional builder-base-image input. When set, BUILDER_BASE_IMAGE
is forwarded as a build-arg AND BUILDER_TARGET=builder-prebuilt is set
to select the variant Dockerfile's prebuilt-base stage. When empty,
BUILDER_TARGET=builder-fromsource (the default) keeps the existing
from-source build path.

This makes the prebuilt-base optimization opt-in per matrix entry
without breaking local `make backends/<name>` invocations or backends
whose Dockerfile doesn't have a prebuilt path.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(llama-cpp,ik-llama-cpp,turboquant): multi-target Dockerfiles for prebuilt + from-source

Restructure the three llama.cpp-derived Dockerfiles so each supports
two builder paths in a single file, selected via the BUILDER_TARGET
build-arg:

  BUILDER_TARGET=builder-fromsource (default)
    - Standalone build: gRPC stage + apt installs + (conditionally)
      CUDA/ROCm/Vulkan + compile.
    - Used by `make backends/llama-cpp` locally and any caller that
      doesn't supply a prebuilt base.

  BUILDER_TARGET=builder-prebuilt
    - FROM \${BUILDER_BASE_IMAGE} (one of quay.io/go-skynet/ci-cache:
      base-grpc-* shipped in PR #9737).
    - Skips ~25-35 min of gRPC compile + ~5-10 min of toolchain installs.
    - Used by CI when the matrix entry sets builder-base-image.

Final FROM scratch resolves BUILDER_TARGET via an aliasing FROM stage
(BuildKit doesn't support variable expansion directly in COPY --from),
then COPY --from=builder pulls package output from the chosen path.
BuildKit prunes the unreferenced builder, so each build only does the
work for the chosen path.

The compile RUN is identical between both builder stages, so it's
factored into .docker/<name>-compile.sh and bind-mounted into both.
ccache mount + cache-id stay per-arch / per-build-type.

Local DX preserved: `make backends/llama-cpp` (no extra args) defaults
to BUILDER_TARGET=builder-fromsource and works exactly as before.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(backend.yml,backend_pr.yml): forward builder-base-image from matrix

Plumbs the new optional builder-base-image input from matrix into
backend_build.yml. backend_build.yml derives BUILDER_TARGET from
whether builder-base-image is set, so matrix entries that map to a
prebuilt base get the prebuilt path; entries that don't (python/go/
rust backends) fall through to the default builder-fromsource (which
their own Dockerfiles don't reference, so it's a no-op for them).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(backend-matrix): wire builder-base-image to llama-cpp variants

For every entry whose Dockerfile is llama-cpp/ik-llama-cpp/turboquant,
add a builder-base-image field pointing at the appropriate prebuilt
quay.io/go-skynet/ci-cache:base-grpc-* tag.

backend_build.yml derives BUILDER_TARGET from this field's presence:
non-empty -> builder-prebuilt; empty -> builder-fromsource. So this
commit alone activates the prebuilt-base path for these 23 backends
in CI, while local `make backends/<name>` (no extra args) keeps the
from-source path.

Mapping by (build-type, arch):
- '' / amd64        -> base-grpc-amd64
- '' / arm64        -> base-grpc-arm64
- cublas-12 / amd64 -> base-grpc-cuda-12-amd64
- cublas-13 / amd64 -> base-grpc-cuda-13-amd64
- cublas-13 / arm64 -> base-grpc-cuda-13-arm64
- hipblas / amd64   -> base-grpc-rocm-amd64
- vulkan / amd64    -> base-grpc-vulkan-amd64
- vulkan / arm64    -> base-grpc-vulkan-arm64
- sycl_* / amd64    -> base-grpc-intel-amd64
- cublas-12 + JetPack r36.4.0 / arm64 -> base-grpc-l4t-cuda-12-arm64

Cold-build savings expected: ~25-35 min per variant (skips the gRPC
compile + toolchain install that's now in the base).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: add base-grpc-l4t-cuda-12-arm64 variant for legacy JetPack entries

Two matrix entries (-nvidia-l4t-arm64-llama-cpp, -nvidia-l4t-arm64-
turboquant) build against nvcr.io/nvidia/l4t-jetpack:r36.4.0 + CUDA
12 ARM64. They're distinct from -nvidia-l4t-cuda-13-arm64-* which use
Ubuntu 24.04 + CUDA 13 sbsa. Add the missing JetPack-based variant
to base-images.yml so those two entries' builder-base-image mapping
in the previous commit resolves.

Bootstrap order before merging this PR (re-run base-images.yml on
this branch — 9 existing variants hit BuildKit cache, only the new
l4t-cuda-12-arm64 builds cold):

  gh workflow run base-images.yml --ref ci/base-images-consumers

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci: extract base-builder install logic into .docker/install-base-deps.sh

Pre-extraction, the apt + protoc + cmake + conditional CUDA/ROCm/Vulkan
+ gRPC install logic was duplicated across four files:
  - backend/Dockerfile.base-grpc-builder (CI prebuilt-base source of truth)
  - backend/Dockerfile.llama-cpp (builder-fromsource stage)
  - backend/Dockerfile.ik-llama-cpp (builder-fromsource stage)
  - backend/Dockerfile.turboquant (builder-fromsource stage)

A bump to e.g. CUDA toolkit packages had to be made in 4 places, and
drift between the prebuilt base and the variant-Dockerfile from-source
path was a real concern (ik-llama-cpp's hipblas branch was already
missing the rocBLAS Kernels echo that llama-cpp / turboquant /
base-grpc-builder all had).

Factor the install logic into a single .docker/install-base-deps.sh
that reads its inputs from env vars and runs conditionally on
BUILD_TYPE / CUDA_*_VERSION / TARGETARCH. Each Dockerfile now bind-
mounts the script alongside .docker/apt-mirror.sh and invokes it from
a single RUN step.

The variant Dockerfiles' grpc-source stage is removed entirely — the
script handles gRPC compile + install at /opt/grpc, and the
builder-fromsource stage mirrors builder-prebuilt by copying
/opt/grpc/. to /usr/local/.

Result:
  - install-base-deps.sh: 244 lines (one source of truth)
  - Dockerfile.base-grpc-builder: 268 -> 98 lines
  - Dockerfile.llama-cpp: 361 -> 157 lines
  - Dockerfile.ik-llama-cpp: 348 -> 151 lines
  - Dockerfile.turboquant: 355 -> 154 lines
  - Total Dockerfile bytes: 1332 -> 560 lines (58% reduction)

Bit-equivalence between prebuilt and from-source paths is now enforced
by construction: both invoke the same script with the same inputs.
A side-effect is that ik-llama-cpp now also gets the rocBLAS Kernels
echo + clblas block parity it was previously missing.

Includes the BUILD_TYPE=clblas branch (libclblast-dev) for parity even
though no current CI matrix entry uses it.

After this commit's force-push, base-images.yml needs to be redispatched
on this branch — the Dockerfile.base-grpc-builder content shifts so the
existing cache won't apply for the install layer (gRPC layer also
rebuilds since it's now in the same RUN step).

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(base-images): skip-drivers on JetPack l4t variant

cuda-nvcc-12-0 isn't installable via apt on the JetPack r36.4.0 base
image — JetPack ships CUDA preinstalled at /usr/local/cuda and its
apt feed doesn't carry the cuda-nvcc-* packages from the public
repositories. The original matrix entry for -nvidia-l4t-arm64-llama-cpp
on master sets skip-drivers: 'true' for exactly this reason; the
new base-grpc-l4t-cuda-12-arm64 base needs to match.

Also forwards SKIP_DRIVERS as a build-arg from matrix into the build
(was missing entirely before this commit).

Caught by run 25612030775 — l4t-cuda-12-arm64 failed at:
  E: Package 'cuda-nvcc-12-0' has no installation candidate

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-10 00:03:52 +02:00
Ettore Di Giacinto
9228e5b412 ci(llama-cpp): add BuildKit ccache mount to the compile step
The big RUN at line 268 of Dockerfile.llama-cpp re-runs from scratch on
every LLAMA_VERSION bump (or any LocalAI source change due to
COPY . /LocalAI just before). For CUDA-13 specifically that compile
recently hit the GHA 6h hard limit and failed:

  https://github.com/mudler/LocalAI/actions/runs/25598418931/job/75148244557

Add a BuildKit cache mount on /root/.ccache and thread ccache through
CMake (CMAKE_C/CXX/CUDA_COMPILER_LAUNCHER) so most translation units
hit cache when their preprocessed source is byte-identical to the
previous build.

The cache mount is exported to the registry as part of the existing
cache-to: type=registry,mode=max in backend_build.yml, so it persists
across runs. mount id is keyed on TARGETARCH + BUILD_TYPE so different
variants don't thrash the same cache slot; sharing=locked serializes
concurrent writes.

Cold-build effect (first run after enable, or on LLAMA_VERSION bump
that touches every TU): unchanged. Hot-build effect (subsequent runs
with the same source, or LLAMA_VERSION bumps that touch a handful of
files): ~5-15 min for the llama.cpp compile vs the previous 1-3h cold.
For CUDA-13 specifically this should bring rebuilds well under the 6h
GHA limit.

Does NOT help the *first* post-bump build — that's still cold. For
that, follow-up work would be: (a) trim CUDA_DOCKER_ARCH to modern
GPUs only, (b) audit which CMake variants the published images
actually need, (c) pre-built CUDA+gRPC base image.

ccache package is already installed in the builder stage (line 90).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-09 16:16:46 +00:00
Ettore Di Giacinto
8edac61e57 feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650)
* feat(ci): allow routing apt traffic through an alternate Ubuntu mirror

Adds opt-in APT_MIRROR / APT_PORTS_MIRROR knobs to all Dockerfiles, the
Makefile, and CI workflows so we can fail over to a non-canonical Ubuntu
mirror when archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com
are degraded (recently observed: multi-day DDoS against the default pool).

Defaults are empty everywhere — behavior is unchanged unless a mirror is
configured. To enable in CI, set the repo-level GitHub Actions variables
APT_MIRROR (and APT_PORTS_MIRROR for arm64 builds). Locally:
    make docker APT_MIRROR=http://azure.archive.ubuntu.com

A small POSIX-sh helper in .docker/apt-mirror.sh rewrites both DEB822
(/etc/apt/sources.list.d/ubuntu.sources, Ubuntu 24.04+) and the legacy
/etc/apt/sources.list before the first apt-get update. Dockerfile stages
load it via RUN --mount=type=bind, so there is no extra layer and no
cache invalidation when the script is unchanged. Reusable workflows also
rewrite the runner's own /etc/apt sources before any sudo apt-get call.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(apt-mirror): default to the Azure mirror, visible in the workflow source

Bakes Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
in as the default for both Docker builds and runner-side apt — rather than
hiding the URL behind a GitHub Actions repo variable that's not visible
from the source tree.

A new composite action at .github/actions/configure-apt-mirror is the
single source of truth for runner-side rewrites. Five standalone
workflows (build-test, release, tests-e2e, tests-ui-e2e, update_swagger)
just `uses: ./.github/actions/configure-apt-mirror`.

Three workflows (image_build, backend_build, checksum_checker) keep an
inline bash rewrite, because they install/upgrade git via apt *before*
the checkout step (so the local composite action isn't loadable yet).
The Azure URL is visible in those files too.

The `apt-mirror` / `apt-ports-mirror` inputs of the reusable workflows
keep their now-Azure defaults — they still feed the Docker build-args
block in addition to the inline runner-side rewrite. Callers (image.yml,
image-pr.yml, backend.yml, backend_pr.yml) drop the previous
`vars.APT_MIRROR` plumbing and rely on those defaults.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(apt-mirror): drop Force Install GIT, consolidate on the composite action

The PPA git upgrade ran add-apt-repository ppa:git-core/ppa, which talks
to api.launchpad.net — also part of Canonical's infrastructure and
currently returning HTTP 504. The Azure mirror only covers
archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com, not PPAs.

The system git that ubuntu-latest already ships is sufficient for
actions/checkout and the build pipeline, so just drop the upgrade. With
that gone, the apt-before-checkout constraint disappears too — all three
holdouts (image_build, backend_build, checksum_checker) can now switch
to ./.github/actions/configure-apt-mirror like the other five.

Net: 0 inline apt-mirror blocks, all 8 workflows route through the
composite action.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-03 23:50:13 +02:00
Andreas Egli
1d0de757c3 fix: add hipblaslt library (#9541)
Signed-off-by: Andreas Egli <github@kharan.ch>
2026-04-24 18:50:03 +02:00
Keith Mattix II
8839a71c87 fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg (#9410)
Add gfx1151 (AMD Strix Halo / Ryzen AI MAX) to the default AMDGPU_TARGETS
list in the llama-cpp backend Makefile. ROCm 7.2.1 ships with gfx1151
Tensile libraries, so this architecture should be included in default builds.

Also expose AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so that
users building for non-default GPU architectures can override the target
list via --build-arg AMDGPU_TARGETS=<arch>. Previously, passing
-DAMDGPU_TARGETS=<arch> through CMAKE_ARGS was silently overridden by
the Makefile's own append of the default target list.

Fixes #9374

Signed-off-by: Keith Mattix <keithmattix2@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-18 20:39:40 +02:00
Ettore Di Giacinto
151ad271f2 feat(rocm): bump to 7.x (#9323)
feat(rocm): bump to 7.2.1

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-12 08:51:30 +02:00
Ettore Di Giacinto
7891c33cb1 chore(vulkan): bump vulkan-sdk to 1.4.335.0 (#7981)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-12 07:51:26 +01:00
Ettore Di Giacinto
917c7aa9f3 chore(ci): roll back l4t-cuda12 configurations (#7935)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 23:04:33 +01:00
Copilot
b2ff1cea2a feat: enable Vulkan arm64 image builds (#7912)
* Initial plan

* Add arm64 support for Vulkan builds in Dockerfiles and workflows

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 21:49:50 +01:00
Richard Palethorpe
e6ba26c3e7 chore: Update to Ubuntu24.04 (cont #7423) (#7769)
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): bump GitHub Actions CUDA support to 12.9

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): disable parallel backend builds to avoid race conditions

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): update backend Dockerfiles to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore: add local-ai-launcher to .gitignore

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix backends GitHub Actions workflows after rebase

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): use build-time UBUNTU_VERSION variable

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(docker): remove libquadmath0 from requirements-stage base image

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(docker): correct CUDA installation steps in backend Dockerfiles

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): update ROCm to 6.4 and align Python hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix failing GitHub Actions runners

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): rm all traces of CUDA 11

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): Add Ubuntu codename as an argument

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
2026-01-06 15:26:42 +01:00
coffeerunhobby
5add7b47f5 fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) (#7864)
* Fix BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge)

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>

* Address feedback from review

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-06 00:13:48 +00:00
Ettore Di Giacinto
dc6182bbb1 chore(ci): add wget to llama-cpp docker image builder
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 22:48:41 +01:00
Ettore Di Giacinto
1d1d52da59 chore(ci): small fixups to build arm64 images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 21:42:33 +01:00
Ettore Di Giacinto
ab4f2742a6 chore(ci): minor fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 16:26:33 +01:00
Ettore Di Giacinto
03f3bf2d94 chore(ci): only install runtime libs needed on arm64
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 15:13:21 +01:00
Chakib Benziane
c28e5b39d6 fix: llama dockerfile make package (#6694)
the make package rule does not currently always run resulting in an
empty scratch image.

- added `make -B` flag to force the `make package` rule

Signed-off-by: blob42 <contact@blob42.xyz>
2025-10-24 09:03:11 +02:00
Ettore Di Giacinto
294f7022f3 feat: do not bundle llama-cpp anymore (#5790)
* Build llama.cpp separately

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Start to try to attach some tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add git and small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: correctly autoload external backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run AIO tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Slightly update the Makefile helps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt auto-bumper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run linux test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add llama-cpp into build pipelines

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add default capability (for cpu)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop llama-cpp specific logic from the backend loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop grpc install in ci for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pass by backends path for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build protogen at start

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests): set backends path consistently

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly configure the backends path

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to build for darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Compile for metal on arm64/darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run build off from cross-arch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the backend index nvidia-l4t and cpu's llama-cpp backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build also darwin-x86 for llama-cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable arm64 builds temporary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test backend build on PR

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup build backend reusable workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* pass by skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use crane

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* x86 darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add packaging step for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix leftover from bark-cpp extraction

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to fix hipblas build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-18 13:24:12 +02:00