mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-30 03:25:42 -04:00
Python's tier-1+2 base image (apt + GPU SDK + lang toolchain) was the only
lang previously factored. The remaining 82 matrix entries (62 golang +
9 llama-cpp + 9 turboquant + 1 ik-llama-cpp + 1 rust) still inlined the
same bootstrap into per-backend cache tags.
Add .docker/bases/Dockerfile.{golang,cpp,rust} mirroring Dockerfile.python's
GPU stack, with the lang-specific tail at the bottom (Go + protoc + grpc
tooling; protoc + cmake + GRPC; rustup + audio dev libs respectively).
Slim the five consumer Dockerfiles to FROM ${BASE_IMAGE_PREBUILT} + the
per-backend COPY/make.
The C++ trio (llama-cpp, ik-llama-cpp, turboquant) only differ in their
make targets, so langOf() in scripts/changed-backends.js remaps all three
Dockerfile suffixes to the shared 'cpp' base. That collapses 17 would-be
distinct bases to 8. langTriggerSelector and baseTriggerFiles are
extended so PRs touching the new recipes fan out canaries; the
.docker/bases/ auto-detection picks up the new langs without further
script changes.
Makefile: add docker-build-{python,golang,cpp,rust}-base targets and a
local-base-tag/local-base-target macro pair so each backend's
docker-build-X chains through the right base. The previous python-only
prereq is now a generic per-lang dispatch.
Total distinct bases for the full 234-entry matrix: 29 (was 9 with only
python factored). The C++ base also absorbs the previously per-consumer
GRPC build stage, removing the dominant cost from the llama-cpp /
ik-llama-cpp / turboquant rebuild paths.
Assisted-by: Claude:opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
68 lines
1.9 KiB
Docker
68 lines
1.9 KiB
Docker
# Builds the llama-cpp backend on top of the shared
|
|
# .docker/bases/Dockerfile.cpp base. The base bakes in apt + GPU SDK +
|
|
# protoc + cmake + GRPC, so this stage only carries the COPY + `make`
|
|
# invocations and the final scratch-stage package.
|
|
#
|
|
# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) passes
|
|
# BASE_IMAGE_PREBUILT. See .agents/ci-caching.md.
|
|
|
|
ARG BASE_IMAGE_PREBUILT
|
|
|
|
FROM ${BASE_IMAGE_PREBUILT} AS builder
|
|
|
|
# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
|
|
ARG CUDA_DOCKER_ARCH
|
|
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
|
|
ARG CMAKE_ARGS
|
|
ENV CMAKE_ARGS=${CMAKE_ARGS}
|
|
ARG AMDGPU_TARGETS
|
|
ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
|
|
ARG BACKEND=llama-cpp
|
|
ARG BUILD_TYPE
|
|
ENV BUILD_TYPE=${BUILD_TYPE}
|
|
ARG CUDA_MAJOR_VERSION
|
|
ARG CUDA_MINOR_VERSION
|
|
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
|
|
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
|
|
ARG TARGETARCH
|
|
ARG TARGETVARIANT
|
|
|
|
COPY . /LocalAI
|
|
|
|
RUN <<'EOT' bash
|
|
set -euxo pipefail
|
|
|
|
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
|
|
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
|
|
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
|
|
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
|
|
rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
|
|
fi
|
|
|
|
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
|
|
cd /LocalAI/backend/cpp/llama-cpp
|
|
make llama-cpp-fallback
|
|
make llama-cpp-grpc
|
|
make llama-cpp-rpc-server
|
|
else
|
|
cd /LocalAI/backend/cpp/llama-cpp
|
|
make llama-cpp-avx
|
|
make llama-cpp-avx2
|
|
make llama-cpp-avx512
|
|
make llama-cpp-fallback
|
|
make llama-cpp-grpc
|
|
make llama-cpp-rpc-server
|
|
fi
|
|
EOT
|
|
|
|
|
|
# Copy libraries using a script to handle architecture differences
|
|
RUN make -BC /LocalAI/backend/cpp/llama-cpp package
|
|
|
|
|
|
FROM scratch
|
|
|
|
|
|
# Copy all available binaries (the build process only creates the appropriate ones for the target architecture)
|
|
COPY --from=builder /LocalAI/backend/cpp/llama-cpp/package/. ./
|