mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-11 10:19:33 -04:00
The 234-entry backend matrix runs the same apt-update + GPU SDK install + Python toolchain bootstrap into N independent registry-cache tags. Factor that shared work out into a tier-1+2 base image (lang × accel × ubuntu × cuda) built once per workflow run, then consumed by every backend that matches its tuple via BASE_IMAGE_PREBUILT. The matrix data moves to .github/backend-matrix.yaml so backend.yml can switch to fromJSON without duplicating the matrix. scripts/changed-backends.js reads the data file, derives the deduplicated bases-matrix, annotates each Python entry with the right base-image-prebuilt ref, and runs a collision check that fails loudly if a future matrix change makes two consumers want incompatible bases under the same tag-stem. PR builds tag with -pr<N> so end-to-end validation lives within one PR; master builds tag without the suffix. The base-images registry cache parallels the existing per-matrix-entry caches. Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding a new language tier is a Dockerfile.<lang> recipe + a slim of the consumer Dockerfile (script auto-detects via .docker/bases/). 10 distinct bases derive from the current 234 entries, replacing the inline bootstrap that previously ran into ~10 separate cache tags. Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>
58 lines
2.2 KiB
Docker
58 lines
2.2 KiB
Docker
# Builds a single Python backend on top of the shared
|
|
# .docker/bases/Dockerfile.python base. The base bakes in apt-update + GPU
|
|
# SDK install + python toolchain (uv, pip, rustup, grpcio-tools), so this
|
|
# stage only carries the per-backend source COPY + `make`.
|
|
#
|
|
# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
|
|
# the right base flavour automatically via scripts/derive-build-matrix.js
|
|
# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
|
|
# make backend-image-base BUILD_TYPE=<...> # build the base
|
|
# make backend-image BACKEND=<...> BUILD_TYPE=<...>
|
|
# See .agents/ci-caching.md.
|
|
|
|
ARG BASE_IMAGE_PREBUILT
|
|
|
|
FROM ${BASE_IMAGE_PREBUILT} AS builder
|
|
|
|
ARG BACKEND=rerankers
|
|
ARG BUILD_TYPE
|
|
ENV BUILD_TYPE=${BUILD_TYPE}
|
|
ARG CUDA_MAJOR_VERSION
|
|
ARG CUDA_MINOR_VERSION
|
|
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
|
|
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
|
|
|
|
COPY backend/python/${BACKEND} /${BACKEND}
|
|
COPY backend/backend.proto /${BACKEND}/backend.proto
|
|
COPY backend/python/common/ /${BACKEND}/common
|
|
COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
|
|
|
|
# Optional per-backend source build toggle (e.g. vllm on CPU can set
|
|
# FROM_SOURCE=true to compile against the build host SIMD instead of
|
|
# pulling a prebuilt wheel). Default empty — most backends ignore it.
|
|
ARG FROM_SOURCE=""
|
|
ENV FROM_SOURCE=${FROM_SOURCE}
|
|
|
|
# Cache-buster for the per-backend `make` step. Most Python backends list
|
|
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
|
|
# would otherwise freeze upstream versions indefinitely. CI passes a value
|
|
# that rolls weekly so the install layer is rebuilt at most once per week
|
|
# and picks up newer wheels from PyPI / nightly indexes.
|
|
ARG DEPS_REFRESH=initial
|
|
|
|
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
|
|
|
# Package GPU libraries into the backend's lib directory
|
|
RUN mkdir -p /${BACKEND}/lib && \
|
|
TARGET_LIB_DIR="/${BACKEND}/lib" BUILD_TYPE="${BUILD_TYPE}" CUDA_MAJOR_VERSION="${CUDA_MAJOR_VERSION}" \
|
|
bash /package-gpu-libs.sh "/${BACKEND}/lib"
|
|
|
|
# Run backend-specific packaging if a package.sh exists
|
|
RUN if [ -f "/${BACKEND}/package.sh" ]; then \
|
|
cd /${BACKEND} && bash package.sh; \
|
|
fi
|
|
|
|
FROM scratch
|
|
ARG BACKEND=rerankers
|
|
COPY --from=builder /${BACKEND}/ /
|