ci: layered Python base images for cross-matrix dedup

The 234-entry backend matrix runs the same apt-update + GPU SDK install + Python toolchain bootstrap into N independent registry-cache tags. Factor that shared work out into a tier-1+2 base image (lang × accel × ubuntu × cuda) built once per workflow run, then consumed by every backend that matches its tuple via BASE_IMAGE_PREBUILT. The matrix data moves to .github/backend-matrix.yaml so backend.yml can switch to fromJSON without duplicating the matrix. scripts/changed-backends.js reads the data file, derives the deduplicated bases-matrix, annotates each Python entry with the right base-image-prebuilt ref, and runs a collision check that fails loudly if a future matrix change makes two consumers want incompatible bases under the same tag-stem. PR builds tag with -pr<N> so end-to-end validation lives within one PR; master builds tag without the suffix. The base-images registry cache parallels the existing per-matrix-entry caches. Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding a new language tier is a Dockerfile.<lang> recipe + a slim of the consumer Dockerfile (script auto-detects via .docker/bases/). 10 distinct bases derive from the current 234 entries, replacing the inline bootstrap that previously ran into ~10 separate cache tags. Assisted-by: Claude:opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-19 14:17:21 -04:00 · 2026-05-05 10:34:26 +01:00
parent 4e154b59e5
commit a3b7c3a819
10 changed files with 4069 additions and 3461 deletions
--- a/.agents/ci-caching.md
+++ b/.agents/ci-caching.md
@@ -101,6 +101,134 @@ For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=cca

 GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.

+## Layered base images (`localai-base`)
+
+The registry-backed BuildKit cache deduplicates **within** a matrix entry's
+cache tag, but each matrix entry has its own tag — so the same `apt-get`,
+GPU SDK install, and language toolchain bootstrap runs into N different
+cache tags across the backend matrix. The `localai-base` images factor that
+shared work out of the per-backend builds.
+
+### How it fits together
+
+```
+.github/backend-matrix.yaml          # raw matrix data (linux + darwin)
+   │
+   ▼
+backend.yml / backend_pr.yml
+  ├── derive-bases / generate-matrix
+  │     scripts/changed-backends.js
+  │       reads .github/backend-matrix.yaml
+  │       (PR mode also reads changed files)
+  │       emits:
+  │         - matrix         (annotated with base-image-prebuilt)
+  │         - matrix-darwin
+  │         - bases-matrix   (deduplicated by tag-stem)
+  │
+  ├── build-bases  (matrix: bases-matrix)
+  │     uses base_images.yml
+  │       FROM .docker/bases/Dockerfile.<lang>
+  │       pushes quay.io/go-skynet/localai-base:<stem>[-pr<N>]
+  │
+  └── backend-jobs  (matrix: matrix; needs build-bases)
+        uses backend_build.yml
+          FROM ${BASE_IMAGE_PREBUILT}
+            i.e. quay.io/go-skynet/localai-base:<stem>[-pr<N>]
+          only the backend source COPY + `make` remain.
+```
+
+The base image is **always** built before backends consume it, in the same
+workflow run. There is no cross-workflow dependency, no chicken-and-egg
+on first push, and no manual matrix to keep in sync — adding a backend
+matrix entry is just an edit to `.github/backend-matrix.yaml`.
+
+### Tag scheme
+
+`<stem>` is computed by `tagStem()` in `scripts/changed-backends.js` from
+the (lang, build-type, ubuntu, cuda, base-image) tuple. Arch is
+intentionally NOT in the stem — bases are built multi-arch when any
+consumer needs multi-arch, and single-arch otherwise (the `platforms`
+field on each base entry is the union of its consumers' platforms).
+
+| Build-type | Stem template |
+|---|---|
+| `''` (CPU) | `<lang>-cpu-<ubuntu>[-<base-image-slug>]` |
+| `cublas` / `l4t` | `<lang>-<build-type>-<ubuntu>-cuda<major>.<minor>[-<base-image-slug>]` |
+| anything else (vulkan, hipblas, intel, sycl_*) | `<lang>-<build-type>-<ubuntu>[-<base-image-slug>]` |
+
+The base-image slug is empty for the default `ubuntu:24.04` and a short
+parseable suffix otherwise (`jetpack-r36.4.0`, `rocm-7.2.1`,
+`oneapi-2025.3.2`, etc.).
+
+| Event | Pushed tag |
+|---|---|
+| `push` (master/tag) | `:<stem>` |
+| `pull_request` | `:<stem>-pr<PR_NUMBER>` |
+
+The cache for the base build itself lives at
+`quay.io/go-skynet/ci-cache:base-<stem>` (`mode=max,ignore-error=true`),
+parallel to the per-matrix-entry caches.
+
+The script also runs a collision check across consumers of each stem: if
+two consumers map to the same stem but disagree on `base-image` or
+`skip-drivers` (and skip-drivers is meaningful for that build-type), the
+script fails loudly. Resolve by encoding the differing input in
+`tagStem()` rather than letting the dedup silently pick a winner.
+
+### PR testability
+
+PRs run the same pipeline as master: derive bases → build bases (tagged
+`-pr<N>`) → run filtered backend matrix consuming those `-pr<N>` tags.
+End-to-end validation always lives within the PR.
+
+For PRs that only change `.docker/bases/Dockerfile.<lang>` (no backend
+source touched), `changed-backends.js` adds one canary backend matrix
+entry per (lang × build-type × arch × cuda × ubuntu) tuple to the filtered
+matrix so each base flavour gets exercised.
+
+### Adding a new (accel × arch × cuda × lang) flavour
+
+Just add the matrix entry to `.github/backend-matrix.yaml` for the new
+flavour. The bases matrix and the per-entry `base-image-prebuilt` are
+derived automatically by `scripts/changed-backends.js`. Nothing else to
+change.
+
+### Adding a new language tier (e.g. golang)
+
+1. Create `.docker/bases/Dockerfile.<lang>` mirroring `Dockerfile.python`
+   (apt + accel install + lang-specific toolchain).
+2. Slim `backend/Dockerfile.<lang>` to `FROM ${BASE_IMAGE_PREBUILT}` plus
+   the per-backend source COPY + build (no inline accel install).
+
+The `langsWithBase` set in `scripts/changed-backends.js` is auto-detected
+from the `.docker/bases/` directory at script startup, so step 1 alone is
+enough for the script to start emitting bases (and annotating matrix
+entries with `base-image-prebuilt`) for that lang.
+
+### Why not just rely on `mode=max` cache?
+
+`mode=max` deduplicates at the layer level, but each matrix entry has its
+own cache tag (`cache<tag-suffix>`). A change that invalidates the GPU SDK
+layer in one backend does not invalidate it in any other; each entry pays
+the full cost on its next rebuild. The shared base image is built once per
+(accel × arch × cuda × lang), then pulled by every backend that consumes
+it — that's the actual cross-matrix dedup.
+
+### Local builds
+
+`backend/Dockerfile.python` requires `BASE_IMAGE_PREBUILT` (no inline
+fallback). For local development:
+
+```bash
+# Build a base flavour locally
+make backend-image-base BUILD_TYPE=cublas CUDA_MAJOR_VERSION=12 CUDA_MINOR_VERSION=9
+
+# Build a backend on top of it
+make backend-image BACKEND=mlx-vlm BUILD_TYPE=cublas CUDA_MAJOR_VERSION=12 CUDA_MINOR_VERSION=9
+```
+
+Or pull a pre-built base from quay if it exists for your target tuple.
+
 ## Touching the cache pipeline

 When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
@@ -109,3 +237,4 @@ When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Doc
 2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
 3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
 4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
+5. **`tagStem()` in `scripts/changed-backends.js` is the single source of truth for base image tags.** The matrix entries are annotated with `base-image-prebuilt` in the same script run; backend-jobs reads the value as-is. There's no parallel YAML expression to keep in sync. Adding a new dimension to the stem (e.g. a slug for a new base-image variant) is a script change only.
--- a/.docker/bases/Dockerfile.python
+++ b/.docker/bases/Dockerfile.python
@@ -0,0 +1,212 @@
+# Shared Python + accelerator base image.
+#
+# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
+# by .github/workflows/base_images_python.yml and pushed to
+# quay.io/go-skynet/localai-base:<tag-stem>[-pr<N>]. Consumed by
+# backend/Dockerfile.python via the BASE_IMAGE_PREBUILT build-arg.
+#
+# Keep the install steps below in lock-step with backend/Dockerfile.python's
+# accel-inline stage until the inline fallback is removed. See
+# .agents/ci-caching.md for the migration plan.
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE}
+
+ARG BUILD_TYPE
+ENV BUILD_TYPE=${BUILD_TYPE}
+ARG CUDA_MAJOR_VERSION
+ARG CUDA_MINOR_VERSION
+ARG SKIP_DRIVERS=false
+ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
+ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
+ENV DEBIAN_FRONTEND=noninteractive
+ARG TARGETARCH
+ARG TARGETVARIANT
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
+LABEL org.opencontainers.image.description="LocalAI Python+accelerator base image"
+LABEL org.localai.base.lang="python"
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        ccache \
+        ca-certificates \
+        espeak-ng \
+        curl \
+        libssl-dev \
+        git wget \
+        git-lfs \
+        unzip clang \
+        upx-ucl \
+        curl python3-pip \
+        python-is-python3 \
+        python3-dev llvm \
+        libnuma1 libgomp1 \
+        python3-venv make cmake && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN <<EOT bash
+    if [ "${UBUNTU_VERSION}" = "2404" ]; then
+        pip install --break-system-packages --user --upgrade pip
+    else
+        pip install --upgrade pip
+    fi
+EOT
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+# Vulkan requirements
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+
+# https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            libclblast-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+        ldconfig \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
+    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
+    ; fi
+
+# Install uv as a system package
+RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
+ENV PATH="/root/.cargo/bin:${PATH}"
+# Increase timeout for uv installs behind slow networks
+ENV UV_HTTP_TIMEOUT=180
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+
+# Install grpcio-tools (the version in 22.04 is too old)
+RUN <<EOT bash
+    if [ "${UBUNTU_VERSION}" = "2404" ]; then
+        pip install --break-system-packages --user grpcio-tools==1.71.0 grpcio==1.71.0
+    else
+        pip install grpcio-tools==1.71.0 grpcio==1.71.0
+    fi
+EOT
--- a/.github/backend-matrix.yaml
+++ b/.github/backend-matrix.yaml
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -63,6 +63,16 @@ on:
        required: false
        default: ''
        type: string
+      base-image-prebuilt:
+        description: |
+          Optional reference to a prebuilt accel/lang base image
+          (quay.io/go-skynet/localai-base:<tag>). When set, the backend
+          Dockerfile FROMs this image instead of running the inline
+          bootstrap. See .github/workflows/base_images_python.yml and
+          .agents/ci-caching.md.
+        required: false
+        default: ''
+        type: string
    secrets:
      dockerUsername:
        required: false
@@ -228,6 +238,7 @@ jobs:
            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
            DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+            BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
@@ -254,6 +265,7 @@ jobs:
            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
            DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+            BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
--- a/.github/workflows/backend_pr.yml
+++ b/.github/workflows/backend_pr.yml
@@ -13,8 +13,10 @@ jobs:
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
      matrix-darwin: ${{ steps.set-matrix.outputs.matrix-darwin }}
+      bases-matrix: ${{ steps.set-matrix.outputs.bases-matrix }}
      has-backends: ${{ steps.set-matrix.outputs.has-backends }}
      has-backends-darwin: ${{ steps.set-matrix.outputs.has-backends-darwin }}
+      has-bases: ${{ steps.set-matrix.outputs.has-bases }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
@@ -27,7 +29,8 @@ jobs:
          bun add js-yaml
          bun add @octokit/core

-      # filters the matrix in backend.yml
+      # Filters the matrix from backend.yml against this PR's changed files
+      # AND derives the deduplicated bases-matrix consumed by build-bases.
      - name: Filter matrix for changed backends
        id: set-matrix
        env:
@@ -35,10 +38,34 @@ jobs:
          GITHUB_EVENT_PATH: ${{ github.event_path }}
        run: bun run scripts/changed-backends.js

-  backend-jobs:
+  build-bases:
    needs: generate-matrix
+    if: needs.generate-matrix.outputs.has-bases == 'true'
+    strategy:
+      fail-fast: false
+      matrix: ${{ fromJSON(needs.generate-matrix.outputs.bases-matrix) }}
+    uses: ./.github/workflows/base_images.yml
+    with:
+      lang: ${{ matrix.lang }}
+      base-image: ${{ matrix.base-image }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      ubuntu-version: ${{ matrix.ubuntu-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      tag-stem: ${{ matrix.tag-stem }}
+      skip-drivers: ${{ matrix.skip-drivers }}
+    secrets:
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+
+  backend-jobs:
+    needs: [generate-matrix, build-bases]
    uses: ./.github/workflows/backend_build.yml
-    if: needs.generate-matrix.outputs.has-backends == 'true'
+    if: |
+      always() && needs.generate-matrix.outputs.has-backends == 'true' &&
+      (needs.build-bases.result == 'success' || needs.build-bases.result == 'skipped')
    with:
      tag-latest: ${{ matrix.tag-latest }}
      tag-suffix: ${{ matrix.tag-suffix }}
@@ -54,12 +81,17 @@ jobs:
      context: ${{ matrix.context }}
      ubuntu-version: ${{ matrix.ubuntu-version }}
      amdgpu-targets: ${{ matrix.amdgpu-targets || 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201' }}
+      # The script annotates each filtered Python entry with the prebuilt
+      # base ref it should consume; non-Python entries get '' and run their
+      # own inline bootstrap.
+      base-image-prebuilt: ${{ matrix.base-image-prebuilt || '' }}
    secrets:
      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
    strategy:
      fail-fast: true
      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
+
  backend-jobs-darwin:
    needs: generate-matrix
    uses: ./.github/workflows/backend_build_darwin.yml
--- a/.github/workflows/base_images.yml
+++ b/.github/workflows/base_images.yml
@@ -0,0 +1,145 @@
+---
+name: 'build base image (reusable)'
+
+# Builds and pushes one (lang, accel, arch, ubuntu, cuda) base image flavour
+# to quay.io/go-skynet/localai-base. Consumed by backend builds via the
+# BASE_IMAGE_PREBUILT build-arg. PR builds tag with `-pr${PR_NUMBER}` so the
+# same PR's backend matrix can opt-in to the freshly-built base; master
+# builds overwrite the unsuffixed tag for downstream consumption. See
+# .agents/ci-caching.md for the full tagging scheme.
+
+on:
+  workflow_call:
+    inputs:
+      lang:
+        description: 'Language toolchain (matches .docker/bases/Dockerfile.<lang>)'
+        required: true
+        type: string
+      base-image:
+        description: 'Upstream base image (ubuntu:24.04, rocm/dev-ubuntu-24.04:..., etc.)'
+        required: true
+        type: string
+      build-type:
+        description: 'BUILD_TYPE: empty for CPU, cublas, hipblas, vulkan, l4t, ...'
+        default: ''
+        type: string
+      cuda-major-version:
+        description: 'CUDA major version (only meaningful for cublas/l4t)'
+        default: '12'
+        type: string
+      cuda-minor-version:
+        description: 'CUDA minor version'
+        default: '9'
+        type: string
+      ubuntu-version:
+        description: 'Ubuntu version code (2204, 2404)'
+        default: '2404'
+        type: string
+      platforms:
+        description: 'Single platform per call (linux/amd64 or linux/arm64)'
+        required: true
+        type: string
+      runs-on:
+        description: 'Runner label'
+        required: true
+        type: string
+      tag-stem:
+        description: 'Stable portion of the image tag (e.g. python-cpu-amd64-2404)'
+        required: true
+        type: string
+      skip-drivers:
+        description: 'Pass-through to the base Dockerfile'
+        default: 'false'
+        type: string
+    secrets:
+      quayUsername:
+        required: false
+      quayPassword:
+        required: false
+    outputs:
+      image-ref:
+        description: 'Full image reference of the built base'
+        value: ${{ jobs.base-build.outputs.image-ref }}
+
+jobs:
+  base-build:
+    runs-on: ${{ inputs.runs-on }}
+    env:
+      quay_username: ${{ secrets.quayUsername }}
+    outputs:
+      image-ref: ${{ steps.compute_ref.outputs.ref }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Configure apt mirror on runner
+        id: apt_mirror
+        uses: ./.github/actions/configure-apt-mirror
+
+      - name: Free Disk Space (Ubuntu)
+        if: inputs.runs-on == 'ubuntu-latest'
+        uses: jlumbroso/free-disk-space@main
+        with:
+          tool-cache: true
+          android: true
+          dotnet: true
+          haskell: true
+          large-packages: true
+          docker-images: true
+          swap-storage: true
+
+      - name: Compute image ref
+        id: compute_ref
+        run: |
+          stem='${{ inputs.tag-stem }}'
+          if [ "${{ github.event_name }}" = "pull_request" ]; then
+            tag="${stem}-pr${{ github.event.number }}"
+          else
+            tag="${stem}"
+          fi
+          echo "tag=${tag}" >> "$GITHUB_OUTPUT"
+          echo "ref=quay.io/go-skynet/localai-base:${tag}" >> "$GITHUB_OUTPUT"
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@master
+        with:
+          platforms: all
+
+      - name: Set up Docker Buildx
+        id: buildx
+        uses: docker/setup-buildx-action@master
+
+      - name: Login to Quay.io
+        if: ${{ env.quay_username != '' }}
+        uses: docker/login-action@v4
+        with:
+          registry: quay.io
+          username: ${{ secrets.quayUsername }}
+          password: ${{ secrets.quayPassword }}
+
+      - name: Build and push base image
+        uses: docker/build-push-action@v7
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          context: .
+          file: ./.docker/bases/Dockerfile.${{ inputs.lang }}
+          build-args: |
+            BUILD_TYPE=${{ inputs.build-type }}
+            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
+            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
+            BASE_IMAGE=${{ inputs.base-image }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
+          platforms: ${{ inputs.platforms }}
+          # Push on PRs as well (if creds present) so the PR's backend matrix
+          # can opt-in to the freshly-built base via -pr${N} tag.
+          push: ${{ env.quay_username != '' }}
+          tags: ${{ steps.compute_ref.outputs.ref }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }},mode=max,ignore-error=true
+
+      - name: job summary
+        run: |
+          echo "Built base image: ${{ steps.compute_ref.outputs.ref }}" >> "$GITHUB_STEP_SUMMARY"
--- a/34
+++ b/34
@@ -1094,6 +1094,32 @@ BACKEND_KOKOROS = kokoros|rust|.|false|true
 # C++ backends (Go wrapper with purego)
 BACKEND_SAM3_CPP = sam3-cpp|golang|.|false|true

+# Tag stem for the local prebuilt base image. Mirrors tagStem() in
+# scripts/changed-backends.js and the inline expression in
+# .github/workflows/backend.yml, so a `make docker-build-X` produces the
+# same FROM ref shape that CI uses.
+LOCAL_BASE_BUILD_TYPE := $(or $(BUILD_TYPE),cpu)
+LOCAL_BASE_UBUNTU_VERSION := $(or $(UBUNTU_VERSION),2404)
+LOCAL_BASE_CUDA_SUFFIX := $(if $(filter cublas l4t,$(BUILD_TYPE)),-cuda$(CUDA_MAJOR_VERSION).$(CUDA_MINOR_VERSION))
+LOCAL_BASE_PYTHON_TAG := localai-base:python-$(LOCAL_BASE_BUILD_TYPE)-$(LOCAL_BASE_UBUNTU_VERSION)$(LOCAL_BASE_CUDA_SUFFIX)
+
+# Build the Python+accelerator base image locally. Backend builds depend on
+# this; PHONY so docker handles its own layer caching.
+.PHONY: docker-build-python-base
+docker-build-python-base:
+	docker build \
+		--build-arg BUILD_TYPE=$(BUILD_TYPE) \
+		--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
+		--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
+		--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
+		--build-arg UBUNTU_VERSION=$(LOCAL_BASE_UBUNTU_VERSION) \
+		--build-arg APT_MIRROR=$(APT_MIRROR) \
+		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(SKIP_DRIVERS),--build-arg SKIP_DRIVERS=$(SKIP_DRIVERS)) \
+		-t $(LOCAL_BASE_PYTHON_TAG) \
+		-f .docker/bases/Dockerfile.python \
+		.
+
 # Helper function to build docker image for a backend
 # Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
 define docker-build-backend
@@ -1106,15 +1132,19 @@ define docker-build-backend
 		--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
 		--build-arg APT_MIRROR=$(APT_MIRROR) \
 		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(filter python,$(2)),--build-arg BASE_IMAGE_PREBUILT=$(LOCAL_BASE_PYTHON_TAG)) \
 		$(if $(FROM_SOURCE),--build-arg FROM_SOURCE=$(FROM_SOURCE)) \
 		$(if $(AMDGPU_TARGETS),--build-arg AMDGPU_TARGETS=$(AMDGPU_TARGETS)) \
 		$(if $(filter true,$(5)),--build-arg BACKEND=$(1)) \
 		-t local-ai-backend:$(1) -f backend/Dockerfile.$(2) $(3)
 endef

-# Generate docker-build targets from backend definitions
+# Generate docker-build targets from backend definitions. Python backends
+# get docker-build-python-base as a prerequisite so the layered base is
+# always present locally. Other dockerfile types still build their own
+# inline bootstrap from their respective Dockerfile.<lang>.
 define generate-docker-build-target
-docker-build-$(word 1,$(subst |, ,$(1))):
+docker-build-$(word 1,$(subst |, ,$(1))): $(if $(filter python,$(word 2,$(subst |, ,$(1)))),docker-build-python-base)
 	$$(call docker-build-backend,$(word 1,$(subst |, ,$(1))),$(word 2,$(subst |, ,$(1))),$(word 3,$(subst |, ,$(1))),$(word 4,$(subst |, ,$(1))),$(word 5,$(subst |, ,$(1))))
 endef

--- a/backend/Dockerfile.python
+++ b/backend/Dockerfile.python
@@ -1,202 +1,26 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds a single Python backend on top of the shared
+# .docker/bases/Dockerfile.python base. The base bakes in apt-update + GPU
+# SDK install + python toolchain (uv, pip, rustup, grpcio-tools), so this
+# stage only carries the per-backend source COPY + `make`.
+#
+# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
+# the right base flavour automatically via scripts/derive-build-matrix.js
+# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
+#   make backend-image-base BUILD_TYPE=<...>     # build the base
+#   make backend-image BACKEND=<...> BUILD_TYPE=<...>
+# See .agents/ci-caching.md.
+
+ARG BASE_IMAGE_PREBUILT
+
+FROM ${BASE_IMAGE_PREBUILT} AS builder

-FROM ${BASE_IMAGE} AS builder
 ARG BACKEND=rerankers
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
-ARG TARGETARCH
-ARG TARGETVARIANT
-ARG UBUNTU_VERSION=2404
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache \
-        ca-certificates \
-        espeak-ng \
-        curl \
-        libssl-dev \
-        git wget \
-        git-lfs \
-        unzip clang \
-        upx-ucl \
-        curl python3-pip \
-        python-is-python3 \
-        python3-dev llvm \
-        libnuma1 libgomp1 \
-        python3-venv make cmake && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-RUN <<EOT bash
-    if [ "${UBUNTU_VERSION}" = "2404" ]; then
-        pip install --break-system-packages --user --upgrade pip
-    else
-        pip install --upgrade pip
-    fi
-EOT
-
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
-    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
-    ; fi
-
-# Install uv as a system package
-RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
-ENV PATH="/root/.cargo/bin:${PATH}"
-# Increase timeout for uv installs behind slow networks
-ENV UV_HTTP_TIMEOUT=180
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
-
-# Install grpcio-tools (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${UBUNTU_VERSION}" = "2404" ]; then
-        pip install --break-system-packages --user grpcio-tools==1.71.0 grpcio==1.71.0
-    else
-        pip install grpcio-tools==1.71.0 grpcio==1.71.0
-    fi
-EOT
-

 COPY backend/python/${BACKEND} /${BACKEND}
 COPY backend/backend.proto /${BACKEND}/backend.proto
--- a/scripts/changed-backends.js
+++ b/scripts/changed-backends.js
@@ -1,19 +1,73 @@
+// Compute the CI build pipeline from backend.yml's matrix:
+//   - matrix:           filtered (PR mode) or full (master mode) backend
+//                       matrix entries, with base-image-prebuilt annotated
+//                       for langs that have a prebuilt base recipe under
+//                       .docker/bases/.
+//   - matrix-darwin:    same idea for the darwin matrix.
+//   - bases-matrix:     deduplicated set of base images needed by the
+//                       filtered matrix, in the shape consumed by
+//                       .github/workflows/base_images.yml.
+//   - has-{backends,backends-darwin,bases}: gating booleans.
+//   - <backend>=true/false:  per-backend booleans for test-extra.yml.
+//
+// On PR events the matrix is filtered to backends whose source dirs
+// changed; if .docker/bases/Dockerfile.<lang> (or its workflow scaffolding)
+// changed, a canary entry per (lang × build-type × arch × cuda × ubuntu)
+// is added so the prebuilt-base path gets exercised end-to-end before
+// merge. See .agents/ci-caching.md.
+
 import fs from "fs";
 import yaml from "js-yaml";
 import { Octokit } from "@octokit/core";

-// Load backend.yml and parse matrix.include
-const backendYml = yaml.load(fs.readFileSync(".github/workflows/backend.yml", "utf8"));
-const jobs = backendYml.jobs;
-const backendJobs = jobs["backend-jobs"];
-const backendJobsDarwin = jobs["backend-jobs-darwin"];
-const includes = backendJobs.strategy.matrix.include;
-const includesDarwin = backendJobsDarwin.strategy.matrix.include;
+// Backend matrix lives in a sibling data file so the workflow can switch
+// to fromJSON without needing two copies of the same matrix. See
+// .github/backend-matrix.yaml.
+const matrixData = yaml.load(fs.readFileSync(".github/backend-matrix.yaml", "utf8"));
+const includes = matrixData.linux;
+const includesDarwin = matrixData.darwin;

 const eventPath = process.env.GITHUB_EVENT_PATH;
 const event = JSON.parse(fs.readFileSync(eventPath, "utf8"));
+const isPR = !!event.pull_request;
+const prNumber = isPR ? event.pull_request.number : null;
+
+// Langs with a prebuilt base recipe under .docker/bases/Dockerfile.<lang>.
+// Discovered at runtime so adding a new language tier (e.g. golang) only
+// requires creating that file + slimming the consumer Dockerfile; no
+// orchestration changes needed.
+const baseRecipeDir = ".docker/bases";
+const langsWithBase = new Set(
+  fs.existsSync(baseRecipeDir)
+    ? fs.readdirSync(baseRecipeDir)
+        .filter(f => f.startsWith("Dockerfile."))
+        .map(f => f.slice("Dockerfile.".length))
+    : []
+);
+
+// Files that, when changed in a PR, should fan out to canary backend
+// matrix entries for the affected lang. Keeps PR validation honest when a
+// PR only touches base scaffolding.
+const baseTriggerFiles = new Set([
+  ".docker/bases/Dockerfile.python",
+  ".docker/apt-mirror.sh",
+  ".github/workflows/base_images.yml",
+  ".github/actions/configure-apt-mirror/action.yml",
+  "scripts/changed-backends.js",
+]);
+const langTriggerSelector = {
+  python: (item) => item.dockerfile && item.dockerfile.endsWith("python"),
+};
+
+// ---------- helpers ----------
+
+function langOf(item) {
+  if (!item.dockerfile) return null;
+  // dockerfile is like "./backend/Dockerfile.python"
+  const m = item.dockerfile.match(/Dockerfile\.([\w-]+)$/);
+  return m ? m[1] : null;
+}

-// Infer backend path
 function inferBackendPath(item) {
  if (item.dockerfile.endsWith("python")) {
    return `backend/python/${item.backend}/`;
@@ -42,61 +96,196 @@ function inferBackendPathDarwin(item) {
  if (!item.lang) {
    return `backend/python/${item.backend}/`;
  }
-
  return `backend/${item.lang}/${item.backend}/`;
 }

-// Build a deduplicated map of backend name -> path prefix from all matrix entries
+function platformsOf(item) {
+  // matrix.platforms can be "linux/amd64", "linux/arm64", or
+  // "linux/amd64,linux/arm64". Always return a normalized array.
+  if (!item.platforms) return ["linux/amd64"];
+  return item.platforms.split(",").map(p => p.trim()).filter(Boolean);
+}
+
+// Slug a base image reference for inclusion in a tag-stem. Returns "" for
+// the default ubuntu:24.04 (which is the implicit BASE_IMAGE) so that case
+// keeps a clean stem. Other base images get a short, parseable suffix.
+function baseImageSlug(img) {
+  if (!img || img === "ubuntu:24.04") return "";
+  if (img.includes("l4t-jetpack")) {
+    const m = img.match(/r\d+(?:\.\d+)+/);
+    return `jetpack-${m ? m[0] : "x"}`;
+  }
+  if (img.includes("rocm/dev-ubuntu")) {
+    const m = img.match(/:([\d.]+)/);
+    return `rocm-${m ? m[1] : "x"}`;
+  }
+  if (img.includes("intel/oneapi-basekit")) {
+    const m = img.match(/:([\d.]+)/);
+    return `oneapi-${m ? m[1] : "x"}`;
+  }
+  return img.replace(/.*\//, "").replace(/:/g, "-").replace(/[^A-Za-z0-9.-]/g, "");
+}
+
+// Tag stem for the prebuilt base. Arch is intentionally NOT in the stem:
+// the base is built multi-arch when any consumer needs multi-arch, and
+// single-arch otherwise.
+function tagStem(item) {
+  const lang = langOf(item);
+  if (!lang || !langsWithBase.has(lang)) return null;
+  const ubuntu = item["ubuntu-version"] || "2404";
+  const buildType = item["build-type"] || "cpu";
+  let stem = `${lang}-${buildType}-${ubuntu}`;
+  if (buildType === "cublas" || buildType === "l4t") {
+    stem += `-cuda${item["cuda-major-version"]}.${item["cuda-minor-version"]}`;
+  }
+  const slug = baseImageSlug(item["base-image"]);
+  if (slug) stem += `-${slug}`;
+  return stem;
+}
+
+function prebuiltRef(stem) {
+  if (!stem) return "";
+  const suffix = isPR ? `-pr${prNumber}` : "";
+  return `quay.io/go-skynet/localai-base:${stem}${suffix}`;
+}
+
+// Build-types that actually exercise the SKIP_DRIVERS branch in the base
+// Dockerfile. For everything else (cpu, intel, sycl_*, mps, metal),
+// skip-drivers is a no-op and disagreeing values across consumers are
+// safe to merge.
+const driverBuildTypes = new Set(["vulkan", "cublas", "l4t", "clblas", "hipblas"]);
+
+function effectiveSkipDrivers(item) {
+  if (!driverBuildTypes.has(item["build-type"] || "")) return "false";
+  return String(item["skip-drivers"] ?? "false");
+}
+
+// Build a base entry consumed by base_images.yml. Platforms is the union
+// across all consumers of this stem (multi-arch when any consumer needs
+// it). runs-on is derived from the platforms: arm-native when arm64 is
+// the only arch, ubuntu-latest (with QEMU) otherwise.
+function baseEntryFor(stem, items) {
+  const first = items[0];
+  const platformSet = new Set();
+  for (const it of items) for (const p of platformsOf(it)) platformSet.add(p);
+  const platforms = [...platformSet].sort().join(",");
+  const armOnly = platforms === "linux/arm64";
+  return {
+    "tag-stem": stem,
+    lang: langOf(first),
+    "base-image": first["base-image"],
+    "build-type": first["build-type"] || "",
+    "cuda-major-version": String(first["cuda-major-version"] ?? ""),
+    "cuda-minor-version": String(first["cuda-minor-version"] ?? ""),
+    "ubuntu-version": String(first["ubuntu-version"] ?? "2404"),
+    platforms,
+    "runs-on": armOnly ? "ubuntu-24.04-arm" : "ubuntu-latest",
+    "skip-drivers": effectiveSkipDrivers(first),
+  };
+}
+
+function dedupBases(items) {
+  // Group consumers by tag-stem.
+  const groups = new Map();
+  for (const item of items) {
+    const stem = tagStem(item);
+    if (!stem) continue;
+    if (!groups.has(stem)) groups.set(stem, []);
+    groups.get(stem).push(item);
+  }
+  // Inputs that MUST agree across all consumers of a stem. If they don't,
+  // the script picks one arbitrarily and the others get a wrong base — fail
+  // loudly so the matrix is reconciled.
+  const collisionChecks = [
+    ["base-image", (it) => it["base-image"]],
+    ["skip-drivers", effectiveSkipDrivers],
+  ];
+  const out = [];
+  for (const [stem, consumers] of groups) {
+    for (const [name, getter] of collisionChecks) {
+      const v0 = getter(consumers[0]);
+      for (const c of consumers.slice(1)) {
+        const v = getter(c);
+        if (v !== v0) {
+          throw new Error(
+            `Tag-stem collision for ${stem}: ${name} differs ` +
+            `(${JSON.stringify(v0)} for ${consumers[0]["tag-suffix"]} vs ` +
+            `${JSON.stringify(v)} for ${c["tag-suffix"]}). ` +
+            `Disambiguate by encoding ${name} in tagStem(), or reconcile the matrix entries.`,
+          );
+        }
+      }
+    }
+    out.push(baseEntryFor(stem, consumers));
+  }
+  return out;
+}
+
+// Annotate a backend matrix entry with `base-image-prebuilt` for langs
+// with a prebuilt base recipe; leave others untouched (their Dockerfile
+// runs the inline bootstrap).
+function annotate(item) {
+  const stem = tagStem(item);
+  if (!stem) return item;
+  return { ...item, "base-image-prebuilt": prebuiltRef(stem) };
+}
+
+// Build the deduplicated list of backend names → path prefixes from all
+// matrix entries (linux + darwin). Used for per-backend boolean outputs
+// consumed by test-extra.yml.
 function getAllBackendPaths() {
  const paths = new Map();
  for (const item of includes) {
    const p = inferBackendPath(item);
-    if (p && !paths.has(item.backend)) {
-      paths.set(item.backend, p);
-    }
+    if (p && !paths.has(item.backend)) paths.set(item.backend, p);
  }
  for (const item of includesDarwin) {
    const p = inferBackendPathDarwin(item);
-    if (p && !paths.has(item.backend)) {
-      paths.set(item.backend, p);
-    }
+    if (p && !paths.has(item.backend)) paths.set(item.backend, p);
  }
  return paths;
 }

 const allBackendPaths = getAllBackendPaths();

-// Non-PR events: output run-all=true and all backends as true
-if (!event.pull_request) {
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix=${JSON.stringify({ include: includes })}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: includesDarwin })}\n`);
+function writeOutput(key, value) {
+  fs.appendFileSync(process.env.GITHUB_OUTPUT, `${key}=${value}\n`);
+}
+
+function emit(filtered, filteredDarwin, runAll) {
+  const annotated = filtered.map(annotate);
+  const bases = dedupBases(filtered);
+  writeOutput("run-all", runAll);
+  writeOutput("has-backends", annotated.length > 0 ? "true" : "false");
+  writeOutput("has-backends-darwin", filteredDarwin.length > 0 ? "true" : "false");
+  writeOutput("has-bases", bases.length > 0 ? "true" : "false");
+  writeOutput("matrix", JSON.stringify({ include: annotated }));
+  writeOutput("matrix-darwin", JSON.stringify({ include: filteredDarwin }));
+  writeOutput("bases-matrix", JSON.stringify({ include: bases }));
+}
+
+// ---------- master mode (push events) ----------
+
+if (!isPR) {
+  emit(includes, includesDarwin, "true");
  for (const backend of allBackendPaths.keys()) {
-    fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=true\n`);
+    writeOutput(backend, "true");
  }
  process.exit(0);
 }

-// PR context
-const prNumber = event.pull_request.number;
+// ---------- PR mode ----------
+
 const repo = event.repository.name;
 const owner = event.repository.owner.login;
-
-const token = process.env.GITHUB_TOKEN;
-const octokit = new Octokit({ auth: token });
+const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

 async function getChangedFiles() {
  let files = [];
  let page = 1;
  while (true) {
-    const res = await octokit.request('GET /repos/{owner}/{repo}/pulls/{pull_number}/files', {
-      owner,
-      repo,
-      pull_number: prNumber,
-      per_page: 100,
-      page
+    const res = await octokit.request("GET /repos/{owner}/{repo}/pulls/{pull_number}/files", {
+      owner, repo, pull_number: prNumber, per_page: 100, page,
    });
    files = files.concat(res.data.map(f => f.filename));
    if (res.data.length < 100) break;
@@ -107,35 +296,55 @@ async function getChangedFiles() {

 (async () => {
  const changedFiles = await getChangedFiles();
-
  console.log("Changed files:", changedFiles);

-  const filtered = includes.filter(item => {
-    const backendPath = inferBackendPath(item);
-    if (!backendPath) return false;
-    return changedFiles.some(file => file.startsWith(backendPath));
+  // Source-driven filter: backend dir touched.
+  const sourceTriggered = new Set();
+  for (const item of includes) {
+    const p = inferBackendPath(item);
+    if (p && changedFiles.some(f => f.startsWith(p))) {
+      sourceTriggered.add(item);
+    }
+  }
+
+  // Base-driven filter: any matrix entry whose lang has a prebuilt base
+  // recipe AND that recipe (or its scaffolding) was touched. We want one
+  // canary per (lang × build-type × arch × cuda × ubuntu) so all bases get
+  // exercised, not 234 entries.
+  const baseTriggered = new Set();
+  const baseTriggerHits = new Set(changedFiles.filter(f => baseTriggerFiles.has(f)));
+  if (baseTriggerHits.size > 0) {
+    const seenStems = new Set();
+    for (const item of includes) {
+      const stem = tagStem(item);
+      if (!stem) continue;
+      const select = langTriggerSelector[langOf(item)];
+      if (select && !select(item)) continue;
+      // Only canary entries for langs whose recipe/scaffolding actually changed.
+      const hits = [...baseTriggerHits];
+      const recipePath = `.docker/bases/Dockerfile.${langOf(item)}`;
+      const langTouched =
+        hits.includes(recipePath) ||
+        // any non-recipe trigger touches all langs
+        hits.some(h => h !== recipePath && !h.startsWith(".docker/bases/Dockerfile."));
+      if (!langTouched) continue;
+      if (seenStems.has(stem)) continue;
+      seenStems.add(stem);
+      baseTriggered.add(item);
+    }
+  }
+
+  const filtered = includes.filter(item => sourceTriggered.has(item) || baseTriggered.has(item));
+  const filteredDarwin = includesDarwin.filter(item => {
+    const p = inferBackendPathDarwin(item);
+    return changedFiles.some(f => f.startsWith(p));
  });

-  const filteredDarwin = includesDarwin.filter(item => {
-    const backendPath = inferBackendPathDarwin(item);
-    return changedFiles.some(file => file.startsWith(backendPath));
-  })
+  console.log("Filtered linux:", filtered.length, "(source:", sourceTriggered.size, "base canaries:", baseTriggered.size, ")");
+  console.log("Filtered darwin:", filteredDarwin.length);

-  console.log("Filtered files:", filtered);
-  console.log("Filtered files Darwin:", filteredDarwin);
+  emit(filtered, filteredDarwin, "false");

-  const hasBackends = filtered.length > 0 ? 'true' : 'false';
-  const hasBackendsDarwin = filteredDarwin.length > 0 ? 'true' : 'false';
-  console.log("Has backends?:", hasBackends);
-  console.log("Has Darwin backends?:", hasBackendsDarwin);
-
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=false\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends=${hasBackends}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=${hasBackendsDarwin}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix=${JSON.stringify({ include: filtered })}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: filteredDarwin })}\n`);
-
-  // Per-backend boolean outputs
  for (const [backend, pathPrefix] of allBackendPaths) {
    let changed = changedFiles.some(file => file.startsWith(pathPrefix));
    // turboquant reuses backend/cpp/llama-cpp sources via a thin wrapper;
@@ -143,6 +352,6 @@ async function getChangedFiles() {
    if (backend === "turboquant" && !changed) {
      changed = changedFiles.some(file => file.startsWith("backend/cpp/llama-cpp/"));
    }
-    fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=${changed ? 'true' : 'false'}\n`);
+    writeOutput(backend, changed ? "true" : "false");
  }
 })();