ci: add pre-built base-grpc-builder image infrastructure (PR 1/2) (#9737)

Introduces a parameterized Dockerfile.base-grpc-builder that produces a fully-prepped builder base image (apt deps + protoc + cmake + gRPC at /opt/grpc + conditional CUDA/ROCm/Vulkan toolchains) and a base-images.yml workflow that builds + pushes 9 variants to quay.io/go-skynet/ci-cache:base-grpc-*: base-grpc-amd64 (Ubuntu 24.04, CPU-only) base-grpc-arm64 (Ubuntu 24.04, CPU-only) base-grpc-cuda-12-amd64 (Ubuntu 24.04 + CUDA 12.8) base-grpc-cuda-13-amd64 (Ubuntu 22.04 + CUDA 13.0) base-grpc-cuda-13-arm64 (Ubuntu 24.04 + CUDA 13.0 sbsa) base-grpc-rocm-amd64 (rocm/dev-ubuntu-24.04:7.2.1 + hipblas) base-grpc-vulkan-amd64 (Ubuntu 24.04 + Vulkan SDK 1.4.335) base-grpc-vulkan-arm64 (Ubuntu 24.04 + Vulkan SDK ARM 1.4.335) base-grpc-intel-amd64 (intel/oneapi-basekit:2025.3.2) The variant Dockerfiles (Dockerfile.llama-cpp, ik-llama-cpp, turboquant) are NOT touched in this PR. PR 2 will refactor them to FROM these prebuilt bases. This PR is intentionally inert - landing it changes no existing CI behavior. The base images don't exist on quay until someone manually triggers the workflow. Bootstrap after merge: gh workflow run base-images.yml --ref master Wait ~30 min for all 9 variants to push, then merge PR 2 (the consumer-side refactor that uses BUILDER_BASE_IMAGE build-arg to FROM these tags). Triggers afterwards: - Saturdays 05:00 UTC (cron) - picks up upstream security updates, runs ~24h before the backend.yml Sunday cron so bases are fresh. - workflow_dispatch - manual ad-hoc rebuild. - master push touching Dockerfile.base-grpc-builder or this workflow. Why split into two PRs: the variant Dockerfiles in PR 2 will FROM the prebuilt bases and have no from-source fallback. Their CI builds fail if the bases don't exist on quay yet. Landing infrastructure first + manual bootstrap + then consumer refactor avoids a broken-master window. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-19 14:17:21 -04:00 · 2026-05-09 18:44:42 +02:00
parent 31aa0582a5
commit 28e29625a2
2 changed files with 406 additions and 0 deletions
--- a/.github/workflows/base-images.yml
+++ b/.github/workflows/base-images.yml
@@ -0,0 +1,138 @@
+---
+name: 'build base-grpc images'
+
+# Builds + pushes pre-compiled builder base images that downstream
+# llama-cpp / ik-llama-cpp / turboquant variant Dockerfiles will FROM
+# (PR 2). Each base contains apt deps + protoc + cmake + gRPC at
+# /opt/grpc + (conditionally) CUDA / ROCm / Vulkan toolchains.
+#
+# Triggers:
+#   - schedule (Saturdays 05:00 UTC) - picks up Ubuntu/CUDA/ROCm
+#     security updates and re-runs ahead of the backend.yml weekly
+#     cron (Sundays 06:00 UTC).
+#   - workflow_dispatch - manual one-off rebuild.
+#   - push to master that touches Dockerfile.base-grpc-builder or
+#     this workflow itself - keeps bases in sync with their inputs.
+#
+# Bootstrap (one-time after this PR merges):
+#   gh workflow run base-images.yml --ref master
+# Wait ~30 min for all 9 matrix variants to push to
+# quay.io/go-skynet/ci-cache:base-grpc-* before merging PR 2.
+
+on:
+  schedule:
+    - cron: '0 5 * * 6'
+  workflow_dispatch:
+  push:
+    branches: [master]
+    paths:
+      - 'backend/Dockerfile.base-grpc-builder'
+      - '.github/workflows/base-images.yml'
+
+concurrency:
+  group: ci-base-images-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
+  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+
+jobs:
+  build:
+    if: github.repository == 'mudler/LocalAI'
+    runs-on: ${{ matrix.runs-on }}
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - tag: 'base-grpc-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'ubuntu:24.04'
+            build-type: ''
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-arm64'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: 'ubuntu:24.04'
+            build-type: ''
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-cuda-12-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'ubuntu:24.04'
+            build-type: 'cublas'
+            cuda-major-version: '12'
+            cuda-minor-version: '8'
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-cuda-13-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'ubuntu:22.04'
+            build-type: 'cublas'
+            cuda-major-version: '13'
+            cuda-minor-version: '0'
+            ubuntu-version: '2204'
+          - tag: 'base-grpc-cuda-13-arm64'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: 'ubuntu:24.04'
+            build-type: 'cublas'
+            cuda-major-version: '13'
+            cuda-minor-version: '0'
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-rocm-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'rocm/dev-ubuntu-24.04:7.2.1'
+            build-type: 'hipblas'
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-vulkan-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'ubuntu:24.04'
+            build-type: 'vulkan'
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-vulkan-arm64'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: 'ubuntu:24.04'
+            build-type: 'vulkan'
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+          - tag: 'base-grpc-intel-amd64'
+            runs-on: 'ubuntu-latest'
+            base-image: 'intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04'
+            build-type: 'sycl'
+            cuda-major-version: ''
+            cuda-minor-version: ''
+            ubuntu-version: '2404'
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          submodules: false
+      - name: Free disk space
+        uses: ./.github/actions/free-disk-space
+      - name: Set up build disk
+        uses: ./.github/actions/setup-build-disk
+      - uses: docker/setup-qemu-action@master
+        with:
+          platforms: all
+      - uses: docker/setup-buildx-action@master
+      - uses: docker/login-action@v4
+        with:
+          registry: quay.io
+          username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+          password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+      - uses: docker/build-push-action@v7
+        with:
+          context: .
+          file: ./backend/Dockerfile.base-grpc-builder
+          build-args: |
+            BASE_IMAGE=${{ matrix.base-image }}
+            BUILD_TYPE=${{ matrix.build-type }}
+            CUDA_MAJOR_VERSION=${{ matrix.cuda-major-version }}
+            CUDA_MINOR_VERSION=${{ matrix.cuda-minor-version }}
+            UBUNTU_VERSION=${{ matrix.ubuntu-version }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-${{ matrix.tag }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-${{ matrix.tag }},mode=max,ignore-error=true
+          provenance: false
+          tags: quay.io/go-skynet/ci-cache:${{ matrix.tag }}
+          push: true
--- a/backend/Dockerfile.base-grpc-builder
+++ b/backend/Dockerfile.base-grpc-builder
@@ -0,0 +1,268 @@
+# syntax=docker/dockerfile:1.7
+#
+# Pre-built builder base image for LocalAI's C++ backends.
+#
+# This Dockerfile is the source of truth for the
+# `quay.io/go-skynet/ci-cache:base-grpc-*` images that
+# `.github/workflows/base-images.yml` builds and pushes. The output of a
+# build is a fully-prepped builder layer containing:
+#
+#   - apt build deps (build-essential, ccache, git, make, pkg-config,
+#     libcurl4-openssl-dev, libssl-dev, curl, unzip, wget, ca-certificates)
+#   - cmake (apt or, when CMAKE_FROM_SOURCE=true, compiled from
+#     ${CMAKE_VERSION})
+#   - protoc v27.1 at /usr/local/bin/protoc
+#   - gRPC ${GRPC_VERSION} compiled and installed at /opt/grpc
+#   - Conditional CUDA toolkit (BUILD_TYPE=cublas|l4t, SKIP_DRIVERS=false)
+#     including the cuda-13 + arm64 cudss/nvpl special case
+#   - Conditional ROCm/HIP build deps (BUILD_TYPE=hipblas)
+#   - Conditional Vulkan SDK 1.4.335.0 (BUILD_TYPE=vulkan)
+#
+# Variants built by the workflow (matrix in base-images.yml):
+#
+#   base-grpc-amd64                 ubuntu:24.04, CPU-only
+#   base-grpc-arm64                 ubuntu:24.04, CPU-only
+#   base-grpc-cuda-12-amd64         ubuntu:24.04 + CUDA 12.8
+#   base-grpc-cuda-13-amd64         ubuntu:22.04 + CUDA 13.0
+#   base-grpc-cuda-13-arm64         ubuntu:24.04 + CUDA 13.0 (sbsa)
+#   base-grpc-rocm-amd64            rocm/dev-ubuntu-24.04:7.2.1 + hipblas
+#   base-grpc-vulkan-amd64          ubuntu:24.04 + Vulkan SDK 1.4.335
+#   base-grpc-vulkan-arm64          ubuntu:24.04 + Vulkan SDK ARM 1.4.335
+#   base-grpc-intel-amd64           intel/oneapi-basekit:2025.3.2 (sycl)
+#
+# This is a SINGLE-stage Dockerfile by design: the final image IS the
+# builder base. The intermediate gRPC compile happens inside this same
+# stage so consumer Dockerfiles in PR 2 can simply
+# `FROM quay.io/go-skynet/ci-cache:base-grpc-<variant>` without needing a
+# COPY --from=grpc step. /opt/grpc is the canonical install prefix and
+# downstream builds will add it to CMAKE_PREFIX_PATH (or copy to
+# /usr/local) the same way Dockerfile.llama-cpp does today.
+#
+# Install logic is copied verbatim from backend/Dockerfile.llama-cpp on
+# master so the resulting image is bit-identical to what the variant
+# Dockerfile produces today. Do not paraphrase apt invocations — PR 2
+# depends on bit-equivalence.
+
+ARG BASE_IMAGE=ubuntu:24.04
+
+FROM ${BASE_IMAGE}
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG BUILD_TYPE=""
+ARG CUDA_MAJOR_VERSION=""
+ARG CUDA_MINOR_VERSION=""
+ARG CMAKE_FROM_SOURCE=false
+# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain
+# detection / arch table issues.
+ARG CMAKE_VERSION=3.31.10
+ARG GRPC_VERSION=v1.65.0
+ARG GRPC_MAKEFLAGS="-j4 -Otarget"
+ARG SKIP_DRIVERS=false
+ARG TARGETARCH
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+ARG AMDGPU_TARGETS=""
+
+ENV BUILD_TYPE=${BUILD_TYPE}
+ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
+ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
+ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
+ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
+ENV DEBIAN_FRONTEND=noninteractive
+
+# CUDA on PATH (no-op when CUDA isn't installed)
+ENV PATH=/usr/local/cuda/bin:${PATH}
+# HipBLAS / ROCm on PATH (no-op when ROCm isn't installed)
+ENV PATH=/opt/rocm/bin:${PATH}
+
+WORKDIR /build
+
+# Base apt build deps. Mirrors backend/Dockerfile.llama-cpp lines 85-97
+# (the `builder` stage's apt block) — superset of the gRPC stage's deps
+# so the same image can compile gRPC and downstream backends.
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        ccache git \
+        ca-certificates \
+        make \
+        pkg-config libcurl4-openssl-dev \
+        curl unzip \
+        libssl-dev wget && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Vulkan SDK install. Mirrors backend/Dockerfile.llama-cpp lines 107-154.
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# CuBLAS (CUDA toolkit) install. Mirrors backend/Dockerfile.llama-cpp
+# lines 157-189.
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# cuDSS / NVPL on arm64 + cublas. Mirrors backend/Dockerfile.llama-cpp
+# lines 193-204. https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# ROCm / HIP build deps. Mirrors backend/Dockerfile.llama-cpp lines 215-230.
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+        ldconfig && \
+        # Log which GPU architectures have rocBLAS kernel support
+        echo "rocBLAS library data architectures:" && \
+        (ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
+        echo "WARNING: No rocBLAS kernel data found" \
+    ; fi
+
+RUN echo "TARGETARCH: $TARGETARCH"
+
+# protoc download. Mirrors backend/Dockerfile.llama-cpp lines 237-248.
+# We need protoc installed, and the version in 22.04 is too old. We will create one as part of installing the GRPC build below
+# but that will also bring in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
+# here so that we can generate the grpc code for the stablediffusion build.
+RUN <<EOT bash
+    if [ "amd64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+    if [ "arm64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+EOT
+
+# CMake install. Mirrors backend/Dockerfile.llama-cpp lines 250-261
+# (the `builder` stage's CMake block). The version in 22.04 is too old.
+RUN <<EOT bash
+    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
+        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
+    else
+        apt-get update && \
+        apt-get install -y \
+            cmake && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# gRPC compile + install at /opt/grpc. Mirrors backend/Dockerfile.llama-cpp
+# lines 50-57 (the `grpc` stage's clone+build+install block). Using the
+# same prefix and the same TESTONLY abseil patch so consumer Dockerfiles
+# in PR 2 can copy /opt/grpc -> /usr/local exactly like
+# `COPY --from=grpc /opt/grpc /usr/local` does today.
+#
+# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
+# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
+# and running make install in the target container
+RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
+    mkdir -p /build/grpc/cmake/build && \
+    cd /build/grpc/cmake/build && \
+    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
+    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
+    make && \
+    make install && \
+    rm -rf /build
+
+WORKDIR /