mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-16 20:52:08 -04:00
ci(llama-cpp): add BuildKit ccache mount to the compile step
The big RUN at line 268 of Dockerfile.llama-cpp re-runs from scratch on every LLAMA_VERSION bump (or any LocalAI source change due to COPY . /LocalAI just before). For CUDA-13 specifically that compile recently hit the GHA 6h hard limit and failed: https://github.com/mudler/LocalAI/actions/runs/25598418931/job/75148244557 Add a BuildKit cache mount on /root/.ccache and thread ccache through CMake (CMAKE_C/CXX/CUDA_COMPILER_LAUNCHER) so most translation units hit cache when their preprocessed source is byte-identical to the previous build. The cache mount is exported to the registry as part of the existing cache-to: type=registry,mode=max in backend_build.yml, so it persists across runs. mount id is keyed on TARGETARCH + BUILD_TYPE so different variants don't thrash the same cache slot; sharing=locked serializes concurrent writes. Cold-build effect (first run after enable, or on LLAMA_VERSION bump that touches every TU): unchanged. Hot-build effect (subsequent runs with the same source, or LLAMA_VERSION bumps that touch a handful of files): ~5-15 min for the llama.cpp compile vs the previous 1-3h cold. For CUDA-13 specifically this should bring rebuilds well under the 6h GHA limit. Does NOT help the *first* post-bump build — that's still cold. For that, follow-up work would be: (a) trim CUDA_DOCKER_ARCH to modern GPUs only, (b) audit which CMake variants the published images actually need, (c) pre-built CUDA+gRPC base image. ccache package is already installed in the builder stage (line 90). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -265,12 +265,26 @@ COPY --from=grpc /opt/grpc /usr/local
|
||||
|
||||
COPY . /LocalAI
|
||||
|
||||
RUN <<'EOT' bash
|
||||
# BuildKit cache mount for ccache. Persists compiler outputs across builds
|
||||
# via the registry cache (cache-to: type=registry,mode=max in CI). On a
|
||||
# LLAMA_VERSION bump most TUs are byte-identical to the previous version's
|
||||
# preprocessed source — ccache returns the previous .o file and skips the
|
||||
# real compile. Same for LocalAI source changes that don't touch llama.cpp.
|
||||
# CMAKE_*_COMPILER_LAUNCHER threads ccache through CMake to wrap gcc/g++/nvcc.
|
||||
# sharing=locked serializes concurrent writes if multiple matrix variants
|
||||
# share the same cache mount id.
|
||||
RUN --mount=type=cache,target=/root/.ccache,id=llama-cpp-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked <<'EOT' bash
|
||||
set -euxo pipefail
|
||||
|
||||
export CCACHE_DIR=/root/.ccache
|
||||
ccache --max-size=5G || true
|
||||
ccache -z || true
|
||||
|
||||
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache"
|
||||
|
||||
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
|
||||
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
|
||||
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
|
||||
export CMAKE_ARGS="${CMAKE_ARGS} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
|
||||
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
|
||||
rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
|
||||
fi
|
||||
@@ -289,6 +303,8 @@ else
|
||||
make llama-cpp-grpc
|
||||
make llama-cpp-rpc-server
|
||||
fi
|
||||
|
||||
ccache -s || true
|
||||
EOT
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user