mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 18:06:58 -04:00
feat(paged): Metal/darwin build availability for llama-cpp-localai-paged
Close the single build-targeting gap the cross-arch audit (ARCH_GENERALITY_AUDIT.md section 6, item 2) flagged: the paged backend had no Metal/darwin variant and no metal: capability key, so a Mac user selecting llama-cpp-localai-paged fell back to default=cpu (a Linux image) that does not run, with no fallthrough to stock llama-cpp. Mirror exactly how stock llama-cpp does darwin: - .github/backend-matrix.yml: add the includeDarwin row (-metal-darwin-arm64-llama-cpp-localai-paged, arch arm64, lang go) next to the stock llama-cpp darwin row. - backend/index.yaml: add the metal: capability key to the llama-cpp-localai-paged meta-backend plus the metal-llama-cpp-localai-paged and -development variant entries (URIs match the matrix tag-suffix); add Metal to tags. - scripts/build/llama-cpp-localai-paged-darwin.sh: new bespoke darwin build, a line-for-line mirror of llama-cpp-darwin.sh swapping the paged wrapper dir, binary names, ggml-shared-libs dir and output tar. Same CPU_ALL_VARIANTS + Metal path (GGML_METAL=ON via the reused llama-cpp Makefile when OS=Darwin; --target ggml pulls in ggml-metal via add_dependencies) with LLAMA_PAGED=on. - Makefile: add backends/llama-cpp-localai-paged-darwin target (+ .NOTPARALLEL). - .github/workflows/backend_build_darwin.yml: give the paged backend the same bespoke darwin build step as stock llama-cpp, share the llama ccache restore (save stays stock-only to avoid a same-run key collision), and exclude it from the generic build-darwin-go-backend step. - scripts/changed-backends.js: comment-only - the paged darwin path mapping was already present (forward-looking); update the stale "if a metal row is ever added" note now that the row exists. Metal delivers paged-KV only (NVFP4 FP4-MMA is CUDA/Blackwell-only); the GDN/conv fused ops have no Metal kernel, so a gated-DeltaNet (qwen35) model falls back to the CPU reference op at runtime - made SAFE by the fused-op backend gate (patch 0030). This is config; the Metal build runs in CI on the next push and is runtime-tested on the M4 Mac. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
94
scripts/build/llama-cpp-localai-paged-darwin.sh
Executable file
94
scripts/build/llama-cpp-localai-paged-darwin.sh
Executable file
@@ -0,0 +1,94 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -ex
|
||||
|
||||
# Darwin/Metal build for the llama-cpp-localai-paged backend. Mirrors
|
||||
# scripts/build/llama-cpp-darwin.sh exactly, swapping the build dir, binary names,
|
||||
# shared-lib dir and output tar for the paged wrapper. The paged wrapper Makefile
|
||||
# (backend/cpp/llama-cpp-localai-paged) reuses backend/cpp/llama-cpp's CMakeLists
|
||||
# /grpc-server with LLAMA_PAGED=on, so the Darwin/Metal path is identical: ggml
|
||||
# CPU_ALL_VARIANTS + GGML_METAL=ON, and --target ggml pulls in ggml-metal via
|
||||
# add_dependencies so the Metal GPU backend is produced as a loadable
|
||||
# libggml-metal.dylib. The new paged GDN/conv ops have no Metal kernel, so a
|
||||
# gated-DeltaNet (qwen35) model falls back to the CPU reference op at runtime
|
||||
# (assert/fall-back is made SAFE by the fused-op backend gate, patch 0030); a
|
||||
# non-qwen35 model gets the full paged-KV path on Metal.
|
||||
|
||||
IMAGE_NAME="${IMAGE_NAME:-localai/llama-cpp-localai-paged-darwin}"
|
||||
|
||||
pushd backend/cpp/llama-cpp-localai-paged
|
||||
|
||||
# Single build via ggml CPU_ALL_VARIANTS: one binary plus the per-microarch Apple/arm
|
||||
# dylibs (apple_m1/m2_m3/m4, armv8.x) that ggml selects at runtime. GGML_METAL stays ON
|
||||
# and --target ggml also builds ggml-metal (via add_dependencies), so the Metal GPU
|
||||
# backend is still produced as a loadable libggml-metal.dylib.
|
||||
make llama-cpp-localai-paged-cpu-all && \
|
||||
make llama-cpp-localai-paged-grpc && \
|
||||
make llama-cpp-localai-paged-rpc-server
|
||||
|
||||
popd
|
||||
|
||||
mkdir -p build/darwin
|
||||
mkdir -p backend-images
|
||||
mkdir -p build/darwin/lib
|
||||
|
||||
cp -rf backend/cpp/llama-cpp-localai-paged/llama-cpp-localai-paged-cpu-all build/darwin/
|
||||
cp -rf backend/cpp/llama-cpp-localai-paged/llama-cpp-localai-paged-grpc build/darwin/
|
||||
cp -rf backend/cpp/llama-cpp-localai-paged/llama-cpp-localai-paged-rpc-server build/darwin/
|
||||
|
||||
# Distribute the shared ggml/llama libraries from the CPU_ALL_VARIANTS build. Unlike the
|
||||
# old fully-static fallback build, these have @rpath install names, so the otool loop below
|
||||
# (which only copies deps that exist on disk) will not pick them up. The split is by suffix:
|
||||
# - ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a .so
|
||||
# suffix EVEN ON DARWIN. These go in the package ROOT next to the binary, because darwin
|
||||
# run.sh execs the binary directly (no bundled ld.so) so ggml's executable-directory
|
||||
# scan looks there.
|
||||
# - the core libraries (libggml-base/libggml/libllama/libllama-common/libmtmd) use the
|
||||
# platform .dylib suffix and are NEEDED deps; they go in lib/, resolved at load time via
|
||||
# the DYLD_LIBRARY_PATH=lib that run.sh exports. -a preserves the version symlinks.
|
||||
SHLIBS=backend/cpp/llama-cpp-localai-paged/ggml-shared-libs
|
||||
cp -a $SHLIBS/*.so build/darwin/
|
||||
cp -a $SHLIBS/*.dylib build/darwin/lib/
|
||||
|
||||
# Set default additional libs only for Darwin on M chips (arm64)
|
||||
if [[ "$(uname -s)" == "Darwin" && "$(uname -m)" == "arm64" ]]; then
|
||||
ADDITIONAL_LIBS=${ADDITIONAL_LIBS:-$(ls /opt/homebrew/Cellar/protobuf/**/lib/libutf8_validity*.dylib 2>/dev/null)}
|
||||
else
|
||||
ADDITIONAL_LIBS=${ADDITIONAL_LIBS:-""}
|
||||
fi
|
||||
|
||||
for file in $ADDITIONAL_LIBS; do
|
||||
cp -rfv $file build/darwin/lib
|
||||
done
|
||||
|
||||
for file in build/darwin/*; do
|
||||
LIBS="$(otool -L $file | awk 'NR > 1 { system("echo " $1) } ' | xargs echo)"
|
||||
for lib in $LIBS; do
|
||||
# only libraries ending in dylib
|
||||
if [[ "$lib" == *.dylib ]]; then
|
||||
if [ -e "$lib" ]; then
|
||||
cp -rvf "$lib" build/darwin/lib
|
||||
fi
|
||||
fi
|
||||
done
|
||||
done
|
||||
|
||||
echo "--------------------------------"
|
||||
echo "ADDITIONAL_LIBS: $ADDITIONAL_LIBS"
|
||||
echo "--------------------------------"
|
||||
|
||||
echo "Bundled libraries:"
|
||||
ls -la build/darwin/lib
|
||||
|
||||
|
||||
cp -rf backend/cpp/llama-cpp-localai-paged/run.sh build/darwin/
|
||||
|
||||
PLATFORMARCH="${PLATFORMARCH:-darwin/arm64}"
|
||||
|
||||
./local-ai util create-oci-image \
|
||||
build/darwin/. \
|
||||
--output ./backend-images/llama-cpp-localai-paged.tar \
|
||||
--image-name $IMAGE_NAME \
|
||||
--platform $PLATFORMARCH
|
||||
|
||||
rm -rf build/darwin
|
||||
@@ -75,8 +75,10 @@ function inferBackendPathDarwin(item) {
|
||||
if (item.backend === "llama-cpp") {
|
||||
return `backend/cpp/llama-cpp/`;
|
||||
}
|
||||
// llama-cpp-localai-paged on Darwin (if a metal row is ever added to
|
||||
// includeDarwin) builds from the C++ sources under backend/cpp/llama-cpp-localai-paged.
|
||||
// llama-cpp-localai-paged on Darwin (the -metal-darwin-arm64-llama-cpp-localai-paged
|
||||
// includeDarwin row) builds from the C++ sources under
|
||||
// backend/cpp/llama-cpp-localai-paged, like stock llama-cpp. The matrix entry
|
||||
// carries lang=go for runner/toolchain selection, but the source is C++.
|
||||
if (item.backend === "llama-cpp-localai-paged") {
|
||||
return `backend/cpp/llama-cpp-localai-paged/`;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user