fix(llama-cpp-darwin): distribute ggml backends by suffix (.so root, .dylib lib)

ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a .so suffix even on darwin, while the core libraries (ggml-base/ggml/llama/ llama-common/mtmd) use .dylib. Split the distribution by suffix: .so DL backends go in the package root for ggml's executable-directory scan, .dylib core libs go in lib/ for DYLD_LIBRARY_PATH. The previous .dylib name-pattern matched none of the variants. Verified on an M4: ggml loads the apple_m4 CPU variant (SME=1) and Metal, model loads and generates correct tokens. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-25 00:59:28 -04:00 · 2026-06-24 21:59:29 +00:00
parent 3b47122e54
commit 4e9bb4f879
1 changed files with 12 additions and 10 deletions
--- a/scripts/build/llama-cpp-darwin.sh
+++ b/scripts/build/llama-cpp-darwin.sh
@@ -24,17 +24,19 @@ cp -rf backend/cpp/llama-cpp/llama-cpp-cpu-all build/darwin/
 cp -rf backend/cpp/llama-cpp/llama-cpp-grpc build/darwin/
 cp -rf backend/cpp/llama-cpp/llama-cpp-rpc-server build/darwin/

-# Distribute the shared ggml/llama dylibs from the CPU_ALL_VARIANTS build. Unlike the old
-# fully-static fallback build, these are real dylibs with @rpath install names, so the
-# otool loop below (which only copies deps that exist on disk) will not pick them up.
-#  - the per-microarch libggml-cpu-*.dylib go in the package ROOT, next to the binary,
-#    because on darwin run.sh execs the binary directly (no bundled ld.so) and ggml
-#    discovers CPU backends by scanning the executable's own directory.
-#  - everything else (libggml-base/libggml/libllama/libmtmd/libggml-metal/...) goes in
-#    lib/, resolved at load time via the DYLD_LIBRARY_PATH=lib that run.sh exports.
+# Distribute the shared ggml/llama libraries from the CPU_ALL_VARIANTS build. Unlike the
+# old fully-static fallback build, these have @rpath install names, so the otool loop below
+# (which only copies deps that exist on disk) will not pick them up. The split is by suffix:
+#  - ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a .so
+#    suffix EVEN ON DARWIN. These go in the package ROOT next to the binary, because darwin
+#    run.sh execs the binary directly (no bundled ld.so) so ggml's executable-directory
+#    scan looks there.
+#  - the core libraries (libggml-base/libggml/libllama/libllama-common/libmtmd) use the
+#    platform .dylib suffix and are NEEDED deps; they go in lib/, resolved at load time via
+#    the DYLD_LIBRARY_PATH=lib that run.sh exports. -a preserves the version symlinks.
 SHLIBS=backend/cpp/llama-cpp/ggml-shared-libs
-cp -rfv $SHLIBS/libggml-cpu-*.dylib build/darwin/
-find $SHLIBS -name '*.dylib' ! -name 'libggml-cpu-*.dylib' -exec cp -rfv {} build/darwin/lib/ \;
+cp -a $SHLIBS/*.so build/darwin/
+cp -a $SHLIBS/*.dylib build/darwin/lib/

 # Set default additional libs only for Darwin on M chips (arm64)
 if [[ "$(uname -s)" == "Darwin" && "$(uname -m)" == "arm64" ]]; then