From 4e9bb4f8790a6da9778caf32efa63a1120955261 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@localai.io>
Date: Wed, 24 Jun 2026 21:59:29 +0000
Subject: [PATCH] fix(llama-cpp-darwin): distribute ggml backends by suffix
 (.so root, .dylib lib)

ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a
.so suffix even on darwin, while the core libraries (ggml-base/ggml/llama/
llama-common/mtmd) use .dylib. Split the distribution by suffix: .so DL backends
go in the package root for ggml's executable-directory scan, .dylib core libs go
in lib/ for DYLD_LIBRARY_PATH. The previous .dylib name-pattern matched none of the
variants.

Verified on an M4: ggml loads the apple_m4 CPU variant (SME=1) and Metal, model
loads and generates correct tokens.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---
 scripts/build/llama-cpp-darwin.sh | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/scripts/build/llama-cpp-darwin.sh b/scripts/build/llama-cpp-darwin.sh
index 3bbd963e6..adec88f04 100644
--- a/scripts/build/llama-cpp-darwin.sh
+++ b/scripts/build/llama-cpp-darwin.sh
@@ -24,17 +24,19 @@ cp -rf backend/cpp/llama-cpp/llama-cpp-cpu-all build/darwin/
 cp -rf backend/cpp/llama-cpp/llama-cpp-grpc build/darwin/
 cp -rf backend/cpp/llama-cpp/llama-cpp-rpc-server build/darwin/
 
-# Distribute the shared ggml/llama dylibs from the CPU_ALL_VARIANTS build. Unlike the old
-# fully-static fallback build, these are real dylibs with @rpath install names, so the
-# otool loop below (which only copies deps that exist on disk) will not pick them up.
-#  - the per-microarch libggml-cpu-*.dylib go in the package ROOT, next to the binary,
-#    because on darwin run.sh execs the binary directly (no bundled ld.so) and ggml
-#    discovers CPU backends by scanning the executable's own directory.
-#  - everything else (libggml-base/libggml/libllama/libmtmd/libggml-metal/...) goes in
-#    lib/, resolved at load time via the DYLD_LIBRARY_PATH=lib that run.sh exports.
+# Distribute the shared ggml/llama libraries from the CPU_ALL_VARIANTS build. Unlike the
+# old fully-static fallback build, these have @rpath install names, so the otool loop below
+# (which only copies deps that exist on disk) will not pick them up. The split is by suffix:
+#  - ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a .so
+#    suffix EVEN ON DARWIN. These go in the package ROOT next to the binary, because darwin
+#    run.sh execs the binary directly (no bundled ld.so) so ggml's executable-directory
+#    scan looks there.
+#  - the core libraries (libggml-base/libggml/libllama/libllama-common/libmtmd) use the
+#    platform .dylib suffix and are NEEDED deps; they go in lib/, resolved at load time via
+#    the DYLD_LIBRARY_PATH=lib that run.sh exports. -a preserves the version symlinks.
 SHLIBS=backend/cpp/llama-cpp/ggml-shared-libs
-cp -rfv $SHLIBS/libggml-cpu-*.dylib build/darwin/
-find $SHLIBS -name '*.dylib' ! -name 'libggml-cpu-*.dylib' -exec cp -rfv {} build/darwin/lib/ \;
+cp -a $SHLIBS/*.so build/darwin/
+cp -a $SHLIBS/*.dylib build/darwin/lib/
 
 # Set default additional libs only for Darwin on M chips (arm64)
 if [[ "$(uname -s)" == "Darwin" && "$(uname -m)" == "arm64" ]]; then