mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-25 09:09:07 -04:00
Replace the per-microarch avx/avx2/avx512/fallback multi-binary build on
x86 with a single grpc-server plus the dlopen-able libggml-cpu-*.so set
that ggml's backend registry selects at runtime by probing host CPU
features. One build instead of four, broader microarch coverage (adds
alderlake AVX-VNNI, zen4 AVX512-BF16, sapphirerapids AMX), and the
shell-side /proc/cpuinfo probing in run.sh goes away.
Build/link notes:
- CPU_ALL_VARIANTS requires GGML_BACKEND_DL + BUILD_SHARED_LIBS=ON, so
ggml/llama become shared objects. SHARED_LIBS is now a make variable
(default OFF) so the override survives the recursive sub-make into the
VARIANT build dir instead of being re-clobbered by the base flags.
- The cpu-all target also builds "--target ggml": the per-microarch
backends are runtime-dlopened, not link deps, so they only compile via
ggml's add_dependencies().
- hw_grpc_proto is pinned STATIC. Under BUILD_SHARED_LIBS=ON it would
otherwise become a DSO referencing hidden-visibility symbols in the
static libprotobuf.a, which fails to link ("hidden symbol ... is
referenced by DSO"). Keeping it static links gRPC/protobuf into the
executable while only ggml/llama stay shared, so no PIC or base-image
change is required.
- package.sh bundles the libggml-*.so set into package/lib; ggml finds
them by scanning the bundled ld.so directory (/proc/self/exe), which
run.sh launches from.
Scope: x86 only. arm64/darwin keep the single fallback build. The
ik-llama-cpp / turboquant forks and the other ggml C++ backends are
unchanged; the same recipe applies but is out of scope here.
Validated with a full docker build plus a live inference smoke test:
the model loads, ggml selects the AVX512_BF16 variant on a Zen-class
host, and tokens generate correctly.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
51 lines
1.5 KiB
Bash
Executable File
51 lines
1.5 KiB
Bash
Executable File
#!/bin/bash
|
|
set -ex
|
|
|
|
# Get the absolute current dir where the script is located
|
|
CURDIR=$(dirname "$(realpath $0)")
|
|
|
|
cd /
|
|
|
|
echo "CPU info:"
|
|
grep -e "model\sname" /proc/cpuinfo | head -1
|
|
grep -e "flags" /proc/cpuinfo | head -1
|
|
|
|
BINARY=llama-cpp-fallback
|
|
|
|
# x86 ships a single llama-cpp-cpu-all built with ggml CPU_ALL_VARIANTS: ggml's backend
|
|
# registry dlopens the best libggml-cpu-*.so for this host, so no shell-side AVX probing.
|
|
# arm64/darwin builds ship only llama-cpp-fallback, so fall back to it when cpu-all absent.
|
|
if [ -e $CURDIR/llama-cpp-cpu-all ]; then
|
|
BINARY=llama-cpp-cpu-all
|
|
fi
|
|
|
|
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
|
|
if [ -e $CURDIR/llama-cpp-grpc ]; then
|
|
BINARY=llama-cpp-grpc
|
|
fi
|
|
fi
|
|
|
|
# Extend ld library path with the dir where this script is located/lib
|
|
if [ "$(uname)" == "Darwin" ]; then
|
|
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
|
|
#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
|
|
else
|
|
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
|
|
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
|
|
if [ -d "$CURDIR/lib/rocblas/library" ]; then
|
|
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
|
|
fi
|
|
fi
|
|
|
|
# If there is a lib/ld.so, use it
|
|
if [ -f $CURDIR/lib/ld.so ]; then
|
|
echo "Using lib/ld.so"
|
|
echo "Using binary: $BINARY"
|
|
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
|
|
fi
|
|
|
|
echo "Using binary: $BINARY"
|
|
exec $CURDIR/$BINARY "$@"
|
|
|
|
# We should never reach this point, however just in case we do, run fallback
|
|
exec $CURDIR/llama-cpp-fallback "$@" |