Commit Graph

4 Commits

Author SHA1 Message Date
Ettore Di Giacinto
e47c58656f feat(llama-cpp): single x86 CPU build via ggml CPU_ALL_VARIANTS
Replace the per-microarch avx/avx2/avx512/fallback multi-binary build on
x86 with a single grpc-server plus the dlopen-able libggml-cpu-*.so set
that ggml's backend registry selects at runtime by probing host CPU
features. One build instead of four, broader microarch coverage (adds
alderlake AVX-VNNI, zen4 AVX512-BF16, sapphirerapids AMX), and the
shell-side /proc/cpuinfo probing in run.sh goes away.

Build/link notes:
- CPU_ALL_VARIANTS requires GGML_BACKEND_DL + BUILD_SHARED_LIBS=ON, so
  ggml/llama become shared objects. SHARED_LIBS is now a make variable
  (default OFF) so the override survives the recursive sub-make into the
  VARIANT build dir instead of being re-clobbered by the base flags.
- The cpu-all target also builds "--target ggml": the per-microarch
  backends are runtime-dlopened, not link deps, so they only compile via
  ggml's add_dependencies().
- hw_grpc_proto is pinned STATIC. Under BUILD_SHARED_LIBS=ON it would
  otherwise become a DSO referencing hidden-visibility symbols in the
  static libprotobuf.a, which fails to link ("hidden symbol ... is
  referenced by DSO"). Keeping it static links gRPC/protobuf into the
  executable while only ggml/llama stay shared, so no PIC or base-image
  change is required.
- package.sh bundles the libggml-*.so set into package/lib; ggml finds
  them by scanning the bundled ld.so directory (/proc/self/exe), which
  run.sh launches from.

Scope: x86 only. arm64/darwin keep the single fallback build. The
ik-llama-cpp / turboquant forks and the other ggml C++ backends are
unchanged; the same recipe applies but is out of scope here.

Validated with a full docker build plus a live inference smoke test:
the model loads, ggml selects the AVX512_BF16 variant on a Zen-class
host, and tokens generate correctly.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 21:21:03 +00:00
Ettore Di Giacinto
f891d60d26 fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099)
chore(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:58:14 +01:00
Copilot
fd53978a7b feat: package GPU libraries inside backend containers for unified base image (#7891)
* Initial plan

* Add GPU library packaging for isolated backend environments

- Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries
- Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries
- Update Dockerfile.python to package GPU libraries into Python backends
- Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: fix variable consistency and quoting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review issues: improve glob handling and remove redundant variable

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify main Dockerfile and workflow to use unified base image

- Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel)
- Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64
- GPU libraries are now packaged in individual backend containers

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 15:48:51 +01:00
Ettore Di Giacinto
294f7022f3 feat: do not bundle llama-cpp anymore (#5790)
* Build llama.cpp separately

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Start to try to attach some tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add git and small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: correctly autoload external backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run AIO tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Slightly update the Makefile helps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt auto-bumper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run linux test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add llama-cpp into build pipelines

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add default capability (for cpu)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop llama-cpp specific logic from the backend loader

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop grpc install in ci for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pass by backends path for tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build protogen at start

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests): set backends path consistently

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly configure the backends path

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to build for darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Compile for metal on arm64/darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run build off from cross-arch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the backend index nvidia-l4t and cpu's llama-cpp backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Build also darwin-x86 for llama-cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable arm64 builds temporary

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Test backend build on PR

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup build backend reusable workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* pass by skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use crane

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Skip drivers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* x86 darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add packaging step for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix leftover from bark-cpp extraction

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to fix hipblas build

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-07-18 13:24:12 +02:00