mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
14b29ebf4e9d6359e1709c0ba0d6d9c12690ede9
20 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
14b29ebf4e |
fix(backends): derive darwin RUN_BINARY from the exec line only (#10541)
golang-darwin.sh's packaging check derived the launch binary by grepping every $CURDIR/... reference in run.sh and taking the last one. Backends that pick a runtime CPU variant assign it via unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so` lines, so the heuristic returned `libgo<x>-avx512.so` — a variant Darwin never builds (arm64 builds only fallback) — and the check then failed with "package/libgo<x>-avx512.so not found ... refusing to package (#10267)", breaking the darwin builds for whisper, sam3-cpp, vibevoice-cpp and friends. Scan only the `exec` line(s) (the actual launch contract) and tolerate a quoted `exec "$CURDIR"/<binary>`. parakeet-cpp's parakeet-cpp-grpc and the quoted-only backends (sherpa/piper/opus) resolve correctly; no Linux change. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
f98b0f1c1e |
fix(gpu-libs): bundle transitive deps of GPU runtime libs (#10537) (#10539)
fix(gpu-libs): bundle transitive deps of GPU runtime libs The per-vendor packagers in package-gpu-libs.sh copy an explicit allowlist of top-level GPU runtime libraries (libamdhip64, libhipblas, librocblas, the CUDA/Intel equivalents, ...) but never resolved their transitive dependencies. Backends run through the bundled lib/ld.so with LD_LIBRARY_PATH=lib, so any transitive dep not in the allowlist is a fatal "cannot open shared object file" at load time. On recent ROCm (base image rocm 7.2.1) the runtime libs link against librocprofiler-register.so.0, which is not in the allowlist, so the rocm llama-cpp backend (and every other GPU backend sharing this script) failed to load with: librocprofiler-register.so.0: cannot open shared object file The Vulkan path already solved this class of problem with copy_elf_deps (ldd-based transitive resolution), but that sweep was only wired into the Vulkan ICD path. This adds a generic sweep_transitive_deps that runs the same ldd resolution over everything the allowlist already bundled, and wires it into the ROCm, CUDA and Intel packagers. ldd returns the full recursive closure, so one pass suffices; core libc-family deps are skipped via is_core_lib so we never shadow the loader's own libc/libstdc++. Adds a self-contained regression test (gcc + ldd) that fabricates a primary lib linking a transitive lib and asserts the sweep bundles the dependency. Fixes #10537 Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
f58dcefed4 |
fix(backends): ship the package/ dir for darwin go backend images (#10522)
fix(backends): ship the package/ dir for darwin go backends golang-darwin.sh packaged the whole backend source/build dir as the OCI image (backend/go/$BACKEND/.), so the runtime dylibs ended up under package/lib and backend-assets/lib while run.sh looks in $CURDIR/lib. As a result a backend like sherpa-onnx could not dlopen its libsherpa-shim.dylib at runtime and exited immediately (the model then 500s with "grpc service not ready"); it started fine only when run from inside package/. Ship package/. instead — the self-contained run.sh + binary + lib/ bundle — matching the Linux Dockerfile.golang (`COPY .../package/. ./`). Backends that don't assemble a package/ fall back to the backend dir, and the binary-existence guard now checks the directory actually shipped. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
d388f874de |
feat(backends): darwin/Metal build for the privacy-filter backend (#10513)
* feat(backends): darwin/Metal build for the privacy-filter backend (timeboxed try) The privacy-filter.cpp engine is already Metal-capable on Apple Silicon: it pulls ggml and never forces GGML_METAL=OFF, and ggml defaults Metal ON on Apple, so a plain Darwin build is Metal-enabled. grpc++/protobuf resolve from Homebrew via find_package(... CONFIG). It just had no darwin build path - the existing package.sh and run.sh are Linux-only and there was no make target / workflow step. Adds the bespoke darwin path, modeled on the ds4 one: - scripts/build/privacy-filter-darwin.sh: native make grpc-server, otool -L dylib bundling, create-oci-image (no Linux package.sh). - Makefile: backends/privacy-filter-darwin target (+ .NOTPARALLEL). - .github/workflows/backend_build_darwin.yml: gated build step for privacy-filter. - scripts/changed-backends.js: inferBackendPathDarwin special-case -> backend/cpp. - .github/backend-matrix.yml: includeDarwin entry (lang go, like ds4/llama-cpp). - backend/index.yaml: metal: capability + metal-privacy-filter(-development) entries. - backend/cpp/privacy-filter/run.sh: DYLD_LIBRARY_PATH branch on Darwin. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(privacy-filter): macOS proto include + bundle ggml dylibs Validated natively on an M4 (the build/package/load chain now works with Metal): - CMakeLists.txt: hw_grpc_proto compiles the generated proto/grpc sources but only linked the binary dir, so on macOS it could not find protobuf's headers (runtime_version.h) - Homebrew puts them under /opt/homebrew, not /usr/include. Link protobuf::libprotobuf + gRPC::grpc++ so their include dirs propagate. No-op on Linux (apt headers are already on the default search path). - privacy-filter-darwin.sh: bundle the ggml shared libs the binary @rpath-links (libggml{,-base,-cpu,-blas,-metal}); the otool -L walk only catches on-disk absolute deps and missed them. Resolved at runtime by run.sh's DYLD_LIBRARY_PATH. M4 check: arm64 grpc-server links @rpath/libggml-metal.0.dylib; with the 15 ggml dylibs + grpc/protobuf bundled, it loads clean (no dyld errors) and prints usage. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
4ac67d255d |
feat: single-build ggml CPU_ALL_VARIANTS for llama-cpp + turboquant (x86/arm64/apple) (#10497)
* feat(llama-cpp): single x86 CPU build via ggml CPU_ALL_VARIANTS
Replace the per-microarch avx/avx2/avx512/fallback multi-binary build on
x86 with a single grpc-server plus the dlopen-able libggml-cpu-*.so set
that ggml's backend registry selects at runtime by probing host CPU
features. One build instead of four, broader microarch coverage (adds
alderlake AVX-VNNI, zen4 AVX512-BF16, sapphirerapids AMX), and the
shell-side /proc/cpuinfo probing in run.sh goes away.
Build/link notes:
- CPU_ALL_VARIANTS requires GGML_BACKEND_DL + BUILD_SHARED_LIBS=ON, so
ggml/llama become shared objects. SHARED_LIBS is now a make variable
(default OFF) so the override survives the recursive sub-make into the
VARIANT build dir instead of being re-clobbered by the base flags.
- The cpu-all target also builds "--target ggml": the per-microarch
backends are runtime-dlopened, not link deps, so they only compile via
ggml's add_dependencies().
- hw_grpc_proto is pinned STATIC. Under BUILD_SHARED_LIBS=ON it would
otherwise become a DSO referencing hidden-visibility symbols in the
static libprotobuf.a, which fails to link ("hidden symbol ... is
referenced by DSO"). Keeping it static links gRPC/protobuf into the
executable while only ggml/llama stay shared, so no PIC or base-image
change is required.
- package.sh bundles the libggml-*.so set into package/lib; ggml finds
them by scanning the bundled ld.so directory (/proc/self/exe), which
run.sh launches from.
Scope: x86 only. arm64/darwin keep the single fallback build. The
ik-llama-cpp / turboquant forks and the other ggml C++ backends are
unchanged; the same recipe applies but is out of scope here.
Validated with a full docker build plus a live inference smoke test:
the model loads, ggml selects the AVX512_BF16 variant on a Zen-class
host, and tokens generate correctly.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* feat(llama-cpp,turboquant): extend CPU_ALL_VARIANTS to arm64 + turboquant
- llama-cpp: x86 AND arm64 now use the single llama-cpp-cpu-all build
(only hipblas keeps the fallback build). ggml's arm64 variant table
(armv8.x / armv9.x, plus apple_m* on darwin) is selected at runtime.
- turboquant: same recipe via a turboquant-cpu-all target. turboquant
copies backend/cpp/llama-cpp's CMakeLists.txt + Makefile per flavor, so
the hw_grpc_proto STATIC fix and the SHARED_LIBS / EXTRA_CMAKE_ARGS
make-vars are inherited; the target just passes SHARED_LIBS=ON, the DL
flags and --target ggml through, then collects the .so set. run.sh and
package.sh updated to ship/select turboquant-cpu-all.
- Makefile lib-collection find now also matches *.dylib (for the darwin
build, which emits dylibs rather than .so).
ik-llama-cpp is intentionally left unchanged: its pinned ggml has no
CPU_ALL_VARIANTS support and its IQK kernels require AVX2, so the
per-microarch dynamic backend set does not apply.
Scope still excludes the darwin packaging wiring (separate change).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* feat(llama-cpp,turboquant): arm64 gcc-14 for SME variants + darwin cpu-all packaging
- arm64: ggml CPU_ALL_VARIANTS builds armv9.2 SME variants whose -march=...+sme
is rejected by the Ubuntu 24.04 default gcc-13. Build the arm64 variants with
gcc-14 (installed in the compile step). The host only selects a variant it
actually supports at runtime, but every variant must still compile.
- darwin: scripts/build/llama-cpp-darwin.sh builds llama-cpp-cpu-all instead of
the fallback binary, keeps Metal (GGML_METAL stays ON; --target ggml also builds
ggml-metal). The per-microarch libggml-cpu-*.dylib are placed in the package
root next to the binary (darwin has no bundled ld.so, so ggml's executable-dir
scan looks there), while the other shared dylibs go in lib/ for DYLD_LIBRARY_PATH.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(llama-cpp-darwin): distribute ggml backends by suffix (.so root, .dylib lib)
ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a
.so suffix even on darwin, while the core libraries (ggml-base/ggml/llama/
llama-common/mtmd) use .dylib. Split the distribution by suffix: .so DL backends
go in the package root for ggml's executable-directory scan, .dylib core libs go
in lib/ for DYLD_LIBRARY_PATH. The previous .dylib name-pattern matched none of the
variants.
Verified on an M4: ggml loads the apple_m4 CPU variant (SME=1) and Metal, model
loads and generates correct tokens.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(llama-cpp,turboquant): only CPU_ALL_VARIANTS for pure-CPU builds, GPU uses fallback
The previous gate sent every non-hipblas build through llama-cpp-cpu-all, so the
GPU image builds (cublas, sycl_f16/f32, vulkan, nvidia l4t) compiled the whole CPU
microarch variant matrix on top of their already-huge GPU backend - blowing the
build time (the sycl job was only 59% done after 2h11m) - and the arm64 l4t build
failed at `apt-get install gcc-14` (exit 100) on the Jetson base.
Gate on an empty BUILD_TYPE instead: only the pure CPU image (build-type: '' in
.github/backend-matrix.yml) builds the CPU_ALL_VARIANTS set; every GPU build gets a
single fallback CPU grpc-server, since the accelerator does the compute. This also
confines the arm64 gcc-14 step (needed for the armv9.2 SME variants) to the CPU
build, away from the GPU base images.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* docs(llama-cpp): correct run.sh comment for arm64/darwin cpu-all
arm64 and darwin CPU images now also ship llama-cpp-cpu-all (not fallback-only);
only GPU images ship fallback-only. Fix the stale comment to match.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
606128e4e9 |
feat(vulkan): make Vulkan backends self-contained on the GPU (#10404)
Vulkan backends bundled their own loader and ICD manifests but neither the Mesa driver the manifests point at nor a way to make the loader find them, so on a runtime base image without Mesa the loader enumerated zero devices and the GPU silently fell back to CPU (only NVIDIA worked, since its ICD is injected by the container toolkit). - scripts/build/package-gpu-libs.sh: for each installed ICD manifest, bundle the driver .so its library_path names — no hard-coded, platform-dependent soname list — plus that driver's ldd dependencies, skipping manifests whose driver isn't installed. Rewrite each library_path to a bare soname so the bundled driver resolves via the LD_LIBRARY_PATH run.sh already sets. - .docker/install-base-deps.sh, backend/Dockerfile.golang, backend/Dockerfile.python: install mesa-vulkan-drivers in every Vulkan builder so the driver + manifests exist to be packaged (the LunarG SDK ships only the loader and shader tooling). - pkg/model/process.go: when a backend ships vulkan/icd.d/, point the loader at it via VK_DRIVER_FILES/VK_ICD_FILENAMES at launch (no-op otherwise). Covered by pkg/model/process_vulkan_test.go. - backend/go/parakeet-cpp/package.sh: complete the L0 stub (was missing the libc-family ldd walk + GPU-lib packaging) by mirroring whisper, so the vulkan-parakeet image actually bundles its GPU runtime. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
6b9f1bd4b3 |
chore: ⬆️ Update antirez/ds4 to e34a8086693ba7ca5cfabd2b9028ee52f0bfac2e (#10350)
* ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(ds4): add Homebrew include/lib prefix for Darwin grpc-proto build The darwin/metal ds4 backend job runs for the first time on this bump (it was skipped on prior ds4 PRs) and fails compiling backend.pb.cc with 'google/protobuf/runtime_version.h' file not found. hw_grpc_proto links neither protobuf::libprotobuf nor gRPC::grpc++, so the generated proto sources rely on default system include paths. That works on Linux (/usr/include) but not on macOS, where Homebrew installs under /opt/homebrew. Add the Homebrew prefix to include/link dirs on Darwin, mirroring the llama-cpp backend that already builds on Darwin CI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): install nlohmann-json on Darwin CI for ds4 backend After the protobuf include-path fix the ds4 darwin build advances to compiling dsml_renderer.cpp, which includes <nlohmann/json.hpp> and #errors when absent. On Linux the header comes from apt nlohmann-json3-dev in the build image; the macOS runner had no equivalent. Add the header-only nlohmann-json formula to the shared Darwin backend brew install/link list and Homebrew cache, alongside the existing deps. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): build proper OCI image tar for Darwin backend The darwin packaging referenced scripts/build/oci-pack.sh, which was never added to the tree, so it fell back to a plain 'tar' that omits manifest.json. 'local-ai backends install' then rejects the tarball with 'file manifest.json not found in tar'. Use './local-ai util create-oci-image' (already built by the 'build' prerequisite of the backends/ds4-darwin target), mirroring llama-cpp-darwin.sh, to emit a real OCI image the installer accepts. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
cf71e291b4 |
fix(darwin): fix vibevoice-cpp build linkage + fail-safe go backend packaging (#10276)
* fix(darwin): never package a go backend build tree as a working image
The darwin/arm64 vibevoice-cpp image shipped the source tree with a
half-built CMake directory (build-libgovibevoicecpp-fallback.so/) and no
backend binary, so the backend could never start: run.sh exec'd a
vibevoice-cpp binary that was not in the package and LocalAI timed out
waiting for the gRPC service.
Two durable, backend-agnostic defenses:
- backend/go/vibevoice-cpp/Makefile: mirror whisper's cleanup discipline so a
partial CMake tree cannot survive into packaging. Run `make purge` before
each variant build and `rm -rfv build*` after. The old recipe only removed
its build dir after a successful `mv`, so a failed build left the half-built
tree behind.
- scripts/build/golang-darwin.sh: before creating the OCI image, remove any
stray build-* directory and assert that the binary run.sh launches actually
exists. A build that produced no binary now fails the job loudly instead of
publishing a source tree as a working backend. The binary name is derived
from run.sh's `exec $CURDIR/<binary>` line (parakeet-cpp launches
parakeet-cpp-grpc, so it is not always ${BACKEND}) with a ${BACKEND}
fallback.
The underlying native build failure that left vibevoice-cpp half-built still
needs to be reproduced and fixed on Apple Silicon; this change ensures such a
failure can never again be published as a working image.
Refs #10267
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(vibevoice-cpp): build libvibevoice.a on darwin (link target, not path)
The darwin build failed with:
No rule to make target 'vibevoice/libvibevoice.a', needed by
'libgovibevoicecpp.so'. Stop.
The upstream vibevoice project is added with add_subdirectory(... EXCLUDE_FROM_ALL),
so its `vibevoice` static-library target is only built when something links it
as a target. The Apple branch linked only `$<TARGET_FILE:vibevoice>` - a bare
archive path with no target reference - so CMake never emitted a rule to build
libvibevoice.a, while the Linux branch worked because it passes the `vibevoice`
target name inside the --whole-archive flags.
Link the `vibevoice` target on Apple (establishing the build dependency) and
apply -force_load as a separate link option to keep whole-archive semantics so
purego can dlsym the vv_capi_* symbols.
Refs #10267
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
07f6c15a37 |
feat(ds4): layer-split distributed inference (#10098)
* feat(ds4): add standalone ds4-worker distributed worker binary Add worker_main.c, a minimal standalone worker that owns a slice of the model's transformer layers and serves activations over ds4's own TCP transport via ds4_dist_run(). It links the same engine objects the backend already builds (including ds4_distributed.o) and has NO gRPC/protobuf dependency, so it builds even on hosts lacking protobuf/grpc dev headers. Launched by `local-ai worker ds4-distributed`. Wire the ds4-worker CMake target (mirrors grpc-server's object/GPU/native handling) and have the Makefile copy + clean the binary alongside grpc-server. Ignore the built ds4-worker artifact. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): package ds4-worker alongside grpc-server Copy the standalone ds4-worker binary into the backend package (Linux package.sh) and the Darwin OCI tar (ds4-darwin.sh: both the explicit copy and the otool dylib-bundling loop) so distributed workers ship with the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): tighten ds4-worker integer arg validation to match upstream Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): wire grpc-server as distributed coordinator Add distributed COORDINATOR support to the ds4 backend's gRPC server. Distributed inference is an engine backend: when LoadModel receives 'ds4_role:coordinator', the process populates ds4_engine_options.distributed (role, layer slice, listen host/port) before ds4_engine_open, then the normal ds4_session_* generation path runs transparently once the worker route covers all layers. - New LoadModel options: ds4_role, ds4_layers (START:END or START:output), ds4_listen (host:port), ds4_route_timeout. - parse_layers_spec() maps the layer spec onto ds4_distributed_layers. - wait_route_ready() blocks generation until ds4_session_distributed_route_ready() reports full coverage (or timeout), gating both Predict and PredictStream; returns UNAVAILABLE on timeout/error. - No ds4_role => g_distributed stays false and wait_route_ready is a no-op, so single-node behavior is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): don't block Status during route wait; validate coordinator opts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): add ds4-distributed worker exec helper Add the ds4WorkerArgs helper plus findDS4Backend/DS4Distributed.Run that resolve the ds4 backend via the gallery and exec the packaged ds4-worker binary. Unlike worker_llamacpp.go, ds4 bundles its own dynamic loader (lib/ld.so) for glibc compatibility, so when present we exec ds4-worker through that loader with LD_LIBRARY_PATH=<backend>/lib, mirroring backend/cpp/ds4/run.sh; otherwise we exec it directly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): register the ds4-distributed worker subcommand Wire DS4Distributed into the Worker kong command tree so `local-ai worker ds4-distributed` is available. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): document layer-split distributed inference Add a ds4 section to the distributed-mode feature docs (coordinator model YAML, manual worker command, layer-range semantics, the 'GGUF on every machine' requirement, coordinator-listens dial direction vs llama.cpp) and a terse Distributed mode section to the ds4 backend agent guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): opt-in hardware-gated distributed e2e spec Add a self-contained, opt-in Ginkgo spec to the backend e2e suite that spins a ds4 coordinator (via the packaged run.sh, loaded with ds4_role/ds4_layers/ds4_listen options) plus a ds4-worker process for the upper layers, then uses Eventually to assert a short successful Predict once the layer route forms, before tearing the worker down. Gated by BACKEND_TEST_DS4_DISTRIBUTED=1 (plus the existing BACKEND_BINARY + BACKEND_TEST_MODEL_FILE and optional layer/listen/accel knobs); compiles and skips cleanly with no env, hardware, or model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): pass coordinator ctx to worker; lowercase error string Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): note distributed transport is plaintext/unauthenticated Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * style(ds4): replace em dashes in distributed docs/agent/test per repo convention Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): link ds4-worker with the C++ driver for CUDA/Metal builds The ds4-worker target is built from worker_main.c (C), so CMake linked it with the C driver. The nvcc-built ds4_cuda.o (and Obj-C++ ds4_metal.o) reference the C++ runtime, so the CUDA/Metal builds failed with undefined libstdc++ symbols (std::__throw_length_error). The CPU build passed because ds4_cpu.o is pure C. Force LINKER_LANGUAGE CXX so libstdc++ is linked. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
d892e4af80 |
feat: add ds4 backend (DeepSeek V4 Flash) with tool calls, thinking, KV cache (#9758)
* test(e2e-backends): allow BACKEND_BINARY for native-built backends
Adds an escape hatch for hardware-gated backends (e.g. ds4) where the
model is too large for Docker build context. When BACKEND_BINARY points
at a run.sh produced by 'make -C backend/cpp/<name> package', the suite
skips docker image extraction and drives the binary directly.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(e2e-backends): validate BACKEND_BINARY basename + log actual source
Two follow-ups from the
|
||
|
|
1d0de757c3 |
fix: add hipblaslt library (#9541)
Signed-off-by: Andreas Egli <github@kharan.ch> |
||
|
|
151ad271f2 |
feat(rocm): bump to 7.x (#9323)
feat(rocm): bump to 7.2.1 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
271cc79709 |
chore(backends): do not bundle cuda target directory (#7982)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
06323df457 |
Optimize GPU library copying to preserve symlinks and avoid duplicates (#7931)
* Initial plan * Optimize library copying to preserve symlinks and avoid duplicates Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Address code review feedback: extract get_inode helper, use file type detection for sorting Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Simplify implementation by removing inode tracking Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Add clarifying comment about basename deduplication Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> |
||
|
|
fd53978a7b |
feat: package GPU libraries inside backend containers for unified base image (#7891)
* Initial plan * Add GPU library packaging for isolated backend environments - Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries - Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries - Update Dockerfile.python to package GPU libraries into Python backends - Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Address code review feedback: fix variable consistency and quoting Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Fix code review issues: improve glob handling and remove redundant variable Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * Simplify main Dockerfile and workflow to use unified base image - Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel) - Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64 - GPU libraries are now packaged in individual backend containers Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> |
||
|
|
976c159fdb |
chore(ci): Build some Go based backends on Darwin (#6164)
* chore(ci): Build Go based backends on Darwin Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(stablediffusion-ggml): Fixes for building on Darwin Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(whisper): Build on Darwin Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
4993df81c3 |
fix(metal-llama.cpp): add all libutf8_validity
Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
6971f71a6c |
Add mlx-vlm (#6119)
* Add mlx-vlm Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to CI workflows Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add requirements-mps.txt Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
1ba66d00f5 |
feat: bundle python inside backends (#6123)
* feat(backends): bundle python Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test ci Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * vllm on self-hosted Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add clang Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to fix it for Mac * Relocate links only when is portable Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Make sure to call macosPortableEnv Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use self-hosted for vllm Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
1d830ce7dd |
feat(mlx): add mlx backend (#6049)
* chore: allow to install with pip Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Make the backend to build and actually work Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * List models from system only Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add script to build darwin python backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Run protogen in libbackend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Detect if mps is available across python backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * CI: try to build backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Debug CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Index mlx-vlm Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Remove mlx-vlm Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop CI test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |