mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-01 12:42:55 -04:00
* feat(ds4): add standalone ds4-worker distributed worker binary Add worker_main.c, a minimal standalone worker that owns a slice of the model's transformer layers and serves activations over ds4's own TCP transport via ds4_dist_run(). It links the same engine objects the backend already builds (including ds4_distributed.o) and has NO gRPC/protobuf dependency, so it builds even on hosts lacking protobuf/grpc dev headers. Launched by `local-ai worker ds4-distributed`. Wire the ds4-worker CMake target (mirrors grpc-server's object/GPU/native handling) and have the Makefile copy + clean the binary alongside grpc-server. Ignore the built ds4-worker artifact. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): package ds4-worker alongside grpc-server Copy the standalone ds4-worker binary into the backend package (Linux package.sh) and the Darwin OCI tar (ds4-darwin.sh: both the explicit copy and the otool dylib-bundling loop) so distributed workers ship with the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): tighten ds4-worker integer arg validation to match upstream Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): wire grpc-server as distributed coordinator Add distributed COORDINATOR support to the ds4 backend's gRPC server. Distributed inference is an engine backend: when LoadModel receives 'ds4_role:coordinator', the process populates ds4_engine_options.distributed (role, layer slice, listen host/port) before ds4_engine_open, then the normal ds4_session_* generation path runs transparently once the worker route covers all layers. - New LoadModel options: ds4_role, ds4_layers (START:END or START:output), ds4_listen (host:port), ds4_route_timeout. - parse_layers_spec() maps the layer spec onto ds4_distributed_layers. - wait_route_ready() blocks generation until ds4_session_distributed_route_ready() reports full coverage (or timeout), gating both Predict and PredictStream; returns UNAVAILABLE on timeout/error. - No ds4_role => g_distributed stays false and wait_route_ready is a no-op, so single-node behavior is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): don't block Status during route wait; validate coordinator opts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): add ds4-distributed worker exec helper Add the ds4WorkerArgs helper plus findDS4Backend/DS4Distributed.Run that resolve the ds4 backend via the gallery and exec the packaged ds4-worker binary. Unlike worker_llamacpp.go, ds4 bundles its own dynamic loader (lib/ld.so) for glibc compatibility, so when present we exec ds4-worker through that loader with LD_LIBRARY_PATH=<backend>/lib, mirroring backend/cpp/ds4/run.sh; otherwise we exec it directly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): register the ds4-distributed worker subcommand Wire DS4Distributed into the Worker kong command tree so `local-ai worker ds4-distributed` is available. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): document layer-split distributed inference Add a ds4 section to the distributed-mode feature docs (coordinator model YAML, manual worker command, layer-range semantics, the 'GGUF on every machine' requirement, coordinator-listens dial direction vs llama.cpp) and a terse Distributed mode section to the ds4 backend agent guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): opt-in hardware-gated distributed e2e spec Add a self-contained, opt-in Ginkgo spec to the backend e2e suite that spins a ds4 coordinator (via the packaged run.sh, loaded with ds4_role/ds4_layers/ds4_listen options) plus a ds4-worker process for the upper layers, then uses Eventually to assert a short successful Predict once the layer route forms, before tearing the worker down. Gated by BACKEND_TEST_DS4_DISTRIBUTED=1 (plus the existing BACKEND_BINARY + BACKEND_TEST_MODEL_FILE and optional layer/listen/accel knobs); compiles and skips cleanly with no env, hardware, or model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): pass coordinator ctx to worker; lowercase error string Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): note distributed transport is plaintext/unauthenticated Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * style(ds4): replace em dashes in distributed docs/agent/test per repo convention Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): link ds4-worker with the C++ driver for CUDA/Metal builds The ds4-worker target is built from worker_main.c (C), so CMake linked it with the C driver. The nvcc-built ds4_cuda.o (and Obj-C++ ds4_metal.o) reference the C++ runtime, so the CUDA/Metal builds failed with undefined libstdc++ symbols (std::__throw_length_error). The CPU build passed because ds4_cpu.o is pure C. Force LINKER_LANGUAGE CXX so libstdc++ is linked. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
83 lines
2.6 KiB
Makefile
83 lines
2.6 KiB
Makefile
# ds4 backend Makefile.
|
|
#
|
|
# Upstream pin lives below as DS4_VERSION?=e16ead1e29c81a67bbb64e5b001117679cf9ce6e
|
|
# (.github/bump_deps.sh) can find and update it - matches the
|
|
# llama-cpp / ik-llama-cpp / turboquant convention.
|
|
|
|
DS4_VERSION?=e16ead1e29c81a67bbb64e5b001117679cf9ce6e
|
|
DS4_REPO?=https://github.com/antirez/ds4
|
|
|
|
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
|
|
BUILD_DIR := build
|
|
|
|
BUILD_TYPE ?=
|
|
NATIVE ?= false
|
|
JOBS ?= $(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
|
|
|
|
UNAME_S := $(shell uname -s)
|
|
|
|
CMAKE_ARGS ?= -DCMAKE_BUILD_TYPE=Release
|
|
|
|
# ds4_distributed.o is a GPU-agnostic translation unit that ds4.c/ds4_cpu.o now
|
|
# reference (upstream split distributed inference into its own .c). The same
|
|
# object is shared by every GPU mode, so it is appended unconditionally below.
|
|
ifeq ($(BUILD_TYPE),cublas)
|
|
CMAKE_ARGS += -DDS4_GPU=cuda
|
|
DS4_OBJ_TARGET := ds4.o ds4_cuda.o ds4_distributed.o
|
|
else ifeq ($(UNAME_S),Darwin)
|
|
CMAKE_ARGS += -DDS4_GPU=metal
|
|
DS4_OBJ_TARGET := ds4.o ds4_metal.o ds4_distributed.o
|
|
else
|
|
# CPU reference path (Linux only - macOS CPU path is broken by VM bug per ds4 README).
|
|
CMAKE_ARGS += -DDS4_GPU=cpu
|
|
DS4_OBJ_TARGET := ds4_cpu.o ds4_distributed.o
|
|
endif
|
|
|
|
ifneq ($(NATIVE),true)
|
|
CMAKE_ARGS += -DDS4_NATIVE=OFF
|
|
endif
|
|
|
|
.PHONY: grpc-server package clean purge test all
|
|
all: grpc-server
|
|
|
|
# Clone the upstream ds4 source at the pinned commit. Directory acts as the
|
|
# target so make only re-clones when missing. After a DS4_VERSION bump,
|
|
# run 'make purge && make' to refetch (or rely on CI's clean build).
|
|
ds4:
|
|
mkdir -p ds4
|
|
cd ds4 && \
|
|
git init -q && \
|
|
git remote add origin $(DS4_REPO) && \
|
|
git fetch --depth 1 origin $(DS4_VERSION) && \
|
|
git checkout FETCH_HEAD
|
|
|
|
# Build ds4's engine object files via its own Makefile, which already encodes
|
|
# the right per-platform compile flags (Objective-C/Metal on Darwin, nvcc on Linux+CUDA).
|
|
ds4/ds4.o: ds4
|
|
ifeq ($(BUILD_TYPE),cublas)
|
|
+$(MAKE) -C ds4 ds4.o ds4_cuda.o ds4_distributed.o
|
|
else ifeq ($(UNAME_S),Darwin)
|
|
+$(MAKE) -C ds4 ds4.o ds4_metal.o ds4_distributed.o
|
|
else
|
|
+$(MAKE) -C ds4 ds4_cpu.o ds4_distributed.o
|
|
endif
|
|
|
|
grpc-server: ds4/ds4.o
|
|
mkdir -p $(BUILD_DIR)
|
|
cd $(BUILD_DIR) && cmake $(CMAKE_ARGS) $(CURRENT_MAKEFILE_DIR) && cmake --build . --config Release -j $(JOBS)
|
|
cp $(BUILD_DIR)/grpc-server grpc-server
|
|
cp $(BUILD_DIR)/ds4-worker ds4-worker
|
|
|
|
package: grpc-server
|
|
bash package.sh
|
|
|
|
test:
|
|
@echo "ds4 backend: e2e coverage at tests/e2e-backends/ (BACKEND_BINARY mode)"
|
|
|
|
clean:
|
|
rm -rf $(BUILD_DIR) grpc-server ds4-worker package
|
|
if [ -d ds4 ]; then $(MAKE) -C ds4 clean; fi
|
|
|
|
purge: clean
|
|
rm -rf ds4
|