mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-01 20:53:15 -04:00
* feat(ds4): add standalone ds4-worker distributed worker binary Add worker_main.c, a minimal standalone worker that owns a slice of the model's transformer layers and serves activations over ds4's own TCP transport via ds4_dist_run(). It links the same engine objects the backend already builds (including ds4_distributed.o) and has NO gRPC/protobuf dependency, so it builds even on hosts lacking protobuf/grpc dev headers. Launched by `local-ai worker ds4-distributed`. Wire the ds4-worker CMake target (mirrors grpc-server's object/GPU/native handling) and have the Makefile copy + clean the binary alongside grpc-server. Ignore the built ds4-worker artifact. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): package ds4-worker alongside grpc-server Copy the standalone ds4-worker binary into the backend package (Linux package.sh) and the Darwin OCI tar (ds4-darwin.sh: both the explicit copy and the otool dylib-bundling loop) so distributed workers ship with the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): tighten ds4-worker integer arg validation to match upstream Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): wire grpc-server as distributed coordinator Add distributed COORDINATOR support to the ds4 backend's gRPC server. Distributed inference is an engine backend: when LoadModel receives 'ds4_role:coordinator', the process populates ds4_engine_options.distributed (role, layer slice, listen host/port) before ds4_engine_open, then the normal ds4_session_* generation path runs transparently once the worker route covers all layers. - New LoadModel options: ds4_role, ds4_layers (START:END or START:output), ds4_listen (host:port), ds4_route_timeout. - parse_layers_spec() maps the layer spec onto ds4_distributed_layers. - wait_route_ready() blocks generation until ds4_session_distributed_route_ready() reports full coverage (or timeout), gating both Predict and PredictStream; returns UNAVAILABLE on timeout/error. - No ds4_role => g_distributed stays false and wait_route_ready is a no-op, so single-node behavior is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): don't block Status during route wait; validate coordinator opts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): add ds4-distributed worker exec helper Add the ds4WorkerArgs helper plus findDS4Backend/DS4Distributed.Run that resolve the ds4 backend via the gallery and exec the packaged ds4-worker binary. Unlike worker_llamacpp.go, ds4 bundles its own dynamic loader (lib/ld.so) for glibc compatibility, so when present we exec ds4-worker through that loader with LD_LIBRARY_PATH=<backend>/lib, mirroring backend/cpp/ds4/run.sh; otherwise we exec it directly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): register the ds4-distributed worker subcommand Wire DS4Distributed into the Worker kong command tree so `local-ai worker ds4-distributed` is available. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): document layer-split distributed inference Add a ds4 section to the distributed-mode feature docs (coordinator model YAML, manual worker command, layer-range semantics, the 'GGUF on every machine' requirement, coordinator-listens dial direction vs llama.cpp) and a terse Distributed mode section to the ds4 backend agent guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): opt-in hardware-gated distributed e2e spec Add a self-contained, opt-in Ginkgo spec to the backend e2e suite that spins a ds4 coordinator (via the packaged run.sh, loaded with ds4_role/ds4_layers/ds4_listen options) plus a ds4-worker process for the upper layers, then uses Eventually to assert a short successful Predict once the layer route forms, before tearing the worker down. Gated by BACKEND_TEST_DS4_DISTRIBUTED=1 (plus the existing BACKEND_BINARY + BACKEND_TEST_MODEL_FILE and optional layer/listen/accel knobs); compiles and skips cleanly with no env, hardware, or model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): pass coordinator ctx to worker; lowercase error string Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): note distributed transport is plaintext/unauthenticated Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * style(ds4): replace em dashes in distributed docs/agent/test per repo convention Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): link ds4-worker with the C++ driver for CUDA/Metal builds The ds4-worker target is built from worker_main.c (C), so CMake linked it with the C driver. The nvcc-built ds4_cuda.o (and Obj-C++ ds4_metal.o) reference the C++ runtime, so the CUDA/Metal builds failed with undefined libstdc++ symbols (std::__throw_length_error). The CPU build passed because ds4_cpu.o is pure C. Force LINKER_LANGUAGE CXX so libstdc++ is linked. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
140 lines
5.2 KiB
CMake
140 lines
5.2 KiB
CMake
cmake_minimum_required(VERSION 3.15)
|
|
project(ds4-grpc-server LANGUAGES CXX C)
|
|
|
|
set(CMAKE_CXX_STANDARD 17)
|
|
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
|
set(TARGET grpc-server)
|
|
|
|
option(DS4_NATIVE "Compile with -march=native / -mcpu=native" ON)
|
|
set(DS4_GPU "cpu" CACHE STRING "GPU backend: cpu, cuda, or metal")
|
|
set(DS4_DIR "${CMAKE_CURRENT_SOURCE_DIR}/ds4" CACHE PATH "Path to cloned ds4 source")
|
|
|
|
find_package(Threads REQUIRED)
|
|
find_package(Protobuf CONFIG QUIET)
|
|
if(NOT Protobuf_FOUND)
|
|
find_package(Protobuf REQUIRED)
|
|
endif()
|
|
find_package(gRPC CONFIG QUIET)
|
|
if(NOT gRPC_FOUND)
|
|
# Ubuntu's apt-installed grpc++ does not ship a CMake config - fall back.
|
|
find_library(GRPCPP_LIB grpc++ REQUIRED)
|
|
find_library(GRPCPP_REFLECTION_LIB grpc++_reflection REQUIRED)
|
|
add_library(gRPC::grpc++ INTERFACE IMPORTED)
|
|
set_target_properties(gRPC::grpc++ PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_LIB}")
|
|
add_library(gRPC::grpc++_reflection INTERFACE IMPORTED)
|
|
set_target_properties(gRPC::grpc++_reflection PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_REFLECTION_LIB}")
|
|
endif()
|
|
|
|
find_program(_PROTOC NAMES protoc REQUIRED)
|
|
find_program(_GRPC_CPP_PLUGIN NAMES grpc_cpp_plugin REQUIRED)
|
|
|
|
get_filename_component(HW_PROTO "${CMAKE_CURRENT_SOURCE_DIR}/../../backend.proto" ABSOLUTE)
|
|
get_filename_component(HW_PROTO_PATH "${HW_PROTO}" PATH)
|
|
|
|
set(HW_PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.cc")
|
|
set(HW_PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.h")
|
|
set(HW_GRPC_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.cc")
|
|
set(HW_GRPC_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.h")
|
|
|
|
add_custom_command(
|
|
OUTPUT "${HW_PROTO_SRCS}" "${HW_PROTO_HDRS}" "${HW_GRPC_SRCS}" "${HW_GRPC_HDRS}"
|
|
COMMAND ${_PROTOC}
|
|
ARGS --grpc_out "${CMAKE_CURRENT_BINARY_DIR}"
|
|
--cpp_out "${CMAKE_CURRENT_BINARY_DIR}"
|
|
-I "${HW_PROTO_PATH}"
|
|
--plugin=protoc-gen-grpc="${_GRPC_CPP_PLUGIN}"
|
|
"${HW_PROTO}"
|
|
DEPENDS "${HW_PROTO}")
|
|
|
|
add_library(hw_grpc_proto STATIC
|
|
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
|
|
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
|
|
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
|
|
|
|
set(DS4_OBJS "${DS4_DIR}/ds4.o")
|
|
if(DS4_GPU STREQUAL "cuda")
|
|
list(APPEND DS4_OBJS "${DS4_DIR}/ds4_cuda.o")
|
|
elseif(DS4_GPU STREQUAL "metal")
|
|
list(APPEND DS4_OBJS "${DS4_DIR}/ds4_metal.o")
|
|
elseif(DS4_GPU STREQUAL "cpu")
|
|
set(DS4_OBJS "${DS4_DIR}/ds4_cpu.o")
|
|
endif()
|
|
|
|
# ds4.c now references ds4_distributed.c (distributed inference was split into
|
|
# its own translation unit upstream). It is a single GPU-agnostic object shared
|
|
# by every GPU mode, so link it in regardless of DS4_GPU.
|
|
list(APPEND DS4_OBJS "${DS4_DIR}/ds4_distributed.o")
|
|
|
|
add_executable(${TARGET}
|
|
grpc-server.cpp
|
|
dsml_parser.cpp
|
|
dsml_renderer.cpp
|
|
kv_cache.cpp)
|
|
|
|
target_include_directories(${TARGET} PRIVATE ${DS4_DIR})
|
|
|
|
foreach(obj ${DS4_OBJS})
|
|
target_sources(${TARGET} PRIVATE ${obj})
|
|
set_source_files_properties(${obj} PROPERTIES EXTERNAL_OBJECT TRUE GENERATED TRUE)
|
|
endforeach()
|
|
|
|
target_link_libraries(${TARGET} PRIVATE
|
|
hw_grpc_proto
|
|
gRPC::grpc++
|
|
gRPC::grpc++_reflection
|
|
protobuf::libprotobuf
|
|
Threads::Threads
|
|
m)
|
|
|
|
if(DS4_GPU STREQUAL "cuda")
|
|
find_package(CUDAToolkit REQUIRED)
|
|
target_link_libraries(${TARGET} PRIVATE CUDA::cudart CUDA::cublas)
|
|
elseif(DS4_GPU STREQUAL "metal")
|
|
find_library(FOUNDATION_LIB Foundation REQUIRED)
|
|
find_library(METAL_LIB Metal REQUIRED)
|
|
target_link_libraries(${TARGET} PRIVATE ${FOUNDATION_LIB} ${METAL_LIB})
|
|
elseif(DS4_GPU STREQUAL "cpu")
|
|
target_compile_definitions(${TARGET} PRIVATE DS4_NO_GPU)
|
|
endif()
|
|
|
|
if(DS4_NATIVE)
|
|
if(APPLE)
|
|
target_compile_options(${TARGET} PRIVATE -mcpu=native)
|
|
else()
|
|
target_compile_options(${TARGET} PRIVATE -march=native)
|
|
endif()
|
|
endif()
|
|
|
|
# ds4-worker: standalone distributed worker. Links the same ds4 engine objects
|
|
# (including ds4_distributed.o) but has NO gRPC/protobuf dependency - it speaks
|
|
# ds4's own TCP transport via ds4_dist_run(). Buildable wherever the engine
|
|
# objects build, even on hosts without protobuf/grpc dev headers.
|
|
add_executable(ds4-worker worker_main.c)
|
|
target_include_directories(ds4-worker PRIVATE ${DS4_DIR})
|
|
foreach(obj ${DS4_OBJS})
|
|
target_sources(ds4-worker PRIVATE ${obj})
|
|
set_source_files_properties(${obj} PROPERTIES EXTERNAL_OBJECT TRUE GENERATED TRUE)
|
|
endforeach()
|
|
# worker_main.c is C, but the engine objects built by nvcc (ds4_cuda.o) and the
|
|
# Metal path (ds4_metal.o, Obj-C++) reference the C++ runtime (libstdc++). Force
|
|
# the C++ linker driver so those symbols resolve; the C driver would not link
|
|
# libstdc++ and the CUDA/Metal builds fail with undefined std:: references.
|
|
set_target_properties(ds4-worker PROPERTIES LINKER_LANGUAGE CXX)
|
|
target_link_libraries(ds4-worker PRIVATE Threads::Threads m)
|
|
|
|
if(DS4_GPU STREQUAL "cuda")
|
|
target_link_libraries(ds4-worker PRIVATE CUDA::cudart CUDA::cublas)
|
|
elseif(DS4_GPU STREQUAL "metal")
|
|
target_link_libraries(ds4-worker PRIVATE ${FOUNDATION_LIB} ${METAL_LIB})
|
|
elseif(DS4_GPU STREQUAL "cpu")
|
|
target_compile_definitions(ds4-worker PRIVATE DS4_NO_GPU)
|
|
endif()
|
|
|
|
if(DS4_NATIVE)
|
|
if(APPLE)
|
|
target_compile_options(ds4-worker PRIVATE -mcpu=native)
|
|
else()
|
|
target_compile_options(ds4-worker PRIVATE -march=native)
|
|
endif()
|
|
endif()
|