mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 12:57:02 -04:00
* fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully
Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI
spawns) can be orphaned and linger — holding VRAM and its listen port — if the
LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown
grace period elapses and LocalAI is SIGKILLed) before its own teardown runs.
Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the
SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown ->
ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when
LocalAI receives a catchable signal and survives long enough to run its
handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1,
whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be
signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or
option for a caller to inject/extend SysProcAttr. LocalAI fully delegates
spawning to that library (it never builds the exec.Cmd itself), so it cannot set
a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing
tells the backend to exit and it is reparented to init.
Fix: add a best-effort, backend-side safety net at the one shared choke point
every out-of-process Go backend routes through — grpc.StartServer / RunServer in
pkg/grpc. On startup it captures getppid() and polls; when the process is
reparented (getppid changes / becomes 1 — the standard POSIX signal the original
parent died) it logs and self-terminates. getppid() reparent detection is
portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via
LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the
existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged.
Scope/limitations: covers Go-based backends (everything using pkg/grpc). The
C++ backends (e.g. llama-cpp) and Python backends do not route through
pkg/grpc and are not covered by this mechanism — they would each need an
equivalent parent-death check (follow-up). The fully general fix is for
go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig
at spawn for every backend regardless of language (suggested upstream follow-up;
out of scope for this LocalAI-only PR).
Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild
process tree, lets the middle process exit to orphan the grandchild running the
real watchParentDeath, and asserts it detects the reparent and self-terminates.
Unix-only (build-tagged), runs in CI (Linux).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(process): extend parent-death backstop to C++ and Python backends
The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.
Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:
- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
(getppid() != orig || == 1), portable across Linux/macOS, no-op on
Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
(default 2s; accepts Go-style durations like 500ms/2s/1m)
C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.
Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.
Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
114 lines
4.9 KiB
CMake
114 lines
4.9 KiB
CMake
set(TARGET grpc-server)
|
|
set(CMAKE_CXX_STANDARD 17)
|
|
cmake_minimum_required(VERSION 3.15)
|
|
set(TARGET grpc-server)
|
|
set(_PROTOBUF_LIBPROTOBUF libprotobuf)
|
|
set(_REFLECTION grpc++_reflection)
|
|
|
|
if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
|
|
# Set correct Homebrew install folder for Apple Silicon and Intel Macs
|
|
if (CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "arm64")
|
|
set(HOMEBREW_DEFAULT_PREFIX "/opt/homebrew")
|
|
else()
|
|
set(HOMEBREW_DEFAULT_PREFIX "/usr/local")
|
|
endif()
|
|
|
|
link_directories("${HOMEBREW_DEFAULT_PREFIX}/lib")
|
|
include_directories("${HOMEBREW_DEFAULT_PREFIX}/include")
|
|
endif()
|
|
|
|
find_package(absl CONFIG REQUIRED)
|
|
find_package(Protobuf CONFIG REQUIRED)
|
|
find_package(gRPC CONFIG REQUIRED)
|
|
|
|
find_program(_PROTOBUF_PROTOC protoc)
|
|
set(_GRPC_GRPCPP grpc++)
|
|
find_program(_GRPC_CPP_PLUGIN_EXECUTABLE grpc_cpp_plugin)
|
|
|
|
include_directories(${CMAKE_CURRENT_BINARY_DIR})
|
|
include_directories(${Protobuf_INCLUDE_DIRS})
|
|
|
|
message(STATUS "Using protobuf version ${Protobuf_VERSION} | Protobuf_INCLUDE_DIRS: ${Protobuf_INCLUDE_DIRS} | CMAKE_CURRENT_BINARY_DIR: ${CMAKE_CURRENT_BINARY_DIR}")
|
|
|
|
# Proto file
|
|
get_filename_component(hw_proto "../../../../../../backend/backend.proto" ABSOLUTE)
|
|
get_filename_component(hw_proto_path "${hw_proto}" PATH)
|
|
|
|
# Generated sources
|
|
set(hw_proto_srcs "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.cc")
|
|
set(hw_proto_hdrs "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.h")
|
|
set(hw_grpc_srcs "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.cc")
|
|
set(hw_grpc_hdrs "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.h")
|
|
|
|
add_custom_command(
|
|
OUTPUT "${hw_proto_srcs}" "${hw_proto_hdrs}" "${hw_grpc_srcs}" "${hw_grpc_hdrs}"
|
|
COMMAND ${_PROTOBUF_PROTOC}
|
|
ARGS --grpc_out "${CMAKE_CURRENT_BINARY_DIR}"
|
|
--cpp_out "${CMAKE_CURRENT_BINARY_DIR}"
|
|
-I "${hw_proto_path}"
|
|
--plugin=protoc-gen-grpc="${_GRPC_CPP_PLUGIN_EXECUTABLE}"
|
|
"${hw_proto}"
|
|
DEPENDS "${hw_proto}")
|
|
|
|
# hw_grpc_proto: force STATIC. Under the CPU_ALL_VARIANTS build BUILD_SHARED_LIBS=ON
|
|
# (ggml/llama become shared), which would otherwise make this glue library a DSO. As a
|
|
# DSO it references the hidden-visibility symbols in the static libprotobuf.a, which the
|
|
# linker cannot satisfy ("hidden symbol ... in libprotobuf.a is referenced by DSO").
|
|
# Keeping it STATIC links protobuf/gRPC directly into the grpc-server executable while
|
|
# only ggml/llama stay shared. No effect on the static variants (already BUILD_SHARED_LIBS=OFF).
|
|
add_library(hw_grpc_proto STATIC
|
|
${hw_grpc_srcs}
|
|
${hw_grpc_hdrs}
|
|
${hw_proto_srcs}
|
|
${hw_proto_hdrs} )
|
|
|
|
add_executable(${TARGET} grpc-server.cpp json.hpp httplib.h)
|
|
|
|
target_include_directories(${TARGET} PRIVATE ../llava)
|
|
target_include_directories(${TARGET} PRIVATE ${CMAKE_SOURCE_DIR})
|
|
|
|
# Upstream llama.cpp renamed the `common` helpers library to `llama-common`.
|
|
# Forks that branched before the rename (e.g. llama-cpp-turboquant) still
|
|
# expose it as `common`. Detect which one is present so the same CMakeLists
|
|
# drives both builds — otherwise an unresolved name silently degrades to a
|
|
# plain `-l` flag and the PUBLIC include dir (where common.h lives) is lost.
|
|
if (TARGET llama-common)
|
|
set(_LLAMA_COMMON_TARGET llama-common)
|
|
else()
|
|
set(_LLAMA_COMMON_TARGET common)
|
|
endif()
|
|
|
|
target_link_libraries(${TARGET} PRIVATE ${_LLAMA_COMMON_TARGET} llama mtmd ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
|
|
absl::flags_parse
|
|
gRPC::${_REFLECTION}
|
|
gRPC::${_GRPC_GRPCPP}
|
|
protobuf::${_PROTOBUF_LIBPROTOBUF})
|
|
target_compile_features(${TARGET} PRIVATE cxx_std_11)
|
|
if(TARGET BUILD_INFO)
|
|
add_dependencies(${TARGET} BUILD_INFO)
|
|
endif()
|
|
|
|
# Unit test for the message-content normalization helper (message_content.h).
|
|
# Off by default so the normal backend build is untouched; enable with
|
|
# -DLLAMA_GRPC_BUILD_TESTS=ON and run via ctest. It reuses llama.cpp's vendored
|
|
# <nlohmann/json.hpp> (propagated by the common helpers library) so it has no
|
|
# extra dependency beyond what the backend already builds against.
|
|
option(LLAMA_GRPC_BUILD_TESTS "Build grpc-server unit tests" OFF)
|
|
if(LLAMA_GRPC_BUILD_TESTS)
|
|
enable_testing()
|
|
add_executable(message_content_test message_content_test.cpp message_content.h)
|
|
target_include_directories(message_content_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
|
|
target_link_libraries(message_content_test PRIVATE ${_LLAMA_COMMON_TARGET})
|
|
target_compile_features(message_content_test PRIVATE cxx_std_17)
|
|
add_test(NAME message_content_test COMMAND message_content_test)
|
|
|
|
# Parent-death watcher test (parent_watch.h) — standard library only, but
|
|
# needs a threading runtime for std::thread.
|
|
find_package(Threads REQUIRED)
|
|
add_executable(parent_watch_test parent_watch_test.cpp parent_watch.h)
|
|
target_include_directories(parent_watch_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
|
|
target_link_libraries(parent_watch_test PRIVATE Threads::Threads)
|
|
target_compile_features(parent_watch_test PRIVATE cxx_std_17)
|
|
add_test(NAME parent_watch_test COMMAND parent_watch_test)
|
|
endif()
|