mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 12:57:02 -04:00
* fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully
Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI
spawns) can be orphaned and linger — holding VRAM and its listen port — if the
LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown
grace period elapses and LocalAI is SIGKILLed) before its own teardown runs.
Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the
SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown ->
ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when
LocalAI receives a catchable signal and survives long enough to run its
handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1,
whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be
signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or
option for a caller to inject/extend SysProcAttr. LocalAI fully delegates
spawning to that library (it never builds the exec.Cmd itself), so it cannot set
a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing
tells the backend to exit and it is reparented to init.
Fix: add a best-effort, backend-side safety net at the one shared choke point
every out-of-process Go backend routes through — grpc.StartServer / RunServer in
pkg/grpc. On startup it captures getppid() and polls; when the process is
reparented (getppid changes / becomes 1 — the standard POSIX signal the original
parent died) it logs and self-terminates. getppid() reparent detection is
portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via
LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the
existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged.
Scope/limitations: covers Go-based backends (everything using pkg/grpc). The
C++ backends (e.g. llama-cpp) and Python backends do not route through
pkg/grpc and are not covered by this mechanism — they would each need an
equivalent parent-death check (follow-up). The fully general fix is for
go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig
at spawn for every backend regardless of language (suggested upstream follow-up;
out of scope for this LocalAI-only PR).
Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild
process tree, lets the middle process exit to orphan the grandchild running the
real watchParentDeath, and asserts it detects the reparent and self-terminates.
Unix-only (build-tagged), runs in CI (Linux).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(process): extend parent-death backstop to C++ and Python backends
The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.
Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:
- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
(getppid() != orig || == 1), portable across Linux/macOS, no-op on
Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
(default 2s; accepts Go-style durations like 500ms/2s/1m)
C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.
Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.
Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
72 lines
2.4 KiB
Bash
Executable File
72 lines
2.4 KiB
Bash
Executable File
#!/bin/bash
|
|
#
|
|
# Discovers and runs every standalone C++ unit test under backend/cpp/.
|
|
#
|
|
# A "standalone" unit test is a *_test.cpp that depends only on the C++ standard
|
|
# library and nlohmann/json (single header) - i.e. it exercises pure helpers and
|
|
# does not need the full llama.cpp + gRPC backend build. Tests that DO need the
|
|
# backend build use the CMake/ctest path (e.g. -DLLAMA_GRPC_BUILD_TESTS=ON)
|
|
# instead and are skipped here.
|
|
#
|
|
# This keeps CI generic: adding a new pure-C++ unit test file named *_test.cpp in
|
|
# an active backend source dir is picked up automatically, with no CI edits.
|
|
#
|
|
# Env:
|
|
# NLOHMANN_INCLUDE include dir that contains nlohmann/json.hpp. If unset, the
|
|
# nlohmann/json single header is fetched to a temp dir.
|
|
# CXX compiler (default: g++).
|
|
# JSON_VERSION nlohmann/json tag to fetch when NLOHMANN_INCLUDE is unset
|
|
# (default: v3.11.3).
|
|
set -uo pipefail
|
|
|
|
ROOT="$(cd "$(dirname "$0")" && pwd)"
|
|
CXX="${CXX:-g++}"
|
|
JSON_VERSION="${JSON_VERSION:-v3.11.3}"
|
|
|
|
JSON_INC="${NLOHMANN_INCLUDE:-}"
|
|
if [ -z "$JSON_INC" ]; then
|
|
JSON_INC="$(mktemp -d)"
|
|
mkdir -p "$JSON_INC/nlohmann"
|
|
echo "Fetching nlohmann/json ${JSON_VERSION} single header..."
|
|
if ! curl -L -sf \
|
|
"https://raw.githubusercontent.com/nlohmann/json/${JSON_VERSION}/single_include/nlohmann/json.hpp" \
|
|
-o "$JSON_INC/nlohmann/json.hpp"; then
|
|
echo "ERROR: failed to fetch nlohmann/json header" >&2
|
|
exit 1
|
|
fi
|
|
fi
|
|
|
|
# Active source dirs only - exclude per-variant build copies, dev snapshots and
|
|
# the vendored upstream llama.cpp tree.
|
|
mapfile -t tests < <(find "$ROOT" -name '*_test.cpp' \
|
|
-not -path '*/llama.cpp/*' \
|
|
-not -path '*-build/*' \
|
|
-not -path '*-dev/*' \
|
|
-not -path '*fallback*' | sort)
|
|
|
|
if [ "${#tests[@]}" -eq 0 ]; then
|
|
echo "No standalone C++ unit tests found under $ROOT"
|
|
exit 0
|
|
fi
|
|
|
|
fail=0
|
|
for test_src in "${tests[@]}"; do
|
|
name="$(basename "$test_src" .cpp)"
|
|
bin="$(mktemp -d)/$name"
|
|
echo "==> $test_src"
|
|
if ! "$CXX" -std=c++17 -Wall -Wextra -pthread \
|
|
-I"$JSON_INC" -I"$(dirname "$test_src")" \
|
|
"$test_src" -o "$bin"; then
|
|
echo "COMPILE FAILED: $test_src" >&2
|
|
fail=1
|
|
continue
|
|
fi
|
|
if ! "$bin"; then
|
|
echo "TEST FAILED: $test_src" >&2
|
|
fail=1
|
|
fi
|
|
done
|
|
|
|
echo "Ran ${#tests[@]} standalone C++ unit test file(s)"
|
|
exit "$fail"
|