mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 04:46:54 -04:00
* fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully
Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI
spawns) can be orphaned and linger — holding VRAM and its listen port — if the
LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown
grace period elapses and LocalAI is SIGKILLed) before its own teardown runs.
Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the
SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown ->
ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when
LocalAI receives a catchable signal and survives long enough to run its
handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1,
whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be
signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or
option for a caller to inject/extend SysProcAttr. LocalAI fully delegates
spawning to that library (it never builds the exec.Cmd itself), so it cannot set
a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing
tells the backend to exit and it is reparented to init.
Fix: add a best-effort, backend-side safety net at the one shared choke point
every out-of-process Go backend routes through — grpc.StartServer / RunServer in
pkg/grpc. On startup it captures getppid() and polls; when the process is
reparented (getppid changes / becomes 1 — the standard POSIX signal the original
parent died) it logs and self-terminates. getppid() reparent detection is
portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via
LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the
existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged.
Scope/limitations: covers Go-based backends (everything using pkg/grpc). The
C++ backends (e.g. llama-cpp) and Python backends do not route through
pkg/grpc and are not covered by this mechanism — they would each need an
equivalent parent-death check (follow-up). The fully general fix is for
go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig
at spawn for every backend regardless of language (suggested upstream follow-up;
out of scope for this LocalAI-only PR).
Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild
process tree, lets the middle process exit to orphan the grandchild running the
real watchParentDeath, and asserts it detects the reparent and self-terminates.
Unix-only (build-tagged), runs in CI (Linux).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(process): extend parent-death backstop to C++ and Python backends
The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.
Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:
- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
(getppid() != orig || == 1), portable across Linux/macOS, no-op on
Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
(default 2s; accepts Go-style durations like 500ms/2s/1m)
C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.
Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.
Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
106 lines
4.2 KiB
Go
106 lines
4.2 KiB
Go
package grpc
|
|
|
|
import (
|
|
"log"
|
|
"os"
|
|
"runtime"
|
|
"strings"
|
|
"time"
|
|
)
|
|
|
|
// Backend worker processes (the per-model gRPC servers LocalAI spawns) are
|
|
// deliberately placed in their own process group by the process manager so
|
|
// LocalAI's graceful shutdown can signal the whole group. That graceful path
|
|
// (SIGTERM -> grace -> SIGKILL, driven by pkg/signals + pkg/model) only runs
|
|
// when LocalAI itself receives a catchable signal and lives long enough to run
|
|
// its handlers. If LocalAI is SIGKILLed (e.g. a supervising process's
|
|
// graceful-shutdown grace period elapses first), that teardown never runs and
|
|
// this backend would be reparented to init and linger, holding VRAM and its
|
|
// listen port.
|
|
//
|
|
// The watcher below is a best-effort backstop for exactly that case: it does
|
|
// NOT replace the graceful teardown, it only covers the "parent vanished
|
|
// without cleaning up" path. It works by detecting reparenting: when the
|
|
// process that spawned this backend dies, the kernel reparents us to the
|
|
// nearest sub-reaper or to init (PID 1), so getppid() stops matching the value
|
|
// we captured at startup. This getppid() approach is portable across
|
|
// Linux/macOS (unlike Linux-only PR_SET_PDEATHSIG), which is why it's used
|
|
// here rather than a kernel parent-death signal.
|
|
const (
|
|
// EnvBackendParentWatch toggles the parent-death watcher. It is enabled by
|
|
// default; set it to a falsey value ("false", "0", "no", "off") to disable
|
|
// (e.g. when running a backend standalone for debugging under a shell whose
|
|
// lifetime shouldn't govern the backend).
|
|
EnvBackendParentWatch = "LOCALAI_BACKEND_PARENT_WATCH"
|
|
// EnvBackendParentWatchInterval overrides the poll interval as a Go
|
|
// duration string (e.g. "500ms"). Defaults to defaultParentWatchInterval.
|
|
EnvBackendParentWatchInterval = "LOCALAI_BACKEND_PARENT_WATCH_INTERVAL"
|
|
|
|
defaultParentWatchInterval = 2 * time.Second
|
|
)
|
|
|
|
// parentWatchEnabled reports whether the watcher should run in this process.
|
|
func parentWatchEnabled() bool {
|
|
switch strings.ToLower(strings.TrimSpace(os.Getenv(EnvBackendParentWatch))) {
|
|
case "false", "0", "no", "off":
|
|
return false
|
|
}
|
|
// Windows does not reparent orphans to a well-known init PID, so the
|
|
// getppid() heuristic used here doesn't apply there.
|
|
return runtime.GOOS != "windows"
|
|
}
|
|
|
|
// parentWatchInterval returns the configured poll interval, or the default.
|
|
func parentWatchInterval() time.Duration {
|
|
if v := os.Getenv(EnvBackendParentWatchInterval); v != "" {
|
|
if d, err := time.ParseDuration(v); err == nil && d > 0 {
|
|
return d
|
|
}
|
|
}
|
|
return defaultParentWatchInterval
|
|
}
|
|
|
|
// parentDied reports whether this process has been reparented away from the
|
|
// parent it had when the watcher started. Reparenting is the standard POSIX
|
|
// signal that the original parent (here, the LocalAI process that spawned this
|
|
// backend) has exited: the orphan is handed to the nearest sub-reaper or to
|
|
// init (PID 1), so getppid() no longer matches the value captured at startup.
|
|
func parentDied(origPPID int) bool {
|
|
ppid := os.Getppid()
|
|
return ppid != origPPID || ppid == 1
|
|
}
|
|
|
|
// watchParentDeath polls until parentDied reports the original parent is gone,
|
|
// then invokes onDeath. It blocks, so run it in its own goroutine.
|
|
func watchParentDeath(origPPID int, interval time.Duration, onDeath func()) {
|
|
ticker := time.NewTicker(interval)
|
|
defer ticker.Stop()
|
|
for range ticker.C {
|
|
if parentDied(origPPID) {
|
|
onDeath()
|
|
return
|
|
}
|
|
}
|
|
}
|
|
|
|
// startParentDeathWatcher installs the best-effort safety net described above
|
|
// on the calling backend process. It is a no-op when disabled or on platforms
|
|
// where the mechanism doesn't apply. This is a backstop alongside — never a
|
|
// replacement for — LocalAI's graceful SIGTERM->grace->SIGKILL teardown.
|
|
func startParentDeathWatcher() {
|
|
if !parentWatchEnabled() {
|
|
return
|
|
}
|
|
origPPID := os.Getppid()
|
|
// A parent of 1 at startup means we were already orphaned (or launched
|
|
// directly under init) — there's no original parent to watch for.
|
|
if origPPID <= 1 {
|
|
return
|
|
}
|
|
interval := parentWatchInterval()
|
|
go watchParentDeath(origPPID, interval, func() {
|
|
log.Printf("backend parent process (pid %d) exited without stopping this backend; self-terminating to avoid orphaning", origPPID)
|
|
os.Exit(1)
|
|
})
|
|
}
|