fix(process): give backend workers a parent-death safety net (#10639)

* fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI spawns) can be orphaned and linger — holding VRAM and its listen port — if the LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before its own teardown runs. Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown -> ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when LocalAI receives a catchable signal and survives long enough to run its handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1, whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or option for a caller to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (it never builds the exec.Cmd itself), so it cannot set a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing tells the backend to exit and it is reparented to init. Fix: add a best-effort, backend-side safety net at the one shared choke point every out-of-process Go backend routes through — grpc.StartServer / RunServer in pkg/grpc. On startup it captures getppid() and polls; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal the original parent died) it logs and self-terminates. getppid() reparent detection is portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged. Scope/limitations: covers Go-based backends (everything using pkg/grpc). The C++ backends (e.g. llama-cpp) and Python backends do not route through pkg/grpc and are not covered by this mechanism — they would each need an equivalent parent-death check (follow-up). The fully general fix is for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language (suggested upstream follow-up; out of scope for this LocalAI-only PR). Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild process tree, lets the middle process exit to orphan the grandchild running the real watchParentDeath, and asserts it detects the reparent and self-terminates. Unix-only (build-tagged), runs in CI (Linux). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(process): extend parent-death backstop to C++ and Python backends The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5) only protects backends that route through pkg/grpc. C++ and Python backends don't, so the originally-reported case — the llama.cpp gRPC worker surviving a non-graceful LocalAI death — was still uncovered. Extend the same best-effort backstop to both languages, reusing the exact mechanism and semantics: - capture getppid() at startup, skip if already orphaned (<=1) - a background thread polls getppid() and self-exits on reparenting (getppid() != orig || == 1), portable across Linux/macOS, no-op on Windows - same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL (default 2s; accepts Go-style durations like 500ms/2s/1m) C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++ backend) as a dependency-free header parent_watch.h, wired into grpc-server.cpp's main() and copied at build time via prepare.sh. C++ backends have no shared server scaffolding, so other C++ backends (ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would each need the same one-line include+call as follow-ups. Python: implemented once in the shared common/parent_watch.py and armed from common/grpc_auth.py's get_auth_interceptors() — the single helper every one of the 35 Python backends invokes while building its gRPC server — so all Python backends (and future ones) are covered with no per-backend edits and no duplicated implementation. Tests (real process-tree reparent detection, mirroring the Go test): - backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh) - backend/python/common/parent_watch_test.py (python -m unittest) Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-03 12:57:02 -04:00 · 2026-07-02 19:16:48 +02:00
parent 6eea3ef2ac
commit a4e6e01e4d
12 changed files with 978 additions and 1 deletions
--- a/backend/python/common/grpc_auth.py
+++ b/backend/python/common/grpc_auth.py
@@ -11,6 +11,8 @@ import os

 import grpc

+from parent_watch import start_parent_death_watcher
+

 class _AbortHandler(grpc.RpcMethodHandler):
    """A method handler that immediately aborts with UNAUTHENTICATED."""
@@ -70,6 +72,13 @@ def get_auth_interceptors(*, aio: bool = False):

    Returns an empty list when LOCALAI_GRPC_AUTH_TOKEN is not set.
    """
+    # Arm the best-effort parent-death backstop here: this is the single helper
+    # every LocalAI Python backend invokes exactly once while building its gRPC
+    # server (mirroring how the Go watcher arms in pkg/grpc's shared serve path).
+    # start_parent_death_watcher() is idempotent and a no-op when disabled or on
+    # unsupported platforms — see parent_watch.py.
+    start_parent_death_watcher()
+
    token = os.environ.get("LOCALAI_GRPC_AUTH_TOKEN", "")
    if not token:
        return []
--- a/backend/python/common/parent_watch.py
+++ b/backend/python/common/parent_watch.py
@@ -0,0 +1,149 @@
+"""Parent-death watcher (best-effort backstop) for LocalAI Python backends.
+
+LocalAI spawns each backend as a child process and, on a clean shutdown, tears
+it down itself (SIGTERM -> grace -> SIGKILL). That graceful path only runs when
+LocalAI receives a catchable signal and lives long enough to run its handlers.
+If LocalAI is SIGKILLed (e.g. a supervising process's grace period elapses
+first), that teardown never runs and this backend would be reparented to init
+and linger, holding GPU/VRAM and its listen port.
+
+The watcher here is a best-effort backstop for exactly that case: it does NOT
+replace the graceful teardown, it only covers the "parent vanished without
+cleaning up" path. It detects reparenting: when the process that spawned this
+backend dies, the kernel reparents us to the nearest sub-reaper or to init
+(PID 1), so os.getppid() stops matching the value captured at startup. This
+getppid() approach is portable across Linux/macOS (unlike the Linux-only
+PR_SET_PDEATHSIG), which is why it is used here, mirroring the Go backends'
+pkg/grpc/parentwatch.go and the C++ backends' parent_watch.h. It is disabled on
+Windows, which has no equivalent orphan-reparenting semantics.
+
+Env vars (shared verbatim across the Go, C++ and Python backends):
+  LOCALAI_BACKEND_PARENT_WATCH           enabled by default; a falsey value
+                                         ("false"/"0"/"no"/"off", case-insensitive)
+                                         disables it.
+  LOCALAI_BACKEND_PARENT_WATCH_INTERVAL  poll interval as a Go-style duration
+                                         string ("500ms", "2s", "1m") or a bare
+                                         number of seconds. Defaults to 2s.
+"""
+
+import os
+import sys
+import threading
+
+ENV_PARENT_WATCH = "LOCALAI_BACKEND_PARENT_WATCH"
+ENV_PARENT_WATCH_INTERVAL = "LOCALAI_BACKEND_PARENT_WATCH_INTERVAL"
+
+_DEFAULT_INTERVAL_SECONDS = 2.0
+
+# Guard so repeated calls (e.g. get_auth_interceptors invoked more than once)
+# only ever arm a single watcher thread per process.
+_started = False
+_started_lock = threading.Lock()
+
+
+def _enabled():
+    """Report whether the watcher should run in this process."""
+    # Windows does not reparent orphans to a well-known init PID, so the
+    # getppid() heuristic used here doesn't apply there.
+    if os.name == "nt" or sys.platform.startswith("win"):
+        return False
+    val = os.environ.get(ENV_PARENT_WATCH, "").strip().lower()
+    if val in ("false", "0", "no", "off"):
+        return False
+    return True
+
+
+def _interval_seconds():
+    """Return the configured poll interval in seconds, or the default.
+
+    Accepts Go-style duration strings ("500ms", "2s", "1m") for cross-language
+    parity, or a bare number interpreted as seconds.
+    """
+    raw = os.environ.get(ENV_PARENT_WATCH_INTERVAL, "").strip()
+    if not raw:
+        return _DEFAULT_INTERVAL_SECONDS
+    # Split numeric prefix from unit suffix.
+    i = 0
+    while i < len(raw) and (raw[i].isdigit() or raw[i] == "." or (i == 0 and raw[i] in "+-")):
+        i += 1
+    if i == 0:
+        return _DEFAULT_INTERVAL_SECONDS
+    try:
+        num = float(raw[:i])
+    except ValueError:
+        return _DEFAULT_INTERVAL_SECONDS
+    unit = raw[i:].lower()
+    if unit == "ms":
+        seconds = num / 1000.0
+    elif unit in ("s", ""):
+        seconds = num
+    elif unit == "m":
+        seconds = num * 60.0
+    else:
+        return _DEFAULT_INTERVAL_SECONDS
+    return seconds if seconds > 0 else _DEFAULT_INTERVAL_SECONDS
+
+
+def _parent_died(orig_ppid):
+    """Report whether this process has been reparented away from orig_ppid.
+
+    Reparenting is the standard POSIX signal that the original parent (here, the
+    LocalAI process that spawned this backend) has exited: the orphan is handed
+    to the nearest sub-reaper or to init (PID 1), so os.getppid() no longer
+    matches the value captured at startup.
+    """
+    ppid = os.getppid()
+    return ppid != orig_ppid or ppid == 1
+
+
+def _watch(orig_ppid, interval, on_death):
+    """Poll until _parent_died reports the original parent is gone, then call
+    on_death. Blocks, so run it on its own (daemon) thread."""
+    import time
+
+    while True:
+        time.sleep(interval)
+        if _parent_died(orig_ppid):
+            on_death()
+            return
+
+
+def start_parent_death_watcher():
+    """Install the best-effort safety net described in this module's docstring.
+
+    No-op when disabled, on Windows, when already orphaned at startup
+    (os.getppid() <= 1), or if already started. This is a backstop alongside —
+    never a replacement for — LocalAI's graceful teardown.
+    """
+    global _started
+    if not _enabled():
+        return
+    with _started_lock:
+        if _started:
+            return
+        orig_ppid = os.getppid()
+        # A parent of 1 (or less) at startup means we were already orphaned (or
+        # launched directly under init) — there is no original parent to watch.
+        if orig_ppid <= 1:
+            return
+        interval = _interval_seconds()
+
+        def on_death():
+            print(
+                "backend parent process (pid {}) exited without stopping this "
+                "backend; self-terminating to avoid orphaning".format(orig_ppid),
+                file=sys.stderr,
+                flush=True,
+            )
+            # Immediate, non-cleanup exit: this is a shutdown safety net and the
+            # normal graceful path is already gone.
+            os._exit(1)
+
+        thread = threading.Thread(
+            target=_watch,
+            args=(orig_ppid, interval, on_death),
+            name="parent-death-watcher",
+            daemon=True,
+        )
+        thread.start()
+        _started = True
--- a/backend/python/common/parent_watch_test.py
+++ b/backend/python/common/parent_watch_test.py
@@ -0,0 +1,150 @@
+"""Unit tests for the parent-death watcher (parent_watch.py).
+
+Run standalone (Python standard library only, no backend venv needed):
+    python3 -m unittest parent_watch_test
+
+The core test (test_detects_reparent) builds a genuine two-level process tree
+(test -> middle -> grandchild) with os.fork, lets the middle process die, and
+asserts the grandchild's parent_watch._watch detects the reparenting and
+self-terminates — mirroring the Go test in pkg/grpc/parentwatch_test.go and the
+C++ test in backend/cpp/llama-cpp/parent_watch_test.cpp.
+"""
+
+import os
+import sys
+import tempfile
+import threading
+import time
+import unittest
+
+import parent_watch
+
+
+class TestParentWatchEnvParsing(unittest.TestCase):
+    def setUp(self):
+        self._saved = {
+            k: os.environ.get(k)
+            for k in (parent_watch.ENV_PARENT_WATCH, parent_watch.ENV_PARENT_WATCH_INTERVAL)
+        }
+        for k in self._saved:
+            os.environ.pop(k, None)
+
+    def tearDown(self):
+        for k, v in self._saved.items():
+            if v is None:
+                os.environ.pop(k, None)
+            else:
+                os.environ[k] = v
+
+    def test_interval_default(self):
+        self.assertEqual(parent_watch._interval_seconds(), 2.0)
+
+    def test_interval_units(self):
+        cases = {"500ms": 0.5, "2s": 2.0, "1m": 60.0, "3": 3.0, "0.5s": 0.5}
+        for raw, expected in cases.items():
+            os.environ[parent_watch.ENV_PARENT_WATCH_INTERVAL] = raw
+            self.assertAlmostEqual(parent_watch._interval_seconds(), expected, msg=raw)
+
+    def test_interval_garbage_falls_back(self):
+        os.environ[parent_watch.ENV_PARENT_WATCH_INTERVAL] = "garbage"
+        self.assertEqual(parent_watch._interval_seconds(), 2.0)
+
+    @unittest.skipIf(os.name == "nt" or sys.platform.startswith("win"), "POSIX only")
+    def test_enabled_default(self):
+        self.assertTrue(parent_watch._enabled())
+
+    @unittest.skipIf(os.name == "nt" or sys.platform.startswith("win"), "POSIX only")
+    def test_disabled_by_falsey(self):
+        for val in ("false", "0", "no", "off", "OFF", " False "):
+            os.environ[parent_watch.ENV_PARENT_WATCH] = val
+            self.assertFalse(parent_watch._enabled(), msg=val)
+
+    @unittest.skipIf(os.name == "nt" or sys.platform.startswith("win"), "POSIX only")
+    def test_enabled_by_truthy(self):
+        for val in ("true", "1", "yes", "on"):
+            os.environ[parent_watch.ENV_PARENT_WATCH] = val
+            self.assertTrue(parent_watch._enabled(), msg=val)
+
+
+@unittest.skipIf(os.name == "nt" or sys.platform.startswith("win"), "fork/reparent is POSIX only")
+class TestParentWatchReparent(unittest.TestCase):
+    def _wait_for_file(self, path, timeout=10.0):
+        deadline = time.time() + timeout
+        while time.time() < deadline:
+            if os.path.exists(path):
+                return True
+            time.sleep(0.02)
+        return False
+
+    def test_detects_reparent(self):
+        tmpdir = tempfile.mkdtemp(prefix="parentwatch_test_")
+        ready_file = os.path.join(tmpdir, "ready")
+        exited_file = os.path.join(tmpdir, "exited")
+
+        middle = os.fork()
+        if middle == 0:
+            # ---- middle process ----
+            grandchild = os.fork()
+            if grandchild == 0:
+                # ---- grandchild process: arm the REAL watcher against middle ----
+                orig_ppid = os.getppid()
+
+                def on_death():
+                    with open(exited_file, "w") as f:
+                        f.write("1")
+                    os._exit(7)
+
+                threading.Thread(
+                    target=parent_watch._watch,
+                    args=(orig_ppid, 0.05, on_death),
+                    daemon=True,
+                ).start()
+
+                # Safety valve: never linger if something goes wrong.
+                def bail():
+                    time.sleep(30)
+                    os._exit(2)
+
+                threading.Thread(target=bail, daemon=True).start()
+
+                # Signal readiness only after the watcher captured orig_ppid.
+                with open(ready_file, "w") as f:
+                    f.write(str(os.getpid()))
+                while True:
+                    time.sleep(1)
+            else:
+                # middle: wait until grandchild is ready, then exit to orphan it.
+                if not self._wait_for_file(ready_file):
+                    os._exit(5)
+                os._exit(0)
+
+        # ---- test (top) process ----
+        os.waitpid(middle, 0)  # reap middle only; grandchild is orphaned
+
+        self.assertTrue(os.path.exists(ready_file), "grandchild never signaled readiness")
+        self.assertTrue(
+            self._wait_for_file(exited_file),
+            "watcher did not detect parent death within timeout",
+        )
+
+        # Best-effort cleanup: kill the grandchild if it somehow survived.
+        try:
+            with open(ready_file) as f:
+                pid = int(f.read().strip())
+            if pid > 1:
+                os.kill(pid, 9)
+        except (OSError, ValueError):
+            pass
+        for p in (ready_file, exited_file):
+            try:
+                os.remove(p)
+            except OSError:
+                pass
+        try:
+            os.rmdir(tmpdir)
+        except OSError:
+            pass
+
+
+if __name__ == "__main__":
+    unittest.main()