LocalAI

mirror/LocalAI

Fork 0

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
LocalAI [bot]	a4e6e01e4d	fix(process): give backend workers a parent-death safety net (#10639 ) * fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI spawns) can be orphaned and linger — holding VRAM and its listen port — if the LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before its own teardown runs. Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown -> ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when LocalAI receives a catchable signal and survives long enough to run its handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1, whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or option for a caller to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (it never builds the exec.Cmd itself), so it cannot set a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing tells the backend to exit and it is reparented to init. Fix: add a best-effort, backend-side safety net at the one shared choke point every out-of-process Go backend routes through — grpc.StartServer / RunServer in pkg/grpc. On startup it captures getppid() and polls; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal the original parent died) it logs and self-terminates. getppid() reparent detection is portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged. Scope/limitations: covers Go-based backends (everything using pkg/grpc). The C++ backends (e.g. llama-cpp) and Python backends do not route through pkg/grpc and are not covered by this mechanism — they would each need an equivalent parent-death check (follow-up). The fully general fix is for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language (suggested upstream follow-up; out of scope for this LocalAI-only PR). Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild process tree, lets the middle process exit to orphan the grandchild running the real watchParentDeath, and asserts it detects the reparent and self-terminates. Unix-only (build-tagged), runs in CI (Linux). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(process): extend parent-death backstop to C++ and Python backends The Go parent-death watcher (pkg/grpc/parentwatch.go, commit `772b435d5`) only protects backends that route through pkg/grpc. C++ and Python backends don't, so the originally-reported case — the llama.cpp gRPC worker surviving a non-graceful LocalAI death — was still uncovered. Extend the same best-effort backstop to both languages, reusing the exact mechanism and semantics: - capture getppid() at startup, skip if already orphaned (<=1) - a background thread polls getppid() and self-exits on reparenting (getppid() != orig \|\| == 1), portable across Linux/macOS, no-op on Windows - same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL (default 2s; accepts Go-style durations like 500ms/2s/1m) C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++ backend) as a dependency-free header parent_watch.h, wired into grpc-server.cpp's main() and copied at build time via prepare.sh. C++ backends have no shared server scaffolding, so other C++ backends (ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would each need the same one-line include+call as follow-ups. Python: implemented once in the shared common/parent_watch.py and armed from common/grpc_auth.py's get_auth_interceptors() — the single helper every one of the 35 Python backends invokes while building its gRPC server — so all Python backends (and future ones) are covered with no per-backend edits and no duplicated implementation. Tests (real process-tree reparent detection, mirroring the Go test): - backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh) - backend/python/common/parent_watch_test.py (python -m unittest) Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>	2026-07-02 19:16:48 +02:00
LocalAI [bot]	f0d0bff232	fix(llama-cpp): stop reinterpreting plain-string message content as JSON (#10524 ) (#10538 ) The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the tokenizer-template path and blindly json::parse'd each message's content string. LocalAI's Go layer always flattens content to a plain string, so a user prompt that merely looks like JSON (e.g. mealie's ingredient array ["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and rejected by oaicompat_chat_params_parse with "unsupported content[].type". Normalize content per role instead: user/system/developer content is opaque text and is never JSON-sniffed; assistant/tool content still collapses a literal JSON null/object (tool-call bookkeeping) to a string, but a plain string is never turned into an array/scalar. The array defense is role-independent, so the role gate only governs the benign null/object case. While here, extract the duplicated per-message reconstruction and the pre-template content sanitization into shared, unit-tested helpers (message_content.h) so the streaming (PredictStream) and non-streaming (Predict) paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the dead tool-role parse branches, and the redundant Predict-only tool_calls branch, while preserving the prior #7324 (null content -> "") and #7528 (tool array content -> string) fixes. Tests: - backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for all three helpers (#10524, #7324, #7528, multimodal), discovered and run by `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON). - core/schema/message_test.go: Go regression pinning that ToProto flattens a JSON-array-looking text part to the verbatim string. - prepare.sh now copies message_content.h into the build tree. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-27 01:42:05 +02:00

LocalAI [bot]

a4e6e01e4d

fix(process): give backend workers a parent-death safety net (#10639 )

* fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully

Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI
spawns) can be orphaned and linger — holding VRAM and its listen port — if the
LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown
grace period elapses and LocalAI is SIGKILLed) before its own teardown runs.

Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the
SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown ->
ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when
LocalAI receives a catchable signal and survives long enough to run its
handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1,
whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be
signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or
option for a caller to inject/extend SysProcAttr. LocalAI fully delegates
spawning to that library (it never builds the exec.Cmd itself), so it cannot set
a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing
tells the backend to exit and it is reparented to init.

Fix: add a best-effort, backend-side safety net at the one shared choke point
every out-of-process Go backend routes through — grpc.StartServer / RunServer in
pkg/grpc. On startup it captures getppid() and polls; when the process is
reparented (getppid changes / becomes 1 — the standard POSIX signal the original
parent died) it logs and self-terminates. getppid() reparent detection is
portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via
LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and
LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the
existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged.

Scope/limitations: covers Go-based backends (everything using pkg/grpc). The
C++ backends (e.g. llama-cpp) and Python backends do not route through
pkg/grpc and are not covered by this mechanism — they would each need an
equivalent parent-death check (follow-up). The fully general fix is for
go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig
at spawn for every backend regardless of language (suggested upstream follow-up;
out of scope for this LocalAI-only PR).

Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild
process tree, lets the middle process exit to orphan the grandchild running the
real watchParentDeath, and asserts it detects the reparent and self-terminates.
Unix-only (build-tagged), runs in CI (Linux).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(process): extend parent-death backstop to C++ and Python backends

The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.

Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:

- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
  (getppid() != orig || == 1), portable across Linux/macOS, no-op on
  Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
  false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
  (default 2s; accepts Go-style durations like 500ms/2s/1m)

C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.

Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.

Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>

2026-07-02 19:16:48 +02:00

LocalAI [bot]

f0d0bff232

fix(llama-cpp): stop reinterpreting plain-string message content as JSON (#10524 ) (#10538 )

The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the
tokenizer-template path and blindly json::parse'd each message's content
string. LocalAI's Go layer always flattens content to a plain string, so a
user prompt that merely looks like JSON (e.g. mealie's ingredient array
["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and
rejected by oaicompat_chat_params_parse with "unsupported content[].type".

Normalize content per role instead: user/system/developer content is opaque
text and is never JSON-sniffed; assistant/tool content still collapses a literal
JSON null/object (tool-call bookkeeping) to a string, but a plain string is
never turned into an array/scalar. The array defense is role-independent, so the
role gate only governs the benign null/object case.

While here, extract the duplicated per-message reconstruction and the
pre-template content sanitization into shared, unit-tested helpers
(message_content.h) so the streaming (PredictStream) and non-streaming (Predict)
paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the
dead tool-role parse branches, and the redundant Predict-only tool_calls branch,
while preserving the prior #7324 (null content -> "") and #7528 (tool array
content -> string) fixes.

Tests:
- backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for
  all three helpers (#10524, #7324, #7528, multimodal), discovered and run by
  `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired
  as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON).
- core/schema/message_test.go: Go regression pinning that ToProto flattens a
  JSON-array-looking text part to the verbatim string.
- prepare.sh now copies message_content.h into the build tree.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-27 01:42:05 +02:00

2 Commits