* fix(distributed): cascade-clean stale node_models on drain and filter routing by healthy status
Stale node_models rows (state="loaded") were surviving past the healthy
state of their owning node, causing /embeddings (and other inference
paths) to dispatch to a backend whose process was gone or drained. The
downstream symptom in a live cluster was pgvector rejecting inserts
with "vector cannot have more than 16000 dimensions (SQLSTATE 54000)"
because the misbehaving backend silently returned a malformed
(oversized) tensor; the Models page showed the model as "running"
without an associated node, like a stale entry, even though the node
was no longer visible in the Nodes view.
Two changes here, plus a third in a follow-up commit:
- MarkDraining now cascade-deletes node_models rows for the affected
node, mirroring MarkOffline. Drains are explicit operator actions —
the box has been intentionally taken out of rotation — so clearing
the rows stops the Models UI from misreporting and prevents the
routing layer from picking those rows if scheduling logic is ever
relaxed. In-flight requests already hold their gRPC client through
Route() and finish normally; the only observable effect is a
non-fatal IncrementInFlight warning, acceptable for a drain.
MarkUnhealthy is deliberately left status-only: it fires from
managers_distributed / reconciler on a single nats.ErrNoResponders
with no retry, so a transient NATS hiccup must not nuke every loaded
model and force a full reload on recovery.
- FindAndLockNodeWithModel's inner JOIN now filters on
backend_nodes.status = healthy in addition to node_models.state =
loaded. The previous version relied on the second node-fetch step to
reject non-healthy nodes, but a concurrent reader could still pick
the same stale row in the same window. Belt-and-braces.
- DistributedConfig.PerModelHealthCheck renamed to
DisablePerModelHealthCheck and inverted at the call site so
per-model gRPC probing is on by default. The probe (now made
consecutive-miss aware in a follow-up commit) independently health-
checks each model's gRPC address and removes stale node_models rows
when the backend has crashed even though the worker's node-level
heartbeat is still arriving.
Migration: the field had no CLI flag, env var binding, or YAML key
in tree (only the bare struct field), so there is no user-facing
migration. Anything constructing DistributedConfig in code needs to
drop the assignment (default now does the right thing) or invert it.
Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(distributed): require consecutive misses before per-model probe removes a row
The per-model gRPC probe used to remove a node_models row on a single
failed health check. With the per-model probe now on by default, that
made any 5-second gRPC blip (network jitter, a long-running request
hogging the worker's gRPC server thread, brief GC pause) trigger a
full reload of the affected model — too eager for production.
Require perModelMissThreshold (3) consecutive failed probes before
removal. At the default 15s tick a model must be unreachable for ~45s
before reap; a single successful probe in between resets the streak.
Per-(node, model, replica) state tracked under a mutex on the monitor.
If the removal call itself fails, the miss counter is left in place
so the next tick retries rather than starting the streak over.
Tests:
- removes stale model via per-model health check after consecutive
failures (replaces the single-shot expectation)
- preserves model row when an intermittent failure is followed by a
success (covers the reset-on-success path and verifies the counter
reset by failing twice more without crossing threshold)
- newTestHealthMonitor initializes the misses map so direct-construct
test helpers don't nil-map-panic in the probe path
Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>