LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-30 11:36:31 -04:00

Files

Ettore Di Giacinto b85b7e29df feat(distributed): surface picked node ID via X-LocalAI-Node header

Plumb the SmartRouter's per-request node decision up to the OpenAI
inference handlers (chat, completions, embeddings) and attach it as the
X-LocalAI-Node response header when the operator enabled
--expose-node-header.

Wiring:

- pkg/model.Model gains a NodeID field plus mutex-guarded
  SetNodeID/NodeID accessors. The router stamps it on the *Model it
  returns from NewModelWithClient; the field stays empty for in-process
  loads.
- core/services/nodes/model_router.go SetNodeID after constructing the
  Model so the in-process store carries the most-recent routing
  decision per modelID.
- core/http/endpoints/openai/node_header.go centralizes the policy in
  maybeSetNodeHeader (no-op when the flag is off, the model is not
  loaded, or no node ID is recorded). chat, completion and embeddings
  handlers call it before writing the response.

Best-effort caveat: the distributed LoadModel path overwrites the per
modelID store entry on every routing decision, so under heavy
concurrency the header reflects "a recent decision" rather than "the
exact node that served this exact request". This is acceptable for
observability and matches what operators already see in the cluster
logs. Documented in the flag help text and in the distributed-mode
feature doc.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-24 20:14:02 +00:00

advisorylock

feat(distributed): sync state with frontends, better backend management reporting (#9426 )

2026-04-19 17:55:53 +02:00

agentpool

fix(agentpool): close truncate-then-read race in agent_jobs.json persistence (#9811 )

2026-05-13 23:58:43 +02:00

agents

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

dbutil

feat: add distributed mode (#9124 )