mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-30 11:36:31 -04:00
Plumb the SmartRouter's per-request node decision up to the OpenAI inference handlers (chat, completions, embeddings) and attach it as the X-LocalAI-Node response header when the operator enabled --expose-node-header. Wiring: - pkg/model.Model gains a NodeID field plus mutex-guarded SetNodeID/NodeID accessors. The router stamps it on the *Model it returns from NewModelWithClient; the field stays empty for in-process loads. - core/services/nodes/model_router.go SetNodeID after constructing the Model so the in-process store carries the most-recent routing decision per modelID. - core/http/endpoints/openai/node_header.go centralizes the policy in maybeSetNodeHeader (no-op when the flag is off, the model is not loaded, or no node ID is recorded). chat, completion and embeddings handlers call it before writing the response. Best-effort caveat: the distributed LoadModel path overwrites the per modelID store entry on every routing decision, so under heavy concurrency the header reflects "a recent decision" rather than "the exact node that served this exact request". This is acceptable for observability and matches what operators already see in the cluster logs. Documented in the flag help text and in the distributed-mode feature doc. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>