mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-30 11:36:31 -04:00
Wire middleware.ExposeNodeHeader onto the OpenAI inference routes (chat, completions, embeddings) plus the Anthropic /v1/messages shim and the Ollama chat/generate/embed shims. The wrapper handles X-LocalAI-Node attribution from a single place, so the per-handler maybeSetNodeHeader calls and the per-request nodeIDCh rendezvous / applyNodeIDHeader plumbing in chat.go and completion.go are removed. For SSE: the wrapper's lazy stamp on the first Write / WriteHeader / Flush picks up the post-ml.Load node ID from the loader, replacing the chan signal the worker used to publish. The role=assistant first chunk emission stays where it is (inside the first token callback) so all writes still happen AFTER ml.Load has stamped the per-modelID node ID. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7[1m]