Wire middleware.ExposeNodeHeader onto the OpenAI inference routes
(chat, completions, embeddings) plus the Anthropic /v1/messages shim
and the Ollama chat/generate/embed shims. The wrapper handles
X-LocalAI-Node attribution from a single place, so the per-handler
maybeSetNodeHeader calls and the per-request nodeIDCh rendezvous /
applyNodeIDHeader plumbing in chat.go and completion.go are removed.
For SSE: the wrapper's lazy stamp on the first Write / WriteHeader /
Flush picks up the post-ml.Load node ID from the loader, replacing the
chan signal the worker used to publish. The role=assistant first chunk
emission stays where it is (inside the first token callback) so all
writes still happen AFTER ml.Load has stamped the per-modelID node ID.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]