Two related fixes in the X-LocalAI-Node middleware wrapper:
1. Replace ml.CheckIsLoaded(modelName).NodeID() with the new
ml.LookupNodeID helper in the lazy resolve closure. CheckIsLoaded
acquires ml.mu and, when the recently-healthy cache window has
expired, runs a gRPC HealthCheck with a 2-minute timeout. Running
that on the response writer right before the first byte hits the
client could stall buffered and streaming responses alike for up to
2 minutes on a stale-healthy model. LookupNodeID is a pure store
read with no I/O and no contention against active inference.
2. Return http.ErrNotSupported (wrapped via fmt.Errorf with %w) from
Hijack when the underlying writer does not implement
http.Hijacker, instead of a string-only errors.New. Matches the
standard library convention so callers using errors.Is - notably
http.NewResponseController.Hijack - detect the condition through
the standard sentinel. Future-proof only: no current routes go
through this branch.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]
Introduce a per-request Echo middleware that wraps the response writer and
lazily stamps X-LocalAI-Node on the first Write / WriteHeader / Flush.
This replaces the chan-based per-request rendezvous and per-handler
maybeSetNodeHeader calls with a single enforcement point.
The wrapper reads the picked node ID by looking up the request's model in
the ModelLoader at flush time (late binding), so the value reflects the
post-ml.Load state of the loader rather than any pre-route guess. Off by
default; gated by ApplicationConfig.ExposeNodeHeader.
Ginkgo specs cover off/on, missing model, in-process model (no node ID),
absent stash, buffered + streaming flush ordering, error path, and late
binding under in-handler stamp.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]