Commit Graph

2 Commits

Author SHA1 Message Date
Ettore Di Giacinto
8b2697f39a fix(distributed): drop hot-path I/O from node-header wrapper
Two related fixes in the X-LocalAI-Node middleware wrapper:

  1. Replace ml.CheckIsLoaded(modelName).NodeID() with the new
     ml.LookupNodeID helper in the lazy resolve closure. CheckIsLoaded
     acquires ml.mu and, when the recently-healthy cache window has
     expired, runs a gRPC HealthCheck with a 2-minute timeout. Running
     that on the response writer right before the first byte hits the
     client could stall buffered and streaming responses alike for up to
     2 minutes on a stale-healthy model. LookupNodeID is a pure store
     read with no I/O and no contention against active inference.

  2. Return http.ErrNotSupported (wrapped via fmt.Errorf with %w) from
     Hijack when the underlying writer does not implement
     http.Hijacker, instead of a string-only errors.New. Matches the
     standard library convention so callers using errors.Is - notably
     http.NewResponseController.Hijack - detect the condition through
     the standard sentinel. Future-proof only: no current routes go
     through this branch.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]
2026-05-24 21:40:57 +00:00
Ettore Di Giacinto
799215cdc6 feat(distributed): add ExposeNodeHeader middleware + ResponseWriter wrapper
Introduce a per-request Echo middleware that wraps the response writer and
lazily stamps X-LocalAI-Node on the first Write / WriteHeader / Flush.
This replaces the chan-based per-request rendezvous and per-handler
maybeSetNodeHeader calls with a single enforcement point.

The wrapper reads the picked node ID by looking up the request's model in
the ModelLoader at flush time (late binding), so the value reflects the
post-ml.Load state of the loader rather than any pre-route guess. Off by
default; gated by ApplicationConfig.ExposeNodeHeader.

Ginkgo specs cover off/on, missing model, in-process model (no node ID),
absent stash, buffered + streaming flush ordering, error path, and late
binding under in-handler stamp.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]
2026-05-24 21:15:11 +00:00