LocalAI

mirror/LocalAI

Fork 0

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-31 20:21:26 -04:00

Commit Graph

Author	SHA1	Message	Date
Ettore Di Giacinto	8b2697f39a	fix(distributed): drop hot-path I/O from node-header wrapper Two related fixes in the X-LocalAI-Node middleware wrapper: 1. Replace ml.CheckIsLoaded(modelName).NodeID() with the new ml.LookupNodeID helper in the lazy resolve closure. CheckIsLoaded acquires ml.mu and, when the recently-healthy cache window has expired, runs a gRPC HealthCheck with a 2-minute timeout. Running that on the response writer right before the first byte hits the client could stall buffered and streaming responses alike for up to 2 minutes on a stale-healthy model. LookupNodeID is a pure store read with no I/O and no contention against active inference. 2. Return http.ErrNotSupported (wrapped via fmt.Errorf with %w) from Hijack when the underlying writer does not implement http.Hijacker, instead of a string-only errors.New. Matches the standard library convention so callers using errors.Is - notably http.NewResponseController.Hijack - detect the condition through the standard sentinel. Future-proof only: no current routes go through this branch. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7[1m]	2026-05-24 21:40:57 +00:00
Ettore Di Giacinto	799215cdc6	feat(distributed): add ExposeNodeHeader middleware + ResponseWriter wrapper Introduce a per-request Echo middleware that wraps the response writer and lazily stamps X-LocalAI-Node on the first Write / WriteHeader / Flush. This replaces the chan-based per-request rendezvous and per-handler maybeSetNodeHeader calls with a single enforcement point. The wrapper reads the picked node ID by looking up the request's model in the ModelLoader at flush time (late binding), so the value reflects the post-ml.Load state of the loader rather than any pre-route guess. Off by default; gated by ApplicationConfig.ExposeNodeHeader. Ginkgo specs cover off/on, missing model, in-process model (no node ID), absent stash, buffered + streaming flush ordering, error path, and late binding under in-handler stamp. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7[1m]	2026-05-24 21:15:11 +00:00

Author

SHA1

Message

Date

Ettore Di Giacinto

8b2697f39a

fix(distributed): drop hot-path I/O from node-header wrapper

Two related fixes in the X-LocalAI-Node middleware wrapper:

  1. Replace ml.CheckIsLoaded(modelName).NodeID() with the new
     ml.LookupNodeID helper in the lazy resolve closure. CheckIsLoaded
     acquires ml.mu and, when the recently-healthy cache window has
     expired, runs a gRPC HealthCheck with a 2-minute timeout. Running
     that on the response writer right before the first byte hits the
     client could stall buffered and streaming responses alike for up to
     2 minutes on a stale-healthy model. LookupNodeID is a pure store
     read with no I/O and no contention against active inference.

  2. Return http.ErrNotSupported (wrapped via fmt.Errorf with %w) from
     Hijack when the underlying writer does not implement
     http.Hijacker, instead of a string-only errors.New. Matches the
     standard library convention so callers using errors.Is - notably
     http.NewResponseController.Hijack - detect the condition through
     the standard sentinel. Future-proof only: no current routes go
     through this branch.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

2026-05-24 21:40:57 +00:00

Ettore Di Giacinto

799215cdc6

feat(distributed): add ExposeNodeHeader middleware + ResponseWriter wrapper

Introduce a per-request Echo middleware that wraps the response writer and
lazily stamps X-LocalAI-Node on the first Write / WriteHeader / Flush.
This replaces the chan-based per-request rendezvous and per-handler
maybeSetNodeHeader calls with a single enforcement point.

The wrapper reads the picked node ID by looking up the request's model in
the ModelLoader at flush time (late binding), so the value reflects the
post-ml.Load state of the loader rather than any pre-route guess. Off by
default; gated by ApplicationConfig.ExposeNodeHeader.

Ginkgo specs cover off/on, missing model, in-process model (no node ID),
absent stash, buffered + streaming flush ordering, error path, and late
binding under in-handler stamp.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

2026-05-24 21:15:11 +00:00

2 Commits