mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 19:19:19 -04:00
* feat(distributed): add per-request node ID context holder Introduce pkg/distributedhdr, a leaf package carrying a per-request *atomic.Value holder for the picked worker node ID from the SmartRouter (core/services/nodes) up to the HTTP response writer wrapper (core/http/middleware). Avoids the import cycle that a shared key in either consumer would create. Exposes NewHolder, WithHolder, Holder, Stamp, Load, Inherit. The holder is atomic.Value so cross-goroutine publish from the router to the response writer wrapper is race-clean. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): add ExposeNodeHeader middleware + response writer wrapper New ApplicationConfig.ExposeNodeHeader bool + --expose-node-header CLI flag / LOCALAI_EXPOSE_NODE_HEADER env var (default off; the node ID reveals internal topology and is opt-in). The middleware creates a per-request *atomic.Value holder, attaches it to c.Request().Context() via distributedhdr.WithHolder, and wraps c.Response().Writer with a custom http.ResponseWriter that sets the X-LocalAI-Node header on first Write / WriteHeader / Flush by reading the holder. Implements http.Flusher, http.Hijacker, Unwrap so it composes cleanly with Echo and http.NewResponseController. request.go propagates the holder onto derived contexts via distributedhdr.Inherit so the holder survives the correlation-ID context replacement. Unit + race-clean concurrency + integration specs. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): stamp node ID in router and wire middleware to inference routes ModelRouterAdapter.Route stamps the picked node ID into the per-request holder via distributedhdr.Stamp(ctx, result.Node.ID) right after replica selection. Wire ExposeNodeHeader middleware to: - OpenAI chat/completion/embeddings + audio transcriptions/speech + image generations/inpainting - Anthropic /v1/messages - Ollama /api/chat, /api/generate, /api/embed, /api/embeddings - Jina /v1/rerank - LocalAI /v1/vad The middleware's wrapper reads the holder on first byte and sets the X-LocalAI-Node response header before delegating to the underlying writer. Per-request scope means no race under concurrent multi-replica routing. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed): thread request context through backend Load + cover ctx propagation Five non-OpenAI backend helpers were silently using app.Context instead of the request context for the gRPC backend call: transcription, TTS, image generation, rerank, VAD. Effect: distributedhdr.Stamp in the router callback was a silent no-op for these paths, AND client cancellation didn't propagate to in-flight inference. Thread c.Request().Context() (or the equivalent input.Context after the request middleware has installed the correlation-ID derived context) through each helper and into ModelOptions via model.WithContext(ctx). ImageGeneration's signature gains a leading ctx parameter; in-tree callers (openai image, openai inpainting, openai inpainting_test) are updated to match. ModelEmbedding gains a leading ctx parameter for the same reason; the openai and ollama embedding handlers pass the request context through. chat_stream_workers.go defers the initial role=assistant chunk emission until the first token callback so the wrapper's lazy X-LocalAI-Node lookup against the loader runs AFTER ml.Load has stamped the per-modelID node ID; semantically identical for clients (role still arrives before any text). Regression test core/backend/ctx_propagation_test.go pins ctx propagation for all five helpers. Docs updated to enumerate the full endpoint coverage of the --expose-node-header flag. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
109 lines
3.7 KiB
Go
109 lines
3.7 KiB
Go
// Package distributedhdr carries a per-request "which worker node served
|
|
// me" record from the distributed router (core/services/nodes) up to the
|
|
// HTTP response writer wrapper (core/http/middleware). It is the bridge
|
|
// for the X-LocalAI-Node response header without coupling those two
|
|
// packages directly or going through any shared mutable state.
|
|
//
|
|
// Why its own package: both core/http/middleware and core/services/nodes
|
|
// already import pkg/model, and the natural homes for this key would
|
|
// create an import cycle if either side hosted the helper. Putting the
|
|
// key and the tiny helpers here keeps it neutral - producer and consumer
|
|
// import a leaf package, not each other.
|
|
package distributedhdr
|
|
|
|
import (
|
|
"context"
|
|
"sync/atomic"
|
|
)
|
|
|
|
// ctxKey is an unexported context-key type so external packages cannot
|
|
// collide with our key by accident.
|
|
type ctxKey struct{}
|
|
|
|
// holderKey is the singleton context key used by WithHolder / Holder /
|
|
// Stamp. Unexported on purpose: producers and consumers must go through
|
|
// the helpers below so the storage representation stays an
|
|
// implementation detail.
|
|
var holderKey = ctxKey{}
|
|
|
|
// NewHolder returns an empty holder ready to be attached to a request
|
|
// context via WithHolder. The middleware creates one per HTTP request;
|
|
// the router fills it; the response writer wrapper reads it on the
|
|
// first byte. The atomic.Value gives us race-clean publish across the
|
|
// goroutines that may write (router) and read (response writer
|
|
// wrapper) the slot.
|
|
func NewHolder() *atomic.Value {
|
|
return &atomic.Value{}
|
|
}
|
|
|
|
// WithHolder returns a derived context that carries the provided holder.
|
|
// The middleware calls this on the per-request context BEFORE the
|
|
// downstream handler chain runs, so any goroutine that inherits this
|
|
// context (notably the SmartRouter / ModelRouterAdapter) can find the
|
|
// holder via Stamp.
|
|
func WithHolder(ctx context.Context, h *atomic.Value) context.Context {
|
|
if h == nil {
|
|
return ctx
|
|
}
|
|
return context.WithValue(ctx, holderKey, h)
|
|
}
|
|
|
|
// Holder returns the holder attached to ctx, or nil if none was set.
|
|
// Callers that just want to publish should prefer Stamp; Holder is
|
|
// useful for tests and for propagating the holder across derived
|
|
// contexts (see Inherit).
|
|
func Holder(ctx context.Context) *atomic.Value {
|
|
if ctx == nil {
|
|
return nil
|
|
}
|
|
h, _ := ctx.Value(holderKey).(*atomic.Value)
|
|
return h
|
|
}
|
|
|
|
// Stamp records the picked worker node ID into the holder attached to
|
|
// ctx. No-op when:
|
|
// - ctx is nil
|
|
// - no holder is attached (e.g. the X-LocalAI-Node feature is off, so
|
|
// the middleware never ran)
|
|
// - nodeID is empty
|
|
//
|
|
// Stamp is safe to call from any goroutine. Subsequent reads via Load
|
|
// observe the most recent stamp.
|
|
func Stamp(ctx context.Context, nodeID string) {
|
|
if nodeID == "" {
|
|
return
|
|
}
|
|
h := Holder(ctx)
|
|
if h == nil {
|
|
return
|
|
}
|
|
h.Store(nodeID)
|
|
}
|
|
|
|
// Load returns the node ID most recently stamped into the holder, or
|
|
// "" if nothing has been stamped. Intended for the response writer
|
|
// wrapper's first-byte hook.
|
|
func Load(h *atomic.Value) string {
|
|
if h == nil {
|
|
return ""
|
|
}
|
|
v, _ := h.Load().(string)
|
|
return v
|
|
}
|
|
|
|
// Inherit copies the holder reference from src into dst when present.
|
|
// Used at request-handling seams where the request context is replaced
|
|
// with a fresh context derived from the long-lived application context
|
|
// (so the cancel semantics of the original context are preserved while
|
|
// also allowing the load path to outlive the HTTP transport). The
|
|
// holder is a pointer, so both contexts point at the same slot; a
|
|
// router stamp via Stamp(dst, ...) is observed by the middleware that
|
|
// reads through src's holder.
|
|
func Inherit(dst, src context.Context) context.Context {
|
|
h := Holder(src)
|
|
if h == nil {
|
|
return dst
|
|
}
|
|
return WithHolder(dst, h)
|
|
}
|