Files
LocalAI/pkg/distributedhdr/distributedhdr.go
LocalAI [bot] 06e777b75e feat(distributed): gated X-LocalAI-Node response header (middleware + wrapper) (#9976)
* feat(distributed): add per-request node ID context holder

Introduce pkg/distributedhdr, a leaf package carrying a per-request
*atomic.Value holder for the picked worker node ID from the
SmartRouter (core/services/nodes) up to the HTTP response writer
wrapper (core/http/middleware). Avoids the import cycle that a shared
key in either consumer would create.

Exposes NewHolder, WithHolder, Holder, Stamp, Load, Inherit. The
holder is atomic.Value so cross-goroutine publish from the router to
the response writer wrapper is race-clean.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): add ExposeNodeHeader middleware + response writer wrapper

New ApplicationConfig.ExposeNodeHeader bool + --expose-node-header CLI
flag / LOCALAI_EXPOSE_NODE_HEADER env var (default off; the node ID
reveals internal topology and is opt-in).

The middleware creates a per-request *atomic.Value holder, attaches it
to c.Request().Context() via distributedhdr.WithHolder, and wraps
c.Response().Writer with a custom http.ResponseWriter that sets the
X-LocalAI-Node header on first Write / WriteHeader / Flush by reading
the holder. Implements http.Flusher, http.Hijacker, Unwrap so it
composes cleanly with Echo and http.NewResponseController.

request.go propagates the holder onto derived contexts via
distributedhdr.Inherit so the holder survives the correlation-ID
context replacement.

Unit + race-clean concurrency + integration specs.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): stamp node ID in router and wire middleware to inference routes

ModelRouterAdapter.Route stamps the picked node ID into the
per-request holder via distributedhdr.Stamp(ctx, result.Node.ID) right
after replica selection.

Wire ExposeNodeHeader middleware to:
- OpenAI chat/completion/embeddings + audio transcriptions/speech + image generations/inpainting
- Anthropic /v1/messages
- Ollama /api/chat, /api/generate, /api/embed, /api/embeddings
- Jina /v1/rerank
- LocalAI /v1/vad

The middleware's wrapper reads the holder on first byte and sets the
X-LocalAI-Node response header before delegating to the underlying
writer. Per-request scope means no race under concurrent multi-replica
routing.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(distributed): thread request context through backend Load + cover ctx propagation

Five non-OpenAI backend helpers were silently using app.Context instead
of the request context for the gRPC backend call: transcription, TTS,
image generation, rerank, VAD. Effect: distributedhdr.Stamp in the
router callback was a silent no-op for these paths, AND client
cancellation didn't propagate to in-flight inference.

Thread c.Request().Context() (or the equivalent input.Context after
the request middleware has installed the correlation-ID derived
context) through each helper and into ModelOptions via
model.WithContext(ctx). ImageGeneration's signature gains a leading
ctx parameter; in-tree callers (openai image, openai inpainting,
openai inpainting_test) are updated to match.

ModelEmbedding gains a leading ctx parameter for the same reason; the
openai and ollama embedding handlers pass the request context through.

chat_stream_workers.go defers the initial role=assistant chunk
emission until the first token callback so the wrapper's lazy
X-LocalAI-Node lookup against the loader runs AFTER ml.Load has
stamped the per-modelID node ID; semantically identical for clients
(role still arrives before any text).

Regression test core/backend/ctx_propagation_test.go pins ctx
propagation for all five helpers.

Docs updated to enumerate the full endpoint coverage of the
--expose-node-header flag.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-25 10:51:48 +02:00

109 lines
3.7 KiB
Go

// Package distributedhdr carries a per-request "which worker node served
// me" record from the distributed router (core/services/nodes) up to the
// HTTP response writer wrapper (core/http/middleware). It is the bridge
// for the X-LocalAI-Node response header without coupling those two
// packages directly or going through any shared mutable state.
//
// Why its own package: both core/http/middleware and core/services/nodes
// already import pkg/model, and the natural homes for this key would
// create an import cycle if either side hosted the helper. Putting the
// key and the tiny helpers here keeps it neutral - producer and consumer
// import a leaf package, not each other.
package distributedhdr
import (
"context"
"sync/atomic"
)
// ctxKey is an unexported context-key type so external packages cannot
// collide with our key by accident.
type ctxKey struct{}
// holderKey is the singleton context key used by WithHolder / Holder /
// Stamp. Unexported on purpose: producers and consumers must go through
// the helpers below so the storage representation stays an
// implementation detail.
var holderKey = ctxKey{}
// NewHolder returns an empty holder ready to be attached to a request
// context via WithHolder. The middleware creates one per HTTP request;
// the router fills it; the response writer wrapper reads it on the
// first byte. The atomic.Value gives us race-clean publish across the
// goroutines that may write (router) and read (response writer
// wrapper) the slot.
func NewHolder() *atomic.Value {
return &atomic.Value{}
}
// WithHolder returns a derived context that carries the provided holder.
// The middleware calls this on the per-request context BEFORE the
// downstream handler chain runs, so any goroutine that inherits this
// context (notably the SmartRouter / ModelRouterAdapter) can find the
// holder via Stamp.
func WithHolder(ctx context.Context, h *atomic.Value) context.Context {
if h == nil {
return ctx
}
return context.WithValue(ctx, holderKey, h)
}
// Holder returns the holder attached to ctx, or nil if none was set.
// Callers that just want to publish should prefer Stamp; Holder is
// useful for tests and for propagating the holder across derived
// contexts (see Inherit).
func Holder(ctx context.Context) *atomic.Value {
if ctx == nil {
return nil
}
h, _ := ctx.Value(holderKey).(*atomic.Value)
return h
}
// Stamp records the picked worker node ID into the holder attached to
// ctx. No-op when:
// - ctx is nil
// - no holder is attached (e.g. the X-LocalAI-Node feature is off, so
// the middleware never ran)
// - nodeID is empty
//
// Stamp is safe to call from any goroutine. Subsequent reads via Load
// observe the most recent stamp.
func Stamp(ctx context.Context, nodeID string) {
if nodeID == "" {
return
}
h := Holder(ctx)
if h == nil {
return
}
h.Store(nodeID)
}
// Load returns the node ID most recently stamped into the holder, or
// "" if nothing has been stamped. Intended for the response writer
// wrapper's first-byte hook.
func Load(h *atomic.Value) string {
if h == nil {
return ""
}
v, _ := h.Load().(string)
return v
}
// Inherit copies the holder reference from src into dst when present.
// Used at request-handling seams where the request context is replaced
// with a fresh context derived from the long-lived application context
// (so the cancel semantics of the original context are preserved while
// also allowing the load path to outlive the HTTP transport). The
// holder is a pointer, so both contexts point at the same slot; a
// router stamp via Stamp(dst, ...) is observed by the middleware that
// reads through src's holder.
func Inherit(dst, src context.Context) context.Context {
h := Holder(src)
if h == nil {
return dst
}
return WithHolder(dst, h)
}