* feat(distributed): add per-request node ID context holder
Introduce pkg/distributedhdr, a leaf package carrying a per-request
*atomic.Value holder for the picked worker node ID from the
SmartRouter (core/services/nodes) up to the HTTP response writer
wrapper (core/http/middleware). Avoids the import cycle that a shared
key in either consumer would create.
Exposes NewHolder, WithHolder, Holder, Stamp, Load, Inherit. The
holder is atomic.Value so cross-goroutine publish from the router to
the response writer wrapper is race-clean.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(distributed): add ExposeNodeHeader middleware + response writer wrapper
New ApplicationConfig.ExposeNodeHeader bool + --expose-node-header CLI
flag / LOCALAI_EXPOSE_NODE_HEADER env var (default off; the node ID
reveals internal topology and is opt-in).
The middleware creates a per-request *atomic.Value holder, attaches it
to c.Request().Context() via distributedhdr.WithHolder, and wraps
c.Response().Writer with a custom http.ResponseWriter that sets the
X-LocalAI-Node header on first Write / WriteHeader / Flush by reading
the holder. Implements http.Flusher, http.Hijacker, Unwrap so it
composes cleanly with Echo and http.NewResponseController.
request.go propagates the holder onto derived contexts via
distributedhdr.Inherit so the holder survives the correlation-ID
context replacement.
Unit + race-clean concurrency + integration specs.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(distributed): stamp node ID in router and wire middleware to inference routes
ModelRouterAdapter.Route stamps the picked node ID into the
per-request holder via distributedhdr.Stamp(ctx, result.Node.ID) right
after replica selection.
Wire ExposeNodeHeader middleware to:
- OpenAI chat/completion/embeddings + audio transcriptions/speech + image generations/inpainting
- Anthropic /v1/messages
- Ollama /api/chat, /api/generate, /api/embed, /api/embeddings
- Jina /v1/rerank
- LocalAI /v1/vad
The middleware's wrapper reads the holder on first byte and sets the
X-LocalAI-Node response header before delegating to the underlying
writer. Per-request scope means no race under concurrent multi-replica
routing.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(distributed): thread request context through backend Load + cover ctx propagation
Five non-OpenAI backend helpers were silently using app.Context instead
of the request context for the gRPC backend call: transcription, TTS,
image generation, rerank, VAD. Effect: distributedhdr.Stamp in the
router callback was a silent no-op for these paths, AND client
cancellation didn't propagate to in-flight inference.
Thread c.Request().Context() (or the equivalent input.Context after
the request middleware has installed the correlation-ID derived
context) through each helper and into ModelOptions via
model.WithContext(ctx). ImageGeneration's signature gains a leading
ctx parameter; in-tree callers (openai image, openai inpainting,
openai inpainting_test) are updated to match.
ModelEmbedding gains a leading ctx parameter for the same reason; the
openai and ollama embedding handlers pass the request context through.
chat_stream_workers.go defers the initial role=assistant chunk
emission until the first token callback so the wrapper's lazy
X-LocalAI-Node lookup against the loader runs AFTER ml.Load has
stamped the per-modelID node ID; semantically identical for clients
(role still arrives before any text).
Regression test core/backend/ctx_propagation_test.go pins ctx
propagation for all five helpers.
Docs updated to enumerate the full endpoint coverage of the
--expose-node-header flag.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Add a routing middleware stack and a cloud-proxy backend.
* cloud-proxy: a Go gRPC backend that forwards OpenAI- and
Anthropic-shaped chat requests to upstream providers, with an
optional translate mode (OpenAI request -> Anthropic /v1/messages
-> OpenAI response) and full tool-calling support.
* routing: admission control, content-aware model routing
(embedding cache + classifier + rerank + Arch-Router score),
PII detection/redaction (regex + NER) with streaming filter and
OpenAI/Anthropic adapters, and a per-user/per-key billing recorder
backed by GORM or in-memory storage.
* middleware: UsageMiddleware records usage via the billing recorder,
plus admission, route-model, usage-stamp and trace middlewares.
* observability: BackendTrace ring buffer stores full request bodies
(capped), MITM proxy emits structured trace events, and router
classifier decisions surface at /api/router/decide.
* gallery: Arch-Router-1.5B (Q4_K_M and Q8_0).
* UI: cloud-proxy model-editor fields, classifier system-prompt and
score-normalization config, and a Traces page rendering request
bodies.
Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(middleware): parse OpenAI-spec tool_choice in /v1/chat/completions
Follows up on #9526 (the 3-site setter fix) by addressing the remaining
clause in #9508 — string mode and OpenAI-spec specific-function shape both
silently failed in the /v1/chat/completions parsing path.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(middleware): restore LF endings and cover tool_choice parsing with specs
The previous commit on this branch saved core/http/middleware/request.go
with CRLF line endings, ballooning the diff against master to 684 / 651
for what is in reality a ~50-line parsing change. Restore LF (matches
.editorconfig end_of_line = lf).
Add 11 Ginkgo specs under "SetModelAndConfig tool_choice parsing
(chat completions)" that parallel the existing MergeOpenResponsesConfig
specs from #9509. They drive the full middleware chain (SetModelAndConfig
+ SetOpenAIRequest) and assert:
* "required" -> ShouldUseFunctions=true, no specific name
* "none" -> ShouldUseFunctions=false (tools disabled per OpenAI spec)
* "auto" -> default, tools available, no specific name
* {type:function, function:{name:X}} (spec) -> X is forced
* {type:function, name:X} (legacy) -> X is forced
* nested wins when both forms are present
* malformed shapes (no type, wrong type, no name, empty name) are no-ops
Update the inline comment on the string case to describe the actual
mechanism: "none" reaches SetFunctionCallString("none") downstream and
is then honored by ShouldUseFunctions() returning false. Before this PR
json.Unmarshal([]byte("none"), &functions.Tool{}) failed silently, so
"none" was ignored - making "none" actually work is a real behavior fix
this PR brings.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
* fix(middleware): preserve pre-#9559 support for JSON-string-encoded tool_choice
Some non-spec clients send tool_choice as a JSON-encoded string of an
object form, e.g. "{\"type\":\"function\",\"function\":{\"name\":\"X\"}}".
The pre-#9559 code accepted this by accident: its case string: branch
ran json.Unmarshal([]byte(content), &functions.Tool{}), which succeeded
for that double-encoded shape even though it failed for the legitimate
plain string modes "auto" / "none" / "required".
The first version of this PR routed every string straight to
SetFunctionCallString as a mode, which fixed the plain-string cases but
silently regressed the double-encoded one (funcs.Select("{...}") returns
nothing). Restore the fallback: when a string looks like a JSON object,
try parsing it as a tool_choice map first; fall through to mode-string
handling only when no usable name comes out.
Factor the map-name extraction into a small helper
(extractToolChoiceFunctionName) so the string-fallback and the regular
map case go through identical code, and accept both the OpenAI-spec
nested shape and the legacy/Anthropic flat shape from either entry
point.
Add 3 Ginkgo specs covering the double-encoded case (nested form, legacy
form, and the fall-through when the JSON has no usable name).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
* test(middleware): silence errcheck on AfterEach os.RemoveAll
The new tool_choice parsing tests added a second AfterEach that calls
os.RemoveAll(modelDir) without checking the error; errcheck flagged it.
Suppress with the standard _ = idiom. The pre-existing AfterEach on the
earlier Describe still elides the check the same way it did before -
leaving that untouched to keep this commit minimal.
Assisted-by: Claude:opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Two bugs in MergeOpenResponsesConfig (/v1/responses + WebSocket, *not*
/v1/chat/completions — that has a separate, working path via Tool
unmarshal + SetFunctionCallNameString):
1. **Shape mismatch.** OpenAI's specific-function tool_choice nests the
name under "function":
{"type": "function", "function": {"name": "my_function"}}
The legacy flat shape was:
{"type": "function", "name": "my_function"}
Only the flat shape was handled. OpenAI-compliant clients that reach
/v1/responses (openai-python with the Responses API, Stainless-generated
SDKs, …) silently failed to force the function.
2. **Wrong setter.** The code called SetFunctionCallString(name), which
writes the mode field (functionCallString: "none"/"auto"/"required").
The specific-function name lives in a separate field
(functionCallNameString), read by ShouldCallSpecificFunction and
FunctionToCall. Net effect: a correctly-formed tool_choice never
engaged grammar-based forcing.
The fix preserves backward compatibility by accepting both shapes
(nested preferred, flat as fallback) and routes to the correct setter.
Note: The same "wrong setter" pattern appears at three other sites —
anthropic/messages.go:883, openai/realtime_model.go:171, and
openresponses/responses.go:776 — and /v1/chat/completions has its own
issue parsing tool_choice="required" as a string (json.Unmarshal on a
raw string fails silently). Those are filed as a tracking issue rather
than bundled here to keep this PR focused.
## Test plan
9 new Ginkgo specs under "MergeOpenResponsesConfig tool_choice parsing":
- string modes: "required" / "auto" / "none"
- OpenAI-spec nested shape: {type:function, function:{name}}
- Legacy Anthropic-compat flat shape: {type:function, name}
- Shape-preference: nested wins over flat when both present
- Malformed: missing type, wrong type, missing name, empty name, nil
$ go test ./core/http/middleware/ -count=1 -run TestMiddleware
Ran 28 of 28 Specs in 0.003 seconds -- PASS
## Repro (against /v1/responses)
curl -N http://localai/v1/responses \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.6-35b-a3b-apex",
"input":"Weather in Berlin?",
"tools":[{"type":"function","name":"get_weather",
"parameters":{"type":"object",
"properties":{"city":{"type":"string"}},
"required":["city"]}}],
"tool_choice":{"type":"function",
"function":{"name":"get_weather"}}}'
Before: grammar-based forcing silently inactive; model free-texts.
After : grammar forces get_weather invocation; output contains
tool_calls with function:{name:"get_weather", arguments:{...}}.
Upstream llama.cpp (PR #21962) switched the server-side mtmd media
marker to a random per-server string and removed the legacy
"<__media__>" backward-compat replacement in mtmd_tokenizer. The
Go layer still emitted the hardcoded "<__media__>", so on the
non-tokenizer-template path the prompt arrived with a marker mtmd
did not recognize and tokenization failed with "number of bitmaps
(1) does not match number of markers (0)".
Report the active media marker via ModelMetadataResponse.media_marker
and substitute the sentinel "<__media__>" with it right before the
gRPC call, after the backend has been loaded and probed. Also skip
the Go-side multimodal templating entirely when UseTokenizerTemplate
is true — llama.cpp's oaicompat_chat_params_parse already injects its
own marker and StringContent is unused in that path. Backends that do
not expose the field keep the legacy "<__media__>" behavior.
* feat: add toggle mechanism to enable/disable models from loading on demand
Implements #9303 - Adds ability to disable models from being auto-loaded
while keeping them in the collection.
Backend changes:
- Add Disabled field to ModelConfig struct with IsDisabled() getter
- New ToggleModelEndpoint handler (PUT /models/toggle/:name/:action)
- Request middleware returns 403 when disabled model is requested
- Capabilities endpoint exposes disabled status
Frontend changes:
- Toggle switch in System > Models table Actions column
- Visual indicators: dimmed row, red Disabled badge, muted icons
- Tooltip describes toggle function on hover
- Loading state while API call is in progress
* fix: remove extra closing brace causing syntax error in request middleware
* refactor: reorder Actions column - Stop button before toggle switch
* refactor: migrate from toggle to toggle-state per PR review feedback
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: wire min_p
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: inferencing defaults
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(refactor): re-use iterative parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: generate automatically inference defaults from unsloth
Instead of trying to re-invent the wheel and maintain here the inference
defaults, prefer to consume unsloth ones, and contribute there as
necessary.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: apply defaults also to models installed via gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: be consistent and apply fallback to all endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(importer): support ollama and OCI, unify code
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support importing from local file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* support also yaml config files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle local files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract importing errors
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add importer tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add integration tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(UX): improve and specify supported URI formats
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fail if backend does not have a runfile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add cache for galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): remove handler duplicate
File input handlers are now handled by Alpine.js @change handlers in chat.html.
Removed duplicate listeners to prevent files from being processed twice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): be consistent in attachments in the chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fail if no importer matches
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: propagate ops correctly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): show stats in chat, improve style
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Markdown, small improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Display token/sec into stats
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Minor enhancement
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "Fixups"
This reverts commit ab1b3d6da9.
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: respect context
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* workaround fasthttp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): allow to abort call
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: improving error
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Respect context also with MCP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tie to both contexts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make detection more robust
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
- Add a system backend path
- Refactor and consolidate system information in system state
- Use system state in all the components to figure out the system paths
to used whenever needed
- Refactor BackendConfig -> ModelConfig. This was otherway misleading as
now we do have a backend configuration which is not the model config.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* migrate core/system to pkg/system - it has no dependencies FROM core, and IS USED in pkg
Signed-off-by: Dave Lee <dave@gray101.com>
* move pkg/templates up to core/templates -- nothing in pkg references it, but it does reference core.
Signed-off-by: Dave Lee <dave@gray101.com>
* remove extra check, len of nil is 0
Signed-off-by: Dave Lee <dave@gray101.com>
* move pkg/startup to core/startup -- it does have important and unfixable dependencies on core
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
* Support file inputs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support multiple files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show preview of files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>