LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-14 02:04:13 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	61bf34ea2f	fix(traces): cap captured body size to keep admin Traces UI responsive (#9946 ) The trace middleware buffered the full request and response bodies for every JSON exchange. With a chatty agent-pool RAG workload, /embeddings responses (large vector arrays) accumulated to tens of MB in the in-memory buffer; the admin Traces page would then download and parse 40+ MB on every load and on every 5s auto-refresh, locking the UI in a loading state. Add LOCALAI_TRACING_MAX_BODY_BYTES (default 64 KiB) that caps each captured body. The full payload still flows through to the real client; only the trace copy is bounded. Exchanges record body_truncated and original body_bytes so the dashboard can show that truncation happened. The cap is configurable via env, CLI, and runtime_settings.json. Also unblock recovery: the Traces page now keeps the Clear button enabled while loading, since "buffer too large to render" is exactly when the user needs to clear it. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-22 15:29:24 +02:00
LocalAI [bot]	0b2ae3c6ca	fix(openai): stream usage non-zero when tools are enabled (#9941 ) * chore: ignore local .worktrees directory Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(openai): stream usage non-zero when tools are enabled The streaming chat-completions worker for tool-bearing requests (processTools in core/http/endpoints/openai/chat.go) never forwarded the cumulative TokenUsage from ComputeChoices to the chunks it placed on the responses channel. The outer streaming loop's running usage tracker therefore stayed at the zero value, and the include_usage trailer reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever the request carried a `tools` array. Without tools, the alternative `process` path stamps Usage on every chunk, so that path was unaffected. Forward the final TokenUsage via a usage-only sentinel chunk (empty Choices, populated Usage) emitted right before close(responses). The outer loop's per-chunk Usage capture moves above the empty-Choices skip so the sentinel updates the tracker without ever reaching the wire, keeping the existing OpenAI spec contract (intermediate chunks carry no `usage` field, and the deferred-final-chunk helpers remain Usage-free per the regression test for issue #8546). Adds streamUsageFromTokenUsage, usageSentinelChunk, and applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level test that mirrors the outer-loop sequence. Fixes #9927 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] * refactor(openai): return final TokenUsage from stream workers Replace the usage-only sentinel SSE chunk introduced in the previous commit with a plain return value. The streaming workers process and processTools (now extracted as package-level processStream and processStreamWithTools) return (backend.TokenUsage, error); the outer ChatEndpoint loop reads the cumulative counts off the existing `ended` channel (now carrying streamWorkerResult{usage, err}) and builds the include_usage trailer from a normal Go value after the LOOP exits. This drops the empty-Choices "skip but capture Usage" rule from the outer loop and removes the usageSentinelChunk / applyChunkToUsage helpers entirely. The SSE responses channel is back to a single purpose: wire chunks only. processStream and processStreamWithTools move into chat_stream_workers.go so they can be exercised directly from tests. The chat_stream_usage_test.go suite now drives the workers with a mocked backend.ModelInferenceFunc and asserts on the returned TokenUsage. The regression coverage for issue #9927 is therefore behavioral: reverting the fix (discarding ComputeChoices' usage return) makes the assertions fail with concrete count mismatches. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-22 10:13:41 +02:00
LocalAI [bot]	f0cb02afb8	feat(usage): attribute Sources rows to user accounts in admin view (#9935 ) The merged feature (#9920) let admins see per-API-key and per-source totals but did not surface which user owned each key, and lumped every user's Web UI traffic into a single global Web UI row. This makes the admin Sources tab properly per-user attributable: - KeyTotal gains UserID + UserName, populated from the snapshot the usage middleware already records. The by_key roll-up now groups by (api_key_id, api_key_name, user_id, user_name). - New SourceTotals.ByUserSource roll-up groups (source, user_id, user_name) for sources without a key identity (web, legacy). Only populated on the admin path (includeLegacy=true); the non-admin endpoint stays unchanged for backwards compatibility. - SourcesTable accepts showUserColumn={isAdmin}; admin view renders a User column, makes the search match user name/id, and expands Web UI / legacy pseudo-rows from the global aggregate to one row per user using by_user_source. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-21 23:23:06 +02:00
LocalAI [bot]	a39e025d64	fix(nodes): make per-node backend install async via gallery job queue (#9928 ) * feat(galleryop): add TargetNodeID to ManagementOp for single-node installs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(galleryop): add NodeScopedKey helpers for per-node opcache rows Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(galleryop): use strings.Cut for NodeScopedKey parsing, reject empty nodeID Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(nodes): scope DistributedBackendManager.InstallBackend to single node via TargetNodeID Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(http): make /api/nodes/:id/backends/install async via gallery service job queue The handler previously called unloader.InstallBackend synchronously and blocked the browser for up to 3 minutes waiting on the NATS reply. It now enqueues a TargetNodeID-scoped ManagementOp on BackendGalleryChannel and returns HTTP 202 + jobID immediately, matching /api/backends/install/:id. The opcache key is built via NodeScopedKey(nodeID, backend) so concurrent installs of the same backend across different nodes do not stomp each other. galleryService/opcache/appConfig are threaded through RegisterNodeAdminRoutes for this. Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(http): log malformed backend_galleries override and stop test drain goroutine Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(api): expose nodeID for node-scoped backend ops in /api/operations Node-scoped backend installs land in opcache under "node:<nodeID>:<backend>" keys. Without splitting that prefix back out, the operations panel renders the full key as the display name and has no structured way to label which worker an install is targeting. Detect the prefix, surface nodeID as its own response field, and reduce the display name back to the bare backend slug. Bare (non-scoped) ops are left untouched so legacy installs do not gain a misleading empty nodeID. Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(react-ui): poll job status for node-targeted backend installs Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(react-ui): make NodeInstallPicker state updates pure and surface cancellations as errors Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(react-ui): clarify async semantics in handleInstallOnTarget Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(http): use statusUrl casing for node install response to match codebase precedent Assisted-by: Claude:opus-4-7 [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-21 22:25:53 +02:00
LocalAI [bot]	f15b9178ec	feat(usage): track and visualise usage per API key (#9920 ) * feat(usage): add Source, APIKeyID, APIKeyName columns to UsageRecord Adds three additive columns plus UsageSource* constants. The columns are auto-migrated by InitDB. APIKeyID is a nullable foreign reference to UserAPIKey.ID; APIKeyName is snapshotted on each row so revoked keys keep showing their name in history. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): backfill Source on pre-feature usage rows InitDB now classifies any pre-existing usage_record with an empty source: 'legacy-api-key' user -> legacy, everything else -> web. The backfill is idempotent (only touches NULL/empty rows). Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): add GetUserUsageBySource aggregator Groups by (bucket, source, api_key_id, api_key_name). Filters out legacy by default. Returns both per-bucket detail and roll-ups (by_source, by_key sorted desc and capped at 200, grand_total). The MAX(created_at) projection is iterated via Rows().Scan into a string column and parsed manually because the SQLite driver surfaces the aggregated timestamp as a string, which database/sql refuses to scan directly into time.Time. Postgres returns a real timestamp; the same string path handles its RFC3339 form too. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(usage): log Rows() errors and assert LastUsed in tests Adds rows.Err() and Rows() open-failure logging in computeSourceTotals so silent data drops surface in logs. Logs on parseLastUsedString format misses for the same reason. Strengthens the snapshot-survival test to assert LastUsed is a recent timestamp, locking the SQLite time-string parser behaviour. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): add admin GetAllUsageBySource with filters and truncation Optional user_id and api_key_id filters (composed with AND). Legacy bucket is included for admin callers. truncated=true when more than 200 distinct keys would be in the by_key roll-up. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(auth): plumb auth_source and auth_apikey through Echo context tryAuthenticate now sets auth_source on every successful branch (web for session/Bearer-session, apikey for Bearer-key/x-api-key/ token-cookie, legacy for legacy env key match). For named-key branches it also stores the resolved UserAPIKey under auth_apikey so downstream middlewares can snapshot id+name without re-validating. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(auth): expand tryAuthenticate godoc and cover Bearer-session branch Documents all three context-keys side effects (auth_source, auth_apikey, _auth_session) plus the split of responsibilities with the parent Middleware. Adds a test for the Bearer-as-session-token classification so future regressions there fail loudly. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): UsageMiddleware records source + snapshots key name Reads auth_source and auth_apikey from the Echo context (set by auth.Middleware in the previous task). Snapshots UserAPIKey.ID and Name onto each row so revoked keys remain readable in history. Falls back to source=web when no auth_source is set (auth disabled or unrecognised path). Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): add /api/auth/usage/sources and admin variant Self endpoint filters legacy server-side; admin endpoint includes legacy and accepts user_id + api_key_id filters. Response includes buckets, totals.{by_source, by_key, grand_total}, and a truncated flag set when the per-key roll-up was capped at 200. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(routes): mark test mirror handlers as keep-in-sync with production The newTestAuthApp helper duplicates production route handlers inline because it cannot use RegisterAuthRoutes (which requires a application.Application). Naming the source path on each mirror makes the drift contract explicit for future maintainers. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(ui): add usageApi.getMySources/getAdminSources + i18n strings Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add Sources tab skeleton with data fetch Adds Usage page tab that fetches /api/auth/usage/sources (or the admin variant). Renders raw totals plus a placeholder key list; real visualisations land in subsequent commits. Restructures the existing tab button block so Models and Sources are visible to non-admins (Users remains admin-only). Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): source mix ribbon + searchable/sortable sources table Replaces the SourcesTab placeholder rendering with two reusable components: SourceMixRibbon (one segmented bar per source class) and SourcesTable (search + sort + revoked-key dim). Pulls the current API key list to detect revoked keys. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ui): skip revoked-key detection until the key list is known existingKeyIds defaulted to an empty Set, which made every live api_key row render as (revoked) during the brief window before apiKeysApi.list() resolved, and permanently after a fetch failure. Use null as the unknown state and suppress the revoked badge until the parent provides a real Set. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): top-N stacked time chart and drill-in chip for Sources tab Top 7 sources by total tokens get distinct colours; the rest roll up into 'Other'. Clicking a row in the SourcesTable dims everything except that series in the chart; the chip is the canonical clear. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(usage): document per-API-key Sources tab and endpoints Extends features/authentication.md Usage Tracking section with: - A 'Sources' tab description and source-class taxonomy - Endpoint documentation for /api/auth/usage/sources and the admin variant - Response shape example with by_source / by_key / grand_total - Migration note about pre-feature row backfill Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(usage): silence errcheck on deferred rows.Close CI errcheck flagged the bare 'defer rows.Close()' in computeSourceTotals. Wrap in a closure that discards the close error explicitly; an error here is non-actionable since we have already drained the rows and logged any iteration failure. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(usage): bound batcher intake and add Shutdown/FlushNow hooks The pre-existing usage batcher had no cap on its add() path; the usageMaxPending=5000 constant only guarded the re-queue path after a failed write, leaving memory growth unbounded if the DB fell behind. This commit: - Adds the cap to add() so saturation drops new records (rate-limited warn at 1/1024) instead of growing unbounded. - Raises usageMaxPending to 50000 to absorb realistic inference bursts. - Replaces the package-level batcher global with a mutex-guarded pair plus a currentBatcher() accessor so Init / Shutdown cycles are race-free. - Adds ShutdownUsageRecorder() for graceful drain on process exit (not yet wired into app shutdown, just published). - Adds FlushNow() for deterministic tests; the middleware suite no longer needs 6s sleeps per spec and now runs in ~50ms instead of 18s. - Re-queue on failed flush is now cap-aware: prepends as much of the failed batch as fits alongside concurrent arrivals, instead of dropping the whole batch when full. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(usage): drain usage batcher on graceful shutdown Registers ShutdownUsageRecorder with the existing signals.RegisterGracefulTerminationHandler so SIGINT/SIGTERM synchronously flushes any in-memory usage records before the process exits. Without this, up to one flush interval (5s) of recorded usage was lost when LocalAI restarted. Refs: #9862 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-21 16:34:02 +02:00
LocalAI [bot]	11d5bd0cc3	fix(react-ui/chat): stop wiping selection on every /api/operations poll (#9904 ) (#9917 ) useOperations() was calling setOperations() with a fresh array on every 1s poll, even when the payload was identical. In React 19 the DOM diff no longer short-circuits dangerouslySetInnerHTML on equal __html, so the forced Chat re-render re-assigned innerHTML on every assistant message once per second — wiping any text the user had selected. Skip the state update when the serialised operations payload is unchanged, and switch loading/error to functional setters so they also short-circuit at the source. Also fixes the chat copy button on plain HTTP: navigator.clipboard is undefined in non-secure contexts (a common LXC+Docker deployment), but the previous code called it unconditionally and showed a success toast regardless. Routed Chat, AgentChat and CanvasPanel through a new copyToClipboard() helper that uses navigator.clipboard when available and falls back to a hidden-textarea + execCommand('copy') trick that browsers still honour outside secure contexts. The fallback preserves the user's existing selection. Regression coverage in e2e/chat-polling-selection.spec.js: a MutationObserver counts mutations on the assistant content node across 3s of polling (must be 0); the copy test stubs out navigator.clipboard and asserts that execCommand('copy') is invoked. Assisted-by: claude-opus-4-7-1m Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-21 12:17:51 +02:00
Daniel Liljeberg	fc3980dadd	fix: inject text-file content into chat completions messages (#9896 ) Non-image/non-audio file attachments (txt, md, csv, json) were being stored in the 'files' metadata field but never added to the message content array sent to /v1/chat/completions. Images and audio correctly received content blocks; files did not. Fix: push a text content block into messageContent when textContent is present, matching the pattern used for image_url and audio_url. Also fixes Home.jsx addFiles which never called file.text() at all, meaning files attached on the home screen had empty textContent even before reaching useChat.js. Note: PDF files use file.text() which returns raw bytes rather than parsed text. Proper PDF support would require PDF.js or server-side extraction and is not part of this fix. Signed-off-by: Daniel Liljeberg <damien_@hotmail.com>	2026-05-20 01:00:32 +02:00
LocalAI [bot]	a39591f144	realtime: honor output_modalities to skip TTS in text-only mode (#9838 ) * realtime: honor output_modalities to skip TTS in text-only mode The emulated realtime pipeline previously ignored the OpenAI Realtime spec field output_modalities and always synthesized TTS. Add resolveOutputModalities + modalitiesContainAudio helpers and gate the TTS / ResponseOutputAudio* emission so a client requesting ["text"] gets only ResponseOutputText* events. This lets thin clients (e.g. thing5-poc) cache TTS on the client side while still using the realtime WS for VAD + STT + LLM + tool-call parsing. Assisted-by: Claude:claude-opus-4-7 * realtime: plumb response-level output_modalities and echo on session Follow-up to the previous commit: - Resolve response.create's output_modalities at the gate so a per-response override of an audio session is honored (the test asserted this contract but the production call site was passing nil). - Mirror OutputModalities in the RealtimeSession echo so session.update round-trips the client-supplied value, matching MaxOutputTokens's pattern. Assisted-by: Claude:claude-opus-4-7 * realtime: silence errcheck on deferred os.Remove of TTS file CI's errcheck flagged the pre-existing `defer os.Remove(audioFilePath)` inside the audio-emission block (now wrapped by the modality gate). Wrap the call in a closure that explicitly discards the error — the canonical Go pattern for "I want to defer a cleanup whose error I genuinely don't care about." Assisted-by: Claude:claude-opus-4-7 golangci-lint --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-15 12:39:47 +02:00
massy_o	745473cbe6	Validate video image URLs before download (#9819 ) Signed-off-by: massy-o <telitos000@gmail.com>	2026-05-14 15:07:17 +02:00
LocalAI [bot]	8af963bdd9	fix(streaming): comply with OpenAI usage / stream_options spec (#9815 ) * fix(streaming): comply with OpenAI usage / stream_options spec (#8546) LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed chunk because `OpenAIResponse.Usage` was a value type without `omitempty`. The official OpenAI Node SDK and its consumers (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthy `result.usage` to detect the trailing usage chunk; LocalAI's zero-but-non-null usage on every intermediate chunk made that filter swallow every content chunk and surface an empty chat response while the server log looked successful. Changes: - `core/schema/openai.go`: `Usage OpenAIUsage \`json:"usage,omitempty"\`` so intermediate chunks no longer carry a `usage` key. Add `OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's request field. - `core/http/endpoints/openai/chat.go` and `completion.go`: keep using the `Usage` struct field as an in-process channel for the running cumulative, but strip it before JSON marshalling. When the request set `stream_options.include_usage: true`, emit a dedicated trailing chunk with `"choices": []` and the populated usage (matching the OpenAI spec and llama.cpp's server behavior). - `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the `usage` parameter from `buildNoActionFinalChunks` since chunks no longer carry usage. - Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage values with `&` for the new pointer field. - UI: send `stream_options:{include_usage:true}` from the React (`useChat.js`) and legacy (`static/chat.js`) chat clients so the token-count badge keeps populating now that the server is spec-compliant. Tests: - New `chat_stream_usage_test.go` pins the spec invariants: intermediate chunks have no `usage` key, the trailer JSON has `"choices":[]` and a populated `usage`, and `OpenAIRequest` parses `stream_options.include_usage`. - Update `chat_emit_test.go` to reflect that finals no longer embed usage. Verified against the live LocalAI instance: before the fix Continue's filter logic swallowed 16/16 token chunks; with the new shape it yields 4/5 and routes usage through the dedicated trailer chunk. Fixes #8546 Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(streaming): silence errcheck on usage trailer Fprintf The new spec-compliant `stream_options.include_usage` trailer writes were flagged by errcheck since they're new code (golangci-lint runs new-from-merge-base on master); the surrounding `fmt.Fprintf` data: writes are grandfathered. Drop the return values explicitly to match the linter's contract without adding a nolint shim. Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-14 08:53:46 +02:00
Tai An	67c34bbb96	fix(middleware): parse OpenAI-spec tool_choice in /v1/chat/completions (#9559 ) * fix(middleware): parse OpenAI-spec tool_choice in /v1/chat/completions Follows up on #9526 (the 3-site setter fix) by addressing the remaining clause in #9508 — string mode and OpenAI-spec specific-function shape both silently failed in the /v1/chat/completions parsing path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(middleware): restore LF endings and cover tool_choice parsing with specs The previous commit on this branch saved core/http/middleware/request.go with CRLF line endings, ballooning the diff against master to 684 / 651 for what is in reality a ~50-line parsing change. Restore LF (matches .editorconfig end_of_line = lf). Add 11 Ginkgo specs under "SetModelAndConfig tool_choice parsing (chat completions)" that parallel the existing MergeOpenResponsesConfig specs from #9509. They drive the full middleware chain (SetModelAndConfig + SetOpenAIRequest) and assert: * "required" -> ShouldUseFunctions=true, no specific name * "none" -> ShouldUseFunctions=false (tools disabled per OpenAI spec) * "auto" -> default, tools available, no specific name * {type:function, function:{name:X}} (spec) -> X is forced * {type:function, name:X} (legacy) -> X is forced * nested wins when both forms are present * malformed shapes (no type, wrong type, no name, empty name) are no-ops Update the inline comment on the string case to describe the actual mechanism: "none" reaches SetFunctionCallString("none") downstream and is then honored by ShouldUseFunctions() returning false. Before this PR json.Unmarshal([]byte("none"), &functions.Tool{}) failed silently, so "none" was ignored - making "none" actually work is a real behavior fix this PR brings. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] * fix(middleware): preserve pre-#9559 support for JSON-string-encoded tool_choice Some non-spec clients send tool_choice as a JSON-encoded string of an object form, e.g. "{\"type\":\"function\",\"function\":{\"name\":\"X\"}}". The pre-#9559 code accepted this by accident: its case string: branch ran json.Unmarshal([]byte(content), &functions.Tool{}), which succeeded for that double-encoded shape even though it failed for the legitimate plain string modes "auto" / "none" / "required". The first version of this PR routed every string straight to SetFunctionCallString as a mode, which fixed the plain-string cases but silently regressed the double-encoded one (funcs.Select("{...}") returns nothing). Restore the fallback: when a string looks like a JSON object, try parsing it as a tool_choice map first; fall through to mode-string handling only when no usable name comes out. Factor the map-name extraction into a small helper (extractToolChoiceFunctionName) so the string-fallback and the regular map case go through identical code, and accept both the OpenAI-spec nested shape and the legacy/Anthropic flat shape from either entry point. Add 3 Ginkgo specs covering the double-encoded case (nested form, legacy form, and the fall-through when the JSON has no usable name). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] * test(middleware): silence errcheck on AfterEach os.RemoveAll The new tool_choice parsing tests added a second AfterEach that calls os.RemoveAll(modelDir) without checking the error; errcheck flagged it. Suppress with the standard _ = idiom. The pre-existing AfterEach on the earlier Describe still elides the check the same way it did before - leaving that untouched to keep this commit minimal. Assisted-by: Claude:opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-14 00:14:38 +02:00
Adira	c2fe0a6475	fix(http): honor X-Forwarded-Prefix when proxy strips the prefix (#9614 ) * fix(http): honor X-Forwarded-Prefix when proxy strips the prefix Closes #9145. Two related issues kept the React UI from loading when a reverse proxy rewrites a sub-path with prefix-stripping (e.g. Caddy `handle_path`): 1. `BaseURL` only computed a prefix from the path StripPathPrefix had removed, so when the proxy strips the prefix before forwarding, the request arrives without it and the base URL was returned without a prefix. Extract a `BasePathPrefix` helper and add an `X-Forwarded-Prefix` header fallback so the prefix is recovered. 2. `<base href>` only changes how relative URLs resolve; the build emits path-absolute references like `/assets/...` and `/favicon.svg`, which still resolve against the origin and bypass the proxy prefix. Rewrite those references in the served `index.html` so the browser requests them through the proxy. Adds unit coverage for `BaseURL` with a pre-stripped path and an end-to-end test for the proxy-stripped scenario. Assisted-by: Claude:claude-opus-4-7 * fix(http): gate X-Forwarded-Prefix through SafeForwardedPrefix in BasePathPrefix BasePathPrefix consumed X-Forwarded-Prefix directly, so a value the codebase elsewhere rejects (e.g. "//evil.com") slipped through and was interpolated into the SPA index.html — both into the path-absolute asset URL rewrite in serveIndex (turning "/assets/..." into "//evil.com/assets/...", a protocol-relative URL that loads JS from a foreign origin) and into <base href>. Route the header through the existing SafeForwardedPrefix validator that StripPathPrefix and prefixRedirect already use, and HTML-escape the prefix before injecting it into the asset rewrite as defense in depth against attribute breakout. Tests cover //evil.com, backslashes, control chars, CR/LF and a missing leading slash; the integration test asserts an unsafe prefix can't poison asset URLs. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7-1m [Read] [Edit] [Bash] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-13 21:59:33 +02:00
Richard Palethorpe	0245b33eab	feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 ) * feat(liquid-audio): add LFM2.5-Audio any-to-any backend + realtime_audio usecase Wires LiquidAI's LFM2.5-Audio-1.5B as a self-contained Realtime API model: single engine handles VAD, transcription, LLM, and TTS in one bidirectional stream — drop-in alternative to a VAD+STT+LLM+TTS pipeline. Backend - backend/python/liquid-audio/ — new Python gRPC backend wrapping the `liquid-audio` package. Modes: chat / asr / tts / s2s, voice presets, Load/Predict/PredictStream/AudioTranscription/TTS/VAD/AudioToAudioStream/ Free and StartFineTune/FineTuneProgress/StopFineTune. Runtime monkey-patch on `liquid_audio.utils.snapshot_download` so absolute local paths from LocalAI's gallery resolve without a HF round-trip. soundfile in place of torchaudio.load/save (torchcodec drags NVIDIA NPP we don't bundle). - backend/backend.proto + pkg/grpc/{backend,client,server,base,embed, interface}.go — new AudioToAudioStream RPC mirroring AudioTransformStream (config/frame/control oneof in; typed event+pcm+meta out). - core/services/nodes/{health_mock,inflight}_test.go — add stubs for the new RPC to the test fakes. Config + capabilities - core/config/backend_capabilities.go — UsecaseRealtimeAudio, MethodAudio ToAudioStream, UsecaseInfoMap entry, liquid-audio BackendCapability row. - core/config/model_config.go — FLAG_REALTIME_AUDIO bitmask, ModalityGroups membership in both speech-input and audio-output groups so a lone flag still reads as multimodal, GetAllModelConfigUsecases entry, GuessUsecases branch. Realtime endpoint - core/http/endpoints/openai/realtime.go — extract prepareRealtimeConfig() so the gate is unit-testable; accept realtime_audio models and self-fill empty pipeline slots with the model's own name (user-pinned slots win). - core/http/endpoints/openai/realtime_gate_test.go — six specs covering nil cfg, empty pipeline, legacy pipeline, self-contained realtime_audio, user-pinned VAD slot, and partial legacy pipeline. UI + endpoints - core/http/routes/ui.go — /api/pipeline-models accepts either a legacy VAD+STT+LLM+TTS pipeline or a realtime_audio model; surfaces a self_contained flag so the Talk page can collapse the four cards. - core/http/routes/ui_api.go — realtime_audio in usecaseFilters. - core/http/routes/ui_pipeline_models_test.go — covers both code paths. - core/http/react-ui/src/pages/Talk.jsx — self-contained badge instead of the four-slot grid; rename Edit Pipeline → Edit Model Config; less pipeline-specific wording. - core/http/react-ui/src/pages/Models.jsx + locales/en/models.json — new realtime_audio filter button + i18n. - core/http/react-ui/src/utils/capabilities.js — CAP_REALTIME_AUDIO. - core/http/react-ui/src/pages/FineTune.jsx — voice + validation-dataset fields, surfaced when backend === liquid-audio, plumbed via extra_options on submit/export/import. Gallery + importer - gallery/liquid-audio.yaml — config template with known_usecases: [realtime_audio, chat, tts, transcript, vad]. - gallery/index.yaml — four model entries (realtime/chat/asr/tts) keyed by mode option. Fixed pre-existing `transcribe` typo on the asr entry (loader silently dropped the unknown string → entry never surfaced as a transcript model). - gallery/lfm.yaml — function block for the LFM2 Pythonic tool-call format `<\|tool_call_start\|>[name(k="v")]<\|tool_call_end\|>` matching common_chat_params_init_lfm2 in vendored llama.cpp. - core/gallery/importers/{liquid-audio,liquid-audio_test}.go — detector matches LFM2-Audio HF repos (excludes -gguf mirrors); mode/voice preferences plumbed through to options. - core/gallery/importers/importers.go — register LiquidAudioImporter before LlamaCPPImporter. - pkg/functions/parse_lfm2_test.go — seven specs for the response/argument regex pair on the LFM2 pythonic format. Build matrix - .github/backend-matrix.yml — seven liquid-audio targets (cuda12, cuda13, l4t-cuda-13, hipblas, intel, cpu amd64, cpu arm64). Jetpack r36 cuda-12 is skipped (Ubuntu 22.04 / Python 3.10 incompatible with liquid-audio's 3.12 floor). - backend/index.yaml — anchor + 13 image entries. - Makefile — .NOTPARALLEL, prepare-test-extra, test-extra, docker-build-liquid-audio. Docs - .agents/plans/liquid-audio-integration.md — phased plan; PR-D (real any-to-any wiring via AudioToAudioStream), PR-E (mid-audio tool-call detector), PR-G (GGUF entries once upstream llama.cpp PR #18641 lands) remain. - .agents/api-endpoints-and-auth.md — expand the capability-surface checklist with every place a new FLAG_* needs to be registered. Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): function calling + history cap for any-to-any models Three pieces, all on the realtime_audio path that just landed: 1. liquid-audio backend (backend/python/liquid-audio/backend.py): - _build_chat_state grows a `tools_prelude` arg. - new _render_tools_prelude parses request.Tools (the OpenAI Chat Completions function array realtime.go already serialises) and emits an LFM2 `<\|tool_list_start\|>…<\|tool_list_end\|>` system turn ahead of the user history. Mirrors gallery/lfm.yaml's `function:` template so the model sees the same prompt shape whether served via llama-cpp or here. Without this the backend silently dropped tools — function calling was wired end-to-end on the Go side but the model never saw a tool list. 2. Realtime history cap (core/http/endpoints/openai/realtime.go): - Session grows MaxHistoryItems int; default picked by new defaultMaxHistoryItems(cfg) — 6 for realtime_audio models (LFM2.5 1.5B degrades quickly past a handful of turns), 0/unlimited for legacy pipelines composing larger LLMs. - triggerResponse runs conv.Items through trimRealtimeItems before building conversationHistory. Helper walks the cut left if it would orphan a function_call_output, so tool result + call pairs stay intact. - realtime_gate_test.go: specs for defaultMaxHistoryItems and trimRealtimeItems (zero cap, under cap, over cap, tool-call pair preservation). 3. Talk page (core/http/react-ui/src/pages/Talk.jsx): - Reuses the chat page's MCP plumbing — useMCPClient hook, ClientMCPDropdown component, same auto-connect/disconnect effect pattern. No bespoke tool registry, no new REST endpoints; tools come from whichever MCP servers the user toggles on, exactly as on the chat page. - sendSessionUpdate now passes session.tools=getToolsForLLM(); the update re-fires when the active server set changes mid-session. - New response.function_call_arguments.done handler executes via the hook's executeTool (which round-trips through the MCP client SDK), then replies with conversation.item.create {type:function_call_output} + response.create so the model completes its turn with the tool output. Mirrors chat's client-side agentic loop, translated to the realtime wire shape. UI changes require a LocalAI image rebuild (Dockerfile:308-313 bakes react-ui/dist into the runtime image). Backend.py changes can be swapped live in /backends/<id>/backend.py + /backend/shutdown. Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): LocalAI Assistant ("Manage Mode") for the Talk page Mirrors the chat-page metadata.localai_assistant flow so users can ask the realtime model what's loaded / installed / configured. Tools are run server-side via the same in-process MCP holder that powers the chat modality — no transport switch, no proxy, no new wire protocol. Wire: - core/http/endpoints/openai/realtime.go: - RealtimeSessionOptions{LocalAIAssistant,IsAdmin}; isCurrentUserAdmin helper mirrors chat.go's requireAssistantAccess (no-op when auth disabled, else requires auth.RoleAdmin). - Session grows AssistantExecutor mcpTools.ToolExecutor. - runRealtimeSession, when opts.LocalAIAssistant is set: gate on admin, fail closed if DisableLocalAIAssistant or the holder has no tools, DiscoverTools and inject into session.Tools, prepend holder.SystemPrompt() to instructions. - Tool-call dispatch loop: when AssistantExecutor.IsTool(name), run ExecuteTool inproc, append a FunctionCallOutput to conv.Items, skip the function_call_arguments client emit (the client can't execute these — it doesn't know about them). After the loop, if any assistant tool ran, trigger another response so the model speaks the result. Mirrors chat's agentic loop, driven server-side rather than via client round-trip. - core/http/endpoints/openai/realtime_webrtc.go: RealtimeCallRequest gains `localai_assistant` (JSON omitempty). Handshake calls isCurrentUserAdmin and builds RealtimeSessionOptions. - core/http/react-ui/src/pages/Talk.jsx: admin-only "Manage Mode" checkbox under the Tools dropdown; passes localai_assistant: true to realtimeApi.call's body, captured in the connect callback's deps. Mirroring chat's pattern means the in-process MCP tools surface "just works" for the Talk page without exposing a Streamable-HTTP MCP endpoint (which was the alternative). Clients with their own MCP servers can still use the existing ClientMCPDropdown path in parallel; the realtime handler distinguishes them by AssistantExecutor.IsTool() at dispatch time. Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): render Manage Mode tool calls in the Talk transcript Previously the realtime endpoint only emitted response.output_item.added for the FunctionCall item, and Talk.jsx's switch ignored the event — so server-side tool runs were invisible in the UI. The model would speak the result but the user had no way to see what tool was actually called. realtime.go: after executing an assistant tool inproc, emit a second output_item.added/.done pair for the FunctionCallOutput item. Mirrors the way the chat page displays tool_call + tool_result blocks. Talk.jsx: handle both response.output_item.added and .done. Render FunctionCall (with arguments) and FunctionCallOutput (pretty-printed JSON when possible) as two transcript entries — `tool_call` with the wrench icon, `tool_result` with the clipboard icon, both in mono-space secondary-colour. Resets streamingRef after the result so the next assistant text delta starts a fresh transcript entry instead of appending to the previous turn. Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * refactor(realtime): bound the Manage Mode tool-loop + preserve assistant tools Fallout from a review pass on the Manage Mode patches: - Bound the server-side agentic loop. triggerResponse used to recurse on executedAssistantTool with no cap — a model that kept calling tools would blow the goroutine stack. New maxAssistantToolTurns = 10 (mirrors useChat.js's maxToolTurns). Public triggerResponse is now a thin shim over triggerResponseAtTurn(toolTurn int); recursion increments the counter and stops at the cap with an xlog.Warn. - Preserve Manage Mode tools across client session.update. The handler used to blindly overwrite session.Tools, so toggling a client MCP server mid-session silently wiped the in-process admin tools. Session now caches the original AssistantTools slice at session creation and the session.update handler merges them back in (client names win on collision — the client is explicit). - strconv.ParseBool for the localai_assistant query param instead of hand-rolled "1" \|\| "true". Mirrors LocalAIAssistantFromMetadata. - Talk.jsx: render both tool_call and tool_result on response.output_item.done instead of splitting them across .added and .done. The server's event pairing (added → done) stays correct; the UI just doesn't need to inspect both phases of the same item. One switch case instead of two, no behavioural change. Out of scope (noted for follow-ups): extract a shared assistant-tools helper between chat.go and realtime.go (duplication is small enough that two parallel implementations stay readable for now), and an i18n key for the Manage Mode helper text (Talk.jsx doesn't use i18n anywhere else yet). Assisted-by: claude-code:claude-opus-4-7-1m [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * ci(test-extra): wire liquid-audio backend smoke test The backend ships test.py + a `make test` target and is listed in backend-matrix.yml, so scripts/changed-backends.js already writes a `liquid-audio=true\|false` output when files under backend/python/liquid-audio/ change. The workflow just wasn't reading it. - Expose the `liquid-audio` output on the detect-changes job - Add a tests-liquid-audio job that runs `make` + `make test` in backend/python/liquid-audio, gated on the per-backend detect flag The smoke covers Health() and LoadModel(mode:finetune); fine-tune mode short-circuits before any HuggingFace download (backend.py:192), so the job needs neither weights nor a GPU. The full-inference path remains gated on LIQUID_AUDIO_MODEL_ID, which CI doesn't set. The four new Go test files (core/gallery/importers/liquid-audio_test.go, core/http/endpoints/openai/realtime_gate_test.go, core/http/routes/ui_pipeline_models_test.go, pkg/functions/parse_lfm2_test.go) are already picked up by the existing test.yml workflow via `make test` → `ginkgo -r ./pkg/... ./core/...`; their packages all carry RunSpecs entries. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-13 21:57:27 +02:00
dependabot[bot]	a689100d61	chore(deps): bump the npm_and_yarn group across 1 directory with 3 updates (#9728 ) Bumps the npm_and_yarn group with 3 updates in the /core/http/react-ui directory: [fast-uri](https://github.com/fastify/fast-uri), [hono](https://github.com/honojs/hono) and [ip-address](https://github.com/beaugunderson/ip-address). Updates `fast-uri` from 3.1.0 to 3.1.2 - [Release notes](https://github.com/fastify/fast-uri/releases) - [Commits](https://github.com/fastify/fast-uri/compare/v3.1.0...v3.1.2) Updates `hono` from 4.12.14 to 4.12.18 - [Release notes](https://github.com/honojs/hono/releases) - [Commits](https://github.com/honojs/hono/compare/v4.12.14...v4.12.18) Updates `ip-address` from 10.1.0 to 10.2.0 - [Commits](https://github.com/beaugunderson/ip-address/commits) --- updated-dependencies: - dependency-name: fast-uri dependency-version: 3.1.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: hono dependency-version: 4.12.18 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: ip-address dependency-version: 10.2.0 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-12 09:54:38 +02:00
LocalAI [bot]	bc3fb16105	feat(ollama): report model capabilities + details on /api/tags and /api/show (#9766 ) Ollama-compatible clients (Open WebUI, Enchanted, ollama-grid-search, etc.) rely on the `capabilities` list and `details.{parameter_size, quantization_level,families}` fields returned by /api/tags and /api/show to decide which models are eligible for a given task -- for example to filter the "embedding model" picker. Upstream Ollama returns these; LocalAI's compat layer was leaving them empty, so embedding models were silently rejected by clients that only allow chat models for chat and only allow embedding models for embeddings. This wires up the existing config signals already present in ModelConfig: - modelCapabilities() derives the Ollama capability strings from the config: "embedding" (FLAG_EMBEDDINGS), "completion" (FLAG_CHAT / FLAG_COMPLETION), "vision" (explicit KnownUsecases bit or MMProj / multimodal template / backend media marker), "tools" (auto-detected ToolFormatMarkers, JSON/Response regex, XML format, grammar triggers), "thinking" (ReasoningConfig with reasoning not disabled) and "insert" (presence of a completion template). - modelDetailsFromModelConfig() now fills families, parameter_size and quantization_level. The latter two are parsed from the GGUF filename via regex -- conservative tokens only (Q/IQ/F16/F32/BF16 and \d+(\.\d+)?[BM] surrounded by separators) so we don't accidentally match "Qwen3" as "3B". - modelInfoFromModelConfig() exposes general.architecture and general.context_length in the new ShowResponse.model_info map. Note: HasUsecases(FLAG_VISION) cannot be used directly -- GuessUsecases has no FLAG_VISION case and returns true at the end for any chat model. hasVisionSupport() instead reads KnownUsecases explicitly plus MMProj / template / media-marker signals. Tests are written first (TDD) using Ginkgo/Gomega -- DescribeTable for the capability mapping (embedding-only, chat, vision, thinking, tools via markers, tools via JSON regex, no-capability rerank) plus integration tests against ShowModelEndpoint that round-trip JSON through a real ModelConfigLoader populated from a temp YAML file. Fixes #9760. Assisted-by: Claude Code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-12 00:16:19 +02:00
LocalAI [bot]	d892e4af80	feat: add ds4 backend (DeepSeek V4 Flash) with tool calls, thinking, KV cache (#9758 ) * test(e2e-backends): allow BACKEND_BINARY for native-built backends Adds an escape hatch for hardware-gated backends (e.g. ds4) where the model is too large for Docker build context. When BACKEND_BINARY points at a run.sh produced by 'make -C backend/cpp/<name> package', the suite skips docker image extraction and drives the binary directly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(e2e-backends): validate BACKEND_BINARY basename + log actual source Two follow-ups from the `cbcf5148` code review: - BACKEND_BINARY now requires a path whose basename is `run.sh`. Without this check, `filepath.Dir(binary)` silently discarded the filename, so pointing the env var at an arbitrary binary failed later with a confusing assertion that named a path the user never typed. - The "Testing image=..." debug line printed an empty string when the binary path was used, hiding the actual source in CI logs. The line now reports whichever of BACKEND_IMAGE / BACKEND_BINARY is in effect as `src=...`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): scaffold ds4 backend dir Adds prepare.sh, run.sh, and a .gitignore. CMakeLists, Makefile, and the implementation arrive in follow-up commits. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): add backend Makefile Drives ds4's upstream Makefile to produce engine .o files (CUDA on Linux when BUILD_TYPE=cublas, Metal on Darwin, otherwise CPU debug path), then invokes CMake on our wrapper. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): add CMakeLists for grpc-server Generates protoc stubs from backend.proto, links grpc-server.cpp + dsml_parser.cpp + dsml_renderer.cpp + kv_cache.cpp against pre-built ds4 engine .o files. DS4_GPU=cuda\|metal\|cpu selects the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): grpc-server skeleton + module stubs The minimum that links: Backend service with Health + Free; other RPCs default to UNIMPLEMENTED. Stub headers/sources for dsml_parser, dsml_renderer, and kv_cache are in place so CMake links cleanly even before those modules ship. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): implement LoadModel Opens engine + creates session sized to ContextSize (default 32768). Backend is compile-time: CPU when DS4_NO_GPU, Metal on __APPLE__, else CUDA. MTP/speculative options are accepted via ModelOptions.Options[] (mtp_path, mtp_draft, mtp_margin). kv_cache_dir option is captured into g_kv_cache_dir for the cache module (Task 19 wires it in). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): implement TokenizeString Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): implement Predict (plain text) Tool calls + thinking-mode split arrive in Task 13 once dsml_parser is in. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): implement PredictStream (plain text) ChatDelta + reasoning/tool_calls split arrives in Task 14. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): implement Status RPC Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): add DSML streaming parser Classifies raw model-emitted token text into CONTENT / REASONING / TOOL_START / TOOL_ARGS / TOOL_END events. Markers it watches for are the literal DSML strings rendered by ds4_server.c's prompt template (<｜DSML｜tool_calls>, <｜DSML｜invoke name=...>, <think>, etc.) - these are plain text the model emits, not special tokens. Partial markers split across token chunks are buffered until a full marker or a definitively-not-a-marker '<' is observed. RandomToolId() generates the API-side tool call id (call_xxx) that exact-replay would key on. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): split hex escapes in DSML markers + add cstring/cstdio includes C++ \x hex escapes have no length cap. '\x9cD' was read as a single escape producing byte 0xCD, eating the 'D'. The markers were never actually matching the DSML text the model emits. Split each escape with adjacent string literal concatenation so the byte sequence is exactly EF BD 9C 44 (｜D) at runtime. Also adds <cstring> and <cstdio> includes (libstdc++ 13 does not transitively expose std::strlen / std::snprintf via <string>). The local plan file (uncommitted) was also updated with the same fixes so Task 16's dsml_renderer.cpp does not re-introduce the bug. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): wire DsmlParser into Predict (ChatDelta) Non-streaming Predict now emits one ChatDelta carrying content, reasoning_content, and tool_calls[] parsed from the model's DSML output. Reply.message still carries the raw model bytes for backends that prefer the regex fallback path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): wire DsmlParser into PredictStream Per-token ChatDelta writes: content/reasoning_content go incrementally, tool_calls emit TOOL_START as one delta (id + name) followed by TOOL_ARGS deltas with incremental JSON. The Go-side aggregator (pkg/functions/chat_deltas.go) reassembles them. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): chat template + reasoning_effort mapping UseTokenizerTemplate=true + Messages -> ds4_chat_begin / append / assistant_prefix. PredictOptions.Metadata['enable_thinking'] and ['reasoning_effort'] map to ds4_think_mode (DS4_THINK_HIGH default; 'max'/'xhigh' -> DS4_THINK_MAX; disabled -> DS4_THINK_NONE). Tool-call rendering for assistant turns with tool_calls JSON arrives in the next commit (dsml_renderer). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): render assistant tool_calls + tool results to DSML Closes the round-trip: when an OpenAI client sends a multi-turn chat where prior turns contain tool_calls or role=tool messages, build_prompt serializes them back to the DSML shape the model was trained on. Mirrors ds4_server.c's prompt renderer; uses nlohmann::json for parsing the OpenAI tool_calls payload. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): disk KV cache module Dir-based cache keyed by SHA1(rendered prompt prefix). File format: 'DS4G' magic + version + ctx_size + prefix_len + prefix + payload_bytes + ds4_session_save_payload output. NOT bit-compatible with ds4-server's KVC files - that interop is a follow-up plan. LoadLongestPrefix walks the dir picking the longest stored prefix that prefixes the incoming prompt. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): wire KvCache into Predict/PredictStream LoadModel reads 'kv_cache_dir' from ModelOptions.Options[], passes it to g_kv_cache.SetDir. Each Predict/PredictStream computes a render text for the request, tries LoadLongestPrefix to recover state, then Saves the new state after generation. ds4_session_sync handles the live-cache fast path internally, so the disk cache only matters for cold-starts and cross-session reuse. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): add package.sh Linux: bundles libc + ld + libstdc++ + libgomp + GPU runtime libs into package/lib so the FROM scratch image boots without a host libc. Darwin is handled by scripts/build/ds4-darwin.sh which uses otool -L. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): rename namespace ds4_backend -> ds4cpp ds4.h defines 'typedef enum {...} ds4_backend' which collides with our C++ 'namespace ds4_backend' anywhere a TU includes both. kv_cache.h includes ds4.h directly and surfaces the conflict immediately; other TUs would hit it once gRPC dev headers are available. Renames the C++ namespace to ds4cpp across all wrapper files and the plan, leaving the upstream ds4 typedef untouched. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend): add Dockerfile.ds4 Single-stage builder (CUDA devel image for cublas, ubuntu:24.04 for cpu) -> FROM scratch with packaged grpc-server + bundled runtime libs. nlohmann-json3-dev is required for dsml_renderer's JSON handling. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(make): wire backend/cpp/ds4 + ds4-darwin into root Makefile BACKEND_DS4 entry + generate-docker-build-target eval + docker-build-ds4 in docker-build-backends + .NOTPARALLEL guards. Also adds the backends/ds4-darwin target which delegates to scripts/build/ds4-darwin.sh (landed in Task 24). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: add backend-matrix entries for ds4 (cpu + cuda13, per-arch) Two entries per build (amd64 + arm64) so backend-merge-jobs assembles a multi-arch manifest. Skipping cuda12 - ds4 was validated against CUDA 13. Darwin Metal is handled outside this matrix by backend_build_darwin.yml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/index): add ds4 meta + image entries cpu + cuda13 x latest + master. Darwin Metal builds publish under ds4-darwin via the existing llama-cpp-darwin OCI pipeline. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(scripts/build): add ds4-darwin.sh Native macOS/Metal build for the ds4 backend. Mirrors llama-cpp-darwin.sh: make grpc-server -> otool -L for dylib bundling -> OCI tar that 'local-ai backends install' consumes via the backends/ds4-darwin Makefile target. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(darwin): build ds4-darwin in backend_build_darwin Adds a 'Build ds4 backend (Darwin Metal)' step that runs the backends/ds4-darwin Makefile target on the macOS runner. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(import): auto-detect ds4 weights via DS4Importer Adds core/gallery/importers/ds4.go which matches on the antirez/deepseek-v4-gguf repo URI and the DeepSeek-V4-Flash-.gguf filename pattern. Registered before LlamaCPPImporter so ds4 weights route to backend: ds4 instead of falling through to llama-cpp. Also lists ds4 in /backends/known so the /import-model UI surfaces it as a manual choice for users who want to force the backend on a non-canonical URI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(gallery): add deepseek-v4-flash-q2 (ds4 backend) One-click install of the q2 weights with backend: ds4. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(.agents): add ds4-backend.md Documents the backend shape, DSML state machine, thinking-mode mapping, disk KV cache, build matrix (cpu/cuda13/Darwin), and the BACKEND_BINARY hardware-validation path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): pass UBUNTU_VERSION + arch env vars to install-base-deps The .docker/install-base-deps.sh script needs UBUNTU_VERSION (defaults to 2404), TARGETARCH, SKIP_DRIVERS, and APT_MIRROR/APT_PORTS_MIRROR exported into the environment so it can pick the right cuda-keyring / cudss / nvpl debs and apt mirrors. Dockerfile.ds4 was declaring some of the ARGs but not re-exporting them via ENV. Mirrors Dockerfile.llama-cpp's pattern. Without this fix 'make docker-build-ds4 BUILD_TYPE=cublas CUDA_MAJOR_VERSION=13' failed at: /usr/local/sbin/install-base-deps: line 120: UBUNTU_VERSION: unbound variable Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/index): add Metal image entries for ds4 Adds metal-ds4 + metal-ds4-development image entries pointing at quay.io/go-skynet/local-ai-backends:{latest,master}-metal-darwin-arm64-ds4 (built by scripts/build/ds4-darwin.sh on macOS arm64 runners), plus the 'metal' and 'metal-darwin-arm64' capability mappings on the ds4 meta and ds4-development variant. Closes a gap from the initial Task 23 landing - the Darwin Metal build script and CI workflow step were already wired (Tasks 24-25), but the gallery had no image entry for users to install the Metal variant. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ci): use ubuntu:24.04 base for ds4 cuda13 matrix entries The initial Task 22 matrix landing used base-image: 'nvidia/cuda:13.0.0-devel-ubuntu24.04' which clashes with install-base-deps.sh's cuda-keyring step: E: Conflicting values set for option Signed-By regarding source https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/sbsa/ The canonical pattern (llama-cpp, ik-llama-cpp, turboquant) uses plain 'ubuntu:24.04' + 'skip-drivers: false' so install-base-deps installs CUDA from scratch via its own keyring setup. Adopting that here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): drop install-base-deps.sh dependency The .docker/install-base-deps.sh pipeline is built around the llama-cpp needs: NVIDIA keyring + cuda-toolkit apt + gRPC-from-source build at /opt/grpc. For ds4 we don't need any of that: - CUDA: nvidia/cuda:13.0.0-devel-ubuntu24.04 ships /usr/local/cuda ready to go; install-base-deps's keyring step then conflicts with the pre-installed Signed-By. - gRPC: ds4's grpc-server.cpp only links against grpc++; system libgrpc++-dev (apt) is sufficient, no source build needed. Replaced the install-base-deps invocation in Dockerfile.ds4 with a direct 'apt-get install libgrpc++-dev libprotobuf-dev protobuf-compiler-grpc nlohmann-json3-dev cmake build-essential pkg-config git'. Matrix entries back to nvidia/cuda base + skip-drivers=true so install-base-deps would no-op even if some downstream tooling calls it. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): correct proto accessors + alias grpc::Status as GStatus Two compile bugs caught by the docker build: 1. proto::Message uses snake_case accessors. The build_prompt loop called m.toolcalls() / m.toolcallid() - the protoc-generated names are m.tool_calls() / m.tool_call_id(). Plan-text bug propagated to the wrapper. 2. The Status RPC method shadowed the 'using grpc::Status' alias, so any later method declaration using Status as a return type failed to parse ('Status does not name a type' starting at LoadModel). Solution: alias grpc::Status as GStatus instead, with no 'using' clause that would conflict. All RPC method declarations and return-statement constructions now use GStatus. Pre-existing code reviewer flagged the Status-shadow concern as 'minor' in the original Task 10 commit; it turned out to be a real compile blocker under libstdc++ 13 once the surrounding methods were filled in. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): preserve TOOL_ARGS content in dsml_parser Flush When the model emitted a parameter value that arrived in the same buffer as the surrounding tool_call markers (e.g. the buffered tail after a literal '</think>' opened the model output), the parser deferred all buffered bytes to Flush() because looks_like_prefix() always returns true while buf starts with '<'. Flush() then drained the buffer as plain CONTENT/REASONING regardless of parser state, so the bytes between the parameter open and close markers were classified as CONTENT instead of TOOL_ARGS. Symptom: the model emitted <\|DSML\|parameter name="location" string="true">Paris, France</\|DSML\|parameter> and the assembled tool_call arguments came out as {"location":""} - the opener and closer were emitted into the args stream but the "Paris, France" content went to the assistant message instead. Fix: 1. Flush() now uses the same state-aware emit logic as DrainPlain: PARAM_VALUE bytes become TOOL_ARGS (json-escaped when string), THINK bytes become REASONING, TEXT bytes become CONTENT, and INVOKE / TOOL_CALLS structural whitespace is discarded. 2. looks_like_prefix() restricts its leading-'<' fallback to buffers that have not yet seen a '>'. Without that change, char-by-char feeds would discard the '<' of '<\|DSML\|invoke name="..."' once the marker prefix length was reached but the closing quote/'>' were still in flight. Verified with a standalone harness that runs the failing input three ways (single Feed, split-after-'>', and char-by-char) and aggregates TOOL_ARGS for tool index 0: all three now produce {"location":"Paris, France"}. Assisted-by: Claude:opus-4.7 [Read,Edit,Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backend/cpp/ds4): use ds4_session_sync + manual generation loop for KV persistence ds4_engine_generate_argmax() is a self-contained helper that doesn't take or update a ds4_session - it manages its own internal state. Our Predict and PredictStream methods created g_session via ds4_session_create() but then called ds4_engine_generate_argmax(), so g_session's KV state never advanced. ds4_session_payload_bytes(g_session) returned 0 and the disk KV cache save correctly rejected with 'session has no valid checkpoint to save'. Switch both RPCs to the proper session API: ds4_session_sync(g_session, &prompt, ...) loop: int token = ds4_session_argmax(g_session) if token == eos: break emit(token) ds4_session_eval(g_session, token, ...) After the loop the session has a real checkpoint and ds4_session_save_payload writes the KV state to disk. Verified end-to-end on a DGX Spark GB10: three .kv files (15-30 MB each) are written when BACKEND_TEST_OPTIONS sets kv_cache_dir, and the e2e tool-call assertion still passes. Also added stderr diagnostics to KvCache (enabled/disabled at SetDir; per-save path + payload_bytes + result) so future failures are visible instead of silent. The 'wrote ok' lines are low-volume - one per Predict/PredictStream when the cache is enabled - and skipped entirely when the option is unset. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): use ds4_session_eval_speculative_argmax when MTP loaded Wires MTP (Multi-Token Prediction) speculative decoding into the manual generation loop in both Predict and PredictStream. When the upstream MTP weights are loaded via 'mtp_path:' option AND we're on CUDA / Metal, ds4_engine_mtp_draft_tokens() returns >0 and we switch the inner loop to ds4_session_eval_speculative_argmax(), which can accept N>1 tokens per verifier step. When MTP is not loaded (no option, CPU backend, or weights absent), we fall through to the simple ds4_session_argmax + ds4_session_eval path with no behavior change. Validated on a DGX Spark GB10 with the optional MTP GGUF (DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf, ~3.6 GB). LoadModel logs 'ds4: MTP support model loaded ... (draft=2)' on stderr. Caveat per upstream README: 'currently provides at most a slight speedup, not a meaningful generation-speed win'. Wired now mainly to track the upstream API; bigger speedups arrive when ds4 improves the speculative path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend/cpp/ds4): honor PredictOptions sampling with DSML-aware override Mirrors ds4_server.c:7102-7115 sampling-policy semantics on the LocalAI gRPC side. The generation loop now consults compute_sample_params() per token to pick the effective (temperature, top_k, top_p, min_p), based on: 1. Request defaults: PredictOptions.temperature / .topk / .topp / .minp 2. Thinking-mode override: when enable_thinking != false, force T=1.0, top_k=0, top_p=1.0, min_p=0.0 (creativity for the reasoning pass and the trailing content) 3. DSML structural override: when DsmlParser::IsInDsmlStructural() returns true (we are between tool-call markers but NOT in a param value payload), force T=0.0 so protocol bytes parse cleanly When the effective temperature is 0, we keep using ds4_session_argmax + MTP speculative path (matches ds4-server's gate that only enables MTP for greedy positions). When > 0, we call ds4_session_sample(s, T, ...) with a per-thread RNG seeded from system_clock and fall back to single-token ds4_session_eval. New public method on DsmlParser: IsInDsmlStructural() encodes which states need protocol-byte determinism. PARAM_VALUE is excluded (payload uses user sampling); TEXT and THINK are excluded (no tool-call context to protect). Verified on the DGX Spark GB10: the e2e suite still passes with all 5 specs including tools, and the Predict output now varies between runs (creative sampling active) while the tool-call args remain a clean '{"location":"Paris, France"}' because the parser-state check forces greedy on the structural bytes. UX note: thinking mode is ON by default (matching ds4-server). Users who want deterministic output should set Metadata.enable_thinking = false. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add sha256 to deepseek-v4-flash-q2 entry Per HF LFS metadata for antirez/deepseek-v4-gguf: size: 86720111200 bytes (~80.76 GiB) sha256: 31598c67c8b8744d3bcebcd19aa62253c6dc43cef3b8adf9f593656c9e86fd8c LocalAI's downloader verifies sha256 when present, so users who install deepseek-v4-flash-q2 from the gallery get integrity-checked weights and the partial-download issue (an 81 GB file is easy to truncate) becomes recoverable instead of silently producing a broken backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-11 22:15:47 +02:00
Richard Palethorpe	670259ce43	chore: Security hardening (#9719 ) * fix(http): close 0.0.0.0/[::] SSRF bypass in /api/cors-proxy The CORS proxy carried its own private-network blocklist (RFC 1918 + a handful of IPv6 ranges) instead of using the same classification as pkg/utils/urlfetch.go. The hand-rolled list missed 0.0.0.0/8 and ::/128, both of which Linux routes to localhost — so any user with FeatureMCP (default-on for new users) could reach LocalAI's own listener and any other service bound to 0.0.0.0:port via: GET /api/cors-proxy?url=http://0.0.0.0:8080/... GET /api/cors-proxy?url=http://[::]:8080/... Replace the custom check with utils.IsPublicIP (Go stdlib IsLoopback / IsLinkLocalUnicast / IsPrivate / IsUnspecified, plus IPv4-mapped IPv6 unmasking) and add an upfront hostname rejection for localhost, .local, and the cloud metadata aliases so split-horizon DNS can't paper over the IP check. The IP-pinning DialContext is unchanged: the validated IP from the single resolution is reused for the connection, so DNS rebinding still cannot swap a public answer for a private one between validate and dial. Regression tests cover 0.0.0.0, 0.0.0.0:PORT, [::], ::ffff:127.0.0.1, ::ffff:10.0.0.1, file://, gopher://, ftp://, localhost, 127.0.0.1, 10.0.0.1, 169.254.169.254, metadata.google.internal. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> fix(downloader): verify SHA before promoting temp file to final path DownloadFileWithContext renamed the .partial file to its final name before checking the streamed SHA, so a hash mismatch returned an error but left the tampered file at filePath. Subsequent code that operated on filePath (a backend launcher, a YAML loader, a re-download that finds the file already present and skips) would consume the attacker-supplied bytes. Reorder: verify the streamed hash first, remove the .partial on mismatch, then rename. The streamed hash is computed during io.Copy so no second read is needed. While here, raise the empty-SHA case from a Debug log to a Warn so "this download had no integrity check" is visible at the default log level. Backend installs currently pass through with no digest; the warning makes that footprint observable without changing behaviour. Regression test asserts os.IsNotExist on the destination after a deliberate SHA mismatch. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(auth): require email_verified for OIDC admin promotion extractOIDCUserInfo read the ID token's "email" claim but never inspected "email_verified". With LOCALAI_ADMIN_EMAIL set, an attacker who could register on the configured OIDC IdP under that email (some IdPs accept self-supplied unverified emails) inherited admin role: - first login: AssignRole(tx, email, adminEmail) → RoleAdmin - re-login: MaybePromote(db, user, adminEmail) → flip to RoleAdmin Add EmailVerified to oauthUserInfo, parse email_verified from the OIDC claims (default false on absence so an IdP that omits the claim cannot short-circuit the gate), and substitute "" for the role-decision email when verified=false via emailForRoleDecision. The user record still stores the unverified email for display. GitHub's path defaults EmailVerified=true: GitHub only returns a public profile email after verification, and fetchGitHubPrimaryEmail explicitly filters to Verified=true. Regression tests cover both the helper contract and integration with AssignRole, including the bootstrap "first user" branch that would otherwise mask the gate. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(cli): refuse public bind when no auth backend is configured When neither an auth DB nor a static API key is set, the auth middleware passes every request through. That is fine for a developer laptop, a home LAN, or a Tailnet — the network itself is the trust boundary. It is not fine on a public IP, where every model install, settings change, and admin endpoint becomes reachable from the internet. Refuse to start in that exact configuration. Loopback, RFC 1918, RFC 4193 ULA, link-local, and RFC 6598 CGNAT (Tailscale's default range) all count as trusted; wildcard binds (`:port`, `0.0.0.0`, `[::]`) are accepted only when every host interface is in one of those ranges. Hostnames are resolved and treated as trusted only when every answer is. A new --allow-insecure-public-bind / LOCALAI_ALLOW_INSECURE_PUBLIC_BIND flag opts out for deployments that gate access externally (a reverse proxy enforcing auth, a mesh ACL, etc.). The error message lists this plus the three constructive alternatives (bind a private interface, enable --auth, set --api-keys). The interface enumeration goes through a package-level interfaceAddrsFn var so tests can simulate cloud-VM, home-LAN, Tailscale-only, and enumeration-failure topologies without poking at the real network stack. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(http): regression-test the localai_assistant admin gate ChatEndpoint already rejects metadata.localai_assistant=true from a non-admin caller, but the gate was open-coded inline with no direct test coverage. The chat route is FeatureChat-gated (default-on), and the assistant's in-process MCP server can install/delete models and edit configs — the wrong handler change would silently turn the LLM into a confused deputy. Extract the gate into requireAssistantAccess(c, authEnabled) and pin its behaviour: auth disabled is a no-op, unauthenticated is 403, RoleUser is 403, RoleAdmin and the synthetic legacy-key admin are admitted. No behaviour change in the production path. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(http): assert every API route is auth-classified The auth middleware classifies path prefixes (/api/, /v1/, /models/, etc.) as protected and treats anything else as a static-asset passthrough. A new endpoint shipped under a brand-new prefix — or a new path that simply isn't on the prefix allowlist — would be reachable anonymously. Walk every route registered by API() with auth enabled and a fresh in-memory database (no users, no keys), and assert each API-prefixed route returns 401 / 404 / 405 to an anonymous request. Public surfaces (/api/auth/, /api/branding, /api/node/ token-authenticated routes, /healthz, branding asset server, generated-content server, static assets) are explicit allowlist entries with comments justifying them. Build-tagged 'auth' so it runs against the SQLite-backed auth DB (matches the existing auth suite). Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(http): pin agent endpoint per-user isolation contract agents.go's getUserID / effectiveUserID / canImpersonateUser / wantsAllUsers helpers are the single trust boundary for cross-user access on agent, agent-jobs, collections, and skills routes. A regression there is the difference between "regular user reads their own data" and "regular user reads anyone's data via ?user_id=victim". Lock in the contract: - effectiveUserID ignores ?user_id= for unauthenticated and RoleUser - effectiveUserID honours it for RoleAdmin and ProviderAgentWorker - wantsAllUsers requires admin AND the literal "true" string - canImpersonateUser is admin OR agent-worker, never plain RoleUser No production change — this commit only adds tests. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(downloader): drop redundant stat in removePartialFile The stat-then-remove pattern is a TOCTOU window and a wasted syscall — os.Remove already returns ErrNotExist for the missing-file case, so trust that and treat it as a no-op. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(http): redact secrets from trace buffer and distribution-token logs The /api/traces buffer captured Authorization, Cookie, Set-Cookie, and API-key headers verbatim from every request when tracing was enabled. The endpoint is admin-only but the buffer is reachable via any heap-style introspection and the captured tokens otherwise outlive the request. Strip those header values at capture time. Body redaction is left to a follow-up — the prompts are usually the operator's own and JSON-walking is invasive. Distribution tokens were also logged in plaintext from core/explorer/discovery.go; logs forward to syslog/journald and outlive the token. Redact those to a short prefix/suffix instead. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(auth): rate-limit OAuth callbacks separately from password endpoints The shared 5/min/IP limit on auth endpoints is right for password-style flows but too tight for OAuth callbacks: corporate SSO funnels many real users through one outbound IP and would trip the limit. Add a separate 60/min/IP limiter for /api/auth/{github,oidc}/callback so callbacks are bounded against floods without breaking shared-IP deployments. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(gallery): verify backend tarball sha256 when set in gallery entry GalleryBackend gained an optional sha256 field; the install path now threads it through to the existing downloader hash-verify (which already streams, verifies, and rolls back on mismatch). Galleries without sha256 keep working; the empty-SHA path still emits the existing "downloading without integrity check" warning. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(http): pin CSRF coverage on multipart endpoints The CSRF middleware in app.go is global (e.Use) so it covers every multipart upload route — branding assets, fine-tune datasets, audio transforms, agent collections. Pin that contract: cross-site multipart POSTs are rejected; same-origin / same-site / API-key clients are not. Also pins the SameSite=Lax fallback path the skipper relies on when Sec-Fetch-Site is absent. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(http): XSS hardening — CSP headers, safe href, base-href escape, SVG sandbox Several closely related XSS-prevention changes spanning the SPA shell, the React UI, and the branding asset server: - New SecurityHeaders middleware sets CSP, X-Content-Type-Options, X-Frame-Options, and Referrer-Policy on every response. The CSP keeps script-src permissive because the Vite bundle relies on inline + eval'd scripts; tightening that requires moving to a nonce-based policy. - The <base href> injection in the SPA shell escaped attacker-controllable Host / X-Forwarded-Host headers — a single quote in the host header broke out of the attribute. Pass through SecureBaseHref (html.EscapeString). - Three React sinks rendering untrusted content via dangerouslySetInnerHTML switch to text-node rendering with whiteSpace: pre-wrap: user message bodies in Chat.jsx and AgentChat.jsx, and the agent activity log in AgentChat.jsx. The hand-rolled escape on the agent user-message variant is replaced by the same plain-text path. - New safeHref util collapses non-allowlisted URI schemes (most importantly javascript:) to '#'. Applied to gallery `<a href={url}>` links in Models / Backends / Manage and to canvas artifact links — these come from gallery JSON or assistant tool calls and must be treated as untrusted. - The branding asset server attaches a sandbox CSP plus same-origin CORP to .svg responses. The React UI loads logos via <img>, but the same URL is also reachable via direct navigation; this prevents script execution if a hostile SVG slipped past upload validation. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(http): bound HTTP server with read-header and idle timeouts A net/http server with no timeouts is trivially Slowloris-able and leaks idle keep-alive connections. Set ReadHeaderTimeout (30s) to plug the slow-headers attack and IdleTimeout (120s) to cap keep-alive sockets. ReadTimeout and WriteTimeout stay at 0 because request bodies can be multi-GB model uploads and SSE / chat completions stream for many minutes; operators who need tighter per-request bounds should terminate slow clients at a reverse proxy. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(auth): pin PUT /api/auth/profile field-tampering contract The handler uses an explicit local body struct (only name and avatar_url) plus a gorm Updates(map) with a column allowlist, so an attacker posting {"role":"admin","email":"...","password_hash":"..."} can't mass-assign those fields. Lock that down with a regression test so a future "let's just c.Bind(&user)" refactor breaks loudly. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(services): strip directory components from multipart upload filenames UploadDataset and UploadToCollectionForUser took the raw multipart file.Filename and joined it into a destination path. The fine-tune upload was incidentally safe because of a UUID prefix that fused any leading '..' to a literal segment, but the protection is fragile. UploadToCollectionForUser handed the filename to a vendored backend without sanitising at all. Strip to filepath.Base at both boundaries and reject the trivial unsafe values ("", ".", "..", "/"). Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(react-ui): validate persisted MCP server entries on load localStorage is shared across same-origin pages; an XSS that lands once can poison persisted MCP server config to attempt header injection or to feed a non-http URL into the fetch path on subsequent loads. Validate every entry: types must match, URL must parse with http(s) scheme, header keys/values must be control-char-free. Drop anything that doesn't fit. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(http): close X-Forwarded-Prefix open redirect The reverse-proxy support concatenated X-Forwarded-Prefix into the redirect target without validation, so a forged header value of "//evil.com" turned the SPA-shell redirect helper at /, /browse, and /browse/* into a 301 to //evil.com/app. The path-strip middleware had the same shape on its prefix-trailing-slash redirect. Add SafeForwardedPrefix at the middleware boundary: must start with a single '/', no protocol-relative '//' opener, no scheme, no backslash, no control characters. Apply at both consumers; misconfig trips the validator and the header is dropped. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(http): refuse wildcard CORS when LOCALAI_CORS=true with empty allowlist When LOCALAI_CORS=true but LOCALAI_CORS_ALLOW_ORIGINS was empty, Echo's CORSWithConfig saw an empty allow-list and fell back to its default AllowOrigins=[""]. An operator who flipped the strict-CORS feature flag without populating the list got the opposite of what they asked for. Echo never sets Allow-Credentials: true so this isn't directly exploitable (cookies aren't sent under wildcard CORS), but the misconfiguration trap is worth closing. Skip the registration and warn. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> feat(auth): zxcvbn password strength check with user-acknowledged override The previous policy was len < 8, which let through "Password1" and the rest of the credential-stuffing corpus. LocalAI has no second factor yet, so the bar needs to sit higher. Add ValidatePasswordStrength using github.com/timbutler/zxcvbn (an actively-maintained fork of the trustelem port; v1.0.4, April 2024): - min 12 chars, max 72 (bcrypt's truncation point) - reject NUL bytes (some bcrypt callers truncate at the first NUL) - require zxcvbn score >= 3 ("safely unguessable, ~10^8 guesses to break"); the hint list ["localai", "local-ai", "admin"] penalises passwords built from the app's own branding zxcvbn produces false positives sometimes (a strong-looking password that happens to match a dictionary word) and operators occasionally need to set a known-weak password (kiosk demos, CI rigs). Add an acknowledgement path: PasswordPolicy{AllowWeak: true} skips the entropy check while still enforcing the hard rules. The structured PasswordErrorResponse marks weak-password rejections as Overridable so the UI can surface a "use this anyway" checkbox. Wired through register, self-service password change, and admin password reset on both the server and the React UI. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(react-ui): drop HTML5 minLength on new-password inputs minLength={12} on the new-password input let the browser block the form submit silently before any JS or network call ran. The browser focused the field, showed a brief native tooltip, and that was that — no toast, no fetch, no clue. Reproducible by typing fewer than 12 chars on the second password change of a session. The JS-level length check in handleSubmit already shows a toast and the server rejects with a structured error, so the HTML5 attribute was redundant defence anyway. Drop it. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(react-ui): bundle Geist fonts locally instead of fetching from Google The new CSP correctly refused to apply styles from fonts.googleapis.com because style-src is locked to 'self' and 'unsafe-inline'. Loosening the CSP would defeat its purpose; the right fix is to stop reaching out to a third-party CDN for fonts on every page load. Add @fontsource-variable/geist and @fontsource-variable/geist-mono as npm deps and import them once at boot. Drop the <link rel="preconnect"> and external stylesheet from index.html. Side benefit: no third-party tracking via Referer / IP on every UI load, no failure mode when offline / behind a captive portal. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(react-ui): refresh i18n strings to reflect 12-char password minimum The translations still said "at least 8 characters" everywhere — the client-side toast on a too-short password change told the user the wrong floor. Update tooShort and newPasswordPlaceholder / newPasswordDescription across all five locales (en, es, it, de, zh-CN) to match the real ValidatePasswordStrength rule. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(auth): make password length-floor overridable like the entropy check The 12-char minimum was a policy choice, not a technical invariant — only "non-empty", "<= 72 bytes", and "no NUL bytes" are real bcrypt constraints. Treating length-12 as a hard rule was inconsistent with the entropy check (already overridable) and friction for use cases where the account is just a name on a session, not a security boundary (single-user kiosk, CI rig, lab demo). Restructure ValidatePasswordStrength: - Hard rules (always enforced): non-empty, <= MaxPasswordLength, no NUL byte - Policy rules (skipped when AllowWeak=true): length >= 12, zxcvbn score >= 3 PasswordError now marks password_too_short as Overridable too. The React forms generalised from `error_code === 'password_too_weak'` to `overridable === true`, and the JS-side preflight length checks were removed (server is source of truth, returns the same checkbox flow). Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-08 16:25:45 +02:00
LocalAI [bot]	e5d7b84216	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 ) * feat(messaging): add backend.upgrade NATS subject + payload types Splits the slow force-reinstall path off backend.install so it can run on its own subscription goroutine, eliminating head-of-line blocking between routine model loads and full gallery upgrades. Wire-level Force flag on BackendInstallRequest is kept for one release as the rolling-update fallback target; doc note marks it deprecated. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed/worker): add per-backend mutex helper to backendSupervisor Different backend names lock independently; same backend serializes. This is the synchronization primitive used by the upcoming concurrent install handler — without it, wrapping the NATS callback in a goroutine would race the gallery directory when two requests target the same backend. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed/worker): run backend.install handler in a goroutine NATS subscriptions deliver messages serially on a single per-subscription goroutine. With a synchronous install handler, a multi-minute gallery download would head-of-line-block every other install request to the same worker — manifesting upstream as a 5-minute "nats: timeout" on unrelated routine model loads. The body now runs in its own goroutine, with a per-backend mutex (lockBackend) protecting the gallery directory from concurrent operations on the same backend. Different backend names install in parallel. Backward-compat: req.Force=true is still honored here, so an older master that hasn't been updated to send on backend.upgrade keeps working. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed/worker): subscribe to backend.upgrade as a separate path Slow force-reinstall now lives on its own NATS subscription, so a multi-minute gallery pull cannot head-of-line-block the routine backend.install handler on the same worker. Same per-backend mutex guards both — concurrent install + upgrade for the same backend serialize at the gallery directory; different backends are independent. upgradeBackend stops every live process for the backend, force-installs from gallery, and re-registers. It does not start a new process — the next backend.install will spawn one with the freshly-pulled binary. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): add UpgradeBackend on NodeCommandSender; drop Force from InstallBackend Master now sends to backend.upgrade for force-reinstall, with a nats.ErrNoResponders fallback to the legacy backend.install Force=true path so a rolling update with a new master + an old worker still converges. The Force parameter leaves the public Go API surface entirely — only the internal fallback sets it on the wire. InstallBackend timeout drops 5min -> 3min (most replies are sub-second since the worker short-circuits on already-running or already-installed). UpgradeBackend timeout is 15min, sized for real-world Jetson-on-WiFi gallery pulls. Updates the admin install HTTP endpoint (core/http/endpoints/localai/nodes.go) to the new signature too. router_test.go's fakeUnloader does not yet implement the new interface shape; Task 3.2 will catch it up before the next package-level test run. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): update fakeUnloader for new NodeCommandSender shape InstallBackend lost its force bool param (Force is not part of the public Go API anymore — only the internal upgrade-fallback path sets it on the wire). UpgradeBackend gained a method. Fake records both call slices and provides an installHook concurrency seam for upcoming singleflight tests. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): cover UpgradeBackend's new subject + rolling-update fallback Task 3.1 changed the master to publish UpgradeBackend on the new backend.upgrade subject; the existing UpgradeBackend tests scripted the old install subject and so all 3 began failing as expected. Updates them to script SubjectNodeBackendUpgrade with BackendUpgradeReply. Adds two new specs for the rolling-update fallback: - ErrNoResponders on backend.upgrade triggers a backend.install Force=true retry on the same node. - Non-NoResponders errors propagate to the caller unchanged. scriptedMessagingClient gains scriptNoResponders (real nats sentinel) and scriptReplyMatching (predicate-matched canned reply, used to assert that the fallback path actually sets Force=true on the install retry). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed): coalesce concurrent identical backend.install via singleflight Six simultaneous chat completions for the same not-yet-loaded model were observed firing six independent NATS install requests, each serializing through the worker's per-subscription goroutine and amplifying queue depth. SmartRouter now wraps the NATS round-trip in a singleflight.Group keyed by (nodeID, backend, modelID, replica): N concurrent identical loads share one round-trip and one reply. Distinct (modelID, replica) keys still fire independent calls, so multi-replica scaling and multi-model fan-out are unaffected. fakeUnloader gains a sync.Mutex around its recording slices to keep concurrent test goroutines race-clean. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(e2e/distributed): drop force arg from InstallBackend test calls Two e2e test call sites still passed the trailing force bool that was removed from RemoteUnloaderAdapter.InstallBackend in `9bde76d7`. Caught by golangci-lint typecheck on the upgrade-split branch (master CI was already green because these tests don't run in the standard test path). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed): extract worker business logic to core/services/worker core/cli/worker.go grew to 1212 lines after the backend.upgrade split. The CLI package was carrying backendSupervisor, NATS lifecycle handlers, gallery install/upgrade orchestration, S3 file staging, and registration helpers — all distributed-worker business logic that doesn't belong in the cobra surface. Move it to a new core/services/worker package, mirroring the existing core/services/{nodes,messaging,galleryop} pattern. core/cli/worker.go shrinks to ~19 lines: a kong-tagged shim that embeds worker.Config and delegates Run. No behavior change. All symbols stay unexported except Config and Run. The three worker-specific tests (addr/replica/concurrency) move with the code via git mv so history follows them. Files split as: worker.go - Run entry point config.go - Config struct (kong tags retained, kong not imported) supervisor.go - backendProcess, backendSupervisor, process lifecycle install.go - installBackend, upgradeBackend, findBackend, lockBackend lifecycle.go - subscribeLifecycleEvents (verbatim, decomposition is a follow-up commit) file_staging.go - subscribeFileStaging, isPathAllowed registration.go - advertiseAddr, registrationBody, heartbeatBody, etc. reply.go - replyJSON process_helpers.go - readLastLinesFromFile Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed/worker): decompose subscribeLifecycleEvents into per-event handlers The 226-line subscribeLifecycleEvents method packed eight NATS subscriptions inline. Each grew context-shaped doc comments mixed with subscription plumbing, making it hard to read any one handler without scrolling past the others. Extract each handler into its own method on *backendSupervisor; the subscriber becomes a thin 8-line dispatcher. No behavior change: each method body is byte-equivalent to its corresponding inline goroutine + handler. Doc comments that were attached to the inline SubscribeReply calls migrate to the new method godocs. Adding the next NATS subject is now a 2-line patch to the dispatcher plus one new method, instead of grafting onto a monolith. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-08 16:24:54 +02:00
LocalAI [bot]	2be07f61da	feat(whisper): honor client cancellation via ggml abort_callback (#9710 ) * refactor(transcription): propagate request ctx through ModelTranscription* Replaces context.Background() with the HTTP request ctx so client disconnects start cancelling the gRPC call. No backend-side abort wiring yet — that comes in a later commit. Pure plumbing. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cli): pass ctx to backend.ModelTranscription Follow-up to `e65d3e1f` which threaded ctx through ModelTranscription but missed the CLI caller. CLI commands have no request-scoped ctx, so context.Background() is correct here. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(audio): propagate request ctx into TTS, sound-gen, audio-transform Same ctx-plumbing pattern applied to the rest of the audio path. CLI callers use context.Background() since there is no request scope; HTTP callers use c.Request().Context(). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(backend): propagate request ctx into biometric, detection, rerank, diarization paths Replaces remaining context.Background() sites in core/backend with the caller's ctx. After this commit, every core/backend/.go entry point threads the request ctx end-to-end to the gRPC client. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> refactor(grpc): plumb ctx through AIModel.AudioTranscription{,Stream} Adds context.Context as first parameter to the AIModel interface methods that wrap whisper-style transcription. Server-side gRPC handler now forwards the per-RPC ctx (server-streaming uses stream.Context()). Whisper, Voxtral, vibevoice-cpp, and sherpa-onnx accept the parameter; none uses it yet — the actual cancellation primitive lands in the next commit so this is pure plumbing. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): add abort_callback hook in the C++ bridge Installs a std::atomic<int> flag, wires it into whisper_full_params.abort_callback, and exposes a set_abort(int) C symbol so Go can flip the flag from a goroutine watching the request context. transcribe() now distinguishes abort (return 2) from real whisper_full failure (return 1). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): register set_abort symbol in the purego loader Adds the Go-side binding for the new C export so the next commit can call CppSetAbort(1) from a watcher goroutine on ctx.Done(). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): honor ctx cancellation and return codes.Canceled A watcher goroutine watches ctx.Done() during AudioTranscription and calls CppSetAbort(1) on cancel. whisper_full sees abort_callback return true at the next compute graph step, returns non-zero, and the bridge returns 2 -> AudioTranscription maps that to codes.Canceled. Adds an opt-in test (gated on WHISPER_MODEL_PATH / WHISPER_AUDIO_PATH) that asserts cancellation latency under 5s and proves the abort flag resets cleanly so the next transcription succeeds. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(whisper): join the cancel watcher goroutine before returning Follow-up to `85edf9d2`. The previous commit used `defer close(done)` and called the watcher "joined synchronously" — but close() only signals, it does not block until the goroutine exits. That left a window where a late CppSetAbort(1) from a cancelled call could land on the next call, after its C-side g_abort reset but before whisper_full() began polling the abort callback, corrupting the second transcription. Switch to a sync.WaitGroup join so wg.Wait() blocks until the watcher has actually returned from its select. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(whisper): short-circuit pre-cancelled ctx in AudioTranscription If ctx is already Done() at entry, return codes.Canceled immediately instead of running the full transcription. The C-side g_abort reset happens at the start of transcribe() and would otherwise overwrite a watcher-set abort flag from an already-cancelled ctx, producing a spurious successful transcription on a request the client has already abandoned. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests/distributed): update testLLM mock for new AudioTranscription signature Phase B (`93c48e19`) added context.Context to AIModel.AudioTranscription but missed the testLLM mock in tests/e2e/distributed. CI golangci-lint caught it: testLLM did not implement grpc.AIModel because the method signature lacked the ctx parameter, which broke the distributed test suite compilation and cascaded through every backend-build job that runs `go build ./...`. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> test(whisper): port cancellation test to Ginkgo/Gomega Project policy (.agents/coding-style.md, enforced by golangci-lint forbidigo) is that all Go tests must use Ginkgo v2 + Gomega — no stdlib testing patterns (t.Skip, t.Fatalf, etc.). Convert the cancellation test to a Describe/It block with Skip(...) for env gating and Expect/HaveOccurred for assertions. Same coverage: cancel mid-flight returns codes.Canceled within 5s and a follow-up transcription succeeds, proving the C-side g_abort flag resets cleanly. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-08 01:44:47 +02:00
LocalAI [bot]	595b6fd22d	feat(api/transcription): include segments + duration + language on stream done event (#9709 ) streamTranscription previously emitted a done event with just `text`, matching the OpenAI streaming spec exactly. Streaming clients that need per-utterance timings or audio duration had to fall back to the non-streaming JSON path — and that path is exactly the one that trips on ResponseHeaderTimeout when whisper requests queue behind each other on a SingleThread backend. Extend the done event to additively carry `language`, `duration`, and a `segments` array (id, start, end, text — start/end as float seconds, matching TranscriptionSegmentSeconds). Empty / zero values are still omitted; spec-compliant clients ignore the new fields. This unblocks notary's streaming Transcribe (companion change in the notary repo) so it produces the same TranscriptionResult shape as the JSON path while sidestepping the queue-induced header timeouts. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-07 17:28:26 +02:00
LocalAI [bot]	447c186089	fix(distributed): make backend upgrade actually re-install on workers (#9708 ) * fix(distributed): make backend upgrade actually re-install on workers UpgradeBackend dispatched a vanilla backend.install NATS event to every node hosting the backend. The worker's installBackend short-circuits on "already running for this (model, replica) slot" and returns the existing address — so the gallery install path was skipped, no artifact was re-downloaded, no metadata was written. The frontend's drift detection then re-flagged the same backends every cycle (installedDigest stays empty → mismatch → "Backend upgrade available (new build)") while "Backend upgraded successfully" landed in the logs at the same time. The user-visible symptom: clicking "Upgrade All" silently does nothing and the same N backends sit on the upgrade list forever. Two coupled fixes, one PR: 1. Force flag on backend.install. Add `Force bool` to BackendInstallRequest and thread it through NodeCommandSender -> RemoteUnloaderAdapter. UpgradeBackend (and the reconciler's pending-op drain when retrying an upgrade) sets force=true; routine load events and admin install endpoints keep force=false. On the worker, force=true stops every live process that uses this backend (resolveProcessKeys for peer replicas, plus the exact request processKey), skips the findBackend short-circuit, and passes force=true into gallery.InstallBackendFromGallery so the on-disk artifact is overwritten. After the gallery install completes, startBackend brings up a fresh process at the same processKey on a new port. 2. Liveness check on the fast path. installBackend's "already running" branch read getAddr without verifying the process was alive, so a gRPC backend that died without the supervisor noticing left a stale (key, addr) entry. The reconciler then dialed that address, got ECONNREFUSED, marked the replica failed, retried install — and the supervisor said "already running addr=…" again. Loop forever, exactly what we observed on a node whose llama-cpp process had died but whose supervisor record persisted. Verify s.isRunning(processKey) before trusting getAddr; if the entry is stale, stopBackendExact cleans up and we fall through to a real install. Backwards-compatible: the new Force field is omitempty, older workers ignore it (their default behavior matches force=false). The signature change on NodeCommandSender.InstallBackend is internal-only. Verified: unit tests in core/services/nodes pass (108s suite). The pre-existing core/backend build break (proto regen pending for word-level timestamps) blocks core/cli and core/http/endpoints/localai package tests but is unrelated to this change. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * test(e2e/distributed): pass force=false to adapter.InstallBackend NodeCommandSender.InstallBackend gained a final force bool in the upgrade-force commit; the e2e distributed lifecycle tests still called the old 8-arg signature and broke compilation. These tests exercise the routine install path (single replica, default behavior), so force=false preserves their existing semantics. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-07 17:28:14 +02:00
LocalAI [bot]	cec5c4fdfc	fix(http): make handler-error status visible in access log + transcription errors (#9707 ) * fix(http): log accurate status code when handler returns error The custom xlog access-log middleware in API() reads res.Status before Echo's central HTTPErrorHandler runs, so when a handler returns an error without writing a response (e.g. TranscriptEndpoint's `return err` on backend failure) the status field stays at its default 200. The logged line then claims status=200 while the client receives 500 — silently hiding every 500/503/etc. that bubbles up through Echo's error handler. Mirror echo.DefaultHTTPErrorHandler's status derivation when err != nil and the response hasn't been committed: default to 500, upgrade to echo.HTTPError.Code if applicable. The logged status now matches what the client actually sees, so failed transcription requests stop appearing as 200 in the access log. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] fix(transcription): log underlying error before returning 500 to client ModelTranscriptionWithOptions surfaces real failures — gRPC errors from a remote node, model load problems, ffmpeg conversion crashes — but TranscriptEndpoint just did `return err`, so Echo turned it into a 500 with a generic body and the original error was lost. Operators chasing transcription failures across distributed mode were left with "upstream returned 500" on the client and zero context anywhere in the frontend's logs. Add an xlog.Error before returning, recording model name, the staged audio path, and the underlying error. Combined with the access-log status fix, a failing transcription now leaves an audit trail (real status code in the access line, real cause in an Error line) instead of vanishing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-07 17:27:45 +02:00
LocalAI [bot]	392fc9ce3d	fix(auth): cascade user deletion across all owned data on PostgreSQL (#9702 ) * fix(auth): cascade user deletion across all owned data on PostgreSQL Deleting a user from the admin UI in distributed mode (PostgreSQL auth DB) returned "user not found" even when the user clearly existed. The old handler ignored result.Error and only checked RowsAffected, so a foreign-key constraint violation surfaced as a misleading 404. Two issues drove this: 1. invite_codes.created_by / used_by reference users(id) but the InviteCode model declared the FKs without ON DELETE CASCADE. On PostgreSQL the engine therefore rejected the user delete with NO ACTION whenever the user had ever issued or consumed an invite. On SQLite (default in single-node mode) FKs are not enforced, so the bug never appeared there. 2. Several owned tables were never cleaned up regardless of dialect: user_permissions and quota_rules relied on CASCADE that does not fire under SQLite, and usage_records have no FK at all and were left orphaned in every dialect. Introduce auth.DeleteUserCascade which runs the full cleanup in a single transaction: drop invites authored by the user, NULL used_by on invites they consumed (preserves the audit trail), and explicitly wipe sessions, API keys, permissions, quota rules, and usage metrics before deleting the user. The in-memory quota cache is invalidated after commit so a recreated user with the same id never sees stale entries. The HTTP handler now maps the helper's errors to proper status codes — real failures surface as 500 with the cause instead of being swallowed as "not found". Add Ginkgo regression coverage in core/http/auth/users_test.go and core/http/routes/auth_test.go covering invite cleanup, used_by null-out, full data wipe, and the FK-enforced original failure mode (via PRAGMA foreign_keys=ON to mirror PostgreSQL behavior on SQLite). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * chore(deps): bump LocalAGI/LocalRecall — pull in go-fitz PDF extraction Pulls LocalAGI@main (facd888) and LocalRecall@v0.6.0. The latter swaps PDF text extraction from dslipak/pdf to gen2brain/go-fitz (libmupdf bindings) and wraps it in a 60s goroutine timeout — previously certain PDFs (broken xref tables, encrypted, image-only without OCR) would hang indefinitely inside r.GetPlainText() and poison the upload queue. Pure dep bump, no LocalAI source changes. Indirect graph picks up go-fitz + purego + ffi; drops dslipak/pdf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 08:28:58 +02:00
Richard Palethorpe	969005b2a1	feat(gallery): Speed up load times and clean gallery entries (#9211 ) * feat: Rework VRAM estimation and use known_usecases in gallery Signed-off-by: Richard Palethorpe <io@richiejp.com> Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] * chore(gallery): regenerate gallery index and add known_usecases to model entries Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-06 14:51:38 +02:00
Andreas Egli	af83518532	feat: support word-level timestamps for faster-whisper (#9621 ) Signed-off-by: Andreas Egli <github@kharan.ch> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-05-06 00:32:52 +02:00
Ettore Di Giacinto	e86ade54a6	feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 ) * feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp Closes #1648. OpenAI-style multipart endpoint that returns "who spoke when". Single endpoint instead of the issue's three-endpoint sketch (refactor /vad, /vad/embedding, /diarization) — the typical client wants one call, and embeddings can land later as a sibling without breaking this surface. Response shape borrows from Pyannote/Deepgram: segments carry a normalised SPEAKER_NN id (zero-padded, stable across the response) plus the raw backend label, optional per-segment text when the backend bundles ASR, and a speakers summary in verbose_json. response_format also accepts rttm so consumers can pipe straight into pyannote.metrics / dscore. Backends: * vibevoice-cpp — Diarize() reuses the existing vv_capi_asr pass. vibevoice's ASR prompt asks the model to emit [{Start,End,Speaker,Content}] natively, so diarization is a by-product of the same pass; include_text=true preserves the transcript per segment, otherwise we drop it. * sherpa-onnx — wraps the upstream SherpaOnnxOfflineSpeakerDiarization C API (pyannote segmentation + speaker-embedding extractor + fast clustering). libsherpa-shim grew config builders, a SetClustering wrapper for per-call num_clusters/threshold overrides, and a segment_at accessor (purego can't read field arrays out of SherpaOnnxOfflineSpeakerDiarizationSegment[] directly). Plumbing: new Diarize gRPC RPC + DiarizeRequest / DiarizeSegment / DiarizeResponse messages, threaded through interface.go, base, server, client, embed. Default Base impl returns unimplemented. Capability surfaces all updated: FLAG_DIARIZATION usecase, FeatureAudioDiarization permission (default-on), RouteFeatureRegistry entries for /v1/audio/diarization and /audio/diarization, audio instruction-def description widened, CAP_DIARIZATION JS symbol, swagger regenerated, /api/instructions discovery map updated. Tests: * core/backend: speaker-label normalisation (first-seen → SPEAKER_NN, per-speaker totals, nil-safety, fallback to backend NumSpeakers when no segments). * core/http/endpoints/openai: RTTM rendering (file-id basename, negative duration clamping, fallback id). * tests/e2e: mock-backend grew a deterministic Diarize that emits raw labels "5","2","5" so the e2e suite verifies SPEAKER_NN remapping, verbose_json speakers summary + transcript pass-through (gated by include_text), RTTM bytes content-type, and rejection of unknown response_format. mock-diarize model config registered with known_usecases=[FLAG_DIARIZATION] to bypass the backend-name guard. Docs: new features/audio-diarization.md (request/response, RTTM example, sherpa-onnx + vibevoice setup), cross-link from audio-to-text.md, entry in whats-new.md. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(diarization): correct sherpa-onnx symbol name + lint cleanup CI failures on #9654: * sherpa-onnx-grpc-{tts,transcription} and sherpa-onnx-realtime panicked at backend startup with `undefined symbol: SherpaOnnxDestroyOfflineSpeakerDiarizationResult`. Upstream's actual symbol is SherpaOnnxOfflineSpeakerDiarizationDestroyResult (Destroy in the middle, not the prefix); the rest of the diarization surface follows the same naming pattern. The mismatched name made purego.RegisterLibFunc fail at dlopen time and crashed the gRPC server before the BeforeAll could probe Health, taking down every sherpa-onnx test job — not just the diarization-related ones. * golangci-lint flagged 5 errcheck violations on new defer cleanups (os.RemoveAll / Close / conn.Close); wrap each in a `defer func() { _ = X() }()` closure (matches the pattern other LocalAI files use for new code, since pre-existing bare defers are grandfathered in via new-from-merge-base). * golangci-lint also flagged forbidigo violations: the new diarization_test.go files used testing.T-style `t.Errorf` / `t.Fatalf`, which are forbidden by the project's coding-style policy (.agents/coding-style.md). Convert both files to Ginkgo/Gomega Describe/It with Expect(...) — they get picked up by the existing TestBackend / TestOpenAI suites, no new suite plumbing needed. * modernize linter: tightened the diarization segment loop to `for i := range int(numSegments)` (Go 1.22+ idiom). Verified locally: golangci-lint with new-from-merge-base=origin/master reports 0 issues across all touched packages, and the four mocked diarization e2e specs in tests/e2e/mock_backend_test.go still pass. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(vibevoice-cpp): convert non-WAV input via ffmpeg + raise ASR token budget Confirmed end-to-end against a real LocalAI instance with vibevoice-asr-q4_k loaded and the multi-speaker MP3 sample at vibevoice.cpp/samples/2p_argument.mp3: both /v1/audio/transcriptions and /v1/audio/diarization now succeed and return correctly attributed speaker turns for the full clip. Two latent issues surfaced once the diarization endpoint actually exercised the backend with a non-trivial input: 1. vv_capi_asr only accepts WAV via load_wav_24k_mono. The previous code passed the uploaded path straight through, so anything that wasn't already a 24 kHz mono s16le WAV failed at the C side with rc=-8 and the very unhelpful "vv_capi_asr failed". prepareWavInput shells out to ffmpeg ("-ar 24000 -ac 1 -acodec pcm_s16le") in a per-call temp dir, matching the rate the model was trained on; both AudioTranscription and Diarize now route through it. This is the same shape sherpa-onnx uses (utils.AudioToWav), but vibevoice needs 24 kHz rather than 16 kHz so we don't reuse that helper. 2. The C ABI's max_new_tokens defaults to 256 when 0 is passed. That's fine for a five-second clip but not for anything past ~10 s — vibevoice stops mid-JSON, the parse fails, and the caller sees a hard error. Pass a much larger budget (16 384 ≈ ~9 minutes of speech at the model's ~30 tok/s rate); generation stops at EOS so this is a cap rather than a target. 3. As a defensive belt-and-braces, mirror AudioTranscription's existing "fall back to a single segment if the model emits non-JSON text" pattern in Diarize, so partial / unusual model output never produces a 500. This kept the endpoint usable while diagnosing (1) and (2), and is the right behaviour to keep. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(vibevoice-cpp): pass valid WAVs through directly so ffmpeg is not required at runtime Spotted by tests-e2e-backend (1.25.x): the previous fix forced every incoming audio file through `ffmpeg -ar 24000 ...`, which meant the backend container — which does not ship ffmpeg — failed even for the existing happy path where the caller already uploads a WAV. The container-side error was: rpc error: code = Unknown desc = vibevoice-cpp: ffmpeg convert to 24k mono wav: exec: "ffmpeg": executable file not found in $PATH Reading vibevoice.cpp's audio_io.cpp, `load_wav_24k_mono` uses drwav and already accepts any PCM/IEEE-float WAV at any sample rate, downmixes multi-channel input to mono, and resamples to 24 kHz internally. So the only inputs that genuinely need an external converter are non-WAV formats (MP3, OGG, FLAC, ...). Detect WAVs by RIFF/WAVE magic at bytes 0..3 / 8..11 and pass them straight through with a no-op cleanup; everything else still goes through ffmpeg with the same 24 kHz mono s16le target. The result: * Container builds without ffmpeg keep working for WAV uploads (the e2e-backends fixture is jfk.wav at 16 kHz mono s16le). * MP3 and other non-WAV inputs still get the new ffmpeg conversion path so the diarization endpoint stays useful. * If the caller uploads a non-WAV but ffmpeg isn't on PATH, the surfaced error is still descriptive enough to act on. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(ci): make gcc-14 install in Dockerfile.golang best-effort for jammy bases The LocalVQE PR (`bb033b16`) made `gcc-14 g++-14` an unconditional apt install in backend/Dockerfile.golang and pointed update-alternatives at them. That works on the default `BASE_IMAGE=ubuntu:24.04` (noble has gcc-14 in main), but every Go backend that builds on `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — jammy under the hood — now fails at the apt step: E: Unable to locate package gcc-14 This blocked unrelated jobs: backend-jobs(*-nvidia-l4t-arm64-{stablediffusion-ggml, sam3-cpp, whisper, acestep-cpp, qwen3-tts-cpp, vibevoice-cpp}). LocalVQE itself is only matrix-built on ubuntu:24.04 (CPU + Vulkan), so it doesn't actually need gcc-14 anywhere else. Make the gcc-14 install conditional on the package being available in the configured apt repos. On noble: identical behaviour to today (gcc-14 installed, update-alternatives points at it). On jammy: skip the gcc-14 stanza entirely and let build-essential's default gcc take over, which is what the other Go backends compile with anyway. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-05 15:10:13 +02:00
Richard Palethorpe	bb033b16a9	feat: add LocalVQE backend and audio transformations UI (#9640 ) feat(audio-transform): add LocalVQE backend, bidi gRPC RPC, Studio UI Introduce a generic "audio transform" capability for any audio-in / audio-out operation (echo cancellation, noise suppression, dereverberation, voice conversion, etc.) and ship LocalVQE as the first backend implementation. Backend protocol: - Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and bidirectional AudioTransformStream for low-latency frame-by-frame use. This is the first bidi stream in the proto; per-frame unary at LocalVQE's 16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server, embed,interface,base} with paired-channel ergonomics. LocalVQE backend (backend/go/localvqe/): - Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream shared lib + its libggml-cpu-.so runtime variants directly — no MODULE wrapper needed because LocalVQE handles CPU feature selection internally via GGML_BACKEND_DL. - Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it LocalVQE runs single-threaded at ~1× realtime instead of the documented ~9.6×. - Reference-length policy: zero-pad short refs, truncate long ones (the trailing portion can't have leaked into a mic that wasn't recording). - Ginkgo test suite (9 always-on specs + 2 model-gated). HTTP layer: - POST /audio/transformations (alias /audio/transform): multipart batch endpoint, accepts audio + optional reference + params[]=v form fields. Persists inputs alongside the output in GeneratedContentDir/audio so the React UI history can replay past (audio, reference, output) triples. - GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames (interleaved stereo mic+ref in, mono out). JSON session.update envelope for config; constants hoisted in core/schema/audio_transform.go. - ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing utils.AudioToWav (with passthrough fast-path), so the user can upload any format / rate without seeing the model's strict 16 kHz constraint. - BackendTraceAudioTransform integration so /api/backend-traces and the Traces UI light up with audio_snippet base64 and timing. - Routes registered under routes/localai.go (LocalAI extension; OpenAI has no /audio/transformations endpoint), traced via TraceMiddleware. Auth + capability + importer: - FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on, in APIFeatures), three RouteFeatureRegistry rows. - localvqe added to knownPrefOnlyBackends with modality "audio-transform". - Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on huggingface.co/LocalAI-io/LocalVQE). React UI: - New /app/transform page surfaced via a dedicated "Enhance" sidebar section (sibling of Tools / Biometrics) — the page is enhancement, not generation, so it lives outside Studio. Two AudioInput components (Upload + Record tabs, drag-drop, mic capture). - Echo-test button: records mic while playing the loaded reference through the speakers — the mic naturally picks up speaker bleed, giving a real (mic, ref) pair for AEC testing without leaving the UI. - Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls) and useAudioPeaks hook (shared module-scoped AudioContext to avoid hitting browser context limits with three players on one page); migrated TTS, Sound, Traces audio blocks to use it. - Past runs saved in localStorage via useMediaHistory('audio-transform') — the history entry stores all three URLs so clicking re-renders the full triple, not just the output. Build + e2e: - 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm, SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those two and let GPU-class hardware route through Vulkan in the gallery capabilities map. - tests-localvqe-grpc-transform job in test-extra.yml (gated on detect-changes.outputs.localvqe). - New audio_transform capability + 4 specs in tests/e2e-backends. - Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js (8 specs covering tabs, file upload, multipart shape, history, errors). Docs: - New docs/content/features/audio-transform.md covering the (audio, reference) mental model, batch + WebSocket wire formats, LocalVQE param keys, and a YAML config example. Cross-links from text-to-audio and audio-to-text feature pages. Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-04 22:07:11 +02:00
Ettore Di Giacinto	a271c72931	fix(react-ui/e2e): scope backendTrigger to <main> so it skips LanguageSwitcher The LanguageSwitcher added in the i18n PR (#9642) lives in the sidebar and also uses aria-haspopup="listbox" — same attribute the import-form SearchableSelect uses. The Batch D / E tests' helper resolved the trigger with `page.locator('button[aria-haspopup="listbox"]').first()`, which now returns the language switcher (rendered first in DOM order, in the sidebar) instead of the backend dropdown. After clicking the wrong button, getByRole('option', { name: 'llama-cpp' }) naturally never resolves — language options aren't backend names — and the test times out at 30s. Scope the locator to the <main className="main-content"> wrapper so only buttons inside the route's main content area match. The page layout has the Sidebar outside <main>, so this cleanly excludes it. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-04 08:58:25 +00:00
Ettore Di Giacinto	ade5fd4b97	fix(react-ui): reflect disabled state on SearchableSelect button The Backend dropdown is disabled while /backends/known is in flight (disabled={isSubmitting \|\| backendsLoading} in ImportModel.jsx). Until now the disabled prop only guarded the internal onClick handler — there was no `disabled` HTML attribute on the <button>, so the element remained "actionable" from the outside. That regressed the import-form-ux Batch D / E Playwright tests after the i18next-suspense PR (#9642): suspending on the importModel namespace defers the useEffect that fetches /backends/known, so when the test calls backendTrigger.click() the button is rendered but backendsLoading is still true. The click hits the no-op branch, the dropdown stays closed, and `getByRole('option', { name: 'llama-cpp' })` times out at 30s. Surfacing the disabled state on the actual <button> makes Playwright auto-wait until the dropdown is ready, fixes a11y (screen readers now announce "disabled"), and removes the button from the tab order while loading. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-04 08:21:03 +00:00
Ettore Di Giacinto	87cf736068	feat(react-ui): add multilingual (i18n) support (#9642 ) Adds end-to-end internationalization to the React UI with five seed languages (English, Italian, Spanish, German, Simplified Chinese) and a sidebar-footer language switcher next to the existing theme toggle. Library: react-i18next + i18next + i18next-http-backend + i18next-browser-languagedetector. The detector caches the user's choice in localStorage (key `localai-language`, mirroring the existing `localai-theme` convention) and updates the `<html lang>` attribute on change. fallbackLng is `en`, so any missing translation in another locale falls back transparently. Translation files live under `public/locales/<lng>/<ns>.json`. They ride along with the existing `//go:embed react-ui/dist/` directive, but the previous SPA route in core/http/app.go only exposed `/assets/` from the embedded React build. This commit generalizes the asset handler into a `serveReactSubdir(subdir)` helper and adds a matching `/locales/*` route so i18next-http-backend can fetch the JSONs at runtime. The http-backend `loadPath` is built via the existing `apiUrl()` helper so instances served under a sub-path (e.g. `<base href="/ui/">`) resolve correctly. Namespaces (13): common, nav, errors, auth, home, models, importModel, chat, agents, skills, collections, media, admin. Translated UI surfaces include the sidebar/header/footer chrome, login + account flows, the Home dashboard (incl. the manage-by-chat assistant CTA), the model gallery + import flow, the chat experience (Chat.jsx + ChatsMenu), agents/skills/collections list pages, the studio media tabs (Image, Video, TTS), and the admin page-headers (Settings incl. its section nav, Manage, Backends, Traces, Nodes, P2P, Users, Usage). Shared components (ConfirmDialog, Toast) take their default labels from the common namespace so callers don't need to pass strings explicitly. Tooling for incremental adoption is included: - `i18next-parser.config.js` + `npm run i18n:extract` to sweep `t()` keys into the JSON skeletons. - `scripts/translate-locales.mjs` (one-off helper) to bootstrap non-English locales from English source via OpenAI or Anthropic APIs, with --copy mode as a placeholder fallback. Idempotent; preserves existing translations unless --overwrite is passed. Larger config-driven pages (ModelEditor, Settings deep field forms, AgentChat/AgentCreate, SkillEdit, CollectionDetails, Talk, Sound, biometrics, FineTune/Quantize, Users modals, Nodes/P2P install pickers, BackendLogs, Traces deep filters, Explorer) intentionally keep their inner content untranslated for now — they fall back to English via fallbackLng so functionality is unaffected, and the extracted-strings pattern + the bootstrap script make follow-up extraction straightforward. The initial Suspense fallback at the root in main.jsx covers the first JSON fetch on cold load. A simple `.app-boot-spinner` styled in App.css provides a non-empty paint while the first namespace loads. Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-02 22:42:08 +02:00
Ettore Di Giacinto	b1a99436c7	feat(branding): admin-configurable instance name, tagline, and assets (#9635 ) Adds a whitelabeling feature so an operator can replace the LocalAI instance name, tagline, square logo, horizontal logo, and favicon from the admin Settings page. Defaults fall back to the bundled assets so existing installs are unaffected. The public GET /api/branding endpoint is reachable pre-auth so the login screen can render the configured branding before sign-in. Mutating routes (POST/DELETE /api/branding/asset/:kind) remain admin-only. Text fields (instance_name, instance_tagline) ride the existing /api/settings flow; binary assets get a dedicated multipart upload route that persists files under DynamicConfigsDir/branding/. To prevent the Settings page's stale local state from clobbering an upload on save, UpdateSettingsEndpoint preserves whatever the on-disk asset filename fields are regardless of the body — /api/branding/asset/* are the sole writers for those fields. The MCP catalog gains get_branding and set_branding tools (text fields only; file upload stays UI-only) plus a configure_branding skill prompt. While wiring this up, the same restart-loss class of bug surfaced for several existing fields whose RuntimeSettings entries were never read by the startup loader. Fix loadRuntimeSettingsFromFile() to load: - branding (instance_name, instance_tagline, *_file basenames) - auto_upgrade_backends, prefer_development_backends - localai_assistant_enabled - open_responses_store_ttl - the 7 existing AgentPool fields (enabled, default/embedding model, chunking sizes, enable_logs, collection_db_path) Also exposes 3 new AgentPool runtime settings (vector_engine, database_url, agent_hub_url) via /api/settings + the Settings UI, with the same load-on-startup wiring. The file watcher's manual-edit path is intentionally not changed — the in-process API endpoints already update appConfig directly, so the watcher is redundant for supported flows and a separate refactor for everything else. 15 TDD specs cover the loader behaviour (1 branding + 11 adjacent + 3 new agent-pool); 2 specs cover the persistence helpers and the clobber-prevention contract. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-02 15:51:36 +02:00
Ettore Di Giacinto	091eda8d70	feat: react chat redesign (#9616 ) * feat(react-ui): redesign chat — popover history, focus on send, density pass Replace the persistent 260px conversation sidebar with a Cmd/Ctrl+K popover (ChatsMenu) so the conversation owns the page. Once a chat has at least one message we auto-collapse the global app rail and fade non-essential header chrome; Esc gives the user back the full chrome for the rest of the session. Move Canvas mode and the MCP dropdown into the input wrapper as mode chips — they describe what's armed for the next message and now live where the user composes. The chat header drops to Chats · title · ModelSelector · overflow · settings, and an overflow menu carries admin-only Manage mode along with Info / Edit / Export / Clear. Density pass: tighter header (40px), smaller avatars with the assistant left-border accent doing the work, 88% bubble width, modern field-sizing on the textarea, 32px send/stop buttons. Empty state now surfaces a Recent strip (top 4 non-empty chats) and a Cmd+K hint, replacing the discoverability the persistent sidebar used to provide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * feat(react-ui): chat input chips, slimmer menu, focus mode polish Move Canvas mode and the MCP dropdown into the input wrapper as compact mode chips — they describe what's armed for the next message and now sit where the user composes. The MCP popover flips upward when anchored to the input row so it stays on-screen. Eliminate the chat header overflow ("…") menu entirely; relocate each item to its semantic home so users don't have to remember a miscellany drawer: - Manage mode toggle → top of the Settings drawer, alongside the other sticky chat knobs. The shield next to the title still signals state at a glance. - Model info / Edit config → small admin-only "ⓘ" button next to the ModelSelector; the existing model-info panel now hosts the Edit config link. - Export as Markdown → per-row hover action in ChatsMenu, so it works for any chat (not just the active one). - Clear chat history → destructive button at the bottom of the Settings drawer. Make the Sidebar listen to its own `sidebar-collapse` event so the chat's focus mode actually shrinks the rail (it previously only flipped the layout class, leaving the sidebar element at full width and overlapping the chat). Drop the focus-mode toast — the visual shift is enough; the toast was noise. Define `--color-text-tertiary` in both themes; without it metadata text (recent strip timestamps and a few other sites) was inheriting the platform default, which read as black on the dark surface. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * fix(model/log-store): close merged channel exactly once; clean up Remove Two latent races in BackendLogStore.Subscribe could panic under load (distributed e2e test triggered "send on closed channel" at backend_log_store.go:288): 1. The aggregated path closed the merged channel `ch` from two places — the fan-in waiter goroutine (after all source channels drained) and unsubscribe(). When unsubscribe ran while a fan-in goroutine was mid-flight on `ch <- line`, the close beat the send and the runtime panicked. Now `ch` is closed by exactly one goroutine: the waiter that observes all fan-in goroutines finish. unsubscribe() only closes the per-buffer source channels — the for-range in each fan-in goroutine then exits naturally and the waiter takes care of the merged close. 2. Remove() closed every subscriber channel but didn't delete the entries from the subscribers map, so a concurrent unsubscribe() would call close() again on the already-closed channel ("close of closed channel"). Clear the map entry while closing. Add a regression test that hammers AppendLine concurrently with Subscribe + unsubscribe + Remove; the race detector catches both classes of regression. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * test(model/log-store): port backend log store tests to ginkgo Bring backend_log_store_test.go in line with the rest of pkg/model (loader_test, watchdog_test, store_test): same external test package (`model_test`), same ginkgo + gomega imports, same Describe/It nesting around the public API. Behaviour is unchanged — the four existing scenarios plus the unsubscribe race regression all run as specs under the existing `TestModel` suite. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-29 22:33:26 +02:00
Ettore Di Giacinto	bcef72b9c1	feat: localai assistant chat modality (#9602 ) * fix(tests): inline model_test fixtures after tests/models_fixtures removal The previous reorg removed tests/models_fixtures/ but core/config/model_test.go still read CONFIG_FILE/MODELS_PATH env vars pointing into that directory, so `make test` failed with "open : no such file or directory" on the readConfigFile spec (the suite ran with --fail-fast and bailed before openresponses_test). Inline the YAMLs (config/embeddings/grpc/rwkv/whisper) directly into the test file, materialise them into a per-test tmpdir via BeforeEach, and drop the env-var lookups. The test no longer depends on Makefile plumbing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash] * refactor(modeladmin): extract model-admin helpers into a service package Lift the bodies of EditModelEndpoint, PatchConfigEndpoint, ToggleStateModelEndpoint, TogglePinnedModelEndpoint and VRAMEstimateEndpoint into core/services/modeladmin so the same logic can be called by non-HTTP clients (notably the in-process MCP server that backs the LocalAI Assistant chat modality, landing in a follow-up commit). The HTTP handlers shrink to thin shells that parse echo inputs, call the matching helper, map typed errors (ErrNotFound, ErrConflict, ErrPathNotTrusted, ErrBadAction, ...) to the existing HTTP status codes, and render the existing response shapes. No REST-surface behaviour change; the existing localai endpoint tests cover the regression net. Adds focused unit tests for each helper against tmp-dir-backed ModelConfigLoader fixtures (deep-merge patch, rename + conflict, path separator guard, toggle/pin enable/disable, sync callback). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(assistant): LocalAI Assistant chat modality with in-memory MCP server Adds a chat modality, admin-only, that wires the chat session to an in-memory MCP server exposing LocalAI's own admin/management surface as tools. An admin can install models, manage backends, edit configs and check status by chatting; the LLM calls tools like gallery_search, install_model, import_model_uri, list_installed_models, edit_model_config and surfaces the results. Same Go package powers two modes: pkg/mcp/localaitools/ NewServer(client, opts) builds an MCP server that registers the 19-tool admin catalog. The LocalAIClient interface has two impls: - inproc.Client — calls services directly (no HTTP loopback, no synthetic admin API key). Used in-process by the chat handler. - httpapi.Client — calls the LocalAI REST API. Used by the new `local-ai mcp-server --target=…` subcommand to control a remote LocalAI from a stdio MCP host. Tools and their embedded skill prompts are agnostic to which client backs them. Skill prompts are markdown files under prompts/, embedded via go:embed and assembled into the system prompt at server init. Wiring: - core/http/endpoints/mcp/localai_assistant.go — process-wide holder that spins up the in-memory MCP server once at Application start using paired net.Pipe transports, then reuses LocalToolExecutor (no fork) for every chat request that opts in. - core/http/endpoints/openai/chat.go — small branch ahead of the existing MCP block: when metadata.localai_assistant=true, defense-in-depth admin check + executor swap + system-prompt injection. All downstream tool dispatch is unchanged. - core/http/auth/{permissions,features}.go — adds FeatureLocalAIAssistant; gating happens at the chat handler entry plus admin-only `/api/settings`. - core/cli/{run.go,cli.go,mcp_server.go} — LOCALAI_DISABLE_ASSISTANT flag (runtime-toggleable via Settings, no restart), plus `local-ai mcp-server` stdio subcommand. - core/config/runtime_settings.go — `localai_assistant_enabled` runtime setting; the chat handler reads `DisableLocalAIAssistant` live at request entry. UI: - Home.jsx — prominent self-explanatory CTA card on first run ("Manage LocalAI by chatting"); collapses to a compact "Manage by chat" button in the quick-links row once used, persisted via localStorage. - Chat.jsx — admin-only "Manage" toggle in the chat header, "Manage mode" badge, dedicated empty-state copy, starter chips. - Settings.jsx — "LocalAI Assistant" section with the runtime enable toggle. - useChat.js — `localaiAssistant` flag on the chat schema; injects `metadata.localai_assistant=true` on requests when active. Distributed mode: the in-memory MCP server lives only on the head node; inproc.Client wraps already-distributed-aware services so installs propagate to workers via the existing GalleryService machinery. Documentation: `.agents/localai-assistant-mcp.md` is the contributor contract — when adding an admin REST endpoint, also add a LocalAIClient method, an inproc + httpapi impl, a tool registration, and a skill prompt update; the AGENTS.md index links to it. Out of scope (follow-ups): per-tool RBAC granularity for non-admin read-only access; streaming mcp_tool_progress for long installs; React Vitest rig for the UI changes. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): extract tool/capability/MiB/server-name constants The MCP tool surface, capability tag set, server-name default, and the chat-handler metadata key were repeated as bare string literals across seven files. Renaming any one required hand-editing every call site and risked code/test/prompt drift. This pulls them into typed constants: - pkg/mcp/localaitools/tools.go — Tool* constants for the 19 MCP tools, plus DefaultServerName. - pkg/mcp/localaitools/capability.go — typed Capability + constants for the capability tag set the LLM passes to list_installed_models. The type rides through LocalAIClient.ListInstalledModels and replaces the triplet of "embed"/"embedding"/"embeddings" with the single CapabilityEmbeddings. - pkg/mcp/localaitools/inproc/client.go — bytesPerMiB constant for the VRAMEstimate byte→MB conversion. - core/http/endpoints/mcp/tools.go — MetadataKeyLocalAIAssistant for the "localai_assistant" request-metadata key consumed by the chat handler. Tool registrations, the test catalog, the dispatch table, the validation fixtures, and the fake/stub clients all reference the constants. The embedded skill prompts under prompts/ keep their bare strings (go:embed markdown can't import Go constants); the existing TestPromptsContain SafetyAnchors guards the alignment. No behaviour change. All tests pass with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(modeladmin): typed Action for ToggleState/TogglePinned The toggle/pin verbs were bare strings everywhere — handler signatures, service implementations, MCP tool args, the fake/stub clients, the inproc and httpapi LocalAIClient impls, plus 4 test files. A typo in any caller silently fell through to the runtime "must be 'enable' or 'disable'" check. Introduce core/services/modeladmin.Action (string alias) with ActionEnable, ActionDisable, ActionPin, ActionUnpin and a small Valid helper. The compiler now catches mismatches at every boundary; renames ripple through one source of truth. LocalAIClient.ToggleModelState/Pinned signatures change to take modeladmin.Action. The package is brand-new and unreleased so this is a free public-API tightening. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): respect ctx cancellation on gallery channel sends InstallModel, DeleteModel, ImportModelURI, InstallBackend and UpgradeBackend all pushed onto galleryop channels with bare sends. If the worker was paused or the buffer full, the chat-handler goroutine blocked forever — the LLM kept polling and the request leaked. Wrap the five sends in a sendModelOp/sendBackendOp helper that selects on ctx.Done() so a cancelled chat completion surfaces context.Canceled back to the LLM instead of hanging. Adds inproc/client_test.go with a pre-cancelled-ctx regression test on InstallModel; the helpers are shared so the same guarantee covers the other four call sites. Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): graceful shutdown for in-memory holder and stdio CLI Two related leaks: - Application.start() built the LocalAIAssistantHolder but never wired Close() into the graceful-termination chain — the in-memory MCP transport pair stayed alive until process exit, and the goroutines behind net.Pipe() didn't drain. Hook into the existing signals.RegisterGracefulTerminationHandler chain (same pattern as core/http/endpoints/mcp/tools.go:770). - core/cli/mcp_server.go ran srv.Run with context.Background(); a Ctrl-C from the host (Claude Desktop, mcphost, npx inspector) or a SIGTERM from process supervision left the stdio loop reading from a closed pipe. Switch to signal.NotifyContext to surface the signal through ctx and let srv.Run drain. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(assistant): typed HTTPError + propagate prompt walk error The httpapi client detected "no such job" by substring-matching on the error string ("404", "could not find") — brittle to status-code formatting changes and to LocalAI fixing /models/jobs/:uuid to return a proper 404. Replace with a typed HTTPError whose Is() method honours errors.Is(err, ErrHTTPNotFound). The 500-with-"could not find" branch stays as a transitional fallback documented in Is(). Same change covers ListNodes' 404 fallback for the /api/nodes endpoint. Adds httptest tests for both 404 and the legacy 500 path, plus a direct errors.Is exposure test so external callers (the standalone stdio CLI host) can match without re-string-parsing. Also tightens prompts.SystemPrompt: panic when fs.WalkDir on the embedded FS fails. The only realistic cause is a build-time //go:embed misconfiguration; serving an empty system prompt to the LLM is much worse than crashing init. TestSystemPromptIncludesAllEmbeddedFiles catches regressions in CI. Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(modeladmin): atomic writes for model config files The five sites that wrote model YAML used os.WriteFile, which opens with O_TRUNC\|O_WRONLY\|O_CREATE. A crash mid-write left the destination truncated and the model unloadable until manual repair. Pre-existing behaviour inherited from the original endpoint handlers — fix once now that there's a single helper. Adds writeFileAtomic: writes to a sibling temp file, chmods, syncs via Close(), then os.Rename. Same-directory temp keeps the rename atomic on the same filesystem; cleanup runs on every error path so stray temps don't accumulate. No new dependency. Applied to: - ConfigService.PatchConfig - ConfigService.EditYAML (both rename and in-place branches) - mutateYAMLBoolFlag (drives ToggleState + TogglePinned) atomic_test.go covers the happy path plus a read-only-dir failure case that asserts the original file is preserved (skipped on Windows where the chmod trick is POSIX-specific). Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(assistant): prune dead code, mark stub, document conventions Three small cleanups landing together: - Drop the unused errNotImplemented sentinel from inproc/client.go. All five methods that used to return it are wired to modeladmin helpers since the Phase B commit; the package var is dead. - Annotate httpapi.Client.GetModelConfig as a known stub. LocalAI's /models/edit/:name returns rendered HTML, not JSON, so the standalone CLI's get_model_config tool surfaces a clear error to the LLM. A future JSON-only /api/models/config-yaml/:name endpoint is tracked in the agent contract; FIXME points at it. - Extend `.agents/localai-assistant-mcp.md` with a "Code conventions" section that documents the audit-driven rules: tool/Capability/Action constants, errors.Is over substring matching, ctx-aware channel sends, atomic writes, and graceful shutdown. Refresh the file map so it lists tools.go and capability.go and drops the removed tools_bootstrap.go. The tools_models.go diff is a comment-only change explaining why the ModelName empty-string check stays at the tool layer (consistency across LocalAIClient implementations, since the SDK schema validator only enforces presence, not non-empty). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(assistant): convert test files to ginkgo + gomega The repo convention (per core/http/endpoints/localai/_test.go, core/gallery/, etc.) is Ginkgo v2 with Gomega assertions. The tests I introduced for the assistant feature used vanilla testing.T, which made them stand out and stripped the BDD structure the rest of the suite relies on. Convert every test file in the assistant scope to Ginkgo: pkg/mcp/localaitools/ dto_test.go — Describe("DTOs round-trip through JSON") prompts_test.go — Describe("SystemPrompt assembler") server_test.go — Describe("Server tool catalog"), Describe("Tool dispatch"), Describe("Tool error surfacing"), Describe("Argument validation"), Describe("Concurrent tool calls") parity_test.go — Describe("LocalAIClient parity"), hosts the suite's single RunSpecs (the file is package localaitools_test so it can import httpapi without an import cycle; Ginkgo aggregates Describes from both the internal and external test packages into one run). httpapi/client_test.go — Describe("httpapi.Client against the LocalAI admin REST surface"), Describe("ErrHTTPNotFound"), Describe("Bearer token") inproc/client_test.go — Describe("inproc.Client cancellation") core/services/modeladmin/ config_test.go — Describe("ConfigService") with sub-Describes for GetConfig, PatchConfig, EditYAML state_test.go — Describe("ConfigService.ToggleState") pinned_test.go — Describe("ConfigService.TogglePinned") atomic_test.go — Describe("writeFileAtomic") core/http/endpoints/mcp/ localai_assistant_test.go — Describe("LocalAIAssistantHolder") Each package gets a `_suite_test.go` with the standard `RegisterFailHandler(Fail) + RunSpecs(t, "...")` boilerplate. Helpers that previously took testing.T (newTestService, writeModelYAML, readMap, sortedStrings, sortGalleries, etc.) drop the T receiver and use Gomega Expectations directly. tmp dirs come from GinkgoT().TempDir(). No semantic change to test coverage — every original assertion has a direct Gomega counterpart. All suites pass with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test+docs(assistant): drift detector for Tool ↔ REST route mapping Honest gap from the audit: the parity_test.go suite only checks four methods, and uses the same httpapi.Client for both sides — it asserts stability of the DTO shapes, not equivalence between in-process and HTTP. If a contributor adds an admin REST endpoint without an MCP tool, or a tool without a matching httpapi route, both surfaces silently diverge. Add a coverage test plus stronger docs: - pkg/mcp/localaitools/coverage_test.go introduces a hand-maintained toolToHTTPRoute map: every Tool* constant must list the REST endpoint the httpapi.Client hits (or "(none)" with a documented reason). Two Ginkgo specs assert the map and the published catalog stay in sync — one fails when a Tool is added without a route entry, the other fails when a route entry references a tool that no longer exists. Verified by removing the ToolDeleteModel entry locally; the test fired with a clear message pointing the contributor at the file. Deliberate non-test: we don't enumerate live admin REST routes from here. Walking the route registry requires booting Application; parsing core/http/routes/localai.go is brittle. The "new admin REST endpoint → MCP tool" direction stays a PR checklist item — see below. - AGENTS.md gets a new Quick Reference bullet that calls out the rule and points at the test by name. - .agents/api-endpoints-and-auth.md tightens the existing "Companion: MCP admin tool surface" subsection from "if useful, consider..." to "MUST be considered, with three concrete outcomes (tool added, deliberately skipped with documented reason, or forgot — which breaks the contract)". Adds a checklist item at the bottom of the file's authoritative checklist. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): drop duplicate DTOs, surface canonical types Audit feedback: localaitools/dto.go reinvented several types that already existed in the codebase. Replace the duplicates with the canonical types so the LLM-visible wire format stays aligned with the rest of LocalAI by construction (no parallel structs to keep in sync). Removed (and the canonical type now used by the LocalAIClient interface): localaitools.Gallery → config.Gallery localaitools.GalleryModelHit → gallery.Metadata localaitools.VRAMEstimate → vram.EstimateResult Tightened scope: localaitools.Backend → kept, but reduced to {Name, Installed}. ListKnownBackends now returns []schema.KnownBackend (the canonical type already used by REST /backends/known). Kept with documented rationale: localaitools.JobStatus — galleryop.OpStatus has Error error which marshals to "{}". JobStatus is the JSON-friendly mirror. localaitools.Node — nodes.BackendNode carries gorm internals + token hash; we expose only the LLM-relevant fields. ImportModelURIRequest/Response — schema.ImportModelRequest and GalleryResponse are wire-shaped, mine are LLM-shaped (BackendPreference flat, AmbiguousBackend exposed). Side wins: - Drop bytesPerMiB; vram.EstimateResult already carries human-readable display strings (size_display, vram_display) the LLM uses directly. - Drop the handler-private vramEstimateRequest in core/http/endpoints/localai/vram.go and bind directly into modeladmin.VRAMRequest (now JSON-tagged). Both clients pass through these types now where possible (e.g. ListGalleries in inproc.Client is a one-liner returning AppConfig.Galleries; httpapi.Client.GallerySearch decodes straight into []gallery.Metadata). All tests green with -race. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(assistant): extract REST route paths into named constants httpapi.Client had 18 bare-string path sites scattered across methods. Pull them into pkg/mcp/localaitools/httpapi/routes.go: static paths as package-private constants, dynamic paths as small builders that handle url.PathEscape on segment values. No behaviour change. Drops the now-unused net/url import from client.go since path escaping moved into routes.go alongside the path it applies to. Local-only by design: the server-side registrations in core/http/routes/localai.go remain bare strings. Sharing constants across the pkg/ ↔ core/ boundary would invert the layering today; the existing Tool↔REST drift-detector in coverage_test.go is the safety net for that direction. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * docs(assistant): align with shipped UI and dropped bootstrap env vars The LocalAI Assistant doc still described the older iteration: - The in-chat toggle was renamed from "Admin" to "Manage" (the badge is now "Manage mode" and the home page exposes a "Manage by chat" CTA). - LOCALAI_ASSISTANT_BOOTSTRAP_MODEL / --localai-assistant-bootstrap-model and the bootstrap_default_model tool were removed — admins pick a model from the existing selector instead, no env-var configuration required. - The shipped tool catalog includes import_model_uri but didn't appear in the doc; bootstrap_default_model appeared but no longer exists. - The Settings → LocalAI Assistant runtime toggle wasn't mentioned as the preferred way to disable without restart. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-28 19:29:27 +02:00
Ettore Di Giacinto	a0317d9926	refactor(tests): split app_test.go, move real-backend coverage to e2e-backends core/http/app_test.go had grown to 1495 lines exercising three concerns at once: HTTP-layer integration, real-backend inference (llama-gguf, tts, stablediffusion, transformers embeddings, whisper), and service logic that already has unit-level coverage. Each PR paid for 6 backend builds plus real-model downloads to satisfy a single suite. Reorg per layer: - app_test.go (1495 -> 1003 lines) drives the mock-backend binary only. Kept: auth, routing, gallery API, file:// import, /system, agent-jobs HTTP plumbing, config-file model loading. Deleted real-inference specs (llama-gguf chat, ggml completions/streaming, logprobs, logit_bias, transcription, embeddings, External-gRPC, Stores duplicate, Model gallery Context). Lifted Agent Jobs out of the deleted Stores Context. - tests/e2e-backends/backend_test.go gains logprobs, logit_bias, and no-first-token-dup specs (the latter folded into PredictStream). Two new caps gate them so non-LLM backends opt out. - tests/e2e-aio/e2e_test.go gains a streaming smoke under Context("text") to catch container-level streaming regressions. - tests/models_fixtures/ removed; all fixtures referenced testmodel.ggml. app_test.go now writes per-Context inline mock-model YAMLs. CI: - test.yml + tests-e2e.yml gain paths-ignore (docs/, examples/, *.md, backend/) so docs and backend-only PRs skip them. test.yml drops the 6-backend Build step plus TRANSFORMER_BACKEND/GO_TAGS=tts; tests-apple drops the llama-cpp-darwin build. - New tests-aio.yml runs the AIO container nightly + on workflow_dispatch + master/tags. The tests-e2e-container job moved out of test.yml so PRs no longer pay AIO cost. - New tests-llama-cpp-smoke job in test-extra.yml runs on every PR with no detect-changes gate; pulls quay.io/go-skynet/local-ai-backends: master-cpu-llama-cpp (no build on PR) and exercises predict/stream/ logprobs/logit_bias against Qwen3-0.6B. This is the PR-acceptance real-backend gate after AIO moved to nightly. The path-gated heavy test-extra-backend-llama-cpp wrapper appends the same caps so it exercises the moved specs when the backend actually changes. Makefile: - Deleted test-models/testmodel.ggml (the wget chain), test-llama-gguf, test-tts, test-stablediffusion, test-realtime-models. test target drops --label-filter, HUGGINGFACE_GRPC, TRANSFORMER_BACKEND, TEST_DIR, FIXTURES, CONFIG_FILE, MODELS_PATH, BACKENDS_PATH; depends on build-mock-backend. test-stores keeps a focused entry point and depends on backends/local-store. clean-tests also clears the mock-backend binary. Net per typical Go-side PR: ~25min (6 backend builds + tests + AIO) + ~8min e2e drops to ~5min mock-backend test + ~8min e2e + ~5-10min llama-cpp-smoke (image pulled). Docs and backend-only PRs skip the always-on workflows entirely. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash]	2026-04-27 23:09:20 +00:00
Ettore Di Giacinto	ea1df8945b	fix(distributed): preserve UI-added node labels across worker re-register The register endpoint called SetNodeLabels(req.Labels) — replace-all semantics — so every worker re-register wiped every label not in the worker's body. The bug existed since labels were introduced in PR #9186 (Mar 31), but only triggered for workers that supplied labels via --node-labels. PR #9583 (the multi-replica refactor) added an auto-mirrored `node.replica-slots` label to every worker's registration body, which made `len(req.Labels) > 0` always true — turning a latent edge-case bug into a universal one. Operators reported "labels assigned to node do not persist": labels survived until the next worker restart, then disappeared. Fix: iterate req.Labels and call SetNodeLabel (upsert) for each instead of SetNodeLabels (delete-then-recreate). Worker-managed labels still refresh on re-register; UI-added labels survive. Trade-off: an operator who removes a label from --node-labels won't have it auto-removed from the DB on next register — they can clean it via the UI. Acceptable, since the alternative (current behavior) silently destroys operator state. Regression test added first (TDD): RegisterNodeEndpoint registers a node, the test simulates a UI add via SetNodeLabel, then re-registers with a different worker label set; assertion that the UI-added label survives. Test fails against the broken code, passes against the fix. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [Bash]	2026-04-27 21:24:50 +00:00
Ettore Di Giacinto	3280b9a287	fix(distributed): per-replica backend logs (store aggregation + UI) The multi-replica refactor (PR #9583) changed the worker's process key from `modelID` to `modelID#replicaIndex`, but the BackendLogStore kept the bare-modelID lookup. Result: every distributed deployment lost backend logs in the Nodes UI — single-replica too, since even the default capacity of 1 produces a `#0` suffix. Two changes wired together: * pkg/model: BackendLogStore.GetLines/Subscribe now treat a modelID without `#` as a model prefix and merge across all `modelID#N` replica buffers (timestamp-sorted for GetLines; fan-in for Subscribe). Calls with a full `modelID#N` key resolve exactly. ListModels strips replica suffixes and deduplicates so the listing surfaces one entry per loaded model. * react-ui: per-replica log streams as the default. Loaded Models table disambiguates each row with a `rep N` pill (only when the node hosts >1 replica of a model). Each row's "View logs" link routes to the per-replica process key so operators see only that replica's output. The logs page renders the replica context as a chip in the title and surfaces a segmented control — `Replica 0 / 1 / … / All merged` — when the model has multiple replicas; the merged segment uses the bare-modelID URL (delegating to the store's prefix aggregation) for the side-by-side comparison case. Single-replica deployments see no extra UI. Tests added first (TDD): the regression set in backend_log_store_test.go reproduces the bug at the exact failure point — GetLines/ListModels/Subscribe assertions all fail against the broken code, all pass against the fix. TestSubscribe_PerReplicaFilter pins the exact-key path so a future change can't silently break it. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:distill]	2026-04-27 20:55:24 +00:00
Ettore Di Giacinto	375bf1929d	fix(ui): hide meta-dev backends in System → Backends Development toggle The Manage view's flagsFor() short-circuited on b.IsMeta and returned dev=false for every meta backend, so meta-dev entries (e.g. llama-cpp-development, whisper-development, insightface-development) leaked through the Development toggle in distributed mode and stayed visible whether the toggle was on or off. The count chip even under-reported because those rows were excluded from it. Drop the IsMeta short-circuit and trust gallery enrichment for both flags. Production metas (llama-cpp) are tagged isAlias=false / isDevelopment=false in the gallery so they still pass both toggles; meta-dev entries carry isDevelopment=true and now correctly hide alongside concrete dev variants. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-27 20:38:20 +00:00
Ettore Di Giacinto	6b63b47f61	feat(distributed): support multiple replicas of one model on the same node (#9583 ) * feat(distributed): support multiple replicas of one model on the same node The distributed scheduler implicitly assumed `(node_id, model_name)` was unique, but the schema didn't enforce it and the worker keyed all gRPC processes by model name alone. With `MinReplicas=2` against a single worker, the reconciler "scaled up" every 30s but the registry never advanced past 1 row — the worker re-loaded the model in-place every tick until VRAM fragmented and the gRPC process died. This change introduces multi-replica-per-node as a first-class concept, with capacity-aware scheduling, a circuit breaker, and VRAM soft-reservation. Operators can declare per-node capacity via the worker flag `--max-replicas-per-model` (mirrored as auto-label `node.replica-slots=N`) or override per-node from the UI. * Schema: BackendNode gains MaxReplicasPerModel (default 1) and ReservedVRAM. NodeModel gains ReplicaIndex (composite with node_id + model_name). ModelSchedulingConfig gains UnsatisfiableUntil/Ticks for the reconciler circuit breaker. * Registry: replica_index threaded through SetNodeModel, RemoveNodeModel, IncrementInFlight, DecrementInFlight, TouchNodeModel, GetNodeModel, SetNodeModelLoadInfo and the InFlightTrackingClient. New helpers: CountReplicasOnNode, NextFreeReplicaIndex (with ErrNoFreeSlot), RemoveAllNodeModelReplicas, FindNodesWithFreeSlot, ClusterCapacityForModel, ReserveVRAM/ReleaseVRAM (atomic UPDATE with ErrInsufficientVRAM), and the unsatisfiable-flag CRUD. * Worker: processKey now `<modelID>#<replicaIndex>` so concurrent loads of the same model land on distinct ports. Adds CLI flag --max-replicas-per-model (env LOCALAI_MAX_REPLICAS_PER_MODEL, default 1) and emits the auto-label. * Router: scheduleNewModel filters candidates by free slot, allocates the replica index, and soft-reserves VRAM before installing the backend. evictLRUAndFreeNode now deletes the targeted row by ID instead of all replicas of the model on the node — fixes a latent bug where evicting one replica orphaned its siblings. * Reconciler: caps scale-up at ClusterCapacityForModel so a misconfig (MinReplicas > capacity) doesn't loop forever. After 3 consecutive ticks of capacity==0 it sets UnsatisfiableUntil for a 5m cooldown and emits a warning. ClearAllUnsatisfiable fires from Register, ApproveNode, SetNodeLabel(s), RemoveNodeLabel and UpdateMaxReplicasPerModel so a new node joining or label changes wake the reconciler immediately. scaleDownIdle removes highest-replica-index first to keep slots compact. * Heartbeat resets reserved_vram to 0 — worker is the source of truth for actual free VRAM; the reservation is only for the in-tick race window between two scheduling decisions. * Probe path (reconciler.probeLoadedModels and health.doCheckAll) now pass the row's replica_index to RemoveNodeModel so an unreachable replica doesn't orphan healthy siblings. * Admin override: PUT /api/nodes/:id/max-replicas-per-model sets a sticky override (preserved across worker re-registration). DELETE clears the override so the worker's flag applies again on next register. Required because Kong defaults the worker flag to 1, so every worker restart would have silently reverted the UI value. * React UI: always-visible slot badge on the node row (muted at default 1, accented when >1); inline editor in the expanded drawer with pencil-to-edit, Save/Cancel, Esc/Enter, "(override)" indicator when the value is admin-set, and a "Reset" button to hand control back to the worker. Soft confirm when shrinking the cap below the count of loaded replicas. Scheduling rules table gets an "Unsatisfiable until HH:MM" status badge surfacing the cooldown. * node.replica-slots filtered out of the labels strip on the row to avoid duplicating the slot badge. 23 new Ginkgo specs (registry, reconciler, inflight, health) cover: multi-replica row independence, RemoveNodeModel of one replica preserving siblings, NextFreeReplicaIndex slot allocation including ErrNoFreeSlot, capacity-gated scale-up with circuit breaker tripping and recovery on Register, scheduleDownIdle ordering, ClusterCapacity math, ReserveVRAM admission gating, Heartbeat reset, override survival across worker re-registration, and ResetMaxReplicasPerModel handing control back. Plus 8 stdlib tests for the worker processKey / CLI / auto-label. Closes the flap reproduced on Qwen3.6-35B against the nvidia-thor worker (single 128 GiB node, MinReplicas=2): the reconciler now caps the scale-up at the cluster's actual capacity instead of looping. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Read] [Edit] [Bash] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:golang-testing] * refactor(react-ui/nodes): tighten capacity editor copy + adopt ActionMenu for row actions * Capacity editor hint trimmed from operator-doc-style ("Sourced from the worker's `--max-replicas-per-model` flag. Changing it here makes it a sticky admin override that survives worker restarts." → "Saved values stick across worker restarts.") and the override-state copy similarly compressed. The full mechanic is no longer needed in the UI — the override pill carries the meaning and the docs cover the rest. * Node row actions migrated from an inline cluster of icon buttons (Drain / Resume / Trash) to the kebab ActionMenu used by /manage for per-row model actions, so dense Nodes tables stay clean. Approve stays as a prominent primary button — it's a stateful admission gate, not a routine action, and elevating it matches how /manage surfaces install-time decisions outside the menu. * The expanded drawer's Labels section now filters node.replica-slots out of the editable label list. The label is owned by the Capacity editor above; surfacing it again as an editable label invited confusion (the Capacity save would clobber any direct edit). Both backend and agent workers benefit — they share the row rendering path, so the action menu and label filter apply to both. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:critique] [Skill:audit] [Skill:polish] * fix(react-ui/nodes): suppress slot badge on agent workers Agent workers don't load models, so the per-node replica capacity is inapplicable to them. Showing "1× slots" on agent rows was a tiny inconsistency from the unified rendering path — gate the badge on node_type !== 'agent' so it only appears on backend workers. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] * refactor(react-ui/nodes): distill expanded drawer + restyle scheduling form The expanded node drawer used to stack five panels — slot badge, filled capacity box, Loaded Models h4+empty-state, Installed Backends h4+empty-state, Labels h4+chips+form — making routine inspections feel like a control panel. The scheduling rule form wrapped its mode toggle as two 50%-width filled buttons that competed visually with the actual primary action. * Drawer: collapse three rarely-touched config zones (Capacity, Backends, Labels) into one `<details>` "Manage" disclosure (closed by default) with small uppercase eyebrow labels for each zone instead of parallel h4 sub-headings. Loaded Models stays as the at-a-glance headline with a single-line empty hint instead of a boxed empty state. CapacityEditor renders flat (no filled background) — the Manage disclosure provides framing. * Scheduling form: replace the chunky 50%-width button-tabs with the project's existing `.segmented` control (icon + label, sized to content). Mode hint becomes a single tied line below. Fields stack vertically with helper text under inputs and a hairline divider above the right-aligned Save / Cancel. The empty drawer collapses from ~5 stacked sections (~280px tall) to two lines (~80px). The scheduling form now reads as a designed dialog instead of raw building blocks. Both surfaces now match the typographic density and weight of the rest of the admin pages. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:distill] [Skill:audit] [Skill:polish] * feat(react-ui/nodes): replace scheduling form's model picker with searchable combobox The native <select> made operators scroll through every gallery entry to find a model name. The project already has SearchableModelSelect (used in Studio/Talk/etc.) which combines free-text search with the gallery list and accepts typed model names that aren't installed yet — useful for pre-staging a scheduling rule before the node it'll run on has finished bootstrapping. Also drops the now-unused useModels import (the combobox manages the gallery hook internally). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] * refactor(react-ui/nodes): consolidate key/value chip editor + add replica preset chips The Nodes page was rendering the same key=value chip pattern in two places with subtly different markup: the Labels editor in the expanded drawer and (post-distill) the Node Selector input in the scheduling form. The form's input was also a comma-separated string that operators were getting wrong. * Extract <KeyValueChips> as a fully controlled chip-builder. Parent owns the map and decides what onAdd/onRemove does — form state for the scheduling form, API calls for the live drawer Labels editor. Same visuals everywhere; one component to change when polish needs apply. * Replace the comma-separated Node Selector text input with KeyValueChips. Operators were copying syntax from docs and missing commas; the chip vocabulary makes the key=value structure self-documenting. * Add <ReplicaInput>: numeric input + quick-pick preset chips for Min/Max replicas. Picked over a slider because replica counts are exact specs derived from VRAM math (operator decision, not a fuzzy estimate). The chips give one-click access to common values (1/2/3/4 for Min, 0=no-limit/2/4/8 for Max) without the slider's special-value problem (MaxReplicas=0 is categorical, not a position on a continuum). * Drop the now-unused labelInputs state in the Nodes page (the inline label editor's per-node draft state lived there and is now owned by KeyValueChips). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [Skill:distill] * test: fix CI fallout from multi-replica refactor (e2e/distributed + playwright) Two breakages caught by CI that didn't surface in the local run: * tests/e2e/distributed/.go — multiple files used the pre-PR2 registry signatures for SetNodeModel / IncrementInFlight / DecrementInFlight / RemoveNodeModel / TouchNodeModel / GetNodeModel / SetNodeModelLoadInfo and one stale adapter.InstallBackend call in node_lifecycle_test.go. All updated to pass replicaIndex=0 — these tests don't exercise multi-replica behavior, they just need to compile against the new signatures. The chip-builder tests in core/services/nodes/ already cover the multi-replica logic. core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js — the drawer's distill refactor moved Backends inside a "Manage" <details> disclosure that's collapsed by default. The test helper expanded the node row but never opened Manage, so the per-node backend table was never in the DOM. Helper now clicks `.node-manage > summary` after expanding the row. All 100 playwright tests pass locally; tests/e2e/distributed compiles clean. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [Bash] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-27 21:20:05 +02:00
Ettore Di Giacinto	60549a8a60	feat(react-ui): page-width archetype system + mobile/tablet nav polish Replace the universal max-width:1200px cap on .page with a four-tier archetype system (narrow 760, medium 1080, default 1600, wide unbounded) selected per page based on what its UX actually wants. Data/table pages fill ultrawide displays; forms cap at reading width; tabbed feature surfaces breathe. Mobile/tablet: - New 640/1024 breakpoint split. Tablets (640-1023) get a persistent 52px icon rail; below 640 keeps the slide-off drawer. - Drawer polish: body-scroll lock, Escape to close, focus moves into the drawer on open and back to the hamburger on close, aria-hidden + inert on main while open. - Mobile top bar carries hamburger + theme toggle + account avatar (44x44 touch targets) so theme/account aren't trapped in the drawer. - Page-level reflow on phones: page-header column-stacks, filter chips scroll horizontally, tables go edge-to-edge, OperationsBar overflows rather than wrapping. Honors prefers-reduced-motion. Manage > Models: drop the toggle column; Enable/Disable joins the per-row Actions menu alongside Stop/Pin/Edit/Logs/Delete for consistency with the other action verbs. Page-width tokens live in theme.css so future tuning is one line. Removes 7 inline maxWidth workarounds from page roots. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude Code:claude-opus-4-7 [Edit] [Bash]	2026-04-27 11:51:29 +00:00
Ettore Di Giacinto	54728e292f	feat(react-ui): split Manage backends toggle into Variants and Development Meta backends are now always shown — they're the entries operators configure against — and two independent toggles govern the noise around them. "Variants" hides platform-specific concrete builds that a meta backend aliases on the host (e.g. llama-cpp-cuda12-12.4). "Development" hides pre-release `-development` builds. Each toggle shows the count of items currently hidden in its category. The legacy `bm` URL flag is honored on read so existing deep-links resolve to the same view they used to. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-27 08:23:53 +00:00
Ettore Di Giacinto	2da1a4d230	feat(distributed): per-node backend installation from the gallery In distributed mode the Backends gallery used to fan every install out to every worker — fine for auto-resolving (meta) backends like llama-cpp where each node picks its own variant, but wrong for hardware-specific builds like cpu-llama-cpp that would silently land on every GPU node. Adds a node-targeted install path through the existing POST /api/nodes/:id/backends/install plumbing, with two entry points: - Backends gallery row gets a split-button in distributed mode. Auto- resolving keeps "Install on all nodes" as the primary; chevron menu opens the picker. Hardware-specific routes the primary directly to the picker — no fan-out path on the row. - Nodes-page drawer gets a "+ Add backend" button that navigates to /app/backends?target=<node-id>; the gallery scopes itself to that node (banner, single per-row install button, Reinstall/Remove for already- installed). One gallery, two scopes — no second UI to maintain. The picker (new NodeInstallPicker) shows a 3-state suitability column (Compatible / Override / Installed), an auto-expanding variant override disclosure that fires when selected nodes have no working GPU, parallel per-node installs with inline status and Retry-failed-nodes, and a mismatch confirm that names the consequence on the button itself. A 409 fan-out guard on /api/backends/apply protects CLI/Terraform/script users from the same footgun: hardware-specific installs in distributed mode now return code "concrete_backend_requires_target" with a human- readable error and a meta_alternative pointer. The gallery list payload now surfaces capabilities, metaBackendFor and per-row nodes (NodeBackendRef) so the picker and the new Nodes column have everything they need without re-walking the gallery client-side. GODEBUG=netdns=go is set on the compose services because the cgo DNS resolver follows the container's nsswitch.conf to host systemd-resolved (127.0.0.53), unreachable from inside the container; the pure-Go resolver reads /etc/resolv.conf directly and uses Docker's embedded DNS. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude Code:claude-opus-4-7[1m] [Edit] [Bash] [Read] [Write]	2026-04-26 22:05:18 +00:00
Ettore Di Giacinto	988430c850	test(react-ui): drive Manage page Backend logs link via the new kebab menu Manage page row actions moved into ActionMenu in `b336d9c6`, so the inline `<a title="Backend logs">` the e2e specs were asserting on no longer exists. Open the row's kebab and assert against the menuitem. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7	2026-04-26 20:51:01 +00:00
Ettore Di Giacinto	b336d9c626	feat(react-ui): polish Manage page with kebab menus and gallery rows Bring the System / Manage page up to the visual standard of the Install gallery so installed models and backends stop reading like a debug dump. - Unified ResourceRow anatomy (icon, name+description, badges, status, expandable detail) shared across both tabs. - Gallery enrichment cross-references installed names against the gallery list endpoints to surface icons, descriptions, license, tags, and links with a graceful "no description" fallback for custom imports. - Header summary with four StatCards (Models / Backends / Running / Updates) — clickable to switch tab + pre-set filter. - Backends meta + development entries hidden by default; "Show meta & development" paired toggle in the FilterBar with hidden-count hint. - Kebab (three-dot) ActionMenu replaces the inline button cluster on every row; restrained until hover, keyboard-navigable, danger items separated by a divider. - Backend "Version" cell now falls back to short digest, OCI tag, or ocifile basename when no semver is set, instead of showing "—" for every OCI install. Detail panel exposes full Source URI + Digest. - Drop redundant column headers ("Actions", "On") — kebabs and toggles carry their own affordance; screen readers still get a label. - Inline System / User / Meta / Dev badges next to the backend name so the dedicated Type column doesn't reserve space for "USER" repeated. - Tightened the spacing between the System Resources card and the StatCards so they no longer crowd the RAM bar. Extracted StatCard and GalleryLoader from Nodes.jsx and Models.jsx into shared components so the visual language is one source of truth. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude Code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]	2026-04-26 20:33:49 +00:00
Ettore Di Giacinto	e9d8e92988	fix(react-ui): don't yank chat scroll to bottom while user is reading The chat and agent-chat pages auto-scrolled to the bottom on every streamed token. If the user scrolled up to re-read part of a response, the next chunk pulled them back down — making long replies unreadable while streaming. Track a stickToBottomRef on each scroll event: if the user is within 80px of the bottom we keep auto-scrolling, otherwise we leave them where they are. On chat switch we snap back to the bottom and re-pin. Same fix applied to both Chat.jsx and AgentChat.jsx since they share the same streaming pattern. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]	2026-04-26 19:35:39 +00:00
Ettore Di Giacinto	c8d63a1003	fix(react-ui): stop Manage page from blanking on auto-refresh; show real model use cases - useModels.refetch now runs silently — distributed-mode 10s auto-refresh no longer flips loading=true and replaces the table with a spinner card. - Manage Use Cases column derives badges from each model's actual capabilities (Chat / Image / TTS / Embeddings / etc.) instead of hardcoding a "Chat" link for every row. - FilterBar right slot is right-aligned via margin-left:auto so the Update button lives at the end of the row, not next to the chips. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]	2026-04-26 19:35:39 +00:00
Ettore Di Giacinto	83b384de97	feat: surface distributed backend management errors (#9552 ) * fix(distributed): surface per-node backend op errors to OpStatus DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`. When workers replied Success=false (e.g. an OCI image with no arm64 variant on a Jetson host), the per-node Error string was recorded in result.Nodes[].Error but never reached the toplevel return value, so OpStatus.Error stayed empty and the UI reported the install as "completed" while the backend was nowhere on the cluster. Add BackendOpResult.Err() that aggregates per-node Status=="error" entries into a single error. Queued nodes (waiting for reconciler retry) are deliberately not treated as failures. Wire the three callers and DeleteBackendDetailed to call result.Err() so reply.Success=false finally reaches OpStatus.Error → /api/backends/job/:uid → the UI. The Delete closures had a related bug: they discarded the reply with `_` and only checked the NATS round-trip error, so reply.Success=false was a silent success even with the new aggregation. Check both. Standalone mode (LocalBackendManager) already surfaces gallery errors correctly through the same OpStatus.Error path; no change needed there. Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete. Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write] * feat(react-ui): per-node backend delete + clearer upgrade affordance The Nodes page exposed a per-node "reinstall" button (fa-sync-alt, tooltip "Reinstall backend") but no per-node delete, even though the Go side has had POST /api/nodes/:id/backends/delete → RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up for a while. Sync icons read as "refresh data" — the action is functionally an upgrade (re-pulls the gallery image), so the affordance was misleading. Per-node backend row now renders two icon buttons: - Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend on this node". Names both action and scope to differentiate from the cluster-wide upgrade on the Backends page. - Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend from this node". Matches the node-level destructive style at the row action column rather than the solid btn-danger of primary destructive pages, since this is a secondary action inside a busy row. Delete goes through the existing ConfirmDialog (danger=true) with copy that names the backend and the node explicitly — it's a non-recoverable op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which already existed in the API client. Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip), delete button presence, confirm dialog flow with POST body assertion, and cancel-doesn't-POST. Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]	2026-04-25 08:57:59 +02:00
Ettore Di Giacinto	487e3fd2a4	feat(react-ui): editorial refresh with Nord palette and polished primitives (#9550 ) * feat(react-ui): editorial refresh with Nord palette and polished primitives Replaces the cool gray-blue theme with a deep Nord-inspired palette: frost-cyan accent (#88c0d0) on deep blue-black surfaces (#13171f / #1a1f2a / #242a36), snow-storm text scale, aurora status colours. - Typography: Geist Variable + Geist Mono Variable (Google Fonts) with ss01/ss03/cv11 stylistic alternates; strengthened h1-h6 hierarchy; editorial negative tracking. - Primitives: buttons gain depth (inset highlight + hover lift + brightness filter); inputs become sunken wells with sage-swap-to-frost focus rings; cards hover-lift and gain an .card--accent left-rail variant; badges become mono caps rectangles with tabular-nums. - Chrome: sidebar active state is now an inset left rail + tint (no border-left); modals get popIn animation and proper shadow lift; toasts carry an inset accent bar + slide-in instead of tinted fills; operations bar breathes on active installs. - Empty states: editorial pattern (eyebrow rule, large mono title, 52ch lede) that inherits gracefully even without page JSX edits. - Chat: assistant bubbles drop the gray-nested-in-gray card for a transparent pull-quote with a left border; user bubbles soften from loud accent fill to a subtle frost tint. - Motion: custom spring easing cubic-bezier(0.22,1,0.36,1), 180ms standard; breathing/pulse/popIn keyframes; global prefers-reduced- motion honoring. - Radii tightened to 3/5/8/10px; warm-shadow tokens redone for cool depth; ::selection, :focus-visible, kbd globals added. - Migrated hardcoded 'JetBrains Mono' CSS literals to var(--font-mono) so the Geist Mono swap lands everywhere. Scope is intentionally tokens + primitives only. Page JSX and the ~1,800 inline style={{…}} instances are untouched and flagged as follow-ups. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] * feat(react-ui): complete-coverage pass — migrate inline styles to tokens Follows up the editorial/Nord token refresh with a mechanical sweep of page JSX and shared components so nothing bypasses the design system. - Font family: replaced 80+ 'JetBrains Mono' / 'Space Grotesk' inline literals (and the string-CSS variants in CollectionDetails and AgentStatus) with var(--font-mono) / var(--font-sans). SVG <text> nodes that used the attribute form were switched to style={{ }} so the CSS variable resolves. - Radii: every unquoted numeric borderRadius (2/3/4/10) is now a var(--radius-) token; 50% and 999px kept as computed shapes. - Spacing: clean-token gaps and margins (4/8/16px) moved to var(--spacing-xs/sm/md); padding: '4px 8px' and '8px 16px' lifted into token pairs. Micro-values (2/6/10/12px) left inline where no token maps cleanly. - Colors: Talk.jsx button/canvas-surface hardcodes moved to var(--color-); FineTune.jsx chart series colours now use the --color-data-* Nord palette (cyan/red/purple/orange instead of tailwind hex); AgentStatus tool-call icon and error tag hex swapped for var(--color-warning) / var(--color-text-inverse). - CodeMirror editor (utils/cmTheme.js): both themes rebased on Nord — polar-night surfaces and aurora syntax highlighting (dark), snow- storm surfaces with darkened aurora (light). Caret/selection/active line/search now frost-cyan tinted instead of legacy indigo/purple. Legitimately dynamic styles (computed widths, per-row colours, canvas 2D context fill/stroke for waveform and spectrogram drawing) remain inline — they can't be expressed as CSS tokens. 29 files, +237/-237 — identity preserved, semantics re-anchored to the token system. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]	2026-04-24 23:35:59 +02:00
dependabot[bot]	c4511be33a	chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9544 ) chore(deps): bump postcss Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [postcss](https://github.com/postcss/postcss). Updates `postcss` from 8.5.8 to 8.5.10 - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/postcss/postcss/compare/8.5.8...8.5.10) --- updated-dependencies: - dependency-name: postcss dependency-version: 8.5.10 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-24 22:02:41 +02:00
Richard Palethorpe	3db60b57e6	fix(realtime): consume ChatDeltas when C++ autoparser clears Response (#9538 ) The llama.cpp C++-side chat autoparser clears Reply.Message and delivers parsed content/reasoning/tool-calls via Reply.chat_deltas. chat.go handles this (non-SSE path uses ToolCallsFromChatDeltas/ContentFromChatDeltas/ ReasoningFromChatDeltas), but realtime.go only read pred.Response, so any model routed through the autoparser (Qwen2.5/3 and friends) produced a silent reply: backend emitted N tokens, the session surface saw zero. Mirror the non-SSE chat path in realtime's triggerResponse: when deltas carry tool calls or content, use them directly; otherwise fall back to the existing raw-text parsing. Assisted-by: claude-opus-4-7-1M [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-24 14:41:38 +02:00
Tai An	5e062b4d1f	fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) (#9526 ) * fix(anthropic): use SetFunctionCallNameString for specific tool forcing * fix(openai/realtime): use SetFunctionCallNameString for specific tool forcing * fix(openresponses): use SetFunctionCallNameString for specific tool forcing	2026-04-24 09:06:42 +02:00

1 2 3 4 5 ...

541 Commits