LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-31 20:21:26 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	8d6548c0b9	fix(distributed): sync gallery OpCache + caches across frontend replicas (#9983 ) When the LocalAI frontend deployment is scaled past one replica, the UI's /api/operations poll round-robins between pods. Each pod kept the OpCache (galleryID->jobID), OpStatus map, and the post-install in-memory caches (ModelConfigLoader, UpgradeChecker) purely in-process. Reads never consulted PostgreSQL or NATS even though writes already published to PG. Symptoms: - A user installing a model on replica A saw the operation card flicker in and out as the load balancer alternated. - The Models page re-fetched the whole gallery on every flicker because useEffect([operations.length]) re-fires when the count changes. - A chat completion that landed on replica B after the install completed on replica A failed to find the new model — B's ModelConfigLoader was still the old one because nothing told it to reload from disk. - The UpgradeChecker 6-hour cache stayed stale on peer replicas after a backend upgrade, so /api/backends/upgrades kept surfacing an upgrade that had already shipped. Mirror the jobs Dispatcher pattern for gallery ops: - OpCache learns SetMessagingClient/SetGalleryStore + a Start(ctx) that hydrates from PostgreSQL and subscribes to gallery.opcache.{start,end}. Set/SetBackend now upsert cache_key + is_backend_op on the gallery_ operations row and broadcast OpCacheEvent so peers merge it in. The hydrate path uses a new GalleryStore.ListActive() (status in {pending, downloading, processing} and updated within 30 min). - GalleryService.SubscribeBroadcasts wires a SubjectGalleryProgress- Wildcard subscriber that calls a new lock-light mergeStatus into the local statuses map, plus a SubjectGalleryCancelWildcard subscriber that runs the locally-registered cancel func. Hydrate() restores active rows from PostgreSQL on startup so a freshly-started replica is not observably empty mid-install. CancelOperation tolerates the cancel func living on a different replica and publishes anyway. - modelHandler and backendHandler publish on the new SubjectCacheInvalidateModels / SubjectCacheInvalidateBackends after a successful install/delete/upgrade. SubscribeBroadcasts wires peers to refresh: OnModelsChanged (re-runs LoadModelConfigsFromPath) and OnBackendOpCompleted (re-triggers UpgradeChecker). The originating replica reloads inline so it never enters the broadcast handler. - OpStatus.Error (an error interface) flat-marshalled to "{}" over JSON, so a failed install replicated to a peer arrived with a nil error and the UI's failure banner never appeared. Add MarshalJSON/UnmarshalJSON via an opStatusWire shim that round-trips Error as a string. - UpdateStatus and CancelOperation now drop the mutex before publishing to NATS or persisting to PostgreSQL. The wildcard subscriber's mergeStatus loops back into the same service on the publishing replica and would deadlock otherwise; this also prevents future PG round-trips from stalling concurrent readers on every progress tick. Tests cover the OpStatus error round-trip, OpCache propagation through a shared in-memory bus, OpCache PostgreSQL hydration (active-only), GalleryService progress + cancel broadcast, Nodes preservation across a peer's bare progress tick, GalleryService hydration from PG, and the two cache-invalidation broadcasts (models + backends). 44 specs total in galleryop; routes/operations specs and jobs/agents suites still pass. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-25 17:28:14 +02:00
Richard Palethorpe	6a80e23733	feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 ) Add a routing middleware stack and a cloud-proxy backend. * cloud-proxy: a Go gRPC backend that forwards OpenAI- and Anthropic-shaped chat requests to upstream providers, with an optional translate mode (OpenAI request -> Anthropic /v1/messages -> OpenAI response) and full tool-calling support. * routing: admission control, content-aware model routing (embedding cache + classifier + rerank + Arch-Router score), PII detection/redaction (regex + NER) with streaming filter and OpenAI/Anthropic adapters, and a per-user/per-key billing recorder backed by GORM or in-memory storage. * middleware: UsageMiddleware records usage via the billing recorder, plus admission, route-model, usage-stamp and trace middlewares. * observability: BackendTrace ring buffer stores full request bodies (capped), MITM proxy emits structured trace events, and router classifier decisions surface at /api/router/decide. * gallery: Arch-Router-1.5B (Q4_K_M and Q8_0). * UI: cloud-proxy model-editor fields, classifier system-prompt and score-normalization config, and a Traces page rendering request bodies. Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-25 09:28:27 +02:00
LocalAI [bot]	a0f3e26245	fix(distributed): make admin backend installs resilient and observable (#9958 ) * feat(distributed): add configurable NATS backend install/upgrade timeouts Adds BackendInstallTimeout and BackendUpgradeTimeout to DistributedConfig with 15m defaults, following the existing MCPToolTimeout / WorkerWaitTimeout pattern. These will replace the hardcoded literals in RemoteUnloaderAdapter so admin-driven backend installs across the cluster survive long OCI image pulls that previously timed out at 3m. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(distributed): gofmt alignment after timeout fields Re-aligns the Validate() negative-duration map and the Default* const block so the new BackendInstall/UpgradeTimeout entries do not leave the surrounding columns mis-padded. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(cli): surface LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT and _UPGRADE_TIMEOUT Parses the two new env vars on the run CLI and threads them through the existing AppOption builder so DistributedConfig picks them up. Invalid duration strings now fail loudly at startup rather than silently falling back to the default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): inject NATS install/upgrade timeouts into RemoteUnloaderAdapter Removes the hardcoded 3m / 15m literals from RemoteUnloaderAdapter and threads in DistributedConfig.BackendInstallTimeoutOrDefault() and BackendUpgradeTimeoutOrDefault() at construction. Install now defaults to 15m (was 3m); cold OCI image pulls on Jetson Wi-Fi routinely blew past the old ceiling. Scripted messaging client captures the timeout so tests can assert the configured value actually reaches the NATS request. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): introduce galleryop.ErrWorkerStillInstalling sentinel When the NATS request-reply for backend.install (or .upgrade) times out the worker is almost always still pulling the OCI image. Wrap the timeout in a typed sentinel so the manager above can distinguish "worker hung" from "worker still working" and leave the pending_backend_ops row in place for the reconciler to confirm via backend.list. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): treat NATS install timeout as in-progress, not failure When a worker times out replying to backend.install but the install is still running on the worker, enqueueAndDrainBackendOp now reports a running_on_worker status and pushes NextRetryAt out by the install timeout so the reconciler does not immediately re-fire another install while the worker is still pulling the image. The pending_backend_ops row stays in place for the next reconciler pass to confirm via backend.list. InstallBackend wraps the result in galleryop.ErrWorkerStillInstalling so callers can branch (galleryop renders yellow in-progress instead of red error). UpgradeBackend uses the same wrap. Adds RemoteUnloaderAdapter.InstallTimeout() so the manager can push NextRetryAt by the configured timeout without reaching into a private field, and NodeRegistry.RecordPendingBackendOpInFlight as the soft cousin of RecordPendingBackendOpFailure. Also includes incidental gofmt-driven struct-field alignment in registry.go on lines unrelated to the change (touched files are re-formatted to canonical form per project policy). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed): don't increment Attempts on in-flight install timeout An in-flight timeout (worker still pulling the OCI image) is not a failed attempt, it's a delayed one. Incrementing Attempts let genuinely-progressing slow installs (e.g. 30 GB CUDA images on Wi-Fi) trip the reconciler's maxPendingBackendOpAttempts cap and dead-letter the queue row while the worker was still legitimately working. RecordPendingBackendOpInFlight now only updates LastError and NextRetryAt. Also documents "running_on_worker" in the NodeOpStatus.Status enum comment so Task 6 implementers see the full surface. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(galleryop): surface ErrWorkerStillInstalling as non-error OpStatus When the distributed backend manager returns an error that wraps ErrWorkerStillInstalling, backendHandler now completes the op with a "still installing in background" message rather than marking it as a red failure. Admin UI sees a yellow in-progress state; reconciler confirms completion on its next pass. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): end-to-end install-timeout-then-reconcile Wires Task 1-6 end-to-end so any seam mismatch surfaces in CI rather than during a real cluster install. NATS times out, the queue row stays alive with running_on_worker status, the worker eventually reports the backend installed via backend.list, the manager surfaces it via ListBackends. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(distributed): document LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT / _UPGRADE_TIMEOUT Add the two new operator-tunable env vars to the Frontend Configuration table in the distributed-mode docs. Explains the 15m default, when to raise it (slow links pulling multi-GB OCI images), and the new "still installing in background" admin-UI state when the round-trip times out but the worker is still working. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): clear pending install rows when backend.list confirms DistributedBackendManager.ListBackends now proactively clears pending_backend_ops install rows whose (nodeID, backend) is reported installed by backend.list. Operator UI updates immediately instead of waiting up to installTimeout (default 15m) for the next reconciler tick after NextRetryAt. Only install rows are cleared; upgrade and delete intents are not satisfied by presence in backend.list and continue to drain through their normal reconciler paths. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(messaging): add BackendInstallProgressEvent wire type and subject New NATS subject nodes.<nodeID>.backend.install.<opID>.progress lets the worker publish transient progress events (file, current/total bytes, percentage, phase) while a long-running install pulls its OCI image. BackendInstallRequest gains an optional OpID field so the worker knows which subject to publish on. Transient pub/sub (not JetStream): the install reply remains ground truth for success/failure; dropped progress events are tolerable. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(messaging): drop em-dash from BackendInstallProgress test comment Per project convention (no em-dashes anywhere). Comment substance is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): worker publishes debounced install progress over NATS When BackendInstallRequest.OpID is set, the worker's backend.install handler wires a debounced publisher (250ms window) into the gallery download callback. Each tick becomes a BackendInstallProgressEvent on nodes.<nodeID>.backend.install.<opID>.progress; the publisher always emits a final event on Flush so the UI sees the terminal percentage. Old masters that do not set OpID continue to run silent installs: no behavior change for them. Lock ordering: the publisher releases its mutex before calling messaging.Publish so a slow network never stalls the install loop. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): RemoteUnloaderAdapter subscribes to install progress InstallBackend gains opID + onProgress parameters. When both are set, the adapter subscribes to nodes.<nodeID>.backend.install.<opID>.progress BEFORE publishing the install request, decodes each message into the caller's onProgress callback in a goroutine (so a slow callback never stalls the NATS reader thread), and unsubscribes after RequestJSON returns. When onProgress is nil OR opID is empty (the reconciler retry path), subscription is skipped entirely - silent installs cost nothing extra. Subscribe failure is logged at Warn and the install proceeds without progress streaming; the NATS round-trip still owns terminal status. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): forward backend install progress into galleryop OpStatus DistributedBackendManager.InstallBackend now passes the gallery op ID and a progress bridge into the adapter call. Each BackendInstallProgressEvent from the worker becomes a galleryop.ProgressCallback tick - which the existing backendHandler already turns into OpStatus.UpdateStatus, so the admin UI/SSE polling sees per-byte progress for distributed installs without any UI-side change. UpgradeBackend is intentionally left silent for now: its wire request (BackendUpgradeRequest) does not carry OpID, and rolling-update fallback is the rarer path. Will be picked up in a follow-up if the worker upgrade path also gets a progress channel. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): InstallBackend tolerates silent (pre-Phase-2) workers A worker on pre-Phase-2 code never publishes progress events. The new master subscribes optimistically; this spec pins that a silent worker still produces a green install with no progressCb ticks. The install reply is the source of truth for terminal state; the progress stream is a best-effort UX enrichment. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(distributed): document install progress streaming Note the new nodes.<nodeID>.backend.install.<opID>.progress subject and the silent-worker compatibility behavior so operators know to expect real-time progress and what happens on a mixed-version cluster. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(distributed): note progress-event ordering trade-off in InstallBackend Document near the goroutine dispatch why ordering at the consumer is best-effort, why it rarely matters in practice (worker debounce >> goroutine jitter), and what a future hardening pass would look like (Seq field + stale-by-seq drop). Stops the next reader from accidentally "fixing" the goroutine pool away. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(galleryop): add NodeProgress + OpStatus.Nodes for per-node breakdown Adds the data model the UI needs to render an expandable per-node breakdown of a fanned-out backend install. NodeProgress carries node identity (ID + name), per-node status (queued / running_on_worker / success / error / downloading), the current file + bytes + percentage from the Phase 2 progress stream, and any per-node error. OpStatus.Nodes is the slice the /api/operations handler will surface in a follow-up. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(galleryop): UpdateNodeProgress merges per-node ticks by NodeID GalleryService.UpdateNodeProgress(opID, nodeID, np) merges a NodeProgress into OpStatus.Nodes (keyed by NodeID, no duplicates) and mirrors the latest tick into the aggregate Progress / FileName / DownloadedFileSize / TotalFileSize fields so the legacy single-bar OperationsBar view keeps working unchanged alongside the new per-node breakdown. Concurrent-safe via the existing g.Mutex. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): write per-node OpStatus entries during install fan-out DistributedBackendManager now accepts a nodeProgressSink and feeds it two streams: 1. enqueueAndDrainBackendOp emits a per-node terminal entry on each status it appends to BackendOpResult (queued, success, error, running_on_worker). The opID is threaded through the function so the sink gets the right gallery op identity. 2. The install apply closure fans each BackendInstallProgressEvent into the sink as a downloading entry, alongside the legacy progressCb path so the aggregate single-bar view stays correct. Production wiring passes the GalleryService (which implements UpdateNodeProgress via Task 2) as the sink. Single-node tests pass nil. DeleteBackend and UpgradeBackend pass an empty opID so the sink path no-ops for ops that aren't gallery-tracked the same way as Install. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(operations): expose per-node breakdown on /api/operations When an operation's OpStatus has Nodes entries (populated by the Phase 4 progress sink wiring), surface them as a "nodes" array on the /api/operations response, sorted by node_name for stable rendering. Backward compatible: legacy clients ignore the field; ops without any node entries (single-node mode, model installs) omit the array entirely thanks to the empty-slice guard. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): per-node breakdown in OperationsBar When an install op fans out to more than one worker, the operations bar now shows a "N nodes" chevron that expands into a per-node list. Each row carries the node's status (color-coded pill), the current file being downloaded, byte counts, percentage, and a thin per-node progress bar. Yellow "Worker busy" pill marks running_on_worker status with a tooltip explaining the NATS round-trip timed out but the worker is still installing in the background. Backward compatible: ops without a nodes field (legacy or single-node mode) render as before. State for expand/collapse is local to the component, keyed by jobID/id - reload starts collapsed. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(distributed): document per-node breakdown in the operations bar Adds a short subsection covering the expandable "N nodes" chevron in the OperationsBar admin UI, the meaning of each status pill, and how it relates to the /api/operations nodes array. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(galleryop): UpdateStatus preserves Nodes when caller sends none Real-world bug surfaced by the Phase 4 multi-worker smoke test: the nodes[] array in /api/operations flickered between a single node at a time on a 2-worker install. Root cause: the Phase 2 progress bridge also calls the legacy progressCb -> UpdateStatus(&OpStatus{...}) on every tick. UpdateStatus then overwrote the entire status pointer, wiping the Nodes slice that UpdateNodeProgress had just merged in. Fix: in UpdateStatus, if the incoming op has an empty Nodes slice, carry forward the previous status's Nodes before storing. Callers that explicitly populate Nodes still win (their slice replaces the prior one, no merge across the two code paths). Two regression specs added pinning both directions of the contract. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(distributed): strip implementation details from user-facing docs Trim the new install/upgrade timeout rows and the install-progress sections to focus on what the operator sees and tunes. Drops: - the NATS subject names and pub/sub mechanics - "round-trip" / reconciler / backend.list jargon - /api/operations polling cadence - "pre-2026-05-22" version references Reframes the breakdown text around the admin UI (Operations Bar, chevron, status pills, "Worker busy" tooltip). Implementation context lives in the agent notes and code comments. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(config): move DistributedConfig.Validate flag names to constants The negative-duration check map was a wall of literal kebab-case strings that had to stay in sync with the kong-derived CLI flag names manually. Move them to a Flag* const block alongside the existing Default* block so a rename of either the Go field or the CLI naming convention forces a compile error rather than silent drift. Sole consumer today is Validate; the constants are exported so future operator-facing surfaces (e.g. error messages on other validation paths) can reference them by name instead of repeating the literals. Tests pin both the literal values (so a future "let's just rename this" doesn't accidentally regress the CLI flag) and the negative- duration error message for the new BackendInstall / BackendUpgrade fields. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed): extract NodeStatus and Phase enums to constants Sweep for the same literal-string-as-identifier pattern called out on the Validate flag names: the per-node install status enum ("queued" \| "downloading" \| "running_on_worker" \| "success" \| "error") appeared as raw literals across managers_distributed.go (10+ sites, including 3 separate `n.Status == "running_on_worker"` checks), operation.go, and the test suite. Same shape for the Phase enum ("resolving" \| "downloading" \| "extracting" \| "starting") in the worker-side progress publisher. Promote both to exported const blocks: - galleryop.NodeStatus{Queued,Downloading,RunningOnWorker,Success,Error} shared between galleryop.NodeProgress.Status (the wire field) and nodes.NodeOpStatus.Status (the in-process per-node summary) - messaging.Phase{Resolving,Downloading,Extracting,Starting} shared between the worker publisher and any future consumer that needs to switch on phase Tests pin both the literal values (so a future "let's just rename" doesn't silently change the JSON wire) and use the constants in setup (so the producer side stays drift-protected). Wire-format assertions on the /api/operations JSON output keep their literals deliberately, so the constant value can never silently diverge from what the UI receives. Out of scope for this PR (separate cleanup): the finetune and quantization job-status enums have the same anti-pattern with 14+ literal sites each, but predate this PR's work. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-23 12:35:44 +02:00
LocalAI [bot]	61bf34ea2f	fix(traces): cap captured body size to keep admin Traces UI responsive (#9946 ) The trace middleware buffered the full request and response bodies for every JSON exchange. With a chatty agent-pool RAG workload, /embeddings responses (large vector arrays) accumulated to tens of MB in the in-memory buffer; the admin Traces page would then download and parse 40+ MB on every load and on every 5s auto-refresh, locking the UI in a loading state. Add LOCALAI_TRACING_MAX_BODY_BYTES (default 64 KiB) that caps each captured body. The full payload still flows through to the real client; only the trace copy is bounded. Exchanges record body_truncated and original body_bytes so the dashboard can show that truncation happened. The cap is configurable via env, CLI, and runtime_settings.json. Also unblock recovery: the Traces page now keeps the Clear button enabled while loading, since "buffer too large to render" is exactly when the user needs to clear it. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-22 15:29:24 +02:00
Richard Palethorpe	5d0b549049	feat(gallery): verify backend OCI images with keyless cosign (#9823 ) * feat(gallery): verify backend OCI images with keyless cosign Close a trust gap where a registry compromise or MITM could silently replace a backend image: the gallery YAML tells LocalAI which image to pull, but until now nothing verified the bytes came from our CI. Consumer (pkg/oci/cosignverify): - New package using sigstore-go to verify keyless-cosign signatures. - OCI 1.1 referrers API + new bundle format (no legacy :tag.sig). - Policy fields: Issuer / IssuerRegex / Identity / IdentityRegex / NotBefore. NotBefore is the revocation lever — keyless Fulcio certs are ephemeral so revocation is policy-side; advancing not_before in the gallery YAML invalidates every signature predating the cutoff. - TUF trusted root cached process-wide so N backends from one gallery do 1 fetch, not N. Plumbing: - pkg/downloader: ImageVerifier interface + WithImageVerifier option threaded through DownloadFileWithContext. Verification runs between oci.GetImage and oci.ExtractOCIImage, with digest pinning via pinnedImageRef to close the TOCTOU window. Skips the verifier's HEAD when the ref is already digest-pinned. - core/config: Gallery.Verification YAML block. - core/gallery: backendDownloadOptions builds the verifier from the policy; applied on initial URI, mirrors, and tag fallbacks. - core/gallery/upgrade: the upgrade path now routes through the same options builder. A regression Ginkgo spec pins this contract — without it, UpgradeBackend silently bypassed verification. - core/cli: --require-backend-integrity (LOCALAI_REQUIRE_BACKEND_INTEGRITY) escalates missing policy / empty SHA256 from warn to hard-fail. Producer (.github/workflows/backend_merge.yml): - id-token: write at job scope (PR-fork-safe via existing event gate). - sigstore/cosign-installer@v3 pinned to v2.4.1. - After each docker buildx imagetools create, resolve the manifest list digest and run cosign sign --recursive --new-bundle-format --registry-referrers-mode=oci-1-1 against repo@digest. --recursive signs the index and every per-arch entry, matching how the consumer resolves a tag to a platform-specific manifest before verifying. Rollout: backend/index.yaml has no `verification:` block yet, so this PR is backward-compatible — installs proceed with a warning until the gallery is populated. Strict mode is opt-in. Assisted-by: claude-code:claude-opus-4-7 [Bash] [Edit] [Read] [Write] [WebSearch] [WebFetch] Signed-off-by: Richard Palethorpe <io@richiejp.com> * refactor(gallery): plumb RequireBackendIntegrity through config instead of env The previous implementation re-exported the --require-backend-integrity CLI flag into LOCALAI_REQUIRE_BACKEND_INTEGRITY via os.Setenv, then re-read it in core/gallery via os.Getenv. This leaked process state into the gallery package and made the flag impossible to override per-call or test without touching the env. Add RequireBackendIntegrity to ApplicationConfig (with a matching WithRequireBackendIntegrity AppOption) and thread the bool through every install/upgrade path: InstallBackend, InstallBackendFromGallery, UpgradeBackend, InstallModelFromGallery, InstallExternalBackend, ApplyGalleryFromString/File, startup.InstallModels. Worker subcommands gain the same env-bound flag on WorkerFlags so distributed-worker installs honor it consistently with the worker daemon path. Add a forbidigo lint rule against os.Getenv / os.LookupEnv / os.Environ to keep the env-leak pattern from creeping back. Existing offenders (p2p, config loaders, etc.) are baseline-grandfathered by the existing new-from-merge-base: origin/master setting; targeted path exclusions cover the legitimate cases — kong CLI entry points, backend subprocesses, system capability probes, gRPC AUTH_TOKEN inheritance, test gating env vars. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-18 08:02:20 +02:00
Richard Palethorpe	969005b2a1	feat(gallery): Speed up load times and clean gallery entries (#9211 ) * feat: Rework VRAM estimation and use known_usecases in gallery Signed-off-by: Richard Palethorpe <io@richiejp.com> Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] * chore(gallery): regenerate gallery index and add known_usecases to model entries Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-06 14:51:38 +02:00
Ettore Di Giacinto	bbcaebc1ef	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 ) * feat(concurrency-groups): per-model exclusive groups for backend loading Adds `concurrency_groups: [...]` to model YAML configs. Two models that share a group cannot be loaded concurrently on the same node — loading one evicts the others, reusing the existing pinned/busy/retry policy from LRU eviction. Layered design: - Watchdog (pkg/model): per-node correctness floor — on every Load(), evict any loaded model that shares a group with the requested one. Pinned skips surface NeedMore so the loader retries (and ultimately logs a clear warning), instead of silently allowing the rule to be violated. - Distributed scheduler (core/services/nodes): soft anti-affinity hint — scheduleNewModel prefers nodes that don't already host a same-group model, falling back to eviction only if every candidate has a conflict. Composes with NodeSelector at the same point in the candidate pipeline. Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy models running on different nodes is fine. The ConfigLoader is wired into SmartRouter via a small ConcurrencyConflictResolver interface so the nodes package keeps a narrow surface on core/config. Refactors the inner LRU eviction body into a shared collectEvictionsLocked helper and the loader retry loop into retryEnforce(fn, maxRetries, interval), so both LRU and group enforcement share busy/pinned/retry semantics. Closes #9659. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(watchdog): sync pinned + concurrency_groups at startup The startup-time watchdog setup lives in initializeWatchdog (startup.go), not in startWatchdog (watchdog.go). The latter is only invoked from the runtime-settings RestartWatchdog path. As a result, neither SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot, so `pinned: true` and `concurrency_groups: [...]` only became effective after a settings-driven watchdog restart. Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end: loading model A in group "heavy", then C with no group (coexists), then B in group "heavy" now correctly evicts A and leaves [B, C]. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec CI lint runs new-from-merge-base, so the existing pre-existing `defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the one introduced by the concurrency_groups YAML round-trip test is held to errcheck. Wrap the remove in a closure that discards the error. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-05 08:42:50 +02:00
Ettore Di Giacinto	b1a99436c7	feat(branding): admin-configurable instance name, tagline, and assets (#9635 ) Adds a whitelabeling feature so an operator can replace the LocalAI instance name, tagline, square logo, horizontal logo, and favicon from the admin Settings page. Defaults fall back to the bundled assets so existing installs are unaffected. The public GET /api/branding endpoint is reachable pre-auth so the login screen can render the configured branding before sign-in. Mutating routes (POST/DELETE /api/branding/asset/:kind) remain admin-only. Text fields (instance_name, instance_tagline) ride the existing /api/settings flow; binary assets get a dedicated multipart upload route that persists files under DynamicConfigsDir/branding/. To prevent the Settings page's stale local state from clobbering an upload on save, UpdateSettingsEndpoint preserves whatever the on-disk asset filename fields are regardless of the body — /api/branding/asset/* are the sole writers for those fields. The MCP catalog gains get_branding and set_branding tools (text fields only; file upload stays UI-only) plus a configure_branding skill prompt. While wiring this up, the same restart-loss class of bug surfaced for several existing fields whose RuntimeSettings entries were never read by the startup loader. Fix loadRuntimeSettingsFromFile() to load: - branding (instance_name, instance_tagline, *_file basenames) - auto_upgrade_backends, prefer_development_backends - localai_assistant_enabled - open_responses_store_ttl - the 7 existing AgentPool fields (enabled, default/embedding model, chunking sizes, enable_logs, collection_db_path) Also exposes 3 new AgentPool runtime settings (vector_engine, database_url, agent_hub_url) via /api/settings + the Settings UI, with the same load-on-startup wiring. The file watcher's manual-edit path is intentionally not changed — the in-process API endpoints already update appConfig directly, so the watcher is redundant for supported flows and a separate refactor for everything else. 15 TDD specs cover the loader behaviour (1 branding + 11 adjacent + 3 new agent-pool); 2 specs cover the persistence helpers and the clobber-prevention contract. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-02 15:51:36 +02:00
Ettore Di Giacinto	4906cbad04	feat: add biometrics UI (#9524 ) * feat(react-ui): add Face & Voice Recognition pages Expose the face and voice biometrics endpoints (/v1/face/, /v1/voice/) through the React UI. Each page has four tabs driving the six endpoints per modality: Analyze (demographics with bounding boxes / waveform segments), Compare (verify with a match gauge and live threshold slider), Enrollment (register / identify / forget with a top-K matches view), Embedding (raw vector inspector with sparkline + copy). MediaInput supports file upload plus live capture: webcam snap-to-canvas for face, MediaRecorder -> AudioContext -> 16-bit PCM mono WAV transcode for voice (libsndfile on the backend only handles WAV/FLAC/OGG natively). Sidebar gets a new Biometrics section feature-gated on face_recognition / voice_recognition; routes are wrapped in <RequireFeature>. No new dependencies -- Font Awesome icons picked from the Free set. Assisted-by: Claude:Opus 4.7 * fix(localai): accept data URI prefixes with codec/charset params Browser MediaRecorder produces data URIs like data:audio/webm;codecs=opus;base64,... so the pre-';base64,' section can carry multiple parameter segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go and core/http/endpoints/localai/audio.go only matched exactly one segment, so recordings straight from the React UI's live-capture tab failed the strip and then tripped the base64 decoder on the leading 'data:' literal, surfacing as "invalid audio base64: illegal base64 data at input byte 4" Widened both regexes to `^data:[^,]+?;base64,` so any number of ';param=value' segments between the mime type and ';base64,' are tolerated. Added a regression test covering the MediaRecorder shape. Assisted-by: Claude:Opus 4.7 * fix(insightface): scope pack ONNX loading to known manifests LocalAI's gallery extracts buffalo_* zips flat into the models directory, which inevitably mixes with ONNX files from other backends (opencv face engine, MiniFASNet antispoof, WeSpeaker voice embedding) and older buffalo pack installs. Feeding those foreign files into insightface's model_zoo.get_model() blows up inside the router -- it assumes a 4-D NCHW input and indexes `input_shape[2]` on tensors that aren't shaped like a face model, raising IndexError mid-load and leaving the backend unusable. The router's dispatch isn't amenable to per-file try/except alone (first-file-wins picks det_10g.onnx from buffalo_l even when the user asked for buffalo_sc -- alphabetical order happens to favour the wrong pack). Instead, ship an explicit manifest of the upstream v0.7 pack contents and scope the glob to that when the requested pack is known. The manifest is small and stable; future packs can be added alongside or fall through to the tolerance loop, which also swallows any remaining IndexError / ValueError from foreign files with a clear `[insightface] skipped` stderr line for diagnostics. Assisted-by: Claude:Opus 4.7 * fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders Pre-exported speaker-encoder ONNX graphs come in two shapes: rank-2 [batch, samples] -- some 3D-Speaker exports, take raw waveform directly. rank-3 [batch, frames, n_mels] -- WeSpeaker and most Kaldi- lineage encoders, expect pre-computed Kaldi FBank. OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` -- correct for rank-2, IndexError-on-input_shape[3] on rank-3, which surfaced to the UI as "Invalid rank for input: feats Got: 2 Expected: 3" Detect the input rank at session init and run Kaldi FBank (80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before the forward pass when rank>=3. All knobs are configurable via backend options for encoders that deviate from defaults. torchaudio.compliance.kaldi is already in the backend's requirements (SpeechBrain pulls torchaudio in), so no new dependency. Assisted-by: Claude:Opus 4.7 * fix(biometrics): isolate face and voice vector stores Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker 256-D) biometric embeddings were colliding inside a single in-memory local-store instance. Enrolling one after the other failed with "Try to add key with length N when existing length is M" because local-store correctly refuses to mix dimensions in one keyspace. The registries were constructed with `storeName=""`, which in StoreBackend() is just a WithModel() call. But ModelLoader's cache is keyed on `modelID`, not `model` -- so both registries collapsed to the same `modelID=""` slot and reused the same backend process despite looking isolated on paper. Three complementary fixes: 1. application.go -- give each registry a distinct default namespace ("localai-face-biometrics" / "localai-voice-biometrics"). The comment claimed isolation, now it's actually enforced. 2. stores.go -- pass the storeName as both WithModelID and WithModel so the ModelLoader cache key separates namespaces and the loader spawns distinct processes. 3. local-store/store.go -- drop the Load() `opts.Model != ""` guard. It was there to prevent generic model-loading loops from picking up local-store by accident, but that auto-load path is being retired; the guard now just blocks legitimate namespace isolation. opts.Model is treated as a tag; the per-tuple process isolation upstream handles discrimination. Assisted-by: Claude:Opus 4.7 * fix(gallery): stale-file cleanup and upgrade-tmp directory safety Two related robustness fixes for backend install/upgrade: pkg/downloader/uri.go OCI downloads passed through if filepath.Ext(filePath) != "" ... filePath = filepath.Dir(filePath) which was intended to redirect file-shaped download targets into their parent directory for OCI extraction. The heuristic misfires on directory-shaped paths with a dot-suffix -- gallery.UpgradeBackend uses tmpPath = "<backendsPath>/<name>.upgrade-tmp" and Go's filepath.Ext treats ".upgrade-tmp" as an extension. The rewrite landed the extraction at "<backendsPath>/", which then overwrote the real install (backends/<name>/) with a flat-layout file and left a stray run.sh at the top level. The tmp dir itself stayed empty, so the validation step that checked "<tmpPath>/run.sh" predictably failed with "upgrade validation failed: run.sh not found in new backend" Every manual upgrade silently corrupted the backends tree this way. Guard the rewrite behind "target isn't already an existing directory" -- InstallBackend / UpgradeBackend both pre-create the target as a directory, so they get the correct behaviour; existing file-path callers with a genuine dot-extension still get the parent redirect. core/gallery/backends.go InstallBackend's MkdirAll returned ENOTDIR when something at the target path was already a file (legacy dev builds dropped golang backend binaries directly at `<backendsPath>/<name>` instead of nesting them under their own subdir). That permanently blocked reinstall and upgrade for anyone carrying that state, since every retry hit the same error. Detect a pre-existing non-directory, warn, and remove it before the MkdirAll so the fresh install can write the correct nested layout with metadata.json + run.sh. Assisted-by: Claude:Opus 4.7 * fix(galleryop): refresh upgrade cache after backend ops UpgradeChecker caches the last upgrade-check result and only refreshes on the 6-hour tick or after an auto-upgrade cycle. Manual upgrades (POST /api/backends/upgrade/:name) go through the async galleryop worker, which completes the upgrade correctly but never tells UpgradeChecker to re-check -- so /api/backends/upgrades continued to list a just-upgraded backend as upgradeable, indistinguishable from a failed upgrade, for up to six hours. Add an optional `OnBackendOpCompleted func()` hook on GalleryService that fires after every successful install / upgrade / delete on the backend channel (async, so a slow callback doesn't stall the queue). startup.go wires it to UpgradeChecker.TriggerCheck after both services exist. Result: the upgrade banner clears within milliseconds of the worker finishing. Assisted-by: Claude:Opus 4.7 * build: prepend GOPATH/bin to PATH for protogen-go install-go-tools runs `go install` for protoc-gen-go and protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin. That directory isn't on every dev's PATH, and protoc resolves its code-gen plugins via PATH, so the immediately-following protoc invocation fails with "protoc-gen-go: program not found" which in turn blocks `make build` and any `make backends/%` target that depends on build. Prepend `go env GOPATH`/bin to PATH for the protoc invocation so the freshly-installed plugins are found without requiring a shell-profile change. Assisted-by: Claude:Opus 4.7 * refactor(ui-api): non-blocking backend upgrade handler with opcache POST /api/backends/upgrade/:name used to send the ManagementOp directly onto the unbuffered BackendGalleryChannel, which blocked the HTTP request whenever the galleryop worker was busy with a prior operation. The op also didn't show up in /api/operations, so the Backends UI couldn't reflect upgrade progress on the affected row. Register the op in opcache immediately, wrap it in a cancellable context, store the cancellation function on the GalleryService, and push onto the channel from a goroutine so the handler returns right away. Response gains a `jobID` field and a `message` string so clients have a consistent handle regardless of whether the op is queued or running. Pairs with the OnBackendOpCompleted hook added in the galleryop commit — together the UI sees the upgrade start, watches progress via /api/operations, and drops the "upgradeable" flag the moment the worker finishes. Assisted-by: Claude:Opus 4.7	2026-04-24 08:50:34 +02:00
Ettore Di Giacinto	75a63f87d8	feat(distributed): sync state with frontends, better backend management reporting (#9426 ) * fix(distributed): detect backend upgrades across worker nodes Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see why it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed. * feat(ui): surface backend upgrades in the System page The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles. * feat(ui): polish the Nodes page so it reads like a product The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker. * feat(distributed): durable backend fan-out + state reconciliation Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter. * feat(ui): show cluster distribution of models in the System page When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle. * feat(ui): NodeDistributionChip — shared per-node attribution component Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic. * feat(ui): shared FilterBar across the System page tabs The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.	2026-04-19 17:55:53 +02:00
Ettore Di Giacinto	8ab0744458	feat: backend versioning, upgrade detection and auto-upgrade (#9315 ) * feat: add backend versioning data model foundation Add Version, URI, and Digest fields to BackendMetadata for tracking installed backend versions and enabling upgrade detection. Add Version field to GalleryBackend. Add UpgradeAvailable/AvailableVersion fields to SystemBackend. Implement GetImageDigest() for lightweight OCI digest lookups via remote.Head. Record version, URI, and digest at install time in InstallBackend() and propagate version through meta backends. * feat: add backend upgrade detection and execution logic Add CheckBackendUpgrades() to compare installed backend versions/digests against gallery entries, and UpgradeBackend() to perform atomic upgrades with backup-based rollback on failure. Includes Agent A's data model changes (Version/URI/Digest fields, GetImageDigest). * feat: add AutoUpgradeBackends config and runtime settings Add configuration and runtime settings for backend auto-upgrade: - RuntimeSettings field for dynamic config via API/JSON - ApplicationConfig field, option func, and roundtrip conversion - CLI flag with LOCALAI_AUTO_UPGRADE_BACKENDS env var - Config file watcher support for runtime_settings.json - Tests for ToRuntimeSettings, ApplyRuntimeSettings, and roundtrip * feat(ui): add backend version display and upgrade support - Add upgrade check/trigger API endpoints to config and api module - Backends page: version badge, upgrade indicator, upgrade button - Manage page: version in metadata, context-aware upgrade/reinstall button - Settings page: auto-upgrade backends toggle * feat: add upgrade checker service, API endpoints, and CLI command - UpgradeChecker background service: checks every 6h, auto-upgrades when enabled - API endpoints: GET /backends/upgrades, POST /backends/upgrades/check, POST /backends/upgrade/:name - CLI: `localai backends upgrade` command, version display in `backends list` - BackendManager interface: add UpgradeBackend and CheckUpgrades methods - Wire upgrade op through GalleryService backend handler - Distributed mode: fan-out upgrade to worker nodes via NATS * fix: use advisory lock for upgrade checker in distributed mode In distributed mode with multiple frontend instances, use PostgreSQL advisory lock (KeyBackendUpgradeCheck) so only one instance runs periodic upgrade checks and auto-upgrades. Prevents duplicate upgrade operations across replicas. Standalone mode is unchanged (simple ticker loop). * test: add e2e tests for backend upgrade API - Test GET /api/backends/upgrades returns 200 (even with no upgrade checker) - Test POST /api/backends/upgrade/:name accepts request and returns job ID - Test full upgrade flow: trigger upgrade via API, wait for job completion, verify run.sh updated to v2 and metadata.json has version 2.0.0 - Test POST /api/backends/upgrades/check returns 200 - Fix nil check for applicationInstance in upgrade API routes	2026-04-11 22:31:15 +02:00
Ettore Di Giacinto	8862e3ce60	feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186 ) * always enable parallel requests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: move tests to ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(smart router): order by available vram Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 08:28:56 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
LocalAI [bot]	2b12875302	fix: Add tracing settings loading from runtime_settings.json (#9081 ) Tracing settings (EnableTracing and TracingMaxItems) were not being loaded from runtime_settings.json on startup, causing tracing settings configured via WebUI to be lost after service restart. This fix adds proper loading of tracing settings in loadRuntimeSettingsFromFile function in core/application/startup.go. Fixes #9072 Co-authored-by: localai-bot <localai-bot@localai.io>	2026-03-20 00:58:52 +01:00
Ettore Di Giacinto	aea21951a2	feat: add users and authentication support (#9061 ) * feat(ui): add users and authentication support Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: allow the admin user to impersonificate users Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: ui improvements, disable 'Users' button in navbar when no auth is configured Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add OIDC support Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: gate models Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: cache requests to optimize speed Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * small UI enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ui): style improvements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: cover other paths by auth Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: separate local auth, refactor Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * security hardening, approval mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: fix tests and expectations Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: update localagi/localrecall Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-19 21:40:51 +01:00
Richard Palethorpe	35d509d8e7	feat(ui): Per model backend logs and various fixes (#9028 ) * feat(gallery): Switch to expandable box instead of pop-over and display model files Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(ui, backends): Add individual backend logging Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ui): Set the context settings from the model config Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-18 08:31:26 +01:00
LocalAI [bot]	d200401e86	feat: Add --data-path CLI flag for persistent data separation (#8888 ) feat: add --data-path CLI flag for persistent data separation - Add LOCALAI_DATA_PATH environment variable and --data-path CLI flag - Default data path: /data (separate from configuration directory) - Automatic migration on startup: moves agent_tasks.json, agent_jobs.json, collections/, and assets/ from old config dir to new data path - Backward compatible: preserves old behavior if LOCALAI_DATA_PATH is not set - Agent state and job directories now use DataPath with proper fallback chain - Update documentation with new flag and docker-compose example This separates mutable persistent data (collectiondb, agents, assets, skills) from configuration files, enabling better volume mounting and data persistence in containerized deployments. Signed-off-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: localai-bot <localai-bot@noreply.github.com>	2026-03-09 14:11:15 +01:00
Ettore Di Giacinto	09ddaf94b2	feat(ui): move to React for frontend (#8772 ) * feat(ui): move to React Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add import model Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * syntax highlight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Minor fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-05 21:47:12 +01:00
Ettore Di Giacinto	ecba23d44e	fix: improve watchdown logics (#8591 ) * fix: ensure proper watchdog shutdown and state passing between restarts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: add missing watchdog settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: untrack model if we shut it down successfully Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-17 18:49:22 +01:00
Ettore Di Giacinto	c844b7ac58	feat: disable force eviction (#7725 ) * feat: allow to set forcing backends eviction while requests are in flight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: try to make the request sit and retry if eviction couldn't be done Otherwise calls that in order to pass would need to shutdown other backends would just fail. In this way instead we make the request sit and retry eviction until it succeeds. The thresholds can be configured by the user. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose settings to CLI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-25 14:26:18 +01:00
Ettore Di Giacinto	c37785b78c	chore(refactor): move logging to common package based on slog (#7668 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-21 19:33:13 +01:00
Ettore Di Giacinto	f3c70a96ba	chore(memory-reclaimer): use saner defaults Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 16:25:09 +01:00
Ettore Di Giacinto	878c9d46d5	fix: improve ram estimation (#7603 ) * fix: default to 10seconds of watchdog if runtime setting is malformed Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: use gosigar for RAM estimation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 10:18:36 +01:00
Ettore Di Giacinto	50f9c9a058	feat(watchdog): add Memory resource reclaimer (#7583 ) * feat(watchdog): add GPU reclaimer Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Handle vram calculation for unified memory devices Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Support RAM eviction, set watchdog interval from runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 09:15:18 +01:00
Ettore Di Giacinto	e1874cdb54	feat(ui): add mask to install custom backends (#7559 ) * feat: allow to install backends from URL in the WebUI and API Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * trace backends installations Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-13 19:11:32 +01:00
Ettore Di Giacinto	fc5b9ebfcc	feat(loader): enhance single active backend to support LRU eviction (#7535 ) * feat(loader): refactor single active backend support to LRU This changeset introduces LRU management of loaded backends. Users can set now a maximum number of models to be loaded concurrently, and, when setting LocalAI in single active backend mode we set LRU to 1 for backward compatibility. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-12 12:28:38 +01:00
Ettore Di Giacinto	53e5b2d6be	feat: agent jobs panel (#7390 ) * feat(agent): agent jobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Multiple webhooks, simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Do not use cron with seconds Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Create separate pages for details Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Detect if no models have MCP configuration, show wizard Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Make services test to run Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-28 23:05:39 +01:00
Ettore Di Giacinto	2dd42292dc	feat(ui): runtime settings (#7320 ) * feat(ui): add watchdog settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Do not re-read env Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Some refactor, move other settings to runtime (p2p) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add API Keys handling Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Allow to disable runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Documentation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * show MCP toggle in index Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop context default Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-20 22:37:20 +01:00
Ettore Di Giacinto	735ca757fa	feat(ui): allow to cancel ops (#7264 ) * feat(ui): allow to cancel ops Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Improve progress text Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Cancel queued ops, don't show up message cancellation always Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: fixup displaying of total progress over multiple files Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-13 18:41:47 +01:00
Ettore Di Giacinto	3728552e94	feat: import models via URI (#7245 ) * feat: initial hook to install elements directly Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP: ui changes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Move HF api client to pkg Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add simple importer for gguf files Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add opcache Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wire importers to CLI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add omitempty to config fields Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add MLX importer Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small refactors to star to use HF for discovery Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Common preferences Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add support to bare HF repos Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(importer/llama.cpp): add support for mmproj files Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add mmproj quants to common preferences Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix vlm usage in tokenizer mode with llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-12 20:48:56 +01:00
Ettore Di Giacinto	79a41a5e07	fix: register backends to model-loader during installation (#6159 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-28 19:11:02 +02:00
Ettore Di Giacinto	9c7f92c81f	feat(p2p): automatically sync installed models between instances (#6108 ) * feat(p2p): sync models between federated nodes This change makes sure that between federated nodes all the models are synced with each other. Note: this works exclusively with models belonging to a gallery. It does not sync files between the nodes, but rather it synces the node setup. E.g. All the nodes needs to have configured the same galleries and install models without any local editing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Make nodes stable Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups on syncing Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ui: improve p2p view Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-19 19:37:46 +02:00
Ettore Di Giacinto	089efe05fd	feat(backends): add system backend, refactor (#6059 ) - Add a system backend path - Refactor and consolidate system information in system state - Use system state in all the components to figure out the system paths to used whenever needed - Refactor BackendConfig -> ModelConfig. This was otherway misleading as now we do have a backend configuration which is not the model config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-14 19:38:26 +02:00
Ettore Di Giacinto	05757e2738	feat(backends install): allow to specify name and alias during manual installation (#5971 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-10 10:05:53 +02:00
Dave	b3c2a3c257	fix: untangle pkg and core (#5896 ) * migrate core/system to pkg/system - it has no dependencies FROM core, and IS USED in pkg Signed-off-by: Dave Lee <dave@gray101.com> * move pkg/templates up to core/templates -- nothing in pkg references it, but it does reference core. Signed-off-by: Dave Lee <dave@gray101.com> * remove extra check, len of nil is 0 Signed-off-by: Dave Lee <dave@gray101.com> * move pkg/startup to core/startup -- it does have important and unfixable dependencies on core Signed-off-by: Dave Lee <dave@gray101.com> --------- Signed-off-by: Dave Lee <dave@gray101.com>	2025-07-24 15:03:41 +02:00
Ettore Di Giacinto	98e5291afc	feat: refactor build process, drop embedded backends (#5875 ) * feat: split remaining backends and drop embedded backends - Drop silero-vad, huggingface, and stores backend from embedded binaries - Refactor Makefile and Dockerfile to avoid building grpc backends - Drop golang code that was used to embed backends - Simplify building by using goreleaser Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(gallery): be specific with llama-cpp backend templates Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(docs): update Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ci): minor fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: drop all ffmpeg references Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: run protogen-go Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Always enable p2p mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update gorelease file Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(stores): do not always load Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix linting issues Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Mac OS fixup Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-22 16:31:04 +02:00
Ettore Di Giacinto	bb54f2da2b	feat(gallery): automatically install missing backends along models (#5736 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-27 18:25:44 +02:00
Ettore Di Giacinto	2d64269763	feat: Add backend gallery (#5607 ) * feat: Add backend gallery This PR add support to manage backends as similar to models. There is now available a backend gallery which can be used to install and remove extra backends. The backend gallery can be configured similarly as a model gallery, and API calls allows to install and remove new backends in runtime, and as well during the startup phase of LocalAI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add backends docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wip: Backend Dockerfile for python backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: drop extras images, build python backends separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixup on all backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Tweaks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop old backends leftovers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Move dockerfile upper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix proto Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Feature dropped for consistency - we prefer model galleries Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add missing packages in the build image Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * exllama is ponly available on cublas Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pin torch on chatterbox Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups to index Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Debug CI * Install accellerators deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add target arch * Add cuda minor version Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use self-hosted runners Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: use quay for test images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups for vllm and chatterbox Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small fixups on CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chatterbox is only available for nvidia Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Simplify CI builds Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt test, use qwen3 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(model gallery): add jina-reranker-v1-tiny-en-gguf Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use reranker from llama.cpp in AIO images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Limit concurrent jobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-06-15 14:56:52 +02:00
Ettore Di Giacinto	2c9279a542	feat(video-gen): add endpoint for video generation (#5247 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-26 18:05:01 +02:00
Ettore Di Giacinto	2c425e9c69	feat(loader): enhance single active backend by treating as singleton (#5107 ) feat(loader): enhance single active backend by treating at singleton Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 20:58:11 +02:00
Dave	3cddf24747	feat: Centralized Request Processing middleware (#3847 ) * squash past, centralize request middleware PR Signed-off-by: Dave Lee <dave@gray101.com> * migrate bruno request files to examples repo Signed-off-by: Dave Lee <dave@gray101.com> * fix Signed-off-by: Dave Lee <dave@gray101.com> * Update tests/e2e-aio/e2e_test.go Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Dave Lee <dave@gray101.com> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-02-10 12:06:16 +01:00
Ettore Di Giacinto	72e52c4f6a	chore: drop embedded models (#4715 ) Since the remote gallery was introduced this is now completely superseded by it. In order to keep the code clean and remove redudant parts let's simplify the usage. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-01-30 00:03:01 +01:00
Ettore Di Giacinto	cea5a0ea42	feat(template): read jinja templates from gguf files (#4332 ) * Read jinja templates as fallback Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Move templating out of model loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test TemplateMessages Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Set role and content from transformers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Tests: be more flexible Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * More jinja Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small refactoring and adaptations Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-12-08 13:50:33 +01:00

43 Commits