Editing a model's YAML and changing the `name:` field previously wrote
the new body to the original `<oldName>.yaml`. On reload the config
loader indexed that file under the new name while the old key
lingered in memory, producing two entries in the system UI that
shared a single underlying file — deleting either removed both.
Detect the rename in EditModelEndpoint and rename the on-disk
`<name>.yaml` and `._gallery_<name>.yaml` to match, drop the stale
in-memory key before the reload, and redirect the editor URL in the
React UI so it tracks the new name. Reject conflicts (409) and names
containing path separators (400).
Fixes#9294
Bumps LocalAGI to pick up the LocalRecall postgres backend fix that
resizes the pgvector column when the configured embedding model
returns vectors of a different dimensionality than the existing
collection. Switching the agent pool's embedding model now triggers
a transparent re-embed at startup instead of failing every subsequent
upload with 'expected N dimensions, not M' (SQLSTATE 22000).
Also surfaces a 409 with an actionable message in
UploadToCollectionEndpoint as a safety net for the rare cases the
upstream migration path doesn't cover (e.g. a model swapped at
runtime), instead of the previous opaque 500.
* feat: add backend versioning data model foundation
Add Version, URI, and Digest fields to BackendMetadata for tracking
installed backend versions and enabling upgrade detection. Add Version
field to GalleryBackend. Add UpgradeAvailable/AvailableVersion fields
to SystemBackend. Implement GetImageDigest() for lightweight OCI digest
lookups via remote.Head. Record version, URI, and digest at install time
in InstallBackend() and propagate version through meta backends.
* feat: add backend upgrade detection and execution logic
Add CheckBackendUpgrades() to compare installed backend versions/digests
against gallery entries, and UpgradeBackend() to perform atomic upgrades
with backup-based rollback on failure. Includes Agent A's data model
changes (Version/URI/Digest fields, GetImageDigest).
* feat: add AutoUpgradeBackends config and runtime settings
Add configuration and runtime settings for backend auto-upgrade:
- RuntimeSettings field for dynamic config via API/JSON
- ApplicationConfig field, option func, and roundtrip conversion
- CLI flag with LOCALAI_AUTO_UPGRADE_BACKENDS env var
- Config file watcher support for runtime_settings.json
- Tests for ToRuntimeSettings, ApplyRuntimeSettings, and roundtrip
* feat(ui): add backend version display and upgrade support
- Add upgrade check/trigger API endpoints to config and api module
- Backends page: version badge, upgrade indicator, upgrade button
- Manage page: version in metadata, context-aware upgrade/reinstall button
- Settings page: auto-upgrade backends toggle
* feat: add upgrade checker service, API endpoints, and CLI command
- UpgradeChecker background service: checks every 6h, auto-upgrades when enabled
- API endpoints: GET /backends/upgrades, POST /backends/upgrades/check, POST /backends/upgrade/:name
- CLI: `localai backends upgrade` command, version display in `backends list`
- BackendManager interface: add UpgradeBackend and CheckUpgrades methods
- Wire upgrade op through GalleryService backend handler
- Distributed mode: fan-out upgrade to worker nodes via NATS
* fix: use advisory lock for upgrade checker in distributed mode
In distributed mode with multiple frontend instances, use PostgreSQL
advisory lock (KeyBackendUpgradeCheck) so only one instance runs
periodic upgrade checks and auto-upgrades. Prevents duplicate
upgrade operations across replicas.
Standalone mode is unchanged (simple ticker loop).
* test: add e2e tests for backend upgrade API
- Test GET /api/backends/upgrades returns 200 (even with no upgrade checker)
- Test POST /api/backends/upgrade/:name accepts request and returns job ID
- Test full upgrade flow: trigger upgrade via API, wait for job completion,
verify run.sh updated to v2 and metadata.json has version 2.0.0
- Test POST /api/backends/upgrades/check returns 200
- Fix nil check for applicationInstance in upgrade API routes
* feat: add toggle mechanism to enable/disable models from loading on demand
Implements #9303 - Adds ability to disable models from being auto-loaded
while keeping them in the collection.
Backend changes:
- Add Disabled field to ModelConfig struct with IsDisabled() getter
- New ToggleModelEndpoint handler (PUT /models/toggle/:name/:action)
- Request middleware returns 403 when disabled model is requested
- Capabilities endpoint exposes disabled status
Frontend changes:
- Toggle switch in System > Models table Actions column
- Visual indicators: dimmed row, red Disabled badge, muted icons
- Tooltip describes toggle function on hover
- Loading state while API call is in progress
* fix: remove extra closing brace causing syntax error in request middleware
* refactor: reorder Actions column - Stop button before toggle switch
* refactor: migrate from toggle to toggle-state per PR review feedback
* feat(ui): Add dynamic model editor with autocomplete
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(docs): Add link to longformat installation video
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* always enable parallel requests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: move tests to ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(smart router): order by available vram
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add fine-tuning endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(experimental): add fine-tuning endpoint and TRL support
This changeset defines new GRPC signatues for Fine tuning backends, and
add TRL backend as initial fine-tuning engine. This implementation also
supports exporting to GGUF and automatically importing it to LocalAI
after fine-tuning.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* commit TRL backend, stop by killing process
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* move fine-tune to generic features
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add evals, reorder menu
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add users and authentication support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow the admin user to impersonificate users
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: ui improvements, disable 'Users' button in navbar when no auth is configured
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add OIDC support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: gate models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: cache requests to optimize speed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* small UI enhancements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ui): style improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: cover other paths by auth
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: separate local auth, refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* security hardening, approval mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: fix tests and expectations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update localagi/localrecall
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): Switch to expandable box instead of pop-over and display model files
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ui, backends): Add individual backend logging
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(ui): Set the context settings from the model config
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(mlx-distributed): add new MLX-distributed backend
Add new MLX distributed backend with support for both TCP and RDMA for
model sharding.
This implementation ties in the discovery implementation already in
place, and re-uses the same P2P mechanism for the TCP MLX-distributed
inferencing.
The Auto-parallel implementation is inspired by Exo's
ones (who have been added to acknowledgement for the great work!)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose a CLI to facilitate backend starting
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: make manual rank0 configurable via model configs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing features from mlx backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: add standalone and agentic functionalities
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose agents via responses api
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix: reload model configuration after editing (issue #8647)
- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* feat(tts): add support for streaming mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Send first audio, make sure it's 16
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openai video endpoint (is not complete)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add download button
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: resolve duplicate MCP route registration causing 50% failure rate
Fixes#7772
The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.
Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
* /v1/mcp/chat/completions
* /mcp/v1/chat/completions
* /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go
The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.
This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* Address feedback from review
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow to set forcing backends eviction while requests are in flight
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: try to make the request sit and retry if eviction couldn't be done
Otherwise calls that in order to pass would need to shutdown other
backends would just fail.
In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose settings to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(loader): refactor single active backend support to LRU
This changeset introduces LRU management of loaded backends. Users can
set now a maximum number of models to be loaded concurrently, and, when
setting LocalAI in single active backend mode we set LRU to 1 for
backward compatibility.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(agent): agent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Multiple webhooks, simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not use cron with seconds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Create separate pages for details
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect if no models have MCP configuration, show wizard
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make services test to run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(importer): support ollama and OCI, unify code
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support importing from local file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* support also yaml config files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle local files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract importing errors
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add importer tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add integration tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(UX): improve and specify supported URI formats
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fail if backend does not have a runfile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add cache for galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): remove handler duplicate
File input handlers are now handled by Alpine.js @change handlers in chat.html.
Removed duplicate listeners to prevent files from being processed twice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): be consistent in attachments in the chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fail if no importer matches
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: propagate ops correctly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move management to separate section
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make index to redirect to chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use logo in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* work out the wizard in the front-page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(mcp): add LocalAI endpoint to stream live results of the agent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refactoring
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* MCP UX integration
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Enhance UX
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support also non-SSE
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: initial hook to install elements directly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP: ui changes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move HF api client to pkg
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add simple importer for gguf files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add opcache
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wire importers to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add omitempty to config fields
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add MLX importer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small refactors to star to use HF for discovery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Common preferences
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add support to bare HF repos
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(importer/llama.cpp): add support for mmproj files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add mmproj quants to common preferences
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix vlm usage in tokenizer mode with llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>