* fix: Add timeout-based wait for model deletion completion
- Replace simple polling loop with context-based timeout (5 minutes)
- Use select statement for cleaner timeout handling
- Added proper logging for timeout case
- This addresses the code review comment about using context with timeout instead of dangerous polling approach
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix: replace goto statements with break in model deletion loop (fixes CI compilation error)
Signed-off-by: LocalAI [bot] <localai-bot@noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: LocalAI [bot] <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: LocalAI [bot] <localai-bot@noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat: Rename 'Whisper' model type to 'STT' in UI
- Updated models.html: Changed 'Whisper' filter button to 'STT'
- Updated talk.html: Changed 'Whisper Model' to 'STT Model'
- Updated backends.html: Changed 'Whisper' to 'STT'
- Updated talk.js: Renamed getWhisperModel() to getSTTModel(),
sendAudioToWhisper() to sendAudioToSTT(), and whisperModelSelect to sttModelSelect
This change makes the UI more consistent with the model category naming,
where all speech-to-text models (including Whisper, Parakeet, Moonshine,
WhisperX, etc.) are grouped under the 'STT' (Speech-to-Text) category.
Fixes#8776
Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
* Rename whisperModelSelect to sttModelSelect in talk.html
As requested by maintainer mudler in PR review, replacing all
whisperModelSelect occurrences with sttModelSelect since the
model type was renamed from Whisper to STT.
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
---------
Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
fix: Add vllm-omni backend to video generation model detection
- Include vllm-omni in the list of backends that support FLAG_VIDEO
- This allows models like vllm-omni-wan2.2-t2v to appear in the video model selector UI
- Fixes issue #8659 where video generation models using vllm-omni backend were not showing in the dropdown
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
fix: return full embedding dimensions instead of truncating trailing zeros
- Remove the logic that strips trailing zeros from embeddings
- Trailing zeros may be valid values in some embedding models
- This fixes the issue where embeddings like jina-v3 returned
only 1/4 of their native dimensions (256 instead of 1024)
- The truncation was causing vector database dimension mismatch errors
- Fixes issue #8721
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
When a model is configured with 'known_usecases: [rerank]' in the YAML
config, the reranking endpoint was not being matched because:
1. The GuessUsecases function only checked for backend == 'rerankers'
2. The syncKnownUsecasesFromString() was not being called when loading
configs via yaml.Unmarshal in readModelConfigsFromFile
This fix:
1. Updates GuessUsecases to also check if Reranking is explicitly set to
true in the model config (in addition to checking backend type)
2. Adds syncKnownUsecasesFromString() calls after yaml.Unmarshal in
readModelConfigsFromFile to ensure known_usecases are properly parsed
Fixes#8658
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix: Implement responsive line wrapping for model names on home page
- Changed model name display from truncate to break-words
- Increased max-width from 100px to 200px to allow more text
- This fixes issue #8209 for responsive text wrapping on smaller screens
Fixes: #8209
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* retry instead of re-computing a response
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Add model storage size display and RAM warning in Models tab
- Backend (ui_api.go):
- Added getDirectorySize() helper function to calculate total size of model files
- Added storageSize, ramTotal, ramUsed, ramUsagePercent to /api/models endpoint response
- Uses xsysinfo.GetSystemRAMInfo() for RAM information
- Frontend (models.html):
- Added storageSize, ramTotal, ramUsed, ramUsagePercent to Alpine.js data object
- Added formatBytes() helper for human-readable byte formatting
- Display storage size in hero header with blue indicator
- Show warning banner when storage exceeds RAM (model too large for system)
Addresses: https://github.com/mudler/LocalAI/issues/6251
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix(video): initialize model selection dropdown with current model value
The Alpine.js link variable was starting empty, causing the dropdown
selection to not reflect the currently selected model. This fix initializes
the link variable with the current model value from the template (e.g.,
video/{{.Model}}), following the same pattern used in image.html.
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
When a backend download fails (e.g., on Mac OS with port conflicts causing
connection issues), the backend directory is left with partial files.
This causes subsequent installation attempts to fail with 'run file not
found' because the sanity check runs on an empty/partial directory.
This fix cleans up the backend directory when the initial download fails
before attempting fallback URIs or mirrors. This ensures a clean state
for retry attempts.
Fixes: #8016
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* fix(gallery): add fallback URI resolution for backend installation
When a backend installation fails (e.g., due to missing 'latest-' tag),
try fallback URIs in order:
1. Replace 'latest-' with 'master-' in the URI
2. If that fails, append '-development' to the backend name
This fixes the issue where backend index entries don't match the
repository tags. For example, installing 'ace-step' tries to download
'latest-gpu-nvidia-cuda-13-ace-step' but only 'master-gpu-nvidia-cuda-13-ace-step'
exists in the quay.io registry.
Fixes: #8437
Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>
* chore(gallery): make fallback URI patterns configurable via env vars
---------
Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>
Closes#8119
When installing models from the gallery, files are created with 0600
permissions (owner read/write only), making them unreadable by the
LocalAI server when running as a different user.
This fix changes the permissions to 0644 (owner read/write, group/others
read), allowing the server to read model files regardless of the user
it runs as.
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix: reload model configuration after editing (issue #8647)
- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* fix(realtime): Wrap functions in OpenAI chat completions format
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Set max tokens from session object
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Find thinking start tag for thinking extraction
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Don't send buffer cleared message when we automatically drop it
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix: ensure proper watchdog shutdown and state passing between restarts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add missing watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: untrack model if we shut it down successfully
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
feat(realtime): Allow sending text and image conversation items
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix(realtime): Use locked websocket for concurrent access
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Use sample rate set in session
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(config): Allow pipelines to have no model parameters
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Use the voice provided by the user or none at all
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
User-supplied URLs passed to GetContentURIAsBase64() and downloadFile()
were fetched without validation, allowing SSRF attacks against internal
services. Added URL validation that blocks private IPs, loopback,
link-local, and cloud metadata endpoints before fetching.
Co-authored-by: kolega.dev <faizan@kolega.ai>