* Add support for multiple voice clones in Qwen TTS
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add voice prompt caching and generation logs to see generation time
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
fix: return full embedding dimensions instead of truncating trailing zeros
- Remove the logic that strips trailing zeros from embeddings
- Trailing zeros may be valid values in some embedding models
- This fixes the issue where embeddings like jina-v3 returned
only 1/4 of their native dimensions (256 instead of 1024)
- The truncation was causing vector database dimension mismatch errors
- Fixes issue #8721
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* fix: Add VRAM cleanup when stopping models
- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
to ensure VRAM is properly released when models are unloaded
Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.
* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
When a model is configured with 'known_usecases: [rerank]' in the YAML
config, the reranking endpoint was not being matched because:
1. The GuessUsecases function only checked for backend == 'rerankers'
2. The syncKnownUsecasesFromString() was not being called when loading
configs via yaml.Unmarshal in readModelConfigsFromFile
This fix:
1. Updates GuessUsecases to also check if Reranking is explicitly set to
true in the model config (in addition to checking backend type)
2. Adds syncKnownUsecasesFromString() calls after yaml.Unmarshal in
readModelConfigsFromFile to ensure known_usecases are properly parsed
Fixes#8658
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
- Add trailing newline to all requirements*.txt files in qwen-tts backend
- This ensures proper file formatting and prevents potential issues with
package installation tools that expect newline-terminated files
fix: Implement responsive line wrapping for model names on home page
- Changed model name display from truncate to break-words
- Increased max-width from 100px to 200px to allow more text
- This fixes issue #8209 for responsive text wrapping on smaller screens
Fixes: #8209
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Adding debug logging to help investigate the pocket-tts custom voice
finding issue (Issue #8244). This is a first step to understand how
voices are being loaded and where the failure occurs.
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* retry instead of re-computing a response
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Add model storage size display and RAM warning in Models tab
- Backend (ui_api.go):
- Added getDirectorySize() helper function to calculate total size of model files
- Added storageSize, ramTotal, ramUsed, ramUsagePercent to /api/models endpoint response
- Uses xsysinfo.GetSystemRAMInfo() for RAM information
- Frontend (models.html):
- Added storageSize, ramTotal, ramUsed, ramUsagePercent to Alpine.js data object
- Added formatBytes() helper for human-readable byte formatting
- Display storage size in hero header with blue indicator
- Show warning banner when storage exceeds RAM (model too large for system)
Addresses: https://github.com/mudler/LocalAI/issues/6251
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix(video): initialize model selection dropdown with current model value
The Alpine.js link variable was starting empty, causing the dropdown
selection to not reflect the currently selected model. This fix initializes
the link variable with the current model value from the template (e.g.,
video/{{.Model}}), following the same pattern used in image.html.
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
When a backend download fails (e.g., on Mac OS with port conflicts causing
connection issues), the backend directory is left with partial files.
This causes subsequent installation attempts to fail with 'run file not
found' because the sanity check runs on an empty/partial directory.
This fix cleans up the backend directory when the initial download fails
before attempting fallback URIs or mirrors. This ensures a clean state
for retry attempts.
Fixes: #8016
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix: use absolute path for CUDA directory detection
The capability detection was using a relative path 'usr/local/cuda-13'
which doesn't work when LocalAI is run from a different working directory.
This caused whisper (and other backends) to fail on CUDA-13 containers
because the system incorrectly detected 'nvidia' capability instead of
'nvidia-cuda-13', leading to wrong backend selection (cuda12-whisper
instead of cuda13-whisper).
Fixes: https://github.com/mudler/LocalAI/issues/8033
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
This addresses issue #8108 where the legacy nvidia driver configuration
causes container startup failures with newer NVIDIA Container Toolkit versions.
Changes:
- Update docker-compose example to show both CDI (recommended) and legacy
nvidia driver options
- Add troubleshooting section for 'Auto-detected mode as legacy' error
- Document the fix for nvidia-container-cli 'invalid expression' errors
The root cause is a Docker/NVIDIA Container Toolkit configuration issue,
not a LocalAI code bug. The error occurs during the container runtime's
prestart hook before LocalAI starts.
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* fix(gallery): add fallback URI resolution for backend installation
When a backend installation fails (e.g., due to missing 'latest-' tag),
try fallback URIs in order:
1. Replace 'latest-' with 'master-' in the URI
2. If that fails, append '-development' to the backend name
This fixes the issue where backend index entries don't match the
repository tags. For example, installing 'ace-step' tries to download
'latest-gpu-nvidia-cuda-13-ace-step' but only 'master-gpu-nvidia-cuda-13-ace-step'
exists in the quay.io registry.
Fixes: #8437
Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>
* chore(gallery): make fallback URI patterns configurable via env vars
---------
Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>
* feat(backends): add faster-qwen3-tts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: this backend is CUDA only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add requirements-install.txt with setuptools for build isolation
The faster-qwen3-tts backend requires setuptools to build packages
like sox that have setuptools as a build dependency. This ensures
the build completes successfully in CI.
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
- Added named volumes (models, images) to docker-compose.yaml
- Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml
- Changed bind mounts to named volumes for Windows compatibility
Fixes#8455
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>