* feat(backends): add faster-qwen3-tts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: this backend is CUDA only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add requirements-install.txt with setuptools for build isolation
The faster-qwen3-tts backend requires setuptools to build packages
like sox that have setuptools as a build dependency. This ensures
the build completes successfully in CI.
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
- Added named volumes (models, images) to docker-compose.yaml
- Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml
- Changed bind mounts to named volumes for Windows compatibility
Fixes#8455
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Closes#8119
When installing models from the gallery, files are created with 0600
permissions (owner read/write only), making them unreadable by the
LocalAI server when running as a different user.
This fix changes the permissions to 0644 (owner read/write, group/others
read), allowing the server to read model files regardless of the user
it runs as.
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
fix: reload model configuration after editing (issue #8647)
- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* docs: add Podman installation documentation
- Add new podman.md with comprehensive installation and usage guide
- Cover installation on multiple platforms (Ubuntu, Fedora, Arch, macOS, Windows)
- Document GPU support (NVIDIA CUDA, AMD ROCm, Intel, Vulkan)
- Include rootless container configuration
- Document Docker Compose with podman-compose
- Add troubleshooting section for common issues
- Link to Podman documentation in installation index
- Update image references to use Docker Hub and link to docker docs
- Change YAML heredoc to EOF in compose.yaml example
- Add curly brackets to notice shortcode and fix link
Closes#8645
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
* docs: merge Docker and Podman docs into unified Containers guide
Following the review comment, we have merged the Docker and Podman
documentation into a single 'Containers' page that covers both container
engines. The Docker and Podman pages now redirect to this unified guide.
Changes:
- Added new docs/content/installation/containers.md with combined Docker/Podman guide
- Updated docs/content/installation/docker.md to redirect to containers
- Updated docs/content/installation/podman.md to redirect to containers
- Updated docs/content/installation/_index.en.md to link to containers
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
* docs: remove podman.md as docs are merged into containers.md
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
---------
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration
* chore: revert backend/python/diffusers/README.md to original content
---------
Co-authored-by: Your Name <you@example.com>
* fix(realtime): Wrap functions in OpenAI chat completions format
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Set max tokens from session object
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Find thinking start tag for thinking extraction
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Don't send buffer cleared message when we automatically drop it
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix: ensure proper watchdog shutdown and state passing between restarts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add missing watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: untrack model if we shut it down successfully
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Closes#8527.
This PR fixes the excessive logging issue in capability detection by applying the existing capabilityLogged guard to the forced capability run file case.
## Changes
- Apply capabilityLogged flag to forced capability detection logging
- Prevents repeated log messages during backend discovery and gallery operations
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
feat(realtime): Allow sending text and image conversation items
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix(realtime): Use locked websocket for concurrent access
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Use sample rate set in session
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(config): Allow pipelines to have no model parameters
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* add experimental support for sd_embed-style prompt embedding
Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
* add doc equivalent to compel
Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
* need to use flux1 embedding function for flux model
Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
---------
Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
* fix(realtime): Use the voice provided by the user or none at all
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
User-supplied URLs passed to GetContentURIAsBase64() and downloadFile()
were fetched without validation, allowing SSRF attacks against internal
services. Added URL validation that blocks private IPs, loopback,
link-local, and cloud metadata endpoints before fetching.
Co-authored-by: kolega.dev <faizan@kolega.ai>
Fixes#8212 - Updated the note about reporting broken models to
reference the main LocalAI repository instead of the outdated
separate gallery repository reference.
* feat(musicgen): add ace-step and UI interface
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle model dir
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop auto-download
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to models, fixup UIs icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* l4t13 is incompatbile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* avoid pinning version for cuda12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop l4t12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Filter GGUF and GGML files from model list
Skip .gguf/.ggml loose files when listing models and add a test
for .gguf exclusion.
Closes#1077
Signed-off-by: Yaroslav98214 <diakovichyaroslav30@gmail.com>
Even if suboptimal as we should poll to wait for the service to be available, this should at least alleviate tests for now
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(metal): try to extend support to remaining backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* neutts doesn't work
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* split outetts out of transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove torch pin to whisperx
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The realtime endpoint was not passing the noAction "answer" function to the
model in the prompt template, causing the model to always call user-provided
tools even when a direct response was appropriate.
Root cause:
- User tools were added to the funcs list
- TemplateMessages() was called to generate the prompt
- noAction function was only added AFTER templating
- This meant the prompt didn't include the "answer" function, even though
the grammar did
Fix:
- Move noAction function creation before TemplateMessages() call so it's
included in both the prompt and grammar
- Add proper tool_choice parameter handling to support "auto", "required",
"none", and specific function selection
- Match behavior of the standard chat endpoint
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Some datacenter setups might be stuck with the 5.x kernel which doesn't
play well with CUDA >=12.9. To incrase compatibility with the CUDA 12.x
branch, downgrade to 12.8. For newer systems, it is still suggested to
use CUDA 13.x wherever compatible.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(tts): add support for streaming mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Send first audio, make sure it's 16
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): Add audio conversations
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(realtime): Vendor the updated API and modify for server side
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Update to the GA realtime API
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore: Document realtime API and add docs to AGENTS.md
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Filter reasoning from spoken output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Send delta and done events for tool calls and audio transcripts
Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.
💘 Generated with Crush
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Improve tool call handling and error reporting
- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Qwen3 4b was using the wrong function format (i.e. using "function"
instead of "name") within the realtime API.
If we specify the function calling format explicitly then it stops it.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(vibevoice): add ASR support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): download voice files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to run on bigger runner
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI can't hold vibevoice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(vllm-omni: add new backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* default to py3.12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
exllama2 development has stalled and only old architectures are
supported. exllamav3 is still in development, meanwhile cleaning up
exllama2 from the gallery.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openai video endpoint (is not complete)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add download button
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(openresponses): support reasoning blocks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* allow to disable reasoning, refactor common logic
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add option to only strip reasoning
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add configurations for custom reasoning tokens
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: extract reasoning to its own package
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* make sure we detect thinking tokens from template
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to override via config, add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Builds exhausts CI currently, and there are better backends at this
point in time. We will probably deprecate it in the future.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
* feat(diffusers): add support to LTX-2
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to the gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: install only torch/torchvision from jetson index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: use pip for l4t-12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "fix: install only torch/torchvision from jetson index"
This reverts commit 2d2b020078
* chatterbox needs wheel
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This PR adds support to support the 'reasoning' API field of the OpenAI
spec.
LocalAI now will extract automatically thinking tags in both SSE and
non-SSE mode. The changes are adapted as well to the Chat UI now that
will use the reasoning field to extract the thinking process and display
it in the chat.
This fixes https://github.com/mudler/LocalAI/issues/7944
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
* Add tool/function calling schema support to Anthropic Messages API
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E tests for Anthropic tool calling
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make tool calling tests require model to use tools
- First test now expects hasToolUse to be true with clear error message
- Third test now expects toolUseID to be non-empty (removed conditional)
- Both tests will now fail if model doesn't call the expected tools
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E test for tool calling with streaming responses
- Tests that streaming events are properly emitted (content_block_start/delta/stop)
- Verifies tool_use blocks are accumulated correctly in streaming mode
- Ensures model calls tools and stop_reason is set to tool_use
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
This image is for HW prior Jetpack 7. Jetpack 7 broke compatibility with
older devices (which are still in use) such as AGX Orin or Jetsons.
While we do have l4t-cuda-13 images with sbsa support for new Nvidia
devices (Thor, DGX, etc). For older HW we are forced to keep old images
around as 24.04 does not seem to be supported.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backends): add moonshine backend for faster transcription
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backend to CI, update AGENTS.md from this exercise
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(dockerfile): drop driver-requirements section
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): drop other builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(function): Add XML Tool Call Parsing Support
Extend the function parsing system in LocalAI to support XML-style tool calls, similar to how JSON tool calls are currently parsed. This will allow models that return XML format (like <tool_call><function=name><parameter=key>value</parameter></function></tool_call>) to be properly parsed alongside text content.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* thinking before tool calls, more strict support for corner cases with no tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support streaming tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative JSON
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative parsing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Consume JSON marker
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix pending TODOs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Don't run other parsing with ParseRegex
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: resolve duplicate MCP route registration causing 50% failure rate
Fixes#7772
The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.
Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
* /v1/mcp/chat/completions
* /mcp/v1/chat/completions
* /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go
The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.
This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* Address feedback from review
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop mode from image generation(unused)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): improve image generation front-end
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): only ref images. files is to be deprecated
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not override default steps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: add retry logic and fallback for checksums.txt download
- Add HTTP client with 30s timeout to ReleaseManager
- Implement downloadFileWithRetry with 3 attempts and exponential backoff
- Allow manual checksum placement at ~/.localai/checksums/checksums-<version>.txt
- Continue installation with warning if checksum download/verification fails
- Add test for HTTPClient initialization
- Fix linter error in systray_manager.go
Fixes#7385
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: add retry logic and improve checksums.txt download handling
This commit addresses issue #7385 by implementing:
- Retry logic (3 attempts) for checksum file downloads
- Fallback to manually placed checksum files
- Option to proceed with installation if checksums unavailable (with warnings)
- Fixed resource leaks in download retry loop
- Added configurable HTTP client with 30s timeout
The installation will now be more resilient to network issues while
maintaining security through checksum verification when available.
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: check for existing checksum file before downloading
This commit addresses the review feedback from mudler on PR #7788.
The code now checks if there's already a checksum file (either manually
placed or previously downloaded) and honors that, skipping download
entirely in such case.
Changes:
- Check for existing checksum file at ~/.localai/checksums/checksums-<version>.txt first
- Check for existing downloaded checksum file at binary path
- Only attempt to download if no existing checksum file is found
- This prevents unnecessary network requests and honors user-placed checksums
Signed-off-by: majiayu000 <1835304752@qq.com>
🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
fix: validate MCP configuration in model config
Fixes#7334
The Validate() function was not checking if MCP configuration
(mcp.stdio and mcp.remote) contains valid JSON. This caused
malformed JSON with missing commas to be silently accepted.
Changes:
- Add MCP configuration validation to ModelConfig.Validate()
- Properly report validation errors instead of discarding them
- Add test cases for valid and invalid MCP configurations
The fix ensures that malformed JSON in MCP config sections
will now be caught and reported during validation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: Add usage fields to image generation response for OpenAI API compatibility
Fixes#7354
Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.
Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: Correct usage field types for image generation API compatibility
Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.
Fixes#7354🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#7420
Added nil checks before calling mergo.Merge in InstallModelFromGallery and InstallModel
functions to prevent panic when req.Overrides or configOverrides are nil. The panic was
occurring at models.go:248 during Qwen-Image-Edit gallery model download.
Changes:
- Added nil check for req.Overrides before merging in InstallModelFromGallery (line 126)
- Added nil check for configOverrides before merging in InstallModel (line 248)
- Added test case to verify nil configOverrides are handled without panic
Signed-off-by: majiayu000 <1835304752@qq.com>
An example output of `rocm-smi --showproductname --showmeminfo vram --showuniqueid --csv`:
```
device,Unique ID,VRAM Total Memory (B),VRAM Total Used Memory (B),Card Series,Card Model,Card Vendor,Card SKU,Subsystem ID,Device Rev,Node ID,GUID,GFX Version
card0,0x9246____________,17163091968,692142080,Navi 21 [Radeon RX 6800/6800 XT / 6900 XT],0x73bf,Advanced Micro Devices Inc. [AMD/ATI],001,0x2406,0xc1,1,45534,gfx1030
card1,N/A,67108864,26079232,Raphael,0x164e,Advanced Micro Devices Inc. [AMD/ATI],RAPHAEL,0x364e,0xc6,2,52156,gfx1036
```
Total memory is actually showed before the total used memory as can be seen in https://github.com/LostRuins/koboldcpp/issues/1104#issuecomment-2321143507.
This PR fixes https://github.com/mudler/LocalAI/issues/7724
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow to set forcing backends eviction while requests are in flight
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: try to make the request sit and retry if eviction couldn't be done
Otherwise calls that in order to pass would need to shutdown other
backends would just fail.
In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose settings to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
stale-issue-message:'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
stale-pr-message:'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
## Building a specified backend
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
- l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
## Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
### 1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
### 2. Add Build Configurations to `.github/workflows/backend.yml`
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
### 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
**Step 3b: Add Image Entries**
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
### 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
**Step 4a: Add to `.NOTPARALLEL`**
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
```makefile
.NOTPARALLEL: ... backends/<backend-name>
```
**Step 4b: Add to `prepare-test-extra`**
Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
```makefile
prepare-test-extra:protogen-python
...
$(MAKE) -C backend/python/<backend-name>
```
**Step 4c: Add to `test-extra`**
Add the backend to the `test-extra` target (around line 319) to run its tests:
```makefile
test-extra:prepare-test-extra
...
$(MAKE) -C backend/python/<backend-name> test
```
**Step 4d: Add Backend Definition**
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
**For Python backends with root context** (like `faster-whisper`, `coqui`):
-`llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
-`llama.cpp/common/chat.h` - Format enums and parameter structures
-`llama.cpp/tools/server/server-context.cpp` - Server configuration options
# Documentation
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
@@ -78,6 +78,20 @@ LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio make run-e2e-aio
We are welcome the contribution of the documents, please open new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
### Gallery YAML Schema
LocalAI provides a JSON Schema for gallery model YAML files at:
`core/schema/gallery-model.schema.json`
This schema mirrors the internal gallery model configuration and can be used by editors (such as VS Code) to enable autocomplete, validation, and inline documentation when creating or modifying gallery files.
To use it with the YAML language server, add the following comment at the top of a gallery YAML file:
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
## 📚🆕 Local Stack Family
## Local Stack Family
🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like:
- **[LocalAGI](https://github.com/mudler/LocalAGI)** - AI agent orchestration platform with OpenAI Responses API compatibility and advanced agentic capabilities
- **[LocalRecall](https://github.com/mudler/LocalRecall)** - MCP/REST API knowledge base system providing persistent memory and storage for AI agents
- 🆕 **[Cogito](https://github.com/mudler/cogito)** - Go library for building intelligent, co-operative agentic software and LLM-powered workflows, focusing on improving results for small, open source language models that scales to any LLM. Powers LocalAGI and LocalAI MCP/Agentic capabilities
- 🆕 **[Wiz](https://github.com/mudler/wiz)** - Terminal-based AI agent accessible via Ctrl+Space keybinding. Portable, local-LLM friendly shell assistant with TUI/CLI modes, tool execution with approval, MCP protocol support, and multi-shell compatibility (zsh, bash, fish)
- 🆕 **[SkillServer](https://github.com/mudler/skillserver)** - Simple, centralized skills database for AI agents via MCP. Manages skills as Markdown files with MCP server integration, web UI for editing, Git synchronization, and full-text search capabilities
<p>A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.</p>
<p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
</td>
</tr>
</table>
## Screenshots / Video
@@ -111,14 +93,7 @@
## 💻 Quickstart
Run the installer script:
```bash
# Basic installation
curl https://localai.io/install.sh | sh
```
For more installation options, see [Installer Options](https://localai.io/installation/).
### macOS Download:
@@ -128,7 +103,7 @@ For more installation options, see [Installer Options](https://localai.io/instal
> Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
Or run with docker:
### Containers (Docker, podman, ...)
> **💡 Docker Run vs Docker Start**
>
@@ -137,55 +112,59 @@ Or run with docker:
>
> If you've already run LocalAI before and want to start it again, use: `docker start -i local-ai`
### CPU only image:
#### CPU only image:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
```
### NVIDIA GPU Images:
#### NVIDIA GPU Images:
```bash
# CUDA 13.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# CUDA 12.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 11.7
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# NVIDIA Jetson (L4T) ARM64
# CUDA 12 (for Nvidia AGX Orin and similar platforms)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
# CUDA 13 (for Nvidia DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
```
### Vulkan GPU Images:
#### Vulkan GPU Images:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
```
### AIO Images (pre-downloaded models):
#### AIO Images (pre-downloaded models):
```bash
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 13 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
@@ -215,7 +194,8 @@ local-ai run oci://localai/phi-2:latest
For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
## 📰 Latest project news
- February 2026: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
- January 2026: **LocalAI 3.10.0** - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: **Moonshine** (ultra-fast transcription), **Pocket-TTS** (lightweight TTS). Vulkan arm64 builds now available. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0).
- December 2025: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic fitting of models to multiple GPUS(llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Added Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
- November 2025: Major improvements to the UX. Among these: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245) and [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
- October 2025: 🔌 [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support added for agentic capabilities with external tools
@@ -248,9 +228,10 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
- 🧩 [Backend Gallery](https://localai.io/backends/): Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/)
| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |
### 🔗 Community and integrations
@@ -334,6 +318,10 @@ Agentic Libraries:
MCPs:
- https://github.com/mudler/MCPs
OS Assistant:
- https://github.com/mudler/Keygeist - Keygeist is an AI-powered keyboard operator that listens for key combinations and responds with AI-generated text typed directly into your Linux box.
Model galleries
- https://github.com/go-skynet/model-gallery
@@ -408,6 +396,10 @@ A huge thank you to our generous sponsors who support this project covering CI e
</a>
</p>
### Individual sponsors
A special thanks to individual sponsors that contributed to the project, a full list is in [Github](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler), a special shout out goes to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
## 🌟 Star history
[](https://star-history.com/#go-skynet/LocalAI&Date)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.