* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(tts): add support for streaming mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Send first audio, make sure it's 16
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): Add audio conversations
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(realtime): Vendor the updated API and modify for server side
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Update to the GA realtime API
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore: Document realtime API and add docs to AGENTS.md
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Filter reasoning from spoken output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Send delta and done events for tool calls and audio transcripts
Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.
💘 Generated with Crush
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Improve tool call handling and error reporting
- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* Debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openai video endpoint (is not complete)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add download button
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(openresponses): support reasoning blocks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* allow to disable reasoning, refactor common logic
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add option to only strip reasoning
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add configurations for custom reasoning tokens
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: extract reasoning to its own package
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* make sure we detect thinking tokens from template
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to override via config, add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
* fix: reduce log verbosity for /api/operations polling
Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes#7989.
This PR adds support to support the 'reasoning' API field of the OpenAI
spec.
LocalAI now will extract automatically thinking tags in both SSE and
non-SSE mode. The changes are adapted as well to the Chat UI now that
will use the reasoning field to extract the thinking process and display
it in the chat.
This fixes https://github.com/mudler/LocalAI/issues/7944
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
* Add tool/function calling schema support to Anthropic Messages API
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E tests for Anthropic tool calling
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make tool calling tests require model to use tools
- First test now expects hasToolUse to be true with clear error message
- Third test now expects toolUseID to be non-empty (removed conditional)
- Both tests will now fail if model doesn't call the expected tools
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E test for tool calling with streaming responses
- Tests that streaming events are properly emitted (content_block_start/delta/stop)
- Verifies tool_use blocks are accumulated correctly in streaming mode
- Ensures model calls tools and stop_reason is set to tool_use
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* feat(function): Add XML Tool Call Parsing Support
Extend the function parsing system in LocalAI to support XML-style tool calls, similar to how JSON tool calls are currently parsed. This will allow models that return XML format (like <tool_call><function=name><parameter=key>value</parameter></function></tool_call>) to be properly parsed alongside text content.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* thinking before tool calls, more strict support for corner cases with no tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support streaming tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative JSON
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative parsing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Consume JSON marker
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix pending TODOs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Don't run other parsing with ParseRegex
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: resolve duplicate MCP route registration causing 50% failure rate
Fixes#7772
The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.
Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
* /v1/mcp/chat/completions
* /mcp/v1/chat/completions
* /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go
The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.
This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* Address feedback from review
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop mode from image generation(unused)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): improve image generation front-end
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): only ref images. files is to be deprecated
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not override default steps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix: validate MCP configuration in model config
Fixes#7334
The Validate() function was not checking if MCP configuration
(mcp.stdio and mcp.remote) contains valid JSON. This caused
malformed JSON with missing commas to be silently accepted.
Changes:
- Add MCP configuration validation to ModelConfig.Validate()
- Properly report validation errors instead of discarding them
- Add test cases for valid and invalid MCP configurations
The fix ensures that malformed JSON in MCP config sections
will now be caught and reported during validation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: Add usage fields to image generation response for OpenAI API compatibility
Fixes#7354
Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.
Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: Correct usage field types for image generation API compatibility
Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.
Fixes#7354🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#7420
Added nil checks before calling mergo.Merge in InstallModelFromGallery and InstallModel
functions to prevent panic when req.Overrides or configOverrides are nil. The panic was
occurring at models.go:248 during Qwen-Image-Edit gallery model download.
Changes:
- Added nil check for req.Overrides before merging in InstallModelFromGallery (line 126)
- Added nil check for configOverrides before merging in InstallModel (line 248)
- Added test case to verify nil configOverrides are handled without panic
Signed-off-by: majiayu000 <1835304752@qq.com>
* feat: allow to set forcing backends eviction while requests are in flight
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: try to make the request sit and retry if eviction couldn't be done
Otherwise calls that in order to pass would need to shutdown other
backends would just fail.
In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose settings to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(makefile): Add buildargs for sd and cuda when building backend
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(whisper): Add prompt to condition transcription output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>