* feat(function): Add XML Tool Call Parsing Support
Extend the function parsing system in LocalAI to support XML-style tool calls, similar to how JSON tool calls are currently parsed. This will allow models that return XML format (like <tool_call><function=name><parameter=key>value</parameter></function></tool_call>) to be properly parsed alongside text content.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* thinking before tool calls, more strict support for corner cases with no tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support streaming tools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative JSON
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Iterative parsing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Consume JSON marker
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix pending TODOs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Don't run other parsing with ParseRegex
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: resolve duplicate MCP route registration causing 50% failure rate
Fixes#7772
The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.
Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
* /v1/mcp/chat/completions
* /mcp/v1/chat/completions
* /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go
The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.
This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* Address feedback from review
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop mode from image generation(unused)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): improve image generation front-end
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): only ref images. files is to be deprecated
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not override default steps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: Add usage fields to image generation response for OpenAI API compatibility
Fixes#7354
Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.
Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: Correct usage field types for image generation API compatibility
Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.
Fixes#7354🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* chore(makefile): Add buildargs for sd and cuda when building backend
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(whisper): Add prompt to condition transcription output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(importer): support ollama and OCI, unify code
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support importing from local file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* support also yaml config files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle local files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract importing errors
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add importer tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add integration tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(UX): improve and specify supported URI formats
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fail if backend does not have a runfile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add cache for galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): remove handler duplicate
File input handlers are now handled by Alpine.js @change handlers in chat.html.
Removed duplicate listeners to prevent files from being processed twice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): be consistent in attachments in the chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fail if no importer matches
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: propagate ops correctly
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add support to logprobs in results
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add support to logitbias
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(mcp): add LocalAI endpoint to stream live results of the agent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refactoring
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* MCP UX integration
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Enhance UX
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Support also non-SSE
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
* Fix SSE streaming format to comply with specification
- Replace json.Encoder with json.Marshal for explicit formatting
- Use explicit \n\n for all SSE messages (instead of relying on implicit newlines)
- Change %v to %s format specifier for proper string formatting
- Fix error message streaming to include proper SSE format
- Ensure consistency between chat.go and completion.go endpoints
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add proper error handling for JSON marshal failures in streaming
- Handle json.Marshal errors explicitly in error response paths
- Add fallback simple error message if marshal fails
- Prevents sending 'data: <nil>' on marshal failures
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix SSE streaming format to comply with specification
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix finish_reason field to use pointer for proper null handling
- Change FinishReason from string to *string in Choice schema
- Streaming chunks now omit finish_reason (null) instead of empty string
- Final chunks properly set finish_reason to "stop", "tool_calls", etc.
- Remove empty content from initial streaming chunks (only send role)
- Final streaming chunk sends empty delta with finish_reason
- Addresses OpenAI API compliance issues causing client failures
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Improve code consistency for string pointer creation
- Use consistent pattern: declare variable then take address
- Remove inline anonymous function for better readability
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Move common finish reasons to constants
- Create constants.go with FinishReasonStop, FinishReasonToolCalls, FinishReasonFunctionCall
- Replace all string literals with constants in chat.go, completion.go, realtime.go
- Improves code maintainability and prevents typos
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make it build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix finish_reason to always be present with null or string value
- Remove omitempty from FinishReason field in Choice struct
- Explicitly set FinishReason to nil for all streaming chunks
- Ensures finish_reason appears as null in JSON for streaming chunks
- Final chunks still properly set finish_reason to "stop", "tool_calls", etc.
- Complies with OpenAI API specification example
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat: respect context
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* workaround fasthttp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): allow to abort call
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: improving error
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Respect context also with MCP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tie to both contexts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make detection more robust
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): expose env vars as options for consistency
This allows to configure everything in the YAML file of the model rather
than have global configurations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templating system to process messages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect template exists if use tokenizer template is enabled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Better recognization of chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixes to support tool calls while using templates from tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop template guessing, fix passing tools to tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract grammar and other options from chat template, add schema struct
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Automatically set use_jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanups, identify by default gguf models for chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP - add endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Rename
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Wire the Completion API
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to make it functional
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Almost functional
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Bump golang versions used in tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add description of the tool
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make it working
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small optimizations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanup/refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ci): Avoid matching wrong backend with the same prefix
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(whisper): Use Purego and enable VAD
This replaces the Whisper CGO bindings with our own Purego based module
to make compilation easier.
In addition this allows VAD models to be loaded by Whisper. There is not
much benefit now except that the same backend can be used for VAD and
transcription. Depending on upstream we may also be able to use GPU for
VAD in the future, but presently it is disabled.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(p2p): sync models between federated nodes
This change makes sure that between federated nodes all the models are
synced with each other.
Note: this works exclusively with models belonging to a gallery. It does
not sync files between the nodes, but rather it synces the node setup.
E.g. All the nodes needs to have configured the same galleries and
install models without any local editing.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make nodes stable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups on syncing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ui: improve p2p view
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
- Add a system backend path
- Refactor and consolidate system information in system state
- Use system state in all the components to figure out the system paths
to used whenever needed
- Refactor BackendConfig -> ModelConfig. This was otherway misleading as
now we do have a backend configuration which is not the model config.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(stablediffusion-ggml): add support to ref images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add it to the model gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* migrate core/system to pkg/system - it has no dependencies FROM core, and IS USED in pkg
Signed-off-by: Dave Lee <dave@gray101.com>
* move pkg/templates up to core/templates -- nothing in pkg references it, but it does reference core.
Signed-off-by: Dave Lee <dave@gray101.com>
* remove extra check, len of nil is 0
Signed-off-by: Dave Lee <dave@gray101.com>
* move pkg/startup to core/startup -- it does have important and unfixable dependencies on core
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
* feat(realtime): Initial Realtime API implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: go mod tidy
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Implement transcription only mode for realtime API
Reduce the scope of the real time API for the initial realease and make
transcription only mode functional.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(build): Build backends on a separate layer to speed up core only changes
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* chore(stablediffusion-ncn): drop in favor of ggml implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): drop stablediffusion build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): add
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): try to fixup current tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tests improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): use quality to specify step
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): switch to sd-1.5
also increase prep time for downloading models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>