* feat(tts): add support for streaming mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Send first audio, make sure it's 16
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): Add audio conversations
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(realtime): Vendor the updated API and modify for server side
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Update to the GA realtime API
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore: Document realtime API and add docs to AGENTS.md
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Filter reasoning from spoken output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Send delta and done events for tool calls and audio transcripts
Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.
💘 Generated with Crush
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Improve tool call handling and error reporting
- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
exllama2 development has stalled and only old architectures are
supported. exllamav3 is still in development, meanwhile cleaning up
exllama2 from the gallery.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(openresponses): support reasoning blocks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* allow to disable reasoning, refactor common logic
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add option to only strip reasoning
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add configurations for custom reasoning tokens
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow to set forcing backends eviction while requests are in flight
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: try to make the request sit and retry if eviction couldn't be done
Otherwise calls that in order to pass would need to shutdown other
backends would just fail.
In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose settings to CLI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(loader): refactor single active backend support to LRU
This changeset introduces LRU management of loaded backends. Users can
set now a maximum number of models to be loaded concurrently, and, when
setting LocalAI in single active backend mode we set LRU to 1 for
backward compatibility.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: add tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The internal echo command in sh does not support "-e" and "-E" options
and interprets backslash escape sequences by default. So we prefer the
external echo command when it is available.
Actually it is not necessary to remove particularly the local-ai data
directory before user deletion. It will be accomplished automatically by
the userdel command. But it is crucial to remove additional users from
the local-ai group to allow userdel command to delete the group itself.
* More appropriate place for data storing
The /usr/share subtree in Linux is used for data that generally are not
supposed to change. Conventional places for changeable data are usually
located under /var, so /var/lib seems to be a reasonable default here.
* Data paths consistency fix
* Directory name consistency fix
* feat(ui): add watchdog settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not re-read env
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Some refactor, move other settings to runtime (p2p)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add API Keys handling
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Allow to disable runtime settings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* show MCP toggle in index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop context default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
* Fix SSE streaming format to comply with specification
- Replace json.Encoder with json.Marshal for explicit formatting
- Use explicit \n\n for all SSE messages (instead of relying on implicit newlines)
- Change %v to %s format specifier for proper string formatting
- Fix error message streaming to include proper SSE format
- Ensure consistency between chat.go and completion.go endpoints
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add proper error handling for JSON marshal failures in streaming
- Handle json.Marshal errors explicitly in error response paths
- Add fallback simple error message if marshal fails
- Prevents sending 'data: <nil>' on marshal failures
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix SSE streaming format to comply with specification
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix finish_reason field to use pointer for proper null handling
- Change FinishReason from string to *string in Choice schema
- Streaming chunks now omit finish_reason (null) instead of empty string
- Final chunks properly set finish_reason to "stop", "tool_calls", etc.
- Remove empty content from initial streaming chunks (only send role)
- Final streaming chunk sends empty delta with finish_reason
- Addresses OpenAI API compliance issues causing client failures
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Improve code consistency for string pointer creation
- Use consistent pattern: declare variable then take address
- Remove inline anonymous function for better readability
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Move common finish reasons to constants
- Create constants.go with FinishReasonStop, FinishReasonToolCalls, FinishReasonFunctionCall
- Replace all string literals with constants in chat.go, completion.go, realtime.go
- Improves code maintainability and prevents typos
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make it build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix finish_reason to always be present with null or string value
- Remove omitempty from FinishReason field in Choice struct
- Explicitly set FinishReason to nil for all streaming chunks
- Ensures finish_reason appears as null in JSON for streaming chunks
- Final chunks still properly set finish_reason to "stop", "tool_calls", etc.
- Complies with OpenAI API specification example
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): expose env vars as options for consistency
This allows to configure everything in the YAML file of the model rather
than have global configurations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templating system to process messages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect template exists if use tokenizer template is enabled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Better recognization of chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixes to support tool calls while using templates from tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop template guessing, fix passing tools to tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract grammar and other options from chat template, add schema struct
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Automatically set use_jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanups, identify by default gguf models for chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Added download instructions for macOS DMG file and updated command for Linux and macOS.
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix: properly terminate kv_overrides array with empty key
The llama model loading function expects KV overrides to be terminated
with an empty key (key[0] == 0). Previously, the kv_overrides vector was
not being properly terminated, causing an assertion failure.
This commit ensures that after parsing all KV override strings, we add a
final terminating entry with an empty key to satisfy the C-style array
termination requirement. This fixes the assertion error and allows the
model to load correctly with custom KV overrides.
Fixes#6643
- Also included a reference to the usage of the `overrides` option in
the advanced-usage section.
Signed-off-by: blob42 <contact@blob42.xyz>
* doc: document the `overrides` option
---------
Signed-off-by: blob42 <contact@blob42.xyz>