The Go-side incremental JSON parser was emitting the same tool call on
every streaming token because it lacked the len > lastEmittedCount guard
that the XML parser had. On top of that, the post-streaming default:
case re-emitted all tool calls from index 0, duplicating everything.
This produced duplicate delta.tool_calls events causing clients to
accumulate arguments as "{args}{args}" — invalid JSON.
Fixes:
- JSON incremental parser: add len(jsonResults) > lastEmittedCount guard
and loop from lastEmittedCount (matching the XML parser pattern)
- Post-streaming default: case: skip i < lastEmittedCount entries that
were already emitted during streaming
- JSON parser: use blocking channel send (matching XML parser behavior)
When clients like Nextcloud or Home Assistant send requests with tools
to thinking models (e.g. Gemma 4 with <|channel>thought tags), the
response was empty despite the backend producing valid content.
Root cause: the C++ autoparser puts clean content in both the raw
Response and ChatDeltas. The Go-side PrependThinkingTokenIfNeeded
then prepends the thinking start token to the already-clean content,
causing ExtractReasoning to classify the entire response as unclosed
reasoning. This made cbRawResult empty, triggering a retry loop that
never succeeds.
Two fixes:
- inference.go: check ChatDeltas for content/tool_calls regardless of
whether Response is empty, so skipCallerRetry fires correctly
- chat.go: when ChatDeltas have content but no tool calls, use that
content directly instead of falling back to the empty cbRawResult
* fix(chat): do not retry if we had chatdeltas or tooldeltas from backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: use oai compat for llama.cpp
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: apply to non-streaming path too
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* map also other fields
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
First when sending errors over SSE we now clearly identify them as such
instead of just sending the error string as a chat completion message.
We use this in the UI to identify errors and link to them to the traces.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): WebRTC support
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(tracing): Show full LLM opts and deltas
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(musicgen): add ace-step and UI interface
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle model dir
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop auto-download
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to models, fixup UIs icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* l4t13 is incompatbile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* avoid pinning version for cuda12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop l4t12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Initial plan
* Add tool/function calling schema support to Anthropic Messages API
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E tests for Anthropic tool calling
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make tool calling tests require model to use tools
- First test now expects hasToolUse to be true with clear error message
- Third test now expects toolUseID to be non-empty (removed conditional)
- Both tests will now fail if model doesn't call the expected tools
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add E2E test for tool calling with streaming responses
- Tests that streaming events are properly emitted (content_block_start/delta/stop)
- Verifies tool_use blocks are accumulated correctly in streaming mode
- Ensures model calls tools and stop_reason is set to tool_use
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>