LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-17 19:53:38 -04:00

Files

Ettore Di Giacinto 13a6ed709c fix: thinking models with tools returning empty content (reasoning-only retry loop) (#9290 )

When clients like Nextcloud or Home Assistant send requests with tools
to thinking models (e.g. Gemma 4 with <|channel>thought tags), the
response was empty despite the backend producing valid content.

Root cause: the C++ autoparser puts clean content in both the raw
Response and ChatDeltas. The Go-side PrependThinkingTokenIfNeeded
then prepends the thinking start token to the already-clean content,
causing ExtractReasoning to classify the entire response as unclosed
reasoning. This made cbRawResult empty, triggering a retry loop that
never succeeds.

Two fixes:
- inference.go: check ChatDeltas for content/tool_calls regardless of
  whether Response is empty, so skipCallerRetry fires correctly
- chat.go: when ChatDeltas have content but no tool calls, use that
  content directly instead of falling back to the empty cbRawResult

2026-04-09 18:30:31 +02:00

e2e

fix: thinking models with tools returning empty content (reasoning-only retry loop) (#9290 )

2026-04-09 18:30:31 +02:00

e2e-aio

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

e2e-ui

feat(ui, gallery): Show model backends and add searchable model/backend selector (#9060 )

2026-03-18 21:14:41 +01:00

fixtures

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

integration

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

models_fixtures

feat(transformers): merge sentencetransformers backend (#4624 )

2025-01-18 18:30:30 +01:00