LocalAI/core/http at ebefa6dcca8b9dd009e0ad260dab70c3c0f315b5 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-30 11:26:32 -04:00

Files

History

Adira 28d7397743 fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

fix(openai): stop max_tokens streaming retry loop on reasoning models

When a thinking model spends its entire max_tokens budget on the reasoning
block, the C++ autoparser clears the raw Response and delivers reasoning-only
ChatDeltas (no content, no tool calls). ComputeChoices' empty-response retry
then fires and regenerates from scratch up to maxRetries times, each
re-consuming the whole budget, instead of terminating with finish_reason
"length" (issue #9716).

Add a reachedTokenBudget helper and suppress both the built-in and
caller-driven retries when the completion count has reached the configured
max_tokens ceiling. Report finish_reason "length" instead of "stop" in the
streaming and non-streaming chat paths when the budget was exhausted.

Adds a deterministic regression test that counts backend invocations
(previously 6, now 1) plus boundary tests for the helper.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Dennisadira <dennisadira@gmail.com>

2026-06-30 09:01:53 +02:00

..

fix(auth): make advisory locks dialect-aware and harden SQLite DSN (#10509 )

2026-06-25 17:18:55 +02:00

fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

2026-06-30 09:01:53 +02:00

fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482 ) (#10504 )

2026-06-25 08:10:59 +02:00

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

fix(distributed): return empty backend list for agent nodes instead of failing backend.list (#10545 ) (#10565 )

2026-06-28 01:22:48 +02:00

fix(streaming): comply with OpenAI usage / stream_options spec (#9815 )

2026-05-14 08:53:46 +02:00

feat(realtime): WebRTC support (#8790 )

2026-03-13 21:37:15 +01:00

app_test.go

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )

2026-06-16 12:16:34 +02:00

app.go

feat(distributed): SyncedMap component + migrate finetune/quant/agent-tasks to cross-replica state (#10542 )

2026-06-27 23:23:51 +02:00

csrf_multipart_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

explorer.go

chore(refactor): move logging to common package based on slog (#7668 )

2025-12-21 19:33:13 +01:00

http_suite_test.go

refactor(tests): split app_test.go, move real-backend coverage to e2e-backends

2026-04-27 23:09:20 +00:00

openresponses_test.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

render.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

route_coverage_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00