LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-01 03:46:41 -04:00

Files

Adira 28d7397743 fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

fix(openai): stop max_tokens streaming retry loop on reasoning models

When a thinking model spends its entire max_tokens budget on the reasoning
block, the C++ autoparser clears the raw Response and delivers reasoning-only
ChatDeltas (no content, no tool calls). ComputeChoices' empty-response retry
then fires and regenerates from scratch up to maxRetries times, each
re-consuming the whole budget, instead of terminating with finish_reason
"length" (issue #9716).

Add a reachedTokenBudget helper and suppress both the built-in and
caller-driven retries when the completion count has reached the configured
max_tokens ceiling. Report finish_reason "length" instead of "stop" in the
streaming and non-streaming chat paths when the budget was exhausted.

Adds a deterministic regression test that counts backend invocations
(previously 6, now 1) plus boundary tests for the helper.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Dennisadira <dennisadira@gmail.com>

2026-06-30 09:01:53 +02:00

compactcoord

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

conncoord

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

coordinator

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

respcoord

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

ttscoord

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

turncoord

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

types

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

chat_assistant_gate_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

chat_assistant_gate.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

chat_emit_test.go

fix(streaming): comply with OpenAI usage / stream_options spec (#9815 )

2026-05-14 08:53:46 +02:00

chat_emit.go

fix(openai): stream usage non-zero when tools are enabled (#9941 )

2026-05-22 10:13:41 +02:00

chat_stream_reasoning_test.go

fix(streaming/tools): don't leak prefill-misclassified content as trailing reasoning chunk (#10000 )

2026-05-26 08:34:26 +02:00

chat_stream_usage_test.go

fix(openai): stream usage non-zero when tools are enabled (#9941 )

2026-05-22 10:13:41 +02:00

chat_stream_workers_test.go

fix(streaming/tools): stop healing-marker stubs from gating off content (#9999 )

2026-05-25 23:55:35 +02:00

chat_stream_workers.go

fix(openai): stop streaming tool-call double-emission when autoparser is active (#10055 )

2026-05-29 11:39:09 +02:00

chat_test.go

fix(reasoning): stop prefilled <think> from swallowing tag-less answers (#10225 )

2026-06-09 09:02:04 +02:00

chat.go

fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

2026-06-30 09:01:53 +02:00

completion.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

constants.go

fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

2026-06-30 09:01:53 +02:00

diarization_test.go

feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )

2026-05-05 15:10:13 +02:00

diarization.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

edit.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

embeddings.go

feat(distributed): gated X-LocalAI-Node response header (middleware + wrapper) (#9976 )

2026-05-25 10:51:48 +02:00

image_test.go

Fix image upload processing and img2img pipeline in diffusers backend (#8879 )

2026-03-11 08:05:50 +01:00

image.go

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

inference_test.go

fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

2026-06-30 09:01:53 +02:00

inference.go

fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 )

2026-06-30 09:01:53 +02:00

inpainting_test.go

feat(distributed): gated X-LocalAI-Node response header (middleware + wrapper) (#9976 )

2026-05-25 10:51:48 +02:00

inpainting.go

feat(distributed): gated X-LocalAI-Node response header (middleware + wrapper) (#9976 )

2026-05-25 10:51:48 +02:00

list.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

openai_suite_test.go

Fix image upload processing and img2img pipeline in diffusers backend (#8879 )

2026-03-11 08:05:50 +01:00

realtime_chunker_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_chunker.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_compactcoord.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_compaction_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_compaction.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_conncoord.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_doubles_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_gate_test.go

feat(realtime): configurable pipeline.max_history_items (#10331 )

2026-06-14 18:13:09 +02:00

realtime_modality_test.go

realtime: honor output_modalities to skip TTS in text-only mode (#9838 )

2026-05-15 12:39:47 +02:00

realtime_model_alias_test.go

fix(realtime): resolve model aliases for pipeline sub-models (#10484 )

2026-06-24 21:50:44 +02:00

realtime_model.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_reasoning_test.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

realtime_reasoning.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

realtime_respcoord.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_semantic_vad_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_semantic_vad.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_sound_detection_test.go

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

realtime_sound_detection.go

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

realtime_speaker_event_test.go

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

realtime_speech_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_speech.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_stream_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_stream.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_thinking_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_thinking.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_transcription_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

realtime_transcription.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_transport_webrtc.go

fix(realtime): raise WebRTC data-channel max-message-size + keep sendLoop alive (#10407 )

2026-06-19 21:36:25 +02:00

realtime_transport_ws.go

feat(realtime): WebRTC support (#8790 )

2026-03-13 21:37:15 +01:00

realtime_transport.go

feat(realtime): WebRTC support (#8790 )

2026-03-13 21:37:15 +01:00

realtime_tts_pipeline_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_tts_pipeline.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_turncoord.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_vad_buffer_test.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

realtime_voicegate_integration_test.go

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

realtime_voicegate_test.go

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

realtime_voicegate.go

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

realtime_webrtc_ice_test.go

feat(realtime): make WebRTC ICE candidates configurable (#10231 )

2026-06-09 22:28:03 +02:00

realtime_webrtc_ice.go

feat(realtime): make WebRTC ICE candidates configurable (#10231 )

2026-06-09 22:28:03 +02:00

realtime_webrtc_sctp_test.go

fix(realtime): raise WebRTC data-channel max-message-size + keep sendLoop alive (#10407 )

2026-06-19 21:36:25 +02:00

realtime_webrtc_sctp.go

fix(realtime): raise WebRTC data-channel max-message-size + keep sendLoop alive (#10407 )

2026-06-19 21:36:25 +02:00

realtime_webrtc.go

fix(realtime): raise WebRTC data-channel max-message-size + keep sendLoop alive (#10407 )

2026-06-19 21:36:25 +02:00

realtime.go

feat(realtime): Semantic VAD EOU token (#10444 )

2026-06-30 09:01:22 +02:00

sound_classification.go

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

transcription.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00