LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-17 03:33:44 -04:00

Files

Ettore Di Giacinto 53deeb1107 fix(reasoning): suppress partial tag tokens during autoparser warm-up

The C++ PEG parser needs a few tokens to identify the reasoning format
(e.g. "<|channel>thought\n" for Gemma 4). During this warm-up, the gRPC
layer was sending raw partial tag tokens to Go, which leaked into the
reasoning field.

- Clear reply.message in gRPC when autoparser is active but has no diffs
  yet, matching llama.cpp server behavior of only emitting classified output
- Prefer C++ autoparser chat deltas for reasoning/content in all streaming
  paths, falling back to Go-side extraction for backends without autoparser
  (e.g. vLLM)
- Override non-streaming no-tools result with chat delta content when available
- Guard PrependThinkingTokenIfNeeded against partial tag prefixes during
  streaming accumulation
- Reorder default thinking tokens so <|channel>thought is checked before
  <|think|> (Gemma 4 templates contain both)

2026-04-04 20:45:57 +00:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

llama-cpp

fix(reasoning): suppress partial tag tokens during autoparser warm-up

2026-04-04 20:45:57 +00:00