mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 19:19:19 -04:00
* fix(streaming/tools): stop healing-marker stubs from gating off content
When the C++ autoparser is in pure-content fallback mode (e.g. qwen3
without --jinja) and the model emits a tool call as JSON, the streaming
worker calls ParseJSONIterative on each new chunk. parseJSONWithStack
heals partial input like `{` into `{"<marker>":1}` where <marker> is a
random integer. removeHealingMarkerFromJSON only stripped the marker
from values, so the synthetic key survived and downstream callers saw
a stub object with a random-looking key.
chat_stream_workers.go's JSON tool-call detector then bumped
lastEmittedCount past the stub even though no real tool call was
emitted, gating off ALL subsequent content chunks. The qwen3 + tools +
streaming case ended up dribbling only the first `{"` to clients and
then nothing, even when the model went on to call the noAction
`answer({"message": "…"})` pseudo-tool.
Three changes, each with its own regression test:
* removeHealingMarkerFromJSON now strips the marker suffix from keys
too, dropping the entry when the truncated key is empty. Inputs like
`{` no longer leak `{"<marker>":1}` to callers; partial keys like
`{ "code` still preserve the model-typed prefix `code`.
* ParseJSONIterative skips empty-after-healing maps so a healed `{`
doesn't surface as a stub result.
* The streaming JSON detector now breaks (not continues) on entries
without a usable `name`, and only bumps lastEmittedCount past
successfully-emitted entries. Defense-in-depth against any future
partial-parse shape.
The parser tests cover eight partial-JSON-prefix shapes and verify no
marker characters leak into keys, plus the two early shapes (`{`,
`{"`) that should not surface a stub at all.
Fixes #9988
Assisted-by: Claude:opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(streaming/tools): cover the autoparser-correctly-working path
Extract the JSON tool-call streaming emit loop into emitJSONToolCallDeltas
and unit-test it against every shape that can hit the streaming worker:
* the bug case — a healing-marker stub at index 0 must NOT bump
lastEmittedCount, so subsequent content chunks keep flowing;
* the autoparser-correctly-working case — empty jsonResults (because
the C++ autoparser cleared the raw text and delivers tool calls via
TokenUsage.ChatDeltas) is a no-op, leaving the deferred end-of-stream
emitter to ship the autoparser's tool calls;
* a single complete tool call — emit one chunk, advance to 1;
* arguments arriving as a JSON-string vs as a nested object — both
serialize to the wire as JSON-string arguments;
* multiple parallel tool calls — one chunk each;
* a real tool call followed by a partial stub — emit the real one,
stop at the stub, resume on a later chunk once the stub completes.
Locks down the no-regression guarantee the user asked for: this PR's
fix is scoped to the pure-content fallback path; when the autoparser
actually classifies tool calls (jinja-recognized chat format with tool
support), the helper is a no-op and nothing changes.
Assisted-by: Claude:opus-4-7 [Read] [Edit] [Bash] [Write]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>