LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Files

Ettore Di Giacinto 4bf73a7e22 fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks

Two bugs broke OpenAI-style tool calling on the MLX backend (and any
Python backend sharing backend/python/common), reproduced end-to-end on
LocalAI v4.5.5 with the metal-mlx backend and
mlx-community/Qwen3.5-2B-MLX-8bit.

messages_to_dicts left each tool call's function.arguments as the raw
OpenAI-wire JSON string. HuggingFace chat templates (e.g. Qwen3.5)
iterate arguments as a mapping (.items()), so any request whose history
contained a prior assistant tool_calls message failed with HTTP 500
"Generation failed: Can only get item pairs from a mapping." — breaking
every agent loop on its second turn. Decode the string back into a dict
so the template sees a mapping.

split_reasoning returned ("", text) whenever the opening think tag was
absent. Models like Qwen3.5 open the assistant turn already inside
thinking, so the generated text carries only the closing </think>; the
whole chain-of-thought leaked into content. When the opener is missing
but the closer is present, treat everything before the closer as
reasoning.

Adds platform-independent unit tests under backend/python/common
(stdlib-only, no MLX/venv required, following parent_watch_test.py).

Assisted-by: Claude Code:claude-opus-4-8

2026-07-03 08:06:59 +00:00

template

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

grpc_auth.py

fix(process): give backend workers a parent-death safety net (#10639 )

2026-07-02 19:16:48 +02:00

libbackend.sh

fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

2026-05-06 00:28:01 +02:00

mlx_utils_test.py

fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks