fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks

Two bugs broke OpenAI-style tool calling on the MLX backend (and any Python backend sharing backend/python/common), reproduced end-to-end on LocalAI v4.5.5 with the metal-mlx backend and mlx-community/Qwen3.5-2B-MLX-8bit. messages_to_dicts left each tool call's function.arguments as the raw OpenAI-wire JSON string. HuggingFace chat templates (e.g. Qwen3.5) iterate arguments as a mapping (.items()), so any request whose history contained a prior assistant tool_calls message failed with HTTP 500 "Generation failed: Can only get item pairs from a mapping." — breaking every agent loop on its second turn. Decode the string back into a dict so the template sees a mapping. split_reasoning returned ("", text) whenever the opening think tag was absent. Models like Qwen3.5 open the assistant turn already inside thinking, so the generated text carries only the closing </think>; the whole chain-of-thought leaked into content. When the opener is missing but the closer is present, treat everything before the closer as reasoning. Adds platform-independent unit tests under backend/python/common (stdlib-only, no MLX/venv required, following parent_watch_test.py). Assisted-by: Claude Code:claude-opus-4-8
2026-07-04 05:16:42 -04:00 · 2026-07-03 08:06:59 +00:00
parent 715d4ed8e5
commit 4bf73a7e22
4 changed files with 218 additions and 2 deletions
--- a/backend/python/common/mlx_utils.py
+++ b/backend/python/common/mlx_utils.py
@@ -20,7 +20,15 @@ def split_reasoning(text, think_start, think_end):
    Returns ``(reasoning_content, remaining_text)``. When ``think_start`` is
    empty or not found, returns ``("", text)`` unchanged.
    """
-    if not think_start or not text or think_start not in text:
+    if not think_start or not text:
+        return "", text
+    if think_start not in text:
+        # Models like Qwen3.5 open assistant turns already INSIDE thinking, so
+        # the generated text carries only the closing tag. Everything before it
+        # is reasoning that would otherwise leak into the content.
+        if think_end and think_end in text:
+            head, _, tail = text.partition(think_end)
+            return head.strip(), tail.strip()
        return "", text
    pattern = re.compile(
        re.escape(think_start) + r"(.*?)" + re.escape(think_end or ""),