mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-14 11:49:33 -04:00
* fix(grammars): honor properties_order entry at index 0 The JSON-schema-to-GBNF property sort used `aOrder != 0 && bOrder != 0` as its "is this key ordered?" guard. That treats index 0 — the first key listed in properties_order — as unset, so `properties_order: name,arguments` fell back to alphabetical ordering and still emitted "arguments" before "name". Use presence in the order map instead: listed keys sort by their index and ahead of unlisted keys, which keep a stable alphabetical order. This makes the documented `properties_order: name,arguments` actually produce name-first tool-call JSON. Relates to #10052. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(functions): defer tool grammar to the backend when the tokenizer template owns templating (#10052) When use_tokenizer_template delegates templating to the backend (llama.cpp), the backend also owns tool-call grammar generation and parsing. LocalAI was still generating its own GBNF grammar and sending it down. With a grammar present, llama.cpp does not hand the tools to its template, so its native peg/json tool parser never engages: it streams the grammar-constrained tool-call JSON back as plain content instead of emitting tool_calls. In streaming mode the JSON object leaked into the content field, and the Go-side incremental detector never gated content because the LocalAI-generated grammar emitted "arguments" before "name". The GGUF auto-import path already couples use_tokenizer_template with grammar.disable, but that block is skipped when a template is already configured, so gallery and hand-written configs (e.g. qwen3) that set the tokenizer template directly never got the paired grammar.disable. - SetDefaults now enforces the coupling for every config: when use_tokenizer_template is set, grammar generation is disabled and tools flow to the backend's native (name-first) pipeline. This also fixes already-installed models without editing each config. - Set function.grammar.disable in the shared gallery/qwen3.yaml, which is the base config referenced by every qwen3 gallery entry. Verified end to end against qwen3-4b with stream:true + tools: content no longer carries the tool-call JSON, reasoning is classified separately, and tool calls stream as proper name-first tool_calls deltas. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
30 lines
1015 B
YAML
30 lines
1015 B
YAML
config_file: |
|
|
backend: llama-cpp
|
|
known_usecases:
|
|
- chat
|
|
parameters:
|
|
context_size: 8192
|
|
f16: true
|
|
mmap: true
|
|
stopwords:
|
|
- <|im_end|>
|
|
- <dummy32000>
|
|
- </s>
|
|
- <|endoftext|>
|
|
# Delegate templating to llama.cpp's jinja runtime so the C++ autoparser
|
|
# can classify <think>…</think> blocks into reasoning_content natively
|
|
# (issue #9985). Without use_jinja the autoparser falls back to a
|
|
# "pure content" PEG parser that leaks reasoning tags into content.
|
|
options:
|
|
- use_jinja:true
|
|
# With use_tokenizer_template the backend (llama.cpp) owns tool-call
|
|
# grammar generation and parsing too. Disabling LocalAI's own grammar lets
|
|
# llama.cpp's native name-first tool pipeline run; otherwise the generated
|
|
# grammar overrides it and the tool-call JSON leaks into content (#10052).
|
|
function:
|
|
grammar:
|
|
disable: true
|
|
template:
|
|
use_tokenizer_template: true
|
|
name: qwen3
|