mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-02 12:26:49 -04:00

Files

History

pos-ei-don 7910018249 fix(vllm): non-streaming tool-call regression after #10351 (#10638 )

fix(vllm): non-streaming tool-call regression after #10351 (native_streaming is a capability flag, not a state flag)

#10351 introduced native streaming via `parser.extract_tool_calls_streaming`
and gated the post-loop `extract_tool_calls` block on `native_streaming and
not native_streaming_error`. That works for streaming requests, but for
non-streaming requests the same flag is still True (it only means "the
parser can stream", not "we actually streamed"), so the block was skipped
and the `elif` cleared `content = ""` — the tool call was silently lost.

Symptom: non-streaming chat.completions with `tools=[...]` returns
`finish_reason: "stop"` with `content: ""` and no `tool_calls`. Streaming
requests are unaffected.

Fix: gate both branches on `streaming` too, so the extract_tool_calls
block runs for non-streaming requests (and for streaming requests that
fell back to the buffered path).

Reproduction (vLLM 0.24, Qwen3-Coder-Next-NVFP4, qwen3_coder parser):

    curl -s -X POST http://localhost:8080/v1/chat/completions \
      -H 'Content-Type: application/json' \
      -d '{"model":"coder","stream":false,
           "messages":[{"role":"user","content":"7*8 via calc"}],
           "tools":[{"type":"function","function":{"name":"calc",
             "parameters":{"type":"object",
               "properties":{"expression":{"type":"string"}}}}}]}'

Before: finish_reason: "stop", content: "", tool_calls: []
After:  finish_reason: "tool_calls", tool_calls[0].function.name: "calc"

Streaming path re-verified in the same setup: delta.tool_calls arrives
token-by-token, finish_reason: "tool_calls", no raw XML in content.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

2026-07-02 09:26:14 +02:00

..

backend.py

fix(vllm): non-streaming tool-call regression after #10351 (#10638 )

2026-07-02 09:26:14 +02:00

install.sh

chore: ⬆️ Update vllm-metal (darwin) to v0.3.0.dev20260630095652 (#10616 )

2026-07-01 21:56:59 +02:00

Makefile

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

package.sh

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

README.md

refactor: move backends into the backends directory (#1279 )

2023-11-13 22:40:16 +01:00

requirements-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cublas12-after.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas12.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas13-after.txt

chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.24.0 (#10618 )

2026-07-01 08:53:03 +02:00

requirements-cublas13.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-hipblas-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-hipblas.txt

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

requirements-install.txt

fix(vllm): seed pybind11 for fastsafetensors build under --no-build-isolation

2026-04-28 20:08:26 +00:00

requirements-intel-after.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-intel.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-l4t13-after.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements-l4t13.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements.txt

chore(deps): bump grpcio from 1.81.0 to 1.81.1 in /backend/python/vllm (#10347 )

2026-06-15 22:57:38 +02:00

run.sh

fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

2026-05-06 00:28:01 +02:00

test.py

feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346 ) (#10351 )

2026-06-21 17:07:15 +02:00

test.sh

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm