fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packages (#10257 )

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
fix(vllm): parse tool_call function arguments before applying the chat template (#10256 )
2026-06-11 18:27:32 -04:00 · 2026-06-11 23:57:00 +02:00 · 2026-06-11 23:55:38 +02:00 · 2026-06-11 18:33:58 +02:00 · 2026-06-11 18:33:38 +02:00 · 2026-06-11 18:32:50 +02:00
6 changed files with 21 additions and 4 deletions
--- a/1
+++ b/1
@@ -108,6 +108,7 @@ RUN <<EOT bash
        apt-get update && \
        apt-get install -y --no-install-recommends \
            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            cuda-nvrtc-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
--- a/backend/Dockerfile.python
+++ b/backend/Dockerfile.python
@@ -126,6 +126,7 @@ RUN <<EOT bash
        apt-get update && \
        apt-get install -y --no-install-recommends \
            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            cuda-nvrtc-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=039e20a2db9e87b2477c76cc04905f3e1acad77f
+LLAMA_VERSION?=ac4cddeb0dbd778f650bf568f6f08344a06abe3a
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=c29f6653a516a3001d923944dad8892072cc7334
+CRISPASR_VERSION?=4b27392ffd0991a857594652cbb8b57e585bcd7b
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/python/vllm/backend.py
+++ b/backend/python/vllm/backend.py
@@ -150,9 +150,24 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
                d["reasoning_content"] = msg.reasoning_content
            if msg.tool_calls:
                try:
-                    d["tool_calls"] = json.loads(msg.tool_calls)
+                    tool_calls = json.loads(msg.tool_calls)
                except json.JSONDecodeError:
                    pass
+                else:
+                    # OpenAI wire format carries function.arguments as a
+                    # JSON-encoded string, but chat templates (e.g. Qwen3)
+                    # iterate over it as a mapping. vLLM's own OpenAI server
+                    # parses arguments before applying the template, so do
+                    # the same here.
+                    if isinstance(tool_calls, list):
+                        for tc in tool_calls:
+                            func = tc.get("function") if isinstance(tc, dict) else None
+                            if isinstance(func, dict) and isinstance(func.get("arguments"), str):
+                                try:
+                                    func["arguments"] = json.loads(func["arguments"])
+                                except json.JSONDecodeError:
+                                    pass
+                    d["tool_calls"] = tool_calls
            result.append(d)
        return result

--- a/docs/content/advanced/model-configuration.md
+++ b/docs/content/advanced/model-configuration.md
@@ -429,7 +429,7 @@ name: my-model
 reasoning_effort: none   # none | minimal | low | medium | high
 ```

-For [realtime pipelines]({{%relref "docs/features/openai-realtime" %}}), set it on the pipeline so it applies to the pipeline's LLM without editing that model's own config:
+For [realtime pipelines]({{%relref "features/openai-realtime" %}}), set it on the pipeline so it applies to the pipeline's LLM without editing that model's own config:

 ```yaml
 name: gpt-realtime
Author	SHA1	Message	Date
pos-ei-don	58cdc050e9	fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packages (#10257 ) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-11 23:57:00 +02:00
pos-ei-don	b962f4a192	fix(vllm): parse tool_call function arguments before applying the chat template (#10256 ) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-11 23:55:38 +02:00
LocalAI [bot]	b6fcb3e1db	chore: ⬆️ Update CrispStrobe/CrispASR to `4b27392ffd0991a857594652cbb8b57e585bcd7b` (#10241 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-11 18:33:58 +02:00
LocalAI [bot]	ff09683d84	chore: ⬆️ Update ggml-org/llama.cpp to `ac4cddeb0dbd778f650bf568f6f08344a06abe3a` (#10239 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-11 18:33:38 +02:00
LocalAI [bot]	f618636c71	docs: fix broken relref to realtime page (#10255 ) Hugo fails the gh-pages build with REF_NOT_FOUND because the relref in model-configuration.md uses the 'docs/' prefix; refs are resolved relative to content/, so the page lives at 'features/openai-realtime' (as the other ref in the same file already uses). Assisted-by: Claude Code:claude-fable-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-11 18:32:50 +02:00