From e1a782b70fcec6e79681e9b6078dd9eeb75cda17 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=B3=8A=E8=88=9F?= <wuj60463@gmail.com>
Date: Fri, 29 May 2026 17:39:09 +0800
Subject: [PATCH] fix(openai): stop streaming tool-call double-emission when
 autoparser is active (#10055)

Streaming /v1/chat/completions could emit the same logical tool call at
multiple `index` values. In processStreamWithTools the Go-side iterative
parser (ParseXMLIterative / ParseJSONIterative) runs on every token and
emits tool-call deltas, while the C++ chat-template autoparser delivers
its own tool calls via ChatDeltas that are flushed at end-of-stream by
ToolCallsFromChatDeltas -> buildDeferredToolCallChunks. With both paths
active the same call is emitted twice at different indices, so OpenAI
clients that accumulate tool calls by `index` dispatch the tool N times.

Skip the Go-side iterative parser once the autoparser is producing tool
calls (hasChatDeltaToolCalls). The deferred flush stays guarded by
lastEmittedCount, so the race where the Go parser emitted before the flag
flipped also remains single-emission. Backends without an autoparser
(e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected.

Refs #9722

Signed-off-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com>
Co-authored-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
---
 core/http/endpoints/openai/chat_stream_workers.go | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/core/http/endpoints/openai/chat_stream_workers.go b/core/http/endpoints/openai/chat_stream_workers.go
index 87f5ee21e..839f40676 100644
--- a/core/http/endpoints/openai/chat_stream_workers.go
+++ b/core/http/endpoints/openai/chat_stream_workers.go
@@ -341,6 +341,19 @@ func processStreamWithTools(
 			}
 		}
 
+		// Issue #9722: when the C++ autoparser is already producing tool
+		// calls (it delivers them via ChatDeltas, which are flushed at
+		// end-of-stream by ToolCallsFromChatDeltas -> buildDeferredToolCallChunks),
+		// skip the Go-side iterative parser below. Running both parsers makes
+		// the same logical tool call surface at multiple `index` values.
+		// The deferred flush is guarded by lastEmittedCount, so the race where
+		// the Go parser already emitted before this flag flipped also stays
+		// single-emission. Backends without an autoparser (e.g. vLLM) keep
+		// hasChatDeltaToolCalls=false and are unaffected.
+		if hasChatDeltaToolCalls {
+			return true
+		}
+
 		// Try incremental XML parsing for streaming support using iterative parser
 		// This allows emitting partial tool calls as they're being generated
 		cleanedResult := functions.CleanupLLMResult(result, cfg.FunctionsConfig)