fix(openai): stop max_tokens streaming retry loop on reasoning models
When a thinking model spends its entire max_tokens budget on the reasoning
block, the C++ autoparser clears the raw Response and delivers reasoning-only
ChatDeltas (no content, no tool calls). ComputeChoices' empty-response retry
then fires and regenerates from scratch up to maxRetries times, each
re-consuming the whole budget, instead of terminating with finish_reason
"length" (issue #9716).
Add a reachedTokenBudget helper and suppress both the built-in and
caller-driven retries when the completion count has reached the configured
max_tokens ceiling. Report finish_reason "length" instead of "stop" in the
streaming and non-streaming chat paths when the budget was exhausted.
Adds a deterministic regression test that counts backend invocations
(previously 6, now 1) plus boundary tests for the helper.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Dennisadira <dennisadira@gmail.com>