mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 12:57:02 -04:00
Newest cloud reasoning models reject two parameters the cloud-proxy
backend currently sends:
- Anthropic (claude-opus-4-x) and OpenAI (gpt-5.x) return 400 when
temperature is present: "'temperature' is deprecated for this model".
OpenAI-compatible clients typically send only the server-side DEFAULT
sampling values rather than user intent, so the translators now forward
neither temperature nor top_p and let the upstream apply its own
defaults.
- OpenAI gpt-5.x rejects max_tokens ("Unsupported parameter: 'max_tokens'
... Use 'max_completion_tokens' instead"). The OpenAI translator now
serializes the token limit as max_completion_tokens, which current
chat-completions models accept.
Verified live against claude-opus-4-8, gpt-5.5 and gemini-3.1-pro
(Gemini OpenAI-compat endpoint). Tests updated to the new contract.
Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: stefanwalcz <stefan.walcz@walcz.de>