mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-30 19:37:00 -04:00
fix(functions): avoid quadratic-time debug logging in CleanupLLMResult/ParseFunctionCall The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go) calls CleanupLLMResult / ParseFunctionCall once per delta chunk with the *full accumulated* LLM result so far. Both functions xlog.Debug the entire argument on entry and exit, so a single N-chunk stream emits roughly chunk_size * N^2 bytes of debug output. Under LOG_LEVEL=debug this was observed in a recent SGLang-via-LocalAI session on a DGX Spark host (about 50K tokens, long streaming generation) to drive container logs to ~96 GiB, which interacted with the streaming hot loop on the same filesystem and contributed to a host-wide hard hang once disk pressure built up. Workaround was setting LOG_LEVEL=info, but the quadratic shape remains a foot-gun for anyone intentionally enabling debug. Replace the four result-content debug arguments with len(...) plus a fixed-size head (200 bytes via a new truncForLog helper), bounding per- call output to a constant. The debug signal stays useful: the first 200 chars are enough to identify which generation is in flight, and the length lets you observe growth without paying for the payload itself. No API change. No behaviour change for LOG_LEVEL != debug. Signed-off-by: Poseidon <philipp.wacker@ibf-solutions.com> Co-authored-by: Poseidon <philipp.wacker@ibf-solutions.com>