mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-26 01:16:58 -04:00
The final benchmark exposed TTFT as the weakest number (dense npl128 903s vs vLLM 6-18s, decode-first budget throttling burst-prefill) plus a concrete paged-pool burst-degradation bug (post-burst low-npl prefill collapses 507->65 t/s; decode unaffected). Highest-value serving fix; decode + memory already strong. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>