LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-26 01:16:58 -04:00

Files

Ettore Di Giacinto 7dd3431040 docs(paged): promote TTFT/prefill + paged-pool burst-degradation bug (benchmark finding)

The final benchmark exposed TTFT as the weakest number (dense npl128 903s vs vLLM
6-18s, decode-first budget throttling burst-prefill) plus a concrete paged-pool
burst-degradation bug (post-burst low-npl prefill collapses 507->65 t/s; decode
unaffected). Highest-value serving fix; decode + memory already strong.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-26 03:54:27 +00:00

paged

feat(paged): target-readiness for 2xH200 - correctness PASS, load-gen harness, projection

2026-06-21 23:16:28 +00:00

patches

docs(paged): promote TTFT/prefill + paged-pool burst-degradation bug (benchmark finding)

2026-06-26 03:54:27 +00:00

CMakeLists.txt

feat: single-build ggml CPU_ALL_VARIANTS for llama-cpp + turboquant (x86/arm64/apple) (#10497 )

2026-06-25 15:47:03 +02:00

grpc-server.cpp

Merge branch 'master' into worktree-feat+paged-attention