LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-24 16:49:06 -04:00

Files

Ettore Di Giacinto 362eea90ff docs(paged): fair re-run verdict - synthesize NVFP4 llama vs vLLM scorecard

Phase 3 synthesis of the max_prefill_tokens (patch 0013) fair re-run:
how much of the gap was prefill starvation, the genuine remaining gap
to vLLM, and where par-or-beat stands per concurrency/model.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-23 21:39:22 +00:00

paged

feat(paged): target-readiness for 2xH200 - correctness PASS, load-gen harness, projection

2026-06-21 23:16:28 +00:00

patches

docs(paged): fair re-run verdict - synthesize NVFP4 llama vs vLLM scorecard

2026-06-23 21:39:22 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat(llama-cpp): per-model max_prefill_tokens option (chunked-prefill QoS budget)