Files
LocalAI/backend/cpp/llama-cpp
Ettore Di Giacinto 362eea90ff docs(paged): fair re-run verdict - synthesize NVFP4 llama vs vLLM scorecard
Phase 3 synthesis of the max_prefill_tokens (patch 0013) fair re-run:
how much of the gap was prefill starvation, the genuine remaining gap
to vLLM, and where par-or-beat stands per concurrency/model.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-23 21:39:22 +00:00
..
2026-04-12 08:51:30 +02:00