Files
LocalAI/backend/cpp
Ettore Di Giacinto 7b129a51f1 docs(paged): finalize P4 CBv2 record with the measured A/B verdict
The forced-report placeholders are replaced with the completed 60/60-raw A/B
from dgx:~/bench/p4_cbv2/perf_20260702_194359/RESULTS.md: NO-GO confirmed by
measurement, and stronger than flat. CBv2 fair-share chunked prefill regresses
TTFT under staggered load (N=32 p50 +33.6%, N=128 p50 +15.5%) and regresses
aggregate/decode -6.9% beyond noise at staggered N=128. Analysis recorded:
processor-sharing delays near-uniform prompt completion by construction; the
scheduler-shaped-TTFT premise is partially refuted for GB10 (patch 0016 already
captures the schedulable win); TTFT parity routes through P3/P5 prefill compute.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-02 18:09:55 +00:00
..