mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 04:46:54 -04:00
The forced-report placeholders are replaced with the completed 60/60-raw A/B from dgx:~/bench/p4_cbv2/perf_20260702_194359/RESULTS.md: NO-GO confirmed by measurement, and stronger than flat. CBv2 fair-share chunked prefill regresses TTFT under staggered load (N=32 p50 +33.6%, N=128 p50 +15.5%) and regresses aggregate/decode -6.9% beyond noise at staggered N=128. Analysis recorded: processor-sharing delays near-uniform prompt completion by construction; the scheduler-shaped-TTFT premise is partially refuted for GB10 (patch 0016 already captures the schedulable win); TTFT parity routes through P3/P5 prefill compute. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>