LocalAI/backend/cpp/llama-cpp-localai-paged/docs at 7b129a51f1fa4ef04cc784d276b80eea514e6807 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Files

History

Ettore Di Giacinto 7b129a51f1 docs(paged): finalize P4 CBv2 record with the measured A/B verdict

The forced-report placeholders are replaced with the completed 60/60-raw A/B
from dgx:~/bench/p4_cbv2/perf_20260702_194359/RESULTS.md: NO-GO confirmed by
measurement, and stronger than flat. CBv2 fair-share chunked prefill regresses
TTFT under staggered load (N=32 p50 +33.6%, N=128 p50 +15.5%) and regresses
aggregate/decode -6.9% beyond noise at staggered N=128. Analysis recorded:
processor-sharing delays near-uniform prompt completion by construction; the
scheduler-shaped-TTFT premise is partially refuted for GB10 (patch 0016 already
captures the schedulable win); TTFT parity routes through P3/P5 prefill compute.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-07-02 18:09:55 +00:00

..

ACCELERATOR_PORTING_SCOPE.md

docs(paged): scope porting the portable benefits to Metal/SYCL/Vulkan (+ROCm)

2026-06-28 08:34:32 +00:00

BENCHMARK.md

docs(paged): record phases 112-140 + series trim decision

2026-07-02 10:16:53 +00:00

DECODE_SERVING_SCOPE.md

docs(paged): record padded/fixed-slot decode shape as tested-and-rejected

2026-06-28 20:47:43 +00:00

EXECUTION_REARCH_SCOPE.md

docs(paged): finalize P4 CBv2 record with the measured A/B verdict

2026-07-02 18:09:55 +00:00

final_benchmark.csv

paged: drop bf16-tau (patch 0026), subsumed by decode fusions (tau=100000 flat, zero speed benefit)

2026-06-28 16:06:06 +00:00

GB10_PARITY_PHASE0_RESULTS.md

docs(paged): record BF16 F32 output broader serving phase

2026-07-01 13:26:50 +00:00

GB10_PARITY_REOPEN_SPEC.md

docs(paged): scope GB10 parity reopen plan

2026-06-30 15:44:11 +00:00

GDN_SHARED_AI_COST_MODEL.md

docs(paged): reject GDN global Ai32 prototype

2026-07-01 01:51:53 +00:00

LOCALAI_LLAMACPP_BACKEND_PLAN.md

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

2026-06-27 13:20:05 +00:00

PAGED_BITEXACT_NOTE.md

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

2026-06-27 13:20:05 +00:00

paged-burst-bench.cpp

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

2026-06-27 13:20:05 +00:00

paged-reclaim-unit.cpp

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

2026-06-27 13:20:05 +00:00

PARITY_HANDOFF.md

docs(paged): finalize P4 CBv2 record with the measured A/B verdict

2026-07-02 18:09:55 +00:00

PATCH_MAINTENANCE.md

docs(paged): record patch mirror readiness phase

2026-07-01 13:11:57 +00:00

PREFILL_GEMM_RESULTS.md

feat(paged): FP4 prefill large-M dequant->bf16 cuBLAS scaffold (patch 0033, default-off)

2026-06-28 17:42:15 +00:00

PREFILL_GEMM_SCOPE.md

docs(paged): scope the large-M NVFP4 prefill GEMM lever (design only)

2026-06-28 16:42:23 +00:00

qwen36_decode_overview.png

docs(paged): add the bf16-tau opt-in line to the decode plots

2026-06-27 22:25:02 +00:00

qwen36_dense_decode_vs_npl.png

docs(paged): add the bf16-tau opt-in line to the decode plots

2026-06-27 22:25:02 +00:00

qwen36_moe_decode_vs_npl.png

docs(paged): add the bf16-tau opt-in line to the decode plots

2026-06-27 22:25:02 +00:00

TENSORCORE_GDN_BUILD_PLAN.md

docs(paged): record GDN tensor-core revalidation phase

2026-07-01 14:05:20 +00:00

TENSORCORE_GDN_SCOPE.md

docs(paged): scope tensor-core (mma) chunked GDN prefill kernel

2026-06-28 17:23:51 +00:00

UPSTREAM_LAYER2_SCOPE.md

docs(paged): scope porting the portable benefits to Metal/SYCL/Vulkan (+ROCm)

2026-06-28 08:34:32 +00:00

VLLM_PARITY_FINAL.md

docs(paged): profile MTP graph reuse loss

2026-07-01 02:32:49 +00:00

VLLM_PARITY_LEVER_MAP.md

docs(paged): record datacenter Blackwell readiness phase

2026-07-01 14:28:41 +00:00