Files
LocalAI/backend/cpp/llama-cpp-localai-paged/docs/final_benchmark.csv
Ettore Di Giacinto 08b754f910 chore(paged): keep patches/ patch-only; README to backend root, docs to docs/
The llama-cpp-localai-paged patches/ dir had accumulated docs, plots, a csv,
dev .cpp harnesses, and a dead FP4-MoE kernel scaffold after an earlier git-mv.
Restore the invariant that patches/ holds only the .patch series.

Moves:
- patches/paged/README.md -> README.md (canonical doc at the backend root)
- patches/paged/{PIN_SYNC_c299a92c,PAGED_BITEXACT_NOTE,LOCALAI_LLAMACPP_BACKEND_PLAN,UPSTREAM_LAYER2_SCOPE}.md,
  final_benchmark.csv, qwen36_*.png, paged-burst-bench.cpp, paged-reclaim-unit.cpp -> docs/
- patches/README.md -> docs/PATCH_MAINTENANCE.md (unique patch-regen recipe not in the canonical README)

Deletes:
- patches/BENCHMARKS.md (superseded by README section 4 + the dev-notes section)
- patches/kernel/ (dead FP4-MoE scaffold, never in the 0001-0030 apply glob, zero refs repo-wide)

Repoint every reference to the moved files: README internal links (docs/ + the
.github links drop from 5x ../ to 3x ../), .agents/llama-cpp-localai-paged-backend.md,
.github/scripts/paged-canary-apply.sh, .github/workflows/llama-cpp-paged-canary.yml,
the wrapper Makefile, backend/cpp/llama-cpp/grpc-server.cpp, backend/index.yaml,
docs/content/features/backends.md, gallery/index.yaml.

The build apply glob PAGED_PATCHES_DIR/0*.patch (PAGED_PATCHES_DIR := .../patches/paged)
is unchanged and still resolves to the 28 patches.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 13:20:05 +00:00

18 lines
983 B
CSV

model,engine,npl,decode_agg_tps,decode_perseq_tps,prefill_tps,ttft_mean_ms,peak_gb
q36-27b-nvfp4,llama,8,82.5,9.57,507.3,6038.1,53.51
q36-27b-nvfp4,llama,32,192.6,4.79,115.0,133551.7,69.63
q36-27b-nvfp4,llama,64,277.8,3.09,95.9,321618.8,83.96
q36-27b-nvfp4,llama,128,384.6,1.86,69.7,902762.7,93.82
q36-27b-nvfp4,vllm,8,70.4,8.76,2096.2,1861.1,110.92
q36-27b-nvfp4,vllm,32,211.8,6.28,2182.6,5353.2,110.87
q36-27b-nvfp4,vllm,64,309.1,4.38,2088.9,9512.4,110.88
q36-27b-nvfp4,vllm,128,418.8,2.79,1929.1,18449.5,110.95
q36-35b-a3b-nvfp4,llama,8,211.8,24.45,1236.4,2477.1,39.66
q36-35b-a3b-nvfp4,llama,32,393.0,10.02,1213.9,8225.2,47.11
q36-35b-a3b-nvfp4,llama,64,527.0,6.15,1152.3,15849.5,57.13
q36-35b-a3b-nvfp4,llama,128,726.4,3.73,276.8,213017.2,61.51
q36-35b-a3b-nvfp4,vllm,8,256.5,31.84,5186.5,768.8,109.62
q36-35b-a3b-nvfp4,vllm,32,500.8,14.90,6223.4,1830.4,109.63
q36-35b-a3b-nvfp4,vllm,64,686.1,9.83,5926.5,3224.4,109.63
q36-35b-a3b-nvfp4,vllm,128,882.2,6.05,5300.5,6487.7,109.64