chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

The llama-cpp-localai-paged patches/ dir had accumulated docs, plots, a csv,
dev .cpp harnesses, and a dead FP4-MoE kernel scaffold after an earlier git-mv.
Restore the invariant that patches/ holds only the .patch series.

Moves:
- patches/paged/README.md -> README.md (canonical doc at the backend root)
- patches/paged/{PIN_SYNC_c299a92c,PAGED_BITEXACT_NOTE,LOCALAI_LLAMACPP_BACKEND_PLAN,UPSTREAM_LAYER2_SCOPE}.md,
  final_benchmark.csv, qwen36_*.png, paged-burst-bench.cpp, paged-reclaim-unit.cpp -> docs/
- patches/README.md -> docs/PATCH_MAINTENANCE.md (unique patch-regen recipe not in the canonical README)

Deletes:
- patches/BENCHMARKS.md (superseded by README section 4 + the dev-notes section)
- patches/kernel/ (dead FP4-MoE scaffold, never in the 0001-0030 apply glob, zero refs repo-wide)

Repoint every reference to the moved files: README internal links (docs/ + the
.github links drop from 5x ../ to 3x ../), .agents/llama-cpp-localai-paged-backend.md,
.github/scripts/paged-canary-apply.sh, .github/workflows/llama-cpp-paged-canary.yml,
the wrapper Makefile, backend/cpp/llama-cpp/grpc-server.cpp, backend/index.yaml,
docs/content/features/backends.md, gallery/index.yaml.

The build apply glob PAGED_PATCHES_DIR/0*.patch (PAGED_PATCHES_DIR := .../patches/paged)
is unchanged and still resolves to the 28 patches.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto
2026-06-27 13:20:05 +00:00
parent db14006fcd
commit 08b754f910
21 changed files with 41 additions and 235 deletions

View File

@@ -0,0 +1,17 @@
model,engine,npl,decode_agg_tps,decode_perseq_tps,prefill_tps,ttft_mean_ms,peak_gb
q36-27b-nvfp4,llama,8,82.5,9.57,507.3,6038.1,53.51
q36-27b-nvfp4,llama,32,192.6,4.79,115.0,133551.7,69.63
q36-27b-nvfp4,llama,64,277.8,3.09,95.9,321618.8,83.96
q36-27b-nvfp4,llama,128,384.6,1.86,69.7,902762.7,93.82
q36-27b-nvfp4,vllm,8,70.4,8.76,2096.2,1861.1,110.92
q36-27b-nvfp4,vllm,32,211.8,6.28,2182.6,5353.2,110.87
q36-27b-nvfp4,vllm,64,309.1,4.38,2088.9,9512.4,110.88
q36-27b-nvfp4,vllm,128,418.8,2.79,1929.1,18449.5,110.95
q36-35b-a3b-nvfp4,llama,8,211.8,24.45,1236.4,2477.1,39.66
q36-35b-a3b-nvfp4,llama,32,393.0,10.02,1213.9,8225.2,47.11
q36-35b-a3b-nvfp4,llama,64,527.0,6.15,1152.3,15849.5,57.13
q36-35b-a3b-nvfp4,llama,128,726.4,3.73,276.8,213017.2,61.51
q36-35b-a3b-nvfp4,vllm,8,256.5,31.84,5186.5,768.8,109.62
q36-35b-a3b-nvfp4,vllm,32,500.8,14.90,6223.4,1830.4,109.63
q36-35b-a3b-nvfp4,vllm,64,686.1,9.83,5926.5,3224.4,109.63
q36-35b-a3b-nvfp4,vllm,128,882.2,6.05,5300.5,6487.7,109.64
1 model engine npl decode_agg_tps decode_perseq_tps prefill_tps ttft_mean_ms peak_gb
2 q36-27b-nvfp4 llama 8 82.5 9.57 507.3 6038.1 53.51
3 q36-27b-nvfp4 llama 32 192.6 4.79 115.0 133551.7 69.63
4 q36-27b-nvfp4 llama 64 277.8 3.09 95.9 321618.8 83.96
5 q36-27b-nvfp4 llama 128 384.6 1.86 69.7 902762.7 93.82
6 q36-27b-nvfp4 vllm 8 70.4 8.76 2096.2 1861.1 110.92
7 q36-27b-nvfp4 vllm 32 211.8 6.28 2182.6 5353.2 110.87
8 q36-27b-nvfp4 vllm 64 309.1 4.38 2088.9 9512.4 110.88
9 q36-27b-nvfp4 vllm 128 418.8 2.79 1929.1 18449.5 110.95
10 q36-35b-a3b-nvfp4 llama 8 211.8 24.45 1236.4 2477.1 39.66
11 q36-35b-a3b-nvfp4 llama 32 393.0 10.02 1213.9 8225.2 47.11
12 q36-35b-a3b-nvfp4 llama 64 527.0 6.15 1152.3 15849.5 57.13
13 q36-35b-a3b-nvfp4 llama 128 726.4 3.73 276.8 213017.2 61.51
14 q36-35b-a3b-nvfp4 vllm 8 256.5 31.84 5186.5 768.8 109.62
15 q36-35b-a3b-nvfp4 vllm 32 500.8 14.90 6223.4 1830.4 109.63
16 q36-35b-a3b-nvfp4 vllm 64 686.1 9.83 5926.5 3224.4 109.63
17 q36-35b-a3b-nvfp4 vllm 128 882.2 6.05 5300.5 6487.7 109.64