Files
LocalAI/backend/cpp
Ettore Di Giacinto b529cc5420 patches(paged): trim series to Phase135 routed-FFN line, sync to fork 1edddc8fe
The campaign patches 0048-0063 were added without matching fork commits.
After a keep/drop review, the series is trimmed and re-mirrored 1:1 onto the
fork branch mudler/llama.cpp:localai-paged (HEAD 1edddc8fe, on 51168c5ee).

Kept, renumbered from the fork (now carry Assisted-by + Signed-off-by):
- 0048 test(paged): cover MoE swiglu down chain      (was 0051, fd920cf8a)
- 0049 test(paged): cover MoE weighted combine chain (was 0052, a85c1e098)
- 0050 test(paged): cover ragged MoE dispatch        (was 0053, 2fed6aacf)
- 0051 fix(speculative): disable backend sampling for MTP drafts (was 0054, f1d976f06)
- 0052 feat(paged): whole-pattern MoE matcher + routed-FFN fused NVFP4-quant
       down MMQ (new, 1edddc8fe)

Dropped (no fork commits, removed from the series):
- 0048-0050 W4A16 grouped-tile pack/tune/pad: dead line, W4A16 ~1.5x slower
  than grouped-MMQ.
- 0055-0063 speculative/moe/mul-mat/cublas route traces + the rejected small-M
  tile-policy knob (0059).
- All other 110-140 campaign markers not needed by Phase135 (GPU-sort,
  W4A16-direct-A, boundary trace/timing, Phase133 sorted-F32, Phase134
  fused-SWIGLU, Phase138 finalize) carry no code in this tree.

Tree-hash proof (the mirror invariant): a fresh detached worktree at
LLAMA_VERSION 0ed235ea2c17a19fc8238668653946721ed136fd with every on-disk
patches/paged/0*.patch applied in numeric order (git apply) stages to tree
097c862c6834b7d8b90419b305b8402155ef8373, byte-identical to fork HEAD
1edddc8fe's tree. Series is 43 patches (0001-0047 unchanged + 0048-0052).

Gated on GB10 sm_121a: default md5 MoE 8cb0ce23 / dense 5951a5b4 unchanged;
opt-in md5-clean; MUL_MAT 1146/1146, MUL_MAT_ID 806/806, GATED_DELTA_NET 46/46,
MOE_SWIGLU_DOWN 7/7, MUL_MAT_ID_RAGGED_MOE 6/6; six mmq_moe_quantized_raw
markers with zero sorted launches on the opt-in sentinel.

Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-02 10:19:10 +00:00
..