test(paged): mirror MoE weighted combine gate

Assisted-by: Codex:gpt-5
2026-07-03 04:46:54 -04:00 · 2026-06-30 23:51:52 +00:00
parent 22a93ce1a3
commit 4b6fc0fa1c
3 changed files with 166 additions and 1 deletions
--- a/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
+++ b/backend/cpp/llama-cpp-localai-paged/docs/GB10_PARITY_PHASE0_RESULTS.md
@@ -620,3 +620,26 @@ Result:
  the fused quantizer used compact GLU-output strides to read split `gate`/`up`
  views. Split views stride over the merged gate/up tensor; using source-view
  strides fixed the op gate but not the end-to-end md5 drift.
+
+## Phase 7 Weighted-Combine Test Gate
+
+Fork commit `3ef7eb9e4d` added patch
+`0052-test-paged-cover-MoE-weighted-combine-chain.patch`. This is a test-only
+patch; it does not change the production inference path.
+
+The new `MOE_WEIGHTED_COMBINE` whole-graph gate covers:
+
+`down MUL_MAT_ID -> router-weight ggml_mul -> rank-ordered expert views/adds`.
+
+DGX artifact:
+
+- `/home/mudler/bench/phase7_source_scope/test_backend_ops_moe_weighted_combine_green.txt`
+
+DGX result:
+
+- `test-backend-ops test -b CUDA0 -o MOE_WEIGHTED_COMBINE -j 1`: `7/7`.
+
+This gate is the correctness target for the next candidate: a deterministic
+post-down MoE weighted-combine fusion that preserves current f32 product and
+rank-order add semantics while avoiding the rejected SWIGLU/FP4-quantization
+shortcut.