Commit Graph

35 Commits

Author SHA1 Message Date
Ettore Di Giacinto
cced07c7fe docs(paged): add MTP shape trace patch
Add the next incremental llama.cpp patch for default-off speculative batch-shape tracing and record the Phase 18 red/green and inference-gate results.

Assisted-by: Codex:gpt-5
2026-07-01 02:54:29 +00:00
Ettore Di Giacinto
6e35476340 docs(paged): scope MTP graph-shape follow-up
Record Phase 17 source inspection: MTP verification rows change hard graph dimensions, padding is not a safe shortcut, and any future work should start with shape instrumentation before an opt-in scheduler experiment.

Assisted-by: Codex:gpt-5
2026-07-01 02:37:21 +00:00
Ettore Di Giacinto
ae76d42a96 docs(paged): profile MTP graph reuse loss
Record Phase 16 nsys evidence that current MTP serving loses paged decode graph reuse and increases GPU work, explaining the Phase 15 serving regression.

Assisted-by: Codex:gpt-5
2026-07-01 02:32:49 +00:00
Ettore Di Giacinto
4d171e62bb docs(paged): reject MTP serving lever
Add the repeatable MTP serving A/B runner and record Phase 15 results showing current llama-server MTP regresses GB10 serving throughput despite passing inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 02:29:28 +00:00
Ettore Di Giacinto
70394364a3 docs(paged): gate MTP rollback safety
Record Phase 14 MTP rollback evidence, normalized greedy-prefix checks, and canonical inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 02:15:11 +00:00
Ettore Di Giacinto
ede23df333 docs(paged): close W4A16 pad checklist
Mark the rejected-branch disposition as not taken because Phase 4 was kept as patch 0050 with recorded md5, op, perf, and mirror gates.

Assisted-by: Codex:gpt-5
2026-07-01 01:58:22 +00:00
Ettore Di Giacinto
abc70c209e docs(paged): close ragged MoE dispatch shortcut
Record the Phase 8 safety rerun, canonical transcript md5 gates, full and ragged MUL_MAT_ID op gates, and the no-production-patch decision for metadata-only fused dispatch work.

Assisted-by: Codex:gpt-5
2026-07-01 01:57:45 +00:00
Ettore Di Giacinto
2074b4fb5b docs(paged): reject GDN global Ai32 prototype
Record the default-off Global-Ai32 implementation, exact md5 gates, GB10 A/B regression, rejected diff artifact, and the resulting stop decision for GDN kernel work on GB10.

Assisted-by: Codex:gpt-5
2026-07-01 01:51:53 +00:00
Ettore Di Giacinto
adabd11919 docs(paged): scope GDN global Ai32 prototype
Record the shared-A/Ai GB10 cost model, the GO decision for one default-off f32 Ai prototype, and the Phase 13 implementation plan.

Assisted-by: Codex:gpt-5
2026-07-01 01:38:51 +00:00
Ettore Di Giacinto
1b5ae227eb docs(paged): reject GDN M5 QS-early phase
Record the Phase 11 default-off QS-early GDN experiment, its canonical md5 gates, the same-session GB10 A/B regression, and the rejected diff artifact.

Assisted-by: Codex:gpt-5
2026-07-01 01:29:44 +00:00
Ettore Di Giacinto
24e778de47 docs(paged): scope GDN M5 state-boundary phase
Add the Phase 11 design and implementation plan for a default-off C16 M5 QS-early GDN experiment after rejecting C32 slabs.

Assisted-by: Codex:gpt-5
2026-07-01 01:21:05 +00:00
Ettore Di Giacinto
3da3b169fb docs(paged): reject GDN C32 slab phase
Record the default-off C32 slab experiment, its md5 gates, the dense tail-row fix, and the performance regression that rejects the source patch.

Assisted-by: Codex:gpt-5
2026-07-01 01:15:00 +00:00
Ettore Di Giacinto
ff3ad84191 docs(paged): record GDN C32 slab baseline
Record the Phase 10 current-M5 prefill baseline and the source inspection finding that C32 M5 needs a real U-staging implementation rather than a launcher-only shortcut.

Assisted-by: Codex:gpt-5
2026-07-01 00:58:54 +00:00
Ettore Di Giacinto
9bbe02c161 fix(paged): gate MTP backend sampling
Record the Phase 9 MTP smoke gate, mirror the fork patch that disables backend sampling for MTP drafts, and scope the follow-up C32 slab GDN prefill phase.

Assisted-by: Codex:gpt-5
2026-07-01 00:54:25 +00:00
Ettore Di Giacinto
b862e2c568 docs(paged): stop ragged dispatch source shortcut
Assisted-by: Codex:gpt-5
2026-07-01 00:42:36 +00:00
Ettore Di Giacinto
b009de0ee0 test(paged): mirror ragged MoE dispatch gate
Assisted-by: Codex:gpt-5
2026-07-01 00:41:21 +00:00
Ettore Di Giacinto
89ef3a4020 docs(paged): record ragged MoE profile gate
Assisted-by: Codex:gpt-5
2026-07-01 00:35:21 +00:00
Ettore Di Giacinto
ef14748f06 docs(paged): scope ragged MoE dispatch phase
Assisted-by: Codex:gpt-5
2026-07-01 00:26:01 +00:00
Ettore Di Giacinto
b6885aa446 docs(paged): reject weighted combine fusion candidate
Assisted-by: Codex:gpt-5
2026-07-01 00:20:53 +00:00
Ettore Di Giacinto
4b6fc0fa1c test(paged): mirror MoE weighted combine gate
Assisted-by: Codex:gpt-5
2026-06-30 23:51:52 +00:00
Ettore Di Giacinto
22a93ce1a3 docs(paged): select weighted combine candidate
Assisted-by: Codex:gpt-5
2026-06-30 23:47:34 +00:00
Ettore Di Giacinto
3cf7fa1715 docs(paged): reject swiglu down fusion candidate
Assisted-by: Codex:gpt-5
2026-06-30 23:41:38 +00:00
Ettore Di Giacinto
d0fa463eac test(paged): mirror MoE swiglu down gate
Mirror the llama.cpp Phase 7 test gate for the merged MoE gate_up/SWIGLU/down chain and record the DGX md5/op gate evidence.

Assisted-by: Codex:gpt-5
2026-06-30 23:20:52 +00:00
Ettore Di Giacinto
34c4b5ce8d docs(paged): scope phase7 serving candidates
Mark the Phase 6 serving classifier complete, preserve the old parity final as historical, and scope Phase 7 source candidates with explicit md5 and op gates.

Assisted-by: Codex:gpt-5
2026-06-30 23:12:09 +00:00
Ettore Di Giacinto
b647460dee docs(paged): record phase6 serving classifier
Record both-engine serving nsys buckets, rejected sampler short-circuit, and rejected GDN/MMQ env grids for the GB10 parity work.

Assisted-by: Codex:gpt-5
2026-06-30 23:04:15 +00:00
Ettore Di Giacinto
f9e015d8e2 docs(paged): record W4A16 Wq padding rejection
Record the Phase 5 Wq shared-memory padding experiment, its gates, sub-threshold benchmark gain, and the decision to ship no 0051 patch.

Assisted-by: Codex:gpt-5
2026-06-30 22:23:14 +00:00
Ettore Di Giacinto
85c88320ef patches(paged): pad W4A16 A shared tile stride
Mirror fork commit d9b9be0be as patch 0050 and record the Phase 4 W4A16 shared-memory padding gates, benchmarks, and mirror verification.

Assisted-by: Codex:gpt-5
2026-06-30 22:15:21 +00:00
Ettore Di Giacinto
8b413d1cbd docs(paged): record W4A16 scale broadcast rejection
Record the Phase 3 scale-broadcast experiment, its md5 and MUL_MAT_ID gates, the prefill regression, and the decision to ship no 0050 patch.

Assisted-by: Codex:gpt-5
2026-06-30 22:06:17 +00:00
Ettore Di Giacinto
c5f2545cdd patches(paged): tune W4A16 grouped tile shape
Mirror fork commit 7dfa0e175 as patch 0049 and record the Phase 2 GB10 W4A16 shape sweep, md5 gates, MUL_MAT_ID checks, and mirror verification.

Assisted-by: Codex:gpt-5
2026-06-30 21:57:42 +00:00
Ettore Di Giacinto
d8edc615e7 patches(paged): mirror W4A16 packed metadata
Mirror the fork-first W4A16 packed tile metadata commit into the LocalAI paged patch series, record the Phase 1 benchmark result, and keep the implementation plan checked off.

Assisted-by: Codex:gpt-5
2026-06-30 21:21:53 +00:00
Ettore Di Giacinto
1c0709b700 docs(paged): record W4A16 phase1 kill gate
Record the clean forced W4A16 baseline, default comparison, selected metadata target, and completed plan checkpoint for the GB10 parity reopen.

Assisted-by: Codex:gpt-5
2026-06-30 20:40:40 +00:00
Ettore Di Giacinto
337ebb8a37 docs(paged): record phase0 decode repro
Record comparable graph-node-traced paged and vLLM decode difference-method artifacts for the GB10 parity reopen.

Assisted-by: Codex:gpt-5
2026-06-30 20:35:43 +00:00
Ettore Di Giacinto
ef5d4af203 docs(paged): record phase0 prefill baseline
Record clean-source MoE and dense prefill baselines for the GB10 parity reopen and mark the plan checkpoint complete.

Assisted-by: Codex:gpt-5
2026-06-30 20:22:18 +00:00
Ettore Di Giacinto
a9a2efb296 docs(paged): record phase0 clean build gates
Record the clean DGX build retry, binary provenance, canonical greedy md5 gates, and completed plan steps for the GB10 parity reopen.

Assisted-by: Codex:gpt-5
2026-06-30 20:19:14 +00:00
Ettore Di Giacinto
d288a0300f docs(paged): add GB10 parity implementation plan
Add the Superpowers implementation plan for the GB10 parity reopen, including Phase 0 provenance, decode repro, W4A16 kill gates, and later kernel workstream entry criteria.

Assisted-by: Codex:gpt-5
2026-06-30 15:50:01 +00:00