Commit Graph

53 Commits

Author SHA1 Message Date
Ettore Di Giacinto
3dbf34e739 docs(paged): record admission histogram trace
Record Phase54 trace-only histogram work, DGX md5/op gates, dense serving histogram evidence, and the next scheduler decision.

Assisted-by: Codex:gpt-5
2026-07-01 09:40:50 +00:00
Ettore Di Giacinto
347a5c05bd docs(paged): reject admission budget sweep
Assisted-by: Codex:gpt-5
2026-07-01 09:27:20 +00:00
Ettore Di Giacinto
2aa76702df docs(paged): record dense admission trace
Assisted-by: Codex:gpt-5
2026-07-01 09:18:43 +00:00
Ettore Di Giacinto
b5f65152e2 docs(paged): record serving admission trace
Assisted-by: Codex:gpt-5
2026-07-01 09:08:42 +00:00
Ettore Di Giacinto
c299dcd231 docs(paged): record dense true decode profile
Assisted-by: Codex:gpt-5
2026-07-01 08:55:23 +00:00
Ettore Di Giacinto
cd59e5d61f fix(paged): scrub harness vars for vllm serve
Assisted-by: Codex:gpt-5
2026-07-01 08:23:05 +00:00
Ettore Di Giacinto
96825a224e docs(paged): record dense serving snapshot
Assisted-by: Codex:gpt-5
2026-07-01 08:20:26 +00:00
Ettore Di Giacinto
440129c98e fix(paged): harden serving snapshot readiness
Assisted-by: Codex:gpt-5
2026-07-01 08:07:48 +00:00
Ettore Di Giacinto
e69ee0e867 feat(paged): parameterize served model name
Assisted-by: Codex:gpt-5
2026-07-01 07:50:19 +00:00
Ettore Di Giacinto
2a0fc0f4b9 docs(paged): record inference gate guard
Assisted-by: Codex:gpt-5
2026-07-01 07:45:52 +00:00
Ettore Di Giacinto
ae8284f5fb feat(paged): parameterize vllm serving snapshot
Assisted-by: Codex:gpt-5
2026-07-01 07:41:55 +00:00
Ettore Di Giacinto
ecaf406c0b docs(paged): reject persistent gate fusion shortcut
Assisted-by: Codex:gpt-5
2026-07-01 07:34:27 +00:00
Ettore Di Giacinto
b9eff5bca3 docs(paged): reconcile next parity target
Assisted-by: Codex:gpt-5
2026-07-01 07:31:26 +00:00
Ettore Di Giacinto
aa848d5afb docs(paged): record low-concurrency serving check
Assisted-by: Codex:gpt-5
2026-07-01 07:24:28 +00:00
Ettore Di Giacinto
d44e164c96 docs(paged): record max-concurrency parity check
Assisted-by: Codex:gpt-5
2026-07-01 07:13:48 +00:00
Ettore Di Giacinto
52c11b1ce5 docs(paged): reject graph-time gate fusion shortcut
Assisted-by: Codex:gpt-5
2026-07-01 06:56:01 +00:00
Ettore Di Giacinto
5354adcffb docs(paged): scope gate projection policy
Assisted-by: Codex:gpt-5
2026-07-01 06:50:19 +00:00
Ettore Di Giacinto
9f75da01f9 feat(paged): add cublas tensor-name trace patch
Add patch 0063 extending LLAMA_CUBLAS_ROUTE_TRACE with src0/src1/dst tensor names.

Record Phase 37 gates and the conclusion that SGEMM traces to MoE gate tensors.

Assisted-by: Codex:gpt-5
2026-07-01 06:41:00 +00:00
Ettore Di Giacinto
fbdc200886 feat(paged): add cublas route trace patch
Add patch 0062 with default-off LLAMA_CUBLAS_ROUTE_TRACE instrumentation for generic cuBLAS MUL_MAT subroutes.

Record Phase 36 DGX gates, serving trace results, and the next projection follow-up scope.

Assisted-by: Codex:gpt-5
2026-07-01 06:24:46 +00:00
Ettore Di Giacinto
49cce0b5a2 feat(paged): add mul mat route trace patch
Add LocalAI patch 0061 from the llama.cpp fork and record Phase 35 gates, serving route counts, and the updated patch mirror invariant.

Assisted-by: Codex:gpt-5
2026-07-01 05:52:09 +00:00
Ettore Di Giacinto
ba1979a689 feat(paged): add moe mmid route trace patch
Add LocalAI patch 0060 from the llama.cpp fork and record Phase 34 gates, serving route counts, and the updated patch mirror invariant.

Assisted-by: Codex:gpt-5
2026-07-01 05:37:53 +00:00
Ettore Di Giacinto
7665422bfa feat(paged): add moe small-m mmq tile policy gate
Assisted-by: Codex:gpt-5
2026-07-01 05:20:18 +00:00
Ettore Di Giacinto
70a4c31f36 feat(paged): add moe small-m mmq candidate trace
Assisted-by: Codex:gpt-5
2026-07-01 05:08:31 +00:00
Ettore Di Giacinto
e189e5a4ca feat(paged): add moe mmq launch trace patch
Assisted-by: Codex:gpt-5
2026-07-01 04:54:33 +00:00
Ettore Di Giacinto
b28b448c68 docs(paged): record mmq shape serving profile
Assisted-by: Codex:gpt-5
2026-07-01 04:36:04 +00:00
Ettore Di Giacinto
2148fa466b feat(paged): add moe mmq shape trace patch
Assisted-by: Codex:gpt-5
2026-07-01 04:32:12 +00:00
Ettore Di Giacinto
3b9ec3e1f1 docs(paged): record mmq occupancy rejection
Assisted-by: Codex:gpt-5
2026-07-01 04:18:12 +00:00
Ettore Di Giacinto
3c2cb9f4ab docs(paged): record graph-node serving profile
Record the Phase 27 current-stack llama.cpp n128 serving profile captured with CUDA graph node tracing and gated before and after the run.

Assisted-by: Codex:gpt-5
2026-07-01 04:00:14 +00:00
Ettore Di Giacinto
ace1ffab28 docs(paged): record audited current snapshot
Record the Phase 26 current-stack paged-vs-vLLM serving snapshot with hardware classification and compact pre/post inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 03:48:27 +00:00
Ettore Di Giacinto
a0194125f5 chore(paged): summarize snapshot inference gates
Emit a compact gate_summary.tsv from current serving snapshots so each artifact records the pre/post MoE md5, dense md5, and backend op checks. Add a summary-only mode for auditing existing artifacts and document the Phase 25 backfill on the Phase 20 snapshot.

Assisted-by: Codex:gpt-5
2026-07-01 03:35:54 +00:00
Ettore Di Giacinto
7108b68a70 chore(paged): record snapshot hardware class
Add a hardware report to the current serving snapshot harness so every paged-vs-vLLM artifact records the GPU identity and conservative hardware class before any server starts. Document the Phase 24 dry run and the GB10 classification for future parity comparisons.

Assisted-by: Codex:gpt-5
2026-07-01 03:31:11 +00:00
Ettore Di Giacinto
7aa15ce539 docs(paged): refresh parity handoff coordinates
Update the paged parity handoff to the current fork head, patch count, mirror invariant, current serving harness, and LocalAI AI-attribution policy after Phases 20-22.

Assisted-by: Codex:gpt-5
2026-07-01 03:25:14 +00:00
Ettore Di Giacinto
6c165747a9 docs(paged): verify patch-series mirror invariant
Record the Phase 22 strict git-apply mirror check proving the LocalAI paged patch series reconstructs the canonical llama.cpp fork tree after patch 0055.

Assisted-by: Codex:gpt-5
2026-07-01 03:22:43 +00:00
Ettore Di Giacinto
ff3f0620de chore(paged): add current serving snapshot harness
Add a reusable current-stack paged-vs-vLLM serving snapshot harness that targets the clean DGX mirror, enforces idle/lock preflight, runs pre/post inference gates, and records ratio summaries.

Assisted-by: Codex:gpt-5
2026-07-01 03:19:36 +00:00
Ettore Di Giacinto
c99678da42 docs(paged): refresh current serving snapshot
Record the Phase 20 same-session MoE paged-vs-vLLM serving snapshot on the current clean DGX mirror, including pre/post inference gates and the resulting GB10 parity decision.

Assisted-by: Codex:gpt-5
2026-07-01 03:15:30 +00:00
Ettore Di Giacinto
310eb3c866 docs(paged): reject MTP draft-shape scheduler
Record the Phase 19 trace-only serving entropy run and close the group/defer-by-draft follow-up based on measured shape distribution, throughput regression, and green inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 03:03:49 +00:00
Ettore Di Giacinto
cced07c7fe docs(paged): add MTP shape trace patch
Add the next incremental llama.cpp patch for default-off speculative batch-shape tracing and record the Phase 18 red/green and inference-gate results.

Assisted-by: Codex:gpt-5
2026-07-01 02:54:29 +00:00
Ettore Di Giacinto
6e35476340 docs(paged): scope MTP graph-shape follow-up
Record Phase 17 source inspection: MTP verification rows change hard graph dimensions, padding is not a safe shortcut, and any future work should start with shape instrumentation before an opt-in scheduler experiment.

Assisted-by: Codex:gpt-5
2026-07-01 02:37:21 +00:00
Ettore Di Giacinto
ae76d42a96 docs(paged): profile MTP graph reuse loss
Record Phase 16 nsys evidence that current MTP serving loses paged decode graph reuse and increases GPU work, explaining the Phase 15 serving regression.

Assisted-by: Codex:gpt-5
2026-07-01 02:32:49 +00:00
Ettore Di Giacinto
4d171e62bb docs(paged): reject MTP serving lever
Add the repeatable MTP serving A/B runner and record Phase 15 results showing current llama-server MTP regresses GB10 serving throughput despite passing inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 02:29:28 +00:00
Ettore Di Giacinto
70394364a3 docs(paged): gate MTP rollback safety
Record Phase 14 MTP rollback evidence, normalized greedy-prefix checks, and canonical inference gates.

Assisted-by: Codex:gpt-5
2026-07-01 02:15:11 +00:00
Ettore Di Giacinto
2074b4fb5b docs(paged): reject GDN global Ai32 prototype
Record the default-off Global-Ai32 implementation, exact md5 gates, GB10 A/B regression, rejected diff artifact, and the resulting stop decision for GDN kernel work on GB10.

Assisted-by: Codex:gpt-5
2026-07-01 01:51:53 +00:00
Ettore Di Giacinto
adabd11919 docs(paged): scope GDN global Ai32 prototype
Record the shared-A/Ai GB10 cost model, the GO decision for one default-off f32 Ai prototype, and the Phase 13 implementation plan.

Assisted-by: Codex:gpt-5
2026-07-01 01:38:51 +00:00
Ettore Di Giacinto
1b5ae227eb docs(paged): reject GDN M5 QS-early phase
Record the Phase 11 default-off QS-early GDN experiment, its canonical md5 gates, the same-session GB10 A/B regression, and the rejected diff artifact.

Assisted-by: Codex:gpt-5
2026-07-01 01:29:44 +00:00
Ettore Di Giacinto
3da3b169fb docs(paged): reject GDN C32 slab phase
Record the default-off C32 slab experiment, its md5 gates, the dense tail-row fix, and the performance regression that rejects the source patch.

Assisted-by: Codex:gpt-5
2026-07-01 01:15:00 +00:00
Ettore Di Giacinto
34c4b5ce8d docs(paged): scope phase7 serving candidates
Mark the Phase 6 serving classifier complete, preserve the old parity final as historical, and scope Phase 7 source candidates with explicit md5 and op gates.

Assisted-by: Codex:gpt-5
2026-06-30 23:12:09 +00:00
Ettore Di Giacinto
85c88320ef patches(paged): pad W4A16 A shared tile stride
Mirror fork commit d9b9be0be as patch 0050 and record the Phase 4 W4A16 shared-memory padding gates, benchmarks, and mirror verification.

Assisted-by: Codex:gpt-5
2026-06-30 22:15:21 +00:00
Ettore Di Giacinto
c5f2545cdd patches(paged): tune W4A16 grouped tile shape
Mirror fork commit 7dfa0e175 as patch 0049 and record the Phase 2 GB10 W4A16 shape sweep, md5 gates, MUL_MAT_ID checks, and mirror verification.

Assisted-by: Codex:gpt-5
2026-06-30 21:57:42 +00:00
Ettore Di Giacinto
d8edc615e7 patches(paged): mirror W4A16 packed metadata
Mirror the fork-first W4A16 packed tile metadata commit into the LocalAI paged patch series, record the Phase 1 benchmark result, and keep the implementation plan checked off.

Assisted-by: Codex:gpt-5
2026-06-30 21:21:53 +00:00
Ettore Di Giacinto
de34cd5954 docs(paged): refresh parity handoff state
Reconcile the paged backend pin prose with the current Makefile pin, mark the 0044 patch tracking note as resolved, and add DGX Docker worker idleness to the benchmark preflight.

Assisted-by: Codex:gpt-5
2026-06-30 15:27:44 +00:00