LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	1b5ae227eb	docs(paged): reject GDN M5 QS-early phase Record the Phase 11 default-off QS-early GDN experiment, its canonical md5 gates, the same-session GB10 A/B regression, and the rejected diff artifact. Assisted-by: Codex:gpt-5	2026-07-01 01:29:44 +00:00
Ettore Di Giacinto	3da3b169fb	docs(paged): reject GDN C32 slab phase Record the default-off C32 slab experiment, its md5 gates, the dense tail-row fix, and the performance regression that rejects the source patch. Assisted-by: Codex:gpt-5	2026-07-01 01:15:00 +00:00
Ettore Di Giacinto	34c4b5ce8d	docs(paged): scope phase7 serving candidates Mark the Phase 6 serving classifier complete, preserve the old parity final as historical, and scope Phase 7 source candidates with explicit md5 and op gates. Assisted-by: Codex:gpt-5	2026-06-30 23:12:09 +00:00
Ettore Di Giacinto	85c88320ef	patches(paged): pad W4A16 A shared tile stride Mirror fork commit d9b9be0be as patch 0050 and record the Phase 4 W4A16 shared-memory padding gates, benchmarks, and mirror verification. Assisted-by: Codex:gpt-5	2026-06-30 22:15:21 +00:00
Ettore Di Giacinto	c5f2545cdd	patches(paged): tune W4A16 grouped tile shape Mirror fork commit 7dfa0e175 as patch 0049 and record the Phase 2 GB10 W4A16 shape sweep, md5 gates, MUL_MAT_ID checks, and mirror verification. Assisted-by: Codex:gpt-5	2026-06-30 21:57:42 +00:00
Ettore Di Giacinto	d8edc615e7	patches(paged): mirror W4A16 packed metadata Mirror the fork-first W4A16 packed tile metadata commit into the LocalAI paged patch series, record the Phase 1 benchmark result, and keep the implementation plan checked off. Assisted-by: Codex:gpt-5	2026-06-30 21:21:53 +00:00
Ettore Di Giacinto	de34cd5954	docs(paged): refresh parity handoff state Reconcile the paged backend pin prose with the current Makefile pin, mark the 0044 patch tracking note as resolved, and add DGX Docker worker idleness to the benchmark preflight. Assisted-by: Codex:gpt-5	2026-06-30 15:27:44 +00:00
Ettore Di Giacinto	1b9176c2c8	docs(paged): codify fork-first patch workflow as mandatory policy The fork mudler/llama.cpp branch localai-paged is the canonical source of truth for all paged-backend kernel/patch work. Always update it FIRST: commit the change on the fork branch and push it, then regenerate the LocalAI patch series (backend/cpp/llama-cpp-localai-paged/patches/paged/) from the fork via git format-patch so the series is a 1:1 drift-free mirror of the branch. Never edit the LocalAI patch files directly, and never add a patch with no corresponding fork-branch commit. The series is a derivative; the fork is the source. The fork branch is also where the build and the per-path bit-exact md5 gate actually run, so it is the only place a change is truly validated. Codified in two places: - .agents/llama-cpp-localai-paged-backend.md: new "Fork-first workflow (MANDATORY)" section at the top of the patch/pin-sync material, plus the "Encapsulating your work" bullet now points at it. - backend/cpp/llama-cpp-localai-paged/docs/PARITY_HANDOFF.md: strengthened the hard-gate (section 2.5) into "Fork-first is MANDATORY", and corrected a stale numbering example (fork 51168c5ee "patch 0044" maps to worktree 0044, not the f32-only M5 which is worktree 0047). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 15:12:36 +00:00
Ettore Di Giacinto	8bb47e5a8a	docs(paged): correct PARITY_HANDOFF ahead/behind + note dense CDEF gate md5 Ground-check follow-up to `2431090ff`. Two factual corrections: - Section 7 worktree line had the ahead/behind counts swapped ("25 ahead, 197 behind"); the branch is actually ~199 ahead / 25 behind origin/master. - Discrepancy item 5 flagged only the MoE CDEF PAGED_GATE_MD5 (0921716...); the dense run is symmetric (COMBINED_DEFINITIVE.txt records ecfe924d... for dense, which likewise differs from the canonical dense gate 5951a5b4). Both CDEF values come from combined_definitive.sh's own gate command, not the canonical bit-exact gate in section 3.3, so neither is sanctioned and both must be KL-validated. Everything else in the handoff verified accurate: fork branch localai-paged HEAD 51168c5ee (patch 0044) on dgx:~/llama-paged-fork, dev-tree HEAD a7d439e, all md5/KL numbers, the 86%/1078/924 decode record, bench env, and all referenced file/artifact paths. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 14:49:06 +00:00
Ettore Di Giacinto	2431090ff3	docs(paged): future-agent vLLM-parity HANDOFF guide (GB10, how-to companion to FINAL) Adds docs/PARITY_HANDOFF.md: the operational how-to for an agent with zero context picking up the GB10 vLLM-parity work. Complements VLLM_PARITY_FINAL.md (the why/record) with TL;DR state, the hard gates (per-path bit-exact md5, KL-gate, no LLAMA_MAX_BATCH_TOKENS, fork-is-canonical), a copy-pasteable operational quickstart (ssh/lock/build/bench + the --cuda-graph-trace=node decode-profiling rule that caused 4 wrong analyses), the complete tested-and- rejected lever map, methodology lessons, the three forward directions, and a key file/artifact index with the open discrepancies to reconcile. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 14:42:44 +00:00

10 Commits