LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Files

Ettore Di Giacinto ac2b0211ff docs(paged): record P6 fp8-KV BLOCKED-ON-INFRA + the analytical decode ceiling

P6 (final program phase) could not run its kill-gate: the DGX/GB10 was
unreachable for the entire window (cloudflared access via prem-vm returned
HTTP 530 / websocket bad-handshake on every probe; re-confirmed with 5 fresh
probes). Stage 0a (measured nsys graph-node decode ceiling) and Stage 0b
(fp8-e4m3 kernel + kill-gate A/B) were physically impossible with no GPU.

Records the honest infra-block (NOT a measured NO-GO, NOT a NO-GO-by-ceiling)
plus the load-bearing artifact: the analytical fp8-KV decode ceiling table.
fp8 halves KV bytes -> theoretical-max decode saving = 0.5 x flash-attn share:
ctx256 0.65% (standard shape hard NO-GO), ctx1024 2.55%, ctx2048 4.98% (first
crosses +3%), ctx4096 9.49%, ctx8192 17.34%. The win, if realizable, lives
only at ctx>=2048; the hybrid-GDN structure (10/40 layers carry KV, 30 GDN
layers hold fixed-size recurrent state with no KV) caps what any KV-dtype
lever can save. The dominant null stands unrefuted: Q8_0 KV was a measured
+7.8% decode regression on GB10. Notes the capacity-play framing (fp8-KV as a
memory feature remains open even if throughput-flat).

Fork localai-paged untouched at 653bb2f3d; series stays at 46 patches
(0001-0055); P3's p3-w4a16-direct work undisturbed. Docs-only; no code, no
topic branch, no patches. Not pushed.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-07-02 21:43:36 +00:00

ACCELERATOR_PORTING_SCOPE.md

docs(paged): scope porting the portable benefits to Metal/SYCL/Vulkan (+ROCm)

2026-06-28 08:34:32 +00:00

BENCHMARK.md

docs(paged): record phases 112-140 + series trim decision

2026-07-02 10:16:53 +00:00

DECODE_SERVING_SCOPE.md

docs(paged): record padded/fixed-slot decode shape as tested-and-rejected

2026-06-28 20:47:43 +00:00

EXECUTION_REARCH_SCOPE.md

docs(paged): record P6 fp8-KV BLOCKED-ON-INFRA + the analytical decode ceiling