LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-28 10:27:30 -04:00

Files

Ettore Di Giacinto c51ff4cec9 docs(paged): scope porting the portable benefits to Metal/SYCL/Vulkan (+ROCm)

Add ACCELERATOR_PORTING_SCOPE.md, the umbrella scope for taking the paged
backend's accelerator-portable wins off the CUDA family. It builds on (does
not duplicate) UPSTREAM_LAYER2_SCOPE.md, which stays the GDN/SSM-fusion
detail (benefit #1), and adds:

- Benefit #2 (paged KV in-kernel block-table flash-attn read, 0009-0011):
  new per-backend feasibility from source analysis of the Metal/SYCL/Vulkan
  flash-attn kernels. SYCL EASY (near line-for-line CUDA mirror), Metal
  EASY-MEDIUM (decode already routes to the vec kernel), Vulkan MEDIUM (the
  fast coopmat2 NVIDIA decode path cannot do the indexed read; push-constants
  are full). Universal constraint: only the vec/scalar decode kernel admits
  the per-cell indexed read, so route block-table ops onto vec (as CUDA's
  0009-0010 dispatch guard already does) and leave the fast MM/coopmat2 path
  contiguous-only. This is the lever that flips paged KV from
  neutral-to-slightly-negative to non-negative off CUDA.
- Benefit #3 (decode-first scheduler, 0013/0016): confirmed a free portable
  win - host-side update_slots() policy, zero kernel work, runs on any
  accelerator as-is.
- Benefit #4 (NVFP4 FP4-MMA, 0017/0023/0025): out of scope (Blackwell only);
  flags the backend-agnostic analogues of the act-quant dedup and the
  graph-coverage lever without over-claiming a port.
- A ROCm note: ROCm rides the CUDA/HIP path (validate, don't re-port);
  FP4-MMA stays Blackwell-only.

Benefits #1 and #2 share the port shape and rank Metal->SYCL->Vulkan, so they
bundle into one per-backend PR behind a shared ops-first PR. Cross-link added
from UPSTREAM_LAYER2_SCOPE.md. All gates are test-backend-ops on-target (no
Metal/SYCL/Vulkan/ROCm hardware here).

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-28 08:34:32 +00:00

ACCELERATOR_PORTING_SCOPE.md

docs(paged): scope porting the portable benefits to Metal/SYCL/Vulkan (+ROCm)

2026-06-28 08:34:32 +00:00

final_benchmark.csv

docs(paged): re-measure DGX benchmarks on one harness (stock/patched/bf16-tau)

2026-06-27 22:05:59 +00:00

LOCALAI_LLAMACPP_BACKEND_PLAN.md

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/

2026-06-27 13:20:05 +00:00

PAGED_BITEXACT_NOTE.md

chore(paged): keep patches/ patch-only; README to backend root, docs to docs/