Files
LocalAI/backend/cpp
Ettore Di Giacinto 4a9a1dd247 docs(paged): Mac stock-vs-patched bench + Vulkan note + cross-backend learnings
Section 4(c): real Apple M4/Metal numbers (Qwen3-8B Q4_K_M, stock vs patched) -
patchset is neutral-to-slightly-negative on Metal (the in-kernel block-table read
is CUDA-only; NVFP4/GDN-fusions inert), so prefer stock llama-cpp on Apple Silicon.
Vulkan: same picture, worse (no upstream GDN op). Section 6: cross-backend learnings
+ upstream candidates (the GDN decode-plumbing fusions are the portable, bit-exact,
CPU-mirrored win worth upstreaming).

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 11:05:37 +00:00
..