LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-27 18:06:58 -04:00

Files

Ettore Di Giacinto 984c8fcbea docs(paged): Layer-2 upstream scope for native fused-GDN kernels (Metal/Vulkan/SYCL)

Source-only analysis of what it would take to give the gated-DeltaNet decode
fusions (0018 in-place state write-back, 0019 fused recurrent-state gather,
0021 ssm_conv_update_inplace, 0028 conv-tap gather fusion) native kernels on
the non-CUDA compute backends, so the patch-series decode win extends past
CUDA-family hardware.

Key findings:
- The base GGML_OP_GATED_DELTA_NET and GGML_OP_SSM_CONV kernels ALREADY exist
  upstream on Metal, Vulkan AND SYCL (the README's no-Vulkan-kernel line is
  stale). The Qwen3.6 hybrids run on all three today via the non-fused path;
  Layer-2 is the decode SPEEDUP, not enabling the model to run.
- Per backend the new work is only the FUSION plumbing: redirect the GDN state
  write (in-place), add the ids read, write one new conv-update kernel + its
  ids variant, two tiny gather kernels, plus supports_op + op-handler + (Vulkan)
  pipeline/push-constant/descriptor wiring. Builders, CPU refs, model graph and
  test-backend-ops cases are shared and already done.
- Bit-exactness is feasible per backend by construction (the fusions redirect
  addresses, not the f32 reduction order); test-backend-ops (backendX-vs-CPU)
  is the gate.
- The 0030 name allow-list should become capability-driven (make supports_op
  authoritative for the discriminated src slots).
- Ranked: ops-first PR, then Metal (highest value/effort, fixed simdgroup =
  simplest bit-exactness), then SYCL (near-verbatim CUDA mirror, cheapest to
  author), then Vulkan (widest hardware reach but the shader-gen + variant
  matrix + subgroup variance make it the capstone).

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-27 12:11:24 +00:00

patches

docs(paged): Layer-2 upstream scope for native fused-GDN kernels (Metal/Vulkan/SYCL)

2026-06-27 12:11:24 +00:00

Makefile

refactor(paged): stock llama-cpp is patch-free; paged backend owns its patch series

2026-06-27 11:01:22 +00:00

package.sh

feat(backend): llama-cpp-localai-paged variant + NVFP4 Qwen3.6 gallery

2026-06-26 12:58:56 +00:00

run.sh

feat(backend): llama-cpp-localai-paged variant + NVFP4 Qwen3.6 gallery

2026-06-26 12:58:56 +00:00