mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 09:57:14 -04:00
docs(paged): correct Vulkan/SYCL note (GDN op IS upstream) + CUDA-only rationale
The gated-DeltaNet + SSM_CONV ops have upstream Metal/Vulkan/SYCL kernels, so the Qwen3.6 hybrids run there (non-fused) - the earlier 'no Vulkan kernel' note was wrong. The patchset's fusions are gated off off-CUDA, so the backend ships CUDA-only; non-CUDA users use stock llama-cpp. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -225,9 +225,14 @@ path falls back to a host-side gather, pure overhead over stock's contiguous rea
|
||||
Everything Blackwell-specific (NVFP4, GDN fusions via 0030, occupancy) is inert.
|
||||
So **on Apple Silicon, prefer the stock `llama-cpp` backend.**
|
||||
|
||||
**Vulkan** (source analysis, no box to measure): same picture, worse - the
|
||||
CUDA-only levers are inert AND the gated-DeltaNet op has *no Vulkan kernel
|
||||
upstream*, so the Qwen3.6 hybrid models assert/fall back and don't run there.
|
||||
**Vulkan / SYCL** (source analysis): the gated-DeltaNet and SSM_CONV ops DO have
|
||||
upstream kernels on Vulkan and SYCL (as on Metal), so the Qwen3.6 hybrids RUN on
|
||||
all three via the non-fused path. The patchset's fusions are gated off there
|
||||
(0030), so the outcome is the same neutral-to-slightly-negative as Metal - not
|
||||
"won't run". This backend therefore ships **CUDA-only** (where the fusions are
|
||||
live + verified); non-CUDA users should use the stock `llama-cpp` backend. See
|
||||
[`UPSTREAM_LAYER2_SCOPE.md`](UPSTREAM_LAYER2_SCOPE.md) for what native non-CUDA
|
||||
fused kernels would take.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user