From 9115c2c52c100c2cd59664b4580cfa823e493de1 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@localai.io>
Date: Sat, 27 Jun 2026 12:18:11 +0000
Subject: [PATCH] docs(paged): correct Vulkan/SYCL note (GDN op IS upstream) +
 CUDA-only rationale

The gated-DeltaNet + SSM_CONV ops have upstream Metal/Vulkan/SYCL kernels, so the
Qwen3.6 hybrids run there (non-fused) - the earlier 'no Vulkan kernel' note was
wrong. The patchset's fusions are gated off off-CUDA, so the backend ships
CUDA-only; non-CUDA users use stock llama-cpp.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---
 .../llama-cpp-localai-paged/patches/paged/README.md   | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/backend/cpp/llama-cpp-localai-paged/patches/paged/README.md b/backend/cpp/llama-cpp-localai-paged/patches/paged/README.md
index 365cf6123..aea38ffc0 100644
--- a/backend/cpp/llama-cpp-localai-paged/patches/paged/README.md
+++ b/backend/cpp/llama-cpp-localai-paged/patches/paged/README.md
@@ -225,9 +225,14 @@ path falls back to a host-side gather, pure overhead over stock's contiguous rea
 Everything Blackwell-specific (NVFP4, GDN fusions via 0030, occupancy) is inert.
 So **on Apple Silicon, prefer the stock `llama-cpp` backend.**
 
-**Vulkan** (source analysis, no box to measure): same picture, worse - the
-CUDA-only levers are inert AND the gated-DeltaNet op has *no Vulkan kernel
-upstream*, so the Qwen3.6 hybrid models assert/fall back and don't run there.
+**Vulkan / SYCL** (source analysis): the gated-DeltaNet and SSM_CONV ops DO have
+upstream kernels on Vulkan and SYCL (as on Metal), so the Qwen3.6 hybrids RUN on
+all three via the non-fused path. The patchset's fusions are gated off there
+(0030), so the outcome is the same neutral-to-slightly-negative as Metal - not
+"won't run". This backend therefore ships **CUDA-only** (where the fusions are
+live + verified); non-CUDA users should use the stock `llama-cpp` backend. See
+[`UPSTREAM_LAYER2_SCOPE.md`](UPSTREAM_LAYER2_SCOPE.md) for what native non-CUDA
+fused kernels would take.
 
 ---