feat(paged): restrict llama-cpp-localai-paged to CUDA-only build targets

The paged backend previously built for cublas/cuda, cpu, vulkan, sycl, hipblas and darwin/metal. On non-CUDA the patchset's wins are inert: the GDN fusions are gated off (patch 0030) and NVFP4 falls back to dequant, so the backend is neutral-to-negative there (README section 4c). The darwin grpc-server link also fails on undefined upstream server symbols, turning CI red. Both broken and pointless off-CUDA, so ship CUDA-only. - backend-matrix.yml: drop the hipblas, sycl f32/f16, cpu amd64/arm64, vulkan amd64/arm64 and metal-darwin rows for this backend; keep the four cublas rows (cuda-12, cuda-13, nvidia-l4t cuda-12 and cuda-13). - index.yaml: meta-backend (and -development) capabilities are now CUDA-only with default pointing at cuda12 (mirrors faster-qwen3-tts); removed the orphaned cpu/rocm/sycl/vulkan/metal variant entries. - Removed the now-unused darwin build script and its Makefile target / .NOTPARALLEL entry / backend_build_darwin.yml step. - Documented the CUDA-only build coverage in the patch README and plan. Non-CUDA users should use the stock llama-cpp backend. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 18:06:58 -04:00 · 2026-06-27 12:29:15 +00:00
parent 9115c2c52c
commit a4e730979d
7 changed files with 25 additions and 299 deletions
--- a/.github/workflows/backend_build_darwin.yml
+++ b/.github/workflows/backend_build_darwin.yml
@@ -230,16 +230,6 @@ jobs:
          make protogen-go
          make backends/llama-cpp-darwin

-      # llama-cpp-localai-paged reuses the same bespoke llama-cpp darwin build path
-      # (CPU_ALL_VARIANTS + Metal + otool dylib bundling) via its own wrapper script,
-      # so it gets a dedicated step like stock llama-cpp rather than the generic
-      # build-darwin-go-backend mold.
-      - name: Build ${{ inputs.backend }}-darwin (llama-cpp-localai-paged)
-        if: inputs.backend == 'llama-cpp-localai-paged'
-        run: |
-          make protogen-go
-          make backends/llama-cpp-localai-paged-darwin
-
      - name: Build ds4 backend (Darwin Metal)
        if: inputs.backend == 'ds4'
        run: |
@@ -255,7 +245,7 @@ jobs:
          make backends/privacy-filter-darwin

      - name: Build ${{ inputs.backend }}-darwin
-        if: inputs.backend != 'llama-cpp' && inputs.backend != 'llama-cpp-localai-paged' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
+        if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
        run: |
          make protogen-go
          BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend