Files
LocalAI/.github
Ettore Di Giacinto a4e730979d feat(paged): restrict llama-cpp-localai-paged to CUDA-only build targets
The paged backend previously built for cublas/cuda, cpu, vulkan, sycl,
hipblas and darwin/metal. On non-CUDA the patchset's wins are inert: the
GDN fusions are gated off (patch 0030) and NVFP4 falls back to dequant,
so the backend is neutral-to-negative there (README section 4c). The
darwin grpc-server link also fails on undefined upstream server symbols,
turning CI red. Both broken and pointless off-CUDA, so ship CUDA-only.

- backend-matrix.yml: drop the hipblas, sycl f32/f16, cpu amd64/arm64,
  vulkan amd64/arm64 and metal-darwin rows for this backend; keep the
  four cublas rows (cuda-12, cuda-13, nvidia-l4t cuda-12 and cuda-13).
- index.yaml: meta-backend (and -development) capabilities are now
  CUDA-only with default pointing at cuda12 (mirrors faster-qwen3-tts);
  removed the orphaned cpu/rocm/sycl/vulkan/metal variant entries.
- Removed the now-unused darwin build script and its Makefile target /
  .NOTPARALLEL entry / backend_build_darwin.yml step.
- Documented the CUDA-only build coverage in the patch README and plan.

Non-CUDA users should use the stock llama-cpp backend.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 12:29:15 +00:00
..
2023-07-09 13:39:00 +02:00
2025-02-13 09:58:19 +01:00