mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 18:06:58 -04:00
feat(paged): restrict llama-cpp-localai-paged to CUDA-only build targets
The paged backend previously built for cublas/cuda, cpu, vulkan, sycl, hipblas and darwin/metal. On non-CUDA the patchset's wins are inert: the GDN fusions are gated off (patch 0030) and NVFP4 falls back to dequant, so the backend is neutral-to-negative there (README section 4c). The darwin grpc-server link also fails on undefined upstream server symbols, turning CI red. Both broken and pointless off-CUDA, so ship CUDA-only. - backend-matrix.yml: drop the hipblas, sycl f32/f16, cpu amd64/arm64, vulkan amd64/arm64 and metal-darwin rows for this backend; keep the four cublas rows (cuda-12, cuda-13, nvidia-l4t cuda-12 and cuda-13). - index.yaml: meta-backend (and -development) capabilities are now CUDA-only with default pointing at cuda12 (mirrors faster-qwen3-tts); removed the orphaned cpu/rocm/sycl/vulkan/metal variant entries. - Removed the now-unused darwin build script and its Makefile target / .NOTPARALLEL entry / backend_build_darwin.yml step. - Documented the CUDA-only build coverage in the patch README and plan. Non-CUDA users should use the stock llama-cpp backend. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -3,6 +3,13 @@
|
||||
Scoping deliverable only. NOTHING is changed by this document. It is grounded in the
|
||||
actual repo structure (read 2026-06-26 in worktree feat+paged-attention), not assumptions.
|
||||
|
||||
SHIPPED REALITY (update 2026-06-27): the backend ships CUDA-only. The matrix rows and
|
||||
the index.yaml meta-backend keep ONLY the CUDA/cublas variants (cuda-12, cuda-13, and
|
||||
the nvidia-l4t arm64 cuda-12/cuda-13 Jetson rows). The cpu / vulkan / sycl / hipblas /
|
||||
metal-darwin variants discussed below as optional/phase-2 were NOT shipped (and the
|
||||
darwin row was removed): off-CUDA the patchset's wins gate off, so it is neutral-to-
|
||||
negative there and non-CUDA users should use the stock llama-cpp backend (README 4c).
|
||||
|
||||
================================================================================
|
||||
0. GROUND TRUTH (what the repo actually does today)
|
||||
================================================================================
|
||||
|
||||
@@ -344,6 +344,14 @@ in a recommended/gallery config.
|
||||
|
||||
## 8. Models
|
||||
|
||||
> **Build coverage: CUDA-only.** This backend ships only the CUDA/cublas build
|
||||
> targets (cuda-12, cuda-13, and the nvidia-l4t arm64 cuda-12/cuda-13 Jetson
|
||||
> rows). There are no cpu / vulkan / sycl / hipblas / metal-darwin builds: the
|
||||
> patchset's wins are CUDA/Blackwell-specific (section 4c), so off-CUDA the
|
||||
> backend is neutral-to-negative and non-CUDA users should run the stock
|
||||
> `llama-cpp` backend instead. The `backend/index.yaml` meta-backend resolves
|
||||
> `default`/`nvidia` to a CUDA variant accordingly.
|
||||
|
||||
The benchmarked NVFP4 GGUFs are published and wired into the LocalAI gallery:
|
||||
|
||||
| Gallery entry | Weights (HuggingFace) | Notes |
|
||||
|
||||
Reference in New Issue
Block a user