mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 18:06:58 -04:00
Classify the paged-attention optimizations as arch-GENERAL (ship everywhere), GB10-TUNED (per-arch retune), or Blackwell-precision-specific; add the per-arch expected story (sm_100/Hopper/Ada/Metal/CPU) and the SAFETY gap (fused GDN/conv ops are CUDA+CPU-only with backend-ungated emission). Extends the prior build/gallery-targeting audit in the same file. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>