mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 18:06:58 -04:00
Two .agents guides (indexed in AGENTS.md): - llama-cpp-localai-paged-backend.md: what the CUDA-only paged backend is, the patchset scope, the bit-exact gate, the manual pin-sync + weekly canary, the CUDA-only / stock-stays-pure invariants, and the Metal/SYCL/Vulkan follow-up scope. - vllm-parity-methodology.md: the decode-parity playbook (bit-exact gating, profile-don't-assume, both-engine ground-truth, per-lever A/B, recording rejected levers, multi-agent GPU orchestration). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>