mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-13 19:27:48 -04:00
ggml leaves GGML_CUDA_GRAPHS off by default. Passing -DGGML_CUDA_GRAPHS=ON for cublas builds lets the CUDA backend capture and replay the compute graph for a small free speedup (about 1% measured on a GB10, never negative). It is not gated by parakeet.cpp's CMake options, so it passes straight through to ggml. Assisted-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>