LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-30 11:36:31 -04:00

Files

Ettore Di Giacinto 7f2b7e4ace fix(buun-llama-cpp): shim atomicAdd(double*,double) for pre-sm_60 CUDA

Buun's Q² calibration path in ggml/src/ggml-cuda/fattn.cu calls
atomicAdd with a double* destination. Native double atomicAdd is only
available on CUDA compute capability 6.0 and later — LocalAI's CUDA 12
Docker image builds for the full published arch range (which includes
sm_50/sm_52), so nvcc fails with:

    fattn.cu:812: error: no instance of overloaded function "atomicAdd"
    matches the argument list, argument types are: (double *, double)

Add the canonical CAS-loop shim from the CUDA C Programming Guide
(B.15 Atomic Functions) guarded on __CUDA_ARCH__ < 600. On sm_60+ the
guard is false and nvcc picks up the native intrinsic as before.

Patch file lives under backend/cpp/buun-llama-cpp/patches/ and is
applied to the cloned fork tree by apply-patches.sh (the infrastructure
already put in place for exactly this class of backport).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-24 13:57:30 +00:00

patches

fix(buun-llama-cpp): shim atomicAdd(double*,double) for pre-sm_60 CUDA

2026-04-24 13:57:30 +00:00

apply-patches.sh

feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)

2026-04-24 12:52:53 +00:00

Makefile

feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)

2026-04-24 12:52:53 +00:00

package.sh

feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)