LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-29 11:07:18 -04:00

Files

Ettore Di Giacinto 42754d33b9 fix(buun-llama-cpp): pass WARP_SIZE to argmax __shfl_xor_sync calls

Two call sites in ggml/src/ggml-cuda/argmax.cu (the top-K intra-warp
merge added by buun) use the 3-arg CUDA form __shfl_xor_sync(mask, var,
laneMask), omitting the optional width parameter. The hipification shim
at ggml/src/ggml-cuda/vendors/hip.h:33 is a function-like macro that
requires all four arguments, so hipcc fails with:

    argmax.cu:265: too few arguments provided to function-like macro
      invocation
    note: macro '__shfl_xor_sync' defined here:
      #define __shfl_xor_sync(mask, var, laneMask, width) \
              __shfl_xor(var, laneMask, width)

Every other call in the same file already passes WARP_SIZE explicitly;
aligning these two with that convention fixes the hipblas build without
changing CUDA codegen (warpSize is the CUDA default).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-24 16:29:29 +00:00

buun-llama-cpp

fix(buun-llama-cpp): pass WARP_SIZE to argmax __shfl_xor_sync calls

2026-04-24 16:29:29 +00:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature (#9531 )

2026-04-24 13:07:26 +02:00

llama-cpp

chore: ⬆️ Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 (#9520 )

2026-04-24 09:17:06 +02:00

turboquant

fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423 )

2026-04-19 21:05:21 +02:00