mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 11:07:18 -04:00
Two call sites in ggml/src/ggml-cuda/argmax.cu (the top-K intra-warp
merge added by buun) use the 3-arg CUDA form __shfl_xor_sync(mask, var,
laneMask), omitting the optional width parameter. The hipification shim
at ggml/src/ggml-cuda/vendors/hip.h:33 is a function-like macro that
requires all four arguments, so hipcc fails with:
argmax.cu:265: too few arguments provided to function-like macro
invocation
note: macro '__shfl_xor_sync' defined here:
#define __shfl_xor_sync(mask, var, laneMask, width) \
__shfl_xor(var, laneMask, width)
Every other call in the same file already passes WARP_SIZE explicitly;
aligning these two with that convention fixes the hipblas build without
changing CUDA codegen (warpSize is the CUDA default).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>