mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-24 08:38:02 -04:00
fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances
Fork commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
in fattn.cu") added three new template instance files under
ggml-cuda/template-instances/ (fattn-vec-instance-f16-turbo{2,3,4}_0.cu)
and wired matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO*)
dispatch cases into fattn.cu.
fattn.cu is shared with the HIP build via hipify, but the fork forgot
to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt.
CMake's ROCm branch carries a hand-curated template-instance list (used
when GGML_CUDA_FA_ALL_QUANTS is OFF, the default), so the HIP build
ended up with the extern template declarations but no matching
instantiations — the -gpu-rocm-hipblas-turboquant job failed partway
through the 3h+ build.
Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the
existing apply-patches.sh machinery applies to the cloned fork sources
after fetch. The patch appends the three new f16-turbo instance files
to ggml-hip's source list in the same interleaved order used by
ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the
ROCm list — the build will fail fast if the anchor context goes stale,
which is the signal to retire it.
CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated
upstream) — the link failure was isolated to HIP.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
This commit is contained in:
@@ -0,0 +1,47 @@
|
||||
From: LocalAI turboquant backend maintainers <noreply@localai.io>
|
||||
Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
|
||||
|
||||
Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
|
||||
in fattn.cu") added three new template instance files under ggml-cuda/:
|
||||
|
||||
- fattn-vec-instance-f16-turbo2_0.cu
|
||||
- fattn-vec-instance-f16-turbo3_0.cu
|
||||
- fattn-vec-instance-f16-turbo4_0.cu
|
||||
|
||||
and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
|
||||
dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
|
||||
were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
|
||||
build path via hipify.
|
||||
|
||||
However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
|
||||
template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
|
||||
is the default) and was never updated for the new F16-K + TURBO-V combos.
|
||||
The HIP build therefore compiles the dispatch cases (which reference
|
||||
ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
|
||||
the matching template instantiations, causing a link-time failure in the
|
||||
-gpu-rocm-hipblas-turboquant CI job.
|
||||
|
||||
Add the three new template instance files to ggml-hip's list so the HIP
|
||||
build links cleanly. Drop this patch once the fork picks up the
|
||||
corresponding upstream sync in ggml-hip/CMakeLists.txt.
|
||||
|
||||
--- a/ggml/src/ggml-hip/CMakeLists.txt
|
||||
+++ b/ggml/src/ggml-hip/CMakeLists.txt
|
||||
@@ -85,14 +85,17 @@ else()
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu
|
||||
Reference in New Issue
Block a user