Compare commits

...

1 Commits

Author SHA1 Message Date
Ettore Di Giacinto
798b5b2d84 chore(turboquant): bump fork to 4d24ad87 and patch ggml-hip for new f16-turbo fattn-vec instances
Bump TURBOQUANT_VERSION from 627ebbc6 to 4d24ad87, which pulls in
upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch
cases in fattn.cu"). That commit adds three new template instance files
under ggml-cuda/template-instances/:

  - fattn-vec-instance-f16-turbo2_0.cu
  - fattn-vec-instance-f16-turbo3_0.cu
  - fattn-vec-instance-f16-turbo4_0.cu

and wires matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
dispatch cases into fattn.cu. The dispatch cases are compiled into the
HIP build (fattn.cu is shared with ggml-hip via hipify), but the fork
forgot to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt.
CMake's ROCm branch carries a hand-curated template-instance list (used
when GGML_CUDA_FA_ALL_QUANTS is OFF, which is the default), so the HIP
build ends up with the extern template declarations but no matching
instantiations — the -gpu-rocm-hipblas-turboquant job failed at link
time (~90min into the 3h+ build).

Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the
existing apply-patches.sh machinery applies to the cloned fork sources
after fetch. The patch appends the three new f16-turbo instance files
to ggml-hip's source list in the same interleaved order used by
ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the
ROCm list (the build will fail fast if the anchor context goes stale,
which is the signal to retire it).

CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated
upstream) — the failure was isolated to HIP.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-22 07:13:47 +00:00
2 changed files with 48 additions and 1 deletions

View File

@@ -1,7 +1,7 @@
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
CMAKE_ARGS?=

View File

@@ -0,0 +1,47 @@
From: LocalAI turboquant backend maintainers <noreply@localai.io>
Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
in fattn.cu") added three new template instance files under ggml-cuda/:
- fattn-vec-instance-f16-turbo2_0.cu
- fattn-vec-instance-f16-turbo3_0.cu
- fattn-vec-instance-f16-turbo4_0.cu
and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
build path via hipify.
However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
is the default) and was never updated for the new F16-K + TURBO-V combos.
The HIP build therefore compiles the dispatch cases (which reference
ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
the matching template instantiations, causing a link-time failure in the
-gpu-rocm-hipblas-turboquant CI job.
Add the three new template instance files to ggml-hip's list so the HIP
build links cleanly. Drop this patch once the fork picks up the
corresponding upstream sync in ggml-hip/CMakeLists.txt.
--- a/ggml/src/ggml-hip/CMakeLists.txt
+++ b/ggml/src/ggml-hip/CMakeLists.txt
@@ -85,14 +85,17 @@ else()
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu