From 798b5b2d84b45b0a98f5dbad6d7d8387b8466780 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Wed, 22 Apr 2026 07:13:47 +0000 Subject: [PATCH] chore(turboquant): bump fork to 4d24ad87 and patch ggml-hip for new f16-turbo fattn-vec instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bump TURBOQUANT_VERSION from 627ebbc6 to 4d24ad87, which pulls in upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases in fattn.cu"). That commit adds three new template instance files under ggml-cuda/template-instances/: - fattn-vec-instance-f16-turbo2_0.cu - fattn-vec-instance-f16-turbo3_0.cu - fattn-vec-instance-f16-turbo4_0.cu and wires matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0) dispatch cases into fattn.cu. The dispatch cases are compiled into the HIP build (fattn.cu is shared with ggml-hip via hipify), but the fork forgot to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt. CMake's ROCm branch carries a hand-curated template-instance list (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which is the default), so the HIP build ends up with the extern template declarations but no matching instantiations — the -gpu-rocm-hipblas-turboquant job failed at link time (~90min into the 3h+ build). Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the existing apply-patches.sh machinery applies to the cloned fork sources after fetch. The patch appends the three new f16-turbo instance files to ggml-hip's source list in the same interleaved order used by ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the ROCm list (the build will fail fast if the anchor context goes stale, which is the signal to retire it). CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated upstream) — the failure was isolated to HIP. Assisted-by: Claude:claude-opus-4-7 [Claude Code] --- backend/cpp/turboquant/Makefile | 2 +- ...ggml-hip-add-f16-turbo-vec-instances.patch | 47 +++++++++++++++++++ 2 files changed, 48 insertions(+), 1 deletion(-) create mode 100644 backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch diff --git a/backend/cpp/turboquant/Makefile b/backend/cpp/turboquant/Makefile index cd3d1ab80..776b82ef1 100644 --- a/backend/cpp/turboquant/Makefile +++ b/backend/cpp/turboquant/Makefile @@ -1,7 +1,7 @@ # Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant. # Auto-bumped nightly by .github/workflows/bump_deps.yaml. -TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8 +TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234 LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant CMAKE_ARGS?= diff --git a/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch b/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch new file mode 100644 index 000000000..5225377d0 --- /dev/null +++ b/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch @@ -0,0 +1,47 @@ +From: LocalAI turboquant backend maintainers +Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances + +Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases +in fattn.cu") added three new template instance files under ggml-cuda/: + + - fattn-vec-instance-f16-turbo2_0.cu + - fattn-vec-instance-f16-turbo3_0.cu + - fattn-vec-instance-f16-turbo4_0.cu + +and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion +dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0) +were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP +build path via hipify. + +However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of +template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which +is the default) and was never updated for the new F16-K + TURBO-V combos. +The HIP build therefore compiles the dispatch cases (which reference +ggml_cuda_flash_attn_ext_vec_case) without ever compiling +the matching template instantiations, causing a link-time failure in the +-gpu-rocm-hipblas-turboquant CI job. + +Add the three new template instance files to ggml-hip's list so the HIP +build links cleanly. Drop this patch once the fork picks up the +corresponding upstream sync in ggml-hip/CMakeLists.txt. + +--- a/ggml/src/ggml-hip/CMakeLists.txt ++++ b/ggml/src/ggml-hip/CMakeLists.txt +@@ -85,14 +85,17 @@ else() + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu