From 9eb21e9a20d387a167c74f030bc15ecbe8e23a84 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Wed, 22 Apr 2026 07:17:33 +0000 Subject: [PATCH] fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fork commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases in fattn.cu") added three new template instance files under ggml-cuda/template-instances/ (fattn-vec-instance-f16-turbo{2,3,4}_0.cu) and wired matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO*) dispatch cases into fattn.cu. fattn.cu is shared with the HIP build via hipify, but the fork forgot to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt. CMake's ROCm branch carries a hand-curated template-instance list (used when GGML_CUDA_FA_ALL_QUANTS is OFF, the default), so the HIP build ended up with the extern template declarations but no matching instantiations — the -gpu-rocm-hipblas-turboquant job failed partway through the 3h+ build. Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the existing apply-patches.sh machinery applies to the cloned fork sources after fetch. The patch appends the three new f16-turbo instance files to ggml-hip's source list in the same interleaved order used by ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the ROCm list — the build will fail fast if the anchor context goes stale, which is the signal to retire it. CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated upstream) — the link failure was isolated to HIP. Assisted-by: Claude:claude-opus-4-7 [Claude Code] --- ...ggml-hip-add-f16-turbo-vec-instances.patch | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch diff --git a/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch b/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch new file mode 100644 index 000000000..5225377d0 --- /dev/null +++ b/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch @@ -0,0 +1,47 @@ +From: LocalAI turboquant backend maintainers +Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances + +Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases +in fattn.cu") added three new template instance files under ggml-cuda/: + + - fattn-vec-instance-f16-turbo2_0.cu + - fattn-vec-instance-f16-turbo3_0.cu + - fattn-vec-instance-f16-turbo4_0.cu + +and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion +dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0) +were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP +build path via hipify. + +However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of +template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which +is the default) and was never updated for the new F16-K + TURBO-V combos. +The HIP build therefore compiles the dispatch cases (which reference +ggml_cuda_flash_attn_ext_vec_case) without ever compiling +the matching template instantiations, causing a link-time failure in the +-gpu-rocm-hipblas-turboquant CI job. + +Add the three new template instance files to ggml-hip's list so the HIP +build links cleanly. Drop this patch once the fork picks up the +corresponding upstream sync in ggml-hip/CMakeLists.txt. + +--- a/ggml/src/ggml-hip/CMakeLists.txt ++++ b/ggml/src/ggml-hip/CMakeLists.txt +@@ -85,14 +85,17 @@ else() + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu ++ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu + ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu