LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-03 05:51:53 -04:00

Files

Ettore Di Giacinto 9eb21e9a20 fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances

Fork commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
in fattn.cu") added three new template instance files under
ggml-cuda/template-instances/ (fattn-vec-instance-f16-turbo{2,3,4}_0.cu)
and wired matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO*)
dispatch cases into fattn.cu.

fattn.cu is shared with the HIP build via hipify, but the fork forgot
to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt.
CMake's ROCm branch carries a hand-curated template-instance list (used
when GGML_CUDA_FA_ALL_QUANTS is OFF, the default), so the HIP build
ended up with the extern template declarations but no matching
instantiations — the -gpu-rocm-hipblas-turboquant job failed partway
through the 3h+ build.

Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the
existing apply-patches.sh machinery applies to the cloned fork sources
after fetch. The patch appends the three new f16-turbo instance files
to ggml-hip's source list in the same interleaved order used by
ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the
ROCm list — the build will fail fast if the anchor context goes stale,
which is the signal to retire it.

CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated
upstream) — the link failure was isolated to HIP.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

2026-04-22 07:17:33 +00:00

patches

fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances

2026-04-22 07:17:33 +00:00

apply-patches.sh

feat(backend): add turboquant llama.cpp-fork backend (#9355 )

2026-04-15 01:25:04 +02:00

Makefile

⬆️ Update TheTom/llama-cpp-turboquant

2026-04-21 21:28:32 +00:00

package.sh

feat(backend): add turboquant llama.cpp-fork backend (#9355 )