mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-24 00:26:34 -04:00
Compare commits
1 Commits
master
...
issue-9478
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
798b5b2d84 |
@@ -1,7 +1,7 @@
|
|||||||
|
|
||||||
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
||||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||||
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
|
TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234
|
||||||
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
||||||
|
|
||||||
CMAKE_ARGS?=
|
CMAKE_ARGS?=
|
||||||
|
|||||||
@@ -0,0 +1,47 @@
|
|||||||
|
From: LocalAI turboquant backend maintainers <noreply@localai.io>
|
||||||
|
Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
|
||||||
|
|
||||||
|
Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
|
||||||
|
in fattn.cu") added three new template instance files under ggml-cuda/:
|
||||||
|
|
||||||
|
- fattn-vec-instance-f16-turbo2_0.cu
|
||||||
|
- fattn-vec-instance-f16-turbo3_0.cu
|
||||||
|
- fattn-vec-instance-f16-turbo4_0.cu
|
||||||
|
|
||||||
|
and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
|
||||||
|
dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
|
||||||
|
were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
|
||||||
|
build path via hipify.
|
||||||
|
|
||||||
|
However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
|
||||||
|
template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
|
||||||
|
is the default) and was never updated for the new F16-K + TURBO-V combos.
|
||||||
|
The HIP build therefore compiles the dispatch cases (which reference
|
||||||
|
ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
|
||||||
|
the matching template instantiations, causing a link-time failure in the
|
||||||
|
-gpu-rocm-hipblas-turboquant CI job.
|
||||||
|
|
||||||
|
Add the three new template instance files to ggml-hip's list so the HIP
|
||||||
|
build links cleanly. Drop this patch once the fork picks up the
|
||||||
|
corresponding upstream sync in ggml-hip/CMakeLists.txt.
|
||||||
|
|
||||||
|
--- a/ggml/src/ggml-hip/CMakeLists.txt
|
||||||
|
+++ b/ggml/src/ggml-hip/CMakeLists.txt
|
||||||
|
@@ -85,14 +85,17 @@ else()
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
|
||||||
|
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
|
||||||
|
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
|
||||||
|
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
|
||||||
|
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu
|
||||||
Reference in New Issue
Block a user