fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances

Fork commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases in fattn.cu") added three new template instance files under ggml-cuda/template-instances/ (fattn-vec-instance-f16-turbo{2,3,4}_0.cu) and wired matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO*) dispatch cases into fattn.cu. fattn.cu is shared with the HIP build via hipify, but the fork forgot to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt. CMake's ROCm branch carries a hand-curated template-instance list (used when GGML_CUDA_FA_ALL_QUANTS is OFF, the default), so the HIP build ended up with the extern template declarations but no matching instantiations — the -gpu-rocm-hipblas-turboquant job failed partway through the 3h+ build. Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the existing apply-patches.sh machinery applies to the cloned fork sources after fetch. The patch appends the three new f16-turbo instance files to ggml-hip's source list in the same interleaved order used by ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the ROCm list — the build will fail fast if the anchor context goes stale, which is the signal to retire it. CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated upstream) — the link failure was isolated to HIP. Assisted-by: Claude:claude-opus-4-7 [Claude Code]
⬆️ Update TheTom/llama-cpp-turboquant
2026-05-24 16:51:44 -04:00 · 2026-04-22 07:17:33 +00:00 · 2026-04-21 21:28:32 +00:00
2 changed files with 1 additions and 56 deletions
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
+LLAMA_VERSION?=cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,59 +1,4 @@
 ---
- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
-  description: |
-    # 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
-
-    A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
-
-    The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
-
-      - **Developed by:** @hesamation
-      - **Base model:** `Qwen/Qwen3.6-35B-A3B`
-      - **License:** apache-2.0
-
-    This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
-
-    [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
-
-    ## Benchmark Results
-
-    The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
-
-    ...
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qwen
-    - reasoning
-  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    options:
-      - use_jinja:true
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
-      presence_penalty: 1.5
-      repeat_penalty: 1
-      temperature: 0.7
-      top_k: 20
-      top_p: 0.8
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
-      sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
-      uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
 - name: "qwen3.5-9b-glm5.1-distill-v1"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls: