fix(gallery): hide broken Gemma 4 QAT MTP entries (#10348)

The Gemma 4 QAT MTP assistant-head gallery entries currently fail to load in the stock llama.cpp backend with unknown architecture errors. Hide them until the assistant GGUFs are verified against the supported backend path. Assisted-by: Codex:GPT-5 [gh] [git] Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-08-02 19:40:11 -04:00 · 2026-06-15 22:57:19 +02:00
parent 9ba8521e7e
commit edc61053aa
1 changed files with 166 additions and 160 deletions
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -540,166 +540,172 @@
    - filename: llama-cpp/mmproj/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B-it-mmproj.gguf
      sha256: 8e239c9c592541c9f537fff75677ea30d8af1e14ba63d27cf245423b7d0a688b
      uri: https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf/resolve/main/gemma-4-31B-it-mmproj.gguf
- name: "gemma-4-12b-it-qat-mtp"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf
-    - https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF
-  description: |
-    Gemma 4 12B IT QAT (Google DeepMind) paired with the official QAT assistant/drafter head for Multi-Token Prediction (MTP) speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from Janvitos, converted from Google's `gemma-4-12B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. With llama.cpp's `draft-mtp` speculative path enabled, this combination accelerates generation while keeping the target model's quality. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
-
-    License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), Janvitos (GGUF conversion)
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qat
-    - multimodal
-    - mtp
-  icon: https://ai.google.dev/gemma/images/gemma4_banner.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    mmproj: llama-cpp/mmproj/gemma-4-12B-it-qat-q4_0-gguf/mmproj-gemma-4-12b-it-qat-q4_0.gguf
-    draft_model: llama-cpp/models/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
-    options:
-      - use_jinja:true
-      - spec_type:draft-mtp
-      - spec_n_max:6
-      - spec_p_min:0.75
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/gemma-4-12B-it-qat-q4_0-gguf/gemma-4-12b-it-qat-q4_0.gguf
-      repeat_penalty: 1
-      temperature: 1
-      top_k: 64
-      top_p: 0.95
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/gemma-4-12B-it-qat-q4_0-gguf/gemma-4-12b-it-qat-q4_0.gguf
-      sha256: faff1a63667fac17ac5e777f47114688fcefea96e220e211aaa8d62c2c4561f1
-      uri: https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf/resolve/main/gemma-4-12b-it-qat-q4_0.gguf
-    - filename: llama-cpp/mmproj/gemma-4-12B-it-qat-q4_0-gguf/mmproj-gemma-4-12b-it-qat-q4_0.gguf
-      sha256: e70b0e5cd80323d5d588b4ed06780356b7b1ba03995a4b8164c6ae9db0ff5989
-      uri: https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf/resolve/main/mmproj-gemma-4-12b-it-qat-q4_0.gguf
-    - filename: llama-cpp/models/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
-      sha256: 13331068b6af643c3dc75e619373b674c1f75a1958e7c82e2020d96a17c63809
-      uri: https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/resolve/main/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
- name: "gemma-4-26b-a4b-it-qat-mtp"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf
-    - https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads
-  description: |
-    Gemma 4 26B-A4B IT QAT (Google DeepMind), a multimodal Mixture-of-Experts model (26B total, ~4B active per token), paired with the QAT-matched MTP assistant/drafter head for Multi-Token Prediction speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from boxwrench, converted from Google's `gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. Using a QAT-matched head instead of a generic non-QAT head raised draft acceptance from ~57% to ~92% on this model. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
-
-    > [!Note]
-    > The assistant head uses the `gemma4_assistant` architecture. It loads on the Atomic TurboQuant llama.cpp fork and on stock llama.cpp once ggml-org/llama.cpp#23398 ("llama: add Gemma4 MTP") merges. Until the upstream `n_tokens` reshape fix lands, run with a single parallel slot.
-
-    License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), boxwrench (GGUF conversion)
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qat
-    - multimodal
-    - moe
-    - mtp
-  icon: https://ai.google.dev/gemma/images/gemma4_banner.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    mmproj: llama-cpp/mmproj/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B-it-mmproj.gguf
-    draft_model: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
-    options:
-      - use_jinja:true
-      - spec_type:draft-mtp
-      - spec_n_max:6
-      - spec_p_min:0.75
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B_q4_0-it.gguf
-      repeat_penalty: 1
-      temperature: 1
-      top_k: 64
-      top_p: 0.95
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B_q4_0-it.gguf
-      sha256: 4c856523d61d77922dbc0b26753a6bf6208e5d69d80db0c04dcd776832d054c5
-      uri: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf/resolve/main/gemma-4-26B_q4_0-it.gguf
-    - filename: llama-cpp/mmproj/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B-it-mmproj.gguf
-      sha256: d8e2de16e17515d9061b23c9a002715f996f9e0c87b93a9354264611bfab9239
-      uri: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf/resolve/main/gemma-4-26B-it-mmproj.gguf
-    - filename: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
-      sha256: 86f156403d9148aeffa765411f1373d1a2f9c840d62f5e088701153a35ecff73
-      uri: https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads/resolve/main/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
- name: "gemma-4-31b-it-qat-mtp"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf
-    - https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads
-  description: |
-    Gemma 4 31B IT QAT (Google DeepMind), the largest dense multimodal model in the family, paired with the QAT-matched MTP assistant/drafter head for Multi-Token Prediction speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from boxwrench, converted from Google's `gemma-4-31B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. Using a QAT-matched head instead of a generic non-QAT head substantially raises draft acceptance and end-to-end throughput. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
-
-    > [!Note]
-    > The assistant head uses the `gemma4_assistant` architecture. It loads on the Atomic TurboQuant llama.cpp fork and on stock llama.cpp once ggml-org/llama.cpp#23398 ("llama: add Gemma4 MTP") merges. Until the upstream `n_tokens` reshape fix lands, run with a single parallel slot.
-
-    License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), boxwrench (GGUF conversion)
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qat
-    - multimodal
-    - mtp
-  icon: https://ai.google.dev/gemma/images/gemma4_banner.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    mmproj: llama-cpp/mmproj/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B-it-mmproj.gguf
-    draft_model: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
-    options:
-      - use_jinja:true
-      - spec_type:draft-mtp
-      - spec_n_max:6
-      - spec_p_min:0.75
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B_q4_0-it.gguf
-      repeat_penalty: 1
-      temperature: 1
-      top_k: 64
-      top_p: 0.95
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B_q4_0-it.gguf
-      sha256: 0374ce7b0124db9ba96fc649e835c531223ee224a497ce88a374baaea10932ec
-      uri: https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf/resolve/main/gemma-4-31B_q4_0-it.gguf
-    - filename: llama-cpp/mmproj/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B-it-mmproj.gguf
-      sha256: 8e239c9c592541c9f537fff75677ea30d8af1e14ba63d27cf245423b7d0a688b
-      uri: https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf/resolve/main/gemma-4-31B-it-mmproj.gguf
-    - filename: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
-      sha256: 7a7cdd65a93536f3bf324e97ddf60cc8d482510eaa0837873aef0fd7e0b493a5
-      uri: https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads/resolve/main/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
+# Temporarily disabled: Gemma 4 QAT MTP assistant-head entries are hidden
+# until the assistant GGUFs are verified against the stock llama.cpp backend.
+# - name: "gemma-4-12b-it-qat-mtp"
+#   url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+#   urls:
+#     - https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf
+#     - https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF
+#   description: |
+#     Gemma 4 12B IT QAT (Google DeepMind) paired with the official QAT assistant/drafter head for Multi-Token Prediction (MTP) speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from Janvitos, converted from Google's `gemma-4-12B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. With llama.cpp's `draft-mtp` speculative path enabled, this combination accelerates generation while keeping the target model's quality. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
+#
+#     License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), Janvitos (GGUF conversion)
+#   license: "apache-2.0"
+#   tags:
+#     - llm
+#     - gguf
+#     - qat
+#     - multimodal
+#     - mtp
+#   icon: https://ai.google.dev/gemma/images/gemma4_banner.png
+#   overrides:
+#     backend: llama-cpp
+#     function:
+#       automatic_tool_parsing_fallback: true
+#       grammar:
+#         disable: true
+#     known_usecases:
+#       - chat
+#     mmproj: llama-cpp/mmproj/gemma-4-12B-it-qat-q4_0-gguf/mmproj-gemma-4-12b-it-qat-q4_0.gguf
+#     draft_model: llama-cpp/models/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
+#     options:
+#       - use_jinja:true
+#       - spec_type:draft-mtp
+#       - spec_n_max:6
+#       - spec_p_min:0.75
+#     parameters:
+#       min_p: 0
+#       model: llama-cpp/models/gemma-4-12B-it-qat-q4_0-gguf/gemma-4-12b-it-qat-q4_0.gguf
+#       repeat_penalty: 1
+#       temperature: 1
+#       top_k: 64
+#       top_p: 0.95
+#     template:
+#       use_tokenizer_template: true
+#   files:
+#     - filename: llama-cpp/models/gemma-4-12B-it-qat-q4_0-gguf/gemma-4-12b-it-qat-q4_0.gguf
+#       sha256: faff1a63667fac17ac5e777f47114688fcefea96e220e211aaa8d62c2c4561f1
+#       uri: https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf/resolve/main/gemma-4-12b-it-qat-q4_0.gguf
+#     - filename: llama-cpp/mmproj/gemma-4-12B-it-qat-q4_0-gguf/mmproj-gemma-4-12b-it-qat-q4_0.gguf
+#       sha256: e70b0e5cd80323d5d588b4ed06780356b7b1ba03995a4b8164c6ae9db0ff5989
+#       uri: https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf/resolve/main/mmproj-gemma-4-12b-it-qat-q4_0.gguf
+#     - filename: llama-cpp/models/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
+#       sha256: 13331068b6af643c3dc75e619373b674c1f75a1958e7c82e2020d96a17c63809
+#       uri: https://huggingface.co/Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF/resolve/main/gemma-4-12B-it-qat-assistant-MTP-Q8_0.gguf
+# Temporarily disabled: these Gemma 4 MTP assistant-head GGUFs currently fail
+# to load in stock llama.cpp with unknown architecture `gemma4_assistant`.
+# Re-enable after the published GGUFs use the upstream `gemma4-assistant`
+# architecture spelling or the backend carries a vetted compatibility fix.
+# - name: "gemma-4-26b-a4b-it-qat-mtp"
+#   url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+#   urls:
+#     - https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf
+#     - https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads
+#   description: |
+#     Gemma 4 26B-A4B IT QAT (Google DeepMind), a multimodal Mixture-of-Experts model (26B total, ~4B active per token), paired with the QAT-matched MTP assistant/drafter head for Multi-Token Prediction speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from boxwrench, converted from Google's `gemma-4-26B-A4B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. Using a QAT-matched head instead of a generic non-QAT head raised draft acceptance from ~57% to ~92% on this model. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
+#
+#     > [!Note]
+#     > The assistant head uses the early `gemma4_assistant` architecture spelling; LocalAI patches the llama.cpp backend to accept it as the upstream `gemma4-assistant` architecture. Until the upstream `n_tokens` reshape fix lands, run with a single parallel slot.
+#
+#     License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), boxwrench (GGUF conversion)
+#   license: "apache-2.0"
+#   tags:
+#     - llm
+#     - gguf
+#     - qat
+#     - multimodal
+#     - moe
+#     - mtp
+#   icon: https://ai.google.dev/gemma/images/gemma4_banner.png
+#   overrides:
+#     backend: llama-cpp
+#     function:
+#       automatic_tool_parsing_fallback: true
+#       grammar:
+#         disable: true
+#     known_usecases:
+#       - chat
+#     mmproj: llama-cpp/mmproj/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B-it-mmproj.gguf
+#     draft_model: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
+#     options:
+#       - use_jinja:true
+#       - spec_type:draft-mtp
+#       - spec_n_max:6
+#       - spec_p_min:0.75
+#     parameters:
+#       min_p: 0
+#       model: llama-cpp/models/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B_q4_0-it.gguf
+#       repeat_penalty: 1
+#       temperature: 1
+#       top_k: 64
+#       top_p: 0.95
+#     template:
+#       use_tokenizer_template: true
+#   files:
+#     - filename: llama-cpp/models/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B_q4_0-it.gguf
+#       sha256: 4c856523d61d77922dbc0b26753a6bf6208e5d69d80db0c04dcd776832d054c5
+#       uri: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf/resolve/main/gemma-4-26B_q4_0-it.gguf
+#     - filename: llama-cpp/mmproj/gemma-4-26B-A4B-it-qat-q4_0-gguf/gemma-4-26B-it-mmproj.gguf
+#       sha256: d8e2de16e17515d9061b23c9a002715f996f9e0c87b93a9354264611bfab9239
+#       uri: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf/resolve/main/gemma-4-26B-it-mmproj.gguf
+#     - filename: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
+#       sha256: 86f156403d9148aeffa765411f1373d1a2f9c840d62f5e088701153a35ecff73
+#       uri: https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads/resolve/main/gemma-4-26B-A4B-it-qat-assistant-MTP-Q8_0.gguf
+# - name: "gemma-4-31b-it-qat-mtp"
+#   url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+#   urls:
+#     - https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf
+#     - https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads
+#   description: |
+#     Gemma 4 31B IT QAT (Google DeepMind), the largest dense multimodal model in the family, paired with the QAT-matched MTP assistant/drafter head for Multi-Token Prediction speculative decoding. The Q4_0 target carries the full multimodal (text + image) model, while the Q8_0 assistant GGUF (from boxwrench, converted from Google's `gemma-4-31B-it-qat-q4_0-unquantized-assistant` checkpoint) acts as the draft model. Using a QAT-matched head instead of a generic non-QAT head substantially raises draft acceptance and end-to-end throughput. The assistant head is not a standalone chat model: it only runs paired with the target, which is why both are bundled here.
+#
+#     > [!Note]
+#     > The assistant head uses the early `gemma4_assistant` architecture spelling; LocalAI patches the llama.cpp backend to accept it as the upstream `gemma4-assistant` architecture. Until the upstream `n_tokens` reshape fix lands, run with a single parallel slot.
+#
+#     License: Apache 2.0 | Authors: Google DeepMind (target/assistant checkpoints), boxwrench (GGUF conversion)
+#   license: "apache-2.0"
+#   tags:
+#     - llm
+#     - gguf
+#     - qat
+#     - multimodal
+#     - mtp
+#   icon: https://ai.google.dev/gemma/images/gemma4_banner.png
+#   overrides:
+#     backend: llama-cpp
+#     function:
+#       automatic_tool_parsing_fallback: true
+#       grammar:
+#         disable: true
+#     known_usecases:
+#       - chat
+#     mmproj: llama-cpp/mmproj/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B-it-mmproj.gguf
+#     draft_model: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
+#     options:
+#       - use_jinja:true
+#       - spec_type:draft-mtp
+#       - spec_n_max:6
+#       - spec_p_min:0.75
+#     parameters:
+#       min_p: 0
+#       model: llama-cpp/models/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B_q4_0-it.gguf
+#       repeat_penalty: 1
+#       temperature: 1
+#       top_k: 64
+#       top_p: 0.95
+#     template:
+#       use_tokenizer_template: true
+#   files:
+#     - filename: llama-cpp/models/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B_q4_0-it.gguf
+#       sha256: 0374ce7b0124db9ba96fc649e835c531223ee224a497ce88a374baaea10932ec
+#       uri: https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf/resolve/main/gemma-4-31B_q4_0-it.gguf
+#     - filename: llama-cpp/mmproj/gemma-4-31B-it-qat-q4_0-gguf/gemma-4-31B-it-mmproj.gguf
+#       sha256: 8e239c9c592541c9f537fff75677ea30d8af1e14ba63d27cf245423b7d0a688b
+#       uri: https://huggingface.co/google/gemma-4-31B-it-qat-q4_0-gguf/resolve/main/gemma-4-31B-it-mmproj.gguf
+#     - filename: llama-cpp/models/gemma-4-qat-mtp-assistant-heads/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
+#       sha256: 7a7cdd65a93536f3bf324e97ddf60cc8d482510eaa0837873aef0fd7e0b493a5
+#       uri: https://huggingface.co/boxwrench/gemma-4-qat-mtp-assistant-heads/resolve/main/gemma-4-31B-it-qat-assistant-MTP-Q8_0.gguf
 - name: "step-3.7-flash"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls: