Add the remaining official Google Gemma 4 QAT Q4_0 GGUFs (E2B, E4B,
26B-A4B, 31B) next to the existing 12B entry, each shipping its
multimodal mmproj.
Also add three MTP (Multi-Token Prediction) speculative-decoding bundles
that pair each QAT target with a QAT-matched assistant/drafter head:
- 12B <- Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF
- 26B-A4B <- boxwrench/gemma-4-qat-mtp-assistant-heads
- 31B <- boxwrench/gemma-4-qat-mtp-assistant-heads
The assistant heads use the gemma4_assistant architecture and are not
standalone chat models, so each entry bundles the target + draft and
sets draft_model together with the draft-mtp spec options
(spec_type:draft-mtp / spec_n_max:6 / spec_p_min:0.75), matching
MTPSpecOptions() in core/config/mtp.go. QAT-matched heads raise draft
acceptance substantially over generic non-QAT heads.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>