mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-11 02:07:27 -04:00
Add the remaining official Google Gemma 4 QAT Q4_0 GGUFs (E2B, E4B, 26B-A4B, 31B) next to the existing 12B entry, each shipping its multimodal mmproj. Also add three MTP (Multi-Token Prediction) speculative-decoding bundles that pair each QAT target with a QAT-matched assistant/drafter head: - 12B <- Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF - 26B-A4B <- boxwrench/gemma-4-qat-mtp-assistant-heads - 31B <- boxwrench/gemma-4-qat-mtp-assistant-heads The assistant heads use the gemma4_assistant architecture and are not standalone chat models, so each entry bundles the target + draft and sets draft_model together with the draft-mtp spec options (spec_type:draft-mtp / spec_n_max:6 / spec_p_min:0.75), matching MTPSpecOptions() in core/config/mtp.go. QAT-matched heads raise draft acceptance substantially over generic non-QAT heads. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>