From 6dd8a3d8953496970d2b92be3dc28f077787c225 Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@localai.io>
Date: Fri, 26 Jun 2026 21:31:16 +0000
Subject: [PATCH] docs(gallery): NVFP4 GGUFs published to mudler/ - update
 header note

The dense + MoE base NVFP4 GGUFs are live (huggingface.co/mudler/Qwen3.6-27B-NVFP4-GGUF
and .../Qwen3.6-35B-A3B-NVFP4-GGUF), sha256 verified vs the Hub LFS hash, uris resolve.
Replaces the placeholder/not-yet-published TODO.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---
 gallery/index.yaml | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/gallery/index.yaml b/gallery/index.yaml
index c9a9421b7..ec637ebdc 100644
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -4,14 +4,15 @@
 # These reproduce the GB10 / DGX Spark benchmark serving config (see
 # backend/cpp/llama-cpp/patches/paged/LOCALAI_LLAMACPP_BACKEND_PLAN.md section 2).
 #
-# TODO(GGUF publish): the two HF repos below are PLACEHOLDERS under the `mudler`
-# org and are not yet published. Until then these entries will not resolve. After
-# uploading each .gguf, add its `sha256:` (sha256sum) to the matching `files:`
-# entry so LocalAI verifies it on download.
+# PUBLISHED: the dense + MoE base NVFP4 GGUFs are live at huggingface.co/mudler/
+# Qwen3.6-27B-NVFP4-GGUF and .../Qwen3.6-35B-A3B-NVFP4-GGUF (file_type MOSTLY_NVFP4);
+# the sha256 below were verified against the Hub LFS hash and the uris resolve (200).
+# Converted from the unsloth/nvidia NVFP4 sources via llama.cpp --outtype auto.
 #
-# TODO(NVFP4 read gating): NVFP4 GGUF tensor types require a llama.cpp new enough
-# to read them. Confirm the paged backend's pinned LLAMA_VERSION supports NVFP4
-# on a GPU box before relying on these (plan section 3.4 / 4 blocker #1).
+# NOTE(NVFP4 read): the paged backend (pinned llama.cpp 9d5d882d) reads NVFP4 GGUF
+# (the GB10 benchmark + the pin-sync md5 gate both ran NVFP4 GGUFs). These gallery
+# GGUFs were re-quantized with a newer convert (origin/master) preserving the same
+# MOSTLY_NVFP4 format; a load check on the paged backend GPU build is the final gate.
 #
 # NOTE(ssm_bf16_tau): Qwen3.5 gated-DeltaNet (hybrid SSM) models can opt into the
 # reduced-precision hybrid SSM-state fast mode by adding `ssm_bf16_tau:<tokens>`