Merge origin/master + pin-sync paged backend to 0ed235ea

master auto-bumped the stock llama-cpp pin 9d5d882d -> 0ed235ea and updated the shared grpc-server.cpp. The paged backend's pin must track the stock pin (the grpc-server.cpp is shared), so bump its LLAMA_VERSION to match. All 28 paged patches apply clean on 0ed235ea (verified against a fresh upstream clone). The bf16-tau state-serialization fix (patch 0026) is included. Bit-exact gate + full grpc-server build verify on GPU/CI to follow. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 11:56:57 -04:00 · 2026-06-28 07:56:47 +00:00
parent 1f3e5ba301 de2ec2f136
commit ea72a56e2c
95 changed files with 6339 additions and 487 deletions
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -801,6 +801,10 @@
  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
  overrides:
    backend: llama-cpp
+    # NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
+    # context size cannot be auto-derived; set it explicitly (the model trains
+    # to 262144, 32768 is a safe default operators can raise).
+    context_size: 32768
    function:
      automatic_tool_parsing_fallback: true
      grammar:
@@ -833,6 +837,9 @@
    - gguf
  overrides:
    backend: llama-cpp
+    # NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
+    # context size cannot be auto-derived; set it explicitly.
+    context_size: 32768
    function:
      automatic_tool_parsing_fallback: true
      grammar:
@@ -860,6 +867,9 @@
  icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/sGQKmrMc6L6guMoaB5_Y2.png
  overrides:
    backend: llama-cpp
+    # NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
+    # context size cannot be auto-derived; set it explicitly.
+    context_size: 32768
    function:
      automatic_tool_parsing_fallback: true
      grammar:
@@ -985,6 +995,10 @@
  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
  overrides:
    backend: llama-cpp
+    # NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
+    # context size cannot be auto-derived; set it explicitly (the model trains
+    # to 262144, 32768 is a safe default operators can raise).
+    context_size: 32768
    function:
      automatic_tool_parsing_fallback: true
      grammar:
@@ -9343,6 +9357,248 @@
    - filename: MiniFASNetV1SE.onnx
      sha256: ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676
      uri: https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx
+- name: face-detect-buffalo-l
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF
+    detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and
+    shipped as a single GGUF for the `face-detect` backend. Highest
+    accuracy of the buffalo line.
+
+    No Python / onnxruntime / torch runtime: face-detect.cpp reads the
+    detector and embedder architecture (`facedetect.arch`) directly from
+    the GGUF metadata, so installing this entry is all that is needed to
+    select buffalo_l. Drives the Embedding / Detect / FaceVerify /
+    FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect}
+    REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof
+    ensemble, available via the FaceVerify `anti_spoof` request flag.
+    NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see
+    `face-detect-yunet-sface`.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-l.gguf
+  files:
+    - filename: face-detect-buffalo-l.gguf
+      sha256: 6ed070f6e569beeed542ddd5603bcbc9eb8ea57f728f7d8013d6a90b2b952116
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_l.gguf
+- name: face-detect-buffalo-m
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF
+    detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF
+    for the `face-detect` backend. Same recognition accuracy as
+    `buffalo_l` with a cheaper detector: a good balance on mid-range
+    hardware.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_m engine. This GGUF also
+    embeds the MiniFASNet anti-spoof ensemble, available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-m.gguf
+  files:
+    - filename: face-detect-buffalo-m.gguf
+      sha256: 0f7527eeb97b88719bf7e11e43ab8af6f05999357d767f8dde53db3c586c1c3f
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_m.gguf
+- name: face-detect-buffalo-s
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF
+    detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the
+    `face-detect` backend. Small and CPU-friendly: a good fit for
+    mid-range and edge deployments.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_s engine. This GGUF also
+    embeds the MiniFASNet anti-spoof ensemble, available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-s.gguf
+  files:
+    - filename: face-detect-buffalo-s.gguf
+      sha256: 7490b1efbc8746b188a5aef0adf5e3d1a2dc9607abd474018893f95571999969
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_s.gguf
+- name: face-detect-buffalo-sc
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M
+    detector + a small ArcFace embedder), converted to a C++/ggml GGUF
+    for the `face-detect` backend. This is the smallest insightface
+    pack: the lightest option for low-resource and edge deployments.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_sc engine. If this GGUF
+    embeds the MiniFASNet anti-spoof ensemble, it is available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-sc.gguf
+  files:
+    - filename: face-detect-buffalo-sc.gguf
+      sha256: f754c0e32d5efbbc53d7efca13be2807676bf5db20a8594ef96b32afa2c482b1
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_sc.gguf
+- name: face-detect-antelopev2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `antelopev2` pack (SCRFD-10G
+    detector + ArcFace glint360k R100, 512-d embedder), converted to a
+    C++/ggml GGUF for the `face-detect` backend. The higher-accuracy
+    insightface pack: heavier, but the best fit when recognition
+    quality matters more than speed.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the antelopev2 engine. If this GGUF
+    embeds the MiniFASNet anti-spoof ensemble, it is available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-antelopev2.gguf
+  files:
+    - filename: face-detect-antelopev2.gguf
+      sha256: 245e657e51754fbf075dd43d80a80a2d14a60c2fc42a3220f63eef17a315e96c
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/antelopev2.gguf
+- name: face-detect-yunet-sface
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/opencv/opencv_zoo
+  description: |
+    Face recognition with OpenCV Zoo weights: YuNet detector + SFace
+    128-d recognizer, converted to a C++/ggml GGUF for the `face-detect`
+    backend. APACHE 2.0: safe for commercial use. Lower accuracy than the
+    buffalo packs and no demographic head, but the commercial-friendly
+    alternative to the insightface buffalo line.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the YuNet + SFace engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - commercial-ok
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.363
+    parameters:
+      model: face-detect-yunet-sface.gguf
+  files:
+    - filename: face-detect-yunet-sface.gguf
+      sha256: 9ce78d4ba0ae9d5e8c91a0e145d511558d1d90f5d9c1f4131cca9bb4bce60902
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/yunet-sface.gguf
 - name: speechbrain-ecapa-tdnn
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
@@ -9412,6 +9668,217 @@
    - filename: wespeaker_voxceleb_resnet34.onnx
      sha256: 7bb2f06e9df17cdf1ef14ee8a15ab08ed28e8d0ef5054ee135741560df2ec068
      uri: https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM/resolve/main/voxceleb_resnet34_LM.onnx
+- name: voice-detect-ecapa-tdnn
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
+  description: |
+    Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained
+    on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the
+    `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal
+    Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe.
+
+    No Python / torch runtime: voice-detect.cpp reads the embedding
+    architecture (`voicedetect.arch`) directly from the GGUF metadata,
+    so installing this entry is all that is needed to select ECAPA-TDNN.
+    Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the
+    /v1/voice/{verify,embed,register,identify,forget} REST endpoints.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-ecapa-tdnn-voxceleb.gguf
+  files:
+    - filename: voice-detect-ecapa-tdnn-voxceleb.gguf
+      sha256: 68046a1fdfb7843f460962db4739fbd381cc5c3ab93d1505e75e2f4c0dc19b8f
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/ecapa-tdnn-voxceleb.gguf
+- name: voice-detect-wespeaker-resnet34
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://github.com/wenet-e2e/wespeaker
+  description: |
+    Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d
+    embeddings, CPU-friendly and runtime-free (no onnxruntime or torch).
+    CC-BY-4.0.
+
+    Use when you want WeSpeaker's ResNet34 topology instead of
+    ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read
+    from the GGUF metadata, so this entry alone selects the engine.
+  license: cc-by-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-wespeaker-resnet34.gguf
+  files:
+    - filename: voice-detect-wespeaker-resnet34.gguf
+      sha256: 72040372494eafec299836bc1977cfc13c603cb486674ed59b0f4c03758d29da
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/wespeaker-resnet34-voxceleb.gguf
+- name: voice-detect-eres2net
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/iic/speech_eres2net_sv_en_voxceleb_16k
+  description: |
+    Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend.
+    192-d embeddings with strong verification accuracy. APACHE 2.0.
+
+    The embedding architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the ERes2Net engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-eres2net.gguf
+  files:
+    - filename: voice-detect-eres2net.gguf
+      sha256: d39f53c7a4d39734740a86a07521b9a819ee8ea56c1a9436eba611ab733a3d06
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/eres2net-base-zh-cn.gguf
+- name: voice-detect-campplus
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/iic/speech_campplus_sv_en_voxceleb_16k
+  description: |
+    Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d
+    embeddings, a fast context-aware masking topology well-suited to
+    CPU and edge deployments. APACHE 2.0.
+
+    The embedding architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the CAM++ engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-campplus.gguf
+  files:
+    - filename: voice-detect-campplus.gguf
+      sha256: a6e34c6d230cff26e37b71a2df0907fde1de425654e28d9d5cacca32e02a13d3
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/campplus-zh-cn.gguf
+- name: voice-detect-emotion-wav2vec2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
+  description: |
+    Voice analysis (age / gender / emotion) with audEERING's wav2vec2
+    model, converted to a C++/ggml GGUF for the `voice-detect` backend.
+    Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST
+    endpoint, returning a continuous age estimate plus gender and
+    emotion class scores for a single utterance. CC-BY-NC-SA-4.0 -
+    research / non-commercial use only.
+
+    The analysis architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
+  license: cc-by-nc-sa-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - voice-analysis
+    - emotion-recognition
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    parameters:
+      model: voice-detect-emotion-wav2vec2.gguf
+  files:
+    - filename: voice-detect-emotion-wav2vec2.gguf
+      sha256: 9e9793e4f77a27f4ae068bcb29c2b6fe2f74881799e2cfea0f8e436ad3765e50
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/emotion-wav2vec2-superb-er.gguf
+- name: voice-detect-age-gender-wav2vec2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://huggingface.co/audeering/wav2vec2-large-robust-24-ft-age-gender
+    - https://github.com/mudler/voice-detect.cpp
+  description: |
+    wav2vec2-large-robust age + gender analysis head
+    (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a
+    C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze
+    gRPC rpc and the /v1/voice/analyze REST endpoint, returning a
+    continuous age estimate plus gender class scores for a single
+    utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only.
+
+    The analysis architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
+  license: cc-by-nc-sa-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - voice-analysis
+    - research-only
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    parameters:
+      model: voice-detect-age-gender-wav2vec2.gguf
+  files:
+    - filename: voice-detect-age-gender-wav2vec2.gguf
+      sha256: d92486b3f1ea7baf6a90f1026b7b8e9848b3a8332bccfb01cc8889eed7069064
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/age-gender-wav2vec2-audeering.gguf
 - name: rfdetr-base
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls: