Merge origin/master + pin-sync paged backend to 0ed235ea

master auto-bumped the stock llama-cpp pin 9d5d882d -> 0ed235ea and updated the
shared grpc-server.cpp. The paged backend's pin must track the stock pin (the
grpc-server.cpp is shared), so bump its LLAMA_VERSION to match. All 28 paged
patches apply clean on 0ed235ea (verified against a fresh upstream clone). The
bf16-tau state-serialization fix (patch 0026) is included. Bit-exact gate + full
grpc-server build verify on GPU/CI to follow.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto
2026-06-28 07:56:47 +00:00
95 changed files with 6339 additions and 487 deletions

View File

@@ -801,6 +801,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -833,6 +837,9 @@
- gguf
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -860,6 +867,9 @@
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/sGQKmrMc6L6guMoaB5_Y2.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -985,6 +995,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -9343,6 +9357,248 @@
- filename: MiniFASNetV1SE.onnx
sha256: ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676
uri: https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx
- name: face-detect-buffalo-l
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/deepinsight/insightface
description: |
Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF
detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and
shipped as a single GGUF for the `face-detect` backend. Highest
accuracy of the buffalo line.
No Python / onnxruntime / torch runtime: face-detect.cpp reads the
detector and embedder architecture (`facedetect.arch`) directly from
the GGUF metadata, so installing this entry is all that is needed to
select buffalo_l. Drives the Embedding / Detect / FaceVerify /
FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect}
REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof
ensemble, available via the FaceVerify `anti_spoof` request flag.
NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see
`face-detect-yunet-sface`.
license: insightface-non-commercial
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
tags:
- face-recognition
- face-verification
- face-embedding
- research-only
- gpu
- cpu
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.35
parameters:
model: face-detect-buffalo-l.gguf
files:
- filename: face-detect-buffalo-l.gguf
sha256: 6ed070f6e569beeed542ddd5603bcbc9eb8ea57f728f7d8013d6a90b2b952116
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_l.gguf
- name: face-detect-buffalo-m
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/deepinsight/insightface
description: |
Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF
detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF
for the `face-detect` backend. Same recognition accuracy as
`buffalo_l` with a cheaper detector: a good balance on mid-range
hardware.
The architecture (`facedetect.arch`) is read from the GGUF metadata,
so this entry alone selects the buffalo_m engine. This GGUF also
embeds the MiniFASNet anti-spoof ensemble, available via the
FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
ONLY.
license: insightface-non-commercial
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
tags:
- face-recognition
- face-verification
- face-embedding
- research-only
- gpu
- cpu
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.35
parameters:
model: face-detect-buffalo-m.gguf
files:
- filename: face-detect-buffalo-m.gguf
sha256: 0f7527eeb97b88719bf7e11e43ab8af6f05999357d767f8dde53db3c586c1c3f
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_m.gguf
- name: face-detect-buffalo-s
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/deepinsight/insightface
description: |
Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF
detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the
`face-detect` backend. Small and CPU-friendly: a good fit for
mid-range and edge deployments.
The architecture (`facedetect.arch`) is read from the GGUF metadata,
so this entry alone selects the buffalo_s engine. This GGUF also
embeds the MiniFASNet anti-spoof ensemble, available via the
FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
ONLY.
license: insightface-non-commercial
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
tags:
- face-recognition
- face-verification
- face-embedding
- research-only
- edge
- cpu
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.35
parameters:
model: face-detect-buffalo-s.gguf
files:
- filename: face-detect-buffalo-s.gguf
sha256: 7490b1efbc8746b188a5aef0adf5e3d1a2dc9607abd474018893f95571999969
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_s.gguf
- name: face-detect-buffalo-sc
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/deepinsight/insightface
description: |
Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M
detector + a small ArcFace embedder), converted to a C++/ggml GGUF
for the `face-detect` backend. This is the smallest insightface
pack: the lightest option for low-resource and edge deployments.
The architecture (`facedetect.arch`) is read from the GGUF metadata,
so this entry alone selects the buffalo_sc engine. If this GGUF
embeds the MiniFASNet anti-spoof ensemble, it is available via the
FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
ONLY.
license: insightface-non-commercial
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
tags:
- face-recognition
- face-verification
- face-embedding
- research-only
- edge
- cpu
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.35
parameters:
model: face-detect-buffalo-sc.gguf
files:
- filename: face-detect-buffalo-sc.gguf
sha256: f754c0e32d5efbbc53d7efca13be2807676bf5db20a8594ef96b32afa2c482b1
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_sc.gguf
- name: face-detect-antelopev2
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/deepinsight/insightface
description: |
Face recognition with insightface's `antelopev2` pack (SCRFD-10G
detector + ArcFace glint360k R100, 512-d embedder), converted to a
C++/ggml GGUF for the `face-detect` backend. The higher-accuracy
insightface pack: heavier, but the best fit when recognition
quality matters more than speed.
The architecture (`facedetect.arch`) is read from the GGUF metadata,
so this entry alone selects the antelopev2 engine. If this GGUF
embeds the MiniFASNet anti-spoof ensemble, it is available via the
FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
ONLY.
license: insightface-non-commercial
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
tags:
- face-recognition
- face-verification
- face-embedding
- research-only
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.35
parameters:
model: face-detect-antelopev2.gguf
files:
- filename: face-detect-antelopev2.gguf
sha256: 245e657e51754fbf075dd43d80a80a2d14a60c2fc42a3220f63eef17a315e96c
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/antelopev2.gguf
- name: face-detect-yunet-sface
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/face-detect.cpp
- https://github.com/opencv/opencv_zoo
description: |
Face recognition with OpenCV Zoo weights: YuNet detector + SFace
128-d recognizer, converted to a C++/ggml GGUF for the `face-detect`
backend. APACHE 2.0: safe for commercial use. Lower accuracy than the
buffalo packs and no demographic head, but the commercial-friendly
alternative to the insightface buffalo line.
The architecture (`facedetect.arch`) is read from the GGUF metadata,
so this entry alone selects the YuNet + SFace engine.
license: apache-2.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- face-recognition
- face-verification
- face-embedding
- commercial-ok
- gpu
- cpu
last_checked: "2026-06-22"
overrides:
backend: face-detect
known_usecases:
- face_recognition
- detection
- embeddings
options:
- verify_threshold:0.363
parameters:
model: face-detect-yunet-sface.gguf
files:
- filename: face-detect-yunet-sface.gguf
sha256: 9ce78d4ba0ae9d5e8c91a0e145d511558d1d90f5d9c1f4131cca9bb4bce60902
uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/yunet-sface.gguf
- name: speechbrain-ecapa-tdnn
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
@@ -9412,6 +9668,217 @@
- filename: wespeaker_voxceleb_resnet34.onnx
sha256: 7bb2f06e9df17cdf1ef14ee8a15ab08ed28e8d0ef5054ee135741560df2ec068
uri: https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM/resolve/main/voxceleb_resnet34_LM.onnx
- name: voice-detect-ecapa-tdnn
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/voice-detect.cpp
- https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
description: |
Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained
on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the
`voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal
Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe.
No Python / torch runtime: voice-detect.cpp reads the embedding
architecture (`voicedetect.arch`) directly from the GGUF metadata,
so installing this entry is all that is needed to select ECAPA-TDNN.
Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the
/v1/voice/{verify,embed,register,identify,forget} REST endpoints.
license: apache-2.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- speaker-verification
- speaker-embedding
- commercial-ok
- cpu
- gpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
options:
- verify_threshold:0.25
parameters:
model: voice-detect-ecapa-tdnn-voxceleb.gguf
files:
- filename: voice-detect-ecapa-tdnn-voxceleb.gguf
sha256: 68046a1fdfb7843f460962db4739fbd381cc5c3ab93d1505e75e2f4c0dc19b8f
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/ecapa-tdnn-voxceleb.gguf
- name: voice-detect-wespeaker-resnet34
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/voice-detect.cpp
- https://github.com/wenet-e2e/wespeaker
description: |
Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb,
converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d
embeddings, CPU-friendly and runtime-free (no onnxruntime or torch).
CC-BY-4.0.
Use when you want WeSpeaker's ResNet34 topology instead of
ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read
from the GGUF metadata, so this entry alone selects the engine.
license: cc-by-4.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- speaker-verification
- speaker-embedding
- commercial-ok
- edge
- cpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
options:
- verify_threshold:0.25
parameters:
model: voice-detect-wespeaker-resnet34.gguf
files:
- filename: voice-detect-wespeaker-resnet34.gguf
sha256: 72040372494eafec299836bc1977cfc13c603cb486674ed59b0f4c03758d29da
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/wespeaker-resnet34-voxceleb.gguf
- name: voice-detect-eres2net
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/voice-detect.cpp
- https://huggingface.co/iic/speech_eres2net_sv_en_voxceleb_16k
description: |
Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb,
converted to a C++/ggml GGUF for the `voice-detect` backend.
192-d embeddings with strong verification accuracy. APACHE 2.0.
The embedding architecture (`voicedetect.arch`) is read from the
GGUF metadata, so this entry alone selects the ERes2Net engine.
license: apache-2.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- speaker-verification
- speaker-embedding
- commercial-ok
- cpu
- gpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
options:
- verify_threshold:0.25
parameters:
model: voice-detect-eres2net.gguf
files:
- filename: voice-detect-eres2net.gguf
sha256: d39f53c7a4d39734740a86a07521b9a819ee8ea56c1a9436eba611ab733a3d06
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/eres2net-base-zh-cn.gguf
- name: voice-detect-campplus
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/voice-detect.cpp
- https://huggingface.co/iic/speech_campplus_sv_en_voxceleb_16k
description: |
Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb,
converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d
embeddings, a fast context-aware masking topology well-suited to
CPU and edge deployments. APACHE 2.0.
The embedding architecture (`voicedetect.arch`) is read from the
GGUF metadata, so this entry alone selects the CAM++ engine.
license: apache-2.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- speaker-verification
- speaker-embedding
- commercial-ok
- edge
- cpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
options:
- verify_threshold:0.25
parameters:
model: voice-detect-campplus.gguf
files:
- filename: voice-detect-campplus.gguf
sha256: a6e34c6d230cff26e37b71a2df0907fde1de425654e28d9d5cacca32e02a13d3
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/campplus-zh-cn.gguf
- name: voice-detect-emotion-wav2vec2
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://github.com/mudler/voice-detect.cpp
- https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
description: |
Voice analysis (age / gender / emotion) with audEERING's wav2vec2
model, converted to a C++/ggml GGUF for the `voice-detect` backend.
Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST
endpoint, returning a continuous age estimate plus gender and
emotion class scores for a single utterance. CC-BY-NC-SA-4.0 -
research / non-commercial use only.
The analysis architecture (`voicedetect.arch`) is read from the
GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
license: cc-by-nc-sa-4.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- voice-analysis
- emotion-recognition
- cpu
- gpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
parameters:
model: voice-detect-emotion-wav2vec2.gguf
files:
- filename: voice-detect-emotion-wav2vec2.gguf
sha256: 9e9793e4f77a27f4ae068bcb29c2b6fe2f74881799e2cfea0f8e436ad3765e50
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/emotion-wav2vec2-superb-er.gguf
- name: voice-detect-age-gender-wav2vec2
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://huggingface.co/audeering/wav2vec2-large-robust-24-ft-age-gender
- https://github.com/mudler/voice-detect.cpp
description: |
wav2vec2-large-robust age + gender analysis head
(audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a
C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze
gRPC rpc and the /v1/voice/analyze REST endpoint, returning a
continuous age estimate plus gender class scores for a single
utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only.
The analysis architecture (`voicedetect.arch`) is read from the
GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
license: cc-by-nc-sa-4.0
icon: https://avatars.githubusercontent.com/u/95302084
tags:
- voice-recognition
- voice-analysis
- research-only
- cpu
- gpu
last_checked: "2026-06-22"
overrides:
backend: voice-detect
known_usecases:
- speaker_recognition
parameters:
model: voice-detect-age-gender-wav2vec2.gguf
files:
- filename: voice-detect-age-gender-wav2vec2.gguf
sha256: d92486b3f1ea7baf6a90f1026b7b8e9848b3a8332bccfb01cc8889eed7069064
uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/age-gender-wav2vec2-audeering.gguf
- name: rfdetr-base
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls: