feat(backends): add voice-detect + face-detect ggml backends (replace Python insightface/speaker-recognition) (#10441)

* feat(voice-detect): add Go purego backend for voice-detect.cpp Add backend/go/voice-detect implementing the Backend gRPC voice subset (VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego, mirroring the parakeet-cpp / omnivoice-cpp backends. The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and float-vector returns are owned by Go and released through the matching capi free functions, with the per-ctx last error surfaced into Go errors. Calls are serialized via base.SingleThread since the C context is not reentrant. Proto field mapping: - VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model. - VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the verify_threshold option, default 0.25) -> verify_paths -> verified/distance/ threshold/confidence/model/processing_time_ms. - VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion document maps to a single VoiceAnalysis segment (start/end 0; gender "label" -> dominant_gender with the remaining float scores as the gender map; emotion label/scores -> dominant_emotion/emotion). The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(voice-detect): wire backend into index, gallery and build Register the voice-detect.cpp speaker-recognition + voice-analysis backend (added in Voice-INT-A) into LocalAI's distribution surfaces, mirroring the ced backend (the closest mudler C++/ggml audio analogue): - backend/index.yaml: add the &voicedetect meta-backend (capabilities platform map, no top-level uri) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the -development variants). Referential integrity audited - every alias target resolves. - gallery/index.yaml: add 5 model entries on backend voice-detect - ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the wav2vec2 age/gender/emotion analyze model. The engine architecture is read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are not yet published: each files: entry points at the intended mudler/voice-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring ced. - .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via VOICEDETECT_VERSION (pin 47546430, = 4754643). - core/config/backend_capabilities.go: register voice-detect in the backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze -> speaker_recognition), mirroring speaker-recognition. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): add purego Go backend for face-detect.cpp Add the LocalAI Go backend that dlopens libfacedetect.so (the flat facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect backend. Implements the Face subset of the Backend gRPC service: - Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path -> L2-normalized ArcFace embedding. - Detect(DetectOptions): src -> detect_path_json -> Detection boxes (class_name "face", [x1,y1,x2,y2] -> x/y/w/h). - FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof -> verify_paths; best-effort img areas via detect. - FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face age + gender ("M"/"F" normalized to "Man"/"Woman"). The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib with ggml + vendored libjpeg-turbo static (PIC), so the .so is ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_ symbols). Gated Ginkgo e2e mirrors voice-detect. Note for the gallery-wiring task: backend registration (index.yaml, gallery, core/config/backend_capabilities.go) is intentionally not touched here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(voice-detect): replace em dashes in net-new descriptions Project style forbids em/en dashes. Replace the three U+2014 chars introduced by the voice-detect gallery/index wiring with `-`/`:`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): wire backend into index, gallery and build Register the face-detect.cpp face detection / embedding / verification / analysis backend (added in Face-INT-A) into LocalAI's distribution surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml recognition analogue): - backend/index.yaml: add the &facedetect meta-backend (capabilities platform map, no top-level uri to avoid the meta-backend gotcha) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/ metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants), 22 entries. Referential integrity audited: every alias target resolves. - gallery/index.yaml: add 4 model entries on backend face-detect - face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL) and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the commercial-friendly alternative). The detector/embedder architecture is read from GGUF metadata (facedetect.arch) at load; only the real verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF artifacts are not yet published: each files: entry points at the intended mudler/face-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - core/config/backend_capabilities.go: register face-detect in the backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze -> face_recognition), mirroring insightface. - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring voice-detect. - .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via FACEDETECT_VERSION (pin 636a1963). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(recon): voice-detect metal build branch + face-detect gallery usecases Add the missing metal BUILD_TYPE branch to the voice-detect Makefile forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the darwin metal CI artifact is built with the Metal backend instead of CPU-only. Expand the 4 face-detect gallery models' known_usecases to [face_recognition, detection, embeddings] to match the backend capabilities map and the mirrored insightface-buffalo entries, so auto-selection for /v1/detect and /embeddings works. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(recon): document voice-detect and face-detect ggml backends Document the new standalone C++/ggml biometric backends as the recommended/default option for face and voice recognition, keeping the existing Python insightface / speaker-recognition backends framed as the legacy path. - features/face-recognition.md: add a face-detect (ggml) backend section with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface Apache-2.0), licensing, and verify/detect/analyze quickstart. - features/voice-recognition.md: add a voice-detect (ggml) backend section with the gallery entries (ecapa-tdnn, wespeaker-resnet34, eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial analyze head) and quickstart. - reference/compatibility-table.md: add face-detect.cpp and voice-detect.cpp rows to the Vision, Detection & Recognition table. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(gallery): publish recon backend GGUF uris + sha256 Fill in the published HuggingFace GGUF uris and verified sha256 for the 9 recon gallery entries (voice-detect-* and face-detect-*), and remove the TODO publish markers. Correct the eres2net, campplus, and emotion-wav2vec2 uris to the actual published filenames. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now embedded and re-uploaded under the same filenames/uris) and note the FaceVerify anti_spoof request flag in each description. Add a new voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): add face-detect-buffalo-sc and antelopev2 packs Add gallery entries for two newly-published insightface face packs on the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k R100, 512-d). Both are non-commercial research-only. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): honor LocalAI per-model threads in voice/face-detect backends LocalAI spawns one backend process per model and serves requests concurrently, so the engines' own min(hardware_concurrency, 8) default can oversubscribe cores. Forward the per-model Threads value from the gRPC LoadModel options into the engine via VOICEDETECT_THREADS / FACEDETECT_THREADS (read at backend construction) before the capi load. A non-positive Threads is treated as unset, leaving the engine default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to CPU-optimized engine commits voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads). Brings the CPU optimizations into the LocalAI backend builds. GGUF format and parity unchanged, so the published HF GGUFs remain valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-2 CPU-optimized engines voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context, wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp -> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-3 Winograd engines voice-detect.cpp -> 45122ec (Winograd F(2x2,3x3) for WeSpeaker/ERes2Net 3x3 convs, -22%/-20% @8t); face-detect.cpp -> cd5c962 (Winograd F(4x4,3x3) for SCRFD large maps, -22% @1t on top of F(2x2), more load-stable). Parity held (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-4 Winograd engines (CPU opt complete) voice-detect.cpp -> d2839ca (CAM++ FCM 2D convs through Winograd, -15.5%/-10.3%); face-detect.cpp -> c1db23d (AVX2-vectorized Winograd tile transforms, SCRFD detect -14%/-9.6%). Final CPU optimization round; the conv-kernel lever class is now exhausted (parity held cosine=1.0; GGUF/parity unchanged, HF GGUFs valid). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump face-detect pin to deep-kernel engine (7ae5c4d) face-detect.cpp -> 7ae5c4d: register-blocked winograd-domain GEMM microkernel (2.8x isolated GFLOP/s), AVX-512 zmm evolution behind runtime CPUID dispatch (ship-safe, AVX2 fallback bit-identical), bias/relu fused into the winograd output transform, and SFace Conv+BN fold + bias/PReLU fusion. SCRFD detect ~1.4x faster end-to-end vs the round-4 baseline; parity bit-exact; portable single binary (function-multiversioned, no global -mavx512f). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ECAPA operand-order win (e9c56ae) voice-detect.cpp -> e9c56ae: weight-as-src0 mul_mat order in ECAPA's F32 conv1d_same (routes through tinyBLAS sgemm); ECAPA embed 1.67x @1t / ~1.3x @8t, parity cosine=1.0. Isolated to encoder.cpp (ECAPA-only); ERes2Net/CAM++/WeSpeaker do not call conv1d_same so are provably unaffected. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to FMA-throughput engines (voice f7b9f89, face 2d2d5f0) face -> 2d2d5f0: route ArcFace 3x3 body convs through the AVX-512 winograd microkernel (kWinoMinSize 80->14); ArcFace 1.62x @1t, SCRFD detect to 0.966 of MLAS @1t, no regression. voice -> f7b9f89: runtime-CPUID-dispatched AVX-512 winograd-GEMM microkernel (ship-safe, AVX2 fallback bit-identical); WeSpeaker 1.90x @1t. Parity cosine=1.0 throughout; portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to MLAS-class direct-conv engines (voice 7ecfd07, face be22d67) Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s, within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback, fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%) + ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression). Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice pin to Phase-A blocked backbone (f4e7eef) WeSpeaker ResNet34 runs as one nChw16c blocked island (2 reorders/forward vs ~60) on AVX-512, default; per-conv directconv fallback on AVX2. +2.9% @1t / +17-19% @8t vs per-conv directconv, parity cosine=1.0. The conv microkernel is already FMA-bound near peak (~0.86-0.98x MLAS-implied); residual to MLAS is sub-peak edge + non-conv tail, documented in docs/cpu-optimization.md. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to breadth blocked-backbone (voice 7f66871, face d80092b) voice 7f66871: AVX2-vectorized (ymm) blocked island - AVX2-only hosts now run the blocked backbone for WeSpeaker (2.3x over per-conv-AVX2, cosine=1.0); ERes2Net stays per-conv (blocked regresses, opt-in only); CAM++ Winograd-pinned. face d80092b: ArcFace recognizer blocked island, AVX-512 default (-13% @8t, ~0.90x MLAS, the closest conv result), auto per-conv on AVX2; SCRFD untouched on Winograd (0 island invocations during detect). Parity cosine=1.0 / detect <=1px throughout. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to small-spatial + stem conv kernels (voice 99b1804, face 47fdab6) Measured-gap-driven conv kernels: small-spatial (fill the register tile when output width <= tile width) + small-IC stem + strided-1x1/downsample recovery. ArcFace recognizer 0.57 -> 0.70x MLAS @1t (the closest conv model), WeSpeaker 0.65 -> 0.79x @1t. Parity cosine=1.0 / detect <=1px. The OC-block-sharing lever was a measured dead-end (deep stride-1 is L3-weight-bandwidth bound, not read-port bound) and was NOT shipped. Kernel ceiling reached; further gap needs an algorithm-class change (cache-blocked weight-stationary GEMM, or q8 weights). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to GPU persistent-graph + multi-model-safe cache (voice 45d2e6b, face 0a4799a) GPU wins (CUDA/ggml backend, no CPU-path change): persistent per-shape graph+context cache in Backend::compute() eliminates the per-call cudaGraph re-instantiation churn -> wav2vec2 emotion+age-gender now AT GPU parity with torch-cuDNN on GB10 (0.97-0.98x), CAM++ -5.7ms; bit-identical parity. Cache hardened multi-model-safe (invalidate-on-free keyed by the ModelLoader weights buffer) so LocalAI multi-model hosting cannot stale-hit. Conv models still trail cuDNN (im2col-materialization-bound) - cuDNN implicit-GEMM lever next. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to cuDNN-conv-capable engines (voice b6e4356, face 6107a24) Adds the opt-in cuDNN implicit-GEMM conv path (VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN, DEFAULT OFF -> zero build/runtime dep until enabled). On GPU it kills the im2col-materialization bottleneck and reaches torch-cuDNN parity on the spill-bound convs: SCRFD detect 14.8->6.4ms (2.3x, ~parity), WeSpeaker ~parity, ERes2Net beats torch (1.10x); ArcFace/CAM++ neutral (no spill). Parity exact (SCRFD <=1px, cosine=1.0). To USE it in LocalAI, the CUDA backend build must enable the flag AND bundle libcudnn - deferred until a cuDNN-bundled GPU image; flag stays OFF here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): enable cuDNN conv path on arm64+CUDA13 recon backends The voice-detect.cpp / face-detect.cpp engines have an opt-in cuDNN implicit-GEMM conv path behind VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN (default OFF) that kills im2col on the GPU and reaches torch-cuDNN parity (SCRFD 2.3x, WeSpeaker/ERes2Net parity), measured on the GB10 (arm64, CUDA 13, sm_121a). Enable it for the CUDA build, but only where cuDNN actually ships: the arm64 + CUDA 13 image (GB10/Jetson/L4T). x86 CUDA images carry no cuDNN, so flipping it on globally for BUILD_TYPE=cublas would be a link failure. The Makefiles gate on CUDA_MAJOR_VERSION=13 + arch (TARGETARCH from the matrix/Docker build, uname -m fallback for local builds). backend/Dockerfile.golang already installs the runtime libcudnn9-cuda-13 in the arm64+CUDA13 apt block; add the matching libcudnn9-dev-cuda-13 so the build-time link resolves. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ERes2Net blocked-default (30beecd) Defaults VD_ERES2NET_BLOCKED ON: routes the ERes2Net Res2Net body through the blocked nChw16c AVX-512 directconv island instead of the 1x1 mul_mat fast path (CONT-transpose + skinny low-K GEMM). On the shipped GGML_NATIVE=OFF build (ggml mul_mat is AVX2-only) this wins ~2x at every thread count (2.07x@1t, 2.2x@4t, 2.05x@8t); pure-AVX2 fallback still 1.3-1.62x. Parity exact (cosine=1.000000 vs golden), so registered voices + verify/identify thresholds are unaffected. The prior default-OFF rested on a stale comment whose 23pct regression only held on the non-shipping GGML_NATIVE=ON build. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(readme): announce native voice-detect + face-detect backends in Latest News Add a Latest News entry for the new from-scratch C++/ggml biometric backends (voice-detect.cpp + face-detect.cpp) that replace the Python insightface and speaker-recognition backends: no Python/onnxruntime at inference, self-contained GGUF, bit-exact parity, GPU cuDNN parity. Mirrors the parakeet.cpp / locate-anything.cpp native-backend news entries. Refs PR #10441. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): re-pin to the squashed engine release commits The voice-detect.cpp and face-detect.cpp histories were squashed to a single release commit, which orphaned the previous pins (voice 30beecd, face 6107a24). Re-pin to the new single-commit SHAs (voice 3d51077, face 06914b0); the tree is identical, so the backend build is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 11:56:57 -04:00 · 2026-06-28 09:29:08 +02:00
parent d3a26f961d
commit de2ec2f136
28 changed files with 3002 additions and 16 deletions
--- a/backend/Dockerfile.golang
+++ b/backend/Dockerfile.golang
@@ -137,7 +137,7 @@ RUN <<EOT bash
            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} libcudnn9-dev-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
        fi
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
--- a/backend/go/face-detect/.gitignore
+++ b/backend/go/face-detect/.gitignore
@@ -0,0 +1,18 @@
+# Fetched upstream sources
+sources/
+
+# CMake build directories
+build*/
+
+# build artifacts staged in-tree by the Makefile (cp from sources/) or
+# symlinked for local dev; the real sources live in face-detect.cpp upstream.
+*.so
+*.so.*
+facedetect_capi.h
+compile_commands.json
+
+# Compiled backend binary
+face-detect-grpc
+
+# Packaging output
+package/
--- a/backend/go/face-detect/Makefile
+++ b/backend/go/face-detect/Makefile
@@ -0,0 +1,110 @@
+# face-detect backend Makefile.
+#
+# Upstream pin lives below as FACEDETECT_VERSION?=06914b0... (.github/bump_deps.sh
+# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
+# convention).
+#
+# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
+# symlink the .so + header into this directory and skip the clone/cmake steps:
+#
+#   ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
+#   ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
+#   go build -o face-detect-grpc .
+#
+# The default target below does the proper clone-at-pin + cmake build so CI does
+# not need a side-checkout.
+
+FACEDETECT_VERSION?=06914b077d52f90d5421299138e7be6bdd06b5e8
+FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
+
+GOCMD?=go
+GO_TAGS?=
+JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
+
+BUILD_TYPE?=
+NATIVE?=false
+
+# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
+# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
+RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
+
+# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
+# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
+# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
+# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
+# side, so only the facedetect_capi_* surface is exported.
+CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
+
+ifeq ($(NATIVE),false)
+	CMAKE_ARGS+=-DGGML_NATIVE=OFF
+endif
+
+# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
+# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
+# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
+# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
+ifeq ($(BUILD_TYPE),cublas)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
+	# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, SCRFD 2.3x
+	# vs torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
+	# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
+	# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
+	ifeq ($(CUDA_MAJOR_VERSION),13)
+	ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
+		CMAKE_ARGS+=-DFACEDETECT_GGML_CUDNN=ON
+	endif
+	endif
+else ifeq ($(BUILD_TYPE),openblas)
+	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+else ifeq ($(BUILD_TYPE),hipblas)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
+else ifeq ($(BUILD_TYPE),vulkan)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
+else ifeq ($(BUILD_TYPE),metal)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
+endif
+
+.PHONY: face-detect-grpc package build clean purge test all
+
+all: face-detect-grpc
+
+# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
+# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
+# bump, run 'make purge && make' to refetch.
+sources/face-detect.cpp:
+	mkdir -p sources/face-detect.cpp
+	cd sources/face-detect.cpp && \
+	git init -q && \
+	git remote add origin $(FACEDETECT_REPO) && \
+	git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
+	git checkout FETCH_HEAD && \
+	git submodule update --init --recursive --depth 1 --single-branch
+
+# Build the shared lib + header out-of-tree, then stage them next to the Go
+# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
+# them up.
+libfacedetect.so: sources/face-detect.cpp
+	cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
+	cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
+	cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
+	cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
+
+face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
+	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
+
+package: face-detect-grpc
+	bash package.sh
+
+build: package
+
+# Test target. The embed/detect/verify/analyze smoke specs are gated on
+# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
+# heavy specs auto-skip and only the pure-Go parsing specs run.
+test:
+	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
+
+clean: purge
+	rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
+
+purge:
+	rm -rf sources/face-detect.cpp
--- a/backend/go/face-detect/gofacedetect.go
+++ b/backend/go/face-detect/gofacedetect.go
@@ -0,0 +1,431 @@
+package main
+
+import (
+	"encoding/base64"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"math"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"time"
+	"unsafe"
+
+	"github.com/mudler/LocalAI/pkg/grpc/base"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/xlog"
+)
+
+// purego-bound entry points from libfacedetect.so. Names match
+// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
+// is enough to spot drift.
+//
+// The opaque ctx and the malloc'd char*/float* return values are declared as
+// uintptr so we get the raw pointer back and can release it via the matching
+// capi free function. purego's native string/[]float32 returns would copy and
+// forget the original pointer, leaking the C-owned buffer on every call.
+var (
+	CppAbiVersion  func() int32
+	CppLoad        func(ggufPath string) uintptr
+	CppFree        func(ctx uintptr)
+	CppLastError   func(ctx uintptr) string
+	CppFreeString  func(s uintptr)
+	CppFreeVec     func(v uintptr)
+	CppEmbedPath   func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
+	CppEmbedRGB    func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
+	CppDetectJSON  func(ctx uintptr, imagePath string) uintptr
+	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
+	CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
+)
+
+// FaceDetect implements the face-recognition (biometric) subset of the Backend
+// gRPC service over libfacedetect.so. The C side keeps a single loaded model
+// pack plus a per-ctx last-error buffer and is not reentrant, so
+// base.SingleThread serializes every call.
+type FaceDetect struct {
+	base.SingleThread
+	opts   loadOptions
+	ctxPtr uintptr
+}
+
+func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
+	model := opts.ModelFile
+	if model == "" {
+		model = opts.ModelPath
+	}
+	if !filepath.IsAbs(model) && opts.ModelPath != "" {
+		model = filepath.Join(opts.ModelPath, model)
+	}
+	if model == "" {
+		return errors.New("face-detect: ModelFile is required")
+	}
+
+	f.opts = parseOptions(opts.Options)
+	if f.opts.modelName == "" {
+		f.opts.modelName = filepath.Base(model)
+	}
+
+	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
+	// one backend process per model and serves requests concurrently, so the
+	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
+	// FACEDETECT_THREADS is read by the engine at backend construction, so it
+	// must be set before the capi load. A non-positive Threads means "unset":
+	// leave the env alone so the engine keeps its sane default.
+	threads := opts.Threads
+	if threads > 0 {
+		if err := os.Setenv("FACEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
+			return fmt.Errorf("face-detect: set FACEDETECT_THREADS: %w", err)
+		}
+		xlog.Info("face-detect: applying LocalAI thread budget", "threads", threads)
+	}
+
+	xlog.Info("face-detect: loading model", "model", model,
+		"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
+
+	ctx := CppLoad(model)
+	if ctx == 0 {
+		// The last-error buffer lives on the ctx that was never returned, so
+		// surface the path the operator tried to load instead.
+		return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
+	}
+	f.ctxPtr = ctx
+	return nil
+}
+
+// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
+// the supplied image. Mirroring the Python face backend, the image is read from
+// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
+// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
+// server wraps the returned slice in an EmbeddingResult.
+func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
+	if f.ctxPtr == 0 {
+		return nil, errors.New("face-detect: model not loaded")
+	}
+	if len(req.Images) == 0 || req.Images[0] == "" {
+		return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
+	}
+
+	path, cleanup, err := materializeImage(req.Images[0])
+	if err != nil {
+		return nil, err
+	}
+	defer cleanup()
+
+	return f.embedPath(path)
+}
+
+func (f *FaceDetect) embedPath(path string) ([]float32, error) {
+	var vec uintptr
+	var dim int32
+	rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
+	if rc != 0 || vec == 0 || dim <= 0 {
+		return nil, f.lastErr("embed", path)
+	}
+	defer CppFreeVec(vec)
+	// Copy out of the C-owned malloc'd buffer before freeing it. The
+	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
+	// nor moves this buffer and we copy immediately.
+	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
+	out := make([]float32, int(dim))
+	copy(out, src)
+	return out, nil
+}
+
+// Detect runs SCRFD over the image and returns one Detection per face. The
+// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
+// width/height, so the corners are converted. The 5 facial landmarks the engine
+// also returns are dropped: the Detection message has no field for them.
+func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Src == "" {
+		return pb.DetectResponse{}, errors.New("face-detect: src image is required")
+	}
+
+	path, cleanup, err := materializeImage(req.Src)
+	if err != nil {
+		return pb.DetectResponse{}, err
+	}
+	defer cleanup()
+
+	faces, err := f.detectFaces(path)
+	if err != nil {
+		return pb.DetectResponse{}, err
+	}
+
+	dets := make([]*pb.Detection, 0, len(faces))
+	for _, fc := range faces {
+		if req.Threshold > 0 && fc.Score < req.Threshold {
+			continue
+		}
+		x, y, w, h := fc.xywh()
+		dets = append(dets, &pb.Detection{
+			X:          x,
+			Y:          y,
+			Width:      w,
+			Height:     h,
+			Confidence: fc.Score,
+			ClassName:  "face",
+		})
+	}
+	return pb.DetectResponse{Detections: dets}, nil
+}
+
+// FaceVerify embeds the primary face in each image and reports whether they are
+// the same identity by cosine distance against a threshold. A request threshold
+// <= 0 falls back to the model-configured default (verify_threshold option,
+// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
+// veto internally (verified forced false on a spoof); the per-image liveness
+// scores are not exposed by the verify entry point, so img*_is_real /
+// img*_antispoof_score stay at their zero values.
+func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Img1 == "" || req.Img2 == "" {
+		return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
+	}
+
+	path1, cleanup1, err := materializeImage(req.Img1)
+	if err != nil {
+		return pb.FaceVerifyResponse{}, err
+	}
+	defer cleanup1()
+	path2, cleanup2, err := materializeImage(req.Img2)
+	if err != nil {
+		return pb.FaceVerifyResponse{}, err
+	}
+	defer cleanup2()
+
+	threshold := req.Threshold
+	if threshold <= 0 {
+		threshold = f.opts.verifyThreshold
+	}
+
+	antiSpoof := int32(0)
+	if req.AntiSpoofing {
+		antiSpoof = 1
+	}
+
+	started := time.Now()
+	var distance float32
+	var verified int32
+	rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
+		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
+	if rc != 0 {
+		return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
+	}
+	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
+
+	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
+	// matching the Python face backend's reporting.
+	confidence := float32(0)
+	if threshold > 0 {
+		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
+	}
+
+	return pb.FaceVerifyResponse{
+		Verified:         verified != 0,
+		Distance:         distance,
+		Threshold:        threshold,
+		Confidence:       confidence,
+		Model:            f.opts.modelName,
+		Img1Area:         f.bestArea(path1),
+		Img2Area:         f.bestArea(path2),
+		ProcessingTimeMs: elapsedMs,
+	}, nil
+}
+
+// FaceAnalyze runs the genderage head on every detected face. The C-API returns
+// "M"/"F" gender labels and a rounded age; the labels are normalized to the
+// "Man"/"Woman" values the proto documents.
+func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Img == "" {
+		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
+	}
+
+	path, cleanup, err := materializeImage(req.Img)
+	if err != nil {
+		return pb.FaceAnalyzeResponse{}, err
+	}
+	defer cleanup()
+
+	ptr := CppAnalyzeJSON(f.ctxPtr, path)
+	if ptr == 0 {
+		return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
+	}
+	defer CppFreeString(ptr)
+
+	faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
+	if err != nil {
+		return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
+	}
+	return pb.FaceAnalyzeResponse{Faces: faces}, nil
+}
+
+// faceBox is one entry of the detect/analyze JSON documents the engine emits.
+type faceBox struct {
+	Score  float32   `json:"score"`
+	Box    []float32 `json:"box"`
+	Age    float32   `json:"age"`
+	Gender string    `json:"gender"`
+}
+
+// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
+// proto carries. A short or missing box yields zeros.
+func (b faceBox) xywh() (x, y, w, h float32) {
+	if len(b.Box) < 4 {
+		return 0, 0, 0, 0
+	}
+	return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
+}
+
+type facesJSON struct {
+	Faces []faceBox `json:"faces"`
+}
+
+func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
+	ptr := CppDetectJSON(f.ctxPtr, path)
+	if ptr == 0 {
+		return nil, f.lastErr("detect", path)
+	}
+	defer CppFreeString(ptr)
+
+	var doc facesJSON
+	if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
+		return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
+	}
+	return doc.Faces, nil
+}
+
+// bestArea returns the FacialArea of the highest-scoring face in an image, or an
+// empty area when detection fails or finds nothing. Best-effort: verify already
+// succeeded, so a missing region must not turn a valid match into an error.
+func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
+	faces, err := f.detectFaces(path)
+	if err != nil || len(faces) == 0 {
+		return &pb.FacialArea{}
+	}
+	best := faces[0]
+	for _, fc := range faces[1:] {
+		if fc.Score > best.Score {
+			best = fc
+		}
+	}
+	x, y, w, h := best.xywh()
+	return &pb.FacialArea{X: x, Y: y, W: w, H: h}
+}
+
+// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
+// The engine reports gender as "M"/"F"; both the dominant label and the score
+// map are filled with the "Man"/"Woman" form the proto documents.
+func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
+	var parsed facesJSON
+	if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
+		return nil, err
+	}
+
+	out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
+	for _, fc := range parsed.Faces {
+		x, y, w, h := fc.xywh()
+		fa := &pb.FaceAnalysis{
+			Region:         &pb.FacialArea{X: x, Y: y, W: w, H: h},
+			FaceConfidence: fc.Score,
+			Age:            fc.Age,
+		}
+		if label := normalizeGender(fc.Gender); label != "" {
+			fa.DominantGender = label
+			fa.Gender = map[string]float32{label: 1.0}
+		}
+		out = append(out, fa)
+	}
+	return out, nil
+}
+
+// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
+// proto documents. Unknown codes pass through unchanged.
+func normalizeGender(g string) string {
+	switch strings.ToUpper(strings.TrimSpace(g)) {
+	case "M":
+		return "Man"
+	case "F":
+		return "Woman"
+	case "":
+		return ""
+	default:
+		return g
+	}
+}
+
+// materializeImage decodes a base64 image payload into a temp file and returns
+// its path plus a cleanup func. As a convenience for callers that already pass a
+// filesystem path (e.g. a test fixture), an existing path is used as-is with a
+// no-op cleanup. data: URI prefixes are stripped before decoding.
+func materializeImage(src string) (path string, cleanup func(), err error) {
+	noop := func() {}
+	if src == "" {
+		return "", noop, errors.New("face-detect: empty image input")
+	}
+	if _, statErr := os.Stat(src); statErr == nil {
+		return src, noop, nil
+	}
+
+	payload := src
+	if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
+		payload = payload[i+1:]
+	}
+	data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
+	if decErr != nil || len(data) == 0 {
+		return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
+	}
+
+	tmp, createErr := os.CreateTemp("", "face-detect-*.img")
+	if createErr != nil {
+		return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
+	}
+	cleanup = func() { _ = os.Remove(tmp.Name()) }
+	if _, wErr := tmp.Write(data); wErr != nil {
+		_ = tmp.Close()
+		cleanup()
+		return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
+	}
+	if cErr := tmp.Close(); cErr != nil {
+		cleanup()
+		return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
+	}
+	return tmp.Name(), cleanup, nil
+}
+
+// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
+func (f *FaceDetect) lastErr(op, subject string) error {
+	msg := strings.TrimSpace(CppLastError(f.ctxPtr))
+	if msg == "" {
+		msg = "no error detail"
+	}
+	return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
+}
+
+// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
+// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
+//
+// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
+// moves the buffer and we dereference it immediately to copy the bytes out.
+func goStringFromCPtr(cptr uintptr) string {
+	if cptr == 0 {
+		return ""
+	}
+	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
+	n := 0
+	for *(*byte)(unsafe.Add(p, n)) != 0 {
+		n++
+	}
+	return string(unsafe.Slice((*byte)(p), n))
+}
--- a/backend/go/face-detect/gofacedetect_test.go
+++ b/backend/go/face-detect/gofacedetect_test.go
@@ -0,0 +1,230 @@
+package main
+
+import (
+	"encoding/base64"
+	"os"
+	"sync"
+	"testing"
+
+	"github.com/ebitengine/purego"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestFaceDetect(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "face-detect Backend Suite")
+}
+
+var (
+	libLoadOnce sync.Once
+	libLoadErr  error
+)
+
+// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
+// bridge without spinning up the gRPC server. Records the error (the smoke
+// specs skip themselves) when libfacedetect.so is not loadable from cwd
+// (LD_LIBRARY_PATH or a symlink in ./).
+func ensureLibLoaded() error {
+	libLoadOnce.Do(func() {
+		libName := os.Getenv("FACEDETECT_LIBRARY")
+		if libName == "" {
+			libName = "libfacedetect.so"
+		}
+		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+		if err != nil {
+			libLoadErr = err
+			return
+		}
+		purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
+		purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
+		purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
+		purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
+		purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
+		purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
+		purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
+		purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
+		purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
+		purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
+		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
+	})
+	return libLoadErr
+}
+
+var _ = Describe("parseOptions", func() {
+	It("defaults verify_threshold to 0.35", func() {
+		o := parseOptions(nil)
+		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
+		Expect(o.modelName).To(Equal(""))
+	})
+
+	It("parses verify_threshold, threshold alias and model_name", func() {
+		o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
+		Expect(o.modelName).To(Equal("buffalo_l"))
+
+		o2 := parseOptions([]string{"threshold:0.3"})
+		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
+	})
+
+	It("ignores non-positive thresholds and keeps the default", func() {
+		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
+	})
+})
+
+var _ = Describe("normalizeGender", func() {
+	It("maps M/F codes to Man/Woman", func() {
+		Expect(normalizeGender("M")).To(Equal("Man"))
+		Expect(normalizeGender("f")).To(Equal("Woman"))
+		Expect(normalizeGender(" m ")).To(Equal("Man"))
+	})
+
+	It("passes empty and unknown codes through", func() {
+		Expect(normalizeGender("")).To(Equal(""))
+		Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
+	})
+})
+
+var _ = Describe("faceBox.xywh", func() {
+	It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
+		b := faceBox{Box: []float32{10, 20, 50, 80}}
+		x, y, w, h := b.xywh()
+		Expect(x).To(Equal(float32(10)))
+		Expect(y).To(Equal(float32(20)))
+		Expect(w).To(Equal(float32(40)))
+		Expect(h).To(Equal(float32(60)))
+	})
+
+	It("returns zeros for a short box", func() {
+		x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
+		Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
+	})
+})
+
+var _ = Describe("parseAnalyzeJSON", func() {
+	It("maps region, age and gender for each face", func() {
+		doc := `{"faces":[
+			{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
+			{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
+		faces, err := parseAnalyzeJSON(doc)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(HaveLen(2))
+
+		Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
+		Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
+		Expect(faces[0].DominantGender).To(Equal("Man"))
+		Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
+		Expect(faces[0].Region.W).To(Equal(float32(40)))
+		Expect(faces[0].Region.H).To(Equal(float32(60)))
+
+		Expect(faces[1].DominantGender).To(Equal("Woman"))
+	})
+
+	It("tolerates a missing gender field", func() {
+		faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(HaveLen(1))
+		Expect(faces[0].DominantGender).To(Equal(""))
+		Expect(faces[0].Gender).To(BeEmpty())
+	})
+
+	It("returns no faces for an empty document", func() {
+		faces, err := parseAnalyzeJSON(`{"faces":[]}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(BeEmpty())
+	})
+
+	It("returns an error on malformed JSON", func() {
+		_, err := parseAnalyzeJSON(`{not-json`)
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+var _ = Describe("materializeImage", func() {
+	It("decodes a base64 payload to a temp file", func() {
+		payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
+		path, cleanup, err := materializeImage(payload)
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		data, rerr := os.ReadFile(path)
+		Expect(rerr).ToNot(HaveOccurred())
+		Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
+	})
+
+	It("strips a data: URI prefix before decoding", func() {
+		payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
+		path, cleanup, err := materializeImage(payload)
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		data, rerr := os.ReadFile(path)
+		Expect(rerr).ToNot(HaveOccurred())
+		Expect(data).To(Equal([]byte("hello")))
+	})
+
+	It("uses an existing path as-is", func() {
+		tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
+		Expect(err).ToNot(HaveOccurred())
+		defer func() { _ = os.Remove(tmp.Name()) }()
+		Expect(tmp.Close()).To(Succeed())
+
+		path, cleanup, err := materializeImage(tmp.Name())
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		Expect(path).To(Equal(tmp.Name()))
+	})
+
+	It("errors on input that is neither a path nor base64", func() {
+		_, _, err := materializeImage("not base64!!!")
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+// The specs below exercise the real C-API end to end. They run only when both a
+// model GGUF and a test image are provided, and skip cleanly otherwise so the
+// suite stays green without large assets.
+var _ = Describe("FaceDetect end-to-end", Ordered, func() {
+	var (
+		f         *FaceDetect
+		modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
+		imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
+	)
+
+	BeforeAll(func() {
+		if modelPath == "" || imagePath == "" {
+			Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
+		}
+		if err := ensureLibLoaded(); err != nil {
+			Skip("libfacedetect.so not loadable: " + err.Error())
+		}
+		f = &FaceDetect{}
+		Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
+	})
+
+	It("embeds the primary face in an image", func() {
+		emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(emb).ToNot(BeEmpty())
+	})
+
+	It("detects at least one face", func() {
+		resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Detections).ToNot(BeEmpty())
+		Expect(resp.Detections[0].ClassName).To(Equal("face"))
+	})
+
+	It("verifies an image against itself as the same identity", func() {
+		resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Verified).To(BeTrue())
+		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
+	})
+
+	It("analyzes age/gender for each face", func() {
+		resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Faces).ToNot(BeEmpty())
+	})
+})
--- a/backend/go/face-detect/main.go
+++ b/backend/go/face-detect/main.go
@@ -0,0 +1,65 @@
+package main
+
+// Started internally by LocalAI - one gRPC server per loaded model.
+//
+// Loads libfacedetect.so via purego and registers the flat C-API entry points
+// declared in facedetect_capi.h. The library name can be overridden with
+// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
+// convention in the sibling backends); the default looks for the .so next to
+// this binary (resolved via LD_LIBRARY_PATH by run.sh).
+import (
+	"flag"
+	"fmt"
+	"os"
+
+	"github.com/ebitengine/purego"
+	grpc "github.com/mudler/LocalAI/pkg/grpc"
+)
+
+var (
+	addr = flag.String("addr", "localhost:50051", "the address to connect to")
+)
+
+type LibFuncs struct {
+	FuncPtr any
+	Name    string
+}
+
+func main() {
+	libName := os.Getenv("FACEDETECT_LIBRARY")
+	if libName == "" {
+		libName = "libfacedetect.so"
+	}
+
+	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	if err != nil {
+		panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
+	}
+
+	// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
+	// uintptr so the raw pointer can be freed via the matching capi free fn.
+	libFuncs := []LibFuncs{
+		{&CppAbiVersion, "facedetect_capi_abi_version"},
+		{&CppLoad, "facedetect_capi_load"},
+		{&CppFree, "facedetect_capi_free"},
+		{&CppLastError, "facedetect_capi_last_error"},
+		{&CppFreeString, "facedetect_capi_free_string"},
+		{&CppFreeVec, "facedetect_capi_free_vec"},
+		{&CppEmbedPath, "facedetect_capi_embed_path"},
+		{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
+		{&CppDetectJSON, "facedetect_capi_detect_path_json"},
+		{&CppVerifyPaths, "facedetect_capi_verify_paths"},
+		{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
+	}
+	for _, lf := range libFuncs {
+		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
+	}
+
+	fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
+
+	flag.Parse()
+
+	if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
+		panic(err)
+	}
+}
--- a/backend/go/face-detect/options.go
+++ b/backend/go/face-detect/options.go
@@ -0,0 +1,47 @@
+package main
+
+import (
+	"strconv"
+	"strings"
+)
+
+// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
+// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
+// face backend ships with so the two implementations agree on verdicts out of
+// the box.
+const defaultVerifyThreshold float32 = 0.35
+
+// loadOptions holds the parsed model-level options for face-detect.
+type loadOptions struct {
+	verifyThreshold float32
+	modelName       string
+}
+
+func splitOption(o string) (key, value string, ok bool) {
+	i := strings.Index(o, ":")
+	if i < 0 {
+		return "", "", false
+	}
+	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
+}
+
+// parseOptions reads the backend "key:value" option slice. Unknown keys are
+// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
+func parseOptions(opts []string) loadOptions {
+	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
+	for _, oo := range opts {
+		key, value, ok := splitOption(oo)
+		if !ok {
+			continue
+		}
+		switch key {
+		case "verify_threshold", "threshold":
+			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
+				o.verifyThreshold = float32(f)
+			}
+		case "model_name":
+			o.modelName = value
+		}
+	}
+	return o
+}
--- a/backend/go/face-detect/package.sh
+++ b/backend/go/face-detect/package.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
+# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
+# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
+# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
+# is used instead of the host's.
+
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+REPO_ROOT="${CURDIR}/../../.."
+
+mkdir -p "$CURDIR/package/lib"
+
+cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
+cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
+
+# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
+# LD_LIBRARY_PATH, which run.sh points at lib/.
+cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
+	exit 1
+}
+
+# Detect architecture and copy the core runtime libs libfacedetect.so links
+# against, plus the matching dynamic loader as lib/ld.so.
+if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
+    echo "Detected x86_64 architecture, copying x86_64 libraries..."
+    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
+    echo "Detected ARM64 architecture, copying ARM64 libraries..."
+    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ "$(uname -s)" = "Darwin" ]; then
+    echo "Detected Darwin"
+else
+    echo "Error: Could not detect architecture"
+    exit 1
+fi
+
+# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
+# BUILD_TYPE so the backend can reach the GPU without the runtime base image
+# shipping those drivers.
+GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
+if [ -f "$GPU_LIB_SCRIPT" ]; then
+    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
+    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
+    package_gpu_libs
+fi
+
+echo "Packaging completed successfully"
+ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/face-detect/run.sh
+++ b/backend/go/face-detect/run.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+
+# If a self-contained ld.so was packaged, route through it so the packaged
+# libc / libstdc++ are used instead of the host's (matches the voice-detect /
+# whisper / parakeet backends' runtime layout).
+if [ -f "$CURDIR/lib/ld.so" ]; then
+	echo "Using lib/ld.so"
+	exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
+fi
+
+exec "$CURDIR/face-detect-grpc" "$@"
--- a/backend/go/face-detect/test.sh
+++ b/backend/go/face-detect/test.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+cd "$CURDIR"
+
+echo "Running face-detect backend tests..."
+
+# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
+# specs run only when a model + image are provided via
+# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
+# auto-skip.
+LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
+
+echo "face-detect tests completed."
--- a/backend/go/voice-detect/.gitignore
+++ b/backend/go/voice-detect/.gitignore
@@ -0,0 +1,18 @@
+# Fetched upstream sources
+sources/
+
+# CMake build directories
+build*/
+
+# build artifacts staged in-tree by the Makefile (cp from sources/) or
+# symlinked for local dev; the real sources live in voice-detect.cpp upstream.
+*.so
+*.so.*
+voicedetect_capi.h
+compile_commands.json
+
+# Compiled backend binary
+voice-detect-grpc
+
+# Packaging output
+package/
--- a/backend/go/voice-detect/Makefile
+++ b/backend/go/voice-detect/Makefile
@@ -0,0 +1,107 @@
+# voice-detect backend Makefile.
+#
+# Upstream pin lives below as VOICEDETECT_VERSION?=3d51077... (.github/bump_deps.sh
+# can find and update it - matches the parakeet.cpp / whisper.cpp / ds4 convention).
+#
+# Local dev shortcut: if you already have an out-of-tree voice-detect.cpp build,
+# symlink the .so + header into this directory and skip the clone/cmake steps:
+#
+#   ln -sf /path/to/voice-detect.cpp/build-shared/libvoicedetect.so .
+#   ln -sf /path/to/voice-detect.cpp/include/voicedetect_capi.h .
+#   go build -o voice-detect-grpc .
+#
+# The default target below does the proper clone-at-pin + cmake build so CI does
+# not need a side-checkout.
+
+VOICEDETECT_VERSION?=3d510772357538c5182808ac7de2278b84824e24
+VOICEDETECT_REPO?=https://github.com/mudler/voice-detect.cpp
+
+GOCMD?=go
+GO_TAGS?=
+JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
+
+BUILD_TYPE?=
+NATIVE?=false
+
+# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
+# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
+RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
+
+# Build ggml statically into libvoicedetect.so (PIC) so the shared lib is
+# self-contained: dlopen needs no libggml*.so alongside it, only system libs
+# (libstdc++/libgomp/libc) that the runtime image already provides.
+CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DVOICEDETECT_SHARED=ON -DVOICEDETECT_BUILD_CLI=OFF -DVOICEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
+
+ifeq ($(NATIVE),false)
+	CMAKE_ARGS+=-DGGML_NATIVE=OFF
+endif
+
+# voice-detect.cpp gates its GGML backends behind VOICEDETECT_GGML_* options and
+# does set(GGML_CUDA ${VOICEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
+# -DGGML_CUDA=ON is overwritten back to OFF. Forward the VOICEDETECT_GGML_*
+# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
+ifeq ($(BUILD_TYPE),cublas)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDA=ON
+	# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, reaches
+	# torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
+	# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
+	# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
+	ifeq ($(CUDA_MAJOR_VERSION),13)
+	ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
+		CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDNN=ON
+	endif
+	endif
+else ifeq ($(BUILD_TYPE),openblas)
+	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+else ifeq ($(BUILD_TYPE),hipblas)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_HIP=ON
+else ifeq ($(BUILD_TYPE),vulkan)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_VULKAN=ON
+else ifeq ($(BUILD_TYPE),metal)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_METAL=ON
+endif
+
+.PHONY: voice-detect-grpc package build clean purge test all
+
+all: voice-detect-grpc
+
+# Clone the upstream voice-detect.cpp source at the pinned commit. Directory acts
+# as the target so make only re-clones when missing. After a VOICEDETECT_VERSION
+# bump, run 'make purge && make' to refetch.
+sources/voice-detect.cpp:
+	mkdir -p sources/voice-detect.cpp
+	cd sources/voice-detect.cpp && \
+	git init -q && \
+	git remote add origin $(VOICEDETECT_REPO) && \
+	git fetch --depth 1 origin $(VOICEDETECT_VERSION) && \
+	git checkout FETCH_HEAD && \
+	git submodule update --init --recursive --depth 1 --single-branch
+
+# Build the shared lib + header out-of-tree, then stage them next to the Go
+# sources so purego.Dlopen("libvoicedetect.so") and the cgo-less build both pick
+# them up.
+libvoicedetect.so: sources/voice-detect.cpp
+	cmake -B sources/voice-detect.cpp/build-shared -S sources/voice-detect.cpp $(CMAKE_ARGS)
+	cmake --build sources/voice-detect.cpp/build-shared --config Release -j$(JOBS) --target voicedetect
+	cp -fv sources/voice-detect.cpp/build-shared/libvoicedetect.so* ./ 2>/dev/null || true
+	cp -fv sources/voice-detect.cpp/include/voicedetect_capi.h ./
+
+voice-detect-grpc: libvoicedetect.so main.go govoicedetect.go options.go
+	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voice-detect-grpc .
+
+package: voice-detect-grpc
+	bash package.sh
+
+build: package
+
+# Test target. The embed/verify/analyze smoke specs are gated on
+# VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV; without them the
+# heavy specs auto-skip and only the pure-Go parsing specs run.
+test:
+	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
+
+clean: purge
+	rm -rf libvoicedetect.so* voicedetect_capi.h package voice-detect-grpc
+
+purge:
+	rm -rf sources/voice-detect.cpp
--- a/backend/go/voice-detect/govoicedetect.go
+++ b/backend/go/voice-detect/govoicedetect.go
@@ -0,0 +1,273 @@
+package main
+
+import (
+	"encoding/json"
+	"errors"
+	"fmt"
+	"math"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"time"
+	"unsafe"
+
+	"github.com/mudler/LocalAI/pkg/grpc/base"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/xlog"
+)
+
+// purego-bound entry points from libvoicedetect.so. Names match
+// voicedetect_capi.h exactly so a `nm libvoicedetect.so | grep voicedetect_capi`
+// is enough to spot drift.
+//
+// The opaque ctx and the malloc'd char*/float* return values are declared as
+// uintptr so we get the raw pointer back and can release it via the matching
+// capi free function. purego's native string/[]float32 returns would copy and
+// forget the original pointer, leaking the C-owned buffer on every call.
+var (
+	CppAbiVersion  func() int32
+	CppLoad        func(ggufPath string) uintptr
+	CppFree        func(ctx uintptr)
+	CppLastError   func(ctx uintptr) string
+	CppFreeString  func(s uintptr)
+	CppFreeVec     func(v uintptr)
+	CppEmbedPath   func(ctx uintptr, wavPath string, outVec, outDim unsafe.Pointer) int32
+	CppEmbedPCM    func(ctx uintptr, pcm []float32, nSamples, sampleRate int32, outVec, outDim unsafe.Pointer) int32
+	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, outDistance, outVerified unsafe.Pointer) int32
+	CppAnalyzeJSON func(ctx uintptr, wavPath string) uintptr
+)
+
+// VoiceDetect implements the speaker-recognition voice subset of the Backend
+// gRPC service over libvoicedetect.so. The C side keeps a single loaded model
+// plus a per-ctx last-error buffer and is not reentrant, so base.SingleThread
+// serializes every call.
+type VoiceDetect struct {
+	base.SingleThread
+	opts   loadOptions
+	ctxPtr uintptr
+}
+
+func (v *VoiceDetect) Load(opts *pb.ModelOptions) error {
+	model := opts.ModelFile
+	if model == "" {
+		model = opts.ModelPath
+	}
+	if !filepath.IsAbs(model) && opts.ModelPath != "" {
+		model = filepath.Join(opts.ModelPath, model)
+	}
+	if model == "" {
+		return errors.New("voice-detect: ModelFile is required")
+	}
+
+	v.opts = parseOptions(opts.Options)
+	if v.opts.modelName == "" {
+		v.opts.modelName = filepath.Base(model)
+	}
+
+	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
+	// one backend process per model and serves requests concurrently, so the
+	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
+	// VOICEDETECT_THREADS is read by the engine at backend construction, so it
+	// must be set before the capi load. A non-positive Threads means "unset":
+	// leave the env alone so the engine keeps its sane default.
+	threads := opts.Threads
+	if threads > 0 {
+		if err := os.Setenv("VOICEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
+			return fmt.Errorf("voice-detect: set VOICEDETECT_THREADS: %w", err)
+		}
+		xlog.Info("voice-detect: applying LocalAI thread budget", "threads", threads)
+	}
+
+	xlog.Info("voice-detect: loading model", "model", model,
+		"verify_threshold", v.opts.verifyThreshold, "abi", CppAbiVersion())
+
+	ctx := CppLoad(model)
+	if ctx == 0 {
+		// The last-error buffer lives on the ctx that was never returned, so
+		// surface the path the operator tried to load instead.
+		return fmt.Errorf("voice-detect: voicedetect_capi_load failed for %q", model)
+	}
+	v.ctxPtr = ctx
+	return nil
+}
+
+// VoiceEmbed returns the L2-normalized speaker embedding for an audio clip.
+// The request carries a filesystem PATH; the HTTP layer materializes
+// base64/URL/data-URI inputs to a temp file before the gRPC call.
+func (v *VoiceDetect) VoiceEmbed(req *pb.VoiceEmbedRequest) (pb.VoiceEmbedResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio == "" {
+		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: audio path is required")
+	}
+	emb, err := v.embedPath(req.Audio)
+	if err != nil {
+		return pb.VoiceEmbedResponse{}, err
+	}
+	return pb.VoiceEmbedResponse{Embedding: emb, Model: v.opts.modelName}, nil
+}
+
+func (v *VoiceDetect) embedPath(path string) ([]float32, error) {
+	var vec uintptr
+	var dim int32
+	rc := CppEmbedPath(v.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
+	if rc != 0 || vec == 0 || dim <= 0 {
+		return nil, v.lastErr("embed", path)
+	}
+	defer CppFreeVec(vec)
+	// Copy out of the C-owned malloc'd buffer before freeing it. The
+	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
+	// nor moves this buffer and we copy immediately.
+	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
+	out := make([]float32, int(dim))
+	copy(out, src)
+	return out, nil
+}
+
+// VoiceVerify embeds two clips and reports whether they are the same speaker by
+// cosine distance against a threshold. A request threshold <= 0 falls back to
+// the model-configured default (verify_threshold option, 0.25 if unset).
+func (v *VoiceDetect) VoiceVerify(req *pb.VoiceVerifyRequest) (pb.VoiceVerifyResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio1 == "" || req.Audio2 == "" {
+		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: audio1 and audio2 are required")
+	}
+
+	threshold := req.Threshold
+	if threshold <= 0 {
+		threshold = v.opts.verifyThreshold
+	}
+
+	started := time.Now()
+	var distance float32
+	var verified int32
+	rc := CppVerifyPaths(v.ctxPtr, req.Audio1, req.Audio2, threshold,
+		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
+	if rc != 0 {
+		return pb.VoiceVerifyResponse{}, v.lastErr("verify", req.Audio1+","+req.Audio2)
+	}
+	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
+
+	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
+	// matching the Python speaker-recognition backend's reporting.
+	confidence := float32(0)
+	if threshold > 0 {
+		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
+	}
+
+	return pb.VoiceVerifyResponse{
+		Verified:         verified != 0,
+		Distance:         distance,
+		Threshold:        threshold,
+		Confidence:       confidence,
+		Model:            v.opts.modelName,
+		ProcessingTimeMs: elapsedMs,
+	}, nil
+}
+
+// VoiceAnalyze runs the age/gender/emotion heads on a single clip. The C-API
+// always evaluates every supported head, so the request's actions filter is
+// advisory and the full analysis is returned as a single segment (the engine
+// does not produce time-bounded segments).
+func (v *VoiceDetect) VoiceAnalyze(req *pb.VoiceAnalyzeRequest) (pb.VoiceAnalyzeResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio == "" {
+		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: audio path is required")
+	}
+
+	ptr := CppAnalyzeJSON(v.ctxPtr, req.Audio)
+	if ptr == 0 {
+		return pb.VoiceAnalyzeResponse{}, v.lastErr("analyze", req.Audio)
+	}
+	defer CppFreeString(ptr)
+
+	seg, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
+	if err != nil {
+		return pb.VoiceAnalyzeResponse{}, fmt.Errorf("voice-detect: analyze JSON for %q: %w", req.Audio, err)
+	}
+	return pb.VoiceAnalyzeResponse{Segments: []*pb.VoiceAnalysis{seg}}, nil
+}
+
+// analyzeJSON mirrors the document returned by voicedetect_capi_analyze_path_json:
+//
+//	{"age":42.0,
+//	 "gender":{"label":"female","female":0.88,"male":0.12},
+//	 "emotion":{"label":"neutral","scores":{"neutral":0.7, ...}}}
+//
+// gender is a mixed object (a "label" string plus per-class float scores), so
+// it is decoded into raw messages and split in parseAnalyzeJSON.
+type analyzeJSON struct {
+	Age     float32                    `json:"age"`
+	Gender  map[string]json.RawMessage `json:"gender"`
+	Emotion struct {
+		Label  string             `json:"label"`
+		Scores map[string]float32 `json:"scores"`
+	} `json:"emotion"`
+}
+
+// parseAnalyzeJSON maps the engine's analyze document onto a VoiceAnalysis.
+// start/end stay 0: the model emits a single whole-utterance result, not
+// time-bounded segments.
+func parseAnalyzeJSON(doc string) (*pb.VoiceAnalysis, error) {
+	var a analyzeJSON
+	if err := json.Unmarshal([]byte(doc), &a); err != nil {
+		return nil, err
+	}
+
+	seg := &pb.VoiceAnalysis{
+		Age:             a.Age,
+		DominantEmotion: a.Emotion.Label,
+		Emotion:         a.Emotion.Scores,
+	}
+
+	if len(a.Gender) > 0 {
+		gender := make(map[string]float32, len(a.Gender))
+		for k, raw := range a.Gender {
+			if k == "label" {
+				_ = json.Unmarshal(raw, &seg.DominantGender)
+				continue
+			}
+			var score float32
+			if err := json.Unmarshal(raw, &score); err == nil {
+				gender[k] = score
+			}
+		}
+		seg.Gender = gender
+	}
+
+	return seg, nil
+}
+
+// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
+func (v *VoiceDetect) lastErr(op, subject string) error {
+	msg := strings.TrimSpace(CppLastError(v.ctxPtr))
+	if msg == "" {
+		msg = "no error detail"
+	}
+	return fmt.Errorf("voice-detect: %s failed for %q: %s", op, subject, msg)
+}
+
+// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
+// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
+//
+// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
+// moves the buffer and we dereference it immediately to copy the bytes out.
+func goStringFromCPtr(cptr uintptr) string {
+	if cptr == 0 {
+		return ""
+	}
+	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
+	n := 0
+	for *(*byte)(unsafe.Add(p, n)) != 0 {
+		n++
+	}
+	return string(unsafe.Slice((*byte)(p), n))
+}
--- a/backend/go/voice-detect/govoicedetect_test.go
+++ b/backend/go/voice-detect/govoicedetect_test.go
@@ -0,0 +1,144 @@
+package main
+
+import (
+	"os"
+	"sync"
+	"testing"
+
+	"github.com/ebitengine/purego"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestVoiceDetect(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "voice-detect Backend Suite")
+}
+
+var (
+	libLoadOnce sync.Once
+	libLoadErr  error
+)
+
+// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
+// bridge without spinning up the gRPC server. Records the error (the smoke
+// specs skip themselves) when libvoicedetect.so is not loadable from cwd
+// (LD_LIBRARY_PATH or a symlink in ./).
+func ensureLibLoaded() error {
+	libLoadOnce.Do(func() {
+		libName := os.Getenv("VOICEDETECT_LIBRARY")
+		if libName == "" {
+			libName = "libvoicedetect.so"
+		}
+		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+		if err != nil {
+			libLoadErr = err
+			return
+		}
+		purego.RegisterLibFunc(&CppAbiVersion, lib, "voicedetect_capi_abi_version")
+		purego.RegisterLibFunc(&CppLoad, lib, "voicedetect_capi_load")
+		purego.RegisterLibFunc(&CppFree, lib, "voicedetect_capi_free")
+		purego.RegisterLibFunc(&CppLastError, lib, "voicedetect_capi_last_error")
+		purego.RegisterLibFunc(&CppFreeString, lib, "voicedetect_capi_free_string")
+		purego.RegisterLibFunc(&CppFreeVec, lib, "voicedetect_capi_free_vec")
+		purego.RegisterLibFunc(&CppEmbedPath, lib, "voicedetect_capi_embed_path")
+		purego.RegisterLibFunc(&CppEmbedPCM, lib, "voicedetect_capi_embed_pcm")
+		purego.RegisterLibFunc(&CppVerifyPaths, lib, "voicedetect_capi_verify_paths")
+		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "voicedetect_capi_analyze_path_json")
+	})
+	return libLoadErr
+}
+
+var _ = Describe("parseOptions", func() {
+	It("defaults verify_threshold to 0.25", func() {
+		o := parseOptions(nil)
+		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
+		Expect(o.modelName).To(Equal(""))
+	})
+
+	It("parses verify_threshold, threshold alias and model_name", func() {
+		o := parseOptions([]string{"verify_threshold:0.4", "model_name:ecapa", "unknown:x"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
+		Expect(o.modelName).To(Equal("ecapa"))
+
+		o2 := parseOptions([]string{"threshold:0.3"})
+		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
+	})
+
+	It("ignores non-positive thresholds and keeps the default", func() {
+		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
+	})
+})
+
+var _ = Describe("parseAnalyzeJSON", func() {
+	It("maps age, gender label+scores and emotion label+scores", func() {
+		doc := `{"age":42.0,
+			"gender":{"label":"female","female":0.88,"male":0.12},
+			"emotion":{"label":"neutral","scores":{"neutral":0.7,"happy":0.2,"sad":0.1}}}`
+		seg, err := parseAnalyzeJSON(doc)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(seg.Age).To(BeNumerically("~", 42.0, 1e-4))
+		Expect(seg.Start).To(Equal(float32(0)))
+		Expect(seg.End).To(Equal(float32(0)))
+
+		Expect(seg.DominantGender).To(Equal("female"))
+		Expect(seg.Gender).To(HaveKeyWithValue("female", BeNumerically("~", 0.88, 1e-4)))
+		Expect(seg.Gender).To(HaveKeyWithValue("male", BeNumerically("~", 0.12, 1e-4)))
+		// The "label" entry is consumed into DominantGender, not the score map.
+		Expect(seg.Gender).ToNot(HaveKey("label"))
+
+		Expect(seg.DominantEmotion).To(Equal("neutral"))
+		Expect(seg.Emotion).To(HaveKeyWithValue("neutral", BeNumerically("~", 0.7, 1e-4)))
+		Expect(seg.Emotion).To(HaveKeyWithValue("happy", BeNumerically("~", 0.2, 1e-4)))
+	})
+
+	It("tolerates a missing gender block", func() {
+		seg, err := parseAnalyzeJSON(`{"age":30.0,"emotion":{"label":"happy","scores":{"happy":1.0}}}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(seg.DominantGender).To(Equal(""))
+		Expect(seg.DominantEmotion).To(Equal("happy"))
+	})
+
+	It("returns an error on malformed JSON", func() {
+		_, err := parseAnalyzeJSON(`{not-json`)
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+// The specs below exercise the real C-API end to end. They run only when both a
+// model GGUF and a test WAV are provided, and skip cleanly otherwise so the
+// suite stays green without large assets.
+var _ = Describe("VoiceDetect end-to-end", Ordered, func() {
+	var (
+		v         *VoiceDetect
+		modelPath = os.Getenv("VOICEDETECT_BACKEND_TEST_MODEL")
+		wavPath   = os.Getenv("VOICEDETECT_BACKEND_TEST_WAV")
+	)
+
+	BeforeAll(func() {
+		if modelPath == "" || wavPath == "" {
+			Skip("set VOICEDETECT_BACKEND_TEST_MODEL and VOICEDETECT_BACKEND_TEST_WAV to run the e2e specs")
+		}
+		if err := ensureLibLoaded(); err != nil {
+			Skip("libvoicedetect.so not loadable: " + err.Error())
+		}
+		v = &VoiceDetect{}
+		Expect(v.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
+	})
+
+	It("embeds an audio clip", func() {
+		resp, err := v.VoiceEmbed(&pb.VoiceEmbedRequest{Audio: wavPath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Embedding).ToNot(BeEmpty())
+		Expect(resp.Model).ToNot(BeEmpty())
+	})
+
+	It("verifies a clip against itself as the same speaker", func() {
+		resp, err := v.VoiceVerify(&pb.VoiceVerifyRequest{Audio1: wavPath, Audio2: wavPath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Verified).To(BeTrue())
+		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
+	})
+})
--- a/backend/go/voice-detect/main.go
+++ b/backend/go/voice-detect/main.go
@@ -0,0 +1,64 @@
+package main
+
+// Started internally by LocalAI - one gRPC server per loaded model.
+//
+// Loads libvoicedetect.so via purego and registers the flat C-API entry points
+// declared in voicedetect_capi.h. The library name can be overridden with
+// VOICEDETECT_LIBRARY (mirrors the PARAKEET_LIBRARY / OMNIVOICE_LIBRARY
+// convention in the sibling backends); the default looks for the .so next to
+// this binary (resolved via LD_LIBRARY_PATH by run.sh).
+import (
+	"flag"
+	"fmt"
+	"os"
+
+	"github.com/ebitengine/purego"
+	grpc "github.com/mudler/LocalAI/pkg/grpc"
+)
+
+var (
+	addr = flag.String("addr", "localhost:50051", "the address to connect to")
+)
+
+type LibFuncs struct {
+	FuncPtr any
+	Name    string
+}
+
+func main() {
+	libName := os.Getenv("VOICEDETECT_LIBRARY")
+	if libName == "" {
+		libName = "libvoicedetect.so"
+	}
+
+	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	if err != nil {
+		panic(fmt.Errorf("voice-detect: dlopen %q: %w", libName, err))
+	}
+
+	// Bound 1:1 to voicedetect_capi.h. char*/float* returns are registered as
+	// uintptr so the raw pointer can be freed via the matching capi free fn.
+	libFuncs := []LibFuncs{
+		{&CppAbiVersion, "voicedetect_capi_abi_version"},
+		{&CppLoad, "voicedetect_capi_load"},
+		{&CppFree, "voicedetect_capi_free"},
+		{&CppLastError, "voicedetect_capi_last_error"},
+		{&CppFreeString, "voicedetect_capi_free_string"},
+		{&CppFreeVec, "voicedetect_capi_free_vec"},
+		{&CppEmbedPath, "voicedetect_capi_embed_path"},
+		{&CppEmbedPCM, "voicedetect_capi_embed_pcm"},
+		{&CppVerifyPaths, "voicedetect_capi_verify_paths"},
+		{&CppAnalyzeJSON, "voicedetect_capi_analyze_path_json"},
+	}
+	for _, lf := range libFuncs {
+		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
+	}
+
+	fmt.Fprintf(os.Stderr, "[voice-detect] ABI=%d\n", CppAbiVersion())
+
+	flag.Parse()
+
+	if err := grpc.StartServer(*addr, &VoiceDetect{}); err != nil {
+		panic(err)
+	}
+}
--- a/backend/go/voice-detect/options.go
+++ b/backend/go/voice-detect/options.go
@@ -0,0 +1,46 @@
+package main
+
+import (
+	"strconv"
+	"strings"
+)
+
+// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
+// not set one. Matches the Python speaker-recognition backend's default so the
+// two implementations agree on verdicts out of the box.
+const defaultVerifyThreshold float32 = 0.25
+
+// loadOptions holds the parsed model-level options for voice-detect.
+type loadOptions struct {
+	verifyThreshold float32
+	modelName       string
+}
+
+func splitOption(o string) (key, value string, ok bool) {
+	i := strings.Index(o, ":")
+	if i < 0 {
+		return "", "", false
+	}
+	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
+}
+
+// parseOptions reads the backend "key:value" option slice. Unknown keys are
+// ignored. Defaults: verify_threshold 0.25, model_name derived from the file.
+func parseOptions(opts []string) loadOptions {
+	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
+	for _, oo := range opts {
+		key, value, ok := splitOption(oo)
+		if !ok {
+			continue
+		}
+		switch key {
+		case "verify_threshold", "threshold":
+			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
+				o.verifyThreshold = float32(f)
+			}
+		case "model_name":
+			o.modelName = value
+		}
+	}
+	return o
+}
--- a/backend/go/voice-detect/package.sh
+++ b/backend/go/voice-detect/package.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Bundle the voice-detect-grpc binary, libvoicedetect.so, the core runtime libs
+# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
+# so the package is self-contained. Mirrors backend/go/parakeet-cpp/package.sh;
+# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
+# is used instead of the host's.
+
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+REPO_ROOT="${CURDIR}/../../.."
+
+mkdir -p "$CURDIR/package/lib"
+
+cp -avf "$CURDIR/voice-detect-grpc" "$CURDIR/package/"
+cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
+
+# libvoicedetect.so + any soname symlinks. purego.Dlopen resolves it via
+# LD_LIBRARY_PATH, which run.sh points at lib/.
+cp -avf "$CURDIR"/libvoicedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libvoicedetect.so not found in $CURDIR, run 'make' first" >&2
+	exit 1
+}
+
+# Detect architecture and copy the core runtime libs libvoicedetect.so links
+# against, plus the matching dynamic loader as lib/ld.so.
+if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
+    echo "Detected x86_64 architecture, copying x86_64 libraries..."
+    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
+    echo "Detected ARM64 architecture, copying ARM64 libraries..."
+    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ "$(uname -s)" = "Darwin" ]; then
+    echo "Detected Darwin"
+else
+    echo "Error: Could not detect architecture"
+    exit 1
+fi
+
+# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
+# BUILD_TYPE so the backend can reach the GPU without the runtime base image
+# shipping those drivers.
+GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
+if [ -f "$GPU_LIB_SCRIPT" ]; then
+    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
+    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
+    package_gpu_libs
+fi
+
+echo "Packaging completed successfully"
+ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/voice-detect/run.sh
+++ b/backend/go/voice-detect/run.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+
+# If a self-contained ld.so was packaged, route through it so the packaged
+# libc / libstdc++ are used instead of the host's (matches the whisper /
+# parakeet backends' runtime layout).
+if [ -f "$CURDIR/lib/ld.so" ]; then
+	echo "Using lib/ld.so"
+	exec "$CURDIR/lib/ld.so" "$CURDIR/voice-detect-grpc" "$@"
+fi
+
+exec "$CURDIR/voice-detect-grpc" "$@"
--- a/backend/go/voice-detect/test.sh
+++ b/backend/go/voice-detect/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+cd "$CURDIR"
+
+echo "Running voice-detect backend tests..."
+
+# The pure-Go parsing specs always run. The embed/verify/analyze smoke specs run
+# only when a model + WAV are provided via VOICEDETECT_BACKEND_TEST_MODEL and
+# VOICEDETECT_BACKEND_TEST_WAV; otherwise they auto-skip.
+LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
+
+echo "voice-detect tests completed."
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -209,6 +209,78 @@
    nvidia-cuda-12: "cuda12-ced"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced"
+- &voicedetect
+  name: "voice-detect"
+  alias: "voice-detect"
+  license: mit
+  icon: https://avatars.githubusercontent.com/u/95302084
+  description: |
+    voice-detect speaker recognition and voice analysis.
+    voice-detect.cpp is a C++/ggml engine that produces L2-normalised
+    speaker embeddings (ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker
+    ERes2Net, CAM++) for voice verification and 1:N identification, plus
+    a wav2vec2 age / gender / emotion analysis head. It replaces the
+    Python speaker-recognition backend and is exposed through the Voice*
+    gRPC rpcs and the /v1/voice/* REST endpoints. It runs on CPU, NVIDIA
+    CUDA, AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - CPU
+    - GPU
+    - CUDA
+    - HIP
+  capabilities:
+    default: "cpu-voice-detect"
+    nvidia: "cuda12-voice-detect"
+    intel: "intel-sycl-f16-voice-detect"
+    metal: "metal-voice-detect"
+    amd: "rocm-voice-detect"
+    vulkan: "vulkan-voice-detect"
+    nvidia-l4t: "nvidia-l4t-arm64-voice-detect"
+    nvidia-cuda-13: "cuda13-voice-detect"
+    nvidia-cuda-12: "cuda12-voice-detect"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect"
+- &facedetect
+  name: "face-detect"
+  alias: "face-detect"
+  license: mit
+  icon: https://avatars.githubusercontent.com/u/95302084
+  description: |
+    face-detect face detection, embedding, verification and analysis.
+    face-detect.cpp is a C++/ggml engine that runs SCRFD / YuNet face
+    detection and ArcFace / SFace 512-d (or 128-d) L2-normalised face
+    embeddings for verification and 1:N identification, plus a landmark /
+    age / gender analysis head. It replaces the Python insightface backend
+    and is exposed through the Embedding, Detect and Face* gRPC rpcs and
+    the /v1/face/* REST endpoints. It runs on CPU, NVIDIA CUDA, AMD
+    ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - CPU
+    - GPU
+    - CUDA
+    - HIP
+  capabilities:
+    default: "cpu-face-detect"
+    nvidia: "cuda12-face-detect"
+    intel: "intel-sycl-f16-face-detect"
+    metal: "metal-face-detect"
+    amd: "rocm-face-detect"
+    vulkan: "vulkan-face-detect"
+    nvidia-l4t: "nvidia-l4t-arm64-face-detect"
+    nvidia-cuda-13: "cuda13-face-detect"
+    nvidia-cuda-12: "cuda12-face-detect"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect"
 - &voxtral
  name: "voxtral"
  alias: "voxtral"
@@ -2827,6 +2899,236 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-13-ced
+## voice-detect
+- !!merge <<: *voicedetect
+  name: "voice-detect-development"
+  capabilities:
+    default: "cpu-voice-detect-development"
+    nvidia: "cuda12-voice-detect-development"
+    intel: "intel-sycl-f16-voice-detect-development"
+    metal: "metal-voice-detect-development"
+    amd: "rocm-voice-detect-development"
+    vulkan: "vulkan-voice-detect-development"
+    nvidia-l4t: "nvidia-l4t-arm64-voice-detect-development"
+    nvidia-cuda-13: "cuda13-voice-detect-development"
+    nvidia-cuda-12: "cuda12-voice-detect-development"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect-development"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect-development"
+- !!merge <<: *voicedetect
+  name: "nvidia-l4t-arm64-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "nvidia-l4t-arm64-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-nvidia-l4t-arm64-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-nvidia-l4t-arm64-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cpu-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-cpu-voice-detect
+- !!merge <<: *voicedetect
+  name: "cpu-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-cpu-voice-detect
+- !!merge <<: *voicedetect
+  name: "metal-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "metal-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda12-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda12-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-voice-detect
+- !!merge <<: *voicedetect
+  name: "rocm-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-rocm-hipblas-voice-detect
+- !!merge <<: *voicedetect
+  name: "rocm-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-rocm-hipblas-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f32-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f32-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f32-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f32-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f16-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f16-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f16-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f16-voice-detect
+- !!merge <<: *voicedetect
+  name: "vulkan-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-vulkan-voice-detect
+- !!merge <<: *voicedetect
+  name: "vulkan-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-vulkan-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-13-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-13-voice-detect
+## face-detect
+- !!merge <<: *facedetect
+  name: "face-detect-development"
+  capabilities:
+    default: "cpu-face-detect-development"
+    nvidia: "cuda12-face-detect-development"
+    intel: "intel-sycl-f16-face-detect-development"
+    metal: "metal-face-detect-development"
+    amd: "rocm-face-detect-development"
+    vulkan: "vulkan-face-detect-development"
+    nvidia-l4t: "nvidia-l4t-arm64-face-detect-development"
+    nvidia-cuda-13: "cuda13-face-detect-development"
+    nvidia-cuda-12: "cuda12-face-detect-development"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect-development"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect-development"
+- !!merge <<: *facedetect
+  name: "nvidia-l4t-arm64-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "nvidia-l4t-arm64-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-nvidia-l4t-arm64-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-nvidia-l4t-arm64-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cpu-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-cpu-face-detect
+- !!merge <<: *facedetect
+  name: "cpu-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-face-detect"
+  mirrors:
+    - localai/localai-backends:master-cpu-face-detect
+- !!merge <<: *facedetect
+  name: "metal-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "metal-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda12-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-face-detect
+- !!merge <<: *facedetect
+  name: "cuda12-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-face-detect
+- !!merge <<: *facedetect
+  name: "rocm-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-rocm-hipblas-face-detect
+- !!merge <<: *facedetect
+  name: "rocm-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-rocm-hipblas-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f32-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f32-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f32-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f32-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f16-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f16-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f16-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f16-face-detect
+- !!merge <<: *facedetect
+  name: "vulkan-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-vulkan-face-detect
+- !!merge <<: *facedetect
+  name: "vulkan-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-vulkan-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-13-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-13-face-detect
 ## stablediffusion-ggml
 - !!merge <<: *stablediffusionggml
  name: "cpu-stablediffusion-ggml"