mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-01 11:56:57 -04:00
feat(backends): add voice-detect + face-detect ggml backends (replace Python insightface/speaker-recognition) (#10441)
* feat(voice-detect): add Go purego backend for voice-detect.cpp Add backend/go/voice-detect implementing the Backend gRPC voice subset (VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego, mirroring the parakeet-cpp / omnivoice-cpp backends. The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and float-vector returns are owned by Go and released through the matching capi free functions, with the per-ctx last error surfaced into Go errors. Calls are serialized via base.SingleThread since the C context is not reentrant. Proto field mapping: - VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model. - VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the verify_threshold option, default 0.25) -> verify_paths -> verified/distance/ threshold/confidence/model/processing_time_ms. - VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion document maps to a single VoiceAnalysis segment (start/end 0; gender "label" -> dominant_gender with the remaining float scores as the gender map; emotion label/scores -> dominant_emotion/emotion). The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(voice-detect): wire backend into index, gallery and build Register the voice-detect.cpp speaker-recognition + voice-analysis backend (added in Voice-INT-A) into LocalAI's distribution surfaces, mirroring the ced backend (the closest mudler C++/ggml audio analogue): - backend/index.yaml: add the &voicedetect meta-backend (capabilities platform map, no top-level uri) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the -development variants). Referential integrity audited - every alias target resolves. - gallery/index.yaml: add 5 model entries on backend voice-detect - ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the wav2vec2 age/gender/emotion analyze model. The engine architecture is read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are not yet published: each files: entry points at the intended mudler/voice-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring ced. - .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via VOICEDETECT_VERSION (pin 47546430, = 4754643). - core/config/backend_capabilities.go: register voice-detect in the backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze -> speaker_recognition), mirroring speaker-recognition. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): add purego Go backend for face-detect.cpp Add the LocalAI Go backend that dlopens libfacedetect.so (the flat facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect backend. Implements the Face subset of the Backend gRPC service: - Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path -> L2-normalized ArcFace embedding. - Detect(DetectOptions): src -> detect_path_json -> Detection boxes (class_name "face", [x1,y1,x2,y2] -> x/y/w/h). - FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof -> verify_paths; best-effort img areas via detect. - FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face age + gender ("M"/"F" normalized to "Man"/"Woman"). The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib with ggml + vendored libjpeg-turbo static (PIC), so the .so is ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_ symbols). Gated Ginkgo e2e mirrors voice-detect. Note for the gallery-wiring task: backend registration (index.yaml, gallery, core/config/backend_capabilities.go) is intentionally not touched here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(voice-detect): replace em dashes in net-new descriptions Project style forbids em/en dashes. Replace the three U+2014 chars introduced by the voice-detect gallery/index wiring with `-`/`:`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(face-detect): wire backend into index, gallery and build Register the face-detect.cpp face detection / embedding / verification / analysis backend (added in Face-INT-A) into LocalAI's distribution surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml recognition analogue): - backend/index.yaml: add the &facedetect meta-backend (capabilities platform map, no top-level uri to avoid the meta-backend gotcha) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/ metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants), 22 entries. Referential integrity audited: every alias target resolves. - gallery/index.yaml: add 4 model entries on backend face-detect - face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL) and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the commercial-friendly alternative). The detector/embedder architecture is read from GGUF metadata (facedetect.arch) at load; only the real verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF artifacts are not yet published: each files: entry points at the intended mudler/face-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - core/config/backend_capabilities.go: register face-detect in the backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze -> face_recognition), mirroring insightface. - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring voice-detect. - .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via FACEDETECT_VERSION (pin 636a1963). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(recon): voice-detect metal build branch + face-detect gallery usecases Add the missing metal BUILD_TYPE branch to the voice-detect Makefile forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the darwin metal CI artifact is built with the Metal backend instead of CPU-only. Expand the 4 face-detect gallery models' known_usecases to [face_recognition, detection, embeddings] to match the backend capabilities map and the mirrored insightface-buffalo entries, so auto-selection for /v1/detect and /embeddings works. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(recon): document voice-detect and face-detect ggml backends Document the new standalone C++/ggml biometric backends as the recommended/default option for face and voice recognition, keeping the existing Python insightface / speaker-recognition backends framed as the legacy path. - features/face-recognition.md: add a face-detect (ggml) backend section with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface Apache-2.0), licensing, and verify/detect/analyze quickstart. - features/voice-recognition.md: add a voice-detect (ggml) backend section with the gallery entries (ecapa-tdnn, wespeaker-resnet34, eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial analyze head) and quickstart. - reference/compatibility-table.md: add face-detect.cpp and voice-detect.cpp rows to the Vision, Detection & Recognition table. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(gallery): publish recon backend GGUF uris + sha256 Fill in the published HuggingFace GGUF uris and verified sha256 for the 9 recon gallery entries (voice-detect-* and face-detect-*), and remove the TODO publish markers. Correct the eres2net, campplus, and emotion-wav2vec2 uris to the actual published filenames. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now embedded and re-uploaded under the same filenames/uris) and note the FaceVerify anti_spoof request flag in each description. Add a new voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): add face-detect-buffalo-sc and antelopev2 packs Add gallery entries for two newly-published insightface face packs on the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k R100, 512-d). Both are non-commercial research-only. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): honor LocalAI per-model threads in voice/face-detect backends LocalAI spawns one backend process per model and serves requests concurrently, so the engines' own min(hardware_concurrency, 8) default can oversubscribe cores. Forward the per-model Threads value from the gRPC LoadModel options into the engine via VOICEDETECT_THREADS / FACEDETECT_THREADS (read at backend construction) before the capi load. A non-positive Threads is treated as unset, leaving the engine default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to CPU-optimized engine commits voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads). Brings the CPU optimizations into the LocalAI backend builds. GGUF format and parity unchanged, so the published HF GGUFs remain valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-2 CPU-optimized engines voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context, wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp -> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-3 Winograd engines voice-detect.cpp -> 45122ec (Winograd F(2x2,3x3) for WeSpeaker/ERes2Net 3x3 convs, -22%/-20% @8t); face-detect.cpp -> cd5c962 (Winograd F(4x4,3x3) for SCRFD large maps, -22% @1t on top of F(2x2), more load-stable). Parity held (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump backend pins to round-4 Winograd engines (CPU opt complete) voice-detect.cpp -> d2839ca (CAM++ FCM 2D convs through Winograd, -15.5%/-10.3%); face-detect.cpp -> c1db23d (AVX2-vectorized Winograd tile transforms, SCRFD detect -14%/-9.6%). Final CPU optimization round; the conv-kernel lever class is now exhausted (parity held cosine=1.0; GGUF/parity unchanged, HF GGUFs valid). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump face-detect pin to deep-kernel engine (7ae5c4d) face-detect.cpp -> 7ae5c4d: register-blocked winograd-domain GEMM microkernel (2.8x isolated GFLOP/s), AVX-512 zmm evolution behind runtime CPUID dispatch (ship-safe, AVX2 fallback bit-identical), bias/relu fused into the winograd output transform, and SFace Conv+BN fold + bias/PReLU fusion. SCRFD detect ~1.4x faster end-to-end vs the round-4 baseline; parity bit-exact; portable single binary (function-multiversioned, no global -mavx512f). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ECAPA operand-order win (e9c56ae) voice-detect.cpp -> e9c56ae: weight-as-src0 mul_mat order in ECAPA's F32 conv1d_same (routes through tinyBLAS sgemm); ECAPA embed 1.67x @1t / ~1.3x @8t, parity cosine=1.0. Isolated to encoder.cpp (ECAPA-only); ERes2Net/CAM++/WeSpeaker do not call conv1d_same so are provably unaffected. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to FMA-throughput engines (voice f7b9f89, face 2d2d5f0) face -> 2d2d5f0: route ArcFace 3x3 body convs through the AVX-512 winograd microkernel (kWinoMinSize 80->14); ArcFace 1.62x @1t, SCRFD detect to 0.966 of MLAS @1t, no regression. voice -> f7b9f89: runtime-CPUID-dispatched AVX-512 winograd-GEMM microkernel (ship-safe, AVX2 fallback bit-identical); WeSpeaker 1.90x @1t. Parity cosine=1.0 throughout; portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to MLAS-class direct-conv engines (voice 7ecfd07, face be22d67) Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s, within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback, fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%) + ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression). Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice pin to Phase-A blocked backbone (f4e7eef) WeSpeaker ResNet34 runs as one nChw16c blocked island (2 reorders/forward vs ~60) on AVX-512, default; per-conv directconv fallback on AVX2. +2.9% @1t / +17-19% @8t vs per-conv directconv, parity cosine=1.0. The conv microkernel is already FMA-bound near peak (~0.86-0.98x MLAS-implied); residual to MLAS is sub-peak edge + non-conv tail, documented in docs/cpu-optimization.md. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to breadth blocked-backbone (voice 7f66871, face d80092b) voice 7f66871: AVX2-vectorized (ymm) blocked island - AVX2-only hosts now run the blocked backbone for WeSpeaker (2.3x over per-conv-AVX2, cosine=1.0); ERes2Net stays per-conv (blocked regresses, opt-in only); CAM++ Winograd-pinned. face d80092b: ArcFace recognizer blocked island, AVX-512 default (-13% @8t, ~0.90x MLAS, the closest conv result), auto per-conv on AVX2; SCRFD untouched on Winograd (0 island invocations during detect). Parity cosine=1.0 / detect <=1px throughout. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to small-spatial + stem conv kernels (voice 99b1804, face 47fdab6) Measured-gap-driven conv kernels: small-spatial (fill the register tile when output width <= tile width) + small-IC stem + strided-1x1/downsample recovery. ArcFace recognizer 0.57 -> 0.70x MLAS @1t (the closest conv model), WeSpeaker 0.65 -> 0.79x @1t. Parity cosine=1.0 / detect <=1px. The OC-block-sharing lever was a measured dead-end (deep stride-1 is L3-weight-bandwidth bound, not read-port bound) and was NOT shipped. Kernel ceiling reached; further gap needs an algorithm-class change (cache-blocked weight-stationary GEMM, or q8 weights). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to GPU persistent-graph + multi-model-safe cache (voice 45d2e6b, face 0a4799a) GPU wins (CUDA/ggml backend, no CPU-path change): persistent per-shape graph+context cache in Backend::compute() eliminates the per-call cudaGraph re-instantiation churn -> wav2vec2 emotion+age-gender now AT GPU parity with torch-cuDNN on GB10 (0.97-0.98x), CAM++ -5.7ms; bit-identical parity. Cache hardened multi-model-safe (invalidate-on-free keyed by the ModelLoader weights buffer) so LocalAI multi-model hosting cannot stale-hit. Conv models still trail cuDNN (im2col-materialization-bound) - cuDNN implicit-GEMM lever next. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump pins to cuDNN-conv-capable engines (voice b6e4356, face 6107a24) Adds the opt-in cuDNN implicit-GEMM conv path (VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN, DEFAULT OFF -> zero build/runtime dep until enabled). On GPU it kills the im2col-materialization bottleneck and reaches torch-cuDNN parity on the spill-bound convs: SCRFD detect 14.8->6.4ms (2.3x, ~parity), WeSpeaker ~parity, ERes2Net beats torch (1.10x); ArcFace/CAM++ neutral (no spill). Parity exact (SCRFD <=1px, cosine=1.0). To USE it in LocalAI, the CUDA backend build must enable the flag AND bundle libcudnn - deferred until a cuDNN-bundled GPU image; flag stays OFF here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(recon): enable cuDNN conv path on arm64+CUDA13 recon backends The voice-detect.cpp / face-detect.cpp engines have an opt-in cuDNN implicit-GEMM conv path behind VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN (default OFF) that kills im2col on the GPU and reaches torch-cuDNN parity (SCRFD 2.3x, WeSpeaker/ERes2Net parity), measured on the GB10 (arm64, CUDA 13, sm_121a). Enable it for the CUDA build, but only where cuDNN actually ships: the arm64 + CUDA 13 image (GB10/Jetson/L4T). x86 CUDA images carry no cuDNN, so flipping it on globally for BUILD_TYPE=cublas would be a link failure. The Makefiles gate on CUDA_MAJOR_VERSION=13 + arch (TARGETARCH from the matrix/Docker build, uname -m fallback for local builds). backend/Dockerfile.golang already installs the runtime libcudnn9-cuda-13 in the arm64+CUDA13 apt block; add the matching libcudnn9-dev-cuda-13 so the build-time link resolves. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): bump voice-detect pin to ERes2Net blocked-default (30beecd) Defaults VD_ERES2NET_BLOCKED ON: routes the ERes2Net Res2Net body through the blocked nChw16c AVX-512 directconv island instead of the 1x1 mul_mat fast path (CONT-transpose + skinny low-K GEMM). On the shipped GGML_NATIVE=OFF build (ggml mul_mat is AVX2-only) this wins ~2x at every thread count (2.07x@1t, 2.2x@4t, 2.05x@8t); pure-AVX2 fallback still 1.3-1.62x. Parity exact (cosine=1.000000 vs golden), so registered voices + verify/identify thresholds are unaffected. The prior default-OFF rested on a stale comment whose 23pct regression only held on the non-shipping GGML_NATIVE=ON build. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(readme): announce native voice-detect + face-detect backends in Latest News Add a Latest News entry for the new from-scratch C++/ggml biometric backends (voice-detect.cpp + face-detect.cpp) that replace the Python insightface and speaker-recognition backends: no Python/onnxruntime at inference, self-contained GGUF, bit-exact parity, GPU cuDNN parity. Mirrors the parakeet.cpp / locate-anything.cpp native-backend news entries. Refs PR #10441. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(recon): re-pin to the squashed engine release commits The voice-detect.cpp and face-detect.cpp histories were squashed to a single release commit, which orphaned the previous pins (voice 30beecd, face 6107a24). Re-pin to the new single-commit SHAs (voice 3d51077, face 06914b0); the tree is identical, so the backend build is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -137,7 +137,7 @@ RUN <<EOT bash
|
||||
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
||||
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
|
||||
apt-get install -y --no-install-recommends \
|
||||
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
||||
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} libcudnn9-dev-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
||||
fi
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
18
backend/go/face-detect/.gitignore
vendored
Normal file
18
backend/go/face-detect/.gitignore
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
# Fetched upstream sources
|
||||
sources/
|
||||
|
||||
# CMake build directories
|
||||
build*/
|
||||
|
||||
# build artifacts staged in-tree by the Makefile (cp from sources/) or
|
||||
# symlinked for local dev; the real sources live in face-detect.cpp upstream.
|
||||
*.so
|
||||
*.so.*
|
||||
facedetect_capi.h
|
||||
compile_commands.json
|
||||
|
||||
# Compiled backend binary
|
||||
face-detect-grpc
|
||||
|
||||
# Packaging output
|
||||
package/
|
||||
110
backend/go/face-detect/Makefile
Normal file
110
backend/go/face-detect/Makefile
Normal file
@@ -0,0 +1,110 @@
|
||||
# face-detect backend Makefile.
|
||||
#
|
||||
# Upstream pin lives below as FACEDETECT_VERSION?=06914b0... (.github/bump_deps.sh
|
||||
# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
|
||||
# convention).
|
||||
#
|
||||
# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
|
||||
# symlink the .so + header into this directory and skip the clone/cmake steps:
|
||||
#
|
||||
# ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
|
||||
# ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
|
||||
# go build -o face-detect-grpc .
|
||||
#
|
||||
# The default target below does the proper clone-at-pin + cmake build so CI does
|
||||
# not need a side-checkout.
|
||||
|
||||
FACEDETECT_VERSION?=06914b077d52f90d5421299138e7be6bdd06b5e8
|
||||
FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
|
||||
|
||||
GOCMD?=go
|
||||
GO_TAGS?=
|
||||
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
|
||||
|
||||
BUILD_TYPE?=
|
||||
NATIVE?=false
|
||||
|
||||
# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
|
||||
# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
|
||||
RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
|
||||
|
||||
# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
|
||||
# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
|
||||
# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
|
||||
# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
|
||||
# side, so only the facedetect_capi_* surface is exported.
|
||||
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
|
||||
|
||||
ifeq ($(NATIVE),false)
|
||||
CMAKE_ARGS+=-DGGML_NATIVE=OFF
|
||||
endif
|
||||
|
||||
# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
|
||||
# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
|
||||
# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
|
||||
# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
|
||||
ifeq ($(BUILD_TYPE),cublas)
|
||||
CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
|
||||
# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, SCRFD 2.3x
|
||||
# vs torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
|
||||
# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
|
||||
# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
|
||||
ifeq ($(CUDA_MAJOR_VERSION),13)
|
||||
ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
|
||||
CMAKE_ARGS+=-DFACEDETECT_GGML_CUDNN=ON
|
||||
endif
|
||||
endif
|
||||
else ifeq ($(BUILD_TYPE),openblas)
|
||||
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
|
||||
else ifeq ($(BUILD_TYPE),hipblas)
|
||||
CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
|
||||
else ifeq ($(BUILD_TYPE),vulkan)
|
||||
CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
|
||||
else ifeq ($(BUILD_TYPE),metal)
|
||||
CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
|
||||
endif
|
||||
|
||||
.PHONY: face-detect-grpc package build clean purge test all
|
||||
|
||||
all: face-detect-grpc
|
||||
|
||||
# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
|
||||
# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
|
||||
# bump, run 'make purge && make' to refetch.
|
||||
sources/face-detect.cpp:
|
||||
mkdir -p sources/face-detect.cpp
|
||||
cd sources/face-detect.cpp && \
|
||||
git init -q && \
|
||||
git remote add origin $(FACEDETECT_REPO) && \
|
||||
git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
|
||||
git checkout FETCH_HEAD && \
|
||||
git submodule update --init --recursive --depth 1 --single-branch
|
||||
|
||||
# Build the shared lib + header out-of-tree, then stage them next to the Go
|
||||
# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
|
||||
# them up.
|
||||
libfacedetect.so: sources/face-detect.cpp
|
||||
cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
|
||||
cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
|
||||
cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
|
||||
cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
|
||||
|
||||
face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
|
||||
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
|
||||
|
||||
package: face-detect-grpc
|
||||
bash package.sh
|
||||
|
||||
build: package
|
||||
|
||||
# Test target. The embed/detect/verify/analyze smoke specs are gated on
|
||||
# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
|
||||
# heavy specs auto-skip and only the pure-Go parsing specs run.
|
||||
test:
|
||||
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
|
||||
|
||||
clean: purge
|
||||
rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
|
||||
|
||||
purge:
|
||||
rm -rf sources/face-detect.cpp
|
||||
431
backend/go/face-detect/gofacedetect.go
Normal file
431
backend/go/face-detect/gofacedetect.go
Normal file
@@ -0,0 +1,431 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"math"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
"unsafe"
|
||||
|
||||
"github.com/mudler/LocalAI/pkg/grpc/base"
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// purego-bound entry points from libfacedetect.so. Names match
|
||||
// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
|
||||
// is enough to spot drift.
|
||||
//
|
||||
// The opaque ctx and the malloc'd char*/float* return values are declared as
|
||||
// uintptr so we get the raw pointer back and can release it via the matching
|
||||
// capi free function. purego's native string/[]float32 returns would copy and
|
||||
// forget the original pointer, leaking the C-owned buffer on every call.
|
||||
var (
|
||||
CppAbiVersion func() int32
|
||||
CppLoad func(ggufPath string) uintptr
|
||||
CppFree func(ctx uintptr)
|
||||
CppLastError func(ctx uintptr) string
|
||||
CppFreeString func(s uintptr)
|
||||
CppFreeVec func(v uintptr)
|
||||
CppEmbedPath func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
|
||||
CppEmbedRGB func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
|
||||
CppDetectJSON func(ctx uintptr, imagePath string) uintptr
|
||||
CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
|
||||
CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
|
||||
)
|
||||
|
||||
// FaceDetect implements the face-recognition (biometric) subset of the Backend
|
||||
// gRPC service over libfacedetect.so. The C side keeps a single loaded model
|
||||
// pack plus a per-ctx last-error buffer and is not reentrant, so
|
||||
// base.SingleThread serializes every call.
|
||||
type FaceDetect struct {
|
||||
base.SingleThread
|
||||
opts loadOptions
|
||||
ctxPtr uintptr
|
||||
}
|
||||
|
||||
func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
|
||||
model := opts.ModelFile
|
||||
if model == "" {
|
||||
model = opts.ModelPath
|
||||
}
|
||||
if !filepath.IsAbs(model) && opts.ModelPath != "" {
|
||||
model = filepath.Join(opts.ModelPath, model)
|
||||
}
|
||||
if model == "" {
|
||||
return errors.New("face-detect: ModelFile is required")
|
||||
}
|
||||
|
||||
f.opts = parseOptions(opts.Options)
|
||||
if f.opts.modelName == "" {
|
||||
f.opts.modelName = filepath.Base(model)
|
||||
}
|
||||
|
||||
// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
|
||||
// one backend process per model and serves requests concurrently, so the
|
||||
// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
|
||||
// FACEDETECT_THREADS is read by the engine at backend construction, so it
|
||||
// must be set before the capi load. A non-positive Threads means "unset":
|
||||
// leave the env alone so the engine keeps its sane default.
|
||||
threads := opts.Threads
|
||||
if threads > 0 {
|
||||
if err := os.Setenv("FACEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
|
||||
return fmt.Errorf("face-detect: set FACEDETECT_THREADS: %w", err)
|
||||
}
|
||||
xlog.Info("face-detect: applying LocalAI thread budget", "threads", threads)
|
||||
}
|
||||
|
||||
xlog.Info("face-detect: loading model", "model", model,
|
||||
"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
|
||||
|
||||
ctx := CppLoad(model)
|
||||
if ctx == 0 {
|
||||
// The last-error buffer lives on the ctx that was never returned, so
|
||||
// surface the path the operator tried to load instead.
|
||||
return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
|
||||
}
|
||||
f.ctxPtr = ctx
|
||||
return nil
|
||||
}
|
||||
|
||||
// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
|
||||
// the supplied image. Mirroring the Python face backend, the image is read from
|
||||
// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
|
||||
// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
|
||||
// server wraps the returned slice in an EmbeddingResult.
|
||||
func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
|
||||
if f.ctxPtr == 0 {
|
||||
return nil, errors.New("face-detect: model not loaded")
|
||||
}
|
||||
if len(req.Images) == 0 || req.Images[0] == "" {
|
||||
return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
|
||||
}
|
||||
|
||||
path, cleanup, err := materializeImage(req.Images[0])
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer cleanup()
|
||||
|
||||
return f.embedPath(path)
|
||||
}
|
||||
|
||||
func (f *FaceDetect) embedPath(path string) ([]float32, error) {
|
||||
var vec uintptr
|
||||
var dim int32
|
||||
rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
|
||||
if rc != 0 || vec == 0 || dim <= 0 {
|
||||
return nil, f.lastErr("embed", path)
|
||||
}
|
||||
defer CppFreeVec(vec)
|
||||
// Copy out of the C-owned malloc'd buffer before freeing it. The
|
||||
// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
|
||||
// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
|
||||
// nor moves this buffer and we copy immediately.
|
||||
src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
|
||||
out := make([]float32, int(dim))
|
||||
copy(out, src)
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// Detect runs SCRFD over the image and returns one Detection per face. The
|
||||
// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
|
||||
// width/height, so the corners are converted. The 5 facial landmarks the engine
|
||||
// also returns are dropped: the Detection message has no field for them.
|
||||
func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
|
||||
if f.ctxPtr == 0 {
|
||||
return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
|
||||
}
|
||||
if req.Src == "" {
|
||||
return pb.DetectResponse{}, errors.New("face-detect: src image is required")
|
||||
}
|
||||
|
||||
path, cleanup, err := materializeImage(req.Src)
|
||||
if err != nil {
|
||||
return pb.DetectResponse{}, err
|
||||
}
|
||||
defer cleanup()
|
||||
|
||||
faces, err := f.detectFaces(path)
|
||||
if err != nil {
|
||||
return pb.DetectResponse{}, err
|
||||
}
|
||||
|
||||
dets := make([]*pb.Detection, 0, len(faces))
|
||||
for _, fc := range faces {
|
||||
if req.Threshold > 0 && fc.Score < req.Threshold {
|
||||
continue
|
||||
}
|
||||
x, y, w, h := fc.xywh()
|
||||
dets = append(dets, &pb.Detection{
|
||||
X: x,
|
||||
Y: y,
|
||||
Width: w,
|
||||
Height: h,
|
||||
Confidence: fc.Score,
|
||||
ClassName: "face",
|
||||
})
|
||||
}
|
||||
return pb.DetectResponse{Detections: dets}, nil
|
||||
}
|
||||
|
||||
// FaceVerify embeds the primary face in each image and reports whether they are
|
||||
// the same identity by cosine distance against a threshold. A request threshold
|
||||
// <= 0 falls back to the model-configured default (verify_threshold option,
|
||||
// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
|
||||
// veto internally (verified forced false on a spoof); the per-image liveness
|
||||
// scores are not exposed by the verify entry point, so img*_is_real /
|
||||
// img*_antispoof_score stay at their zero values.
|
||||
func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
|
||||
if f.ctxPtr == 0 {
|
||||
return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
|
||||
}
|
||||
if req.Img1 == "" || req.Img2 == "" {
|
||||
return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
|
||||
}
|
||||
|
||||
path1, cleanup1, err := materializeImage(req.Img1)
|
||||
if err != nil {
|
||||
return pb.FaceVerifyResponse{}, err
|
||||
}
|
||||
defer cleanup1()
|
||||
path2, cleanup2, err := materializeImage(req.Img2)
|
||||
if err != nil {
|
||||
return pb.FaceVerifyResponse{}, err
|
||||
}
|
||||
defer cleanup2()
|
||||
|
||||
threshold := req.Threshold
|
||||
if threshold <= 0 {
|
||||
threshold = f.opts.verifyThreshold
|
||||
}
|
||||
|
||||
antiSpoof := int32(0)
|
||||
if req.AntiSpoofing {
|
||||
antiSpoof = 1
|
||||
}
|
||||
|
||||
started := time.Now()
|
||||
var distance float32
|
||||
var verified int32
|
||||
rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
|
||||
unsafe.Pointer(&distance), unsafe.Pointer(&verified))
|
||||
if rc != 0 {
|
||||
return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
|
||||
}
|
||||
elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
|
||||
|
||||
// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
|
||||
// matching the Python face backend's reporting.
|
||||
confidence := float32(0)
|
||||
if threshold > 0 {
|
||||
confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
|
||||
}
|
||||
|
||||
return pb.FaceVerifyResponse{
|
||||
Verified: verified != 0,
|
||||
Distance: distance,
|
||||
Threshold: threshold,
|
||||
Confidence: confidence,
|
||||
Model: f.opts.modelName,
|
||||
Img1Area: f.bestArea(path1),
|
||||
Img2Area: f.bestArea(path2),
|
||||
ProcessingTimeMs: elapsedMs,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// FaceAnalyze runs the genderage head on every detected face. The C-API returns
|
||||
// "M"/"F" gender labels and a rounded age; the labels are normalized to the
|
||||
// "Man"/"Woman" values the proto documents.
|
||||
func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
|
||||
if f.ctxPtr == 0 {
|
||||
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
|
||||
}
|
||||
if req.Img == "" {
|
||||
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
|
||||
}
|
||||
|
||||
path, cleanup, err := materializeImage(req.Img)
|
||||
if err != nil {
|
||||
return pb.FaceAnalyzeResponse{}, err
|
||||
}
|
||||
defer cleanup()
|
||||
|
||||
ptr := CppAnalyzeJSON(f.ctxPtr, path)
|
||||
if ptr == 0 {
|
||||
return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
|
||||
}
|
||||
defer CppFreeString(ptr)
|
||||
|
||||
faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
|
||||
if err != nil {
|
||||
return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
|
||||
}
|
||||
return pb.FaceAnalyzeResponse{Faces: faces}, nil
|
||||
}
|
||||
|
||||
// faceBox is one entry of the detect/analyze JSON documents the engine emits.
|
||||
type faceBox struct {
|
||||
Score float32 `json:"score"`
|
||||
Box []float32 `json:"box"`
|
||||
Age float32 `json:"age"`
|
||||
Gender string `json:"gender"`
|
||||
}
|
||||
|
||||
// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
|
||||
// proto carries. A short or missing box yields zeros.
|
||||
func (b faceBox) xywh() (x, y, w, h float32) {
|
||||
if len(b.Box) < 4 {
|
||||
return 0, 0, 0, 0
|
||||
}
|
||||
return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
|
||||
}
|
||||
|
||||
type facesJSON struct {
|
||||
Faces []faceBox `json:"faces"`
|
||||
}
|
||||
|
||||
func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
|
||||
ptr := CppDetectJSON(f.ctxPtr, path)
|
||||
if ptr == 0 {
|
||||
return nil, f.lastErr("detect", path)
|
||||
}
|
||||
defer CppFreeString(ptr)
|
||||
|
||||
var doc facesJSON
|
||||
if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
|
||||
return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
|
||||
}
|
||||
return doc.Faces, nil
|
||||
}
|
||||
|
||||
// bestArea returns the FacialArea of the highest-scoring face in an image, or an
|
||||
// empty area when detection fails or finds nothing. Best-effort: verify already
|
||||
// succeeded, so a missing region must not turn a valid match into an error.
|
||||
func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
|
||||
faces, err := f.detectFaces(path)
|
||||
if err != nil || len(faces) == 0 {
|
||||
return &pb.FacialArea{}
|
||||
}
|
||||
best := faces[0]
|
||||
for _, fc := range faces[1:] {
|
||||
if fc.Score > best.Score {
|
||||
best = fc
|
||||
}
|
||||
}
|
||||
x, y, w, h := best.xywh()
|
||||
return &pb.FacialArea{X: x, Y: y, W: w, H: h}
|
||||
}
|
||||
|
||||
// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
|
||||
// The engine reports gender as "M"/"F"; both the dominant label and the score
|
||||
// map are filled with the "Man"/"Woman" form the proto documents.
|
||||
func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
|
||||
var parsed facesJSON
|
||||
if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
|
||||
for _, fc := range parsed.Faces {
|
||||
x, y, w, h := fc.xywh()
|
||||
fa := &pb.FaceAnalysis{
|
||||
Region: &pb.FacialArea{X: x, Y: y, W: w, H: h},
|
||||
FaceConfidence: fc.Score,
|
||||
Age: fc.Age,
|
||||
}
|
||||
if label := normalizeGender(fc.Gender); label != "" {
|
||||
fa.DominantGender = label
|
||||
fa.Gender = map[string]float32{label: 1.0}
|
||||
}
|
||||
out = append(out, fa)
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
|
||||
// proto documents. Unknown codes pass through unchanged.
|
||||
func normalizeGender(g string) string {
|
||||
switch strings.ToUpper(strings.TrimSpace(g)) {
|
||||
case "M":
|
||||
return "Man"
|
||||
case "F":
|
||||
return "Woman"
|
||||
case "":
|
||||
return ""
|
||||
default:
|
||||
return g
|
||||
}
|
||||
}
|
||||
|
||||
// materializeImage decodes a base64 image payload into a temp file and returns
|
||||
// its path plus a cleanup func. As a convenience for callers that already pass a
|
||||
// filesystem path (e.g. a test fixture), an existing path is used as-is with a
|
||||
// no-op cleanup. data: URI prefixes are stripped before decoding.
|
||||
func materializeImage(src string) (path string, cleanup func(), err error) {
|
||||
noop := func() {}
|
||||
if src == "" {
|
||||
return "", noop, errors.New("face-detect: empty image input")
|
||||
}
|
||||
if _, statErr := os.Stat(src); statErr == nil {
|
||||
return src, noop, nil
|
||||
}
|
||||
|
||||
payload := src
|
||||
if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
|
||||
payload = payload[i+1:]
|
||||
}
|
||||
data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
|
||||
if decErr != nil || len(data) == 0 {
|
||||
return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
|
||||
}
|
||||
|
||||
tmp, createErr := os.CreateTemp("", "face-detect-*.img")
|
||||
if createErr != nil {
|
||||
return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
|
||||
}
|
||||
cleanup = func() { _ = os.Remove(tmp.Name()) }
|
||||
if _, wErr := tmp.Write(data); wErr != nil {
|
||||
_ = tmp.Close()
|
||||
cleanup()
|
||||
return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
|
||||
}
|
||||
if cErr := tmp.Close(); cErr != nil {
|
||||
cleanup()
|
||||
return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
|
||||
}
|
||||
return tmp.Name(), cleanup, nil
|
||||
}
|
||||
|
||||
// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
|
||||
func (f *FaceDetect) lastErr(op, subject string) error {
|
||||
msg := strings.TrimSpace(CppLastError(f.ctxPtr))
|
||||
if msg == "" {
|
||||
msg = "no error detail"
|
||||
}
|
||||
return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
|
||||
}
|
||||
|
||||
// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
|
||||
// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
|
||||
//
|
||||
// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
|
||||
// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
|
||||
// moves the buffer and we dereference it immediately to copy the bytes out.
|
||||
func goStringFromCPtr(cptr uintptr) string {
|
||||
if cptr == 0 {
|
||||
return ""
|
||||
}
|
||||
p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
|
||||
n := 0
|
||||
for *(*byte)(unsafe.Add(p, n)) != 0 {
|
||||
n++
|
||||
}
|
||||
return string(unsafe.Slice((*byte)(p), n))
|
||||
}
|
||||
230
backend/go/face-detect/gofacedetect_test.go
Normal file
230
backend/go/face-detect/gofacedetect_test.go
Normal file
@@ -0,0 +1,230 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/base64"
|
||||
"os"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/ebitengine/purego"
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
func TestFaceDetect(t *testing.T) {
|
||||
RegisterFailHandler(Fail)
|
||||
RunSpecs(t, "face-detect Backend Suite")
|
||||
}
|
||||
|
||||
var (
|
||||
libLoadOnce sync.Once
|
||||
libLoadErr error
|
||||
)
|
||||
|
||||
// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
|
||||
// bridge without spinning up the gRPC server. Records the error (the smoke
|
||||
// specs skip themselves) when libfacedetect.so is not loadable from cwd
|
||||
// (LD_LIBRARY_PATH or a symlink in ./).
|
||||
func ensureLibLoaded() error {
|
||||
libLoadOnce.Do(func() {
|
||||
libName := os.Getenv("FACEDETECT_LIBRARY")
|
||||
if libName == "" {
|
||||
libName = "libfacedetect.so"
|
||||
}
|
||||
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
|
||||
if err != nil {
|
||||
libLoadErr = err
|
||||
return
|
||||
}
|
||||
purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
|
||||
purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
|
||||
purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
|
||||
purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
|
||||
purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
|
||||
purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
|
||||
purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
|
||||
purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
|
||||
purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
|
||||
purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
|
||||
purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
|
||||
})
|
||||
return libLoadErr
|
||||
}
|
||||
|
||||
var _ = Describe("parseOptions", func() {
|
||||
It("defaults verify_threshold to 0.35", func() {
|
||||
o := parseOptions(nil)
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
|
||||
Expect(o.modelName).To(Equal(""))
|
||||
})
|
||||
|
||||
It("parses verify_threshold, threshold alias and model_name", func() {
|
||||
o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.4)))
|
||||
Expect(o.modelName).To(Equal("buffalo_l"))
|
||||
|
||||
o2 := parseOptions([]string{"threshold:0.3"})
|
||||
Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
|
||||
})
|
||||
|
||||
It("ignores non-positive thresholds and keeps the default", func() {
|
||||
o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("normalizeGender", func() {
|
||||
It("maps M/F codes to Man/Woman", func() {
|
||||
Expect(normalizeGender("M")).To(Equal("Man"))
|
||||
Expect(normalizeGender("f")).To(Equal("Woman"))
|
||||
Expect(normalizeGender(" m ")).To(Equal("Man"))
|
||||
})
|
||||
|
||||
It("passes empty and unknown codes through", func() {
|
||||
Expect(normalizeGender("")).To(Equal(""))
|
||||
Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("faceBox.xywh", func() {
|
||||
It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
|
||||
b := faceBox{Box: []float32{10, 20, 50, 80}}
|
||||
x, y, w, h := b.xywh()
|
||||
Expect(x).To(Equal(float32(10)))
|
||||
Expect(y).To(Equal(float32(20)))
|
||||
Expect(w).To(Equal(float32(40)))
|
||||
Expect(h).To(Equal(float32(60)))
|
||||
})
|
||||
|
||||
It("returns zeros for a short box", func() {
|
||||
x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
|
||||
Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("parseAnalyzeJSON", func() {
|
||||
It("maps region, age and gender for each face", func() {
|
||||
doc := `{"faces":[
|
||||
{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
|
||||
{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
|
||||
faces, err := parseAnalyzeJSON(doc)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(faces).To(HaveLen(2))
|
||||
|
||||
Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
|
||||
Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
|
||||
Expect(faces[0].DominantGender).To(Equal("Man"))
|
||||
Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
|
||||
Expect(faces[0].Region.W).To(Equal(float32(40)))
|
||||
Expect(faces[0].Region.H).To(Equal(float32(60)))
|
||||
|
||||
Expect(faces[1].DominantGender).To(Equal("Woman"))
|
||||
})
|
||||
|
||||
It("tolerates a missing gender field", func() {
|
||||
faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(faces).To(HaveLen(1))
|
||||
Expect(faces[0].DominantGender).To(Equal(""))
|
||||
Expect(faces[0].Gender).To(BeEmpty())
|
||||
})
|
||||
|
||||
It("returns no faces for an empty document", func() {
|
||||
faces, err := parseAnalyzeJSON(`{"faces":[]}`)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(faces).To(BeEmpty())
|
||||
})
|
||||
|
||||
It("returns an error on malformed JSON", func() {
|
||||
_, err := parseAnalyzeJSON(`{not-json`)
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("materializeImage", func() {
|
||||
It("decodes a base64 payload to a temp file", func() {
|
||||
payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
|
||||
path, cleanup, err := materializeImage(payload)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer cleanup()
|
||||
data, rerr := os.ReadFile(path)
|
||||
Expect(rerr).ToNot(HaveOccurred())
|
||||
Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
|
||||
})
|
||||
|
||||
It("strips a data: URI prefix before decoding", func() {
|
||||
payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
|
||||
path, cleanup, err := materializeImage(payload)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer cleanup()
|
||||
data, rerr := os.ReadFile(path)
|
||||
Expect(rerr).ToNot(HaveOccurred())
|
||||
Expect(data).To(Equal([]byte("hello")))
|
||||
})
|
||||
|
||||
It("uses an existing path as-is", func() {
|
||||
tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer func() { _ = os.Remove(tmp.Name()) }()
|
||||
Expect(tmp.Close()).To(Succeed())
|
||||
|
||||
path, cleanup, err := materializeImage(tmp.Name())
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer cleanup()
|
||||
Expect(path).To(Equal(tmp.Name()))
|
||||
})
|
||||
|
||||
It("errors on input that is neither a path nor base64", func() {
|
||||
_, _, err := materializeImage("not base64!!!")
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
})
|
||||
|
||||
// The specs below exercise the real C-API end to end. They run only when both a
|
||||
// model GGUF and a test image are provided, and skip cleanly otherwise so the
|
||||
// suite stays green without large assets.
|
||||
var _ = Describe("FaceDetect end-to-end", Ordered, func() {
|
||||
var (
|
||||
f *FaceDetect
|
||||
modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
|
||||
imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
|
||||
)
|
||||
|
||||
BeforeAll(func() {
|
||||
if modelPath == "" || imagePath == "" {
|
||||
Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
|
||||
}
|
||||
if err := ensureLibLoaded(); err != nil {
|
||||
Skip("libfacedetect.so not loadable: " + err.Error())
|
||||
}
|
||||
f = &FaceDetect{}
|
||||
Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
|
||||
})
|
||||
|
||||
It("embeds the primary face in an image", func() {
|
||||
emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(emb).ToNot(BeEmpty())
|
||||
})
|
||||
|
||||
It("detects at least one face", func() {
|
||||
resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(resp.Detections).ToNot(BeEmpty())
|
||||
Expect(resp.Detections[0].ClassName).To(Equal("face"))
|
||||
})
|
||||
|
||||
It("verifies an image against itself as the same identity", func() {
|
||||
resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(resp.Verified).To(BeTrue())
|
||||
Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
|
||||
})
|
||||
|
||||
It("analyzes age/gender for each face", func() {
|
||||
resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(resp.Faces).ToNot(BeEmpty())
|
||||
})
|
||||
})
|
||||
65
backend/go/face-detect/main.go
Normal file
65
backend/go/face-detect/main.go
Normal file
@@ -0,0 +1,65 @@
|
||||
package main
|
||||
|
||||
// Started internally by LocalAI - one gRPC server per loaded model.
|
||||
//
|
||||
// Loads libfacedetect.so via purego and registers the flat C-API entry points
|
||||
// declared in facedetect_capi.h. The library name can be overridden with
|
||||
// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
|
||||
// convention in the sibling backends); the default looks for the .so next to
|
||||
// this binary (resolved via LD_LIBRARY_PATH by run.sh).
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/ebitengine/purego"
|
||||
grpc "github.com/mudler/LocalAI/pkg/grpc"
|
||||
)
|
||||
|
||||
var (
|
||||
addr = flag.String("addr", "localhost:50051", "the address to connect to")
|
||||
)
|
||||
|
||||
type LibFuncs struct {
|
||||
FuncPtr any
|
||||
Name string
|
||||
}
|
||||
|
||||
func main() {
|
||||
libName := os.Getenv("FACEDETECT_LIBRARY")
|
||||
if libName == "" {
|
||||
libName = "libfacedetect.so"
|
||||
}
|
||||
|
||||
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
|
||||
if err != nil {
|
||||
panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
|
||||
}
|
||||
|
||||
// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
|
||||
// uintptr so the raw pointer can be freed via the matching capi free fn.
|
||||
libFuncs := []LibFuncs{
|
||||
{&CppAbiVersion, "facedetect_capi_abi_version"},
|
||||
{&CppLoad, "facedetect_capi_load"},
|
||||
{&CppFree, "facedetect_capi_free"},
|
||||
{&CppLastError, "facedetect_capi_last_error"},
|
||||
{&CppFreeString, "facedetect_capi_free_string"},
|
||||
{&CppFreeVec, "facedetect_capi_free_vec"},
|
||||
{&CppEmbedPath, "facedetect_capi_embed_path"},
|
||||
{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
|
||||
{&CppDetectJSON, "facedetect_capi_detect_path_json"},
|
||||
{&CppVerifyPaths, "facedetect_capi_verify_paths"},
|
||||
{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
|
||||
}
|
||||
for _, lf := range libFuncs {
|
||||
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
|
||||
}
|
||||
|
||||
fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
|
||||
|
||||
flag.Parse()
|
||||
|
||||
if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
47
backend/go/face-detect/options.go
Normal file
47
backend/go/face-detect/options.go
Normal file
@@ -0,0 +1,47 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
|
||||
// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
|
||||
// face backend ships with so the two implementations agree on verdicts out of
|
||||
// the box.
|
||||
const defaultVerifyThreshold float32 = 0.35
|
||||
|
||||
// loadOptions holds the parsed model-level options for face-detect.
|
||||
type loadOptions struct {
|
||||
verifyThreshold float32
|
||||
modelName string
|
||||
}
|
||||
|
||||
func splitOption(o string) (key, value string, ok bool) {
|
||||
i := strings.Index(o, ":")
|
||||
if i < 0 {
|
||||
return "", "", false
|
||||
}
|
||||
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
|
||||
}
|
||||
|
||||
// parseOptions reads the backend "key:value" option slice. Unknown keys are
|
||||
// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
|
||||
func parseOptions(opts []string) loadOptions {
|
||||
o := loadOptions{verifyThreshold: defaultVerifyThreshold}
|
||||
for _, oo := range opts {
|
||||
key, value, ok := splitOption(oo)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
switch key {
|
||||
case "verify_threshold", "threshold":
|
||||
if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
|
||||
o.verifyThreshold = float32(f)
|
||||
}
|
||||
case "model_name":
|
||||
o.modelName = value
|
||||
}
|
||||
}
|
||||
return o
|
||||
}
|
||||
68
backend/go/face-detect/package.sh
Normal file
68
backend/go/face-detect/package.sh
Normal file
@@ -0,0 +1,68 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
|
||||
# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
|
||||
# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
|
||||
# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
|
||||
# is used instead of the host's.
|
||||
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
REPO_ROOT="${CURDIR}/../../.."
|
||||
|
||||
mkdir -p "$CURDIR/package/lib"
|
||||
|
||||
cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
|
||||
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
|
||||
|
||||
# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
|
||||
# LD_LIBRARY_PATH, which run.sh points at lib/.
|
||||
cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
|
||||
echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Detect architecture and copy the core runtime libs libfacedetect.so links
|
||||
# against, plus the matching dynamic loader as lib/ld.so.
|
||||
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
|
||||
echo "Detected x86_64 architecture, copying x86_64 libraries..."
|
||||
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
|
||||
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
|
||||
echo "Detected ARM64 architecture, copying ARM64 libraries..."
|
||||
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
|
||||
elif [ "$(uname -s)" = "Darwin" ]; then
|
||||
echo "Detected Darwin"
|
||||
else
|
||||
echo "Error: Could not detect architecture"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
|
||||
# BUILD_TYPE so the backend can reach the GPU without the runtime base image
|
||||
# shipping those drivers.
|
||||
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
|
||||
if [ -f "$GPU_LIB_SCRIPT" ]; then
|
||||
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
|
||||
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
|
||||
package_gpu_libs
|
||||
fi
|
||||
|
||||
echo "Packaging completed successfully"
|
||||
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
|
||||
16
backend/go/face-detect/run.sh
Normal file
16
backend/go/face-detect/run.sh
Normal file
@@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
|
||||
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
|
||||
|
||||
# If a self-contained ld.so was packaged, route through it so the packaged
|
||||
# libc / libstdc++ are used instead of the host's (matches the voice-detect /
|
||||
# whisper / parakeet backends' runtime layout).
|
||||
if [ -f "$CURDIR/lib/ld.so" ]; then
|
||||
echo "Using lib/ld.so"
|
||||
exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
|
||||
fi
|
||||
|
||||
exec "$CURDIR/face-detect-grpc" "$@"
|
||||
15
backend/go/face-detect/test.sh
Normal file
15
backend/go/face-detect/test.sh
Normal file
@@ -0,0 +1,15 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
cd "$CURDIR"
|
||||
|
||||
echo "Running face-detect backend tests..."
|
||||
|
||||
# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
|
||||
# specs run only when a model + image are provided via
|
||||
# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
|
||||
# auto-skip.
|
||||
LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
|
||||
|
||||
echo "face-detect tests completed."
|
||||
18
backend/go/voice-detect/.gitignore
vendored
Normal file
18
backend/go/voice-detect/.gitignore
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
# Fetched upstream sources
|
||||
sources/
|
||||
|
||||
# CMake build directories
|
||||
build*/
|
||||
|
||||
# build artifacts staged in-tree by the Makefile (cp from sources/) or
|
||||
# symlinked for local dev; the real sources live in voice-detect.cpp upstream.
|
||||
*.so
|
||||
*.so.*
|
||||
voicedetect_capi.h
|
||||
compile_commands.json
|
||||
|
||||
# Compiled backend binary
|
||||
voice-detect-grpc
|
||||
|
||||
# Packaging output
|
||||
package/
|
||||
107
backend/go/voice-detect/Makefile
Normal file
107
backend/go/voice-detect/Makefile
Normal file
@@ -0,0 +1,107 @@
|
||||
# voice-detect backend Makefile.
|
||||
#
|
||||
# Upstream pin lives below as VOICEDETECT_VERSION?=3d51077... (.github/bump_deps.sh
|
||||
# can find and update it - matches the parakeet.cpp / whisper.cpp / ds4 convention).
|
||||
#
|
||||
# Local dev shortcut: if you already have an out-of-tree voice-detect.cpp build,
|
||||
# symlink the .so + header into this directory and skip the clone/cmake steps:
|
||||
#
|
||||
# ln -sf /path/to/voice-detect.cpp/build-shared/libvoicedetect.so .
|
||||
# ln -sf /path/to/voice-detect.cpp/include/voicedetect_capi.h .
|
||||
# go build -o voice-detect-grpc .
|
||||
#
|
||||
# The default target below does the proper clone-at-pin + cmake build so CI does
|
||||
# not need a side-checkout.
|
||||
|
||||
VOICEDETECT_VERSION?=3d510772357538c5182808ac7de2278b84824e24
|
||||
VOICEDETECT_REPO?=https://github.com/mudler/voice-detect.cpp
|
||||
|
||||
GOCMD?=go
|
||||
GO_TAGS?=
|
||||
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
|
||||
|
||||
BUILD_TYPE?=
|
||||
NATIVE?=false
|
||||
|
||||
# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
|
||||
# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
|
||||
RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
|
||||
|
||||
# Build ggml statically into libvoicedetect.so (PIC) so the shared lib is
|
||||
# self-contained: dlopen needs no libggml*.so alongside it, only system libs
|
||||
# (libstdc++/libgomp/libc) that the runtime image already provides.
|
||||
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DVOICEDETECT_SHARED=ON -DVOICEDETECT_BUILD_CLI=OFF -DVOICEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
|
||||
|
||||
ifeq ($(NATIVE),false)
|
||||
CMAKE_ARGS+=-DGGML_NATIVE=OFF
|
||||
endif
|
||||
|
||||
# voice-detect.cpp gates its GGML backends behind VOICEDETECT_GGML_* options and
|
||||
# does set(GGML_CUDA ${VOICEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
|
||||
# -DGGML_CUDA=ON is overwritten back to OFF. Forward the VOICEDETECT_GGML_*
|
||||
# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
|
||||
ifeq ($(BUILD_TYPE),cublas)
|
||||
CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDA=ON
|
||||
# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, reaches
|
||||
# torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
|
||||
# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
|
||||
# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
|
||||
ifeq ($(CUDA_MAJOR_VERSION),13)
|
||||
ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
|
||||
CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDNN=ON
|
||||
endif
|
||||
endif
|
||||
else ifeq ($(BUILD_TYPE),openblas)
|
||||
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
|
||||
else ifeq ($(BUILD_TYPE),hipblas)
|
||||
CMAKE_ARGS+=-DVOICEDETECT_GGML_HIP=ON
|
||||
else ifeq ($(BUILD_TYPE),vulkan)
|
||||
CMAKE_ARGS+=-DVOICEDETECT_GGML_VULKAN=ON
|
||||
else ifeq ($(BUILD_TYPE),metal)
|
||||
CMAKE_ARGS+=-DVOICEDETECT_GGML_METAL=ON
|
||||
endif
|
||||
|
||||
.PHONY: voice-detect-grpc package build clean purge test all
|
||||
|
||||
all: voice-detect-grpc
|
||||
|
||||
# Clone the upstream voice-detect.cpp source at the pinned commit. Directory acts
|
||||
# as the target so make only re-clones when missing. After a VOICEDETECT_VERSION
|
||||
# bump, run 'make purge && make' to refetch.
|
||||
sources/voice-detect.cpp:
|
||||
mkdir -p sources/voice-detect.cpp
|
||||
cd sources/voice-detect.cpp && \
|
||||
git init -q && \
|
||||
git remote add origin $(VOICEDETECT_REPO) && \
|
||||
git fetch --depth 1 origin $(VOICEDETECT_VERSION) && \
|
||||
git checkout FETCH_HEAD && \
|
||||
git submodule update --init --recursive --depth 1 --single-branch
|
||||
|
||||
# Build the shared lib + header out-of-tree, then stage them next to the Go
|
||||
# sources so purego.Dlopen("libvoicedetect.so") and the cgo-less build both pick
|
||||
# them up.
|
||||
libvoicedetect.so: sources/voice-detect.cpp
|
||||
cmake -B sources/voice-detect.cpp/build-shared -S sources/voice-detect.cpp $(CMAKE_ARGS)
|
||||
cmake --build sources/voice-detect.cpp/build-shared --config Release -j$(JOBS) --target voicedetect
|
||||
cp -fv sources/voice-detect.cpp/build-shared/libvoicedetect.so* ./ 2>/dev/null || true
|
||||
cp -fv sources/voice-detect.cpp/include/voicedetect_capi.h ./
|
||||
|
||||
voice-detect-grpc: libvoicedetect.so main.go govoicedetect.go options.go
|
||||
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voice-detect-grpc .
|
||||
|
||||
package: voice-detect-grpc
|
||||
bash package.sh
|
||||
|
||||
build: package
|
||||
|
||||
# Test target. The embed/verify/analyze smoke specs are gated on
|
||||
# VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV; without them the
|
||||
# heavy specs auto-skip and only the pure-Go parsing specs run.
|
||||
test:
|
||||
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
|
||||
|
||||
clean: purge
|
||||
rm -rf libvoicedetect.so* voicedetect_capi.h package voice-detect-grpc
|
||||
|
||||
purge:
|
||||
rm -rf sources/voice-detect.cpp
|
||||
273
backend/go/voice-detect/govoicedetect.go
Normal file
273
backend/go/voice-detect/govoicedetect.go
Normal file
@@ -0,0 +1,273 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"math"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
"unsafe"
|
||||
|
||||
"github.com/mudler/LocalAI/pkg/grpc/base"
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// purego-bound entry points from libvoicedetect.so. Names match
|
||||
// voicedetect_capi.h exactly so a `nm libvoicedetect.so | grep voicedetect_capi`
|
||||
// is enough to spot drift.
|
||||
//
|
||||
// The opaque ctx and the malloc'd char*/float* return values are declared as
|
||||
// uintptr so we get the raw pointer back and can release it via the matching
|
||||
// capi free function. purego's native string/[]float32 returns would copy and
|
||||
// forget the original pointer, leaking the C-owned buffer on every call.
|
||||
var (
|
||||
CppAbiVersion func() int32
|
||||
CppLoad func(ggufPath string) uintptr
|
||||
CppFree func(ctx uintptr)
|
||||
CppLastError func(ctx uintptr) string
|
||||
CppFreeString func(s uintptr)
|
||||
CppFreeVec func(v uintptr)
|
||||
CppEmbedPath func(ctx uintptr, wavPath string, outVec, outDim unsafe.Pointer) int32
|
||||
CppEmbedPCM func(ctx uintptr, pcm []float32, nSamples, sampleRate int32, outVec, outDim unsafe.Pointer) int32
|
||||
CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, outDistance, outVerified unsafe.Pointer) int32
|
||||
CppAnalyzeJSON func(ctx uintptr, wavPath string) uintptr
|
||||
)
|
||||
|
||||
// VoiceDetect implements the speaker-recognition voice subset of the Backend
|
||||
// gRPC service over libvoicedetect.so. The C side keeps a single loaded model
|
||||
// plus a per-ctx last-error buffer and is not reentrant, so base.SingleThread
|
||||
// serializes every call.
|
||||
type VoiceDetect struct {
|
||||
base.SingleThread
|
||||
opts loadOptions
|
||||
ctxPtr uintptr
|
||||
}
|
||||
|
||||
func (v *VoiceDetect) Load(opts *pb.ModelOptions) error {
|
||||
model := opts.ModelFile
|
||||
if model == "" {
|
||||
model = opts.ModelPath
|
||||
}
|
||||
if !filepath.IsAbs(model) && opts.ModelPath != "" {
|
||||
model = filepath.Join(opts.ModelPath, model)
|
||||
}
|
||||
if model == "" {
|
||||
return errors.New("voice-detect: ModelFile is required")
|
||||
}
|
||||
|
||||
v.opts = parseOptions(opts.Options)
|
||||
if v.opts.modelName == "" {
|
||||
v.opts.modelName = filepath.Base(model)
|
||||
}
|
||||
|
||||
// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
|
||||
// one backend process per model and serves requests concurrently, so the
|
||||
// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
|
||||
// VOICEDETECT_THREADS is read by the engine at backend construction, so it
|
||||
// must be set before the capi load. A non-positive Threads means "unset":
|
||||
// leave the env alone so the engine keeps its sane default.
|
||||
threads := opts.Threads
|
||||
if threads > 0 {
|
||||
if err := os.Setenv("VOICEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
|
||||
return fmt.Errorf("voice-detect: set VOICEDETECT_THREADS: %w", err)
|
||||
}
|
||||
xlog.Info("voice-detect: applying LocalAI thread budget", "threads", threads)
|
||||
}
|
||||
|
||||
xlog.Info("voice-detect: loading model", "model", model,
|
||||
"verify_threshold", v.opts.verifyThreshold, "abi", CppAbiVersion())
|
||||
|
||||
ctx := CppLoad(model)
|
||||
if ctx == 0 {
|
||||
// The last-error buffer lives on the ctx that was never returned, so
|
||||
// surface the path the operator tried to load instead.
|
||||
return fmt.Errorf("voice-detect: voicedetect_capi_load failed for %q", model)
|
||||
}
|
||||
v.ctxPtr = ctx
|
||||
return nil
|
||||
}
|
||||
|
||||
// VoiceEmbed returns the L2-normalized speaker embedding for an audio clip.
|
||||
// The request carries a filesystem PATH; the HTTP layer materializes
|
||||
// base64/URL/data-URI inputs to a temp file before the gRPC call.
|
||||
func (v *VoiceDetect) VoiceEmbed(req *pb.VoiceEmbedRequest) (pb.VoiceEmbedResponse, error) {
|
||||
if v.ctxPtr == 0 {
|
||||
return pb.VoiceEmbedResponse{}, errors.New("voice-detect: model not loaded")
|
||||
}
|
||||
if req.Audio == "" {
|
||||
return pb.VoiceEmbedResponse{}, errors.New("voice-detect: audio path is required")
|
||||
}
|
||||
emb, err := v.embedPath(req.Audio)
|
||||
if err != nil {
|
||||
return pb.VoiceEmbedResponse{}, err
|
||||
}
|
||||
return pb.VoiceEmbedResponse{Embedding: emb, Model: v.opts.modelName}, nil
|
||||
}
|
||||
|
||||
func (v *VoiceDetect) embedPath(path string) ([]float32, error) {
|
||||
var vec uintptr
|
||||
var dim int32
|
||||
rc := CppEmbedPath(v.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
|
||||
if rc != 0 || vec == 0 || dim <= 0 {
|
||||
return nil, v.lastErr("embed", path)
|
||||
}
|
||||
defer CppFreeVec(vec)
|
||||
// Copy out of the C-owned malloc'd buffer before freeing it. The
|
||||
// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
|
||||
// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
|
||||
// nor moves this buffer and we copy immediately.
|
||||
src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
|
||||
out := make([]float32, int(dim))
|
||||
copy(out, src)
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// VoiceVerify embeds two clips and reports whether they are the same speaker by
|
||||
// cosine distance against a threshold. A request threshold <= 0 falls back to
|
||||
// the model-configured default (verify_threshold option, 0.25 if unset).
|
||||
func (v *VoiceDetect) VoiceVerify(req *pb.VoiceVerifyRequest) (pb.VoiceVerifyResponse, error) {
|
||||
if v.ctxPtr == 0 {
|
||||
return pb.VoiceVerifyResponse{}, errors.New("voice-detect: model not loaded")
|
||||
}
|
||||
if req.Audio1 == "" || req.Audio2 == "" {
|
||||
return pb.VoiceVerifyResponse{}, errors.New("voice-detect: audio1 and audio2 are required")
|
||||
}
|
||||
|
||||
threshold := req.Threshold
|
||||
if threshold <= 0 {
|
||||
threshold = v.opts.verifyThreshold
|
||||
}
|
||||
|
||||
started := time.Now()
|
||||
var distance float32
|
||||
var verified int32
|
||||
rc := CppVerifyPaths(v.ctxPtr, req.Audio1, req.Audio2, threshold,
|
||||
unsafe.Pointer(&distance), unsafe.Pointer(&verified))
|
||||
if rc != 0 {
|
||||
return pb.VoiceVerifyResponse{}, v.lastErr("verify", req.Audio1+","+req.Audio2)
|
||||
}
|
||||
elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
|
||||
|
||||
// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
|
||||
// matching the Python speaker-recognition backend's reporting.
|
||||
confidence := float32(0)
|
||||
if threshold > 0 {
|
||||
confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
|
||||
}
|
||||
|
||||
return pb.VoiceVerifyResponse{
|
||||
Verified: verified != 0,
|
||||
Distance: distance,
|
||||
Threshold: threshold,
|
||||
Confidence: confidence,
|
||||
Model: v.opts.modelName,
|
||||
ProcessingTimeMs: elapsedMs,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// VoiceAnalyze runs the age/gender/emotion heads on a single clip. The C-API
|
||||
// always evaluates every supported head, so the request's actions filter is
|
||||
// advisory and the full analysis is returned as a single segment (the engine
|
||||
// does not produce time-bounded segments).
|
||||
func (v *VoiceDetect) VoiceAnalyze(req *pb.VoiceAnalyzeRequest) (pb.VoiceAnalyzeResponse, error) {
|
||||
if v.ctxPtr == 0 {
|
||||
return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: model not loaded")
|
||||
}
|
||||
if req.Audio == "" {
|
||||
return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: audio path is required")
|
||||
}
|
||||
|
||||
ptr := CppAnalyzeJSON(v.ctxPtr, req.Audio)
|
||||
if ptr == 0 {
|
||||
return pb.VoiceAnalyzeResponse{}, v.lastErr("analyze", req.Audio)
|
||||
}
|
||||
defer CppFreeString(ptr)
|
||||
|
||||
seg, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
|
||||
if err != nil {
|
||||
return pb.VoiceAnalyzeResponse{}, fmt.Errorf("voice-detect: analyze JSON for %q: %w", req.Audio, err)
|
||||
}
|
||||
return pb.VoiceAnalyzeResponse{Segments: []*pb.VoiceAnalysis{seg}}, nil
|
||||
}
|
||||
|
||||
// analyzeJSON mirrors the document returned by voicedetect_capi_analyze_path_json:
|
||||
//
|
||||
// {"age":42.0,
|
||||
// "gender":{"label":"female","female":0.88,"male":0.12},
|
||||
// "emotion":{"label":"neutral","scores":{"neutral":0.7, ...}}}
|
||||
//
|
||||
// gender is a mixed object (a "label" string plus per-class float scores), so
|
||||
// it is decoded into raw messages and split in parseAnalyzeJSON.
|
||||
type analyzeJSON struct {
|
||||
Age float32 `json:"age"`
|
||||
Gender map[string]json.RawMessage `json:"gender"`
|
||||
Emotion struct {
|
||||
Label string `json:"label"`
|
||||
Scores map[string]float32 `json:"scores"`
|
||||
} `json:"emotion"`
|
||||
}
|
||||
|
||||
// parseAnalyzeJSON maps the engine's analyze document onto a VoiceAnalysis.
|
||||
// start/end stay 0: the model emits a single whole-utterance result, not
|
||||
// time-bounded segments.
|
||||
func parseAnalyzeJSON(doc string) (*pb.VoiceAnalysis, error) {
|
||||
var a analyzeJSON
|
||||
if err := json.Unmarshal([]byte(doc), &a); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
seg := &pb.VoiceAnalysis{
|
||||
Age: a.Age,
|
||||
DominantEmotion: a.Emotion.Label,
|
||||
Emotion: a.Emotion.Scores,
|
||||
}
|
||||
|
||||
if len(a.Gender) > 0 {
|
||||
gender := make(map[string]float32, len(a.Gender))
|
||||
for k, raw := range a.Gender {
|
||||
if k == "label" {
|
||||
_ = json.Unmarshal(raw, &seg.DominantGender)
|
||||
continue
|
||||
}
|
||||
var score float32
|
||||
if err := json.Unmarshal(raw, &score); err == nil {
|
||||
gender[k] = score
|
||||
}
|
||||
}
|
||||
seg.Gender = gender
|
||||
}
|
||||
|
||||
return seg, nil
|
||||
}
|
||||
|
||||
// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
|
||||
func (v *VoiceDetect) lastErr(op, subject string) error {
|
||||
msg := strings.TrimSpace(CppLastError(v.ctxPtr))
|
||||
if msg == "" {
|
||||
msg = "no error detail"
|
||||
}
|
||||
return fmt.Errorf("voice-detect: %s failed for %q: %s", op, subject, msg)
|
||||
}
|
||||
|
||||
// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
|
||||
// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
|
||||
//
|
||||
// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
|
||||
// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
|
||||
// moves the buffer and we dereference it immediately to copy the bytes out.
|
||||
func goStringFromCPtr(cptr uintptr) string {
|
||||
if cptr == 0 {
|
||||
return ""
|
||||
}
|
||||
p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
|
||||
n := 0
|
||||
for *(*byte)(unsafe.Add(p, n)) != 0 {
|
||||
n++
|
||||
}
|
||||
return string(unsafe.Slice((*byte)(p), n))
|
||||
}
|
||||
144
backend/go/voice-detect/govoicedetect_test.go
Normal file
144
backend/go/voice-detect/govoicedetect_test.go
Normal file
@@ -0,0 +1,144 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/ebitengine/purego"
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
func TestVoiceDetect(t *testing.T) {
|
||||
RegisterFailHandler(Fail)
|
||||
RunSpecs(t, "voice-detect Backend Suite")
|
||||
}
|
||||
|
||||
var (
|
||||
libLoadOnce sync.Once
|
||||
libLoadErr error
|
||||
)
|
||||
|
||||
// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
|
||||
// bridge without spinning up the gRPC server. Records the error (the smoke
|
||||
// specs skip themselves) when libvoicedetect.so is not loadable from cwd
|
||||
// (LD_LIBRARY_PATH or a symlink in ./).
|
||||
func ensureLibLoaded() error {
|
||||
libLoadOnce.Do(func() {
|
||||
libName := os.Getenv("VOICEDETECT_LIBRARY")
|
||||
if libName == "" {
|
||||
libName = "libvoicedetect.so"
|
||||
}
|
||||
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
|
||||
if err != nil {
|
||||
libLoadErr = err
|
||||
return
|
||||
}
|
||||
purego.RegisterLibFunc(&CppAbiVersion, lib, "voicedetect_capi_abi_version")
|
||||
purego.RegisterLibFunc(&CppLoad, lib, "voicedetect_capi_load")
|
||||
purego.RegisterLibFunc(&CppFree, lib, "voicedetect_capi_free")
|
||||
purego.RegisterLibFunc(&CppLastError, lib, "voicedetect_capi_last_error")
|
||||
purego.RegisterLibFunc(&CppFreeString, lib, "voicedetect_capi_free_string")
|
||||
purego.RegisterLibFunc(&CppFreeVec, lib, "voicedetect_capi_free_vec")
|
||||
purego.RegisterLibFunc(&CppEmbedPath, lib, "voicedetect_capi_embed_path")
|
||||
purego.RegisterLibFunc(&CppEmbedPCM, lib, "voicedetect_capi_embed_pcm")
|
||||
purego.RegisterLibFunc(&CppVerifyPaths, lib, "voicedetect_capi_verify_paths")
|
||||
purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "voicedetect_capi_analyze_path_json")
|
||||
})
|
||||
return libLoadErr
|
||||
}
|
||||
|
||||
var _ = Describe("parseOptions", func() {
|
||||
It("defaults verify_threshold to 0.25", func() {
|
||||
o := parseOptions(nil)
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.25)))
|
||||
Expect(o.modelName).To(Equal(""))
|
||||
})
|
||||
|
||||
It("parses verify_threshold, threshold alias and model_name", func() {
|
||||
o := parseOptions([]string{"verify_threshold:0.4", "model_name:ecapa", "unknown:x"})
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.4)))
|
||||
Expect(o.modelName).To(Equal("ecapa"))
|
||||
|
||||
o2 := parseOptions([]string{"threshold:0.3"})
|
||||
Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
|
||||
})
|
||||
|
||||
It("ignores non-positive thresholds and keeps the default", func() {
|
||||
o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
|
||||
Expect(o.verifyThreshold).To(Equal(float32(0.25)))
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("parseAnalyzeJSON", func() {
|
||||
It("maps age, gender label+scores and emotion label+scores", func() {
|
||||
doc := `{"age":42.0,
|
||||
"gender":{"label":"female","female":0.88,"male":0.12},
|
||||
"emotion":{"label":"neutral","scores":{"neutral":0.7,"happy":0.2,"sad":0.1}}}`
|
||||
seg, err := parseAnalyzeJSON(doc)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(seg.Age).To(BeNumerically("~", 42.0, 1e-4))
|
||||
Expect(seg.Start).To(Equal(float32(0)))
|
||||
Expect(seg.End).To(Equal(float32(0)))
|
||||
|
||||
Expect(seg.DominantGender).To(Equal("female"))
|
||||
Expect(seg.Gender).To(HaveKeyWithValue("female", BeNumerically("~", 0.88, 1e-4)))
|
||||
Expect(seg.Gender).To(HaveKeyWithValue("male", BeNumerically("~", 0.12, 1e-4)))
|
||||
// The "label" entry is consumed into DominantGender, not the score map.
|
||||
Expect(seg.Gender).ToNot(HaveKey("label"))
|
||||
|
||||
Expect(seg.DominantEmotion).To(Equal("neutral"))
|
||||
Expect(seg.Emotion).To(HaveKeyWithValue("neutral", BeNumerically("~", 0.7, 1e-4)))
|
||||
Expect(seg.Emotion).To(HaveKeyWithValue("happy", BeNumerically("~", 0.2, 1e-4)))
|
||||
})
|
||||
|
||||
It("tolerates a missing gender block", func() {
|
||||
seg, err := parseAnalyzeJSON(`{"age":30.0,"emotion":{"label":"happy","scores":{"happy":1.0}}}`)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(seg.DominantGender).To(Equal(""))
|
||||
Expect(seg.DominantEmotion).To(Equal("happy"))
|
||||
})
|
||||
|
||||
It("returns an error on malformed JSON", func() {
|
||||
_, err := parseAnalyzeJSON(`{not-json`)
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
})
|
||||
|
||||
// The specs below exercise the real C-API end to end. They run only when both a
|
||||
// model GGUF and a test WAV are provided, and skip cleanly otherwise so the
|
||||
// suite stays green without large assets.
|
||||
var _ = Describe("VoiceDetect end-to-end", Ordered, func() {
|
||||
var (
|
||||
v *VoiceDetect
|
||||
modelPath = os.Getenv("VOICEDETECT_BACKEND_TEST_MODEL")
|
||||
wavPath = os.Getenv("VOICEDETECT_BACKEND_TEST_WAV")
|
||||
)
|
||||
|
||||
BeforeAll(func() {
|
||||
if modelPath == "" || wavPath == "" {
|
||||
Skip("set VOICEDETECT_BACKEND_TEST_MODEL and VOICEDETECT_BACKEND_TEST_WAV to run the e2e specs")
|
||||
}
|
||||
if err := ensureLibLoaded(); err != nil {
|
||||
Skip("libvoicedetect.so not loadable: " + err.Error())
|
||||
}
|
||||
v = &VoiceDetect{}
|
||||
Expect(v.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
|
||||
})
|
||||
|
||||
It("embeds an audio clip", func() {
|
||||
resp, err := v.VoiceEmbed(&pb.VoiceEmbedRequest{Audio: wavPath})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(resp.Embedding).ToNot(BeEmpty())
|
||||
Expect(resp.Model).ToNot(BeEmpty())
|
||||
})
|
||||
|
||||
It("verifies a clip against itself as the same speaker", func() {
|
||||
resp, err := v.VoiceVerify(&pb.VoiceVerifyRequest{Audio1: wavPath, Audio2: wavPath})
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(resp.Verified).To(BeTrue())
|
||||
Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
|
||||
})
|
||||
})
|
||||
64
backend/go/voice-detect/main.go
Normal file
64
backend/go/voice-detect/main.go
Normal file
@@ -0,0 +1,64 @@
|
||||
package main
|
||||
|
||||
// Started internally by LocalAI - one gRPC server per loaded model.
|
||||
//
|
||||
// Loads libvoicedetect.so via purego and registers the flat C-API entry points
|
||||
// declared in voicedetect_capi.h. The library name can be overridden with
|
||||
// VOICEDETECT_LIBRARY (mirrors the PARAKEET_LIBRARY / OMNIVOICE_LIBRARY
|
||||
// convention in the sibling backends); the default looks for the .so next to
|
||||
// this binary (resolved via LD_LIBRARY_PATH by run.sh).
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/ebitengine/purego"
|
||||
grpc "github.com/mudler/LocalAI/pkg/grpc"
|
||||
)
|
||||
|
||||
var (
|
||||
addr = flag.String("addr", "localhost:50051", "the address to connect to")
|
||||
)
|
||||
|
||||
type LibFuncs struct {
|
||||
FuncPtr any
|
||||
Name string
|
||||
}
|
||||
|
||||
func main() {
|
||||
libName := os.Getenv("VOICEDETECT_LIBRARY")
|
||||
if libName == "" {
|
||||
libName = "libvoicedetect.so"
|
||||
}
|
||||
|
||||
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
|
||||
if err != nil {
|
||||
panic(fmt.Errorf("voice-detect: dlopen %q: %w", libName, err))
|
||||
}
|
||||
|
||||
// Bound 1:1 to voicedetect_capi.h. char*/float* returns are registered as
|
||||
// uintptr so the raw pointer can be freed via the matching capi free fn.
|
||||
libFuncs := []LibFuncs{
|
||||
{&CppAbiVersion, "voicedetect_capi_abi_version"},
|
||||
{&CppLoad, "voicedetect_capi_load"},
|
||||
{&CppFree, "voicedetect_capi_free"},
|
||||
{&CppLastError, "voicedetect_capi_last_error"},
|
||||
{&CppFreeString, "voicedetect_capi_free_string"},
|
||||
{&CppFreeVec, "voicedetect_capi_free_vec"},
|
||||
{&CppEmbedPath, "voicedetect_capi_embed_path"},
|
||||
{&CppEmbedPCM, "voicedetect_capi_embed_pcm"},
|
||||
{&CppVerifyPaths, "voicedetect_capi_verify_paths"},
|
||||
{&CppAnalyzeJSON, "voicedetect_capi_analyze_path_json"},
|
||||
}
|
||||
for _, lf := range libFuncs {
|
||||
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
|
||||
}
|
||||
|
||||
fmt.Fprintf(os.Stderr, "[voice-detect] ABI=%d\n", CppAbiVersion())
|
||||
|
||||
flag.Parse()
|
||||
|
||||
if err := grpc.StartServer(*addr, &VoiceDetect{}); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
46
backend/go/voice-detect/options.go
Normal file
46
backend/go/voice-detect/options.go
Normal file
@@ -0,0 +1,46 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
|
||||
// not set one. Matches the Python speaker-recognition backend's default so the
|
||||
// two implementations agree on verdicts out of the box.
|
||||
const defaultVerifyThreshold float32 = 0.25
|
||||
|
||||
// loadOptions holds the parsed model-level options for voice-detect.
|
||||
type loadOptions struct {
|
||||
verifyThreshold float32
|
||||
modelName string
|
||||
}
|
||||
|
||||
func splitOption(o string) (key, value string, ok bool) {
|
||||
i := strings.Index(o, ":")
|
||||
if i < 0 {
|
||||
return "", "", false
|
||||
}
|
||||
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
|
||||
}
|
||||
|
||||
// parseOptions reads the backend "key:value" option slice. Unknown keys are
|
||||
// ignored. Defaults: verify_threshold 0.25, model_name derived from the file.
|
||||
func parseOptions(opts []string) loadOptions {
|
||||
o := loadOptions{verifyThreshold: defaultVerifyThreshold}
|
||||
for _, oo := range opts {
|
||||
key, value, ok := splitOption(oo)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
switch key {
|
||||
case "verify_threshold", "threshold":
|
||||
if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
|
||||
o.verifyThreshold = float32(f)
|
||||
}
|
||||
case "model_name":
|
||||
o.modelName = value
|
||||
}
|
||||
}
|
||||
return o
|
||||
}
|
||||
68
backend/go/voice-detect/package.sh
Executable file
68
backend/go/voice-detect/package.sh
Executable file
@@ -0,0 +1,68 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Bundle the voice-detect-grpc binary, libvoicedetect.so, the core runtime libs
|
||||
# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
|
||||
# so the package is self-contained. Mirrors backend/go/parakeet-cpp/package.sh;
|
||||
# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
|
||||
# is used instead of the host's.
|
||||
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
REPO_ROOT="${CURDIR}/../../.."
|
||||
|
||||
mkdir -p "$CURDIR/package/lib"
|
||||
|
||||
cp -avf "$CURDIR/voice-detect-grpc" "$CURDIR/package/"
|
||||
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
|
||||
|
||||
# libvoicedetect.so + any soname symlinks. purego.Dlopen resolves it via
|
||||
# LD_LIBRARY_PATH, which run.sh points at lib/.
|
||||
cp -avf "$CURDIR"/libvoicedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
|
||||
echo "ERROR: libvoicedetect.so not found in $CURDIR, run 'make' first" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Detect architecture and copy the core runtime libs libvoicedetect.so links
|
||||
# against, plus the matching dynamic loader as lib/ld.so.
|
||||
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
|
||||
echo "Detected x86_64 architecture, copying x86_64 libraries..."
|
||||
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
|
||||
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
|
||||
echo "Detected ARM64 architecture, copying ARM64 libraries..."
|
||||
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
|
||||
elif [ "$(uname -s)" = "Darwin" ]; then
|
||||
echo "Detected Darwin"
|
||||
else
|
||||
echo "Error: Could not detect architecture"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
|
||||
# BUILD_TYPE so the backend can reach the GPU without the runtime base image
|
||||
# shipping those drivers.
|
||||
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
|
||||
if [ -f "$GPU_LIB_SCRIPT" ]; then
|
||||
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
|
||||
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
|
||||
package_gpu_libs
|
||||
fi
|
||||
|
||||
echo "Packaging completed successfully"
|
||||
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
|
||||
16
backend/go/voice-detect/run.sh
Executable file
16
backend/go/voice-detect/run.sh
Executable file
@@ -0,0 +1,16 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
|
||||
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
|
||||
|
||||
# If a self-contained ld.so was packaged, route through it so the packaged
|
||||
# libc / libstdc++ are used instead of the host's (matches the whisper /
|
||||
# parakeet backends' runtime layout).
|
||||
if [ -f "$CURDIR/lib/ld.so" ]; then
|
||||
echo "Using lib/ld.so"
|
||||
exec "$CURDIR/lib/ld.so" "$CURDIR/voice-detect-grpc" "$@"
|
||||
fi
|
||||
|
||||
exec "$CURDIR/voice-detect-grpc" "$@"
|
||||
14
backend/go/voice-detect/test.sh
Executable file
14
backend/go/voice-detect/test.sh
Executable file
@@ -0,0 +1,14 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
cd "$CURDIR"
|
||||
|
||||
echo "Running voice-detect backend tests..."
|
||||
|
||||
# The pure-Go parsing specs always run. The embed/verify/analyze smoke specs run
|
||||
# only when a model + WAV are provided via VOICEDETECT_BACKEND_TEST_MODEL and
|
||||
# VOICEDETECT_BACKEND_TEST_WAV; otherwise they auto-skip.
|
||||
LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
|
||||
|
||||
echo "voice-detect tests completed."
|
||||
@@ -209,6 +209,78 @@
|
||||
nvidia-cuda-12: "cuda12-ced"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced"
|
||||
- &voicedetect
|
||||
name: "voice-detect"
|
||||
alias: "voice-detect"
|
||||
license: mit
|
||||
icon: https://avatars.githubusercontent.com/u/95302084
|
||||
description: |
|
||||
voice-detect speaker recognition and voice analysis.
|
||||
voice-detect.cpp is a C++/ggml engine that produces L2-normalised
|
||||
speaker embeddings (ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker
|
||||
ERes2Net, CAM++) for voice verification and 1:N identification, plus
|
||||
a wav2vec2 age / gender / emotion analysis head. It replaces the
|
||||
Python speaker-recognition backend and is exposed through the Voice*
|
||||
gRPC rpcs and the /v1/voice/* REST endpoints. It runs on CPU, NVIDIA
|
||||
CUDA, AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
|
||||
urls:
|
||||
- https://github.com/mudler/voice-detect.cpp
|
||||
tags:
|
||||
- voice-recognition
|
||||
- speaker-verification
|
||||
- speaker-embedding
|
||||
- CPU
|
||||
- GPU
|
||||
- CUDA
|
||||
- HIP
|
||||
capabilities:
|
||||
default: "cpu-voice-detect"
|
||||
nvidia: "cuda12-voice-detect"
|
||||
intel: "intel-sycl-f16-voice-detect"
|
||||
metal: "metal-voice-detect"
|
||||
amd: "rocm-voice-detect"
|
||||
vulkan: "vulkan-voice-detect"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-voice-detect"
|
||||
nvidia-cuda-13: "cuda13-voice-detect"
|
||||
nvidia-cuda-12: "cuda12-voice-detect"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect"
|
||||
- &facedetect
|
||||
name: "face-detect"
|
||||
alias: "face-detect"
|
||||
license: mit
|
||||
icon: https://avatars.githubusercontent.com/u/95302084
|
||||
description: |
|
||||
face-detect face detection, embedding, verification and analysis.
|
||||
face-detect.cpp is a C++/ggml engine that runs SCRFD / YuNet face
|
||||
detection and ArcFace / SFace 512-d (or 128-d) L2-normalised face
|
||||
embeddings for verification and 1:N identification, plus a landmark /
|
||||
age / gender analysis head. It replaces the Python insightface backend
|
||||
and is exposed through the Embedding, Detect and Face* gRPC rpcs and
|
||||
the /v1/face/* REST endpoints. It runs on CPU, NVIDIA CUDA, AMD
|
||||
ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
|
||||
urls:
|
||||
- https://github.com/mudler/face-detect.cpp
|
||||
tags:
|
||||
- face-recognition
|
||||
- face-verification
|
||||
- face-embedding
|
||||
- CPU
|
||||
- GPU
|
||||
- CUDA
|
||||
- HIP
|
||||
capabilities:
|
||||
default: "cpu-face-detect"
|
||||
nvidia: "cuda12-face-detect"
|
||||
intel: "intel-sycl-f16-face-detect"
|
||||
metal: "metal-face-detect"
|
||||
amd: "rocm-face-detect"
|
||||
vulkan: "vulkan-face-detect"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-face-detect"
|
||||
nvidia-cuda-13: "cuda13-face-detect"
|
||||
nvidia-cuda-12: "cuda12-face-detect"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect"
|
||||
- &voxtral
|
||||
name: "voxtral"
|
||||
alias: "voxtral"
|
||||
@@ -2827,6 +2899,236 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-ced
|
||||
## voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "voice-detect-development"
|
||||
capabilities:
|
||||
default: "cpu-voice-detect-development"
|
||||
nvidia: "cuda12-voice-detect-development"
|
||||
intel: "intel-sycl-f16-voice-detect-development"
|
||||
metal: "metal-voice-detect-development"
|
||||
amd: "rocm-voice-detect-development"
|
||||
vulkan: "vulkan-voice-detect-development"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-voice-detect-development"
|
||||
nvidia-cuda-13: "cuda13-voice-detect-development"
|
||||
nvidia-cuda-12: "cuda12-voice-detect-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect-development"
|
||||
- !!merge <<: *voicedetect
|
||||
name: "nvidia-l4t-arm64-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "nvidia-l4t-arm64-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda13-nvidia-l4t-arm64-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda13-nvidia-l4t-arm64-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cpu-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-cpu-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cpu-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-cpu-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "metal-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-metal-darwin-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "metal-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-metal-darwin-arm64-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda12-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda12-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "rocm-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-rocm-hipblas-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "rocm-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-rocm-hipblas-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "intel-sycl-f32-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-intel-sycl-f32-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "intel-sycl-f32-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-intel-sycl-f32-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "intel-sycl-f16-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-intel-sycl-f16-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "intel-sycl-f16-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-intel-sycl-f16-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "vulkan-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-vulkan-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "vulkan-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-vulkan-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda13-voice-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-voice-detect
|
||||
- !!merge <<: *voicedetect
|
||||
name: "cuda13-voice-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voice-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-voice-detect
|
||||
## face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "face-detect-development"
|
||||
capabilities:
|
||||
default: "cpu-face-detect-development"
|
||||
nvidia: "cuda12-face-detect-development"
|
||||
intel: "intel-sycl-f16-face-detect-development"
|
||||
metal: "metal-face-detect-development"
|
||||
amd: "rocm-face-detect-development"
|
||||
vulkan: "vulkan-face-detect-development"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-face-detect-development"
|
||||
nvidia-cuda-13: "cuda13-face-detect-development"
|
||||
nvidia-cuda-12: "cuda12-face-detect-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect-development"
|
||||
- !!merge <<: *facedetect
|
||||
name: "nvidia-l4t-arm64-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "nvidia-l4t-arm64-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda13-nvidia-l4t-arm64-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda13-nvidia-l4t-arm64-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cpu-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-cpu-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cpu-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-cpu-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "metal-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-metal-darwin-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "metal-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-metal-darwin-arm64-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda12-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda12-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "rocm-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-rocm-hipblas-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "rocm-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-rocm-hipblas-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "intel-sycl-f32-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-intel-sycl-f32-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "intel-sycl-f32-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-intel-sycl-f32-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "intel-sycl-f16-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-intel-sycl-f16-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "intel-sycl-f16-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-intel-sycl-f16-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "vulkan-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-vulkan-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "vulkan-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-vulkan-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda13-face-detect"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-face-detect
|
||||
- !!merge <<: *facedetect
|
||||
name: "cuda13-face-detect-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-face-detect"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-face-detect
|
||||
## stablediffusion-ggml
|
||||
- !!merge <<: *stablediffusionggml
|
||||
name: "cpu-stablediffusion-ggml"
|
||||
|
||||
Reference in New Issue
Block a user