chore(recon): bump backend pins to round-2 CPU-optimized engines

voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context, wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp -> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
chore(recon): bump backend pins to CPU-optimized engine commits
2026-06-22 15:49:12 -04:00 · 2026-06-22 18:27:59 +00:00 · 2026-06-22 15:16:21 +00:00 · 2026-06-22 13:32:02 +00:00 · 2026-06-22 10:48:00 +00:00 · 2026-06-22 09:42:01 +00:00
104 changed files with 4558 additions and 3122 deletions
--- a/.github/backend-matrix.yml
+++ b/.github/backend-matrix.yml
@@ -3723,6 +3723,302 @@ include:
    dockerfile: "./backend/Dockerfile.golang"
    context: "./"
    ubuntu-version: '2404'
+  # voice-detect
+  - build-type: 'cublas'
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-nvidia-cuda-12-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-nvidia-cuda-13-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: 'linux/arm64'
+    skip-drivers: 'false'
+    tag-latest: 'auto'
+    tag-suffix: '-nvidia-l4t-cuda-13-arm64-voice-detect'
+    base-image: "ubuntu:24.04"
+    ubuntu-version: '2404'
+    runs-on: 'ubuntu-24.04-arm'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+  - build-type: ''
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    platform-tag: 'amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-cpu-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: ''
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/arm64'
+    platform-tag: 'arm64'
+    tag-latest: 'auto'
+    tag-suffix: '-cpu-voice-detect'
+    runs-on: 'ubuntu-24.04-arm'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'sycl_f32'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-intel-sycl-f32-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'sycl_f16'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-intel-sycl-f16-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'vulkan'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    platform-tag: 'amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-vulkan-voice-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'vulkan'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/arm64'
+    platform-tag: 'arm64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-vulkan-voice-detect'
+    runs-on: 'ubuntu-24.04-arm'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: 'linux/arm64'
+    skip-drivers: 'false'
+    tag-latest: 'auto'
+    tag-suffix: '-nvidia-l4t-arm64-voice-detect'
+    base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+    runs-on: 'ubuntu-24.04-arm'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2204'
+  - build-type: 'hipblas'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-rocm-hipblas-voice-detect'
+    base-image: "rocm/dev-ubuntu-24.04:7.2.1"
+    runs-on: 'ubuntu-latest'
+    skip-drivers: 'false'
+    backend: "voice-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  # face-detect
+  - build-type: 'cublas'
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-nvidia-cuda-12-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-nvidia-cuda-13-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: 'linux/arm64'
+    skip-drivers: 'false'
+    tag-latest: 'auto'
+    tag-suffix: '-nvidia-l4t-cuda-13-arm64-face-detect'
+    base-image: "ubuntu:24.04"
+    ubuntu-version: '2404'
+    runs-on: 'ubuntu-24.04-arm'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+  - build-type: ''
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    platform-tag: 'amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-cpu-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: ''
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/arm64'
+    platform-tag: 'arm64'
+    tag-latest: 'auto'
+    tag-suffix: '-cpu-face-detect'
+    runs-on: 'ubuntu-24.04-arm'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'sycl_f32'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-intel-sycl-f32-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'sycl_f16'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-intel-sycl-f16-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'vulkan'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    platform-tag: 'amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-vulkan-face-detect'
+    runs-on: 'ubuntu-latest'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'vulkan'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/arm64'
+    platform-tag: 'arm64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-vulkan-face-detect'
+    runs-on: 'ubuntu-24.04-arm'
+    base-image: "ubuntu:24.04"
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
+  - build-type: 'cublas'
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: 'linux/arm64'
+    skip-drivers: 'false'
+    tag-latest: 'auto'
+    tag-suffix: '-nvidia-l4t-arm64-face-detect'
+    base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+    runs-on: 'ubuntu-24.04-arm'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2204'
+  - build-type: 'hipblas'
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: 'linux/amd64'
+    tag-latest: 'auto'
+    tag-suffix: '-gpu-rocm-hipblas-face-detect'
+    base-image: "rocm/dev-ubuntu-24.04:7.2.1"
+    runs-on: 'ubuntu-latest'
+    skip-drivers: 'false'
+    backend: "face-detect"
+    dockerfile: "./backend/Dockerfile.golang"
+    context: "./"
+    ubuntu-version: '2404'
  # acestep-cpp
  - build-type: ''
    cuda-major-version: ""
@@ -4906,6 +5202,14 @@ includeDarwin:
    tag-suffix: "-metal-darwin-arm64-ced"
    build-type: "metal"
    lang: "go"
+  - backend: "voice-detect"
+    tag-suffix: "-metal-darwin-arm64-voice-detect"
+    build-type: "metal"
+    lang: "go"
+  - backend: "face-detect"
+    tag-suffix: "-metal-darwin-arm64-face-detect"
+    build-type: "metal"
+    lang: "go"
  - backend: "acestep-cpp"
    tag-suffix: "-metal-darwin-arm64-acestep-cpp"
    build-type: "metal"
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -46,6 +46,14 @@ jobs:
            variable: "CED_VERSION"
            branch: "master"
            file: "backend/go/ced/Makefile"
+          - repository: "mudler/voice-detect.cpp"
+            variable: "VOICEDETECT_VERSION"
+            branch: "master"
+            file: "backend/go/voice-detect/Makefile"
+          - repository: "mudler/face-detect.cpp"
+            variable: "FACEDETECT_VERSION"
+            branch: "master"
+            file: "backend/go/face-detect/Makefile"
          - repository: "mudler/depth-anything.cpp"
            variable: "DEPTHANYTHING_VERSION"
            branch: "master"
--- a/.github/workflows/tests-pii-ner-e2e.yml
+++ b/.github/workflows/tests-pii-ner-e2e.yml
@@ -1,97 +0,0 @@
---
-name: 'PII NER tier E2E (live GGUF, CPU)'
-
-# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the
-# hermetic tests/e2e suite cannot cover (it only exercises the in-process
-# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB
-# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand.
-#
-# This drives the container-level harness (tests/e2e-backends) via
-# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image,
-# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned
-# TokenClassify spans. The complementary HTTP-path specs in tests/e2e
-# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired.
-
-on:
-  workflow_dispatch:
-  schedule:
-    - cron: '0 3 * * *'
-  push:
-    branches:
-      - master
-    paths:
-      - 'backend/cpp/privacy-filter/**'
-      - 'backend/Dockerfile.privacy-filter'
-      - 'core/services/routing/pii/**'
-      - 'core/services/routing/piidetector/**'
-      - 'core/backend/token_classify.go'
-      - 'core/http/endpoints/localai/pii.go'
-      - 'core/schema/pii.go'
-      - 'tests/e2e-backends/**'
-      - 'tests/e2e/e2e_pii_ner_test.go'
-      - 'tests/e2e/e2e_suite_test.go'
-      - '.github/workflows/tests-pii-ner-e2e.yml'
-  pull_request:
-    paths:
-      - 'backend/cpp/privacy-filter/**'
-      - 'backend/Dockerfile.privacy-filter'
-      - 'core/services/routing/pii/**'
-      - 'core/services/routing/piidetector/**'
-      - 'core/backend/token_classify.go'
-      - 'core/http/endpoints/localai/pii.go'
-      - 'core/schema/pii.go'
-      - 'tests/e2e-backends/**'
-      - 'tests/e2e/e2e_pii_ner_test.go'
-      - 'tests/e2e/e2e_suite_test.go'
-      - '.github/workflows/tests-pii-ner-e2e.yml'
-
-concurrency:
-  group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
-  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
-
-jobs:
-  tests-pii-ner-e2e:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        go-version: ['1.25.x']
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Free disk space
-        run: |
-          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
-          sudo docker image prune --all --force || true
-          df -h
-      - name: Configure apt mirror on runner
-        uses: ./.github/actions/configure-apt-mirror
-      - name: Setup Go ${{ matrix.go-version }}
-        uses: actions/setup-go@v5
-        with:
-          go-version: ${{ matrix.go-version }}
-          cache: false
-      - name: Proto Dependencies
-        run: |
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential
-      # Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on
-      # CPU and runs the token_classify capability spec (byte-offset contract).
-      - name: Run live PII NER backend E2E
-        run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
--- a/.gitignore
+++ b/.gitignore
@@ -91,6 +91,3 @@ core/http/react-ui/test-results/

 # Local worktrees
 .worktrees/
-
-# SDD / brainstorm scratch (agent-driven development)
-.superpowers/
--- a/10
+++ b/10
@@ -690,16 +690,6 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
 	BACKEND_TEST_CTX_SIZE=2048 \
 	$(MAKE) test-extra-backend

-## privacy-filter: the PII/NER token-classification backend. Exercises the
-## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets
-## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M
-## active params). This is the live-backend coverage for the PII NER tier.
-test-extra-backend-privacy-filter: docker-build-privacy-filter
-	BACKEND_IMAGE=local-ai-backend:privacy-filter \
-	BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \
-	BACKEND_TEST_CAPS=health,load,token_classify \
-	$(MAKE) test-extra-backend
-
 ## vllm is resolved from a HuggingFace model id (no file download) and
 ## exercises Predict + streaming + tool-call extraction via the hermes parser.
 ## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=7c082bc417bbe53210a83df4ba5b49e18ce6193c
+LLAMA_VERSION?=e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=7a8cb80907341c0204bd0488c1244764f4163883
+CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/go/face-detect/.gitignore
+++ b/backend/go/face-detect/.gitignore
@@ -0,0 +1,18 @@
+# Fetched upstream sources
+sources/
+
+# CMake build directories
+build*/
+
+# build artifacts staged in-tree by the Makefile (cp from sources/) or
+# symlinked for local dev; the real sources live in face-detect.cpp upstream.
+*.so
+*.so.*
+facedetect_capi.h
+compile_commands.json
+
+# Compiled backend binary
+face-detect-grpc
+
+# Packaging output
+package/
--- a/backend/go/face-detect/Makefile
+++ b/backend/go/face-detect/Makefile
@@ -0,0 +1,97 @@
+# face-detect backend Makefile.
+#
+# Upstream pin lives below as FACEDETECT_VERSION?=9c8adb7... (.github/bump_deps.sh
+# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
+# convention).
+#
+# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
+# symlink the .so + header into this directory and skip the clone/cmake steps:
+#
+#   ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
+#   ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
+#   go build -o face-detect-grpc .
+#
+# The default target below does the proper clone-at-pin + cmake build so CI does
+# not need a side-checkout.
+
+FACEDETECT_VERSION?=9c8adb748f1f02d7fc0430a883234aef4b343a34
+FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
+
+GOCMD?=go
+GO_TAGS?=
+JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
+
+BUILD_TYPE?=
+NATIVE?=false
+
+# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
+# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
+# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
+# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
+# side, so only the facedetect_capi_* surface is exported.
+CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
+
+ifeq ($(NATIVE),false)
+	CMAKE_ARGS+=-DGGML_NATIVE=OFF
+endif
+
+# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
+# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
+# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
+# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
+ifeq ($(BUILD_TYPE),cublas)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
+else ifeq ($(BUILD_TYPE),openblas)
+	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+else ifeq ($(BUILD_TYPE),hipblas)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
+else ifeq ($(BUILD_TYPE),vulkan)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
+else ifeq ($(BUILD_TYPE),metal)
+	CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
+endif
+
+.PHONY: face-detect-grpc package build clean purge test all
+
+all: face-detect-grpc
+
+# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
+# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
+# bump, run 'make purge && make' to refetch.
+sources/face-detect.cpp:
+	mkdir -p sources/face-detect.cpp
+	cd sources/face-detect.cpp && \
+	git init -q && \
+	git remote add origin $(FACEDETECT_REPO) && \
+	git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
+	git checkout FETCH_HEAD && \
+	git submodule update --init --recursive --depth 1 --single-branch
+
+# Build the shared lib + header out-of-tree, then stage them next to the Go
+# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
+# them up.
+libfacedetect.so: sources/face-detect.cpp
+	cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
+	cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
+	cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
+	cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
+
+face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
+	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
+
+package: face-detect-grpc
+	bash package.sh
+
+build: package
+
+# Test target. The embed/detect/verify/analyze smoke specs are gated on
+# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
+# heavy specs auto-skip and only the pure-Go parsing specs run.
+test:
+	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
+
+clean: purge
+	rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
+
+purge:
+	rm -rf sources/face-detect.cpp
--- a/backend/go/face-detect/gofacedetect.go
+++ b/backend/go/face-detect/gofacedetect.go
@@ -0,0 +1,431 @@
+package main
+
+import (
+	"encoding/base64"
+	"encoding/json"
+	"errors"
+	"fmt"
+	"math"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"time"
+	"unsafe"
+
+	"github.com/mudler/LocalAI/pkg/grpc/base"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/xlog"
+)
+
+// purego-bound entry points from libfacedetect.so. Names match
+// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
+// is enough to spot drift.
+//
+// The opaque ctx and the malloc'd char*/float* return values are declared as
+// uintptr so we get the raw pointer back and can release it via the matching
+// capi free function. purego's native string/[]float32 returns would copy and
+// forget the original pointer, leaking the C-owned buffer on every call.
+var (
+	CppAbiVersion  func() int32
+	CppLoad        func(ggufPath string) uintptr
+	CppFree        func(ctx uintptr)
+	CppLastError   func(ctx uintptr) string
+	CppFreeString  func(s uintptr)
+	CppFreeVec     func(v uintptr)
+	CppEmbedPath   func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
+	CppEmbedRGB    func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
+	CppDetectJSON  func(ctx uintptr, imagePath string) uintptr
+	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
+	CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
+)
+
+// FaceDetect implements the face-recognition (biometric) subset of the Backend
+// gRPC service over libfacedetect.so. The C side keeps a single loaded model
+// pack plus a per-ctx last-error buffer and is not reentrant, so
+// base.SingleThread serializes every call.
+type FaceDetect struct {
+	base.SingleThread
+	opts   loadOptions
+	ctxPtr uintptr
+}
+
+func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
+	model := opts.ModelFile
+	if model == "" {
+		model = opts.ModelPath
+	}
+	if !filepath.IsAbs(model) && opts.ModelPath != "" {
+		model = filepath.Join(opts.ModelPath, model)
+	}
+	if model == "" {
+		return errors.New("face-detect: ModelFile is required")
+	}
+
+	f.opts = parseOptions(opts.Options)
+	if f.opts.modelName == "" {
+		f.opts.modelName = filepath.Base(model)
+	}
+
+	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
+	// one backend process per model and serves requests concurrently, so the
+	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
+	// FACEDETECT_THREADS is read by the engine at backend construction, so it
+	// must be set before the capi load. A non-positive Threads means "unset":
+	// leave the env alone so the engine keeps its sane default.
+	threads := opts.Threads
+	if threads > 0 {
+		if err := os.Setenv("FACEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
+			return fmt.Errorf("face-detect: set FACEDETECT_THREADS: %w", err)
+		}
+		xlog.Info("face-detect: applying LocalAI thread budget", "threads", threads)
+	}
+
+	xlog.Info("face-detect: loading model", "model", model,
+		"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
+
+	ctx := CppLoad(model)
+	if ctx == 0 {
+		// The last-error buffer lives on the ctx that was never returned, so
+		// surface the path the operator tried to load instead.
+		return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
+	}
+	f.ctxPtr = ctx
+	return nil
+}
+
+// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
+// the supplied image. Mirroring the Python face backend, the image is read from
+// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
+// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
+// server wraps the returned slice in an EmbeddingResult.
+func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
+	if f.ctxPtr == 0 {
+		return nil, errors.New("face-detect: model not loaded")
+	}
+	if len(req.Images) == 0 || req.Images[0] == "" {
+		return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
+	}
+
+	path, cleanup, err := materializeImage(req.Images[0])
+	if err != nil {
+		return nil, err
+	}
+	defer cleanup()
+
+	return f.embedPath(path)
+}
+
+func (f *FaceDetect) embedPath(path string) ([]float32, error) {
+	var vec uintptr
+	var dim int32
+	rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
+	if rc != 0 || vec == 0 || dim <= 0 {
+		return nil, f.lastErr("embed", path)
+	}
+	defer CppFreeVec(vec)
+	// Copy out of the C-owned malloc'd buffer before freeing it. The
+	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
+	// nor moves this buffer and we copy immediately.
+	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
+	out := make([]float32, int(dim))
+	copy(out, src)
+	return out, nil
+}
+
+// Detect runs SCRFD over the image and returns one Detection per face. The
+// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
+// width/height, so the corners are converted. The 5 facial landmarks the engine
+// also returns are dropped: the Detection message has no field for them.
+func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Src == "" {
+		return pb.DetectResponse{}, errors.New("face-detect: src image is required")
+	}
+
+	path, cleanup, err := materializeImage(req.Src)
+	if err != nil {
+		return pb.DetectResponse{}, err
+	}
+	defer cleanup()
+
+	faces, err := f.detectFaces(path)
+	if err != nil {
+		return pb.DetectResponse{}, err
+	}
+
+	dets := make([]*pb.Detection, 0, len(faces))
+	for _, fc := range faces {
+		if req.Threshold > 0 && fc.Score < req.Threshold {
+			continue
+		}
+		x, y, w, h := fc.xywh()
+		dets = append(dets, &pb.Detection{
+			X:          x,
+			Y:          y,
+			Width:      w,
+			Height:     h,
+			Confidence: fc.Score,
+			ClassName:  "face",
+		})
+	}
+	return pb.DetectResponse{Detections: dets}, nil
+}
+
+// FaceVerify embeds the primary face in each image and reports whether they are
+// the same identity by cosine distance against a threshold. A request threshold
+// <= 0 falls back to the model-configured default (verify_threshold option,
+// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
+// veto internally (verified forced false on a spoof); the per-image liveness
+// scores are not exposed by the verify entry point, so img*_is_real /
+// img*_antispoof_score stay at their zero values.
+func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Img1 == "" || req.Img2 == "" {
+		return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
+	}
+
+	path1, cleanup1, err := materializeImage(req.Img1)
+	if err != nil {
+		return pb.FaceVerifyResponse{}, err
+	}
+	defer cleanup1()
+	path2, cleanup2, err := materializeImage(req.Img2)
+	if err != nil {
+		return pb.FaceVerifyResponse{}, err
+	}
+	defer cleanup2()
+
+	threshold := req.Threshold
+	if threshold <= 0 {
+		threshold = f.opts.verifyThreshold
+	}
+
+	antiSpoof := int32(0)
+	if req.AntiSpoofing {
+		antiSpoof = 1
+	}
+
+	started := time.Now()
+	var distance float32
+	var verified int32
+	rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
+		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
+	if rc != 0 {
+		return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
+	}
+	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
+
+	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
+	// matching the Python face backend's reporting.
+	confidence := float32(0)
+	if threshold > 0 {
+		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
+	}
+
+	return pb.FaceVerifyResponse{
+		Verified:         verified != 0,
+		Distance:         distance,
+		Threshold:        threshold,
+		Confidence:       confidence,
+		Model:            f.opts.modelName,
+		Img1Area:         f.bestArea(path1),
+		Img2Area:         f.bestArea(path2),
+		ProcessingTimeMs: elapsedMs,
+	}, nil
+}
+
+// FaceAnalyze runs the genderage head on every detected face. The C-API returns
+// "M"/"F" gender labels and a rounded age; the labels are normalized to the
+// "Man"/"Woman" values the proto documents.
+func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
+	if f.ctxPtr == 0 {
+		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
+	}
+	if req.Img == "" {
+		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
+	}
+
+	path, cleanup, err := materializeImage(req.Img)
+	if err != nil {
+		return pb.FaceAnalyzeResponse{}, err
+	}
+	defer cleanup()
+
+	ptr := CppAnalyzeJSON(f.ctxPtr, path)
+	if ptr == 0 {
+		return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
+	}
+	defer CppFreeString(ptr)
+
+	faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
+	if err != nil {
+		return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
+	}
+	return pb.FaceAnalyzeResponse{Faces: faces}, nil
+}
+
+// faceBox is one entry of the detect/analyze JSON documents the engine emits.
+type faceBox struct {
+	Score  float32   `json:"score"`
+	Box    []float32 `json:"box"`
+	Age    float32   `json:"age"`
+	Gender string    `json:"gender"`
+}
+
+// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
+// proto carries. A short or missing box yields zeros.
+func (b faceBox) xywh() (x, y, w, h float32) {
+	if len(b.Box) < 4 {
+		return 0, 0, 0, 0
+	}
+	return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
+}
+
+type facesJSON struct {
+	Faces []faceBox `json:"faces"`
+}
+
+func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
+	ptr := CppDetectJSON(f.ctxPtr, path)
+	if ptr == 0 {
+		return nil, f.lastErr("detect", path)
+	}
+	defer CppFreeString(ptr)
+
+	var doc facesJSON
+	if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
+		return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
+	}
+	return doc.Faces, nil
+}
+
+// bestArea returns the FacialArea of the highest-scoring face in an image, or an
+// empty area when detection fails or finds nothing. Best-effort: verify already
+// succeeded, so a missing region must not turn a valid match into an error.
+func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
+	faces, err := f.detectFaces(path)
+	if err != nil || len(faces) == 0 {
+		return &pb.FacialArea{}
+	}
+	best := faces[0]
+	for _, fc := range faces[1:] {
+		if fc.Score > best.Score {
+			best = fc
+		}
+	}
+	x, y, w, h := best.xywh()
+	return &pb.FacialArea{X: x, Y: y, W: w, H: h}
+}
+
+// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
+// The engine reports gender as "M"/"F"; both the dominant label and the score
+// map are filled with the "Man"/"Woman" form the proto documents.
+func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
+	var parsed facesJSON
+	if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
+		return nil, err
+	}
+
+	out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
+	for _, fc := range parsed.Faces {
+		x, y, w, h := fc.xywh()
+		fa := &pb.FaceAnalysis{
+			Region:         &pb.FacialArea{X: x, Y: y, W: w, H: h},
+			FaceConfidence: fc.Score,
+			Age:            fc.Age,
+		}
+		if label := normalizeGender(fc.Gender); label != "" {
+			fa.DominantGender = label
+			fa.Gender = map[string]float32{label: 1.0}
+		}
+		out = append(out, fa)
+	}
+	return out, nil
+}
+
+// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
+// proto documents. Unknown codes pass through unchanged.
+func normalizeGender(g string) string {
+	switch strings.ToUpper(strings.TrimSpace(g)) {
+	case "M":
+		return "Man"
+	case "F":
+		return "Woman"
+	case "":
+		return ""
+	default:
+		return g
+	}
+}
+
+// materializeImage decodes a base64 image payload into a temp file and returns
+// its path plus a cleanup func. As a convenience for callers that already pass a
+// filesystem path (e.g. a test fixture), an existing path is used as-is with a
+// no-op cleanup. data: URI prefixes are stripped before decoding.
+func materializeImage(src string) (path string, cleanup func(), err error) {
+	noop := func() {}
+	if src == "" {
+		return "", noop, errors.New("face-detect: empty image input")
+	}
+	if _, statErr := os.Stat(src); statErr == nil {
+		return src, noop, nil
+	}
+
+	payload := src
+	if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
+		payload = payload[i+1:]
+	}
+	data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
+	if decErr != nil || len(data) == 0 {
+		return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
+	}
+
+	tmp, createErr := os.CreateTemp("", "face-detect-*.img")
+	if createErr != nil {
+		return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
+	}
+	cleanup = func() { _ = os.Remove(tmp.Name()) }
+	if _, wErr := tmp.Write(data); wErr != nil {
+		_ = tmp.Close()
+		cleanup()
+		return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
+	}
+	if cErr := tmp.Close(); cErr != nil {
+		cleanup()
+		return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
+	}
+	return tmp.Name(), cleanup, nil
+}
+
+// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
+func (f *FaceDetect) lastErr(op, subject string) error {
+	msg := strings.TrimSpace(CppLastError(f.ctxPtr))
+	if msg == "" {
+		msg = "no error detail"
+	}
+	return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
+}
+
+// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
+// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
+//
+// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
+// moves the buffer and we dereference it immediately to copy the bytes out.
+func goStringFromCPtr(cptr uintptr) string {
+	if cptr == 0 {
+		return ""
+	}
+	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
+	n := 0
+	for *(*byte)(unsafe.Add(p, n)) != 0 {
+		n++
+	}
+	return string(unsafe.Slice((*byte)(p), n))
+}
--- a/backend/go/face-detect/gofacedetect_test.go
+++ b/backend/go/face-detect/gofacedetect_test.go
@@ -0,0 +1,230 @@
+package main
+
+import (
+	"encoding/base64"
+	"os"
+	"sync"
+	"testing"
+
+	"github.com/ebitengine/purego"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestFaceDetect(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "face-detect Backend Suite")
+}
+
+var (
+	libLoadOnce sync.Once
+	libLoadErr  error
+)
+
+// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
+// bridge without spinning up the gRPC server. Records the error (the smoke
+// specs skip themselves) when libfacedetect.so is not loadable from cwd
+// (LD_LIBRARY_PATH or a symlink in ./).
+func ensureLibLoaded() error {
+	libLoadOnce.Do(func() {
+		libName := os.Getenv("FACEDETECT_LIBRARY")
+		if libName == "" {
+			libName = "libfacedetect.so"
+		}
+		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+		if err != nil {
+			libLoadErr = err
+			return
+		}
+		purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
+		purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
+		purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
+		purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
+		purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
+		purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
+		purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
+		purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
+		purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
+		purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
+		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
+	})
+	return libLoadErr
+}
+
+var _ = Describe("parseOptions", func() {
+	It("defaults verify_threshold to 0.35", func() {
+		o := parseOptions(nil)
+		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
+		Expect(o.modelName).To(Equal(""))
+	})
+
+	It("parses verify_threshold, threshold alias and model_name", func() {
+		o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
+		Expect(o.modelName).To(Equal("buffalo_l"))
+
+		o2 := parseOptions([]string{"threshold:0.3"})
+		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
+	})
+
+	It("ignores non-positive thresholds and keeps the default", func() {
+		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
+	})
+})
+
+var _ = Describe("normalizeGender", func() {
+	It("maps M/F codes to Man/Woman", func() {
+		Expect(normalizeGender("M")).To(Equal("Man"))
+		Expect(normalizeGender("f")).To(Equal("Woman"))
+		Expect(normalizeGender(" m ")).To(Equal("Man"))
+	})
+
+	It("passes empty and unknown codes through", func() {
+		Expect(normalizeGender("")).To(Equal(""))
+		Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
+	})
+})
+
+var _ = Describe("faceBox.xywh", func() {
+	It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
+		b := faceBox{Box: []float32{10, 20, 50, 80}}
+		x, y, w, h := b.xywh()
+		Expect(x).To(Equal(float32(10)))
+		Expect(y).To(Equal(float32(20)))
+		Expect(w).To(Equal(float32(40)))
+		Expect(h).To(Equal(float32(60)))
+	})
+
+	It("returns zeros for a short box", func() {
+		x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
+		Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
+	})
+})
+
+var _ = Describe("parseAnalyzeJSON", func() {
+	It("maps region, age and gender for each face", func() {
+		doc := `{"faces":[
+			{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
+			{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
+		faces, err := parseAnalyzeJSON(doc)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(HaveLen(2))
+
+		Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
+		Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
+		Expect(faces[0].DominantGender).To(Equal("Man"))
+		Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
+		Expect(faces[0].Region.W).To(Equal(float32(40)))
+		Expect(faces[0].Region.H).To(Equal(float32(60)))
+
+		Expect(faces[1].DominantGender).To(Equal("Woman"))
+	})
+
+	It("tolerates a missing gender field", func() {
+		faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(HaveLen(1))
+		Expect(faces[0].DominantGender).To(Equal(""))
+		Expect(faces[0].Gender).To(BeEmpty())
+	})
+
+	It("returns no faces for an empty document", func() {
+		faces, err := parseAnalyzeJSON(`{"faces":[]}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(faces).To(BeEmpty())
+	})
+
+	It("returns an error on malformed JSON", func() {
+		_, err := parseAnalyzeJSON(`{not-json`)
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+var _ = Describe("materializeImage", func() {
+	It("decodes a base64 payload to a temp file", func() {
+		payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
+		path, cleanup, err := materializeImage(payload)
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		data, rerr := os.ReadFile(path)
+		Expect(rerr).ToNot(HaveOccurred())
+		Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
+	})
+
+	It("strips a data: URI prefix before decoding", func() {
+		payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
+		path, cleanup, err := materializeImage(payload)
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		data, rerr := os.ReadFile(path)
+		Expect(rerr).ToNot(HaveOccurred())
+		Expect(data).To(Equal([]byte("hello")))
+	})
+
+	It("uses an existing path as-is", func() {
+		tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
+		Expect(err).ToNot(HaveOccurred())
+		defer func() { _ = os.Remove(tmp.Name()) }()
+		Expect(tmp.Close()).To(Succeed())
+
+		path, cleanup, err := materializeImage(tmp.Name())
+		Expect(err).ToNot(HaveOccurred())
+		defer cleanup()
+		Expect(path).To(Equal(tmp.Name()))
+	})
+
+	It("errors on input that is neither a path nor base64", func() {
+		_, _, err := materializeImage("not base64!!!")
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+// The specs below exercise the real C-API end to end. They run only when both a
+// model GGUF and a test image are provided, and skip cleanly otherwise so the
+// suite stays green without large assets.
+var _ = Describe("FaceDetect end-to-end", Ordered, func() {
+	var (
+		f         *FaceDetect
+		modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
+		imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
+	)
+
+	BeforeAll(func() {
+		if modelPath == "" || imagePath == "" {
+			Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
+		}
+		if err := ensureLibLoaded(); err != nil {
+			Skip("libfacedetect.so not loadable: " + err.Error())
+		}
+		f = &FaceDetect{}
+		Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
+	})
+
+	It("embeds the primary face in an image", func() {
+		emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(emb).ToNot(BeEmpty())
+	})
+
+	It("detects at least one face", func() {
+		resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Detections).ToNot(BeEmpty())
+		Expect(resp.Detections[0].ClassName).To(Equal("face"))
+	})
+
+	It("verifies an image against itself as the same identity", func() {
+		resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Verified).To(BeTrue())
+		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
+	})
+
+	It("analyzes age/gender for each face", func() {
+		resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Faces).ToNot(BeEmpty())
+	})
+})
--- a/backend/go/face-detect/main.go
+++ b/backend/go/face-detect/main.go
@@ -0,0 +1,65 @@
+package main
+
+// Started internally by LocalAI - one gRPC server per loaded model.
+//
+// Loads libfacedetect.so via purego and registers the flat C-API entry points
+// declared in facedetect_capi.h. The library name can be overridden with
+// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
+// convention in the sibling backends); the default looks for the .so next to
+// this binary (resolved via LD_LIBRARY_PATH by run.sh).
+import (
+	"flag"
+	"fmt"
+	"os"
+
+	"github.com/ebitengine/purego"
+	grpc "github.com/mudler/LocalAI/pkg/grpc"
+)
+
+var (
+	addr = flag.String("addr", "localhost:50051", "the address to connect to")
+)
+
+type LibFuncs struct {
+	FuncPtr any
+	Name    string
+}
+
+func main() {
+	libName := os.Getenv("FACEDETECT_LIBRARY")
+	if libName == "" {
+		libName = "libfacedetect.so"
+	}
+
+	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	if err != nil {
+		panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
+	}
+
+	// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
+	// uintptr so the raw pointer can be freed via the matching capi free fn.
+	libFuncs := []LibFuncs{
+		{&CppAbiVersion, "facedetect_capi_abi_version"},
+		{&CppLoad, "facedetect_capi_load"},
+		{&CppFree, "facedetect_capi_free"},
+		{&CppLastError, "facedetect_capi_last_error"},
+		{&CppFreeString, "facedetect_capi_free_string"},
+		{&CppFreeVec, "facedetect_capi_free_vec"},
+		{&CppEmbedPath, "facedetect_capi_embed_path"},
+		{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
+		{&CppDetectJSON, "facedetect_capi_detect_path_json"},
+		{&CppVerifyPaths, "facedetect_capi_verify_paths"},
+		{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
+	}
+	for _, lf := range libFuncs {
+		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
+	}
+
+	fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
+
+	flag.Parse()
+
+	if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
+		panic(err)
+	}
+}
--- a/backend/go/face-detect/options.go
+++ b/backend/go/face-detect/options.go
@@ -0,0 +1,47 @@
+package main
+
+import (
+	"strconv"
+	"strings"
+)
+
+// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
+// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
+// face backend ships with so the two implementations agree on verdicts out of
+// the box.
+const defaultVerifyThreshold float32 = 0.35
+
+// loadOptions holds the parsed model-level options for face-detect.
+type loadOptions struct {
+	verifyThreshold float32
+	modelName       string
+}
+
+func splitOption(o string) (key, value string, ok bool) {
+	i := strings.Index(o, ":")
+	if i < 0 {
+		return "", "", false
+	}
+	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
+}
+
+// parseOptions reads the backend "key:value" option slice. Unknown keys are
+// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
+func parseOptions(opts []string) loadOptions {
+	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
+	for _, oo := range opts {
+		key, value, ok := splitOption(oo)
+		if !ok {
+			continue
+		}
+		switch key {
+		case "verify_threshold", "threshold":
+			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
+				o.verifyThreshold = float32(f)
+			}
+		case "model_name":
+			o.modelName = value
+		}
+	}
+	return o
+}
--- a/backend/go/face-detect/package.sh
+++ b/backend/go/face-detect/package.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
+# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
+# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
+# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
+# is used instead of the host's.
+
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+REPO_ROOT="${CURDIR}/../../.."
+
+mkdir -p "$CURDIR/package/lib"
+
+cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
+cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
+
+# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
+# LD_LIBRARY_PATH, which run.sh points at lib/.
+cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
+	exit 1
+}
+
+# Detect architecture and copy the core runtime libs libfacedetect.so links
+# against, plus the matching dynamic loader as lib/ld.so.
+if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
+    echo "Detected x86_64 architecture, copying x86_64 libraries..."
+    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
+    echo "Detected ARM64 architecture, copying ARM64 libraries..."
+    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ "$(uname -s)" = "Darwin" ]; then
+    echo "Detected Darwin"
+else
+    echo "Error: Could not detect architecture"
+    exit 1
+fi
+
+# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
+# BUILD_TYPE so the backend can reach the GPU without the runtime base image
+# shipping those drivers.
+GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
+if [ -f "$GPU_LIB_SCRIPT" ]; then
+    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
+    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
+    package_gpu_libs
+fi
+
+echo "Packaging completed successfully"
+ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/face-detect/run.sh
+++ b/backend/go/face-detect/run.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+
+# If a self-contained ld.so was packaged, route through it so the packaged
+# libc / libstdc++ are used instead of the host's (matches the voice-detect /
+# whisper / parakeet backends' runtime layout).
+if [ -f "$CURDIR/lib/ld.so" ]; then
+	echo "Using lib/ld.so"
+	exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
+fi
+
+exec "$CURDIR/face-detect-grpc" "$@"
--- a/backend/go/face-detect/test.sh
+++ b/backend/go/face-detect/test.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+cd "$CURDIR"
+
+echo "Running face-detect backend tests..."
+
+# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
+# specs run only when a model + image are provided via
+# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
+# auto-skip.
+LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
+
+echo "face-detect tests completed."
--- a/backend/go/voice-detect/.gitignore
+++ b/backend/go/voice-detect/.gitignore
@@ -0,0 +1,18 @@
+# Fetched upstream sources
+sources/
+
+# CMake build directories
+build*/
+
+# build artifacts staged in-tree by the Makefile (cp from sources/) or
+# symlinked for local dev; the real sources live in voice-detect.cpp upstream.
+*.so
+*.so.*
+voicedetect_capi.h
+compile_commands.json
+
+# Compiled backend binary
+voice-detect-grpc
+
+# Packaging output
+package/
--- a/backend/go/voice-detect/Makefile
+++ b/backend/go/voice-detect/Makefile
@@ -0,0 +1,94 @@
+# voice-detect backend Makefile.
+#
+# Upstream pin lives below as VOICEDETECT_VERSION?=fe7e6a3... (.github/bump_deps.sh
+# can find and update it - matches the parakeet.cpp / whisper.cpp / ds4 convention).
+#
+# Local dev shortcut: if you already have an out-of-tree voice-detect.cpp build,
+# symlink the .so + header into this directory and skip the clone/cmake steps:
+#
+#   ln -sf /path/to/voice-detect.cpp/build-shared/libvoicedetect.so .
+#   ln -sf /path/to/voice-detect.cpp/include/voicedetect_capi.h .
+#   go build -o voice-detect-grpc .
+#
+# The default target below does the proper clone-at-pin + cmake build so CI does
+# not need a side-checkout.
+
+VOICEDETECT_VERSION?=fe7e6a3f0a0afc141566e18c8e97a8417ee0c3cd
+VOICEDETECT_REPO?=https://github.com/mudler/voice-detect.cpp
+
+GOCMD?=go
+GO_TAGS?=
+JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
+
+BUILD_TYPE?=
+NATIVE?=false
+
+# Build ggml statically into libvoicedetect.so (PIC) so the shared lib is
+# self-contained: dlopen needs no libggml*.so alongside it, only system libs
+# (libstdc++/libgomp/libc) that the runtime image already provides.
+CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DVOICEDETECT_SHARED=ON -DVOICEDETECT_BUILD_CLI=OFF -DVOICEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
+
+ifeq ($(NATIVE),false)
+	CMAKE_ARGS+=-DGGML_NATIVE=OFF
+endif
+
+# voice-detect.cpp gates its GGML backends behind VOICEDETECT_GGML_* options and
+# does set(GGML_CUDA ${VOICEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
+# -DGGML_CUDA=ON is overwritten back to OFF. Forward the VOICEDETECT_GGML_*
+# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
+ifeq ($(BUILD_TYPE),cublas)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDA=ON
+else ifeq ($(BUILD_TYPE),openblas)
+	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
+else ifeq ($(BUILD_TYPE),hipblas)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_HIP=ON
+else ifeq ($(BUILD_TYPE),vulkan)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_VULKAN=ON
+else ifeq ($(BUILD_TYPE),metal)
+	CMAKE_ARGS+=-DVOICEDETECT_GGML_METAL=ON
+endif
+
+.PHONY: voice-detect-grpc package build clean purge test all
+
+all: voice-detect-grpc
+
+# Clone the upstream voice-detect.cpp source at the pinned commit. Directory acts
+# as the target so make only re-clones when missing. After a VOICEDETECT_VERSION
+# bump, run 'make purge && make' to refetch.
+sources/voice-detect.cpp:
+	mkdir -p sources/voice-detect.cpp
+	cd sources/voice-detect.cpp && \
+	git init -q && \
+	git remote add origin $(VOICEDETECT_REPO) && \
+	git fetch --depth 1 origin $(VOICEDETECT_VERSION) && \
+	git checkout FETCH_HEAD && \
+	git submodule update --init --recursive --depth 1 --single-branch
+
+# Build the shared lib + header out-of-tree, then stage them next to the Go
+# sources so purego.Dlopen("libvoicedetect.so") and the cgo-less build both pick
+# them up.
+libvoicedetect.so: sources/voice-detect.cpp
+	cmake -B sources/voice-detect.cpp/build-shared -S sources/voice-detect.cpp $(CMAKE_ARGS)
+	cmake --build sources/voice-detect.cpp/build-shared --config Release -j$(JOBS) --target voicedetect
+	cp -fv sources/voice-detect.cpp/build-shared/libvoicedetect.so* ./ 2>/dev/null || true
+	cp -fv sources/voice-detect.cpp/include/voicedetect_capi.h ./
+
+voice-detect-grpc: libvoicedetect.so main.go govoicedetect.go options.go
+	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voice-detect-grpc .
+
+package: voice-detect-grpc
+	bash package.sh
+
+build: package
+
+# Test target. The embed/verify/analyze smoke specs are gated on
+# VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV; without them the
+# heavy specs auto-skip and only the pure-Go parsing specs run.
+test:
+	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
+
+clean: purge
+	rm -rf libvoicedetect.so* voicedetect_capi.h package voice-detect-grpc
+
+purge:
+	rm -rf sources/voice-detect.cpp
--- a/backend/go/voice-detect/govoicedetect.go
+++ b/backend/go/voice-detect/govoicedetect.go
@@ -0,0 +1,273 @@
+package main
+
+import (
+	"encoding/json"
+	"errors"
+	"fmt"
+	"math"
+	"os"
+	"path/filepath"
+	"strconv"
+	"strings"
+	"time"
+	"unsafe"
+
+	"github.com/mudler/LocalAI/pkg/grpc/base"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/xlog"
+)
+
+// purego-bound entry points from libvoicedetect.so. Names match
+// voicedetect_capi.h exactly so a `nm libvoicedetect.so | grep voicedetect_capi`
+// is enough to spot drift.
+//
+// The opaque ctx and the malloc'd char*/float* return values are declared as
+// uintptr so we get the raw pointer back and can release it via the matching
+// capi free function. purego's native string/[]float32 returns would copy and
+// forget the original pointer, leaking the C-owned buffer on every call.
+var (
+	CppAbiVersion  func() int32
+	CppLoad        func(ggufPath string) uintptr
+	CppFree        func(ctx uintptr)
+	CppLastError   func(ctx uintptr) string
+	CppFreeString  func(s uintptr)
+	CppFreeVec     func(v uintptr)
+	CppEmbedPath   func(ctx uintptr, wavPath string, outVec, outDim unsafe.Pointer) int32
+	CppEmbedPCM    func(ctx uintptr, pcm []float32, nSamples, sampleRate int32, outVec, outDim unsafe.Pointer) int32
+	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, outDistance, outVerified unsafe.Pointer) int32
+	CppAnalyzeJSON func(ctx uintptr, wavPath string) uintptr
+)
+
+// VoiceDetect implements the speaker-recognition voice subset of the Backend
+// gRPC service over libvoicedetect.so. The C side keeps a single loaded model
+// plus a per-ctx last-error buffer and is not reentrant, so base.SingleThread
+// serializes every call.
+type VoiceDetect struct {
+	base.SingleThread
+	opts   loadOptions
+	ctxPtr uintptr
+}
+
+func (v *VoiceDetect) Load(opts *pb.ModelOptions) error {
+	model := opts.ModelFile
+	if model == "" {
+		model = opts.ModelPath
+	}
+	if !filepath.IsAbs(model) && opts.ModelPath != "" {
+		model = filepath.Join(opts.ModelPath, model)
+	}
+	if model == "" {
+		return errors.New("voice-detect: ModelFile is required")
+	}
+
+	v.opts = parseOptions(opts.Options)
+	if v.opts.modelName == "" {
+		v.opts.modelName = filepath.Base(model)
+	}
+
+	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
+	// one backend process per model and serves requests concurrently, so the
+	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
+	// VOICEDETECT_THREADS is read by the engine at backend construction, so it
+	// must be set before the capi load. A non-positive Threads means "unset":
+	// leave the env alone so the engine keeps its sane default.
+	threads := opts.Threads
+	if threads > 0 {
+		if err := os.Setenv("VOICEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
+			return fmt.Errorf("voice-detect: set VOICEDETECT_THREADS: %w", err)
+		}
+		xlog.Info("voice-detect: applying LocalAI thread budget", "threads", threads)
+	}
+
+	xlog.Info("voice-detect: loading model", "model", model,
+		"verify_threshold", v.opts.verifyThreshold, "abi", CppAbiVersion())
+
+	ctx := CppLoad(model)
+	if ctx == 0 {
+		// The last-error buffer lives on the ctx that was never returned, so
+		// surface the path the operator tried to load instead.
+		return fmt.Errorf("voice-detect: voicedetect_capi_load failed for %q", model)
+	}
+	v.ctxPtr = ctx
+	return nil
+}
+
+// VoiceEmbed returns the L2-normalized speaker embedding for an audio clip.
+// The request carries a filesystem PATH; the HTTP layer materializes
+// base64/URL/data-URI inputs to a temp file before the gRPC call.
+func (v *VoiceDetect) VoiceEmbed(req *pb.VoiceEmbedRequest) (pb.VoiceEmbedResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio == "" {
+		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: audio path is required")
+	}
+	emb, err := v.embedPath(req.Audio)
+	if err != nil {
+		return pb.VoiceEmbedResponse{}, err
+	}
+	return pb.VoiceEmbedResponse{Embedding: emb, Model: v.opts.modelName}, nil
+}
+
+func (v *VoiceDetect) embedPath(path string) ([]float32, error) {
+	var vec uintptr
+	var dim int32
+	rc := CppEmbedPath(v.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
+	if rc != 0 || vec == 0 || dim <= 0 {
+		return nil, v.lastErr("embed", path)
+	}
+	defer CppFreeVec(vec)
+	// Copy out of the C-owned malloc'd buffer before freeing it. The
+	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
+	// nor moves this buffer and we copy immediately.
+	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
+	out := make([]float32, int(dim))
+	copy(out, src)
+	return out, nil
+}
+
+// VoiceVerify embeds two clips and reports whether they are the same speaker by
+// cosine distance against a threshold. A request threshold <= 0 falls back to
+// the model-configured default (verify_threshold option, 0.25 if unset).
+func (v *VoiceDetect) VoiceVerify(req *pb.VoiceVerifyRequest) (pb.VoiceVerifyResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio1 == "" || req.Audio2 == "" {
+		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: audio1 and audio2 are required")
+	}
+
+	threshold := req.Threshold
+	if threshold <= 0 {
+		threshold = v.opts.verifyThreshold
+	}
+
+	started := time.Now()
+	var distance float32
+	var verified int32
+	rc := CppVerifyPaths(v.ctxPtr, req.Audio1, req.Audio2, threshold,
+		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
+	if rc != 0 {
+		return pb.VoiceVerifyResponse{}, v.lastErr("verify", req.Audio1+","+req.Audio2)
+	}
+	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
+
+	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
+	// matching the Python speaker-recognition backend's reporting.
+	confidence := float32(0)
+	if threshold > 0 {
+		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
+	}
+
+	return pb.VoiceVerifyResponse{
+		Verified:         verified != 0,
+		Distance:         distance,
+		Threshold:        threshold,
+		Confidence:       confidence,
+		Model:            v.opts.modelName,
+		ProcessingTimeMs: elapsedMs,
+	}, nil
+}
+
+// VoiceAnalyze runs the age/gender/emotion heads on a single clip. The C-API
+// always evaluates every supported head, so the request's actions filter is
+// advisory and the full analysis is returned as a single segment (the engine
+// does not produce time-bounded segments).
+func (v *VoiceDetect) VoiceAnalyze(req *pb.VoiceAnalyzeRequest) (pb.VoiceAnalyzeResponse, error) {
+	if v.ctxPtr == 0 {
+		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: model not loaded")
+	}
+	if req.Audio == "" {
+		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: audio path is required")
+	}
+
+	ptr := CppAnalyzeJSON(v.ctxPtr, req.Audio)
+	if ptr == 0 {
+		return pb.VoiceAnalyzeResponse{}, v.lastErr("analyze", req.Audio)
+	}
+	defer CppFreeString(ptr)
+
+	seg, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
+	if err != nil {
+		return pb.VoiceAnalyzeResponse{}, fmt.Errorf("voice-detect: analyze JSON for %q: %w", req.Audio, err)
+	}
+	return pb.VoiceAnalyzeResponse{Segments: []*pb.VoiceAnalysis{seg}}, nil
+}
+
+// analyzeJSON mirrors the document returned by voicedetect_capi_analyze_path_json:
+//
+//	{"age":42.0,
+//	 "gender":{"label":"female","female":0.88,"male":0.12},
+//	 "emotion":{"label":"neutral","scores":{"neutral":0.7, ...}}}
+//
+// gender is a mixed object (a "label" string plus per-class float scores), so
+// it is decoded into raw messages and split in parseAnalyzeJSON.
+type analyzeJSON struct {
+	Age     float32                    `json:"age"`
+	Gender  map[string]json.RawMessage `json:"gender"`
+	Emotion struct {
+		Label  string             `json:"label"`
+		Scores map[string]float32 `json:"scores"`
+	} `json:"emotion"`
+}
+
+// parseAnalyzeJSON maps the engine's analyze document onto a VoiceAnalysis.
+// start/end stay 0: the model emits a single whole-utterance result, not
+// time-bounded segments.
+func parseAnalyzeJSON(doc string) (*pb.VoiceAnalysis, error) {
+	var a analyzeJSON
+	if err := json.Unmarshal([]byte(doc), &a); err != nil {
+		return nil, err
+	}
+
+	seg := &pb.VoiceAnalysis{
+		Age:             a.Age,
+		DominantEmotion: a.Emotion.Label,
+		Emotion:         a.Emotion.Scores,
+	}
+
+	if len(a.Gender) > 0 {
+		gender := make(map[string]float32, len(a.Gender))
+		for k, raw := range a.Gender {
+			if k == "label" {
+				_ = json.Unmarshal(raw, &seg.DominantGender)
+				continue
+			}
+			var score float32
+			if err := json.Unmarshal(raw, &score); err == nil {
+				gender[k] = score
+			}
+		}
+		seg.Gender = gender
+	}
+
+	return seg, nil
+}
+
+// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
+func (v *VoiceDetect) lastErr(op, subject string) error {
+	msg := strings.TrimSpace(CppLastError(v.ctxPtr))
+	if msg == "" {
+		msg = "no error detail"
+	}
+	return fmt.Errorf("voice-detect: %s failed for %q: %s", op, subject, msg)
+}
+
+// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
+// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
+//
+// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
+// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
+// moves the buffer and we dereference it immediately to copy the bytes out.
+func goStringFromCPtr(cptr uintptr) string {
+	if cptr == 0 {
+		return ""
+	}
+	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
+	n := 0
+	for *(*byte)(unsafe.Add(p, n)) != 0 {
+		n++
+	}
+	return string(unsafe.Slice((*byte)(p), n))
+}
--- a/backend/go/voice-detect/govoicedetect_test.go
+++ b/backend/go/voice-detect/govoicedetect_test.go
@@ -0,0 +1,144 @@
+package main
+
+import (
+	"os"
+	"sync"
+	"testing"
+
+	"github.com/ebitengine/purego"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestVoiceDetect(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "voice-detect Backend Suite")
+}
+
+var (
+	libLoadOnce sync.Once
+	libLoadErr  error
+)
+
+// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
+// bridge without spinning up the gRPC server. Records the error (the smoke
+// specs skip themselves) when libvoicedetect.so is not loadable from cwd
+// (LD_LIBRARY_PATH or a symlink in ./).
+func ensureLibLoaded() error {
+	libLoadOnce.Do(func() {
+		libName := os.Getenv("VOICEDETECT_LIBRARY")
+		if libName == "" {
+			libName = "libvoicedetect.so"
+		}
+		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+		if err != nil {
+			libLoadErr = err
+			return
+		}
+		purego.RegisterLibFunc(&CppAbiVersion, lib, "voicedetect_capi_abi_version")
+		purego.RegisterLibFunc(&CppLoad, lib, "voicedetect_capi_load")
+		purego.RegisterLibFunc(&CppFree, lib, "voicedetect_capi_free")
+		purego.RegisterLibFunc(&CppLastError, lib, "voicedetect_capi_last_error")
+		purego.RegisterLibFunc(&CppFreeString, lib, "voicedetect_capi_free_string")
+		purego.RegisterLibFunc(&CppFreeVec, lib, "voicedetect_capi_free_vec")
+		purego.RegisterLibFunc(&CppEmbedPath, lib, "voicedetect_capi_embed_path")
+		purego.RegisterLibFunc(&CppEmbedPCM, lib, "voicedetect_capi_embed_pcm")
+		purego.RegisterLibFunc(&CppVerifyPaths, lib, "voicedetect_capi_verify_paths")
+		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "voicedetect_capi_analyze_path_json")
+	})
+	return libLoadErr
+}
+
+var _ = Describe("parseOptions", func() {
+	It("defaults verify_threshold to 0.25", func() {
+		o := parseOptions(nil)
+		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
+		Expect(o.modelName).To(Equal(""))
+	})
+
+	It("parses verify_threshold, threshold alias and model_name", func() {
+		o := parseOptions([]string{"verify_threshold:0.4", "model_name:ecapa", "unknown:x"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
+		Expect(o.modelName).To(Equal("ecapa"))
+
+		o2 := parseOptions([]string{"threshold:0.3"})
+		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
+	})
+
+	It("ignores non-positive thresholds and keeps the default", func() {
+		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
+		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
+	})
+})
+
+var _ = Describe("parseAnalyzeJSON", func() {
+	It("maps age, gender label+scores and emotion label+scores", func() {
+		doc := `{"age":42.0,
+			"gender":{"label":"female","female":0.88,"male":0.12},
+			"emotion":{"label":"neutral","scores":{"neutral":0.7,"happy":0.2,"sad":0.1}}}`
+		seg, err := parseAnalyzeJSON(doc)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(seg.Age).To(BeNumerically("~", 42.0, 1e-4))
+		Expect(seg.Start).To(Equal(float32(0)))
+		Expect(seg.End).To(Equal(float32(0)))
+
+		Expect(seg.DominantGender).To(Equal("female"))
+		Expect(seg.Gender).To(HaveKeyWithValue("female", BeNumerically("~", 0.88, 1e-4)))
+		Expect(seg.Gender).To(HaveKeyWithValue("male", BeNumerically("~", 0.12, 1e-4)))
+		// The "label" entry is consumed into DominantGender, not the score map.
+		Expect(seg.Gender).ToNot(HaveKey("label"))
+
+		Expect(seg.DominantEmotion).To(Equal("neutral"))
+		Expect(seg.Emotion).To(HaveKeyWithValue("neutral", BeNumerically("~", 0.7, 1e-4)))
+		Expect(seg.Emotion).To(HaveKeyWithValue("happy", BeNumerically("~", 0.2, 1e-4)))
+	})
+
+	It("tolerates a missing gender block", func() {
+		seg, err := parseAnalyzeJSON(`{"age":30.0,"emotion":{"label":"happy","scores":{"happy":1.0}}}`)
+		Expect(err).ToNot(HaveOccurred())
+		Expect(seg.DominantGender).To(Equal(""))
+		Expect(seg.DominantEmotion).To(Equal("happy"))
+	})
+
+	It("returns an error on malformed JSON", func() {
+		_, err := parseAnalyzeJSON(`{not-json`)
+		Expect(err).To(HaveOccurred())
+	})
+})
+
+// The specs below exercise the real C-API end to end. They run only when both a
+// model GGUF and a test WAV are provided, and skip cleanly otherwise so the
+// suite stays green without large assets.
+var _ = Describe("VoiceDetect end-to-end", Ordered, func() {
+	var (
+		v         *VoiceDetect
+		modelPath = os.Getenv("VOICEDETECT_BACKEND_TEST_MODEL")
+		wavPath   = os.Getenv("VOICEDETECT_BACKEND_TEST_WAV")
+	)
+
+	BeforeAll(func() {
+		if modelPath == "" || wavPath == "" {
+			Skip("set VOICEDETECT_BACKEND_TEST_MODEL and VOICEDETECT_BACKEND_TEST_WAV to run the e2e specs")
+		}
+		if err := ensureLibLoaded(); err != nil {
+			Skip("libvoicedetect.so not loadable: " + err.Error())
+		}
+		v = &VoiceDetect{}
+		Expect(v.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
+	})
+
+	It("embeds an audio clip", func() {
+		resp, err := v.VoiceEmbed(&pb.VoiceEmbedRequest{Audio: wavPath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Embedding).ToNot(BeEmpty())
+		Expect(resp.Model).ToNot(BeEmpty())
+	})
+
+	It("verifies a clip against itself as the same speaker", func() {
+		resp, err := v.VoiceVerify(&pb.VoiceVerifyRequest{Audio1: wavPath, Audio2: wavPath})
+		Expect(err).ToNot(HaveOccurred())
+		Expect(resp.Verified).To(BeTrue())
+		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
+	})
+})
--- a/backend/go/voice-detect/main.go
+++ b/backend/go/voice-detect/main.go
@@ -0,0 +1,64 @@
+package main
+
+// Started internally by LocalAI - one gRPC server per loaded model.
+//
+// Loads libvoicedetect.so via purego and registers the flat C-API entry points
+// declared in voicedetect_capi.h. The library name can be overridden with
+// VOICEDETECT_LIBRARY (mirrors the PARAKEET_LIBRARY / OMNIVOICE_LIBRARY
+// convention in the sibling backends); the default looks for the .so next to
+// this binary (resolved via LD_LIBRARY_PATH by run.sh).
+import (
+	"flag"
+	"fmt"
+	"os"
+
+	"github.com/ebitengine/purego"
+	grpc "github.com/mudler/LocalAI/pkg/grpc"
+)
+
+var (
+	addr = flag.String("addr", "localhost:50051", "the address to connect to")
+)
+
+type LibFuncs struct {
+	FuncPtr any
+	Name    string
+}
+
+func main() {
+	libName := os.Getenv("VOICEDETECT_LIBRARY")
+	if libName == "" {
+		libName = "libvoicedetect.so"
+	}
+
+	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	if err != nil {
+		panic(fmt.Errorf("voice-detect: dlopen %q: %w", libName, err))
+	}
+
+	// Bound 1:1 to voicedetect_capi.h. char*/float* returns are registered as
+	// uintptr so the raw pointer can be freed via the matching capi free fn.
+	libFuncs := []LibFuncs{
+		{&CppAbiVersion, "voicedetect_capi_abi_version"},
+		{&CppLoad, "voicedetect_capi_load"},
+		{&CppFree, "voicedetect_capi_free"},
+		{&CppLastError, "voicedetect_capi_last_error"},
+		{&CppFreeString, "voicedetect_capi_free_string"},
+		{&CppFreeVec, "voicedetect_capi_free_vec"},
+		{&CppEmbedPath, "voicedetect_capi_embed_path"},
+		{&CppEmbedPCM, "voicedetect_capi_embed_pcm"},
+		{&CppVerifyPaths, "voicedetect_capi_verify_paths"},
+		{&CppAnalyzeJSON, "voicedetect_capi_analyze_path_json"},
+	}
+	for _, lf := range libFuncs {
+		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
+	}
+
+	fmt.Fprintf(os.Stderr, "[voice-detect] ABI=%d\n", CppAbiVersion())
+
+	flag.Parse()
+
+	if err := grpc.StartServer(*addr, &VoiceDetect{}); err != nil {
+		panic(err)
+	}
+}
--- a/backend/go/voice-detect/options.go
+++ b/backend/go/voice-detect/options.go
@@ -0,0 +1,46 @@
+package main
+
+import (
+	"strconv"
+	"strings"
+)
+
+// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
+// not set one. Matches the Python speaker-recognition backend's default so the
+// two implementations agree on verdicts out of the box.
+const defaultVerifyThreshold float32 = 0.25
+
+// loadOptions holds the parsed model-level options for voice-detect.
+type loadOptions struct {
+	verifyThreshold float32
+	modelName       string
+}
+
+func splitOption(o string) (key, value string, ok bool) {
+	i := strings.Index(o, ":")
+	if i < 0 {
+		return "", "", false
+	}
+	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
+}
+
+// parseOptions reads the backend "key:value" option slice. Unknown keys are
+// ignored. Defaults: verify_threshold 0.25, model_name derived from the file.
+func parseOptions(opts []string) loadOptions {
+	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
+	for _, oo := range opts {
+		key, value, ok := splitOption(oo)
+		if !ok {
+			continue
+		}
+		switch key {
+		case "verify_threshold", "threshold":
+			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
+				o.verifyThreshold = float32(f)
+			}
+		case "model_name":
+			o.modelName = value
+		}
+	}
+	return o
+}
--- a/backend/go/voice-detect/package.sh
+++ b/backend/go/voice-detect/package.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+#
+# Bundle the voice-detect-grpc binary, libvoicedetect.so, the core runtime libs
+# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
+# so the package is self-contained. Mirrors backend/go/parakeet-cpp/package.sh;
+# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
+# is used instead of the host's.
+
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+REPO_ROOT="${CURDIR}/../../.."
+
+mkdir -p "$CURDIR/package/lib"
+
+cp -avf "$CURDIR/voice-detect-grpc" "$CURDIR/package/"
+cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
+
+# libvoicedetect.so + any soname symlinks. purego.Dlopen resolves it via
+# LD_LIBRARY_PATH, which run.sh points at lib/.
+cp -avf "$CURDIR"/libvoicedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libvoicedetect.so not found in $CURDIR, run 'make' first" >&2
+	exit 1
+}
+
+# Detect architecture and copy the core runtime libs libvoicedetect.so links
+# against, plus the matching dynamic loader as lib/ld.so.
+if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
+    echo "Detected x86_64 architecture, copying x86_64 libraries..."
+    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
+    echo "Detected ARM64 architecture, copying ARM64 libraries..."
+    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
+    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
+    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
+    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
+    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
+elif [ "$(uname -s)" = "Darwin" ]; then
+    echo "Detected Darwin"
+else
+    echo "Error: Could not detect architecture"
+    exit 1
+fi
+
+# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
+# BUILD_TYPE so the backend can reach the GPU without the runtime base image
+# shipping those drivers.
+GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
+if [ -f "$GPU_LIB_SCRIPT" ]; then
+    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
+    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
+    package_gpu_libs
+fi
+
+echo "Packaging completed successfully"
+ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/voice-detect/run.sh
+++ b/backend/go/voice-detect/run.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+
+# If a self-contained ld.so was packaged, route through it so the packaged
+# libc / libstdc++ are used instead of the host's (matches the whisper /
+# parakeet backends' runtime layout).
+if [ -f "$CURDIR/lib/ld.so" ]; then
+	echo "Using lib/ld.so"
+	exec "$CURDIR/lib/ld.so" "$CURDIR/voice-detect-grpc" "$@"
+fi
+
+exec "$CURDIR/voice-detect-grpc" "$@"
--- a/backend/go/voice-detect/test.sh
+++ b/backend/go/voice-detect/test.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath "$0")")
+cd "$CURDIR"
+
+echo "Running voice-detect backend tests..."
+
+# The pure-Go parsing specs always run. The embed/verify/analyze smoke specs run
+# only when a model + WAV are provided via VOICEDETECT_BACKEND_TEST_MODEL and
+# VOICEDETECT_BACKEND_TEST_WAV; otherwise they auto-skip.
+LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
+
+echo "voice-detect tests completed."
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -209,6 +209,78 @@
    nvidia-cuda-12: "cuda12-ced"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced"
+- &voicedetect
+  name: "voice-detect"
+  alias: "voice-detect"
+  license: mit
+  icon: https://avatars.githubusercontent.com/u/95302084
+  description: |
+    voice-detect speaker recognition and voice analysis.
+    voice-detect.cpp is a C++/ggml engine that produces L2-normalised
+    speaker embeddings (ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker
+    ERes2Net, CAM++) for voice verification and 1:N identification, plus
+    a wav2vec2 age / gender / emotion analysis head. It replaces the
+    Python speaker-recognition backend and is exposed through the Voice*
+    gRPC rpcs and the /v1/voice/* REST endpoints. It runs on CPU, NVIDIA
+    CUDA, AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - CPU
+    - GPU
+    - CUDA
+    - HIP
+  capabilities:
+    default: "cpu-voice-detect"
+    nvidia: "cuda12-voice-detect"
+    intel: "intel-sycl-f16-voice-detect"
+    metal: "metal-voice-detect"
+    amd: "rocm-voice-detect"
+    vulkan: "vulkan-voice-detect"
+    nvidia-l4t: "nvidia-l4t-arm64-voice-detect"
+    nvidia-cuda-13: "cuda13-voice-detect"
+    nvidia-cuda-12: "cuda12-voice-detect"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect"
+- &facedetect
+  name: "face-detect"
+  alias: "face-detect"
+  license: mit
+  icon: https://avatars.githubusercontent.com/u/95302084
+  description: |
+    face-detect face detection, embedding, verification and analysis.
+    face-detect.cpp is a C++/ggml engine that runs SCRFD / YuNet face
+    detection and ArcFace / SFace 512-d (or 128-d) L2-normalised face
+    embeddings for verification and 1:N identification, plus a landmark /
+    age / gender analysis head. It replaces the Python insightface backend
+    and is exposed through the Embedding, Detect and Face* gRPC rpcs and
+    the /v1/face/* REST endpoints. It runs on CPU, NVIDIA CUDA, AMD
+    ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - CPU
+    - GPU
+    - CUDA
+    - HIP
+  capabilities:
+    default: "cpu-face-detect"
+    nvidia: "cuda12-face-detect"
+    intel: "intel-sycl-f16-face-detect"
+    metal: "metal-face-detect"
+    amd: "rocm-face-detect"
+    vulkan: "vulkan-face-detect"
+    nvidia-l4t: "nvidia-l4t-arm64-face-detect"
+    nvidia-cuda-13: "cuda13-face-detect"
+    nvidia-cuda-12: "cuda12-face-detect"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect"
 - &voxtral
  name: "voxtral"
  alias: "voxtral"
@@ -2796,6 +2868,236 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-13-ced
+## voice-detect
+- !!merge <<: *voicedetect
+  name: "voice-detect-development"
+  capabilities:
+    default: "cpu-voice-detect-development"
+    nvidia: "cuda12-voice-detect-development"
+    intel: "intel-sycl-f16-voice-detect-development"
+    metal: "metal-voice-detect-development"
+    amd: "rocm-voice-detect-development"
+    vulkan: "vulkan-voice-detect-development"
+    nvidia-l4t: "nvidia-l4t-arm64-voice-detect-development"
+    nvidia-cuda-13: "cuda13-voice-detect-development"
+    nvidia-cuda-12: "cuda12-voice-detect-development"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect-development"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect-development"
+- !!merge <<: *voicedetect
+  name: "nvidia-l4t-arm64-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "nvidia-l4t-arm64-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-nvidia-l4t-arm64-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-nvidia-l4t-arm64-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cpu-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-cpu-voice-detect
+- !!merge <<: *voicedetect
+  name: "cpu-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-cpu-voice-detect
+- !!merge <<: *voicedetect
+  name: "metal-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "metal-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda12-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda12-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-voice-detect
+- !!merge <<: *voicedetect
+  name: "rocm-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-rocm-hipblas-voice-detect
+- !!merge <<: *voicedetect
+  name: "rocm-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-rocm-hipblas-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f32-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f32-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f32-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f32-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f16-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f16-voice-detect
+- !!merge <<: *voicedetect
+  name: "intel-sycl-f16-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f16-voice-detect
+- !!merge <<: *voicedetect
+  name: "vulkan-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-vulkan-voice-detect
+- !!merge <<: *voicedetect
+  name: "vulkan-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-vulkan-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-voice-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voice-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-13-voice-detect
+- !!merge <<: *voicedetect
+  name: "cuda13-voice-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voice-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-13-voice-detect
+## face-detect
+- !!merge <<: *facedetect
+  name: "face-detect-development"
+  capabilities:
+    default: "cpu-face-detect-development"
+    nvidia: "cuda12-face-detect-development"
+    intel: "intel-sycl-f16-face-detect-development"
+    metal: "metal-face-detect-development"
+    amd: "rocm-face-detect-development"
+    vulkan: "vulkan-face-detect-development"
+    nvidia-l4t: "nvidia-l4t-arm64-face-detect-development"
+    nvidia-cuda-13: "cuda13-face-detect-development"
+    nvidia-cuda-12: "cuda12-face-detect-development"
+    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect-development"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect-development"
+- !!merge <<: *facedetect
+  name: "nvidia-l4t-arm64-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "nvidia-l4t-arm64-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-nvidia-l4t-arm64-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-nvidia-l4t-arm64-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cpu-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-cpu-face-detect
+- !!merge <<: *facedetect
+  name: "cpu-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-face-detect"
+  mirrors:
+    - localai/localai-backends:master-cpu-face-detect
+- !!merge <<: *facedetect
+  name: "metal-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "metal-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-face-detect"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-face-detect
+- !!merge <<: *facedetect
+  name: "cuda12-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-face-detect
+- !!merge <<: *facedetect
+  name: "cuda12-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-face-detect
+- !!merge <<: *facedetect
+  name: "rocm-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-rocm-hipblas-face-detect
+- !!merge <<: *facedetect
+  name: "rocm-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-rocm-hipblas-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f32-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f32-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f32-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f32-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f16-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-intel-sycl-f16-face-detect
+- !!merge <<: *facedetect
+  name: "intel-sycl-f16-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-intel-sycl-f16-face-detect
+- !!merge <<: *facedetect
+  name: "vulkan-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-vulkan-face-detect
+- !!merge <<: *facedetect
+  name: "vulkan-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-vulkan-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-face-detect"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-face-detect"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-13-face-detect
+- !!merge <<: *facedetect
+  name: "cuda13-face-detect-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-face-detect"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-13-face-detect
 ## stablediffusion-ggml
 - !!merge <<: *stablediffusionggml
  name: "cpu-stablediffusion-ggml"
--- a/backend/python/diffusers/requirements-cpu.txt
+++ b/backend/python/diffusers/requirements-cpu.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 torchvision==0.22.1
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,15 +10,9 @@ sentencepiece
 torch==2.7.1
 optimum-quanto
 ftfy
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-cublas12.txt
+++ b/backend/python/diffusers/requirements-cublas12.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu121
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 torchvision
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,15 +10,9 @@ sentencepiece
 torch
 ftfy
 optimum-quanto
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-cublas13.txt
+++ b/backend/python/diffusers/requirements-cublas13.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu130
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 torchvision
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,15 +10,9 @@ sentencepiece
 torch
 ftfy
 optimum-quanto
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-hipblas.txt
+++ b/backend/python/diffusers/requirements-hipblas.txt
@@ -1,23 +1,17 @@
 --extra-index-url https://download.pytorch.org/whl/rocm7.0
 torch==2.10.0+rocm7.0
 torchvision==0.25.0+rocm7.0
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 accelerate
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-intel.txt
+++ b/backend/python/diffusers/requirements-intel.txt
@@ -3,24 +3,18 @@ torch
 torchvision
 optimum[openvino]
 setuptools
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 accelerate
 git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-l4t12.txt
+++ b/backend/python/diffusers/requirements-l4t12.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129/
 torch
-diffusers==0.38.0
-transformers==4.57.6
+git+https://github.com/huggingface/diffusers
+transformers
 accelerate
 peft
 optimum-quanto
@@ -9,15 +9,9 @@ numpy<2
 sentencepiece
 torchvision
 ftfy
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-l4t13.txt
+++ b/backend/python/diffusers/requirements-l4t13.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu130
 torch
-diffusers==0.38.0
-transformers==4.57.6
+git+https://github.com/huggingface/diffusers
+transformers
 accelerate
 peft
 optimum-quanto
@@ -10,15 +10,9 @@ sentencepiece
 torchvision
 ftfy
 chardet
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-mps.txt
+++ b/backend/python/diffusers/requirements-mps.txt
@@ -1,22 +1,16 @@
 torch==2.7.1
 torchvision==0.22.1
-diffusers==0.38.0
+git+https://github.com/huggingface/diffusers
 opencv-python
-transformers==4.57.6
+transformers
 accelerate
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# diffusers and transformers are pinned together on purpose. transformers v5
-# restructured CLIPTextModel and dropped the `.text_model` attribute, which
-# breaks single-file Stable Diffusion loading on every released diffusers
-# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
-# main via git froze whichever broken pair existed at image-build time. Pin the
-# last known-good released pair so builds are reproducible and can't drift into
-# the broken window. See https://github.com/mudler/LocalAI/issues/9979
-#
-# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
-# with this pin and previously forced pip into multi-hour resolver backtracking
-# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
-# the import succeeding, so dropping it here is safe.
+# TODO: re-add compel once it supports transformers >= 5.
+# Tracking: https://github.com/damian0815/compel/pull/129
+#           https://github.com/damian0815/compel/issues/128
+# compel currently pins transformers~=4.25, which forced pip into multi-hour
+# resolver backtracking storms in CI. backend.py imports it lazily and gates
+# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
--- a/backend/python/vllm/requirements-cpu.txt
+++ b/backend/python/vllm/requirements-cpu.txt
@@ -1,6 +1,6 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
 accelerate
-torch==2.12.1+xpu
+torch==2.9.1+cpu
 torchvision
 torchaudio
 transformers
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -341,9 +341,11 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d
 	}
 	appCfg := a.ApplicationConfig()

-	// PIIIsEnabled already encodes "explicit pii.enabled wins, else backend
-	// default (cloud-proxy)" — the single source of that rule.
-	enabled = cfg.PIIIsEnabled()
+	if cfg.PII.Enabled != nil {
+		enabled = *cfg.PII.Enabled
+	} else {
+		enabled = cfg.PIIIsEnabled() // backend default (cloud-proxy)
+	}
 	if !enabled {
 		return false, nil
 	}
@@ -352,7 +354,7 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d
 	if len(detectors) == 0 {
 		detectors = append([]string(nil), appCfg.PIIDefaultDetectors...)
 	}
-	return true, detectors // enabled is necessarily true past the !enabled guard
+	return enabled, detectors
 }

 // PIIPolicyResolver adapts ResolvePIIPolicy to pii.PolicyResolver for
--- a/core/application/distributed.go
+++ b/core/application/distributed.go
@@ -357,15 +357,6 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
 		Pressure:         pressure,
 	})

-	// Wire staging-progress broadcasting so file-staging shows up on every
-	// replica, not just the one performing the transfer. Without this, a
-	// /api/operations poll that round-robins onto a peer sees no staging row and
-	// the progress flickers. The origin publishes; peers mirror via the wildcard.
-	router.StagingTracker().SetPublisher(natsClient)
-	if _, err := router.StagingTracker().SubscribeBroadcasts(natsClient); err != nil {
-		xlog.Warn("Failed to subscribe to staging progress broadcasts", "error", err)
-	}
-
 	// Create ReplicaReconciler for auto-scaling model replicas. Adapter +
 	// RegistrationToken feed the state-reconciliation passes: pending op
 	// drain uses the adapter, and model health probes use the token to auth
--- a/core/config/backend_capabilities.go
+++ b/core/config/backend_capabilities.go
@@ -542,6 +542,19 @@ var BackendCapabilities = map[string]BackendCapability{
 		DefaultUsecases:  []string{UsecaseSpeakerRecognition},
 		Description:      "Speaker recognition — voice identity verification and analysis",
 	},
+	"voice-detect": {
+		GRPCMethods:      []GRPCMethod{MethodVoiceVerify, MethodVoiceEmbed, MethodVoiceAnalyze},
+		PossibleUsecases: []string{UsecaseSpeakerRecognition},
+		DefaultUsecases:  []string{UsecaseSpeakerRecognition},
+		Description:      "voice-detect.cpp: C++/ggml speaker embedding, verification and voice analysis (age/gender/emotion)",
+	},
+	"face-detect": {
+		GRPCMethods:      []GRPCMethod{MethodEmbedding, MethodDetect, MethodFaceVerify, MethodFaceAnalyze},
+		PossibleUsecases: []string{UsecaseEmbeddings, UsecaseDetection, UsecaseFaceRecognition},
+		DefaultUsecases:  []string{UsecaseFaceRecognition},
+		AcceptsImages:    true,
+		Description:      "face-detect.cpp: C++/ggml face detection, embedding, verification and attribute analysis",
+	},
 	"silero-vad": {
 		GRPCMethods:      []GRPCMethod{MethodVAD},
 		PossibleUsecases: []string{UsecaseVAD},
--- a/core/http/endpoints/localai/nodes.go
+++ b/core/http/endpoints/localai/nodes.go
@@ -385,23 +385,6 @@ func GetNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
 	}
 }

-// ListAllNodeModelsEndpoint returns all loaded models across all healthy nodes.
-// @Summary List all loaded models cluster-wide
-// @Tags Nodes
-// @Success 200 {array} nodes.NodeModel
-// @Router /api/nodes/models [get]
-func ListAllNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
-	return func(c echo.Context) error {
-		ctx := c.Request().Context()
-		models, err := registry.ListAllLoadedModels(ctx)
-		if err != nil {
-			xlog.Error("Failed to list all node models", "error", err)
-			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to list node models"))
-		}
-		return c.JSON(http.StatusOK, models)
-	}
-}
-
 // DrainNodeEndpoint sets a node to draining status (no new requests).
 func DrainNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
 	return func(c echo.Context) error {
--- a/core/http/endpoints/localai/nodes_test.go
+++ b/core/http/endpoints/localai/nodes_test.go
@@ -407,44 +407,4 @@ var _ = Describe("Node HTTP handlers", func() {
 			Expect(names).To(ConsistOf("alpha", "beta"))
 		})
 	})
-
-	Describe("ListAllNodeModelsEndpoint", func() {
-		It("returns an empty list when no models are loaded", func() {
-			e := echo.New()
-			req := httptest.NewRequest(http.MethodGet, "/", nil)
-			rec := httptest.NewRecorder()
-			c := e.NewContext(req, rec)
-
-			handler := ListAllNodeModelsEndpoint(registry)
-			Expect(handler(c)).To(Succeed())
-			Expect(rec.Code).To(Equal(http.StatusOK))
-
-			var list []nodes.NodeModel
-			Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
-			Expect(list).To(BeEmpty())
-		})
-
-		It("returns loaded models across healthy nodes", func() {
-			ctx := context.Background()
-			Expect(registry.Register(ctx, &nodes.BackendNode{
-				ID: "n1", Name: "alpha", Address: "10.0.0.1:50051", Status: nodes.StatusHealthy,
-			}, true)).To(Succeed())
-			Expect(registry.SetNodeModel(ctx, "n1", "llama-3.3", 0, "loaded", "10.0.0.1:50051", 0)).To(Succeed())
-
-			e := echo.New()
-			req := httptest.NewRequest(http.MethodGet, "/", nil)
-			rec := httptest.NewRecorder()
-			c := e.NewContext(req, rec)
-
-			handler := ListAllNodeModelsEndpoint(registry)
-			Expect(handler(c)).To(Succeed())
-			Expect(rec.Code).To(Equal(http.StatusOK))
-
-			var list []nodes.NodeModel
-			Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
-			Expect(list).To(HaveLen(1))
-			Expect(list[0].ModelName).To(Equal("llama-3.3"))
-			Expect(list[0].NodeID).To(Equal("n1"))
-		})
-	})
 })
--- a/core/http/react-ui/e2e/model-config.spec.js
+++ b/core/http/react-ui/e2e/model-config.spec.js
@@ -288,21 +288,6 @@ test.describe('Model Editor - Interactive Tab', () => {
    await expect(page.locator('input[placeholder^="match,"]')).toBeVisible()
  })

-  test('pattern min_len clamps a directly-typed negative to 0', async ({ page }) => {
-    const searchInput = page.locator('input[placeholder="Search fields to add..."]')
-    await searchInput.fill('Custom Secret Patterns')
-    const dropdown = searchInput.locator('..').locator('..')
-    await dropdown.locator('div', { hasText: 'Custom Secret Patterns' }).first().click()
-
-    await page.locator('button', { hasText: 'Add pattern' }).click()
-    // The number input's min={0} only limits the spinner arrows, not keyboard
-    // entry; the editor must sanitise a typed negative so a meaningless
-    // negative length floor never reaches the saved config.
-    const minLen = page.locator('input[aria-label="Minimum length"]')
-    await minLen.fill('-5')
-    await expect(minLen).toHaveValue('0')
-  })
-
  // Regression: a map-typed field (entity_actions) present in the loaded YAML
  // must render WITH its values. flattenConfig used to recurse into the map,
  // scattering it across pii_detection.entity_actions.<GROUP> paths that match
@@ -344,37 +329,4 @@ test.describe('Model Editor - Interactive Tab', () => {
    await expect(page.getByText(/block —/i).first()).toBeVisible()
  })

-  // A map cannot hold two values for one key, so renaming a row to an existing
-  // group must collapse to a single row (Object.fromEntries, last write wins)
-  // rather than rendering two conflicting rows that silently lose one on save.
-  test('entity_actions collapses a duplicate group to a single row', async ({ page }) => {
-    await page.route('**/api/models/edit/ner-model', (route) => {
-      route.fulfill({
-        contentType: 'application/json',
-        body: JSON.stringify({
-          name: 'ner-model',
-          config: [
-            'name: ner-model',
-            'backend: llama-cpp',
-            'pii_detection:',
-            '    entity_actions:',
-            '        SSN: block',
-            '        EMAIL: mask',
-            '',
-          ].join('\n'),
-        }),
-      })
-    })
-
-    await page.goto('/app/model-editor/ner-model')
-
-    const groupInputs = page.locator('input[aria-label="Entity group"]')
-    await expect(groupInputs).toHaveCount(2)
-
-    // Rename the EMAIL row to duplicate SSN; the editor collapses to one SSN row.
-    await groupInputs.nth(1).fill('SSN')
-    await expect(groupInputs).toHaveCount(1)
-    await expect(groupInputs.nth(0)).toHaveValue('SSN')
-  })
-
 })
--- a/core/http/react-ui/e2e/nodes-detail.spec.js
+++ b/core/http/react-ui/e2e/nodes-detail.spec.js
@@ -1,34 +0,0 @@
-import { test, expect } from './coverage-fixtures.js'
-
-const ID = 'n1'
-async function mockNode(page) {
-  await page.route(`**/api/nodes/${ID}`, r => r.fulfill({ status: 200, contentType: 'application/json',
-    body: JSON.stringify({ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy', total_vram: 24e9, available_vram: 12e9, max_replicas_per_model: 1, labels: { env: 'prod' } }) }))
-  await page.route(`**/api/nodes/${ID}/models`, r => r.fulfill({ status: 200, contentType: 'application/json',
-    body: JSON.stringify([{ node_id: ID, model_name: 'llama-3.3', state: 'loaded', in_flight: 0, replica_index: 0 }]) }))
-  await page.route(`**/api/nodes/${ID}/backends`, r => r.fulfill({ status: 200, contentType: 'application/json',
-    body: JSON.stringify([{ name: 'llama-cpp', is_system: true, installed_at: '2026-06-01T00:00:00Z' }]) }))
-}
-
-test.describe('Node detail page', () => {
-  test('renders sections for a node', async ({ page }) => {
-    await mockNode(page)
-    await page.goto(`/app/nodes/${ID}`)
-    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
-    await expect(page.getByText('alpha')).toBeVisible()
-    await expect(page.getByText('llama-3.3')).toBeVisible()
-    await expect(page.getByText('llama-cpp')).toBeVisible()
-    await expect(page.getByText('env=prod')).toBeVisible()
-  })
-
-  test('is reachable by clicking a roster panel', async ({ page }) => {
-    await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json',
-      body: JSON.stringify([{ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' }]) }))
-    await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
-    await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
-    await mockNode(page)
-    await page.goto('/app/nodes')
-    await page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('alpha').click()
-    await expect(page).toHaveURL(new RegExp(`/app/nodes/${ID}$`))
-  })
-})
--- a/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
+++ b/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
@@ -12,37 +12,28 @@ const NODE_NAME = 'worker-test'
 const BACKEND_NAME = 'cuda12-vllm-development'

 async function mockDistributedNodes(page, { onDelete } = {}) {
-  const nodeRecord = {
-    id: NODE_ID,
-    name: NODE_NAME,
-    node_type: 'backend',
-    address: '10.0.0.1:50051',
-    http_address: '10.0.0.1:8090',
-    status: 'healthy',
-    total_vram: 0,
-    available_vram: 0,
-    total_ram: 8_000_000_000,
-    available_ram: 4_000_000_000,
-    gpu_vendor: '',
-    last_heartbeat: new Date().toISOString(),
-    created_at: new Date().toISOString(),
-    updated_at: new Date().toISOString(),
-  }
-
  await page.route('**/api/nodes', (route) => {
    route.fulfill({
      status: 200,
      contentType: 'application/json',
-      body: JSON.stringify([nodeRecord]),
-    })
-  })
-
-  // The detail page fetches the single node via nodesApi.get(id).
-  await page.route(`**/api/nodes/${NODE_ID}`, (route) => {
-    route.fulfill({
-      status: 200,
-      contentType: 'application/json',
-      body: JSON.stringify(nodeRecord),
+      body: JSON.stringify([
+        {
+          id: NODE_ID,
+          name: NODE_NAME,
+          node_type: 'backend',
+          address: '10.0.0.1:50051',
+          http_address: '10.0.0.1:8090',
+          status: 'healthy',
+          total_vram: 0,
+          available_vram: 0,
+          total_ram: 8_000_000_000,
+          available_ram: 4_000_000_000,
+          gpu_vendor: '',
+          last_heartbeat: new Date().toISOString(),
+          created_at: new Date().toISOString(),
+          updated_at: new Date().toISOString(),
+        },
+      ]),
    })
  })

@@ -89,18 +80,24 @@ async function mockDistributedNodes(page, { onDelete } = {}) {
  })
 }

-async function openNodeDetail(page) {
-  // The per-node backend table now lives on the deep-linkable detail page
-  // at /app/nodes/:id (the old expand-row + "Manage" disclosure was removed
-  // when the roster was restructured). Navigate straight there.
-  await page.goto(`/app/nodes/${NODE_ID}`)
+async function expandNodeAndWaitForBackends(page) {
+  await page.goto('/app/nodes')
+  // Click the row to expand it. The chevron toggle and the row both work,
+  // but clicking the name cell is the most user-like.
+  await page.getByText(NODE_NAME).first().click()
+  // Backends, Capacity and Labels live behind a "Manage" <details>
+  // disclosure (the drawer was distilled to keep at-a-glance content
+  // lean — see distill refactor in the multi-replica branch). Open it
+  // by clicking the summary inside the .node-manage scope so the
+  // per-node backend table is in the DOM before assertions run.
+  await page.locator('.node-manage > summary').first().click()
  await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
 }

 test.describe('Nodes page — per-node backend actions', () => {
  test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
    await mockDistributedNodes(page)
-    await openNodeDetail(page)
+    await expandNodeAndWaitForBackends(page)

    // Negative: the old, ambiguous wording must not be used.
    await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
@@ -117,7 +114,7 @@ test.describe('Nodes page — per-node backend actions', () => {

  test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
    await mockDistributedNodes(page)
-    await openNodeDetail(page)
+    await expandNodeAndWaitForBackends(page)

    const deleteBtn = page.locator('button[title="Delete backend from this node"]')
    await expect(deleteBtn).toBeVisible()
@@ -131,7 +128,7 @@ test.describe('Nodes page — per-node backend actions', () => {
        postedBody = route.request().postDataJSON()
      },
    })
-    await openNodeDetail(page)
+    await expandNodeAndWaitForBackends(page)

    await page.locator('button[title="Delete backend from this node"]').click()

@@ -153,7 +150,7 @@ test.describe('Nodes page — per-node backend actions', () => {
        deleteCalls += 1
      },
    })
-    await openNodeDetail(page)
+    await expandNodeAndWaitForBackends(page)

    await page.locator('button[title="Delete backend from this node"]').click()

--- a/core/http/react-ui/e2e/nodes-roster.spec.js
+++ b/core/http/react-ui/e2e/nodes-roster.spec.js
@@ -1,47 +0,0 @@
-import { test, expect } from './coverage-fixtures.js'
-
-async function mockCluster(page, nodes) {
-  await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify(nodes) }))
-  await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
-  await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
-}
-
-test.describe('Nodes roster header', () => {
-  test('shows a cluster pulse line and no stat-card grid', async ({ page }) => {
-    await mockCluster(page, [
-      { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' },
-      { id: 'n2', name: 'beta', node_type: 'backend', address: '10.0.0.2:50051', status: 'draining' },
-    ])
-    await page.goto('/app/nodes')
-    await expect(page.locator('.cluster-pulse')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.cluster-pulse')).toContainText('2 nodes')
-    await expect(page.locator('.stat-grid')).toHaveCount(0)
-  })
-
-  test('shows an approval callout for pending nodes', async ({ page }) => {
-    await mockCluster(page, [{ id: 'n3', name: 'gamma', node_type: 'backend', address: '10.0.0.3:50051', status: 'pending' }])
-    await page.goto('/app/nodes')
-    await expect(page.locator('.attention-callout')).toContainText('approval', { timeout: 15_000 })
-  })
-})
-
-test.describe('Nodes roster panels', () => {
-  test('shows model chips without clicking and filters by type', async ({ page }) => {
-    await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([
-      { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' },
-      { id: 'a1', name: 'agent-1', node_type: 'agent', address: '10.0.0.9:50051', status: 'healthy' },
-    ]) }))
-    await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([
-      { node_id: 'n1', model_name: 'llama-3.3', state: 'loaded', in_flight: 2, replica_index: 0 },
-    ]) }))
-    await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
-
-    await page.goto('/app/nodes')
-    // model chip visible without any expand click
-    await expect(page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('llama-3.3')).toBeVisible({ timeout: 15_000 })
-    // segmented filter: Agent shows the agent node, hides the backend node
-    await page.getByRole('radio', { name: /Agent/ }).click()
-    await expect(page.getByText('agent-1')).toBeVisible()
-    await expect(page.getByText('alpha')).toHaveCount(0)
-  })
-})
--- a/core/http/react-ui/e2e/page-render-smoke.spec.js
+++ b/core/http/react-ui/e2e/page-render-smoke.spec.js
@@ -21,7 +21,6 @@ const PAGES = [
  ['/app/backends', 'Backends'],
  ['/app/settings', 'Settings'],
  ['/app/nodes', 'Nodes'],
-  ['/app/scheduling', 'Scheduling'],
  ['/app/face', 'Face recognition'],
  ['/app/voice', 'Voice recognition'],
  ['/app/fine-tune', 'Fine-tuning'],
--- a/core/http/react-ui/e2e/scheduling.spec.js
+++ b/core/http/react-ui/e2e/scheduling.spec.js
@@ -1,16 +0,0 @@
-import { test, expect } from './coverage-fixtures.js'
-
-test.describe('Scheduling page', () => {
-  test('renders at /app/scheduling with rules from the API', async ({ page }) => {
-    await page.route('**/api/nodes/scheduling', (route) => {
-      route.fulfill({
-        status: 200, contentType: 'application/json',
-        body: JSON.stringify([{ model_name: 'llama-3.3', spread_all: true, min_replicas: 0, max_replicas: 0 }]),
-      })
-    })
-    await page.goto('/app/scheduling')
-    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
-    await expect(page).toHaveURL(/\/app\/scheduling$/)
-    await expect(page.getByText('llama-3.3')).toBeVisible()
-  })
-})
--- a/core/http/react-ui/public/locales/de/admin.json
+++ b/core/http/react-ui/public/locales/de/admin.json
@@ -43,10 +43,6 @@
    "title": "Verteilte Knoten",
    "subtitle": "Backend- und Agenten-Worker-Knoten verwalten"
  },
-  "scheduling": {
-    "title": "Planung",
-    "subtitle": "Modellplatzierung und Replikat-Regeln im gesamten Cluster"
-  },
  "p2p": {
    "title": "Verteilte KI-Berechnung",
    "subtitle": "Skalieren Sie Ihre KI-Workloads über mehrere Geräte mit Peer-to-Peer-Verteilung"
--- a/core/http/react-ui/public/locales/de/nav.json
+++ b/core/http/react-ui/public/locales/de/nav.json
@@ -50,7 +50,6 @@
    "backends": "Backends",
    "traces": "Traces",
    "nodes": "Knoten",
-    "scheduling": "Planung",
    "swarm": "Swarm",
    "system": "System",
    "settings": "Einstellungen",
--- a/core/http/react-ui/public/locales/en/admin.json
+++ b/core/http/react-ui/public/locales/en/admin.json
@@ -43,10 +43,6 @@
    "title": "Distributed Nodes",
    "subtitle": "Manage backend and agent worker nodes"
  },
-  "scheduling": {
-    "title": "Scheduling",
-    "subtitle": "Model placement and replica rules across the cluster"
-  },
  "p2p": {
    "title": "Distributed AI Computing",
    "subtitle": "Scale your AI workloads across multiple devices with peer-to-peer distribution"
--- a/core/http/react-ui/public/locales/en/nav.json
+++ b/core/http/react-ui/public/locales/en/nav.json
@@ -51,7 +51,6 @@
    "backends": "Backends",
    "traces": "Traces",
    "nodes": "Nodes",
-    "scheduling": "Scheduling",
    "swarm": "Swarm",
    "system": "System",
    "settings": "Settings",
--- a/core/http/react-ui/public/locales/es/admin.json
+++ b/core/http/react-ui/public/locales/es/admin.json
@@ -43,10 +43,6 @@
    "title": "Nodos distribuidos",
    "subtitle": "Administra nodos worker de backends y agentes"
  },
-  "scheduling": {
-    "title": "Planificación",
-    "subtitle": "Reglas de ubicación de modelos y réplicas en el clúster"
-  },
  "p2p": {
    "title": "Computación de IA distribuida",
    "subtitle": "Escala tus cargas de trabajo de IA en múltiples dispositivos con distribución peer-to-peer"
--- a/core/http/react-ui/public/locales/es/nav.json
+++ b/core/http/react-ui/public/locales/es/nav.json
@@ -50,7 +50,6 @@
    "backends": "Backends",
    "traces": "Trazas",
    "nodes": "Nodos",
-    "scheduling": "Planificación",
    "swarm": "Swarm",
    "system": "Sistema",
    "settings": "Configuración",
--- a/core/http/react-ui/public/locales/id/admin.json
+++ b/core/http/react-ui/public/locales/id/admin.json
@@ -43,10 +43,6 @@
    "title": "Node Terdistribusi",
    "subtitle": "Kelola node backend dan node worker"
  },
-  "scheduling": {
-    "title": "Penjadwalan",
-    "subtitle": "Aturan penempatan model dan replika di seluruh klaster"
-  },
  "p2p": {
    "title": "Komputasi AI Terdistribusi",
    "subtitle": "Skalakan beban kerja AI Anda ke beberapa perangkat dengan distribusi peer-to-peer"
--- a/core/http/react-ui/public/locales/id/nav.json
+++ b/core/http/react-ui/public/locales/id/nav.json
@@ -51,7 +51,6 @@
    "backends": "Backend",
    "traces": "Trace",
    "nodes": "Node",
-    "scheduling": "Penjadwalan",
    "swarm": "Swarm",
    "system": "Sistem",
    "settings": "Pengaturan",
--- a/core/http/react-ui/public/locales/it/admin.json
+++ b/core/http/react-ui/public/locales/it/admin.json
@@ -43,10 +43,6 @@
    "title": "Nodi distribuiti",
    "subtitle": "Gestisci i nodi worker dei backend e degli agenti"
  },
-  "scheduling": {
-    "title": "Pianificazione",
-    "subtitle": "Regole di posizionamento dei modelli e delle repliche nel cluster"
-  },
  "p2p": {
    "title": "Calcolo AI distribuito",
    "subtitle": "Scala i tuoi carichi di lavoro AI su più dispositivi con la distribuzione peer-to-peer"
--- a/core/http/react-ui/public/locales/it/nav.json
+++ b/core/http/react-ui/public/locales/it/nav.json
@@ -50,7 +50,6 @@
    "backends": "Backend",
    "traces": "Tracce",
    "nodes": "Nodi",
-    "scheduling": "Pianificazione",
    "swarm": "Swarm",
    "system": "Sistema",
    "settings": "Impostazioni",
--- a/core/http/react-ui/public/locales/ko/admin.json
+++ b/core/http/react-ui/public/locales/ko/admin.json
@@ -43,10 +43,6 @@
    "title": "분산 노드",
    "subtitle": "백엔드 및 에이전트 워커 노드를 관리합니다"
  },
-  "scheduling": {
-    "title": "스케줄링",
-    "subtitle": "클러스터 전반의 모델 배치 및 복제본 규칙"
-  },
  "p2p": {
    "title": "분산 AI 컴퓨팅",
    "subtitle": "피어 투 피어 분산으로 여러 기기에 걸쳐 AI 워크로드를 확장합니다"
--- a/core/http/react-ui/public/locales/ko/nav.json
+++ b/core/http/react-ui/public/locales/ko/nav.json
@@ -51,7 +51,6 @@
    "backends": "백엔드",
    "traces": "트레이스",
    "nodes": "노드",
-    "scheduling": "스케줄링",
    "swarm": "Swarm",
    "system": "시스템",
    "settings": "설정",
--- a/core/http/react-ui/public/locales/zh-CN/admin.json
+++ b/core/http/react-ui/public/locales/zh-CN/admin.json
@@ -43,10 +43,6 @@
    "title": "分布式节点",
    "subtitle": "管理后端和智能体工作节点"
  },
-  "scheduling": {
-    "title": "调度",
-    "subtitle": "集群中的模型放置和副本规则"
-  },
  "p2p": {
    "title": "分布式 AI 计算",
    "subtitle": "通过点对点分发将您的 AI 工作负载扩展到多个设备"
--- a/core/http/react-ui/public/locales/zh-CN/nav.json
+++ b/core/http/react-ui/public/locales/zh-CN/nav.json
@@ -50,7 +50,6 @@
    "backends": "后端",
    "traces": "追踪",
    "nodes": "节点",
-    "scheduling": "调度",
    "swarm": "Swarm",
    "system": "系统",
    "settings": "设置",
--- a/core/http/react-ui/src/App.css
+++ b/core/http/react-ui/src/App.css
@@ -8471,56 +8471,3 @@ select.input {
 .status-pill--error   .status-pill__dot { background: var(--color-error); }
 .status-pill--info    .status-pill__dot { background: var(--color-info); }
 .status-pill--muted   .status-pill__dot { background: var(--color-text-muted); }
-
-/* Nodes: cluster pulse + attention callout (replaces the stat-card strip) */
-.cluster-pulse {
-  font-size: var(--text-sm);
-  color: var(--color-text-muted);
-  margin: 0 0 var(--spacing-lg);
-}
-.cluster-pulse__strong { color: var(--color-text-primary); font-weight: 600; }
-
-.attention-callout {
-  display: flex;
-  align-items: center;
-  justify-content: space-between;
-  gap: var(--spacing-md);
-  padding: var(--spacing-sm) var(--spacing-md);
-  border-radius: var(--radius-md);
-  margin-bottom: var(--spacing-lg);
-  font-size: var(--text-sm);
-}
-.attention-callout--warn {
-  background: var(--color-warning-light);
-  border: 1px solid var(--color-warning-border);
-  color: var(--color-text-primary);
-}
-.attention-callout--error {
-  background: var(--color-error-light);
-  border: 1px solid var(--color-error-border);
-  color: var(--color-text-primary);
-}
-
-/* Node roster panels (Nodes page) */
-.node-roster { display: flex; flex-direction: column; gap: var(--spacing-sm); }
-.node-panel {
-  background: var(--color-bg-secondary);
-  border: 1px solid var(--color-border-subtle);
-  border-radius: var(--radius-lg);
-}
-.node-panel__main { padding: var(--spacing-md) var(--spacing-lg); cursor: pointer; }
-.node-panel:hover { border-color: var(--color-border); }
-.node-panel__head { display: flex; align-items: flex-start; justify-content: space-between; gap: var(--spacing-md); }
-.node-panel__id { display: flex; align-items: center; gap: var(--spacing-sm); flex-wrap: wrap; }
-.node-panel__name { font-weight: 600; }
-.node-panel__meta { display: flex; gap: var(--spacing-lg); margin-top: var(--spacing-sm); color: var(--color-text-muted); font-size: var(--text-xs); }
-.node-panel__models { display: flex; flex-wrap: wrap; gap: 6px; margin-top: var(--spacing-sm); }
-.model-chip {
-  display: inline-flex; align-items: center; gap: 5px;
-  font-family: var(--font-mono); font-size: 0.6875rem;
-  padding: 2px 8px; border-radius: var(--radius-sm); border: 1px solid;
-}
-.model-chip__dot { width: 6px; height: 6px; border-radius: 50%; }
-.model-chip__state { opacity: 0.85; font-style: normal; }
-.node-filter { margin-bottom: var(--spacing-lg); }
-.node-detail__metrics { display: flex; gap: var(--spacing-xl); margin: var(--spacing-md) 0 var(--spacing-lg); flex-wrap: wrap; }
--- a/core/http/react-ui/src/components/PatternListEditor.jsx
+++ b/core/http/react-ui/src/components/PatternListEditor.jsx
@@ -74,18 +74,7 @@ export default function PatternListEditor({ value, onChange }) {
            min={0}
            value={r.min_len || 0}
            title="Minimum match length (0 = no floor)"
-            // min={0} only constrains the spinner, not keyboard entry. Clamp a
-            // typed negative to 0 (a negative floor is meaningless and would
-            // disable the length filter). When we clamp, force the DOM value
-            // too: the resulting 0->0 state change is a no-op, so React's
-            // controlled input would otherwise keep displaying the rejected
-            // "-5" even though the saved value is 0.
-            onChange={e => {
-              const parsed = parseInt(e.target.value, 10)
-              const n = Math.max(0, parsed || 0)
-              if (parsed < 0) e.target.value = String(n)
-              update(i, { min_len: n })
-            }}
+            onChange={e => update(i, { min_len: parseInt(e.target.value, 10) || 0 })}
            style={{ width: 80, fontSize: '0.8125rem' }}
            aria-label="Minimum length"
          />
--- a/core/http/react-ui/src/components/console/consoleConfig.js
+++ b/core/http/react-ui/src/components/console/consoleConfig.js
@@ -59,7 +59,6 @@ export const operateConsole = {
      titleKey: 'operate.cluster',
      items: [
        { path: '/app/nodes', icon: 'fas fa-network-wired', labelKey: 'items.nodes', adminOnly: true, feature: 'distributed' },
-        { path: '/app/scheduling', icon: 'fas fa-calendar-alt', labelKey: 'items.scheduling', adminOnly: true, feature: 'distributed' },
        { path: '/app/p2p', icon: 'fas fa-circle-nodes', labelKey: 'items.swarm', adminOnly: true },
      ],
    },
--- a/core/http/react-ui/src/components/nodes/AttentionCallout.jsx
+++ b/core/http/react-ui/src/components/nodes/AttentionCallout.jsx
@@ -1,31 +0,0 @@
-export default function AttentionCallout({ nodes, onApprove }) {
-  const pending = nodes.filter(n => n.status === 'pending')
-  const unhealthy = nodes.filter(n => n.status === 'unhealthy' || n.status === 'offline')
-  if (pending.length === 0 && unhealthy.length === 0) return null
-
-  if (pending.length > 0) {
-    const first = pending[0]
-    const extra = pending.length - 1
-    return (
-      <div className="attention-callout attention-callout--warn">
-        <span>
-          <i className="fas fa-exclamation-circle" />{' '}
-          <strong>{pending.length} node{pending.length > 1 ? 's' : ''} awaiting approval</strong>
-          {' - '}{first.name}{extra > 0 ? ` +${extra} more` : ''}
-        </span>
-        <button className="btn btn-primary btn-sm" onClick={() => onApprove(first.id)}>
-          <i className="fas fa-check" /> Approve {first.name}
-        </button>
-      </div>
-    )
-  }
-  return (
-    <div className="attention-callout attention-callout--error">
-      <span>
-        <i className="fas fa-exclamation-triangle" />{' '}
-        <strong>{unhealthy.length} node{unhealthy.length > 1 ? 's' : ''} unhealthy</strong>
-        {' - '}{unhealthy.map(n => n.name).slice(0, 3).join(', ')}
-      </span>
-    </div>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/CapacityEditor.jsx
+++ b/core/http/react-ui/src/components/nodes/CapacityEditor.jsx
@@ -1,196 +0,0 @@
-import { useState, useEffect, useCallback } from 'react'
-import { nodesApi } from '../../utils/api'
-import LoadingSpinner from '../LoadingSpinner'
-
-/**
- * Inline editor for a node's per-model replica capacity.
- *
- * UX intent: discoverable affordance (pencil icon) that opens an inline
- * input - never a modal for a single field. Source-of-truth note is shown
- * inline so operators understand a worker re-registration will overwrite
- * their override; surfacing this in a tooltip would hide too important a
- * caveat.
- *
- * `confirmShrink` is a hook the parent provides so the page can render its
- * own confirm dialog (it has access to all nodes and can phrase the message
- * with full context).
- */
-export default function CapacityEditor({ node, loadedModelCounts, onUpdate, confirmShrink, addToast }) {
-  const current = node.max_replicas_per_model || 1
-  const isOverride = !!node.max_replicas_per_model_manually_set
-  const [editing, setEditing] = useState(false)
-  const [draft, setDraft] = useState(String(current))
-  const [saving, setSaving] = useState(false)
-  const [resetting, setResetting] = useState(false)
-
-  // Reset draft when current value changes (server response, etc.)
-  useEffect(() => {
-    if (!editing) setDraft(String(current))
-  }, [current, editing])
-
-  const cancel = useCallback(() => {
-    setEditing(false)
-    setDraft(String(current))
-  }, [current])
-
-  const save = useCallback(async () => {
-    const value = parseInt(draft, 10)
-    if (!Number.isFinite(value) || value < 1) {
-      addToast('Replica capacity must be 1 or higher', 'error')
-      return
-    }
-    if (value === current) {
-      setEditing(false)
-      return
-    }
-    // Reducing the cap below current loaded replicas: confirm so the operator
-    // sees the consequence (running replicas keep going until idle eviction).
-    const maxLoadedAcrossModels = Math.max(0, ...Object.values(loadedModelCounts || {}))
-    if (value < maxLoadedAcrossModels) {
-      const proceed = await confirmShrink({ node, newValue: value, currentLoaded: maxLoadedAcrossModels })
-      if (!proceed) return
-    }
-    setSaving(true)
-    try {
-      await nodesApi.updateMaxReplicasPerModel(node.id, value)
-      addToast(`Replica capacity set to ${value} on ${node.name}`, 'success')
-      setEditing(false)
-      onUpdate?.(value)
-    } catch (err) {
-      addToast(`Could not change replica capacity: ${err.message || err}`, 'error')
-    } finally {
-      setSaving(false)
-    }
-  }, [draft, current, node, loadedModelCounts, confirmShrink, onUpdate, addToast])
-
-  const onKeyDown = (e) => {
-    if (e.key === 'Enter') { e.preventDefault(); save() }
-    else if (e.key === 'Escape') { e.preventDefault(); cancel() }
-  }
-
-  const reset = useCallback(async () => {
-    setResetting(true)
-    try {
-      await nodesApi.resetMaxReplicasPerModel(node.id)
-      addToast(`Override cleared on ${node.name}; worker flag will apply on next re-registration`, 'success')
-      onUpdate?.(null)
-    } catch (err) {
-      addToast(`Could not reset override: ${err.message || err}`, 'error')
-    } finally {
-      setResetting(false)
-    }
-  }, [node, onUpdate, addToast])
-
-  return (
-    <div style={{
-      display: 'flex', alignItems: 'flex-start', gap: 'var(--spacing-md)',
-    }}>
-      <i className="fas fa-layer-group" style={{ color: 'var(--color-text-muted)', marginTop: 3 }} aria-hidden="true" />
-      <div style={{ flex: 1, minWidth: 0 }}>
-        <div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
-          <label
-            htmlFor={`capacity-${node.id}`}
-            style={{ fontSize: '0.8125rem', fontWeight: 600, color: 'var(--color-text-primary)' }}
-          >
-            Max replicas per model
-          </label>
-          {editing ? (
-            <>
-              <input
-                id={`capacity-${node.id}`}
-                type="number"
-                min={1}
-                value={draft}
-                disabled={saving}
-                onChange={(e) => setDraft(e.target.value)}
-                onKeyDown={onKeyDown}
-                autoFocus
-                aria-describedby={`capacity-hint-${node.id}`}
-                style={{
-                  width: 72, padding: '4px 8px', borderRadius: 'var(--radius-sm)',
-                  border: '1px solid var(--color-border)', background: 'var(--color-bg-primary)',
-                  fontFamily: 'var(--font-mono)', fontSize: '0.8125rem',
-                  color: 'var(--color-text-primary)',
-                }}
-              />
-              <button
-                className="btn btn-primary btn-sm"
-                onClick={save}
-                disabled={saving}
-                style={{ minHeight: 32 }}
-                aria-label="Save replica capacity"
-              >
-                {saving ? <LoadingSpinner size="xs" /> : <><i className="fas fa-check" /> Save</>}
-              </button>
-              <button
-                className="btn btn-secondary btn-sm"
-                onClick={cancel}
-                disabled={saving}
-                style={{ minHeight: 32 }}
-                aria-label="Cancel"
-              >
-                Cancel
-              </button>
-            </>
-          ) : (
-            <>
-              <span
-                className="cell-mono"
-                style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}
-              >
-                {current}
-              </span>
-              {isOverride && (
-                <span
-                  title="This value was set from the UI. It will persist across worker restarts until you click Reset."
-                  style={{
-                    display: 'inline-block', fontSize: '0.6875rem', padding: '1px 6px',
-                    borderRadius: 'var(--radius-sm)', fontWeight: 500,
-                    background: 'var(--color-bg-primary)',
-                    border: '1px solid var(--color-warning, #d97706)',
-                    color: 'var(--color-warning, #d97706)',
-                  }}
-                >
-                  override
-                </span>
-              )}
-              <button
-                onClick={() => setEditing(true)}
-                aria-label={`Edit replica capacity (currently ${current})`}
-                title="Change replica capacity for this node"
-                style={{
-                  display: 'inline-flex', alignItems: 'center', justifyContent: 'center',
-                  minWidth: 32, minHeight: 32, padding: 4, borderRadius: 'var(--radius-sm)',
-                  border: '1px solid var(--color-border-subtle)',
-                  background: 'transparent', color: 'var(--color-text-muted)', cursor: 'pointer',
-                }}
-              >
-                <i className="fas fa-pencil-alt" />
-              </button>
-              {isOverride && (
-                <button
-                  onClick={reset}
-                  disabled={resetting}
-                  aria-label="Clear admin override and let the worker flag apply"
-                  title="Clear override; the worker's --max-replicas-per-model flag will apply on the next re-registration"
-                  className="btn btn-secondary btn-sm"
-                  style={{ minHeight: 32 }}
-                >
-                  {resetting ? <LoadingSpinner size="xs" /> : <><i className="fas fa-undo" /> Reset</>}
-                </button>
-              )}
-            </>
-          )}
-        </div>
-        <div
-          id={`capacity-hint-${node.id}`}
-          style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 4, lineHeight: 1.4 }}
-        >
-          {isOverride
-            ? <>Set from here. <strong>Reset</strong> to use the worker's default.</>
-            : <>Saved values stick across worker restarts.</>}
-        </div>
-      </div>
-    </div>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/ClusterPulse.jsx
+++ b/core/http/react-ui/src/components/nodes/ClusterPulse.jsx
@@ -1,18 +0,0 @@
-import { formatVRAM } from './nodeStatus'
-
-export default function ClusterPulse({ nodes }) {
-  const total = nodes.length
-  const healthy = nodes.filter(n => n.status === 'healthy').length
-  const draining = nodes.filter(n => n.status === 'draining').length
-  const usedVRAM = nodes.reduce((s, n) =>
-    (n.total_vram && n.available_vram != null) ? s + (n.total_vram - n.available_vram) : s, 0)
-  const vramStr = formatVRAM(usedVRAM)
-  return (
-    <p className="cluster-pulse">
-      <span className="cluster-pulse__strong">{total} {total === 1 ? 'node' : 'nodes'}</span>
-      {' · '}<span style={{ color: 'var(--color-success)' }}>{healthy} healthy</span>
-      {draining > 0 && <>{' · '}<span style={{ color: 'var(--color-warning)' }}>{draining} draining</span></>}
-      {vramStr && <>{' · '}{vramStr} VRAM in use</>}
-    </p>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/KeyValueChips.jsx
+++ b/core/http/react-ui/src/components/nodes/KeyValueChips.jsx
@@ -1,98 +0,0 @@
-import { useState } from 'react'
-
-/**
- * Controlled chip-builder for { key: value } maps. Replaces the prior
- * comma-separated-string Node Selector input AND the bespoke Labels editor
- * in the node drawer - both were rendering the same chip pattern with
- * subtly different markup.
- *
- * Fully controlled: parent owns the map and decides what onAdd/onRemove
- * does (form state for the scheduling form; API calls for the live
- * labels editor). The component just renders chips and a key/value input
- * row.
- *
- * Props:
- *   pairs       - current map of key -> value
- *   onAdd(k,v)  - called when the user adds a pair (parent handles dedup
- *                 and persistence side effects)
- *   onRemove(k) - called when a chip's × is clicked
- *   placeholderKey, placeholderValue - input hints
- *   ariaLabel   - accessible name for the section
- */
-export default function KeyValueChips({ pairs, onAdd, onRemove, placeholderKey = 'key', placeholderValue = 'value', ariaLabel }) {
-  const [k, setK] = useState('')
-  const [v, setV] = useState('')
-
-  const add = () => {
-    const key = k.trim()
-    if (!key) return
-    onAdd(key, v.trim())
-    setK(''); setV('')
-  }
-  const onKeyDown = (e) => {
-    if (e.key === 'Enter') { e.preventDefault(); add() }
-  }
-
-  const entries = pairs ? Object.entries(pairs) : []
-  return (
-    <div aria-label={ariaLabel}>
-      {entries.length > 0 && (
-        <div style={{ display: 'flex', flexWrap: 'wrap', gap: 4, marginBottom: 'var(--spacing-xs)' }}>
-          {entries.map(([key, val]) => (
-            <span key={key} style={{
-              display: 'inline-flex', alignItems: 'center', gap: 4,
-              fontSize: '0.75rem', padding: '2px 8px',
-              borderRadius: 'var(--radius-sm)',
-              background: 'var(--color-bg-tertiary)',
-              border: '1px solid var(--color-border-subtle)',
-              fontFamily: 'var(--font-mono)',
-            }}>
-              {key}={val}
-              <button
-                type="button"
-                onClick={(e) => { e.stopPropagation(); onRemove(key) }}
-                aria-label={`Remove ${key}`}
-                title="Remove"
-                style={{
-                  background: 'none', border: 'none', cursor: 'pointer',
-                  color: 'var(--color-text-muted)', fontSize: '0.625rem', padding: 0,
-                }}
-              >
-                <i className="fas fa-times" />
-              </button>
-            </span>
-          ))}
-        </div>
-      )}
-      <div style={{ display: 'flex', gap: 'var(--spacing-xs)', alignItems: 'stretch' }}>
-        <input
-          className="input"
-          type="text"
-          placeholder={placeholderKey}
-          value={k}
-          onChange={e => setK(e.target.value)}
-          onKeyDown={onKeyDown}
-          style={{ flex: 1 }}
-        />
-        <input
-          className="input"
-          type="text"
-          placeholder={placeholderValue}
-          value={v}
-          onChange={e => setV(e.target.value)}
-          onKeyDown={onKeyDown}
-          style={{ flex: 1 }}
-        />
-        <button
-          type="button"
-          className="btn btn-secondary btn-sm"
-          onClick={add}
-          disabled={!k.trim()}
-          style={{ minHeight: 36 }}
-        >
-          <i className="fas fa-plus" /> Add
-        </button>
-      </div>
-    </div>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/ModelChip.jsx
+++ b/core/http/react-ui/src/components/nodes/ModelChip.jsx
@@ -1,12 +0,0 @@
-import { modelStateConfig } from './nodeStatus'
-
-export default function ModelChip({ model }) {
-  const cfg = modelStateConfig[model.state] || modelStateConfig.idle
-  return (
-    <span className="model-chip" style={{ background: cfg.bg, color: cfg.color, borderColor: cfg.border }}>
-      <span className="model-chip__dot" style={{ background: cfg.color }} />
-      {model.model_name}
-      {model.state !== 'loaded' && <span className="model-chip__state"> {model.state}</span>}
-    </span>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/NodePanel.jsx
+++ b/core/http/react-ui/src/components/nodes/NodePanel.jsx
@@ -1,60 +0,0 @@
-import { useNavigate } from 'react-router-dom'
-import StatusPill from './StatusPill'
-import ModelChip from './ModelChip'
-import ActionMenu from '../ActionMenu'
-import { formatVRAM } from './nodeStatus'
-
-export default function NodePanel({ node, models = [], onApprove, onDrain, onResume, onRemove }) {
-  const navigate = useNavigate()
-  const isAgent = node.node_type === 'agent'
-  const open = () => navigate(`/app/nodes/${node.id}`)
-  const usedVRAM = node.total_vram && node.available_vram != null ? node.total_vram - node.available_vram : null
-
-  return (
-    <div className="node-panel">
-      <div className="node-panel__main" onClick={open} role="button" tabIndex={0}
-        onKeyDown={(e) => { if (e.key === 'Enter') open() }}>
-        <div className="node-panel__head">
-          <div className="node-panel__id">
-            <StatusPill status={node.status} />
-            <span className="node-panel__name">{node.name}</span>
-            <span className="cell-mono cell-muted">{node.address}</span>
-          </div>
-          <div className="node-panel__actions" onClick={(e) => e.stopPropagation()}>
-            {node.status === 'pending' && (
-              <button className="btn btn-primary btn-sm" onClick={() => onApprove(node.id)}>
-                <i className="fas fa-check" /> Approve
-              </button>
-            )}
-            <ActionMenu
-              ariaLabel={`Actions for ${node.name}`}
-              triggerLabel={`Actions for ${node.name}`}
-              items={[
-                { key: 'resume', icon: 'fa-play', label: 'Resume', hidden: node.status !== 'draining', onClick: () => onResume(node.id) },
-                { key: 'drain', icon: 'fa-pause', label: 'Drain', hidden: node.status === 'draining' || node.status === 'pending', onClick: () => onDrain(node.id) },
-                { divider: true, hidden: node.status === 'pending' },
-                { key: 'remove', icon: 'fa-trash', label: 'Remove from cluster', danger: true, onClick: () => onRemove(node) },
-              ]}
-            />
-          </div>
-        </div>
-
-        {!isAgent && (
-          <>
-            <div className="node-panel__meta">
-              {node.total_vram > 0 && (
-                <span className="cell-mono">VRAM {formatVRAM(usedVRAM) || '0'} / {formatVRAM(node.total_vram)}</span>
-              )}
-              <span className="cell-mono">{node.in_flight_count || 0} in-flight</span>
-            </div>
-            <div className="node-panel__models">
-              {models.length === 0
-                ? <span className="cell-muted">No models loaded</span>
-                : models.map(m => <ModelChip key={`${m.model_name}-${m.replica_index ?? 0}`} model={m} />)}
-            </div>
-          </>
-        )}
-      </div>
-    </div>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/StatusPill.jsx
+++ b/core/http/react-ui/src/components/nodes/StatusPill.jsx
@@ -1,11 +0,0 @@
-import { statusConfig } from './nodeStatus'
-
-export default function StatusPill({ status }) {
-  const cfg = statusConfig[status] || statusConfig.unhealthy
-  return (
-    <span className="node-status" style={{ color: cfg.color }}>
-      <span className="node-status__dot" style={{ background: cfg.color }} />
-      {cfg.label}
-    </span>
-  )
-}
--- a/core/http/react-ui/src/components/nodes/nodeStatus.js
+++ b/core/http/react-ui/src/components/nodes/nodeStatus.js
@@ -1,34 +0,0 @@
-export const statusConfig = {
-  healthy: { color: 'var(--color-success)', label: 'Healthy' },
-  unhealthy: { color: 'var(--color-error)', label: 'Unhealthy' },
-  offline: { color: 'var(--color-error)', label: 'Offline' },
-  registering: { color: 'var(--color-primary)', label: 'Registering' },
-  draining: { color: 'var(--color-warning)', label: 'Draining' },
-  pending: { color: 'var(--color-warning)', label: 'Pending Approval' },
-}
-
-export const modelStateConfig = {
-  loaded: { bg: 'var(--color-success-light)', color: 'var(--color-success)', border: 'var(--color-success-border)' },
-  loading: { bg: 'var(--color-primary-light)', color: 'var(--color-primary)', border: 'var(--color-primary-border)' },
-  unloading: { bg: 'var(--color-warning-light)', color: 'var(--color-warning)', border: 'var(--color-warning-border)' },
-  idle: { bg: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)', border: 'var(--color-border-subtle)' },
-}
-
-export function formatVRAM(bytes) {
-  if (!bytes || bytes === 0) return null
-  const gb = bytes / (1024 * 1024 * 1024)
-  return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
-}
-
-export function timeAgo(dateString) {
-  if (!dateString) return 'never'
-  const seconds = Math.floor((Date.now() - new Date(dateString).getTime()) / 1000)
-  if (seconds < 0) return 'just now'
-  if (seconds < 60) return `${seconds}s ago`
-  const minutes = Math.floor(seconds / 60)
-  if (minutes < 60) return `${minutes}m ago`
-  const hours = Math.floor(minutes / 60)
-  if (hours < 24) return `${hours}h ago`
-  const days = Math.floor(hours / 24)
-  return `${days}d ago`
-}
--- a/core/http/react-ui/src/pages/NodeDetail.jsx
+++ b/core/http/react-ui/src/pages/NodeDetail.jsx
@@ -1,352 +0,0 @@
-import { useState, useEffect, useCallback } from 'react'
-import { useParams, useNavigate, useOutletContext } from 'react-router-dom'
-import { nodesApi } from '../utils/api'
-import PageHeader from '../components/PageHeader'
-import LoadingSpinner from '../components/LoadingSpinner'
-import ConfirmDialog from '../components/ConfirmDialog'
-import StatusPill from '../components/nodes/StatusPill'
-import CapacityEditor from '../components/nodes/CapacityEditor'
-import KeyValueChips from '../components/nodes/KeyValueChips'
-import { formatVRAM, modelStateConfig, timeAgo } from '../components/nodes/nodeStatus'
-
-// Deep-linkable node management home. Reached by clicking a roster panel on
-// /app/nodes. Surfaces what's running here plus the management affordances
-// (capacity, backends, labels, drain/resume/remove) that previously lived in
-// the expanded-row "Manage" drawer.
-export default function NodeDetail() {
-  const { id } = useParams()
-  const navigate = useNavigate()
-  const { addToast } = useOutletContext()
-  const [node, setNode] = useState(null)
-  const [models, setModels] = useState([])
-  const [backends, setBackends] = useState([])
-  const [loading, setLoading] = useState(true)
-  const [confirmRemove, setConfirmRemove] = useState(false)
-  const [confirmUnload, setConfirmUnload] = useState(null)
-  const [confirmDeleteBackend, setConfirmDeleteBackend] = useState(null)
-  // Promise-based shrink confirmation: CapacityEditor awaits this hook so the
-  // page owns the dialog (it can phrase the message with full node context).
-  const [confirmShrinkState, setConfirmShrinkState] = useState(null)
-
-  const refresh = useCallback(async () => {
-    try {
-      const n = await nodesApi.get(id)
-      setNode(n)
-      const [m, b] = await Promise.all([nodesApi.getModels(id), nodesApi.getBackends(id)])
-      setModels(Array.isArray(m) ? m : [])
-      setBackends(Array.isArray(b) ? b : [])
-    } catch (err) {
-      addToast(`Failed to load node: ${err.message}`, 'error')
-    } finally {
-      setLoading(false)
-    }
-  }, [id, addToast])
-
-  useEffect(() => { refresh() }, [refresh])
-
-  const confirmShrink = useCallback((ctx) => new Promise((resolve) => {
-    setConfirmShrinkState({ ...ctx, resolve })
-  }), [])
-
-  if (loading) return <div className="page page--wide" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
-  if (!node) return <div className="page page--wide"><PageHeader title="Node not found" /></div>
-
-  const drain = async () => { try { await nodesApi.drain(id); addToast('Node set to draining', 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const resume = async () => { try { await nodesApi.resume(id); addToast('Node resumed', 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const remove = async () => { try { await nodesApi.delete(id); addToast('Node removed', 'success'); navigate('/app/nodes') } catch (e) { addToast(e.message, 'error') } }
-  const unload = async (name) => { try { await nodesApi.unloadModel(id, name); addToast(`Model "${name}" unloaded`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const upgradeBackend = async (name) => { try { await nodesApi.installBackend(id, name); addToast(`Backend "${name}" upgraded`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const deleteBackend = async (name) => { try { await nodesApi.deleteBackend(id, name); addToast(`Backend "${name}" deleted`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const addLabel = async (k, v) => { try { await nodesApi.mergeLabels(id, { [k]: v }); refresh() } catch (e) { addToast(e.message, 'error') } }
-  const delLabel = async (k) => { try { await nodesApi.deleteLabel(id, k); refresh() } catch (e) { addToast(e.message, 'error') } }
-
-  const usedVRAM = node.total_vram && node.available_vram != null ? node.total_vram - node.available_vram : 0
-  // {modelName: replicaCount} of loaded models so the shrink confirm can warn
-  // if the new cap is below the actual count of any single model on this node.
-  const loadedModelCounts = (() => {
-    const counts = {}
-    models.forEach(m => { if (m.state === 'loaded') counts[m.model_name] = (counts[m.model_name] || 0) + 1 })
-    return counts
-  })()
-
-  return (
-    <div className="page page--wide">
-      <PageHeader
-        eyebrow={<a onClick={() => navigate('/app/nodes')} style={{ cursor: 'pointer', color: 'var(--color-primary)' }}><i className="fas fa-arrow-left" style={{ marginRight: 6 }} aria-hidden="true" />Cluster</a>}
-        title={<><StatusPill status={node.status} /> {node.name}</>}
-        supporting={node.address}
-        actions={
-          <>
-            {node.status === 'draining'
-              ? <button className="btn btn-secondary btn-sm" onClick={resume}><i className="fas fa-play" /> Resume</button>
-              : <button className="btn btn-secondary btn-sm" onClick={drain}><i className="fas fa-pause" /> Drain</button>}
-            <button className="btn btn-danger btn-sm" onClick={() => setConfirmRemove(true)}><i className="fas fa-trash" /> Remove</button>
-          </>
-        }
-      />
-
-      {/* Inline metrics row: VRAM / in-flight - no boxes, just labelled values. */}
-      <div className="node-detail__metrics">
-        {node.total_vram > 0 && (
-          <div>
-            <div className="drawer-eyebrow">VRAM</div>
-            <span className="cell-mono">{formatVRAM(usedVRAM) || '0'} / {formatVRAM(node.total_vram)}</span>
-          </div>
-        )}
-        <div>
-          <div className="drawer-eyebrow">In-flight</div>
-          <span className="cell-mono">{node.in_flight_count || 0}</span>
-        </div>
-        {node.node_type !== 'agent' && (
-          <div style={{ minWidth: 0 }}>
-            <div className="drawer-eyebrow">Capacity</div>
-            <CapacityEditor
-              node={node}
-              loadedModelCounts={loadedModelCounts}
-              confirmShrink={confirmShrink}
-              addToast={addToast}
-              onUpdate={() => refresh()}
-            />
-          </div>
-        )}
-      </div>
-
-      {/* Running models */}
-      <div style={{ marginTop: 'var(--spacing-lg)' }}>
-        <div className="drawer-eyebrow">Running models</div>
-        {models.length === 0 ? (
-          <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: '0 0 var(--spacing-md) 0' }}>
-            <i className="fas fa-cube" style={{ marginRight: 6, opacity: 0.6 }} aria-hidden="true" />
-            No models loaded yet - they'll appear here when scheduled to this node.
-          </p>
-        ) : (
-          <table className="table" style={{ margin: 0 }}>
-            <thead>
-              <tr>
-                <th>Model</th>
-                <th>State</th>
-                <th>In-Flight</th>
-                <th style={{ width: 40 }}>Logs</th>
-                <th style={{ textAlign: 'right' }}>Actions</th>
-              </tr>
-            </thead>
-            <tbody>
-              {(() => {
-                // Pre-compute per-model replica counts so the disambiguation
-                // pill only renders when this node actually hosts >1 replica
-                // of the same model. Single-replica deployments stay clean.
-                const replicaCounts = {}
-                models.forEach(m => { replicaCounts[m.model_name] = (replicaCounts[m.model_name] || 0) + 1 })
-                return models.map(m => {
-                  const stCfg = modelStateConfig[m.state] || modelStateConfig.idle
-                  const showReplica = (replicaCounts[m.model_name] || 0) > 1
-                  // Per-replica process key - what the worker stores logs under and what the
-                  // store's GetLines/Subscribe match on for replica-scoped filtering.
-                  const processKey = `${m.model_name}#${m.replica_index ?? 0}`
-                  return (
-                    <tr key={m.id || `${m.model_name}#${m.replica_index ?? 0}`}>
-                      <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
-                        {m.model_name}
-                        {showReplica && (
-                          <span
-                            className="cell-mono"
-                            aria-label={`replica ${m.replica_index ?? 0}`}
-                            title={`Replica ${m.replica_index ?? 0} on this node`}
-                            style={{
-                              marginLeft: 8, padding: '1px 6px', borderRadius: 'var(--radius-sm)',
-                              background: 'var(--color-bg-tertiary)',
-                              border: '1px solid var(--color-border-subtle)',
-                              fontSize: '0.6875rem', fontWeight: 500,
-                              color: 'var(--color-text-secondary)',
-                            }}
-                          >
-                            rep {m.replica_index ?? 0}
-                          </span>
-                        )}
-                      </td>
-                      <td>
-                        <span style={{
-                          display: 'inline-block', padding: '2px 8px', borderRadius: 'var(--radius-sm)',
-                          fontSize: '0.75rem', fontWeight: 500,
-                          background: stCfg.bg, color: stCfg.color, border: `1px solid ${stCfg.border}`,
-                        }}>
-                          {m.state}
-                        </span>
-                      </td>
-                      <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
-                        {m.in_flight ?? 0}
-                      </td>
-                      <td>
-                        <a
-                          href="#"
-                          onClick={(e) => {
-                            e.preventDefault()
-                            // Send the replica-scoped process key (modelName#replicaIndex).
-                            navigate(`/app/node-backend-logs/${id}/${encodeURIComponent(processKey)}`)
-                          }}
-                          style={{ fontSize: '0.75rem', color: 'var(--color-primary)' }}
-                          title={showReplica ? `View backend logs for replica ${m.replica_index ?? 0}` : 'View backend logs'}
-                        >
-                          <i className="fas fa-terminal" />
-                        </a>
-                      </td>
-                      <td style={{ textAlign: 'right' }}>
-                        <button
-                          className="btn btn-danger btn-sm"
-                          title={m.in_flight > 0 ? 'Unload model (has in-flight requests)' : 'Unload model'}
-                          onClick={() => setConfirmUnload({ modelName: m.model_name, inFlight: m.in_flight ?? 0 })}
-                        >
-                          <i className="fas fa-stop" />
-                        </button>
-                      </td>
-                    </tr>
-                  )
-                })
-              })()}
-            </tbody>
-          </table>
-        )}
-      </div>
-
-      {/* Installed backends */}
-      <div style={{ marginTop: 'var(--spacing-lg)' }}>
-        <div style={{
-          display: 'flex', alignItems: 'center', justifyContent: 'space-between',
-          marginBottom: 'var(--spacing-sm)',
-        }}>
-          <div className="drawer-eyebrow" style={{ margin: 0 }}>Installed backends</div>
-          <button
-            type="button"
-            className="btn btn-secondary btn-sm"
-            onClick={() => navigate(`/app/backends?target=${encodeURIComponent(id)}`)}
-            title={`Install a backend on ${node.name}`}
-          >
-            <i className="fas fa-plus" /> Add backend
-          </button>
-        </div>
-        {backends.length === 0 ? (
-          <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: 0 }}>
-            None installed. <a href="#" style={{ color: 'var(--color-primary)' }} onClick={(e) => { e.preventDefault(); navigate(`/app/backends?target=${encodeURIComponent(id)}`) }}>Install one from the gallery</a> to schedule models here.
-          </p>
-        ) : (
-          <table className="table" style={{ margin: 0 }}>
-            <thead>
-              <tr>
-                <th>Name</th>
-                <th>Type</th>
-                <th>Installed At</th>
-                <th style={{ textAlign: 'right' }}>Actions</th>
-              </tr>
-            </thead>
-            <tbody>
-              {backends.map(b => (
-                <tr key={b.name}>
-                  <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
-                    {b.name}
-                  </td>
-                  <td>
-                    <span style={{
-                      display: 'inline-block', padding: '2px 8px', borderRadius: 'var(--radius-sm)',
-                      fontSize: '0.75rem', fontWeight: 500,
-                      background: b.is_system ? 'var(--color-bg-tertiary)' : 'var(--color-primary-light)',
-                      color: b.is_system ? 'var(--color-text-muted)' : 'var(--color-primary)',
-                      border: `1px solid ${b.is_system ? 'var(--color-border-subtle)' : 'var(--color-primary-border)'}`,
-                    }}>
-                      {b.is_system ? 'system' : 'gallery'}
-                    </span>
-                  </td>
-                  <td style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>
-                    {b.installed_at ? timeAgo(b.installed_at) : '-'}
-                  </td>
-                  <td style={{ textAlign: 'right' }}>
-                    {!b.is_system && (
-                      <div style={{ display: 'inline-flex', gap: 'var(--spacing-xs)' }}>
-                        <button
-                          className="btn btn-secondary btn-sm"
-                          onClick={() => upgradeBackend(b.name)}
-                          title="Upgrade backend on this node"
-                        >
-                          <i className="fas fa-arrow-up" />
-                        </button>
-                        <button
-                          className="btn btn-danger-ghost btn-sm"
-                          onClick={() => setConfirmDeleteBackend({ backend: b.name })}
-                          title="Delete backend from this node"
-                        >
-                          <i className="fas fa-trash" />
-                        </button>
-                      </div>
-                    )}
-                  </td>
-                </tr>
-              ))}
-            </tbody>
-          </table>
-        )}
-      </div>
-
-      {/* Labels - node.replica-slots is filtered out so the Capacity editor
-          stays the single source of truth for that label. */}
-      <div style={{ marginTop: 'var(--spacing-lg)' }}>
-        <div className="drawer-eyebrow">Labels</div>
-        <KeyValueChips
-          pairs={Object.fromEntries(Object.entries(node.labels || {}).filter(([k]) => k !== 'node.replica-slots'))}
-          onAdd={addLabel}
-          onRemove={delLabel}
-          placeholderKey="key"
-          placeholderValue="value"
-          ariaLabel="Node labels"
-        />
-      </div>
-
-      <ConfirmDialog
-        open={confirmRemove}
-        title="Remove node"
-        message={`Remove "${node.name}" from the cluster? This will deregister it.`}
-        confirmLabel="Remove"
-        danger
-        onConfirm={() => { remove(); setConfirmRemove(false) }}
-        onCancel={() => setConfirmRemove(false)}
-      />
-
-      <ConfirmDialog
-        open={!!confirmUnload}
-        title="Unload Model"
-        message={
-          confirmUnload
-            ? confirmUnload.inFlight > 0
-              ? `"${confirmUnload.modelName}" currently has ${confirmUnload.inFlight} in-flight request(s). Unloading will interrupt them. Continue?`
-              : `Unload "${confirmUnload.modelName}" from ${node.name}?`
-            : ''
-        }
-        confirmLabel="Unload"
-        danger={confirmUnload?.inFlight > 0}
-        onConfirm={() => { if (confirmUnload) unload(confirmUnload.modelName); setConfirmUnload(null) }}
-        onCancel={() => setConfirmUnload(null)}
-      />
-
-      <ConfirmDialog
-        open={!!confirmDeleteBackend}
-        title="Delete Backend"
-        message={confirmDeleteBackend ? `Delete "${confirmDeleteBackend.backend}" from ${node.name}? This removes the backend files from this node only.` : ''}
-        confirmLabel="Delete"
-        danger
-        onConfirm={() => { if (confirmDeleteBackend) deleteBackend(confirmDeleteBackend.backend); setConfirmDeleteBackend(null) }}
-        onCancel={() => setConfirmDeleteBackend(null)}
-      />
-
-      <ConfirmDialog
-        open={!!confirmShrinkState}
-        title="Reduce replica capacity"
-        message={
-          confirmShrinkState
-            ? `${node.name} currently has ${confirmShrinkState.currentLoaded} replica(s) of at least one model loaded. Reducing the cap to ${confirmShrinkState.newValue} won't evict anything immediately - running replicas keep going, but the reconciler will trim down on the next idle window. Continue?`
-            : ''
-        }
-        confirmLabel="Reduce"
-        onConfirm={() => { confirmShrinkState?.resolve(true); setConfirmShrinkState(null) }}
-        onCancel={() => { confirmShrinkState?.resolve(false); setConfirmShrinkState(null) }}
-      />
-    </div>
-  )
-}
--- a/core/http/react-ui/src/pages/Nodes.jsx
+++ b/core/http/react-ui/src/pages/Nodes.jsx
--- a/core/http/react-ui/src/pages/Scheduling.jsx
+++ b/core/http/react-ui/src/pages/Scheduling.jsx
@@ -1,438 +0,0 @@
-import { useState, useEffect, useCallback } from 'react'
-import { useOutletContext } from 'react-router-dom'
-import { useTranslation } from 'react-i18next'
-import { nodesApi } from '../utils/api'
-import PageHeader from '../components/PageHeader'
-import ConfirmDialog from '../components/ConfirmDialog'
-import ResponsiveTable from '../components/ResponsiveTable'
-import SearchableModelSelect from '../components/SearchableModelSelect'
-import KeyValueChips from '../components/nodes/KeyValueChips'
-
-// Numeric input with quick-pick preset chips. Picked over a slider because
-// replica counts are exact specs (operator math), not fuzzy estimates. The
-// chips give one-click access to common values without the slider's
-// precision/special-value problems (e.g. MaxReplicas=0 = "no limit").
-function ReplicaInput({ id, label, value, onChange, presets }) {
-  return (
-    <div style={{ flex: 1 }}>
-      <label className="form-label" htmlFor={id}>{label}</label>
-      <input
-        id={id}
-        className="input"
-        type="number"
-        min={0}
-        value={value}
-        onChange={e => onChange(parseInt(e.target.value) || 0)}
-      />
-      <div style={{ display: 'flex', gap: 4, flexWrap: 'wrap', marginTop: 6 }}>
-        {presets.map(({ v, l }) => {
-          const active = value === v
-          return (
-            <button
-              key={v}
-              type="button"
-              onClick={() => onChange(v)}
-              aria-pressed={active}
-              className="cell-mono"
-              style={{
-                padding: '2px 8px',
-                borderRadius: 'var(--radius-sm)',
-                fontSize: '0.6875rem',
-                fontWeight: 500,
-                cursor: 'pointer',
-                background: active ? 'var(--color-primary-light)' : 'transparent',
-                border: `1px solid ${active ? 'var(--color-primary-border)' : 'var(--color-border-subtle)'}`,
-                color: active ? 'var(--color-primary)' : 'var(--color-text-muted)',
-              }}
-            >{l || v}</button>
-          )
-        })}
-      </div>
-    </div>
-  )
-}
-
-function SchedulingForm({ onSave, onCancel }) {
-  const [mode, setMode] = useState('placement')
-  const [modelName, setModelName] = useState('')
-  // Selector is now a chip-builder map instead of a comma-separated string.
-  // Operators were copying syntax from docs and missing commas; the chip UI
-  // makes the key=value structure self-documenting.
-  const [selector, setSelector] = useState({})
-  const [minReplicas, setMinReplicas] = useState(1)
-  const [maxReplicas, setMaxReplicas] = useState(0)
-  // Prefix-cache routing controls. Empty routePolicy means "inherit the
-  // cluster default"; the three thresholds at 0 likewise inherit, so they
-  // stay out of the POST body's effective override only when explicitly set.
-  const [routePolicy, setRoutePolicy] = useState('')
-  const [balanceAbsThreshold, setBalanceAbsThreshold] = useState(0)
-  const [balanceRelThreshold, setBalanceRelThreshold] = useState(0)
-  const [minPrefixMatch, setMinPrefixMatch] = useState(0)
-
-  const hasSelector = Object.keys(selector).length > 0
-
-  const isValid = () => {
-    if (!modelName) return false
-    if (mode === 'placement') return hasSelector
-    if (mode === 'spread') return true
-    return minReplicas > 0 || maxReplicas > 0
-  }
-
-  const handleSubmit = () => {
-    onSave({
-      model_name: modelName,
-      node_selector: hasSelector ? selector : undefined,
-      min_replicas: mode === 'autoscaling' ? minReplicas : 0,
-      max_replicas: mode === 'autoscaling' ? maxReplicas : 0,
-      spread_all: mode === 'spread',
-      route_policy: routePolicy,
-      balance_abs_threshold: balanceAbsThreshold,
-      balance_rel_threshold: balanceRelThreshold,
-      min_prefix_match: minPrefixMatch,
-    })
-  }
-
-  return (
-    <div className="card" style={{ padding: 'var(--spacing-lg)', marginBottom: 'var(--spacing-md)' }}>
-      {/* Mode selector — uses the project's segmented control instead of two
-          50%-width filled buttons that competed visually with the actual
-          primary action (Save). */}
-      <div role="radiogroup" aria-label="Scheduling mode" className="segmented" style={{ marginBottom: 'var(--spacing-xs)' }}>
-        <button
-          type="button" role="radio" aria-checked={mode === 'placement'}
-          className={`segmented__item${mode === 'placement' ? ' is-active' : ''}`}
-          onClick={() => setMode('placement')}
-        >
-          <i className="fas fa-thumbtack" aria-hidden="true" /> Pin to nodes
-        </button>
-        <button
-          type="button" role="radio" aria-checked={mode === 'autoscaling'}
-          className={`segmented__item${mode === 'autoscaling' ? ' is-active' : ''}`}
-          onClick={() => setMode('autoscaling')}
-        >
-          <i className="fas fa-arrows-up-down" aria-hidden="true" /> Auto-scale
-        </button>
-        <button
-          type="button" role="radio" aria-checked={mode === 'spread'}
-          className={`segmented__item${mode === 'spread' ? ' is-active' : ''}`}
-          onClick={() => setMode('spread')}
-        >
-          <i className="fas fa-network-wired" aria-hidden="true" /> Spread to all
-        </button>
-      </div>
-      <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: '0 0 var(--spacing-lg) 0' }}>
-        {mode === 'placement'
-          ? 'Restrict this model to specific nodes. Loaded on demand, evictable when idle.'
-          : mode === 'spread'
-          ? 'Run one replica on every node matching the selector (all healthy nodes when empty). Tracks nodes joining and leaving.'
-          : 'Maintain a target replica count across the cluster. Min ≥ 1 protects from eviction.'}
-      </p>
-
-      {/* Linear vertical flow — model picker is the visual focus, then the
-          mode-specific fields below. No 2-column grid (the mismatched widths
-          made the form look raw). */}
-      <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
-        <div>
-          <label className="form-label" htmlFor="sched-model">Model</label>
-          {/* Searchable combobox so a long gallery doesn't force the operator
-              to scroll through hundreds of entries. Free-text is allowed —
-              you can pre-create a rule for a model that hasn't been
-              installed yet, which is a real workflow when standing up a new
-              node and pre-staging its scheduling policy. */}
-          <SearchableModelSelect
-            value={modelName}
-            onChange={setModelName}
-            placeholder="Type to search models, or paste a name..."
-          />
-        </div>
-
-        <div>
-          <label className="form-label">
-            Node selector{mode === 'placement' ? '' : ' (optional)'}
-          </label>
-          <KeyValueChips
-            pairs={selector}
-            onAdd={(k, v) => setSelector(prev => ({ ...prev, [k]: v }))}
-            onRemove={(k) => setSelector(prev => { const n = { ...prev }; delete n[k]; return n })}
-            placeholderKey="key (e.g. gpu.vendor)"
-            placeholderValue="value (e.g. nvidia)"
-            ariaLabel="Node selector"
-          />
-          <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
-            {mode === 'placement'
-              ? 'Models will load only on nodes that match all listed labels.'
-              : (hasSelector ? 'Replicas land only on matching nodes.' : 'Empty = any healthy node.')}
-          </span>
-        </div>
-
-        {mode === 'autoscaling' && (
-          <div style={{ display: 'flex', gap: 'var(--spacing-md)' }}>
-            <ReplicaInput
-              id="sched-min"
-              label="Min replicas"
-              value={minReplicas}
-              onChange={setMinReplicas}
-              presets={[{ v: 1 }, { v: 2 }, { v: 3 }, { v: 4 }]}
-            />
-            <ReplicaInput
-              id="sched-max"
-              label="Max replicas"
-              value={maxReplicas}
-              onChange={setMaxReplicas}
-              presets={[{ v: 0, l: 'no limit' }, { v: 2 }, { v: 4 }, { v: 8 }]}
-            />
-          </div>
-        )}
-
-        {/* Per-model routing policy. Left empty/zero these inherit the
-            cluster-wide defaults; set them to override how requests for this
-            model are spread across replicas. */}
-        <div>
-          <label className="form-label" htmlFor="sched-route-policy">Routing policy</label>
-          <select
-            id="sched-route-policy"
-            className="input"
-            value={routePolicy}
-            onChange={e => setRoutePolicy(e.target.value)}
-          >
-            <option value="">Default (cluster setting)</option>
-            <option value="round_robin">Round Robin</option>
-            <option value="prefix_cache">Prefix Cache</option>
-          </select>
-          <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
-            Prefix Cache routes shared-prefix requests to the same replica to reuse its KV cache, falling back to round-robin when replicas are imbalanced.
-          </span>
-        </div>
-
-        {routePolicy === 'prefix_cache' && (
-          <div style={{ display: 'flex', gap: 'var(--spacing-md)' }}>
-            <div style={{ flex: 1 }}>
-              <label className="form-label" htmlFor="sched-min-prefix-match">Min prefix match</label>
-              <input
-                id="sched-min-prefix-match"
-                className="input"
-                type="number"
-                step="0.05"
-                min="0"
-                max="1"
-                value={minPrefixMatch}
-                onChange={e => setMinPrefixMatch(parseFloat(e.target.value) || 0)}
-              />
-              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
-                Fraction of the prompt (0..1) that must match a cached prefix before affinity kicks in. 0 inherits the default.
-              </span>
-            </div>
-            <div style={{ flex: 1 }}>
-              <label className="form-label" htmlFor="sched-balance-abs">Balance abs threshold</label>
-              <input
-                id="sched-balance-abs"
-                className="input"
-                type="number"
-                min="0"
-                value={balanceAbsThreshold}
-                onChange={e => setBalanceAbsThreshold(parseInt(e.target.value) || 0)}
-              />
-              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
-                Max absolute in-flight gap allowed before falling back to round-robin. 0 inherits the default.
-              </span>
-            </div>
-            <div style={{ flex: 1 }}>
-              <label className="form-label" htmlFor="sched-balance-rel">Balance rel threshold</label>
-              <input
-                id="sched-balance-rel"
-                className="input"
-                type="number"
-                step="0.1"
-                min="0"
-                value={balanceRelThreshold}
-                onChange={e => setBalanceRelThreshold(parseFloat(e.target.value) || 0)}
-              />
-              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
-                Max relative in-flight ratio (&gt;= 1) allowed before falling back to round-robin. 0 inherits the default.
-              </span>
-            </div>
-          </div>
-        )}
-      </div>
-
-      {/* Hairline divider above the actions, matching the project's form pattern. */}
-      <div style={{
-        display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end',
-        marginTop: 'var(--spacing-lg)', paddingTop: 'var(--spacing-md)',
-        borderTop: '1px solid var(--color-border-subtle)',
-      }}>
-        <button className="btn btn-secondary btn-sm" onClick={onCancel}>Cancel</button>
-        <button className="btn btn-primary btn-sm" onClick={handleSubmit} disabled={!isValid()}>Save rule</button>
-      </div>
-    </div>
-  )
-}
-
-export default function Scheduling() {
-  const { addToast } = useOutletContext()
-  const { t } = useTranslation('admin')
-  const [schedulingConfigs, setSchedulingConfigs] = useState([])
-  const [showForm, setShowForm] = useState(false)
-  const [confirmDelete, setConfirmDelete] = useState(null)
-
-  const fetchScheduling = useCallback(async () => {
-    try {
-      const data = await nodesApi.listScheduling()
-      setSchedulingConfigs(Array.isArray(data) ? data : [])
-    } catch { setSchedulingConfigs([]) }
-  }, [])
-
-  useEffect(() => { fetchScheduling() }, [fetchScheduling])
-
-  const handleSave = async (config) => {
-    try {
-      await nodesApi.setScheduling(config)
-      addToast('Scheduling rule saved', 'success')
-      setShowForm(false)
-      fetchScheduling()
-    } catch (err) { addToast(`Failed to save rule: ${err.message}`, 'error') }
-  }
-
-  const handleDelete = async (model) => {
-    try {
-      await nodesApi.deleteScheduling(model)
-      addToast('Scheduling rule removed', 'success')
-      setConfirmDelete(null)
-      fetchScheduling()
-    } catch (err) { addToast(`Failed to remove rule: ${err.message}`, 'error') }
-  }
-
-  return (
-    <div className="page page--wide">
-      <PageHeader
-        title={<><i className="fas fa-calendar-alt" style={{ marginRight: 'var(--spacing-sm)' }} />{t('scheduling.title')}</>}
-        supporting={t('scheduling.subtitle')}
-      />
-      <div>
-        <button className="btn btn-primary btn-sm" style={{ marginBottom: 'var(--spacing-md)' }}
-          onClick={() => setShowForm(f => !f)}>
-          <i className="fas fa-plus" style={{ marginRight: 6 }} />
-          Add Scheduling Rule
-        </button>
-        {showForm && <SchedulingForm onSave={handleSave} onCancel={() => setShowForm(false)} />}
-        {schedulingConfigs.length === 0 && !showForm ? (
-          <p style={{ fontSize: '0.875rem', color: 'var(--color-text-muted)', textAlign: 'center', padding: 'var(--spacing-xl) 0' }}>
-            No scheduling rules configured. Add a rule to control how models are placed on nodes.
-          </p>
-        ) : schedulingConfigs.length > 0 && (
-          <ResponsiveTable>
-              <thead><tr>
-                <th>Model</th>
-                <th>Mode</th>
-                <th>Node Selector</th>
-                <th>Min Replicas</th>
-                <th>Max Replicas</th>
-                <th>Routing</th>
-                <th>Thresholds</th>
-                <th>Status</th>
-                <th style={{ textAlign: 'right' }}>Actions</th>
-              </tr></thead>
-              <tbody>
-                {schedulingConfigs.map(cfg => {
-                  const isSpread = !!cfg.spread_all
-                  const isAutoScaling = !isSpread && (cfg.min_replicas > 0 || cfg.max_replicas > 0)
-                  const hasSelector = !!cfg.node_selector
-                  const modeLabel = isSpread ? 'Spread' : isAutoScaling ? 'Auto-scaling' : hasSelector ? 'Placement' : 'Inactive'
-                  const modeColor = isSpread ? 'var(--color-warning)' : isAutoScaling ? 'var(--color-success)' : hasSelector ? 'var(--color-primary)' : 'var(--color-text-muted)'
-                  // Cooldown: reconciler tripped the circuit breaker because cluster
-                  // capacity is exhausted. Surface so the operator sees it instead
-                  // of the model silently failing to scale.
-                  const unsatisfiableUntil = cfg.unsatisfiable_until ? new Date(cfg.unsatisfiable_until) : null
-                  const isUnsatisfiable = unsatisfiableUntil && unsatisfiableUntil.getTime() > Date.now()
-                  return (
-                  <tr key={cfg.id || cfg.model_name}>
-                    <td style={{ fontWeight: 600, fontSize: '0.875rem' }}>{cfg.model_name}</td>
-                    <td>
-                      <span style={{
-                        display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px', borderRadius: "var(--radius-sm)",
-                        background: 'var(--color-bg-tertiary)', border: `1px solid ${modeColor}`,
-                        color: modeColor, fontWeight: 600,
-                      }}>{modeLabel}</span>
-                    </td>
-                    <td>
-                      {cfg.node_selector ? (() => {
-                        try {
-                          const sel = typeof cfg.node_selector === 'string' ? JSON.parse(cfg.node_selector) : cfg.node_selector
-                          return Object.entries(sel).map(([k,v]) => (
-                            <span key={k} style={{
-                              display: 'inline-block', fontSize: '0.75rem', padding: '2px 6px', borderRadius: "var(--radius-sm)",
-                              background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
-                              fontFamily: 'var(--font-mono)', marginRight: 4,
-                            }}>{k}={v}</span>
-                          ))
-                        } catch { return <span style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}>{cfg.node_selector}</span> }
-                      })() : <span style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}>Any node</span>}
-                    </td>
-                    <td style={{ fontFamily: 'var(--font-mono)' }}>
-                      {isSpread
-                        ? <span style={{
-                            display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px', borderRadius: "var(--radius-sm)",
-                            background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-warning)',
-                            color: 'var(--color-warning)', fontWeight: 600, fontFamily: 'var(--font-sans)',
-                          }}>Spread: all matching nodes</span>
-                        : isAutoScaling ? cfg.min_replicas : '-'}
-                    </td>
-                    <td style={{ fontFamily: 'var(--font-mono)' }}>
-                      {isSpread ? '-' : isAutoScaling ? (cfg.max_replicas || 'no limit') : '-'}
-                    </td>
-                    <td style={{ fontSize: '0.8125rem' }}>
-                      {cfg.route_policy || 'default'}
-                    </td>
-                    <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem', color: 'var(--color-text-muted)' }}>
-                      {cfg.route_policy === 'prefix_cache' ? (
-                        <>
-                          <div>match: {cfg.min_prefix_match ? cfg.min_prefix_match : 'inherit'}</div>
-                          <div>abs: {cfg.balance_abs_threshold ? cfg.balance_abs_threshold : 'inherit'}</div>
-                          <div>rel: {cfg.balance_rel_threshold ? cfg.balance_rel_threshold : 'inherit'}</div>
-                        </>
-                      ) : '-'}
-                    </td>
-                    <td>
-                      {isUnsatisfiable ? (
-                        <span
-                          title={`Reconciler couldn't satisfy this rule (capacity exhausted). Will retry by ${unsatisfiableUntil.toLocaleString()}, or sooner on a node lifecycle change.`}
-                          style={{
-                            display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px',
-                            borderRadius: 'var(--radius-sm)', fontWeight: 600,
-                            background: 'var(--color-bg-tertiary)',
-                            border: '1px solid var(--color-warning, #d97706)',
-                            color: 'var(--color-warning, #d97706)',
-                          }}
-                        >
-                          <i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />
-                          Unsatisfiable until {unsatisfiableUntil.toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })}
-                        </span>
-                      ) : (
-                        <span style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>OK</span>
-                      )}
-                    </td>
-                    <td style={{ textAlign: 'right' }}>
-                      <button className="btn btn-danger btn-sm" onClick={() => setConfirmDelete(cfg.model_name)}>
-                        <i className="fas fa-trash" />
-                      </button>
-                    </td>
-                  </tr>
-                  )
-                })}
-              </tbody>
-          </ResponsiveTable>
-        )}
-      </div>
-
-      <ConfirmDialog
-        open={!!confirmDelete}
-        title="Remove scheduling rule"
-        message={confirmDelete ? `Remove the scheduling rule for "${confirmDelete}"?` : ''}
-        confirmLabel="Remove"
-        danger
-        onConfirm={() => confirmDelete && handleDelete(confirmDelete)}
-        onCancel={() => setConfirmDelete(null)}
-      />
-    </div>
-  )
-}
--- a/core/http/react-ui/src/router.jsx
+++ b/core/http/react-ui/src/router.jsx
@@ -69,9 +69,7 @@ const Studio = page('studio', () => import('./pages/Studio'))
 const FaceRecognition = page('face', () => import('./pages/FaceRecognition'))
 const VoiceRecognition = page('voice', () => import('./pages/VoiceRecognition'))
 const Nodes = page('nodes', () => import('./pages/Nodes'))
-const Scheduling = page('scheduling', () => import('./pages/Scheduling'))
 const NodeBackendLogs = page(null, () => import('./pages/NodeBackendLogs'))
-const NodeDetail = page(null, () => import('./pages/NodeDetail'))
 const NotFound = page(null, () => import('./pages/NotFound'))
 const Usage = page('usage', () => import('./pages/Usage'))
 const Users = page('users', () => import('./pages/Users'))
@@ -154,8 +152,6 @@ const appChildren = [
      { path: 'backend-logs/:modelId', element: <Admin><BackendLogs /></Admin> },
      { path: 'p2p', element: <Admin><P2P /></Admin> },
      { path: 'nodes', element: <Admin><Nodes /></Admin> },
-      { path: 'nodes/:id', element: <Admin><NodeDetail /></Admin> },
-      { path: 'scheduling', element: <Admin><Scheduling /></Admin> },
      { path: 'node-backend-logs/:nodeId/:modelId', element: <Admin><NodeBackendLogs /></Admin> },
      { path: 'usage', element: <Usage /> },
      { path: 'users', element: <RequireAuthEnabled><Admin><Users /></Admin></RequireAuthEnabled> },
--- a/core/http/react-ui/src/utils/api.js
+++ b/core/http/react-ui/src/utils/api.js
@@ -568,7 +568,6 @@ export const nodesApi = {
    method: 'DELETE',
  }),
  listScheduling: () => fetchJSON(API_CONFIG.endpoints.nodesScheduling),
-  allModels: () => fetchJSON(API_CONFIG.endpoints.nodesModels),
  setScheduling: (config) => postJSON(API_CONFIG.endpoints.nodesScheduling, config),
  deleteScheduling: (model) => fetchJSON(API_CONFIG.endpoints.nodesSchedulingModel(model), { method: 'DELETE' }),
 }
--- a/core/http/react-ui/src/utils/config.js
+++ b/core/http/react-ui/src/utils/config.js
@@ -144,7 +144,6 @@ export const API_CONFIG = {
    nodeLabelKey: (id, key) => `/api/nodes/${id}/labels/${key}`,
    nodeMaxReplicasPerModel: (id) => `/api/nodes/${id}/max-replicas-per-model`,
    nodesScheduling: '/api/nodes/scheduling',
-    nodesModels: '/api/nodes/models',
    nodesSchedulingModel: (model) => `/api/nodes/scheduling/${encodeURIComponent(model)}`,
  },
 }
--- a/core/http/routes/nodes.go
+++ b/core/http/routes/nodes.go
@@ -71,9 +71,6 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade
 	admin := e.Group("/api/nodes", readyMw, adminMw)
 	admin.GET("", localai.ListNodesEndpoint(registry))

-	// Cluster-wide loaded models (registered before /:id to avoid route conflicts)
-	admin.GET("/models", localai.ListAllNodeModelsEndpoint(registry))
-
 	// Model scheduling (registered before /:id to avoid route conflicts)
 	admin.GET("/scheduling", localai.ListSchedulingEndpoint(registry))
 	admin.GET("/scheduling/:model", localai.GetSchedulingEndpoint(registry))
--- a/core/services/messaging/subjects.go
+++ b/core/services/messaging/subjects.go
@@ -64,22 +64,6 @@ func SubjectGalleryProgress(opID string) string {
 	return subjectGalleryPrefix + sanitizeSubjectToken(opID) + ".progress"
 }

-// SubjectStagingProgress returns the NATS subject a frontend replica publishes
-// file-staging progress on. Staging progress is otherwise per-process state
-// (the SmartRouter's in-memory StagingTracker), so without this broadcast a
-// /api/operations poll that round-robins onto a replica that did not originate
-// the staging op sees nothing - the progress row flickers in multi-replica
-// deployments. Peers subscribe to the wildcard and merge.
-func SubjectStagingProgress(modelID string) string {
-	return subjectStagingPrefix + sanitizeSubjectToken(modelID) + ".progress"
-}
-
-const subjectStagingPrefix = "staging."
-
-// SubjectStagingProgressWildcard matches every replica's staging-progress
-// broadcasts so a peer can mirror staging ops it did not originate.
-const SubjectStagingProgressWildcard = "staging.*.progress"
-
 // SubjectGalleryOpStart and SubjectGalleryOpEnd are broadcast subjects for the
 // in-memory OpCache lifecycle. Frontend replicas publish to these when an
 // admin admits a new install/delete (Start) and when an operation is
--- a/core/services/nodes/router.go
+++ b/core/services/nodes/router.go
@@ -359,21 +359,8 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 		}
 	}

-	// Step 2: Model not loaded — schedule loading with distributed lock to prevent duplicates.
-	//
-	// Detach the cold-load from the caller's context. Staging a model can
-	// transfer multiple GB to a worker, which takes far longer than any client
-	// keeps its HTTP request open — a browser refresh, an ingress/LB idle
-	// timeout, or a round-robined retry landing on another replica all cancel
-	// the request context. If staging were bound to it, the multi-GB upload
-	// aborts with "context canceled" mid-transfer and large models can never
-	// finish staging (the model-load outage). WithoutCancel keeps the request's
-	// values (prefix chain, etc.) but drops its cancellation/deadline. Each
-	// long step still has its own bound (the file stager's resume budget,
-	// LoadModel's 5m timeout), and the per-model advisory lock below de-dupes
-	// concurrent loaders across replicas.
-	loadCtx := context.WithoutCancel(ctx)
-	loadModel := func(ctx context.Context) (*RouteResult, error) {
+	// Step 2: Model not loaded — schedule loading with distributed lock to prevent duplicates
+	loadModel := func() (*RouteResult, error) {
 		// Re-check after acquiring lock — another request may have loaded it
 		node, nm, err := r.registry.FindAndLockNodeWithModel(ctx, trackingKey, candidateNodeIDs, pref)
 		if err == nil && node != nil {
@@ -446,9 +433,9 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 	if r.db != nil {
 		lockKey := advisorylock.KeyFromString("model-load:" + trackingKey)
 		var result *RouteResult
-		lockErr := advisorylock.WithLockCtx(loadCtx, r.db, lockKey, func() error {
+		lockErr := advisorylock.WithLockCtx(ctx, r.db, lockKey, func() error {
 			var err error
-			result, err = loadModel(loadCtx)
+			result, err = loadModel()
 			return err
 		})
 		if lockErr != nil {
@@ -457,7 +444,7 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 		return result, nil
 	}
 	// No DB (non-distributed) — proceed without lock
-	return loadModel(loadCtx)
+	return loadModel()
 }

 // parseSelectorJSON decodes a JSON node selector string into a map.
--- a/core/services/nodes/router_staging_context_test.go
+++ b/core/services/nodes/router_staging_context_test.go
@@ -1,80 +0,0 @@
-package nodes
-
-import (
-	"context"
-	"errors"
-	"os"
-	"path/filepath"
-
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-
-	"github.com/mudler/LocalAI/core/services/messaging"
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-)
-
-// cancelOnStageStager simulates the triggering HTTP request being abandoned
-// (client disconnect, ingress idle-timeout) the moment a multi-GB file starts
-// staging. It cancels the request context and records whether the context the
-// stager itself received was cancelled as a result.
-type cancelOnStageStager struct {
-	fakeFileStager
-	cancelRequest context.CancelFunc
-	staged        bool
-	ctxErrOnStage error
-}
-
-func (s *cancelOnStageStager) EnsureRemote(ctx context.Context, _, _, key string) (string, error) {
-	s.staged = true
-	// Mid-transfer: the client gives up on the (minutes-long) request.
-	if s.cancelRequest != nil {
-		s.cancelRequest()
-	}
-	// A multi-GB upload must survive this. If staging were bound to the
-	// request context, ctx is now cancelled and the real HTTP stager would
-	// abort with "context canceled" — exactly the production outage.
-	s.ctxErrOnStage = ctx.Err()
-	return "/remote/" + key, nil
-}
-
-var _ = Describe("Route cold-load staging context", func() {
-	It("detaches staging from the request context so a client disconnect cannot abort a multi-GB transfer", func() {
-		// A real model file so stageModelFiles actually calls the stager
-		// (non-existent paths are skipped).
-		tmp := GinkgoT().TempDir()
-		modelFile := filepath.Join(tmp, "big.gguf")
-		Expect(os.WriteFile(modelFile, []byte("weights"), 0o644)).To(Succeed())
-
-		reg := &fakeModelRouter{
-			findAndLockErr: errors.New("not loaded"),
-			findIdleNode:   &BackendNode{ID: "n1", Name: "worker-1", Address: "10.0.0.1:50051"},
-		}
-		backend := &stubBackend{loadResult: &pb.Result{Success: true}}
-		factory := &stubClientFactory{client: backend}
-		unloader := &fakeUnloader{installReply: &messaging.BackendInstallReply{
-			Success: true,
-			Address: "10.0.0.1:9001",
-		}}
-		stager := &cancelOnStageStager{}
-
-		router := NewSmartRouter(reg, SmartRouterOptions{
-			Unloader:      unloader,
-			ClientFactory: factory,
-			FileStager:    stager,
-			// DB nil: no advisory lock, exercises the same detached load ctx.
-		})
-
-		ctx, cancel := context.WithCancel(context.Background())
-		stager.cancelRequest = cancel
-		defer cancel()
-
-		result, err := router.Route(ctx, "big-model", filepath.Join("models", "big.gguf"), "llama-cpp",
-			&pb.ModelOptions{Model: "big.gguf", ModelFile: modelFile}, false)
-
-		Expect(err).ToNot(HaveOccurred())
-		Expect(result).ToNot(BeNil())
-		Expect(stager.staged).To(BeTrue(), "staging must have been attempted")
-		Expect(stager.ctxErrOnStage).ToNot(HaveOccurred(),
-			"staging context must survive cancellation of the triggering request")
-	})
-})
--- a/core/services/nodes/staging_progress.go
+++ b/core/services/nodes/staging_progress.go
@@ -5,138 +5,58 @@ import (
 	"fmt"
 	"sync"
 	"time"
-
-	"github.com/mudler/LocalAI/core/services/messaging"
 )

 // StagingStatus represents the current progress of a model staging operation.
 type StagingStatus struct {
-	ModelID    string    `json:"model_id"`
-	NodeName   string    `json:"node_name"`
-	FileName   string    `json:"file_name"`
-	BytesSent  int64     `json:"bytes_sent"`
-	TotalBytes int64     `json:"total_bytes"`
-	Progress   float64   `json:"progress"` // 0-100 overall progress
-	Speed      string    `json:"speed"`
-	FileIndex  int       `json:"file_index"`
-	TotalFiles int       `json:"total_files"`
-	Message    string    `json:"message"`
+	ModelID    string  `json:"model_id"`
+	NodeName   string  `json:"node_name"`
+	FileName   string  `json:"file_name"`
+	BytesSent  int64   `json:"bytes_sent"`
+	TotalBytes int64   `json:"total_bytes"`
+	Progress   float64 `json:"progress"` // 0-100 overall progress
+	Speed      string  `json:"speed"`
+	FileIndex  int     `json:"file_index"`
+	TotalFiles int     `json:"total_files"`
+	Message    string  `json:"message"`
 	StartedAt  time.Time `json:"started_at"`
 }

-const (
-	// stagingBroadcastInterval bounds how often byte-level UpdateFile ticks are
-	// re-broadcast to peers (leading-edge debounce). State transitions (Start,
-	// FileComplete, Complete) always publish so peers never miss them.
-	stagingBroadcastInterval = time.Second
-	// stagingRemoteTTL drops a mirrored (remote) op whose last update is older
-	// than this. NATS pub/sub is fire-and-forget, so a missed Done event would
-	// otherwise leave a phantom staging row on a peer forever; a live op
-	// refreshes its mirror at least every stagingBroadcastInterval.
-	stagingRemoteTTL = 60 * time.Second
-)
-
-// stagingEntry wraps a StagingStatus with the bookkeeping needed to keep peer
-// replicas consistent: whether this op is mirrored from a peer (remote) vs.
-// owned locally, when it was last updated (for remote-mirror expiry), and when
-// its byte progress was last broadcast (for debounce).
-type stagingEntry struct {
-	status    StagingStatus
-	remote    bool
-	updatedAt time.Time
-	lastPub   time.Time
-}
-
 // StagingTracker tracks active file staging operations in-memory.
 // Used by SmartRouter to publish progress and by /api/operations to surface it.
-//
-// In distributed mode each frontend replica runs its own tracker. The replica
-// performing a transfer owns the op locally and broadcasts progress over NATS
-// (SetPublisher); peers mirror it via ApplyRemote (SubscribeBroadcasts) so a
-// /api/operations poll that round-robins onto any replica surfaces the op.
 type StagingTracker struct {
-	mu        sync.RWMutex
-	active    map[string]*stagingEntry
-	publisher messaging.Publisher
-}
-
-// StagingProgressEvent is the wire payload a frontend replica broadcasts on
-// SubjectStagingProgress so peer replicas can mirror a staging op they did not
-// originate. Done signals the op finished (peers drop their mirrored copy).
-type StagingProgressEvent struct {
-	ModelID string         `json:"model_id"`
-	Status  *StagingStatus `json:"status,omitempty"`
-	Done    bool           `json:"done"`
+	mu     sync.RWMutex
+	active map[string]*StagingStatus
 }

 // NewStagingTracker creates a new tracker.
 func NewStagingTracker() *StagingTracker {
 	return &StagingTracker{
-		active: make(map[string]*stagingEntry),
+		active: make(map[string]*StagingStatus),
 	}
 }

-// SetPublisher wires the NATS publisher used to broadcast staging progress to
-// peer replicas. No-op publisher (nil) keeps the tracker standalone.
-func (t *StagingTracker) SetPublisher(p messaging.Publisher) {
-	t.mu.Lock()
-	defer t.mu.Unlock()
-	t.publisher = p
-}
-
-// SubscribeBroadcasts subscribes to peer replicas' staging-progress broadcasts
-// and mirrors them into this tracker, so /api/operations on any replica surfaces
-// staging ops it did not originate. Returns the subscription for cleanup.
-func (t *StagingTracker) SubscribeBroadcasts(nc messaging.MessagingClient) (messaging.Subscription, error) {
-	return messaging.SubscribeJSON(nc, messaging.SubjectStagingProgressWildcard, func(evt StagingProgressEvent) {
-		if evt.ModelID == "" {
-			return
-		}
-		t.ApplyRemote(evt)
-	})
-}
-
-// publishStaging emits an event to the per-model staging subject. The publisher
-// is captured by the caller under the lock and passed in, so publishing happens
-// outside the lock (a slow NATS link must not stall the staging copy loop).
-func publishStaging(p messaging.Publisher, evt StagingProgressEvent) {
-	if p == nil {
-		return
-	}
-	_ = p.Publish(messaging.SubjectStagingProgress(evt.ModelID), evt)
-}
-
 // Start registers a new staging operation for the given model.
 func (t *StagingTracker) Start(modelID, nodeName string, totalFiles int) {
 	t.mu.Lock()
-	e := &stagingEntry{
-		status: StagingStatus{
-			ModelID:    modelID,
-			NodeName:   nodeName,
-			TotalFiles: totalFiles,
-			StartedAt:  time.Now(),
-			Message:    "Preparing to stage model files",
-		},
-		updatedAt: time.Now(),
-		// lastPub stays zero so the first UpdateFile tick always broadcasts.
+	defer t.mu.Unlock()
+	t.active[modelID] = &StagingStatus{
+		ModelID:    modelID,
+		NodeName:   nodeName,
+		TotalFiles: totalFiles,
+		StartedAt:  time.Now(),
+		Message:    "Preparing to stage model files",
 	}
-	t.active[modelID] = e
-	pub := t.publisher
-	snap := e.status
-	t.mu.Unlock()
-
-	publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
 }

 // UpdateFile updates the tracker with current file transfer progress.
 func (t *StagingTracker) UpdateFile(modelID, fileName string, fileIndex int, bytesSent, totalBytes int64, speed string) {
 	t.mu.Lock()
-	e, ok := t.active[modelID]
+	defer t.mu.Unlock()
+	s, ok := t.active[modelID]
 	if !ok {
-		t.mu.Unlock()
 		return
 	}
-	s := &e.status
 	s.FileName = fileName
 	s.FileIndex = fileIndex
 	s.BytesSent = bytesSent
@@ -159,121 +79,52 @@ func (t *StagingTracker) UpdateFile(modelID, fileName string, fileIndex int, byt
 	} else {
 		s.Message = fmt.Sprintf("Staging %s", fileName)
 	}
-
-	e.updatedAt = time.Now()
-	// Leading-edge debounce: byte ticks fire many times per second; only
-	// re-broadcast at most once per stagingBroadcastInterval.
-	var pub messaging.Publisher
-	var snap StagingStatus
-	if time.Since(e.lastPub) >= stagingBroadcastInterval {
-		e.lastPub = time.Now()
-		pub = t.publisher
-		snap = e.status
-	}
-	t.mu.Unlock()
-
-	if pub != nil {
-		publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
-	}
 }

 // FileComplete marks a single file as done within a staging operation.
 func (t *StagingTracker) FileComplete(modelID string, fileIndex, totalFiles int) {
 	t.mu.Lock()
-	e, ok := t.active[modelID]
+	defer t.mu.Unlock()
+	s, ok := t.active[modelID]
 	if !ok {
-		t.mu.Unlock()
 		return
 	}
-	s := &e.status
 	if totalFiles > 0 {
 		s.Progress = float64(fileIndex) / float64(totalFiles) * 100
 	}
 	s.BytesSent = 0
 	s.TotalBytes = 0
 	s.Speed = ""
-	e.updatedAt = time.Now()
-	e.lastPub = time.Now()
-	pub := t.publisher
-	snap := e.status
-	t.mu.Unlock()
-
-	// Always broadcast a per-file completion so peers' progress bars advance.
-	publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
 }

 // Complete removes a staging operation (it's done).
 func (t *StagingTracker) Complete(modelID string) {
 	t.mu.Lock()
-	_, ok := t.active[modelID]
+	defer t.mu.Unlock()
 	delete(t.active, modelID)
-	pub := t.publisher
-	t.mu.Unlock()
-
-	if ok {
-		// Tell peers to drop their mirrored copy.
-		publishStaging(pub, StagingProgressEvent{ModelID: modelID, Done: true})
-	}
 }

-// ApplyRemote merges a peer replica's staging broadcast into this tracker. It
-// never re-broadcasts (no echo loop). A locally-owned op is authoritative: a
-// remote event for the same model is ignored, so the origin replica receiving
-// its own broadcast (and any stray peer event) cannot clobber or delete it.
-func (t *StagingTracker) ApplyRemote(evt StagingProgressEvent) {
-	t.mu.Lock()
-	defer t.mu.Unlock()
-
-	if existing, ok := t.active[evt.ModelID]; ok && !existing.remote {
-		// We own this op locally — ignore peer chatter about it.
-		return
-	}
-	if evt.Done {
-		delete(t.active, evt.ModelID)
-		return
-	}
-	if evt.Status == nil {
-		return
-	}
-	t.active[evt.ModelID] = &stagingEntry{
-		status:    *evt.Status,
-		remote:    true,
-		updatedAt: time.Now(),
-	}
-}
-
-// GetAll returns a snapshot of all active staging operations. Stale remote
-// mirrors (a peer op whose Done event was missed) are pruned here so they don't
-// linger in the UI.
+// GetAll returns a snapshot of all active staging operations.
 func (t *StagingTracker) GetAll() map[string]StagingStatus {
-	t.mu.Lock()
-	defer t.mu.Unlock()
-	now := time.Now()
+	t.mu.RLock()
+	defer t.mu.RUnlock()
 	result := make(map[string]StagingStatus, len(t.active))
-	for k, e := range t.active {
-		if e.remote && now.Sub(e.updatedAt) > stagingRemoteTTL {
-			delete(t.active, k)
-			continue
-		}
-		result[k] = e.status
+	for k, v := range t.active {
+		result[k] = *v
 	}
 	return result
 }

-// Get returns the status of a specific staging operation, or nil if not active
-// (or a stale remote mirror).
+// Get returns the status of a specific staging operation, or nil if not active.
 func (t *StagingTracker) Get(modelID string) *StagingStatus {
 	t.mu.RLock()
 	defer t.mu.RUnlock()
-	e, ok := t.active[modelID]
+	s, ok := t.active[modelID]
 	if !ok {
 		return nil
 	}
-	if e.remote && time.Since(e.updatedAt) > stagingRemoteTTL {
-		return nil
-	}
-	s := e.status
-	return &s
+	copy := *s
+	return &copy
 }

 // StagingProgressCallback is called by file stagers to report byte-level progress.
--- a/core/services/nodes/staging_progress_broadcast_test.go
+++ b/core/services/nodes/staging_progress_broadcast_test.go
@@ -1,109 +0,0 @@
-package nodes
-
-import (
-	"encoding/json"
-
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-
-	"github.com/mudler/LocalAI/core/services/messaging"
-)
-
-// decodeStagingEvents extracts every StagingProgressEvent the fake messaging
-// client captured, in publish order.
-func decodeStagingEvents(mc *fakeMessagingClient) []StagingProgressEvent {
-	mc.mu.Lock()
-	defer mc.mu.Unlock()
-	var out []StagingProgressEvent
-	for _, p := range mc.published {
-		var evt StagingProgressEvent
-		if err := json.Unmarshal(p.Data, &evt); err != nil {
-			continue
-		}
-		if evt.ModelID == "" {
-			continue
-		}
-		out = append(out, evt)
-	}
-	return out
-}
-
-var _ = Describe("StagingTracker cross-replica broadcast", func() {
-	Context("when a publisher is wired (distributed mode)", func() {
-		It("broadcasts staging progress so a peer replica surfaces an op it did not originate", func() {
-			mc := &fakeMessagingClient{}
-			origin := NewStagingTracker()
-			origin.SetPublisher(mc)
-
-			origin.Start("model-x", "worker-1", 1)
-			origin.UpdateFile("model-x", "weights.gguf", 1, 5<<30, 10<<30, "100 MiB/s")
-
-			events := decodeStagingEvents(mc)
-			Expect(events).ToNot(BeEmpty(), "writes must be broadcast over NATS")
-			Expect(mc.published[0].Subject).To(Equal(messaging.SubjectStagingProgress("model-x")))
-
-			// A peer replica that never ran the op merges the broadcast.
-			peer := NewStagingTracker()
-			for _, evt := range events {
-				peer.ApplyRemote(evt)
-			}
-
-			all := peer.GetAll()
-			Expect(all).To(HaveKey("model-x"))
-			Expect(all["model-x"].NodeName).To(Equal("worker-1"))
-			Expect(all["model-x"].FileName).To(Equal("weights.gguf"))
-			Expect(all["model-x"].TotalBytes).To(Equal(int64(10 << 30)))
-		})
-
-		It("removes the op from the peer when the origin completes it", func() {
-			mc := &fakeMessagingClient{}
-			origin := NewStagingTracker()
-			origin.SetPublisher(mc)
-
-			origin.Start("model-x", "worker-1", 1)
-			origin.Complete("model-x")
-
-			peer := NewStagingTracker()
-			for _, evt := range decodeStagingEvents(mc) {
-				peer.ApplyRemote(evt)
-			}
-			Expect(peer.GetAll()).ToNot(HaveKey("model-x"))
-		})
-
-		It("does not let a peer broadcast clobber an op this replica is itself running", func() {
-			local := NewStagingTracker()
-			local.Start("model-x", "worker-local", 2)
-			local.UpdateFile("model-x", "weights.gguf", 1, 9<<30, 10<<30, "")
-
-			// A stray/older remote event for the SAME modelID must not overwrite
-			// the authoritative local state, nor delete it.
-			local.ApplyRemote(StagingProgressEvent{
-				ModelID: "model-x",
-				Status:  &StagingStatus{ModelID: "model-x", NodeName: "worker-other", FileName: "stale.gguf"},
-			})
-			local.ApplyRemote(StagingProgressEvent{ModelID: "model-x", Done: true})
-
-			all := local.GetAll()
-			Expect(all).To(HaveKey("model-x"))
-			Expect(all["model-x"].NodeName).To(Equal("worker-local"))
-			Expect(all["model-x"].FileName).To(Equal("weights.gguf"))
-		})
-	})
-
-	Context("when no publisher is wired (standalone mode)", func() {
-		It("does not broadcast", func() {
-			mc := &fakeMessagingClient{}
-			t := NewStagingTracker()
-			t.Start("model-x", "worker-1", 1)
-			t.UpdateFile("model-x", "weights.gguf", 1, 1<<30, 10<<30, "")
-			Expect(mc.published).To(BeEmpty())
-		})
-	})
-})
-
-var _ = Describe("SubjectStagingProgress", func() {
-	It("namespaces by model id and matches the wildcard prefix", func() {
-		Expect(messaging.SubjectStagingProgress("model-x")).To(Equal("staging.model-x.progress"))
-		Expect(messaging.SubjectStagingProgressWildcard).To(Equal("staging.*.progress"))
-	})
-})
--- a/core/services/routing/piiadapter/openai_completion.go
+++ b/core/services/routing/piiadapter/openai_completion.go
@@ -44,7 +44,7 @@ func applyAnyText(v any, elem int, text string) any {
 	if elem < 0 {
 		return text
 	}
-	if arr, ok := v.([]any); ok && elem < len(arr) {
+	if arr, ok := v.([]any); ok && elem >= 0 && elem < len(arr) {
 		arr[elem] = text
 	}
 	return v
--- a/core/services/routing/piidetector/pattern.go
+++ b/core/services/routing/piidetector/pattern.go
@@ -39,9 +39,8 @@ type patternDetector struct {
 // When tracing is enabled it records a pattern_pii BackendTrace so the matches
 // (group, byte range, text) show in the Traces UI alongside NER detections.
 func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntity, error) {
-	tracing := d.appConfig != nil && d.appConfig.EnableTracing
 	var start time.Time
-	if tracing {
+	if d.appConfig != nil && d.appConfig.EnableTracing {
 		trace.InitBackendTracingIfEnabled(d.appConfig.TracingMaxItems, d.appConfig.TracingMaxBodyBytes)
 		start = time.Now()
 	}
@@ -51,12 +50,12 @@ func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntit
 	var traceEnts []backend.TokenEntity
 	for _, mt := range matches {
 		out = append(out, pii.NEREntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text})
-		if tracing {
+		if d.appConfig != nil && d.appConfig.EnableTracing {
 			traceEnts = append(traceEnts, backend.TokenEntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text})
 		}
 	}

-	if tracing {
+	if d.appConfig != nil && d.appConfig.EnableTracing {
 		trace.RecordBackendTrace(patternPIITrace(d.modelName, text, traceEnts, start))
 	}
 	return out, nil
--- a/core/services/routing/piipattern/grammar.go
+++ b/core/services/routing/piipattern/grammar.go
@@ -28,16 +28,10 @@ const (
 	// credential shape, small enough that the compiled program stays tiny.
 	MaxPatternLen = 256
 	// MaxQuantifier caps an explicit {n,m} upper bound. RE2 expands a bounded
-	// repeat into that many copies, so a large bound inflates the compiled
-	// program. Go's regexp/syntax independently rejects any bound above 1000
-	// at Parse time, so this cap MUST stay strictly below 1000 to be a live
-	// guard rather than dead code shadowed by the parser: a bound in
-	// (MaxQuantifier, 1000] reaches walk and is rejected here with an
-	// actionable error, while >1000 is caught earlier by Parse. 512 is far
-	// larger than any real credential token yet keeps the guard meaningful and
-	// is defence in depth should the stdlib cap ever rise. Unbounded {n,} (no
-	// upper) is a loop, not an expansion, and is allowed.
-	MaxQuantifier = 512
+	// repeat into that many copies, so an uncapped {0,1000000} would blow up
+	// the compiled program's memory. Unbounded {n,} (no upper) is a loop, not
+	// an expansion, and is allowed.
+	MaxQuantifier = 4096
 	// MaxAlternation caps the arms of a single `a|b|c` alternation.
 	MaxAlternation = 64
 	// MaxAST bounds recursion depth so a pathologically nested pattern can't
--- a/core/services/routing/piipattern/piipattern_test.go
+++ b/core/services/routing/piipattern/piipattern_test.go
@@ -1,7 +1,6 @@
 package piipattern

 import (
-	"fmt"
 	"strings"
 	"testing"

@@ -37,45 +36,6 @@ var _ = Describe("ValidatePattern", func() {
 	)
 })

-var _ = Describe("MaxQuantifier guard (must stay live, not dead code)", func() {
-	// Go's regexp/syntax hard-caps repeat bounds at 1000 and rejects anything
-	// larger at Parse time, before walk() runs. So the walk() {n,m} guard only
-	// fires for bounds in (MaxQuantifier, 1000]; if MaxQuantifier ever creeps
-	// to >= 1000 the guard becomes unreachable dead code. These specs pin the
-	// relationship and prove the guard is the binding constraint in that band.
-	const stdlibRepeatCap = 1000
-
-	It("is strictly below the stdlib repeat cap so the guard is reachable", func() {
-		Expect(MaxQuantifier).To(BeNumerically("<", stdlibRepeatCap),
-			"MaxQuantifier must be < %d or walk()'s {n,m} guard is dead code (Parse rejects larger bounds first)", stdlibRepeatCap)
-	})
-
-	It("accepts a bound at exactly MaxQuantifier", func() {
-		Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier))).To(Succeed())
-	})
-
-	It("rejects a bound just above MaxQuantifier with our actionable error (proves the guard runs)", func() {
-		// MaxQuantifier+1 is still parseable (<= stdlib cap), so it reaches
-		// walk(), where our guard — not the parser — rejects it.
-		err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier+1))
-		Expect(err).To(HaveOccurred())
-		Expect(err.Error()).To(ContainSubstring("bound is too large"),
-			"a bound in (MaxQuantifier, stdlib cap] must be rejected by walk(), not the parser")
-	})
-
-	It("rejects an unbounded {n,} whose lower bound exceeds MaxQuantifier", func() {
-		err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d,}`, MaxQuantifier+1))
-		Expect(err).To(HaveOccurred())
-		Expect(err.Error()).To(ContainSubstring("bound is too large"))
-	})
-
-	It("still fails closed above the stdlib cap (Parse rejects before walk)", func() {
-		// >1000: caught by syntax.Parse; the message is the parser's, but it
-		// still fails closed — defence in depth.
-		Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, stdlibRepeatCap+1))).NotTo(Succeed())
-	})
-})
-
 var _ = Describe("Compile", func() {
 	It("compiles a valid pattern with leftmost-longest semantics", func() {
 		re, err := Compile(`sk-ant-[A-Za-z0-9_-]{4,}`)
--- a/docs/content/features/distributed-mode.md
+++ b/docs/content/features/distributed-mode.md
@@ -311,7 +311,7 @@ Used by the WebUI and admin API consumers. Requires admin authentication.
 | `POST` | `/api/nodes/:id/models/unload` | Unload a model from a worker |
 | `POST` | `/api/nodes/:id/models/delete` | Delete model files from a worker |

-The **Nodes** page in the React WebUI provides a visual overview of all registered workers, their statuses, and loaded models. The page opens with a one-line **cluster pulse** summarising node health and an **attention callout** that surfaces nodes needing action (for example pending approvals). Below that, a roster of **node panels** lists each worker with its inline model chips (no expand click needed), filtered by an **All / Backend / Agent** segmented control. Selecting a panel opens a dedicated **node detail page** at `/app/nodes/:id` with per-node metrics, models, and backend actions. Model scheduling lives on its own **Scheduling** page (separate nav item), not as a tab on the Nodes page.
+The **Nodes** page in the React WebUI provides a visual overview of all registered workers, their statuses, and loaded models.

 ## Node Approval

@@ -554,7 +554,7 @@ local-ai worker \

 ## Model Scheduling

-Model scheduling controls where models are placed and how many replicas are maintained. In the React WebUI it has its own **Scheduling** page (a top-level nav item, separate from the Nodes page). It combines two optional features:
+Model scheduling controls where models are placed and how many replicas are maintained. It combines two optional features:

 ### Node Selectors

--- a/docs/content/features/face-recognition.md
+++ b/docs/content/features/face-recognition.md
@@ -7,16 +7,93 @@ url = "/features/face-recognition/"

 ![Face recognition: 1:N match against a vector store, with an anti-spoofing liveness gate that can veto a verification](/images/diagrams/face-recognition-flow.png)

-LocalAI supports face recognition through the `insightface` backend:
-face verification (1:1), face identification (1:N) against a built-in
-vector store, face embedding, face detection, demographic analysis
-(age / gender), and antispoofing / liveness detection.
+LocalAI supports face recognition: face verification (1:1), face
+identification (1:N) against a built-in vector store, face embedding,
+face detection, demographic analysis (age / gender), and antispoofing /
+liveness detection.

-The backend ships **two interchangeable engines** under one image, each
-paired with a distinct gallery entry so users can pick by license and
-accuracy needs.
+The same `/v1/face/*` HTTP API is served by two backends:

-## Licensing — read this first
+- **`face-detect` (recommended, default).** A standalone C++/ggml
+  engine ([face-detect.cpp](https://github.com/mudler/face-detect.cpp)):
+  no Python, no onnxruntime, no torch runtime. Each gallery entry is a
+  single self-describing GGUF. This is the recommended option for new
+  deployments.
+- **`insightface` (Python).** The original ONNX Runtime backend. Still
+  supported; see [the Python backend](#insightface-python-backend) below.
+
+Both backends expose the identical wire format, so the API examples in
+this page work with either - only the gallery entry name (the `model`
+field) changes.
+
+## face-detect (ggml) backend
+
+The `face-detect` backend reads the detector and recognizer architecture
+(`facedetect.arch`) directly from the GGUF metadata, so installing a
+gallery entry is all that is needed to select an engine. It drives the
+Embeddings / Detect / FaceVerify / FaceAnalyze gRPC rpcs behind the
+`/v1/face/{embed,verify,analyze,detect,register,identify,forget}`
+endpoints.
+
+### Licensing - read this first
+
+| Gallery entry | Detector + recognizer | Embedding dim | License |
+|---|---|---|---|
+| `face-detect-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** (upstream insightface weights) |
+| `face-detect-buffalo-m` | SCRFD-2.5GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** |
+| `face-detect-buffalo-s` | SCRFD-500MF + MBF + GenderAge | 512 | **Non-commercial research only** |
+| `face-detect-yunet-sface` | YuNet + SFace (OpenCV Zoo) | 128 | **Apache 2.0 - commercial-safe** |
+
+The insightface buffalo packs (buffalo_l / buffalo_m / buffalo_s) are
+released by the upstream maintainers for **non-commercial research use
+only**. Pick the `face-detect-yunet-sface` entry for production /
+commercial deployments.
+
+### Quickstart
+
+Install the commercial-safe entry (recommended for copy-paste):
+
+```bash
+local-ai models install face-detect-yunet-sface
+```
+
+Verify that two images depict the same person:
+
+```bash
+curl -sX POST http://localhost:8080/v1/face/verify \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "face-detect-yunet-sface",
+    "img1": "https://example.com/alice_1.jpg",
+    "img2": "https://example.com/alice_2.jpg"
+  }'
+```
+
+Detect faces and analyze demographics (buffalo entries populate
+age / gender; YuNet + SFace returns regions only):
+
+```bash
+curl -sX POST http://localhost:8080/v1/face/detect \
+  -H "Content-Type: application/json" \
+  -d '{"model": "face-detect-buffalo-l", "img": "https://example.com/group.jpg"}'
+
+curl -sX POST http://localhost:8080/v1/face/analyze \
+  -H "Content-Type: application/json" \
+  -d '{"model": "face-detect-buffalo-l", "img": "https://example.com/alice.jpg"}'
+```
+
+The 1:N register / identify / forget workflow and the rest of the API
+are identical to the [API reference](#api-reference) below - just pass a
+`face-detect-*` model name. The per-engine verify thresholds are ~0.35
+for the buffalo ArcFace/MBF recognizers and ~0.363 for SFace.
+
+## insightface (Python) backend
+
+The `insightface` backend ships **two interchangeable engines** under
+one image, each paired with a distinct gallery entry so users can pick
+by license and accuracy needs.
+
+### Licensing - read this first

 | Gallery entry | Detector + recognizer | Size | License |
 |---|---|---|---|
--- a/docs/content/features/voice-recognition.md
+++ b/docs/content/features/voice-recognition.md
@@ -7,16 +7,92 @@ url = "/features/voice-recognition/"

 ![Voice recognition: register, identify, and forget voiceprints in a vector store, for 1:1 verify or 1:N identify](/images/diagrams/voice-recognition-flow.png)

-LocalAI supports voice (speaker) recognition through the
-`speaker-recognition` backend: speaker verification (1:1), speaker
-identification (1:N) against a built-in vector store, speaker
-embedding, and demographic analysis (age / gender / emotion from
-voice).
+LocalAI supports voice (speaker) recognition: speaker verification
+(1:1), speaker identification (1:N) against a built-in vector store,
+speaker embedding, and demographic analysis (age / gender / emotion
+from voice).

 The audio analog to [Face Recognition](/features/face-recognition/),
-following the same two-engine pattern under one image.
+served over the same `/v1/voice/*` HTTP API by two backends:

-## Engines
+- **`voice-detect` (recommended, default).** A standalone C++/ggml
+  engine ([voice-detect.cpp](https://github.com/mudler/voice-detect.cpp)):
+  no Python, no onnxruntime, no torch runtime. Each gallery entry is a
+  single self-describing GGUF. This is the recommended option for new
+  deployments.
+- **`speaker-recognition` (Python).** The original SpeechBrain / ONNX
+  backend. Still supported; see [the Python backend](#speaker-recognition-python-backend)
+  below.
+
+Both backends expose the identical wire format, so the API examples on
+this page work with either - only the gallery entry name (the `model`
+field) changes.
+
+## voice-detect (ggml) backend
+
+The `voice-detect` backend reads the embedding (or analysis)
+architecture (`voicedetect.arch`) directly from the GGUF metadata, so
+installing a gallery entry is all that is needed to select an engine. It
+drives the VoiceEmbed / VoiceVerify / VoiceAnalyze gRPC rpcs behind the
+`/v1/voice/{embed,verify,analyze,register,identify,forget}` endpoints.
+
+### Gallery entries
+
+| Gallery entry | Model | Embedding dim | License |
+|---|---|---|---|
+| `voice-detect-ecapa-tdnn` | SpeechBrain ECAPA-TDNN (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
+| `voice-detect-wespeaker-resnet34` | WeSpeaker ResNet34 (VoxCeleb) | 256 | CC-BY-4.0 |
+| `voice-detect-eres2net` | 3D-Speaker ERes2Net (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
+| `voice-detect-campplus` | 3D-Speaker CAM++ (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
+| `voice-detect-emotion-wav2vec2` | audEERING wav2vec2 (age / gender / emotion) | analyze head | **CC-BY-NC-SA-4.0 - non-commercial** |
+
+The four speaker-recognition entries drive verify / embed / identify.
+`voice-detect-emotion-wav2vec2` is the analysis head behind
+`/v1/voice/analyze` (continuous age estimate plus gender and emotion
+class scores) and is **non-commercial / research use only**.
+
+### Quickstart
+
+Install the default entry (recommended for copy-paste):
+
+```bash
+local-ai models install voice-detect-ecapa-tdnn
+```
+
+Verify that two audio clips were spoken by the same person:
+
+```bash
+curl -sX POST http://localhost:8080/v1/voice/verify \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "voice-detect-ecapa-tdnn",
+    "audio1": "https://example.com/alice_1.wav",
+    "audio2": "https://example.com/alice_2.wav"
+  }'
+```
+
+Analyze age / gender / emotion (install the analyze entry first):
+
+```bash
+local-ai models install voice-detect-emotion-wav2vec2
+
+curl -sX POST http://localhost:8080/v1/voice/analyze \
+  -H "Content-Type: application/json" \
+  -d '{"model": "voice-detect-emotion-wav2vec2", "audio": "https://example.com/alice.wav"}'
+```
+
+The 1:N register / identify / forget workflow and the rest of the API
+are identical to the [API reference](#api-reference) below - just pass a
+`voice-detect-*` model name. The default verify threshold is ~0.25 for
+the ECAPA-TDNN / ERes2Net / CAM++ recognizers and ~0.30 for WeSpeaker
+ResNet34.
+
+## speaker-recognition (Python) backend
+
+The `speaker-recognition` backend follows the same two-engine pattern
+under one image.
+
+### Engines

 | Gallery entry | Model | Size | License |
 |---|---|---|---|
--- a/docs/content/getting-started/models.md
+++ b/docs/content/getting-started/models.md
@@ -131,10 +131,6 @@ local-ai run ollama://gemma:2b
 local-ai run oci://localai/phi-2:latest
 ```

-{{% notice note %}}
-When pulling models from Ollama or OCI registries, LocalAI identifies itself with a `LocalAI/<version>` `User-Agent` header so registry operators can attribute usage to LocalAI.
-{{% /notice %}}
-
 ### Run Models via URI

 To run models via URI, specify a URI to a model file or a configuration file when starting LocalAI. Valid syntax includes:
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -97,6 +97,8 @@ All backends listed here can be installed on demand from the [Backend Gallery]({
 | [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
 | [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
 | [sam3.cpp](https://github.com/PABannier/sam3.cpp) | Segment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
+| [face-detect.cpp](https://github.com/mudler/face-detect.cpp) | Native face detection, recognition, embedding, demographics and anti-spoofing (SCRFD/ArcFace, YuNet/SFace) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
+| [voice-detect.cpp](https://github.com/mudler/voice-detect.cpp) | Native speaker (voice) recognition and voice analysis (ECAPA-TDNN, WeSpeaker, ERes2Net, CAM++, wav2vec2) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
 | [insightface](https://github.com/deepinsight/insightface) | Face verification, embedding, and anti-spoofing liveness (ONNX Runtime) | CPU, CUDA 12 |
 | [speaker-recognition](https://speechbrain.github.io/) | Speaker (voice) recognition via SpeechBrain ECAPA-TDNN | CPU, CUDA 12, Metal |

--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,142 +1,4 @@
 ---
- name: "qwythos-9b-claude-mythos-5-1m"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
-  description: |
-    # Qwythos-9B
-
-    **Developed by Empero**
-
-    **Qwythos-9B** is a full-parameter reasoning model built on top of a **deeply uncensored Qwen3.5-9B base** and post-trained on **over 500 million tokens** of high-quality Claude Mythos and Claude Fable traces, with chain-of-thought generated in-house by Empero AI's internal tool **rethink**.
-
-    The result is a compact, fast, **dramatically more capable** 9B reasoning model. Headline capabilities:
-
-    ...
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - vision
-    - multimodal
-    - reasoning
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    mmproj: llama-cpp/mmproj/Qwythos-9B-Claude-Mythos-5-1M-GGUF/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
-    options:
-      - use_jinja:true
-      - spec_type:draft-mtp
-      - spec_n_max:6
-      - spec_p_min:0.75
-    parameters:
-      model: llama-cpp/models/Qwythos-9B-Claude-Mythos-5-1M-GGUF/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/Qwythos-9B-Claude-Mythos-5-1M-GGUF/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
-      sha256: 24ee22e0f5d9f0d3d615809607f365c728d9b0c3f3fb6eb19d8bd83a1c2933d8
-      uri: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF/resolve/main/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
-    - filename: llama-cpp/mmproj/Qwythos-9B-Claude-Mythos-5-1M-GGUF/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
-      sha256: f70dc3509053962b0d0d3ee8a7eacebf5d60aa560cad78254ae8698516ae029f
-      uri: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF/resolve/main/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
- name: "glm-5.2"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/unsloth/GLM-5.2-GGUF
-  description: |
-    # GLM-5.2
-
-    👋 Join our WeChat or Discord community.
-
-    📖 Check out the GLM-5.2 blog and GLM-5 Technical report.
-
-    📍 Use GLM-5.2 API services on Z.ai API Platform.
-
-    🔜 Try GLM-5.2 here.
-
-    [Paper]
-    [GitHub]
-
-    ## Introduction
-
-    We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a **solid 1M-token context**. GLM-5.2's new capabilities include:
-      - **Solid 1M Context:** A solid 1M-token context that stably sustains long-horizon work
-      - **Advanced Coding with Flexible Effort**: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
-      - **Improved Architecture**: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
-      - **Pure Open**: An MIT open-source license — no regional limits, technical access without borders
-
-    ## Benchmark
-
-    ## Serve GLM-5.2 Locally
-
-    ...
-  license: "mit"
-  tags:
-    - llm
-    - gguf
-  icon: https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/bench_52.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    options:
-      - use_jinja:true
-      - spec_type:draft-mtp
-      - spec_n_max:6
-      - spec_p_min:0.75
-    parameters:
-      min_p: 0.01
-      model: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
-      repeat_penalty: 1
-      temperature: 1
-      top_k: -1
-      top_p: 0.95
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
-      sha256: 3256ac8c290273f0965ff39e93a8bcd07dc99bcd23e923bd4b7306ef39061038
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00002-of-00011.gguf
-      sha256: 1020105e78d862988a6cabb3a78eafa75f29666ab8a5fd10de1b9b8c8a6bc5e8
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00002-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00003-of-00011.gguf
-      sha256: 0b36f406e120759290894ea4960d5086f9b362a8c8f9c7fcaad24b4471172efb
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00003-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00004-of-00011.gguf
-      sha256: 04b19199f52ba29e7f9966b15df3fbc2d1e5c56cd6343c405076be7174d49d32
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00004-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00005-of-00011.gguf
-      sha256: 5cb76d724ee16e80c1cb6aba29aacd76161e7a6f147079be3447501c06d95f2c
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00005-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00006-of-00011.gguf
-      sha256: ec2c65255c834b686f066e350bc5b8d8a7020cd1133f0ee9e819d2fb5d3afad0
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00006-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00007-of-00011.gguf
-      sha256: 53c8328852ca0b6791a9a9243bcc56157305adca8526a646054389845e7445a9
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00007-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00008-of-00011.gguf
-      sha256: 9a23bfb21c5f6fcc94b0329c108ec1ef3fdbd815c57eeb0bf105d26861d7271e
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00008-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00009-of-00011.gguf
-      sha256: 71088054fb1a09a4f38e2ee8a726526790660a4f77ead817f75cb7a484bdb0b8
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00009-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00010-of-00011.gguf
-      sha256: 848db99658faf24971df23638281305a15bdc187cbcaed968952ed9e9c835b50
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00010-of-00011.gguf
-    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00011-of-00011.gguf
-      sha256: 629e23bce250fb500d9a190de7249c2882af524aacc112ce507a871ed5bebf90
-      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00011-of-00011.gguf
 - name: "qwen3.6-35b-a3b-nvfp4-mtp"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
@@ -1252,98 +1114,6 @@
    - filename: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
      sha256: 01b76572f80b7d2ebee80a27cb9c3699c26b04cae1c402eee7664fc17a4b5ce6
      uri: https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf
- name: "privacy-filter-nemotron"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
-  urls:
-    - https://huggingface.co/OpenMed/privacy-filter-nemotron
-    - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF
-  description: |
-    A fine-grained English PII token-classification model: a fine-tune of
-    openai/privacy-filter by OpenMed on NVIDIA's Nemotron-PII dataset. It labels
-    every token with a BIOES tag over 55 PII categories (221 classes), trading
-    the multilingual sibling's language breadth for category depth - identity,
-    contact, address, dates, government IDs, financial, healthcare, enterprise,
-    vehicle and digital entities (including api_key, ipv4/ipv6 and mac_address).
-    For multilingual text prefer privacy-filter-multilingual instead.
-
-    In LocalAI this is a PII detector for the NER redactor tier: set
-    known_usecases to [token_classify] (as below), and any model opts into
-    redaction by listing this one under pii.detectors. The detection policy
-    (which categories to mask vs block, and the score threshold) lives on this
-    model's own pii_detection block - see the overrides below. It runs locally
-    with no Python, served by the standalone privacy-filter backend's
-    TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
-    entity spans).
-
-    Architecture: gpt-oss-style sparse MoE (8 layers, d_model 640, 128 experts
-    top-4, ~1.5B total / ~50M active per token), bidirectional banded attention,
-    o200k tokenizer and a 221-way token-classification head; served via the
-    openai-privacy-filter architecture. F16, ~2.8 GB. (A smaller Q8_0 quant
-    exists on the GGUF repo for RAM-constrained use - validate it on your own
-    data, since for PII a single dropped span is a leak.)
-  license: apache-2.0
-  tags:
-    - token-classification
-    - ner
-    - pii
-    - privacy
-    - nemotron
-    - gguf
-  overrides:
-    backend: privacy-filter
-    embeddings: true
-    known_usecases:
-      - token_classify
-    parameters:
-      model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf
-    pii_detection:
-      min_score: 0.5
-      default_action: mask
-  files:
-    - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf
-      sha256: 70dfe91ff220ff04594168a83e296dcc2054449cde77f98d0e782edbb6a31f5a
-      uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-f16.gguf
- name: "privacy-filter-nemotron-q8"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
-  urls:
-    - https://huggingface.co/OpenMed/privacy-filter-nemotron
-    - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF
-  description: |
-    Q8_0 quant of privacy-filter-nemotron (~1.64 GB, vs ~2.8 GB for F16) for
-    RAM-constrained / edge use (e.g. a 4 GB Raspberry Pi 5). The MoE expert
-    weights are stored 8-bit; attention, embeddings and the classifier head
-    stay F16. Same model, policy and runtime as the F16 entry - see
-    privacy-filter-nemotron for the full description.
-
-    Prefer the F16 entry when you can afford it: it is the reference artifact.
-    On a mixed-PII document the publisher measured q8 matching F16 on 99.93% of
-    token labels with an identical span set at threshold 0.5 - but one token
-    flipped, and for PII a single dropped span is a leak. Treat q8 as a
-    deliberate size/speed tradeoff and validate it on your own data.
-  license: apache-2.0
-  tags:
-    - token-classification
-    - ner
-    - pii
-    - privacy
-    - nemotron
-    - gguf
-  overrides:
-    backend: privacy-filter
-    embeddings: true
-    known_usecases:
-      - token_classify
-    parameters:
-      model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf
-    pii_detection:
-      min_score: 0.5
-      default_action: mask
-  files:
-    - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf
-      sha256: 2ec11c154e572a2686f4d77e861b7f74e6917e09638fe9bd27156d48bd99e21a
-      uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-q8.gguf
 - name: "secret-filter"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  description: |
@@ -8679,6 +8449,248 @@
    - filename: MiniFASNetV1SE.onnx
      sha256: ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676
      uri: https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx
+- name: face-detect-buffalo-l
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF
+    detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and
+    shipped as a single GGUF for the `face-detect` backend. Highest
+    accuracy of the buffalo line.
+
+    No Python / onnxruntime / torch runtime: face-detect.cpp reads the
+    detector and embedder architecture (`facedetect.arch`) directly from
+    the GGUF metadata, so installing this entry is all that is needed to
+    select buffalo_l. Drives the Embedding / Detect / FaceVerify /
+    FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect}
+    REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof
+    ensemble, available via the FaceVerify `anti_spoof` request flag.
+    NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see
+    `face-detect-yunet-sface`.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-l.gguf
+  files:
+    - filename: face-detect-buffalo-l.gguf
+      sha256: 6ed070f6e569beeed542ddd5603bcbc9eb8ea57f728f7d8013d6a90b2b952116
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_l.gguf
+- name: face-detect-buffalo-m
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF
+    detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF
+    for the `face-detect` backend. Same recognition accuracy as
+    `buffalo_l` with a cheaper detector: a good balance on mid-range
+    hardware.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_m engine. This GGUF also
+    embeds the MiniFASNet anti-spoof ensemble, available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-m.gguf
+  files:
+    - filename: face-detect-buffalo-m.gguf
+      sha256: 0f7527eeb97b88719bf7e11e43ab8af6f05999357d767f8dde53db3c586c1c3f
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_m.gguf
+- name: face-detect-buffalo-s
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF
+    detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the
+    `face-detect` backend. Small and CPU-friendly: a good fit for
+    mid-range and edge deployments.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_s engine. This GGUF also
+    embeds the MiniFASNet anti-spoof ensemble, available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-s.gguf
+  files:
+    - filename: face-detect-buffalo-s.gguf
+      sha256: 7490b1efbc8746b188a5aef0adf5e3d1a2dc9607abd474018893f95571999969
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_s.gguf
+- name: face-detect-buffalo-sc
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M
+    detector + a small ArcFace embedder), converted to a C++/ggml GGUF
+    for the `face-detect` backend. This is the smallest insightface
+    pack: the lightest option for low-resource and edge deployments.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the buffalo_sc engine. If this GGUF
+    embeds the MiniFASNet anti-spoof ensemble, it is available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-buffalo-sc.gguf
+  files:
+    - filename: face-detect-buffalo-sc.gguf
+      sha256: f754c0e32d5efbbc53d7efca13be2807676bf5db20a8594ef96b32afa2c482b1
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_sc.gguf
+- name: face-detect-antelopev2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/deepinsight/insightface
+  description: |
+    Face recognition with insightface's `antelopev2` pack (SCRFD-10G
+    detector + ArcFace glint360k R100, 512-d embedder), converted to a
+    C++/ggml GGUF for the `face-detect` backend. The higher-accuracy
+    insightface pack: heavier, but the best fit when recognition
+    quality matters more than speed.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the antelopev2 engine. If this GGUF
+    embeds the MiniFASNet anti-spoof ensemble, it is available via the
+    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
+    ONLY.
+  license: insightface-non-commercial
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - research-only
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.35
+    parameters:
+      model: face-detect-antelopev2.gguf
+  files:
+    - filename: face-detect-antelopev2.gguf
+      sha256: 245e657e51754fbf075dd43d80a80a2d14a60c2fc42a3220f63eef17a315e96c
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/antelopev2.gguf
+- name: face-detect-yunet-sface
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/face-detect.cpp
+    - https://github.com/opencv/opencv_zoo
+  description: |
+    Face recognition with OpenCV Zoo weights: YuNet detector + SFace
+    128-d recognizer, converted to a C++/ggml GGUF for the `face-detect`
+    backend. APACHE 2.0: safe for commercial use. Lower accuracy than the
+    buffalo packs and no demographic head, but the commercial-friendly
+    alternative to the insightface buffalo line.
+
+    The architecture (`facedetect.arch`) is read from the GGUF metadata,
+    so this entry alone selects the YuNet + SFace engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - commercial-ok
+    - gpu
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: face-detect
+    known_usecases:
+      - face_recognition
+      - detection
+      - embeddings
+    options:
+      - verify_threshold:0.363
+    parameters:
+      model: face-detect-yunet-sface.gguf
+  files:
+    - filename: face-detect-yunet-sface.gguf
+      sha256: 9ce78d4ba0ae9d5e8c91a0e145d511558d1d90f5d9c1f4131cca9bb4bce60902
+      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/yunet-sface.gguf
 - name: speechbrain-ecapa-tdnn
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
@@ -8748,6 +8760,217 @@
    - filename: wespeaker_voxceleb_resnet34.onnx
      sha256: 7bb2f06e9df17cdf1ef14ee8a15ab08ed28e8d0ef5054ee135741560df2ec068
      uri: https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM/resolve/main/voxceleb_resnet34_LM.onnx
+- name: voice-detect-ecapa-tdnn
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
+  description: |
+    Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained
+    on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the
+    `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal
+    Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe.
+
+    No Python / torch runtime: voice-detect.cpp reads the embedding
+    architecture (`voicedetect.arch`) directly from the GGUF metadata,
+    so installing this entry is all that is needed to select ECAPA-TDNN.
+    Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the
+    /v1/voice/{verify,embed,register,identify,forget} REST endpoints.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-ecapa-tdnn-voxceleb.gguf
+  files:
+    - filename: voice-detect-ecapa-tdnn-voxceleb.gguf
+      sha256: 68046a1fdfb7843f460962db4739fbd381cc5c3ab93d1505e75e2f4c0dc19b8f
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/ecapa-tdnn-voxceleb.gguf
+- name: voice-detect-wespeaker-resnet34
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://github.com/wenet-e2e/wespeaker
+  description: |
+    Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d
+    embeddings, CPU-friendly and runtime-free (no onnxruntime or torch).
+    CC-BY-4.0.
+
+    Use when you want WeSpeaker's ResNet34 topology instead of
+    ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read
+    from the GGUF metadata, so this entry alone selects the engine.
+  license: cc-by-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-wespeaker-resnet34.gguf
+  files:
+    - filename: voice-detect-wespeaker-resnet34.gguf
+      sha256: 72040372494eafec299836bc1977cfc13c603cb486674ed59b0f4c03758d29da
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/wespeaker-resnet34-voxceleb.gguf
+- name: voice-detect-eres2net
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/iic/speech_eres2net_sv_en_voxceleb_16k
+  description: |
+    Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend.
+    192-d embeddings with strong verification accuracy. APACHE 2.0.
+
+    The embedding architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the ERes2Net engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-eres2net.gguf
+  files:
+    - filename: voice-detect-eres2net.gguf
+      sha256: d39f53c7a4d39734740a86a07521b9a819ee8ea56c1a9436eba611ab733a3d06
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/eres2net-base-zh-cn.gguf
+- name: voice-detect-campplus
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/iic/speech_campplus_sv_en_voxceleb_16k
+  description: |
+    Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb,
+    converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d
+    embeddings, a fast context-aware masking topology well-suited to
+    CPU and edge deployments. APACHE 2.0.
+
+    The embedding architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the CAM++ engine.
+  license: apache-2.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - speaker-verification
+    - speaker-embedding
+    - commercial-ok
+    - edge
+    - cpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    options:
+      - verify_threshold:0.25
+    parameters:
+      model: voice-detect-campplus.gguf
+  files:
+    - filename: voice-detect-campplus.gguf
+      sha256: a6e34c6d230cff26e37b71a2df0907fde1de425654e28d9d5cacca32e02a13d3
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/campplus-zh-cn.gguf
+- name: voice-detect-emotion-wav2vec2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://github.com/mudler/voice-detect.cpp
+    - https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
+  description: |
+    Voice analysis (age / gender / emotion) with audEERING's wav2vec2
+    model, converted to a C++/ggml GGUF for the `voice-detect` backend.
+    Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST
+    endpoint, returning a continuous age estimate plus gender and
+    emotion class scores for a single utterance. CC-BY-NC-SA-4.0 -
+    research / non-commercial use only.
+
+    The analysis architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
+  license: cc-by-nc-sa-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - voice-analysis
+    - emotion-recognition
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    parameters:
+      model: voice-detect-emotion-wav2vec2.gguf
+  files:
+    - filename: voice-detect-emotion-wav2vec2.gguf
+      sha256: 9e9793e4f77a27f4ae068bcb29c2b6fe2f74881799e2cfea0f8e436ad3765e50
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/emotion-wav2vec2-superb-er.gguf
+- name: voice-detect-age-gender-wav2vec2
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://huggingface.co/audeering/wav2vec2-large-robust-24-ft-age-gender
+    - https://github.com/mudler/voice-detect.cpp
+  description: |
+    wav2vec2-large-robust age + gender analysis head
+    (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a
+    C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze
+    gRPC rpc and the /v1/voice/analyze REST endpoint, returning a
+    continuous age estimate plus gender class scores for a single
+    utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only.
+
+    The analysis architecture (`voicedetect.arch`) is read from the
+    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
+  license: cc-by-nc-sa-4.0
+  icon: https://avatars.githubusercontent.com/u/95302084
+  tags:
+    - voice-recognition
+    - voice-analysis
+    - research-only
+    - cpu
+    - gpu
+  last_checked: "2026-06-22"
+  overrides:
+    backend: voice-detect
+    known_usecases:
+      - speaker_recognition
+    parameters:
+      model: voice-detect-age-gender-wav2vec2.gguf
+  files:
+    - filename: voice-detect-age-gender-wav2vec2.gguf
+      sha256: d92486b3f1ea7baf6a90f1026b7b8e9848b3a8332bccfb01cc8889eed7069064
+      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/age-gender-wav2vec2-audeering.gguf
 - name: rfdetr-base
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
--- a/pkg/oci/blob.go
+++ b/pkg/oci/blob.go
@@ -11,8 +11,6 @@ import (

 	oras "oras.land/oras-go/v2"
 	"oras.land/oras-go/v2/registry/remote"
-	"oras.land/oras-go/v2/registry/remote/auth"
-	"oras.land/oras-go/v2/registry/remote/retry"
 )

 func FetchImageBlob(ctx context.Context, r, reference, dst string, statusReader func(ocispec.Descriptor) io.Writer) error {
@@ -30,16 +28,6 @@ func FetchImageBlob(ctx context.Context, r, reference, dst string, statusReader
 	}
 	repo.SkipReferrersGC = true

-	// Identify LocalAI to the registry. This mirrors oras' auth.DefaultClient
-	// (same retry policy) but advertises a LocalAI User-Agent instead of the
-	// library default.
-	client := &auth.Client{
-		Client: retry.DefaultClient,
-		Cache:  auth.NewCache(),
-	}
-	client.SetUserAgent(UserAgent())
-	repo.Client = client
-
 	// https://github.com/oras-project/oras/blob/main/cmd/oras/internal/option/remote.go#L364
 	// https://github.com/oras-project/oras/blob/main/cmd/oras/root/blob/fetch.go#L136
 	desc, reader, err := oras.Fetch(ctx, repo.Blobs(), reference, oras.DefaultFetchOptions)
--- a/pkg/oci/image.go
+++ b/pkg/oci/image.go
@@ -176,7 +176,6 @@ func GetImage(targetImage, targetPlatform string, auth *registrytypes.AuthConfig
 	opts := []remote.Option{
 		remote.WithTransport(tr),
 		remote.WithPlatform(*platform),
-		remote.WithUserAgent(UserAgent()),
 	}
 	if auth != nil {
 		opts = append(opts, remote.WithAuth(staticAuth{auth}))
@@ -224,7 +223,6 @@ func GetImageDigest(targetImage, targetPlatform string, auth *registrytypes.Auth
 	opts := []remote.Option{
 		remote.WithTransport(tr),
 		remote.WithPlatform(*platform),
-		remote.WithUserAgent(UserAgent()),
 	}
 	if auth != nil {
 		opts = append(opts, remote.WithAuth(staticAuth{auth}))
--- a/pkg/oci/ollama.go
+++ b/pkg/oci/ollama.go
@@ -47,7 +47,6 @@ func OllamaModelManifest(image string) (*Manifest, error) {
 		return nil, err
 	}
 	req.Header.Set("Accept", "application/vnd.docker.distribution.manifest.v2+json")
-	req.Header.Set("User-Agent", UserAgent())
 	client := httpclient.New(httpclient.WithFollowRedirects())
 	resp, err := client.Do(req)
 	if err != nil {
--- a/pkg/oci/useragent.go
+++ b/pkg/oci/useragent.go
@@ -1,19 +0,0 @@
-package oci
-
-import (
-	"fmt"
-
-	"github.com/mudler/LocalAI/internal"
-)
-
-// UserAgent returns the User-Agent string LocalAI sends on outbound registry
-// requests (OCI registries and Ollama). It identifies the client as LocalAI
-// and, when the binary was built with a version stamp, appends it so registries
-// can attribute client-side usage to LocalAI rather than to the generic
-// User-Agent of the underlying transport library.
-func UserAgent() string {
-	if internal.Version == "" {
-		return "LocalAI"
-	}
-	return fmt.Sprintf("LocalAI/%s", internal.Version)
-}
--- a/pkg/oci/useragent_test.go
+++ b/pkg/oci/useragent_test.go
@@ -1,32 +0,0 @@
-package oci_test
-
-import (
-	"github.com/mudler/LocalAI/internal"
-	. "github.com/mudler/LocalAI/pkg/oci"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-var _ = Describe("OCI", func() {
-	Context("UserAgent", func() {
-		var savedVersion string
-
-		BeforeEach(func() {
-			savedVersion = internal.Version
-		})
-
-		AfterEach(func() {
-			internal.Version = savedVersion
-		})
-
-		It("identifies as LocalAI when no version is stamped", func() {
-			internal.Version = ""
-			Expect(UserAgent()).To(Equal("LocalAI"))
-		})
-
-		It("appends the build version when one is stamped", func() {
-			internal.Version = "v3.2.1"
-			Expect(UserAgent()).To(Equal("LocalAI/v3.2.1"))
-		})
-	})
-})
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -1021,25 +1021,6 @@ const docTemplate = `{
                }
            }
        },
-        "/api/nodes/models": {
-            "get": {
-                "tags": [
-                    "Nodes"
-                ],
-                "summary": "List all loaded models cluster-wide",
-                "responses": {
-                    "200": {
-                        "description": "OK",
-                        "schema": {
-                            "type": "array",
-                            "items": {
-                                "$ref": "#/definitions/nodes.NodeModel"
-                            }
-                        }
-                    }
-                }
-            }
-        },
        "/api/nodes/{id}/max-replicas-per-model": {
            "put": {
                "tags": [
@@ -3773,52 +3754,6 @@ const docTemplate = `{
                }
            }
        },
-        "nodes.NodeModel": {
-            "type": "object",
-            "properties": {
-                "address": {
-                    "description": "gRPC address for this replica's backend process",
-                    "type": "string"
-                },
-                "backend_type": {
-                    "description": "e.g. \"llama-cpp\"; used by reconciler to replicate loads",
-                    "type": "string"
-                },
-                "created_at": {
-                    "type": "string"
-                },
-                "id": {
-                    "type": "string"
-                },
-                "in_flight": {
-                    "description": "number of active requests on this replica",
-                    "type": "integer"
-                },
-                "last_used": {
-                    "type": "string"
-                },
-                "loading_by": {
-                    "description": "frontend ID that triggered loading",
-                    "type": "string"
-                },
-                "model_name": {
-                    "type": "string"
-                },
-                "node_id": {
-                    "type": "string"
-                },
-                "replica_index": {
-                    "type": "integer"
-                },
-                "state": {
-                    "description": "loading, loaded, unloading, idle",
-                    "type": "string"
-                },
-                "updated_at": {
-                    "type": "string"
-                }
-            }
-        },
        "proto.MemoryUsageData": {
            "type": "object",
            "properties": {
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -1018,25 +1018,6 @@
                }
            }
        },
-        "/api/nodes/models": {
-            "get": {
-                "tags": [
-                    "Nodes"
-                ],
-                "summary": "List all loaded models cluster-wide",
-                "responses": {
-                    "200": {
-                        "description": "OK",
-                        "schema": {
-                            "type": "array",
-                            "items": {
-                                "$ref": "#/definitions/nodes.NodeModel"
-                            }
-                        }
-                    }
-                }
-            }
-        },
        "/api/nodes/{id}/max-replicas-per-model": {
            "put": {
                "tags": [
@@ -3770,52 +3751,6 @@
                }
            }
        },
-        "nodes.NodeModel": {
-            "type": "object",
-            "properties": {
-                "address": {
-                    "description": "gRPC address for this replica's backend process",
-                    "type": "string"
-                },
-                "backend_type": {
-                    "description": "e.g. \"llama-cpp\"; used by reconciler to replicate loads",
-                    "type": "string"
-                },
-                "created_at": {
-                    "type": "string"
-                },
-                "id": {
-                    "type": "string"
-                },
-                "in_flight": {
-                    "description": "number of active requests on this replica",
-                    "type": "integer"
-                },
-                "last_used": {
-                    "type": "string"
-                },
-                "loading_by": {
-                    "description": "frontend ID that triggered loading",
-                    "type": "string"
-                },
-                "model_name": {
-                    "type": "string"
-                },
-                "node_id": {
-                    "type": "string"
-                },
-                "replica_index": {
-                    "type": "integer"
-                },
-                "state": {
-                    "description": "loading, loaded, unloading, idle",
-                    "type": "string"
-                },
-                "updated_at": {
-                    "type": "string"
-                }
-            }
-        },
        "proto.MemoryUsageData": {
            "type": "object",
            "properties": {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Ettore Di Giacinto	c6170b875d	chore(recon): bump backend pins to round-2 CPU-optimized engines voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context, wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp -> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 18:27:59 +00:00
Ettore Di Giacinto	a9c7484986	chore(recon): bump backend pins to CPU-optimized engine commits voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads). Brings the CPU optimizations into the LocalAI backend builds. GGUF format and parity unchanged, so the published HF GGUFs remain valid. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 15:16:21 +00:00
Ettore Di Giacinto	e05dece93c	feat(recon): honor LocalAI per-model threads in voice/face-detect backends LocalAI spawns one backend process per model and serves requests concurrently, so the engines' own min(hardware_concurrency, 8) default can oversubscribe cores. Forward the per-model Threads value from the gRPC LoadModel options into the engine via VOICEDETECT_THREADS / FACEDETECT_THREADS (read at backend construction) before the capi load. A non-positive Threads is treated as unset, leaving the engine default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 13:32:02 +00:00
Ettore Di Giacinto	7c2a347e79	feat(gallery): add face-detect-buffalo-sc and antelopev2 packs Add gallery entries for two newly-published insightface face packs on the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k R100, 512-d). Both are non-commercial research-only. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 10:48:00 +00:00
Ettore Di Giacinto	6e0c491380	feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now embedded and re-uploaded under the same filenames/uris) and note the FaceVerify anti_spoof request flag in each description. Add a new voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 09:42:01 +00:00
Ettore Di Giacinto	2bcdfe2a68	chore(gallery): publish recon backend GGUF uris + sha256 Fill in the published HuggingFace GGUF uris and verified sha256 for the 9 recon gallery entries (voice-detect-* and face-detect-*), and remove the TODO publish markers. Correct the eres2net, campplus, and emotion-wav2vec2 uris to the actual published filenames. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 08:48:30 +00:00
Ettore Di Giacinto	b843f498ca	docs(recon): document voice-detect and face-detect ggml backends Document the new standalone C++/ggml biometric backends as the recommended/default option for face and voice recognition, keeping the existing Python insightface / speaker-recognition backends framed as the legacy path. - features/face-recognition.md: add a face-detect (ggml) backend section with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface Apache-2.0), licensing, and verify/detect/analyze quickstart. - features/voice-recognition.md: add a voice-detect (ggml) backend section with the gallery entries (ecapa-tdnn, wespeaker-resnet34, eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial analyze head) and quickstart. - reference/compatibility-table.md: add face-detect.cpp and voice-detect.cpp rows to the Vision, Detection & Recognition table. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 08:43:30 +00:00
Ettore Di Giacinto	46d7d59a82	fix(recon): voice-detect metal build branch + face-detect gallery usecases Add the missing metal BUILD_TYPE branch to the voice-detect Makefile forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the darwin metal CI artifact is built with the Metal backend instead of CPU-only. Expand the 4 face-detect gallery models' known_usecases to [face_recognition, detection, embeddings] to match the backend capabilities map and the mirrored insightface-buffalo entries, so auto-selection for /v1/detect and /embeddings works. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 07:24:54 +00:00
Ettore Di Giacinto	e3bca9a172	feat(face-detect): wire backend into index, gallery and build Register the face-detect.cpp face detection / embedding / verification / analysis backend (added in Face-INT-A) into LocalAI's distribution surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml recognition analogue): - backend/index.yaml: add the &facedetect meta-backend (capabilities platform map, no top-level uri to avoid the meta-backend gotcha) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/ metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants), 22 entries. Referential integrity audited: every alias target resolves. - gallery/index.yaml: add 4 model entries on backend face-detect - face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL) and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the commercial-friendly alternative). The detector/embedder architecture is read from GGUF metadata (facedetect.arch) at load; only the real verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF artifacts are not yet published: each files: entry points at the intended mudler/face-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - core/config/backend_capabilities.go: register face-detect in the backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze -> face_recognition), mirroring insightface. - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring voice-detect. - .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via FACEDETECT_VERSION (pin 636a1963). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 07:13:45 +00:00
Ettore Di Giacinto	a19ab22186	fix(voice-detect): replace em dashes in net-new descriptions Project style forbids em/en dashes. Replace the three U+2014 chars introduced by the voice-detect gallery/index wiring with `-`/`:`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 00:28:58 +00:00
Ettore Di Giacinto	91d08d88e6	feat(face-detect): add purego Go backend for face-detect.cpp Add the LocalAI Go backend that dlopens libfacedetect.so (the flat facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect backend. Implements the Face subset of the Backend gRPC service: - Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path -> L2-normalized ArcFace embedding. - Detect(DetectOptions): src -> detect_path_json -> Detection boxes (class_name "face", [x1,y1,x2,y2] -> x/y/w/h). - FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof -> verify_paths; best-effort img areas via detect. - FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face age + gender ("M"/"F" normalized to "Man"/"Woman"). The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib with ggml + vendored libjpeg-turbo static (PIC), so the .so is ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_ symbols). Gated Ginkgo e2e mirrors voice-detect. Note for the gallery-wiring task: backend registration (index.yaml, gallery, core/config/backend_capabilities.go) is intentionally not touched here. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 00:26:15 +00:00
Ettore Di Giacinto	2c5ed413cb	feat(voice-detect): wire backend into index, gallery and build Register the voice-detect.cpp speaker-recognition + voice-analysis backend (added in Voice-INT-A) into LocalAI's distribution surfaces, mirroring the ced backend (the closest mudler C++/ggml audio analogue): - backend/index.yaml: add the &voicedetect meta-backend (capabilities platform map, no top-level uri) plus the full set of concrete per-arch image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the -development variants). Referential integrity audited - every alias target resolves. - gallery/index.yaml: add 5 model entries on backend voice-detect - ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the wav2vec2 age/gender/emotion analyze model. The engine architecture is read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are not yet published: each files: entry points at the intended mudler/voice-detect-gguf location with a TODO to fill sha256 after upload (no fabricated hashes). - .github/backend-matrix.yml: add the linux build matrix block + the darwin metal entry mirroring ced. - .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via VOICEDETECT_VERSION (pin 47546430, = 4754643). - core/config/backend_capabilities.go: register voice-detect in the backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze -> speaker_recognition), mirroring speaker-recognition. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 00:15:00 +00:00
Ettore Di Giacinto	01e098a844	feat(voice-detect): add Go purego backend for voice-detect.cpp Add backend/go/voice-detect implementing the Backend gRPC voice subset (VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego, mirroring the parakeet-cpp / omnivoice-cpp backends. The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and float-vector returns are owned by Go and released through the matching capi free functions, with the per-ctx last error surfaced into Go errors. Calls are serialized via base.SingleThread since the C context is not reentrant. Proto field mapping: - VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model. - VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the verify_threshold option, default 0.25) -> verify_paths -> verified/distance/ threshold/confidence/model/processing_time_ms. - VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion document maps to a single VoiceAnalysis segment (start/end 0; gender "label" -> dominant_gender with the remaining float scores as the gender map; emotion label/scores -> dominant_emotion/emotion). The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-22 00:00:32 +00:00