chore(deps): bump torch in /backend/python/vllm

Bumps torch from 2.9.1+cpu to 2.12.1+xpu. --- updated-dependencies: - dependency-name: torch dependency-version: 2.12.1+xpu dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401 )
2026-06-22 15:49:12 -04:00 · 2026-06-22 18:33:32 +00:00 · 2026-06-22 18:26:19 +02:00 · 2026-06-22 18:24:29 +02:00 · 2026-06-22 16:09:16 +02:00 · 2026-06-22 12:38:06 +02:00
104 changed files with 3122 additions and 4558 deletions
--- a/.github/backend-matrix.yml
+++ b/.github/backend-matrix.yml
@@ -3723,302 +3723,6 @@ include:
    dockerfile: "./backend/Dockerfile.golang"
    context: "./"
    ubuntu-version: '2404'
-  # voice-detect
-  - build-type: 'cublas'
-    cuda-major-version: "12"
-    cuda-minor-version: "8"
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-nvidia-cuda-12-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "13"
-    cuda-minor-version: "0"
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-nvidia-cuda-13-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "13"
-    cuda-minor-version: "0"
-    platforms: 'linux/arm64'
-    skip-drivers: 'false'
-    tag-latest: 'auto'
-    tag-suffix: '-nvidia-l4t-cuda-13-arm64-voice-detect'
-    base-image: "ubuntu:24.04"
-    ubuntu-version: '2404'
-    runs-on: 'ubuntu-24.04-arm'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-  - build-type: ''
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    platform-tag: 'amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-cpu-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: ''
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/arm64'
-    platform-tag: 'arm64'
-    tag-latest: 'auto'
-    tag-suffix: '-cpu-voice-detect'
-    runs-on: 'ubuntu-24.04-arm'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'sycl_f32'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-intel-sycl-f32-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'sycl_f16'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-intel-sycl-f16-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'vulkan'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    platform-tag: 'amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-vulkan-voice-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'vulkan'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/arm64'
-    platform-tag: 'arm64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-vulkan-voice-detect'
-    runs-on: 'ubuntu-24.04-arm'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "12"
-    cuda-minor-version: "0"
-    platforms: 'linux/arm64'
-    skip-drivers: 'false'
-    tag-latest: 'auto'
-    tag-suffix: '-nvidia-l4t-arm64-voice-detect'
-    base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-    runs-on: 'ubuntu-24.04-arm'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2204'
-  - build-type: 'hipblas'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-rocm-hipblas-voice-detect'
-    base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-    runs-on: 'ubuntu-latest'
-    skip-drivers: 'false'
-    backend: "voice-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  # face-detect
-  - build-type: 'cublas'
-    cuda-major-version: "12"
-    cuda-minor-version: "8"
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-nvidia-cuda-12-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "13"
-    cuda-minor-version: "0"
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-nvidia-cuda-13-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "13"
-    cuda-minor-version: "0"
-    platforms: 'linux/arm64'
-    skip-drivers: 'false'
-    tag-latest: 'auto'
-    tag-suffix: '-nvidia-l4t-cuda-13-arm64-face-detect'
-    base-image: "ubuntu:24.04"
-    ubuntu-version: '2404'
-    runs-on: 'ubuntu-24.04-arm'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-  - build-type: ''
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    platform-tag: 'amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-cpu-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: ''
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/arm64'
-    platform-tag: 'arm64'
-    tag-latest: 'auto'
-    tag-suffix: '-cpu-face-detect'
-    runs-on: 'ubuntu-24.04-arm'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'sycl_f32'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-intel-sycl-f32-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'sycl_f16'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-intel-sycl-f16-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'vulkan'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    platform-tag: 'amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-vulkan-face-detect'
-    runs-on: 'ubuntu-latest'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'vulkan'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/arm64'
-    platform-tag: 'arm64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-vulkan-face-detect'
-    runs-on: 'ubuntu-24.04-arm'
-    base-image: "ubuntu:24.04"
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
-  - build-type: 'cublas'
-    cuda-major-version: "12"
-    cuda-minor-version: "0"
-    platforms: 'linux/arm64'
-    skip-drivers: 'false'
-    tag-latest: 'auto'
-    tag-suffix: '-nvidia-l4t-arm64-face-detect'
-    base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-    runs-on: 'ubuntu-24.04-arm'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2204'
-  - build-type: 'hipblas'
-    cuda-major-version: ""
-    cuda-minor-version: ""
-    platforms: 'linux/amd64'
-    tag-latest: 'auto'
-    tag-suffix: '-gpu-rocm-hipblas-face-detect'
-    base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-    runs-on: 'ubuntu-latest'
-    skip-drivers: 'false'
-    backend: "face-detect"
-    dockerfile: "./backend/Dockerfile.golang"
-    context: "./"
-    ubuntu-version: '2404'
  # acestep-cpp
  - build-type: ''
    cuda-major-version: ""
@@ -5202,14 +4906,6 @@ includeDarwin:
    tag-suffix: "-metal-darwin-arm64-ced"
    build-type: "metal"
    lang: "go"
-  - backend: "voice-detect"
-    tag-suffix: "-metal-darwin-arm64-voice-detect"
-    build-type: "metal"
-    lang: "go"
-  - backend: "face-detect"
-    tag-suffix: "-metal-darwin-arm64-face-detect"
-    build-type: "metal"
-    lang: "go"
  - backend: "acestep-cpp"
    tag-suffix: "-metal-darwin-arm64-acestep-cpp"
    build-type: "metal"
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -46,14 +46,6 @@ jobs:
            variable: "CED_VERSION"
            branch: "master"
            file: "backend/go/ced/Makefile"
-          - repository: "mudler/voice-detect.cpp"
-            variable: "VOICEDETECT_VERSION"
-            branch: "master"
-            file: "backend/go/voice-detect/Makefile"
-          - repository: "mudler/face-detect.cpp"
-            variable: "FACEDETECT_VERSION"
-            branch: "master"
-            file: "backend/go/face-detect/Makefile"
          - repository: "mudler/depth-anything.cpp"
            variable: "DEPTHANYTHING_VERSION"
            branch: "master"
--- a/.github/workflows/tests-pii-ner-e2e.yml
+++ b/.github/workflows/tests-pii-ner-e2e.yml
@@ -0,0 +1,97 @@
+---
+name: 'PII NER tier E2E (live GGUF, CPU)'
+
+# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the
+# hermetic tests/e2e suite cannot cover (it only exercises the in-process
+# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB
+# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand.
+#
+# This drives the container-level harness (tests/e2e-backends) via
+# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image,
+# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned
+# TokenClassify spans. The complementary HTTP-path specs in tests/e2e
+# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired.
+
+on:
+  workflow_dispatch:
+  schedule:
+    - cron: '0 3 * * *'
+  push:
+    branches:
+      - master
+    paths:
+      - 'backend/cpp/privacy-filter/**'
+      - 'backend/Dockerfile.privacy-filter'
+      - 'core/services/routing/pii/**'
+      - 'core/services/routing/piidetector/**'
+      - 'core/backend/token_classify.go'
+      - 'core/http/endpoints/localai/pii.go'
+      - 'core/schema/pii.go'
+      - 'tests/e2e-backends/**'
+      - 'tests/e2e/e2e_pii_ner_test.go'
+      - 'tests/e2e/e2e_suite_test.go'
+      - '.github/workflows/tests-pii-ner-e2e.yml'
+  pull_request:
+    paths:
+      - 'backend/cpp/privacy-filter/**'
+      - 'backend/Dockerfile.privacy-filter'
+      - 'core/services/routing/pii/**'
+      - 'core/services/routing/piidetector/**'
+      - 'core/backend/token_classify.go'
+      - 'core/http/endpoints/localai/pii.go'
+      - 'core/schema/pii.go'
+      - 'tests/e2e-backends/**'
+      - 'tests/e2e/e2e_pii_ner_test.go'
+      - 'tests/e2e/e2e_suite_test.go'
+      - '.github/workflows/tests-pii-ner-e2e.yml'
+
+concurrency:
+  group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
+  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+
+jobs:
+  tests-pii-ner-e2e:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        go-version: ['1.25.x']
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
+          sudo docker image prune --all --force || true
+          df -h
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
+      - name: Setup Go ${{ matrix.go-version }}
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ matrix.go-version }}
+          cache: false
+      - name: Proto Dependencies
+        run: |
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential
+      # Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on
+      # CPU and runs the token_classify capability spec (byte-offset contract).
+      - name: Run live PII NER backend E2E
+        run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter
+      - name: Setup tmate session if tests fail
+        if: ${{ failure() }}
+        uses: mxschmitt/action-tmate@v3.23
+        with:
+          detached: true
+          connect-timeout-seconds: 180
+          limit-access-to-actor: true
--- a/.gitignore
+++ b/.gitignore
@@ -91,3 +91,6 @@ core/http/react-ui/test-results/

 # Local worktrees
 .worktrees/
+
+# SDD / brainstorm scratch (agent-driven development)
+.superpowers/
--- a/10
+++ b/10
@@ -690,6 +690,16 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
 	BACKEND_TEST_CTX_SIZE=2048 \
 	$(MAKE) test-extra-backend

+## privacy-filter: the PII/NER token-classification backend. Exercises the
+## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets
+## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M
+## active params). This is the live-backend coverage for the PII NER tier.
+test-extra-backend-privacy-filter: docker-build-privacy-filter
+	BACKEND_IMAGE=local-ai-backend:privacy-filter \
+	BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \
+	BACKEND_TEST_CAPS=health,load,token_classify \
+	$(MAKE) test-extra-backend
+
 ## vllm is resolved from a HuggingFace model id (no file download) and
 ## exercises Predict + streaming + tool-call extraction via the hermes parser.
 ## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62
+LLAMA_VERSION?=7c082bc417bbe53210a83df4ba5b49e18ce6193c
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221
+CRISPASR_VERSION?=7a8cb80907341c0204bd0488c1244764f4163883
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/go/face-detect/.gitignore
+++ b/backend/go/face-detect/.gitignore
@@ -1,18 +0,0 @@
-# Fetched upstream sources
-sources/
-
-# CMake build directories
-build*/
-
-# build artifacts staged in-tree by the Makefile (cp from sources/) or
-# symlinked for local dev; the real sources live in face-detect.cpp upstream.
-*.so
-*.so.*
-facedetect_capi.h
-compile_commands.json
-
-# Compiled backend binary
-face-detect-grpc
-
-# Packaging output
-package/
--- a/backend/go/face-detect/Makefile
+++ b/backend/go/face-detect/Makefile
@@ -1,97 +0,0 @@
-# face-detect backend Makefile.
-#
-# Upstream pin lives below as FACEDETECT_VERSION?=9c8adb7... (.github/bump_deps.sh
-# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
-# convention).
-#
-# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
-# symlink the .so + header into this directory and skip the clone/cmake steps:
-#
-#   ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
-#   ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
-#   go build -o face-detect-grpc .
-#
-# The default target below does the proper clone-at-pin + cmake build so CI does
-# not need a side-checkout.
-
-FACEDETECT_VERSION?=9c8adb748f1f02d7fc0430a883234aef4b343a34
-FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
-
-GOCMD?=go
-GO_TAGS?=
-JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
-
-BUILD_TYPE?=
-NATIVE?=false
-
-# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
-# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
-# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
-# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
-# side, so only the facedetect_capi_* surface is exported.
-CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
-
-ifeq ($(NATIVE),false)
-	CMAKE_ARGS+=-DGGML_NATIVE=OFF
-endif
-
-# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
-# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
-# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
-# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
-ifeq ($(BUILD_TYPE),cublas)
-	CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
-else ifeq ($(BUILD_TYPE),openblas)
-	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
-else ifeq ($(BUILD_TYPE),hipblas)
-	CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
-else ifeq ($(BUILD_TYPE),vulkan)
-	CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
-else ifeq ($(BUILD_TYPE),metal)
-	CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
-endif
-
-.PHONY: face-detect-grpc package build clean purge test all
-
-all: face-detect-grpc
-
-# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
-# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
-# bump, run 'make purge && make' to refetch.
-sources/face-detect.cpp:
-	mkdir -p sources/face-detect.cpp
-	cd sources/face-detect.cpp && \
-	git init -q && \
-	git remote add origin $(FACEDETECT_REPO) && \
-	git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
-	git checkout FETCH_HEAD && \
-	git submodule update --init --recursive --depth 1 --single-branch
-
-# Build the shared lib + header out-of-tree, then stage them next to the Go
-# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
-# them up.
-libfacedetect.so: sources/face-detect.cpp
-	cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
-	cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
-	cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
-	cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
-
-face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
-	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
-
-package: face-detect-grpc
-	bash package.sh
-
-build: package
-
-# Test target. The embed/detect/verify/analyze smoke specs are gated on
-# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
-# heavy specs auto-skip and only the pure-Go parsing specs run.
-test:
-	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
-
-clean: purge
-	rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
-
-purge:
-	rm -rf sources/face-detect.cpp
--- a/backend/go/face-detect/gofacedetect.go
+++ b/backend/go/face-detect/gofacedetect.go
@@ -1,431 +0,0 @@
-package main
-
-import (
-	"encoding/base64"
-	"encoding/json"
-	"errors"
-	"fmt"
-	"math"
-	"os"
-	"path/filepath"
-	"strconv"
-	"strings"
-	"time"
-	"unsafe"
-
-	"github.com/mudler/LocalAI/pkg/grpc/base"
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-	"github.com/mudler/xlog"
-)
-
-// purego-bound entry points from libfacedetect.so. Names match
-// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
-// is enough to spot drift.
-//
-// The opaque ctx and the malloc'd char*/float* return values are declared as
-// uintptr so we get the raw pointer back and can release it via the matching
-// capi free function. purego's native string/[]float32 returns would copy and
-// forget the original pointer, leaking the C-owned buffer on every call.
-var (
-	CppAbiVersion  func() int32
-	CppLoad        func(ggufPath string) uintptr
-	CppFree        func(ctx uintptr)
-	CppLastError   func(ctx uintptr) string
-	CppFreeString  func(s uintptr)
-	CppFreeVec     func(v uintptr)
-	CppEmbedPath   func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
-	CppEmbedRGB    func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
-	CppDetectJSON  func(ctx uintptr, imagePath string) uintptr
-	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
-	CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
-)
-
-// FaceDetect implements the face-recognition (biometric) subset of the Backend
-// gRPC service over libfacedetect.so. The C side keeps a single loaded model
-// pack plus a per-ctx last-error buffer and is not reentrant, so
-// base.SingleThread serializes every call.
-type FaceDetect struct {
-	base.SingleThread
-	opts   loadOptions
-	ctxPtr uintptr
-}
-
-func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
-	model := opts.ModelFile
-	if model == "" {
-		model = opts.ModelPath
-	}
-	if !filepath.IsAbs(model) && opts.ModelPath != "" {
-		model = filepath.Join(opts.ModelPath, model)
-	}
-	if model == "" {
-		return errors.New("face-detect: ModelFile is required")
-	}
-
-	f.opts = parseOptions(opts.Options)
-	if f.opts.modelName == "" {
-		f.opts.modelName = filepath.Base(model)
-	}
-
-	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
-	// one backend process per model and serves requests concurrently, so the
-	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
-	// FACEDETECT_THREADS is read by the engine at backend construction, so it
-	// must be set before the capi load. A non-positive Threads means "unset":
-	// leave the env alone so the engine keeps its sane default.
-	threads := opts.Threads
-	if threads > 0 {
-		if err := os.Setenv("FACEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
-			return fmt.Errorf("face-detect: set FACEDETECT_THREADS: %w", err)
-		}
-		xlog.Info("face-detect: applying LocalAI thread budget", "threads", threads)
-	}
-
-	xlog.Info("face-detect: loading model", "model", model,
-		"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
-
-	ctx := CppLoad(model)
-	if ctx == 0 {
-		// The last-error buffer lives on the ctx that was never returned, so
-		// surface the path the operator tried to load instead.
-		return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
-	}
-	f.ctxPtr = ctx
-	return nil
-}
-
-// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
-// the supplied image. Mirroring the Python face backend, the image is read from
-// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
-// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
-// server wraps the returned slice in an EmbeddingResult.
-func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
-	if f.ctxPtr == 0 {
-		return nil, errors.New("face-detect: model not loaded")
-	}
-	if len(req.Images) == 0 || req.Images[0] == "" {
-		return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
-	}
-
-	path, cleanup, err := materializeImage(req.Images[0])
-	if err != nil {
-		return nil, err
-	}
-	defer cleanup()
-
-	return f.embedPath(path)
-}
-
-func (f *FaceDetect) embedPath(path string) ([]float32, error) {
-	var vec uintptr
-	var dim int32
-	rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
-	if rc != 0 || vec == 0 || dim <= 0 {
-		return nil, f.lastErr("embed", path)
-	}
-	defer CppFreeVec(vec)
-	// Copy out of the C-owned malloc'd buffer before freeing it. The
-	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
-	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
-	// nor moves this buffer and we copy immediately.
-	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
-	out := make([]float32, int(dim))
-	copy(out, src)
-	return out, nil
-}
-
-// Detect runs SCRFD over the image and returns one Detection per face. The
-// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
-// width/height, so the corners are converted. The 5 facial landmarks the engine
-// also returns are dropped: the Detection message has no field for them.
-func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
-	if f.ctxPtr == 0 {
-		return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
-	}
-	if req.Src == "" {
-		return pb.DetectResponse{}, errors.New("face-detect: src image is required")
-	}
-
-	path, cleanup, err := materializeImage(req.Src)
-	if err != nil {
-		return pb.DetectResponse{}, err
-	}
-	defer cleanup()
-
-	faces, err := f.detectFaces(path)
-	if err != nil {
-		return pb.DetectResponse{}, err
-	}
-
-	dets := make([]*pb.Detection, 0, len(faces))
-	for _, fc := range faces {
-		if req.Threshold > 0 && fc.Score < req.Threshold {
-			continue
-		}
-		x, y, w, h := fc.xywh()
-		dets = append(dets, &pb.Detection{
-			X:          x,
-			Y:          y,
-			Width:      w,
-			Height:     h,
-			Confidence: fc.Score,
-			ClassName:  "face",
-		})
-	}
-	return pb.DetectResponse{Detections: dets}, nil
-}
-
-// FaceVerify embeds the primary face in each image and reports whether they are
-// the same identity by cosine distance against a threshold. A request threshold
-// <= 0 falls back to the model-configured default (verify_threshold option,
-// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
-// veto internally (verified forced false on a spoof); the per-image liveness
-// scores are not exposed by the verify entry point, so img*_is_real /
-// img*_antispoof_score stay at their zero values.
-func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
-	if f.ctxPtr == 0 {
-		return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
-	}
-	if req.Img1 == "" || req.Img2 == "" {
-		return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
-	}
-
-	path1, cleanup1, err := materializeImage(req.Img1)
-	if err != nil {
-		return pb.FaceVerifyResponse{}, err
-	}
-	defer cleanup1()
-	path2, cleanup2, err := materializeImage(req.Img2)
-	if err != nil {
-		return pb.FaceVerifyResponse{}, err
-	}
-	defer cleanup2()
-
-	threshold := req.Threshold
-	if threshold <= 0 {
-		threshold = f.opts.verifyThreshold
-	}
-
-	antiSpoof := int32(0)
-	if req.AntiSpoofing {
-		antiSpoof = 1
-	}
-
-	started := time.Now()
-	var distance float32
-	var verified int32
-	rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
-		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
-	if rc != 0 {
-		return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
-	}
-	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
-
-	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
-	// matching the Python face backend's reporting.
-	confidence := float32(0)
-	if threshold > 0 {
-		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
-	}
-
-	return pb.FaceVerifyResponse{
-		Verified:         verified != 0,
-		Distance:         distance,
-		Threshold:        threshold,
-		Confidence:       confidence,
-		Model:            f.opts.modelName,
-		Img1Area:         f.bestArea(path1),
-		Img2Area:         f.bestArea(path2),
-		ProcessingTimeMs: elapsedMs,
-	}, nil
-}
-
-// FaceAnalyze runs the genderage head on every detected face. The C-API returns
-// "M"/"F" gender labels and a rounded age; the labels are normalized to the
-// "Man"/"Woman" values the proto documents.
-func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
-	if f.ctxPtr == 0 {
-		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
-	}
-	if req.Img == "" {
-		return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
-	}
-
-	path, cleanup, err := materializeImage(req.Img)
-	if err != nil {
-		return pb.FaceAnalyzeResponse{}, err
-	}
-	defer cleanup()
-
-	ptr := CppAnalyzeJSON(f.ctxPtr, path)
-	if ptr == 0 {
-		return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
-	}
-	defer CppFreeString(ptr)
-
-	faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
-	if err != nil {
-		return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
-	}
-	return pb.FaceAnalyzeResponse{Faces: faces}, nil
-}
-
-// faceBox is one entry of the detect/analyze JSON documents the engine emits.
-type faceBox struct {
-	Score  float32   `json:"score"`
-	Box    []float32 `json:"box"`
-	Age    float32   `json:"age"`
-	Gender string    `json:"gender"`
-}
-
-// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
-// proto carries. A short or missing box yields zeros.
-func (b faceBox) xywh() (x, y, w, h float32) {
-	if len(b.Box) < 4 {
-		return 0, 0, 0, 0
-	}
-	return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
-}
-
-type facesJSON struct {
-	Faces []faceBox `json:"faces"`
-}
-
-func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
-	ptr := CppDetectJSON(f.ctxPtr, path)
-	if ptr == 0 {
-		return nil, f.lastErr("detect", path)
-	}
-	defer CppFreeString(ptr)
-
-	var doc facesJSON
-	if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
-		return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
-	}
-	return doc.Faces, nil
-}
-
-// bestArea returns the FacialArea of the highest-scoring face in an image, or an
-// empty area when detection fails or finds nothing. Best-effort: verify already
-// succeeded, so a missing region must not turn a valid match into an error.
-func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
-	faces, err := f.detectFaces(path)
-	if err != nil || len(faces) == 0 {
-		return &pb.FacialArea{}
-	}
-	best := faces[0]
-	for _, fc := range faces[1:] {
-		if fc.Score > best.Score {
-			best = fc
-		}
-	}
-	x, y, w, h := best.xywh()
-	return &pb.FacialArea{X: x, Y: y, W: w, H: h}
-}
-
-// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
-// The engine reports gender as "M"/"F"; both the dominant label and the score
-// map are filled with the "Man"/"Woman" form the proto documents.
-func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
-	var parsed facesJSON
-	if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
-		return nil, err
-	}
-
-	out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
-	for _, fc := range parsed.Faces {
-		x, y, w, h := fc.xywh()
-		fa := &pb.FaceAnalysis{
-			Region:         &pb.FacialArea{X: x, Y: y, W: w, H: h},
-			FaceConfidence: fc.Score,
-			Age:            fc.Age,
-		}
-		if label := normalizeGender(fc.Gender); label != "" {
-			fa.DominantGender = label
-			fa.Gender = map[string]float32{label: 1.0}
-		}
-		out = append(out, fa)
-	}
-	return out, nil
-}
-
-// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
-// proto documents. Unknown codes pass through unchanged.
-func normalizeGender(g string) string {
-	switch strings.ToUpper(strings.TrimSpace(g)) {
-	case "M":
-		return "Man"
-	case "F":
-		return "Woman"
-	case "":
-		return ""
-	default:
-		return g
-	}
-}
-
-// materializeImage decodes a base64 image payload into a temp file and returns
-// its path plus a cleanup func. As a convenience for callers that already pass a
-// filesystem path (e.g. a test fixture), an existing path is used as-is with a
-// no-op cleanup. data: URI prefixes are stripped before decoding.
-func materializeImage(src string) (path string, cleanup func(), err error) {
-	noop := func() {}
-	if src == "" {
-		return "", noop, errors.New("face-detect: empty image input")
-	}
-	if _, statErr := os.Stat(src); statErr == nil {
-		return src, noop, nil
-	}
-
-	payload := src
-	if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
-		payload = payload[i+1:]
-	}
-	data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
-	if decErr != nil || len(data) == 0 {
-		return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
-	}
-
-	tmp, createErr := os.CreateTemp("", "face-detect-*.img")
-	if createErr != nil {
-		return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
-	}
-	cleanup = func() { _ = os.Remove(tmp.Name()) }
-	if _, wErr := tmp.Write(data); wErr != nil {
-		_ = tmp.Close()
-		cleanup()
-		return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
-	}
-	if cErr := tmp.Close(); cErr != nil {
-		cleanup()
-		return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
-	}
-	return tmp.Name(), cleanup, nil
-}
-
-// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
-func (f *FaceDetect) lastErr(op, subject string) error {
-	msg := strings.TrimSpace(CppLastError(f.ctxPtr))
-	if msg == "" {
-		msg = "no error detail"
-	}
-	return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
-}
-
-// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
-// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
-//
-// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
-// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
-// moves the buffer and we dereference it immediately to copy the bytes out.
-func goStringFromCPtr(cptr uintptr) string {
-	if cptr == 0 {
-		return ""
-	}
-	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
-	n := 0
-	for *(*byte)(unsafe.Add(p, n)) != 0 {
-		n++
-	}
-	return string(unsafe.Slice((*byte)(p), n))
-}
--- a/backend/go/face-detect/gofacedetect_test.go
+++ b/backend/go/face-detect/gofacedetect_test.go
@@ -1,230 +0,0 @@
-package main
-
-import (
-	"encoding/base64"
-	"os"
-	"sync"
-	"testing"
-
-	"github.com/ebitengine/purego"
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-func TestFaceDetect(t *testing.T) {
-	RegisterFailHandler(Fail)
-	RunSpecs(t, "face-detect Backend Suite")
-}
-
-var (
-	libLoadOnce sync.Once
-	libLoadErr  error
-)
-
-// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
-// bridge without spinning up the gRPC server. Records the error (the smoke
-// specs skip themselves) when libfacedetect.so is not loadable from cwd
-// (LD_LIBRARY_PATH or a symlink in ./).
-func ensureLibLoaded() error {
-	libLoadOnce.Do(func() {
-		libName := os.Getenv("FACEDETECT_LIBRARY")
-		if libName == "" {
-			libName = "libfacedetect.so"
-		}
-		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
-		if err != nil {
-			libLoadErr = err
-			return
-		}
-		purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
-		purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
-		purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
-		purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
-		purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
-		purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
-		purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
-		purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
-		purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
-		purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
-		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
-	})
-	return libLoadErr
-}
-
-var _ = Describe("parseOptions", func() {
-	It("defaults verify_threshold to 0.35", func() {
-		o := parseOptions(nil)
-		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
-		Expect(o.modelName).To(Equal(""))
-	})
-
-	It("parses verify_threshold, threshold alias and model_name", func() {
-		o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
-		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
-		Expect(o.modelName).To(Equal("buffalo_l"))
-
-		o2 := parseOptions([]string{"threshold:0.3"})
-		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
-	})
-
-	It("ignores non-positive thresholds and keeps the default", func() {
-		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
-		Expect(o.verifyThreshold).To(Equal(float32(0.35)))
-	})
-})
-
-var _ = Describe("normalizeGender", func() {
-	It("maps M/F codes to Man/Woman", func() {
-		Expect(normalizeGender("M")).To(Equal("Man"))
-		Expect(normalizeGender("f")).To(Equal("Woman"))
-		Expect(normalizeGender(" m ")).To(Equal("Man"))
-	})
-
-	It("passes empty and unknown codes through", func() {
-		Expect(normalizeGender("")).To(Equal(""))
-		Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
-	})
-})
-
-var _ = Describe("faceBox.xywh", func() {
-	It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
-		b := faceBox{Box: []float32{10, 20, 50, 80}}
-		x, y, w, h := b.xywh()
-		Expect(x).To(Equal(float32(10)))
-		Expect(y).To(Equal(float32(20)))
-		Expect(w).To(Equal(float32(40)))
-		Expect(h).To(Equal(float32(60)))
-	})
-
-	It("returns zeros for a short box", func() {
-		x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
-		Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
-	})
-})
-
-var _ = Describe("parseAnalyzeJSON", func() {
-	It("maps region, age and gender for each face", func() {
-		doc := `{"faces":[
-			{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
-			{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
-		faces, err := parseAnalyzeJSON(doc)
-		Expect(err).ToNot(HaveOccurred())
-		Expect(faces).To(HaveLen(2))
-
-		Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
-		Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
-		Expect(faces[0].DominantGender).To(Equal("Man"))
-		Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
-		Expect(faces[0].Region.W).To(Equal(float32(40)))
-		Expect(faces[0].Region.H).To(Equal(float32(60)))
-
-		Expect(faces[1].DominantGender).To(Equal("Woman"))
-	})
-
-	It("tolerates a missing gender field", func() {
-		faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
-		Expect(err).ToNot(HaveOccurred())
-		Expect(faces).To(HaveLen(1))
-		Expect(faces[0].DominantGender).To(Equal(""))
-		Expect(faces[0].Gender).To(BeEmpty())
-	})
-
-	It("returns no faces for an empty document", func() {
-		faces, err := parseAnalyzeJSON(`{"faces":[]}`)
-		Expect(err).ToNot(HaveOccurred())
-		Expect(faces).To(BeEmpty())
-	})
-
-	It("returns an error on malformed JSON", func() {
-		_, err := parseAnalyzeJSON(`{not-json`)
-		Expect(err).To(HaveOccurred())
-	})
-})
-
-var _ = Describe("materializeImage", func() {
-	It("decodes a base64 payload to a temp file", func() {
-		payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
-		path, cleanup, err := materializeImage(payload)
-		Expect(err).ToNot(HaveOccurred())
-		defer cleanup()
-		data, rerr := os.ReadFile(path)
-		Expect(rerr).ToNot(HaveOccurred())
-		Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
-	})
-
-	It("strips a data: URI prefix before decoding", func() {
-		payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
-		path, cleanup, err := materializeImage(payload)
-		Expect(err).ToNot(HaveOccurred())
-		defer cleanup()
-		data, rerr := os.ReadFile(path)
-		Expect(rerr).ToNot(HaveOccurred())
-		Expect(data).To(Equal([]byte("hello")))
-	})
-
-	It("uses an existing path as-is", func() {
-		tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
-		Expect(err).ToNot(HaveOccurred())
-		defer func() { _ = os.Remove(tmp.Name()) }()
-		Expect(tmp.Close()).To(Succeed())
-
-		path, cleanup, err := materializeImage(tmp.Name())
-		Expect(err).ToNot(HaveOccurred())
-		defer cleanup()
-		Expect(path).To(Equal(tmp.Name()))
-	})
-
-	It("errors on input that is neither a path nor base64", func() {
-		_, _, err := materializeImage("not base64!!!")
-		Expect(err).To(HaveOccurred())
-	})
-})
-
-// The specs below exercise the real C-API end to end. They run only when both a
-// model GGUF and a test image are provided, and skip cleanly otherwise so the
-// suite stays green without large assets.
-var _ = Describe("FaceDetect end-to-end", Ordered, func() {
-	var (
-		f         *FaceDetect
-		modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
-		imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
-	)
-
-	BeforeAll(func() {
-		if modelPath == "" || imagePath == "" {
-			Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
-		}
-		if err := ensureLibLoaded(); err != nil {
-			Skip("libfacedetect.so not loadable: " + err.Error())
-		}
-		f = &FaceDetect{}
-		Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
-	})
-
-	It("embeds the primary face in an image", func() {
-		emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(emb).ToNot(BeEmpty())
-	})
-
-	It("detects at least one face", func() {
-		resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(resp.Detections).ToNot(BeEmpty())
-		Expect(resp.Detections[0].ClassName).To(Equal("face"))
-	})
-
-	It("verifies an image against itself as the same identity", func() {
-		resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(resp.Verified).To(BeTrue())
-		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
-	})
-
-	It("analyzes age/gender for each face", func() {
-		resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(resp.Faces).ToNot(BeEmpty())
-	})
-})
--- a/backend/go/face-detect/main.go
+++ b/backend/go/face-detect/main.go
@@ -1,65 +0,0 @@
-package main
-
-// Started internally by LocalAI - one gRPC server per loaded model.
-//
-// Loads libfacedetect.so via purego and registers the flat C-API entry points
-// declared in facedetect_capi.h. The library name can be overridden with
-// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
-// convention in the sibling backends); the default looks for the .so next to
-// this binary (resolved via LD_LIBRARY_PATH by run.sh).
-import (
-	"flag"
-	"fmt"
-	"os"
-
-	"github.com/ebitengine/purego"
-	grpc "github.com/mudler/LocalAI/pkg/grpc"
-)
-
-var (
-	addr = flag.String("addr", "localhost:50051", "the address to connect to")
-)
-
-type LibFuncs struct {
-	FuncPtr any
-	Name    string
-}
-
-func main() {
-	libName := os.Getenv("FACEDETECT_LIBRARY")
-	if libName == "" {
-		libName = "libfacedetect.so"
-	}
-
-	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
-	if err != nil {
-		panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
-	}
-
-	// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
-	// uintptr so the raw pointer can be freed via the matching capi free fn.
-	libFuncs := []LibFuncs{
-		{&CppAbiVersion, "facedetect_capi_abi_version"},
-		{&CppLoad, "facedetect_capi_load"},
-		{&CppFree, "facedetect_capi_free"},
-		{&CppLastError, "facedetect_capi_last_error"},
-		{&CppFreeString, "facedetect_capi_free_string"},
-		{&CppFreeVec, "facedetect_capi_free_vec"},
-		{&CppEmbedPath, "facedetect_capi_embed_path"},
-		{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
-		{&CppDetectJSON, "facedetect_capi_detect_path_json"},
-		{&CppVerifyPaths, "facedetect_capi_verify_paths"},
-		{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
-	}
-	for _, lf := range libFuncs {
-		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
-	}
-
-	fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
-
-	flag.Parse()
-
-	if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
-		panic(err)
-	}
-}
--- a/backend/go/face-detect/options.go
+++ b/backend/go/face-detect/options.go
@@ -1,47 +0,0 @@
-package main
-
-import (
-	"strconv"
-	"strings"
-)
-
-// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
-// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
-// face backend ships with so the two implementations agree on verdicts out of
-// the box.
-const defaultVerifyThreshold float32 = 0.35
-
-// loadOptions holds the parsed model-level options for face-detect.
-type loadOptions struct {
-	verifyThreshold float32
-	modelName       string
-}
-
-func splitOption(o string) (key, value string, ok bool) {
-	i := strings.Index(o, ":")
-	if i < 0 {
-		return "", "", false
-	}
-	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
-}
-
-// parseOptions reads the backend "key:value" option slice. Unknown keys are
-// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
-func parseOptions(opts []string) loadOptions {
-	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
-	for _, oo := range opts {
-		key, value, ok := splitOption(oo)
-		if !ok {
-			continue
-		}
-		switch key {
-		case "verify_threshold", "threshold":
-			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
-				o.verifyThreshold = float32(f)
-			}
-		case "model_name":
-			o.modelName = value
-		}
-	}
-	return o
-}
--- a/backend/go/face-detect/package.sh
+++ b/backend/go/face-detect/package.sh
@@ -1,68 +0,0 @@
-#!/bin/bash
-#
-# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
-# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
-# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
-# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
-# is used instead of the host's.
-
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-REPO_ROOT="${CURDIR}/../../.."
-
-mkdir -p "$CURDIR/package/lib"
-
-cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
-cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
-
-# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
-# LD_LIBRARY_PATH, which run.sh points at lib/.
-cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
-	echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
-	exit 1
-}
-
-# Detect architecture and copy the core runtime libs libfacedetect.so links
-# against, plus the matching dynamic loader as lib/ld.so.
-if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
-    echo "Detected x86_64 architecture, copying x86_64 libraries..."
-    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
-    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
-    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
-elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
-    echo "Detected ARM64 architecture, copying ARM64 libraries..."
-    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
-    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
-    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
-elif [ "$(uname -s)" = "Darwin" ]; then
-    echo "Detected Darwin"
-else
-    echo "Error: Could not detect architecture"
-    exit 1
-fi
-
-# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
-# BUILD_TYPE so the backend can reach the GPU without the runtime base image
-# shipping those drivers.
-GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
-if [ -f "$GPU_LIB_SCRIPT" ]; then
-    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
-    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
-    package_gpu_libs
-fi
-
-echo "Packaging completed successfully"
-ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/face-detect/run.sh
+++ b/backend/go/face-detect/run.sh
@@ -1,16 +0,0 @@
-#!/bin/bash
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-
-export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
-
-# If a self-contained ld.so was packaged, route through it so the packaged
-# libc / libstdc++ are used instead of the host's (matches the voice-detect /
-# whisper / parakeet backends' runtime layout).
-if [ -f "$CURDIR/lib/ld.so" ]; then
-	echo "Using lib/ld.so"
-	exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
-fi
-
-exec "$CURDIR/face-detect-grpc" "$@"
--- a/backend/go/face-detect/test.sh
+++ b/backend/go/face-detect/test.sh
@@ -1,15 +0,0 @@
-#!/bin/bash
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-cd "$CURDIR"
-
-echo "Running face-detect backend tests..."
-
-# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
-# specs run only when a model + image are provided via
-# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
-# auto-skip.
-LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
-
-echo "face-detect tests completed."
--- a/backend/go/voice-detect/.gitignore
+++ b/backend/go/voice-detect/.gitignore
@@ -1,18 +0,0 @@
-# Fetched upstream sources
-sources/
-
-# CMake build directories
-build*/
-
-# build artifacts staged in-tree by the Makefile (cp from sources/) or
-# symlinked for local dev; the real sources live in voice-detect.cpp upstream.
-*.so
-*.so.*
-voicedetect_capi.h
-compile_commands.json
-
-# Compiled backend binary
-voice-detect-grpc
-
-# Packaging output
-package/
--- a/backend/go/voice-detect/Makefile
+++ b/backend/go/voice-detect/Makefile
@@ -1,94 +0,0 @@
-# voice-detect backend Makefile.
-#
-# Upstream pin lives below as VOICEDETECT_VERSION?=fe7e6a3... (.github/bump_deps.sh
-# can find and update it - matches the parakeet.cpp / whisper.cpp / ds4 convention).
-#
-# Local dev shortcut: if you already have an out-of-tree voice-detect.cpp build,
-# symlink the .so + header into this directory and skip the clone/cmake steps:
-#
-#   ln -sf /path/to/voice-detect.cpp/build-shared/libvoicedetect.so .
-#   ln -sf /path/to/voice-detect.cpp/include/voicedetect_capi.h .
-#   go build -o voice-detect-grpc .
-#
-# The default target below does the proper clone-at-pin + cmake build so CI does
-# not need a side-checkout.
-
-VOICEDETECT_VERSION?=fe7e6a3f0a0afc141566e18c8e97a8417ee0c3cd
-VOICEDETECT_REPO?=https://github.com/mudler/voice-detect.cpp
-
-GOCMD?=go
-GO_TAGS?=
-JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
-
-BUILD_TYPE?=
-NATIVE?=false
-
-# Build ggml statically into libvoicedetect.so (PIC) so the shared lib is
-# self-contained: dlopen needs no libggml*.so alongside it, only system libs
-# (libstdc++/libgomp/libc) that the runtime image already provides.
-CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DVOICEDETECT_SHARED=ON -DVOICEDETECT_BUILD_CLI=OFF -DVOICEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
-
-ifeq ($(NATIVE),false)
-	CMAKE_ARGS+=-DGGML_NATIVE=OFF
-endif
-
-# voice-detect.cpp gates its GGML backends behind VOICEDETECT_GGML_* options and
-# does set(GGML_CUDA ${VOICEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
-# -DGGML_CUDA=ON is overwritten back to OFF. Forward the VOICEDETECT_GGML_*
-# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
-ifeq ($(BUILD_TYPE),cublas)
-	CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDA=ON
-else ifeq ($(BUILD_TYPE),openblas)
-	CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
-else ifeq ($(BUILD_TYPE),hipblas)
-	CMAKE_ARGS+=-DVOICEDETECT_GGML_HIP=ON
-else ifeq ($(BUILD_TYPE),vulkan)
-	CMAKE_ARGS+=-DVOICEDETECT_GGML_VULKAN=ON
-else ifeq ($(BUILD_TYPE),metal)
-	CMAKE_ARGS+=-DVOICEDETECT_GGML_METAL=ON
-endif
-
-.PHONY: voice-detect-grpc package build clean purge test all
-
-all: voice-detect-grpc
-
-# Clone the upstream voice-detect.cpp source at the pinned commit. Directory acts
-# as the target so make only re-clones when missing. After a VOICEDETECT_VERSION
-# bump, run 'make purge && make' to refetch.
-sources/voice-detect.cpp:
-	mkdir -p sources/voice-detect.cpp
-	cd sources/voice-detect.cpp && \
-	git init -q && \
-	git remote add origin $(VOICEDETECT_REPO) && \
-	git fetch --depth 1 origin $(VOICEDETECT_VERSION) && \
-	git checkout FETCH_HEAD && \
-	git submodule update --init --recursive --depth 1 --single-branch
-
-# Build the shared lib + header out-of-tree, then stage them next to the Go
-# sources so purego.Dlopen("libvoicedetect.so") and the cgo-less build both pick
-# them up.
-libvoicedetect.so: sources/voice-detect.cpp
-	cmake -B sources/voice-detect.cpp/build-shared -S sources/voice-detect.cpp $(CMAKE_ARGS)
-	cmake --build sources/voice-detect.cpp/build-shared --config Release -j$(JOBS) --target voicedetect
-	cp -fv sources/voice-detect.cpp/build-shared/libvoicedetect.so* ./ 2>/dev/null || true
-	cp -fv sources/voice-detect.cpp/include/voicedetect_capi.h ./
-
-voice-detect-grpc: libvoicedetect.so main.go govoicedetect.go options.go
-	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voice-detect-grpc .
-
-package: voice-detect-grpc
-	bash package.sh
-
-build: package
-
-# Test target. The embed/verify/analyze smoke specs are gated on
-# VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV; without them the
-# heavy specs auto-skip and only the pure-Go parsing specs run.
-test:
-	LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
-
-clean: purge
-	rm -rf libvoicedetect.so* voicedetect_capi.h package voice-detect-grpc
-
-purge:
-	rm -rf sources/voice-detect.cpp
--- a/backend/go/voice-detect/govoicedetect.go
+++ b/backend/go/voice-detect/govoicedetect.go
@@ -1,273 +0,0 @@
-package main
-
-import (
-	"encoding/json"
-	"errors"
-	"fmt"
-	"math"
-	"os"
-	"path/filepath"
-	"strconv"
-	"strings"
-	"time"
-	"unsafe"
-
-	"github.com/mudler/LocalAI/pkg/grpc/base"
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-	"github.com/mudler/xlog"
-)
-
-// purego-bound entry points from libvoicedetect.so. Names match
-// voicedetect_capi.h exactly so a `nm libvoicedetect.so | grep voicedetect_capi`
-// is enough to spot drift.
-//
-// The opaque ctx and the malloc'd char*/float* return values are declared as
-// uintptr so we get the raw pointer back and can release it via the matching
-// capi free function. purego's native string/[]float32 returns would copy and
-// forget the original pointer, leaking the C-owned buffer on every call.
-var (
-	CppAbiVersion  func() int32
-	CppLoad        func(ggufPath string) uintptr
-	CppFree        func(ctx uintptr)
-	CppLastError   func(ctx uintptr) string
-	CppFreeString  func(s uintptr)
-	CppFreeVec     func(v uintptr)
-	CppEmbedPath   func(ctx uintptr, wavPath string, outVec, outDim unsafe.Pointer) int32
-	CppEmbedPCM    func(ctx uintptr, pcm []float32, nSamples, sampleRate int32, outVec, outDim unsafe.Pointer) int32
-	CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, outDistance, outVerified unsafe.Pointer) int32
-	CppAnalyzeJSON func(ctx uintptr, wavPath string) uintptr
-)
-
-// VoiceDetect implements the speaker-recognition voice subset of the Backend
-// gRPC service over libvoicedetect.so. The C side keeps a single loaded model
-// plus a per-ctx last-error buffer and is not reentrant, so base.SingleThread
-// serializes every call.
-type VoiceDetect struct {
-	base.SingleThread
-	opts   loadOptions
-	ctxPtr uintptr
-}
-
-func (v *VoiceDetect) Load(opts *pb.ModelOptions) error {
-	model := opts.ModelFile
-	if model == "" {
-		model = opts.ModelPath
-	}
-	if !filepath.IsAbs(model) && opts.ModelPath != "" {
-		model = filepath.Join(opts.ModelPath, model)
-	}
-	if model == "" {
-		return errors.New("voice-detect: ModelFile is required")
-	}
-
-	v.opts = parseOptions(opts.Options)
-	if v.opts.modelName == "" {
-		v.opts.modelName = filepath.Base(model)
-	}
-
-	// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
-	// one backend process per model and serves requests concurrently, so the
-	// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
-	// VOICEDETECT_THREADS is read by the engine at backend construction, so it
-	// must be set before the capi load. A non-positive Threads means "unset":
-	// leave the env alone so the engine keeps its sane default.
-	threads := opts.Threads
-	if threads > 0 {
-		if err := os.Setenv("VOICEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
-			return fmt.Errorf("voice-detect: set VOICEDETECT_THREADS: %w", err)
-		}
-		xlog.Info("voice-detect: applying LocalAI thread budget", "threads", threads)
-	}
-
-	xlog.Info("voice-detect: loading model", "model", model,
-		"verify_threshold", v.opts.verifyThreshold, "abi", CppAbiVersion())
-
-	ctx := CppLoad(model)
-	if ctx == 0 {
-		// The last-error buffer lives on the ctx that was never returned, so
-		// surface the path the operator tried to load instead.
-		return fmt.Errorf("voice-detect: voicedetect_capi_load failed for %q", model)
-	}
-	v.ctxPtr = ctx
-	return nil
-}
-
-// VoiceEmbed returns the L2-normalized speaker embedding for an audio clip.
-// The request carries a filesystem PATH; the HTTP layer materializes
-// base64/URL/data-URI inputs to a temp file before the gRPC call.
-func (v *VoiceDetect) VoiceEmbed(req *pb.VoiceEmbedRequest) (pb.VoiceEmbedResponse, error) {
-	if v.ctxPtr == 0 {
-		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: model not loaded")
-	}
-	if req.Audio == "" {
-		return pb.VoiceEmbedResponse{}, errors.New("voice-detect: audio path is required")
-	}
-	emb, err := v.embedPath(req.Audio)
-	if err != nil {
-		return pb.VoiceEmbedResponse{}, err
-	}
-	return pb.VoiceEmbedResponse{Embedding: emb, Model: v.opts.modelName}, nil
-}
-
-func (v *VoiceDetect) embedPath(path string) ([]float32, error) {
-	var vec uintptr
-	var dim int32
-	rc := CppEmbedPath(v.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
-	if rc != 0 || vec == 0 || dim <= 0 {
-		return nil, v.lastErr("embed", path)
-	}
-	defer CppFreeVec(vec)
-	// Copy out of the C-owned malloc'd buffer before freeing it. The
-	// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
-	// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
-	// nor moves this buffer and we copy immediately.
-	src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
-	out := make([]float32, int(dim))
-	copy(out, src)
-	return out, nil
-}
-
-// VoiceVerify embeds two clips and reports whether they are the same speaker by
-// cosine distance against a threshold. A request threshold <= 0 falls back to
-// the model-configured default (verify_threshold option, 0.25 if unset).
-func (v *VoiceDetect) VoiceVerify(req *pb.VoiceVerifyRequest) (pb.VoiceVerifyResponse, error) {
-	if v.ctxPtr == 0 {
-		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: model not loaded")
-	}
-	if req.Audio1 == "" || req.Audio2 == "" {
-		return pb.VoiceVerifyResponse{}, errors.New("voice-detect: audio1 and audio2 are required")
-	}
-
-	threshold := req.Threshold
-	if threshold <= 0 {
-		threshold = v.opts.verifyThreshold
-	}
-
-	started := time.Now()
-	var distance float32
-	var verified int32
-	rc := CppVerifyPaths(v.ctxPtr, req.Audio1, req.Audio2, threshold,
-		unsafe.Pointer(&distance), unsafe.Pointer(&verified))
-	if rc != 0 {
-		return pb.VoiceVerifyResponse{}, v.lastErr("verify", req.Audio1+","+req.Audio2)
-	}
-	elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
-
-	// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
-	// matching the Python speaker-recognition backend's reporting.
-	confidence := float32(0)
-	if threshold > 0 {
-		confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
-	}
-
-	return pb.VoiceVerifyResponse{
-		Verified:         verified != 0,
-		Distance:         distance,
-		Threshold:        threshold,
-		Confidence:       confidence,
-		Model:            v.opts.modelName,
-		ProcessingTimeMs: elapsedMs,
-	}, nil
-}
-
-// VoiceAnalyze runs the age/gender/emotion heads on a single clip. The C-API
-// always evaluates every supported head, so the request's actions filter is
-// advisory and the full analysis is returned as a single segment (the engine
-// does not produce time-bounded segments).
-func (v *VoiceDetect) VoiceAnalyze(req *pb.VoiceAnalyzeRequest) (pb.VoiceAnalyzeResponse, error) {
-	if v.ctxPtr == 0 {
-		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: model not loaded")
-	}
-	if req.Audio == "" {
-		return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: audio path is required")
-	}
-
-	ptr := CppAnalyzeJSON(v.ctxPtr, req.Audio)
-	if ptr == 0 {
-		return pb.VoiceAnalyzeResponse{}, v.lastErr("analyze", req.Audio)
-	}
-	defer CppFreeString(ptr)
-
-	seg, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
-	if err != nil {
-		return pb.VoiceAnalyzeResponse{}, fmt.Errorf("voice-detect: analyze JSON for %q: %w", req.Audio, err)
-	}
-	return pb.VoiceAnalyzeResponse{Segments: []*pb.VoiceAnalysis{seg}}, nil
-}
-
-// analyzeJSON mirrors the document returned by voicedetect_capi_analyze_path_json:
-//
-//	{"age":42.0,
-//	 "gender":{"label":"female","female":0.88,"male":0.12},
-//	 "emotion":{"label":"neutral","scores":{"neutral":0.7, ...}}}
-//
-// gender is a mixed object (a "label" string plus per-class float scores), so
-// it is decoded into raw messages and split in parseAnalyzeJSON.
-type analyzeJSON struct {
-	Age     float32                    `json:"age"`
-	Gender  map[string]json.RawMessage `json:"gender"`
-	Emotion struct {
-		Label  string             `json:"label"`
-		Scores map[string]float32 `json:"scores"`
-	} `json:"emotion"`
-}
-
-// parseAnalyzeJSON maps the engine's analyze document onto a VoiceAnalysis.
-// start/end stay 0: the model emits a single whole-utterance result, not
-// time-bounded segments.
-func parseAnalyzeJSON(doc string) (*pb.VoiceAnalysis, error) {
-	var a analyzeJSON
-	if err := json.Unmarshal([]byte(doc), &a); err != nil {
-		return nil, err
-	}
-
-	seg := &pb.VoiceAnalysis{
-		Age:             a.Age,
-		DominantEmotion: a.Emotion.Label,
-		Emotion:         a.Emotion.Scores,
-	}
-
-	if len(a.Gender) > 0 {
-		gender := make(map[string]float32, len(a.Gender))
-		for k, raw := range a.Gender {
-			if k == "label" {
-				_ = json.Unmarshal(raw, &seg.DominantGender)
-				continue
-			}
-			var score float32
-			if err := json.Unmarshal(raw, &score); err == nil {
-				gender[k] = score
-			}
-		}
-		seg.Gender = gender
-	}
-
-	return seg, nil
-}
-
-// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
-func (v *VoiceDetect) lastErr(op, subject string) error {
-	msg := strings.TrimSpace(CppLastError(v.ctxPtr))
-	if msg == "" {
-		msg = "no error detail"
-	}
-	return fmt.Errorf("voice-detect: %s failed for %q: %s", op, subject, msg)
-}
-
-// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
-// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
-//
-// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
-// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
-// moves the buffer and we dereference it immediately to copy the bytes out.
-func goStringFromCPtr(cptr uintptr) string {
-	if cptr == 0 {
-		return ""
-	}
-	p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
-	n := 0
-	for *(*byte)(unsafe.Add(p, n)) != 0 {
-		n++
-	}
-	return string(unsafe.Slice((*byte)(p), n))
-}
--- a/backend/go/voice-detect/govoicedetect_test.go
+++ b/backend/go/voice-detect/govoicedetect_test.go
@@ -1,144 +0,0 @@
-package main
-
-import (
-	"os"
-	"sync"
-	"testing"
-
-	"github.com/ebitengine/purego"
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-func TestVoiceDetect(t *testing.T) {
-	RegisterFailHandler(Fail)
-	RunSpecs(t, "voice-detect Backend Suite")
-}
-
-var (
-	libLoadOnce sync.Once
-	libLoadErr  error
-)
-
-// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
-// bridge without spinning up the gRPC server. Records the error (the smoke
-// specs skip themselves) when libvoicedetect.so is not loadable from cwd
-// (LD_LIBRARY_PATH or a symlink in ./).
-func ensureLibLoaded() error {
-	libLoadOnce.Do(func() {
-		libName := os.Getenv("VOICEDETECT_LIBRARY")
-		if libName == "" {
-			libName = "libvoicedetect.so"
-		}
-		lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
-		if err != nil {
-			libLoadErr = err
-			return
-		}
-		purego.RegisterLibFunc(&CppAbiVersion, lib, "voicedetect_capi_abi_version")
-		purego.RegisterLibFunc(&CppLoad, lib, "voicedetect_capi_load")
-		purego.RegisterLibFunc(&CppFree, lib, "voicedetect_capi_free")
-		purego.RegisterLibFunc(&CppLastError, lib, "voicedetect_capi_last_error")
-		purego.RegisterLibFunc(&CppFreeString, lib, "voicedetect_capi_free_string")
-		purego.RegisterLibFunc(&CppFreeVec, lib, "voicedetect_capi_free_vec")
-		purego.RegisterLibFunc(&CppEmbedPath, lib, "voicedetect_capi_embed_path")
-		purego.RegisterLibFunc(&CppEmbedPCM, lib, "voicedetect_capi_embed_pcm")
-		purego.RegisterLibFunc(&CppVerifyPaths, lib, "voicedetect_capi_verify_paths")
-		purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "voicedetect_capi_analyze_path_json")
-	})
-	return libLoadErr
-}
-
-var _ = Describe("parseOptions", func() {
-	It("defaults verify_threshold to 0.25", func() {
-		o := parseOptions(nil)
-		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
-		Expect(o.modelName).To(Equal(""))
-	})
-
-	It("parses verify_threshold, threshold alias and model_name", func() {
-		o := parseOptions([]string{"verify_threshold:0.4", "model_name:ecapa", "unknown:x"})
-		Expect(o.verifyThreshold).To(Equal(float32(0.4)))
-		Expect(o.modelName).To(Equal("ecapa"))
-
-		o2 := parseOptions([]string{"threshold:0.3"})
-		Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
-	})
-
-	It("ignores non-positive thresholds and keeps the default", func() {
-		o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
-		Expect(o.verifyThreshold).To(Equal(float32(0.25)))
-	})
-})
-
-var _ = Describe("parseAnalyzeJSON", func() {
-	It("maps age, gender label+scores and emotion label+scores", func() {
-		doc := `{"age":42.0,
-			"gender":{"label":"female","female":0.88,"male":0.12},
-			"emotion":{"label":"neutral","scores":{"neutral":0.7,"happy":0.2,"sad":0.1}}}`
-		seg, err := parseAnalyzeJSON(doc)
-		Expect(err).ToNot(HaveOccurred())
-		Expect(seg.Age).To(BeNumerically("~", 42.0, 1e-4))
-		Expect(seg.Start).To(Equal(float32(0)))
-		Expect(seg.End).To(Equal(float32(0)))
-
-		Expect(seg.DominantGender).To(Equal("female"))
-		Expect(seg.Gender).To(HaveKeyWithValue("female", BeNumerically("~", 0.88, 1e-4)))
-		Expect(seg.Gender).To(HaveKeyWithValue("male", BeNumerically("~", 0.12, 1e-4)))
-		// The "label" entry is consumed into DominantGender, not the score map.
-		Expect(seg.Gender).ToNot(HaveKey("label"))
-
-		Expect(seg.DominantEmotion).To(Equal("neutral"))
-		Expect(seg.Emotion).To(HaveKeyWithValue("neutral", BeNumerically("~", 0.7, 1e-4)))
-		Expect(seg.Emotion).To(HaveKeyWithValue("happy", BeNumerically("~", 0.2, 1e-4)))
-	})
-
-	It("tolerates a missing gender block", func() {
-		seg, err := parseAnalyzeJSON(`{"age":30.0,"emotion":{"label":"happy","scores":{"happy":1.0}}}`)
-		Expect(err).ToNot(HaveOccurred())
-		Expect(seg.DominantGender).To(Equal(""))
-		Expect(seg.DominantEmotion).To(Equal("happy"))
-	})
-
-	It("returns an error on malformed JSON", func() {
-		_, err := parseAnalyzeJSON(`{not-json`)
-		Expect(err).To(HaveOccurred())
-	})
-})
-
-// The specs below exercise the real C-API end to end. They run only when both a
-// model GGUF and a test WAV are provided, and skip cleanly otherwise so the
-// suite stays green without large assets.
-var _ = Describe("VoiceDetect end-to-end", Ordered, func() {
-	var (
-		v         *VoiceDetect
-		modelPath = os.Getenv("VOICEDETECT_BACKEND_TEST_MODEL")
-		wavPath   = os.Getenv("VOICEDETECT_BACKEND_TEST_WAV")
-	)
-
-	BeforeAll(func() {
-		if modelPath == "" || wavPath == "" {
-			Skip("set VOICEDETECT_BACKEND_TEST_MODEL and VOICEDETECT_BACKEND_TEST_WAV to run the e2e specs")
-		}
-		if err := ensureLibLoaded(); err != nil {
-			Skip("libvoicedetect.so not loadable: " + err.Error())
-		}
-		v = &VoiceDetect{}
-		Expect(v.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
-	})
-
-	It("embeds an audio clip", func() {
-		resp, err := v.VoiceEmbed(&pb.VoiceEmbedRequest{Audio: wavPath})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(resp.Embedding).ToNot(BeEmpty())
-		Expect(resp.Model).ToNot(BeEmpty())
-	})
-
-	It("verifies a clip against itself as the same speaker", func() {
-		resp, err := v.VoiceVerify(&pb.VoiceVerifyRequest{Audio1: wavPath, Audio2: wavPath})
-		Expect(err).ToNot(HaveOccurred())
-		Expect(resp.Verified).To(BeTrue())
-		Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
-	})
-})
--- a/backend/go/voice-detect/main.go
+++ b/backend/go/voice-detect/main.go
@@ -1,64 +0,0 @@
-package main
-
-// Started internally by LocalAI - one gRPC server per loaded model.
-//
-// Loads libvoicedetect.so via purego and registers the flat C-API entry points
-// declared in voicedetect_capi.h. The library name can be overridden with
-// VOICEDETECT_LIBRARY (mirrors the PARAKEET_LIBRARY / OMNIVOICE_LIBRARY
-// convention in the sibling backends); the default looks for the .so next to
-// this binary (resolved via LD_LIBRARY_PATH by run.sh).
-import (
-	"flag"
-	"fmt"
-	"os"
-
-	"github.com/ebitengine/purego"
-	grpc "github.com/mudler/LocalAI/pkg/grpc"
-)
-
-var (
-	addr = flag.String("addr", "localhost:50051", "the address to connect to")
-)
-
-type LibFuncs struct {
-	FuncPtr any
-	Name    string
-}
-
-func main() {
-	libName := os.Getenv("VOICEDETECT_LIBRARY")
-	if libName == "" {
-		libName = "libvoicedetect.so"
-	}
-
-	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
-	if err != nil {
-		panic(fmt.Errorf("voice-detect: dlopen %q: %w", libName, err))
-	}
-
-	// Bound 1:1 to voicedetect_capi.h. char*/float* returns are registered as
-	// uintptr so the raw pointer can be freed via the matching capi free fn.
-	libFuncs := []LibFuncs{
-		{&CppAbiVersion, "voicedetect_capi_abi_version"},
-		{&CppLoad, "voicedetect_capi_load"},
-		{&CppFree, "voicedetect_capi_free"},
-		{&CppLastError, "voicedetect_capi_last_error"},
-		{&CppFreeString, "voicedetect_capi_free_string"},
-		{&CppFreeVec, "voicedetect_capi_free_vec"},
-		{&CppEmbedPath, "voicedetect_capi_embed_path"},
-		{&CppEmbedPCM, "voicedetect_capi_embed_pcm"},
-		{&CppVerifyPaths, "voicedetect_capi_verify_paths"},
-		{&CppAnalyzeJSON, "voicedetect_capi_analyze_path_json"},
-	}
-	for _, lf := range libFuncs {
-		purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
-	}
-
-	fmt.Fprintf(os.Stderr, "[voice-detect] ABI=%d\n", CppAbiVersion())
-
-	flag.Parse()
-
-	if err := grpc.StartServer(*addr, &VoiceDetect{}); err != nil {
-		panic(err)
-	}
-}
--- a/backend/go/voice-detect/options.go
+++ b/backend/go/voice-detect/options.go
@@ -1,46 +0,0 @@
-package main
-
-import (
-	"strconv"
-	"strings"
-)
-
-// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
-// not set one. Matches the Python speaker-recognition backend's default so the
-// two implementations agree on verdicts out of the box.
-const defaultVerifyThreshold float32 = 0.25
-
-// loadOptions holds the parsed model-level options for voice-detect.
-type loadOptions struct {
-	verifyThreshold float32
-	modelName       string
-}
-
-func splitOption(o string) (key, value string, ok bool) {
-	i := strings.Index(o, ":")
-	if i < 0 {
-		return "", "", false
-	}
-	return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
-}
-
-// parseOptions reads the backend "key:value" option slice. Unknown keys are
-// ignored. Defaults: verify_threshold 0.25, model_name derived from the file.
-func parseOptions(opts []string) loadOptions {
-	o := loadOptions{verifyThreshold: defaultVerifyThreshold}
-	for _, oo := range opts {
-		key, value, ok := splitOption(oo)
-		if !ok {
-			continue
-		}
-		switch key {
-		case "verify_threshold", "threshold":
-			if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
-				o.verifyThreshold = float32(f)
-			}
-		case "model_name":
-			o.modelName = value
-		}
-	}
-	return o
-}
--- a/backend/go/voice-detect/package.sh
+++ b/backend/go/voice-detect/package.sh
@@ -1,68 +0,0 @@
-#!/bin/bash
-#
-# Bundle the voice-detect-grpc binary, libvoicedetect.so, the core runtime libs
-# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
-# so the package is self-contained. Mirrors backend/go/parakeet-cpp/package.sh;
-# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
-# is used instead of the host's.
-
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-REPO_ROOT="${CURDIR}/../../.."
-
-mkdir -p "$CURDIR/package/lib"
-
-cp -avf "$CURDIR/voice-detect-grpc" "$CURDIR/package/"
-cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
-
-# libvoicedetect.so + any soname symlinks. purego.Dlopen resolves it via
-# LD_LIBRARY_PATH, which run.sh points at lib/.
-cp -avf "$CURDIR"/libvoicedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
-	echo "ERROR: libvoicedetect.so not found in $CURDIR, run 'make' first" >&2
-	exit 1
-}
-
-# Detect architecture and copy the core runtime libs libvoicedetect.so links
-# against, plus the matching dynamic loader as lib/ld.so.
-if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
-    echo "Detected x86_64 architecture, copying x86_64 libraries..."
-    cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
-    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
-    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
-    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
-    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
-elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
-    echo "Detected ARM64 architecture, copying ARM64 libraries..."
-    cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
-    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
-    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
-    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
-    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
-elif [ "$(uname -s)" = "Darwin" ]; then
-    echo "Detected Darwin"
-else
-    echo "Error: Could not detect architecture"
-    exit 1
-fi
-
-# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
-# BUILD_TYPE so the backend can reach the GPU without the runtime base image
-# shipping those drivers.
-GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
-if [ -f "$GPU_LIB_SCRIPT" ]; then
-    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
-    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
-    package_gpu_libs
-fi
-
-echo "Packaging completed successfully"
-ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"
--- a/backend/go/voice-detect/run.sh
+++ b/backend/go/voice-detect/run.sh
@@ -1,16 +0,0 @@
-#!/bin/bash
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-
-export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
-
-# If a self-contained ld.so was packaged, route through it so the packaged
-# libc / libstdc++ are used instead of the host's (matches the whisper /
-# parakeet backends' runtime layout).
-if [ -f "$CURDIR/lib/ld.so" ]; then
-	echo "Using lib/ld.so"
-	exec "$CURDIR/lib/ld.so" "$CURDIR/voice-detect-grpc" "$@"
-fi
-
-exec "$CURDIR/voice-detect-grpc" "$@"
--- a/backend/go/voice-detect/test.sh
+++ b/backend/go/voice-detect/test.sh
@@ -1,14 +0,0 @@
-#!/bin/bash
-set -e
-
-CURDIR=$(dirname "$(realpath "$0")")
-cd "$CURDIR"
-
-echo "Running voice-detect backend tests..."
-
-# The pure-Go parsing specs always run. The embed/verify/analyze smoke specs run
-# only when a model + WAV are provided via VOICEDETECT_BACKEND_TEST_MODEL and
-# VOICEDETECT_BACKEND_TEST_WAV; otherwise they auto-skip.
-LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
-
-echo "voice-detect tests completed."
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -209,78 +209,6 @@
    nvidia-cuda-12: "cuda12-ced"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced"
- &voicedetect
-  name: "voice-detect"
-  alias: "voice-detect"
-  license: mit
-  icon: https://avatars.githubusercontent.com/u/95302084
-  description: |
-    voice-detect speaker recognition and voice analysis.
-    voice-detect.cpp is a C++/ggml engine that produces L2-normalised
-    speaker embeddings (ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker
-    ERes2Net, CAM++) for voice verification and 1:N identification, plus
-    a wav2vec2 age / gender / emotion analysis head. It replaces the
-    Python speaker-recognition backend and is exposed through the Voice*
-    gRPC rpcs and the /v1/voice/* REST endpoints. It runs on CPU, NVIDIA
-    CUDA, AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-  tags:
-    - voice-recognition
-    - speaker-verification
-    - speaker-embedding
-    - CPU
-    - GPU
-    - CUDA
-    - HIP
-  capabilities:
-    default: "cpu-voice-detect"
-    nvidia: "cuda12-voice-detect"
-    intel: "intel-sycl-f16-voice-detect"
-    metal: "metal-voice-detect"
-    amd: "rocm-voice-detect"
-    vulkan: "vulkan-voice-detect"
-    nvidia-l4t: "nvidia-l4t-arm64-voice-detect"
-    nvidia-cuda-13: "cuda13-voice-detect"
-    nvidia-cuda-12: "cuda12-voice-detect"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect"
- &facedetect
-  name: "face-detect"
-  alias: "face-detect"
-  license: mit
-  icon: https://avatars.githubusercontent.com/u/95302084
-  description: |
-    face-detect face detection, embedding, verification and analysis.
-    face-detect.cpp is a C++/ggml engine that runs SCRFD / YuNet face
-    detection and ArcFace / SFace 512-d (or 128-d) L2-normalised face
-    embeddings for verification and 1:N identification, plus a landmark /
-    age / gender analysis head. It replaces the Python insightface backend
-    and is exposed through the Embedding, Detect and Face* gRPC rpcs and
-    the /v1/face/* REST endpoints. It runs on CPU, NVIDIA CUDA, AMD
-    ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - CPU
-    - GPU
-    - CUDA
-    - HIP
-  capabilities:
-    default: "cpu-face-detect"
-    nvidia: "cuda12-face-detect"
-    intel: "intel-sycl-f16-face-detect"
-    metal: "metal-face-detect"
-    amd: "rocm-face-detect"
-    vulkan: "vulkan-face-detect"
-    nvidia-l4t: "nvidia-l4t-arm64-face-detect"
-    nvidia-cuda-13: "cuda13-face-detect"
-    nvidia-cuda-12: "cuda12-face-detect"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect"
 - &voxtral
  name: "voxtral"
  alias: "voxtral"
@@ -2868,236 +2796,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-13-ced
-## voice-detect
- !!merge <<: *voicedetect
-  name: "voice-detect-development"
-  capabilities:
-    default: "cpu-voice-detect-development"
-    nvidia: "cuda12-voice-detect-development"
-    intel: "intel-sycl-f16-voice-detect-development"
-    metal: "metal-voice-detect-development"
-    amd: "rocm-voice-detect-development"
-    vulkan: "vulkan-voice-detect-development"
-    nvidia-l4t: "nvidia-l4t-arm64-voice-detect-development"
-    nvidia-cuda-13: "cuda13-voice-detect-development"
-    nvidia-cuda-12: "cuda12-voice-detect-development"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect-development"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect-development"
- !!merge <<: *voicedetect
-  name: "nvidia-l4t-arm64-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-nvidia-l4t-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "nvidia-l4t-arm64-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-nvidia-l4t-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda13-nvidia-l4t-arm64-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda13-nvidia-l4t-arm64-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "cpu-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-cpu-voice-detect
- !!merge <<: *voicedetect
-  name: "cpu-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-cpu-voice-detect
- !!merge <<: *voicedetect
-  name: "metal-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "metal-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda12-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-nvidia-cuda-12-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda12-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-nvidia-cuda-12-voice-detect
- !!merge <<: *voicedetect
-  name: "rocm-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-rocm-hipblas-voice-detect
- !!merge <<: *voicedetect
-  name: "rocm-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-rocm-hipblas-voice-detect
- !!merge <<: *voicedetect
-  name: "intel-sycl-f32-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-voice-detect
- !!merge <<: *voicedetect
-  name: "intel-sycl-f32-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-voice-detect
- !!merge <<: *voicedetect
-  name: "intel-sycl-f16-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-voice-detect
- !!merge <<: *voicedetect
-  name: "intel-sycl-f16-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-voice-detect
- !!merge <<: *voicedetect
-  name: "vulkan-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-vulkan-voice-detect
- !!merge <<: *voicedetect
-  name: "vulkan-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-vulkan-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda13-voice-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voice-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-nvidia-cuda-13-voice-detect
- !!merge <<: *voicedetect
-  name: "cuda13-voice-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voice-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-nvidia-cuda-13-voice-detect
-## face-detect
- !!merge <<: *facedetect
-  name: "face-detect-development"
-  capabilities:
-    default: "cpu-face-detect-development"
-    nvidia: "cuda12-face-detect-development"
-    intel: "intel-sycl-f16-face-detect-development"
-    metal: "metal-face-detect-development"
-    amd: "rocm-face-detect-development"
-    vulkan: "vulkan-face-detect-development"
-    nvidia-l4t: "nvidia-l4t-arm64-face-detect-development"
-    nvidia-cuda-13: "cuda13-face-detect-development"
-    nvidia-cuda-12: "cuda12-face-detect-development"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect-development"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect-development"
- !!merge <<: *facedetect
-  name: "nvidia-l4t-arm64-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-nvidia-l4t-arm64-face-detect
- !!merge <<: *facedetect
-  name: "nvidia-l4t-arm64-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:master-nvidia-l4t-arm64-face-detect
- !!merge <<: *facedetect
-  name: "cuda13-nvidia-l4t-arm64-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect
- !!merge <<: *facedetect
-  name: "cuda13-nvidia-l4t-arm64-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect
- !!merge <<: *facedetect
-  name: "cpu-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-cpu-face-detect
- !!merge <<: *facedetect
-  name: "cpu-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-face-detect"
-  mirrors:
-    - localai/localai-backends:master-cpu-face-detect
- !!merge <<: *facedetect
-  name: "metal-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-face-detect
- !!merge <<: *facedetect
-  name: "metal-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-face-detect"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-face-detect
- !!merge <<: *facedetect
-  name: "cuda12-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-nvidia-cuda-12-face-detect
- !!merge <<: *facedetect
-  name: "cuda12-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-nvidia-cuda-12-face-detect
- !!merge <<: *facedetect
-  name: "rocm-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-rocm-hipblas-face-detect
- !!merge <<: *facedetect
-  name: "rocm-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-rocm-hipblas-face-detect
- !!merge <<: *facedetect
-  name: "intel-sycl-f32-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f32-face-detect
- !!merge <<: *facedetect
-  name: "intel-sycl-f32-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f32-face-detect
- !!merge <<: *facedetect
-  name: "intel-sycl-f16-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-intel-sycl-f16-face-detect
- !!merge <<: *facedetect
-  name: "intel-sycl-f16-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-intel-sycl-f16-face-detect
- !!merge <<: *facedetect
-  name: "vulkan-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-vulkan-face-detect
- !!merge <<: *facedetect
-  name: "vulkan-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-vulkan-face-detect
- !!merge <<: *facedetect
-  name: "cuda13-face-detect"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-face-detect"
-  mirrors:
-    - localai/localai-backends:latest-gpu-nvidia-cuda-13-face-detect
- !!merge <<: *facedetect
-  name: "cuda13-face-detect-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-face-detect"
-  mirrors:
-    - localai/localai-backends:master-gpu-nvidia-cuda-13-face-detect
 ## stablediffusion-ggml
 - !!merge <<: *stablediffusionggml
  name: "cpu-stablediffusion-ggml"
--- a/backend/python/diffusers/requirements-cpu.txt
+++ b/backend/python/diffusers/requirements-cpu.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 torchvision==0.22.1
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,9 +10,15 @@ sentencepiece
 torch==2.7.1
 optimum-quanto
 ftfy
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-cublas12.txt
+++ b/backend/python/diffusers/requirements-cublas12.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu121
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 torchvision
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,9 +10,15 @@ sentencepiece
 torch
 ftfy
 optimum-quanto
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-cublas13.txt
+++ b/backend/python/diffusers/requirements-cublas13.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu130
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 torchvision
 accelerate
 git+https://github.com/xhinker/sd_embed
@@ -10,9 +10,15 @@ sentencepiece
 torch
 ftfy
 optimum-quanto
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-hipblas.txt
+++ b/backend/python/diffusers/requirements-hipblas.txt
@@ -1,17 +1,23 @@
 --extra-index-url https://download.pytorch.org/whl/rocm7.0
 torch==2.10.0+rocm7.0
 torchvision==0.25.0+rocm7.0
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 accelerate
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-intel.txt
+++ b/backend/python/diffusers/requirements-intel.txt
@@ -3,18 +3,24 @@ torch
 torchvision
 optimum[openvino]
 setuptools
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 accelerate
 git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-l4t12.txt
+++ b/backend/python/diffusers/requirements-l4t12.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129/
 torch
-git+https://github.com/huggingface/diffusers
-transformers
+diffusers==0.38.0
+transformers==4.57.6
 accelerate
 peft
 optimum-quanto
@@ -9,9 +9,15 @@ numpy<2
 sentencepiece
 torchvision
 ftfy
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-l4t13.txt
+++ b/backend/python/diffusers/requirements-l4t13.txt
@@ -1,7 +1,7 @@
 --extra-index-url https://download.pytorch.org/whl/cu130
 torch
-git+https://github.com/huggingface/diffusers
-transformers
+diffusers==0.38.0
+transformers==4.57.6
 accelerate
 peft
 optimum-quanto
@@ -10,9 +10,15 @@ sentencepiece
 torchvision
 ftfy
 chardet
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/diffusers/requirements-mps.txt
+++ b/backend/python/diffusers/requirements-mps.txt
@@ -1,16 +1,22 @@
 torch==2.7.1
 torchvision==0.22.1
-git+https://github.com/huggingface/diffusers
+diffusers==0.38.0
 opencv-python
-transformers
+transformers==4.57.6
 accelerate
 peft
 sentencepiece
 optimum-quanto
 ftfy
-# TODO: re-add compel once it supports transformers >= 5.
-# Tracking: https://github.com/damian0815/compel/pull/129
-#           https://github.com/damian0815/compel/issues/128
-# compel currently pins transformers~=4.25, which forced pip into multi-hour
-# resolver backtracking storms in CI. backend.py imports it lazily and gates
-# the COMPEL=1 env var on the import succeeding, so dropping it here is safe.
+# diffusers and transformers are pinned together on purpose. transformers v5
+# restructured CLIPTextModel and dropped the `.text_model` attribute, which
+# breaks single-file Stable Diffusion loading on every released diffusers
+# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking
+# main via git froze whichever broken pair existed at image-build time. Pin the
+# last known-good released pair so builds are reproducible and can't drift into
+# the broken window. See https://github.com/mudler/LocalAI/issues/9979
+#
+# compel is intentionally omitted: it pins transformers~=4.25, which conflicts
+# with this pin and previously forced pip into multi-hour resolver backtracking
+# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on
+# the import succeeding, so dropping it here is safe.
--- a/backend/python/vllm/requirements-cpu.txt
+++ b/backend/python/vllm/requirements-cpu.txt
@@ -1,6 +1,6 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
 accelerate
-torch==2.9.1+cpu
+torch==2.12.1+xpu
 torchvision
 torchaudio
 transformers
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -341,11 +341,9 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d
 	}
 	appCfg := a.ApplicationConfig()

-	if cfg.PII.Enabled != nil {
-		enabled = *cfg.PII.Enabled
-	} else {
-		enabled = cfg.PIIIsEnabled() // backend default (cloud-proxy)
-	}
+	// PIIIsEnabled already encodes "explicit pii.enabled wins, else backend
+	// default (cloud-proxy)" — the single source of that rule.
+	enabled = cfg.PIIIsEnabled()
 	if !enabled {
 		return false, nil
 	}
@@ -354,7 +352,7 @@ func (a *Application) ResolvePIIPolicy(cfg *config.ModelConfig) (enabled bool, d
 	if len(detectors) == 0 {
 		detectors = append([]string(nil), appCfg.PIIDefaultDetectors...)
 	}
-	return enabled, detectors
+	return true, detectors // enabled is necessarily true past the !enabled guard
 }

 // PIIPolicyResolver adapts ResolvePIIPolicy to pii.PolicyResolver for
--- a/core/application/distributed.go
+++ b/core/application/distributed.go
@@ -357,6 +357,15 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
 		Pressure:         pressure,
 	})

+	// Wire staging-progress broadcasting so file-staging shows up on every
+	// replica, not just the one performing the transfer. Without this, a
+	// /api/operations poll that round-robins onto a peer sees no staging row and
+	// the progress flickers. The origin publishes; peers mirror via the wildcard.
+	router.StagingTracker().SetPublisher(natsClient)
+	if _, err := router.StagingTracker().SubscribeBroadcasts(natsClient); err != nil {
+		xlog.Warn("Failed to subscribe to staging progress broadcasts", "error", err)
+	}
+
 	// Create ReplicaReconciler for auto-scaling model replicas. Adapter +
 	// RegistrationToken feed the state-reconciliation passes: pending op
 	// drain uses the adapter, and model health probes use the token to auth
--- a/core/config/backend_capabilities.go
+++ b/core/config/backend_capabilities.go
@@ -542,19 +542,6 @@ var BackendCapabilities = map[string]BackendCapability{
 		DefaultUsecases:  []string{UsecaseSpeakerRecognition},
 		Description:      "Speaker recognition — voice identity verification and analysis",
 	},
-	"voice-detect": {
-		GRPCMethods:      []GRPCMethod{MethodVoiceVerify, MethodVoiceEmbed, MethodVoiceAnalyze},
-		PossibleUsecases: []string{UsecaseSpeakerRecognition},
-		DefaultUsecases:  []string{UsecaseSpeakerRecognition},
-		Description:      "voice-detect.cpp: C++/ggml speaker embedding, verification and voice analysis (age/gender/emotion)",
-	},
-	"face-detect": {
-		GRPCMethods:      []GRPCMethod{MethodEmbedding, MethodDetect, MethodFaceVerify, MethodFaceAnalyze},
-		PossibleUsecases: []string{UsecaseEmbeddings, UsecaseDetection, UsecaseFaceRecognition},
-		DefaultUsecases:  []string{UsecaseFaceRecognition},
-		AcceptsImages:    true,
-		Description:      "face-detect.cpp: C++/ggml face detection, embedding, verification and attribute analysis",
-	},
 	"silero-vad": {
 		GRPCMethods:      []GRPCMethod{MethodVAD},
 		PossibleUsecases: []string{UsecaseVAD},
--- a/core/http/endpoints/localai/nodes.go
+++ b/core/http/endpoints/localai/nodes.go
@@ -385,6 +385,23 @@ func GetNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
 	}
 }

+// ListAllNodeModelsEndpoint returns all loaded models across all healthy nodes.
+// @Summary List all loaded models cluster-wide
+// @Tags Nodes
+// @Success 200 {array} nodes.NodeModel
+// @Router /api/nodes/models [get]
+func ListAllNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		ctx := c.Request().Context()
+		models, err := registry.ListAllLoadedModels(ctx)
+		if err != nil {
+			xlog.Error("Failed to list all node models", "error", err)
+			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to list node models"))
+		}
+		return c.JSON(http.StatusOK, models)
+	}
+}
+
 // DrainNodeEndpoint sets a node to draining status (no new requests).
 func DrainNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
 	return func(c echo.Context) error {
--- a/core/http/endpoints/localai/nodes_test.go
+++ b/core/http/endpoints/localai/nodes_test.go
@@ -407,4 +407,44 @@ var _ = Describe("Node HTTP handlers", func() {
 			Expect(names).To(ConsistOf("alpha", "beta"))
 		})
 	})
+
+	Describe("ListAllNodeModelsEndpoint", func() {
+		It("returns an empty list when no models are loaded", func() {
+			e := echo.New()
+			req := httptest.NewRequest(http.MethodGet, "/", nil)
+			rec := httptest.NewRecorder()
+			c := e.NewContext(req, rec)
+
+			handler := ListAllNodeModelsEndpoint(registry)
+			Expect(handler(c)).To(Succeed())
+			Expect(rec.Code).To(Equal(http.StatusOK))
+
+			var list []nodes.NodeModel
+			Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
+			Expect(list).To(BeEmpty())
+		})
+
+		It("returns loaded models across healthy nodes", func() {
+			ctx := context.Background()
+			Expect(registry.Register(ctx, &nodes.BackendNode{
+				ID: "n1", Name: "alpha", Address: "10.0.0.1:50051", Status: nodes.StatusHealthy,
+			}, true)).To(Succeed())
+			Expect(registry.SetNodeModel(ctx, "n1", "llama-3.3", 0, "loaded", "10.0.0.1:50051", 0)).To(Succeed())
+
+			e := echo.New()
+			req := httptest.NewRequest(http.MethodGet, "/", nil)
+			rec := httptest.NewRecorder()
+			c := e.NewContext(req, rec)
+
+			handler := ListAllNodeModelsEndpoint(registry)
+			Expect(handler(c)).To(Succeed())
+			Expect(rec.Code).To(Equal(http.StatusOK))
+
+			var list []nodes.NodeModel
+			Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
+			Expect(list).To(HaveLen(1))
+			Expect(list[0].ModelName).To(Equal("llama-3.3"))
+			Expect(list[0].NodeID).To(Equal("n1"))
+		})
+	})
 })
--- a/core/http/react-ui/e2e/model-config.spec.js
+++ b/core/http/react-ui/e2e/model-config.spec.js
@@ -288,6 +288,21 @@ test.describe('Model Editor - Interactive Tab', () => {
    await expect(page.locator('input[placeholder^="match,"]')).toBeVisible()
  })

+  test('pattern min_len clamps a directly-typed negative to 0', async ({ page }) => {
+    const searchInput = page.locator('input[placeholder="Search fields to add..."]')
+    await searchInput.fill('Custom Secret Patterns')
+    const dropdown = searchInput.locator('..').locator('..')
+    await dropdown.locator('div', { hasText: 'Custom Secret Patterns' }).first().click()
+
+    await page.locator('button', { hasText: 'Add pattern' }).click()
+    // The number input's min={0} only limits the spinner arrows, not keyboard
+    // entry; the editor must sanitise a typed negative so a meaningless
+    // negative length floor never reaches the saved config.
+    const minLen = page.locator('input[aria-label="Minimum length"]')
+    await minLen.fill('-5')
+    await expect(minLen).toHaveValue('0')
+  })
+
  // Regression: a map-typed field (entity_actions) present in the loaded YAML
  // must render WITH its values. flattenConfig used to recurse into the map,
  // scattering it across pii_detection.entity_actions.<GROUP> paths that match
@@ -329,4 +344,37 @@ test.describe('Model Editor - Interactive Tab', () => {
    await expect(page.getByText(/block —/i).first()).toBeVisible()
  })

+  // A map cannot hold two values for one key, so renaming a row to an existing
+  // group must collapse to a single row (Object.fromEntries, last write wins)
+  // rather than rendering two conflicting rows that silently lose one on save.
+  test('entity_actions collapses a duplicate group to a single row', async ({ page }) => {
+    await page.route('**/api/models/edit/ner-model', (route) => {
+      route.fulfill({
+        contentType: 'application/json',
+        body: JSON.stringify({
+          name: 'ner-model',
+          config: [
+            'name: ner-model',
+            'backend: llama-cpp',
+            'pii_detection:',
+            '    entity_actions:',
+            '        SSN: block',
+            '        EMAIL: mask',
+            '',
+          ].join('\n'),
+        }),
+      })
+    })
+
+    await page.goto('/app/model-editor/ner-model')
+
+    const groupInputs = page.locator('input[aria-label="Entity group"]')
+    await expect(groupInputs).toHaveCount(2)
+
+    // Rename the EMAIL row to duplicate SSN; the editor collapses to one SSN row.
+    await groupInputs.nth(1).fill('SSN')
+    await expect(groupInputs).toHaveCount(1)
+    await expect(groupInputs.nth(0)).toHaveValue('SSN')
+  })
+
 })
--- a/core/http/react-ui/e2e/nodes-detail.spec.js
+++ b/core/http/react-ui/e2e/nodes-detail.spec.js
@@ -0,0 +1,34 @@
+import { test, expect } from './coverage-fixtures.js'
+
+const ID = 'n1'
+async function mockNode(page) {
+  await page.route(`**/api/nodes/${ID}`, r => r.fulfill({ status: 200, contentType: 'application/json',
+    body: JSON.stringify({ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy', total_vram: 24e9, available_vram: 12e9, max_replicas_per_model: 1, labels: { env: 'prod' } }) }))
+  await page.route(`**/api/nodes/${ID}/models`, r => r.fulfill({ status: 200, contentType: 'application/json',
+    body: JSON.stringify([{ node_id: ID, model_name: 'llama-3.3', state: 'loaded', in_flight: 0, replica_index: 0 }]) }))
+  await page.route(`**/api/nodes/${ID}/backends`, r => r.fulfill({ status: 200, contentType: 'application/json',
+    body: JSON.stringify([{ name: 'llama-cpp', is_system: true, installed_at: '2026-06-01T00:00:00Z' }]) }))
+}
+
+test.describe('Node detail page', () => {
+  test('renders sections for a node', async ({ page }) => {
+    await mockNode(page)
+    await page.goto(`/app/nodes/${ID}`)
+    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
+    await expect(page.getByText('alpha')).toBeVisible()
+    await expect(page.getByText('llama-3.3')).toBeVisible()
+    await expect(page.getByText('llama-cpp')).toBeVisible()
+    await expect(page.getByText('env=prod')).toBeVisible()
+  })
+
+  test('is reachable by clicking a roster panel', async ({ page }) => {
+    await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json',
+      body: JSON.stringify([{ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' }]) }))
+    await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
+    await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
+    await mockNode(page)
+    await page.goto('/app/nodes')
+    await page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('alpha').click()
+    await expect(page).toHaveURL(new RegExp(`/app/nodes/${ID}$`))
+  })
+})
--- a/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
+++ b/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
@@ -12,28 +12,37 @@ const NODE_NAME = 'worker-test'
 const BACKEND_NAME = 'cuda12-vllm-development'

 async function mockDistributedNodes(page, { onDelete } = {}) {
+  const nodeRecord = {
+    id: NODE_ID,
+    name: NODE_NAME,
+    node_type: 'backend',
+    address: '10.0.0.1:50051',
+    http_address: '10.0.0.1:8090',
+    status: 'healthy',
+    total_vram: 0,
+    available_vram: 0,
+    total_ram: 8_000_000_000,
+    available_ram: 4_000_000_000,
+    gpu_vendor: '',
+    last_heartbeat: new Date().toISOString(),
+    created_at: new Date().toISOString(),
+    updated_at: new Date().toISOString(),
+  }
+
  await page.route('**/api/nodes', (route) => {
    route.fulfill({
      status: 200,
      contentType: 'application/json',
-      body: JSON.stringify([
-        {
-          id: NODE_ID,
-          name: NODE_NAME,
-          node_type: 'backend',
-          address: '10.0.0.1:50051',
-          http_address: '10.0.0.1:8090',
-          status: 'healthy',
-          total_vram: 0,
-          available_vram: 0,
-          total_ram: 8_000_000_000,
-          available_ram: 4_000_000_000,
-          gpu_vendor: '',
-          last_heartbeat: new Date().toISOString(),
-          created_at: new Date().toISOString(),
-          updated_at: new Date().toISOString(),
-        },
-      ]),
+      body: JSON.stringify([nodeRecord]),
+    })
+  })
+
+  // The detail page fetches the single node via nodesApi.get(id).
+  await page.route(`**/api/nodes/${NODE_ID}`, (route) => {
+    route.fulfill({
+      status: 200,
+      contentType: 'application/json',
+      body: JSON.stringify(nodeRecord),
    })
  })

@@ -80,24 +89,18 @@ async function mockDistributedNodes(page, { onDelete } = {}) {
  })
 }

-async function expandNodeAndWaitForBackends(page) {
-  await page.goto('/app/nodes')
-  // Click the row to expand it. The chevron toggle and the row both work,
-  // but clicking the name cell is the most user-like.
-  await page.getByText(NODE_NAME).first().click()
-  // Backends, Capacity and Labels live behind a "Manage" <details>
-  // disclosure (the drawer was distilled to keep at-a-glance content
-  // lean — see distill refactor in the multi-replica branch). Open it
-  // by clicking the summary inside the .node-manage scope so the
-  // per-node backend table is in the DOM before assertions run.
-  await page.locator('.node-manage > summary').first().click()
+async function openNodeDetail(page) {
+  // The per-node backend table now lives on the deep-linkable detail page
+  // at /app/nodes/:id (the old expand-row + "Manage" disclosure was removed
+  // when the roster was restructured). Navigate straight there.
+  await page.goto(`/app/nodes/${NODE_ID}`)
  await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
 }

 test.describe('Nodes page — per-node backend actions', () => {
  test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
    await mockDistributedNodes(page)
-    await expandNodeAndWaitForBackends(page)
+    await openNodeDetail(page)

    // Negative: the old, ambiguous wording must not be used.
    await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
@@ -114,7 +117,7 @@ test.describe('Nodes page — per-node backend actions', () => {

  test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
    await mockDistributedNodes(page)
-    await expandNodeAndWaitForBackends(page)
+    await openNodeDetail(page)

    const deleteBtn = page.locator('button[title="Delete backend from this node"]')
    await expect(deleteBtn).toBeVisible()
@@ -128,7 +131,7 @@ test.describe('Nodes page — per-node backend actions', () => {
        postedBody = route.request().postDataJSON()
      },
    })
-    await expandNodeAndWaitForBackends(page)
+    await openNodeDetail(page)

    await page.locator('button[title="Delete backend from this node"]').click()

@@ -150,7 +153,7 @@ test.describe('Nodes page — per-node backend actions', () => {
        deleteCalls += 1
      },
    })
-    await expandNodeAndWaitForBackends(page)
+    await openNodeDetail(page)

    await page.locator('button[title="Delete backend from this node"]').click()

--- a/core/http/react-ui/e2e/nodes-roster.spec.js
+++ b/core/http/react-ui/e2e/nodes-roster.spec.js
@@ -0,0 +1,47 @@
+import { test, expect } from './coverage-fixtures.js'
+
+async function mockCluster(page, nodes) {
+  await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify(nodes) }))
+  await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
+  await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
+}
+
+test.describe('Nodes roster header', () => {
+  test('shows a cluster pulse line and no stat-card grid', async ({ page }) => {
+    await mockCluster(page, [
+      { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' },
+      { id: 'n2', name: 'beta', node_type: 'backend', address: '10.0.0.2:50051', status: 'draining' },
+    ])
+    await page.goto('/app/nodes')
+    await expect(page.locator('.cluster-pulse')).toBeVisible({ timeout: 15_000 })
+    await expect(page.locator('.cluster-pulse')).toContainText('2 nodes')
+    await expect(page.locator('.stat-grid')).toHaveCount(0)
+  })
+
+  test('shows an approval callout for pending nodes', async ({ page }) => {
+    await mockCluster(page, [{ id: 'n3', name: 'gamma', node_type: 'backend', address: '10.0.0.3:50051', status: 'pending' }])
+    await page.goto('/app/nodes')
+    await expect(page.locator('.attention-callout')).toContainText('approval', { timeout: 15_000 })
+  })
+})
+
+test.describe('Nodes roster panels', () => {
+  test('shows model chips without clicking and filters by type', async ({ page }) => {
+    await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([
+      { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' },
+      { id: 'a1', name: 'agent-1', node_type: 'agent', address: '10.0.0.9:50051', status: 'healthy' },
+    ]) }))
+    await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([
+      { node_id: 'n1', model_name: 'llama-3.3', state: 'loaded', in_flight: 2, replica_index: 0 },
+    ]) }))
+    await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' }))
+
+    await page.goto('/app/nodes')
+    // model chip visible without any expand click
+    await expect(page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('llama-3.3')).toBeVisible({ timeout: 15_000 })
+    // segmented filter: Agent shows the agent node, hides the backend node
+    await page.getByRole('radio', { name: /Agent/ }).click()
+    await expect(page.getByText('agent-1')).toBeVisible()
+    await expect(page.getByText('alpha')).toHaveCount(0)
+  })
+})
--- a/core/http/react-ui/e2e/page-render-smoke.spec.js
+++ b/core/http/react-ui/e2e/page-render-smoke.spec.js
@@ -21,6 +21,7 @@ const PAGES = [
  ['/app/backends', 'Backends'],
  ['/app/settings', 'Settings'],
  ['/app/nodes', 'Nodes'],
+  ['/app/scheduling', 'Scheduling'],
  ['/app/face', 'Face recognition'],
  ['/app/voice', 'Voice recognition'],
  ['/app/fine-tune', 'Fine-tuning'],
--- a/core/http/react-ui/e2e/scheduling.spec.js
+++ b/core/http/react-ui/e2e/scheduling.spec.js
@@ -0,0 +1,16 @@
+import { test, expect } from './coverage-fixtures.js'
+
+test.describe('Scheduling page', () => {
+  test('renders at /app/scheduling with rules from the API', async ({ page }) => {
+    await page.route('**/api/nodes/scheduling', (route) => {
+      route.fulfill({
+        status: 200, contentType: 'application/json',
+        body: JSON.stringify([{ model_name: 'llama-3.3', spread_all: true, min_replicas: 0, max_replicas: 0 }]),
+      })
+    })
+    await page.goto('/app/scheduling')
+    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
+    await expect(page).toHaveURL(/\/app\/scheduling$/)
+    await expect(page.getByText('llama-3.3')).toBeVisible()
+  })
+})
--- a/core/http/react-ui/public/locales/de/admin.json
+++ b/core/http/react-ui/public/locales/de/admin.json
@@ -43,6 +43,10 @@
    "title": "Verteilte Knoten",
    "subtitle": "Backend- und Agenten-Worker-Knoten verwalten"
  },
+  "scheduling": {
+    "title": "Planung",
+    "subtitle": "Modellplatzierung und Replikat-Regeln im gesamten Cluster"
+  },
  "p2p": {
    "title": "Verteilte KI-Berechnung",
    "subtitle": "Skalieren Sie Ihre KI-Workloads über mehrere Geräte mit Peer-to-Peer-Verteilung"
--- a/core/http/react-ui/public/locales/de/nav.json
+++ b/core/http/react-ui/public/locales/de/nav.json
@@ -50,6 +50,7 @@
    "backends": "Backends",
    "traces": "Traces",
    "nodes": "Knoten",
+    "scheduling": "Planung",
    "swarm": "Swarm",
    "system": "System",
    "settings": "Einstellungen",
--- a/core/http/react-ui/public/locales/en/admin.json
+++ b/core/http/react-ui/public/locales/en/admin.json
@@ -43,6 +43,10 @@
    "title": "Distributed Nodes",
    "subtitle": "Manage backend and agent worker nodes"
  },
+  "scheduling": {
+    "title": "Scheduling",
+    "subtitle": "Model placement and replica rules across the cluster"
+  },
  "p2p": {
    "title": "Distributed AI Computing",
    "subtitle": "Scale your AI workloads across multiple devices with peer-to-peer distribution"
--- a/core/http/react-ui/public/locales/en/nav.json
+++ b/core/http/react-ui/public/locales/en/nav.json
@@ -51,6 +51,7 @@
    "backends": "Backends",
    "traces": "Traces",
    "nodes": "Nodes",
+    "scheduling": "Scheduling",
    "swarm": "Swarm",
    "system": "System",
    "settings": "Settings",
--- a/core/http/react-ui/public/locales/es/admin.json
+++ b/core/http/react-ui/public/locales/es/admin.json
@@ -43,6 +43,10 @@
    "title": "Nodos distribuidos",
    "subtitle": "Administra nodos worker de backends y agentes"
  },
+  "scheduling": {
+    "title": "Planificación",
+    "subtitle": "Reglas de ubicación de modelos y réplicas en el clúster"
+  },
  "p2p": {
    "title": "Computación de IA distribuida",
    "subtitle": "Escala tus cargas de trabajo de IA en múltiples dispositivos con distribución peer-to-peer"
--- a/core/http/react-ui/public/locales/es/nav.json
+++ b/core/http/react-ui/public/locales/es/nav.json
@@ -50,6 +50,7 @@
    "backends": "Backends",
    "traces": "Trazas",
    "nodes": "Nodos",
+    "scheduling": "Planificación",
    "swarm": "Swarm",
    "system": "Sistema",
    "settings": "Configuración",
--- a/core/http/react-ui/public/locales/id/admin.json
+++ b/core/http/react-ui/public/locales/id/admin.json
@@ -43,6 +43,10 @@
    "title": "Node Terdistribusi",
    "subtitle": "Kelola node backend dan node worker"
  },
+  "scheduling": {
+    "title": "Penjadwalan",
+    "subtitle": "Aturan penempatan model dan replika di seluruh klaster"
+  },
  "p2p": {
    "title": "Komputasi AI Terdistribusi",
    "subtitle": "Skalakan beban kerja AI Anda ke beberapa perangkat dengan distribusi peer-to-peer"
--- a/core/http/react-ui/public/locales/id/nav.json
+++ b/core/http/react-ui/public/locales/id/nav.json
@@ -51,6 +51,7 @@
    "backends": "Backend",
    "traces": "Trace",
    "nodes": "Node",
+    "scheduling": "Penjadwalan",
    "swarm": "Swarm",
    "system": "Sistem",
    "settings": "Pengaturan",
--- a/core/http/react-ui/public/locales/it/admin.json
+++ b/core/http/react-ui/public/locales/it/admin.json
@@ -43,6 +43,10 @@
    "title": "Nodi distribuiti",
    "subtitle": "Gestisci i nodi worker dei backend e degli agenti"
  },
+  "scheduling": {
+    "title": "Pianificazione",
+    "subtitle": "Regole di posizionamento dei modelli e delle repliche nel cluster"
+  },
  "p2p": {
    "title": "Calcolo AI distribuito",
    "subtitle": "Scala i tuoi carichi di lavoro AI su più dispositivi con la distribuzione peer-to-peer"
--- a/core/http/react-ui/public/locales/it/nav.json
+++ b/core/http/react-ui/public/locales/it/nav.json
@@ -50,6 +50,7 @@
    "backends": "Backend",
    "traces": "Tracce",
    "nodes": "Nodi",
+    "scheduling": "Pianificazione",
    "swarm": "Swarm",
    "system": "Sistema",
    "settings": "Impostazioni",
--- a/core/http/react-ui/public/locales/ko/admin.json
+++ b/core/http/react-ui/public/locales/ko/admin.json
@@ -43,6 +43,10 @@
    "title": "분산 노드",
    "subtitle": "백엔드 및 에이전트 워커 노드를 관리합니다"
  },
+  "scheduling": {
+    "title": "스케줄링",
+    "subtitle": "클러스터 전반의 모델 배치 및 복제본 규칙"
+  },
  "p2p": {
    "title": "분산 AI 컴퓨팅",
    "subtitle": "피어 투 피어 분산으로 여러 기기에 걸쳐 AI 워크로드를 확장합니다"
--- a/core/http/react-ui/public/locales/ko/nav.json
+++ b/core/http/react-ui/public/locales/ko/nav.json
@@ -51,6 +51,7 @@
    "backends": "백엔드",
    "traces": "트레이스",
    "nodes": "노드",
+    "scheduling": "스케줄링",
    "swarm": "Swarm",
    "system": "시스템",
    "settings": "설정",
--- a/core/http/react-ui/public/locales/zh-CN/admin.json
+++ b/core/http/react-ui/public/locales/zh-CN/admin.json
@@ -43,6 +43,10 @@
    "title": "分布式节点",
    "subtitle": "管理后端和智能体工作节点"
  },
+  "scheduling": {
+    "title": "调度",
+    "subtitle": "集群中的模型放置和副本规则"
+  },
  "p2p": {
    "title": "分布式 AI 计算",
    "subtitle": "通过点对点分发将您的 AI 工作负载扩展到多个设备"
--- a/core/http/react-ui/public/locales/zh-CN/nav.json
+++ b/core/http/react-ui/public/locales/zh-CN/nav.json
@@ -50,6 +50,7 @@
    "backends": "后端",
    "traces": "追踪",
    "nodes": "节点",
+    "scheduling": "调度",
    "swarm": "Swarm",
    "system": "系统",
    "settings": "设置",
--- a/core/http/react-ui/src/App.css
+++ b/core/http/react-ui/src/App.css
@@ -8471,3 +8471,56 @@ select.input {
 .status-pill--error   .status-pill__dot { background: var(--color-error); }
 .status-pill--info    .status-pill__dot { background: var(--color-info); }
 .status-pill--muted   .status-pill__dot { background: var(--color-text-muted); }
+
+/* Nodes: cluster pulse + attention callout (replaces the stat-card strip) */
+.cluster-pulse {
+  font-size: var(--text-sm);
+  color: var(--color-text-muted);
+  margin: 0 0 var(--spacing-lg);
+}
+.cluster-pulse__strong { color: var(--color-text-primary); font-weight: 600; }
+
+.attention-callout {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: var(--spacing-md);
+  padding: var(--spacing-sm) var(--spacing-md);
+  border-radius: var(--radius-md);
+  margin-bottom: var(--spacing-lg);
+  font-size: var(--text-sm);
+}
+.attention-callout--warn {
+  background: var(--color-warning-light);
+  border: 1px solid var(--color-warning-border);
+  color: var(--color-text-primary);
+}
+.attention-callout--error {
+  background: var(--color-error-light);
+  border: 1px solid var(--color-error-border);
+  color: var(--color-text-primary);
+}
+
+/* Node roster panels (Nodes page) */
+.node-roster { display: flex; flex-direction: column; gap: var(--spacing-sm); }
+.node-panel {
+  background: var(--color-bg-secondary);
+  border: 1px solid var(--color-border-subtle);
+  border-radius: var(--radius-lg);
+}
+.node-panel__main { padding: var(--spacing-md) var(--spacing-lg); cursor: pointer; }
+.node-panel:hover { border-color: var(--color-border); }
+.node-panel__head { display: flex; align-items: flex-start; justify-content: space-between; gap: var(--spacing-md); }
+.node-panel__id { display: flex; align-items: center; gap: var(--spacing-sm); flex-wrap: wrap; }
+.node-panel__name { font-weight: 600; }
+.node-panel__meta { display: flex; gap: var(--spacing-lg); margin-top: var(--spacing-sm); color: var(--color-text-muted); font-size: var(--text-xs); }
+.node-panel__models { display: flex; flex-wrap: wrap; gap: 6px; margin-top: var(--spacing-sm); }
+.model-chip {
+  display: inline-flex; align-items: center; gap: 5px;
+  font-family: var(--font-mono); font-size: 0.6875rem;
+  padding: 2px 8px; border-radius: var(--radius-sm); border: 1px solid;
+}
+.model-chip__dot { width: 6px; height: 6px; border-radius: 50%; }
+.model-chip__state { opacity: 0.85; font-style: normal; }
+.node-filter { margin-bottom: var(--spacing-lg); }
+.node-detail__metrics { display: flex; gap: var(--spacing-xl); margin: var(--spacing-md) 0 var(--spacing-lg); flex-wrap: wrap; }
--- a/core/http/react-ui/src/components/PatternListEditor.jsx
+++ b/core/http/react-ui/src/components/PatternListEditor.jsx
@@ -74,7 +74,18 @@ export default function PatternListEditor({ value, onChange }) {
            min={0}
            value={r.min_len || 0}
            title="Minimum match length (0 = no floor)"
-            onChange={e => update(i, { min_len: parseInt(e.target.value, 10) || 0 })}
+            // min={0} only constrains the spinner, not keyboard entry. Clamp a
+            // typed negative to 0 (a negative floor is meaningless and would
+            // disable the length filter). When we clamp, force the DOM value
+            // too: the resulting 0->0 state change is a no-op, so React's
+            // controlled input would otherwise keep displaying the rejected
+            // "-5" even though the saved value is 0.
+            onChange={e => {
+              const parsed = parseInt(e.target.value, 10)
+              const n = Math.max(0, parsed || 0)
+              if (parsed < 0) e.target.value = String(n)
+              update(i, { min_len: n })
+            }}
            style={{ width: 80, fontSize: '0.8125rem' }}
            aria-label="Minimum length"
          />
--- a/core/http/react-ui/src/components/console/consoleConfig.js
+++ b/core/http/react-ui/src/components/console/consoleConfig.js
@@ -59,6 +59,7 @@ export const operateConsole = {
      titleKey: 'operate.cluster',
      items: [
        { path: '/app/nodes', icon: 'fas fa-network-wired', labelKey: 'items.nodes', adminOnly: true, feature: 'distributed' },
+        { path: '/app/scheduling', icon: 'fas fa-calendar-alt', labelKey: 'items.scheduling', adminOnly: true, feature: 'distributed' },
        { path: '/app/p2p', icon: 'fas fa-circle-nodes', labelKey: 'items.swarm', adminOnly: true },
      ],
    },
--- a/core/http/react-ui/src/components/nodes/AttentionCallout.jsx
+++ b/core/http/react-ui/src/components/nodes/AttentionCallout.jsx
@@ -0,0 +1,31 @@
+export default function AttentionCallout({ nodes, onApprove }) {
+  const pending = nodes.filter(n => n.status === 'pending')
+  const unhealthy = nodes.filter(n => n.status === 'unhealthy' || n.status === 'offline')
+  if (pending.length === 0 && unhealthy.length === 0) return null
+
+  if (pending.length > 0) {
+    const first = pending[0]
+    const extra = pending.length - 1
+    return (
+      <div className="attention-callout attention-callout--warn">
+        <span>
+          <i className="fas fa-exclamation-circle" />{' '}
+          <strong>{pending.length} node{pending.length > 1 ? 's' : ''} awaiting approval</strong>
+          {' - '}{first.name}{extra > 0 ? ` +${extra} more` : ''}
+        </span>
+        <button className="btn btn-primary btn-sm" onClick={() => onApprove(first.id)}>
+          <i className="fas fa-check" /> Approve {first.name}
+        </button>
+      </div>
+    )
+  }
+  return (
+    <div className="attention-callout attention-callout--error">
+      <span>
+        <i className="fas fa-exclamation-triangle" />{' '}
+        <strong>{unhealthy.length} node{unhealthy.length > 1 ? 's' : ''} unhealthy</strong>
+        {' - '}{unhealthy.map(n => n.name).slice(0, 3).join(', ')}
+      </span>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/CapacityEditor.jsx
+++ b/core/http/react-ui/src/components/nodes/CapacityEditor.jsx
@@ -0,0 +1,196 @@
+import { useState, useEffect, useCallback } from 'react'
+import { nodesApi } from '../../utils/api'
+import LoadingSpinner from '../LoadingSpinner'
+
+/**
+ * Inline editor for a node's per-model replica capacity.
+ *
+ * UX intent: discoverable affordance (pencil icon) that opens an inline
+ * input - never a modal for a single field. Source-of-truth note is shown
+ * inline so operators understand a worker re-registration will overwrite
+ * their override; surfacing this in a tooltip would hide too important a
+ * caveat.
+ *
+ * `confirmShrink` is a hook the parent provides so the page can render its
+ * own confirm dialog (it has access to all nodes and can phrase the message
+ * with full context).
+ */
+export default function CapacityEditor({ node, loadedModelCounts, onUpdate, confirmShrink, addToast }) {
+  const current = node.max_replicas_per_model || 1
+  const isOverride = !!node.max_replicas_per_model_manually_set
+  const [editing, setEditing] = useState(false)
+  const [draft, setDraft] = useState(String(current))
+  const [saving, setSaving] = useState(false)
+  const [resetting, setResetting] = useState(false)
+
+  // Reset draft when current value changes (server response, etc.)
+  useEffect(() => {
+    if (!editing) setDraft(String(current))
+  }, [current, editing])
+
+  const cancel = useCallback(() => {
+    setEditing(false)
+    setDraft(String(current))
+  }, [current])
+
+  const save = useCallback(async () => {
+    const value = parseInt(draft, 10)
+    if (!Number.isFinite(value) || value < 1) {
+      addToast('Replica capacity must be 1 or higher', 'error')
+      return
+    }
+    if (value === current) {
+      setEditing(false)
+      return
+    }
+    // Reducing the cap below current loaded replicas: confirm so the operator
+    // sees the consequence (running replicas keep going until idle eviction).
+    const maxLoadedAcrossModels = Math.max(0, ...Object.values(loadedModelCounts || {}))
+    if (value < maxLoadedAcrossModels) {
+      const proceed = await confirmShrink({ node, newValue: value, currentLoaded: maxLoadedAcrossModels })
+      if (!proceed) return
+    }
+    setSaving(true)
+    try {
+      await nodesApi.updateMaxReplicasPerModel(node.id, value)
+      addToast(`Replica capacity set to ${value} on ${node.name}`, 'success')
+      setEditing(false)
+      onUpdate?.(value)
+    } catch (err) {
+      addToast(`Could not change replica capacity: ${err.message || err}`, 'error')
+    } finally {
+      setSaving(false)
+    }
+  }, [draft, current, node, loadedModelCounts, confirmShrink, onUpdate, addToast])
+
+  const onKeyDown = (e) => {
+    if (e.key === 'Enter') { e.preventDefault(); save() }
+    else if (e.key === 'Escape') { e.preventDefault(); cancel() }
+  }
+
+  const reset = useCallback(async () => {
+    setResetting(true)
+    try {
+      await nodesApi.resetMaxReplicasPerModel(node.id)
+      addToast(`Override cleared on ${node.name}; worker flag will apply on next re-registration`, 'success')
+      onUpdate?.(null)
+    } catch (err) {
+      addToast(`Could not reset override: ${err.message || err}`, 'error')
+    } finally {
+      setResetting(false)
+    }
+  }, [node, onUpdate, addToast])
+
+  return (
+    <div style={{
+      display: 'flex', alignItems: 'flex-start', gap: 'var(--spacing-md)',
+    }}>
+      <i className="fas fa-layer-group" style={{ color: 'var(--color-text-muted)', marginTop: 3 }} aria-hidden="true" />
+      <div style={{ flex: 1, minWidth: 0 }}>
+        <div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
+          <label
+            htmlFor={`capacity-${node.id}`}
+            style={{ fontSize: '0.8125rem', fontWeight: 600, color: 'var(--color-text-primary)' }}
+          >
+            Max replicas per model
+          </label>
+          {editing ? (
+            <>
+              <input
+                id={`capacity-${node.id}`}
+                type="number"
+                min={1}
+                value={draft}
+                disabled={saving}
+                onChange={(e) => setDraft(e.target.value)}
+                onKeyDown={onKeyDown}
+                autoFocus
+                aria-describedby={`capacity-hint-${node.id}`}
+                style={{
+                  width: 72, padding: '4px 8px', borderRadius: 'var(--radius-sm)',
+                  border: '1px solid var(--color-border)', background: 'var(--color-bg-primary)',
+                  fontFamily: 'var(--font-mono)', fontSize: '0.8125rem',
+                  color: 'var(--color-text-primary)',
+                }}
+              />
+              <button
+                className="btn btn-primary btn-sm"
+                onClick={save}
+                disabled={saving}
+                style={{ minHeight: 32 }}
+                aria-label="Save replica capacity"
+              >
+                {saving ? <LoadingSpinner size="xs" /> : <><i className="fas fa-check" /> Save</>}
+              </button>
+              <button
+                className="btn btn-secondary btn-sm"
+                onClick={cancel}
+                disabled={saving}
+                style={{ minHeight: 32 }}
+                aria-label="Cancel"
+              >
+                Cancel
+              </button>
+            </>
+          ) : (
+            <>
+              <span
+                className="cell-mono"
+                style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}
+              >
+                {current}
+              </span>
+              {isOverride && (
+                <span
+                  title="This value was set from the UI. It will persist across worker restarts until you click Reset."
+                  style={{
+                    display: 'inline-block', fontSize: '0.6875rem', padding: '1px 6px',
+                    borderRadius: 'var(--radius-sm)', fontWeight: 500,
+                    background: 'var(--color-bg-primary)',
+                    border: '1px solid var(--color-warning, #d97706)',
+                    color: 'var(--color-warning, #d97706)',
+                  }}
+                >
+                  override
+                </span>
+              )}
+              <button
+                onClick={() => setEditing(true)}
+                aria-label={`Edit replica capacity (currently ${current})`}
+                title="Change replica capacity for this node"
+                style={{
+                  display: 'inline-flex', alignItems: 'center', justifyContent: 'center',
+                  minWidth: 32, minHeight: 32, padding: 4, borderRadius: 'var(--radius-sm)',
+                  border: '1px solid var(--color-border-subtle)',
+                  background: 'transparent', color: 'var(--color-text-muted)', cursor: 'pointer',
+                }}
+              >
+                <i className="fas fa-pencil-alt" />
+              </button>
+              {isOverride && (
+                <button
+                  onClick={reset}
+                  disabled={resetting}
+                  aria-label="Clear admin override and let the worker flag apply"
+                  title="Clear override; the worker's --max-replicas-per-model flag will apply on the next re-registration"
+                  className="btn btn-secondary btn-sm"
+                  style={{ minHeight: 32 }}
+                >
+                  {resetting ? <LoadingSpinner size="xs" /> : <><i className="fas fa-undo" /> Reset</>}
+                </button>
+              )}
+            </>
+          )}
+        </div>
+        <div
+          id={`capacity-hint-${node.id}`}
+          style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 4, lineHeight: 1.4 }}
+        >
+          {isOverride
+            ? <>Set from here. <strong>Reset</strong> to use the worker's default.</>
+            : <>Saved values stick across worker restarts.</>}
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/ClusterPulse.jsx
+++ b/core/http/react-ui/src/components/nodes/ClusterPulse.jsx
@@ -0,0 +1,18 @@
+import { formatVRAM } from './nodeStatus'
+
+export default function ClusterPulse({ nodes }) {
+  const total = nodes.length
+  const healthy = nodes.filter(n => n.status === 'healthy').length
+  const draining = nodes.filter(n => n.status === 'draining').length
+  const usedVRAM = nodes.reduce((s, n) =>
+    (n.total_vram && n.available_vram != null) ? s + (n.total_vram - n.available_vram) : s, 0)
+  const vramStr = formatVRAM(usedVRAM)
+  return (
+    <p className="cluster-pulse">
+      <span className="cluster-pulse__strong">{total} {total === 1 ? 'node' : 'nodes'}</span>
+      {' · '}<span style={{ color: 'var(--color-success)' }}>{healthy} healthy</span>
+      {draining > 0 && <>{' · '}<span style={{ color: 'var(--color-warning)' }}>{draining} draining</span></>}
+      {vramStr && <>{' · '}{vramStr} VRAM in use</>}
+    </p>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/KeyValueChips.jsx
+++ b/core/http/react-ui/src/components/nodes/KeyValueChips.jsx
@@ -0,0 +1,98 @@
+import { useState } from 'react'
+
+/**
+ * Controlled chip-builder for { key: value } maps. Replaces the prior
+ * comma-separated-string Node Selector input AND the bespoke Labels editor
+ * in the node drawer - both were rendering the same chip pattern with
+ * subtly different markup.
+ *
+ * Fully controlled: parent owns the map and decides what onAdd/onRemove
+ * does (form state for the scheduling form; API calls for the live
+ * labels editor). The component just renders chips and a key/value input
+ * row.
+ *
+ * Props:
+ *   pairs       - current map of key -> value
+ *   onAdd(k,v)  - called when the user adds a pair (parent handles dedup
+ *                 and persistence side effects)
+ *   onRemove(k) - called when a chip's × is clicked
+ *   placeholderKey, placeholderValue - input hints
+ *   ariaLabel   - accessible name for the section
+ */
+export default function KeyValueChips({ pairs, onAdd, onRemove, placeholderKey = 'key', placeholderValue = 'value', ariaLabel }) {
+  const [k, setK] = useState('')
+  const [v, setV] = useState('')
+
+  const add = () => {
+    const key = k.trim()
+    if (!key) return
+    onAdd(key, v.trim())
+    setK(''); setV('')
+  }
+  const onKeyDown = (e) => {
+    if (e.key === 'Enter') { e.preventDefault(); add() }
+  }
+
+  const entries = pairs ? Object.entries(pairs) : []
+  return (
+    <div aria-label={ariaLabel}>
+      {entries.length > 0 && (
+        <div style={{ display: 'flex', flexWrap: 'wrap', gap: 4, marginBottom: 'var(--spacing-xs)' }}>
+          {entries.map(([key, val]) => (
+            <span key={key} style={{
+              display: 'inline-flex', alignItems: 'center', gap: 4,
+              fontSize: '0.75rem', padding: '2px 8px',
+              borderRadius: 'var(--radius-sm)',
+              background: 'var(--color-bg-tertiary)',
+              border: '1px solid var(--color-border-subtle)',
+              fontFamily: 'var(--font-mono)',
+            }}>
+              {key}={val}
+              <button
+                type="button"
+                onClick={(e) => { e.stopPropagation(); onRemove(key) }}
+                aria-label={`Remove ${key}`}
+                title="Remove"
+                style={{
+                  background: 'none', border: 'none', cursor: 'pointer',
+                  color: 'var(--color-text-muted)', fontSize: '0.625rem', padding: 0,
+                }}
+              >
+                <i className="fas fa-times" />
+              </button>
+            </span>
+          ))}
+        </div>
+      )}
+      <div style={{ display: 'flex', gap: 'var(--spacing-xs)', alignItems: 'stretch' }}>
+        <input
+          className="input"
+          type="text"
+          placeholder={placeholderKey}
+          value={k}
+          onChange={e => setK(e.target.value)}
+          onKeyDown={onKeyDown}
+          style={{ flex: 1 }}
+        />
+        <input
+          className="input"
+          type="text"
+          placeholder={placeholderValue}
+          value={v}
+          onChange={e => setV(e.target.value)}
+          onKeyDown={onKeyDown}
+          style={{ flex: 1 }}
+        />
+        <button
+          type="button"
+          className="btn btn-secondary btn-sm"
+          onClick={add}
+          disabled={!k.trim()}
+          style={{ minHeight: 36 }}
+        >
+          <i className="fas fa-plus" /> Add
+        </button>
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/ModelChip.jsx
+++ b/core/http/react-ui/src/components/nodes/ModelChip.jsx
@@ -0,0 +1,12 @@
+import { modelStateConfig } from './nodeStatus'
+
+export default function ModelChip({ model }) {
+  const cfg = modelStateConfig[model.state] || modelStateConfig.idle
+  return (
+    <span className="model-chip" style={{ background: cfg.bg, color: cfg.color, borderColor: cfg.border }}>
+      <span className="model-chip__dot" style={{ background: cfg.color }} />
+      {model.model_name}
+      {model.state !== 'loaded' && <span className="model-chip__state"> {model.state}</span>}
+    </span>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/NodePanel.jsx
+++ b/core/http/react-ui/src/components/nodes/NodePanel.jsx
@@ -0,0 +1,60 @@
+import { useNavigate } from 'react-router-dom'
+import StatusPill from './StatusPill'
+import ModelChip from './ModelChip'
+import ActionMenu from '../ActionMenu'
+import { formatVRAM } from './nodeStatus'
+
+export default function NodePanel({ node, models = [], onApprove, onDrain, onResume, onRemove }) {
+  const navigate = useNavigate()
+  const isAgent = node.node_type === 'agent'
+  const open = () => navigate(`/app/nodes/${node.id}`)
+  const usedVRAM = node.total_vram && node.available_vram != null ? node.total_vram - node.available_vram : null
+
+  return (
+    <div className="node-panel">
+      <div className="node-panel__main" onClick={open} role="button" tabIndex={0}
+        onKeyDown={(e) => { if (e.key === 'Enter') open() }}>
+        <div className="node-panel__head">
+          <div className="node-panel__id">
+            <StatusPill status={node.status} />
+            <span className="node-panel__name">{node.name}</span>
+            <span className="cell-mono cell-muted">{node.address}</span>
+          </div>
+          <div className="node-panel__actions" onClick={(e) => e.stopPropagation()}>
+            {node.status === 'pending' && (
+              <button className="btn btn-primary btn-sm" onClick={() => onApprove(node.id)}>
+                <i className="fas fa-check" /> Approve
+              </button>
+            )}
+            <ActionMenu
+              ariaLabel={`Actions for ${node.name}`}
+              triggerLabel={`Actions for ${node.name}`}
+              items={[
+                { key: 'resume', icon: 'fa-play', label: 'Resume', hidden: node.status !== 'draining', onClick: () => onResume(node.id) },
+                { key: 'drain', icon: 'fa-pause', label: 'Drain', hidden: node.status === 'draining' || node.status === 'pending', onClick: () => onDrain(node.id) },
+                { divider: true, hidden: node.status === 'pending' },
+                { key: 'remove', icon: 'fa-trash', label: 'Remove from cluster', danger: true, onClick: () => onRemove(node) },
+              ]}
+            />
+          </div>
+        </div>
+
+        {!isAgent && (
+          <>
+            <div className="node-panel__meta">
+              {node.total_vram > 0 && (
+                <span className="cell-mono">VRAM {formatVRAM(usedVRAM) || '0'} / {formatVRAM(node.total_vram)}</span>
+              )}
+              <span className="cell-mono">{node.in_flight_count || 0} in-flight</span>
+            </div>
+            <div className="node-panel__models">
+              {models.length === 0
+                ? <span className="cell-muted">No models loaded</span>
+                : models.map(m => <ModelChip key={`${m.model_name}-${m.replica_index ?? 0}`} model={m} />)}
+            </div>
+          </>
+        )}
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/StatusPill.jsx
+++ b/core/http/react-ui/src/components/nodes/StatusPill.jsx
@@ -0,0 +1,11 @@
+import { statusConfig } from './nodeStatus'
+
+export default function StatusPill({ status }) {
+  const cfg = statusConfig[status] || statusConfig.unhealthy
+  return (
+    <span className="node-status" style={{ color: cfg.color }}>
+      <span className="node-status__dot" style={{ background: cfg.color }} />
+      {cfg.label}
+    </span>
+  )
+}
--- a/core/http/react-ui/src/components/nodes/nodeStatus.js
+++ b/core/http/react-ui/src/components/nodes/nodeStatus.js
@@ -0,0 +1,34 @@
+export const statusConfig = {
+  healthy: { color: 'var(--color-success)', label: 'Healthy' },
+  unhealthy: { color: 'var(--color-error)', label: 'Unhealthy' },
+  offline: { color: 'var(--color-error)', label: 'Offline' },
+  registering: { color: 'var(--color-primary)', label: 'Registering' },
+  draining: { color: 'var(--color-warning)', label: 'Draining' },
+  pending: { color: 'var(--color-warning)', label: 'Pending Approval' },
+}
+
+export const modelStateConfig = {
+  loaded: { bg: 'var(--color-success-light)', color: 'var(--color-success)', border: 'var(--color-success-border)' },
+  loading: { bg: 'var(--color-primary-light)', color: 'var(--color-primary)', border: 'var(--color-primary-border)' },
+  unloading: { bg: 'var(--color-warning-light)', color: 'var(--color-warning)', border: 'var(--color-warning-border)' },
+  idle: { bg: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)', border: 'var(--color-border-subtle)' },
+}
+
+export function formatVRAM(bytes) {
+  if (!bytes || bytes === 0) return null
+  const gb = bytes / (1024 * 1024 * 1024)
+  return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
+}
+
+export function timeAgo(dateString) {
+  if (!dateString) return 'never'
+  const seconds = Math.floor((Date.now() - new Date(dateString).getTime()) / 1000)
+  if (seconds < 0) return 'just now'
+  if (seconds < 60) return `${seconds}s ago`
+  const minutes = Math.floor(seconds / 60)
+  if (minutes < 60) return `${minutes}m ago`
+  const hours = Math.floor(minutes / 60)
+  if (hours < 24) return `${hours}h ago`
+  const days = Math.floor(hours / 24)
+  return `${days}d ago`
+}
--- a/core/http/react-ui/src/pages/NodeDetail.jsx
+++ b/core/http/react-ui/src/pages/NodeDetail.jsx
@@ -0,0 +1,352 @@
+import { useState, useEffect, useCallback } from 'react'
+import { useParams, useNavigate, useOutletContext } from 'react-router-dom'
+import { nodesApi } from '../utils/api'
+import PageHeader from '../components/PageHeader'
+import LoadingSpinner from '../components/LoadingSpinner'
+import ConfirmDialog from '../components/ConfirmDialog'
+import StatusPill from '../components/nodes/StatusPill'
+import CapacityEditor from '../components/nodes/CapacityEditor'
+import KeyValueChips from '../components/nodes/KeyValueChips'
+import { formatVRAM, modelStateConfig, timeAgo } from '../components/nodes/nodeStatus'
+
+// Deep-linkable node management home. Reached by clicking a roster panel on
+// /app/nodes. Surfaces what's running here plus the management affordances
+// (capacity, backends, labels, drain/resume/remove) that previously lived in
+// the expanded-row "Manage" drawer.
+export default function NodeDetail() {
+  const { id } = useParams()
+  const navigate = useNavigate()
+  const { addToast } = useOutletContext()
+  const [node, setNode] = useState(null)
+  const [models, setModels] = useState([])
+  const [backends, setBackends] = useState([])
+  const [loading, setLoading] = useState(true)
+  const [confirmRemove, setConfirmRemove] = useState(false)
+  const [confirmUnload, setConfirmUnload] = useState(null)
+  const [confirmDeleteBackend, setConfirmDeleteBackend] = useState(null)
+  // Promise-based shrink confirmation: CapacityEditor awaits this hook so the
+  // page owns the dialog (it can phrase the message with full node context).
+  const [confirmShrinkState, setConfirmShrinkState] = useState(null)
+
+  const refresh = useCallback(async () => {
+    try {
+      const n = await nodesApi.get(id)
+      setNode(n)
+      const [m, b] = await Promise.all([nodesApi.getModels(id), nodesApi.getBackends(id)])
+      setModels(Array.isArray(m) ? m : [])
+      setBackends(Array.isArray(b) ? b : [])
+    } catch (err) {
+      addToast(`Failed to load node: ${err.message}`, 'error')
+    } finally {
+      setLoading(false)
+    }
+  }, [id, addToast])
+
+  useEffect(() => { refresh() }, [refresh])
+
+  const confirmShrink = useCallback((ctx) => new Promise((resolve) => {
+    setConfirmShrinkState({ ...ctx, resolve })
+  }), [])
+
+  if (loading) return <div className="page page--wide" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
+  if (!node) return <div className="page page--wide"><PageHeader title="Node not found" /></div>
+
+  const drain = async () => { try { await nodesApi.drain(id); addToast('Node set to draining', 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const resume = async () => { try { await nodesApi.resume(id); addToast('Node resumed', 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const remove = async () => { try { await nodesApi.delete(id); addToast('Node removed', 'success'); navigate('/app/nodes') } catch (e) { addToast(e.message, 'error') } }
+  const unload = async (name) => { try { await nodesApi.unloadModel(id, name); addToast(`Model "${name}" unloaded`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const upgradeBackend = async (name) => { try { await nodesApi.installBackend(id, name); addToast(`Backend "${name}" upgraded`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const deleteBackend = async (name) => { try { await nodesApi.deleteBackend(id, name); addToast(`Backend "${name}" deleted`, 'success'); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const addLabel = async (k, v) => { try { await nodesApi.mergeLabels(id, { [k]: v }); refresh() } catch (e) { addToast(e.message, 'error') } }
+  const delLabel = async (k) => { try { await nodesApi.deleteLabel(id, k); refresh() } catch (e) { addToast(e.message, 'error') } }
+
+  const usedVRAM = node.total_vram && node.available_vram != null ? node.total_vram - node.available_vram : 0
+  // {modelName: replicaCount} of loaded models so the shrink confirm can warn
+  // if the new cap is below the actual count of any single model on this node.
+  const loadedModelCounts = (() => {
+    const counts = {}
+    models.forEach(m => { if (m.state === 'loaded') counts[m.model_name] = (counts[m.model_name] || 0) + 1 })
+    return counts
+  })()
+
+  return (
+    <div className="page page--wide">
+      <PageHeader
+        eyebrow={<a onClick={() => navigate('/app/nodes')} style={{ cursor: 'pointer', color: 'var(--color-primary)' }}><i className="fas fa-arrow-left" style={{ marginRight: 6 }} aria-hidden="true" />Cluster</a>}
+        title={<><StatusPill status={node.status} /> {node.name}</>}
+        supporting={node.address}
+        actions={
+          <>
+            {node.status === 'draining'
+              ? <button className="btn btn-secondary btn-sm" onClick={resume}><i className="fas fa-play" /> Resume</button>
+              : <button className="btn btn-secondary btn-sm" onClick={drain}><i className="fas fa-pause" /> Drain</button>}
+            <button className="btn btn-danger btn-sm" onClick={() => setConfirmRemove(true)}><i className="fas fa-trash" /> Remove</button>
+          </>
+        }
+      />
+
+      {/* Inline metrics row: VRAM / in-flight - no boxes, just labelled values. */}
+      <div className="node-detail__metrics">
+        {node.total_vram > 0 && (
+          <div>
+            <div className="drawer-eyebrow">VRAM</div>
+            <span className="cell-mono">{formatVRAM(usedVRAM) || '0'} / {formatVRAM(node.total_vram)}</span>
+          </div>
+        )}
+        <div>
+          <div className="drawer-eyebrow">In-flight</div>
+          <span className="cell-mono">{node.in_flight_count || 0}</span>
+        </div>
+        {node.node_type !== 'agent' && (
+          <div style={{ minWidth: 0 }}>
+            <div className="drawer-eyebrow">Capacity</div>
+            <CapacityEditor
+              node={node}
+              loadedModelCounts={loadedModelCounts}
+              confirmShrink={confirmShrink}
+              addToast={addToast}
+              onUpdate={() => refresh()}
+            />
+          </div>
+        )}
+      </div>
+
+      {/* Running models */}
+      <div style={{ marginTop: 'var(--spacing-lg)' }}>
+        <div className="drawer-eyebrow">Running models</div>
+        {models.length === 0 ? (
+          <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: '0 0 var(--spacing-md) 0' }}>
+            <i className="fas fa-cube" style={{ marginRight: 6, opacity: 0.6 }} aria-hidden="true" />
+            No models loaded yet - they'll appear here when scheduled to this node.
+          </p>
+        ) : (
+          <table className="table" style={{ margin: 0 }}>
+            <thead>
+              <tr>
+                <th>Model</th>
+                <th>State</th>
+                <th>In-Flight</th>
+                <th style={{ width: 40 }}>Logs</th>
+                <th style={{ textAlign: 'right' }}>Actions</th>
+              </tr>
+            </thead>
+            <tbody>
+              {(() => {
+                // Pre-compute per-model replica counts so the disambiguation
+                // pill only renders when this node actually hosts >1 replica
+                // of the same model. Single-replica deployments stay clean.
+                const replicaCounts = {}
+                models.forEach(m => { replicaCounts[m.model_name] = (replicaCounts[m.model_name] || 0) + 1 })
+                return models.map(m => {
+                  const stCfg = modelStateConfig[m.state] || modelStateConfig.idle
+                  const showReplica = (replicaCounts[m.model_name] || 0) > 1
+                  // Per-replica process key - what the worker stores logs under and what the
+                  // store's GetLines/Subscribe match on for replica-scoped filtering.
+                  const processKey = `${m.model_name}#${m.replica_index ?? 0}`
+                  return (
+                    <tr key={m.id || `${m.model_name}#${m.replica_index ?? 0}`}>
+                      <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
+                        {m.model_name}
+                        {showReplica && (
+                          <span
+                            className="cell-mono"
+                            aria-label={`replica ${m.replica_index ?? 0}`}
+                            title={`Replica ${m.replica_index ?? 0} on this node`}
+                            style={{
+                              marginLeft: 8, padding: '1px 6px', borderRadius: 'var(--radius-sm)',
+                              background: 'var(--color-bg-tertiary)',
+                              border: '1px solid var(--color-border-subtle)',
+                              fontSize: '0.6875rem', fontWeight: 500,
+                              color: 'var(--color-text-secondary)',
+                            }}
+                          >
+                            rep {m.replica_index ?? 0}
+                          </span>
+                        )}
+                      </td>
+                      <td>
+                        <span style={{
+                          display: 'inline-block', padding: '2px 8px', borderRadius: 'var(--radius-sm)',
+                          fontSize: '0.75rem', fontWeight: 500,
+                          background: stCfg.bg, color: stCfg.color, border: `1px solid ${stCfg.border}`,
+                        }}>
+                          {m.state}
+                        </span>
+                      </td>
+                      <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
+                        {m.in_flight ?? 0}
+                      </td>
+                      <td>
+                        <a
+                          href="#"
+                          onClick={(e) => {
+                            e.preventDefault()
+                            // Send the replica-scoped process key (modelName#replicaIndex).
+                            navigate(`/app/node-backend-logs/${id}/${encodeURIComponent(processKey)}`)
+                          }}
+                          style={{ fontSize: '0.75rem', color: 'var(--color-primary)' }}
+                          title={showReplica ? `View backend logs for replica ${m.replica_index ?? 0}` : 'View backend logs'}
+                        >
+                          <i className="fas fa-terminal" />
+                        </a>
+                      </td>
+                      <td style={{ textAlign: 'right' }}>
+                        <button
+                          className="btn btn-danger btn-sm"
+                          title={m.in_flight > 0 ? 'Unload model (has in-flight requests)' : 'Unload model'}
+                          onClick={() => setConfirmUnload({ modelName: m.model_name, inFlight: m.in_flight ?? 0 })}
+                        >
+                          <i className="fas fa-stop" />
+                        </button>
+                      </td>
+                    </tr>
+                  )
+                })
+              })()}
+            </tbody>
+          </table>
+        )}
+      </div>
+
+      {/* Installed backends */}
+      <div style={{ marginTop: 'var(--spacing-lg)' }}>
+        <div style={{
+          display: 'flex', alignItems: 'center', justifyContent: 'space-between',
+          marginBottom: 'var(--spacing-sm)',
+        }}>
+          <div className="drawer-eyebrow" style={{ margin: 0 }}>Installed backends</div>
+          <button
+            type="button"
+            className="btn btn-secondary btn-sm"
+            onClick={() => navigate(`/app/backends?target=${encodeURIComponent(id)}`)}
+            title={`Install a backend on ${node.name}`}
+          >
+            <i className="fas fa-plus" /> Add backend
+          </button>
+        </div>
+        {backends.length === 0 ? (
+          <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: 0 }}>
+            None installed. <a href="#" style={{ color: 'var(--color-primary)' }} onClick={(e) => { e.preventDefault(); navigate(`/app/backends?target=${encodeURIComponent(id)}`) }}>Install one from the gallery</a> to schedule models here.
+          </p>
+        ) : (
+          <table className="table" style={{ margin: 0 }}>
+            <thead>
+              <tr>
+                <th>Name</th>
+                <th>Type</th>
+                <th>Installed At</th>
+                <th style={{ textAlign: 'right' }}>Actions</th>
+              </tr>
+            </thead>
+            <tbody>
+              {backends.map(b => (
+                <tr key={b.name}>
+                  <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
+                    {b.name}
+                  </td>
+                  <td>
+                    <span style={{
+                      display: 'inline-block', padding: '2px 8px', borderRadius: 'var(--radius-sm)',
+                      fontSize: '0.75rem', fontWeight: 500,
+                      background: b.is_system ? 'var(--color-bg-tertiary)' : 'var(--color-primary-light)',
+                      color: b.is_system ? 'var(--color-text-muted)' : 'var(--color-primary)',
+                      border: `1px solid ${b.is_system ? 'var(--color-border-subtle)' : 'var(--color-primary-border)'}`,
+                    }}>
+                      {b.is_system ? 'system' : 'gallery'}
+                    </span>
+                  </td>
+                  <td style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>
+                    {b.installed_at ? timeAgo(b.installed_at) : '-'}
+                  </td>
+                  <td style={{ textAlign: 'right' }}>
+                    {!b.is_system && (
+                      <div style={{ display: 'inline-flex', gap: 'var(--spacing-xs)' }}>
+                        <button
+                          className="btn btn-secondary btn-sm"
+                          onClick={() => upgradeBackend(b.name)}
+                          title="Upgrade backend on this node"
+                        >
+                          <i className="fas fa-arrow-up" />
+                        </button>
+                        <button
+                          className="btn btn-danger-ghost btn-sm"
+                          onClick={() => setConfirmDeleteBackend({ backend: b.name })}
+                          title="Delete backend from this node"
+                        >
+                          <i className="fas fa-trash" />
+                        </button>
+                      </div>
+                    )}
+                  </td>
+                </tr>
+              ))}
+            </tbody>
+          </table>
+        )}
+      </div>
+
+      {/* Labels - node.replica-slots is filtered out so the Capacity editor
+          stays the single source of truth for that label. */}
+      <div style={{ marginTop: 'var(--spacing-lg)' }}>
+        <div className="drawer-eyebrow">Labels</div>
+        <KeyValueChips
+          pairs={Object.fromEntries(Object.entries(node.labels || {}).filter(([k]) => k !== 'node.replica-slots'))}
+          onAdd={addLabel}
+          onRemove={delLabel}
+          placeholderKey="key"
+          placeholderValue="value"
+          ariaLabel="Node labels"
+        />
+      </div>
+
+      <ConfirmDialog
+        open={confirmRemove}
+        title="Remove node"
+        message={`Remove "${node.name}" from the cluster? This will deregister it.`}
+        confirmLabel="Remove"
+        danger
+        onConfirm={() => { remove(); setConfirmRemove(false) }}
+        onCancel={() => setConfirmRemove(false)}
+      />
+
+      <ConfirmDialog
+        open={!!confirmUnload}
+        title="Unload Model"
+        message={
+          confirmUnload
+            ? confirmUnload.inFlight > 0
+              ? `"${confirmUnload.modelName}" currently has ${confirmUnload.inFlight} in-flight request(s). Unloading will interrupt them. Continue?`
+              : `Unload "${confirmUnload.modelName}" from ${node.name}?`
+            : ''
+        }
+        confirmLabel="Unload"
+        danger={confirmUnload?.inFlight > 0}
+        onConfirm={() => { if (confirmUnload) unload(confirmUnload.modelName); setConfirmUnload(null) }}
+        onCancel={() => setConfirmUnload(null)}
+      />
+
+      <ConfirmDialog
+        open={!!confirmDeleteBackend}
+        title="Delete Backend"
+        message={confirmDeleteBackend ? `Delete "${confirmDeleteBackend.backend}" from ${node.name}? This removes the backend files from this node only.` : ''}
+        confirmLabel="Delete"
+        danger
+        onConfirm={() => { if (confirmDeleteBackend) deleteBackend(confirmDeleteBackend.backend); setConfirmDeleteBackend(null) }}
+        onCancel={() => setConfirmDeleteBackend(null)}
+      />
+
+      <ConfirmDialog
+        open={!!confirmShrinkState}
+        title="Reduce replica capacity"
+        message={
+          confirmShrinkState
+            ? `${node.name} currently has ${confirmShrinkState.currentLoaded} replica(s) of at least one model loaded. Reducing the cap to ${confirmShrinkState.newValue} won't evict anything immediately - running replicas keep going, but the reconciler will trim down on the next idle window. Continue?`
+            : ''
+        }
+        confirmLabel="Reduce"
+        onConfirm={() => { confirmShrinkState?.resolve(true); setConfirmShrinkState(null) }}
+        onCancel={() => { confirmShrinkState?.resolve(false); setConfirmShrinkState(null) }}
+      />
+    </div>
+  )
+}
--- a/core/http/react-ui/src/pages/Nodes.jsx
+++ b/core/http/react-ui/src/pages/Nodes.jsx
--- a/core/http/react-ui/src/pages/Scheduling.jsx
+++ b/core/http/react-ui/src/pages/Scheduling.jsx
@@ -0,0 +1,438 @@
+import { useState, useEffect, useCallback } from 'react'
+import { useOutletContext } from 'react-router-dom'
+import { useTranslation } from 'react-i18next'
+import { nodesApi } from '../utils/api'
+import PageHeader from '../components/PageHeader'
+import ConfirmDialog from '../components/ConfirmDialog'
+import ResponsiveTable from '../components/ResponsiveTable'
+import SearchableModelSelect from '../components/SearchableModelSelect'
+import KeyValueChips from '../components/nodes/KeyValueChips'
+
+// Numeric input with quick-pick preset chips. Picked over a slider because
+// replica counts are exact specs (operator math), not fuzzy estimates. The
+// chips give one-click access to common values without the slider's
+// precision/special-value problems (e.g. MaxReplicas=0 = "no limit").
+function ReplicaInput({ id, label, value, onChange, presets }) {
+  return (
+    <div style={{ flex: 1 }}>
+      <label className="form-label" htmlFor={id}>{label}</label>
+      <input
+        id={id}
+        className="input"
+        type="number"
+        min={0}
+        value={value}
+        onChange={e => onChange(parseInt(e.target.value) || 0)}
+      />
+      <div style={{ display: 'flex', gap: 4, flexWrap: 'wrap', marginTop: 6 }}>
+        {presets.map(({ v, l }) => {
+          const active = value === v
+          return (
+            <button
+              key={v}
+              type="button"
+              onClick={() => onChange(v)}
+              aria-pressed={active}
+              className="cell-mono"
+              style={{
+                padding: '2px 8px',
+                borderRadius: 'var(--radius-sm)',
+                fontSize: '0.6875rem',
+                fontWeight: 500,
+                cursor: 'pointer',
+                background: active ? 'var(--color-primary-light)' : 'transparent',
+                border: `1px solid ${active ? 'var(--color-primary-border)' : 'var(--color-border-subtle)'}`,
+                color: active ? 'var(--color-primary)' : 'var(--color-text-muted)',
+              }}
+            >{l || v}</button>
+          )
+        })}
+      </div>
+    </div>
+  )
+}
+
+function SchedulingForm({ onSave, onCancel }) {
+  const [mode, setMode] = useState('placement')
+  const [modelName, setModelName] = useState('')
+  // Selector is now a chip-builder map instead of a comma-separated string.
+  // Operators were copying syntax from docs and missing commas; the chip UI
+  // makes the key=value structure self-documenting.
+  const [selector, setSelector] = useState({})
+  const [minReplicas, setMinReplicas] = useState(1)
+  const [maxReplicas, setMaxReplicas] = useState(0)
+  // Prefix-cache routing controls. Empty routePolicy means "inherit the
+  // cluster default"; the three thresholds at 0 likewise inherit, so they
+  // stay out of the POST body's effective override only when explicitly set.
+  const [routePolicy, setRoutePolicy] = useState('')
+  const [balanceAbsThreshold, setBalanceAbsThreshold] = useState(0)
+  const [balanceRelThreshold, setBalanceRelThreshold] = useState(0)
+  const [minPrefixMatch, setMinPrefixMatch] = useState(0)
+
+  const hasSelector = Object.keys(selector).length > 0
+
+  const isValid = () => {
+    if (!modelName) return false
+    if (mode === 'placement') return hasSelector
+    if (mode === 'spread') return true
+    return minReplicas > 0 || maxReplicas > 0
+  }
+
+  const handleSubmit = () => {
+    onSave({
+      model_name: modelName,
+      node_selector: hasSelector ? selector : undefined,
+      min_replicas: mode === 'autoscaling' ? minReplicas : 0,
+      max_replicas: mode === 'autoscaling' ? maxReplicas : 0,
+      spread_all: mode === 'spread',
+      route_policy: routePolicy,
+      balance_abs_threshold: balanceAbsThreshold,
+      balance_rel_threshold: balanceRelThreshold,
+      min_prefix_match: minPrefixMatch,
+    })
+  }
+
+  return (
+    <div className="card" style={{ padding: 'var(--spacing-lg)', marginBottom: 'var(--spacing-md)' }}>
+      {/* Mode selector — uses the project's segmented control instead of two
+          50%-width filled buttons that competed visually with the actual
+          primary action (Save). */}
+      <div role="radiogroup" aria-label="Scheduling mode" className="segmented" style={{ marginBottom: 'var(--spacing-xs)' }}>
+        <button
+          type="button" role="radio" aria-checked={mode === 'placement'}
+          className={`segmented__item${mode === 'placement' ? ' is-active' : ''}`}
+          onClick={() => setMode('placement')}
+        >
+          <i className="fas fa-thumbtack" aria-hidden="true" /> Pin to nodes
+        </button>
+        <button
+          type="button" role="radio" aria-checked={mode === 'autoscaling'}
+          className={`segmented__item${mode === 'autoscaling' ? ' is-active' : ''}`}
+          onClick={() => setMode('autoscaling')}
+        >
+          <i className="fas fa-arrows-up-down" aria-hidden="true" /> Auto-scale
+        </button>
+        <button
+          type="button" role="radio" aria-checked={mode === 'spread'}
+          className={`segmented__item${mode === 'spread' ? ' is-active' : ''}`}
+          onClick={() => setMode('spread')}
+        >
+          <i className="fas fa-network-wired" aria-hidden="true" /> Spread to all
+        </button>
+      </div>
+      <p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)', margin: '0 0 var(--spacing-lg) 0' }}>
+        {mode === 'placement'
+          ? 'Restrict this model to specific nodes. Loaded on demand, evictable when idle.'
+          : mode === 'spread'
+          ? 'Run one replica on every node matching the selector (all healthy nodes when empty). Tracks nodes joining and leaving.'
+          : 'Maintain a target replica count across the cluster. Min ≥ 1 protects from eviction.'}
+      </p>
+
+      {/* Linear vertical flow — model picker is the visual focus, then the
+          mode-specific fields below. No 2-column grid (the mismatched widths
+          made the form look raw). */}
+      <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
+        <div>
+          <label className="form-label" htmlFor="sched-model">Model</label>
+          {/* Searchable combobox so a long gallery doesn't force the operator
+              to scroll through hundreds of entries. Free-text is allowed —
+              you can pre-create a rule for a model that hasn't been
+              installed yet, which is a real workflow when standing up a new
+              node and pre-staging its scheduling policy. */}
+          <SearchableModelSelect
+            value={modelName}
+            onChange={setModelName}
+            placeholder="Type to search models, or paste a name..."
+          />
+        </div>
+
+        <div>
+          <label className="form-label">
+            Node selector{mode === 'placement' ? '' : ' (optional)'}
+          </label>
+          <KeyValueChips
+            pairs={selector}
+            onAdd={(k, v) => setSelector(prev => ({ ...prev, [k]: v }))}
+            onRemove={(k) => setSelector(prev => { const n = { ...prev }; delete n[k]; return n })}
+            placeholderKey="key (e.g. gpu.vendor)"
+            placeholderValue="value (e.g. nvidia)"
+            ariaLabel="Node selector"
+          />
+          <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
+            {mode === 'placement'
+              ? 'Models will load only on nodes that match all listed labels.'
+              : (hasSelector ? 'Replicas land only on matching nodes.' : 'Empty = any healthy node.')}
+          </span>
+        </div>
+
+        {mode === 'autoscaling' && (
+          <div style={{ display: 'flex', gap: 'var(--spacing-md)' }}>
+            <ReplicaInput
+              id="sched-min"
+              label="Min replicas"
+              value={minReplicas}
+              onChange={setMinReplicas}
+              presets={[{ v: 1 }, { v: 2 }, { v: 3 }, { v: 4 }]}
+            />
+            <ReplicaInput
+              id="sched-max"
+              label="Max replicas"
+              value={maxReplicas}
+              onChange={setMaxReplicas}
+              presets={[{ v: 0, l: 'no limit' }, { v: 2 }, { v: 4 }, { v: 8 }]}
+            />
+          </div>
+        )}
+
+        {/* Per-model routing policy. Left empty/zero these inherit the
+            cluster-wide defaults; set them to override how requests for this
+            model are spread across replicas. */}
+        <div>
+          <label className="form-label" htmlFor="sched-route-policy">Routing policy</label>
+          <select
+            id="sched-route-policy"
+            className="input"
+            value={routePolicy}
+            onChange={e => setRoutePolicy(e.target.value)}
+          >
+            <option value="">Default (cluster setting)</option>
+            <option value="round_robin">Round Robin</option>
+            <option value="prefix_cache">Prefix Cache</option>
+          </select>
+          <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
+            Prefix Cache routes shared-prefix requests to the same replica to reuse its KV cache, falling back to round-robin when replicas are imbalanced.
+          </span>
+        </div>
+
+        {routePolicy === 'prefix_cache' && (
+          <div style={{ display: 'flex', gap: 'var(--spacing-md)' }}>
+            <div style={{ flex: 1 }}>
+              <label className="form-label" htmlFor="sched-min-prefix-match">Min prefix match</label>
+              <input
+                id="sched-min-prefix-match"
+                className="input"
+                type="number"
+                step="0.05"
+                min="0"
+                max="1"
+                value={minPrefixMatch}
+                onChange={e => setMinPrefixMatch(parseFloat(e.target.value) || 0)}
+              />
+              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
+                Fraction of the prompt (0..1) that must match a cached prefix before affinity kicks in. 0 inherits the default.
+              </span>
+            </div>
+            <div style={{ flex: 1 }}>
+              <label className="form-label" htmlFor="sched-balance-abs">Balance abs threshold</label>
+              <input
+                id="sched-balance-abs"
+                className="input"
+                type="number"
+                min="0"
+                value={balanceAbsThreshold}
+                onChange={e => setBalanceAbsThreshold(parseInt(e.target.value) || 0)}
+              />
+              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
+                Max absolute in-flight gap allowed before falling back to round-robin. 0 inherits the default.
+              </span>
+            </div>
+            <div style={{ flex: 1 }}>
+              <label className="form-label" htmlFor="sched-balance-rel">Balance rel threshold</label>
+              <input
+                id="sched-balance-rel"
+                className="input"
+                type="number"
+                step="0.1"
+                min="0"
+                value={balanceRelThreshold}
+                onChange={e => setBalanceRelThreshold(parseFloat(e.target.value) || 0)}
+              />
+              <span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', display: 'block', marginTop: 6 }}>
+                Max relative in-flight ratio (&gt;= 1) allowed before falling back to round-robin. 0 inherits the default.
+              </span>
+            </div>
+          </div>
+        )}
+      </div>
+
+      {/* Hairline divider above the actions, matching the project's form pattern. */}
+      <div style={{
+        display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end',
+        marginTop: 'var(--spacing-lg)', paddingTop: 'var(--spacing-md)',
+        borderTop: '1px solid var(--color-border-subtle)',
+      }}>
+        <button className="btn btn-secondary btn-sm" onClick={onCancel}>Cancel</button>
+        <button className="btn btn-primary btn-sm" onClick={handleSubmit} disabled={!isValid()}>Save rule</button>
+      </div>
+    </div>
+  )
+}
+
+export default function Scheduling() {
+  const { addToast } = useOutletContext()
+  const { t } = useTranslation('admin')
+  const [schedulingConfigs, setSchedulingConfigs] = useState([])
+  const [showForm, setShowForm] = useState(false)
+  const [confirmDelete, setConfirmDelete] = useState(null)
+
+  const fetchScheduling = useCallback(async () => {
+    try {
+      const data = await nodesApi.listScheduling()
+      setSchedulingConfigs(Array.isArray(data) ? data : [])
+    } catch { setSchedulingConfigs([]) }
+  }, [])
+
+  useEffect(() => { fetchScheduling() }, [fetchScheduling])
+
+  const handleSave = async (config) => {
+    try {
+      await nodesApi.setScheduling(config)
+      addToast('Scheduling rule saved', 'success')
+      setShowForm(false)
+      fetchScheduling()
+    } catch (err) { addToast(`Failed to save rule: ${err.message}`, 'error') }
+  }
+
+  const handleDelete = async (model) => {
+    try {
+      await nodesApi.deleteScheduling(model)
+      addToast('Scheduling rule removed', 'success')
+      setConfirmDelete(null)
+      fetchScheduling()
+    } catch (err) { addToast(`Failed to remove rule: ${err.message}`, 'error') }
+  }
+
+  return (
+    <div className="page page--wide">
+      <PageHeader
+        title={<><i className="fas fa-calendar-alt" style={{ marginRight: 'var(--spacing-sm)' }} />{t('scheduling.title')}</>}
+        supporting={t('scheduling.subtitle')}
+      />
+      <div>
+        <button className="btn btn-primary btn-sm" style={{ marginBottom: 'var(--spacing-md)' }}
+          onClick={() => setShowForm(f => !f)}>
+          <i className="fas fa-plus" style={{ marginRight: 6 }} />
+          Add Scheduling Rule
+        </button>
+        {showForm && <SchedulingForm onSave={handleSave} onCancel={() => setShowForm(false)} />}
+        {schedulingConfigs.length === 0 && !showForm ? (
+          <p style={{ fontSize: '0.875rem', color: 'var(--color-text-muted)', textAlign: 'center', padding: 'var(--spacing-xl) 0' }}>
+            No scheduling rules configured. Add a rule to control how models are placed on nodes.
+          </p>
+        ) : schedulingConfigs.length > 0 && (
+          <ResponsiveTable>
+              <thead><tr>
+                <th>Model</th>
+                <th>Mode</th>
+                <th>Node Selector</th>
+                <th>Min Replicas</th>
+                <th>Max Replicas</th>
+                <th>Routing</th>
+                <th>Thresholds</th>
+                <th>Status</th>
+                <th style={{ textAlign: 'right' }}>Actions</th>
+              </tr></thead>
+              <tbody>
+                {schedulingConfigs.map(cfg => {
+                  const isSpread = !!cfg.spread_all
+                  const isAutoScaling = !isSpread && (cfg.min_replicas > 0 || cfg.max_replicas > 0)
+                  const hasSelector = !!cfg.node_selector
+                  const modeLabel = isSpread ? 'Spread' : isAutoScaling ? 'Auto-scaling' : hasSelector ? 'Placement' : 'Inactive'
+                  const modeColor = isSpread ? 'var(--color-warning)' : isAutoScaling ? 'var(--color-success)' : hasSelector ? 'var(--color-primary)' : 'var(--color-text-muted)'
+                  // Cooldown: reconciler tripped the circuit breaker because cluster
+                  // capacity is exhausted. Surface so the operator sees it instead
+                  // of the model silently failing to scale.
+                  const unsatisfiableUntil = cfg.unsatisfiable_until ? new Date(cfg.unsatisfiable_until) : null
+                  const isUnsatisfiable = unsatisfiableUntil && unsatisfiableUntil.getTime() > Date.now()
+                  return (
+                  <tr key={cfg.id || cfg.model_name}>
+                    <td style={{ fontWeight: 600, fontSize: '0.875rem' }}>{cfg.model_name}</td>
+                    <td>
+                      <span style={{
+                        display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px', borderRadius: "var(--radius-sm)",
+                        background: 'var(--color-bg-tertiary)', border: `1px solid ${modeColor}`,
+                        color: modeColor, fontWeight: 600,
+                      }}>{modeLabel}</span>
+                    </td>
+                    <td>
+                      {cfg.node_selector ? (() => {
+                        try {
+                          const sel = typeof cfg.node_selector === 'string' ? JSON.parse(cfg.node_selector) : cfg.node_selector
+                          return Object.entries(sel).map(([k,v]) => (
+                            <span key={k} style={{
+                              display: 'inline-block', fontSize: '0.75rem', padding: '2px 6px', borderRadius: "var(--radius-sm)",
+                              background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
+                              fontFamily: 'var(--font-mono)', marginRight: 4,
+                            }}>{k}={v}</span>
+                          ))
+                        } catch { return <span style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}>{cfg.node_selector}</span> }
+                      })() : <span style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}>Any node</span>}
+                    </td>
+                    <td style={{ fontFamily: 'var(--font-mono)' }}>
+                      {isSpread
+                        ? <span style={{
+                            display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px', borderRadius: "var(--radius-sm)",
+                            background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-warning)',
+                            color: 'var(--color-warning)', fontWeight: 600, fontFamily: 'var(--font-sans)',
+                          }}>Spread: all matching nodes</span>
+                        : isAutoScaling ? cfg.min_replicas : '-'}
+                    </td>
+                    <td style={{ fontFamily: 'var(--font-mono)' }}>
+                      {isSpread ? '-' : isAutoScaling ? (cfg.max_replicas || 'no limit') : '-'}
+                    </td>
+                    <td style={{ fontSize: '0.8125rem' }}>
+                      {cfg.route_policy || 'default'}
+                    </td>
+                    <td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem', color: 'var(--color-text-muted)' }}>
+                      {cfg.route_policy === 'prefix_cache' ? (
+                        <>
+                          <div>match: {cfg.min_prefix_match ? cfg.min_prefix_match : 'inherit'}</div>
+                          <div>abs: {cfg.balance_abs_threshold ? cfg.balance_abs_threshold : 'inherit'}</div>
+                          <div>rel: {cfg.balance_rel_threshold ? cfg.balance_rel_threshold : 'inherit'}</div>
+                        </>
+                      ) : '-'}
+                    </td>
+                    <td>
+                      {isUnsatisfiable ? (
+                        <span
+                          title={`Reconciler couldn't satisfy this rule (capacity exhausted). Will retry by ${unsatisfiableUntil.toLocaleString()}, or sooner on a node lifecycle change.`}
+                          style={{
+                            display: 'inline-block', fontSize: '0.75rem', padding: '2px 8px',
+                            borderRadius: 'var(--radius-sm)', fontWeight: 600,
+                            background: 'var(--color-bg-tertiary)',
+                            border: '1px solid var(--color-warning, #d97706)',
+                            color: 'var(--color-warning, #d97706)',
+                          }}
+                        >
+                          <i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />
+                          Unsatisfiable until {unsatisfiableUntil.toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })}
+                        </span>
+                      ) : (
+                        <span style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>OK</span>
+                      )}
+                    </td>
+                    <td style={{ textAlign: 'right' }}>
+                      <button className="btn btn-danger btn-sm" onClick={() => setConfirmDelete(cfg.model_name)}>
+                        <i className="fas fa-trash" />
+                      </button>
+                    </td>
+                  </tr>
+                  )
+                })}
+              </tbody>
+          </ResponsiveTable>
+        )}
+      </div>
+
+      <ConfirmDialog
+        open={!!confirmDelete}
+        title="Remove scheduling rule"
+        message={confirmDelete ? `Remove the scheduling rule for "${confirmDelete}"?` : ''}
+        confirmLabel="Remove"
+        danger
+        onConfirm={() => confirmDelete && handleDelete(confirmDelete)}
+        onCancel={() => setConfirmDelete(null)}
+      />
+    </div>
+  )
+}
--- a/core/http/react-ui/src/router.jsx
+++ b/core/http/react-ui/src/router.jsx
@@ -69,7 +69,9 @@ const Studio = page('studio', () => import('./pages/Studio'))
 const FaceRecognition = page('face', () => import('./pages/FaceRecognition'))
 const VoiceRecognition = page('voice', () => import('./pages/VoiceRecognition'))
 const Nodes = page('nodes', () => import('./pages/Nodes'))
+const Scheduling = page('scheduling', () => import('./pages/Scheduling'))
 const NodeBackendLogs = page(null, () => import('./pages/NodeBackendLogs'))
+const NodeDetail = page(null, () => import('./pages/NodeDetail'))
 const NotFound = page(null, () => import('./pages/NotFound'))
 const Usage = page('usage', () => import('./pages/Usage'))
 const Users = page('users', () => import('./pages/Users'))
@@ -152,6 +154,8 @@ const appChildren = [
      { path: 'backend-logs/:modelId', element: <Admin><BackendLogs /></Admin> },
      { path: 'p2p', element: <Admin><P2P /></Admin> },
      { path: 'nodes', element: <Admin><Nodes /></Admin> },
+      { path: 'nodes/:id', element: <Admin><NodeDetail /></Admin> },
+      { path: 'scheduling', element: <Admin><Scheduling /></Admin> },
      { path: 'node-backend-logs/:nodeId/:modelId', element: <Admin><NodeBackendLogs /></Admin> },
      { path: 'usage', element: <Usage /> },
      { path: 'users', element: <RequireAuthEnabled><Admin><Users /></Admin></RequireAuthEnabled> },
--- a/core/http/react-ui/src/utils/api.js
+++ b/core/http/react-ui/src/utils/api.js
@@ -568,6 +568,7 @@ export const nodesApi = {
    method: 'DELETE',
  }),
  listScheduling: () => fetchJSON(API_CONFIG.endpoints.nodesScheduling),
+  allModels: () => fetchJSON(API_CONFIG.endpoints.nodesModels),
  setScheduling: (config) => postJSON(API_CONFIG.endpoints.nodesScheduling, config),
  deleteScheduling: (model) => fetchJSON(API_CONFIG.endpoints.nodesSchedulingModel(model), { method: 'DELETE' }),
 }
--- a/core/http/react-ui/src/utils/config.js
+++ b/core/http/react-ui/src/utils/config.js
@@ -144,6 +144,7 @@ export const API_CONFIG = {
    nodeLabelKey: (id, key) => `/api/nodes/${id}/labels/${key}`,
    nodeMaxReplicasPerModel: (id) => `/api/nodes/${id}/max-replicas-per-model`,
    nodesScheduling: '/api/nodes/scheduling',
+    nodesModels: '/api/nodes/models',
    nodesSchedulingModel: (model) => `/api/nodes/scheduling/${encodeURIComponent(model)}`,
  },
 }
--- a/core/http/routes/nodes.go
+++ b/core/http/routes/nodes.go
@@ -71,6 +71,9 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade
 	admin := e.Group("/api/nodes", readyMw, adminMw)
 	admin.GET("", localai.ListNodesEndpoint(registry))

+	// Cluster-wide loaded models (registered before /:id to avoid route conflicts)
+	admin.GET("/models", localai.ListAllNodeModelsEndpoint(registry))
+
 	// Model scheduling (registered before /:id to avoid route conflicts)
 	admin.GET("/scheduling", localai.ListSchedulingEndpoint(registry))
 	admin.GET("/scheduling/:model", localai.GetSchedulingEndpoint(registry))
--- a/core/services/messaging/subjects.go
+++ b/core/services/messaging/subjects.go
@@ -64,6 +64,22 @@ func SubjectGalleryProgress(opID string) string {
 	return subjectGalleryPrefix + sanitizeSubjectToken(opID) + ".progress"
 }

+// SubjectStagingProgress returns the NATS subject a frontend replica publishes
+// file-staging progress on. Staging progress is otherwise per-process state
+// (the SmartRouter's in-memory StagingTracker), so without this broadcast a
+// /api/operations poll that round-robins onto a replica that did not originate
+// the staging op sees nothing - the progress row flickers in multi-replica
+// deployments. Peers subscribe to the wildcard and merge.
+func SubjectStagingProgress(modelID string) string {
+	return subjectStagingPrefix + sanitizeSubjectToken(modelID) + ".progress"
+}
+
+const subjectStagingPrefix = "staging."
+
+// SubjectStagingProgressWildcard matches every replica's staging-progress
+// broadcasts so a peer can mirror staging ops it did not originate.
+const SubjectStagingProgressWildcard = "staging.*.progress"
+
 // SubjectGalleryOpStart and SubjectGalleryOpEnd are broadcast subjects for the
 // in-memory OpCache lifecycle. Frontend replicas publish to these when an
 // admin admits a new install/delete (Start) and when an operation is
--- a/core/services/nodes/router.go
+++ b/core/services/nodes/router.go
@@ -359,8 +359,21 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 		}
 	}

-	// Step 2: Model not loaded — schedule loading with distributed lock to prevent duplicates
-	loadModel := func() (*RouteResult, error) {
+	// Step 2: Model not loaded — schedule loading with distributed lock to prevent duplicates.
+	//
+	// Detach the cold-load from the caller's context. Staging a model can
+	// transfer multiple GB to a worker, which takes far longer than any client
+	// keeps its HTTP request open — a browser refresh, an ingress/LB idle
+	// timeout, or a round-robined retry landing on another replica all cancel
+	// the request context. If staging were bound to it, the multi-GB upload
+	// aborts with "context canceled" mid-transfer and large models can never
+	// finish staging (the model-load outage). WithoutCancel keeps the request's
+	// values (prefix chain, etc.) but drops its cancellation/deadline. Each
+	// long step still has its own bound (the file stager's resume budget,
+	// LoadModel's 5m timeout), and the per-model advisory lock below de-dupes
+	// concurrent loaders across replicas.
+	loadCtx := context.WithoutCancel(ctx)
+	loadModel := func(ctx context.Context) (*RouteResult, error) {
 		// Re-check after acquiring lock — another request may have loaded it
 		node, nm, err := r.registry.FindAndLockNodeWithModel(ctx, trackingKey, candidateNodeIDs, pref)
 		if err == nil && node != nil {
@@ -433,9 +446,9 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 	if r.db != nil {
 		lockKey := advisorylock.KeyFromString("model-load:" + trackingKey)
 		var result *RouteResult
-		lockErr := advisorylock.WithLockCtx(ctx, r.db, lockKey, func() error {
+		lockErr := advisorylock.WithLockCtx(loadCtx, r.db, lockKey, func() error {
 			var err error
-			result, err = loadModel()
+			result, err = loadModel(loadCtx)
 			return err
 		})
 		if lockErr != nil {
@@ -444,7 +457,7 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 		return result, nil
 	}
 	// No DB (non-distributed) — proceed without lock
-	return loadModel()
+	return loadModel(loadCtx)
 }

 // parseSelectorJSON decodes a JSON node selector string into a map.
--- a/core/services/nodes/router_staging_context_test.go
+++ b/core/services/nodes/router_staging_context_test.go
@@ -0,0 +1,80 @@
+package nodes
+
+import (
+	"context"
+	"errors"
+	"os"
+	"path/filepath"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/services/messaging"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+)
+
+// cancelOnStageStager simulates the triggering HTTP request being abandoned
+// (client disconnect, ingress idle-timeout) the moment a multi-GB file starts
+// staging. It cancels the request context and records whether the context the
+// stager itself received was cancelled as a result.
+type cancelOnStageStager struct {
+	fakeFileStager
+	cancelRequest context.CancelFunc
+	staged        bool
+	ctxErrOnStage error
+}
+
+func (s *cancelOnStageStager) EnsureRemote(ctx context.Context, _, _, key string) (string, error) {
+	s.staged = true
+	// Mid-transfer: the client gives up on the (minutes-long) request.
+	if s.cancelRequest != nil {
+		s.cancelRequest()
+	}
+	// A multi-GB upload must survive this. If staging were bound to the
+	// request context, ctx is now cancelled and the real HTTP stager would
+	// abort with "context canceled" — exactly the production outage.
+	s.ctxErrOnStage = ctx.Err()
+	return "/remote/" + key, nil
+}
+
+var _ = Describe("Route cold-load staging context", func() {
+	It("detaches staging from the request context so a client disconnect cannot abort a multi-GB transfer", func() {
+		// A real model file so stageModelFiles actually calls the stager
+		// (non-existent paths are skipped).
+		tmp := GinkgoT().TempDir()
+		modelFile := filepath.Join(tmp, "big.gguf")
+		Expect(os.WriteFile(modelFile, []byte("weights"), 0o644)).To(Succeed())
+
+		reg := &fakeModelRouter{
+			findAndLockErr: errors.New("not loaded"),
+			findIdleNode:   &BackendNode{ID: "n1", Name: "worker-1", Address: "10.0.0.1:50051"},
+		}
+		backend := &stubBackend{loadResult: &pb.Result{Success: true}}
+		factory := &stubClientFactory{client: backend}
+		unloader := &fakeUnloader{installReply: &messaging.BackendInstallReply{
+			Success: true,
+			Address: "10.0.0.1:9001",
+		}}
+		stager := &cancelOnStageStager{}
+
+		router := NewSmartRouter(reg, SmartRouterOptions{
+			Unloader:      unloader,
+			ClientFactory: factory,
+			FileStager:    stager,
+			// DB nil: no advisory lock, exercises the same detached load ctx.
+		})
+
+		ctx, cancel := context.WithCancel(context.Background())
+		stager.cancelRequest = cancel
+		defer cancel()
+
+		result, err := router.Route(ctx, "big-model", filepath.Join("models", "big.gguf"), "llama-cpp",
+			&pb.ModelOptions{Model: "big.gguf", ModelFile: modelFile}, false)
+
+		Expect(err).ToNot(HaveOccurred())
+		Expect(result).ToNot(BeNil())
+		Expect(stager.staged).To(BeTrue(), "staging must have been attempted")
+		Expect(stager.ctxErrOnStage).ToNot(HaveOccurred(),
+			"staging context must survive cancellation of the triggering request")
+	})
+})
--- a/core/services/nodes/staging_progress.go
+++ b/core/services/nodes/staging_progress.go
@@ -5,58 +5,138 @@ import (
 	"fmt"
 	"sync"
 	"time"
+
+	"github.com/mudler/LocalAI/core/services/messaging"
 )

 // StagingStatus represents the current progress of a model staging operation.
 type StagingStatus struct {
-	ModelID    string  `json:"model_id"`
-	NodeName   string  `json:"node_name"`
-	FileName   string  `json:"file_name"`
-	BytesSent  int64   `json:"bytes_sent"`
-	TotalBytes int64   `json:"total_bytes"`
-	Progress   float64 `json:"progress"` // 0-100 overall progress
-	Speed      string  `json:"speed"`
-	FileIndex  int     `json:"file_index"`
-	TotalFiles int     `json:"total_files"`
-	Message    string  `json:"message"`
+	ModelID    string    `json:"model_id"`
+	NodeName   string    `json:"node_name"`
+	FileName   string    `json:"file_name"`
+	BytesSent  int64     `json:"bytes_sent"`
+	TotalBytes int64     `json:"total_bytes"`
+	Progress   float64   `json:"progress"` // 0-100 overall progress
+	Speed      string    `json:"speed"`
+	FileIndex  int       `json:"file_index"`
+	TotalFiles int       `json:"total_files"`
+	Message    string    `json:"message"`
 	StartedAt  time.Time `json:"started_at"`
 }

+const (
+	// stagingBroadcastInterval bounds how often byte-level UpdateFile ticks are
+	// re-broadcast to peers (leading-edge debounce). State transitions (Start,
+	// FileComplete, Complete) always publish so peers never miss them.
+	stagingBroadcastInterval = time.Second
+	// stagingRemoteTTL drops a mirrored (remote) op whose last update is older
+	// than this. NATS pub/sub is fire-and-forget, so a missed Done event would
+	// otherwise leave a phantom staging row on a peer forever; a live op
+	// refreshes its mirror at least every stagingBroadcastInterval.
+	stagingRemoteTTL = 60 * time.Second
+)
+
+// stagingEntry wraps a StagingStatus with the bookkeeping needed to keep peer
+// replicas consistent: whether this op is mirrored from a peer (remote) vs.
+// owned locally, when it was last updated (for remote-mirror expiry), and when
+// its byte progress was last broadcast (for debounce).
+type stagingEntry struct {
+	status    StagingStatus
+	remote    bool
+	updatedAt time.Time
+	lastPub   time.Time
+}
+
 // StagingTracker tracks active file staging operations in-memory.
 // Used by SmartRouter to publish progress and by /api/operations to surface it.
+//
+// In distributed mode each frontend replica runs its own tracker. The replica
+// performing a transfer owns the op locally and broadcasts progress over NATS
+// (SetPublisher); peers mirror it via ApplyRemote (SubscribeBroadcasts) so a
+// /api/operations poll that round-robins onto any replica surfaces the op.
 type StagingTracker struct {
-	mu     sync.RWMutex
-	active map[string]*StagingStatus
+	mu        sync.RWMutex
+	active    map[string]*stagingEntry
+	publisher messaging.Publisher
+}
+
+// StagingProgressEvent is the wire payload a frontend replica broadcasts on
+// SubjectStagingProgress so peer replicas can mirror a staging op they did not
+// originate. Done signals the op finished (peers drop their mirrored copy).
+type StagingProgressEvent struct {
+	ModelID string         `json:"model_id"`
+	Status  *StagingStatus `json:"status,omitempty"`
+	Done    bool           `json:"done"`
 }

 // NewStagingTracker creates a new tracker.
 func NewStagingTracker() *StagingTracker {
 	return &StagingTracker{
-		active: make(map[string]*StagingStatus),
+		active: make(map[string]*stagingEntry),
 	}
 }

+// SetPublisher wires the NATS publisher used to broadcast staging progress to
+// peer replicas. No-op publisher (nil) keeps the tracker standalone.
+func (t *StagingTracker) SetPublisher(p messaging.Publisher) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	t.publisher = p
+}
+
+// SubscribeBroadcasts subscribes to peer replicas' staging-progress broadcasts
+// and mirrors them into this tracker, so /api/operations on any replica surfaces
+// staging ops it did not originate. Returns the subscription for cleanup.
+func (t *StagingTracker) SubscribeBroadcasts(nc messaging.MessagingClient) (messaging.Subscription, error) {
+	return messaging.SubscribeJSON(nc, messaging.SubjectStagingProgressWildcard, func(evt StagingProgressEvent) {
+		if evt.ModelID == "" {
+			return
+		}
+		t.ApplyRemote(evt)
+	})
+}
+
+// publishStaging emits an event to the per-model staging subject. The publisher
+// is captured by the caller under the lock and passed in, so publishing happens
+// outside the lock (a slow NATS link must not stall the staging copy loop).
+func publishStaging(p messaging.Publisher, evt StagingProgressEvent) {
+	if p == nil {
+		return
+	}
+	_ = p.Publish(messaging.SubjectStagingProgress(evt.ModelID), evt)
+}
+
 // Start registers a new staging operation for the given model.
 func (t *StagingTracker) Start(modelID, nodeName string, totalFiles int) {
 	t.mu.Lock()
-	defer t.mu.Unlock()
-	t.active[modelID] = &StagingStatus{
-		ModelID:    modelID,
-		NodeName:   nodeName,
-		TotalFiles: totalFiles,
-		StartedAt:  time.Now(),
-		Message:    "Preparing to stage model files",
+	e := &stagingEntry{
+		status: StagingStatus{
+			ModelID:    modelID,
+			NodeName:   nodeName,
+			TotalFiles: totalFiles,
+			StartedAt:  time.Now(),
+			Message:    "Preparing to stage model files",
+		},
+		updatedAt: time.Now(),
+		// lastPub stays zero so the first UpdateFile tick always broadcasts.
 	}
+	t.active[modelID] = e
+	pub := t.publisher
+	snap := e.status
+	t.mu.Unlock()
+
+	publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
 }

 // UpdateFile updates the tracker with current file transfer progress.
 func (t *StagingTracker) UpdateFile(modelID, fileName string, fileIndex int, bytesSent, totalBytes int64, speed string) {
 	t.mu.Lock()
-	defer t.mu.Unlock()
-	s, ok := t.active[modelID]
+	e, ok := t.active[modelID]
 	if !ok {
+		t.mu.Unlock()
 		return
 	}
+	s := &e.status
 	s.FileName = fileName
 	s.FileIndex = fileIndex
 	s.BytesSent = bytesSent
@@ -79,52 +159,121 @@ func (t *StagingTracker) UpdateFile(modelID, fileName string, fileIndex int, byt
 	} else {
 		s.Message = fmt.Sprintf("Staging %s", fileName)
 	}
+
+	e.updatedAt = time.Now()
+	// Leading-edge debounce: byte ticks fire many times per second; only
+	// re-broadcast at most once per stagingBroadcastInterval.
+	var pub messaging.Publisher
+	var snap StagingStatus
+	if time.Since(e.lastPub) >= stagingBroadcastInterval {
+		e.lastPub = time.Now()
+		pub = t.publisher
+		snap = e.status
+	}
+	t.mu.Unlock()
+
+	if pub != nil {
+		publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
+	}
 }

 // FileComplete marks a single file as done within a staging operation.
 func (t *StagingTracker) FileComplete(modelID string, fileIndex, totalFiles int) {
 	t.mu.Lock()
-	defer t.mu.Unlock()
-	s, ok := t.active[modelID]
+	e, ok := t.active[modelID]
 	if !ok {
+		t.mu.Unlock()
 		return
 	}
+	s := &e.status
 	if totalFiles > 0 {
 		s.Progress = float64(fileIndex) / float64(totalFiles) * 100
 	}
 	s.BytesSent = 0
 	s.TotalBytes = 0
 	s.Speed = ""
+	e.updatedAt = time.Now()
+	e.lastPub = time.Now()
+	pub := t.publisher
+	snap := e.status
+	t.mu.Unlock()
+
+	// Always broadcast a per-file completion so peers' progress bars advance.
+	publishStaging(pub, StagingProgressEvent{ModelID: modelID, Status: &snap})
 }

 // Complete removes a staging operation (it's done).
 func (t *StagingTracker) Complete(modelID string) {
 	t.mu.Lock()
-	defer t.mu.Unlock()
+	_, ok := t.active[modelID]
 	delete(t.active, modelID)
+	pub := t.publisher
+	t.mu.Unlock()
+
+	if ok {
+		// Tell peers to drop their mirrored copy.
+		publishStaging(pub, StagingProgressEvent{ModelID: modelID, Done: true})
+	}
 }

-// GetAll returns a snapshot of all active staging operations.
+// ApplyRemote merges a peer replica's staging broadcast into this tracker. It
+// never re-broadcasts (no echo loop). A locally-owned op is authoritative: a
+// remote event for the same model is ignored, so the origin replica receiving
+// its own broadcast (and any stray peer event) cannot clobber or delete it.
+func (t *StagingTracker) ApplyRemote(evt StagingProgressEvent) {
+	t.mu.Lock()
+	defer t.mu.Unlock()
+
+	if existing, ok := t.active[evt.ModelID]; ok && !existing.remote {
+		// We own this op locally — ignore peer chatter about it.
+		return
+	}
+	if evt.Done {
+		delete(t.active, evt.ModelID)
+		return
+	}
+	if evt.Status == nil {
+		return
+	}
+	t.active[evt.ModelID] = &stagingEntry{
+		status:    *evt.Status,
+		remote:    true,
+		updatedAt: time.Now(),
+	}
+}
+
+// GetAll returns a snapshot of all active staging operations. Stale remote
+// mirrors (a peer op whose Done event was missed) are pruned here so they don't
+// linger in the UI.
 func (t *StagingTracker) GetAll() map[string]StagingStatus {
-	t.mu.RLock()
-	defer t.mu.RUnlock()
+	t.mu.Lock()
+	defer t.mu.Unlock()
+	now := time.Now()
 	result := make(map[string]StagingStatus, len(t.active))
-	for k, v := range t.active {
-		result[k] = *v
+	for k, e := range t.active {
+		if e.remote && now.Sub(e.updatedAt) > stagingRemoteTTL {
+			delete(t.active, k)
+			continue
+		}
+		result[k] = e.status
 	}
 	return result
 }

-// Get returns the status of a specific staging operation, or nil if not active.
+// Get returns the status of a specific staging operation, or nil if not active
+// (or a stale remote mirror).
 func (t *StagingTracker) Get(modelID string) *StagingStatus {
 	t.mu.RLock()
 	defer t.mu.RUnlock()
-	s, ok := t.active[modelID]
+	e, ok := t.active[modelID]
 	if !ok {
 		return nil
 	}
-	copy := *s
-	return &copy
+	if e.remote && time.Since(e.updatedAt) > stagingRemoteTTL {
+		return nil
+	}
+	s := e.status
+	return &s
 }

 // StagingProgressCallback is called by file stagers to report byte-level progress.
--- a/core/services/nodes/staging_progress_broadcast_test.go
+++ b/core/services/nodes/staging_progress_broadcast_test.go
@@ -0,0 +1,109 @@
+package nodes
+
+import (
+	"encoding/json"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/services/messaging"
+)
+
+// decodeStagingEvents extracts every StagingProgressEvent the fake messaging
+// client captured, in publish order.
+func decodeStagingEvents(mc *fakeMessagingClient) []StagingProgressEvent {
+	mc.mu.Lock()
+	defer mc.mu.Unlock()
+	var out []StagingProgressEvent
+	for _, p := range mc.published {
+		var evt StagingProgressEvent
+		if err := json.Unmarshal(p.Data, &evt); err != nil {
+			continue
+		}
+		if evt.ModelID == "" {
+			continue
+		}
+		out = append(out, evt)
+	}
+	return out
+}
+
+var _ = Describe("StagingTracker cross-replica broadcast", func() {
+	Context("when a publisher is wired (distributed mode)", func() {
+		It("broadcasts staging progress so a peer replica surfaces an op it did not originate", func() {
+			mc := &fakeMessagingClient{}
+			origin := NewStagingTracker()
+			origin.SetPublisher(mc)
+
+			origin.Start("model-x", "worker-1", 1)
+			origin.UpdateFile("model-x", "weights.gguf", 1, 5<<30, 10<<30, "100 MiB/s")
+
+			events := decodeStagingEvents(mc)
+			Expect(events).ToNot(BeEmpty(), "writes must be broadcast over NATS")
+			Expect(mc.published[0].Subject).To(Equal(messaging.SubjectStagingProgress("model-x")))
+
+			// A peer replica that never ran the op merges the broadcast.
+			peer := NewStagingTracker()
+			for _, evt := range events {
+				peer.ApplyRemote(evt)
+			}
+
+			all := peer.GetAll()
+			Expect(all).To(HaveKey("model-x"))
+			Expect(all["model-x"].NodeName).To(Equal("worker-1"))
+			Expect(all["model-x"].FileName).To(Equal("weights.gguf"))
+			Expect(all["model-x"].TotalBytes).To(Equal(int64(10 << 30)))
+		})
+
+		It("removes the op from the peer when the origin completes it", func() {
+			mc := &fakeMessagingClient{}
+			origin := NewStagingTracker()
+			origin.SetPublisher(mc)
+
+			origin.Start("model-x", "worker-1", 1)
+			origin.Complete("model-x")
+
+			peer := NewStagingTracker()
+			for _, evt := range decodeStagingEvents(mc) {
+				peer.ApplyRemote(evt)
+			}
+			Expect(peer.GetAll()).ToNot(HaveKey("model-x"))
+		})
+
+		It("does not let a peer broadcast clobber an op this replica is itself running", func() {
+			local := NewStagingTracker()
+			local.Start("model-x", "worker-local", 2)
+			local.UpdateFile("model-x", "weights.gguf", 1, 9<<30, 10<<30, "")
+
+			// A stray/older remote event for the SAME modelID must not overwrite
+			// the authoritative local state, nor delete it.
+			local.ApplyRemote(StagingProgressEvent{
+				ModelID: "model-x",
+				Status:  &StagingStatus{ModelID: "model-x", NodeName: "worker-other", FileName: "stale.gguf"},
+			})
+			local.ApplyRemote(StagingProgressEvent{ModelID: "model-x", Done: true})
+
+			all := local.GetAll()
+			Expect(all).To(HaveKey("model-x"))
+			Expect(all["model-x"].NodeName).To(Equal("worker-local"))
+			Expect(all["model-x"].FileName).To(Equal("weights.gguf"))
+		})
+	})
+
+	Context("when no publisher is wired (standalone mode)", func() {
+		It("does not broadcast", func() {
+			mc := &fakeMessagingClient{}
+			t := NewStagingTracker()
+			t.Start("model-x", "worker-1", 1)
+			t.UpdateFile("model-x", "weights.gguf", 1, 1<<30, 10<<30, "")
+			Expect(mc.published).To(BeEmpty())
+		})
+	})
+})
+
+var _ = Describe("SubjectStagingProgress", func() {
+	It("namespaces by model id and matches the wildcard prefix", func() {
+		Expect(messaging.SubjectStagingProgress("model-x")).To(Equal("staging.model-x.progress"))
+		Expect(messaging.SubjectStagingProgressWildcard).To(Equal("staging.*.progress"))
+	})
+})
--- a/core/services/routing/piiadapter/openai_completion.go
+++ b/core/services/routing/piiadapter/openai_completion.go
@@ -44,7 +44,7 @@ func applyAnyText(v any, elem int, text string) any {
 	if elem < 0 {
 		return text
 	}
-	if arr, ok := v.([]any); ok && elem >= 0 && elem < len(arr) {
+	if arr, ok := v.([]any); ok && elem < len(arr) {
 		arr[elem] = text
 	}
 	return v
--- a/core/services/routing/piidetector/pattern.go
+++ b/core/services/routing/piidetector/pattern.go
@@ -39,8 +39,9 @@ type patternDetector struct {
 // When tracing is enabled it records a pattern_pii BackendTrace so the matches
 // (group, byte range, text) show in the Traces UI alongside NER detections.
 func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntity, error) {
+	tracing := d.appConfig != nil && d.appConfig.EnableTracing
 	var start time.Time
-	if d.appConfig != nil && d.appConfig.EnableTracing {
+	if tracing {
 		trace.InitBackendTracingIfEnabled(d.appConfig.TracingMaxItems, d.appConfig.TracingMaxBodyBytes)
 		start = time.Now()
 	}
@@ -50,12 +51,12 @@ func (d *patternDetector) Detect(_ context.Context, text string) ([]pii.NEREntit
 	var traceEnts []backend.TokenEntity
 	for _, mt := range matches {
 		out = append(out, pii.NEREntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text})
-		if d.appConfig != nil && d.appConfig.EnableTracing {
+		if tracing {
 			traceEnts = append(traceEnts, backend.TokenEntity{Group: mt.Group, Start: mt.Start, End: mt.End, Score: 1.0, Text: mt.Text})
 		}
 	}

-	if d.appConfig != nil && d.appConfig.EnableTracing {
+	if tracing {
 		trace.RecordBackendTrace(patternPIITrace(d.modelName, text, traceEnts, start))
 	}
 	return out, nil
--- a/core/services/routing/piipattern/grammar.go
+++ b/core/services/routing/piipattern/grammar.go
@@ -28,10 +28,16 @@ const (
 	// credential shape, small enough that the compiled program stays tiny.
 	MaxPatternLen = 256
 	// MaxQuantifier caps an explicit {n,m} upper bound. RE2 expands a bounded
-	// repeat into that many copies, so an uncapped {0,1000000} would blow up
-	// the compiled program's memory. Unbounded {n,} (no upper) is a loop, not
-	// an expansion, and is allowed.
-	MaxQuantifier = 4096
+	// repeat into that many copies, so a large bound inflates the compiled
+	// program. Go's regexp/syntax independently rejects any bound above 1000
+	// at Parse time, so this cap MUST stay strictly below 1000 to be a live
+	// guard rather than dead code shadowed by the parser: a bound in
+	// (MaxQuantifier, 1000] reaches walk and is rejected here with an
+	// actionable error, while >1000 is caught earlier by Parse. 512 is far
+	// larger than any real credential token yet keeps the guard meaningful and
+	// is defence in depth should the stdlib cap ever rise. Unbounded {n,} (no
+	// upper) is a loop, not an expansion, and is allowed.
+	MaxQuantifier = 512
 	// MaxAlternation caps the arms of a single `a|b|c` alternation.
 	MaxAlternation = 64
 	// MaxAST bounds recursion depth so a pathologically nested pattern can't
--- a/core/services/routing/piipattern/piipattern_test.go
+++ b/core/services/routing/piipattern/piipattern_test.go
@@ -1,6 +1,7 @@
 package piipattern

 import (
+	"fmt"
 	"strings"
 	"testing"

@@ -36,6 +37,45 @@ var _ = Describe("ValidatePattern", func() {
 	)
 })

+var _ = Describe("MaxQuantifier guard (must stay live, not dead code)", func() {
+	// Go's regexp/syntax hard-caps repeat bounds at 1000 and rejects anything
+	// larger at Parse time, before walk() runs. So the walk() {n,m} guard only
+	// fires for bounds in (MaxQuantifier, 1000]; if MaxQuantifier ever creeps
+	// to >= 1000 the guard becomes unreachable dead code. These specs pin the
+	// relationship and prove the guard is the binding constraint in that band.
+	const stdlibRepeatCap = 1000
+
+	It("is strictly below the stdlib repeat cap so the guard is reachable", func() {
+		Expect(MaxQuantifier).To(BeNumerically("<", stdlibRepeatCap),
+			"MaxQuantifier must be < %d or walk()'s {n,m} guard is dead code (Parse rejects larger bounds first)", stdlibRepeatCap)
+	})
+
+	It("accepts a bound at exactly MaxQuantifier", func() {
+		Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier))).To(Succeed())
+	})
+
+	It("rejects a bound just above MaxQuantifier with our actionable error (proves the guard runs)", func() {
+		// MaxQuantifier+1 is still parseable (<= stdlib cap), so it reaches
+		// walk(), where our guard — not the parser — rejects it.
+		err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, MaxQuantifier+1))
+		Expect(err).To(HaveOccurred())
+		Expect(err.Error()).To(ContainSubstring("bound is too large"),
+			"a bound in (MaxQuantifier, stdlib cap] must be rejected by walk(), not the parser")
+	})
+
+	It("rejects an unbounded {n,} whose lower bound exceeds MaxQuantifier", func() {
+		err := ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d,}`, MaxQuantifier+1))
+		Expect(err).To(HaveOccurred())
+		Expect(err.Error()).To(ContainSubstring("bound is too large"))
+	})
+
+	It("still fails closed above the stdlib cap (Parse rejects before walk)", func() {
+		// >1000: caught by syntax.Parse; the message is the parser's, but it
+		// still fails closed — defence in depth.
+		Expect(ValidatePattern(fmt.Sprintf(`sk-ant-[A-Za-z0-9]{%d}`, stdlibRepeatCap+1))).NotTo(Succeed())
+	})
+})
+
 var _ = Describe("Compile", func() {
 	It("compiles a valid pattern with leftmost-longest semantics", func() {
 		re, err := Compile(`sk-ant-[A-Za-z0-9_-]{4,}`)
--- a/docs/content/features/distributed-mode.md
+++ b/docs/content/features/distributed-mode.md
@@ -311,7 +311,7 @@ Used by the WebUI and admin API consumers. Requires admin authentication.
 | `POST` | `/api/nodes/:id/models/unload` | Unload a model from a worker |
 | `POST` | `/api/nodes/:id/models/delete` | Delete model files from a worker |

-The **Nodes** page in the React WebUI provides a visual overview of all registered workers, their statuses, and loaded models.
+The **Nodes** page in the React WebUI provides a visual overview of all registered workers, their statuses, and loaded models. The page opens with a one-line **cluster pulse** summarising node health and an **attention callout** that surfaces nodes needing action (for example pending approvals). Below that, a roster of **node panels** lists each worker with its inline model chips (no expand click needed), filtered by an **All / Backend / Agent** segmented control. Selecting a panel opens a dedicated **node detail page** at `/app/nodes/:id` with per-node metrics, models, and backend actions. Model scheduling lives on its own **Scheduling** page (separate nav item), not as a tab on the Nodes page.

 ## Node Approval

@@ -554,7 +554,7 @@ local-ai worker \

 ## Model Scheduling

-Model scheduling controls where models are placed and how many replicas are maintained. It combines two optional features:
+Model scheduling controls where models are placed and how many replicas are maintained. In the React WebUI it has its own **Scheduling** page (a top-level nav item, separate from the Nodes page). It combines two optional features:

 ### Node Selectors

--- a/docs/content/features/face-recognition.md
+++ b/docs/content/features/face-recognition.md
@@ -7,93 +7,16 @@ url = "/features/face-recognition/"

 ![Face recognition: 1:N match against a vector store, with an anti-spoofing liveness gate that can veto a verification](/images/diagrams/face-recognition-flow.png)

-LocalAI supports face recognition: face verification (1:1), face
-identification (1:N) against a built-in vector store, face embedding,
-face detection, demographic analysis (age / gender), and antispoofing /
-liveness detection.
+LocalAI supports face recognition through the `insightface` backend:
+face verification (1:1), face identification (1:N) against a built-in
+vector store, face embedding, face detection, demographic analysis
+(age / gender), and antispoofing / liveness detection.

-The same `/v1/face/*` HTTP API is served by two backends:
+The backend ships **two interchangeable engines** under one image, each
+paired with a distinct gallery entry so users can pick by license and
+accuracy needs.

- **`face-detect` (recommended, default).** A standalone C++/ggml
-  engine ([face-detect.cpp](https://github.com/mudler/face-detect.cpp)):
-  no Python, no onnxruntime, no torch runtime. Each gallery entry is a
-  single self-describing GGUF. This is the recommended option for new
-  deployments.
- **`insightface` (Python).** The original ONNX Runtime backend. Still
-  supported; see [the Python backend](#insightface-python-backend) below.
-
-Both backends expose the identical wire format, so the API examples in
-this page work with either - only the gallery entry name (the `model`
-field) changes.
-
-## face-detect (ggml) backend
-
-The `face-detect` backend reads the detector and recognizer architecture
-(`facedetect.arch`) directly from the GGUF metadata, so installing a
-gallery entry is all that is needed to select an engine. It drives the
-Embeddings / Detect / FaceVerify / FaceAnalyze gRPC rpcs behind the
-`/v1/face/{embed,verify,analyze,detect,register,identify,forget}`
-endpoints.
-
-### Licensing - read this first
-
-| Gallery entry | Detector + recognizer | Embedding dim | License |
-|---|---|---|---|
-| `face-detect-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** (upstream insightface weights) |
-| `face-detect-buffalo-m` | SCRFD-2.5GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** |
-| `face-detect-buffalo-s` | SCRFD-500MF + MBF + GenderAge | 512 | **Non-commercial research only** |
-| `face-detect-yunet-sface` | YuNet + SFace (OpenCV Zoo) | 128 | **Apache 2.0 - commercial-safe** |
-
-The insightface buffalo packs (buffalo_l / buffalo_m / buffalo_s) are
-released by the upstream maintainers for **non-commercial research use
-only**. Pick the `face-detect-yunet-sface` entry for production /
-commercial deployments.
-
-### Quickstart
-
-Install the commercial-safe entry (recommended for copy-paste):
-
-```bash
-local-ai models install face-detect-yunet-sface
-```
-
-Verify that two images depict the same person:
-
-```bash
-curl -sX POST http://localhost:8080/v1/face/verify \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "face-detect-yunet-sface",
-    "img1": "https://example.com/alice_1.jpg",
-    "img2": "https://example.com/alice_2.jpg"
-  }'
-```
-
-Detect faces and analyze demographics (buffalo entries populate
-age / gender; YuNet + SFace returns regions only):
-
-```bash
-curl -sX POST http://localhost:8080/v1/face/detect \
-  -H "Content-Type: application/json" \
-  -d '{"model": "face-detect-buffalo-l", "img": "https://example.com/group.jpg"}'
-
-curl -sX POST http://localhost:8080/v1/face/analyze \
-  -H "Content-Type: application/json" \
-  -d '{"model": "face-detect-buffalo-l", "img": "https://example.com/alice.jpg"}'
-```
-
-The 1:N register / identify / forget workflow and the rest of the API
-are identical to the [API reference](#api-reference) below - just pass a
-`face-detect-*` model name. The per-engine verify thresholds are ~0.35
-for the buffalo ArcFace/MBF recognizers and ~0.363 for SFace.
-
-## insightface (Python) backend
-
-The `insightface` backend ships **two interchangeable engines** under
-one image, each paired with a distinct gallery entry so users can pick
-by license and accuracy needs.
-
-### Licensing - read this first
+## Licensing — read this first

 | Gallery entry | Detector + recognizer | Size | License |
 |---|---|---|---|
--- a/docs/content/features/voice-recognition.md
+++ b/docs/content/features/voice-recognition.md
@@ -7,92 +7,16 @@ url = "/features/voice-recognition/"

 ![Voice recognition: register, identify, and forget voiceprints in a vector store, for 1:1 verify or 1:N identify](/images/diagrams/voice-recognition-flow.png)

-LocalAI supports voice (speaker) recognition: speaker verification
-(1:1), speaker identification (1:N) against a built-in vector store,
-speaker embedding, and demographic analysis (age / gender / emotion
-from voice).
+LocalAI supports voice (speaker) recognition through the
+`speaker-recognition` backend: speaker verification (1:1), speaker
+identification (1:N) against a built-in vector store, speaker
+embedding, and demographic analysis (age / gender / emotion from
+voice).

 The audio analog to [Face Recognition](/features/face-recognition/),
-served over the same `/v1/voice/*` HTTP API by two backends:
+following the same two-engine pattern under one image.

- **`voice-detect` (recommended, default).** A standalone C++/ggml
-  engine ([voice-detect.cpp](https://github.com/mudler/voice-detect.cpp)):
-  no Python, no onnxruntime, no torch runtime. Each gallery entry is a
-  single self-describing GGUF. This is the recommended option for new
-  deployments.
- **`speaker-recognition` (Python).** The original SpeechBrain / ONNX
-  backend. Still supported; see [the Python backend](#speaker-recognition-python-backend)
-  below.
-
-Both backends expose the identical wire format, so the API examples on
-this page work with either - only the gallery entry name (the `model`
-field) changes.
-
-## voice-detect (ggml) backend
-
-The `voice-detect` backend reads the embedding (or analysis)
-architecture (`voicedetect.arch`) directly from the GGUF metadata, so
-installing a gallery entry is all that is needed to select an engine. It
-drives the VoiceEmbed / VoiceVerify / VoiceAnalyze gRPC rpcs behind the
-`/v1/voice/{embed,verify,analyze,register,identify,forget}` endpoints.
-
-### Gallery entries
-
-| Gallery entry | Model | Embedding dim | License |
-|---|---|---|---|
-| `voice-detect-ecapa-tdnn` | SpeechBrain ECAPA-TDNN (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
-| `voice-detect-wespeaker-resnet34` | WeSpeaker ResNet34 (VoxCeleb) | 256 | CC-BY-4.0 |
-| `voice-detect-eres2net` | 3D-Speaker ERes2Net (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
-| `voice-detect-campplus` | 3D-Speaker CAM++ (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
-| `voice-detect-emotion-wav2vec2` | audEERING wav2vec2 (age / gender / emotion) | analyze head | **CC-BY-NC-SA-4.0 - non-commercial** |
-
-The four speaker-recognition entries drive verify / embed / identify.
-`voice-detect-emotion-wav2vec2` is the analysis head behind
-`/v1/voice/analyze` (continuous age estimate plus gender and emotion
-class scores) and is **non-commercial / research use only**.
-
-### Quickstart
-
-Install the default entry (recommended for copy-paste):
-
-```bash
-local-ai models install voice-detect-ecapa-tdnn
-```
-
-Verify that two audio clips were spoken by the same person:
-
-```bash
-curl -sX POST http://localhost:8080/v1/voice/verify \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "voice-detect-ecapa-tdnn",
-    "audio1": "https://example.com/alice_1.wav",
-    "audio2": "https://example.com/alice_2.wav"
-  }'
-```
-
-Analyze age / gender / emotion (install the analyze entry first):
-
-```bash
-local-ai models install voice-detect-emotion-wav2vec2
-
-curl -sX POST http://localhost:8080/v1/voice/analyze \
-  -H "Content-Type: application/json" \
-  -d '{"model": "voice-detect-emotion-wav2vec2", "audio": "https://example.com/alice.wav"}'
-```
-
-The 1:N register / identify / forget workflow and the rest of the API
-are identical to the [API reference](#api-reference) below - just pass a
-`voice-detect-*` model name. The default verify threshold is ~0.25 for
-the ECAPA-TDNN / ERes2Net / CAM++ recognizers and ~0.30 for WeSpeaker
-ResNet34.
-
-## speaker-recognition (Python) backend
-
-The `speaker-recognition` backend follows the same two-engine pattern
-under one image.
-
-### Engines
+## Engines

 | Gallery entry | Model | Size | License |
 |---|---|---|---|
--- a/docs/content/getting-started/models.md
+++ b/docs/content/getting-started/models.md
@@ -131,6 +131,10 @@ local-ai run ollama://gemma:2b
 local-ai run oci://localai/phi-2:latest
 ```

+{{% notice note %}}
+When pulling models from Ollama or OCI registries, LocalAI identifies itself with a `LocalAI/<version>` `User-Agent` header so registry operators can attribute usage to LocalAI.
+{{% /notice %}}
+
 ### Run Models via URI

 To run models via URI, specify a URI to a model file or a configuration file when starting LocalAI. Valid syntax includes:
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -97,8 +97,6 @@ All backends listed here can be installed on demand from the [Backend Gallery]({
 | [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
 | [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
 | [sam3.cpp](https://github.com/PABannier/sam3.cpp) | Segment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
-| [face-detect.cpp](https://github.com/mudler/face-detect.cpp) | Native face detection, recognition, embedding, demographics and anti-spoofing (SCRFD/ArcFace, YuNet/SFace) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
-| [voice-detect.cpp](https://github.com/mudler/voice-detect.cpp) | Native speaker (voice) recognition and voice analysis (ECAPA-TDNN, WeSpeaker, ERes2Net, CAM++, wav2vec2) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
 | [insightface](https://github.com/deepinsight/insightface) | Face verification, embedding, and anti-spoofing liveness (ONNX Runtime) | CPU, CUDA 12 |
 | [speaker-recognition](https://speechbrain.github.io/) | Speaker (voice) recognition via SpeechBrain ECAPA-TDNN | CPU, CUDA 12, Metal |

--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,4 +1,142 @@
 ---
+- name: "qwythos-9b-claude-mythos-5-1m"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF
+  description: |
+    # Qwythos-9B
+
+    **Developed by Empero**
+
+    **Qwythos-9B** is a full-parameter reasoning model built on top of a **deeply uncensored Qwen3.5-9B base** and post-trained on **over 500 million tokens** of high-quality Claude Mythos and Claude Fable traces, with chain-of-thought generated in-house by Empero AI's internal tool **rethink**.
+
+    The result is a compact, fast, **dramatically more capable** 9B reasoning model. Headline capabilities:
+
+    ...
+  license: "apache-2.0"
+  tags:
+    - llm
+    - gguf
+    - vision
+    - multimodal
+    - reasoning
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    mmproj: llama-cpp/mmproj/Qwythos-9B-Claude-Mythos-5-1M-GGUF/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
+    options:
+      - use_jinja:true
+      - spec_type:draft-mtp
+      - spec_n_max:6
+      - spec_p_min:0.75
+    parameters:
+      model: llama-cpp/models/Qwythos-9B-Claude-Mythos-5-1M-GGUF/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/Qwythos-9B-Claude-Mythos-5-1M-GGUF/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
+      sha256: 24ee22e0f5d9f0d3d615809607f365c728d9b0c3f3fb6eb19d8bd83a1c2933d8
+      uri: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF/resolve/main/Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/Qwythos-9B-Claude-Mythos-5-1M-GGUF/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
+      sha256: f70dc3509053962b0d0d3ee8a7eacebf5d60aa560cad78254ae8698516ae029f
+      uri: https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF/resolve/main/mmproj-Qwythos-9B-Claude-Mythos-5-1M-f16.gguf
+- name: "glm-5.2"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/unsloth/GLM-5.2-GGUF
+  description: |
+    # GLM-5.2
+
+    👋 Join our WeChat or Discord community.
+
+    📖 Check out the GLM-5.2 blog and GLM-5 Technical report.
+
+    📍 Use GLM-5.2 API services on Z.ai API Platform.
+
+    🔜 Try GLM-5.2 here.
+
+    [Paper]
+    [GitHub]
+
+    ## Introduction
+
+    We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a **solid 1M-token context**. GLM-5.2's new capabilities include:
+      - **Solid 1M Context:** A solid 1M-token context that stably sustains long-horizon work
+      - **Advanced Coding with Flexible Effort**: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
+      - **Improved Architecture**: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
+      - **Pure Open**: An MIT open-source license — no regional limits, technical access without borders
+
+    ## Benchmark
+
+    ## Serve GLM-5.2 Locally
+
+    ...
+  license: "mit"
+  tags:
+    - llm
+    - gguf
+  icon: https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/bench_52.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    options:
+      - use_jinja:true
+      - spec_type:draft-mtp
+      - spec_n_max:6
+      - spec_p_min:0.75
+    parameters:
+      min_p: 0.01
+      model: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
+      repeat_penalty: 1
+      temperature: 1
+      top_k: -1
+      top_p: 0.95
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
+      sha256: 3256ac8c290273f0965ff39e93a8bcd07dc99bcd23e923bd4b7306ef39061038
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00001-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00002-of-00011.gguf
+      sha256: 1020105e78d862988a6cabb3a78eafa75f29666ab8a5fd10de1b9b8c8a6bc5e8
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00002-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00003-of-00011.gguf
+      sha256: 0b36f406e120759290894ea4960d5086f9b362a8c8f9c7fcaad24b4471172efb
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00003-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00004-of-00011.gguf
+      sha256: 04b19199f52ba29e7f9966b15df3fbc2d1e5c56cd6343c405076be7174d49d32
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00004-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00005-of-00011.gguf
+      sha256: 5cb76d724ee16e80c1cb6aba29aacd76161e7a6f147079be3447501c06d95f2c
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00005-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00006-of-00011.gguf
+      sha256: ec2c65255c834b686f066e350bc5b8d8a7020cd1133f0ee9e819d2fb5d3afad0
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00006-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00007-of-00011.gguf
+      sha256: 53c8328852ca0b6791a9a9243bcc56157305adca8526a646054389845e7445a9
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00007-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00008-of-00011.gguf
+      sha256: 9a23bfb21c5f6fcc94b0329c108ec1ef3fdbd815c57eeb0bf105d26861d7271e
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00008-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00009-of-00011.gguf
+      sha256: 71088054fb1a09a4f38e2ee8a726526790660a4f77ead817f75cb7a484bdb0b8
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00009-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00010-of-00011.gguf
+      sha256: 848db99658faf24971df23638281305a15bdc187cbcaed968952ed9e9c835b50
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00010-of-00011.gguf
+    - filename: llama-cpp/models/GLM-5.2-GGUF/GLM-5.2-UD-Q4_K_M-00011-of-00011.gguf
+      sha256: 629e23bce250fb500d9a190de7249c2882af524aacc112ce507a871ed5bebf90
+      uri: https://huggingface.co/unsloth/GLM-5.2-GGUF/resolve/main/UD-Q4_K_M/GLM-5.2-UD-Q4_K_M-00011-of-00011.gguf
 - name: "qwen3.6-35b-a3b-nvfp4-mtp"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
@@ -1114,6 +1252,98 @@
    - filename: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
      sha256: 01b76572f80b7d2ebee80a27cb9c3699c26b04cae1c402eee7664fc17a4b5ce6
      uri: https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf
+- name: "privacy-filter-nemotron"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
+  urls:
+    - https://huggingface.co/OpenMed/privacy-filter-nemotron
+    - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF
+  description: |
+    A fine-grained English PII token-classification model: a fine-tune of
+    openai/privacy-filter by OpenMed on NVIDIA's Nemotron-PII dataset. It labels
+    every token with a BIOES tag over 55 PII categories (221 classes), trading
+    the multilingual sibling's language breadth for category depth - identity,
+    contact, address, dates, government IDs, financial, healthcare, enterprise,
+    vehicle and digital entities (including api_key, ipv4/ipv6 and mac_address).
+    For multilingual text prefer privacy-filter-multilingual instead.
+
+    In LocalAI this is a PII detector for the NER redactor tier: set
+    known_usecases to [token_classify] (as below), and any model opts into
+    redaction by listing this one under pii.detectors. The detection policy
+    (which categories to mask vs block, and the score threshold) lives on this
+    model's own pii_detection block - see the overrides below. It runs locally
+    with no Python, served by the standalone privacy-filter backend's
+    TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
+    entity spans).
+
+    Architecture: gpt-oss-style sparse MoE (8 layers, d_model 640, 128 experts
+    top-4, ~1.5B total / ~50M active per token), bidirectional banded attention,
+    o200k tokenizer and a 221-way token-classification head; served via the
+    openai-privacy-filter architecture. F16, ~2.8 GB. (A smaller Q8_0 quant
+    exists on the GGUF repo for RAM-constrained use - validate it on your own
+    data, since for PII a single dropped span is a leak.)
+  license: apache-2.0
+  tags:
+    - token-classification
+    - ner
+    - pii
+    - privacy
+    - nemotron
+    - gguf
+  overrides:
+    backend: privacy-filter
+    embeddings: true
+    known_usecases:
+      - token_classify
+    parameters:
+      model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf
+    pii_detection:
+      min_score: 0.5
+      default_action: mask
+  files:
+    - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-f16.gguf
+      sha256: 70dfe91ff220ff04594168a83e296dcc2054449cde77f98d0e782edbb6a31f5a
+      uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-f16.gguf
+- name: "privacy-filter-nemotron-q8"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
+  urls:
+    - https://huggingface.co/OpenMed/privacy-filter-nemotron
+    - https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF
+  description: |
+    Q8_0 quant of privacy-filter-nemotron (~1.64 GB, vs ~2.8 GB for F16) for
+    RAM-constrained / edge use (e.g. a 4 GB Raspberry Pi 5). The MoE expert
+    weights are stored 8-bit; attention, embeddings and the classifier head
+    stay F16. Same model, policy and runtime as the F16 entry - see
+    privacy-filter-nemotron for the full description.
+
+    Prefer the F16 entry when you can afford it: it is the reference artifact.
+    On a mixed-PII document the publisher measured q8 matching F16 on 99.93% of
+    token labels with an identical span set at threshold 0.5 - but one token
+    flipped, and for PII a single dropped span is a leak. Treat q8 as a
+    deliberate size/speed tradeoff and validate it on your own data.
+  license: apache-2.0
+  tags:
+    - token-classification
+    - ner
+    - pii
+    - privacy
+    - nemotron
+    - gguf
+  overrides:
+    backend: privacy-filter
+    embeddings: true
+    known_usecases:
+      - token_classify
+    parameters:
+      model: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf
+    pii_detection:
+      min_score: 0.5
+      default_action: mask
+  files:
+    - filename: privacy-filter/models/privacy-filter-nemotron/privacy-filter-nemotron-q8.gguf
+      sha256: 2ec11c154e572a2686f4d77e861b7f74e6917e09638fe9bd27156d48bd99e21a
+      uri: https://huggingface.co/LocalAI-io/privacy-filter-nemotron-GGUF/resolve/main/privacy-filter-nemotron-q8.gguf
 - name: "secret-filter"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  description: |
@@ -8449,248 +8679,6 @@
    - filename: MiniFASNetV1SE.onnx
      sha256: ebab7f90c7833fbccd46d3a555410e78d969db5438e169b6524be444862b3676
      uri: https://github.com/yakhyo/face-anti-spoofing/releases/download/weights/MiniFASNetV1SE.onnx
- name: face-detect-buffalo-l
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/deepinsight/insightface
-  description: |
-    Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF
-    detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and
-    shipped as a single GGUF for the `face-detect` backend. Highest
-    accuracy of the buffalo line.
-
-    No Python / onnxruntime / torch runtime: face-detect.cpp reads the
-    detector and embedder architecture (`facedetect.arch`) directly from
-    the GGUF metadata, so installing this entry is all that is needed to
-    select buffalo_l. Drives the Embedding / Detect / FaceVerify /
-    FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect}
-    REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof
-    ensemble, available via the FaceVerify `anti_spoof` request flag.
-    NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see
-    `face-detect-yunet-sface`.
-  license: insightface-non-commercial
-  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - research-only
-    - gpu
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.35
-    parameters:
-      model: face-detect-buffalo-l.gguf
-  files:
-    - filename: face-detect-buffalo-l.gguf
-      sha256: 6ed070f6e569beeed542ddd5603bcbc9eb8ea57f728f7d8013d6a90b2b952116
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_l.gguf
- name: face-detect-buffalo-m
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/deepinsight/insightface
-  description: |
-    Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF
-    detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF
-    for the `face-detect` backend. Same recognition accuracy as
-    `buffalo_l` with a cheaper detector: a good balance on mid-range
-    hardware.
-
-    The architecture (`facedetect.arch`) is read from the GGUF metadata,
-    so this entry alone selects the buffalo_m engine. This GGUF also
-    embeds the MiniFASNet anti-spoof ensemble, available via the
-    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
-    ONLY.
-  license: insightface-non-commercial
-  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - research-only
-    - gpu
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.35
-    parameters:
-      model: face-detect-buffalo-m.gguf
-  files:
-    - filename: face-detect-buffalo-m.gguf
-      sha256: 0f7527eeb97b88719bf7e11e43ab8af6f05999357d767f8dde53db3c586c1c3f
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_m.gguf
- name: face-detect-buffalo-s
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/deepinsight/insightface
-  description: |
-    Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF
-    detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the
-    `face-detect` backend. Small and CPU-friendly: a good fit for
-    mid-range and edge deployments.
-
-    The architecture (`facedetect.arch`) is read from the GGUF metadata,
-    so this entry alone selects the buffalo_s engine. This GGUF also
-    embeds the MiniFASNet anti-spoof ensemble, available via the
-    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
-    ONLY.
-  license: insightface-non-commercial
-  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - research-only
-    - edge
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.35
-    parameters:
-      model: face-detect-buffalo-s.gguf
-  files:
-    - filename: face-detect-buffalo-s.gguf
-      sha256: 7490b1efbc8746b188a5aef0adf5e3d1a2dc9607abd474018893f95571999969
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_s.gguf
- name: face-detect-buffalo-sc
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/deepinsight/insightface
-  description: |
-    Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M
-    detector + a small ArcFace embedder), converted to a C++/ggml GGUF
-    for the `face-detect` backend. This is the smallest insightface
-    pack: the lightest option for low-resource and edge deployments.
-
-    The architecture (`facedetect.arch`) is read from the GGUF metadata,
-    so this entry alone selects the buffalo_sc engine. If this GGUF
-    embeds the MiniFASNet anti-spoof ensemble, it is available via the
-    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
-    ONLY.
-  license: insightface-non-commercial
-  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - research-only
-    - edge
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.35
-    parameters:
-      model: face-detect-buffalo-sc.gguf
-  files:
-    - filename: face-detect-buffalo-sc.gguf
-      sha256: f754c0e32d5efbbc53d7efca13be2807676bf5db20a8594ef96b32afa2c482b1
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/buffalo_sc.gguf
- name: face-detect-antelopev2
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/deepinsight/insightface
-  description: |
-    Face recognition with insightface's `antelopev2` pack (SCRFD-10G
-    detector + ArcFace glint360k R100, 512-d embedder), converted to a
-    C++/ggml GGUF for the `face-detect` backend. The higher-accuracy
-    insightface pack: heavier, but the best fit when recognition
-    quality matters more than speed.
-
-    The architecture (`facedetect.arch`) is read from the GGUF metadata,
-    so this entry alone selects the antelopev2 engine. If this GGUF
-    embeds the MiniFASNet anti-spoof ensemble, it is available via the
-    FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE
-    ONLY.
-  license: insightface-non-commercial
-  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - research-only
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.35
-    parameters:
-      model: face-detect-antelopev2.gguf
-  files:
-    - filename: face-detect-antelopev2.gguf
-      sha256: 245e657e51754fbf075dd43d80a80a2d14a60c2fc42a3220f63eef17a315e96c
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/antelopev2.gguf
- name: face-detect-yunet-sface
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/face-detect.cpp
-    - https://github.com/opencv/opencv_zoo
-  description: |
-    Face recognition with OpenCV Zoo weights: YuNet detector + SFace
-    128-d recognizer, converted to a C++/ggml GGUF for the `face-detect`
-    backend. APACHE 2.0: safe for commercial use. Lower accuracy than the
-    buffalo packs and no demographic head, but the commercial-friendly
-    alternative to the insightface buffalo line.
-
-    The architecture (`facedetect.arch`) is read from the GGUF metadata,
-    so this entry alone selects the YuNet + SFace engine.
-  license: apache-2.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - face-recognition
-    - face-verification
-    - face-embedding
-    - commercial-ok
-    - gpu
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: face-detect
-    known_usecases:
-      - face_recognition
-      - detection
-      - embeddings
-    options:
-      - verify_threshold:0.363
-    parameters:
-      model: face-detect-yunet-sface.gguf
-  files:
-    - filename: face-detect-yunet-sface.gguf
-      sha256: 9ce78d4ba0ae9d5e8c91a0e145d511558d1d90f5d9c1f4131cca9bb4bce60902
-      uri: https://huggingface.co/mudler/face-detect-gguf/resolve/main/yunet-sface.gguf
 - name: speechbrain-ecapa-tdnn
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
@@ -8760,217 +8748,6 @@
    - filename: wespeaker_voxceleb_resnet34.onnx
      sha256: 7bb2f06e9df17cdf1ef14ee8a15ab08ed28e8d0ef5054ee135741560df2ec068
      uri: https://huggingface.co/Wespeaker/wespeaker-voxceleb-resnet34-LM/resolve/main/voxceleb_resnet34_LM.onnx
- name: voice-detect-ecapa-tdnn
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-    - https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb
-  description: |
-    Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained
-    on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the
-    `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal
-    Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe.
-
-    No Python / torch runtime: voice-detect.cpp reads the embedding
-    architecture (`voicedetect.arch`) directly from the GGUF metadata,
-    so installing this entry is all that is needed to select ECAPA-TDNN.
-    Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the
-    /v1/voice/{verify,embed,register,identify,forget} REST endpoints.
-  license: apache-2.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - speaker-verification
-    - speaker-embedding
-    - commercial-ok
-    - cpu
-    - gpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    options:
-      - verify_threshold:0.25
-    parameters:
-      model: voice-detect-ecapa-tdnn-voxceleb.gguf
-  files:
-    - filename: voice-detect-ecapa-tdnn-voxceleb.gguf
-      sha256: 68046a1fdfb7843f460962db4739fbd381cc5c3ab93d1505e75e2f4c0dc19b8f
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/ecapa-tdnn-voxceleb.gguf
- name: voice-detect-wespeaker-resnet34
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-    - https://github.com/wenet-e2e/wespeaker
-  description: |
-    Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb,
-    converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d
-    embeddings, CPU-friendly and runtime-free (no onnxruntime or torch).
-    CC-BY-4.0.
-
-    Use when you want WeSpeaker's ResNet34 topology instead of
-    ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read
-    from the GGUF metadata, so this entry alone selects the engine.
-  license: cc-by-4.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - speaker-verification
-    - speaker-embedding
-    - commercial-ok
-    - edge
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    options:
-      - verify_threshold:0.25
-    parameters:
-      model: voice-detect-wespeaker-resnet34.gguf
-  files:
-    - filename: voice-detect-wespeaker-resnet34.gguf
-      sha256: 72040372494eafec299836bc1977cfc13c603cb486674ed59b0f4c03758d29da
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/wespeaker-resnet34-voxceleb.gguf
- name: voice-detect-eres2net
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-    - https://huggingface.co/iic/speech_eres2net_sv_en_voxceleb_16k
-  description: |
-    Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb,
-    converted to a C++/ggml GGUF for the `voice-detect` backend.
-    192-d embeddings with strong verification accuracy. APACHE 2.0.
-
-    The embedding architecture (`voicedetect.arch`) is read from the
-    GGUF metadata, so this entry alone selects the ERes2Net engine.
-  license: apache-2.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - speaker-verification
-    - speaker-embedding
-    - commercial-ok
-    - cpu
-    - gpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    options:
-      - verify_threshold:0.25
-    parameters:
-      model: voice-detect-eres2net.gguf
-  files:
-    - filename: voice-detect-eres2net.gguf
-      sha256: d39f53c7a4d39734740a86a07521b9a819ee8ea56c1a9436eba611ab733a3d06
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/eres2net-base-zh-cn.gguf
- name: voice-detect-campplus
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-    - https://huggingface.co/iic/speech_campplus_sv_en_voxceleb_16k
-  description: |
-    Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb,
-    converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d
-    embeddings, a fast context-aware masking topology well-suited to
-    CPU and edge deployments. APACHE 2.0.
-
-    The embedding architecture (`voicedetect.arch`) is read from the
-    GGUF metadata, so this entry alone selects the CAM++ engine.
-  license: apache-2.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - speaker-verification
-    - speaker-embedding
-    - commercial-ok
-    - edge
-    - cpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    options:
-      - verify_threshold:0.25
-    parameters:
-      model: voice-detect-campplus.gguf
-  files:
-    - filename: voice-detect-campplus.gguf
-      sha256: a6e34c6d230cff26e37b71a2df0907fde1de425654e28d9d5cacca32e02a13d3
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/campplus-zh-cn.gguf
- name: voice-detect-emotion-wav2vec2
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://github.com/mudler/voice-detect.cpp
-    - https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
-  description: |
-    Voice analysis (age / gender / emotion) with audEERING's wav2vec2
-    model, converted to a C++/ggml GGUF for the `voice-detect` backend.
-    Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST
-    endpoint, returning a continuous age estimate plus gender and
-    emotion class scores for a single utterance. CC-BY-NC-SA-4.0 -
-    research / non-commercial use only.
-
-    The analysis architecture (`voicedetect.arch`) is read from the
-    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
-  license: cc-by-nc-sa-4.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - voice-analysis
-    - emotion-recognition
-    - cpu
-    - gpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    parameters:
-      model: voice-detect-emotion-wav2vec2.gguf
-  files:
-    - filename: voice-detect-emotion-wav2vec2.gguf
-      sha256: 9e9793e4f77a27f4ae068bcb29c2b6fe2f74881799e2cfea0f8e436ad3765e50
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/emotion-wav2vec2-superb-er.gguf
- name: voice-detect-age-gender-wav2vec2
-  url: github:mudler/LocalAI/gallery/virtual.yaml@master
-  urls:
-    - https://huggingface.co/audeering/wav2vec2-large-robust-24-ft-age-gender
-    - https://github.com/mudler/voice-detect.cpp
-  description: |
-    wav2vec2-large-robust age + gender analysis head
-    (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a
-    C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze
-    gRPC rpc and the /v1/voice/analyze REST endpoint, returning a
-    continuous age estimate plus gender class scores for a single
-    utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only.
-
-    The analysis architecture (`voicedetect.arch`) is read from the
-    GGUF metadata, so this entry alone selects the wav2vec2 analyze head.
-  license: cc-by-nc-sa-4.0
-  icon: https://avatars.githubusercontent.com/u/95302084
-  tags:
-    - voice-recognition
-    - voice-analysis
-    - research-only
-    - cpu
-    - gpu
-  last_checked: "2026-06-22"
-  overrides:
-    backend: voice-detect
-    known_usecases:
-      - speaker_recognition
-    parameters:
-      model: voice-detect-age-gender-wav2vec2.gguf
-  files:
-    - filename: voice-detect-age-gender-wav2vec2.gguf
-      sha256: d92486b3f1ea7baf6a90f1026b7b8e9848b3a8332bccfb01cc8889eed7069064
-      uri: https://huggingface.co/mudler/voice-detect-gguf/resolve/main/age-gender-wav2vec2-audeering.gguf
 - name: rfdetr-base
  url: github:mudler/LocalAI/gallery/virtual.yaml@master
  urls:
--- a/pkg/oci/blob.go
+++ b/pkg/oci/blob.go
@@ -11,6 +11,8 @@ import (

 	oras "oras.land/oras-go/v2"
 	"oras.land/oras-go/v2/registry/remote"
+	"oras.land/oras-go/v2/registry/remote/auth"
+	"oras.land/oras-go/v2/registry/remote/retry"
 )

 func FetchImageBlob(ctx context.Context, r, reference, dst string, statusReader func(ocispec.Descriptor) io.Writer) error {
@@ -28,6 +30,16 @@ func FetchImageBlob(ctx context.Context, r, reference, dst string, statusReader
 	}
 	repo.SkipReferrersGC = true

+	// Identify LocalAI to the registry. This mirrors oras' auth.DefaultClient
+	// (same retry policy) but advertises a LocalAI User-Agent instead of the
+	// library default.
+	client := &auth.Client{
+		Client: retry.DefaultClient,
+		Cache:  auth.NewCache(),
+	}
+	client.SetUserAgent(UserAgent())
+	repo.Client = client
+
 	// https://github.com/oras-project/oras/blob/main/cmd/oras/internal/option/remote.go#L364
 	// https://github.com/oras-project/oras/blob/main/cmd/oras/root/blob/fetch.go#L136
 	desc, reader, err := oras.Fetch(ctx, repo.Blobs(), reference, oras.DefaultFetchOptions)
--- a/pkg/oci/image.go
+++ b/pkg/oci/image.go
@@ -176,6 +176,7 @@ func GetImage(targetImage, targetPlatform string, auth *registrytypes.AuthConfig
 	opts := []remote.Option{
 		remote.WithTransport(tr),
 		remote.WithPlatform(*platform),
+		remote.WithUserAgent(UserAgent()),
 	}
 	if auth != nil {
 		opts = append(opts, remote.WithAuth(staticAuth{auth}))
@@ -223,6 +224,7 @@ func GetImageDigest(targetImage, targetPlatform string, auth *registrytypes.Auth
 	opts := []remote.Option{
 		remote.WithTransport(tr),
 		remote.WithPlatform(*platform),
+		remote.WithUserAgent(UserAgent()),
 	}
 	if auth != nil {
 		opts = append(opts, remote.WithAuth(staticAuth{auth}))
--- a/pkg/oci/ollama.go
+++ b/pkg/oci/ollama.go
@@ -47,6 +47,7 @@ func OllamaModelManifest(image string) (*Manifest, error) {
 		return nil, err
 	}
 	req.Header.Set("Accept", "application/vnd.docker.distribution.manifest.v2+json")
+	req.Header.Set("User-Agent", UserAgent())
 	client := httpclient.New(httpclient.WithFollowRedirects())
 	resp, err := client.Do(req)
 	if err != nil {
--- a/pkg/oci/useragent.go
+++ b/pkg/oci/useragent.go
@@ -0,0 +1,19 @@
+package oci
+
+import (
+	"fmt"
+
+	"github.com/mudler/LocalAI/internal"
+)
+
+// UserAgent returns the User-Agent string LocalAI sends on outbound registry
+// requests (OCI registries and Ollama). It identifies the client as LocalAI
+// and, when the binary was built with a version stamp, appends it so registries
+// can attribute client-side usage to LocalAI rather than to the generic
+// User-Agent of the underlying transport library.
+func UserAgent() string {
+	if internal.Version == "" {
+		return "LocalAI"
+	}
+	return fmt.Sprintf("LocalAI/%s", internal.Version)
+}
--- a/pkg/oci/useragent_test.go
+++ b/pkg/oci/useragent_test.go
@@ -0,0 +1,32 @@
+package oci_test
+
+import (
+	"github.com/mudler/LocalAI/internal"
+	. "github.com/mudler/LocalAI/pkg/oci"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("OCI", func() {
+	Context("UserAgent", func() {
+		var savedVersion string
+
+		BeforeEach(func() {
+			savedVersion = internal.Version
+		})
+
+		AfterEach(func() {
+			internal.Version = savedVersion
+		})
+
+		It("identifies as LocalAI when no version is stamped", func() {
+			internal.Version = ""
+			Expect(UserAgent()).To(Equal("LocalAI"))
+		})
+
+		It("appends the build version when one is stamped", func() {
+			internal.Version = "v3.2.1"
+			Expect(UserAgent()).To(Equal("LocalAI/v3.2.1"))
+		})
+	})
+})
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -1021,6 +1021,25 @@ const docTemplate = `{
                }
            }
        },
+        "/api/nodes/models": {
+            "get": {
+                "tags": [
+                    "Nodes"
+                ],
+                "summary": "List all loaded models cluster-wide",
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "array",
+                            "items": {
+                                "$ref": "#/definitions/nodes.NodeModel"
+                            }
+                        }
+                    }
+                }
+            }
+        },
        "/api/nodes/{id}/max-replicas-per-model": {
            "put": {
                "tags": [
@@ -3754,6 +3773,52 @@ const docTemplate = `{
                }
            }
        },
+        "nodes.NodeModel": {
+            "type": "object",
+            "properties": {
+                "address": {
+                    "description": "gRPC address for this replica's backend process",
+                    "type": "string"
+                },
+                "backend_type": {
+                    "description": "e.g. \"llama-cpp\"; used by reconciler to replicate loads",
+                    "type": "string"
+                },
+                "created_at": {
+                    "type": "string"
+                },
+                "id": {
+                    "type": "string"
+                },
+                "in_flight": {
+                    "description": "number of active requests on this replica",
+                    "type": "integer"
+                },
+                "last_used": {
+                    "type": "string"
+                },
+                "loading_by": {
+                    "description": "frontend ID that triggered loading",
+                    "type": "string"
+                },
+                "model_name": {
+                    "type": "string"
+                },
+                "node_id": {
+                    "type": "string"
+                },
+                "replica_index": {
+                    "type": "integer"
+                },
+                "state": {
+                    "description": "loading, loaded, unloading, idle",
+                    "type": "string"
+                },
+                "updated_at": {
+                    "type": "string"
+                }
+            }
+        },
        "proto.MemoryUsageData": {
            "type": "object",
            "properties": {
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -1018,6 +1018,25 @@
                }
            }
        },
+        "/api/nodes/models": {
+            "get": {
+                "tags": [
+                    "Nodes"
+                ],
+                "summary": "List all loaded models cluster-wide",
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "schema": {
+                            "type": "array",
+                            "items": {
+                                "$ref": "#/definitions/nodes.NodeModel"
+                            }
+                        }
+                    }
+                }
+            }
+        },
        "/api/nodes/{id}/max-replicas-per-model": {
            "put": {
                "tags": [
@@ -3751,6 +3770,52 @@
                }
            }
        },
+        "nodes.NodeModel": {
+            "type": "object",
+            "properties": {
+                "address": {
+                    "description": "gRPC address for this replica's backend process",
+                    "type": "string"
+                },
+                "backend_type": {
+                    "description": "e.g. \"llama-cpp\"; used by reconciler to replicate loads",
+                    "type": "string"
+                },
+                "created_at": {
+                    "type": "string"
+                },
+                "id": {
+                    "type": "string"
+                },
+                "in_flight": {
+                    "description": "number of active requests on this replica",
+                    "type": "integer"
+                },
+                "last_used": {
+                    "type": "string"
+                },
+                "loading_by": {
+                    "description": "frontend ID that triggered loading",
+                    "type": "string"
+                },
+                "model_name": {
+                    "type": "string"
+                },
+                "node_id": {
+                    "type": "string"
+                },
+                "replica_index": {
+                    "type": "integer"
+                },
+                "state": {
+                    "description": "loading, loaded, unloading, idle",
+                    "type": "string"
+                },
+                "updated_at": {
+                    "type": "string"
+                }
+            }
+        },
        "proto.MemoryUsageData": {
            "type": "object",
            "properties": {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
dependabot[bot]	7e2b69e895	chore(deps): bump torch in /backend/python/vllm Bumps torch from 2.9.1+cpu to 2.12.1+xpu. --- updated-dependencies: - dependency-name: torch dependency-version: 2.12.1+xpu dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-22 18:33:32 +00:00
Richard Palethorpe	63bcbf6c12	fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401 ) * fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier Follow-up to the NER tier engine (#10360), already on master. This carries only the incremental review fixes and tests that postdate that merge — the feature itself is not re-introduced. Review fixes: - openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText (the `elem < 0` guard above already returns). - application.go: collapse ResolvePIIPolicy's inline re-implementation of PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the "explicit pii.enabled wins, else cloud-proxy default" rule) and return true past the !enabled guard where it is provable. - pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in patternDetector.Detect into one local. - grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire — dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000] is rejected here with an actionable error; >1000 still fails closed via Parse. Specs pin the relationship so the guard can't silently revert. - PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and force the DOM value back when clamping (min={0} only constrained the spinner, so a negative reached saved config and silently disabled the length filter). Tests: - piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead). - model-config.spec.js: assert the min_len clamp, and that entity_actions collapses a duplicate group to a single row (map semantics; regression guard against emitting an array that drops a row on save). - tests/e2e-backends: token_classify capability driving the TokenClassify gRPC RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via `make test-extra-backend-privacy-filter` (3/3 specs). - Makefile: test-extra-backend-privacy-filter wrapper. - tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact (mask + block) through the full HTTP -> detector -> redactor path; gated on PII_NER_MODEL_GGUF so the default suite is unaffected. - .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job running the container harness on CPU. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(gallery): add privacy-filter-nemotron (f16 + q8) GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing privacy-filter-multilingual entry, trading language breadth for category depth. - privacy-filter-nemotron: F16 reference artifact (~2.8 GB). - privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge use; description notes the size/speed tradeoff and to validate on your own data (a single dropped span is a PII leak). Both run on the privacy-filter backend with known_usecases [token_classify] and a default mask policy (min_score 0.5); operators add per-category entity_actions as needed. sha256s taken from the HF repo's LFS object ids. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-22 18:26:19 +02:00
LocalAI [bot]	95b058e1c5	feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447 ) * chore: gitignore SDD scratch directory Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(nodes): add GET /api/nodes/models cluster-wide loaded-models endpoint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): add nodesApi.allModels() for cluster-wide model roster Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): move Scheduling to its own page and nav item Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): replace nodes stat-card strip with cluster pulse + attention callout Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): node-panel roster with inline model chips and segmented filter Replace the Nodes table with a full-width node-panel roster that shows each backend node's running-model chips without an expand click, plus an All/Backend/Agent segmented filter. Per-node detail (models, backends, labels, capacity) moves to the node detail page. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): add deep-linkable node detail page at /app/nodes/:id Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ui): remove em-dash from CapacityEditor comment; align detail spec backend mock Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(ui): nodes page cleanup, hover/chip polish, docs for restructured cluster view Nodes.jsx dead-code sweep confirmed clean (no StatCard/table/expand state/scheduling-form leftovers). Two App.css polish fixes: move the node-panel hover border-color onto the bordered element so hover gives real feedback, and add the missing .model-chip__state rule the ModelChip component already emits. Update distributed-mode docs prose to describe the restructured cluster view (cluster pulse, attention callout, node-panel roster with inline model chips, All/Backend/Agent filter, node detail page at /app/nodes/:id, Scheduling as its own page). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(ui): drop unused gpuVendorLabel export from nodeStatus Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 18:24:29 +02:00
LocalAI [bot]	f2abcc7503	chore(model gallery): 🤖 add 1 new models via gallery agent (#10445 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 16:09:16 +02:00
Adira	62c99c10b3	fix(diffusers): pin diffusers and transformers to a known-good pair (#9979 ) (#10442 ) fix(diffusers): pin diffusers and transformers to a known-good pair The diffusers backend tracked git+https://github.com/huggingface/diffusers (main) with an unpinned transformers. transformers v5 restructured CLIPTextModel and removed the .text_model attribute that diffusers' single -file loader reads, so loading any single-file Stable Diffusion checkpoint fails: create_diffusers_clip_model_from_ldm (single_file_utils.py) position_embedding_dim = model.text_model.embeddings.position_embedding... AttributeError: 'CLIPTextModel' object has no attribute 'text_model' No released diffusers (<=0.38.0) supports transformers v5 - only unreleased diffusers main does. Because the requirements tracked main plus an unpinned transformers, every backend image froze whichever pair existed at build time, and images built once transformers v5 shipped but before diffusers main caught up are permanently broken. Pin the last known-good released pair across all requirements files: diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no functionality is lost, and builds become reproducible instead of drifting into the broken window. Fixes #9979 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-22 12:38:06 +02:00
LocalAI [bot]	7226bb9f30	chore: ⬆️ Update CrispStrobe/CrispASR to `7a8cb80907341c0204bd0488c1244764f4163883` (#10315 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 12:21:58 +02:00
LocalAI [bot]	569d9bbd9e	fix(distributed): broadcast file-staging progress across replicas (#10440 ) File-staging progress lived only in the SmartRouter's in-memory StagingTracker on the replica performing the transfer. In a multi-replica deployment behind a round-robin load balancer, a /api/operations poll that lands on any other replica saw no staging row, so the progress ("processing file ... Total ... Current ...") flickered in and out as polls rotated between frontends. Mirror the pattern already used for gallery-install progress: the origin replica broadcasts staging ticks over NATS (SubjectStagingProgress, a new staging.<model>.progress subject), and peers merge them via ApplyRemote (SubscribeBroadcasts on the wildcard). Byte-level ticks are leading-edge debounced (~1/s); Start/FileComplete/Complete always publish. A locally-owned op stays authoritative so the origin's own echo and stray peer events can't clobber it, and mirrored remote ops expire after a TTL so a missed Done event can't leave a phantom row. The UI read path (StagingTracker.GetAll) is unchanged. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 09:28:07 +02:00
LocalAI [bot]	682fb2718c	fix(distributed): detach cold-load staging from the request context (#10438 ) A model not yet loaded on a worker is staged lazily on the inference request path. Staging a multi-GB model takes minutes - far longer than any client keeps its HTTP request open - so a browser refresh, an ingress/LB idle-timeout, or a round-robined retry landing on another frontend replica cancels the request context and aborts the upload with "context canceled" mid-transfer. Large models then never finish staging, so they never load (observed in a 2-replica deployment: both frontends repeatedly failed to stage a 15.7 GB GGUF, each attempt dying at a different offset). Bind the cold load (staging + LoadModel + the per-model advisory lock) to context.WithoutCancel(ctx): it keeps the request's values (prefix chain) but drops cancellation/deadline. Each long step keeps its own bound (the file stager's resume budget, LoadModel's 5m timeout), and the advisory lock still de-dupes concurrent loaders across replicas. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 09:06:20 +02:00
LocalAI [bot]	20c643e1f6	chore(model gallery): 🤖 add 1 new models via gallery agent (#10439 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 08:46:34 +02:00
VJSai	64a4351f3a	feat: send a LocalAI User-Agent on registry pulls (#10434 ) LocalAI pulls models from OCI registries (via go-containerregistry), the Ollama registry, and OCI blob stores (via oras), but every request went out with the underlying library's generic User-Agent, so registry operators had no way to attribute traffic to LocalAI. Add an oci.UserAgent() helper that returns "LocalAI" (or "LocalAI/<version>" when the binary is built with a version stamp via internal.Version) and wire it into all three pull paths: - pkg/oci/image.go: remote.WithUserAgent on the go-containerregistry image and digest requests - pkg/oci/ollama.go: a User-Agent header on the Ollama manifest request - pkg/oci/blob.go: a LocalAI User-Agent on the oras blob client. This mirrors oras' auth.DefaultClient (same retry.DefaultClient policy); only the advertised User-Agent changes. Implements #6258. Assisted-by: Claude:claude-opus-4-8 golangci-lint Signed-off-by: Vijay Sai <vijaysaijnv@gmail.com>	2026-06-22 08:44:12 +02:00
LocalAI [bot]	b7d67f5779	chore: ⬆️ Update ggml-org/llama.cpp to `7c082bc417bbe53210a83df4ba5b49e18ce6193c` (#10417 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 08:43:40 +02:00