feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480)

* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis Adds face recognition as a new first-class capability in LocalAI via the `insightface` Python backend, with a pluggable two-engine design so non-commercial (insightface model packs) and commercial-safe (OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface. New gRPC RPCs (backend/backend.proto): * FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse * FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse Existing Embedding and Detect RPCs are reused (face image in PredictOptions.Images / DetectOptions.src) for face embedding and face detection respectively. New HTTP endpoints under /v1/face/: * verify — 1:1 image pair same-person decision * analyze — per-face age + gender (emotion/race reserved) * register — 1:N enrollment; stores embedding in vector store * identify — 1:N recognition; detect → embed → StoresFind * forget — remove a registered face by opaque ID Service layer (core/services/facerecognition/) introduces a `Registry` interface with one in-memory `storeRegistry` impl backed by LocalAI's existing local-store gRPC vector backend. HTTP handlers depend on the interface, not on StoresSet/StoresFind directly, so a persistent PostgreSQL/pgvector implementation can be slotted in via a single constructor change in core/application (TODO marker in the package doc). New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired into FLAG_DETECTION so /v1/detection works for face bounding boxes. Gallery (backend/index.yaml) ships three entries: * insightface-buffalo-l — SCRFD-10GF + ArcFace R50 + genderage (~326MB pre-baked; non-commercial research use only) * insightface-opencv — YuNet + SFace (~40MB pre-baked; Apache 2.0) * insightface-buffalo-s — SCRFD-500MF + MBF (runtime download; non-commercial) Python backend (backend/python/insightface/): * engines.py — FaceEngine protocol with InsightFaceEngine and OnnxDirectEngine; resolves model paths relative to the backend directory so the same gallery config works in docker-scratch and in the e2e-backends rootfs-extraction harness. * backend.py — gRPC servicer implementing Health, LoadModel, Status, Embedding, Detect, FaceVerify, FaceAnalyze. * install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the backend directory so first-run is offline-clean (the final scratch image only preserves files under /<backend>/). * test.py — parametrized unit tests over both engines. Tests: * Registry unit tests (go test -race ./core/services/facerecognition/...) — in-memory fake grpc.Backend, table-driven, covers register/ identify/forget/error paths + concurrent access. * tests/e2e-backends/backend_test.go extended with face caps (face_detect, face_embed, face_verify, face_analyze); relative ordering + configurable verifyCeiling per engine. * Makefile targets: test-extra-backend-insightface-buffalo-l, -opencv, and the -all aggregate. * CI: .github/workflows/test-extra.yml gains tests-insightface-grpc, auto-triggered by changes under backend/python/insightface/. Docs: * docs/content/features/face-recognition.md — feature page with license table, quickstart (defaults to the commercial-safe model), models matrix, API reference, 1:N workflow, storage caveats. * Cross-refs in object-detection.md, stores.md, embeddings.md, and whats-new.md. * Contributor README at backend/python/insightface/README.md. Verified end-to-end: * buffalo_l: 6/6 specs (health, load, face_detect, face_embed, face_verify, face_analyze). * opencv: 5/5 specs (same minus face_analyze — SFace has no demographic head; correctly skipped via BACKEND_TEST_CAPS). Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): move engine selection to model gallery, collapse backend entries The previous commit put engine/model_pack options on backend gallery entries (`backend/index.yaml`). That was wrong — `GalleryBackend` (core/gallery/backend_types.go:32) has no `options` field, so the YAML decoder silently dropped those keys and all three "different insightface-*" backend entries resolved to the same container image with no distinguishing configuration. Correct split: * `backend/index.yaml` now has ONE `insightface` backend entry shipping the CPU + CUDA 12 container images. The Python backend bundles both the non-commercial insightface model packs (buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo weights (YuNet + SFace); the active engine is selected at LoadModel time via `options: ["engine:..."]`. * `gallery/index.yaml` gains three model entries — `insightface-buffalo-l`, `insightface-opencv`, `insightface-buffalo-s` — each setting the appropriate `overrides.backend` + `overrides.options` so installing one actually gives the user the intended engine. This matches how `rfdetr-base` lives in the model gallery against the `rfdetr` backend. The earlier e2e tests passed despite this bug because the Makefile targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC, bypassing any gallery resolution entirely. No code changes needed. Assisted-by: Claude:claude-opus-4-7 * feat(face-recognition): cover all supported models in the gallery + drop weight baking Follows up on the model-gallery split: adds entries for every model configuration either engine actually supports, and switches weight delivery from image-baked to LocalAI's standard gallery mechanism. Gallery now has seven `insightface-*` model entries (gallery/index.yaml): insightface (family) — non-commercial research use • buffalo-l (326MB) — SCRFD-10GF + ResNet50 + genderage, default • buffalo-m (313MB) — SCRFD-2.5GF + ResNet50 + genderage • buffalo-s (159MB) — SCRFD-500MF + MBF + genderage • buffalo-sc (16MB) — SCRFD-500MF + MBF, recognition only (no landmarks, no demographics — analyze returns empty attributes) • antelopev2 (407MB) — SCRFD-10GF + ResNet100@Glint360K + genderage OpenCV Zoo family — Apache 2.0 commercial-safe • opencv — YuNet + SFace fp32 (~40MB) • opencv-int8 — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU) Model weights are no longer baked into the backend image. The image now ships only the Python runtime + libraries (~275MB content size, ~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through LocalAI's gallery mechanism: * OpenCV variants list `files:` with ONNX URIs + SHA-256, so `local-ai models install insightface-opencv` pulls them into the models directory exactly like any other gallery-managed model. * insightface packs (upstream distributes .zip archives only, not individual ONNX files) auto-download on first LoadModel via FaceAnalysis' built-in machinery, rooted at the LocalAI models directory so they live alongside everything else — same pattern `rfdetr` uses with `inference.get_model()`. Backend changes (backend/python/insightface/): * backend.py — LoadModel propagates `ModelOptions.ModelPath` (the LocalAI models directory) to engines via a `_model_dir` hint. This replaces the earlier ModelFile-dirname approach; ModelPath is the canonical "models directory" variable set by the Go loader (pkg/model/initializers.go:144) and is always populated. * engines.py::_resolve_model_path — picks up `model_dir` and searches it (plus basename-in-model-dir) before falling back to the dev script-dir. This is how OnnxDirectEngine finds gallery-downloaded YuNet/SFace files by filename only. * engines.py::_flatten_insightface_pack — new helper that works around an upstream packaging inconsistency: buffalo_l/s/sc zips expand flat, but buffalo_m and antelopev2 zips wrap their ONNX files in a redundant `<name>/` directory. insightface's own loader looks one level too shallow and fails. We call `ensure_available()` explicitly, flatten if nested, then hand to FaceAnalysis. * engines.py::InsightFaceEngine.prepare — root-resolution order now includes the `_model_dir` hint so packs download into the LocalAI models directory by default. * install.sh — no longer pre-downloads any weights. Everything is gallery-managed now. * smoke.py (new) — parametrized smoke test that iterates over every gallery configuration, simulating the LocalAI install flow (creates a models dir, fetches OpenCV files with checksum verification, lets insightface auto-download its packs), then runs detect + embed + verify (+ analyze where supported) through the in-process BackendServicer. * test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/` paths; downloads ONNX files to a temp dir at setUpClass time and passes ModelPath accordingly. Registry change (core/services/facerecognition/store_registry.go): * `dim=0` in NewStoreRegistry now means "accept whatever dimension arrives" — needed because the backend supports 512-d ArcFace/MBF and 128-d SFace via the same Registry. A non-zero dim still fails fast with ErrDimensionMismatch. * core/application plumbs `faceEmbeddingDim = 0`, explaining the rationale in the comment. Backend gallery description updated to reflect that the image carries no weights — it's just Python + engines. Smoke-tested all 7 configurations against the rebuilt image (with the flatten fix applied), exit 0: PASS: insightface-buffalo-l faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-sc faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-s faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-m faces=6 dim=512 same-dist=0.000 PASS: insightface-antelopev2 faces=6 dim=512 same-dist=0.000 PASS: insightface-opencv faces=6 dim=128 same-dist=0.000 PASS: insightface-opencv-int8 faces=6 dim=128 same-dist=0.000 7/7 passed Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim CI regression from the previous commit: I moved OpenCV Zoo weight delivery to LocalAI's gallery `files:` mechanism, but the test-extra-backend-insightface-opencv target was still passing relative paths `detector_onnx:models/opencv/yunet.onnx` in BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over gRPC without going through the gallery, so those relative paths resolved to nothing and OpenCV's ONNXImporter failed: LoadModel failed: Failed to load face engine: OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx Fix: add an `insightface-opencv-models` prerequisite target that fetches the two ONNX files (YuNet + SFace) to a deterministic host cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256, and skips the download on re-runs. The opencv test target depends on it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend finds the files via its normal absolute-path resolution branch. Also refresh the buffalo_l comment: it no longer says "pre-baked" (nothing is — the pack auto-downloads from upstream's GitHub release on first LoadModel, same as in CI). Locally verified: `make test-extra-backend-insightface-opencv` passes 5/5 specs (health, load, face_detect, face_embed, face_verify). Assisted-by: Claude:claude-opus-4-7 * feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs The docs promised that /v1/embeddings returns face vectors when you send an image data-URI. That was never true: /v1/embeddings is OpenAI-compatible and text-only by contract — its handler goes through `core/backend/embeddings.go::ModelEmbedding`, which sets `predictOptions.Embeddings = s` (a string of TEXT to embed) and never populates `predictOptions.Images[]`. The Python backend's Embedding gRPC method does handle Images[] (that's how /v1/face/register reaches it internally via `backend.FaceEmbed`), but the HTTP embeddings endpoint wasn't wired to populate it. Rather than overload /v1/embeddings with image-vs-text detection — messy, and the endpoint is OpenAI-compatible by design — add a dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed` (already used internally by /v1/face/register and /v1/face/identify). Matches LocalAI's convention of a dedicated path per non-standard flow (/v1/rerank, /v1/detection, /v1/face/verify etc.). Response: { "embedding": [<dim> floats, L2-normed], "dim": int, // 512 for ArcFace R50 / MBF, 128 for SFace "model": "<name>" } Live-tested on the opencv engine: returns a 128-d L2-normalized vector (sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings is text-only and point image users at /v1/face/embed instead. Assisted-by: Claude:claude-opus-4-7 * fix(http): map malformed image input + gRPC status codes to proper 4xx Image-input failures on LocalAI's single-image endpoints (/v1/detection, /v1/face/{verify,analyze,embed,register,identify}) have historically returned 500 — even when the client was the one who sent garbage. Classic example: you POST an "image" that isn't a URL, isn't a data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim that's its fault. Two helpers land in core/http/endpoints/localai/images.go and every single-image handler is switched over: * decodeImageInput(s) Wraps utils.GetContentURIAsBase64 and turns any failure (invalid URL, not a data-URI, download error, etc.) into echo.NewHTTPError(400, "invalid image input: ..."). * mapBackendError(err) Inspects the gRPC status on a backend call error and maps: INVALID_ARGUMENT → 400 Bad Request NOT_FOUND → 404 Not Found FAILED_PRECONDITION → 412 Precondition Failed Unimplemented → 501 Not Implemented All other codes fall through unchanged (still 500). Before, my 1×1 PNG error-path test returned: HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images" After: HTTP 400 "failed to decode one or both images" Scope-limited to the LocalAI single-image endpoints. The multi-modal paths (middleware/request.go, openresponses/responses.go, openai/realtime.go) intentionally log-and-skip individual media parts when decoding fails — different design intent (graceful degradation of a multi-part message), not a 400-worthy failure. Left untouched. Live-verified: every error case in /tmp/face_errors.py now returns 4xx with a meaningful message; the "image with no face (1x1 PNG)" case specifically went from 500 → 400. Assisted-by: Claude:claude-opus-4-7 * refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis Follows up on the discovery that LocalAI's gallery `files:` mechanism handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy piper voices use exactly this pattern. Insightface packs are zip archives, so we can now deliver them the same way every other gallery-managed model gets delivered: declaratively, checksum-verified, through LocalAI's standard download+extract pipeline. Two changes: 1. Gallery (gallery/index.yaml) — every insightface-* entry gains a `files:` list with the pack zip's URI + SHA-256. `local-ai models install insightface-buffalo-l` now fetches the zip, verifies the hash, and extracts it into the models directory. No more reliance on insightface's library-internal `ensure_available()` auto-download or its hardcoded `BASE_REPO_URL`. 2. InsightFaceEngine (backend/python/insightface/engines.py) — drops the FaceAnalysis wrapper and drives insightface's `model_zoo` directly. The ~50 lines FaceAnalysis provides — glob ONNX files, route each through `model_zoo.get_model()`, build a `{taskname: model}` dict, loop per-face at inference — are reimplemented in `InsightFaceEngine`. The actual inference classes (RetinaFace, ArcFaceONNX, Attribute, Landmark) are still insightface's — we only replicate the glue, so drift risk against upstream is minimal. Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx` layout that doesn't match what LocalAI's zip extraction produces. LocalAI unpacks archives flat into `<models_dir>`. Upstream packs are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant `<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new `_locate_insightface_pack` helper searches both locations plus legacy paths and returns whichever has ONNX files. Replaces the earlier `_flatten_insightface_pack` helper (which tried to fight FaceAnalysis's layout expectations; now we just find the files wherever they are). Net effect for users: install once via LocalAI's managed flow, weights live alongside every other model, progress shows in the jobs endpoint, no first-load network call. Same API surface, cleaner plumbing. Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched The e2e suite drives LoadModel over gRPC without going through LocalAI's gallery flow, so the engine's `_model_dir` option (normally populated from ModelPath) is empty. Previously the insightface target relied on FaceAnalysis auto-download to paper over this, but we dropped FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l target started failing at LoadModel with "no insightface pack found". Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip (same SHA as the gallery entry), extract it on the host, and pass `root:<dir>` so the engine locates the pack without needing ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI fast; it covers the same insightface engine code path as buffalo_l. Face analyze cap dropped since buffalo_sc has no age/gender head. Assisted-by: Claude:claude-opus-4-7[1m] * feat(face-recognition): surface face-recognition in advertised feature maps The six /v1/face/* endpoints were missing from every place LocalAI advertises its feature surface to clients: * api_instructions — the machine-readable capability index at GET /api/instructions. Added `face-recognition` as a dedicated instruction area with an intro that calls out the in-memory registry caveat and the /v1/face/embed vs /v1/embeddings split. * auth/permissions — added FeatureFaceRecognition constant, routed all six face endpoints through it so admins can gate them per-user like any other API feature. Default ON (matches the other API features). * React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a follow-up (noted in the plan). Instruction count bumped 9 → 10; test updated. Assisted-by: Claude:claude-opus-4-7[1m] * docs(agents): capture advertising-surface steps in the endpoint guide Before this change, adding a new /v1/* endpoint reliably missed one or more of: the swagger @Tags annotation, the /api/instructions registry, the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The endpoint would work but be invisible to API consumers, admins, and the UI — and nothing in the existing docs said to look in those places. Extend .agents/api-endpoints-and-auth.md with a new "Advertising surfaces" section covering all four surfaces (swagger tags, /api/ instructions, capabilities.js, docs/), and expand the closing checklist so it's impossible to ship a feature without visiting each one. Hoist a one-liner reminder into AGENTS.md's Quick Reference so agents skim it before diving in. Assisted-by: Claude:claude-opus-4-7[1m]
2026-05-19 05:58:05 -04:00 · 2026-04-22 21:55:41 +02:00
parent d16f19f1eb
commit 20baec77ab
59 changed files with 3625 additions and 24 deletions
--- a/.agents/api-endpoints-and-auth.md
+++ b/.agents/api-endpoints-and-auth.md
@@ -2,6 +2,8 @@

 This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.

+> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
+
 ## Architecture overview

 Authentication and authorization flow through three layers:
@@ -234,6 +236,66 @@ Use these HTTP status codes:

 If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.

+## Advertising surfaces — where to register a new capability
+
+Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
+
+### 1. Swagger `@Tags` annotation (mandatory)
+
+Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
+
+```go
+// MyEndpoint does X.
+// @Summary Do X.
+// @Tags my-capability
+// @Param request body schema.MyRequest true "payload"
+// @Success 200 {object} schema.MyResponse "Response"
+// @Router /v1/my-endpoint [post]
+func MyEndpoint(...) echo.HandlerFunc { ... }
+```
+
+Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
+
+After adding endpoints, regenerate the embedded spec so the runtime serves it:
+
+```bash
+make protogen-go         # ensures gRPC codegen is fresh first
+make swagger             # regenerates swagger/swagger.json
+```
+
+### 2. `/api/instructions` registry (for new capability areas)
+
+`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
+
+**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
+
+Add an entry to `instructionDefs`:
+
+```go
+{
+    Name:        "my-capability",             // URL segment at /api/instructions/my-capability
+    Description: "Short sentence describing the capability",
+    Tags:        []string{"my-capability"},   // must match swagger @Tags
+    Intro:       "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
+},
+```
+
+Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
+
+### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
+
+If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
+
+```js
+export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
+```
+
+React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
+
+### 4. `docs/content/` (user-facing documentation)
+
+A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
+
 ## Path protection rules

 The global auth middleware classifies paths as API paths or non-API paths:
@@ -248,12 +310,23 @@ If you add endpoints under a new top-level path prefix, add it to `isAPIPath()`

 When adding a new endpoint:

+**Routing & auth**
 - [ ] Handler in `core/http/endpoints/`
 - [ ] Route registered in appropriate `core/http/routes/` file
 - [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
+- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
+- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
+- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
 - [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
 - [ ] If token-counting: `usageMiddleware` added to middleware chain
- [ ] Error responses use `schema.ErrorResponse` format
+
+**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
+- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
+- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
+- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
+- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
+
+**Quality**
+- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
 - [ ] Tests cover both authenticated and unauthenticated access
+- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -711,6 +711,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "8"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-insightface'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "insightface"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'cublas'
            cuda-major-version: "12"
            cuda-minor-version: "8"
@@ -2626,6 +2639,20 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          # insightface (face recognition)
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64,linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-insightface'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "insightface"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -38,6 +38,7 @@ jobs:
      qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
      voxtral: ${{ steps.detect.outputs.voxtral }}
      kokoros: ${{ steps.detect.outputs.kokoros }}
+      insightface: ${{ steps.detect.outputs.insightface }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
@@ -751,3 +752,29 @@ jobs:
      - name: Test kokoros
        run: |
          make -C backend/rust/kokoros test
+  tests-insightface-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+              make build-essential curl unzip ca-certificates git tar
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.26.0'
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+          df -h
+      - name: Build insightface backend image and run both model configurations
+        run: |
+          make test-extra-backend-insightface-all
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -34,5 +34,6 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
 - **Go style**: Prefer `any` over `interface{}`
 - **Comments**: Explain *why*, not *what*
 - **Docs**: Update `docs/content/` when adding features or changing config
+- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
 - **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
 - **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/116
+++ b/116
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad

 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -434,6 +434,7 @@ prepare-test-extra: protogen-python
 	$(MAKE) -C backend/python/ace-step
 	$(MAKE) -C backend/python/trl
 	$(MAKE) -C backend/python/tinygrad
+	$(MAKE) -C backend/python/insightface
 	$(MAKE) -C backend/rust/kokoros kokoros-grpc

 test-extra: prepare-test-extra
@@ -457,6 +458,7 @@ test-extra: prepare-test-extra
 	$(MAKE) -C backend/python/ace-step test
 	$(MAKE) -C backend/python/trl test
 	$(MAKE) -C backend/python/tinygrad test
+	$(MAKE) -C backend/python/insightface test
 	$(MAKE) -C backend/rust/kokoros test

 ##
@@ -507,6 +509,13 @@ test-extra-backend: protogen-go
 	BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
 	BACKEND_TEST_CACHE_TYPE_K="$$BACKEND_TEST_CACHE_TYPE_K" \
 	BACKEND_TEST_CACHE_TYPE_V="$$BACKEND_TEST_CACHE_TYPE_V" \
+	BACKEND_TEST_FACE_IMAGE_1_URL="$$BACKEND_TEST_FACE_IMAGE_1_URL" \
+	BACKEND_TEST_FACE_IMAGE_1_FILE="$$BACKEND_TEST_FACE_IMAGE_1_FILE" \
+	BACKEND_TEST_FACE_IMAGE_2_URL="$$BACKEND_TEST_FACE_IMAGE_2_URL" \
+	BACKEND_TEST_FACE_IMAGE_2_FILE="$$BACKEND_TEST_FACE_IMAGE_2_FILE" \
+	BACKEND_TEST_FACE_IMAGE_3_URL="$$BACKEND_TEST_FACE_IMAGE_3_URL" \
+	BACKEND_TEST_FACE_IMAGE_3_FILE="$$BACKEND_TEST_FACE_IMAGE_3_FILE" \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING="$$BACKEND_TEST_VERIFY_DISTANCE_CEILING" \
 	go test -v -timeout 30m ./tests/e2e-backends/...

 ## Convenience wrappers: build the image, then exercise it.
@@ -603,6 +612,107 @@ test-extra-backend-tinygrad-all: \
 	test-extra-backend-tinygrad-sd \
 	test-extra-backend-tinygrad-whisper

+## insightface — face recognition.
+##
+## Face fixtures default to the sample images shipped in the
+## deepinsight/insightface repository (MIT-licensed). For offline/local
+## runs override with BACKEND_TEST_FACE_IMAGE_{1,2,3}_FILE pointing at
+## local paths.
+FACE_IMAGE_1_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
+FACE_IMAGE_2_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
+FACE_IMAGE_3_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/mask_white.jpg
+
+## Host-side cache for the OpenCV Zoo face ONNX files used by the
+## opencv e2e target. The backend image no longer bakes model weights —
+## gallery installs bring them via `files:` — but the e2e suite drives
+## LoadModel over gRPC directly without going through the gallery. We
+## pre-download the ONNX files to a stable host path and pass absolute
+## paths in BACKEND_TEST_OPTIONS; `make` skips the downloads when the
+## SHA-256 already matches.
+INSIGHTFACE_OPENCV_DIR := /tmp/localai-insightface-opencv-cache
+INSIGHTFACE_OPENCV_YUNET_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
+INSIGHTFACE_OPENCV_SFACE_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
+INSIGHTFACE_OPENCV_YUNET_SHA := 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
+INSIGHTFACE_OPENCV_SFACE_SHA := 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
+
+## buffalo_sc (insightface) — pack zip + SHA-256 mirrors the gallery
+## entry so the e2e target matches exactly what `local-ai models install
+## insightface-buffalo-sc` would have fetched. Smallest insightface pack
+## (~16MB) — keeps CI fast while still covering the insightface engine
+## code path end-to-end.
+INSIGHTFACE_BUFFALO_SC_DIR := /tmp/localai-insightface-buffalo-sc-cache
+INSIGHTFACE_BUFFALO_SC_URL := https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
+INSIGHTFACE_BUFFALO_SC_SHA := 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
+
+.PHONY: insightface-opencv-models
+insightface-opencv-models:
+	@mkdir -p $(INSIGHTFACE_OPENCV_DIR)
+	@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_YUNET_SHA)" ]; then \
+		echo "Fetching YuNet..."; \
+		curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx $(INSIGHTFACE_OPENCV_YUNET_URL); \
+		echo "$(INSIGHTFACE_OPENCV_YUNET_SHA)  $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx" | sha256sum -c; \
+	fi
+	@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/sface.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_SFACE_SHA)" ]; then \
+		echo "Fetching SFace..."; \
+		curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/sface.onnx $(INSIGHTFACE_OPENCV_SFACE_URL); \
+		echo "$(INSIGHTFACE_OPENCV_SFACE_SHA)  $(INSIGHTFACE_OPENCV_DIR)/sface.onnx" | sha256sum -c; \
+	fi
+
+.PHONY: insightface-buffalo-sc-models
+insightface-buffalo-sc-models:
+	@mkdir -p $(INSIGHTFACE_BUFFALO_SC_DIR)
+	@if [ "$$(sha256sum $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_BUFFALO_SC_SHA)" ]; then \
+		echo "Fetching buffalo_sc..."; \
+		curl -fsSL -o $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip $(INSIGHTFACE_BUFFALO_SC_URL); \
+		echo "$(INSIGHTFACE_BUFFALO_SC_SHA)  $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip" | sha256sum -c; \
+		rm -f $(INSIGHTFACE_BUFFALO_SC_DIR)/*.onnx; \
+	fi
+	@if [ ! -f "$(INSIGHTFACE_BUFFALO_SC_DIR)/det_500m.onnx" ]; then \
+		echo "Extracting buffalo_sc..."; \
+		unzip -o -q $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip -d $(INSIGHTFACE_BUFFALO_SC_DIR); \
+	fi
+
+## buffalo_sc — smallest insightface pack (SCRFD-500MF detector + MBF
+## recognizer, ~16MB). Exercises the insightface engine code path
+## (model_zoo-backed inference) without the ~326MB buffalo_l download.
+## No age/gender/landmark heads — face_analyze is dropped from caps.
+## The pack is pre-fetched on the host and passed as `root:<dir>` since
+## the e2e suite drives LoadModel directly without going through
+## LocalAI's gallery flow (which is what would normally populate
+## ModelPath and in turn the engine's `_model_dir` option).
+test-extra-backend-insightface-buffalo-sc: docker-build-insightface insightface-buffalo-sc-models
+	BACKEND_IMAGE=local-ai-backend:insightface \
+	BACKEND_TEST_MODEL_NAME=insightface-buffalo-sc \
+	BACKEND_TEST_OPTIONS=engine:insightface,model_pack:buffalo_sc,root:$(INSIGHTFACE_BUFFALO_SC_DIR) \
+	BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
+	BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
+	BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
+	BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
+	$(MAKE) test-extra-backend
+
+## OpenCV Zoo YuNet + SFace — Apache 2.0, commercial-safe. face_analyze
+## cap is dropped (SFace has no demographic head). The ONNX files are
+## pre-fetched on the host via the insightface-opencv-models target and
+## passed as absolute paths, since the e2e suite drives LoadModel
+## directly without going through LocalAI's gallery flow.
+test-extra-backend-insightface-opencv: docker-build-insightface insightface-opencv-models
+	BACKEND_IMAGE=local-ai-backend:insightface \
+	BACKEND_TEST_MODEL_NAME=insightface-opencv \
+	BACKEND_TEST_OPTIONS=engine:onnx_direct,detector_onnx:$(INSIGHTFACE_OPENCV_DIR)/yunet.onnx,recognizer_onnx:$(INSIGHTFACE_OPENCV_DIR)/sface.onnx \
+	BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
+	BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
+	BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
+	BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
+	$(MAKE) test-extra-backend
+
+## Aggregate — runs both face-recognition model configurations so CI
+## catches regressions across engines together.
+test-extra-backend-insightface-all: \
+	test-extra-backend-insightface-buffalo-sc \
+	test-extra-backend-insightface-opencv
+
 ## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
 ## tool-call extraction via sglang's native qwen parser. CPU builds use
 ## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
@@ -748,6 +858,7 @@ BACKEND_OUTETTS = outetts|python|.|false|true
 BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
 BACKEND_COQUI = coqui|python|.|false|true
 BACKEND_RFDETR = rfdetr|python|.|false|true
+BACKEND_INSIGHTFACE = insightface|python|.|false|true
 BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
 BACKEND_NEUTTS = neutts|python|.|false|true
 BACKEND_KOKORO = kokoro|python|.|false|true
@@ -819,6 +930,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
 $(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
 $(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
+$(eval $(call generate-docker-build-target,$(BACKEND_INSIGHTFACE)))
 $(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
@@ -853,7 +965,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
 docker-save-%: backend-images
 	docker save local-ai-backend:$* -o backend-images/$*.tar

-docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
+docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface

 ########################################################
 ### Mock Backend for E2E Tests
--- a/backend/backend.proto
+++ b/backend/backend.proto
@@ -24,6 +24,8 @@ service Backend {
  rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
  rpc Status(HealthMessage) returns (StatusResponse) {}
  rpc Detect(DetectOptions) returns (DetectResponse) {}
+  rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
+  rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}

  rpc StoresSet(StoresSetOptions) returns (Result) {}
  rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
@@ -475,6 +477,57 @@ message DetectResponse {
  repeated Detection Detections = 1;
 }

+// --- Face recognition messages ---
+
+message FacialArea {
+  float x = 1;
+  float y = 2;
+  float w = 3;
+  float h = 4;
+}
+
+message FaceVerifyRequest {
+  string img1 = 1;              // base64-encoded image
+  string img2 = 2;              // base64-encoded image
+  float  threshold = 3;         // cosine-distance threshold; 0 = use backend default
+  bool   anti_spoofing = 4;     // reserved for future MiniFASNet bolt-on
+}
+
+message FaceVerifyResponse {
+  bool       verified = 1;
+  float      distance = 2;      // 1 - cosine_similarity
+  float      threshold = 3;
+  float      confidence = 4;    // 0-100
+  string     model = 5;         // e.g. "buffalo_l"
+  FacialArea img1_area = 6;
+  FacialArea img2_area = 7;
+  float      processing_time_ms = 8;
+}
+
+message FaceAnalyzeRequest {
+  string          img = 1;          // base64-encoded image
+  repeated string actions = 2;      // subset of ["age","gender","emotion","race"]; empty = all-supported
+  bool            anti_spoofing = 3;
+}
+
+message FaceAnalysis {
+  FacialArea         region = 1;
+  float              face_confidence = 2;
+  float              age = 3;
+  string             dominant_gender = 4;   // "Man" | "Woman"
+  map<string, float> gender = 5;
+  string             dominant_emotion = 6;  // reserved; empty in MVP
+  map<string, float> emotion = 7;
+  string             dominant_race = 8;     // not populated
+  map<string, float> race = 9;
+  bool               is_real = 10;          // anti-spoofing result when enabled
+  float              antispoof_score = 11;
+}
+
+message FaceAnalyzeResponse {
+  repeated FaceAnalysis faces = 1;
+}
+
 message ToolFormatMarkers {
  string format_type = 1;           // "json_native", "tag_with_json", "tag_with_tagged"

--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -168,6 +168,43 @@
    nvidia-cuda-13: "cuda13-rfdetr"
    nvidia-cuda-12: "cuda12-rfdetr"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-rfdetr"
+- &insightface
+  name: "insightface"
+  alias: "insightface"
+  # Upstream insightface library is MIT. The pretrained model packs
+  # (buffalo_l, buffalo_s, antelopev2) are released for NON-COMMERCIAL
+  # research use only. The backend image also pre-bakes OpenCV Zoo
+  # YuNet + SFace (Apache 2.0) for commercial use. Pick the engine
+  # via model-gallery entries (insightface-buffalo-l / insightface-opencv
+  # / insightface-buffalo-s) or set `options` in your model YAML.
+  license: "mixed"
+  description: |
+    Face recognition backend powered by `insightface` (ONNX Runtime).
+    Provides face verification (/v1/face/verify), face analysis
+    (/v1/face/analyze), face embedding (/v1/embeddings), face
+    detection (/v1/detection), and 1:N identification
+    (/v1/face/{register,identify,forget}).
+    Ships two engines in a single image: one that drives the insightface
+    model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research
+    use only) and one that drives OpenCV Zoo's YuNet + SFace pair
+    (Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]`
+    in your model YAML, or install one of the ready-made model-gallery
+    entries under the `insightface-*` prefix.
+    The backend image contains only code and Python deps; all model
+    weights are managed by LocalAI's gallery download mechanism.
+  urls:
+    - https://github.com/deepinsight/insightface
+    - https://github.com/opencv/opencv_zoo
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - gpu
+    - cpu
+  capabilities:
+    default: "cpu-insightface"
+    nvidia: "cuda12-insightface"
+    nvidia-cuda-12: "cuda12-insightface"
 - &sam3cpp
  name: "sam3-cpp"
  alias: "sam3-cpp"
@@ -3709,3 +3746,30 @@
  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
  mirrors:
    - localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization
+# insightface (face recognition) — development and concrete image entries
+- !!merge <<: *insightface
+  name: "insightface-development"
+  capabilities:
+    default: "cpu-insightface-development"
+    nvidia: "cuda12-insightface-development"
+    nvidia-cuda-12: "cuda12-insightface-development"
+- !!merge <<: *insightface
+  name: "cpu-insightface"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-insightface"
+  mirrors:
+    - localai/localai-backends:latest-cpu-insightface
+- !!merge <<: *insightface
+  name: "cuda12-insightface"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-insightface"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-insightface
+- !!merge <<: *insightface
+  name: "cpu-insightface-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-insightface"
+  mirrors:
+    - localai/localai-backends:master-cpu-insightface
+- !!merge <<: *insightface
+  name: "cuda12-insightface-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-insightface"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-insightface
--- a/backend/python/insightface/Makefile
+++ b/backend/python/insightface/Makefile
@@ -0,0 +1,13 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+	bash install.sh
+
+.PHONY: protogen-clean
+protogen-clean:
+	$(RM) backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: clean
+clean: protogen-clean
+	rm -rf venv __pycache__
--- a/backend/python/insightface/README.md
+++ b/backend/python/insightface/README.md
@@ -0,0 +1,67 @@
+# insightface backend (LocalAI)
+
+Face recognition backend backed by ONNX Runtime. Provides face
+verification (1:1), face analysis (age/gender), face detection, face
+embedding, and — via LocalAI's built-in vector store — 1:N
+identification.
+
+## Engines
+
+This backend ships with **two** interchangeable engines selected via
+`LoadModel.Options["engine"]`:
+
+| engine | Implementation | Models | License |
+|---|---|---|---|
+| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
+| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
+
+Both engines implement the same `FaceEngine` protocol in `engines.py`,
+so the gRPC servicer in `backend.py` doesn't need to know which one is
+active.
+
+## LoadModel options
+
+Common:
+
+| option | default | description |
+|---|---|---|
+| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
+| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
+| `det_thresh` | `0.5` | detector confidence threshold |
+| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
+
+`insightface` engine:
+
+| option | default | description |
+|---|---|---|
+| `model_pack` | `buffalo_l` | which insightface pack to load |
+
+`onnx_direct` engine:
+
+| option | default | description |
+|---|---|---|
+| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
+| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
+
+## Adding a new model pack
+
+1. If it's an insightface pack (auto-downloadable or manually extracted
+   into `~/.insightface/models/<name>/`), just add a new gallery entry
+   in `backend/index.yaml` with `options: ["engine:insightface",
+   "model_pack:<name>"]`. No code change.
+2. If it's an Apache-licensed ONNX pair, add a gallery entry with
+   `options: ["engine:onnx_direct", "detector_onnx:...",
+   "recognizer_onnx:..."]`. If the detector or recognizer has a
+   different input-tensor shape than YuNet/SFace, you may need a new
+   engine implementation in `engines.py`; the two-engine seam makes
+   that a self-contained change.
+
+## Running tests locally
+
+```bash
+make -C backend/python/insightface         # install deps + bake models
+make -C backend/python/insightface test    # run test.py
+```
+
+The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
+absent (e.g. on dev boxes where `install.sh` wasn't run).
--- a/backend/python/insightface/backend.py
+++ b/backend/python/insightface/backend.py
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+"""gRPC server for the insightface face recognition backend.
+
+Implements Health / LoadModel / Status plus the face-specific methods:
+Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
+delegated to engines.py — this file is just the gRPC plumbing.
+"""
+import argparse
+import base64
+import os
+import signal
+import sys
+import time
+from concurrent import futures
+from io import BytesIO
+
+import backend_pb2
+import backend_pb2_grpc
+import cv2
+import grpc
+import numpy as np
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
+from grpc_auth import get_auth_interceptors  # noqa: E402
+
+from engines import FaceEngine, build_engine  # noqa: E402
+
+_ONE_DAY = 60 * 60 * 24
+MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
+
+# Default cosine-distance threshold for "same person" on buffalo_l
+# ArcFace R50. Clients can override per-request; clients using SFace
+# should pass threshold≈0.4 since the distance distribution is wider.
+DEFAULT_VERIFY_THRESHOLD = 0.35
+
+
+def _decode_image(src: str) -> np.ndarray | None:
+    """Decode a base64-encoded image into an OpenCV BGR numpy array."""
+    if not src:
+        return None
+    try:
+        data = base64.b64decode(src, validate=False)
+    except Exception:
+        return None
+    arr = np.frombuffer(data, dtype=np.uint8)
+    if arr.size == 0:
+        return None
+    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
+    return img
+
+
+def _parse_options(raw: list[str]) -> dict[str, str]:
+    out: dict[str, str] = {}
+    for entry in raw:
+        if ":" not in entry:
+            continue
+        k, v = entry.split(":", 1)
+        out[k.strip()] = v.strip()
+    return out
+
+
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+    def __init__(self) -> None:
+        self.engine: FaceEngine | None = None
+        self.engine_name: str = ""
+        self.model_name: str = ""
+        self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
+
+    def Health(self, request, context):
+        return backend_pb2.Reply(message=bytes("OK", "utf-8"))
+
+    def LoadModel(self, request, context):
+        options = _parse_options(list(request.Options))
+        # Surface LocalAI's models directory (ModelPath) so engines can
+        # anchor relative paths — OnnxDirectEngine's detector_onnx /
+        # recognizer_onnx point at gallery-managed files that LocalAI
+        # dropped there, and InsightFaceEngine auto-downloads its packs
+        # into that same directory alongside every other managed model.
+        # Private key to avoid clashing with user-provided options.
+        if request.ModelPath:
+            options["_model_dir"] = request.ModelPath
+
+        engine_name = options.get("engine", "insightface")
+        try:
+            self.engine = build_engine(engine_name)
+            self.engine.prepare(options)
+        except Exception as err:  # pragma: no cover - exercised via e2e
+            return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
+
+        self.engine_name = engine_name
+        self.model_name = request.Model or options.get("model_pack", "")
+        if "verify_threshold" in options:
+            try:
+                self.verify_threshold = float(options["verify_threshold"])
+            except ValueError:
+                pass
+        print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
+        return backend_pb2.Result(success=True, message="Model loaded successfully")
+
+    def Status(self, request, context):
+        state = (
+            backend_pb2.StatusResponse.READY
+            if self.engine is not None
+            else backend_pb2.StatusResponse.UNINITIALIZED
+        )
+        return backend_pb2.StatusResponse(state=state)
+
+    def Embedding(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.EmbeddingResult()
+        if not request.Images:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("Embedding requires Images[0] to be a base64 image")
+            return backend_pb2.EmbeddingResult()
+
+        img = _decode_image(request.Images[0])
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.EmbeddingResult()
+
+        vec = self.engine.embed(img)
+        if vec is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected")
+            return backend_pb2.EmbeddingResult()
+        return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
+
+    def Detect(self, request, context):
+        if self.engine is None:
+            return backend_pb2.DetectResponse()
+        img = _decode_image(request.src)
+        if img is None:
+            return backend_pb2.DetectResponse()
+        detections = []
+        for d in self.engine.detect(img):
+            x1, y1, x2, y2 = d.bbox
+            detections.append(
+                backend_pb2.Detection(
+                    x=float(x1),
+                    y=float(y1),
+                    width=float(x2 - x1),
+                    height=float(y2 - y1),
+                    confidence=float(d.score),
+                    class_name="face",
+                )
+            )
+        return backend_pb2.DetectResponse(Detections=detections)
+
+    def FaceVerify(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceVerifyResponse()
+
+        img1 = _decode_image(request.img1)
+        img2 = _decode_image(request.img2)
+        if img1 is None or img2 is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        threshold = request.threshold if request.threshold > 0 else self.verify_threshold
+
+        start = time.time()
+        e1 = self.engine.embed(img1)
+        e2 = self.engine.embed(img2)
+        if e1 is None or e2 is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected in one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        # Both engines return L2-normalized vectors, so the dot product
+        # is the cosine similarity directly.
+        sim = float(np.dot(e1, e2))
+        distance = 1.0 - sim
+        verified = distance < threshold
+        confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
+
+        def _region(img) -> backend_pb2.FacialArea:
+            dets = self.engine.detect(img)
+            if not dets:
+                return backend_pb2.FacialArea()
+            best = max(dets, key=lambda d: d.score)
+            x1, y1, x2, y2 = best.bbox
+            return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
+
+        return backend_pb2.FaceVerifyResponse(
+            verified=verified,
+            distance=float(distance),
+            threshold=float(threshold),
+            confidence=float(confidence),
+            model=self.model_name or self.engine_name,
+            img1_area=_region(img1),
+            img2_area=_region(img2),
+            processing_time_ms=float((time.time() - start) * 1000.0),
+        )
+
+    def FaceAnalyze(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceAnalyzeResponse()
+        img = _decode_image(request.img)
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.FaceAnalyzeResponse()
+
+        faces = []
+        for attrs in self.engine.analyze(img):
+            x, y, w, h = attrs.region
+            fa = backend_pb2.FaceAnalysis(
+                region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
+                face_confidence=float(attrs.face_confidence),
+            )
+            if attrs.age is not None:
+                fa.age = float(attrs.age)
+            if attrs.dominant_gender:
+                fa.dominant_gender = attrs.dominant_gender
+            for k, v in attrs.gender.items():
+                fa.gender[k] = float(v)
+            faces.append(fa)
+        return backend_pb2.FaceAnalyzeResponse(faces=faces)
+
+
+def serve(address: str) -> None:
+    server = grpc.server(
+        futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
+        options=[
+            ("grpc.max_message_length", 50 * 1024 * 1024),
+            ("grpc.max_send_message_length", 50 * 1024 * 1024),
+            ("grpc.max_receive_message_length", 50 * 1024 * 1024),
+        ],
+        interceptors=get_auth_interceptors(),
+    )
+    backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+    server.add_insecure_port(address)
+    server.start()
+    print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
+
+    def _stop(sig, frame):  # pragma: no cover
+        print("[insightface] shutting down")
+        server.stop(0)
+        sys.exit(0)
+
+    signal.signal(signal.SIGINT, _stop)
+    signal.signal(signal.SIGTERM, _stop)
+
+    try:
+        while True:
+            time.sleep(_ONE_DAY)
+    except KeyboardInterrupt:
+        server.stop(0)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
+    parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
+    args = parser.parse_args()
+    print(f"[insightface] startup: {args}", file=sys.stderr)
+    serve(args.addr)
--- a/backend/python/insightface/engines.py
+++ b/backend/python/insightface/engines.py
@@ -0,0 +1,382 @@
+"""Face recognition engine implementations for the LocalAI insightface backend.
+
+Two engines are provided:
+
+    * InsightFaceEngine  — wraps insightface.app.FaceAnalysis. Supports
+                           buffalo_l / buffalo_s / antelopev2 model packs
+                           with SCRFD detector + ArcFace recognizer +
+                           genderage head. NON-COMMERCIAL research use
+                           only (upstream license).
+
+    * OnnxDirectEngine   — loads detector + recognizer ONNX files directly
+                           via onnxruntime. Used for OpenCV Zoo models
+                           (YuNet + SFace) and any future Apache-licensed
+                           model set. Does not support analyze().
+
+Both engines expose the same interface so the gRPC servicer (backend.py)
+can dispatch without knowing which one is active.
+"""
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Protocol
+
+import cv2
+import numpy as np
+
+
+@dataclass
+class FaceDetection:
+    bbox: tuple[float, float, float, float]  # x1, y1, x2, y2
+    score: float
+    landmarks: np.ndarray | None = None      # 5x2 keypoints when available
+
+
+@dataclass
+class FaceAttributes:
+    region: tuple[float, float, float, float]  # x, y, w, h
+    face_confidence: float
+    age: float | None = None
+    dominant_gender: str | None = None
+    gender: dict[str, float] = field(default_factory=dict)
+
+
+class FaceEngine(Protocol):
+    """Minimal interface every engine must implement."""
+
+    def prepare(self, options: dict[str, str]) -> None: ...
+    def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
+    def embed(self, img: np.ndarray) -> np.ndarray | None: ...
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
+
+
+# ─── InsightFaceEngine ────────────────────────────────────────────────
+
+class InsightFaceEngine:
+    """Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
+
+    FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
+    in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
+    build a `{taskname: model}` dict, then loop per-face at inference).
+    We reimplement the same loop here so we can:
+
+      1. Load packs from whatever directory LocalAI's gallery extracted
+         them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
+         nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
+         without needing a specific layout on disk.
+      2. Skip insightface's built-in auto-download entirely: weight
+         delivery is LocalAI's gallery `files:` job now, checksum-
+         verified and cached alongside every other managed model.
+
+    The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
+    Landmark) stay in insightface — we only reimplement the ~50 lines
+    of glue around them.
+    """
+
+    def __init__(self) -> None:
+        self.models: dict[str, Any] = {}
+        self.det_model: Any = None
+        self.model_pack: str = "buffalo_l"
+        self.det_size: tuple[int, int] = (640, 640)
+        self.det_thresh: float = 0.5
+        self._providers: list[str] = ["CPUExecutionProvider"]
+
+    def prepare(self, options: dict[str, str]) -> None:
+        import glob
+        import os
+
+        from insightface.model_zoo import model_zoo
+
+        self.model_pack = options.get("model_pack", "buffalo_l")
+        self.det_size = _parse_det_size(options.get("det_size", "640x640"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        pack_dir = _locate_insightface_pack(options, self.model_pack)
+        if pack_dir is None:
+            raise ValueError(
+                f"no insightface pack '{self.model_pack}' found — install via "
+                f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
+            )
+
+        onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
+        if not onnx_files:
+            raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
+
+        # CUDAExecutionProvider is picked automatically by onnxruntime-gpu
+        # when available; falling back to CPU keeps the CPU-only image
+        # working. ctx_id=0 means "first GPU if any, else CPU".
+        self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
+
+        self.models = {}
+        for onnx_file in onnx_files:
+            m = model_zoo.get_model(onnx_file, providers=self._providers)
+            if m is None:
+                continue
+            # First occurrence of each taskname wins (matches FaceAnalysis).
+            if m.taskname not in self.models:
+                self.models[m.taskname] = m
+
+        if "detection" not in self.models:
+            raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
+        self.det_model = self.models["detection"]
+
+        self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
+        for name, m in self.models.items():
+            if name != "detection":
+                m.prepare(0)
+
+    def _faces(self, img: np.ndarray) -> list[Any]:
+        """Run detection + all non-detection models per face."""
+        if self.det_model is None:
+            return []
+        from insightface.app.common import Face
+
+        bboxes, kpss = self.det_model.detect(img, max_num=0)
+        if bboxes is None or bboxes.shape[0] == 0:
+            return []
+        faces: list[Any] = []
+        for i in range(bboxes.shape[0]):
+            bbox = bboxes[i, 0:4]
+            det_score = bboxes[i, 4]
+            kps = kpss[i] if kpss is not None else None
+            face = Face(bbox=bbox, kps=kps, det_score=det_score)
+            for name, m in self.models.items():
+                if name == "detection":
+                    continue
+                m.get(img, face)
+            faces.append(face)
+        return faces
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        return [
+            FaceDetection(
+                bbox=tuple(float(v) for v in f.bbox),
+                score=float(f.det_score),
+                landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
+            )
+            for f in self._faces(img)
+        ]
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        faces = self._faces(img)
+        if not faces:
+            return None
+        best = max(faces, key=lambda f: float(f.det_score))
+        if getattr(best, "normed_embedding", None) is None:
+            return None
+        return np.asarray(best.normed_embedding, dtype=np.float32)
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        out: list[FaceAttributes] = []
+        for f in self._faces(img):
+            x1, y1, x2, y2 = (float(v) for v in f.bbox)
+            region = (x1, y1, x2 - x1, y2 - y1)
+            attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
+            age = getattr(f, "age", None)
+            if age is not None:
+                attrs.age = float(age)
+            gender = getattr(f, "gender", None)
+            if gender is not None:
+                # genderage head emits argmax, not probabilities —
+                # one-hot dict keeps the API stable.
+                attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
+                attrs.gender = {
+                    "Man": 1.0 if int(gender) == 1 else 0.0,
+                    "Woman": 0.0 if int(gender) == 1 else 1.0,
+                }
+            out.append(attrs)
+        return out
+
+
+# ─── OnnxDirectEngine ─────────────────────────────────────────────────
+
+class OnnxDirectEngine:
+    """Loads detector + recognizer ONNX files directly.
+
+    Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
+    exposes a C++-level API via cv2.FaceDetectorYN which accepts the
+    ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
+    Both are Apache 2.0 licensed.
+    """
+
+    def __init__(self) -> None:
+        self.detector_path: str = ""
+        self.recognizer_path: str = ""
+        self.input_size: tuple[int, int] = (320, 320)
+        self.det_thresh: float = 0.5
+        self._detector: Any = None
+        self._recognizer: Any = None
+
+    def prepare(self, options: dict[str, str]) -> None:
+        raw_det = options.get("detector_onnx", "")
+        raw_rec = options.get("recognizer_onnx", "")
+        if not raw_det or not raw_rec:
+            raise ValueError(
+                "onnx_direct engine requires both detector_onnx and recognizer_onnx options"
+            )
+        model_dir = options.get("_model_dir")
+        self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
+        self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
+        self.input_size = _parse_det_size(options.get("det_size", "320x320"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        # YuNet is a fixed-size detector; size is reset per detect() call to
+        # match the input frame.
+        self._detector = cv2.FaceDetectorYN.create(
+            self.detector_path,
+            "",
+            self.input_size,
+            score_threshold=self.det_thresh,
+            nms_threshold=0.3,
+            top_k=5000,
+        )
+        self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        if self._detector is None:
+            return []
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None:
+            return []
+        out: list[FaceDetection] = []
+        for row in faces:
+            x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
+            # Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
+            landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
+            score = float(row[-1])
+            out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
+        return out
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        if self._detector is None or self._recognizer is None:
+            return None
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None or len(faces) == 0:
+            return None
+        # Pick the highest-score face (last column is score).
+        best = max(faces, key=lambda r: float(r[-1]))
+        aligned = self._recognizer.alignCrop(img, best)
+        feat = self._recognizer.feature(aligned)
+        vec = np.asarray(feat, dtype=np.float32).flatten()
+        # SFace outputs a 128-dim feature; L2-normalize to make dot-product
+        # comparable to buffalo_l's already-normed 512-dim embedding.
+        norm = float(np.linalg.norm(vec))
+        if norm == 0:
+            return None
+        return vec / norm
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        # OpenCV Zoo does not ship a demographic classifier; report
+        # only the face-detection regions so callers can still see
+        # how many faces were detected.
+        return [
+            FaceAttributes(
+                region=(
+                    d.bbox[0],
+                    d.bbox[1],
+                    d.bbox[2] - d.bbox[0],
+                    d.bbox[3] - d.bbox[1],
+                ),
+                face_confidence=d.score,
+            )
+            for d in self.detect(img)
+        ]
+
+
+# ─── helpers ──────────────────────────────────────────────────────────
+
+def _parse_det_size(raw: str) -> tuple[int, int]:
+    raw = raw.strip().lower().replace(" ", "")
+    if "x" in raw:
+        w, h = raw.split("x", 1)
+        return (int(w), int(h))
+    n = int(raw)
+    return (n, n)
+
+
+def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
+    """Find the directory holding the insightface pack's ONNX files.
+
+    LocalAI's gallery `files:` extracts the pack zip straight into the
+    models directory. Upstream packs are inconsistent:
+
+      buffalo_l/s/sc  — flat zip, ONNX lands at `<models_dir>/*.onnx`
+      buffalo_m, antelopev2  — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
+
+    We search, in order:
+      1. `<models_dir>/<name>/`  — wrapped-zip layout, or insightface's
+         own FaceAnalysis-style `<root>/models/<name>/` layout.
+      2. `<models_dir>/models/<name>/`  — insightface's FaceAnalysis
+         auto-download lands here (handy for dev environments that
+         still have old `~/.insightface` caches).
+      3. `<models_dir>/`  — flat-zip layout directly in models dir.
+
+    Returns the first directory whose contents include `*.onnx`.
+    """
+    import glob
+    import os
+
+    model_dir = options.get("_model_dir") or ""
+    explicit_root = options.get("root")
+
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, name))
+        candidates.append(os.path.join(model_dir, "models", name))
+        candidates.append(model_dir)
+    if explicit_root:
+        expanded = os.path.expanduser(explicit_root)
+        candidates.append(os.path.join(expanded, "models", name))
+        candidates.append(os.path.join(expanded, name))
+        candidates.append(expanded)
+
+    for c in candidates:
+        if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
+            return c
+    return None
+
+
+def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
+    """Resolve an ONNX file path across the paths LocalAI might deliver it from.
+
+    Search order:
+      1. The path itself if it already resolves (absolute, or relative to CWD).
+      2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
+         this is how LocalAI surfaces gallery-managed files. When the gallery
+         entry lists `files:`, each one lands under the models directory and
+         backends load them via filename anchored by ModelFile.
+      3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
+         where someone manually dropped weights inside the backend dir.
+
+    If none hit, return the literal input so cv2/insightface surfaces a
+    clearer error naming the actually-attempted path.
+    """
+    import os
+
+    if os.path.isfile(path):
+        return path
+    stripped = path.lstrip("/")
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, os.path.basename(path)))
+        candidates.append(os.path.join(model_dir, stripped))
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    candidates.append(os.path.join(script_dir, stripped))
+    for c in candidates:
+        if os.path.isfile(c):
+            return c
+    return path
+
+
+def build_engine(name: str) -> FaceEngine:
+    """Factory for the engine selected by LoadModel options."""
+    key = name.strip().lower()
+    if key in ("", "insightface"):
+        return InsightFaceEngine()
+    if key in ("onnx_direct", "onnx-direct", "opencv"):
+        return OnnxDirectEngine()
+    raise ValueError(f"unknown engine: {name!r}")
--- a/backend/python/insightface/install.sh
+++ b/backend/python/insightface/install.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+installRequirements
+
+# We deliberately do NOT pre-bake any model weights here. Two reasons:
+#
+#   1. Weights should follow LocalAI's gallery-managed download flow
+#      like every other backend. For OpenCV Zoo (YuNet + SFace) the
+#      gallery entries in gallery/index.yaml list the ONNX files via
+#      `files:` with URI + SHA-256 — LocalAI fetches them into the
+#      models directory on `local-ai models install`.
+#
+#   2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
+#      buffalo_sc, antelopev2), upstream distributes zip archives
+#      only (no individual ONNX URLs). We rely on insightface's own
+#      auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
+#      at first LoadModel, pointed at a writable directory. This
+#      matches how rfdetr behaves (uses `inference.get_model()`).
+#
+# Net effect: the backend image ships only Python deps (~150MB CPU).
--- a/backend/python/insightface/requirements-cpu.txt
+++ b/backend/python/insightface/requirements-cpu.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements-cublas12.txt
+++ b/backend/python/insightface/requirements-cublas12.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime-gpu
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements.txt
+++ b/backend/python/insightface/requirements.txt
@@ -0,0 +1,3 @@
+grpcio==1.71.0
+protobuf
+grpcio-tools
--- a/backend/python/insightface/run.sh
+++ b/backend/python/insightface/run.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+startBackend $@
--- a/backend/python/insightface/smoke.py
+++ b/backend/python/insightface/smoke.py
@@ -0,0 +1,264 @@
+#!/usr/bin/env python3
+"""Smoke-test every face recognition model configuration shipped in the
+gallery. Simulates what LocalAI does at runtime: for each config, sets
+up a models directory, fetches any required files via URL (as the
+gallery's `files:` list would), then loads + detects + embeds via the
+in-process BackendServicer — matching the gRPC surface end users hit.
+
+Run inside the built backend image (venv already has insightface /
+onnxruntime / opencv-python-headless):
+
+    python smoke.py
+
+Network is required for the insightface packs (fetched via upstream's
+FaceAnalysis auto-download at first LoadModel) and for downloading
+the OpenCV Zoo ONNX files on first run.
+"""
+from __future__ import annotations
+
+import base64
+import hashlib
+import os
+import sys
+import traceback
+import urllib.request
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+from backend import BackendServicer  # noqa: E402
+
+
+# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
+# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
+OPENCV_FILES = {
+    "fp32": [
+        (
+            "face_detection_yunet_2023mar.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+            "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+        ),
+        (
+            "face_recognition_sface_2021dec.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+            "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+        ),
+    ],
+    "int8": [
+        (
+            "face_detection_yunet_2023mar_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
+            "321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
+        ),
+        (
+            "face_recognition_sface_2021dec_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
+            "2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
+        ),
+    ],
+}
+
+
+CONFIGS = [
+    {
+        "name": "insightface-buffalo-l",
+        "options": ["engine:insightface", "model_pack:buffalo_l"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-sc",
+        "options": ["engine:insightface", "model_pack:buffalo_sc"],
+        # buffalo_sc has recognizer only — no landmarks, no genderage.
+        "has_analyze": False,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-s",
+        "options": ["engine:insightface", "model_pack:buffalo_s"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-m",
+        "options": ["engine:insightface", "model_pack:buffalo_m"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-antelopev2",
+        "options": ["engine:insightface", "model_pack:antelopev2"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-opencv",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "fp32",
+    },
+    {
+        "name": "insightface-opencv-int8",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar_int8.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "int8",
+    },
+]
+
+
+class _FakeContext:
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+def _encode_image(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_sample_image() -> str:
+    from insightface.data import get_image as ins_get_image
+
+    return _encode_image(ins_get_image("t1"))
+
+
+def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
+    dest = os.path.join(model_dir, filename)
+    if os.path.isfile(dest):
+        h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+        if h == sha256:
+            return
+    sys.stderr.write(f"  fetching {filename} from {uri}\n")
+    sys.stderr.flush()
+    urllib.request.urlretrieve(uri, dest)
+    h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+    if h != sha256:
+        raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
+
+
+def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
+    # Mirror LocalAI's gallery flow: populate model_dir with the
+    # gallery's listed files before calling LoadModel.
+    if cfg["needs_opencv_files"]:
+        for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
+            _download_if_missing(model_dir, filename, uri, sha256)
+
+    svc = BackendServicer()
+    ctx = _FakeContext()
+
+    load_res = svc.LoadModel(
+        backend_pb2.ModelOptions(
+            Model=cfg["name"],
+            Options=cfg["options"],
+            # ModelPath is what the Go loader sets to ml.ModelPath —
+            # LocalAI's models directory. The backend anchors relative
+            # paths and insightface auto-download root here.
+            ModelPath=model_dir,
+        ),
+        ctx,
+    )
+    if not load_res.success:
+        return False, f"LoadModel: {load_res.message}"
+
+    det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+    if len(det_res.Detections) == 0:
+        return False, "Detect returned no faces"
+    for d in det_res.Detections:
+        if d.class_name != "face":
+            return False, f"Detect returned class_name={d.class_name!r}"
+
+    emb_ctx = _FakeContext()
+    emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
+    if emb_ctx.code is not None:
+        return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
+    if len(emb_res.embeddings) == 0:
+        return False, "Embedding returned empty vector"
+    norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
+    if not (0.8 <= norm_sq <= 1.2):
+        return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
+
+    ver_ctx = _FakeContext()
+    ver_res = svc.FaceVerify(
+        backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
+    )
+    if ver_ctx.code is not None:
+        return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
+    if not ver_res.verified:
+        return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
+    if ver_res.distance > 0.1:
+        return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
+
+    if cfg["has_analyze"]:
+        an_ctx = _FakeContext()
+        an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
+        if an_ctx.code is not None:
+            return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
+        if len(an_res.faces) == 0:
+            return False, "FaceAnalyze returned no faces"
+        f0 = an_res.faces[0]
+        if f0.age <= 0:
+            return False, f"FaceAnalyze age not populated (age={f0.age})"
+        if f0.dominant_gender not in ("Man", "Woman"):
+            return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
+
+    n_dets = len(det_res.Detections)
+    dim = len(emb_res.embeddings)
+    return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
+
+
+def main() -> int:
+    # Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
+    # default to a fresh temp dir.
+    model_dir = os.environ.get("LOCALAI_MODELS_PATH")
+    if not model_dir:
+        import tempfile
+
+        model_dir = tempfile.mkdtemp(prefix="face-smoke-")
+    os.makedirs(model_dir, exist_ok=True)
+    print(f"model_dir={model_dir}", file=sys.stderr)
+
+    print("Preparing sample image from insightface.data...", file=sys.stderr)
+    img_b64 = _load_sample_image()
+
+    results: list[tuple[str, bool, str]] = []
+    for cfg in CONFIGS:
+        sys.stderr.write(f"\n=== {cfg['name']} ===\n")
+        sys.stderr.flush()
+        try:
+            ok, detail = _run_one(cfg, img_b64, model_dir)
+        except Exception:
+            ok, detail = False, traceback.format_exc().splitlines()[-1]
+        results.append((cfg["name"], ok, detail))
+        print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s}  {detail}")
+        sys.stdout.flush()
+
+    print("\n=== summary ===")
+    passed = sum(1 for _, ok, _ in results if ok)
+    total = len(results)
+    for name, ok, detail in results:
+        mark = "✓" if ok else "✗"
+        print(f"  {mark} {name:30s} {detail}")
+    print(f"\n{passed}/{total} passed")
+    return 0 if passed == total else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/backend/python/insightface/test.py
+++ b/backend/python/insightface/test.py
@@ -0,0 +1,234 @@
+"""Unit tests for the insightface gRPC backend.
+
+The servicer is instantiated in-process (no gRPC channel) and driven
+directly. Images come from insightface.data which ships with the pip
+package — no external downloads.
+
+Tests are parametrized over both engines (InsightFaceEngine and
+OnnxDirectEngine) where applicable.
+"""
+from __future__ import annotations
+
+import base64
+import os
+import sys
+import unittest
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+
+from backend import BackendServicer  # noqa: E402
+
+# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
+# to mirror LocalAI's gallery `files:` flow (the backend image itself
+# doesn't ship model weights).
+OPENCV_FILES = [
+    (
+        "face_detection_yunet_2023mar.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+        "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+    ),
+    (
+        "face_recognition_sface_2021dec.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+        "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+    ),
+]
+
+
+def _encode(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_insightface_samples() -> dict[str, str]:
+    """Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
+
+    t1 is a group photo, t2 a different one. We reuse both as
+    stand-ins for "Alice photo 1/2" and "Bob".
+    """
+    from insightface.data import get_image as ins_get_image
+
+    return {
+        "t1": _encode(ins_get_image("t1")),
+        "t2": _encode(ins_get_image("t2")),
+    }
+
+
+class _FakeContext:
+    """Minimal stand-in for grpc.ServicerContext."""
+
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+class _Harness:
+    def __init__(self, servicer: BackendServicer) -> None:
+        self.svc = servicer
+
+    def health(self):
+        return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
+
+    def load(self, options: list[str], model_path: str = ""):
+        return self.svc.LoadModel(
+            backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
+            _FakeContext(),
+        )
+
+    def detect(self, img_b64: str):
+        return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+
+    def embed(self, img_b64: str):
+        ctx = _FakeContext()
+        res = self.svc.Embedding(
+            backend_pb2.PredictOptions(Images=[img_b64]),
+            ctx,
+        )
+        return res, ctx
+
+    def verify(self, a: str, b: str, threshold: float = 0.0):
+        return self.svc.FaceVerify(
+            backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
+            _FakeContext(),
+        )
+
+    def analyze(self, img_b64: str):
+        return self.svc.FaceAnalyze(
+            backend_pb2.FaceAnalyzeRequest(img=img_b64),
+            _FakeContext(),
+        )
+
+
+class InsightFaceEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_health(self):
+        self.assertEqual(self.harness.health().message, b"OK")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+            self.assertGreater(d.width, 0)
+            self.assertGreater(d.height, 0)
+
+    def test_embedding_is_l2_normed(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertEqual(len(res.embeddings), 512)
+        norm_sq = sum(x * x for x in res.embeddings)
+        self.assertAlmostEqual(norm_sq, 1.0, places=2)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"])
+        self.assertTrue(res.verified)
+        self.assertLess(res.distance, 0.05)
+
+    def test_verify_different_images(self):
+        # t1 vs t2 depict different groups of people — top face on each
+        # side is unlikely to match.
+        res = self.harness.verify(self.samples["t1"], self.samples["t2"])
+        # We assert only that some numerical answer came back; the
+        # matches-or-not determination depends on which face each side
+        # picked and isn't a stable test assertion.
+        self.assertGreaterEqual(res.distance, 0.0)
+
+    def test_analyze_has_age_and_gender(self):
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertGreater(face.face_confidence, 0.0)
+            # Age should be populated for buffalo_l.
+            self.assertGreater(face.age, 0.0)
+            self.assertIn(face.dominant_gender, ("Man", "Woman"))
+
+
+def _prepare_opencv_models_dir() -> str | None:
+    """Download OpenCV Zoo face ONNX files into a temp dir the way
+    LocalAI's gallery would. Returns the directory, or None if
+    downloads failed (network-restricted sandbox).
+    """
+    import hashlib
+    import tempfile
+    import urllib.request
+
+    root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
+        prefix="opencv-face-"
+    )
+    for filename, uri, sha256 in OPENCV_FILES:
+        dest = os.path.join(root, filename)
+        if os.path.isfile(dest):
+            if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
+                continue
+        try:
+            urllib.request.urlretrieve(uri, dest)
+        except Exception:
+            return None
+        if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
+            return None
+    return root
+
+
+class OnnxDirectEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.model_dir = _prepare_opencv_models_dir()
+        if cls.model_dir is None:
+            raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(
+            [
+                "engine:onnx_direct",
+                "detector_onnx:face_detection_yunet_2023mar.onnx",
+                "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+            ],
+            model_path=cls.model_dir,
+        )
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+
+    def test_embedding_nonempty(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertGreater(len(res.embeddings), 0)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
+        self.assertTrue(res.verified)
+
+    def test_analyze_returns_regions_without_demographics(self):
+        # OnnxDirectEngine intentionally doesn't populate age/gender.
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertEqual(face.dominant_gender, "")
+            self.assertEqual(face.age, 0.0)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/backend/python/insightface/test.sh
+++ b/backend/python/insightface/test.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -7,17 +7,28 @@ import (
 	"sync/atomic"
 	"time"

+	corebackend "github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/config"
 	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/services/agentpool"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
 	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/LocalAI/core/templates"
+	pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
 	"github.com/mudler/LocalAI/pkg/model"
 	"github.com/mudler/xlog"
 	"gorm.io/gorm"
 )

+// faceEmbeddingDim is the expected dimension for face embeddings.
+// Set to 0 so the Registry accepts whatever dim the loaded recognizer
+// produces — ArcFace R50 is 512-d, MBF is 512-d, SFace is 128-d, and
+// the insightface backend can load any of them via LoadModel options.
+// Locking this to a specific value would force a single recognizer
+// family per deployment; we keep the door open instead.
+const faceEmbeddingDim = 0
+
 type Application struct {
 	backendLoader      *config.ModelConfigLoader
 	modelLoader        *model.ModelLoader
@@ -27,6 +38,7 @@ type Application struct {
 	galleryService     *galleryop.GalleryService
 	agentJobService    *agentpool.AgentJobService
 	agentPoolService   atomic.Pointer[agentpool.AgentPoolService]
+	faceRegistry       facerecognition.Registry
 	authDB             *gorm.DB
 	watchdogMutex      sync.Mutex
 	watchdogStop       chan bool
@@ -50,12 +62,23 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
 		mcpTools.CloseMCPSessions(modelName)
 	})

-	return &Application{
+	app := &Application{
 		backendLoader:      config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
 		modelLoader:        ml,
 		applicationConfig:  appConfig,
 		templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
 	}
+
+	// Face-recognition registry backed by LocalAI's built-in vector store.
+	// The resolver closes over the ModelLoader so the Registry stays
+	// decoupled from loader plumbing; swapping in a postgres-backed
+	// implementation later is a single construction change here.
+	faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
+		return corebackend.StoreBackend(ml, appConfig, storeName, "")
+	}
+	app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
+
+	return app
 }

 func (a *Application) ModelConfigLoader() *config.ModelConfigLoader {
@@ -99,6 +122,14 @@ func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
 	return a.agentPoolService.Load()
 }

+// FaceRegistry returns the face-recognition registry used for 1:N
+// identification. The current implementation is backed by the
+// in-memory local-store backend; see core/services/facerecognition
+// for the interface and the postgres TODO.
+func (a *Application) FaceRegistry() facerecognition.Registry {
+	return a.faceRegistry
+}
+
 // AuthDB returns the auth database connection, or nil if auth is not enabled.
 func (a *Application) AuthDB() *gorm.DB {
 	return a.authDB
--- a/core/backend/face_analyze.go
+++ b/core/backend/face_analyze.go
@@ -0,0 +1,60 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/trace"
+	"github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+func FaceAnalyze(
+	img string,
+	actions []string,
+	antiSpoofing bool,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) (*proto.FaceAnalyzeResponse, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	var startTime time.Time
+	if appConfig.EnableTracing {
+		trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
+		startTime = time.Now()
+	}
+
+	res, err := faceModel.FaceAnalyze(context.Background(), &proto.FaceAnalyzeRequest{
+		Img:          img,
+		Actions:      actions,
+		AntiSpoofing: antiSpoofing,
+	})
+
+	if appConfig.EnableTracing {
+		errStr := ""
+		if err != nil {
+			errStr = err.Error()
+		}
+		trace.RecordBackendTrace(trace.BackendTrace{
+			Timestamp: startTime,
+			Duration:  time.Since(startTime),
+			Type:      trace.BackendTraceFaceAnalyze,
+			ModelName: modelConfig.Name,
+			Backend:   modelConfig.Backend,
+			Error:     errStr,
+		})
+	}
+
+	return res, err
+}
--- a/core/backend/face_embed.go
+++ b/core/backend/face_embed.go
@@ -0,0 +1,43 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+// FaceEmbed loads the face recognition backend and returns a 512-d
+// face embedding for the base64-encoded image. Unlike ModelEmbedding
+// it passes the image through PredictOptions.Images — the insightface
+// backend picks the highest-confidence face and returns its
+// L2-normalized embedding.
+func FaceEmbed(
+	imgBase64 string,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) ([]float32, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	predictOpts := gRPCPredictOpts(modelConfig, loader.ModelPath)
+	predictOpts.Images = []string{imgBase64}
+
+	res, err := faceModel.Embeddings(context.Background(), predictOpts)
+	if err != nil {
+		return nil, err
+	}
+	if len(res.Embeddings) == 0 {
+		return nil, fmt.Errorf("face embedding returned empty vector (no face detected?)")
+	}
+	return res.Embeddings, nil
+}
--- a/core/backend/face_verify.go
+++ b/core/backend/face_verify.go
@@ -0,0 +1,61 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/trace"
+	"github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+func FaceVerify(
+	img1, img2 string,
+	threshold float32,
+	antiSpoofing bool,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) (*proto.FaceVerifyResponse, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	var startTime time.Time
+	if appConfig.EnableTracing {
+		trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
+		startTime = time.Now()
+	}
+
+	res, err := faceModel.FaceVerify(context.Background(), &proto.FaceVerifyRequest{
+		Img1:         img1,
+		Img2:         img2,
+		Threshold:    threshold,
+		AntiSpoofing: antiSpoofing,
+	})
+
+	if appConfig.EnableTracing {
+		errStr := ""
+		if err != nil {
+			errStr = err.Error()
+		}
+		trace.RecordBackendTrace(trace.BackendTrace{
+			Timestamp: startTime,
+			Duration:  time.Since(startTime),
+			Type:      trace.BackendTraceFaceVerify,
+			ModelName: modelConfig.Name,
+			Backend:   modelConfig.Backend,
+			Error:     errStr,
+		})
+	}
+
+	return res, err
+}
--- a/core/config/model_config.go
+++ b/core/config/model_config.go
@@ -588,6 +588,7 @@ const (
 	FLAG_VAD              ModelConfigUsecase = 0b010000000000
 	FLAG_VIDEO            ModelConfigUsecase = 0b100000000000
 	FLAG_DETECTION        ModelConfigUsecase = 0b1000000000000
+	FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000

 	// Common Subsets
 	FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@@ -611,6 +612,7 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
 		"FLAG_LLM":              FLAG_LLM,
 		"FLAG_VIDEO":            FLAG_VIDEO,
 		"FLAG_DETECTION":        FLAG_DETECTION,
+		"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
 	}
 }

@@ -651,7 +653,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
 	nonTextGenBackends := []string{
 		"whisper", "piper", "kokoro",
 		"diffusers", "stablediffusion", "stablediffusion-ggml",
-		"rerankers", "silero-vad", "rfdetr",
+		"rerankers", "silero-vad", "rfdetr", "insightface",
 		"transformers-musicgen", "ace-step", "acestep-cpp",
 	}

@@ -728,12 +730,19 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
 	}

 	if (u & FLAG_DETECTION) == FLAG_DETECTION {
-		detectionBackends := []string{"rfdetr", "sam3-cpp"}
+		detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
 		if !slices.Contains(detectionBackends, c.Backend) {
 			return false
 		}
 	}

+	if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
+		faceBackends := []string{"insightface"}
+		if !slices.Contains(faceBackends, c.Backend) {
+			return false
+		}
+	}
+
 	if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
 		soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
 		if !slices.Contains(soundGenBackends, c.Backend) {
--- a/core/http/auth/features.go
+++ b/core/http/auth/features.go
@@ -57,6 +57,14 @@ var RouteFeatureRegistry = []RouteFeature{
 	// Detection
 	{"POST", "/v1/detection", FeatureDetection},

+	// Face recognition
+	{"POST", "/v1/face/verify", FeatureFaceRecognition},
+	{"POST", "/v1/face/analyze", FeatureFaceRecognition},
+	{"POST", "/v1/face/embed", FeatureFaceRecognition},
+	{"POST", "/v1/face/register", FeatureFaceRecognition},
+	{"POST", "/v1/face/identify", FeatureFaceRecognition},
+	{"POST", "/v1/face/forget", FeatureFaceRecognition},
+
 	// Video
 	{"POST", "/video", FeatureVideo},

@@ -151,5 +159,6 @@ func APIFeatureMetas() []FeatureMeta {
 		{FeatureTokenize, "Tokenize", true},
 		{FeatureMCP, "MCP", true},
 		{FeatureStores, "Stores", true},
+		{FeatureFaceRecognition, "Face Recognition", true},
 	}
 }
--- a/core/http/auth/permissions.go
+++ b/core/http/auth/permissions.go
@@ -51,6 +51,7 @@ const (
 	FeatureTokenize           = "tokenize"
 	FeatureMCP                = "mcp"
 	FeatureStores             = "stores"
+	FeatureFaceRecognition    = "face_recognition"
 )

 // AgentFeatures lists agent-related features (default OFF).
@@ -64,6 +65,7 @@ var APIFeatures = []string{
 	FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription,
 	FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound,
 	FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores,
+	FeatureFaceRecognition,
 }

 // AllFeatures lists all known features (used by UI and validation).
--- a/core/http/endpoints/localai/api_instructions.go
+++ b/core/http/endpoints/localai/api_instructions.go
@@ -73,6 +73,12 @@ var instructionDefs = []instructionDef{
 		Description: "Video generation from text prompts",
 		Tags:        []string{"video"},
 	},
+	{
+		Name:        "face-recognition",
+		Description: "Face verification (1:1), identification (1:N), embedding, and demographic analysis",
+		Tags:        []string{"face-recognition"},
+		Intro:       "The /v1/face/register, /identify, and /forget endpoints build on a vector store — registrations are in-memory by default and lost on restart. Use /v1/face/embed for a raw embedding; /v1/embeddings is OpenAI-compatible and text-only.",
+	},
 }

 // swaggerState holds parsed swagger spec data, initialised once.
--- a/core/http/endpoints/localai/api_instructions_test.go
+++ b/core/http/endpoints/localai/api_instructions_test.go
@@ -39,7 +39,7 @@ var _ = Describe("API Instructions Endpoints", func() {

 			instructions, ok := resp["instructions"].([]any)
 			Expect(ok).To(BeTrue())
-			Expect(instructions).To(HaveLen(9))
+			Expect(instructions).To(HaveLen(10))

 			// Verify each instruction has required fields and correct URL format
 			for _, s := range instructions {
@@ -73,6 +73,7 @@ var _ = Describe("API Instructions Endpoints", func() {
 				"model-management",
 				"monitoring",
 				"agents",
+				"face-recognition",
 			))
 		})
 	})
--- a/core/http/endpoints/localai/detection.go
+++ b/core/http/endpoints/localai/detection.go
@@ -9,7 +9,6 @@ import (
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
 	"github.com/mudler/xlog"
 )

@@ -34,14 +33,14 @@ func DetectionEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC

 		xlog.Debug("Detection", "image", input.Image, "modelFile", "modelFile", "backend", cfg.Backend)

-		image, err := utils.GetContentURIAsBase64(input.Image)
+		image, err := decodeImageInput(input.Image)
 		if err != nil {
 			return err
 		}

 		res, err := backend.Detection(image, input.Prompt, input.Points, input.Boxes, input.Threshold, ml, appConfig, *cfg)
 		if err != nil {
-			return err
+			return mapBackendError(err)
 		}

 		response := schema.DetectionResponse{
--- a/core/http/endpoints/localai/face_analyze.go
+++ b/core/http/endpoints/localai/face_analyze.go
@@ -0,0 +1,69 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceAnalyzeEndpoint returns demographic attributes for faces in an image.
+// @Summary Analyze demographic attributes (age, gender, ...) of faces.
+// @Tags face-recognition
+// @Param request body schema.FaceAnalyzeRequest true "query params"
+// @Success 200 {object} schema.FaceAnalyzeResponse "Response"
+// @Router /v1/face/analyze [post]
+func FaceAnalyzeEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceAnalyzeRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceAnalyze", "model", cfg.Name, "backend", cfg.Backend, "actions", input.Actions)
+		res, err := backend.FaceAnalyze(img, input.Actions, input.AntiSpoofing, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		response := schema.FaceAnalyzeResponse{
+			Faces: make([]schema.FaceAnalysis, len(res.GetFaces())),
+		}
+		for i, f := range res.GetFaces() {
+			response.Faces[i] = schema.FaceAnalysis{
+				Region: schema.FacialArea{
+					X: f.GetRegion().GetX(),
+					Y: f.GetRegion().GetY(),
+					W: f.GetRegion().GetW(),
+					H: f.GetRegion().GetH(),
+				},
+				FaceConfidence:  f.GetFaceConfidence(),
+				Age:             f.GetAge(),
+				DominantGender:  f.GetDominantGender(),
+				Gender:          f.GetGender(),
+				DominantEmotion: f.GetDominantEmotion(),
+				Emotion:         f.GetEmotion(),
+				DominantRace:    f.GetDominantRace(),
+				Race:            f.GetRace(),
+				IsReal:          f.GetIsReal(),
+				AntispoofScore:  f.GetAntispoofScore(),
+			}
+		}
+
+		return c.JSON(http.StatusOK, response)
+	}
+}
--- a/core/http/endpoints/localai/face_embed.go
+++ b/core/http/endpoints/localai/face_embed.go
@@ -0,0 +1,54 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceEmbedEndpoint extracts a face embedding vector from an image.
+//
+// Distinct from /v1/embeddings, which is OpenAI-compatible and text-only
+// by contract (its `input` field is a string or string list of TEXT to
+// embed). Passing an image data-URI to /v1/embeddings does not work —
+// use this endpoint instead.
+//
+// @Summary Extract a face embedding from an image.
+// @Tags face-recognition
+// @Param request body schema.FaceEmbedRequest true "query params"
+// @Success 200 {object} schema.FaceEmbedResponse "Response"
+// @Router /v1/face/embed [post]
+func FaceEmbedEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceEmbedRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceEmbed", "model", cfg.Name, "backend", cfg.Backend)
+		vec, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+		return c.JSON(http.StatusOK, schema.FaceEmbedResponse{
+			Embedding: vec,
+			Dim:       len(vec),
+			Model:     cfg.Name,
+		})
+	}
+}
--- a/core/http/endpoints/localai/face_forget.go
+++ b/core/http/endpoints/localai/face_forget.go
@@ -0,0 +1,45 @@
+package localai
+
+import (
+	"errors"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/xlog"
+)
+
+// FaceForgetEndpoint removes a previously-registered face by ID.
+// @Summary Remove a previously-registered face by ID.
+// @Tags face-recognition
+// @Param request body schema.FaceForgetRequest true "query params"
+// @Success 204 "No Content"
+// @Router /v1/face/forget [post]
+func FaceForgetEndpoint(registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceForgetRequest)
+		if !ok {
+			// Forget doesn't need a face model loaded — fall back to a raw bind
+			// when the request extractor hasn't run (e.g. when the route was
+			// registered without SetModelAndConfig).
+			input = new(schema.FaceForgetRequest)
+			if err := c.Bind(input); err != nil {
+				return echo.ErrBadRequest
+			}
+		}
+		if input.ID == "" {
+			return echo.NewHTTPError(http.StatusBadRequest, "id is required")
+		}
+
+		xlog.Debug("FaceForget", "id", input.ID)
+		if err := registry.Forget(c.Request().Context(), input.ID); err != nil {
+			if errors.Is(err, facerecognition.ErrNotFound) {
+				return echo.NewHTTPError(http.StatusNotFound, err.Error())
+			}
+			return err
+		}
+		return c.NoContent(http.StatusNoContent)
+	}
+}
--- a/core/http/endpoints/localai/face_identify.go
+++ b/core/http/endpoints/localai/face_identify.go
@@ -0,0 +1,80 @@
+package localai
+
+import (
+	"cmp"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// defaultIdentifyThreshold is the cosine-distance cutoff applied when
+// the client does not specify one. Tuned for buffalo_l ArcFace R50;
+// other recognizers (e.g. SFace) should override it explicitly.
+const defaultIdentifyThreshold = float32(0.35)
+
+// FaceIdentifyEndpoint runs 1:N identification against the registered store.
+// @Summary Identify a face against the registered database (1:N recognition).
+// @Tags face-recognition
+// @Param request body schema.FaceIdentifyRequest true "query params"
+// @Success 200 {object} schema.FaceIdentifyResponse "Response"
+// @Router /v1/face/identify [post]
+func FaceIdentifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceIdentifyRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		topK := cmp.Or(input.TopK, 5)
+		threshold := cmp.Or(input.Threshold, defaultIdentifyThreshold)
+
+		xlog.Debug("FaceIdentify", "model", cfg.Name, "topK", topK, "threshold", threshold)
+		probe, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		matches, err := registry.Identify(c.Request().Context(), probe, topK)
+		if err != nil {
+			return err
+		}
+
+		response := schema.FaceIdentifyResponse{
+			Matches: make([]schema.FaceIdentifyMatch, len(matches)),
+		}
+		for i, m := range matches {
+			confidence := (1 - m.Distance/threshold) * 100
+			if confidence < 0 {
+				confidence = 0
+			}
+			if confidence > 100 {
+				confidence = 100
+			}
+			response.Matches[i] = schema.FaceIdentifyMatch{
+				ID:         m.ID,
+				Name:       m.Metadata.Name,
+				Labels:     m.Metadata.Labels,
+				Distance:   m.Distance,
+				Confidence: confidence,
+				Match:      m.Distance <= threshold,
+			}
+		}
+		return c.JSON(http.StatusOK, response)
+	}
+}
--- a/core/http/endpoints/localai/face_register.go
+++ b/core/http/endpoints/localai/face_register.go
@@ -0,0 +1,60 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceRegisterEndpoint enrolls a face into the 1:N identification store.
+// @Summary Register a face for 1:N identification.
+// @Tags face-recognition
+// @Param request body schema.FaceRegisterRequest true "query params"
+// @Success 200 {object} schema.FaceRegisterResponse "Response"
+// @Router /v1/face/register [post]
+func FaceRegisterEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceRegisterRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+		if input.Name == "" {
+			return echo.NewHTTPError(http.StatusBadRequest, "name is required")
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceRegister", "model", cfg.Name, "name", input.Name)
+		embedding, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		stored, err := registry.Register(c.Request().Context(), embedding, facerecognition.Metadata{
+			Name:   input.Name,
+			Labels: input.Labels,
+		})
+		if err != nil {
+			return err
+		}
+		return c.JSON(http.StatusOK, schema.FaceRegisterResponse{
+			ID:           stored.ID,
+			Name:         stored.Name,
+			RegisteredAt: stored.RegisteredAt,
+		})
+	}
+}
--- a/core/http/endpoints/localai/face_verify.go
+++ b/core/http/endpoints/localai/face_verify.go
@@ -0,0 +1,68 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceVerifyEndpoint compares two images and reports whether they depict the same person.
+// @Summary Verify that two images depict the same person.
+// @Tags face-recognition
+// @Param request body schema.FaceVerifyRequest true "query params"
+// @Success 200 {object} schema.FaceVerifyResponse "Response"
+// @Router /v1/face/verify [post]
+func FaceVerifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceVerifyRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img1, err := decodeImageInput(input.Img1)
+		if err != nil {
+			return err
+		}
+		img2, err := decodeImageInput(input.Img2)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceVerify", "model", cfg.Name, "backend", cfg.Backend)
+		res, err := backend.FaceVerify(img1, img2, input.Threshold, input.AntiSpoofing, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		return c.JSON(http.StatusOK, schema.FaceVerifyResponse{
+			Verified:   res.GetVerified(),
+			Distance:   res.GetDistance(),
+			Threshold:  res.GetThreshold(),
+			Confidence: res.GetConfidence(),
+			Model:      res.GetModel(),
+			Img1Area: schema.FacialArea{
+				X: res.GetImg1Area().GetX(),
+				Y: res.GetImg1Area().GetY(),
+				W: res.GetImg1Area().GetW(),
+				H: res.GetImg1Area().GetH(),
+			},
+			Img2Area: schema.FacialArea{
+				X: res.GetImg2Area().GetX(),
+				Y: res.GetImg2Area().GetY(),
+				W: res.GetImg2Area().GetW(),
+				H: res.GetImg2Area().GetH(),
+			},
+			ProcessingTimeMs: res.GetProcessingTimeMs(),
+		})
+	}
+}
--- a/core/http/endpoints/localai/images.go
+++ b/core/http/endpoints/localai/images.go
@@ -0,0 +1,55 @@
+package localai
+
+import (
+	"fmt"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/pkg/utils"
+	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/status"
+)
+
+// decodeImageInput resolves a URL, data-URI, or plain-string image
+// input to a base64 payload ready for the gRPC surface. Errors from
+// the underlying utils helper (bad URL, not a data-URI, download
+// failure, etc.) are all caused by what the client sent — we surface
+// them as 400 rather than the default 500 so API consumers can
+// distinguish "you sent bad input" from "our server broke".
+//
+// This is the single-input path for endpoints where the image IS the
+// request (detection, face recognition, etc.). The multi-modal message
+// paths (chat completions, responses API, realtime) intentionally
+// log-and-skip individual media parts; they don't use this helper.
+func decodeImageInput(s string) (string, error) {
+	img, err := utils.GetContentURIAsBase64(s)
+	if err != nil {
+		return "", echo.NewHTTPError(http.StatusBadRequest, fmt.Sprintf("invalid image input: %v", err))
+	}
+	return img, nil
+}
+
+// mapBackendError converts the gRPC status code a backend returns into
+// a matching HTTP status. Without this, every backend error defaults
+// to 500 — which lies to API consumers when the backend is telling us
+// "your input was bad" (INVALID_ARGUMENT) or "the resource doesn't
+// exist" (NOT_FOUND). Pass any err from a `core/backend/*` call
+// through this before returning from a handler.
+func mapBackendError(err error) error {
+	if err == nil {
+		return nil
+	}
+	if st, ok := status.FromError(err); ok {
+		switch st.Code() {
+		case codes.InvalidArgument:
+			return echo.NewHTTPError(http.StatusBadRequest, st.Message())
+		case codes.NotFound:
+			return echo.NewHTTPError(http.StatusNotFound, st.Message())
+		case codes.FailedPrecondition:
+			return echo.NewHTTPError(http.StatusPreconditionFailed, st.Message())
+		case codes.Unimplemented:
+			return echo.NewHTTPError(http.StatusNotImplemented, st.Message())
+		}
+	}
+	return err
+}
--- a/core/http/react-ui/src/utils/capabilities.js
+++ b/core/http/react-ui/src/utils/capabilities.js
@@ -12,3 +12,4 @@ export const CAP_TOKENIZE = 'FLAG_TOKENIZE'
 export const CAP_VAD = 'FLAG_VAD'
 export const CAP_VIDEO = 'FLAG_VIDEO'
 export const CAP_DETECTION = 'FLAG_DETECTION'
+export const CAP_FACE_RECOGNITION = 'FLAG_FACE_RECOGNITION'
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -97,6 +97,28 @@ func RegisterLocalAIRoutes(router *echo.Echo,
 		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
 		requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }))

+	// Face recognition endpoints
+	faceMw := []echo.MiddlewareFunc{
+		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_FACE_RECOGNITION)),
+	}
+	router.POST("/v1/face/verify",
+		localai.FaceVerifyEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceVerifyRequest) }))...)
+	router.POST("/v1/face/analyze",
+		localai.FaceAnalyzeEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceAnalyzeRequest) }))...)
+	router.POST("/v1/face/embed",
+		localai.FaceEmbedEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceEmbedRequest) }))...)
+	router.POST("/v1/face/register",
+		localai.FaceRegisterEndpoint(cl, ml, appConfig, app.FaceRegistry()),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceRegisterRequest) }))...)
+	router.POST("/v1/face/identify",
+		localai.FaceIdentifyEndpoint(cl, ml, appConfig, app.FaceRegistry()),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceIdentifyRequest) }))...)
+	// Forget does not load a face model — it only needs the registry.
+	router.POST("/v1/face/forget", localai.FaceForgetEndpoint(app.FaceRegistry()))
+
 	ttsHandler := localai.TTSEndpoint(cl, ml, appConfig)
 	router.POST("/tts",
 		ttsHandler,
--- a/core/schema/localai.go
+++ b/core/schema/localai.go
@@ -173,6 +173,123 @@ type Detection struct {
 	Mask       string  `json:"mask,omitempty"` // base64-encoded PNG segmentation mask
 }

+// ─── Face recognition ──────────────────────────────────────────────
+//
+// FacialArea describes a bounding box for a detected face.
+type FacialArea struct {
+	X float32 `json:"x"`
+	Y float32 `json:"y"`
+	W float32 `json:"w"`
+	H float32 `json:"h"`
+}
+
+// FaceVerifyRequest compares two images to decide whether they depict
+// the same person. Img1 and Img2 accept URL, base64, or data-URI.
+type FaceVerifyRequest struct {
+	BasicModelRequest
+	Img1         string  `json:"img1"`
+	Img2         string  `json:"img2"`
+	Threshold    float32 `json:"threshold,omitempty"`
+	AntiSpoofing bool    `json:"anti_spoofing,omitempty"`
+}
+
+type FaceVerifyResponse struct {
+	Verified         bool       `json:"verified"`
+	Distance         float32    `json:"distance"`
+	Threshold        float32    `json:"threshold"`
+	Confidence       float32    `json:"confidence"`
+	Model            string     `json:"model"`
+	Img1Area         FacialArea `json:"img1_area"`
+	Img2Area         FacialArea `json:"img2_area"`
+	ProcessingTimeMs float32    `json:"processing_time_ms,omitempty"`
+}
+
+// FaceAnalyzeRequest asks the backend for demographic attributes on
+// every face detected in Img.
+type FaceAnalyzeRequest struct {
+	BasicModelRequest
+	Img          string   `json:"img"`
+	Actions      []string `json:"actions,omitempty"` // subset of {"age","gender","emotion","race"}
+	AntiSpoofing bool     `json:"anti_spoofing,omitempty"`
+}
+
+type FaceAnalyzeResponse struct {
+	Faces []FaceAnalysis `json:"faces"`
+}
+
+type FaceAnalysis struct {
+	Region          FacialArea         `json:"region"`
+	FaceConfidence  float32            `json:"face_confidence"`
+	Age             float32            `json:"age,omitempty"`
+	DominantGender  string             `json:"dominant_gender,omitempty"`
+	Gender          map[string]float32 `json:"gender,omitempty"`
+	DominantEmotion string             `json:"dominant_emotion,omitempty"`
+	Emotion         map[string]float32 `json:"emotion,omitempty"`
+	DominantRace    string             `json:"dominant_race,omitempty"`
+	Race            map[string]float32 `json:"race,omitempty"`
+	IsReal          bool               `json:"is_real,omitempty"`
+	AntispoofScore  float32            `json:"antispoof_score,omitempty"`
+}
+
+// FaceEmbedRequest extracts a face embedding from an image. Distinct
+// from /v1/embeddings (which is OpenAI-compatible and text-only); this
+// endpoint accepts URL / base64 / data-URI image inputs.
+type FaceEmbedRequest struct {
+	BasicModelRequest
+	Img string `json:"img"`
+}
+
+type FaceEmbedResponse struct {
+	Embedding []float32 `json:"embedding"`
+	Dim       int       `json:"dim"`
+	Model     string    `json:"model,omitempty"`
+}
+
+// FaceRegisterRequest enrolls a face into the 1:N recognition store.
+type FaceRegisterRequest struct {
+	BasicModelRequest
+	Img    string            `json:"img"`
+	Name   string            `json:"name"`
+	Labels map[string]string `json:"labels,omitempty"`
+	Store  string            `json:"store,omitempty"` // vector store model; empty = local-store default
+}
+
+type FaceRegisterResponse struct {
+	ID           string    `json:"id"`
+	Name         string    `json:"name"`
+	RegisteredAt time.Time `json:"registered_at"`
+}
+
+// FaceIdentifyRequest runs 1:N recognition: embed the probe and
+// return the top-K nearest registered faces.
+type FaceIdentifyRequest struct {
+	BasicModelRequest
+	Img       string  `json:"img"`
+	TopK      int     `json:"top_k,omitempty"`
+	Threshold float32 `json:"threshold,omitempty"` // optional cutoff on distance
+	Store     string  `json:"store,omitempty"`
+}
+
+type FaceIdentifyResponse struct {
+	Matches []FaceIdentifyMatch `json:"matches"`
+}
+
+type FaceIdentifyMatch struct {
+	ID         string            `json:"id"`
+	Name       string            `json:"name"`
+	Labels     map[string]string `json:"labels,omitempty"`
+	Distance   float32           `json:"distance"`
+	Confidence float32           `json:"confidence"`
+	Match      bool              `json:"match"` // true when distance <= threshold
+}
+
+// FaceForgetRequest removes a previously-registered face by ID.
+type FaceForgetRequest struct {
+	BasicModelRequest
+	ID    string `json:"id"`
+	Store string `json:"store,omitempty"`
+}
+
 type ImportModelRequest struct {
 	URI         string          `json:"uri"`
 	Preferences json.RawMessage `json:"preferences,omitempty"`
--- a/core/services/facerecognition/registry.go
+++ b/core/services/facerecognition/registry.go
@@ -0,0 +1,60 @@
+// Package facerecognition provides a swappable backing store for face
+// embeddings and the 1:N identification pipeline that sits on top of it.
+//
+// The current implementation (NewStoreRegistry) is backed by LocalAI's
+// in-memory local-store gRPC backend. This is in-memory only — all
+// registrations are lost when LocalAI restarts.
+//
+// TODO: add a persistent PostgreSQL/pgvector-backed implementation for
+// production deployments. The Registry interface is explicitly designed
+// so the swap is a constructor change in core/application, with zero
+// HTTP-handler changes.
+package facerecognition
+
+import (
+	"context"
+	"errors"
+	"time"
+)
+
+// Registry stores face embeddings keyed by an opaque ID and supports
+// approximate similarity search. Implementations are expected to be
+// safe for concurrent use.
+type Registry interface {
+	// Register stores a face embedding alongside its metadata.
+	// Returns the stored metadata with ID and RegisteredAt populated.
+	// The embedding length must match the registry's expected dimension.
+	Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error)
+
+	// Identify returns up to topK matches for the probe embedding,
+	// sorted by ascending distance (closest first).
+	Identify(ctx context.Context, probe []float32, topK int) ([]Match, error)
+
+	// Forget removes a previously-registered embedding by ID.
+	// Returns ErrNotFound if the ID is unknown.
+	Forget(ctx context.Context, id string) error
+}
+
+// Metadata is the user-supplied payload stored alongside a face embedding.
+type Metadata struct {
+	// ID is populated by the registry at Register time and should not be
+	// set by the caller. It is echoed back in Match.Metadata.
+	ID           string            `json:"id"`
+	Name         string            `json:"name"`
+	Labels       map[string]string `json:"labels,omitempty"`
+	RegisteredAt time.Time         `json:"registered_at"`
+}
+
+// Match is a single result from Identify, ranked by similarity.
+type Match struct {
+	ID       string
+	Metadata Metadata
+	Distance float32 // 1 - cosine_similarity; lower = closer
+}
+
+// Sentinel errors; callers should compare with errors.Is.
+var (
+	ErrNotFound          = errors.New("facerecognition: id not found")
+	ErrEmptyEmbedding    = errors.New("facerecognition: embedding is empty")
+	ErrDimensionMismatch = errors.New("facerecognition: embedding dimension mismatch")
+)
--- a/core/services/facerecognition/registry_test.go
+++ b/core/services/facerecognition/registry_test.go
@@ -0,0 +1,253 @@
+package facerecognition_test
+
+import (
+	"context"
+	"errors"
+	"math"
+	"sync"
+	"testing"
+
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/grpc"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+
+	grpclib "google.golang.org/grpc"
+)
+
+const dim = 4 // tiny test-friendly embedding dimension
+
+func TestRegisterIdentifyForget(t *testing.T) {
+	t.Parallel()
+
+	reg, fake := newTestRegistry(t)
+	ctx := t.Context()
+
+	alice := []float32{1, 0, 0, 0}
+	bob := []float32{0, 1, 0, 0}
+
+	aliceMeta, err := reg.Register(ctx, alice, facerecognition.Metadata{Name: "Alice"})
+	if err != nil {
+		t.Fatalf("Register Alice: %v", err)
+	}
+	if aliceMeta.ID == "" {
+		t.Fatalf("Register returned empty ID")
+	}
+	if aliceMeta.RegisteredAt.IsZero() {
+		t.Fatalf("Register did not populate RegisteredAt")
+	}
+
+	bobMeta, err := reg.Register(ctx, bob, facerecognition.Metadata{Name: "Bob"})
+	if err != nil {
+		t.Fatalf("Register Bob: %v", err)
+	}
+	if bobMeta.ID == aliceMeta.ID {
+		t.Fatalf("IDs should be distinct, got %q twice", bobMeta.ID)
+	}
+	aliceID := aliceMeta.ID
+	if got, want := fake.len(), 2; got != want {
+		t.Fatalf("fake store has %d entries, want %d", got, want)
+	}
+
+	// Identify an Alice-like probe — she should win.
+	matches, err := reg.Identify(ctx, []float32{0.99, 0.01, 0, 0}, 2)
+	if err != nil {
+		t.Fatalf("Identify: %v", err)
+	}
+	if len(matches) == 0 {
+		t.Fatalf("no matches returned")
+	}
+	if matches[0].Metadata.Name != "Alice" {
+		t.Fatalf("top match name = %q, want Alice", matches[0].Metadata.Name)
+	}
+	if matches[0].ID != aliceID {
+		t.Fatalf("top match ID = %q, want %q", matches[0].ID, aliceID)
+	}
+	// Sorted ascending by distance.
+	for i := 1; i < len(matches); i++ {
+		if matches[i].Distance < matches[i-1].Distance {
+			t.Fatalf("matches not sorted by distance: %v", matches)
+		}
+	}
+
+	// Forget Alice → she's gone, Bob remains.
+	if err := reg.Forget(ctx, aliceID); err != nil {
+		t.Fatalf("Forget Alice: %v", err)
+	}
+	if got, want := fake.len(), 1; got != want {
+		t.Fatalf("after Forget, store has %d entries, want %d", got, want)
+	}
+
+	// Forget unknown ID → ErrNotFound (checkable via errors.Is).
+	if err := reg.Forget(ctx, "nonexistent"); !errors.Is(err, facerecognition.ErrNotFound) {
+		t.Fatalf("Forget unknown: err = %v, want ErrNotFound", err)
+	}
+}
+
+func TestRegisterRejectsBadEmbedding(t *testing.T) {
+	t.Parallel()
+
+	reg, _ := newTestRegistry(t)
+	ctx := t.Context()
+
+	tests := []struct {
+		name    string
+		embed   []float32
+		wantErr error
+	}{
+		{"empty", []float32{}, facerecognition.ErrEmptyEmbedding},
+		{"wrong_dim", []float32{1, 2}, facerecognition.ErrDimensionMismatch},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			t.Parallel()
+			_, err := reg.Register(ctx, tc.embed, facerecognition.Metadata{Name: "x"})
+			if !errors.Is(err, tc.wantErr) {
+				t.Fatalf("err = %v, want wrapping %v", err, tc.wantErr)
+			}
+		})
+	}
+}
+
+func TestConcurrent(t *testing.T) {
+	t.Parallel()
+
+	reg, _ := newTestRegistry(t)
+	ctx := t.Context()
+
+	done := make(chan struct{})
+	for i := range 32 {
+		go func(i int) {
+			embed := []float32{float32(i % 4), float32((i + 1) % 4), 0, 1}
+			meta, err := reg.Register(ctx, embed, facerecognition.Metadata{Name: "n"})
+			if err == nil {
+				_, _ = reg.Identify(ctx, embed, 3)
+				_ = reg.Forget(ctx, meta.ID)
+			}
+			done <- struct{}{}
+		}(i)
+	}
+	for range 32 {
+		<-done
+	}
+}
+
+// ─── fake gRPC backend ───────────────────────────────────────────────
+
+func newTestRegistry(t *testing.T) (facerecognition.Registry, *fakeBackend) {
+	t.Helper()
+	fake := &fakeBackend{}
+	resolver := func(_ context.Context, _ string) (grpc.Backend, error) {
+		return fake, nil
+	}
+	return facerecognition.NewStoreRegistry(resolver, "test-store", dim), fake
+}
+
+// fakeBackend implements just enough of grpc.Backend for the store
+// helpers. All other methods panic so any accidental dependency is
+// visible in tests.
+type fakeBackend struct {
+	grpc.Backend // embed to inherit no-op default method set via panic
+
+	mu   sync.Mutex
+	keys [][]float32
+	vals [][]byte
+}
+
+func (f *fakeBackend) len() int {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	return len(f.keys)
+}
+
+func (f *fakeBackend) StoresSet(_ context.Context, in *pb.StoresSetOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	for i, k := range in.Keys {
+		f.keys = append(f.keys, append([]float32(nil), k.Floats...))
+		f.vals = append(f.vals, append([]byte(nil), in.Values[i].Bytes...))
+	}
+	return &pb.Result{Success: true}, nil
+}
+
+func (f *fakeBackend) StoresDelete(_ context.Context, in *pb.StoresDeleteOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	for _, k := range in.Keys {
+		idx := f.findKey(k.Floats)
+		if idx < 0 {
+			continue
+		}
+		f.keys = append(f.keys[:idx], f.keys[idx+1:]...)
+		f.vals = append(f.vals[:idx], f.vals[idx+1:]...)
+	}
+	return &pb.Result{Success: true}, nil
+}
+
+func (f *fakeBackend) StoresFind(_ context.Context, in *pb.StoresFindOptions, _ ...grpclib.CallOption) (*pb.StoresFindResult, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+
+	type scored struct {
+		key []float32
+		val []byte
+		sim float32
+	}
+	results := make([]scored, 0, len(f.keys))
+	for i, k := range f.keys {
+		results = append(results, scored{k, f.vals[i], cosine(k, in.Key.Floats)})
+	}
+	// Sort descending by similarity.
+	for i := 0; i < len(results); i++ {
+		for j := i + 1; j < len(results); j++ {
+			if results[j].sim > results[i].sim {
+				results[i], results[j] = results[j], results[i]
+			}
+		}
+	}
+
+	top := int(in.TopK)
+	if top <= 0 || top > len(results) {
+		top = len(results)
+	}
+	out := &pb.StoresFindResult{}
+	for _, r := range results[:top] {
+		out.Keys = append(out.Keys, &pb.StoresKey{Floats: r.key})
+		out.Values = append(out.Values, &pb.StoresValue{Bytes: r.val})
+		out.Similarities = append(out.Similarities, r.sim)
+	}
+	return out, nil
+}
+
+func (f *fakeBackend) findKey(target []float32) int {
+	for i, k := range f.keys {
+		if equalFloats(k, target) {
+			return i
+		}
+	}
+	return -1
+}
+
+func equalFloats(a, b []float32) bool {
+	if len(a) != len(b) {
+		return false
+	}
+	for i := range a {
+		if a[i] != b[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func cosine(a, b []float32) float32 {
+	var dot, na, nb float64
+	for i := range a {
+		dot += float64(a[i]) * float64(b[i])
+		na += float64(a[i]) * float64(a[i])
+		nb += float64(b[i]) * float64(b[i])
+	}
+	if na == 0 || nb == 0 {
+		return 0
+	}
+	return float32(dot / (math.Sqrt(na) * math.Sqrt(nb)))
+}
--- a/core/services/facerecognition/store_registry.go
+++ b/core/services/facerecognition/store_registry.go
@@ -0,0 +1,142 @@
+package facerecognition
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"sort"
+	"sync"
+	"time"
+
+	"github.com/google/uuid"
+
+	"github.com/mudler/LocalAI/pkg/grpc"
+	"github.com/mudler/LocalAI/pkg/store"
+)
+
+// StoreResolver resolves a named vector store to a gRPC backend. The
+// HTTP handler layer wires this to backend.StoreBackend so the
+// registry stays decoupled from the ModelLoader plumbing.
+type StoreResolver func(ctx context.Context, storeName string) (grpc.Backend, error)
+
+// NewStoreRegistry returns a Registry backed by LocalAI's generic
+// StoresSet / StoresFind / StoresDelete gRPC surface.
+//
+// storeName selects which vector-store model to use (defaults to the
+// local-store Go backend). `dim` is the expected embedding dimension;
+// pass 0 to accept whatever dimension arrives (useful when the face
+// backend exposes multiple recognizers of different sizes, e.g.
+// ArcFace R50 at 512 vs SFace at 128). A non-zero dim is enforced at
+// Register time and fails fast with ErrDimensionMismatch.
+func NewStoreRegistry(resolve StoreResolver, storeName string, dim int) Registry {
+	return &storeRegistry{
+		resolve:   resolve,
+		storeName: storeName,
+		dim:       dim,
+	}
+}
+
+type storeRegistry struct {
+	resolve   StoreResolver
+	storeName string
+	dim       int
+
+	// TODO(postgres): the local-store gRPC surface keys by embedding
+	// vector and exposes no "list all" method, so we cannot delete by
+	// ID without remembering the embedding. This in-memory index is
+	// rebuilt on every Register and lost on restart — acceptable while
+	// the only implementation is itself in-memory. A persistent
+	// implementation must rebuild this index at startup.
+	idIndex sync.Map // map[string][]float32
+}
+
+func (r *storeRegistry) Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error) {
+	if len(embedding) == 0 {
+		return Metadata{}, ErrEmptyEmbedding
+	}
+	if r.dim != 0 && len(embedding) != r.dim {
+		return Metadata{}, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(embedding))
+	}
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+
+	meta.ID = uuid.NewString()
+	if meta.RegisteredAt.IsZero() {
+		meta.RegisteredAt = time.Now().UTC()
+	}
+
+	payload, err := json.Marshal(meta)
+	if err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: marshal metadata: %w", err)
+	}
+
+	if err := store.SetSingle(ctx, backend, embedding, payload); err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: set: %w", err)
+	}
+
+	// Retain a copy so Forget can look up the embedding by ID.
+	embCopy := append([]float32(nil), embedding...)
+	r.idIndex.Store(meta.ID, embCopy)
+	return meta, nil
+}
+
+func (r *storeRegistry) Identify(ctx context.Context, probe []float32, topK int) ([]Match, error) {
+	if len(probe) == 0 {
+		return nil, ErrEmptyEmbedding
+	}
+	if r.dim != 0 && len(probe) != r.dim {
+		return nil, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(probe))
+	}
+	if topK <= 0 {
+		topK = 5
+	}
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return nil, fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+
+	_, values, similarities, err := store.Find(ctx, backend, probe, topK)
+	if err != nil {
+		return nil, fmt.Errorf("facerecognition: find: %w", err)
+	}
+
+	matches := make([]Match, 0, len(values))
+	for i, raw := range values {
+		var meta Metadata
+		if err := json.Unmarshal(raw, &meta); err != nil {
+			// Skip unreadable entries instead of failing the whole query —
+			// the store may contain non-face records in shared deployments.
+			continue
+		}
+		matches = append(matches, Match{
+			ID:       meta.ID,
+			Metadata: meta,
+			Distance: 1 - similarities[i],
+		})
+	}
+
+	sort.SliceStable(matches, func(i, j int) bool { return matches[i].Distance < matches[j].Distance })
+	return matches, nil
+}
+
+func (r *storeRegistry) Forget(ctx context.Context, id string) error {
+	raw, ok := r.idIndex.Load(id)
+	if !ok {
+		return ErrNotFound
+	}
+	embedding := raw.([]float32)
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+	if err := store.DeleteSingle(ctx, backend, embedding); err != nil {
+		return fmt.Errorf("facerecognition: delete: %w", err)
+	}
+	r.idIndex.Delete(id)
+	return nil
+}
--- a/core/services/nodes/health_mock_test.go
+++ b/core/services/nodes/health_mock_test.go
@@ -168,6 +168,12 @@ func (c *fakeBackendClient) SoundGeneration(_ context.Context, _ *pb.SoundGenera
 func (c *fakeBackendClient) Detect(_ context.Context, _ *pb.DetectOptions, _ ...ggrpc.CallOption) (*pb.DetectResponse, error) {
 	return nil, nil
 }
+func (c *fakeBackendClient) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	return nil, nil
+}
+func (c *fakeBackendClient) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	return nil, nil
+}
 func (c *fakeBackendClient) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
 	return nil, nil
 }
--- a/core/services/nodes/inflight_test.go
+++ b/core/services/nodes/inflight_test.go
@@ -91,6 +91,14 @@ func (f *fakeGRPCBackend) Detect(_ context.Context, _ *pb.DetectOptions, _ ...gg
 	return &pb.DetectResponse{}, nil
 }

+func (f *fakeGRPCBackend) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	return &pb.FaceVerifyResponse{}, nil
+}
+
+func (f *fakeGRPCBackend) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	return &pb.FaceAnalyzeResponse{}, nil
+}
+
 func (f *fakeGRPCBackend) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
 	return &pb.TranscriptResult{}, nil
 }
--- a/core/trace/backend_trace.go
+++ b/core/trace/backend_trace.go
@@ -24,6 +24,8 @@ const (
 	BackendTraceRerank          BackendTraceType = "rerank"
 	BackendTraceTokenize        BackendTraceType = "tokenize"
 	BackendTraceDetection       BackendTraceType = "detection"
+	BackendTraceFaceVerify      BackendTraceType = "face_verify"
+	BackendTraceFaceAnalyze     BackendTraceType = "face_analyze"
 	BackendTraceModelLoad       BackendTraceType = "model_load"
 )

--- a/docs/content/features/embeddings.md
+++ b/docs/content/features/embeddings.md
@@ -7,6 +7,10 @@ url = "/features/embeddings/"

 LocalAI supports generating embeddings for text or list of tokens.

+For face embeddings specifically, see the
+[Face Recognition](/features/face-recognition/) feature — it produces
+512-d L2-normalized vectors tuned for face similarity.
+
 For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings

 ## Model compatibility
--- a/docs/content/features/face-recognition.md
+++ b/docs/content/features/face-recognition.md
@@ -0,0 +1,228 @@
+++
+disableToc = false
+title = "Face Recognition"
+weight = 14
+url = "/features/face-recognition/"
+++
+
+LocalAI supports face recognition through the `insightface` backend:
+face verification (1:1), face identification (1:N) against a built-in
+vector store, face embedding, face detection, and demographic analysis
+(age / gender).
+
+The backend ships **two interchangeable engines** under one image, each
+paired with a distinct gallery entry so users can pick by license and
+accuracy needs.
+
+## Licensing — read this first
+
+| Gallery entry | Detector + recognizer | Size | License |
+|---|---|---|---|
+| `insightface-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | ~326 MB | **Non-commercial research only** (upstream insightface weights) |
+| `insightface-buffalo-s` | SCRFD-500MF + MBF + GenderAge | ~159 MB | **Non-commercial research only** |
+| `insightface-opencv` | YuNet + SFace | ~40 MB | **Apache 2.0 — commercial-safe** |
+
+The `insightface` Python library itself is MIT, but the pretrained model
+packs (buffalo_l, buffalo_s, antelopev2) are released by the upstream
+maintainers for **non-commercial research use only**. Pick the
+`insightface-opencv` entry for production / commercial deployments.
+
+## Quickstart
+
+Pull the commercial-safe backend (recommended for copy-paste):
+
+```bash
+local-ai models install insightface-opencv
+```
+
+Verify that two images depict the same person:
+
+```bash
+curl -sX POST http://localhost:8080/v1/face/verify \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "insightface-opencv",
+    "img1": "https://example.com/alice_1.jpg",
+    "img2": "https://example.com/alice_2.jpg"
+  }'
+```
+
+Response:
+
+```json
+{
+  "verified": true,
+  "distance": 0.27,
+  "threshold": 0.35,
+  "confidence": 23.1,
+  "model": "insightface-opencv",
+  "img1_area": { "x": 120.4, "y": 82.1, "w": 198.3, "h": 260.5 },
+  "img2_area": { "x": 110.8, "y": 95.0, "w": 205.6, "h": 268.2 },
+  "processing_time_ms": 412.0
+}
+```
+
+## 1:N identification workflow (register → identify → forget)
+
+This is the primary "face recognition" flow. Under the hood it uses
+LocalAI's built-in in-memory vector store — no external database to
+stand up.
+
+1. Register known faces:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/register \
+      -H "Content-Type: application/json" \
+      -d '{
+        "model": "insightface-buffalo-l",
+        "name": "Alice",
+        "img": "https://example.com/alice.jpg"
+      }'
+    # → {"id": "8b7...", "name": "Alice", "registered_at": "2026-04-21T..."}
+    ```
+
+2. Identify an unknown probe:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/identify \
+      -H "Content-Type: application/json" \
+      -d '{
+        "model": "insightface-buffalo-l",
+        "img": "https://example.com/unknown.jpg",
+        "top_k": 5
+      }'
+    # → {"matches": [{"id":"8b7...","name":"Alice","distance":0.22,"match":true,...}]}
+    ```
+
+3. Remove a person by ID:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/forget \
+      -d '{"id": "8b7..."}'
+    # → 204 No Content
+    ```
+
+{{% alert icon="⚠️" color="warning" %}}
+**Storage caveat.** The default vector store is in-memory. All
+registered faces are lost when LocalAI restarts. Persistent storage
+(pgvector) is a tracked future enhancement — the face-recognition HTTP
+API is designed to swap the backing store without changing the wire
+format.
+{{% /alert %}}
+
+## API reference
+
+### `POST /v1/face/verify` (1:1)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | gallery entry name (e.g. `insightface-buffalo-l`) |
+| `img1`, `img2` | string | URL, base64, or data-URI |
+| `threshold` | float, optional | cosine-distance cutoff; default depends on engine |
+| `anti_spoofing` | bool, optional | reserved — unused in the current release |
+
+Returns `verified`, `distance`, `threshold`, `confidence`, `model`,
+`img1_area`, `img2_area`, and `processing_time_ms`.
+
+### `POST /v1/face/analyze`
+
+Returns demographic attributes for every detected face:
+
+| field | type | description |
+|---|---|---|
+| `model` | string | gallery entry |
+| `img` | string | URL / base64 / data-URI |
+| `actions` | string[] | subset of `["age","gender","emotion","race"]`; empty = all supported |
+
+Only `insightface-buffalo-l` / `insightface-buffalo-s` populate age and
+gender (genderage head). `insightface-opencv` returns face regions with
+empty attributes — SFace has no demographic classifier. Emotion and
+race are always empty in the current release.
+
+### `POST /v1/face/register` (1:N enrollment)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face recognition model |
+| `img` | string | face to enroll |
+| `name` | string | human-readable label |
+| `labels` | map[string]string, optional | arbitrary metadata |
+| `store` | string, optional | vector store model; defaults to local-store |
+
+Returns `{id, name, registered_at}`. The `id` is an opaque UUID used by
+`/v1/face/identify` and `/v1/face/forget`.
+
+### `POST /v1/face/identify` (1:N recognition)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face recognition model |
+| `img` | string | probe image |
+| `top_k` | int, optional | max matches to return; default 5 |
+| `threshold` | float, optional | cosine-distance cutoff; default 0.35 (ArcFace) |
+| `store` | string, optional | vector store model; defaults to local-store |
+
+Returns a list of matches sorted by ascending distance, each with `id`,
+`name`, `labels`, `distance`, `confidence`, and `match`
+(`distance ≤ threshold`).
+
+### `POST /v1/face/forget`
+
+| field | type | description |
+|---|---|---|
+| `id` | string | ID returned by `/v1/face/register` |
+
+Returns `204 No Content` on success, `404 Not Found` if the ID is
+unknown.
+
+### `POST /v1/face/embed`
+
+Returns the L2-normalized face embedding vector for the detected face.
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face model |
+| `img` | string | URL / base64 / data-URI |
+
+Returns `{embedding: float[], dim: int, model: string}`. Dimension is
+512 for the insightface ArcFace/MBF recognizers and 128 for OpenCV's
+SFace.
+
+> **Note:** the OpenAI-compatible `/v1/embeddings` endpoint is
+> intentionally text-only by contract (`input` is a string or list of
+> strings of TEXT to embed) — passing an image data-URI there does
+> nothing useful. Use `/v1/face/embed` for image inputs.
+
+### Reused endpoint
+
+- `POST /v1/detection` — returns face bounding boxes with
+  `class_name: "face"`; works for both engines.
+
+## Choosing an engine
+
+| Need | Entry |
+|---|---|
+| Commercial product | `insightface-opencv` |
+| Highest accuracy (research / demos) | `insightface-buffalo-l` |
+| Edge / low-memory / research | `insightface-buffalo-s` |
+
+The recommended default `threshold` for `/v1/face/verify` and
+`/v1/face/identify` depends on the recognizer:
+
+| Recognizer | Cosine-distance threshold |
+|---|---|
+| ArcFace R50 (`buffalo_l`) | ~0.35 |
+| MBF (`buffalo_s`) | ~0.40 |
+| SFace (`opencv`) | ~0.50 |
+
+Pass `threshold` explicitly when switching engines — the per-engine
+default only fires when the field is omitted.
+
+## Related features
+
+- [Object Detection](/features/object-detection/) — generic bounding-box
+  detection; `/v1/detection` works with the insightface backend too.
+- [Embeddings](/features/embeddings/) — raw vector extraction; face
+  embeddings live in the same endpoint under the hood.
+- [Stores](/features/stores/) — the generic vector store powering the
+  1:N recognition pipeline.
--- a/docs/content/features/object-detection.md
+++ b/docs/content/features/object-detection.md
@@ -7,6 +7,11 @@ url = "/features/object-detection/"

 LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).

+For detecting **faces** specifically, see the dedicated
+[Face Recognition](/features/face-recognition/) feature — its
+`/v1/detection` support is tuned for face bounding boxes and ships
+with commercially-safe model options.
+
 ## Overview

 Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
--- a/docs/content/features/stores.md
+++ b/docs/content/features/stores.md
@@ -9,6 +9,14 @@ url = '/stores'
 Stores are an experimental feature to help with querying data using similarity search. It is
 a low level API that consists of only `get`, `set`, `delete` and `find`.

+{{% alert icon="💡" color="info" %}}
+**Face recognition uses this store.** The 1:N face identification flow
+(`/v1/face/register`, `/v1/face/identify`, `/v1/face/forget`) is built
+on top of the generic store — see
+[Face Recognition](/features/face-recognition/) for the face-oriented
+API.
+{{% /alert %}}
+
 For example if you have an embedding of some text and want to find text with similar embeddings.
 You can create embeddings for chunks of all your text then compare them against the embedding of the text you
 are searching on.
--- a/docs/content/whats-new.md
+++ b/docs/content/whats-new.md
@@ -10,6 +10,10 @@ Release notes have been now moved completely over Github releases.

 You can see the release notes [here](https://github.com/mudler/LocalAI/releases).

+## 2026 Highlights
+
+- **April 2026**: [Face recognition backend](/features/face-recognition/) — `insightface`-powered 1:1 verification, 1:N identification, face embedding, face detection, and demographic analysis. Ships both a non-commercial `buffalo_l` model and an Apache 2.0 OpenCV Zoo alternative.
+
 ## 2024 Highlights

 - **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3706,6 +3706,169 @@
    - filename: arcee-ai_AFM-4.5B-Q4_K_M.gguf
      sha256: f05516b323f581bebae1af2cbf900d83a2569b0a60c54366daf4a9c15ae30d4f
      uri: huggingface://bartowski/arcee-ai_AFM-4.5B-GGUF/arcee-ai_AFM-4.5B-Q4_K_M.gguf
+- &insightface_buffalo_l
+  name: "insightface-buffalo-l"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  # insightface library is MIT; pretrained packs are NON-COMMERCIAL.
+  license: "insightface-non-commercial"
+  description: |
+    Face recognition using insightface's `buffalo_l` pack
+    (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB).
+    Default choice, highest accuracy.
+
+    Weights delivered via LocalAI's gallery mechanism (SHA-256 verified,
+    cached in the models directory like any other managed model).
+    NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-l}
+    options: ["engine:insightface", "model_pack:buffalo_l"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_l.zip
+      sha256: 80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
+- &insightface_buffalo_m
+  name: "insightface-buffalo-m"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace +
+    genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a
+    cheaper detector — good balance on mid-range hardware.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-m}
+    options: ["engine:insightface", "model_pack:buffalo_m"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_m.zip
+      sha256: d98264bd8f2dc75cbc2ddce2a14e636e02bb857b3051c234b737bf3b614edca9
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_m.zip
+- &insightface_buffalo_s
+  name: "insightface-buffalo-s"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder +
+    genderage, ~159MB). Good fit for mid-range CPU deployments.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-s}
+    options: ["engine:insightface", "model_pack:buffalo_s"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_s.zip
+      sha256: d85a87f503f691807cd8bb97128bdf7a0660326cd9cd02657127fa978bab8b5e
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_s.zip
+- &insightface_buffalo_sc
+  name: "insightface-buffalo-sc"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB).
+    NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty
+    attributes for this pack. Ideal for edge/embedded deployments where
+    only verification and embedding are needed.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-sc}
+    options: ["engine:insightface", "model_pack:buffalo_sc"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_sc.zip
+      sha256: 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
+- &insightface_antelopev2
+  name: "insightface-antelopev2"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer +
+    genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on
+    harder benchmarks; pays for it in GPU memory.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-antelopev2}
+    options: ["engine:insightface", "model_pack:antelopev2"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: antelopev2.zip
+      sha256: 8e182f14fc6e80b3bfa375b33eb6cff7ee05d8ef7633e738d1c89021dcf0c5c5
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip
+- &insightface_opencv
+  name: "insightface-opencv"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  license: apache-2.0
+  description: |
+    Face recognition using OpenCV Zoo weights: YuNet detector + SFace
+    128-d recognizer (fp32). APACHE 2.0 — safe for commercial use.
+    Lower accuracy than insightface packs, no demographic head
+    (`/v1/face/analyze` returns detection regions only).
+    Weights are downloaded on install via LocalAI's gallery mechanism
+    (~40MB).
+  tags: [face-recognition, face-verification, face-embedding, commercial-ok, gpu, cpu]
+  urls: [https://github.com/opencv/opencv_zoo]
+  overrides:
+    backend: insightface
+    parameters: {model: face_detection_yunet_2023mar.onnx}
+    options:
+      - "engine:onnx_direct"
+      - "detector_onnx:face_detection_yunet_2023mar.onnx"
+      - "recognizer_onnx:face_recognition_sface_2021dec.onnx"
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: face_detection_yunet_2023mar.onnx
+      sha256: 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
+    - filename: face_recognition_sface_2021dec.onnx
+      sha256: 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
+- &insightface_opencv_int8
+  name: "insightface-opencv-int8"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  license: apache-2.0
+  description: |
+    Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB).
+    Roughly 3x smaller and noticeably faster on CPU than the fp32 variant
+    at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe.
+    Weights are downloaded on install via LocalAI's gallery mechanism.
+  tags: [face-recognition, face-verification, face-embedding, commercial-ok, edge, cpu]
+  urls: [https://github.com/opencv/opencv_zoo]
+  overrides:
+    backend: insightface
+    parameters: {model: face_detection_yunet_2023mar_int8.onnx}
+    options:
+      - "engine:onnx_direct"
+      - "detector_onnx:face_detection_yunet_2023mar_int8.onnx"
+      - "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx"
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: face_detection_yunet_2023mar_int8.onnx
+      sha256: 321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
+    - filename: face_recognition_sface_2021dec_int8.onnx
+      sha256: 2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
 - &rfdetr
  name: "rfdetr-base"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
--- a/pkg/grpc/backend.go
+++ b/pkg/grpc/backend.go
@@ -54,6 +54,8 @@ type Backend interface {
 	TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
 	SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
 	Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
+	FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
+	FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error)
 	AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
 	AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
 	TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)
--- a/pkg/grpc/base/base.go
+++ b/pkg/grpc/base/base.go
@@ -81,6 +81,14 @@ func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
 	return pb.DetectResponse{}, fmt.Errorf("unimplemented")
 }

+func (llm *Base) FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
+	return pb.FaceVerifyResponse{}, fmt.Errorf("unimplemented")
+}
+
+func (llm *Base) FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
+	return pb.FaceAnalyzeResponse{}, fmt.Errorf("unimplemented")
+}
+
 func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
 	return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
 }
--- a/pkg/grpc/client.go
+++ b/pkg/grpc/client.go
@@ -580,6 +580,42 @@ func (c *Client) Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.
 	return client.Detect(ctx, in, opts...)
 }

+func (c *Client) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	if !c.parallel {
+		c.opMutex.Lock()
+		defer c.opMutex.Unlock()
+	}
+	c.setBusy(true)
+	defer c.setBusy(false)
+	c.wdMark()
+	defer c.wdUnMark()
+	conn, err := c.dial()
+	if err != nil {
+		return nil, err
+	}
+	defer conn.Close()
+	client := pb.NewBackendClient(conn)
+	return client.FaceVerify(ctx, in, opts...)
+}
+
+func (c *Client) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	if !c.parallel {
+		c.opMutex.Lock()
+		defer c.opMutex.Unlock()
+	}
+	c.setBusy(true)
+	defer c.setBusy(false)
+	c.wdMark()
+	defer c.wdUnMark()
+	conn, err := c.dial()
+	if err != nil {
+		return nil, err
+	}
+	defer conn.Close()
+	client := pb.NewBackendClient(conn)
+	return client.FaceAnalyze(ctx, in, opts...)
+}
+
 func (c *Client) AudioEncode(ctx context.Context, in *pb.AudioEncodeRequest, opts ...grpc.CallOption) (*pb.AudioEncodeResult, error) {
 	if !c.parallel {
 		c.opMutex.Lock()
--- a/pkg/grpc/embed.go
+++ b/pkg/grpc/embed.go
@@ -71,6 +71,14 @@ func (e *embedBackend) Detect(ctx context.Context, in *pb.DetectOptions, opts ..
 	return e.s.Detect(ctx, in)
 }

+func (e *embedBackend) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	return e.s.FaceVerify(ctx, in)
+}
+
+func (e *embedBackend) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	return e.s.FaceAnalyze(ctx, in)
+}
+
 func (e *embedBackend) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error) {
 	return e.s.AudioTranscription(ctx, in)
 }
--- a/pkg/grpc/interface.go
+++ b/pkg/grpc/interface.go
@@ -17,6 +17,8 @@ type AIModel interface {
 	GenerateImage(*pb.GenerateImageRequest) error
 	GenerateVideo(*pb.GenerateVideoRequest) error
 	Detect(*pb.DetectOptions) (pb.DetectResponse, error)
+	FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error)
+	FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error)
 	AudioTranscription(*pb.TranscriptRequest) (pb.TranscriptResult, error)
 	AudioTranscriptionStream(*pb.TranscriptRequest, chan *pb.TranscriptStreamResponse) error
 	TTS(*pb.TTSRequest) error
--- a/pkg/grpc/server.go
+++ b/pkg/grpc/server.go
@@ -151,6 +151,30 @@ func (s *server) Detect(ctx context.Context, in *pb.DetectOptions) (*pb.DetectRe
 	return &res, nil
 }

+func (s *server) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest) (*pb.FaceVerifyResponse, error) {
+	if s.llm.Locking() {
+		s.llm.Lock()
+		defer s.llm.Unlock()
+	}
+	res, err := s.llm.FaceVerify(in)
+	if err != nil {
+		return nil, err
+	}
+	return &res, nil
+}
+
+func (s *server) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest) (*pb.FaceAnalyzeResponse, error) {
+	if s.llm.Locking() {
+		s.llm.Lock()
+		defer s.llm.Unlock()
+	}
+	res, err := s.llm.FaceAnalyze(in)
+	if err != nil {
+		return nil, err
+	}
+	return &res, nil
+}
+
 func (s *server) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest) (*pb.TranscriptResult, error) {
 	if s.llm.Locking() {
 		s.llm.Lock()
--- a/tests/e2e-backends/backend_test.go
+++ b/tests/e2e-backends/backend_test.go
@@ -2,6 +2,7 @@ package e2ebackends_test

 import (
 	"context"
+	"encoding/base64"
 	"fmt"
 	"io"
 	"net"
@@ -83,13 +84,18 @@ const (
 	capTools         = "tools"
 	capTranscription = "transcription"
 	capImage         = "image"
+	capFaceDetect    = "face_detect"
+	capFaceEmbed     = "face_embed"
+	capFaceVerify    = "face_verify"
+	capFaceAnalyze   = "face_analyze"

-	defaultPrompt      = "The capital of France is"
-	streamPrompt       = "Once upon a time"
-	defaultToolPrompt  = "What's the weather like in Paris, France?"
-	defaultToolName    = "get_weather"
-	defaultImagePrompt = "a photograph of an astronaut riding a horse"
-	defaultImageSteps  = 4
+	defaultPrompt             = "The capital of France is"
+	streamPrompt              = "Once upon a time"
+	defaultToolPrompt         = "What's the weather like in Paris, France?"
+	defaultToolName           = "get_weather"
+	defaultImagePrompt        = "a photograph of an astronaut riding a horse"
+	defaultImageSteps         = 4
+	defaultVerifyDistanceCeil = float32(0.6) // upper bound for same-person; SFace runs closer to 0.5 ArcFace to 0.35.
 )

 func defaultCaps() map[string]bool {
@@ -127,12 +133,21 @@ var _ = Describe("Backend container", Ordered, func() {
 		modelName  string // set when a HuggingFace model id is used
 		mmprojFile string // optional multimodal projector
 		audioFile  string // optional audio fixture for transcription specs
-		addr       string
-		serverCmd  *exec.Cmd
-		conn       *grpc.ClientConn
-		client     pb.BackendClient
-		prompt     string
-		options    []string
+		// Face fixtures: two photos of the same person + one different person.
+		faceFile1 string
+		faceFile2 string
+		faceFile3 string
+		// verifyCeiling is the upper-bound cosine distance for a
+		// same-person pair; each model configuration can override it via
+		// BACKEND_TEST_VERIFY_DISTANCE_CEILING because SFace's distance
+		// distribution is wider than ArcFace's.
+		verifyCeiling float32
+		addr          string
+		serverCmd     *exec.Cmd
+		conn          *grpc.ClientConn
+		client        pb.BackendClient
+		prompt        string
+		options       []string
 	)

 	BeforeAll(func() {
@@ -197,6 +212,12 @@ var _ = Describe("Backend container", Ordered, func() {
 			}
 		}

+		// Face fixtures for the face-recognition specs.
+		faceFile1 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_1", "face_a_1.jpg")
+		faceFile2 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_2", "face_a_2.jpg")
+		faceFile3 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_3", "face_b.jpg")
+		verifyCeiling = envFloat32("BACKEND_TEST_VERIFY_DISTANCE_CEILING", defaultVerifyDistanceCeil)
+
 		// Pick a free port and launch the backend.
 		port, err := freeport.GetFreePort()
 		Expect(err).NotTo(HaveOccurred())
@@ -533,6 +554,120 @@ var _ = Describe("Backend container", Ordered, func() {
 		GinkgoWriter.Printf("AudioTranscriptionStream: deltas=%d assembled=%q final=%q\n",
 			len(deltas), assembled.String(), finalText)
 	})
+
+	// ─── face recognition specs ─────────────────────────────────────────
+
+	It("detects faces via Detect", func() {
+		if !caps[capFaceDetect] {
+			Skip("face_detect capability not enabled")
+		}
+		Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
+
+		ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+		defer cancel()
+		res, err := client.Detect(ctx, &pb.DetectOptions{Src: base64File(faceFile1)})
+		Expect(err).NotTo(HaveOccurred())
+		Expect(res.GetDetections()).NotTo(BeEmpty(), "Detect returned no faces")
+		for _, d := range res.GetDetections() {
+			Expect(d.GetClassName()).To(Equal("face"))
+			Expect(d.GetWidth()).To(BeNumerically(">", 0))
+			Expect(d.GetHeight()).To(BeNumerically(">", 0))
+		}
+		GinkgoWriter.Printf("face_detect: %d faces\n", len(res.GetDetections()))
+	})
+
+	It("produces face embeddings via Embedding", func() {
+		if !caps[capFaceEmbed] {
+			Skip("face_embed capability not enabled")
+		}
+		Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
+
+		ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+		defer cancel()
+		res, err := client.Embedding(ctx, &pb.PredictOptions{Images: []string{base64File(faceFile1)}})
+		Expect(err).NotTo(HaveOccurred())
+		vec := res.GetEmbeddings()
+		Expect(vec).NotTo(BeEmpty(), "Embedding returned empty vector")
+		// Face embeddings are L2-normalized — expect unit norm.
+		var sumSq float64
+		for _, v := range vec {
+			sumSq += float64(v) * float64(v)
+		}
+		Expect(sumSq).To(BeNumerically("~", 1.0, 0.05),
+			"face embedding should be L2-normed (sum(x^2)=%.3f, dim=%d)", sumSq, len(vec))
+		GinkgoWriter.Printf("face_embed: dim=%d\n", len(vec))
+	})
+
+	It("verifies faces via FaceVerify", func() {
+		if !caps[capFaceVerify] {
+			Skip("face_verify capability not enabled")
+		}
+		Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
+
+		ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+		defer cancel()
+
+		// Same image twice — expected verified=true with very small distance.
+		b1 := base64File(faceFile1)
+		same, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b1, Threshold: verifyCeiling})
+		Expect(err).NotTo(HaveOccurred())
+		Expect(same.GetVerified()).To(BeTrue(), "same image should verify: dist=%.3f", same.GetDistance())
+		Expect(same.GetDistance()).To(BeNumerically("<", 0.1))
+		GinkgoWriter.Printf("face_verify(same): dist=%.3f confidence=%.1f\n", same.GetDistance(), same.GetConfidence())
+
+		// Different images — assert relative ordering when the detector
+		// actually finds a face in both images. Some fixtures (masked
+		// faces, profile shots, etc.) are legitimately borderline for
+		// SCRFD's default threshold, so we don't fail the suite when the
+		// second image gets a NotFound — we just log and skip the
+		// cross-person check. The same-image assertion above is the
+		// definitive proof the RPC works end-to-end.
+		if faceFile3 != "" {
+			b3 := base64File(faceFile3)
+			diff, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b3, Threshold: verifyCeiling})
+			if err != nil {
+				GinkgoWriter.Printf("face_verify(diff): skipped — %v\n", err)
+			} else {
+				Expect(diff.GetDistance()).To(BeNumerically(">", same.GetDistance()),
+					"cross-person distance %.3f should exceed same-image distance %.3f", diff.GetDistance(), same.GetDistance())
+				GinkgoWriter.Printf("face_verify(diff): dist=%.3f verified=%v\n", diff.GetDistance(), diff.GetVerified())
+			}
+		}
+
+		// If two photos of the same person were provided, the ordering
+		// should also hold: d(a1,a2) < ceiling. Best-effort as above —
+		// skip if the detector doesn't find a face in the second image.
+		if faceFile2 != "" {
+			b2 := base64File(faceFile2)
+			sp, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b2, Threshold: verifyCeiling})
+			if err != nil {
+				GinkgoWriter.Printf("face_verify(same-person): skipped — %v\n", err)
+			} else {
+				Expect(sp.GetDistance()).To(BeNumerically("<", verifyCeiling),
+					"same-person (different photos) distance %.3f exceeds ceiling %.3f", sp.GetDistance(), verifyCeiling)
+				GinkgoWriter.Printf("face_verify(same-person): dist=%.3f verified=%v\n", sp.GetDistance(), sp.GetVerified())
+			}
+		}
+	})
+
+	It("analyzes faces via FaceAnalyze", func() {
+		if !caps[capFaceAnalyze] {
+			Skip("face_analyze capability not enabled")
+		}
+		Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
+
+		ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
+		defer cancel()
+		res, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{Img: base64File(faceFile1)})
+		Expect(err).NotTo(HaveOccurred())
+		Expect(res.GetFaces()).NotTo(BeEmpty(), "FaceAnalyze returned no faces")
+		for _, f := range res.GetFaces() {
+			Expect(f.GetFaceConfidence()).To(BeNumerically(">", 0))
+			Expect(f.GetAge()).To(BeNumerically(">", 0), "age should be populated by analyze-capable engines")
+			Expect(f.GetDominantGender()).To(BeElementOf("Man", "Woman"))
+		}
+		GinkgoWriter.Printf("face_analyze: %d faces\n", len(res.GetFaces()))
+	})
 })

 // extractImage runs `docker create` + `docker export` to materialise the image
@@ -589,6 +724,43 @@ func envInt32(name string, def int32) int32 {
 	return v
 }

+func envFloat32(name string, def float32) float32 {
+	raw := os.Getenv(name)
+	if raw == "" {
+		return def
+	}
+	var v float32
+	if _, err := fmt.Sscanf(raw, "%f", &v); err != nil {
+		return def
+	}
+	return v
+}
+
+// resolveFaceFixture returns the local path of a face-fixture image,
+// preferring BACKEND_TEST_<prefix>_FILE when set and otherwise
+// downloading BACKEND_TEST_<prefix>_URL into workDir. Returns an empty
+// string when neither is configured — specs that need it should skip.
+func resolveFaceFixture(workDir, prefix, defaultName string) string {
+	if path := os.Getenv(prefix + "_FILE"); path != "" {
+		return path
+	}
+	url := os.Getenv(prefix + "_URL")
+	if url == "" {
+		return ""
+	}
+	dest := filepath.Join(workDir, defaultName)
+	downloadFile(url, dest)
+	return dest
+}
+
+// base64File reads a file and returns its base64 encoding.
+func base64File(path string) string {
+	GinkgoHelper()
+	data, err := os.ReadFile(path)
+	Expect(err).NotTo(HaveOccurred(), "reading %s", path)
+	return base64.StdEncoding.EncodeToString(data)
+}
+
 func keys(m map[string]bool) []string {
 	out := make([]string, 0, len(m))
 	for k, v := range m {
--- a/tests/fixtures/faces/README.md
+++ b/tests/fixtures/faces/README.md
@@ -0,0 +1,27 @@
+# Face-recognition e2e fixtures
+
+The face-recognition e2e tests (`tests/e2e-backends/backend_test.go`)
+don't require committed fixture JPEGs. They follow the same pattern as
+`BACKEND_TEST_AUDIO_URL` (whisper): the Makefile target passes
+HTTP URLs via `BACKEND_TEST_FACE_IMAGE_*_URL`, and the suite downloads
+them at `BeforeAll` time.
+
+For the Makefile targets in `Makefile`, the defaults point at NASA's
+public-domain astronaut portraits on nasa.gov. NASA images are released
+into the public domain by U.S. federal work (see
+<https://www.nasa.gov/nasa-brand-center/images-and-media/>).
+
+If you want to run the suite fully offline, drop three JPEGs into this
+directory with the names the Makefile expects and flip the env vars to
+the `_FILE` variants:
+
+```
+tests/fixtures/faces/person_a_1.jpg   # person A, photo 1
+tests/fixtures/faces/person_a_2.jpg   # person A, photo 2 (different angle/lighting)
+tests/fixtures/faces/person_b.jpg     # a different person
+```
+
+The suite asserts *relative* ordering only (`d(a1,a2) < d(a1,b)`) — the
+absolute distance ceiling is set per-model via
+`BACKEND_TEST_VERIFY_DISTANCE_CEILING` so SFace (which uses a wider
+distance distribution than ArcFace) can share the same suite.