feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480)

* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis Adds face recognition as a new first-class capability in LocalAI via the `insightface` Python backend, with a pluggable two-engine design so non-commercial (insightface model packs) and commercial-safe (OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface. New gRPC RPCs (backend/backend.proto): * FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse * FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse Existing Embedding and Detect RPCs are reused (face image in PredictOptions.Images / DetectOptions.src) for face embedding and face detection respectively. New HTTP endpoints under /v1/face/: * verify — 1:1 image pair same-person decision * analyze — per-face age + gender (emotion/race reserved) * register — 1:N enrollment; stores embedding in vector store * identify — 1:N recognition; detect → embed → StoresFind * forget — remove a registered face by opaque ID Service layer (core/services/facerecognition/) introduces a `Registry` interface with one in-memory `storeRegistry` impl backed by LocalAI's existing local-store gRPC vector backend. HTTP handlers depend on the interface, not on StoresSet/StoresFind directly, so a persistent PostgreSQL/pgvector implementation can be slotted in via a single constructor change in core/application (TODO marker in the package doc). New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired into FLAG_DETECTION so /v1/detection works for face bounding boxes. Gallery (backend/index.yaml) ships three entries: * insightface-buffalo-l — SCRFD-10GF + ArcFace R50 + genderage (~326MB pre-baked; non-commercial research use only) * insightface-opencv — YuNet + SFace (~40MB pre-baked; Apache 2.0) * insightface-buffalo-s — SCRFD-500MF + MBF (runtime download; non-commercial) Python backend (backend/python/insightface/): * engines.py — FaceEngine protocol with InsightFaceEngine and OnnxDirectEngine; resolves model paths relative to the backend directory so the same gallery config works in docker-scratch and in the e2e-backends rootfs-extraction harness. * backend.py — gRPC servicer implementing Health, LoadModel, Status, Embedding, Detect, FaceVerify, FaceAnalyze. * install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the backend directory so first-run is offline-clean (the final scratch image only preserves files under /<backend>/). * test.py — parametrized unit tests over both engines. Tests: * Registry unit tests (go test -race ./core/services/facerecognition/...) — in-memory fake grpc.Backend, table-driven, covers register/ identify/forget/error paths + concurrent access. * tests/e2e-backends/backend_test.go extended with face caps (face_detect, face_embed, face_verify, face_analyze); relative ordering + configurable verifyCeiling per engine. * Makefile targets: test-extra-backend-insightface-buffalo-l, -opencv, and the -all aggregate. * CI: .github/workflows/test-extra.yml gains tests-insightface-grpc, auto-triggered by changes under backend/python/insightface/. Docs: * docs/content/features/face-recognition.md — feature page with license table, quickstart (defaults to the commercial-safe model), models matrix, API reference, 1:N workflow, storage caveats. * Cross-refs in object-detection.md, stores.md, embeddings.md, and whats-new.md. * Contributor README at backend/python/insightface/README.md. Verified end-to-end: * buffalo_l: 6/6 specs (health, load, face_detect, face_embed, face_verify, face_analyze). * opencv: 5/5 specs (same minus face_analyze — SFace has no demographic head; correctly skipped via BACKEND_TEST_CAPS). Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): move engine selection to model gallery, collapse backend entries The previous commit put engine/model_pack options on backend gallery entries (`backend/index.yaml`). That was wrong — `GalleryBackend` (core/gallery/backend_types.go:32) has no `options` field, so the YAML decoder silently dropped those keys and all three "different insightface-*" backend entries resolved to the same container image with no distinguishing configuration. Correct split: * `backend/index.yaml` now has ONE `insightface` backend entry shipping the CPU + CUDA 12 container images. The Python backend bundles both the non-commercial insightface model packs (buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo weights (YuNet + SFace); the active engine is selected at LoadModel time via `options: ["engine:..."]`. * `gallery/index.yaml` gains three model entries — `insightface-buffalo-l`, `insightface-opencv`, `insightface-buffalo-s` — each setting the appropriate `overrides.backend` + `overrides.options` so installing one actually gives the user the intended engine. This matches how `rfdetr-base` lives in the model gallery against the `rfdetr` backend. The earlier e2e tests passed despite this bug because the Makefile targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC, bypassing any gallery resolution entirely. No code changes needed. Assisted-by: Claude:claude-opus-4-7 * feat(face-recognition): cover all supported models in the gallery + drop weight baking Follows up on the model-gallery split: adds entries for every model configuration either engine actually supports, and switches weight delivery from image-baked to LocalAI's standard gallery mechanism. Gallery now has seven `insightface-*` model entries (gallery/index.yaml): insightface (family) — non-commercial research use • buffalo-l (326MB) — SCRFD-10GF + ResNet50 + genderage, default • buffalo-m (313MB) — SCRFD-2.5GF + ResNet50 + genderage • buffalo-s (159MB) — SCRFD-500MF + MBF + genderage • buffalo-sc (16MB) — SCRFD-500MF + MBF, recognition only (no landmarks, no demographics — analyze returns empty attributes) • antelopev2 (407MB) — SCRFD-10GF + ResNet100@Glint360K + genderage OpenCV Zoo family — Apache 2.0 commercial-safe • opencv — YuNet + SFace fp32 (~40MB) • opencv-int8 — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU) Model weights are no longer baked into the backend image. The image now ships only the Python runtime + libraries (~275MB content size, ~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through LocalAI's gallery mechanism: * OpenCV variants list `files:` with ONNX URIs + SHA-256, so `local-ai models install insightface-opencv` pulls them into the models directory exactly like any other gallery-managed model. * insightface packs (upstream distributes .zip archives only, not individual ONNX files) auto-download on first LoadModel via FaceAnalysis' built-in machinery, rooted at the LocalAI models directory so they live alongside everything else — same pattern `rfdetr` uses with `inference.get_model()`. Backend changes (backend/python/insightface/): * backend.py — LoadModel propagates `ModelOptions.ModelPath` (the LocalAI models directory) to engines via a `_model_dir` hint. This replaces the earlier ModelFile-dirname approach; ModelPath is the canonical "models directory" variable set by the Go loader (pkg/model/initializers.go:144) and is always populated. * engines.py::_resolve_model_path — picks up `model_dir` and searches it (plus basename-in-model-dir) before falling back to the dev script-dir. This is how OnnxDirectEngine finds gallery-downloaded YuNet/SFace files by filename only. * engines.py::_flatten_insightface_pack — new helper that works around an upstream packaging inconsistency: buffalo_l/s/sc zips expand flat, but buffalo_m and antelopev2 zips wrap their ONNX files in a redundant `<name>/` directory. insightface's own loader looks one level too shallow and fails. We call `ensure_available()` explicitly, flatten if nested, then hand to FaceAnalysis. * engines.py::InsightFaceEngine.prepare — root-resolution order now includes the `_model_dir` hint so packs download into the LocalAI models directory by default. * install.sh — no longer pre-downloads any weights. Everything is gallery-managed now. * smoke.py (new) — parametrized smoke test that iterates over every gallery configuration, simulating the LocalAI install flow (creates a models dir, fetches OpenCV files with checksum verification, lets insightface auto-download its packs), then runs detect + embed + verify (+ analyze where supported) through the in-process BackendServicer. * test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/` paths; downloads ONNX files to a temp dir at setUpClass time and passes ModelPath accordingly. Registry change (core/services/facerecognition/store_registry.go): * `dim=0` in NewStoreRegistry now means "accept whatever dimension arrives" — needed because the backend supports 512-d ArcFace/MBF and 128-d SFace via the same Registry. A non-zero dim still fails fast with ErrDimensionMismatch. * core/application plumbs `faceEmbeddingDim = 0`, explaining the rationale in the comment. Backend gallery description updated to reflect that the image carries no weights — it's just Python + engines. Smoke-tested all 7 configurations against the rebuilt image (with the flatten fix applied), exit 0: PASS: insightface-buffalo-l faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-sc faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-s faces=6 dim=512 same-dist=0.000 PASS: insightface-buffalo-m faces=6 dim=512 same-dist=0.000 PASS: insightface-antelopev2 faces=6 dim=512 same-dist=0.000 PASS: insightface-opencv faces=6 dim=128 same-dist=0.000 PASS: insightface-opencv-int8 faces=6 dim=128 same-dist=0.000 7/7 passed Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim CI regression from the previous commit: I moved OpenCV Zoo weight delivery to LocalAI's gallery `files:` mechanism, but the test-extra-backend-insightface-opencv target was still passing relative paths `detector_onnx:models/opencv/yunet.onnx` in BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over gRPC without going through the gallery, so those relative paths resolved to nothing and OpenCV's ONNXImporter failed: LoadModel failed: Failed to load face engine: OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx Fix: add an `insightface-opencv-models` prerequisite target that fetches the two ONNX files (YuNet + SFace) to a deterministic host cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256, and skips the download on re-runs. The opencv test target depends on it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend finds the files via its normal absolute-path resolution branch. Also refresh the buffalo_l comment: it no longer says "pre-baked" (nothing is — the pack auto-downloads from upstream's GitHub release on first LoadModel, same as in CI). Locally verified: `make test-extra-backend-insightface-opencv` passes 5/5 specs (health, load, face_detect, face_embed, face_verify). Assisted-by: Claude:claude-opus-4-7 * feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs The docs promised that /v1/embeddings returns face vectors when you send an image data-URI. That was never true: /v1/embeddings is OpenAI-compatible and text-only by contract — its handler goes through `core/backend/embeddings.go::ModelEmbedding`, which sets `predictOptions.Embeddings = s` (a string of TEXT to embed) and never populates `predictOptions.Images[]`. The Python backend's Embedding gRPC method does handle Images[] (that's how /v1/face/register reaches it internally via `backend.FaceEmbed`), but the HTTP embeddings endpoint wasn't wired to populate it. Rather than overload /v1/embeddings with image-vs-text detection — messy, and the endpoint is OpenAI-compatible by design — add a dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed` (already used internally by /v1/face/register and /v1/face/identify). Matches LocalAI's convention of a dedicated path per non-standard flow (/v1/rerank, /v1/detection, /v1/face/verify etc.). Response: { "embedding": [<dim> floats, L2-normed], "dim": int, // 512 for ArcFace R50 / MBF, 128 for SFace "model": "<name>" } Live-tested on the opencv engine: returns a 128-d L2-normalized vector (sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings is text-only and point image users at /v1/face/embed instead. Assisted-by: Claude:claude-opus-4-7 * fix(http): map malformed image input + gRPC status codes to proper 4xx Image-input failures on LocalAI's single-image endpoints (/v1/detection, /v1/face/{verify,analyze,embed,register,identify}) have historically returned 500 — even when the client was the one who sent garbage. Classic example: you POST an "image" that isn't a URL, isn't a data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim that's its fault. Two helpers land in core/http/endpoints/localai/images.go and every single-image handler is switched over: * decodeImageInput(s) Wraps utils.GetContentURIAsBase64 and turns any failure (invalid URL, not a data-URI, download error, etc.) into echo.NewHTTPError(400, "invalid image input: ..."). * mapBackendError(err) Inspects the gRPC status on a backend call error and maps: INVALID_ARGUMENT → 400 Bad Request NOT_FOUND → 404 Not Found FAILED_PRECONDITION → 412 Precondition Failed Unimplemented → 501 Not Implemented All other codes fall through unchanged (still 500). Before, my 1×1 PNG error-path test returned: HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images" After: HTTP 400 "failed to decode one or both images" Scope-limited to the LocalAI single-image endpoints. The multi-modal paths (middleware/request.go, openresponses/responses.go, openai/realtime.go) intentionally log-and-skip individual media parts when decoding fails — different design intent (graceful degradation of a multi-part message), not a 400-worthy failure. Left untouched. Live-verified: every error case in /tmp/face_errors.py now returns 4xx with a meaningful message; the "image with no face (1x1 PNG)" case specifically went from 500 → 400. Assisted-by: Claude:claude-opus-4-7 * refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis Follows up on the discovery that LocalAI's gallery `files:` mechanism handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy piper voices use exactly this pattern. Insightface packs are zip archives, so we can now deliver them the same way every other gallery-managed model gets delivered: declaratively, checksum-verified, through LocalAI's standard download+extract pipeline. Two changes: 1. Gallery (gallery/index.yaml) — every insightface-* entry gains a `files:` list with the pack zip's URI + SHA-256. `local-ai models install insightface-buffalo-l` now fetches the zip, verifies the hash, and extracts it into the models directory. No more reliance on insightface's library-internal `ensure_available()` auto-download or its hardcoded `BASE_REPO_URL`. 2. InsightFaceEngine (backend/python/insightface/engines.py) — drops the FaceAnalysis wrapper and drives insightface's `model_zoo` directly. The ~50 lines FaceAnalysis provides — glob ONNX files, route each through `model_zoo.get_model()`, build a `{taskname: model}` dict, loop per-face at inference — are reimplemented in `InsightFaceEngine`. The actual inference classes (RetinaFace, ArcFaceONNX, Attribute, Landmark) are still insightface's — we only replicate the glue, so drift risk against upstream is minimal. Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx` layout that doesn't match what LocalAI's zip extraction produces. LocalAI unpacks archives flat into `<models_dir>`. Upstream packs are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant `<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new `_locate_insightface_pack` helper searches both locations plus legacy paths and returns whichever has ONNX files. Replaces the earlier `_flatten_insightface_pack` helper (which tried to fight FaceAnalysis's layout expectations; now we just find the files wherever they are). Net effect for users: install once via LocalAI's managed flow, weights live alongside every other model, progress shows in the jobs endpoint, no first-load network call. Same API surface, cleaner plumbing. Assisted-by: Claude:claude-opus-4-7 * fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched The e2e suite drives LoadModel over gRPC without going through LocalAI's gallery flow, so the engine's `_model_dir` option (normally populated from ModelPath) is empty. Previously the insightface target relied on FaceAnalysis auto-download to paper over this, but we dropped FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l target started failing at LoadModel with "no insightface pack found". Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip (same SHA as the gallery entry), extract it on the host, and pass `root:<dir>` so the engine locates the pack without needing ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI fast; it covers the same insightface engine code path as buffalo_l. Face analyze cap dropped since buffalo_sc has no age/gender head. Assisted-by: Claude:claude-opus-4-7[1m] * feat(face-recognition): surface face-recognition in advertised feature maps The six /v1/face/* endpoints were missing from every place LocalAI advertises its feature surface to clients: * api_instructions — the machine-readable capability index at GET /api/instructions. Added `face-recognition` as a dedicated instruction area with an intro that calls out the in-memory registry caveat and the /v1/face/embed vs /v1/embeddings split. * auth/permissions — added FeatureFaceRecognition constant, routed all six face endpoints through it so admins can gate them per-user like any other API feature. Default ON (matches the other API features). * React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a follow-up (noted in the plan). Instruction count bumped 9 → 10; test updated. Assisted-by: Claude:claude-opus-4-7[1m] * docs(agents): capture advertising-surface steps in the endpoint guide Before this change, adding a new /v1/* endpoint reliably missed one or more of: the swagger @Tags annotation, the /api/instructions registry, the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The endpoint would work but be invisible to API consumers, admins, and the UI — and nothing in the existing docs said to look in those places. Extend .agents/api-endpoints-and-auth.md with a new "Advertising surfaces" section covering all four surfaces (swagger tags, /api/ instructions, capabilities.js, docs/), and expand the closing checklist so it's impossible to ship a feature without visiting each one. Hoist a one-liner reminder into AGENTS.md's Quick Reference so agents skim it before diving in. Assisted-by: Claude:claude-opus-4-7[1m]
2026-04-29 11:37:40 -04:00 · 2026-04-22 21:55:41 +02:00
parent d16f19f1eb
commit 20baec77ab
59 changed files with 3625 additions and 24 deletions
--- a/backend/python/insightface/Makefile
+++ b/backend/python/insightface/Makefile
@@ -0,0 +1,13 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+	bash install.sh
+
+.PHONY: protogen-clean
+protogen-clean:
+	$(RM) backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: clean
+clean: protogen-clean
+	rm -rf venv __pycache__
--- a/backend/python/insightface/README.md
+++ b/backend/python/insightface/README.md
@@ -0,0 +1,67 @@
+# insightface backend (LocalAI)
+
+Face recognition backend backed by ONNX Runtime. Provides face
+verification (1:1), face analysis (age/gender), face detection, face
+embedding, and — via LocalAI's built-in vector store — 1:N
+identification.
+
+## Engines
+
+This backend ships with **two** interchangeable engines selected via
+`LoadModel.Options["engine"]`:
+
+| engine | Implementation | Models | License |
+|---|---|---|---|
+| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
+| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
+
+Both engines implement the same `FaceEngine` protocol in `engines.py`,
+so the gRPC servicer in `backend.py` doesn't need to know which one is
+active.
+
+## LoadModel options
+
+Common:
+
+| option | default | description |
+|---|---|---|
+| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
+| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
+| `det_thresh` | `0.5` | detector confidence threshold |
+| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
+
+`insightface` engine:
+
+| option | default | description |
+|---|---|---|
+| `model_pack` | `buffalo_l` | which insightface pack to load |
+
+`onnx_direct` engine:
+
+| option | default | description |
+|---|---|---|
+| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
+| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
+
+## Adding a new model pack
+
+1. If it's an insightface pack (auto-downloadable or manually extracted
+   into `~/.insightface/models/<name>/`), just add a new gallery entry
+   in `backend/index.yaml` with `options: ["engine:insightface",
+   "model_pack:<name>"]`. No code change.
+2. If it's an Apache-licensed ONNX pair, add a gallery entry with
+   `options: ["engine:onnx_direct", "detector_onnx:...",
+   "recognizer_onnx:..."]`. If the detector or recognizer has a
+   different input-tensor shape than YuNet/SFace, you may need a new
+   engine implementation in `engines.py`; the two-engine seam makes
+   that a self-contained change.
+
+## Running tests locally
+
+```bash
+make -C backend/python/insightface         # install deps + bake models
+make -C backend/python/insightface test    # run test.py
+```
+
+The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
+absent (e.g. on dev boxes where `install.sh` wasn't run).
--- a/backend/python/insightface/backend.py
+++ b/backend/python/insightface/backend.py
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+"""gRPC server for the insightface face recognition backend.
+
+Implements Health / LoadModel / Status plus the face-specific methods:
+Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
+delegated to engines.py — this file is just the gRPC plumbing.
+"""
+import argparse
+import base64
+import os
+import signal
+import sys
+import time
+from concurrent import futures
+from io import BytesIO
+
+import backend_pb2
+import backend_pb2_grpc
+import cv2
+import grpc
+import numpy as np
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
+from grpc_auth import get_auth_interceptors  # noqa: E402
+
+from engines import FaceEngine, build_engine  # noqa: E402
+
+_ONE_DAY = 60 * 60 * 24
+MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
+
+# Default cosine-distance threshold for "same person" on buffalo_l
+# ArcFace R50. Clients can override per-request; clients using SFace
+# should pass threshold≈0.4 since the distance distribution is wider.
+DEFAULT_VERIFY_THRESHOLD = 0.35
+
+
+def _decode_image(src: str) -> np.ndarray | None:
+    """Decode a base64-encoded image into an OpenCV BGR numpy array."""
+    if not src:
+        return None
+    try:
+        data = base64.b64decode(src, validate=False)
+    except Exception:
+        return None
+    arr = np.frombuffer(data, dtype=np.uint8)
+    if arr.size == 0:
+        return None
+    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
+    return img
+
+
+def _parse_options(raw: list[str]) -> dict[str, str]:
+    out: dict[str, str] = {}
+    for entry in raw:
+        if ":" not in entry:
+            continue
+        k, v = entry.split(":", 1)
+        out[k.strip()] = v.strip()
+    return out
+
+
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+    def __init__(self) -> None:
+        self.engine: FaceEngine | None = None
+        self.engine_name: str = ""
+        self.model_name: str = ""
+        self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
+
+    def Health(self, request, context):
+        return backend_pb2.Reply(message=bytes("OK", "utf-8"))
+
+    def LoadModel(self, request, context):
+        options = _parse_options(list(request.Options))
+        # Surface LocalAI's models directory (ModelPath) so engines can
+        # anchor relative paths — OnnxDirectEngine's detector_onnx /
+        # recognizer_onnx point at gallery-managed files that LocalAI
+        # dropped there, and InsightFaceEngine auto-downloads its packs
+        # into that same directory alongside every other managed model.
+        # Private key to avoid clashing with user-provided options.
+        if request.ModelPath:
+            options["_model_dir"] = request.ModelPath
+
+        engine_name = options.get("engine", "insightface")
+        try:
+            self.engine = build_engine(engine_name)
+            self.engine.prepare(options)
+        except Exception as err:  # pragma: no cover - exercised via e2e
+            return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
+
+        self.engine_name = engine_name
+        self.model_name = request.Model or options.get("model_pack", "")
+        if "verify_threshold" in options:
+            try:
+                self.verify_threshold = float(options["verify_threshold"])
+            except ValueError:
+                pass
+        print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
+        return backend_pb2.Result(success=True, message="Model loaded successfully")
+
+    def Status(self, request, context):
+        state = (
+            backend_pb2.StatusResponse.READY
+            if self.engine is not None
+            else backend_pb2.StatusResponse.UNINITIALIZED
+        )
+        return backend_pb2.StatusResponse(state=state)
+
+    def Embedding(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.EmbeddingResult()
+        if not request.Images:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("Embedding requires Images[0] to be a base64 image")
+            return backend_pb2.EmbeddingResult()
+
+        img = _decode_image(request.Images[0])
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.EmbeddingResult()
+
+        vec = self.engine.embed(img)
+        if vec is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected")
+            return backend_pb2.EmbeddingResult()
+        return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
+
+    def Detect(self, request, context):
+        if self.engine is None:
+            return backend_pb2.DetectResponse()
+        img = _decode_image(request.src)
+        if img is None:
+            return backend_pb2.DetectResponse()
+        detections = []
+        for d in self.engine.detect(img):
+            x1, y1, x2, y2 = d.bbox
+            detections.append(
+                backend_pb2.Detection(
+                    x=float(x1),
+                    y=float(y1),
+                    width=float(x2 - x1),
+                    height=float(y2 - y1),
+                    confidence=float(d.score),
+                    class_name="face",
+                )
+            )
+        return backend_pb2.DetectResponse(Detections=detections)
+
+    def FaceVerify(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceVerifyResponse()
+
+        img1 = _decode_image(request.img1)
+        img2 = _decode_image(request.img2)
+        if img1 is None or img2 is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        threshold = request.threshold if request.threshold > 0 else self.verify_threshold
+
+        start = time.time()
+        e1 = self.engine.embed(img1)
+        e2 = self.engine.embed(img2)
+        if e1 is None or e2 is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected in one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        # Both engines return L2-normalized vectors, so the dot product
+        # is the cosine similarity directly.
+        sim = float(np.dot(e1, e2))
+        distance = 1.0 - sim
+        verified = distance < threshold
+        confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
+
+        def _region(img) -> backend_pb2.FacialArea:
+            dets = self.engine.detect(img)
+            if not dets:
+                return backend_pb2.FacialArea()
+            best = max(dets, key=lambda d: d.score)
+            x1, y1, x2, y2 = best.bbox
+            return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
+
+        return backend_pb2.FaceVerifyResponse(
+            verified=verified,
+            distance=float(distance),
+            threshold=float(threshold),
+            confidence=float(confidence),
+            model=self.model_name or self.engine_name,
+            img1_area=_region(img1),
+            img2_area=_region(img2),
+            processing_time_ms=float((time.time() - start) * 1000.0),
+        )
+
+    def FaceAnalyze(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceAnalyzeResponse()
+        img = _decode_image(request.img)
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.FaceAnalyzeResponse()
+
+        faces = []
+        for attrs in self.engine.analyze(img):
+            x, y, w, h = attrs.region
+            fa = backend_pb2.FaceAnalysis(
+                region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
+                face_confidence=float(attrs.face_confidence),
+            )
+            if attrs.age is not None:
+                fa.age = float(attrs.age)
+            if attrs.dominant_gender:
+                fa.dominant_gender = attrs.dominant_gender
+            for k, v in attrs.gender.items():
+                fa.gender[k] = float(v)
+            faces.append(fa)
+        return backend_pb2.FaceAnalyzeResponse(faces=faces)
+
+
+def serve(address: str) -> None:
+    server = grpc.server(
+        futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
+        options=[
+            ("grpc.max_message_length", 50 * 1024 * 1024),
+            ("grpc.max_send_message_length", 50 * 1024 * 1024),
+            ("grpc.max_receive_message_length", 50 * 1024 * 1024),
+        ],
+        interceptors=get_auth_interceptors(),
+    )
+    backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+    server.add_insecure_port(address)
+    server.start()
+    print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
+
+    def _stop(sig, frame):  # pragma: no cover
+        print("[insightface] shutting down")
+        server.stop(0)
+        sys.exit(0)
+
+    signal.signal(signal.SIGINT, _stop)
+    signal.signal(signal.SIGTERM, _stop)
+
+    try:
+        while True:
+            time.sleep(_ONE_DAY)
+    except KeyboardInterrupt:
+        server.stop(0)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
+    parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
+    args = parser.parse_args()
+    print(f"[insightface] startup: {args}", file=sys.stderr)
+    serve(args.addr)
--- a/backend/python/insightface/engines.py
+++ b/backend/python/insightface/engines.py
@@ -0,0 +1,382 @@
+"""Face recognition engine implementations for the LocalAI insightface backend.
+
+Two engines are provided:
+
+    * InsightFaceEngine  — wraps insightface.app.FaceAnalysis. Supports
+                           buffalo_l / buffalo_s / antelopev2 model packs
+                           with SCRFD detector + ArcFace recognizer +
+                           genderage head. NON-COMMERCIAL research use
+                           only (upstream license).
+
+    * OnnxDirectEngine   — loads detector + recognizer ONNX files directly
+                           via onnxruntime. Used for OpenCV Zoo models
+                           (YuNet + SFace) and any future Apache-licensed
+                           model set. Does not support analyze().
+
+Both engines expose the same interface so the gRPC servicer (backend.py)
+can dispatch without knowing which one is active.
+"""
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Protocol
+
+import cv2
+import numpy as np
+
+
+@dataclass
+class FaceDetection:
+    bbox: tuple[float, float, float, float]  # x1, y1, x2, y2
+    score: float
+    landmarks: np.ndarray | None = None      # 5x2 keypoints when available
+
+
+@dataclass
+class FaceAttributes:
+    region: tuple[float, float, float, float]  # x, y, w, h
+    face_confidence: float
+    age: float | None = None
+    dominant_gender: str | None = None
+    gender: dict[str, float] = field(default_factory=dict)
+
+
+class FaceEngine(Protocol):
+    """Minimal interface every engine must implement."""
+
+    def prepare(self, options: dict[str, str]) -> None: ...
+    def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
+    def embed(self, img: np.ndarray) -> np.ndarray | None: ...
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
+
+
+# ─── InsightFaceEngine ────────────────────────────────────────────────
+
+class InsightFaceEngine:
+    """Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
+
+    FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
+    in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
+    build a `{taskname: model}` dict, then loop per-face at inference).
+    We reimplement the same loop here so we can:
+
+      1. Load packs from whatever directory LocalAI's gallery extracted
+         them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
+         nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
+         without needing a specific layout on disk.
+      2. Skip insightface's built-in auto-download entirely: weight
+         delivery is LocalAI's gallery `files:` job now, checksum-
+         verified and cached alongside every other managed model.
+
+    The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
+    Landmark) stay in insightface — we only reimplement the ~50 lines
+    of glue around them.
+    """
+
+    def __init__(self) -> None:
+        self.models: dict[str, Any] = {}
+        self.det_model: Any = None
+        self.model_pack: str = "buffalo_l"
+        self.det_size: tuple[int, int] = (640, 640)
+        self.det_thresh: float = 0.5
+        self._providers: list[str] = ["CPUExecutionProvider"]
+
+    def prepare(self, options: dict[str, str]) -> None:
+        import glob
+        import os
+
+        from insightface.model_zoo import model_zoo
+
+        self.model_pack = options.get("model_pack", "buffalo_l")
+        self.det_size = _parse_det_size(options.get("det_size", "640x640"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        pack_dir = _locate_insightface_pack(options, self.model_pack)
+        if pack_dir is None:
+            raise ValueError(
+                f"no insightface pack '{self.model_pack}' found — install via "
+                f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
+            )
+
+        onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
+        if not onnx_files:
+            raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
+
+        # CUDAExecutionProvider is picked automatically by onnxruntime-gpu
+        # when available; falling back to CPU keeps the CPU-only image
+        # working. ctx_id=0 means "first GPU if any, else CPU".
+        self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
+
+        self.models = {}
+        for onnx_file in onnx_files:
+            m = model_zoo.get_model(onnx_file, providers=self._providers)
+            if m is None:
+                continue
+            # First occurrence of each taskname wins (matches FaceAnalysis).
+            if m.taskname not in self.models:
+                self.models[m.taskname] = m
+
+        if "detection" not in self.models:
+            raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
+        self.det_model = self.models["detection"]
+
+        self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
+        for name, m in self.models.items():
+            if name != "detection":
+                m.prepare(0)
+
+    def _faces(self, img: np.ndarray) -> list[Any]:
+        """Run detection + all non-detection models per face."""
+        if self.det_model is None:
+            return []
+        from insightface.app.common import Face
+
+        bboxes, kpss = self.det_model.detect(img, max_num=0)
+        if bboxes is None or bboxes.shape[0] == 0:
+            return []
+        faces: list[Any] = []
+        for i in range(bboxes.shape[0]):
+            bbox = bboxes[i, 0:4]
+            det_score = bboxes[i, 4]
+            kps = kpss[i] if kpss is not None else None
+            face = Face(bbox=bbox, kps=kps, det_score=det_score)
+            for name, m in self.models.items():
+                if name == "detection":
+                    continue
+                m.get(img, face)
+            faces.append(face)
+        return faces
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        return [
+            FaceDetection(
+                bbox=tuple(float(v) for v in f.bbox),
+                score=float(f.det_score),
+                landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
+            )
+            for f in self._faces(img)
+        ]
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        faces = self._faces(img)
+        if not faces:
+            return None
+        best = max(faces, key=lambda f: float(f.det_score))
+        if getattr(best, "normed_embedding", None) is None:
+            return None
+        return np.asarray(best.normed_embedding, dtype=np.float32)
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        out: list[FaceAttributes] = []
+        for f in self._faces(img):
+            x1, y1, x2, y2 = (float(v) for v in f.bbox)
+            region = (x1, y1, x2 - x1, y2 - y1)
+            attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
+            age = getattr(f, "age", None)
+            if age is not None:
+                attrs.age = float(age)
+            gender = getattr(f, "gender", None)
+            if gender is not None:
+                # genderage head emits argmax, not probabilities —
+                # one-hot dict keeps the API stable.
+                attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
+                attrs.gender = {
+                    "Man": 1.0 if int(gender) == 1 else 0.0,
+                    "Woman": 0.0 if int(gender) == 1 else 1.0,
+                }
+            out.append(attrs)
+        return out
+
+
+# ─── OnnxDirectEngine ─────────────────────────────────────────────────
+
+class OnnxDirectEngine:
+    """Loads detector + recognizer ONNX files directly.
+
+    Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
+    exposes a C++-level API via cv2.FaceDetectorYN which accepts the
+    ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
+    Both are Apache 2.0 licensed.
+    """
+
+    def __init__(self) -> None:
+        self.detector_path: str = ""
+        self.recognizer_path: str = ""
+        self.input_size: tuple[int, int] = (320, 320)
+        self.det_thresh: float = 0.5
+        self._detector: Any = None
+        self._recognizer: Any = None
+
+    def prepare(self, options: dict[str, str]) -> None:
+        raw_det = options.get("detector_onnx", "")
+        raw_rec = options.get("recognizer_onnx", "")
+        if not raw_det or not raw_rec:
+            raise ValueError(
+                "onnx_direct engine requires both detector_onnx and recognizer_onnx options"
+            )
+        model_dir = options.get("_model_dir")
+        self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
+        self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
+        self.input_size = _parse_det_size(options.get("det_size", "320x320"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        # YuNet is a fixed-size detector; size is reset per detect() call to
+        # match the input frame.
+        self._detector = cv2.FaceDetectorYN.create(
+            self.detector_path,
+            "",
+            self.input_size,
+            score_threshold=self.det_thresh,
+            nms_threshold=0.3,
+            top_k=5000,
+        )
+        self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        if self._detector is None:
+            return []
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None:
+            return []
+        out: list[FaceDetection] = []
+        for row in faces:
+            x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
+            # Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
+            landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
+            score = float(row[-1])
+            out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
+        return out
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        if self._detector is None or self._recognizer is None:
+            return None
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None or len(faces) == 0:
+            return None
+        # Pick the highest-score face (last column is score).
+        best = max(faces, key=lambda r: float(r[-1]))
+        aligned = self._recognizer.alignCrop(img, best)
+        feat = self._recognizer.feature(aligned)
+        vec = np.asarray(feat, dtype=np.float32).flatten()
+        # SFace outputs a 128-dim feature; L2-normalize to make dot-product
+        # comparable to buffalo_l's already-normed 512-dim embedding.
+        norm = float(np.linalg.norm(vec))
+        if norm == 0:
+            return None
+        return vec / norm
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        # OpenCV Zoo does not ship a demographic classifier; report
+        # only the face-detection regions so callers can still see
+        # how many faces were detected.
+        return [
+            FaceAttributes(
+                region=(
+                    d.bbox[0],
+                    d.bbox[1],
+                    d.bbox[2] - d.bbox[0],
+                    d.bbox[3] - d.bbox[1],
+                ),
+                face_confidence=d.score,
+            )
+            for d in self.detect(img)
+        ]
+
+
+# ─── helpers ──────────────────────────────────────────────────────────
+
+def _parse_det_size(raw: str) -> tuple[int, int]:
+    raw = raw.strip().lower().replace(" ", "")
+    if "x" in raw:
+        w, h = raw.split("x", 1)
+        return (int(w), int(h))
+    n = int(raw)
+    return (n, n)
+
+
+def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
+    """Find the directory holding the insightface pack's ONNX files.
+
+    LocalAI's gallery `files:` extracts the pack zip straight into the
+    models directory. Upstream packs are inconsistent:
+
+      buffalo_l/s/sc  — flat zip, ONNX lands at `<models_dir>/*.onnx`
+      buffalo_m, antelopev2  — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
+
+    We search, in order:
+      1. `<models_dir>/<name>/`  — wrapped-zip layout, or insightface's
+         own FaceAnalysis-style `<root>/models/<name>/` layout.
+      2. `<models_dir>/models/<name>/`  — insightface's FaceAnalysis
+         auto-download lands here (handy for dev environments that
+         still have old `~/.insightface` caches).
+      3. `<models_dir>/`  — flat-zip layout directly in models dir.
+
+    Returns the first directory whose contents include `*.onnx`.
+    """
+    import glob
+    import os
+
+    model_dir = options.get("_model_dir") or ""
+    explicit_root = options.get("root")
+
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, name))
+        candidates.append(os.path.join(model_dir, "models", name))
+        candidates.append(model_dir)
+    if explicit_root:
+        expanded = os.path.expanduser(explicit_root)
+        candidates.append(os.path.join(expanded, "models", name))
+        candidates.append(os.path.join(expanded, name))
+        candidates.append(expanded)
+
+    for c in candidates:
+        if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
+            return c
+    return None
+
+
+def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
+    """Resolve an ONNX file path across the paths LocalAI might deliver it from.
+
+    Search order:
+      1. The path itself if it already resolves (absolute, or relative to CWD).
+      2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
+         this is how LocalAI surfaces gallery-managed files. When the gallery
+         entry lists `files:`, each one lands under the models directory and
+         backends load them via filename anchored by ModelFile.
+      3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
+         where someone manually dropped weights inside the backend dir.
+
+    If none hit, return the literal input so cv2/insightface surfaces a
+    clearer error naming the actually-attempted path.
+    """
+    import os
+
+    if os.path.isfile(path):
+        return path
+    stripped = path.lstrip("/")
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, os.path.basename(path)))
+        candidates.append(os.path.join(model_dir, stripped))
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    candidates.append(os.path.join(script_dir, stripped))
+    for c in candidates:
+        if os.path.isfile(c):
+            return c
+    return path
+
+
+def build_engine(name: str) -> FaceEngine:
+    """Factory for the engine selected by LoadModel options."""
+    key = name.strip().lower()
+    if key in ("", "insightface"):
+        return InsightFaceEngine()
+    if key in ("onnx_direct", "onnx-direct", "opencv"):
+        return OnnxDirectEngine()
+    raise ValueError(f"unknown engine: {name!r}")
--- a/backend/python/insightface/install.sh
+++ b/backend/python/insightface/install.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+installRequirements
+
+# We deliberately do NOT pre-bake any model weights here. Two reasons:
+#
+#   1. Weights should follow LocalAI's gallery-managed download flow
+#      like every other backend. For OpenCV Zoo (YuNet + SFace) the
+#      gallery entries in gallery/index.yaml list the ONNX files via
+#      `files:` with URI + SHA-256 — LocalAI fetches them into the
+#      models directory on `local-ai models install`.
+#
+#   2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
+#      buffalo_sc, antelopev2), upstream distributes zip archives
+#      only (no individual ONNX URLs). We rely on insightface's own
+#      auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
+#      at first LoadModel, pointed at a writable directory. This
+#      matches how rfdetr behaves (uses `inference.get_model()`).
+#
+# Net effect: the backend image ships only Python deps (~150MB CPU).
--- a/backend/python/insightface/requirements-cpu.txt
+++ b/backend/python/insightface/requirements-cpu.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements-cublas12.txt
+++ b/backend/python/insightface/requirements-cublas12.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime-gpu
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements.txt
+++ b/backend/python/insightface/requirements.txt
@@ -0,0 +1,3 @@
+grpcio==1.71.0
+protobuf
+grpcio-tools
--- a/backend/python/insightface/run.sh
+++ b/backend/python/insightface/run.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+startBackend $@
--- a/backend/python/insightface/smoke.py
+++ b/backend/python/insightface/smoke.py
@@ -0,0 +1,264 @@
+#!/usr/bin/env python3
+"""Smoke-test every face recognition model configuration shipped in the
+gallery. Simulates what LocalAI does at runtime: for each config, sets
+up a models directory, fetches any required files via URL (as the
+gallery's `files:` list would), then loads + detects + embeds via the
+in-process BackendServicer — matching the gRPC surface end users hit.
+
+Run inside the built backend image (venv already has insightface /
+onnxruntime / opencv-python-headless):
+
+    python smoke.py
+
+Network is required for the insightface packs (fetched via upstream's
+FaceAnalysis auto-download at first LoadModel) and for downloading
+the OpenCV Zoo ONNX files on first run.
+"""
+from __future__ import annotations
+
+import base64
+import hashlib
+import os
+import sys
+import traceback
+import urllib.request
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+from backend import BackendServicer  # noqa: E402
+
+
+# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
+# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
+OPENCV_FILES = {
+    "fp32": [
+        (
+            "face_detection_yunet_2023mar.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+            "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+        ),
+        (
+            "face_recognition_sface_2021dec.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+            "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+        ),
+    ],
+    "int8": [
+        (
+            "face_detection_yunet_2023mar_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
+            "321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
+        ),
+        (
+            "face_recognition_sface_2021dec_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
+            "2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
+        ),
+    ],
+}
+
+
+CONFIGS = [
+    {
+        "name": "insightface-buffalo-l",
+        "options": ["engine:insightface", "model_pack:buffalo_l"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-sc",
+        "options": ["engine:insightface", "model_pack:buffalo_sc"],
+        # buffalo_sc has recognizer only — no landmarks, no genderage.
+        "has_analyze": False,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-s",
+        "options": ["engine:insightface", "model_pack:buffalo_s"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-m",
+        "options": ["engine:insightface", "model_pack:buffalo_m"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-antelopev2",
+        "options": ["engine:insightface", "model_pack:antelopev2"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-opencv",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "fp32",
+    },
+    {
+        "name": "insightface-opencv-int8",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar_int8.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "int8",
+    },
+]
+
+
+class _FakeContext:
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+def _encode_image(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_sample_image() -> str:
+    from insightface.data import get_image as ins_get_image
+
+    return _encode_image(ins_get_image("t1"))
+
+
+def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
+    dest = os.path.join(model_dir, filename)
+    if os.path.isfile(dest):
+        h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+        if h == sha256:
+            return
+    sys.stderr.write(f"  fetching {filename} from {uri}\n")
+    sys.stderr.flush()
+    urllib.request.urlretrieve(uri, dest)
+    h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+    if h != sha256:
+        raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
+
+
+def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
+    # Mirror LocalAI's gallery flow: populate model_dir with the
+    # gallery's listed files before calling LoadModel.
+    if cfg["needs_opencv_files"]:
+        for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
+            _download_if_missing(model_dir, filename, uri, sha256)
+
+    svc = BackendServicer()
+    ctx = _FakeContext()
+
+    load_res = svc.LoadModel(
+        backend_pb2.ModelOptions(
+            Model=cfg["name"],
+            Options=cfg["options"],
+            # ModelPath is what the Go loader sets to ml.ModelPath —
+            # LocalAI's models directory. The backend anchors relative
+            # paths and insightface auto-download root here.
+            ModelPath=model_dir,
+        ),
+        ctx,
+    )
+    if not load_res.success:
+        return False, f"LoadModel: {load_res.message}"
+
+    det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+    if len(det_res.Detections) == 0:
+        return False, "Detect returned no faces"
+    for d in det_res.Detections:
+        if d.class_name != "face":
+            return False, f"Detect returned class_name={d.class_name!r}"
+
+    emb_ctx = _FakeContext()
+    emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
+    if emb_ctx.code is not None:
+        return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
+    if len(emb_res.embeddings) == 0:
+        return False, "Embedding returned empty vector"
+    norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
+    if not (0.8 <= norm_sq <= 1.2):
+        return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
+
+    ver_ctx = _FakeContext()
+    ver_res = svc.FaceVerify(
+        backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
+    )
+    if ver_ctx.code is not None:
+        return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
+    if not ver_res.verified:
+        return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
+    if ver_res.distance > 0.1:
+        return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
+
+    if cfg["has_analyze"]:
+        an_ctx = _FakeContext()
+        an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
+        if an_ctx.code is not None:
+            return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
+        if len(an_res.faces) == 0:
+            return False, "FaceAnalyze returned no faces"
+        f0 = an_res.faces[0]
+        if f0.age <= 0:
+            return False, f"FaceAnalyze age not populated (age={f0.age})"
+        if f0.dominant_gender not in ("Man", "Woman"):
+            return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
+
+    n_dets = len(det_res.Detections)
+    dim = len(emb_res.embeddings)
+    return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
+
+
+def main() -> int:
+    # Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
+    # default to a fresh temp dir.
+    model_dir = os.environ.get("LOCALAI_MODELS_PATH")
+    if not model_dir:
+        import tempfile
+
+        model_dir = tempfile.mkdtemp(prefix="face-smoke-")
+    os.makedirs(model_dir, exist_ok=True)
+    print(f"model_dir={model_dir}", file=sys.stderr)
+
+    print("Preparing sample image from insightface.data...", file=sys.stderr)
+    img_b64 = _load_sample_image()
+
+    results: list[tuple[str, bool, str]] = []
+    for cfg in CONFIGS:
+        sys.stderr.write(f"\n=== {cfg['name']} ===\n")
+        sys.stderr.flush()
+        try:
+            ok, detail = _run_one(cfg, img_b64, model_dir)
+        except Exception:
+            ok, detail = False, traceback.format_exc().splitlines()[-1]
+        results.append((cfg["name"], ok, detail))
+        print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s}  {detail}")
+        sys.stdout.flush()
+
+    print("\n=== summary ===")
+    passed = sum(1 for _, ok, _ in results if ok)
+    total = len(results)
+    for name, ok, detail in results:
+        mark = "✓" if ok else "✗"
+        print(f"  {mark} {name:30s} {detail}")
+    print(f"\n{passed}/{total} passed")
+    return 0 if passed == total else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/backend/python/insightface/test.py
+++ b/backend/python/insightface/test.py
@@ -0,0 +1,234 @@
+"""Unit tests for the insightface gRPC backend.
+
+The servicer is instantiated in-process (no gRPC channel) and driven
+directly. Images come from insightface.data which ships with the pip
+package — no external downloads.
+
+Tests are parametrized over both engines (InsightFaceEngine and
+OnnxDirectEngine) where applicable.
+"""
+from __future__ import annotations
+
+import base64
+import os
+import sys
+import unittest
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+
+from backend import BackendServicer  # noqa: E402
+
+# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
+# to mirror LocalAI's gallery `files:` flow (the backend image itself
+# doesn't ship model weights).
+OPENCV_FILES = [
+    (
+        "face_detection_yunet_2023mar.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+        "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+    ),
+    (
+        "face_recognition_sface_2021dec.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+        "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+    ),
+]
+
+
+def _encode(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_insightface_samples() -> dict[str, str]:
+    """Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
+
+    t1 is a group photo, t2 a different one. We reuse both as
+    stand-ins for "Alice photo 1/2" and "Bob".
+    """
+    from insightface.data import get_image as ins_get_image
+
+    return {
+        "t1": _encode(ins_get_image("t1")),
+        "t2": _encode(ins_get_image("t2")),
+    }
+
+
+class _FakeContext:
+    """Minimal stand-in for grpc.ServicerContext."""
+
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+class _Harness:
+    def __init__(self, servicer: BackendServicer) -> None:
+        self.svc = servicer
+
+    def health(self):
+        return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
+
+    def load(self, options: list[str], model_path: str = ""):
+        return self.svc.LoadModel(
+            backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
+            _FakeContext(),
+        )
+
+    def detect(self, img_b64: str):
+        return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+
+    def embed(self, img_b64: str):
+        ctx = _FakeContext()
+        res = self.svc.Embedding(
+            backend_pb2.PredictOptions(Images=[img_b64]),
+            ctx,
+        )
+        return res, ctx
+
+    def verify(self, a: str, b: str, threshold: float = 0.0):
+        return self.svc.FaceVerify(
+            backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
+            _FakeContext(),
+        )
+
+    def analyze(self, img_b64: str):
+        return self.svc.FaceAnalyze(
+            backend_pb2.FaceAnalyzeRequest(img=img_b64),
+            _FakeContext(),
+        )
+
+
+class InsightFaceEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_health(self):
+        self.assertEqual(self.harness.health().message, b"OK")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+            self.assertGreater(d.width, 0)
+            self.assertGreater(d.height, 0)
+
+    def test_embedding_is_l2_normed(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertEqual(len(res.embeddings), 512)
+        norm_sq = sum(x * x for x in res.embeddings)
+        self.assertAlmostEqual(norm_sq, 1.0, places=2)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"])
+        self.assertTrue(res.verified)
+        self.assertLess(res.distance, 0.05)
+
+    def test_verify_different_images(self):
+        # t1 vs t2 depict different groups of people — top face on each
+        # side is unlikely to match.
+        res = self.harness.verify(self.samples["t1"], self.samples["t2"])
+        # We assert only that some numerical answer came back; the
+        # matches-or-not determination depends on which face each side
+        # picked and isn't a stable test assertion.
+        self.assertGreaterEqual(res.distance, 0.0)
+
+    def test_analyze_has_age_and_gender(self):
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertGreater(face.face_confidence, 0.0)
+            # Age should be populated for buffalo_l.
+            self.assertGreater(face.age, 0.0)
+            self.assertIn(face.dominant_gender, ("Man", "Woman"))
+
+
+def _prepare_opencv_models_dir() -> str | None:
+    """Download OpenCV Zoo face ONNX files into a temp dir the way
+    LocalAI's gallery would. Returns the directory, or None if
+    downloads failed (network-restricted sandbox).
+    """
+    import hashlib
+    import tempfile
+    import urllib.request
+
+    root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
+        prefix="opencv-face-"
+    )
+    for filename, uri, sha256 in OPENCV_FILES:
+        dest = os.path.join(root, filename)
+        if os.path.isfile(dest):
+            if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
+                continue
+        try:
+            urllib.request.urlretrieve(uri, dest)
+        except Exception:
+            return None
+        if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
+            return None
+    return root
+
+
+class OnnxDirectEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.model_dir = _prepare_opencv_models_dir()
+        if cls.model_dir is None:
+            raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(
+            [
+                "engine:onnx_direct",
+                "detector_onnx:face_detection_yunet_2023mar.onnx",
+                "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+            ],
+            model_path=cls.model_dir,
+        )
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+
+    def test_embedding_nonempty(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertGreater(len(res.embeddings), 0)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
+        self.assertTrue(res.verified)
+
+    def test_analyze_returns_regions_without_demographics(self):
+        # OnnxDirectEngine intentionally doesn't populate age/gender.
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertEqual(face.dominant_gender, "")
+            self.assertEqual(face.age, 0.0)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/backend/python/insightface/test.sh
+++ b/backend/python/insightface/test.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests