feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480)

* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis

Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.

New gRPC RPCs (backend/backend.proto):
  * FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
  * FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse

Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.

New HTTP endpoints under /v1/face/:
  * verify     — 1:1 image pair same-person decision
  * analyze    — per-face age + gender (emotion/race reserved)
  * register   — 1:N enrollment; stores embedding in vector store
  * identify   — 1:N recognition; detect → embed → StoresFind
  * forget     — remove a registered face by opaque ID

Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).

New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.

Gallery (backend/index.yaml) ships three entries:
  * insightface-buffalo-l   — SCRFD-10GF + ArcFace R50 + genderage
                              (~326MB pre-baked; non-commercial research use only)
  * insightface-opencv      — YuNet + SFace (~40MB pre-baked; Apache 2.0)
  * insightface-buffalo-s   — SCRFD-500MF + MBF (runtime download; non-commercial)

Python backend (backend/python/insightface/):
  * engines.py — FaceEngine protocol with InsightFaceEngine and
    OnnxDirectEngine; resolves model paths relative to the backend
    directory so the same gallery config works in docker-scratch and
    in the e2e-backends rootfs-extraction harness.
  * backend.py — gRPC servicer implementing Health, LoadModel, Status,
    Embedding, Detect, FaceVerify, FaceAnalyze.
  * install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
    backend directory so first-run is offline-clean (the final scratch
    image only preserves files under /<backend>/).
  * test.py — parametrized unit tests over both engines.

Tests:
  * Registry unit tests (go test -race ./core/services/facerecognition/...)
    — in-memory fake grpc.Backend, table-driven, covers register/
    identify/forget/error paths + concurrent access.
  * tests/e2e-backends/backend_test.go extended with face caps
    (face_detect, face_embed, face_verify, face_analyze); relative
    ordering + configurable verifyCeiling per engine.
  * Makefile targets: test-extra-backend-insightface-buffalo-l,
    -opencv, and the -all aggregate.
  * CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
    auto-triggered by changes under backend/python/insightface/.

Docs:
  * docs/content/features/face-recognition.md — feature page with
    license table, quickstart (defaults to the commercial-safe model),
    models matrix, API reference, 1:N workflow, storage caveats.
  * Cross-refs in object-detection.md, stores.md, embeddings.md, and
    whats-new.md.
  * Contributor README at backend/python/insightface/README.md.

Verified end-to-end:
  * buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
    face_verify, face_analyze).
  * opencv: 5/5 specs (same minus face_analyze — SFace has no
    demographic head; correctly skipped via BACKEND_TEST_CAPS).

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): move engine selection to model gallery, collapse backend entries

The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.

Correct split:

  * `backend/index.yaml` now has ONE `insightface` backend entry
    shipping the CPU + CUDA 12 container images. The Python backend
    bundles both the non-commercial insightface model packs
    (buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
    weights (YuNet + SFace); the active engine is selected at
    LoadModel time via `options: ["engine:..."]`.

  * `gallery/index.yaml` gains three model entries —
    `insightface-buffalo-l`, `insightface-opencv`,
    `insightface-buffalo-s` — each setting the appropriate
    `overrides.backend` + `overrides.options` so installing one
    actually gives the user the intended engine. This matches how
    `rfdetr-base` lives in the model gallery against the `rfdetr`
    backend.

The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): cover all supported models in the gallery + drop weight baking

Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.

Gallery now has seven `insightface-*` model entries (gallery/index.yaml):

  insightface (family)  — non-commercial research use
    • buffalo-l   (326MB)  — SCRFD-10GF + ResNet50 + genderage, default
    • buffalo-m   (313MB)  — SCRFD-2.5GF + ResNet50 + genderage
    • buffalo-s   (159MB)  — SCRFD-500MF + MBF + genderage
    • buffalo-sc  (16MB)   — SCRFD-500MF + MBF, recognition only
                             (no landmarks, no demographics — analyze
                             returns empty attributes)
    • antelopev2  (407MB)  — SCRFD-10GF + ResNet100@Glint360K + genderage

  OpenCV Zoo family — Apache 2.0 commercial-safe
    • opencv       — YuNet + SFace fp32 (~40MB)
    • opencv-int8  — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)

Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:

  * OpenCV variants list `files:` with ONNX URIs + SHA-256, so
    `local-ai models install insightface-opencv` pulls them into the
    models directory exactly like any other gallery-managed model.

  * insightface packs (upstream distributes .zip archives only, not
    individual ONNX files) auto-download on first LoadModel via
    FaceAnalysis' built-in machinery, rooted at the LocalAI models
    directory so they live alongside everything else — same pattern
    `rfdetr` uses with `inference.get_model()`.

Backend changes (backend/python/insightface/):

  * backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
    LocalAI models directory) to engines via a `_model_dir` hint.
    This replaces the earlier ModelFile-dirname approach; ModelPath
    is the canonical "models directory" variable set by the Go loader
    (pkg/model/initializers.go:144) and is always populated.

  * engines.py::_resolve_model_path — picks up `model_dir` and searches
    it (plus basename-in-model-dir) before falling back to the dev
    script-dir. This is how OnnxDirectEngine finds gallery-downloaded
    YuNet/SFace files by filename only.

  * engines.py::_flatten_insightface_pack — new helper that works
    around an upstream packaging inconsistency: buffalo_l/s/sc zips
    expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
    files in a redundant `<name>/` directory. insightface's own
    loader looks one level too shallow and fails. We call
    `ensure_available()` explicitly, flatten if nested, then hand to
    FaceAnalysis.

  * engines.py::InsightFaceEngine.prepare — root-resolution order now
    includes the `_model_dir` hint so packs download into the LocalAI
    models directory by default.

  * install.sh — no longer pre-downloads any weights. Everything is
    gallery-managed now.

  * smoke.py (new) — parametrized smoke test that iterates over every
    gallery configuration, simulating the LocalAI install flow
    (creates a models dir, fetches OpenCV files with checksum
    verification, lets insightface auto-download its packs), then
    runs detect + embed + verify (+ analyze where supported) through
    the in-process BackendServicer.

  * test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
    paths; downloads ONNX files to a temp dir at setUpClass time and
    passes ModelPath accordingly.

Registry change (core/services/facerecognition/store_registry.go):

  * `dim=0` in NewStoreRegistry now means "accept whatever dimension
    arrives" — needed because the backend supports 512-d ArcFace/MBF
    and 128-d SFace via the same Registry. A non-zero dim still fails
    fast with ErrDimensionMismatch.

  * core/application plumbs `faceEmbeddingDim = 0`, explaining the
    rationale in the comment.

Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.

Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:

    PASS: insightface-buffalo-l    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-sc   faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-s    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-m    faces=6 dim=512 same-dist=0.000
    PASS: insightface-antelopev2   faces=6 dim=512 same-dist=0.000
    PASS: insightface-opencv       faces=6 dim=128 same-dist=0.000
    PASS: insightface-opencv-int8  faces=6 dim=128 same-dist=0.000
    7/7 passed

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim

CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:

    LoadModel failed: Failed to load face engine:
    OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx

Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.

Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).

Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs

The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.

Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).

Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).

Response:

    {
      "embedding": [<dim> floats, L2-normed],
      "dim": int,           // 512 for ArcFace R50 / MBF, 128 for SFace
      "model": "<name>"
    }

Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.

Assisted-by: Claude:claude-opus-4-7

* fix(http): map malformed image input + gRPC status codes to proper 4xx

Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.

Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:

  * decodeImageInput(s)
      Wraps utils.GetContentURIAsBase64 and turns any failure
      (invalid URL, not a data-URI, download error, etc.) into
      echo.NewHTTPError(400, "invalid image input: ...").

  * mapBackendError(err)
      Inspects the gRPC status on a backend call error and maps:
        INVALID_ARGUMENT     → 400 Bad Request
        NOT_FOUND            → 404 Not Found
        FAILED_PRECONDITION  → 412 Precondition Failed
        Unimplemented        → 501 Not Implemented
      All other codes fall through unchanged (still 500).

Before, my 1×1 PNG error-path test returned:
    HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
    HTTP 400 "failed to decode one or both images"

Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.

Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.

Assisted-by: Claude:claude-opus-4-7

* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis

Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.

Two changes:

1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
   `files:` list with the pack zip's URI + SHA-256. `local-ai models
   install insightface-buffalo-l` now fetches the zip, verifies the
   hash, and extracts it into the models directory. No more reliance
   on insightface's library-internal `ensure_available()` auto-download
   or its hardcoded `BASE_REPO_URL`.

2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
   the FaceAnalysis wrapper and drives insightface's `model_zoo`
   directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
   route each through `model_zoo.get_model()`, build a
   `{taskname: model}` dict, loop per-face at inference — are
   reimplemented in `InsightFaceEngine`. The actual inference classes
   (RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
   insightface's — we only replicate the glue, so drift risk against
   upstream is minimal.

   Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
   layout that doesn't match what LocalAI's zip extraction produces.
   LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
   are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
   at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
   `<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
   `_locate_insightface_pack` helper searches both locations plus
   legacy paths and returns whichever has ONNX files. Replaces the
   earlier `_flatten_insightface_pack` helper (which tried to fight
   FaceAnalysis's layout expectations; now we just find the files
   wherever they are).

Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched

The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".

Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.

Face analyze cap dropped since buffalo_sc has no age/gender head.

Assisted-by: Claude:claude-opus-4-7[1m]

* feat(face-recognition): surface face-recognition in advertised feature maps

The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:

  * api_instructions — the machine-readable capability index at
    GET /api/instructions. Added `face-recognition` as a dedicated
    instruction area with an intro that calls out the in-memory
    registry caveat and the /v1/face/embed vs /v1/embeddings split.
  * auth/permissions — added FeatureFaceRecognition constant, routed
    all six face endpoints through it so admins can gate them per-user
    like any other API feature. Default ON (matches the other API
    features).
  * React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
    FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
    follow-up (noted in the plan).

Instruction count bumped 9 → 10; test updated.

Assisted-by: Claude:claude-opus-4-7[1m]

* docs(agents): capture advertising-surface steps in the endpoint guide

Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.

Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.

Assisted-by: Claude:claude-opus-4-7[1m]
This commit is contained in:
Ettore Di Giacinto
2026-04-22 21:55:41 +02:00
committed by GitHub
parent d16f19f1eb
commit 20baec77ab
59 changed files with 3625 additions and 24 deletions

View File

@@ -2,6 +2,8 @@
This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
## Architecture overview
Authentication and authorization flow through three layers:
@@ -234,6 +236,66 @@ Use these HTTP status codes:
If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
## Advertising surfaces — where to register a new capability
Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
### 1. Swagger `@Tags` annotation (mandatory)
Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
```go
// MyEndpoint does X.
// @Summary Do X.
// @Tags my-capability
// @Param request body schema.MyRequest true "payload"
// @Success 200 {object} schema.MyResponse "Response"
// @Router /v1/my-endpoint [post]
func MyEndpoint(...) echo.HandlerFunc { ... }
```
Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
After adding endpoints, regenerate the embedded spec so the runtime serves it:
```bash
make protogen-go # ensures gRPC codegen is fresh first
make swagger # regenerates swagger/swagger.json
```
### 2. `/api/instructions` registry (for new capability areas)
`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
Add an entry to `instructionDefs`:
```go
{
Name: "my-capability", // URL segment at /api/instructions/my-capability
Description: "Short sentence describing the capability",
Tags: []string{"my-capability"}, // must match swagger @Tags
Intro: "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
},
```
Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
```js
export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
```
React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
### 4. `docs/content/` (user-facing documentation)
A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
## Path protection rules
The global auth middleware classifies paths as API paths or non-API paths:
@@ -248,12 +310,23 @@ If you add endpoints under a new top-level path prefix, add it to `isAPIPath()`
When adding a new endpoint:
**Routing & auth**
- [ ] Handler in `core/http/endpoints/`
- [ ] Route registered in appropriate `core/http/routes/` file
- [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
- [ ] If token-counting: `usageMiddleware` added to middleware chain
- [ ] Error responses use `schema.ErrorResponse` format
**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
**Quality**
- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
- [ ] Tests cover both authenticated and unauthenticated access
- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation

View File

@@ -711,6 +711,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-insightface'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "insightface"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -2626,6 +2639,20 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
# insightface (face recognition)
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-insightface'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "insightface"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""

View File

@@ -38,6 +38,7 @@ jobs:
qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
voxtral: ${{ steps.detect.outputs.voxtral }}
kokoros: ${{ steps.detect.outputs.kokoros }}
insightface: ${{ steps.detect.outputs.insightface }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
@@ -751,3 +752,29 @@ jobs:
- name: Test kokoros
run: |
make -C backend/rust/kokoros test
tests-insightface-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
make build-essential curl unzip ca-certificates git tar
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.26.0'
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
df -h
- name: Build insightface backend image and run both model configurations
run: |
make test-extra-backend-insightface-all

View File

@@ -34,5 +34,6 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
- **Go style**: Prefer `any` over `interface{}`
- **Comments**: Explain *why*, not *what*
- **Docs**: Update `docs/content/` when adding features or changing config
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

116
Makefile
View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
GOCMD=go
GOTEST=$(GOCMD) test
@@ -434,6 +434,7 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/ace-step
$(MAKE) -C backend/python/trl
$(MAKE) -C backend/python/tinygrad
$(MAKE) -C backend/python/insightface
$(MAKE) -C backend/rust/kokoros kokoros-grpc
test-extra: prepare-test-extra
@@ -457,6 +458,7 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/python/ace-step test
$(MAKE) -C backend/python/trl test
$(MAKE) -C backend/python/tinygrad test
$(MAKE) -C backend/python/insightface test
$(MAKE) -C backend/rust/kokoros test
##
@@ -507,6 +509,13 @@ test-extra-backend: protogen-go
BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
BACKEND_TEST_CACHE_TYPE_K="$$BACKEND_TEST_CACHE_TYPE_K" \
BACKEND_TEST_CACHE_TYPE_V="$$BACKEND_TEST_CACHE_TYPE_V" \
BACKEND_TEST_FACE_IMAGE_1_URL="$$BACKEND_TEST_FACE_IMAGE_1_URL" \
BACKEND_TEST_FACE_IMAGE_1_FILE="$$BACKEND_TEST_FACE_IMAGE_1_FILE" \
BACKEND_TEST_FACE_IMAGE_2_URL="$$BACKEND_TEST_FACE_IMAGE_2_URL" \
BACKEND_TEST_FACE_IMAGE_2_FILE="$$BACKEND_TEST_FACE_IMAGE_2_FILE" \
BACKEND_TEST_FACE_IMAGE_3_URL="$$BACKEND_TEST_FACE_IMAGE_3_URL" \
BACKEND_TEST_FACE_IMAGE_3_FILE="$$BACKEND_TEST_FACE_IMAGE_3_FILE" \
BACKEND_TEST_VERIFY_DISTANCE_CEILING="$$BACKEND_TEST_VERIFY_DISTANCE_CEILING" \
go test -v -timeout 30m ./tests/e2e-backends/...
## Convenience wrappers: build the image, then exercise it.
@@ -603,6 +612,107 @@ test-extra-backend-tinygrad-all: \
test-extra-backend-tinygrad-sd \
test-extra-backend-tinygrad-whisper
## insightface — face recognition.
##
## Face fixtures default to the sample images shipped in the
## deepinsight/insightface repository (MIT-licensed). For offline/local
## runs override with BACKEND_TEST_FACE_IMAGE_{1,2,3}_FILE pointing at
## local paths.
FACE_IMAGE_1_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
FACE_IMAGE_2_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
FACE_IMAGE_3_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/mask_white.jpg
## Host-side cache for the OpenCV Zoo face ONNX files used by the
## opencv e2e target. The backend image no longer bakes model weights —
## gallery installs bring them via `files:` — but the e2e suite drives
## LoadModel over gRPC directly without going through the gallery. We
## pre-download the ONNX files to a stable host path and pass absolute
## paths in BACKEND_TEST_OPTIONS; `make` skips the downloads when the
## SHA-256 already matches.
INSIGHTFACE_OPENCV_DIR := /tmp/localai-insightface-opencv-cache
INSIGHTFACE_OPENCV_YUNET_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
INSIGHTFACE_OPENCV_SFACE_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
INSIGHTFACE_OPENCV_YUNET_SHA := 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
INSIGHTFACE_OPENCV_SFACE_SHA := 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
## buffalo_sc (insightface) — pack zip + SHA-256 mirrors the gallery
## entry so the e2e target matches exactly what `local-ai models install
## insightface-buffalo-sc` would have fetched. Smallest insightface pack
## (~16MB) — keeps CI fast while still covering the insightface engine
## code path end-to-end.
INSIGHTFACE_BUFFALO_SC_DIR := /tmp/localai-insightface-buffalo-sc-cache
INSIGHTFACE_BUFFALO_SC_URL := https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
INSIGHTFACE_BUFFALO_SC_SHA := 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
.PHONY: insightface-opencv-models
insightface-opencv-models:
@mkdir -p $(INSIGHTFACE_OPENCV_DIR)
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_YUNET_SHA)" ]; then \
echo "Fetching YuNet..."; \
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx $(INSIGHTFACE_OPENCV_YUNET_URL); \
echo "$(INSIGHTFACE_OPENCV_YUNET_SHA) $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx" | sha256sum -c; \
fi
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/sface.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_SFACE_SHA)" ]; then \
echo "Fetching SFace..."; \
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/sface.onnx $(INSIGHTFACE_OPENCV_SFACE_URL); \
echo "$(INSIGHTFACE_OPENCV_SFACE_SHA) $(INSIGHTFACE_OPENCV_DIR)/sface.onnx" | sha256sum -c; \
fi
.PHONY: insightface-buffalo-sc-models
insightface-buffalo-sc-models:
@mkdir -p $(INSIGHTFACE_BUFFALO_SC_DIR)
@if [ "$$(sha256sum $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_BUFFALO_SC_SHA)" ]; then \
echo "Fetching buffalo_sc..."; \
curl -fsSL -o $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip $(INSIGHTFACE_BUFFALO_SC_URL); \
echo "$(INSIGHTFACE_BUFFALO_SC_SHA) $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip" | sha256sum -c; \
rm -f $(INSIGHTFACE_BUFFALO_SC_DIR)/*.onnx; \
fi
@if [ ! -f "$(INSIGHTFACE_BUFFALO_SC_DIR)/det_500m.onnx" ]; then \
echo "Extracting buffalo_sc..."; \
unzip -o -q $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip -d $(INSIGHTFACE_BUFFALO_SC_DIR); \
fi
## buffalo_sc — smallest insightface pack (SCRFD-500MF detector + MBF
## recognizer, ~16MB). Exercises the insightface engine code path
## (model_zoo-backed inference) without the ~326MB buffalo_l download.
## No age/gender/landmark heads — face_analyze is dropped from caps.
## The pack is pre-fetched on the host and passed as `root:<dir>` since
## the e2e suite drives LoadModel directly without going through
## LocalAI's gallery flow (which is what would normally populate
## ModelPath and in turn the engine's `_model_dir` option).
test-extra-backend-insightface-buffalo-sc: docker-build-insightface insightface-buffalo-sc-models
BACKEND_IMAGE=local-ai-backend:insightface \
BACKEND_TEST_MODEL_NAME=insightface-buffalo-sc \
BACKEND_TEST_OPTIONS=engine:insightface,model_pack:buffalo_sc,root:$(INSIGHTFACE_BUFFALO_SC_DIR) \
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
$(MAKE) test-extra-backend
## OpenCV Zoo YuNet + SFace — Apache 2.0, commercial-safe. face_analyze
## cap is dropped (SFace has no demographic head). The ONNX files are
## pre-fetched on the host via the insightface-opencv-models target and
## passed as absolute paths, since the e2e suite drives LoadModel
## directly without going through LocalAI's gallery flow.
test-extra-backend-insightface-opencv: docker-build-insightface insightface-opencv-models
BACKEND_IMAGE=local-ai-backend:insightface \
BACKEND_TEST_MODEL_NAME=insightface-opencv \
BACKEND_TEST_OPTIONS=engine:onnx_direct,detector_onnx:$(INSIGHTFACE_OPENCV_DIR)/yunet.onnx,recognizer_onnx:$(INSIGHTFACE_OPENCV_DIR)/sface.onnx \
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
$(MAKE) test-extra-backend
## Aggregate — runs both face-recognition model configurations so CI
## catches regressions across engines together.
test-extra-backend-insightface-all: \
test-extra-backend-insightface-buffalo-sc \
test-extra-backend-insightface-opencv
## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
## tool-call extraction via sglang's native qwen parser. CPU builds use
## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
@@ -748,6 +858,7 @@ BACKEND_OUTETTS = outetts|python|.|false|true
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
BACKEND_COQUI = coqui|python|.|false|true
BACKEND_RFDETR = rfdetr|python|.|false|true
BACKEND_INSIGHTFACE = insightface|python|.|false|true
BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
BACKEND_NEUTTS = neutts|python|.|false|true
BACKEND_KOKORO = kokoro|python|.|false|true
@@ -819,6 +930,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
$(eval $(call generate-docker-build-target,$(BACKEND_INSIGHTFACE)))
$(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
@@ -853,7 +965,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface
########################################################
### Mock Backend for E2E Tests

View File

@@ -24,6 +24,8 @@ service Backend {
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
rpc Detect(DetectOptions) returns (DetectResponse) {}
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
rpc StoresSet(StoresSetOptions) returns (Result) {}
rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
@@ -475,6 +477,57 @@ message DetectResponse {
repeated Detection Detections = 1;
}
// --- Face recognition messages ---
message FacialArea {
float x = 1;
float y = 2;
float w = 3;
float h = 4;
}
message FaceVerifyRequest {
string img1 = 1; // base64-encoded image
string img2 = 2; // base64-encoded image
float threshold = 3; // cosine-distance threshold; 0 = use backend default
bool anti_spoofing = 4; // reserved for future MiniFASNet bolt-on
}
message FaceVerifyResponse {
bool verified = 1;
float distance = 2; // 1 - cosine_similarity
float threshold = 3;
float confidence = 4; // 0-100
string model = 5; // e.g. "buffalo_l"
FacialArea img1_area = 6;
FacialArea img2_area = 7;
float processing_time_ms = 8;
}
message FaceAnalyzeRequest {
string img = 1; // base64-encoded image
repeated string actions = 2; // subset of ["age","gender","emotion","race"]; empty = all-supported
bool anti_spoofing = 3;
}
message FaceAnalysis {
FacialArea region = 1;
float face_confidence = 2;
float age = 3;
string dominant_gender = 4; // "Man" | "Woman"
map<string, float> gender = 5;
string dominant_emotion = 6; // reserved; empty in MVP
map<string, float> emotion = 7;
string dominant_race = 8; // not populated
map<string, float> race = 9;
bool is_real = 10; // anti-spoofing result when enabled
float antispoof_score = 11;
}
message FaceAnalyzeResponse {
repeated FaceAnalysis faces = 1;
}
message ToolFormatMarkers {
string format_type = 1; // "json_native", "tag_with_json", "tag_with_tagged"

View File

@@ -168,6 +168,43 @@
nvidia-cuda-13: "cuda13-rfdetr"
nvidia-cuda-12: "cuda12-rfdetr"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-rfdetr"
- &insightface
name: "insightface"
alias: "insightface"
# Upstream insightface library is MIT. The pretrained model packs
# (buffalo_l, buffalo_s, antelopev2) are released for NON-COMMERCIAL
# research use only. The backend image also pre-bakes OpenCV Zoo
# YuNet + SFace (Apache 2.0) for commercial use. Pick the engine
# via model-gallery entries (insightface-buffalo-l / insightface-opencv
# / insightface-buffalo-s) or set `options` in your model YAML.
license: "mixed"
description: |
Face recognition backend powered by `insightface` (ONNX Runtime).
Provides face verification (/v1/face/verify), face analysis
(/v1/face/analyze), face embedding (/v1/embeddings), face
detection (/v1/detection), and 1:N identification
(/v1/face/{register,identify,forget}).
Ships two engines in a single image: one that drives the insightface
model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research
use only) and one that drives OpenCV Zoo's YuNet + SFace pair
(Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]`
in your model YAML, or install one of the ready-made model-gallery
entries under the `insightface-*` prefix.
The backend image contains only code and Python deps; all model
weights are managed by LocalAI's gallery download mechanism.
urls:
- https://github.com/deepinsight/insightface
- https://github.com/opencv/opencv_zoo
tags:
- face-recognition
- face-verification
- face-embedding
- gpu
- cpu
capabilities:
default: "cpu-insightface"
nvidia: "cuda12-insightface"
nvidia-cuda-12: "cuda12-insightface"
- &sam3cpp
name: "sam3-cpp"
alias: "sam3-cpp"
@@ -3709,3 +3746,30 @@
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization
# insightface (face recognition) — development and concrete image entries
- !!merge <<: *insightface
name: "insightface-development"
capabilities:
default: "cpu-insightface-development"
nvidia: "cuda12-insightface-development"
nvidia-cuda-12: "cuda12-insightface-development"
- !!merge <<: *insightface
name: "cpu-insightface"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-insightface"
mirrors:
- localai/localai-backends:latest-cpu-insightface
- !!merge <<: *insightface
name: "cuda12-insightface"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-insightface"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-insightface
- !!merge <<: *insightface
name: "cpu-insightface-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-insightface"
mirrors:
- localai/localai-backends:master-cpu-insightface
- !!merge <<: *insightface
name: "cuda12-insightface-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-insightface"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-insightface

View File

@@ -0,0 +1,13 @@
.DEFAULT_GOAL := install
.PHONY: install
install:
bash install.sh
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -0,0 +1,67 @@
# insightface backend (LocalAI)
Face recognition backend backed by ONNX Runtime. Provides face
verification (1:1), face analysis (age/gender), face detection, face
embedding, and — via LocalAI's built-in vector store — 1:N
identification.
## Engines
This backend ships with **two** interchangeable engines selected via
`LoadModel.Options["engine"]`:
| engine | Implementation | Models | License |
|---|---|---|---|
| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
Both engines implement the same `FaceEngine` protocol in `engines.py`,
so the gRPC servicer in `backend.py` doesn't need to know which one is
active.
## LoadModel options
Common:
| option | default | description |
|---|---|---|
| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
| `det_thresh` | `0.5` | detector confidence threshold |
| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
`insightface` engine:
| option | default | description |
|---|---|---|
| `model_pack` | `buffalo_l` | which insightface pack to load |
`onnx_direct` engine:
| option | default | description |
|---|---|---|
| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
## Adding a new model pack
1. If it's an insightface pack (auto-downloadable or manually extracted
into `~/.insightface/models/<name>/`), just add a new gallery entry
in `backend/index.yaml` with `options: ["engine:insightface",
"model_pack:<name>"]`. No code change.
2. If it's an Apache-licensed ONNX pair, add a gallery entry with
`options: ["engine:onnx_direct", "detector_onnx:...",
"recognizer_onnx:..."]`. If the detector or recognizer has a
different input-tensor shape than YuNet/SFace, you may need a new
engine implementation in `engines.py`; the two-engine seam makes
that a self-contained change.
## Running tests locally
```bash
make -C backend/python/insightface # install deps + bake models
make -C backend/python/insightface test # run test.py
```
The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
absent (e.g. on dev boxes where `install.sh` wasn't run).

View File

@@ -0,0 +1,265 @@
#!/usr/bin/env python3
"""gRPC server for the insightface face recognition backend.
Implements Health / LoadModel / Status plus the face-specific methods:
Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
delegated to engines.py — this file is just the gRPC plumbing.
"""
import argparse
import base64
import os
import signal
import sys
import time
from concurrent import futures
from io import BytesIO
import backend_pb2
import backend_pb2_grpc
import cv2
import grpc
import numpy as np
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
from grpc_auth import get_auth_interceptors # noqa: E402
from engines import FaceEngine, build_engine # noqa: E402
_ONE_DAY = 60 * 60 * 24
MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
# Default cosine-distance threshold for "same person" on buffalo_l
# ArcFace R50. Clients can override per-request; clients using SFace
# should pass threshold≈0.4 since the distance distribution is wider.
DEFAULT_VERIFY_THRESHOLD = 0.35
def _decode_image(src: str) -> np.ndarray | None:
"""Decode a base64-encoded image into an OpenCV BGR numpy array."""
if not src:
return None
try:
data = base64.b64decode(src, validate=False)
except Exception:
return None
arr = np.frombuffer(data, dtype=np.uint8)
if arr.size == 0:
return None
img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
return img
def _parse_options(raw: list[str]) -> dict[str, str]:
out: dict[str, str] = {}
for entry in raw:
if ":" not in entry:
continue
k, v = entry.split(":", 1)
out[k.strip()] = v.strip()
return out
class BackendServicer(backend_pb2_grpc.BackendServicer):
def __init__(self) -> None:
self.engine: FaceEngine | None = None
self.engine_name: str = ""
self.model_name: str = ""
self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", "utf-8"))
def LoadModel(self, request, context):
options = _parse_options(list(request.Options))
# Surface LocalAI's models directory (ModelPath) so engines can
# anchor relative paths — OnnxDirectEngine's detector_onnx /
# recognizer_onnx point at gallery-managed files that LocalAI
# dropped there, and InsightFaceEngine auto-downloads its packs
# into that same directory alongside every other managed model.
# Private key to avoid clashing with user-provided options.
if request.ModelPath:
options["_model_dir"] = request.ModelPath
engine_name = options.get("engine", "insightface")
try:
self.engine = build_engine(engine_name)
self.engine.prepare(options)
except Exception as err: # pragma: no cover - exercised via e2e
return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
self.engine_name = engine_name
self.model_name = request.Model or options.get("model_pack", "")
if "verify_threshold" in options:
try:
self.verify_threshold = float(options["verify_threshold"])
except ValueError:
pass
print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
return backend_pb2.Result(success=True, message="Model loaded successfully")
def Status(self, request, context):
state = (
backend_pb2.StatusResponse.READY
if self.engine is not None
else backend_pb2.StatusResponse.UNINITIALIZED
)
return backend_pb2.StatusResponse(state=state)
def Embedding(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.EmbeddingResult()
if not request.Images:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("Embedding requires Images[0] to be a base64 image")
return backend_pb2.EmbeddingResult()
img = _decode_image(request.Images[0])
if img is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode image")
return backend_pb2.EmbeddingResult()
vec = self.engine.embed(img)
if vec is None:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details("no face detected")
return backend_pb2.EmbeddingResult()
return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
def Detect(self, request, context):
if self.engine is None:
return backend_pb2.DetectResponse()
img = _decode_image(request.src)
if img is None:
return backend_pb2.DetectResponse()
detections = []
for d in self.engine.detect(img):
x1, y1, x2, y2 = d.bbox
detections.append(
backend_pb2.Detection(
x=float(x1),
y=float(y1),
width=float(x2 - x1),
height=float(y2 - y1),
confidence=float(d.score),
class_name="face",
)
)
return backend_pb2.DetectResponse(Detections=detections)
def FaceVerify(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.FaceVerifyResponse()
img1 = _decode_image(request.img1)
img2 = _decode_image(request.img2)
if img1 is None or img2 is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode one or both images")
return backend_pb2.FaceVerifyResponse()
threshold = request.threshold if request.threshold > 0 else self.verify_threshold
start = time.time()
e1 = self.engine.embed(img1)
e2 = self.engine.embed(img2)
if e1 is None or e2 is None:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details("no face detected in one or both images")
return backend_pb2.FaceVerifyResponse()
# Both engines return L2-normalized vectors, so the dot product
# is the cosine similarity directly.
sim = float(np.dot(e1, e2))
distance = 1.0 - sim
verified = distance < threshold
confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
def _region(img) -> backend_pb2.FacialArea:
dets = self.engine.detect(img)
if not dets:
return backend_pb2.FacialArea()
best = max(dets, key=lambda d: d.score)
x1, y1, x2, y2 = best.bbox
return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
return backend_pb2.FaceVerifyResponse(
verified=verified,
distance=float(distance),
threshold=float(threshold),
confidence=float(confidence),
model=self.model_name or self.engine_name,
img1_area=_region(img1),
img2_area=_region(img2),
processing_time_ms=float((time.time() - start) * 1000.0),
)
def FaceAnalyze(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.FaceAnalyzeResponse()
img = _decode_image(request.img)
if img is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode image")
return backend_pb2.FaceAnalyzeResponse()
faces = []
for attrs in self.engine.analyze(img):
x, y, w, h = attrs.region
fa = backend_pb2.FaceAnalysis(
region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
face_confidence=float(attrs.face_confidence),
)
if attrs.age is not None:
fa.age = float(attrs.age)
if attrs.dominant_gender:
fa.dominant_gender = attrs.dominant_gender
for k, v in attrs.gender.items():
fa.gender[k] = float(v)
faces.append(fa)
return backend_pb2.FaceAnalyzeResponse(faces=faces)
def serve(address: str) -> None:
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
("grpc.max_message_length", 50 * 1024 * 1024),
("grpc.max_send_message_length", 50 * 1024 * 1024),
("grpc.max_receive_message_length", 50 * 1024 * 1024),
],
interceptors=get_auth_interceptors(),
)
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
def _stop(sig, frame): # pragma: no cover
print("[insightface] shutting down")
server.stop(0)
sys.exit(0)
signal.signal(signal.SIGINT, _stop)
signal.signal(signal.SIGTERM, _stop)
try:
while True:
time.sleep(_ONE_DAY)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
args = parser.parse_args()
print(f"[insightface] startup: {args}", file=sys.stderr)
serve(args.addr)

View File

@@ -0,0 +1,382 @@
"""Face recognition engine implementations for the LocalAI insightface backend.
Two engines are provided:
* InsightFaceEngine — wraps insightface.app.FaceAnalysis. Supports
buffalo_l / buffalo_s / antelopev2 model packs
with SCRFD detector + ArcFace recognizer +
genderage head. NON-COMMERCIAL research use
only (upstream license).
* OnnxDirectEngine — loads detector + recognizer ONNX files directly
via onnxruntime. Used for OpenCV Zoo models
(YuNet + SFace) and any future Apache-licensed
model set. Does not support analyze().
Both engines expose the same interface so the gRPC servicer (backend.py)
can dispatch without knowing which one is active.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Protocol
import cv2
import numpy as np
@dataclass
class FaceDetection:
bbox: tuple[float, float, float, float] # x1, y1, x2, y2
score: float
landmarks: np.ndarray | None = None # 5x2 keypoints when available
@dataclass
class FaceAttributes:
region: tuple[float, float, float, float] # x, y, w, h
face_confidence: float
age: float | None = None
dominant_gender: str | None = None
gender: dict[str, float] = field(default_factory=dict)
class FaceEngine(Protocol):
"""Minimal interface every engine must implement."""
def prepare(self, options: dict[str, str]) -> None: ...
def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
def embed(self, img: np.ndarray) -> np.ndarray | None: ...
def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
# ─── InsightFaceEngine ────────────────────────────────────────────────
class InsightFaceEngine:
"""Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
build a `{taskname: model}` dict, then loop per-face at inference).
We reimplement the same loop here so we can:
1. Load packs from whatever directory LocalAI's gallery extracted
them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
without needing a specific layout on disk.
2. Skip insightface's built-in auto-download entirely: weight
delivery is LocalAI's gallery `files:` job now, checksum-
verified and cached alongside every other managed model.
The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
Landmark) stay in insightface — we only reimplement the ~50 lines
of glue around them.
"""
def __init__(self) -> None:
self.models: dict[str, Any] = {}
self.det_model: Any = None
self.model_pack: str = "buffalo_l"
self.det_size: tuple[int, int] = (640, 640)
self.det_thresh: float = 0.5
self._providers: list[str] = ["CPUExecutionProvider"]
def prepare(self, options: dict[str, str]) -> None:
import glob
import os
from insightface.model_zoo import model_zoo
self.model_pack = options.get("model_pack", "buffalo_l")
self.det_size = _parse_det_size(options.get("det_size", "640x640"))
self.det_thresh = float(options.get("det_thresh", "0.5"))
pack_dir = _locate_insightface_pack(options, self.model_pack)
if pack_dir is None:
raise ValueError(
f"no insightface pack '{self.model_pack}' found — install via "
f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
)
onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
if not onnx_files:
raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
# CUDAExecutionProvider is picked automatically by onnxruntime-gpu
# when available; falling back to CPU keeps the CPU-only image
# working. ctx_id=0 means "first GPU if any, else CPU".
self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
self.models = {}
for onnx_file in onnx_files:
m = model_zoo.get_model(onnx_file, providers=self._providers)
if m is None:
continue
# First occurrence of each taskname wins (matches FaceAnalysis).
if m.taskname not in self.models:
self.models[m.taskname] = m
if "detection" not in self.models:
raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
self.det_model = self.models["detection"]
self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
for name, m in self.models.items():
if name != "detection":
m.prepare(0)
def _faces(self, img: np.ndarray) -> list[Any]:
"""Run detection + all non-detection models per face."""
if self.det_model is None:
return []
from insightface.app.common import Face
bboxes, kpss = self.det_model.detect(img, max_num=0)
if bboxes is None or bboxes.shape[0] == 0:
return []
faces: list[Any] = []
for i in range(bboxes.shape[0]):
bbox = bboxes[i, 0:4]
det_score = bboxes[i, 4]
kps = kpss[i] if kpss is not None else None
face = Face(bbox=bbox, kps=kps, det_score=det_score)
for name, m in self.models.items():
if name == "detection":
continue
m.get(img, face)
faces.append(face)
return faces
def detect(self, img: np.ndarray) -> list[FaceDetection]:
return [
FaceDetection(
bbox=tuple(float(v) for v in f.bbox),
score=float(f.det_score),
landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
)
for f in self._faces(img)
]
def embed(self, img: np.ndarray) -> np.ndarray | None:
faces = self._faces(img)
if not faces:
return None
best = max(faces, key=lambda f: float(f.det_score))
if getattr(best, "normed_embedding", None) is None:
return None
return np.asarray(best.normed_embedding, dtype=np.float32)
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
out: list[FaceAttributes] = []
for f in self._faces(img):
x1, y1, x2, y2 = (float(v) for v in f.bbox)
region = (x1, y1, x2 - x1, y2 - y1)
attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
age = getattr(f, "age", None)
if age is not None:
attrs.age = float(age)
gender = getattr(f, "gender", None)
if gender is not None:
# genderage head emits argmax, not probabilities —
# one-hot dict keeps the API stable.
attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
attrs.gender = {
"Man": 1.0 if int(gender) == 1 else 0.0,
"Woman": 0.0 if int(gender) == 1 else 1.0,
}
out.append(attrs)
return out
# ─── OnnxDirectEngine ─────────────────────────────────────────────────
class OnnxDirectEngine:
"""Loads detector + recognizer ONNX files directly.
Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
exposes a C++-level API via cv2.FaceDetectorYN which accepts the
ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
Both are Apache 2.0 licensed.
"""
def __init__(self) -> None:
self.detector_path: str = ""
self.recognizer_path: str = ""
self.input_size: tuple[int, int] = (320, 320)
self.det_thresh: float = 0.5
self._detector: Any = None
self._recognizer: Any = None
def prepare(self, options: dict[str, str]) -> None:
raw_det = options.get("detector_onnx", "")
raw_rec = options.get("recognizer_onnx", "")
if not raw_det or not raw_rec:
raise ValueError(
"onnx_direct engine requires both detector_onnx and recognizer_onnx options"
)
model_dir = options.get("_model_dir")
self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
self.input_size = _parse_det_size(options.get("det_size", "320x320"))
self.det_thresh = float(options.get("det_thresh", "0.5"))
# YuNet is a fixed-size detector; size is reset per detect() call to
# match the input frame.
self._detector = cv2.FaceDetectorYN.create(
self.detector_path,
"",
self.input_size,
score_threshold=self.det_thresh,
nms_threshold=0.3,
top_k=5000,
)
self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
def detect(self, img: np.ndarray) -> list[FaceDetection]:
if self._detector is None:
return []
h, w = img.shape[:2]
self._detector.setInputSize((w, h))
retval, faces = self._detector.detect(img)
if faces is None:
return []
out: list[FaceDetection] = []
for row in faces:
x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
# Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
score = float(row[-1])
out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
return out
def embed(self, img: np.ndarray) -> np.ndarray | None:
if self._detector is None or self._recognizer is None:
return None
h, w = img.shape[:2]
self._detector.setInputSize((w, h))
retval, faces = self._detector.detect(img)
if faces is None or len(faces) == 0:
return None
# Pick the highest-score face (last column is score).
best = max(faces, key=lambda r: float(r[-1]))
aligned = self._recognizer.alignCrop(img, best)
feat = self._recognizer.feature(aligned)
vec = np.asarray(feat, dtype=np.float32).flatten()
# SFace outputs a 128-dim feature; L2-normalize to make dot-product
# comparable to buffalo_l's already-normed 512-dim embedding.
norm = float(np.linalg.norm(vec))
if norm == 0:
return None
return vec / norm
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
# OpenCV Zoo does not ship a demographic classifier; report
# only the face-detection regions so callers can still see
# how many faces were detected.
return [
FaceAttributes(
region=(
d.bbox[0],
d.bbox[1],
d.bbox[2] - d.bbox[0],
d.bbox[3] - d.bbox[1],
),
face_confidence=d.score,
)
for d in self.detect(img)
]
# ─── helpers ──────────────────────────────────────────────────────────
def _parse_det_size(raw: str) -> tuple[int, int]:
raw = raw.strip().lower().replace(" ", "")
if "x" in raw:
w, h = raw.split("x", 1)
return (int(w), int(h))
n = int(raw)
return (n, n)
def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
"""Find the directory holding the insightface pack's ONNX files.
LocalAI's gallery `files:` extracts the pack zip straight into the
models directory. Upstream packs are inconsistent:
buffalo_l/s/sc — flat zip, ONNX lands at `<models_dir>/*.onnx`
buffalo_m, antelopev2 — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
We search, in order:
1. `<models_dir>/<name>/` — wrapped-zip layout, or insightface's
own FaceAnalysis-style `<root>/models/<name>/` layout.
2. `<models_dir>/models/<name>/` — insightface's FaceAnalysis
auto-download lands here (handy for dev environments that
still have old `~/.insightface` caches).
3. `<models_dir>/` — flat-zip layout directly in models dir.
Returns the first directory whose contents include `*.onnx`.
"""
import glob
import os
model_dir = options.get("_model_dir") or ""
explicit_root = options.get("root")
candidates: list[str] = []
if model_dir:
candidates.append(os.path.join(model_dir, name))
candidates.append(os.path.join(model_dir, "models", name))
candidates.append(model_dir)
if explicit_root:
expanded = os.path.expanduser(explicit_root)
candidates.append(os.path.join(expanded, "models", name))
candidates.append(os.path.join(expanded, name))
candidates.append(expanded)
for c in candidates:
if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
return c
return None
def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
"""Resolve an ONNX file path across the paths LocalAI might deliver it from.
Search order:
1. The path itself if it already resolves (absolute, or relative to CWD).
2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
this is how LocalAI surfaces gallery-managed files. When the gallery
entry lists `files:`, each one lands under the models directory and
backends load them via filename anchored by ModelFile.
3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
where someone manually dropped weights inside the backend dir.
If none hit, return the literal input so cv2/insightface surfaces a
clearer error naming the actually-attempted path.
"""
import os
if os.path.isfile(path):
return path
stripped = path.lstrip("/")
candidates: list[str] = []
if model_dir:
candidates.append(os.path.join(model_dir, os.path.basename(path)))
candidates.append(os.path.join(model_dir, stripped))
script_dir = os.path.dirname(os.path.abspath(__file__))
candidates.append(os.path.join(script_dir, stripped))
for c in candidates:
if os.path.isfile(c):
return c
return path
def build_engine(name: str) -> FaceEngine:
"""Factory for the engine selected by LoadModel options."""
key = name.strip().lower()
if key in ("", "insightface"):
return InsightFaceEngine()
if key in ("onnx_direct", "onnx-direct", "opencv"):
return OnnxDirectEngine()
raise ValueError(f"unknown engine: {name!r}")

View File

@@ -0,0 +1,28 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
installRequirements
# We deliberately do NOT pre-bake any model weights here. Two reasons:
#
# 1. Weights should follow LocalAI's gallery-managed download flow
# like every other backend. For OpenCV Zoo (YuNet + SFace) the
# gallery entries in gallery/index.yaml list the ONNX files via
# `files:` with URI + SHA-256 — LocalAI fetches them into the
# models directory on `local-ai models install`.
#
# 2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
# buffalo_sc, antelopev2), upstream distributes zip archives
# only (no individual ONNX URLs). We rely on insightface's own
# auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
# at first LoadModel, pointed at a writable directory. This
# matches how rfdetr behaves (uses `inference.get_model()`).
#
# Net effect: the backend image ships only Python deps (~150MB CPU).

View File

@@ -0,0 +1,7 @@
insightface
onnxruntime
opencv-python-headless
numpy
onnx
cython
scikit-image

View File

@@ -0,0 +1,7 @@
insightface
onnxruntime-gpu
opencv-python-headless
numpy
onnx
cython
scikit-image

View File

@@ -0,0 +1,3 @@
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -0,0 +1,9 @@
#!/bin/bash
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
startBackend $@

View File

@@ -0,0 +1,264 @@
#!/usr/bin/env python3
"""Smoke-test every face recognition model configuration shipped in the
gallery. Simulates what LocalAI does at runtime: for each config, sets
up a models directory, fetches any required files via URL (as the
gallery's `files:` list would), then loads + detects + embeds via the
in-process BackendServicer — matching the gRPC surface end users hit.
Run inside the built backend image (venv already has insightface /
onnxruntime / opencv-python-headless):
python smoke.py
Network is required for the insightface packs (fetched via upstream's
FaceAnalysis auto-download at first LoadModel) and for downloading
the OpenCV Zoo ONNX files on first run.
"""
from __future__ import annotations
import base64
import hashlib
import os
import sys
import traceback
import urllib.request
import cv2
import numpy as np
sys.path.insert(0, os.path.dirname(__file__))
import backend_pb2 # noqa: E402
from backend import BackendServicer # noqa: E402
# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
OPENCV_FILES = {
"fp32": [
(
"face_detection_yunet_2023mar.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
),
(
"face_recognition_sface_2021dec.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
),
],
"int8": [
(
"face_detection_yunet_2023mar_int8.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
"321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
),
(
"face_recognition_sface_2021dec_int8.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
"2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
),
],
}
CONFIGS = [
{
"name": "insightface-buffalo-l",
"options": ["engine:insightface", "model_pack:buffalo_l"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-sc",
"options": ["engine:insightface", "model_pack:buffalo_sc"],
# buffalo_sc has recognizer only — no landmarks, no genderage.
"has_analyze": False,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-s",
"options": ["engine:insightface", "model_pack:buffalo_s"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-m",
"options": ["engine:insightface", "model_pack:buffalo_m"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-antelopev2",
"options": ["engine:insightface", "model_pack:antelopev2"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-opencv",
"options": [
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar.onnx",
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
],
"has_analyze": False,
"needs_opencv_files": "fp32",
},
{
"name": "insightface-opencv-int8",
"options": [
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar_int8.onnx",
"recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
],
"has_analyze": False,
"needs_opencv_files": "int8",
},
]
class _FakeContext:
def __init__(self) -> None:
self.code = None
self.details = None
def set_code(self, code):
self.code = code
def set_details(self, details):
self.details = details
def _encode_image(img: np.ndarray) -> str:
_, buf = cv2.imencode(".jpg", img)
return base64.b64encode(buf.tobytes()).decode("ascii")
def _load_sample_image() -> str:
from insightface.data import get_image as ins_get_image
return _encode_image(ins_get_image("t1"))
def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
dest = os.path.join(model_dir, filename)
if os.path.isfile(dest):
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
if h == sha256:
return
sys.stderr.write(f" fetching {filename} from {uri}\n")
sys.stderr.flush()
urllib.request.urlretrieve(uri, dest)
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
if h != sha256:
raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
# Mirror LocalAI's gallery flow: populate model_dir with the
# gallery's listed files before calling LoadModel.
if cfg["needs_opencv_files"]:
for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
_download_if_missing(model_dir, filename, uri, sha256)
svc = BackendServicer()
ctx = _FakeContext()
load_res = svc.LoadModel(
backend_pb2.ModelOptions(
Model=cfg["name"],
Options=cfg["options"],
# ModelPath is what the Go loader sets to ml.ModelPath —
# LocalAI's models directory. The backend anchors relative
# paths and insightface auto-download root here.
ModelPath=model_dir,
),
ctx,
)
if not load_res.success:
return False, f"LoadModel: {load_res.message}"
det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
if len(det_res.Detections) == 0:
return False, "Detect returned no faces"
for d in det_res.Detections:
if d.class_name != "face":
return False, f"Detect returned class_name={d.class_name!r}"
emb_ctx = _FakeContext()
emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
if emb_ctx.code is not None:
return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
if len(emb_res.embeddings) == 0:
return False, "Embedding returned empty vector"
norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
if not (0.8 <= norm_sq <= 1.2):
return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
ver_ctx = _FakeContext()
ver_res = svc.FaceVerify(
backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
)
if ver_ctx.code is not None:
return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
if not ver_res.verified:
return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
if ver_res.distance > 0.1:
return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
if cfg["has_analyze"]:
an_ctx = _FakeContext()
an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
if an_ctx.code is not None:
return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
if len(an_res.faces) == 0:
return False, "FaceAnalyze returned no faces"
f0 = an_res.faces[0]
if f0.age <= 0:
return False, f"FaceAnalyze age not populated (age={f0.age})"
if f0.dominant_gender not in ("Man", "Woman"):
return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
n_dets = len(det_res.Detections)
dim = len(emb_res.embeddings)
return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
def main() -> int:
# Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
# default to a fresh temp dir.
model_dir = os.environ.get("LOCALAI_MODELS_PATH")
if not model_dir:
import tempfile
model_dir = tempfile.mkdtemp(prefix="face-smoke-")
os.makedirs(model_dir, exist_ok=True)
print(f"model_dir={model_dir}", file=sys.stderr)
print("Preparing sample image from insightface.data...", file=sys.stderr)
img_b64 = _load_sample_image()
results: list[tuple[str, bool, str]] = []
for cfg in CONFIGS:
sys.stderr.write(f"\n=== {cfg['name']} ===\n")
sys.stderr.flush()
try:
ok, detail = _run_one(cfg, img_b64, model_dir)
except Exception:
ok, detail = False, traceback.format_exc().splitlines()[-1]
results.append((cfg["name"], ok, detail))
print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s} {detail}")
sys.stdout.flush()
print("\n=== summary ===")
passed = sum(1 for _, ok, _ in results if ok)
total = len(results)
for name, ok, detail in results:
mark = "" if ok else ""
print(f" {mark} {name:30s} {detail}")
print(f"\n{passed}/{total} passed")
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,234 @@
"""Unit tests for the insightface gRPC backend.
The servicer is instantiated in-process (no gRPC channel) and driven
directly. Images come from insightface.data which ships with the pip
package — no external downloads.
Tests are parametrized over both engines (InsightFaceEngine and
OnnxDirectEngine) where applicable.
"""
from __future__ import annotations
import base64
import os
import sys
import unittest
import cv2
import numpy as np
sys.path.insert(0, os.path.dirname(__file__))
import backend_pb2 # noqa: E402
from backend import BackendServicer # noqa: E402
# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
# to mirror LocalAI's gallery `files:` flow (the backend image itself
# doesn't ship model weights).
OPENCV_FILES = [
(
"face_detection_yunet_2023mar.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
),
(
"face_recognition_sface_2021dec.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
),
]
def _encode(img: np.ndarray) -> str:
_, buf = cv2.imencode(".jpg", img)
return base64.b64encode(buf.tobytes()).decode("ascii")
def _load_insightface_samples() -> dict[str, str]:
"""Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
t1 is a group photo, t2 a different one. We reuse both as
stand-ins for "Alice photo 1/2" and "Bob".
"""
from insightface.data import get_image as ins_get_image
return {
"t1": _encode(ins_get_image("t1")),
"t2": _encode(ins_get_image("t2")),
}
class _FakeContext:
"""Minimal stand-in for grpc.ServicerContext."""
def __init__(self) -> None:
self.code = None
self.details = None
def set_code(self, code):
self.code = code
def set_details(self, details):
self.details = details
class _Harness:
def __init__(self, servicer: BackendServicer) -> None:
self.svc = servicer
def health(self):
return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
def load(self, options: list[str], model_path: str = ""):
return self.svc.LoadModel(
backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
_FakeContext(),
)
def detect(self, img_b64: str):
return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
def embed(self, img_b64: str):
ctx = _FakeContext()
res = self.svc.Embedding(
backend_pb2.PredictOptions(Images=[img_b64]),
ctx,
)
return res, ctx
def verify(self, a: str, b: str, threshold: float = 0.0):
return self.svc.FaceVerify(
backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
_FakeContext(),
)
def analyze(self, img_b64: str):
return self.svc.FaceAnalyze(
backend_pb2.FaceAnalyzeRequest(img=img_b64),
_FakeContext(),
)
class InsightFaceEngineTest(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.samples = _load_insightface_samples()
cls.harness = _Harness(BackendServicer())
load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
if not load.success:
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
def test_health(self):
self.assertEqual(self.harness.health().message, b"OK")
def test_detect_finds_face(self):
res = self.harness.detect(self.samples["t1"])
self.assertGreater(len(res.Detections), 0)
for d in res.Detections:
self.assertEqual(d.class_name, "face")
self.assertGreater(d.width, 0)
self.assertGreater(d.height, 0)
def test_embedding_is_l2_normed(self):
res, ctx = self.harness.embed(self.samples["t1"])
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
self.assertEqual(len(res.embeddings), 512)
norm_sq = sum(x * x for x in res.embeddings)
self.assertAlmostEqual(norm_sq, 1.0, places=2)
def test_verify_same_image(self):
res = self.harness.verify(self.samples["t1"], self.samples["t1"])
self.assertTrue(res.verified)
self.assertLess(res.distance, 0.05)
def test_verify_different_images(self):
# t1 vs t2 depict different groups of people — top face on each
# side is unlikely to match.
res = self.harness.verify(self.samples["t1"], self.samples["t2"])
# We assert only that some numerical answer came back; the
# matches-or-not determination depends on which face each side
# picked and isn't a stable test assertion.
self.assertGreaterEqual(res.distance, 0.0)
def test_analyze_has_age_and_gender(self):
res = self.harness.analyze(self.samples["t1"])
self.assertGreater(len(res.faces), 0)
for face in res.faces:
self.assertGreater(face.face_confidence, 0.0)
# Age should be populated for buffalo_l.
self.assertGreater(face.age, 0.0)
self.assertIn(face.dominant_gender, ("Man", "Woman"))
def _prepare_opencv_models_dir() -> str | None:
"""Download OpenCV Zoo face ONNX files into a temp dir the way
LocalAI's gallery would. Returns the directory, or None if
downloads failed (network-restricted sandbox).
"""
import hashlib
import tempfile
import urllib.request
root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
prefix="opencv-face-"
)
for filename, uri, sha256 in OPENCV_FILES:
dest = os.path.join(root, filename)
if os.path.isfile(dest):
if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
continue
try:
urllib.request.urlretrieve(uri, dest)
except Exception:
return None
if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
return None
return root
class OnnxDirectEngineTest(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.samples = _load_insightface_samples()
cls.model_dir = _prepare_opencv_models_dir()
if cls.model_dir is None:
raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
cls.harness = _Harness(BackendServicer())
load = cls.harness.load(
[
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar.onnx",
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
],
model_path=cls.model_dir,
)
if not load.success:
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
def test_detect_finds_face(self):
res = self.harness.detect(self.samples["t1"])
self.assertGreater(len(res.Detections), 0)
for d in res.Detections:
self.assertEqual(d.class_name, "face")
def test_embedding_nonempty(self):
res, ctx = self.harness.embed(self.samples["t1"])
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
self.assertGreater(len(res.embeddings), 0)
def test_verify_same_image(self):
res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
self.assertTrue(res.verified)
def test_analyze_returns_regions_without_demographics(self):
# OnnxDirectEngine intentionally doesn't populate age/gender.
res = self.harness.analyze(self.samples["t1"])
self.assertGreater(len(res.faces), 0)
for face in res.faces:
self.assertEqual(face.dominant_gender, "")
self.assertEqual(face.age, 0.0)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,11 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
runUnittests

View File

@@ -7,17 +7,28 @@ import (
"sync/atomic"
"time"
corebackend "github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/services/agentpool"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/templates"
pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
"gorm.io/gorm"
)
// faceEmbeddingDim is the expected dimension for face embeddings.
// Set to 0 so the Registry accepts whatever dim the loaded recognizer
// produces — ArcFace R50 is 512-d, MBF is 512-d, SFace is 128-d, and
// the insightface backend can load any of them via LoadModel options.
// Locking this to a specific value would force a single recognizer
// family per deployment; we keep the door open instead.
const faceEmbeddingDim = 0
type Application struct {
backendLoader *config.ModelConfigLoader
modelLoader *model.ModelLoader
@@ -27,6 +38,7 @@ type Application struct {
galleryService *galleryop.GalleryService
agentJobService *agentpool.AgentJobService
agentPoolService atomic.Pointer[agentpool.AgentPoolService]
faceRegistry facerecognition.Registry
authDB *gorm.DB
watchdogMutex sync.Mutex
watchdogStop chan bool
@@ -50,12 +62,23 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
mcpTools.CloseMCPSessions(modelName)
})
return &Application{
app := &Application{
backendLoader: config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
modelLoader: ml,
applicationConfig: appConfig,
templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
}
// Face-recognition registry backed by LocalAI's built-in vector store.
// The resolver closes over the ModelLoader so the Registry stays
// decoupled from loader plumbing; swapping in a postgres-backed
// implementation later is a single construction change here.
faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
return corebackend.StoreBackend(ml, appConfig, storeName, "")
}
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
return app
}
func (a *Application) ModelConfigLoader() *config.ModelConfigLoader {
@@ -99,6 +122,14 @@ func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
return a.agentPoolService.Load()
}
// FaceRegistry returns the face-recognition registry used for 1:N
// identification. The current implementation is backed by the
// in-memory local-store backend; see core/services/facerecognition
// for the interface and the postgres TODO.
func (a *Application) FaceRegistry() facerecognition.Registry {
return a.faceRegistry
}
// AuthDB returns the auth database connection, or nil if auth is not enabled.
func (a *Application) AuthDB() *gorm.DB {
return a.authDB

View File

@@ -0,0 +1,60 @@
package backend
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
"github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
)
func FaceAnalyze(
img string,
actions []string,
antiSpoofing bool,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) (*proto.FaceAnalyzeResponse, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
startTime = time.Now()
}
res, err := faceModel.FaceAnalyze(context.Background(), &proto.FaceAnalyzeRequest{
Img: img,
Actions: actions,
AntiSpoofing: antiSpoofing,
})
if appConfig.EnableTracing {
errStr := ""
if err != nil {
errStr = err.Error()
}
trace.RecordBackendTrace(trace.BackendTrace{
Timestamp: startTime,
Duration: time.Since(startTime),
Type: trace.BackendTraceFaceAnalyze,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Error: errStr,
})
}
return res, err
}

View File

@@ -0,0 +1,43 @@
package backend
import (
"context"
"fmt"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/pkg/model"
)
// FaceEmbed loads the face recognition backend and returns a 512-d
// face embedding for the base64-encoded image. Unlike ModelEmbedding
// it passes the image through PredictOptions.Images — the insightface
// backend picks the highest-confidence face and returns its
// L2-normalized embedding.
func FaceEmbed(
imgBase64 string,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) ([]float32, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
predictOpts := gRPCPredictOpts(modelConfig, loader.ModelPath)
predictOpts.Images = []string{imgBase64}
res, err := faceModel.Embeddings(context.Background(), predictOpts)
if err != nil {
return nil, err
}
if len(res.Embeddings) == 0 {
return nil, fmt.Errorf("face embedding returned empty vector (no face detected?)")
}
return res.Embeddings, nil
}

View File

@@ -0,0 +1,61 @@
package backend
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
"github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
)
func FaceVerify(
img1, img2 string,
threshold float32,
antiSpoofing bool,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) (*proto.FaceVerifyResponse, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
startTime = time.Now()
}
res, err := faceModel.FaceVerify(context.Background(), &proto.FaceVerifyRequest{
Img1: img1,
Img2: img2,
Threshold: threshold,
AntiSpoofing: antiSpoofing,
})
if appConfig.EnableTracing {
errStr := ""
if err != nil {
errStr = err.Error()
}
trace.RecordBackendTrace(trace.BackendTrace{
Timestamp: startTime,
Duration: time.Since(startTime),
Type: trace.BackendTraceFaceVerify,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Error: errStr,
})
}
return res, err
}

View File

@@ -588,6 +588,7 @@ const (
FLAG_VAD ModelConfigUsecase = 0b010000000000
FLAG_VIDEO ModelConfigUsecase = 0b100000000000
FLAG_DETECTION ModelConfigUsecase = 0b1000000000000
FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000
// Common Subsets
FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@@ -611,6 +612,7 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
"FLAG_LLM": FLAG_LLM,
"FLAG_VIDEO": FLAG_VIDEO,
"FLAG_DETECTION": FLAG_DETECTION,
"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
}
}
@@ -651,7 +653,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
nonTextGenBackends := []string{
"whisper", "piper", "kokoro",
"diffusers", "stablediffusion", "stablediffusion-ggml",
"rerankers", "silero-vad", "rfdetr",
"rerankers", "silero-vad", "rfdetr", "insightface",
"transformers-musicgen", "ace-step", "acestep-cpp",
}
@@ -728,12 +730,19 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
}
if (u & FLAG_DETECTION) == FLAG_DETECTION {
detectionBackends := []string{"rfdetr", "sam3-cpp"}
detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
if !slices.Contains(detectionBackends, c.Backend) {
return false
}
}
if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
faceBackends := []string{"insightface"}
if !slices.Contains(faceBackends, c.Backend) {
return false
}
}
if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
if !slices.Contains(soundGenBackends, c.Backend) {

View File

@@ -57,6 +57,14 @@ var RouteFeatureRegistry = []RouteFeature{
// Detection
{"POST", "/v1/detection", FeatureDetection},
// Face recognition
{"POST", "/v1/face/verify", FeatureFaceRecognition},
{"POST", "/v1/face/analyze", FeatureFaceRecognition},
{"POST", "/v1/face/embed", FeatureFaceRecognition},
{"POST", "/v1/face/register", FeatureFaceRecognition},
{"POST", "/v1/face/identify", FeatureFaceRecognition},
{"POST", "/v1/face/forget", FeatureFaceRecognition},
// Video
{"POST", "/video", FeatureVideo},
@@ -151,5 +159,6 @@ func APIFeatureMetas() []FeatureMeta {
{FeatureTokenize, "Tokenize", true},
{FeatureMCP, "MCP", true},
{FeatureStores, "Stores", true},
{FeatureFaceRecognition, "Face Recognition", true},
}
}

View File

@@ -51,6 +51,7 @@ const (
FeatureTokenize = "tokenize"
FeatureMCP = "mcp"
FeatureStores = "stores"
FeatureFaceRecognition = "face_recognition"
)
// AgentFeatures lists agent-related features (default OFF).
@@ -64,6 +65,7 @@ var APIFeatures = []string{
FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription,
FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound,
FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores,
FeatureFaceRecognition,
}
// AllFeatures lists all known features (used by UI and validation).

View File

@@ -73,6 +73,12 @@ var instructionDefs = []instructionDef{
Description: "Video generation from text prompts",
Tags: []string{"video"},
},
{
Name: "face-recognition",
Description: "Face verification (1:1), identification (1:N), embedding, and demographic analysis",
Tags: []string{"face-recognition"},
Intro: "The /v1/face/register, /identify, and /forget endpoints build on a vector store — registrations are in-memory by default and lost on restart. Use /v1/face/embed for a raw embedding; /v1/embeddings is OpenAI-compatible and text-only.",
},
}
// swaggerState holds parsed swagger spec data, initialised once.

View File

@@ -39,7 +39,7 @@ var _ = Describe("API Instructions Endpoints", func() {
instructions, ok := resp["instructions"].([]any)
Expect(ok).To(BeTrue())
Expect(instructions).To(HaveLen(9))
Expect(instructions).To(HaveLen(10))
// Verify each instruction has required fields and correct URL format
for _, s := range instructions {
@@ -73,6 +73,7 @@ var _ = Describe("API Instructions Endpoints", func() {
"model-management",
"monitoring",
"agents",
"face-recognition",
))
})
})

View File

@@ -9,7 +9,6 @@ import (
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils"
"github.com/mudler/xlog"
)
@@ -34,14 +33,14 @@ func DetectionEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC
xlog.Debug("Detection", "image", input.Image, "modelFile", "modelFile", "backend", cfg.Backend)
image, err := utils.GetContentURIAsBase64(input.Image)
image, err := decodeImageInput(input.Image)
if err != nil {
return err
}
res, err := backend.Detection(image, input.Prompt, input.Points, input.Boxes, input.Threshold, ml, appConfig, *cfg)
if err != nil {
return err
return mapBackendError(err)
}
response := schema.DetectionResponse{

View File

@@ -0,0 +1,69 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceAnalyzeEndpoint returns demographic attributes for faces in an image.
// @Summary Analyze demographic attributes (age, gender, ...) of faces.
// @Tags face-recognition
// @Param request body schema.FaceAnalyzeRequest true "query params"
// @Success 200 {object} schema.FaceAnalyzeResponse "Response"
// @Router /v1/face/analyze [post]
func FaceAnalyzeEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceAnalyzeRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceAnalyze", "model", cfg.Name, "backend", cfg.Backend, "actions", input.Actions)
res, err := backend.FaceAnalyze(img, input.Actions, input.AntiSpoofing, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
response := schema.FaceAnalyzeResponse{
Faces: make([]schema.FaceAnalysis, len(res.GetFaces())),
}
for i, f := range res.GetFaces() {
response.Faces[i] = schema.FaceAnalysis{
Region: schema.FacialArea{
X: f.GetRegion().GetX(),
Y: f.GetRegion().GetY(),
W: f.GetRegion().GetW(),
H: f.GetRegion().GetH(),
},
FaceConfidence: f.GetFaceConfidence(),
Age: f.GetAge(),
DominantGender: f.GetDominantGender(),
Gender: f.GetGender(),
DominantEmotion: f.GetDominantEmotion(),
Emotion: f.GetEmotion(),
DominantRace: f.GetDominantRace(),
Race: f.GetRace(),
IsReal: f.GetIsReal(),
AntispoofScore: f.GetAntispoofScore(),
}
}
return c.JSON(http.StatusOK, response)
}
}

View File

@@ -0,0 +1,54 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceEmbedEndpoint extracts a face embedding vector from an image.
//
// Distinct from /v1/embeddings, which is OpenAI-compatible and text-only
// by contract (its `input` field is a string or string list of TEXT to
// embed). Passing an image data-URI to /v1/embeddings does not work —
// use this endpoint instead.
//
// @Summary Extract a face embedding from an image.
// @Tags face-recognition
// @Param request body schema.FaceEmbedRequest true "query params"
// @Success 200 {object} schema.FaceEmbedResponse "Response"
// @Router /v1/face/embed [post]
func FaceEmbedEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceEmbedRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceEmbed", "model", cfg.Name, "backend", cfg.Backend)
vec, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
return c.JSON(http.StatusOK, schema.FaceEmbedResponse{
Embedding: vec,
Dim: len(vec),
Model: cfg.Name,
})
}
}

View File

@@ -0,0 +1,45 @@
package localai
import (
"errors"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/xlog"
)
// FaceForgetEndpoint removes a previously-registered face by ID.
// @Summary Remove a previously-registered face by ID.
// @Tags face-recognition
// @Param request body schema.FaceForgetRequest true "query params"
// @Success 204 "No Content"
// @Router /v1/face/forget [post]
func FaceForgetEndpoint(registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceForgetRequest)
if !ok {
// Forget doesn't need a face model loaded — fall back to a raw bind
// when the request extractor hasn't run (e.g. when the route was
// registered without SetModelAndConfig).
input = new(schema.FaceForgetRequest)
if err := c.Bind(input); err != nil {
return echo.ErrBadRequest
}
}
if input.ID == "" {
return echo.NewHTTPError(http.StatusBadRequest, "id is required")
}
xlog.Debug("FaceForget", "id", input.ID)
if err := registry.Forget(c.Request().Context(), input.ID); err != nil {
if errors.Is(err, facerecognition.ErrNotFound) {
return echo.NewHTTPError(http.StatusNotFound, err.Error())
}
return err
}
return c.NoContent(http.StatusNoContent)
}
}

View File

@@ -0,0 +1,80 @@
package localai
import (
"cmp"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// defaultIdentifyThreshold is the cosine-distance cutoff applied when
// the client does not specify one. Tuned for buffalo_l ArcFace R50;
// other recognizers (e.g. SFace) should override it explicitly.
const defaultIdentifyThreshold = float32(0.35)
// FaceIdentifyEndpoint runs 1:N identification against the registered store.
// @Summary Identify a face against the registered database (1:N recognition).
// @Tags face-recognition
// @Param request body schema.FaceIdentifyRequest true "query params"
// @Success 200 {object} schema.FaceIdentifyResponse "Response"
// @Router /v1/face/identify [post]
func FaceIdentifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceIdentifyRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
topK := cmp.Or(input.TopK, 5)
threshold := cmp.Or(input.Threshold, defaultIdentifyThreshold)
xlog.Debug("FaceIdentify", "model", cfg.Name, "topK", topK, "threshold", threshold)
probe, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
matches, err := registry.Identify(c.Request().Context(), probe, topK)
if err != nil {
return err
}
response := schema.FaceIdentifyResponse{
Matches: make([]schema.FaceIdentifyMatch, len(matches)),
}
for i, m := range matches {
confidence := (1 - m.Distance/threshold) * 100
if confidence < 0 {
confidence = 0
}
if confidence > 100 {
confidence = 100
}
response.Matches[i] = schema.FaceIdentifyMatch{
ID: m.ID,
Name: m.Metadata.Name,
Labels: m.Metadata.Labels,
Distance: m.Distance,
Confidence: confidence,
Match: m.Distance <= threshold,
}
}
return c.JSON(http.StatusOK, response)
}
}

View File

@@ -0,0 +1,60 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceRegisterEndpoint enrolls a face into the 1:N identification store.
// @Summary Register a face for 1:N identification.
// @Tags face-recognition
// @Param request body schema.FaceRegisterRequest true "query params"
// @Success 200 {object} schema.FaceRegisterResponse "Response"
// @Router /v1/face/register [post]
func FaceRegisterEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceRegisterRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
if input.Name == "" {
return echo.NewHTTPError(http.StatusBadRequest, "name is required")
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceRegister", "model", cfg.Name, "name", input.Name)
embedding, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
stored, err := registry.Register(c.Request().Context(), embedding, facerecognition.Metadata{
Name: input.Name,
Labels: input.Labels,
})
if err != nil {
return err
}
return c.JSON(http.StatusOK, schema.FaceRegisterResponse{
ID: stored.ID,
Name: stored.Name,
RegisteredAt: stored.RegisteredAt,
})
}
}

View File

@@ -0,0 +1,68 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceVerifyEndpoint compares two images and reports whether they depict the same person.
// @Summary Verify that two images depict the same person.
// @Tags face-recognition
// @Param request body schema.FaceVerifyRequest true "query params"
// @Success 200 {object} schema.FaceVerifyResponse "Response"
// @Router /v1/face/verify [post]
func FaceVerifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceVerifyRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img1, err := decodeImageInput(input.Img1)
if err != nil {
return err
}
img2, err := decodeImageInput(input.Img2)
if err != nil {
return err
}
xlog.Debug("FaceVerify", "model", cfg.Name, "backend", cfg.Backend)
res, err := backend.FaceVerify(img1, img2, input.Threshold, input.AntiSpoofing, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
return c.JSON(http.StatusOK, schema.FaceVerifyResponse{
Verified: res.GetVerified(),
Distance: res.GetDistance(),
Threshold: res.GetThreshold(),
Confidence: res.GetConfidence(),
Model: res.GetModel(),
Img1Area: schema.FacialArea{
X: res.GetImg1Area().GetX(),
Y: res.GetImg1Area().GetY(),
W: res.GetImg1Area().GetW(),
H: res.GetImg1Area().GetH(),
},
Img2Area: schema.FacialArea{
X: res.GetImg2Area().GetX(),
Y: res.GetImg2Area().GetY(),
W: res.GetImg2Area().GetW(),
H: res.GetImg2Area().GetH(),
},
ProcessingTimeMs: res.GetProcessingTimeMs(),
})
}
}

View File

@@ -0,0 +1,55 @@
package localai
import (
"fmt"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/pkg/utils"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
// decodeImageInput resolves a URL, data-URI, or plain-string image
// input to a base64 payload ready for the gRPC surface. Errors from
// the underlying utils helper (bad URL, not a data-URI, download
// failure, etc.) are all caused by what the client sent — we surface
// them as 400 rather than the default 500 so API consumers can
// distinguish "you sent bad input" from "our server broke".
//
// This is the single-input path for endpoints where the image IS the
// request (detection, face recognition, etc.). The multi-modal message
// paths (chat completions, responses API, realtime) intentionally
// log-and-skip individual media parts; they don't use this helper.
func decodeImageInput(s string) (string, error) {
img, err := utils.GetContentURIAsBase64(s)
if err != nil {
return "", echo.NewHTTPError(http.StatusBadRequest, fmt.Sprintf("invalid image input: %v", err))
}
return img, nil
}
// mapBackendError converts the gRPC status code a backend returns into
// a matching HTTP status. Without this, every backend error defaults
// to 500 — which lies to API consumers when the backend is telling us
// "your input was bad" (INVALID_ARGUMENT) or "the resource doesn't
// exist" (NOT_FOUND). Pass any err from a `core/backend/*` call
// through this before returning from a handler.
func mapBackendError(err error) error {
if err == nil {
return nil
}
if st, ok := status.FromError(err); ok {
switch st.Code() {
case codes.InvalidArgument:
return echo.NewHTTPError(http.StatusBadRequest, st.Message())
case codes.NotFound:
return echo.NewHTTPError(http.StatusNotFound, st.Message())
case codes.FailedPrecondition:
return echo.NewHTTPError(http.StatusPreconditionFailed, st.Message())
case codes.Unimplemented:
return echo.NewHTTPError(http.StatusNotImplemented, st.Message())
}
}
return err
}

View File

@@ -12,3 +12,4 @@ export const CAP_TOKENIZE = 'FLAG_TOKENIZE'
export const CAP_VAD = 'FLAG_VAD'
export const CAP_VIDEO = 'FLAG_VIDEO'
export const CAP_DETECTION = 'FLAG_DETECTION'
export const CAP_FACE_RECOGNITION = 'FLAG_FACE_RECOGNITION'

View File

@@ -97,6 +97,28 @@ func RegisterLocalAIRoutes(router *echo.Echo,
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }))
// Face recognition endpoints
faceMw := []echo.MiddlewareFunc{
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_FACE_RECOGNITION)),
}
router.POST("/v1/face/verify",
localai.FaceVerifyEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceVerifyRequest) }))...)
router.POST("/v1/face/analyze",
localai.FaceAnalyzeEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceAnalyzeRequest) }))...)
router.POST("/v1/face/embed",
localai.FaceEmbedEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceEmbedRequest) }))...)
router.POST("/v1/face/register",
localai.FaceRegisterEndpoint(cl, ml, appConfig, app.FaceRegistry()),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceRegisterRequest) }))...)
router.POST("/v1/face/identify",
localai.FaceIdentifyEndpoint(cl, ml, appConfig, app.FaceRegistry()),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceIdentifyRequest) }))...)
// Forget does not load a face model — it only needs the registry.
router.POST("/v1/face/forget", localai.FaceForgetEndpoint(app.FaceRegistry()))
ttsHandler := localai.TTSEndpoint(cl, ml, appConfig)
router.POST("/tts",
ttsHandler,

View File

@@ -173,6 +173,123 @@ type Detection struct {
Mask string `json:"mask,omitempty"` // base64-encoded PNG segmentation mask
}
// ─── Face recognition ──────────────────────────────────────────────
//
// FacialArea describes a bounding box for a detected face.
type FacialArea struct {
X float32 `json:"x"`
Y float32 `json:"y"`
W float32 `json:"w"`
H float32 `json:"h"`
}
// FaceVerifyRequest compares two images to decide whether they depict
// the same person. Img1 and Img2 accept URL, base64, or data-URI.
type FaceVerifyRequest struct {
BasicModelRequest
Img1 string `json:"img1"`
Img2 string `json:"img2"`
Threshold float32 `json:"threshold,omitempty"`
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
}
type FaceVerifyResponse struct {
Verified bool `json:"verified"`
Distance float32 `json:"distance"`
Threshold float32 `json:"threshold"`
Confidence float32 `json:"confidence"`
Model string `json:"model"`
Img1Area FacialArea `json:"img1_area"`
Img2Area FacialArea `json:"img2_area"`
ProcessingTimeMs float32 `json:"processing_time_ms,omitempty"`
}
// FaceAnalyzeRequest asks the backend for demographic attributes on
// every face detected in Img.
type FaceAnalyzeRequest struct {
BasicModelRequest
Img string `json:"img"`
Actions []string `json:"actions,omitempty"` // subset of {"age","gender","emotion","race"}
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
}
type FaceAnalyzeResponse struct {
Faces []FaceAnalysis `json:"faces"`
}
type FaceAnalysis struct {
Region FacialArea `json:"region"`
FaceConfidence float32 `json:"face_confidence"`
Age float32 `json:"age,omitempty"`
DominantGender string `json:"dominant_gender,omitempty"`
Gender map[string]float32 `json:"gender,omitempty"`
DominantEmotion string `json:"dominant_emotion,omitempty"`
Emotion map[string]float32 `json:"emotion,omitempty"`
DominantRace string `json:"dominant_race,omitempty"`
Race map[string]float32 `json:"race,omitempty"`
IsReal bool `json:"is_real,omitempty"`
AntispoofScore float32 `json:"antispoof_score,omitempty"`
}
// FaceEmbedRequest extracts a face embedding from an image. Distinct
// from /v1/embeddings (which is OpenAI-compatible and text-only); this
// endpoint accepts URL / base64 / data-URI image inputs.
type FaceEmbedRequest struct {
BasicModelRequest
Img string `json:"img"`
}
type FaceEmbedResponse struct {
Embedding []float32 `json:"embedding"`
Dim int `json:"dim"`
Model string `json:"model,omitempty"`
}
// FaceRegisterRequest enrolls a face into the 1:N recognition store.
type FaceRegisterRequest struct {
BasicModelRequest
Img string `json:"img"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
Store string `json:"store,omitempty"` // vector store model; empty = local-store default
}
type FaceRegisterResponse struct {
ID string `json:"id"`
Name string `json:"name"`
RegisteredAt time.Time `json:"registered_at"`
}
// FaceIdentifyRequest runs 1:N recognition: embed the probe and
// return the top-K nearest registered faces.
type FaceIdentifyRequest struct {
BasicModelRequest
Img string `json:"img"`
TopK int `json:"top_k,omitempty"`
Threshold float32 `json:"threshold,omitempty"` // optional cutoff on distance
Store string `json:"store,omitempty"`
}
type FaceIdentifyResponse struct {
Matches []FaceIdentifyMatch `json:"matches"`
}
type FaceIdentifyMatch struct {
ID string `json:"id"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
Distance float32 `json:"distance"`
Confidence float32 `json:"confidence"`
Match bool `json:"match"` // true when distance <= threshold
}
// FaceForgetRequest removes a previously-registered face by ID.
type FaceForgetRequest struct {
BasicModelRequest
ID string `json:"id"`
Store string `json:"store,omitempty"`
}
type ImportModelRequest struct {
URI string `json:"uri"`
Preferences json.RawMessage `json:"preferences,omitempty"`

View File

@@ -0,0 +1,60 @@
// Package facerecognition provides a swappable backing store for face
// embeddings and the 1:N identification pipeline that sits on top of it.
//
// The current implementation (NewStoreRegistry) is backed by LocalAI's
// in-memory local-store gRPC backend. This is in-memory only — all
// registrations are lost when LocalAI restarts.
//
// TODO: add a persistent PostgreSQL/pgvector-backed implementation for
// production deployments. The Registry interface is explicitly designed
// so the swap is a constructor change in core/application, with zero
// HTTP-handler changes.
package facerecognition
import (
"context"
"errors"
"time"
)
// Registry stores face embeddings keyed by an opaque ID and supports
// approximate similarity search. Implementations are expected to be
// safe for concurrent use.
type Registry interface {
// Register stores a face embedding alongside its metadata.
// Returns the stored metadata with ID and RegisteredAt populated.
// The embedding length must match the registry's expected dimension.
Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error)
// Identify returns up to topK matches for the probe embedding,
// sorted by ascending distance (closest first).
Identify(ctx context.Context, probe []float32, topK int) ([]Match, error)
// Forget removes a previously-registered embedding by ID.
// Returns ErrNotFound if the ID is unknown.
Forget(ctx context.Context, id string) error
}
// Metadata is the user-supplied payload stored alongside a face embedding.
type Metadata struct {
// ID is populated by the registry at Register time and should not be
// set by the caller. It is echoed back in Match.Metadata.
ID string `json:"id"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
RegisteredAt time.Time `json:"registered_at"`
}
// Match is a single result from Identify, ranked by similarity.
type Match struct {
ID string
Metadata Metadata
Distance float32 // 1 - cosine_similarity; lower = closer
}
// Sentinel errors; callers should compare with errors.Is.
var (
ErrNotFound = errors.New("facerecognition: id not found")
ErrEmptyEmbedding = errors.New("facerecognition: embedding is empty")
ErrDimensionMismatch = errors.New("facerecognition: embedding dimension mismatch")
)

View File

@@ -0,0 +1,253 @@
package facerecognition_test
import (
"context"
"errors"
"math"
"sync"
"testing"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/grpc"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
grpclib "google.golang.org/grpc"
)
const dim = 4 // tiny test-friendly embedding dimension
func TestRegisterIdentifyForget(t *testing.T) {
t.Parallel()
reg, fake := newTestRegistry(t)
ctx := t.Context()
alice := []float32{1, 0, 0, 0}
bob := []float32{0, 1, 0, 0}
aliceMeta, err := reg.Register(ctx, alice, facerecognition.Metadata{Name: "Alice"})
if err != nil {
t.Fatalf("Register Alice: %v", err)
}
if aliceMeta.ID == "" {
t.Fatalf("Register returned empty ID")
}
if aliceMeta.RegisteredAt.IsZero() {
t.Fatalf("Register did not populate RegisteredAt")
}
bobMeta, err := reg.Register(ctx, bob, facerecognition.Metadata{Name: "Bob"})
if err != nil {
t.Fatalf("Register Bob: %v", err)
}
if bobMeta.ID == aliceMeta.ID {
t.Fatalf("IDs should be distinct, got %q twice", bobMeta.ID)
}
aliceID := aliceMeta.ID
if got, want := fake.len(), 2; got != want {
t.Fatalf("fake store has %d entries, want %d", got, want)
}
// Identify an Alice-like probe — she should win.
matches, err := reg.Identify(ctx, []float32{0.99, 0.01, 0, 0}, 2)
if err != nil {
t.Fatalf("Identify: %v", err)
}
if len(matches) == 0 {
t.Fatalf("no matches returned")
}
if matches[0].Metadata.Name != "Alice" {
t.Fatalf("top match name = %q, want Alice", matches[0].Metadata.Name)
}
if matches[0].ID != aliceID {
t.Fatalf("top match ID = %q, want %q", matches[0].ID, aliceID)
}
// Sorted ascending by distance.
for i := 1; i < len(matches); i++ {
if matches[i].Distance < matches[i-1].Distance {
t.Fatalf("matches not sorted by distance: %v", matches)
}
}
// Forget Alice → she's gone, Bob remains.
if err := reg.Forget(ctx, aliceID); err != nil {
t.Fatalf("Forget Alice: %v", err)
}
if got, want := fake.len(), 1; got != want {
t.Fatalf("after Forget, store has %d entries, want %d", got, want)
}
// Forget unknown ID → ErrNotFound (checkable via errors.Is).
if err := reg.Forget(ctx, "nonexistent"); !errors.Is(err, facerecognition.ErrNotFound) {
t.Fatalf("Forget unknown: err = %v, want ErrNotFound", err)
}
}
func TestRegisterRejectsBadEmbedding(t *testing.T) {
t.Parallel()
reg, _ := newTestRegistry(t)
ctx := t.Context()
tests := []struct {
name string
embed []float32
wantErr error
}{
{"empty", []float32{}, facerecognition.ErrEmptyEmbedding},
{"wrong_dim", []float32{1, 2}, facerecognition.ErrDimensionMismatch},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
_, err := reg.Register(ctx, tc.embed, facerecognition.Metadata{Name: "x"})
if !errors.Is(err, tc.wantErr) {
t.Fatalf("err = %v, want wrapping %v", err, tc.wantErr)
}
})
}
}
func TestConcurrent(t *testing.T) {
t.Parallel()
reg, _ := newTestRegistry(t)
ctx := t.Context()
done := make(chan struct{})
for i := range 32 {
go func(i int) {
embed := []float32{float32(i % 4), float32((i + 1) % 4), 0, 1}
meta, err := reg.Register(ctx, embed, facerecognition.Metadata{Name: "n"})
if err == nil {
_, _ = reg.Identify(ctx, embed, 3)
_ = reg.Forget(ctx, meta.ID)
}
done <- struct{}{}
}(i)
}
for range 32 {
<-done
}
}
// ─── fake gRPC backend ───────────────────────────────────────────────
func newTestRegistry(t *testing.T) (facerecognition.Registry, *fakeBackend) {
t.Helper()
fake := &fakeBackend{}
resolver := func(_ context.Context, _ string) (grpc.Backend, error) {
return fake, nil
}
return facerecognition.NewStoreRegistry(resolver, "test-store", dim), fake
}
// fakeBackend implements just enough of grpc.Backend for the store
// helpers. All other methods panic so any accidental dependency is
// visible in tests.
type fakeBackend struct {
grpc.Backend // embed to inherit no-op default method set via panic
mu sync.Mutex
keys [][]float32
vals [][]byte
}
func (f *fakeBackend) len() int {
f.mu.Lock()
defer f.mu.Unlock()
return len(f.keys)
}
func (f *fakeBackend) StoresSet(_ context.Context, in *pb.StoresSetOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
f.mu.Lock()
defer f.mu.Unlock()
for i, k := range in.Keys {
f.keys = append(f.keys, append([]float32(nil), k.Floats...))
f.vals = append(f.vals, append([]byte(nil), in.Values[i].Bytes...))
}
return &pb.Result{Success: true}, nil
}
func (f *fakeBackend) StoresDelete(_ context.Context, in *pb.StoresDeleteOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
f.mu.Lock()
defer f.mu.Unlock()
for _, k := range in.Keys {
idx := f.findKey(k.Floats)
if idx < 0 {
continue
}
f.keys = append(f.keys[:idx], f.keys[idx+1:]...)
f.vals = append(f.vals[:idx], f.vals[idx+1:]...)
}
return &pb.Result{Success: true}, nil
}
func (f *fakeBackend) StoresFind(_ context.Context, in *pb.StoresFindOptions, _ ...grpclib.CallOption) (*pb.StoresFindResult, error) {
f.mu.Lock()
defer f.mu.Unlock()
type scored struct {
key []float32
val []byte
sim float32
}
results := make([]scored, 0, len(f.keys))
for i, k := range f.keys {
results = append(results, scored{k, f.vals[i], cosine(k, in.Key.Floats)})
}
// Sort descending by similarity.
for i := 0; i < len(results); i++ {
for j := i + 1; j < len(results); j++ {
if results[j].sim > results[i].sim {
results[i], results[j] = results[j], results[i]
}
}
}
top := int(in.TopK)
if top <= 0 || top > len(results) {
top = len(results)
}
out := &pb.StoresFindResult{}
for _, r := range results[:top] {
out.Keys = append(out.Keys, &pb.StoresKey{Floats: r.key})
out.Values = append(out.Values, &pb.StoresValue{Bytes: r.val})
out.Similarities = append(out.Similarities, r.sim)
}
return out, nil
}
func (f *fakeBackend) findKey(target []float32) int {
for i, k := range f.keys {
if equalFloats(k, target) {
return i
}
}
return -1
}
func equalFloats(a, b []float32) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
func cosine(a, b []float32) float32 {
var dot, na, nb float64
for i := range a {
dot += float64(a[i]) * float64(b[i])
na += float64(a[i]) * float64(a[i])
nb += float64(b[i]) * float64(b[i])
}
if na == 0 || nb == 0 {
return 0
}
return float32(dot / (math.Sqrt(na) * math.Sqrt(nb)))
}

View File

@@ -0,0 +1,142 @@
package facerecognition
import (
"context"
"encoding/json"
"fmt"
"sort"
"sync"
"time"
"github.com/google/uuid"
"github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/LocalAI/pkg/store"
)
// StoreResolver resolves a named vector store to a gRPC backend. The
// HTTP handler layer wires this to backend.StoreBackend so the
// registry stays decoupled from the ModelLoader plumbing.
type StoreResolver func(ctx context.Context, storeName string) (grpc.Backend, error)
// NewStoreRegistry returns a Registry backed by LocalAI's generic
// StoresSet / StoresFind / StoresDelete gRPC surface.
//
// storeName selects which vector-store model to use (defaults to the
// local-store Go backend). `dim` is the expected embedding dimension;
// pass 0 to accept whatever dimension arrives (useful when the face
// backend exposes multiple recognizers of different sizes, e.g.
// ArcFace R50 at 512 vs SFace at 128). A non-zero dim is enforced at
// Register time and fails fast with ErrDimensionMismatch.
func NewStoreRegistry(resolve StoreResolver, storeName string, dim int) Registry {
return &storeRegistry{
resolve: resolve,
storeName: storeName,
dim: dim,
}
}
type storeRegistry struct {
resolve StoreResolver
storeName string
dim int
// TODO(postgres): the local-store gRPC surface keys by embedding
// vector and exposes no "list all" method, so we cannot delete by
// ID without remembering the embedding. This in-memory index is
// rebuilt on every Register and lost on restart — acceptable while
// the only implementation is itself in-memory. A persistent
// implementation must rebuild this index at startup.
idIndex sync.Map // map[string][]float32
}
func (r *storeRegistry) Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error) {
if len(embedding) == 0 {
return Metadata{}, ErrEmptyEmbedding
}
if r.dim != 0 && len(embedding) != r.dim {
return Metadata{}, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(embedding))
}
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return Metadata{}, fmt.Errorf("facerecognition: resolve store: %w", err)
}
meta.ID = uuid.NewString()
if meta.RegisteredAt.IsZero() {
meta.RegisteredAt = time.Now().UTC()
}
payload, err := json.Marshal(meta)
if err != nil {
return Metadata{}, fmt.Errorf("facerecognition: marshal metadata: %w", err)
}
if err := store.SetSingle(ctx, backend, embedding, payload); err != nil {
return Metadata{}, fmt.Errorf("facerecognition: set: %w", err)
}
// Retain a copy so Forget can look up the embedding by ID.
embCopy := append([]float32(nil), embedding...)
r.idIndex.Store(meta.ID, embCopy)
return meta, nil
}
func (r *storeRegistry) Identify(ctx context.Context, probe []float32, topK int) ([]Match, error) {
if len(probe) == 0 {
return nil, ErrEmptyEmbedding
}
if r.dim != 0 && len(probe) != r.dim {
return nil, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(probe))
}
if topK <= 0 {
topK = 5
}
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return nil, fmt.Errorf("facerecognition: resolve store: %w", err)
}
_, values, similarities, err := store.Find(ctx, backend, probe, topK)
if err != nil {
return nil, fmt.Errorf("facerecognition: find: %w", err)
}
matches := make([]Match, 0, len(values))
for i, raw := range values {
var meta Metadata
if err := json.Unmarshal(raw, &meta); err != nil {
// Skip unreadable entries instead of failing the whole query —
// the store may contain non-face records in shared deployments.
continue
}
matches = append(matches, Match{
ID: meta.ID,
Metadata: meta,
Distance: 1 - similarities[i],
})
}
sort.SliceStable(matches, func(i, j int) bool { return matches[i].Distance < matches[j].Distance })
return matches, nil
}
func (r *storeRegistry) Forget(ctx context.Context, id string) error {
raw, ok := r.idIndex.Load(id)
if !ok {
return ErrNotFound
}
embedding := raw.([]float32)
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return fmt.Errorf("facerecognition: resolve store: %w", err)
}
if err := store.DeleteSingle(ctx, backend, embedding); err != nil {
return fmt.Errorf("facerecognition: delete: %w", err)
}
r.idIndex.Delete(id)
return nil
}

View File

@@ -168,6 +168,12 @@ func (c *fakeBackendClient) SoundGeneration(_ context.Context, _ *pb.SoundGenera
func (c *fakeBackendClient) Detect(_ context.Context, _ *pb.DetectOptions, _ ...ggrpc.CallOption) (*pb.DetectResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
return nil, nil
}

View File

@@ -91,6 +91,14 @@ func (f *fakeGRPCBackend) Detect(_ context.Context, _ *pb.DetectOptions, _ ...gg
return &pb.DetectResponse{}, nil
}
func (f *fakeGRPCBackend) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
return &pb.FaceVerifyResponse{}, nil
}
func (f *fakeGRPCBackend) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
return &pb.FaceAnalyzeResponse{}, nil
}
func (f *fakeGRPCBackend) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
return &pb.TranscriptResult{}, nil
}

View File

@@ -24,6 +24,8 @@ const (
BackendTraceRerank BackendTraceType = "rerank"
BackendTraceTokenize BackendTraceType = "tokenize"
BackendTraceDetection BackendTraceType = "detection"
BackendTraceFaceVerify BackendTraceType = "face_verify"
BackendTraceFaceAnalyze BackendTraceType = "face_analyze"
BackendTraceModelLoad BackendTraceType = "model_load"
)

View File

@@ -7,6 +7,10 @@ url = "/features/embeddings/"
LocalAI supports generating embeddings for text or list of tokens.
For face embeddings specifically, see the
[Face Recognition](/features/face-recognition/) feature — it produces
512-d L2-normalized vectors tuned for face similarity.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
## Model compatibility

View File

@@ -0,0 +1,228 @@
+++
disableToc = false
title = "Face Recognition"
weight = 14
url = "/features/face-recognition/"
+++
LocalAI supports face recognition through the `insightface` backend:
face verification (1:1), face identification (1:N) against a built-in
vector store, face embedding, face detection, and demographic analysis
(age / gender).
The backend ships **two interchangeable engines** under one image, each
paired with a distinct gallery entry so users can pick by license and
accuracy needs.
## Licensing — read this first
| Gallery entry | Detector + recognizer | Size | License |
|---|---|---|---|
| `insightface-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | ~326 MB | **Non-commercial research only** (upstream insightface weights) |
| `insightface-buffalo-s` | SCRFD-500MF + MBF + GenderAge | ~159 MB | **Non-commercial research only** |
| `insightface-opencv` | YuNet + SFace | ~40 MB | **Apache 2.0 — commercial-safe** |
The `insightface` Python library itself is MIT, but the pretrained model
packs (buffalo_l, buffalo_s, antelopev2) are released by the upstream
maintainers for **non-commercial research use only**. Pick the
`insightface-opencv` entry for production / commercial deployments.
## Quickstart
Pull the commercial-safe backend (recommended for copy-paste):
```bash
local-ai models install insightface-opencv
```
Verify that two images depict the same person:
```bash
curl -sX POST http://localhost:8080/v1/face/verify \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-opencv",
"img1": "https://example.com/alice_1.jpg",
"img2": "https://example.com/alice_2.jpg"
}'
```
Response:
```json
{
"verified": true,
"distance": 0.27,
"threshold": 0.35,
"confidence": 23.1,
"model": "insightface-opencv",
"img1_area": { "x": 120.4, "y": 82.1, "w": 198.3, "h": 260.5 },
"img2_area": { "x": 110.8, "y": 95.0, "w": 205.6, "h": 268.2 },
"processing_time_ms": 412.0
}
```
## 1:N identification workflow (register → identify → forget)
This is the primary "face recognition" flow. Under the hood it uses
LocalAI's built-in in-memory vector store — no external database to
stand up.
1. Register known faces:
```bash
curl -sX POST http://localhost:8080/v1/face/register \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-buffalo-l",
"name": "Alice",
"img": "https://example.com/alice.jpg"
}'
# → {"id": "8b7...", "name": "Alice", "registered_at": "2026-04-21T..."}
```
2. Identify an unknown probe:
```bash
curl -sX POST http://localhost:8080/v1/face/identify \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-buffalo-l",
"img": "https://example.com/unknown.jpg",
"top_k": 5
}'
# → {"matches": [{"id":"8b7...","name":"Alice","distance":0.22,"match":true,...}]}
```
3. Remove a person by ID:
```bash
curl -sX POST http://localhost:8080/v1/face/forget \
-d '{"id": "8b7..."}'
# → 204 No Content
```
{{% alert icon="⚠️" color="warning" %}}
**Storage caveat.** The default vector store is in-memory. All
registered faces are lost when LocalAI restarts. Persistent storage
(pgvector) is a tracked future enhancement — the face-recognition HTTP
API is designed to swap the backing store without changing the wire
format.
{{% /alert %}}
## API reference
### `POST /v1/face/verify` (1:1)
| field | type | description |
|---|---|---|
| `model` | string | gallery entry name (e.g. `insightface-buffalo-l`) |
| `img1`, `img2` | string | URL, base64, or data-URI |
| `threshold` | float, optional | cosine-distance cutoff; default depends on engine |
| `anti_spoofing` | bool, optional | reserved — unused in the current release |
Returns `verified`, `distance`, `threshold`, `confidence`, `model`,
`img1_area`, `img2_area`, and `processing_time_ms`.
### `POST /v1/face/analyze`
Returns demographic attributes for every detected face:
| field | type | description |
|---|---|---|
| `model` | string | gallery entry |
| `img` | string | URL / base64 / data-URI |
| `actions` | string[] | subset of `["age","gender","emotion","race"]`; empty = all supported |
Only `insightface-buffalo-l` / `insightface-buffalo-s` populate age and
gender (genderage head). `insightface-opencv` returns face regions with
empty attributes — SFace has no demographic classifier. Emotion and
race are always empty in the current release.
### `POST /v1/face/register` (1:N enrollment)
| field | type | description |
|---|---|---|
| `model` | string | face recognition model |
| `img` | string | face to enroll |
| `name` | string | human-readable label |
| `labels` | map[string]string, optional | arbitrary metadata |
| `store` | string, optional | vector store model; defaults to local-store |
Returns `{id, name, registered_at}`. The `id` is an opaque UUID used by
`/v1/face/identify` and `/v1/face/forget`.
### `POST /v1/face/identify` (1:N recognition)
| field | type | description |
|---|---|---|
| `model` | string | face recognition model |
| `img` | string | probe image |
| `top_k` | int, optional | max matches to return; default 5 |
| `threshold` | float, optional | cosine-distance cutoff; default 0.35 (ArcFace) |
| `store` | string, optional | vector store model; defaults to local-store |
Returns a list of matches sorted by ascending distance, each with `id`,
`name`, `labels`, `distance`, `confidence`, and `match`
(`distance ≤ threshold`).
### `POST /v1/face/forget`
| field | type | description |
|---|---|---|
| `id` | string | ID returned by `/v1/face/register` |
Returns `204 No Content` on success, `404 Not Found` if the ID is
unknown.
### `POST /v1/face/embed`
Returns the L2-normalized face embedding vector for the detected face.
| field | type | description |
|---|---|---|
| `model` | string | face model |
| `img` | string | URL / base64 / data-URI |
Returns `{embedding: float[], dim: int, model: string}`. Dimension is
512 for the insightface ArcFace/MBF recognizers and 128 for OpenCV's
SFace.
> **Note:** the OpenAI-compatible `/v1/embeddings` endpoint is
> intentionally text-only by contract (`input` is a string or list of
> strings of TEXT to embed) — passing an image data-URI there does
> nothing useful. Use `/v1/face/embed` for image inputs.
### Reused endpoint
- `POST /v1/detection` — returns face bounding boxes with
`class_name: "face"`; works for both engines.
## Choosing an engine
| Need | Entry |
|---|---|
| Commercial product | `insightface-opencv` |
| Highest accuracy (research / demos) | `insightface-buffalo-l` |
| Edge / low-memory / research | `insightface-buffalo-s` |
The recommended default `threshold` for `/v1/face/verify` and
`/v1/face/identify` depends on the recognizer:
| Recognizer | Cosine-distance threshold |
|---|---|
| ArcFace R50 (`buffalo_l`) | ~0.35 |
| MBF (`buffalo_s`) | ~0.40 |
| SFace (`opencv`) | ~0.50 |
Pass `threshold` explicitly when switching engines — the per-engine
default only fires when the field is omitted.
## Related features
- [Object Detection](/features/object-detection/) — generic bounding-box
detection; `/v1/detection` works with the insightface backend too.
- [Embeddings](/features/embeddings/) — raw vector extraction; face
embeddings live in the same endpoint under the hood.
- [Stores](/features/stores/) — the generic vector store powering the
1:N recognition pipeline.

View File

@@ -7,6 +7,11 @@ url = "/features/object-detection/"
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
For detecting **faces** specifically, see the dedicated
[Face Recognition](/features/face-recognition/) feature — its
`/v1/detection` support is tuned for face bounding boxes and ships
with commercially-safe model options.
## Overview
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.

View File

@@ -9,6 +9,14 @@ url = '/stores'
Stores are an experimental feature to help with querying data using similarity search. It is
a low level API that consists of only `get`, `set`, `delete` and `find`.
{{% alert icon="💡" color="info" %}}
**Face recognition uses this store.** The 1:N face identification flow
(`/v1/face/register`, `/v1/face/identify`, `/v1/face/forget`) is built
on top of the generic store — see
[Face Recognition](/features/face-recognition/) for the face-oriented
API.
{{% /alert %}}
For example if you have an embedding of some text and want to find text with similar embeddings.
You can create embeddings for chunks of all your text then compare them against the embedding of the text you
are searching on.

View File

@@ -10,6 +10,10 @@ Release notes have been now moved completely over Github releases.
You can see the release notes [here](https://github.com/mudler/LocalAI/releases).
## 2026 Highlights
- **April 2026**: [Face recognition backend](/features/face-recognition/) — `insightface`-powered 1:1 verification, 1:N identification, face embedding, face detection, and demographic analysis. Ships both a non-commercial `buffalo_l` model and an Apache 2.0 OpenCV Zoo alternative.
## 2024 Highlights
- **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)

View File

@@ -3706,6 +3706,169 @@
- filename: arcee-ai_AFM-4.5B-Q4_K_M.gguf
sha256: f05516b323f581bebae1af2cbf900d83a2569b0a60c54366daf4a9c15ae30d4f
uri: huggingface://bartowski/arcee-ai_AFM-4.5B-GGUF/arcee-ai_AFM-4.5B-Q4_K_M.gguf
- &insightface_buffalo_l
name: "insightface-buffalo-l"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
# insightface library is MIT; pretrained packs are NON-COMMERCIAL.
license: "insightface-non-commercial"
description: |
Face recognition using insightface's `buffalo_l` pack
(SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB).
Default choice, highest accuracy.
Weights delivered via LocalAI's gallery mechanism (SHA-256 verified,
cached in the models directory like any other managed model).
NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-l}
options: ["engine:insightface", "model_pack:buffalo_l"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_l.zip
sha256: 80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
- &insightface_buffalo_m
name: "insightface-buffalo-m"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace +
genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a
cheaper detector — good balance on mid-range hardware.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-m}
options: ["engine:insightface", "model_pack:buffalo_m"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_m.zip
sha256: d98264bd8f2dc75cbc2ddce2a14e636e02bb857b3051c234b737bf3b614edca9
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_m.zip
- &insightface_buffalo_s
name: "insightface-buffalo-s"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder +
genderage, ~159MB). Good fit for mid-range CPU deployments.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-s}
options: ["engine:insightface", "model_pack:buffalo_s"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_s.zip
sha256: d85a87f503f691807cd8bb97128bdf7a0660326cd9cd02657127fa978bab8b5e
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_s.zip
- &insightface_buffalo_sc
name: "insightface-buffalo-sc"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB).
NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty
attributes for this pack. Ideal for edge/embedded deployments where
only verification and embedding are needed.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-sc}
options: ["engine:insightface", "model_pack:buffalo_sc"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_sc.zip
sha256: 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
- &insightface_antelopev2
name: "insightface-antelopev2"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer +
genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on
harder benchmarks; pays for it in GPU memory.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-antelopev2}
options: ["engine:insightface", "model_pack:antelopev2"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: antelopev2.zip
sha256: 8e182f14fc6e80b3bfa375b33eb6cff7ee05d8ef7633e738d1c89021dcf0c5c5
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip
- &insightface_opencv
name: "insightface-opencv"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
license: apache-2.0
description: |
Face recognition using OpenCV Zoo weights: YuNet detector + SFace
128-d recognizer (fp32). APACHE 2.0 — safe for commercial use.
Lower accuracy than insightface packs, no demographic head
(`/v1/face/analyze` returns detection regions only).
Weights are downloaded on install via LocalAI's gallery mechanism
(~40MB).
tags: [face-recognition, face-verification, face-embedding, commercial-ok, gpu, cpu]
urls: [https://github.com/opencv/opencv_zoo]
overrides:
backend: insightface
parameters: {model: face_detection_yunet_2023mar.onnx}
options:
- "engine:onnx_direct"
- "detector_onnx:face_detection_yunet_2023mar.onnx"
- "recognizer_onnx:face_recognition_sface_2021dec.onnx"
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: face_detection_yunet_2023mar.onnx
sha256: 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
- filename: face_recognition_sface_2021dec.onnx
sha256: 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
- &insightface_opencv_int8
name: "insightface-opencv-int8"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
license: apache-2.0
description: |
Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB).
Roughly 3x smaller and noticeably faster on CPU than the fp32 variant
at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe.
Weights are downloaded on install via LocalAI's gallery mechanism.
tags: [face-recognition, face-verification, face-embedding, commercial-ok, edge, cpu]
urls: [https://github.com/opencv/opencv_zoo]
overrides:
backend: insightface
parameters: {model: face_detection_yunet_2023mar_int8.onnx}
options:
- "engine:onnx_direct"
- "detector_onnx:face_detection_yunet_2023mar_int8.onnx"
- "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx"
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: face_detection_yunet_2023mar_int8.onnx
sha256: 321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
- filename: face_recognition_sface_2021dec_int8.onnx
sha256: 2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
- &rfdetr
name: "rfdetr-base"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"

View File

@@ -54,6 +54,8 @@ type Backend interface {
TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error)
AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)

View File

@@ -81,6 +81,14 @@ func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
return pb.DetectResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
return pb.FaceVerifyResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
return pb.FaceAnalyzeResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
}

View File

@@ -580,6 +580,42 @@ func (c *Client) Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.
return client.Detect(ctx, in, opts...)
}
func (c *Client) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error) {
if !c.parallel {
c.opMutex.Lock()
defer c.opMutex.Unlock()
}
c.setBusy(true)
defer c.setBusy(false)
c.wdMark()
defer c.wdUnMark()
conn, err := c.dial()
if err != nil {
return nil, err
}
defer conn.Close()
client := pb.NewBackendClient(conn)
return client.FaceVerify(ctx, in, opts...)
}
func (c *Client) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
if !c.parallel {
c.opMutex.Lock()
defer c.opMutex.Unlock()
}
c.setBusy(true)
defer c.setBusy(false)
c.wdMark()
defer c.wdUnMark()
conn, err := c.dial()
if err != nil {
return nil, err
}
defer conn.Close()
client := pb.NewBackendClient(conn)
return client.FaceAnalyze(ctx, in, opts...)
}
func (c *Client) AudioEncode(ctx context.Context, in *pb.AudioEncodeRequest, opts ...grpc.CallOption) (*pb.AudioEncodeResult, error) {
if !c.parallel {
c.opMutex.Lock()

View File

@@ -71,6 +71,14 @@ func (e *embedBackend) Detect(ctx context.Context, in *pb.DetectOptions, opts ..
return e.s.Detect(ctx, in)
}
func (e *embedBackend) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error) {
return e.s.FaceVerify(ctx, in)
}
func (e *embedBackend) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
return e.s.FaceAnalyze(ctx, in)
}
func (e *embedBackend) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error) {
return e.s.AudioTranscription(ctx, in)
}

View File

@@ -17,6 +17,8 @@ type AIModel interface {
GenerateImage(*pb.GenerateImageRequest) error
GenerateVideo(*pb.GenerateVideoRequest) error
Detect(*pb.DetectOptions) (pb.DetectResponse, error)
FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error)
FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error)
AudioTranscription(*pb.TranscriptRequest) (pb.TranscriptResult, error)
AudioTranscriptionStream(*pb.TranscriptRequest, chan *pb.TranscriptStreamResponse) error
TTS(*pb.TTSRequest) error

View File

@@ -151,6 +151,30 @@ func (s *server) Detect(ctx context.Context, in *pb.DetectOptions) (*pb.DetectRe
return &res, nil
}
func (s *server) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest) (*pb.FaceVerifyResponse, error) {
if s.llm.Locking() {
s.llm.Lock()
defer s.llm.Unlock()
}
res, err := s.llm.FaceVerify(in)
if err != nil {
return nil, err
}
return &res, nil
}
func (s *server) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest) (*pb.FaceAnalyzeResponse, error) {
if s.llm.Locking() {
s.llm.Lock()
defer s.llm.Unlock()
}
res, err := s.llm.FaceAnalyze(in)
if err != nil {
return nil, err
}
return &res, nil
}
func (s *server) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest) (*pb.TranscriptResult, error) {
if s.llm.Locking() {
s.llm.Lock()

View File

@@ -2,6 +2,7 @@ package e2ebackends_test
import (
"context"
"encoding/base64"
"fmt"
"io"
"net"
@@ -83,13 +84,18 @@ const (
capTools = "tools"
capTranscription = "transcription"
capImage = "image"
capFaceDetect = "face_detect"
capFaceEmbed = "face_embed"
capFaceVerify = "face_verify"
capFaceAnalyze = "face_analyze"
defaultPrompt = "The capital of France is"
streamPrompt = "Once upon a time"
defaultToolPrompt = "What's the weather like in Paris, France?"
defaultToolName = "get_weather"
defaultImagePrompt = "a photograph of an astronaut riding a horse"
defaultImageSteps = 4
defaultPrompt = "The capital of France is"
streamPrompt = "Once upon a time"
defaultToolPrompt = "What's the weather like in Paris, France?"
defaultToolName = "get_weather"
defaultImagePrompt = "a photograph of an astronaut riding a horse"
defaultImageSteps = 4
defaultVerifyDistanceCeil = float32(0.6) // upper bound for same-person; SFace runs closer to 0.5 ArcFace to 0.35.
)
func defaultCaps() map[string]bool {
@@ -127,12 +133,21 @@ var _ = Describe("Backend container", Ordered, func() {
modelName string // set when a HuggingFace model id is used
mmprojFile string // optional multimodal projector
audioFile string // optional audio fixture for transcription specs
addr string
serverCmd *exec.Cmd
conn *grpc.ClientConn
client pb.BackendClient
prompt string
options []string
// Face fixtures: two photos of the same person + one different person.
faceFile1 string
faceFile2 string
faceFile3 string
// verifyCeiling is the upper-bound cosine distance for a
// same-person pair; each model configuration can override it via
// BACKEND_TEST_VERIFY_DISTANCE_CEILING because SFace's distance
// distribution is wider than ArcFace's.
verifyCeiling float32
addr string
serverCmd *exec.Cmd
conn *grpc.ClientConn
client pb.BackendClient
prompt string
options []string
)
BeforeAll(func() {
@@ -197,6 +212,12 @@ var _ = Describe("Backend container", Ordered, func() {
}
}
// Face fixtures for the face-recognition specs.
faceFile1 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_1", "face_a_1.jpg")
faceFile2 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_2", "face_a_2.jpg")
faceFile3 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_3", "face_b.jpg")
verifyCeiling = envFloat32("BACKEND_TEST_VERIFY_DISTANCE_CEILING", defaultVerifyDistanceCeil)
// Pick a free port and launch the backend.
port, err := freeport.GetFreePort()
Expect(err).NotTo(HaveOccurred())
@@ -533,6 +554,120 @@ var _ = Describe("Backend container", Ordered, func() {
GinkgoWriter.Printf("AudioTranscriptionStream: deltas=%d assembled=%q final=%q\n",
len(deltas), assembled.String(), finalText)
})
// ─── face recognition specs ─────────────────────────────────────────
It("detects faces via Detect", func() {
if !caps[capFaceDetect] {
Skip("face_detect capability not enabled")
}
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
res, err := client.Detect(ctx, &pb.DetectOptions{Src: base64File(faceFile1)})
Expect(err).NotTo(HaveOccurred())
Expect(res.GetDetections()).NotTo(BeEmpty(), "Detect returned no faces")
for _, d := range res.GetDetections() {
Expect(d.GetClassName()).To(Equal("face"))
Expect(d.GetWidth()).To(BeNumerically(">", 0))
Expect(d.GetHeight()).To(BeNumerically(">", 0))
}
GinkgoWriter.Printf("face_detect: %d faces\n", len(res.GetDetections()))
})
It("produces face embeddings via Embedding", func() {
if !caps[capFaceEmbed] {
Skip("face_embed capability not enabled")
}
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
res, err := client.Embedding(ctx, &pb.PredictOptions{Images: []string{base64File(faceFile1)}})
Expect(err).NotTo(HaveOccurred())
vec := res.GetEmbeddings()
Expect(vec).NotTo(BeEmpty(), "Embedding returned empty vector")
// Face embeddings are L2-normalized — expect unit norm.
var sumSq float64
for _, v := range vec {
sumSq += float64(v) * float64(v)
}
Expect(sumSq).To(BeNumerically("~", 1.0, 0.05),
"face embedding should be L2-normed (sum(x^2)=%.3f, dim=%d)", sumSq, len(vec))
GinkgoWriter.Printf("face_embed: dim=%d\n", len(vec))
})
It("verifies faces via FaceVerify", func() {
if !caps[capFaceVerify] {
Skip("face_verify capability not enabled")
}
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
// Same image twice — expected verified=true with very small distance.
b1 := base64File(faceFile1)
same, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b1, Threshold: verifyCeiling})
Expect(err).NotTo(HaveOccurred())
Expect(same.GetVerified()).To(BeTrue(), "same image should verify: dist=%.3f", same.GetDistance())
Expect(same.GetDistance()).To(BeNumerically("<", 0.1))
GinkgoWriter.Printf("face_verify(same): dist=%.3f confidence=%.1f\n", same.GetDistance(), same.GetConfidence())
// Different images — assert relative ordering when the detector
// actually finds a face in both images. Some fixtures (masked
// faces, profile shots, etc.) are legitimately borderline for
// SCRFD's default threshold, so we don't fail the suite when the
// second image gets a NotFound — we just log and skip the
// cross-person check. The same-image assertion above is the
// definitive proof the RPC works end-to-end.
if faceFile3 != "" {
b3 := base64File(faceFile3)
diff, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b3, Threshold: verifyCeiling})
if err != nil {
GinkgoWriter.Printf("face_verify(diff): skipped — %v\n", err)
} else {
Expect(diff.GetDistance()).To(BeNumerically(">", same.GetDistance()),
"cross-person distance %.3f should exceed same-image distance %.3f", diff.GetDistance(), same.GetDistance())
GinkgoWriter.Printf("face_verify(diff): dist=%.3f verified=%v\n", diff.GetDistance(), diff.GetVerified())
}
}
// If two photos of the same person were provided, the ordering
// should also hold: d(a1,a2) < ceiling. Best-effort as above —
// skip if the detector doesn't find a face in the second image.
if faceFile2 != "" {
b2 := base64File(faceFile2)
sp, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b2, Threshold: verifyCeiling})
if err != nil {
GinkgoWriter.Printf("face_verify(same-person): skipped — %v\n", err)
} else {
Expect(sp.GetDistance()).To(BeNumerically("<", verifyCeiling),
"same-person (different photos) distance %.3f exceeds ceiling %.3f", sp.GetDistance(), verifyCeiling)
GinkgoWriter.Printf("face_verify(same-person): dist=%.3f verified=%v\n", sp.GetDistance(), sp.GetVerified())
}
}
})
It("analyzes faces via FaceAnalyze", func() {
if !caps[capFaceAnalyze] {
Skip("face_analyze capability not enabled")
}
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
res, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{Img: base64File(faceFile1)})
Expect(err).NotTo(HaveOccurred())
Expect(res.GetFaces()).NotTo(BeEmpty(), "FaceAnalyze returned no faces")
for _, f := range res.GetFaces() {
Expect(f.GetFaceConfidence()).To(BeNumerically(">", 0))
Expect(f.GetAge()).To(BeNumerically(">", 0), "age should be populated by analyze-capable engines")
Expect(f.GetDominantGender()).To(BeElementOf("Man", "Woman"))
}
GinkgoWriter.Printf("face_analyze: %d faces\n", len(res.GetFaces()))
})
})
// extractImage runs `docker create` + `docker export` to materialise the image
@@ -589,6 +724,43 @@ func envInt32(name string, def int32) int32 {
return v
}
func envFloat32(name string, def float32) float32 {
raw := os.Getenv(name)
if raw == "" {
return def
}
var v float32
if _, err := fmt.Sscanf(raw, "%f", &v); err != nil {
return def
}
return v
}
// resolveFaceFixture returns the local path of a face-fixture image,
// preferring BACKEND_TEST_<prefix>_FILE when set and otherwise
// downloading BACKEND_TEST_<prefix>_URL into workDir. Returns an empty
// string when neither is configured — specs that need it should skip.
func resolveFaceFixture(workDir, prefix, defaultName string) string {
if path := os.Getenv(prefix + "_FILE"); path != "" {
return path
}
url := os.Getenv(prefix + "_URL")
if url == "" {
return ""
}
dest := filepath.Join(workDir, defaultName)
downloadFile(url, dest)
return dest
}
// base64File reads a file and returns its base64 encoding.
func base64File(path string) string {
GinkgoHelper()
data, err := os.ReadFile(path)
Expect(err).NotTo(HaveOccurred(), "reading %s", path)
return base64.StdEncoding.EncodeToString(data)
}
func keys(m map[string]bool) []string {
out := make([]string, 0, len(m))
for k, v := range m {

27
tests/fixtures/faces/README.md vendored Normal file
View File

@@ -0,0 +1,27 @@
# Face-recognition e2e fixtures
The face-recognition e2e tests (`tests/e2e-backends/backend_test.go`)
don't require committed fixture JPEGs. They follow the same pattern as
`BACKEND_TEST_AUDIO_URL` (whisper): the Makefile target passes
HTTP URLs via `BACKEND_TEST_FACE_IMAGE_*_URL`, and the suite downloads
them at `BeforeAll` time.
For the Makefile targets in `Makefile`, the defaults point at NASA's
public-domain astronaut portraits on nasa.gov. NASA images are released
into the public domain by U.S. federal work (see
<https://www.nasa.gov/nasa-brand-center/images-and-media/>).
If you want to run the suite fully offline, drop three JPEGs into this
directory with the names the Makefile expects and flip the env vars to
the `_FILE` variants:
```
tests/fixtures/faces/person_a_1.jpg # person A, photo 1
tests/fixtures/faces/person_a_2.jpg # person A, photo 2 (different angle/lighting)
tests/fixtures/faces/person_b.jpg # a different person
```
The suite asserts *relative* ordering only (`d(a1,a2) < d(a1,b)`) — the
absolute distance ceiling is set per-model via
`BACKEND_TEST_VERIFY_DISTANCE_CEILING` so SFace (which uses a wider
distance distribution than ArcFace) can share the same suite.