chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026

Move the TurboQuant llama.cpp fork pin from feature/turboquant-kv-cache (627ebbc6) to rebase/upstream-sync-april-2026 (7f320bb8), picking up the upstream-sync work on the fork. Assisted-by: Claude:claude-opus-4-7
fix: remove unsafe sprintf() in grpc-server.cpp (#9486 )
2026-05-24 00:26:34 -04:00 · 2026-04-22 20:01:49 +00:00 · 2026-04-22 21:57:29 +02:00 · 2026-04-22 21:56:11 +02:00 · 2026-04-22 21:55:41 +02:00 · 2026-04-22 13:19:55 +02:00
111 changed files with 5890 additions and 347 deletions
--- a/.agents/ai-coding-assistants.md
+++ b/.agents/ai-coding-assistants.md
@@ -0,0 +1,101 @@
+# AI Coding Assistants
+
+This document provides guidance for AI tools and developers using AI
+assistance when contributing to LocalAI.
+
+**LocalAI follows the same guidelines as the Linux kernel project for
+AI-assisted contributions.** See the upstream policy here:
+<https://docs.kernel.org/process/coding-assistants.html>
+
+The rules below mirror that policy, adapted to LocalAI's license and
+project layout. If anything is unclear, the kernel document is the
+authoritative reference for intent.
+
+AI tools helping with LocalAI development should follow the standard
+project development process:
+
+- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
+  conventions, and PR guidelines
+- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
+  logging, and documentation conventions
+- [.agents/building-and-testing.md](building-and-testing.md) — build and
+  test procedures
+
+## Licensing and Legal Requirements
+
+All contributions must comply with LocalAI's licensing requirements:
+
+- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
+  file
+- New source files should use the SPDX license identifier `MIT` where
+  applicable to the file type
+- Contributions must be compatible with the MIT License and must not
+  introduce code under incompatible licenses (e.g., GPL) without an
+  explicit discussion with maintainers
+
+## Signed-off-by and Developer Certificate of Origin
+
+**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
+certify the Developer Certificate of Origin (DCO). The human submitter
+is responsible for:
+
+- Reviewing all AI-generated code
+- Ensuring compliance with licensing requirements
+- Adding their own `Signed-off-by` tag (when the project requires DCO)
+  to certify the contribution
+- Taking full responsibility for the contribution
+
+AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
+A human reviewer owns the contribution; the AI's involvement is recorded
+via `Assisted-by` (see below).
+
+## Attribution
+
+When AI tools contribute to LocalAI development, proper attribution helps
+track the evolving role of AI in the development process. Contributions
+should include an `Assisted-by` tag in the commit message trailer in the
+following format:
+
+```
+Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
+```
+
+Where:
+
+- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
+  `Copilot`, `Cursor`)
+- `MODEL_VERSION` — specific model version used (e.g.,
+  `claude-opus-4-7`, `gpt-5`)
+- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
+  agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
+
+Basic development tools (git, go, make, editors) should **not** be listed.
+
+### Example
+
+```
+fix(llama-cpp): handle empty tool call arguments
+
+Previously the parser panicked when the model returned a tool call with
+an empty arguments object. Fall back to an empty JSON object in that
+case so downstream consumers receive a valid payload.
+
+Assisted-by: Claude:claude-opus-4-7 golangci-lint
+Signed-off-by: Jane Developer <jane@example.com>
+```
+
+## Scope and Responsibility
+
+Using an AI assistant does not reduce the contributor's responsibility.
+The human submitter must:
+
+- Understand every line that lands in the PR
+- Verify that generated code compiles, passes tests, and follows the
+  project style
+- Confirm that any referenced APIs, flags, or file paths actually exist
+  in the current tree (AI models may hallucinate identifiers)
+- Not submit AI output verbatim without review
+
+Reviewers may ask for clarification on any change regardless of how it
+was produced. "An AI wrote it" is not an acceptable answer to a design
+question.
--- a/.agents/api-endpoints-and-auth.md
+++ b/.agents/api-endpoints-and-auth.md
@@ -2,6 +2,8 @@

 This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.

+> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
+
 ## Architecture overview

 Authentication and authorization flow through three layers:
@@ -234,6 +236,66 @@ Use these HTTP status codes:

 If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.

+## Advertising surfaces — where to register a new capability
+
+Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
+
+### 1. Swagger `@Tags` annotation (mandatory)
+
+Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
+
+```go
+// MyEndpoint does X.
+// @Summary Do X.
+// @Tags my-capability
+// @Param request body schema.MyRequest true "payload"
+// @Success 200 {object} schema.MyResponse "Response"
+// @Router /v1/my-endpoint [post]
+func MyEndpoint(...) echo.HandlerFunc { ... }
+```
+
+Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
+
+After adding endpoints, regenerate the embedded spec so the runtime serves it:
+
+```bash
+make protogen-go         # ensures gRPC codegen is fresh first
+make swagger             # regenerates swagger/swagger.json
+```
+
+### 2. `/api/instructions` registry (for new capability areas)
+
+`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
+
+**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
+
+Add an entry to `instructionDefs`:
+
+```go
+{
+    Name:        "my-capability",             // URL segment at /api/instructions/my-capability
+    Description: "Short sentence describing the capability",
+    Tags:        []string{"my-capability"},   // must match swagger @Tags
+    Intro:       "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
+},
+```
+
+Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
+
+### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
+
+If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
+
+```js
+export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
+```
+
+React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
+
+### 4. `docs/content/` (user-facing documentation)
+
+A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
+
 ## Path protection rules

 The global auth middleware classifies paths as API paths or non-API paths:
@@ -248,12 +310,23 @@ If you add endpoints under a new top-level path prefix, add it to `isAPIPath()`

 When adding a new endpoint:

+**Routing & auth**
 - [ ] Handler in `core/http/endpoints/`
 - [ ] Route registered in appropriate `core/http/routes/` file
 - [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
+- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
+- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
+- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
 - [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
 - [ ] If token-counting: `usageMiddleware` added to middleware chain
- [ ] Error responses use `schema.ErrorResponse` format
+
+**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
+- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
+- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
+- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
+- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
+
+**Quality**
+- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
 - [ ] Tests cover both authenticated and unauthenticated access
+- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -30,6 +30,7 @@ jobs:
      skip-drivers: ${{ matrix.skip-drivers }}
      context: ${{ matrix.context }}
      ubuntu-version: ${{ matrix.ubuntu-version }}
+      amdgpu-targets: ${{ matrix.amdgpu-targets }}
    secrets:
      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -710,6 +711,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "8"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-insightface'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "insightface"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'cublas'
            cuda-major-version: "12"
            cuda-minor-version: "8"
@@ -1623,19 +1637,6 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-whisperx'
-            runs-on: 'bigger-runner'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
          - build-type: 'hipblas'
            cuda-major-version: ""
            cuda-minor-version: ""
@@ -2596,6 +2597,20 @@ jobs:
            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
            ubuntu-version: '2404'
+          # kokoros (Rust TTS)
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-kokoros'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "kokoros"
+            dockerfile: "./backend/Dockerfile.rust"
+            context: "./"
+            ubuntu-version: '2404'
          # local-store
          - build-type: ''
            cuda-major-version: ""
@@ -2624,6 +2639,20 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          # insightface (face recognition)
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64,linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-insightface'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "insightface"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'intel'
            cuda-major-version: ""
            cuda-minor-version: ""
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -58,6 +58,11 @@ on:
        required: false
        default: '2204'
        type: string
+      amdgpu-targets:
+        description: 'AMD GPU targets for ROCm/HIP builds'
+        required: false
+        default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
+        type: string
    secrets:
      dockerUsername:
        required: false
@@ -214,6 +219,7 @@ jobs:
            BASE_IMAGE=${{ inputs.base-image }}
            BACKEND=${{ inputs.backend }}
            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=gha
@@ -235,6 +241,7 @@ jobs:
            BASE_IMAGE=${{ inputs.base-image }}
            BACKEND=${{ inputs.backend }}
            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=gha
--- a/.github/workflows/gallery-agent.yaml
+++ b/.github/workflows/gallery-agent.yaml
@@ -54,24 +54,41 @@ jobs:
          REPO: ${{ github.repository }}
          SEARCH: 'gallery agent in:title'
        run: |
-          # Walk open gallery-agent PRs and act on maintainer comments:
+          # Walk gallery-agent PRs and act on maintainer comments:
          #   /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
          #   /gallery-agent recreate  → close without label (next run may repropose)
          # Only comments from OWNER / MEMBER / COLLABORATOR are honored so
          # random users can't drive the bot.
+          #
+          # We scan both open PRs AND recently-closed PRs that don't already
+          # carry the blacklist label. This covers the common flow where a
+          # maintainer writes /gallery-agent blacklist and immediately clicks
+          # Close — without this, the next scheduled run wouldn't see the
+          # command (PR is already closed) and would repropose the model.
          gh label create gallery-agent/blacklisted \
            --repo "$REPO" --color ededed \
            --description "gallery-agent must not repropose this model" 2>/dev/null || true

-          prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
+          prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
+            --json number --jq '.[].number')
+          # Closed PRs from the last 14 days that don't yet have the blacklist label.
+          # Bounded window keeps the scan cheap while covering late-applied commands.
+          since=$(date -u -d '14 days ago' +%Y-%m-%d)
+          prs_closed=$(gh pr list --repo "$REPO" --state closed \
+            --search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
+            --json number --jq '.[].number')
+          prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
          for pr in $prs; do
+            state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
            cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
              --jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
            if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
-              echo "PR #$pr: blacklist command found"
+              echo "PR #$pr: blacklist command found (state=$state)"
              gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
-              gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
-            elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
+              if [ "$state" = "OPEN" ]; then
+                gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
+              fi
+            elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
              echo "PR #$pr: recreate command found"
              gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
            fi
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -38,6 +38,7 @@ jobs:
      qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
      voxtral: ${{ steps.detect.outputs.voxtral }}
      kokoros: ${{ steps.detect.outputs.kokoros }}
+      insightface: ${{ steps.detect.outputs.insightface }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
@@ -751,3 +752,29 @@ jobs:
      - name: Test kokoros
        run: |
          make -C backend/rust/kokoros test
+  tests-insightface-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+              make build-essential curl unzip ca-certificates git tar
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.26.0'
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+          df -h
+      - name: Build insightface backend image and run both model configurations
+        run: |
+          make test-extra-backend-insightface-all
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,11 +1,23 @@
 # LocalAI Agent Instructions

-This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
+This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
+
+Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
+
+## Policy for AI-Assisted Contributions
+
+LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
+
+- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
+- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
+- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
+- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.

 ## Topics

 | File | When to read |
 |------|-------------|
+| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
 | [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
 | [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
 | [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
@@ -22,5 +34,6 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
 - **Go style**: Prefer `any` over `interface{}`
 - **Comments**: Explain *why*, not *what*
 - **Docs**: Update `docs/content/` when adding features or changing config
+- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
 - **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
 - **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -13,6 +13,7 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
  - [Development Workflow](#development-workflow)
  - [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
 - [Coding Guidelines](#coding-guidelines)
+- [AI Coding Assistants](#ai-coding-assistants)
 - [Testing](#testing)
 - [Documentation](#documentation)
 - [Community and Communication](#community-and-communication)
@@ -185,7 +186,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to

 This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.

-For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
+For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.

 ### General Principles

@@ -211,6 +212,26 @@ For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific gui
 - Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
 - Be responsive to review feedback and keep discussions constructive.

+## AI Coding Assistants
+
+LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
+
+The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
+
+- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
+- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
+- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
+
+  ```
+  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
+  ```
+
+  Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
+
+  Basic development tools (git, go, make, editors) should not be listed.
+- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
+- Contributions must remain compatible with LocalAI's **MIT License**.
+
 ## Testing

 All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
--- a/116
+++ b/116
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad

 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -434,6 +434,7 @@ prepare-test-extra: protogen-python
 	$(MAKE) -C backend/python/ace-step
 	$(MAKE) -C backend/python/trl
 	$(MAKE) -C backend/python/tinygrad
+	$(MAKE) -C backend/python/insightface
 	$(MAKE) -C backend/rust/kokoros kokoros-grpc

 test-extra: prepare-test-extra
@@ -457,6 +458,7 @@ test-extra: prepare-test-extra
 	$(MAKE) -C backend/python/ace-step test
 	$(MAKE) -C backend/python/trl test
 	$(MAKE) -C backend/python/tinygrad test
+	$(MAKE) -C backend/python/insightface test
 	$(MAKE) -C backend/rust/kokoros test

 ##
@@ -507,6 +509,13 @@ test-extra-backend: protogen-go
 	BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
 	BACKEND_TEST_CACHE_TYPE_K="$$BACKEND_TEST_CACHE_TYPE_K" \
 	BACKEND_TEST_CACHE_TYPE_V="$$BACKEND_TEST_CACHE_TYPE_V" \
+	BACKEND_TEST_FACE_IMAGE_1_URL="$$BACKEND_TEST_FACE_IMAGE_1_URL" \
+	BACKEND_TEST_FACE_IMAGE_1_FILE="$$BACKEND_TEST_FACE_IMAGE_1_FILE" \
+	BACKEND_TEST_FACE_IMAGE_2_URL="$$BACKEND_TEST_FACE_IMAGE_2_URL" \
+	BACKEND_TEST_FACE_IMAGE_2_FILE="$$BACKEND_TEST_FACE_IMAGE_2_FILE" \
+	BACKEND_TEST_FACE_IMAGE_3_URL="$$BACKEND_TEST_FACE_IMAGE_3_URL" \
+	BACKEND_TEST_FACE_IMAGE_3_FILE="$$BACKEND_TEST_FACE_IMAGE_3_FILE" \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING="$$BACKEND_TEST_VERIFY_DISTANCE_CEILING" \
 	go test -v -timeout 30m ./tests/e2e-backends/...

 ## Convenience wrappers: build the image, then exercise it.
@@ -603,6 +612,107 @@ test-extra-backend-tinygrad-all: \
 	test-extra-backend-tinygrad-sd \
 	test-extra-backend-tinygrad-whisper

+## insightface — face recognition.
+##
+## Face fixtures default to the sample images shipped in the
+## deepinsight/insightface repository (MIT-licensed). For offline/local
+## runs override with BACKEND_TEST_FACE_IMAGE_{1,2,3}_FILE pointing at
+## local paths.
+FACE_IMAGE_1_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
+FACE_IMAGE_2_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
+FACE_IMAGE_3_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/mask_white.jpg
+
+## Host-side cache for the OpenCV Zoo face ONNX files used by the
+## opencv e2e target. The backend image no longer bakes model weights —
+## gallery installs bring them via `files:` — but the e2e suite drives
+## LoadModel over gRPC directly without going through the gallery. We
+## pre-download the ONNX files to a stable host path and pass absolute
+## paths in BACKEND_TEST_OPTIONS; `make` skips the downloads when the
+## SHA-256 already matches.
+INSIGHTFACE_OPENCV_DIR := /tmp/localai-insightface-opencv-cache
+INSIGHTFACE_OPENCV_YUNET_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
+INSIGHTFACE_OPENCV_SFACE_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
+INSIGHTFACE_OPENCV_YUNET_SHA := 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
+INSIGHTFACE_OPENCV_SFACE_SHA := 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
+
+## buffalo_sc (insightface) — pack zip + SHA-256 mirrors the gallery
+## entry so the e2e target matches exactly what `local-ai models install
+## insightface-buffalo-sc` would have fetched. Smallest insightface pack
+## (~16MB) — keeps CI fast while still covering the insightface engine
+## code path end-to-end.
+INSIGHTFACE_BUFFALO_SC_DIR := /tmp/localai-insightface-buffalo-sc-cache
+INSIGHTFACE_BUFFALO_SC_URL := https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
+INSIGHTFACE_BUFFALO_SC_SHA := 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
+
+.PHONY: insightface-opencv-models
+insightface-opencv-models:
+	@mkdir -p $(INSIGHTFACE_OPENCV_DIR)
+	@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_YUNET_SHA)" ]; then \
+		echo "Fetching YuNet..."; \
+		curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx $(INSIGHTFACE_OPENCV_YUNET_URL); \
+		echo "$(INSIGHTFACE_OPENCV_YUNET_SHA)  $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx" | sha256sum -c; \
+	fi
+	@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/sface.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_SFACE_SHA)" ]; then \
+		echo "Fetching SFace..."; \
+		curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/sface.onnx $(INSIGHTFACE_OPENCV_SFACE_URL); \
+		echo "$(INSIGHTFACE_OPENCV_SFACE_SHA)  $(INSIGHTFACE_OPENCV_DIR)/sface.onnx" | sha256sum -c; \
+	fi
+
+.PHONY: insightface-buffalo-sc-models
+insightface-buffalo-sc-models:
+	@mkdir -p $(INSIGHTFACE_BUFFALO_SC_DIR)
+	@if [ "$$(sha256sum $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_BUFFALO_SC_SHA)" ]; then \
+		echo "Fetching buffalo_sc..."; \
+		curl -fsSL -o $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip $(INSIGHTFACE_BUFFALO_SC_URL); \
+		echo "$(INSIGHTFACE_BUFFALO_SC_SHA)  $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip" | sha256sum -c; \
+		rm -f $(INSIGHTFACE_BUFFALO_SC_DIR)/*.onnx; \
+	fi
+	@if [ ! -f "$(INSIGHTFACE_BUFFALO_SC_DIR)/det_500m.onnx" ]; then \
+		echo "Extracting buffalo_sc..."; \
+		unzip -o -q $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip -d $(INSIGHTFACE_BUFFALO_SC_DIR); \
+	fi
+
+## buffalo_sc — smallest insightface pack (SCRFD-500MF detector + MBF
+## recognizer, ~16MB). Exercises the insightface engine code path
+## (model_zoo-backed inference) without the ~326MB buffalo_l download.
+## No age/gender/landmark heads — face_analyze is dropped from caps.
+## The pack is pre-fetched on the host and passed as `root:<dir>` since
+## the e2e suite drives LoadModel directly without going through
+## LocalAI's gallery flow (which is what would normally populate
+## ModelPath and in turn the engine's `_model_dir` option).
+test-extra-backend-insightface-buffalo-sc: docker-build-insightface insightface-buffalo-sc-models
+	BACKEND_IMAGE=local-ai-backend:insightface \
+	BACKEND_TEST_MODEL_NAME=insightface-buffalo-sc \
+	BACKEND_TEST_OPTIONS=engine:insightface,model_pack:buffalo_sc,root:$(INSIGHTFACE_BUFFALO_SC_DIR) \
+	BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
+	BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
+	BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
+	BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
+	$(MAKE) test-extra-backend
+
+## OpenCV Zoo YuNet + SFace — Apache 2.0, commercial-safe. face_analyze
+## cap is dropped (SFace has no demographic head). The ONNX files are
+## pre-fetched on the host via the insightface-opencv-models target and
+## passed as absolute paths, since the e2e suite drives LoadModel
+## directly without going through LocalAI's gallery flow.
+test-extra-backend-insightface-opencv: docker-build-insightface insightface-opencv-models
+	BACKEND_IMAGE=local-ai-backend:insightface \
+	BACKEND_TEST_MODEL_NAME=insightface-opencv \
+	BACKEND_TEST_OPTIONS=engine:onnx_direct,detector_onnx:$(INSIGHTFACE_OPENCV_DIR)/yunet.onnx,recognizer_onnx:$(INSIGHTFACE_OPENCV_DIR)/sface.onnx \
+	BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
+	BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
+	BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
+	BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
+	BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
+	$(MAKE) test-extra-backend
+
+## Aggregate — runs both face-recognition model configurations so CI
+## catches regressions across engines together.
+test-extra-backend-insightface-all: \
+	test-extra-backend-insightface-buffalo-sc \
+	test-extra-backend-insightface-opencv
+
 ## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
 ## tool-call extraction via sglang's native qwen parser. CPU builds use
 ## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
@@ -748,6 +858,7 @@ BACKEND_OUTETTS = outetts|python|.|false|true
 BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
 BACKEND_COQUI = coqui|python|.|false|true
 BACKEND_RFDETR = rfdetr|python|.|false|true
+BACKEND_INSIGHTFACE = insightface|python|.|false|true
 BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
 BACKEND_NEUTTS = neutts|python|.|false|true
 BACKEND_KOKORO = kokoro|python|.|false|true
@@ -819,6 +930,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
 $(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
 $(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
+$(eval $(call generate-docker-build-target,$(BACKEND_INSIGHTFACE)))
 $(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
@@ -853,7 +965,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
 docker-save-%: backend-images
 	docker save local-ai-backend:$* -o backend-images/$*.tar

-docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
+docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface

 ########################################################
 ### Mock Backend for E2E Tests
--- a/backend/backend.proto
+++ b/backend/backend.proto
@@ -24,6 +24,8 @@ service Backend {
  rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
  rpc Status(HealthMessage) returns (StatusResponse) {}
  rpc Detect(DetectOptions) returns (DetectResponse) {}
+  rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
+  rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}

  rpc StoresSet(StoresSetOptions) returns (Result) {}
  rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
@@ -475,6 +477,57 @@ message DetectResponse {
  repeated Detection Detections = 1;
 }

+// --- Face recognition messages ---
+
+message FacialArea {
+  float x = 1;
+  float y = 2;
+  float w = 3;
+  float h = 4;
+}
+
+message FaceVerifyRequest {
+  string img1 = 1;              // base64-encoded image
+  string img2 = 2;              // base64-encoded image
+  float  threshold = 3;         // cosine-distance threshold; 0 = use backend default
+  bool   anti_spoofing = 4;     // reserved for future MiniFASNet bolt-on
+}
+
+message FaceVerifyResponse {
+  bool       verified = 1;
+  float      distance = 2;      // 1 - cosine_similarity
+  float      threshold = 3;
+  float      confidence = 4;    // 0-100
+  string     model = 5;         // e.g. "buffalo_l"
+  FacialArea img1_area = 6;
+  FacialArea img2_area = 7;
+  float      processing_time_ms = 8;
+}
+
+message FaceAnalyzeRequest {
+  string          img = 1;          // base64-encoded image
+  repeated string actions = 2;      // subset of ["age","gender","emotion","race"]; empty = all-supported
+  bool            anti_spoofing = 3;
+}
+
+message FaceAnalysis {
+  FacialArea         region = 1;
+  float              face_confidence = 2;
+  float              age = 3;
+  string             dominant_gender = 4;   // "Man" | "Woman"
+  map<string, float> gender = 5;
+  string             dominant_emotion = 6;  // reserved; empty in MVP
+  map<string, float> emotion = 7;
+  string             dominant_race = 8;     // not populated
+  map<string, float> race = 9;
+  bool               is_real = 10;          // anti-spoofing result when enabled
+  float              antispoof_score = 11;
+}
+
+message FaceAnalyzeResponse {
+  repeated FaceAnalysis faces = 1;
+}
+
 message ToolFormatMarkers {
  string format_type = 1;           // "json_native", "tag_with_json", "tag_with_tagged"

--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
+IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/ik-llama-cpp/grpc-server.cpp
+++ b/backend/cpp/ik-llama-cpp/grpc-server.cpp
@@ -326,7 +326,7 @@ struct llama_client_slot
       char buffer[512];
        double t_token = t_prompt_processing / num_prompt_tokens_processed;
        double n_tokens_second = 1e3 / t_prompt_processing * num_prompt_tokens_processed;
-        sprintf(buffer, "prompt eval time     = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
+        snprintf(buffer, sizeof(buffer), "prompt eval time     = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
                t_prompt_processing, num_prompt_tokens_processed,
                t_token, n_tokens_second);
        LOG_INFO(buffer, {
@@ -340,7 +340,7 @@ struct llama_client_slot

        t_token = t_token_generation / n_decoded;
        n_tokens_second = 1e3 / t_token_generation * n_decoded;
-        sprintf(buffer, "generation eval time = %10.2f ms / %5d runs   (%8.2f ms per token, %8.2f tokens per second)",
+        snprintf(buffer, sizeof(buffer), "generation eval time = %10.2f ms / %5d runs   (%8.2f ms per token, %8.2f tokens per second)",
                t_token_generation, n_decoded,
                t_token, n_tokens_second);
        LOG_INFO(buffer, {
@@ -352,7 +352,7 @@ struct llama_client_slot
            {"n_tokens_second",    n_tokens_second},
        });

-        sprintf(buffer, "          total time = %10.2f ms", t_prompt_processing + t_token_generation);
+        snprintf(buffer, sizeof(buffer), "          total time = %10.2f ms", t_prompt_processing + t_token_generation);
        LOG_INFO(buffer, {
            {"slot_id",             id},
            {"task_id",             task_id},
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
+LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/turboquant/Makefile
+++ b/backend/cpp/turboquant/Makefile
@@ -1,7 +1,7 @@

-# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
+# Pinned to the HEAD of rebase/upstream-sync-april-2026 on https://github.com/TheTom/llama-cpp-turboquant.
 # Auto-bumped nightly by .github/workflows/bump_deps.yaml.
-TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
+TURBOQUANT_VERSION?=7f320bb89f68096240a517783674cc17c66b7ad2
 LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant

 CMAKE_ARGS?=
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d
+STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/stablediffusion-ggml/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/gosd.cpp
@@ -1106,6 +1106,11 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
            const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
            const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
            const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
+            // Force MP4 container. Distributed LocalAI hands us a staging
+            // path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
+            // extension; relying on filename suffix makes ffmpeg bail with
+            // "Unable to choose an output format".
+            const_cast<char*>("-f"), const_cast<char*>("mp4"),
            const_cast<char*>(dst),
            nullptr
        };
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
+WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -168,6 +168,43 @@
    nvidia-cuda-13: "cuda13-rfdetr"
    nvidia-cuda-12: "cuda12-rfdetr"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-rfdetr"
+- &insightface
+  name: "insightface"
+  alias: "insightface"
+  # Upstream insightface library is MIT. The pretrained model packs
+  # (buffalo_l, buffalo_s, antelopev2) are released for NON-COMMERCIAL
+  # research use only. The backend image also pre-bakes OpenCV Zoo
+  # YuNet + SFace (Apache 2.0) for commercial use. Pick the engine
+  # via model-gallery entries (insightface-buffalo-l / insightface-opencv
+  # / insightface-buffalo-s) or set `options` in your model YAML.
+  license: "mixed"
+  description: |
+    Face recognition backend powered by `insightface` (ONNX Runtime).
+    Provides face verification (/v1/face/verify), face analysis
+    (/v1/face/analyze), face embedding (/v1/embeddings), face
+    detection (/v1/detection), and 1:N identification
+    (/v1/face/{register,identify,forget}).
+    Ships two engines in a single image: one that drives the insightface
+    model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research
+    use only) and one that drives OpenCV Zoo's YuNet + SFace pair
+    (Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]`
+    in your model YAML, or install one of the ready-made model-gallery
+    entries under the `insightface-*` prefix.
+    The backend image contains only code and Python deps; all model
+    weights are managed by LocalAI's gallery download mechanism.
+  urls:
+    - https://github.com/deepinsight/insightface
+    - https://github.com/opencv/opencv_zoo
+  tags:
+    - face-recognition
+    - face-verification
+    - face-embedding
+    - gpu
+    - cpu
+  capabilities:
+    default: "cpu-insightface"
+    nvidia: "cuda12-insightface"
+    nvidia-cuda-12: "cuda12-insightface"
 - &sam3cpp
  name: "sam3-cpp"
  alias: "sam3-cpp"
@@ -587,7 +624,6 @@
  alias: "whisperx"
  capabilities:
    nvidia: "cuda12-whisperx"
-    amd: "rocm-whisperx"
    metal: "metal-whisperx"
    default: "cpu-whisperx"
    nvidia-cuda-13: "cuda13-whisperx"
@@ -2745,7 +2781,6 @@
  name: "whisperx-development"
  capabilities:
    nvidia: "cuda12-whisperx-development"
-    amd: "rocm-whisperx-development"
    metal: "metal-whisperx-development"
    default: "cpu-whisperx-development"
    nvidia-cuda-13: "cuda13-whisperx-development"
@@ -2771,16 +2806,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
- !!merge <<: *whisperx
-  name: "rocm-whisperx"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
-  mirrors:
-    - localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
- !!merge <<: *whisperx
-  name: "rocm-whisperx-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
-  mirrors:
-    - localai/localai-backends:master-gpu-rocm-hipblas-whisperx
 - !!merge <<: *whisperx
  name: "cuda13-whisperx"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
@@ -3721,3 +3746,30 @@
  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
  mirrors:
    - localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization
+# insightface (face recognition) — development and concrete image entries
+- !!merge <<: *insightface
+  name: "insightface-development"
+  capabilities:
+    default: "cpu-insightface-development"
+    nvidia: "cuda12-insightface-development"
+    nvidia-cuda-12: "cuda12-insightface-development"
+- !!merge <<: *insightface
+  name: "cpu-insightface"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-insightface"
+  mirrors:
+    - localai/localai-backends:latest-cpu-insightface
+- !!merge <<: *insightface
+  name: "cuda12-insightface"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-insightface"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-insightface
+- !!merge <<: *insightface
+  name: "cpu-insightface-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-insightface"
+  mirrors:
+    - localai/localai-backends:master-cpu-insightface
+- !!merge <<: *insightface
+  name: "cuda12-insightface-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-insightface"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-insightface
--- a/backend/python/insightface/Makefile
+++ b/backend/python/insightface/Makefile
@@ -0,0 +1,13 @@
+.DEFAULT_GOAL := install
+
+.PHONY: install
+install:
+	bash install.sh
+
+.PHONY: protogen-clean
+protogen-clean:
+	$(RM) backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: clean
+clean: protogen-clean
+	rm -rf venv __pycache__
--- a/backend/python/insightface/README.md
+++ b/backend/python/insightface/README.md
@@ -0,0 +1,67 @@
+# insightface backend (LocalAI)
+
+Face recognition backend backed by ONNX Runtime. Provides face
+verification (1:1), face analysis (age/gender), face detection, face
+embedding, and — via LocalAI's built-in vector store — 1:N
+identification.
+
+## Engines
+
+This backend ships with **two** interchangeable engines selected via
+`LoadModel.Options["engine"]`:
+
+| engine | Implementation | Models | License |
+|---|---|---|---|
+| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
+| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
+
+Both engines implement the same `FaceEngine` protocol in `engines.py`,
+so the gRPC servicer in `backend.py` doesn't need to know which one is
+active.
+
+## LoadModel options
+
+Common:
+
+| option | default | description |
+|---|---|---|
+| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
+| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
+| `det_thresh` | `0.5` | detector confidence threshold |
+| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
+
+`insightface` engine:
+
+| option | default | description |
+|---|---|---|
+| `model_pack` | `buffalo_l` | which insightface pack to load |
+
+`onnx_direct` engine:
+
+| option | default | description |
+|---|---|---|
+| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
+| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
+
+## Adding a new model pack
+
+1. If it's an insightface pack (auto-downloadable or manually extracted
+   into `~/.insightface/models/<name>/`), just add a new gallery entry
+   in `backend/index.yaml` with `options: ["engine:insightface",
+   "model_pack:<name>"]`. No code change.
+2. If it's an Apache-licensed ONNX pair, add a gallery entry with
+   `options: ["engine:onnx_direct", "detector_onnx:...",
+   "recognizer_onnx:..."]`. If the detector or recognizer has a
+   different input-tensor shape than YuNet/SFace, you may need a new
+   engine implementation in `engines.py`; the two-engine seam makes
+   that a self-contained change.
+
+## Running tests locally
+
+```bash
+make -C backend/python/insightface         # install deps + bake models
+make -C backend/python/insightface test    # run test.py
+```
+
+The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
+absent (e.g. on dev boxes where `install.sh` wasn't run).
--- a/backend/python/insightface/backend.py
+++ b/backend/python/insightface/backend.py
@@ -0,0 +1,265 @@
+#!/usr/bin/env python3
+"""gRPC server for the insightface face recognition backend.
+
+Implements Health / LoadModel / Status plus the face-specific methods:
+Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
+delegated to engines.py — this file is just the gRPC plumbing.
+"""
+import argparse
+import base64
+import os
+import signal
+import sys
+import time
+from concurrent import futures
+from io import BytesIO
+
+import backend_pb2
+import backend_pb2_grpc
+import cv2
+import grpc
+import numpy as np
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
+from grpc_auth import get_auth_interceptors  # noqa: E402
+
+from engines import FaceEngine, build_engine  # noqa: E402
+
+_ONE_DAY = 60 * 60 * 24
+MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
+
+# Default cosine-distance threshold for "same person" on buffalo_l
+# ArcFace R50. Clients can override per-request; clients using SFace
+# should pass threshold≈0.4 since the distance distribution is wider.
+DEFAULT_VERIFY_THRESHOLD = 0.35
+
+
+def _decode_image(src: str) -> np.ndarray | None:
+    """Decode a base64-encoded image into an OpenCV BGR numpy array."""
+    if not src:
+        return None
+    try:
+        data = base64.b64decode(src, validate=False)
+    except Exception:
+        return None
+    arr = np.frombuffer(data, dtype=np.uint8)
+    if arr.size == 0:
+        return None
+    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
+    return img
+
+
+def _parse_options(raw: list[str]) -> dict[str, str]:
+    out: dict[str, str] = {}
+    for entry in raw:
+        if ":" not in entry:
+            continue
+        k, v = entry.split(":", 1)
+        out[k.strip()] = v.strip()
+    return out
+
+
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+    def __init__(self) -> None:
+        self.engine: FaceEngine | None = None
+        self.engine_name: str = ""
+        self.model_name: str = ""
+        self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
+
+    def Health(self, request, context):
+        return backend_pb2.Reply(message=bytes("OK", "utf-8"))
+
+    def LoadModel(self, request, context):
+        options = _parse_options(list(request.Options))
+        # Surface LocalAI's models directory (ModelPath) so engines can
+        # anchor relative paths — OnnxDirectEngine's detector_onnx /
+        # recognizer_onnx point at gallery-managed files that LocalAI
+        # dropped there, and InsightFaceEngine auto-downloads its packs
+        # into that same directory alongside every other managed model.
+        # Private key to avoid clashing with user-provided options.
+        if request.ModelPath:
+            options["_model_dir"] = request.ModelPath
+
+        engine_name = options.get("engine", "insightface")
+        try:
+            self.engine = build_engine(engine_name)
+            self.engine.prepare(options)
+        except Exception as err:  # pragma: no cover - exercised via e2e
+            return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
+
+        self.engine_name = engine_name
+        self.model_name = request.Model or options.get("model_pack", "")
+        if "verify_threshold" in options:
+            try:
+                self.verify_threshold = float(options["verify_threshold"])
+            except ValueError:
+                pass
+        print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
+        return backend_pb2.Result(success=True, message="Model loaded successfully")
+
+    def Status(self, request, context):
+        state = (
+            backend_pb2.StatusResponse.READY
+            if self.engine is not None
+            else backend_pb2.StatusResponse.UNINITIALIZED
+        )
+        return backend_pb2.StatusResponse(state=state)
+
+    def Embedding(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.EmbeddingResult()
+        if not request.Images:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("Embedding requires Images[0] to be a base64 image")
+            return backend_pb2.EmbeddingResult()
+
+        img = _decode_image(request.Images[0])
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.EmbeddingResult()
+
+        vec = self.engine.embed(img)
+        if vec is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected")
+            return backend_pb2.EmbeddingResult()
+        return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
+
+    def Detect(self, request, context):
+        if self.engine is None:
+            return backend_pb2.DetectResponse()
+        img = _decode_image(request.src)
+        if img is None:
+            return backend_pb2.DetectResponse()
+        detections = []
+        for d in self.engine.detect(img):
+            x1, y1, x2, y2 = d.bbox
+            detections.append(
+                backend_pb2.Detection(
+                    x=float(x1),
+                    y=float(y1),
+                    width=float(x2 - x1),
+                    height=float(y2 - y1),
+                    confidence=float(d.score),
+                    class_name="face",
+                )
+            )
+        return backend_pb2.DetectResponse(Detections=detections)
+
+    def FaceVerify(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceVerifyResponse()
+
+        img1 = _decode_image(request.img1)
+        img2 = _decode_image(request.img2)
+        if img1 is None or img2 is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        threshold = request.threshold if request.threshold > 0 else self.verify_threshold
+
+        start = time.time()
+        e1 = self.engine.embed(img1)
+        e2 = self.engine.embed(img2)
+        if e1 is None or e2 is None:
+            context.set_code(grpc.StatusCode.NOT_FOUND)
+            context.set_details("no face detected in one or both images")
+            return backend_pb2.FaceVerifyResponse()
+
+        # Both engines return L2-normalized vectors, so the dot product
+        # is the cosine similarity directly.
+        sim = float(np.dot(e1, e2))
+        distance = 1.0 - sim
+        verified = distance < threshold
+        confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
+
+        def _region(img) -> backend_pb2.FacialArea:
+            dets = self.engine.detect(img)
+            if not dets:
+                return backend_pb2.FacialArea()
+            best = max(dets, key=lambda d: d.score)
+            x1, y1, x2, y2 = best.bbox
+            return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
+
+        return backend_pb2.FaceVerifyResponse(
+            verified=verified,
+            distance=float(distance),
+            threshold=float(threshold),
+            confidence=float(confidence),
+            model=self.model_name or self.engine_name,
+            img1_area=_region(img1),
+            img2_area=_region(img2),
+            processing_time_ms=float((time.time() - start) * 1000.0),
+        )
+
+    def FaceAnalyze(self, request, context):
+        if self.engine is None:
+            context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
+            context.set_details("face model not loaded")
+            return backend_pb2.FaceAnalyzeResponse()
+        img = _decode_image(request.img)
+        if img is None:
+            context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
+            context.set_details("failed to decode image")
+            return backend_pb2.FaceAnalyzeResponse()
+
+        faces = []
+        for attrs in self.engine.analyze(img):
+            x, y, w, h = attrs.region
+            fa = backend_pb2.FaceAnalysis(
+                region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
+                face_confidence=float(attrs.face_confidence),
+            )
+            if attrs.age is not None:
+                fa.age = float(attrs.age)
+            if attrs.dominant_gender:
+                fa.dominant_gender = attrs.dominant_gender
+            for k, v in attrs.gender.items():
+                fa.gender[k] = float(v)
+            faces.append(fa)
+        return backend_pb2.FaceAnalyzeResponse(faces=faces)
+
+
+def serve(address: str) -> None:
+    server = grpc.server(
+        futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
+        options=[
+            ("grpc.max_message_length", 50 * 1024 * 1024),
+            ("grpc.max_send_message_length", 50 * 1024 * 1024),
+            ("grpc.max_receive_message_length", 50 * 1024 * 1024),
+        ],
+        interceptors=get_auth_interceptors(),
+    )
+    backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+    server.add_insecure_port(address)
+    server.start()
+    print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
+
+    def _stop(sig, frame):  # pragma: no cover
+        print("[insightface] shutting down")
+        server.stop(0)
+        sys.exit(0)
+
+    signal.signal(signal.SIGINT, _stop)
+    signal.signal(signal.SIGTERM, _stop)
+
+    try:
+        while True:
+            time.sleep(_ONE_DAY)
+    except KeyboardInterrupt:
+        server.stop(0)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
+    parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
+    args = parser.parse_args()
+    print(f"[insightface] startup: {args}", file=sys.stderr)
+    serve(args.addr)
--- a/backend/python/insightface/engines.py
+++ b/backend/python/insightface/engines.py
@@ -0,0 +1,382 @@
+"""Face recognition engine implementations for the LocalAI insightface backend.
+
+Two engines are provided:
+
+    * InsightFaceEngine  — wraps insightface.app.FaceAnalysis. Supports
+                           buffalo_l / buffalo_s / antelopev2 model packs
+                           with SCRFD detector + ArcFace recognizer +
+                           genderage head. NON-COMMERCIAL research use
+                           only (upstream license).
+
+    * OnnxDirectEngine   — loads detector + recognizer ONNX files directly
+                           via onnxruntime. Used for OpenCV Zoo models
+                           (YuNet + SFace) and any future Apache-licensed
+                           model set. Does not support analyze().
+
+Both engines expose the same interface so the gRPC servicer (backend.py)
+can dispatch without knowing which one is active.
+"""
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Any, Protocol
+
+import cv2
+import numpy as np
+
+
+@dataclass
+class FaceDetection:
+    bbox: tuple[float, float, float, float]  # x1, y1, x2, y2
+    score: float
+    landmarks: np.ndarray | None = None      # 5x2 keypoints when available
+
+
+@dataclass
+class FaceAttributes:
+    region: tuple[float, float, float, float]  # x, y, w, h
+    face_confidence: float
+    age: float | None = None
+    dominant_gender: str | None = None
+    gender: dict[str, float] = field(default_factory=dict)
+
+
+class FaceEngine(Protocol):
+    """Minimal interface every engine must implement."""
+
+    def prepare(self, options: dict[str, str]) -> None: ...
+    def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
+    def embed(self, img: np.ndarray) -> np.ndarray | None: ...
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
+
+
+# ─── InsightFaceEngine ────────────────────────────────────────────────
+
+class InsightFaceEngine:
+    """Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
+
+    FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
+    in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
+    build a `{taskname: model}` dict, then loop per-face at inference).
+    We reimplement the same loop here so we can:
+
+      1. Load packs from whatever directory LocalAI's gallery extracted
+         them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
+         nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
+         without needing a specific layout on disk.
+      2. Skip insightface's built-in auto-download entirely: weight
+         delivery is LocalAI's gallery `files:` job now, checksum-
+         verified and cached alongside every other managed model.
+
+    The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
+    Landmark) stay in insightface — we only reimplement the ~50 lines
+    of glue around them.
+    """
+
+    def __init__(self) -> None:
+        self.models: dict[str, Any] = {}
+        self.det_model: Any = None
+        self.model_pack: str = "buffalo_l"
+        self.det_size: tuple[int, int] = (640, 640)
+        self.det_thresh: float = 0.5
+        self._providers: list[str] = ["CPUExecutionProvider"]
+
+    def prepare(self, options: dict[str, str]) -> None:
+        import glob
+        import os
+
+        from insightface.model_zoo import model_zoo
+
+        self.model_pack = options.get("model_pack", "buffalo_l")
+        self.det_size = _parse_det_size(options.get("det_size", "640x640"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        pack_dir = _locate_insightface_pack(options, self.model_pack)
+        if pack_dir is None:
+            raise ValueError(
+                f"no insightface pack '{self.model_pack}' found — install via "
+                f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
+            )
+
+        onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
+        if not onnx_files:
+            raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
+
+        # CUDAExecutionProvider is picked automatically by onnxruntime-gpu
+        # when available; falling back to CPU keeps the CPU-only image
+        # working. ctx_id=0 means "first GPU if any, else CPU".
+        self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
+
+        self.models = {}
+        for onnx_file in onnx_files:
+            m = model_zoo.get_model(onnx_file, providers=self._providers)
+            if m is None:
+                continue
+            # First occurrence of each taskname wins (matches FaceAnalysis).
+            if m.taskname not in self.models:
+                self.models[m.taskname] = m
+
+        if "detection" not in self.models:
+            raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
+        self.det_model = self.models["detection"]
+
+        self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
+        for name, m in self.models.items():
+            if name != "detection":
+                m.prepare(0)
+
+    def _faces(self, img: np.ndarray) -> list[Any]:
+        """Run detection + all non-detection models per face."""
+        if self.det_model is None:
+            return []
+        from insightface.app.common import Face
+
+        bboxes, kpss = self.det_model.detect(img, max_num=0)
+        if bboxes is None or bboxes.shape[0] == 0:
+            return []
+        faces: list[Any] = []
+        for i in range(bboxes.shape[0]):
+            bbox = bboxes[i, 0:4]
+            det_score = bboxes[i, 4]
+            kps = kpss[i] if kpss is not None else None
+            face = Face(bbox=bbox, kps=kps, det_score=det_score)
+            for name, m in self.models.items():
+                if name == "detection":
+                    continue
+                m.get(img, face)
+            faces.append(face)
+        return faces
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        return [
+            FaceDetection(
+                bbox=tuple(float(v) for v in f.bbox),
+                score=float(f.det_score),
+                landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
+            )
+            for f in self._faces(img)
+        ]
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        faces = self._faces(img)
+        if not faces:
+            return None
+        best = max(faces, key=lambda f: float(f.det_score))
+        if getattr(best, "normed_embedding", None) is None:
+            return None
+        return np.asarray(best.normed_embedding, dtype=np.float32)
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        out: list[FaceAttributes] = []
+        for f in self._faces(img):
+            x1, y1, x2, y2 = (float(v) for v in f.bbox)
+            region = (x1, y1, x2 - x1, y2 - y1)
+            attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
+            age = getattr(f, "age", None)
+            if age is not None:
+                attrs.age = float(age)
+            gender = getattr(f, "gender", None)
+            if gender is not None:
+                # genderage head emits argmax, not probabilities —
+                # one-hot dict keeps the API stable.
+                attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
+                attrs.gender = {
+                    "Man": 1.0 if int(gender) == 1 else 0.0,
+                    "Woman": 0.0 if int(gender) == 1 else 1.0,
+                }
+            out.append(attrs)
+        return out
+
+
+# ─── OnnxDirectEngine ─────────────────────────────────────────────────
+
+class OnnxDirectEngine:
+    """Loads detector + recognizer ONNX files directly.
+
+    Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
+    exposes a C++-level API via cv2.FaceDetectorYN which accepts the
+    ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
+    Both are Apache 2.0 licensed.
+    """
+
+    def __init__(self) -> None:
+        self.detector_path: str = ""
+        self.recognizer_path: str = ""
+        self.input_size: tuple[int, int] = (320, 320)
+        self.det_thresh: float = 0.5
+        self._detector: Any = None
+        self._recognizer: Any = None
+
+    def prepare(self, options: dict[str, str]) -> None:
+        raw_det = options.get("detector_onnx", "")
+        raw_rec = options.get("recognizer_onnx", "")
+        if not raw_det or not raw_rec:
+            raise ValueError(
+                "onnx_direct engine requires both detector_onnx and recognizer_onnx options"
+            )
+        model_dir = options.get("_model_dir")
+        self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
+        self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
+        self.input_size = _parse_det_size(options.get("det_size", "320x320"))
+        self.det_thresh = float(options.get("det_thresh", "0.5"))
+
+        # YuNet is a fixed-size detector; size is reset per detect() call to
+        # match the input frame.
+        self._detector = cv2.FaceDetectorYN.create(
+            self.detector_path,
+            "",
+            self.input_size,
+            score_threshold=self.det_thresh,
+            nms_threshold=0.3,
+            top_k=5000,
+        )
+        self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
+
+    def detect(self, img: np.ndarray) -> list[FaceDetection]:
+        if self._detector is None:
+            return []
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None:
+            return []
+        out: list[FaceDetection] = []
+        for row in faces:
+            x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
+            # Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
+            landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
+            score = float(row[-1])
+            out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
+        return out
+
+    def embed(self, img: np.ndarray) -> np.ndarray | None:
+        if self._detector is None or self._recognizer is None:
+            return None
+        h, w = img.shape[:2]
+        self._detector.setInputSize((w, h))
+        retval, faces = self._detector.detect(img)
+        if faces is None or len(faces) == 0:
+            return None
+        # Pick the highest-score face (last column is score).
+        best = max(faces, key=lambda r: float(r[-1]))
+        aligned = self._recognizer.alignCrop(img, best)
+        feat = self._recognizer.feature(aligned)
+        vec = np.asarray(feat, dtype=np.float32).flatten()
+        # SFace outputs a 128-dim feature; L2-normalize to make dot-product
+        # comparable to buffalo_l's already-normed 512-dim embedding.
+        norm = float(np.linalg.norm(vec))
+        if norm == 0:
+            return None
+        return vec / norm
+
+    def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
+        # OpenCV Zoo does not ship a demographic classifier; report
+        # only the face-detection regions so callers can still see
+        # how many faces were detected.
+        return [
+            FaceAttributes(
+                region=(
+                    d.bbox[0],
+                    d.bbox[1],
+                    d.bbox[2] - d.bbox[0],
+                    d.bbox[3] - d.bbox[1],
+                ),
+                face_confidence=d.score,
+            )
+            for d in self.detect(img)
+        ]
+
+
+# ─── helpers ──────────────────────────────────────────────────────────
+
+def _parse_det_size(raw: str) -> tuple[int, int]:
+    raw = raw.strip().lower().replace(" ", "")
+    if "x" in raw:
+        w, h = raw.split("x", 1)
+        return (int(w), int(h))
+    n = int(raw)
+    return (n, n)
+
+
+def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
+    """Find the directory holding the insightface pack's ONNX files.
+
+    LocalAI's gallery `files:` extracts the pack zip straight into the
+    models directory. Upstream packs are inconsistent:
+
+      buffalo_l/s/sc  — flat zip, ONNX lands at `<models_dir>/*.onnx`
+      buffalo_m, antelopev2  — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
+
+    We search, in order:
+      1. `<models_dir>/<name>/`  — wrapped-zip layout, or insightface's
+         own FaceAnalysis-style `<root>/models/<name>/` layout.
+      2. `<models_dir>/models/<name>/`  — insightface's FaceAnalysis
+         auto-download lands here (handy for dev environments that
+         still have old `~/.insightface` caches).
+      3. `<models_dir>/`  — flat-zip layout directly in models dir.
+
+    Returns the first directory whose contents include `*.onnx`.
+    """
+    import glob
+    import os
+
+    model_dir = options.get("_model_dir") or ""
+    explicit_root = options.get("root")
+
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, name))
+        candidates.append(os.path.join(model_dir, "models", name))
+        candidates.append(model_dir)
+    if explicit_root:
+        expanded = os.path.expanduser(explicit_root)
+        candidates.append(os.path.join(expanded, "models", name))
+        candidates.append(os.path.join(expanded, name))
+        candidates.append(expanded)
+
+    for c in candidates:
+        if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
+            return c
+    return None
+
+
+def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
+    """Resolve an ONNX file path across the paths LocalAI might deliver it from.
+
+    Search order:
+      1. The path itself if it already resolves (absolute, or relative to CWD).
+      2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
+         this is how LocalAI surfaces gallery-managed files. When the gallery
+         entry lists `files:`, each one lands under the models directory and
+         backends load them via filename anchored by ModelFile.
+      3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
+         where someone manually dropped weights inside the backend dir.
+
+    If none hit, return the literal input so cv2/insightface surfaces a
+    clearer error naming the actually-attempted path.
+    """
+    import os
+
+    if os.path.isfile(path):
+        return path
+    stripped = path.lstrip("/")
+    candidates: list[str] = []
+    if model_dir:
+        candidates.append(os.path.join(model_dir, os.path.basename(path)))
+        candidates.append(os.path.join(model_dir, stripped))
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    candidates.append(os.path.join(script_dir, stripped))
+    for c in candidates:
+        if os.path.isfile(c):
+            return c
+    return path
+
+
+def build_engine(name: str) -> FaceEngine:
+    """Factory for the engine selected by LoadModel options."""
+    key = name.strip().lower()
+    if key in ("", "insightface"):
+        return InsightFaceEngine()
+    if key in ("onnx_direct", "onnx-direct", "opencv"):
+        return OnnxDirectEngine()
+    raise ValueError(f"unknown engine: {name!r}")
--- a/backend/python/insightface/install.sh
+++ b/backend/python/insightface/install.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+installRequirements
+
+# We deliberately do NOT pre-bake any model weights here. Two reasons:
+#
+#   1. Weights should follow LocalAI's gallery-managed download flow
+#      like every other backend. For OpenCV Zoo (YuNet + SFace) the
+#      gallery entries in gallery/index.yaml list the ONNX files via
+#      `files:` with URI + SHA-256 — LocalAI fetches them into the
+#      models directory on `local-ai models install`.
+#
+#   2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
+#      buffalo_sc, antelopev2), upstream distributes zip archives
+#      only (no individual ONNX URLs). We rely on insightface's own
+#      auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
+#      at first LoadModel, pointed at a writable directory. This
+#      matches how rfdetr behaves (uses `inference.get_model()`).
+#
+# Net effect: the backend image ships only Python deps (~150MB CPU).
--- a/backend/python/insightface/requirements-cpu.txt
+++ b/backend/python/insightface/requirements-cpu.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements-cublas12.txt
+++ b/backend/python/insightface/requirements-cublas12.txt
@@ -0,0 +1,7 @@
+insightface
+onnxruntime-gpu
+opencv-python-headless
+numpy
+onnx
+cython
+scikit-image
--- a/backend/python/insightface/requirements.txt
+++ b/backend/python/insightface/requirements.txt
@@ -0,0 +1,3 @@
+grpcio==1.71.0
+protobuf
+grpcio-tools
--- a/backend/python/insightface/run.sh
+++ b/backend/python/insightface/run.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+startBackend $@
--- a/backend/python/insightface/smoke.py
+++ b/backend/python/insightface/smoke.py
@@ -0,0 +1,264 @@
+#!/usr/bin/env python3
+"""Smoke-test every face recognition model configuration shipped in the
+gallery. Simulates what LocalAI does at runtime: for each config, sets
+up a models directory, fetches any required files via URL (as the
+gallery's `files:` list would), then loads + detects + embeds via the
+in-process BackendServicer — matching the gRPC surface end users hit.
+
+Run inside the built backend image (venv already has insightface /
+onnxruntime / opencv-python-headless):
+
+    python smoke.py
+
+Network is required for the insightface packs (fetched via upstream's
+FaceAnalysis auto-download at first LoadModel) and for downloading
+the OpenCV Zoo ONNX files on first run.
+"""
+from __future__ import annotations
+
+import base64
+import hashlib
+import os
+import sys
+import traceback
+import urllib.request
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+from backend import BackendServicer  # noqa: E402
+
+
+# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
+# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
+OPENCV_FILES = {
+    "fp32": [
+        (
+            "face_detection_yunet_2023mar.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+            "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+        ),
+        (
+            "face_recognition_sface_2021dec.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+            "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+        ),
+    ],
+    "int8": [
+        (
+            "face_detection_yunet_2023mar_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
+            "321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
+        ),
+        (
+            "face_recognition_sface_2021dec_int8.onnx",
+            "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
+            "2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
+        ),
+    ],
+}
+
+
+CONFIGS = [
+    {
+        "name": "insightface-buffalo-l",
+        "options": ["engine:insightface", "model_pack:buffalo_l"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-sc",
+        "options": ["engine:insightface", "model_pack:buffalo_sc"],
+        # buffalo_sc has recognizer only — no landmarks, no genderage.
+        "has_analyze": False,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-s",
+        "options": ["engine:insightface", "model_pack:buffalo_s"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-buffalo-m",
+        "options": ["engine:insightface", "model_pack:buffalo_m"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-antelopev2",
+        "options": ["engine:insightface", "model_pack:antelopev2"],
+        "has_analyze": True,
+        "needs_opencv_files": None,
+    },
+    {
+        "name": "insightface-opencv",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "fp32",
+    },
+    {
+        "name": "insightface-opencv-int8",
+        "options": [
+            "engine:onnx_direct",
+            "detector_onnx:face_detection_yunet_2023mar_int8.onnx",
+            "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
+        ],
+        "has_analyze": False,
+        "needs_opencv_files": "int8",
+    },
+]
+
+
+class _FakeContext:
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+def _encode_image(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_sample_image() -> str:
+    from insightface.data import get_image as ins_get_image
+
+    return _encode_image(ins_get_image("t1"))
+
+
+def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
+    dest = os.path.join(model_dir, filename)
+    if os.path.isfile(dest):
+        h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+        if h == sha256:
+            return
+    sys.stderr.write(f"  fetching {filename} from {uri}\n")
+    sys.stderr.flush()
+    urllib.request.urlretrieve(uri, dest)
+    h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
+    if h != sha256:
+        raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
+
+
+def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
+    # Mirror LocalAI's gallery flow: populate model_dir with the
+    # gallery's listed files before calling LoadModel.
+    if cfg["needs_opencv_files"]:
+        for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
+            _download_if_missing(model_dir, filename, uri, sha256)
+
+    svc = BackendServicer()
+    ctx = _FakeContext()
+
+    load_res = svc.LoadModel(
+        backend_pb2.ModelOptions(
+            Model=cfg["name"],
+            Options=cfg["options"],
+            # ModelPath is what the Go loader sets to ml.ModelPath —
+            # LocalAI's models directory. The backend anchors relative
+            # paths and insightface auto-download root here.
+            ModelPath=model_dir,
+        ),
+        ctx,
+    )
+    if not load_res.success:
+        return False, f"LoadModel: {load_res.message}"
+
+    det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+    if len(det_res.Detections) == 0:
+        return False, "Detect returned no faces"
+    for d in det_res.Detections:
+        if d.class_name != "face":
+            return False, f"Detect returned class_name={d.class_name!r}"
+
+    emb_ctx = _FakeContext()
+    emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
+    if emb_ctx.code is not None:
+        return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
+    if len(emb_res.embeddings) == 0:
+        return False, "Embedding returned empty vector"
+    norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
+    if not (0.8 <= norm_sq <= 1.2):
+        return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
+
+    ver_ctx = _FakeContext()
+    ver_res = svc.FaceVerify(
+        backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
+    )
+    if ver_ctx.code is not None:
+        return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
+    if not ver_res.verified:
+        return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
+    if ver_res.distance > 0.1:
+        return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
+
+    if cfg["has_analyze"]:
+        an_ctx = _FakeContext()
+        an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
+        if an_ctx.code is not None:
+            return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
+        if len(an_res.faces) == 0:
+            return False, "FaceAnalyze returned no faces"
+        f0 = an_res.faces[0]
+        if f0.age <= 0:
+            return False, f"FaceAnalyze age not populated (age={f0.age})"
+        if f0.dominant_gender not in ("Man", "Woman"):
+            return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
+
+    n_dets = len(det_res.Detections)
+    dim = len(emb_res.embeddings)
+    return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
+
+
+def main() -> int:
+    # Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
+    # default to a fresh temp dir.
+    model_dir = os.environ.get("LOCALAI_MODELS_PATH")
+    if not model_dir:
+        import tempfile
+
+        model_dir = tempfile.mkdtemp(prefix="face-smoke-")
+    os.makedirs(model_dir, exist_ok=True)
+    print(f"model_dir={model_dir}", file=sys.stderr)
+
+    print("Preparing sample image from insightface.data...", file=sys.stderr)
+    img_b64 = _load_sample_image()
+
+    results: list[tuple[str, bool, str]] = []
+    for cfg in CONFIGS:
+        sys.stderr.write(f"\n=== {cfg['name']} ===\n")
+        sys.stderr.flush()
+        try:
+            ok, detail = _run_one(cfg, img_b64, model_dir)
+        except Exception:
+            ok, detail = False, traceback.format_exc().splitlines()[-1]
+        results.append((cfg["name"], ok, detail))
+        print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s}  {detail}")
+        sys.stdout.flush()
+
+    print("\n=== summary ===")
+    passed = sum(1 for _, ok, _ in results if ok)
+    total = len(results)
+    for name, ok, detail in results:
+        mark = "✓" if ok else "✗"
+        print(f"  {mark} {name:30s} {detail}")
+    print(f"\n{passed}/{total} passed")
+    return 0 if passed == total else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/backend/python/insightface/test.py
+++ b/backend/python/insightface/test.py
@@ -0,0 +1,234 @@
+"""Unit tests for the insightface gRPC backend.
+
+The servicer is instantiated in-process (no gRPC channel) and driven
+directly. Images come from insightface.data which ships with the pip
+package — no external downloads.
+
+Tests are parametrized over both engines (InsightFaceEngine and
+OnnxDirectEngine) where applicable.
+"""
+from __future__ import annotations
+
+import base64
+import os
+import sys
+import unittest
+
+import cv2
+import numpy as np
+
+sys.path.insert(0, os.path.dirname(__file__))
+
+import backend_pb2  # noqa: E402
+
+from backend import BackendServicer  # noqa: E402
+
+# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
+# to mirror LocalAI's gallery `files:` flow (the backend image itself
+# doesn't ship model weights).
+OPENCV_FILES = [
+    (
+        "face_detection_yunet_2023mar.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
+        "8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
+    ),
+    (
+        "face_recognition_sface_2021dec.onnx",
+        "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
+        "0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
+    ),
+]
+
+
+def _encode(img: np.ndarray) -> str:
+    _, buf = cv2.imencode(".jpg", img)
+    return base64.b64encode(buf.tobytes()).decode("ascii")
+
+
+def _load_insightface_samples() -> dict[str, str]:
+    """Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
+
+    t1 is a group photo, t2 a different one. We reuse both as
+    stand-ins for "Alice photo 1/2" and "Bob".
+    """
+    from insightface.data import get_image as ins_get_image
+
+    return {
+        "t1": _encode(ins_get_image("t1")),
+        "t2": _encode(ins_get_image("t2")),
+    }
+
+
+class _FakeContext:
+    """Minimal stand-in for grpc.ServicerContext."""
+
+    def __init__(self) -> None:
+        self.code = None
+        self.details = None
+
+    def set_code(self, code):
+        self.code = code
+
+    def set_details(self, details):
+        self.details = details
+
+
+class _Harness:
+    def __init__(self, servicer: BackendServicer) -> None:
+        self.svc = servicer
+
+    def health(self):
+        return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
+
+    def load(self, options: list[str], model_path: str = ""):
+        return self.svc.LoadModel(
+            backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
+            _FakeContext(),
+        )
+
+    def detect(self, img_b64: str):
+        return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
+
+    def embed(self, img_b64: str):
+        ctx = _FakeContext()
+        res = self.svc.Embedding(
+            backend_pb2.PredictOptions(Images=[img_b64]),
+            ctx,
+        )
+        return res, ctx
+
+    def verify(self, a: str, b: str, threshold: float = 0.0):
+        return self.svc.FaceVerify(
+            backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
+            _FakeContext(),
+        )
+
+    def analyze(self, img_b64: str):
+        return self.svc.FaceAnalyze(
+            backend_pb2.FaceAnalyzeRequest(img=img_b64),
+            _FakeContext(),
+        )
+
+
+class InsightFaceEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_health(self):
+        self.assertEqual(self.harness.health().message, b"OK")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+            self.assertGreater(d.width, 0)
+            self.assertGreater(d.height, 0)
+
+    def test_embedding_is_l2_normed(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertEqual(len(res.embeddings), 512)
+        norm_sq = sum(x * x for x in res.embeddings)
+        self.assertAlmostEqual(norm_sq, 1.0, places=2)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"])
+        self.assertTrue(res.verified)
+        self.assertLess(res.distance, 0.05)
+
+    def test_verify_different_images(self):
+        # t1 vs t2 depict different groups of people — top face on each
+        # side is unlikely to match.
+        res = self.harness.verify(self.samples["t1"], self.samples["t2"])
+        # We assert only that some numerical answer came back; the
+        # matches-or-not determination depends on which face each side
+        # picked and isn't a stable test assertion.
+        self.assertGreaterEqual(res.distance, 0.0)
+
+    def test_analyze_has_age_and_gender(self):
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertGreater(face.face_confidence, 0.0)
+            # Age should be populated for buffalo_l.
+            self.assertGreater(face.age, 0.0)
+            self.assertIn(face.dominant_gender, ("Man", "Woman"))
+
+
+def _prepare_opencv_models_dir() -> str | None:
+    """Download OpenCV Zoo face ONNX files into a temp dir the way
+    LocalAI's gallery would. Returns the directory, or None if
+    downloads failed (network-restricted sandbox).
+    """
+    import hashlib
+    import tempfile
+    import urllib.request
+
+    root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
+        prefix="opencv-face-"
+    )
+    for filename, uri, sha256 in OPENCV_FILES:
+        dest = os.path.join(root, filename)
+        if os.path.isfile(dest):
+            if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
+                continue
+        try:
+            urllib.request.urlretrieve(uri, dest)
+        except Exception:
+            return None
+        if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
+            return None
+    return root
+
+
+class OnnxDirectEngineTest(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        cls.samples = _load_insightface_samples()
+        cls.model_dir = _prepare_opencv_models_dir()
+        if cls.model_dir is None:
+            raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
+        cls.harness = _Harness(BackendServicer())
+        load = cls.harness.load(
+            [
+                "engine:onnx_direct",
+                "detector_onnx:face_detection_yunet_2023mar.onnx",
+                "recognizer_onnx:face_recognition_sface_2021dec.onnx",
+            ],
+            model_path=cls.model_dir,
+        )
+        if not load.success:
+            raise unittest.SkipTest(f"LoadModel failed: {load.message}")
+
+    def test_detect_finds_face(self):
+        res = self.harness.detect(self.samples["t1"])
+        self.assertGreater(len(res.Detections), 0)
+        for d in res.Detections:
+            self.assertEqual(d.class_name, "face")
+
+    def test_embedding_nonempty(self):
+        res, ctx = self.harness.embed(self.samples["t1"])
+        self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
+        self.assertGreater(len(res.embeddings), 0)
+
+    def test_verify_same_image(self):
+        res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
+        self.assertTrue(res.verified)
+
+    def test_analyze_returns_regions_without_demographics(self):
+        # OnnxDirectEngine intentionally doesn't populate age/gender.
+        res = self.harness.analyze(self.samples["t1"])
+        self.assertGreater(len(res.faces), 0)
+        for face in res.faces:
+            self.assertEqual(face.dominant_gender, "")
+            self.assertEqual(face.age, 0.0)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/backend/python/insightface/test.sh
+++ b/backend/python/insightface/test.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests
--- a/backend/python/whisperx/requirements-hipblas.txt
+++ b/backend/python/whisperx/requirements-hipblas.txt
@@ -1,6 +0,0 @@
-# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
-# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
-# the rocm7.2.1 runtime via AMD's forward-compatibility window.
--extra-index-url https://download.pytorch.org/whl/rocm6.4
-torch==2.8.0+rocm6.4
-whisperx @ git+https://github.com/m-bain/whisperX.git
--- a/core/application/application.go
+++ b/core/application/application.go
@@ -7,17 +7,28 @@ import (
 	"sync/atomic"
 	"time"

+	corebackend "github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/config"
 	mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
 	"github.com/mudler/LocalAI/core/services/agentpool"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
 	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/LocalAI/core/templates"
+	pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
 	"github.com/mudler/LocalAI/pkg/model"
 	"github.com/mudler/xlog"
 	"gorm.io/gorm"
 )

+// faceEmbeddingDim is the expected dimension for face embeddings.
+// Set to 0 so the Registry accepts whatever dim the loaded recognizer
+// produces — ArcFace R50 is 512-d, MBF is 512-d, SFace is 128-d, and
+// the insightface backend can load any of them via LoadModel options.
+// Locking this to a specific value would force a single recognizer
+// family per deployment; we keep the door open instead.
+const faceEmbeddingDim = 0
+
 type Application struct {
 	backendLoader      *config.ModelConfigLoader
 	modelLoader        *model.ModelLoader
@@ -27,6 +38,7 @@ type Application struct {
 	galleryService     *galleryop.GalleryService
 	agentJobService    *agentpool.AgentJobService
 	agentPoolService   atomic.Pointer[agentpool.AgentPoolService]
+	faceRegistry       facerecognition.Registry
 	authDB             *gorm.DB
 	watchdogMutex      sync.Mutex
 	watchdogStop       chan bool
@@ -50,12 +62,23 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
 		mcpTools.CloseMCPSessions(modelName)
 	})

-	return &Application{
+	app := &Application{
 		backendLoader:      config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
 		modelLoader:        ml,
 		applicationConfig:  appConfig,
 		templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
 	}
+
+	// Face-recognition registry backed by LocalAI's built-in vector store.
+	// The resolver closes over the ModelLoader so the Registry stays
+	// decoupled from loader plumbing; swapping in a postgres-backed
+	// implementation later is a single construction change here.
+	faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
+		return corebackend.StoreBackend(ml, appConfig, storeName, "")
+	}
+	app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
+
+	return app
 }

 func (a *Application) ModelConfigLoader() *config.ModelConfigLoader {
@@ -99,6 +122,14 @@ func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
 	return a.agentPoolService.Load()
 }

+// FaceRegistry returns the face-recognition registry used for 1:N
+// identification. The current implementation is backed by the
+// in-memory local-store backend; see core/services/facerecognition
+// for the interface and the postgres TODO.
+func (a *Application) FaceRegistry() facerecognition.Registry {
+	return a.faceRegistry
+}
+
 // AuthDB returns the auth database connection, or nil if auth is not enabled.
 func (a *Application) AuthDB() *gorm.DB {
 	return a.authDB
--- a/core/backend/face_analyze.go
+++ b/core/backend/face_analyze.go
@@ -0,0 +1,60 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/trace"
+	"github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+func FaceAnalyze(
+	img string,
+	actions []string,
+	antiSpoofing bool,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) (*proto.FaceAnalyzeResponse, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	var startTime time.Time
+	if appConfig.EnableTracing {
+		trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
+		startTime = time.Now()
+	}
+
+	res, err := faceModel.FaceAnalyze(context.Background(), &proto.FaceAnalyzeRequest{
+		Img:          img,
+		Actions:      actions,
+		AntiSpoofing: antiSpoofing,
+	})
+
+	if appConfig.EnableTracing {
+		errStr := ""
+		if err != nil {
+			errStr = err.Error()
+		}
+		trace.RecordBackendTrace(trace.BackendTrace{
+			Timestamp: startTime,
+			Duration:  time.Since(startTime),
+			Type:      trace.BackendTraceFaceAnalyze,
+			ModelName: modelConfig.Name,
+			Backend:   modelConfig.Backend,
+			Error:     errStr,
+		})
+	}
+
+	return res, err
+}
--- a/core/backend/face_embed.go
+++ b/core/backend/face_embed.go
@@ -0,0 +1,43 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+// FaceEmbed loads the face recognition backend and returns a 512-d
+// face embedding for the base64-encoded image. Unlike ModelEmbedding
+// it passes the image through PredictOptions.Images — the insightface
+// backend picks the highest-confidence face and returns its
+// L2-normalized embedding.
+func FaceEmbed(
+	imgBase64 string,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) ([]float32, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	predictOpts := gRPCPredictOpts(modelConfig, loader.ModelPath)
+	predictOpts.Images = []string{imgBase64}
+
+	res, err := faceModel.Embeddings(context.Background(), predictOpts)
+	if err != nil {
+		return nil, err
+	}
+	if len(res.Embeddings) == 0 {
+		return nil, fmt.Errorf("face embedding returned empty vector (no face detected?)")
+	}
+	return res.Embeddings, nil
+}
--- a/core/backend/face_verify.go
+++ b/core/backend/face_verify.go
@@ -0,0 +1,61 @@
+package backend
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/trace"
+	"github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/model"
+)
+
+func FaceVerify(
+	img1, img2 string,
+	threshold float32,
+	antiSpoofing bool,
+	loader *model.ModelLoader,
+	appConfig *config.ApplicationConfig,
+	modelConfig config.ModelConfig,
+) (*proto.FaceVerifyResponse, error) {
+	opts := ModelOptions(modelConfig, appConfig)
+	faceModel, err := loader.Load(opts...)
+	if err != nil {
+		recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
+		return nil, err
+	}
+	if faceModel == nil {
+		return nil, fmt.Errorf("could not load face recognition model")
+	}
+
+	var startTime time.Time
+	if appConfig.EnableTracing {
+		trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
+		startTime = time.Now()
+	}
+
+	res, err := faceModel.FaceVerify(context.Background(), &proto.FaceVerifyRequest{
+		Img1:         img1,
+		Img2:         img2,
+		Threshold:    threshold,
+		AntiSpoofing: antiSpoofing,
+	})
+
+	if appConfig.EnableTracing {
+		errStr := ""
+		if err != nil {
+			errStr = err.Error()
+		}
+		trace.RecordBackendTrace(trace.BackendTrace{
+			Timestamp: startTime,
+			Duration:  time.Since(startTime),
+			Type:      trace.BackendTraceFaceVerify,
+			ModelName: modelConfig.Name,
+			Backend:   modelConfig.Backend,
+			Error:     errStr,
+		})
+	}
+
+	return res, err
+}
--- a/core/backend/llm.go
+++ b/core/backend/llm.go
@@ -40,6 +40,12 @@ type TokenUsage struct {
 	ChatDeltas             []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
 }

+func needsThinkingProbe(c *config.ModelConfig) bool {
+	return c.TemplateConfig.UseTokenizerTemplate &&
+		(c.ReasoningConfig.DisableReasoning == nil ||
+			c.ReasoningConfig.DisableReasoningTagPrefill == nil)
+}
+
 // HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
 // Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
 func (t TokenUsage) HasChatDeltaContent() bool {
@@ -100,11 +106,9 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
 	// tokenizer template path is active) and the multimodal media marker (needed
 	// by custom chat templates so markers line up with what mtmd expects).
 	// We probe whenever any of those slots is still empty.
-	needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
-		c.ReasoningConfig.DisableReasoning == nil &&
-		c.ReasoningConfig.DisableReasoningTagPrefill == nil
+	shouldProbeThinking := needsThinkingProbe(c)
 	needsMarkerProbe := c.MediaMarker == ""
-	if needsThinkingProbe || needsMarkerProbe {
+	if shouldProbeThinking || needsMarkerProbe {
 		modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
 		config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
 		// Update the config in the loader so it persists for future requests
--- a/core/backend/llm_probe_test.go
+++ b/core/backend/llm_probe_test.go
@@ -0,0 +1,29 @@
+package backend
+
+import (
+	"github.com/mudler/LocalAI/core/config"
+
+	"github.com/gpustack/gguf-parser-go/util/ptr"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("thinking probe gating", func() {
+	It("probes tokenizer-template models when any reasoning default is still unset", func() {
+		cfg := &config.ModelConfig{
+			TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
+		}
+		Expect(needsThinkingProbe(cfg)).To(BeTrue())
+
+		cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
+		Expect(needsThinkingProbe(cfg)).To(BeTrue())
+
+		cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
+		Expect(needsThinkingProbe(cfg)).To(BeFalse())
+	})
+
+	It("does not probe when tokenizer templates are disabled", func() {
+		cfg := &config.ModelConfig{}
+		Expect(needsThinkingProbe(cfg)).To(BeFalse())
+	})
+})
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {

 	app, err := application.New(opts...)
 	if err != nil {
-		return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
+		return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n  1. Check that your models directory exists and is accessible: %s\n  2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n  3. Check available disk space and file permissions\n  4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
 	}

 	appHTTP, err := http.API(app)
--- a/core/cli/transcript.go
+++ b/core/cli/transcript.go
@@ -3,7 +3,6 @@ package cli
 import (
 	"context"
 	"encoding/json"
-	"errors"
 	"fmt"
 	"strings"

@@ -60,7 +59,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {

 	c, exists := cl.GetModelConfig(t.Model)
 	if !exists {
-		return errors.New("model not found")
+		return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
 	}

 	c.Threads = &t.Threads
--- a/core/cli/util.go
+++ b/core/cli/util.go
@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {

 func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
 	if len(u.Args) == 0 {
-		return fmt.Errorf("no GGUF file provided")
+		return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
 	}
 	// We try to guess only if we don't have a template defined already
 	f, err := gguf.ParseGGUFFile(u.Args[0])
--- a/core/cli/worker.go
+++ b/core/cli/worker.go
@@ -21,6 +21,7 @@ import (
 	"github.com/mudler/LocalAI/core/cli/workerregistry"
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/messaging"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/LocalAI/core/services/storage"
@@ -597,12 +598,20 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
 	// Try to find the backend binary
 	backendPath := s.findBackend(req.Backend)
 	if backendPath == "" {
-		// Backend not found locally — try auto-installing from gallery
-		xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
-		if err := gallery.InstallBackendFromGallery(
-			context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
-		); err != nil {
-			return "", fmt.Errorf("installing backend from gallery: %w", err)
+		if req.URI != "" {
+			xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
+			if err := galleryop.InstallExternalBackend(
+				context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
+			); err != nil {
+				return "", fmt.Errorf("installing backend from gallery: %w", err)
+			}
+		} else {
+			xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
+			if err := gallery.InstallBackendFromGallery(
+				context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
+			); err != nil {
+				return "", fmt.Errorf("installing backend from gallery: %w", err)
+			}
 		}
 		// Re-register after install and retry
 		gallery.RegisterBackends(s.systemState, s.ml)
--- a/core/cli/worker/worker_p2p.go
+++ b/core/cli/worker/worker_p2p.go
@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
 	// Check if the token is set
 	// as we always need it.
 	if r.Token == "" {
-		return fmt.Errorf("Token is required")
+		return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
 	}

 	port, err := freeport.GetFreePort()
--- a/core/config/gguf.go
+++ b/core/config/gguf.go
@@ -125,19 +125,7 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
 			return
 		}

-		cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
-
-		// Use the rendered template to detect if thinking token is at the end
-		// This reuses the existing DetectThinkingStartToken function
-		if metadata.RenderedTemplate != "" {
-			thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
-			thinkingForcedOpen := thinkingStartToken != ""
-			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
-			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
-		} else {
-			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
-			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
-		}
+		applyDetectedThinkingConfig(cfg, metadata)

 		// Extract tool format markers from autoparser analysis
 		if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
@@ -180,3 +168,34 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
 		}
 	}
 }
+
+func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
+	if cfg == nil || metadata == nil {
+		return
+	}
+
+	// Respect explicit YAML/user config. Backend probing should only fill defaults
+	// when the reasoning mode has not already been set.
+	if cfg.ReasoningConfig.DisableReasoning == nil {
+		cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
+	}
+
+	// Respect explicit prefill config for the same reason. Only infer the
+	// default prefill behavior when the user did not set it.
+	if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
+		// Use the rendered template to detect if thinking token is at the end.
+		// This reuses the existing DetectThinkingStartToken function.
+		if metadata.RenderedTemplate != "" {
+			thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
+			thinkingForcedOpen := thinkingStartToken != ""
+			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
+			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
+		} else {
+			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
+			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
+		}
+		return
+	}
+
+	xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
+}
--- a/core/config/gguf_reasoning_test.go
+++ b/core/config/gguf_reasoning_test.go
@@ -0,0 +1,101 @@
+package config
+
+import (
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/reasoning"
+
+	"github.com/gpustack/gguf-parser-go/util/ptr"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("GGUF backend metadata reasoning defaults", func() {
+	It("fills reasoning defaults when unset", func() {
+		cfg := &ModelConfig{
+			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
+		}
+
+		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
+			SupportsThinking: true,
+			RenderedTemplate: "{{ bos_token }}<think>",
+		})
+
+		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
+		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
+	})
+
+	It("preserves fully explicit reasoning settings", func() {
+		cfg := &ModelConfig{
+			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
+			ReasoningConfig: reasoning.Config{
+				DisableReasoning:           ptr.To(true),
+				DisableReasoningTagPrefill: ptr.To(true),
+			},
+		}
+
+		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
+			SupportsThinking: true,
+			RenderedTemplate: "{{ bos_token }}<think>",
+		})
+
+		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
+		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
+	})
+
+	It("preserves explicit disable while still inferring missing prefill", func() {
+		cfg := &ModelConfig{
+			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
+			ReasoningConfig: reasoning.Config{
+				DisableReasoning: ptr.To(true),
+			},
+		}
+
+		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
+			SupportsThinking: true,
+			RenderedTemplate: "{{ bos_token }}<think>",
+		})
+
+		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
+		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
+	})
+
+	It("preserves explicit prefill while still inferring missing disable flag", func() {
+		cfg := &ModelConfig{
+			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
+			ReasoningConfig: reasoning.Config{
+				DisableReasoningTagPrefill: ptr.To(true),
+			},
+		}
+
+		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
+			SupportsThinking: true,
+			RenderedTemplate: "{{ bos_token }}<think>",
+		})
+
+		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
+		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
+	})
+
+	It("defaults to disabling reasoning when backend does not support thinking", func() {
+		cfg := &ModelConfig{
+			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
+		}
+
+		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
+			SupportsThinking: false,
+		})
+
+		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
+		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
+		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
+	})
+})
--- a/core/config/model_config.go
+++ b/core/config/model_config.go
@@ -588,6 +588,7 @@ const (
 	FLAG_VAD              ModelConfigUsecase = 0b010000000000
 	FLAG_VIDEO            ModelConfigUsecase = 0b100000000000
 	FLAG_DETECTION        ModelConfigUsecase = 0b1000000000000
+	FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000

 	// Common Subsets
 	FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@@ -611,6 +612,7 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
 		"FLAG_LLM":              FLAG_LLM,
 		"FLAG_VIDEO":            FLAG_VIDEO,
 		"FLAG_DETECTION":        FLAG_DETECTION,
+		"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
 	}
 }

@@ -651,7 +653,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
 	nonTextGenBackends := []string{
 		"whisper", "piper", "kokoro",
 		"diffusers", "stablediffusion", "stablediffusion-ggml",
-		"rerankers", "silero-vad", "rfdetr",
+		"rerankers", "silero-vad", "rfdetr", "insightface",
 		"transformers-musicgen", "ace-step", "acestep-cpp",
 	}

@@ -728,12 +730,19 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
 	}

 	if (u & FLAG_DETECTION) == FLAG_DETECTION {
-		detectionBackends := []string{"rfdetr", "sam3-cpp"}
+		detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
 		if !slices.Contains(detectionBackends, c.Backend) {
 			return false
 		}
 	}

+	if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
+		faceBackends := []string{"insightface"}
+		if !slices.Contains(faceBackends, c.Backend) {
+			return false
+		}
+	}
+
 	if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
 		soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
 		if !slices.Contains(soundGenBackends, c.Backend) {
--- a/core/config/model_config_loader.go
+++ b/core/config/model_config_loader.go
@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
 		bcl.configs[c.Name] = *c
 	} else {
 		if err != nil {
-			return fmt.Errorf("config is not valid: %w", err)
+			return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
 		}
-		return fmt.Errorf("config is not valid")
+		return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
 	}

 	return nil
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
 		files = append(files, info)
 	}
 	for _, file := range files {
-		// Skip templates, YAML and .keep files
-		if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
-			strings.HasPrefix(file.Name(), ".") {
+		// Only load real YAML config files and ignore dotfiles or backup variants
+		ext := strings.ToLower(filepath.Ext(file.Name()))
+		if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
 			continue
 		}

--- a/core/config/model_test.go
+++ b/core/config/model_test.go
@@ -2,6 +2,7 @@ package config

 import (
 	"os"
+	"path/filepath"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
@@ -109,5 +110,50 @@ options:
 			Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))

 		})
+
+		It("Only loads files ending with yaml or yml", func() {
+			tmpdir, err := os.MkdirTemp("", "model-config-loader")
+			Expect(err).ToNot(HaveOccurred())
+			defer os.RemoveAll(tmpdir)
+
+			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
+				`name: "foo-model"
+description: "formal config"
+backend: "llama-cpp"
+parameters:
+  model: "foo.gguf"
+`), 0644)
+			Expect(err).ToNot(HaveOccurred())
+
+			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
+				`name: "foo-model"
+description: "backup config"
+backend: "llama-cpp"
+parameters:
+  model: "foo-backup.gguf"
+`), 0644)
+			Expect(err).ToNot(HaveOccurred())
+
+			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
+				`name: "foo-backup-only"
+description: "timestamped backup config"
+backend: "llama-cpp"
+parameters:
+  model: "foo-timestamped.gguf"
+`), 0644)
+			Expect(err).ToNot(HaveOccurred())
+
+			bcl := NewModelConfigLoader(tmpdir)
+			err = bcl.LoadModelConfigsFromPath(tmpdir)
+			Expect(err).ToNot(HaveOccurred())
+
+			configs := bcl.GetAllModelsConfigs()
+			Expect(configs).To(HaveLen(1))
+			Expect(configs[0].Name).To(Equal("foo-model"))
+			Expect(configs[0].Description).To(Equal("formal config"))
+
+			_, exists := bcl.GetModelConfig("foo-backup-only")
+			Expect(exists).To(BeFalse())
+		})
 	})
 })
--- a/core/gallery/backends.go
+++ b/core/gallery/backends.go
@@ -110,7 +110,13 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
 		if err != nil {
 			return err
 		}
-		if backends.Exists(name) {
+		// Only short-circuit if the install is *actually usable*. An orphaned
+		// meta entry whose concrete was removed still shows up in
+		// ListSystemBackends with a RunFile pointing at a path that no longer
+		// exists; returning early there leaves the caller with a broken
+		// alias and the worker fails with "backend not found after install
+		// attempt" on every retry. Re-install in that case.
+		if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
 			return nil
 		}
 	}
@@ -375,17 +381,44 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
 	}

 	if metadata != nil && metadata.MetaBackendFor != "" {
-		metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
-		xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
-		if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
-			return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
+		concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
+		xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
+		// If the concrete the meta points to is already gone (earlier delete,
+		// partial install, or manual cleanup), keep going and remove the
+		// orphaned meta dir. Previously we returned an error here, which made
+		// the orphaned meta impossible to uninstall from the UI — the delete
+		// kept failing and every subsequent install short-circuited because
+		// the stale meta metadata made ListSystemBackends.Exists(name) true.
+		if _, statErr := os.Stat(concreteDirectory); statErr == nil {
+			os.RemoveAll(concreteDirectory)
+		} else if os.IsNotExist(statErr) {
+			xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
+				"meta", name, "concrete", metadata.MetaBackendFor)
+		} else {
+			return statErr
 		}
-		os.RemoveAll(metaBackendDirectory)
 	}

 	return os.RemoveAll(backendDirectory)
 }

+// isBackendRunnable reports whether the given backend entry can actually be
+// invoked. A meta backend is runnable only if its concrete's run.sh still
+// exists on disk; concrete backends are considered runnable as long as their
+// RunFile is set (ListSystemBackends only emits them when the runfile is
+// present). Used to guard the "already installed" short-circuit so an
+// orphaned meta pointing at a missing concrete triggers a real reinstall
+// rather than being silently skipped.
+func isBackendRunnable(b SystemBackend) bool {
+	if b.RunFile == "" {
+		return false
+	}
+	if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
+		return false
+	}
+	return true
+}
+
 type SystemBackend struct {
 	Name             string
 	RunFile          string
--- a/core/gallery/backends_test.go
+++ b/core/gallery/backends_test.go
@@ -952,6 +952,58 @@ var _ = Describe("Gallery Backends", func() {
 			err = DeleteBackendFromSystem(systemState, "non-existent")
 			Expect(err).To(HaveOccurred())
 		})
+
+		It("removes an orphaned meta backend whose concrete is missing", func() {
+			// Real scenario from the dev cluster: the concrete got wiped
+			// (partial install, manual cleanup, previous crash) but the meta
+			// directory + metadata.json still points at it. The old code
+			// errored with "meta backend X not found" and left the orphan in
+			// place, making the backend impossible to uninstall.
+			metaName := "meta-backend"
+			concreteName := "concrete-backend-that-vanished"
+			metaPath := filepath.Join(tempDir, metaName)
+			Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
+
+			meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
+			data, err := json.MarshalIndent(meta, "", "  ")
+			Expect(err).NotTo(HaveOccurred())
+			Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
+
+			// Concrete directory intentionally absent.
+			systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
+			Expect(err).NotTo(HaveOccurred())
+
+			Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
+			Expect(metaPath).NotTo(BeADirectory())
+		})
+	})
+
+	Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
+		It("re-runs install when the meta's concrete is missing", func() {
+			// Seed state: meta dir exists with metadata pointing at a
+			// concrete that was removed from disk. ListSystemBackends still
+			// surfaces the meta via its metadata.Name → the old short-circuit
+			// at `if backends.Exists(name) { return nil }` returned silently,
+			// leaving the worker's findBackend() with a dead alias forever.
+			// The fix: require the backend to be runnable before we skip.
+			metaName := "meta-orphan"
+			concreteName := "concrete-gone"
+			metaPath := filepath.Join(tempDir, metaName)
+			Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
+			meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
+			data, err := json.MarshalIndent(meta, "", "  ")
+			Expect(err).NotTo(HaveOccurred())
+			Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
+
+			systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
+			Expect(err).NotTo(HaveOccurred())
+
+			listed, err := ListSystemBackends(systemState)
+			Expect(err).NotTo(HaveOccurred())
+			b, ok := listed.Get(metaName)
+			Expect(ok).To(BeTrue())
+			Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
+		})
 	})

 	Describe("ListSystemBackends", func() {
--- a/core/http/auth/features.go
+++ b/core/http/auth/features.go
@@ -57,6 +57,14 @@ var RouteFeatureRegistry = []RouteFeature{
 	// Detection
 	{"POST", "/v1/detection", FeatureDetection},

+	// Face recognition
+	{"POST", "/v1/face/verify", FeatureFaceRecognition},
+	{"POST", "/v1/face/analyze", FeatureFaceRecognition},
+	{"POST", "/v1/face/embed", FeatureFaceRecognition},
+	{"POST", "/v1/face/register", FeatureFaceRecognition},
+	{"POST", "/v1/face/identify", FeatureFaceRecognition},
+	{"POST", "/v1/face/forget", FeatureFaceRecognition},
+
 	// Video
 	{"POST", "/video", FeatureVideo},

@@ -151,5 +159,6 @@ func APIFeatureMetas() []FeatureMeta {
 		{FeatureTokenize, "Tokenize", true},
 		{FeatureMCP, "MCP", true},
 		{FeatureStores, "Stores", true},
+		{FeatureFaceRecognition, "Face Recognition", true},
 	}
 }
--- a/core/http/auth/permissions.go
+++ b/core/http/auth/permissions.go
@@ -51,6 +51,7 @@ const (
 	FeatureTokenize           = "tokenize"
 	FeatureMCP                = "mcp"
 	FeatureStores             = "stores"
+	FeatureFaceRecognition    = "face_recognition"
 )

 // AgentFeatures lists agent-related features (default OFF).
@@ -64,6 +65,7 @@ var APIFeatures = []string{
 	FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription,
 	FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound,
 	FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores,
+	FeatureFaceRecognition,
 }

 // AllFeatures lists all known features (used by UI and validation).
--- a/core/http/endpoints/localai/api_instructions.go
+++ b/core/http/endpoints/localai/api_instructions.go
@@ -73,6 +73,12 @@ var instructionDefs = []instructionDef{
 		Description: "Video generation from text prompts",
 		Tags:        []string{"video"},
 	},
+	{
+		Name:        "face-recognition",
+		Description: "Face verification (1:1), identification (1:N), embedding, and demographic analysis",
+		Tags:        []string{"face-recognition"},
+		Intro:       "The /v1/face/register, /identify, and /forget endpoints build on a vector store — registrations are in-memory by default and lost on restart. Use /v1/face/embed for a raw embedding; /v1/embeddings is OpenAI-compatible and text-only.",
+	},
 }

 // swaggerState holds parsed swagger spec data, initialised once.
--- a/core/http/endpoints/localai/api_instructions_test.go
+++ b/core/http/endpoints/localai/api_instructions_test.go
@@ -39,7 +39,7 @@ var _ = Describe("API Instructions Endpoints", func() {

 			instructions, ok := resp["instructions"].([]any)
 			Expect(ok).To(BeTrue())
-			Expect(instructions).To(HaveLen(9))
+			Expect(instructions).To(HaveLen(10))

 			// Verify each instruction has required fields and correct URL format
 			for _, s := range instructions {
@@ -73,6 +73,7 @@ var _ = Describe("API Instructions Endpoints", func() {
 				"model-management",
 				"monitoring",
 				"agents",
+				"face-recognition",
 			))
 		})
 	})
--- a/core/http/endpoints/localai/backend_monitor.go
+++ b/core/http/endpoints/localai/backend_monitor.go
@@ -9,19 +9,26 @@ import (
 // BackendMonitorEndpoint returns the status of the specified backend
 // @Summary Backend monitor endpoint
 // @Tags monitoring
-// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
+// @Param model query string true "Name of the model to monitor"
 // @Success 200 {object} proto.StatusResponse "Response"
 // @Router /backend/monitor [get]
 func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
 	return func(c echo.Context) error {
-
-		input := new(schema.BackendMonitorRequest)
-		// Get input data from the request body
-		if err := c.Bind(input); err != nil {
-			return err
+		model := c.QueryParam("model")
+		// Fall back to binding the request body so pre-existing clients that
+		// sent `{"model": "..."}` with GET keep working.
+		if model == "" {
+			input := new(schema.BackendMonitorRequest)
+			if err := c.Bind(input); err != nil {
+				return err
+			}
+			model = input.Model
+		}
+		if model == "" {
+			return echo.NewHTTPError(400, "model query parameter is required")
 		}

-		resp, err := bm.CheckAndSample(input.Model)
+		resp, err := bm.CheckAndSample(model)
 		if err != nil {
 			return err
 		}
--- a/core/http/endpoints/localai/detection.go
+++ b/core/http/endpoints/localai/detection.go
@@ -9,7 +9,6 @@ import (
 	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/model"
-	"github.com/mudler/LocalAI/pkg/utils"
 	"github.com/mudler/xlog"
 )

@@ -34,14 +33,14 @@ func DetectionEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC

 		xlog.Debug("Detection", "image", input.Image, "modelFile", "modelFile", "backend", cfg.Backend)

-		image, err := utils.GetContentURIAsBase64(input.Image)
+		image, err := decodeImageInput(input.Image)
 		if err != nil {
 			return err
 		}

 		res, err := backend.Detection(image, input.Prompt, input.Points, input.Boxes, input.Threshold, ml, appConfig, *cfg)
 		if err != nil {
-			return err
+			return mapBackendError(err)
 		}

 		response := schema.DetectionResponse{
--- a/core/http/endpoints/localai/face_analyze.go
+++ b/core/http/endpoints/localai/face_analyze.go
@@ -0,0 +1,69 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceAnalyzeEndpoint returns demographic attributes for faces in an image.
+// @Summary Analyze demographic attributes (age, gender, ...) of faces.
+// @Tags face-recognition
+// @Param request body schema.FaceAnalyzeRequest true "query params"
+// @Success 200 {object} schema.FaceAnalyzeResponse "Response"
+// @Router /v1/face/analyze [post]
+func FaceAnalyzeEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceAnalyzeRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceAnalyze", "model", cfg.Name, "backend", cfg.Backend, "actions", input.Actions)
+		res, err := backend.FaceAnalyze(img, input.Actions, input.AntiSpoofing, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		response := schema.FaceAnalyzeResponse{
+			Faces: make([]schema.FaceAnalysis, len(res.GetFaces())),
+		}
+		for i, f := range res.GetFaces() {
+			response.Faces[i] = schema.FaceAnalysis{
+				Region: schema.FacialArea{
+					X: f.GetRegion().GetX(),
+					Y: f.GetRegion().GetY(),
+					W: f.GetRegion().GetW(),
+					H: f.GetRegion().GetH(),
+				},
+				FaceConfidence:  f.GetFaceConfidence(),
+				Age:             f.GetAge(),
+				DominantGender:  f.GetDominantGender(),
+				Gender:          f.GetGender(),
+				DominantEmotion: f.GetDominantEmotion(),
+				Emotion:         f.GetEmotion(),
+				DominantRace:    f.GetDominantRace(),
+				Race:            f.GetRace(),
+				IsReal:          f.GetIsReal(),
+				AntispoofScore:  f.GetAntispoofScore(),
+			}
+		}
+
+		return c.JSON(http.StatusOK, response)
+	}
+}
--- a/core/http/endpoints/localai/face_embed.go
+++ b/core/http/endpoints/localai/face_embed.go
@@ -0,0 +1,54 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceEmbedEndpoint extracts a face embedding vector from an image.
+//
+// Distinct from /v1/embeddings, which is OpenAI-compatible and text-only
+// by contract (its `input` field is a string or string list of TEXT to
+// embed). Passing an image data-URI to /v1/embeddings does not work —
+// use this endpoint instead.
+//
+// @Summary Extract a face embedding from an image.
+// @Tags face-recognition
+// @Param request body schema.FaceEmbedRequest true "query params"
+// @Success 200 {object} schema.FaceEmbedResponse "Response"
+// @Router /v1/face/embed [post]
+func FaceEmbedEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceEmbedRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceEmbed", "model", cfg.Name, "backend", cfg.Backend)
+		vec, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+		return c.JSON(http.StatusOK, schema.FaceEmbedResponse{
+			Embedding: vec,
+			Dim:       len(vec),
+			Model:     cfg.Name,
+		})
+	}
+}
--- a/core/http/endpoints/localai/face_forget.go
+++ b/core/http/endpoints/localai/face_forget.go
@@ -0,0 +1,45 @@
+package localai
+
+import (
+	"errors"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/xlog"
+)
+
+// FaceForgetEndpoint removes a previously-registered face by ID.
+// @Summary Remove a previously-registered face by ID.
+// @Tags face-recognition
+// @Param request body schema.FaceForgetRequest true "query params"
+// @Success 204 "No Content"
+// @Router /v1/face/forget [post]
+func FaceForgetEndpoint(registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceForgetRequest)
+		if !ok {
+			// Forget doesn't need a face model loaded — fall back to a raw bind
+			// when the request extractor hasn't run (e.g. when the route was
+			// registered without SetModelAndConfig).
+			input = new(schema.FaceForgetRequest)
+			if err := c.Bind(input); err != nil {
+				return echo.ErrBadRequest
+			}
+		}
+		if input.ID == "" {
+			return echo.NewHTTPError(http.StatusBadRequest, "id is required")
+		}
+
+		xlog.Debug("FaceForget", "id", input.ID)
+		if err := registry.Forget(c.Request().Context(), input.ID); err != nil {
+			if errors.Is(err, facerecognition.ErrNotFound) {
+				return echo.NewHTTPError(http.StatusNotFound, err.Error())
+			}
+			return err
+		}
+		return c.NoContent(http.StatusNoContent)
+	}
+}
--- a/core/http/endpoints/localai/face_identify.go
+++ b/core/http/endpoints/localai/face_identify.go
@@ -0,0 +1,80 @@
+package localai
+
+import (
+	"cmp"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// defaultIdentifyThreshold is the cosine-distance cutoff applied when
+// the client does not specify one. Tuned for buffalo_l ArcFace R50;
+// other recognizers (e.g. SFace) should override it explicitly.
+const defaultIdentifyThreshold = float32(0.35)
+
+// FaceIdentifyEndpoint runs 1:N identification against the registered store.
+// @Summary Identify a face against the registered database (1:N recognition).
+// @Tags face-recognition
+// @Param request body schema.FaceIdentifyRequest true "query params"
+// @Success 200 {object} schema.FaceIdentifyResponse "Response"
+// @Router /v1/face/identify [post]
+func FaceIdentifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceIdentifyRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		topK := cmp.Or(input.TopK, 5)
+		threshold := cmp.Or(input.Threshold, defaultIdentifyThreshold)
+
+		xlog.Debug("FaceIdentify", "model", cfg.Name, "topK", topK, "threshold", threshold)
+		probe, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		matches, err := registry.Identify(c.Request().Context(), probe, topK)
+		if err != nil {
+			return err
+		}
+
+		response := schema.FaceIdentifyResponse{
+			Matches: make([]schema.FaceIdentifyMatch, len(matches)),
+		}
+		for i, m := range matches {
+			confidence := (1 - m.Distance/threshold) * 100
+			if confidence < 0 {
+				confidence = 0
+			}
+			if confidence > 100 {
+				confidence = 100
+			}
+			response.Matches[i] = schema.FaceIdentifyMatch{
+				ID:         m.ID,
+				Name:       m.Metadata.Name,
+				Labels:     m.Metadata.Labels,
+				Distance:   m.Distance,
+				Confidence: confidence,
+				Match:      m.Distance <= threshold,
+			}
+		}
+		return c.JSON(http.StatusOK, response)
+	}
+}
--- a/core/http/endpoints/localai/face_register.go
+++ b/core/http/endpoints/localai/face_register.go
@@ -0,0 +1,60 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceRegisterEndpoint enrolls a face into the 1:N identification store.
+// @Summary Register a face for 1:N identification.
+// @Tags face-recognition
+// @Param request body schema.FaceRegisterRequest true "query params"
+// @Success 200 {object} schema.FaceRegisterResponse "Response"
+// @Router /v1/face/register [post]
+func FaceRegisterEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceRegisterRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+		if input.Name == "" {
+			return echo.NewHTTPError(http.StatusBadRequest, "name is required")
+		}
+
+		img, err := decodeImageInput(input.Img)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceRegister", "model", cfg.Name, "name", input.Name)
+		embedding, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		stored, err := registry.Register(c.Request().Context(), embedding, facerecognition.Metadata{
+			Name:   input.Name,
+			Labels: input.Labels,
+		})
+		if err != nil {
+			return err
+		}
+		return c.JSON(http.StatusOK, schema.FaceRegisterResponse{
+			ID:           stored.ID,
+			Name:         stored.Name,
+			RegisteredAt: stored.RegisteredAt,
+		})
+	}
+}
--- a/core/http/endpoints/localai/face_verify.go
+++ b/core/http/endpoints/localai/face_verify.go
@@ -0,0 +1,68 @@
+package localai
+
+import (
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/xlog"
+)
+
+// FaceVerifyEndpoint compares two images and reports whether they depict the same person.
+// @Summary Verify that two images depict the same person.
+// @Tags face-recognition
+// @Param request body schema.FaceVerifyRequest true "query params"
+// @Success 200 {object} schema.FaceVerifyResponse "Response"
+// @Router /v1/face/verify [post]
+func FaceVerifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+	return func(c echo.Context) error {
+		input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceVerifyRequest)
+		if !ok || input.Model == "" {
+			return echo.ErrBadRequest
+		}
+		cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+		if !ok || cfg == nil {
+			return echo.ErrBadRequest
+		}
+
+		img1, err := decodeImageInput(input.Img1)
+		if err != nil {
+			return err
+		}
+		img2, err := decodeImageInput(input.Img2)
+		if err != nil {
+			return err
+		}
+
+		xlog.Debug("FaceVerify", "model", cfg.Name, "backend", cfg.Backend)
+		res, err := backend.FaceVerify(img1, img2, input.Threshold, input.AntiSpoofing, ml, appConfig, *cfg)
+		if err != nil {
+			return mapBackendError(err)
+		}
+
+		return c.JSON(http.StatusOK, schema.FaceVerifyResponse{
+			Verified:   res.GetVerified(),
+			Distance:   res.GetDistance(),
+			Threshold:  res.GetThreshold(),
+			Confidence: res.GetConfidence(),
+			Model:      res.GetModel(),
+			Img1Area: schema.FacialArea{
+				X: res.GetImg1Area().GetX(),
+				Y: res.GetImg1Area().GetY(),
+				W: res.GetImg1Area().GetW(),
+				H: res.GetImg1Area().GetH(),
+			},
+			Img2Area: schema.FacialArea{
+				X: res.GetImg2Area().GetX(),
+				Y: res.GetImg2Area().GetY(),
+				W: res.GetImg2Area().GetW(),
+				H: res.GetImg2Area().GetH(),
+			},
+			ProcessingTimeMs: res.GetProcessingTimeMs(),
+		})
+	}
+}
--- a/core/http/endpoints/localai/images.go
+++ b/core/http/endpoints/localai/images.go
@@ -0,0 +1,55 @@
+package localai
+
+import (
+	"fmt"
+	"net/http"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/pkg/utils"
+	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/status"
+)
+
+// decodeImageInput resolves a URL, data-URI, or plain-string image
+// input to a base64 payload ready for the gRPC surface. Errors from
+// the underlying utils helper (bad URL, not a data-URI, download
+// failure, etc.) are all caused by what the client sent — we surface
+// them as 400 rather than the default 500 so API consumers can
+// distinguish "you sent bad input" from "our server broke".
+//
+// This is the single-input path for endpoints where the image IS the
+// request (detection, face recognition, etc.). The multi-modal message
+// paths (chat completions, responses API, realtime) intentionally
+// log-and-skip individual media parts; they don't use this helper.
+func decodeImageInput(s string) (string, error) {
+	img, err := utils.GetContentURIAsBase64(s)
+	if err != nil {
+		return "", echo.NewHTTPError(http.StatusBadRequest, fmt.Sprintf("invalid image input: %v", err))
+	}
+	return img, nil
+}
+
+// mapBackendError converts the gRPC status code a backend returns into
+// a matching HTTP status. Without this, every backend error defaults
+// to 500 — which lies to API consumers when the backend is telling us
+// "your input was bad" (INVALID_ARGUMENT) or "the resource doesn't
+// exist" (NOT_FOUND). Pass any err from a `core/backend/*` call
+// through this before returning from a handler.
+func mapBackendError(err error) error {
+	if err == nil {
+		return nil
+	}
+	if st, ok := status.FromError(err); ok {
+		switch st.Code() {
+		case codes.InvalidArgument:
+			return echo.NewHTTPError(http.StatusBadRequest, st.Message())
+		case codes.NotFound:
+			return echo.NewHTTPError(http.StatusNotFound, st.Message())
+		case codes.FailedPrecondition:
+			return echo.NewHTTPError(http.StatusPreconditionFailed, st.Message())
+		case codes.Unimplemented:
+			return echo.NewHTTPError(http.StatusNotImplemented, st.Message())
+		}
+	}
+	return err
+}
--- a/core/http/endpoints/localai/nodes.go
+++ b/core/http/endpoints/localai/nodes.go
@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
 		if err := c.Bind(&req); err != nil || req.Backend == "" {
 			return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
 		}
-		reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
+		reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
 		if err != nil {
 			xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
 			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
--- a/core/http/endpoints/localai/settings.go
+++ b/core/http/endpoints/localai/settings.go
@@ -110,6 +110,27 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			})
 		}

+		// The UI reads ApiKeys from GET /api/settings, which already returns the
+		// merged env+runtime list. When the user clicks Save, the same merged
+		// list comes back in the POST body. Strip the env-supplied keys from
+		// the incoming list before we persist or re-merge, otherwise each save
+		// duplicates the env keys on top of the previous merge (#9071).
+		if settings.ApiKeys != nil {
+			envKeys := startupConfig.ApiKeys
+			envSet := make(map[string]struct{}, len(envKeys))
+			for _, k := range envKeys {
+				envSet[k] = struct{}{}
+			}
+			runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
+			for _, k := range *settings.ApiKeys {
+				if _, fromEnv := envSet[k]; fromEnv {
+					continue
+				}
+				runtimeOnly = append(runtimeOnly, k)
+			}
+			settings.ApiKeys = &runtimeOnly
+		}
+
 		settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
 		settingsJSON, err := json.MarshalIndent(settings, "", "  ")
 		if err != nil {
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -147,6 +147,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 		result := ""
 		lastEmittedCount := 0
 		sentInitialRole := false
+		sentReasoning := false
 		hasChatDeltaToolCalls := false
 		hasChatDeltaContent := false

@@ -190,6 +191,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 					}},
 					Object: "chat.completion.chunk",
 				}
+				sentReasoning = true
 			}

 			// Stream content deltas (cleaned of reasoning tags) while no tool calls
@@ -363,7 +365,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 			functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
 		}
 		xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
-		noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0
+		// noAction is a sentinel "just answer" pseudo-function — not a real
+		// tool call. Scan the whole slice rather than only index 0 so we
+		// don't drop a real tool call that happens to follow a noAction
+		// entry, and so the default branch isn't entered with only noAction
+		// entries to emit as tool_calls.
+		noActionToRun := !hasRealCall(functionResults, noAction)

 		switch {
 		case noActionToRun:
@@ -377,108 +384,31 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
 			}

-			if sentInitialRole {
-				// Content was already streamed during the callback — just emit usage.
-				delta := &schema.Message{}
-				if reasoning != "" && extractor.Reasoning() == "" {
-					delta.Reasoning = &reasoning
-				}
-				responses <- schema.OpenAIResponse{
-					ID: id, Created: created, Model: req.Model,
-					Choices: []schema.Choice{{Delta: delta, Index: 0}},
-					Object:  "chat.completion.chunk",
-					Usage:   usage,
-				}
-			} else {
-				// Content was NOT streamed — send everything at once (fallback).
-				responses <- schema.OpenAIResponse{
-					ID: id, Created: created, Model: req.Model,
-					Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
-					Object:  "chat.completion.chunk",
-				}
-
-				result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
-				if err != nil {
-					xlog.Error("error handling question", "error", err)
-					return err
-				}
-
-				delta := &schema.Message{Content: &result}
-				if reasoning != "" {
-					delta.Reasoning = &reasoning
-				}
-				responses <- schema.OpenAIResponse{
-					ID: id, Created: created, Model: req.Model,
-					Choices: []schema.Choice{{Delta: delta, Index: 0}},
-					Object:  "chat.completion.chunk",
-					Usage:   usage,
+			var result string
+			if !sentInitialRole {
+				var hqErr error
+				result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
+				if hqErr != nil {
+					xlog.Error("error handling question", "error", hqErr)
+					return hqErr
 				}
 			}
+			for _, chunk := range buildNoActionFinalChunks(
+				id, req.Model, created,
+				sentInitialRole, sentReasoning,
+				result, reasoning, usage,
+			) {
+				responses <- chunk
+			}

 		default:
-			for i, ss := range functionResults {
-				name, args := ss.Name, ss.Arguments
-				toolCallID := ss.ID
-				if toolCallID == "" {
-					toolCallID = id
-				}
-
-				if i < lastEmittedCount {
-					// Already emitted during streaming by the incremental
-					// JSON/XML parser — skip to avoid duplicate tool calls.
-					continue
-				}
-
-				// Tool call not yet emitted — send name + args (two chunks).
-				initialMessage := schema.OpenAIResponse{
-					ID:      id,
-					Created: created,
-					Model:   req.Model,
-					Choices: []schema.Choice{{
-						Delta: &schema.Message{
-							Role: "assistant",
-							ToolCalls: []schema.ToolCall{
-								{
-									Index: i,
-									ID:    toolCallID,
-									Type:  "function",
-									FunctionCall: schema.FunctionCall{
-										Name: name,
-									},
-								},
-							},
-						},
-						Index:        0,
-						FinishReason: nil,
-					}},
-					Object: "chat.completion.chunk",
-				}
-				responses <- initialMessage
-
-				responses <- schema.OpenAIResponse{
-					ID:      id,
-					Created: created,
-					Model:   req.Model,
-					Choices: []schema.Choice{{
-						Delta: &schema.Message{
-							Role:    "assistant",
-							Content: textContentToReturn,
-							ToolCalls: []schema.ToolCall{
-								{
-									Index: i,
-									ID:    toolCallID,
-									Type:  "function",
-									FunctionCall: schema.FunctionCall{
-										Arguments: args,
-									},
-								},
-							},
-						},
-						Index:        0,
-						FinishReason: nil,
-					}},
-					Object: "chat.completion.chunk",
-				}
+			for _, chunk := range buildDeferredToolCallChunks(
+				id, req.Model, created,
+				functionResults, lastEmittedCount,
+				sentInitialRole, *textContentToReturn,
+				sentReasoning, reasoning,
+			) {
+				responses <- chunk
 			}
 		}

--- a/core/http/endpoints/openai/chat_emit.go
+++ b/core/http/endpoints/openai/chat_emit.go
@@ -0,0 +1,233 @@
+package openai
+
+import (
+	"fmt"
+
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/functions"
+)
+
+// hasRealCall reports whether functionResults contains at least one
+// entry whose Name is something other than the noAction sentinel.
+// Used by processTools to decide between the "answer the question"
+// path and the real tool-call flush.
+func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
+	for _, fc := range functionResults {
+		if fc.Name != noAction {
+			return true
+		}
+	}
+	return false
+}
+
+// buildNoActionFinalChunks produces the closing SSE chunks for the
+// noActionToRun branch of processTools (i.e. the model chose the "answer"
+// pseudo-function or emitted no tool calls at all).
+//
+// When content was already streamed (contentAlreadyStreamed=true) the
+// helper emits a single trailing usage chunk, optionally carrying
+// reasoning that was produced but not streamed incrementally. When
+// content was not streamed it emits a role chunk followed by a
+// content+reasoning+usage chunk — the "send everything at once" fallback.
+//
+// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
+// probing the extractor's Go-side state: the C++ autoparser delivers
+// reasoning through ProcessChatDeltaReasoning which populates a
+// separate accumulator that extractor.Reasoning() does not expose.
+// Without this guard the callback would stream reasoning incrementally
+// and the final chunk would duplicate it.
+func buildNoActionFinalChunks(
+	id, model string,
+	created int,
+	contentAlreadyStreamed bool,
+	reasoningAlreadyStreamed bool,
+	content string,
+	reasoning string,
+	usage schema.OpenAIUsage,
+) []schema.OpenAIResponse {
+	var out []schema.OpenAIResponse
+
+	if contentAlreadyStreamed {
+		delta := &schema.Message{}
+		if reasoning != "" && !reasoningAlreadyStreamed {
+			r := reasoning
+			delta.Reasoning = &r
+		}
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{Delta: delta, Index: 0}},
+			Object:  "chat.completion.chunk",
+			Usage:   usage,
+		})
+		return out
+	}
+
+	// Content was not streamed — send role, then content (+reasoning) + usage.
+	out = append(out, schema.OpenAIResponse{
+		ID: id, Created: created, Model: model,
+		Choices: []schema.Choice{{
+			Delta: &schema.Message{Role: "assistant"},
+			Index: 0,
+		}},
+		Object: "chat.completion.chunk",
+	})
+
+	c := content
+	delta := &schema.Message{Content: &c}
+	if reasoning != "" && !reasoningAlreadyStreamed {
+		r := reasoning
+		delta.Reasoning = &r
+	}
+	out = append(out, schema.OpenAIResponse{
+		ID: id, Created: created, Model: model,
+		Choices: []schema.Choice{{Delta: delta, Index: 0}},
+		Object:  "chat.completion.chunk",
+		Usage:   usage,
+	})
+	return out
+}
+
+// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
+// were discovered only during final parsing (i.e. after the streaming
+// callback finished). The caller forwards every returned chunk to the
+// responses channel.
+//
+// Guarantees:
+//   - tool calls with i < lastEmittedCount are skipped (already streamed)
+//   - each emitted call yields two chunks: name-only, then args-only
+//   - no chunk ever carries both non-empty Content and non-empty ToolCalls
+//   - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
+//   - if !reasoningAlreadyStreamed && reasoningContent != "",
+//     a reasoning chunk is emitted first
+//   - if !contentAlreadyStreamed && textContent != "",
+//     a role chunk followed by a content chunk is emitted (after reasoning)
+//   - chunks order: [reasoning?] [role+content?] (name, args)+
+//   - fallback IDs for empty ss.ID are unique per index so a client can
+//     match tool_result messages back to the right call
+func buildDeferredToolCallChunks(
+	id, model string,
+	created int,
+	functionResults []functions.FuncCallResults,
+	lastEmittedCount int,
+	contentAlreadyStreamed bool,
+	textContent string,
+	reasoningAlreadyStreamed bool,
+	reasoningContent string,
+) []schema.OpenAIResponse {
+	// If every call was already emitted incrementally there's nothing to
+	// flush — and no reason to emit a standalone reasoning/content chunk.
+	hasDeferred := false
+	for i := range functionResults {
+		if i >= lastEmittedCount {
+			hasDeferred = true
+			break
+		}
+	}
+	if !hasDeferred {
+		return nil
+	}
+
+	var out []schema.OpenAIResponse
+
+	// Reasoning first — the callback path at processTools emits reasoning
+	// incrementally in its own chunks, but when the C++ autoparser only
+	// surfaces reasoning as a final aggregate the callback never sees it.
+	// Recover it here (no duplication: contentAlreadyStreamed and
+	// reasoningAlreadyStreamed track what the callback already sent).
+	if !reasoningAlreadyStreamed && reasoningContent != "" {
+		r := reasoningContent
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{
+				Delta: &schema.Message{Reasoning: &r},
+				Index: 0,
+			}},
+			Object: "chat.completion.chunk",
+		})
+	}
+
+	// Then content, when it wasn't streamed via the callback. Emit role
+	// and content in separate deltas — the OpenAI streaming contract
+	// forbids bundling content alongside tool_calls in one delta.
+	if !contentAlreadyStreamed && textContent != "" {
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{
+				Delta: &schema.Message{Role: "assistant"},
+				Index: 0,
+			}},
+			Object: "chat.completion.chunk",
+		})
+		c := textContent
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{
+				Delta: &schema.Message{Content: &c},
+				Index: 0,
+			}},
+			Object: "chat.completion.chunk",
+		})
+	}
+
+	for i, ss := range functionResults {
+		if i < lastEmittedCount {
+			// Already streamed by the incremental JSON/XML parser during
+			// the token callback — skip to avoid a duplicate emission.
+			continue
+		}
+
+		toolCallID := ss.ID
+		if toolCallID == "" {
+			// Unique per-index fallback so multiple empty-ID calls don't
+			// collide on the same request ID (clients match tool results
+			// back by tool_call_id).
+			toolCallID = fmt.Sprintf("%s-%d", id, i)
+		}
+
+		// Name chunk.
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{
+				Delta: &schema.Message{
+					Role: "assistant",
+					ToolCalls: []schema.ToolCall{{
+						Index: i,
+						ID:    toolCallID,
+						Type:  "function",
+						FunctionCall: schema.FunctionCall{
+							Name: ss.Name,
+						},
+					}},
+				},
+				Index:        0,
+				FinishReason: nil,
+			}},
+			Object: "chat.completion.chunk",
+		})
+
+		// Args chunk — no Content here. Either it was streamed through
+		// the token callback earlier, or the role+content pair above
+		// already delivered it.
+		out = append(out, schema.OpenAIResponse{
+			ID: id, Created: created, Model: model,
+			Choices: []schema.Choice{{
+				Delta: &schema.Message{
+					Role: "assistant",
+					ToolCalls: []schema.ToolCall{{
+						Index: i,
+						ID:    toolCallID,
+						Type:  "function",
+						FunctionCall: schema.FunctionCall{
+							Arguments: ss.Arguments,
+						},
+					}},
+				},
+				Index:        0,
+				FinishReason: nil,
+			}},
+			Object: "chat.completion.chunk",
+		})
+	}
+
+	return out
+}
--- a/core/http/endpoints/openai/chat_emit_test.go
+++ b/core/http/endpoints/openai/chat_emit_test.go
@@ -0,0 +1,717 @@
+package openai
+
+import (
+	"fmt"
+
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/functions"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// contentOf extracts the string payload from a chunk's delta.Content,
+// transparently handling both *string and string underlying types so
+// assertions don't have to care which one the helper produced.
+func contentOf(ch schema.OpenAIResponse) string {
+	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
+		return ""
+	}
+	switch v := ch.Choices[0].Delta.Content.(type) {
+	case *string:
+		if v == nil {
+			return ""
+		}
+		return *v
+	case string:
+		return v
+	default:
+		return ""
+	}
+}
+
+// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
+// *string on schema.Message.
+func reasoningOf(ch schema.OpenAIResponse) string {
+	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
+		return ""
+	}
+	r := ch.Choices[0].Delta.Reasoning
+	if r == nil {
+		return ""
+	}
+	return *r
+}
+
+// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
+func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
+	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
+		return nil
+	}
+	return ch.Choices[0].Delta.ToolCalls
+}
+
+// expectSpecCompliant enforces the invariants on every chunk:
+//   - Object == "chat.completion.chunk"
+//   - Exactly one Choice with Index==0
+//   - No delta ever carries both non-empty Content and non-empty ToolCalls
+//   - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
+func expectSpecCompliant(chunks []schema.OpenAIResponse) {
+	for i, ch := range chunks {
+		Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
+		Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
+		Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
+
+		hasContent := contentOf(ch) != ""
+		hasReasoning := reasoningOf(ch) != ""
+		hasToolCalls := len(toolCallsOf(ch)) > 0
+
+		if hasContent && hasToolCalls {
+			Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
+		}
+		if hasReasoning && hasToolCalls {
+			Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
+		}
+	}
+}
+
+// expectMetadata asserts every chunk carries the same id/model/created.
+func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
+	for i, ch := range chunks {
+		Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
+		Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
+		Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
+	}
+}
+
+var _ = Describe("buildDeferredToolCallChunks", func() {
+	const (
+		testID      = "req"
+		testModel   = "test-model"
+		testCreated = 1700000000
+	)
+
+	Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
+		It("emits only the tool_call chunks, no Content anywhere", func() {
+			results := []functions.FuncCallResults{
+				{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "Let me search…",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2), "two chunks: name, args")
+
+			// Name chunk
+			tc0 := toolCallsOf(chunks[0])
+			Expect(tc0).To(HaveLen(1))
+			Expect(tc0[0].Index).To(Equal(0))
+			Expect(tc0[0].ID).To(Equal("tc1"))
+			Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
+			Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+
+			// Args chunk — MUST NOT carry Content
+			tc1 := toolCallsOf(chunks[1])
+			Expect(tc1).To(HaveLen(1))
+			Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
+			Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
+			Expect(contentOf(chunks[1])).To(BeEmpty(),
+				"args chunk must not duplicate already-streamed content")
+		})
+	})
+
+	Describe("Case B — autoparser / content not streamed", func() {
+		It("emits role, content, then name+args", func() {
+			results := []functions.FuncCallResults{
+				{Name: "do", Arguments: "{}", ID: "tc1"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				false, "Here is my plan…",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(4), "role, content, name, args")
+
+			// Role chunk
+			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+			Expect(toolCallsOf(chunks[0])).To(BeEmpty())
+
+			// Content chunk
+			Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
+			Expect(toolCallsOf(chunks[1])).To(BeEmpty())
+
+			// Name + args chunks
+			Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
+			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
+			Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
+			Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
+		})
+	})
+
+	Describe("Case C — multiple deferred calls, content already streamed", func() {
+		It("emits (name, args) × 3 with no Content anywhere", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tcA"},
+				{Name: "b", Arguments: "{}", ID: "tcB"},
+				{Name: "c", Arguments: "{}", ID: "tcC"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "some narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(6))
+
+			for i := 0; i < 3; i++ {
+				Expect(contentOf(chunks[2*i])).To(BeEmpty(),
+					"call #%d name chunk must not carry Content", i)
+				Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
+					"call #%d args chunk must not carry Content", i)
+				Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
+				Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
+			}
+			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
+			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
+			Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
+		})
+	})
+
+	Describe("Case D — partial incremental emission", func() {
+		It("emits only the deferred tail (call #1), skipping #0", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc0"},
+				{Name: "b", Arguments: "{}", ID: "tc1"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 1,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2))
+			Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
+			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
+			Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
+			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
+		})
+	})
+
+	Describe("Case E — all calls already emitted incrementally", func() {
+		It("emits nothing", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc0"},
+				{Name: "b", Arguments: "{}", ID: "tc1"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 2,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(BeEmpty())
+		})
+	})
+
+	Describe("Case F — content not streamed but textContent empty", func() {
+		It("emits only the tool call chunks, no leading role/content", func() {
+			results := []functions.FuncCallResults{
+				{Name: "x", Arguments: "{}", ID: "tcX"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				false, "",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2))
+			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
+			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
+		})
+	})
+
+	Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
+		It("emits a deterministic per-index fallback", func() {
+			results := []functions.FuncCallResults{
+				{Name: "x", Arguments: "{}", ID: ""},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2))
+			expectedID := fmt.Sprintf("%s-%d", testID, 0)
+			Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
+			Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
+		})
+	})
+
+	Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
+		It("avoids the collision bug where every empty-ID call shared the request id", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: ""},
+				{Name: "b", Arguments: "{}", ID: ""},
+				{Name: "c", Arguments: "{}", ID: ""},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(6))
+
+			ids := map[string]int{}
+			for _, ch := range chunks {
+				for _, tc := range toolCallsOf(ch) {
+					ids[tc.ID]++
+				}
+			}
+			// Each call yields a name chunk + args chunk → each distinct ID
+			// should appear in exactly two chunks. Three distinct IDs
+			// overall.
+			Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
+			for id, n := range ids {
+				Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
+			}
+		})
+	})
+
+	Describe("Case H — indices preserved across skip with multiple calls", func() {
+		It("emits Index fields matching functionResults positions", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc0"},
+				{Name: "b", Arguments: "{}", ID: "tc1"},
+				{Name: "c", Arguments: "{}", ID: "tc2"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 1,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(4))
+
+			Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
+			Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
+			Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
+			Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
+		})
+	})
+
+	Describe("Case I — explicit non-empty ID is preserved", func() {
+		It("does not touch ss.ID when it's already set", func() {
+			results := []functions.FuncCallResults{
+				{Name: "x", Arguments: "{}", ID: "abc123"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2))
+			Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
+			Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
+		})
+	})
+
+	Describe("Case J — chunk-shape sanity", func() {
+		It("splits Name into the first chunk and Arguments into the second", func() {
+			results := []functions.FuncCallResults{
+				{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "narration",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2))
+
+			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
+			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
+
+			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
+			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
+		})
+	})
+
+	Describe("Case K — metadata propagation", func() {
+		It("stamps every chunk with the same id/model/created", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tcA"},
+				{Name: "b", Arguments: "{}", ID: "tcB"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				false, "hello",
+				true, "",
+			)
+
+			expectSpecCompliant(chunks)
+			expectMetadata(chunks, testID, testModel, testCreated)
+		})
+	})
+
+	Describe("Case L — Choices[0].Index == 0 invariant", func() {
+		It("is upheld across every branch the helper can take", func() {
+			scenarios := []struct {
+				name                  string
+				functionResults       []functions.FuncCallResults
+				lastEmittedCount      int
+				contentStreamed       bool
+				text                  string
+				reasoningStreamed     bool
+				reasoning             string
+			}{
+				{"streamed-content-deferred-call",
+					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
+					0, true, "hi", true, ""},
+				{"unstreamed-content-deferred-call",
+					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
+					0, false, "hello", true, ""},
+				{"unstreamed-reasoning-and-content",
+					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
+					0, false, "hello", false, "thinking…"},
+				{"partial-incremental",
+					[]functions.FuncCallResults{
+						{Name: "a", Arguments: "{}"},
+						{Name: "b", Arguments: "{}"}},
+					1, true, "hi", true, ""},
+			}
+			for _, sc := range scenarios {
+				chunks := buildDeferredToolCallChunks(
+					testID, testModel, testCreated,
+					sc.functionResults, sc.lastEmittedCount,
+					sc.contentStreamed, sc.text,
+					sc.reasoningStreamed, sc.reasoning,
+				)
+				for i, ch := range chunks {
+					Expect(ch.Choices[0].Index).To(Equal(0),
+						"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
+				}
+			}
+		})
+	})
+
+	Describe("Case M — spec compliance across every scenario", func() {
+		It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
+			scenarios := []struct {
+				name                  string
+				functionResults       []functions.FuncCallResults
+				lastEmittedCount      int
+				contentStreamed       bool
+				text                  string
+				reasoningStreamed     bool
+				reasoning             string
+			}{
+				{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
+					0, true, "already-streamed", true, ""},
+				{"C", []functions.FuncCallResults{
+					{Name: "a", Arguments: "{}", ID: "tc0"},
+					{Name: "b", Arguments: "{}", ID: "tc1"}},
+					0, true, "already-streamed", true, ""},
+				{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
+					0, false, "plan", true, ""},
+				{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
+					0, false, "plan", false, "thinking…"},
+			}
+			for _, sc := range scenarios {
+				chunks := buildDeferredToolCallChunks(
+					testID, testModel, testCreated,
+					sc.functionResults, sc.lastEmittedCount,
+					sc.contentStreamed, sc.text,
+					sc.reasoningStreamed, sc.reasoning,
+				)
+				for i, ch := range chunks {
+					hasContent := contentOf(ch) != ""
+					hasReasoning := reasoningOf(ch) != ""
+					hasToolCalls := len(toolCallsOf(ch)) > 0
+					Expect(hasContent && hasToolCalls).To(BeFalse(),
+						"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
+					Expect(hasReasoning && hasToolCalls).To(BeFalse(),
+						"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
+				}
+			}
+		})
+	})
+
+	Describe("Case N — empty functionResults", func() {
+		It("emits nothing, including no leading role/content/reasoning", func() {
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				nil, 0,
+				false, "ignored",
+				false, "ignored",
+			)
+			Expect(chunks).To(BeEmpty())
+		})
+	})
+
+	Describe("Case O — content not streamed but all calls already emitted", func() {
+		It("emits nothing, not even a standalone content chunk", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc0"},
+				{Name: "b", Arguments: "{}", ID: "tc1"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 2,
+				false, "narration",
+				false, "thinking…",
+			)
+			Expect(chunks).To(BeEmpty(),
+				"no tool_calls to trigger on, so no leading role/content/reasoning either")
+		})
+	})
+
+	Describe("Reasoning — autoparser delivered reasoning only at end", func() {
+		It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "streamed content",
+				false, "model's private thoughts",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(3), "reasoning, name, args")
+
+			Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+			Expect(toolCallsOf(chunks[0])).To(BeEmpty())
+
+			// The following two are the tool_call name + args chunks.
+			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
+			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
+		})
+
+		It("emits reasoning before role+content when neither was streamed", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				false, "final plan",
+				false, "private thoughts",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
+
+			Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
+			Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
+			Expect(contentOf(chunks[2])).To(Equal("final plan"))
+			Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
+			Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
+		})
+
+		It("does not re-emit reasoning that was already streamed", func() {
+			results := []functions.FuncCallResults{
+				{Name: "a", Arguments: "{}", ID: "tc"},
+			}
+			chunks := buildDeferredToolCallChunks(
+				testID, testModel, testCreated,
+				results, 0,
+				true, "streamed",
+				true, "already-sent reasoning",
+			)
+
+			expectSpecCompliant(chunks)
+			Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
+			for _, ch := range chunks {
+				Expect(reasoningOf(ch)).To(BeEmpty())
+			}
+		})
+	})
+})
+
+var _ = Describe("hasRealCall", func() {
+	const noAction = "answer"
+
+	It("returns false for nil and empty slices", func() {
+		Expect(hasRealCall(nil, noAction)).To(BeFalse())
+		Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
+	})
+
+	It("returns false when every entry is the noAction sentinel", func() {
+		results := []functions.FuncCallResults{
+			{Name: noAction, Arguments: `{"message":"hi"}`},
+			{Name: noAction, Arguments: `{"message":"hello"}`},
+		}
+		Expect(hasRealCall(results, noAction)).To(BeFalse())
+	})
+
+	It("returns true when only one entry is a real call", func() {
+		results := []functions.FuncCallResults{
+			{Name: "search", Arguments: "{}"},
+		}
+		Expect(hasRealCall(results, noAction)).To(BeTrue())
+	})
+
+	It("returns true when a real call follows a noAction entry", func() {
+		// This is the regression the follow-up fixes: the old
+		// functionResults[0].Name == noAction check would declare this
+		// noActionToRun and drop the real call entirely.
+		results := []functions.FuncCallResults{
+			{Name: noAction, Arguments: `{"message":"hi"}`},
+			{Name: "search", Arguments: "{}"},
+		}
+		Expect(hasRealCall(results, noAction)).To(BeTrue())
+	})
+
+	It("returns true when a real call precedes a noAction entry", func() {
+		results := []functions.FuncCallResults{
+			{Name: "search", Arguments: "{}"},
+			{Name: noAction, Arguments: `{"message":"hi"}`},
+		}
+		Expect(hasRealCall(results, noAction)).To(BeTrue())
+	})
+})
+
+var _ = Describe("buildNoActionFinalChunks", func() {
+	const (
+		testID      = "req"
+		testModel   = "test-model"
+		testCreated = 1700000000
+	)
+	usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
+
+	Describe("Content streamed — trailing usage chunk", func() {
+		It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				true, true,
+				"", "already-streamed-reasoning", usage,
+			)
+
+			Expect(chunks).To(HaveLen(1))
+			Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+			Expect(reasoningOf(chunks[0])).To(BeEmpty(),
+				"reasoning must not be re-emitted once it was streamed via the callback")
+		})
+
+		It("emits a trailing reasoning delivery when reasoning came only at end", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				true, false,
+				"", "autoparser final reasoning", usage,
+			)
+
+			Expect(chunks).To(HaveLen(1))
+			Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+			Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
+		})
+
+		It("omits reasoning when it's empty regardless of streamed flag", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				true, false,
+				"", "", usage,
+			)
+
+			Expect(chunks).To(HaveLen(1))
+			Expect(reasoningOf(chunks[0])).To(BeEmpty())
+		})
+	})
+
+	Describe("Content not streamed — role, then content+usage", func() {
+		It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				false, true,
+				"the answer", "already-streamed-reasoning", usage,
+			)
+
+			Expect(chunks).To(HaveLen(2))
+			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
+			Expect(contentOf(chunks[0])).To(BeEmpty())
+
+			Expect(contentOf(chunks[1])).To(Equal("the answer"))
+			Expect(reasoningOf(chunks[1])).To(BeEmpty(),
+				"reasoning must not be re-emitted if it was streamed earlier")
+			Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
+		})
+
+		It("emits role, then content+reasoning when reasoning was not streamed", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				false, false,
+				"the answer", "autoparser final reasoning", usage,
+			)
+
+			Expect(chunks).To(HaveLen(2))
+			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
+
+			Expect(contentOf(chunks[1])).To(Equal("the answer"))
+			Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
+			Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
+		})
+
+		It("still emits content even when reasoning is empty", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				false, false,
+				"just an answer", "", usage,
+			)
+
+			Expect(chunks).To(HaveLen(2))
+			Expect(contentOf(chunks[1])).To(Equal("just an answer"))
+			Expect(reasoningOf(chunks[1])).To(BeEmpty())
+		})
+	})
+
+	Describe("Metadata and shape invariants", func() {
+		It("stamps every chunk with the same id/model/created and object", func() {
+			chunks := buildNoActionFinalChunks(
+				testID, testModel, testCreated,
+				false, false,
+				"hi", "reasoning", usage,
+			)
+			for i, ch := range chunks {
+				Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
+				Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
+				Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
+				Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
+				Expect(ch.Choices).To(HaveLen(1))
+				Expect(ch.Choices[0].Index).To(Equal(0))
+			}
+		})
+	})
+})
--- a/core/http/middleware/trace.go
+++ b/core/http/middleware/trace.go
@@ -3,6 +3,7 @@ package middleware
 import (
 	"bytes"
 	"io"
+	"mime"
 	"net/http"
 	"slices"
 	"sync"
@@ -94,7 +95,8 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {

 			initializeTracing(app.ApplicationConfig().TracingMaxItems)

-			if c.Request().Header.Get("Content-Type") != "application/json" {
+			ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
+			if ct != "application/json" {
 				return next(c)
 			}

--- a/core/http/react-ui/src/utils/capabilities.js
+++ b/core/http/react-ui/src/utils/capabilities.js
@@ -12,3 +12,4 @@ export const CAP_TOKENIZE = 'FLAG_TOKENIZE'
 export const CAP_VAD = 'FLAG_VAD'
 export const CAP_VIDEO = 'FLAG_VIDEO'
 export const CAP_DETECTION = 'FLAG_DETECTION'
+export const CAP_FACE_RECOGNITION = 'FLAG_FACE_RECOGNITION'
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -97,6 +97,28 @@ func RegisterLocalAIRoutes(router *echo.Echo,
 		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
 		requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }))

+	// Face recognition endpoints
+	faceMw := []echo.MiddlewareFunc{
+		requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_FACE_RECOGNITION)),
+	}
+	router.POST("/v1/face/verify",
+		localai.FaceVerifyEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceVerifyRequest) }))...)
+	router.POST("/v1/face/analyze",
+		localai.FaceAnalyzeEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceAnalyzeRequest) }))...)
+	router.POST("/v1/face/embed",
+		localai.FaceEmbedEndpoint(cl, ml, appConfig),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceEmbedRequest) }))...)
+	router.POST("/v1/face/register",
+		localai.FaceRegisterEndpoint(cl, ml, appConfig, app.FaceRegistry()),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceRegisterRequest) }))...)
+	router.POST("/v1/face/identify",
+		localai.FaceIdentifyEndpoint(cl, ml, appConfig, app.FaceRegistry()),
+		append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceIdentifyRequest) }))...)
+	// Forget does not load a face model — it only needs the registry.
+	router.POST("/v1/face/forget", localai.FaceForgetEndpoint(app.FaceRegistry()))
+
 	ttsHandler := localai.TTSEndpoint(cl, ml, appConfig)
 	router.POST("/tts",
 		ttsHandler,
--- a/core/http/routes/ui_api.go
+++ b/core/http/routes/ui_api.go
@@ -23,7 +23,6 @@ import (
 	"github.com/mudler/LocalAI/core/gallery"
 	"github.com/mudler/LocalAI/core/http/auth"
 	"github.com/mudler/LocalAI/core/http/endpoints/localai"
-	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/p2p"
 	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/pkg/model"
@@ -1458,24 +1457,5 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 		app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
 	}

-	// Logs API (admin only)
-	app.GET("/api/traces", func(c echo.Context) error {
-		if !appConfig.EnableTracing {
-			return c.JSON(503, map[string]any{
-				"error": "Tracing disabled",
-			})
-		}
-		traces := middleware.GetTraces()
-		return c.JSON(200, map[string]any{
-			"traces": traces,
-		})
-	}, adminMiddleware)
-
-	app.POST("/api/traces/clear", func(c echo.Context) error {
-		middleware.ClearTraces()
-		return c.JSON(200, map[string]any{
-			"message": "Traces cleared",
-		})
-	}, adminMiddleware)
 }

--- a/core/schema/localai.go
+++ b/core/schema/localai.go
@@ -173,6 +173,123 @@ type Detection struct {
 	Mask       string  `json:"mask,omitempty"` // base64-encoded PNG segmentation mask
 }

+// ─── Face recognition ──────────────────────────────────────────────
+//
+// FacialArea describes a bounding box for a detected face.
+type FacialArea struct {
+	X float32 `json:"x"`
+	Y float32 `json:"y"`
+	W float32 `json:"w"`
+	H float32 `json:"h"`
+}
+
+// FaceVerifyRequest compares two images to decide whether they depict
+// the same person. Img1 and Img2 accept URL, base64, or data-URI.
+type FaceVerifyRequest struct {
+	BasicModelRequest
+	Img1         string  `json:"img1"`
+	Img2         string  `json:"img2"`
+	Threshold    float32 `json:"threshold,omitempty"`
+	AntiSpoofing bool    `json:"anti_spoofing,omitempty"`
+}
+
+type FaceVerifyResponse struct {
+	Verified         bool       `json:"verified"`
+	Distance         float32    `json:"distance"`
+	Threshold        float32    `json:"threshold"`
+	Confidence       float32    `json:"confidence"`
+	Model            string     `json:"model"`
+	Img1Area         FacialArea `json:"img1_area"`
+	Img2Area         FacialArea `json:"img2_area"`
+	ProcessingTimeMs float32    `json:"processing_time_ms,omitempty"`
+}
+
+// FaceAnalyzeRequest asks the backend for demographic attributes on
+// every face detected in Img.
+type FaceAnalyzeRequest struct {
+	BasicModelRequest
+	Img          string   `json:"img"`
+	Actions      []string `json:"actions,omitempty"` // subset of {"age","gender","emotion","race"}
+	AntiSpoofing bool     `json:"anti_spoofing,omitempty"`
+}
+
+type FaceAnalyzeResponse struct {
+	Faces []FaceAnalysis `json:"faces"`
+}
+
+type FaceAnalysis struct {
+	Region          FacialArea         `json:"region"`
+	FaceConfidence  float32            `json:"face_confidence"`
+	Age             float32            `json:"age,omitempty"`
+	DominantGender  string             `json:"dominant_gender,omitempty"`
+	Gender          map[string]float32 `json:"gender,omitempty"`
+	DominantEmotion string             `json:"dominant_emotion,omitempty"`
+	Emotion         map[string]float32 `json:"emotion,omitempty"`
+	DominantRace    string             `json:"dominant_race,omitempty"`
+	Race            map[string]float32 `json:"race,omitempty"`
+	IsReal          bool               `json:"is_real,omitempty"`
+	AntispoofScore  float32            `json:"antispoof_score,omitempty"`
+}
+
+// FaceEmbedRequest extracts a face embedding from an image. Distinct
+// from /v1/embeddings (which is OpenAI-compatible and text-only); this
+// endpoint accepts URL / base64 / data-URI image inputs.
+type FaceEmbedRequest struct {
+	BasicModelRequest
+	Img string `json:"img"`
+}
+
+type FaceEmbedResponse struct {
+	Embedding []float32 `json:"embedding"`
+	Dim       int       `json:"dim"`
+	Model     string    `json:"model,omitempty"`
+}
+
+// FaceRegisterRequest enrolls a face into the 1:N recognition store.
+type FaceRegisterRequest struct {
+	BasicModelRequest
+	Img    string            `json:"img"`
+	Name   string            `json:"name"`
+	Labels map[string]string `json:"labels,omitempty"`
+	Store  string            `json:"store,omitempty"` // vector store model; empty = local-store default
+}
+
+type FaceRegisterResponse struct {
+	ID           string    `json:"id"`
+	Name         string    `json:"name"`
+	RegisteredAt time.Time `json:"registered_at"`
+}
+
+// FaceIdentifyRequest runs 1:N recognition: embed the probe and
+// return the top-K nearest registered faces.
+type FaceIdentifyRequest struct {
+	BasicModelRequest
+	Img       string  `json:"img"`
+	TopK      int     `json:"top_k,omitempty"`
+	Threshold float32 `json:"threshold,omitempty"` // optional cutoff on distance
+	Store     string  `json:"store,omitempty"`
+}
+
+type FaceIdentifyResponse struct {
+	Matches []FaceIdentifyMatch `json:"matches"`
+}
+
+type FaceIdentifyMatch struct {
+	ID         string            `json:"id"`
+	Name       string            `json:"name"`
+	Labels     map[string]string `json:"labels,omitempty"`
+	Distance   float32           `json:"distance"`
+	Confidence float32           `json:"confidence"`
+	Match      bool              `json:"match"` // true when distance <= threshold
+}
+
+// FaceForgetRequest removes a previously-registered face by ID.
+type FaceForgetRequest struct {
+	BasicModelRequest
+	ID    string `json:"id"`
+	Store string `json:"store,omitempty"`
+}
+
 type ImportModelRequest struct {
 	URI         string          `json:"uri"`
 	Preferences json.RawMessage `json:"preferences,omitempty"`
--- a/core/services/facerecognition/registry.go
+++ b/core/services/facerecognition/registry.go
@@ -0,0 +1,60 @@
+// Package facerecognition provides a swappable backing store for face
+// embeddings and the 1:N identification pipeline that sits on top of it.
+//
+// The current implementation (NewStoreRegistry) is backed by LocalAI's
+// in-memory local-store gRPC backend. This is in-memory only — all
+// registrations are lost when LocalAI restarts.
+//
+// TODO: add a persistent PostgreSQL/pgvector-backed implementation for
+// production deployments. The Registry interface is explicitly designed
+// so the swap is a constructor change in core/application, with zero
+// HTTP-handler changes.
+package facerecognition
+
+import (
+	"context"
+	"errors"
+	"time"
+)
+
+// Registry stores face embeddings keyed by an opaque ID and supports
+// approximate similarity search. Implementations are expected to be
+// safe for concurrent use.
+type Registry interface {
+	// Register stores a face embedding alongside its metadata.
+	// Returns the stored metadata with ID and RegisteredAt populated.
+	// The embedding length must match the registry's expected dimension.
+	Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error)
+
+	// Identify returns up to topK matches for the probe embedding,
+	// sorted by ascending distance (closest first).
+	Identify(ctx context.Context, probe []float32, topK int) ([]Match, error)
+
+	// Forget removes a previously-registered embedding by ID.
+	// Returns ErrNotFound if the ID is unknown.
+	Forget(ctx context.Context, id string) error
+}
+
+// Metadata is the user-supplied payload stored alongside a face embedding.
+type Metadata struct {
+	// ID is populated by the registry at Register time and should not be
+	// set by the caller. It is echoed back in Match.Metadata.
+	ID           string            `json:"id"`
+	Name         string            `json:"name"`
+	Labels       map[string]string `json:"labels,omitempty"`
+	RegisteredAt time.Time         `json:"registered_at"`
+}
+
+// Match is a single result from Identify, ranked by similarity.
+type Match struct {
+	ID       string
+	Metadata Metadata
+	Distance float32 // 1 - cosine_similarity; lower = closer
+}
+
+// Sentinel errors; callers should compare with errors.Is.
+var (
+	ErrNotFound          = errors.New("facerecognition: id not found")
+	ErrEmptyEmbedding    = errors.New("facerecognition: embedding is empty")
+	ErrDimensionMismatch = errors.New("facerecognition: embedding dimension mismatch")
+)
--- a/core/services/facerecognition/registry_test.go
+++ b/core/services/facerecognition/registry_test.go
@@ -0,0 +1,253 @@
+package facerecognition_test
+
+import (
+	"context"
+	"errors"
+	"math"
+	"sync"
+	"testing"
+
+	"github.com/mudler/LocalAI/core/services/facerecognition"
+	"github.com/mudler/LocalAI/pkg/grpc"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+
+	grpclib "google.golang.org/grpc"
+)
+
+const dim = 4 // tiny test-friendly embedding dimension
+
+func TestRegisterIdentifyForget(t *testing.T) {
+	t.Parallel()
+
+	reg, fake := newTestRegistry(t)
+	ctx := t.Context()
+
+	alice := []float32{1, 0, 0, 0}
+	bob := []float32{0, 1, 0, 0}
+
+	aliceMeta, err := reg.Register(ctx, alice, facerecognition.Metadata{Name: "Alice"})
+	if err != nil {
+		t.Fatalf("Register Alice: %v", err)
+	}
+	if aliceMeta.ID == "" {
+		t.Fatalf("Register returned empty ID")
+	}
+	if aliceMeta.RegisteredAt.IsZero() {
+		t.Fatalf("Register did not populate RegisteredAt")
+	}
+
+	bobMeta, err := reg.Register(ctx, bob, facerecognition.Metadata{Name: "Bob"})
+	if err != nil {
+		t.Fatalf("Register Bob: %v", err)
+	}
+	if bobMeta.ID == aliceMeta.ID {
+		t.Fatalf("IDs should be distinct, got %q twice", bobMeta.ID)
+	}
+	aliceID := aliceMeta.ID
+	if got, want := fake.len(), 2; got != want {
+		t.Fatalf("fake store has %d entries, want %d", got, want)
+	}
+
+	// Identify an Alice-like probe — she should win.
+	matches, err := reg.Identify(ctx, []float32{0.99, 0.01, 0, 0}, 2)
+	if err != nil {
+		t.Fatalf("Identify: %v", err)
+	}
+	if len(matches) == 0 {
+		t.Fatalf("no matches returned")
+	}
+	if matches[0].Metadata.Name != "Alice" {
+		t.Fatalf("top match name = %q, want Alice", matches[0].Metadata.Name)
+	}
+	if matches[0].ID != aliceID {
+		t.Fatalf("top match ID = %q, want %q", matches[0].ID, aliceID)
+	}
+	// Sorted ascending by distance.
+	for i := 1; i < len(matches); i++ {
+		if matches[i].Distance < matches[i-1].Distance {
+			t.Fatalf("matches not sorted by distance: %v", matches)
+		}
+	}
+
+	// Forget Alice → she's gone, Bob remains.
+	if err := reg.Forget(ctx, aliceID); err != nil {
+		t.Fatalf("Forget Alice: %v", err)
+	}
+	if got, want := fake.len(), 1; got != want {
+		t.Fatalf("after Forget, store has %d entries, want %d", got, want)
+	}
+
+	// Forget unknown ID → ErrNotFound (checkable via errors.Is).
+	if err := reg.Forget(ctx, "nonexistent"); !errors.Is(err, facerecognition.ErrNotFound) {
+		t.Fatalf("Forget unknown: err = %v, want ErrNotFound", err)
+	}
+}
+
+func TestRegisterRejectsBadEmbedding(t *testing.T) {
+	t.Parallel()
+
+	reg, _ := newTestRegistry(t)
+	ctx := t.Context()
+
+	tests := []struct {
+		name    string
+		embed   []float32
+		wantErr error
+	}{
+		{"empty", []float32{}, facerecognition.ErrEmptyEmbedding},
+		{"wrong_dim", []float32{1, 2}, facerecognition.ErrDimensionMismatch},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			t.Parallel()
+			_, err := reg.Register(ctx, tc.embed, facerecognition.Metadata{Name: "x"})
+			if !errors.Is(err, tc.wantErr) {
+				t.Fatalf("err = %v, want wrapping %v", err, tc.wantErr)
+			}
+		})
+	}
+}
+
+func TestConcurrent(t *testing.T) {
+	t.Parallel()
+
+	reg, _ := newTestRegistry(t)
+	ctx := t.Context()
+
+	done := make(chan struct{})
+	for i := range 32 {
+		go func(i int) {
+			embed := []float32{float32(i % 4), float32((i + 1) % 4), 0, 1}
+			meta, err := reg.Register(ctx, embed, facerecognition.Metadata{Name: "n"})
+			if err == nil {
+				_, _ = reg.Identify(ctx, embed, 3)
+				_ = reg.Forget(ctx, meta.ID)
+			}
+			done <- struct{}{}
+		}(i)
+	}
+	for range 32 {
+		<-done
+	}
+}
+
+// ─── fake gRPC backend ───────────────────────────────────────────────
+
+func newTestRegistry(t *testing.T) (facerecognition.Registry, *fakeBackend) {
+	t.Helper()
+	fake := &fakeBackend{}
+	resolver := func(_ context.Context, _ string) (grpc.Backend, error) {
+		return fake, nil
+	}
+	return facerecognition.NewStoreRegistry(resolver, "test-store", dim), fake
+}
+
+// fakeBackend implements just enough of grpc.Backend for the store
+// helpers. All other methods panic so any accidental dependency is
+// visible in tests.
+type fakeBackend struct {
+	grpc.Backend // embed to inherit no-op default method set via panic
+
+	mu   sync.Mutex
+	keys [][]float32
+	vals [][]byte
+}
+
+func (f *fakeBackend) len() int {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	return len(f.keys)
+}
+
+func (f *fakeBackend) StoresSet(_ context.Context, in *pb.StoresSetOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	for i, k := range in.Keys {
+		f.keys = append(f.keys, append([]float32(nil), k.Floats...))
+		f.vals = append(f.vals, append([]byte(nil), in.Values[i].Bytes...))
+	}
+	return &pb.Result{Success: true}, nil
+}
+
+func (f *fakeBackend) StoresDelete(_ context.Context, in *pb.StoresDeleteOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+	for _, k := range in.Keys {
+		idx := f.findKey(k.Floats)
+		if idx < 0 {
+			continue
+		}
+		f.keys = append(f.keys[:idx], f.keys[idx+1:]...)
+		f.vals = append(f.vals[:idx], f.vals[idx+1:]...)
+	}
+	return &pb.Result{Success: true}, nil
+}
+
+func (f *fakeBackend) StoresFind(_ context.Context, in *pb.StoresFindOptions, _ ...grpclib.CallOption) (*pb.StoresFindResult, error) {
+	f.mu.Lock()
+	defer f.mu.Unlock()
+
+	type scored struct {
+		key []float32
+		val []byte
+		sim float32
+	}
+	results := make([]scored, 0, len(f.keys))
+	for i, k := range f.keys {
+		results = append(results, scored{k, f.vals[i], cosine(k, in.Key.Floats)})
+	}
+	// Sort descending by similarity.
+	for i := 0; i < len(results); i++ {
+		for j := i + 1; j < len(results); j++ {
+			if results[j].sim > results[i].sim {
+				results[i], results[j] = results[j], results[i]
+			}
+		}
+	}
+
+	top := int(in.TopK)
+	if top <= 0 || top > len(results) {
+		top = len(results)
+	}
+	out := &pb.StoresFindResult{}
+	for _, r := range results[:top] {
+		out.Keys = append(out.Keys, &pb.StoresKey{Floats: r.key})
+		out.Values = append(out.Values, &pb.StoresValue{Bytes: r.val})
+		out.Similarities = append(out.Similarities, r.sim)
+	}
+	return out, nil
+}
+
+func (f *fakeBackend) findKey(target []float32) int {
+	for i, k := range f.keys {
+		if equalFloats(k, target) {
+			return i
+		}
+	}
+	return -1
+}
+
+func equalFloats(a, b []float32) bool {
+	if len(a) != len(b) {
+		return false
+	}
+	for i := range a {
+		if a[i] != b[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func cosine(a, b []float32) float32 {
+	var dot, na, nb float64
+	for i := range a {
+		dot += float64(a[i]) * float64(b[i])
+		na += float64(a[i]) * float64(a[i])
+		nb += float64(b[i]) * float64(b[i])
+	}
+	if na == 0 || nb == 0 {
+		return 0
+	}
+	return float32(dot / (math.Sqrt(na) * math.Sqrt(nb)))
+}
--- a/core/services/facerecognition/store_registry.go
+++ b/core/services/facerecognition/store_registry.go
@@ -0,0 +1,142 @@
+package facerecognition
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"sort"
+	"sync"
+	"time"
+
+	"github.com/google/uuid"
+
+	"github.com/mudler/LocalAI/pkg/grpc"
+	"github.com/mudler/LocalAI/pkg/store"
+)
+
+// StoreResolver resolves a named vector store to a gRPC backend. The
+// HTTP handler layer wires this to backend.StoreBackend so the
+// registry stays decoupled from the ModelLoader plumbing.
+type StoreResolver func(ctx context.Context, storeName string) (grpc.Backend, error)
+
+// NewStoreRegistry returns a Registry backed by LocalAI's generic
+// StoresSet / StoresFind / StoresDelete gRPC surface.
+//
+// storeName selects which vector-store model to use (defaults to the
+// local-store Go backend). `dim` is the expected embedding dimension;
+// pass 0 to accept whatever dimension arrives (useful when the face
+// backend exposes multiple recognizers of different sizes, e.g.
+// ArcFace R50 at 512 vs SFace at 128). A non-zero dim is enforced at
+// Register time and fails fast with ErrDimensionMismatch.
+func NewStoreRegistry(resolve StoreResolver, storeName string, dim int) Registry {
+	return &storeRegistry{
+		resolve:   resolve,
+		storeName: storeName,
+		dim:       dim,
+	}
+}
+
+type storeRegistry struct {
+	resolve   StoreResolver
+	storeName string
+	dim       int
+
+	// TODO(postgres): the local-store gRPC surface keys by embedding
+	// vector and exposes no "list all" method, so we cannot delete by
+	// ID without remembering the embedding. This in-memory index is
+	// rebuilt on every Register and lost on restart — acceptable while
+	// the only implementation is itself in-memory. A persistent
+	// implementation must rebuild this index at startup.
+	idIndex sync.Map // map[string][]float32
+}
+
+func (r *storeRegistry) Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error) {
+	if len(embedding) == 0 {
+		return Metadata{}, ErrEmptyEmbedding
+	}
+	if r.dim != 0 && len(embedding) != r.dim {
+		return Metadata{}, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(embedding))
+	}
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+
+	meta.ID = uuid.NewString()
+	if meta.RegisteredAt.IsZero() {
+		meta.RegisteredAt = time.Now().UTC()
+	}
+
+	payload, err := json.Marshal(meta)
+	if err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: marshal metadata: %w", err)
+	}
+
+	if err := store.SetSingle(ctx, backend, embedding, payload); err != nil {
+		return Metadata{}, fmt.Errorf("facerecognition: set: %w", err)
+	}
+
+	// Retain a copy so Forget can look up the embedding by ID.
+	embCopy := append([]float32(nil), embedding...)
+	r.idIndex.Store(meta.ID, embCopy)
+	return meta, nil
+}
+
+func (r *storeRegistry) Identify(ctx context.Context, probe []float32, topK int) ([]Match, error) {
+	if len(probe) == 0 {
+		return nil, ErrEmptyEmbedding
+	}
+	if r.dim != 0 && len(probe) != r.dim {
+		return nil, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(probe))
+	}
+	if topK <= 0 {
+		topK = 5
+	}
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return nil, fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+
+	_, values, similarities, err := store.Find(ctx, backend, probe, topK)
+	if err != nil {
+		return nil, fmt.Errorf("facerecognition: find: %w", err)
+	}
+
+	matches := make([]Match, 0, len(values))
+	for i, raw := range values {
+		var meta Metadata
+		if err := json.Unmarshal(raw, &meta); err != nil {
+			// Skip unreadable entries instead of failing the whole query —
+			// the store may contain non-face records in shared deployments.
+			continue
+		}
+		matches = append(matches, Match{
+			ID:       meta.ID,
+			Metadata: meta,
+			Distance: 1 - similarities[i],
+		})
+	}
+
+	sort.SliceStable(matches, func(i, j int) bool { return matches[i].Distance < matches[j].Distance })
+	return matches, nil
+}
+
+func (r *storeRegistry) Forget(ctx context.Context, id string) error {
+	raw, ok := r.idIndex.Load(id)
+	if !ok {
+		return ErrNotFound
+	}
+	embedding := raw.([]float32)
+
+	backend, err := r.resolve(ctx, r.storeName)
+	if err != nil {
+		return fmt.Errorf("facerecognition: resolve store: %w", err)
+	}
+	if err := store.DeleteSingle(ctx, backend, embedding); err != nil {
+		return fmt.Errorf("facerecognition: delete: %w", err)
+	}
+	r.idIndex.Delete(id)
+	return nil
+}
--- a/core/services/messaging/subjects.go
+++ b/core/services/messaging/subjects.go
@@ -124,8 +124,13 @@ func SubjectNodeBackendInstall(nodeID string) string {
 // BackendInstallRequest is the payload for a backend.install NATS request.
 type BackendInstallRequest struct {
 	Backend          string `json:"backend"`
-	ModelID          string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
+	ModelID          string `json:"model_id,omitempty"`
 	BackendGalleries string `json:"backend_galleries,omitempty"`
+	// URI is set for external installs (OCI image, URL, or path). When non-empty
+	// the worker routes to InstallExternalBackend instead of the gallery lookup.
+	URI   string `json:"uri,omitempty"`
+	Name  string `json:"name,omitempty"`
+	Alias string `json:"alias,omitempty"`
 }

 // BackendInstallReply is the response from a backend.install NATS request.
--- a/core/services/nodes/health_mock_test.go
+++ b/core/services/nodes/health_mock_test.go
@@ -168,6 +168,12 @@ func (c *fakeBackendClient) SoundGeneration(_ context.Context, _ *pb.SoundGenera
 func (c *fakeBackendClient) Detect(_ context.Context, _ *pb.DetectOptions, _ ...ggrpc.CallOption) (*pb.DetectResponse, error) {
 	return nil, nil
 }
+func (c *fakeBackendClient) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	return nil, nil
+}
+func (c *fakeBackendClient) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	return nil, nil
+}
 func (c *fakeBackendClient) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
 	return nil, nil
 }
--- a/core/services/nodes/inflight_test.go
+++ b/core/services/nodes/inflight_test.go
@@ -91,6 +91,14 @@ func (f *fakeGRPCBackend) Detect(_ context.Context, _ *pb.DetectOptions, _ ...gg
 	return &pb.DetectResponse{}, nil
 }

+func (f *fakeGRPCBackend) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
+	return &pb.FaceVerifyResponse{}, nil
+}
+
+func (f *fakeGRPCBackend) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
+	return &pb.FaceAnalyzeResponse{}, nil
+}
+
 func (f *fakeGRPCBackend) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
 	return &pb.TranscriptResult{}, nil
 }
--- a/core/services/nodes/managers_distributed.go
+++ b/core/services/nodes/managers_distributed.go
@@ -106,6 +106,13 @@ func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context
 		if node.Status == StatusPending {
 			continue
 		}
+		// Backend lifecycle ops only make sense on backend-type workers.
+		// Agent workers don't subscribe to backend.install/delete/list, so
+		// enqueueing for them guarantees a forever-retrying row that the
+		// reconciler can never drain. Silently skip — they aren't consumers.
+		if node.NodeType != "" && node.NodeType != NodeTypeBackend {
+			continue
+		}
 		if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
 			xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
 			result.Nodes = append(result.Nodes, NodeOpStatus{
@@ -286,7 +293,7 @@ func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *gall
 	backendName := op.GalleryElementName

 	_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
-		reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
+		reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
 		if err != nil {
 			return err
 		}
@@ -304,7 +311,7 @@ func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name str
 	galleriesJSON, _ := json.Marshal(d.backendGalleries)

 	_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
-		reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
+		reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
 		if err != nil {
 			return err
 		}
--- a/core/services/nodes/reconciler.go
+++ b/core/services/nodes/reconciler.go
@@ -3,12 +3,14 @@ package nodes
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"time"

 	"github.com/mudler/LocalAI/core/services/advisorylock"
 	grpcclient "github.com/mudler/LocalAI/pkg/grpc"
 	"github.com/mudler/xlog"
+	"github.com/nats-io/nats.go"
 	"gorm.io/gorm"
 )

@@ -186,7 +188,7 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
 		case OpBackendDelete:
 			_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
 		case OpBackendInstall, OpBackendUpgrade:
-			reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries))
+			reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
 			if err != nil {
 				applyErr = err
 			} else if !reply.Success {
@@ -206,12 +208,47 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
 			}
 			continue
 		}
+
+		// ErrNoResponders means the node has no active NATS subscription for
+		// this subject. Either its connection dropped, or it's the wrong
+		// node type entirely. Mark unhealthy so the health monitor's
+		// heartbeat-only pass doesn't immediately flip it back — and so
+		// ListDuePendingBackendOps (which filters by status=healthy) stops
+		// picking the row until the node genuinely recovers.
+		if errors.Is(applyErr, nats.ErrNoResponders) {
+			xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
+				"op", op.Op, "backend", op.Backend, "node", op.NodeID)
+			_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
+		}
+
+		// Dead-letter cap: after maxAttempts the row is the reconciler
+		// equivalent of a poison message. Delete it loudly so the queue
+		// doesn't churn NATS every tick forever — operators can re-issue
+		// the op from the UI if they still want it applied.
+		if op.Attempts+1 >= maxPendingBackendOpAttempts {
+			xlog.Error("Reconciler: abandoning pending backend op after max attempts",
+				"op", op.Op, "backend", op.Backend, "node", op.NodeID,
+				"attempts", op.Attempts+1, "last_error", applyErr)
+			if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
+				xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
+			}
+			continue
+		}
+
 		_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
 		xlog.Warn("Reconciler: pending backend op retry failed",
 			"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
 	}
 }

+// maxPendingBackendOpAttempts caps how many times the reconciler retries a
+// failing row before dead-lettering it. Ten attempts at exponential backoff
+// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
+// worker restart or network blip. Poisoned rows beyond that are almost
+// certainly structural (wrong node type, non-existent gallery entry) and no
+// amount of further retrying will help.
+const maxPendingBackendOpAttempts = 10
+
 // probeLoadedModels gRPC-health-checks model addresses that the DB says are
 // loaded. If a model's backend process is gone (OOM, crash, manual restart)
 // we remove the row so ghosts don't linger. Only probes rows older than
--- a/core/services/nodes/reconciler_test.go
+++ b/core/services/nodes/reconciler_test.go
@@ -373,4 +373,30 @@ var _ = Describe("ReplicaReconciler — state reconciliation", func() {
 			Expect(row.NextRetryAt).To(BeTemporally(">", before))
 		})
 	})
+
+	Describe("NewNodeRegistry malformed-row pruning", func() {
+		It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
+			agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
+			Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
+			backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
+			Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
+
+			// Three rows: one for a valid backend node (should survive),
+			// one for an agent node (pruned), one for an empty backend name
+			// on the valid node (pruned).
+			Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
+			Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
+			Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
+
+			// Re-instantiating the registry runs the cleanup migration.
+			_, err := NewNodeRegistry(db)
+			Expect(err).ToNot(HaveOccurred())
+
+			var rows []PendingBackendOp
+			Expect(db.Find(&rows).Error).To(Succeed())
+			Expect(rows).To(HaveLen(1))
+			Expect(rows[0].NodeID).To(Equal(backend.ID))
+			Expect(rows[0].Backend).To(Equal("foo"))
+		})
+	})
 })
--- a/core/services/nodes/registry.go
+++ b/core/services/nodes/registry.go
@@ -148,6 +148,30 @@ func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
 	}); err != nil {
 		return nil, fmt.Errorf("migrating node tables: %w", err)
 	}
+
+	// One-shot cleanup of queue rows that can never drain: ops targeted at
+	// agent workers (wrong subscription set), at non-existent nodes, or with
+	// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
+	// new ones from being written, but rows persisted by earlier versions
+	// keep the reconciler busy retrying a permanently-failing NATS request
+	// every 30s. Guarded by the same migration advisory lock so only one
+	// frontend runs it.
+	_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
+		res := db.Exec(`
+			DELETE FROM pending_backend_ops
+			WHERE backend = ''
+			   OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
+		`, NodeTypeBackend)
+		if res.Error != nil {
+			xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
+			return res.Error
+		}
+		if res.RowsAffected > 0 {
+			xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
+		}
+		return nil
+	})
+
 	return &NodeRegistry{db: db}, nil
 }

--- a/core/services/nodes/router.go
+++ b/core/services/nodes/router.go
@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
 		return "", fmt.Errorf("no NATS connection for backend installation")
 	}

-	reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
+	reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
 	if err != nil {
 		return "", err
 	}
--- a/core/services/nodes/router_test.go
+++ b/core/services/nodes/router_test.go
@@ -244,7 +244,7 @@ type fakeUnloader struct {
 	unloadErr    error
 }

-func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
+func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
 	return f.installReply, f.installErr
 }

--- a/core/services/nodes/unloader.go
+++ b/core/services/nodes/unloader.go
@@ -17,7 +17,7 @@ type backendStopRequest struct {
 // NodeCommandSender abstracts NATS-based commands to worker nodes.
 // Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
 type NodeCommandSender interface {
-	InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
+	InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
 	DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
 	ListBackends(nodeID string) (*messaging.BackendListReply, error)
 	StopBackend(nodeID, backend string) error
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
 // The worker installs the backend from gallery (if not already installed),
 // starts the gRPC process, and replies when ready.
 // Timeout: 5 minutes (gallery install can take a while).
-func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
+func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
 	subject := messaging.SubjectNodeBackendInstall(nodeID)
 	xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)

@@ -80,6 +80,9 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
 		Backend:          backendType,
 		ModelID:          modelID,
 		BackendGalleries: galleriesJSON,
+		URI:              uri,
+		Name:             name,
+		Alias:            alias,
 	}, 5*time.Minute)
 }

--- a/core/trace/backend_trace.go
+++ b/core/trace/backend_trace.go
@@ -24,6 +24,8 @@ const (
 	BackendTraceRerank          BackendTraceType = "rerank"
 	BackendTraceTokenize        BackendTraceType = "tokenize"
 	BackendTraceDetection       BackendTraceType = "detection"
+	BackendTraceFaceVerify      BackendTraceType = "face_verify"
+	BackendTraceFaceAnalyze     BackendTraceType = "face_analyze"
 	BackendTraceModelLoad       BackendTraceType = "model_load"
 )

--- a/docs/content/features/backend-monitor.md
+++ b/docs/content/features/backend-monitor.md
@@ -14,11 +14,13 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend

 ### Request

-The request body is JSON:
+The model to monitor is passed as a query parameter:

-| Parameter | Type     | Required | Description                    |
-|-----------|----------|----------|--------------------------------|
-| `model`   | `string` | Yes      | Name of the model to monitor   |
+| Parameter | Type     | Required | Location | Description                    |
+|-----------|----------|----------|----------|--------------------------------|
+| `model`   | `string` | Yes      | query    | Name of the model to monitor   |
+
+For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.

 ### Response

@@ -42,9 +44,7 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
 ### Usage

 ```bash
-curl http://localhost:8080/backend/monitor \
-  -H "Content-Type: application/json" \
-  -d '{"model": "my-model"}'
+curl "http://localhost:8080/backend/monitor?model=my-model"
 ```

 ### Example response
--- a/docs/content/features/embeddings.md
+++ b/docs/content/features/embeddings.md
@@ -7,6 +7,10 @@ url = "/features/embeddings/"

 LocalAI supports generating embeddings for text or list of tokens.

+For face embeddings specifically, see the
+[Face Recognition](/features/face-recognition/) feature — it produces
+512-d L2-normalized vectors tuned for face similarity.
+
 For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings

 ## Model compatibility
--- a/docs/content/features/face-recognition.md
+++ b/docs/content/features/face-recognition.md
@@ -0,0 +1,228 @@
+++
+disableToc = false
+title = "Face Recognition"
+weight = 14
+url = "/features/face-recognition/"
+++
+
+LocalAI supports face recognition through the `insightface` backend:
+face verification (1:1), face identification (1:N) against a built-in
+vector store, face embedding, face detection, and demographic analysis
+(age / gender).
+
+The backend ships **two interchangeable engines** under one image, each
+paired with a distinct gallery entry so users can pick by license and
+accuracy needs.
+
+## Licensing — read this first
+
+| Gallery entry | Detector + recognizer | Size | License |
+|---|---|---|---|
+| `insightface-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | ~326 MB | **Non-commercial research only** (upstream insightface weights) |
+| `insightface-buffalo-s` | SCRFD-500MF + MBF + GenderAge | ~159 MB | **Non-commercial research only** |
+| `insightface-opencv` | YuNet + SFace | ~40 MB | **Apache 2.0 — commercial-safe** |
+
+The `insightface` Python library itself is MIT, but the pretrained model
+packs (buffalo_l, buffalo_s, antelopev2) are released by the upstream
+maintainers for **non-commercial research use only**. Pick the
+`insightface-opencv` entry for production / commercial deployments.
+
+## Quickstart
+
+Pull the commercial-safe backend (recommended for copy-paste):
+
+```bash
+local-ai models install insightface-opencv
+```
+
+Verify that two images depict the same person:
+
+```bash
+curl -sX POST http://localhost:8080/v1/face/verify \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "insightface-opencv",
+    "img1": "https://example.com/alice_1.jpg",
+    "img2": "https://example.com/alice_2.jpg"
+  }'
+```
+
+Response:
+
+```json
+{
+  "verified": true,
+  "distance": 0.27,
+  "threshold": 0.35,
+  "confidence": 23.1,
+  "model": "insightface-opencv",
+  "img1_area": { "x": 120.4, "y": 82.1, "w": 198.3, "h": 260.5 },
+  "img2_area": { "x": 110.8, "y": 95.0, "w": 205.6, "h": 268.2 },
+  "processing_time_ms": 412.0
+}
+```
+
+## 1:N identification workflow (register → identify → forget)
+
+This is the primary "face recognition" flow. Under the hood it uses
+LocalAI's built-in in-memory vector store — no external database to
+stand up.
+
+1. Register known faces:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/register \
+      -H "Content-Type: application/json" \
+      -d '{
+        "model": "insightface-buffalo-l",
+        "name": "Alice",
+        "img": "https://example.com/alice.jpg"
+      }'
+    # → {"id": "8b7...", "name": "Alice", "registered_at": "2026-04-21T..."}
+    ```
+
+2. Identify an unknown probe:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/identify \
+      -H "Content-Type: application/json" \
+      -d '{
+        "model": "insightface-buffalo-l",
+        "img": "https://example.com/unknown.jpg",
+        "top_k": 5
+      }'
+    # → {"matches": [{"id":"8b7...","name":"Alice","distance":0.22,"match":true,...}]}
+    ```
+
+3. Remove a person by ID:
+
+    ```bash
+    curl -sX POST http://localhost:8080/v1/face/forget \
+      -d '{"id": "8b7..."}'
+    # → 204 No Content
+    ```
+
+{{% alert icon="⚠️" color="warning" %}}
+**Storage caveat.** The default vector store is in-memory. All
+registered faces are lost when LocalAI restarts. Persistent storage
+(pgvector) is a tracked future enhancement — the face-recognition HTTP
+API is designed to swap the backing store without changing the wire
+format.
+{{% /alert %}}
+
+## API reference
+
+### `POST /v1/face/verify` (1:1)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | gallery entry name (e.g. `insightface-buffalo-l`) |
+| `img1`, `img2` | string | URL, base64, or data-URI |
+| `threshold` | float, optional | cosine-distance cutoff; default depends on engine |
+| `anti_spoofing` | bool, optional | reserved — unused in the current release |
+
+Returns `verified`, `distance`, `threshold`, `confidence`, `model`,
+`img1_area`, `img2_area`, and `processing_time_ms`.
+
+### `POST /v1/face/analyze`
+
+Returns demographic attributes for every detected face:
+
+| field | type | description |
+|---|---|---|
+| `model` | string | gallery entry |
+| `img` | string | URL / base64 / data-URI |
+| `actions` | string[] | subset of `["age","gender","emotion","race"]`; empty = all supported |
+
+Only `insightface-buffalo-l` / `insightface-buffalo-s` populate age and
+gender (genderage head). `insightface-opencv` returns face regions with
+empty attributes — SFace has no demographic classifier. Emotion and
+race are always empty in the current release.
+
+### `POST /v1/face/register` (1:N enrollment)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face recognition model |
+| `img` | string | face to enroll |
+| `name` | string | human-readable label |
+| `labels` | map[string]string, optional | arbitrary metadata |
+| `store` | string, optional | vector store model; defaults to local-store |
+
+Returns `{id, name, registered_at}`. The `id` is an opaque UUID used by
+`/v1/face/identify` and `/v1/face/forget`.
+
+### `POST /v1/face/identify` (1:N recognition)
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face recognition model |
+| `img` | string | probe image |
+| `top_k` | int, optional | max matches to return; default 5 |
+| `threshold` | float, optional | cosine-distance cutoff; default 0.35 (ArcFace) |
+| `store` | string, optional | vector store model; defaults to local-store |
+
+Returns a list of matches sorted by ascending distance, each with `id`,
+`name`, `labels`, `distance`, `confidence`, and `match`
+(`distance ≤ threshold`).
+
+### `POST /v1/face/forget`
+
+| field | type | description |
+|---|---|---|
+| `id` | string | ID returned by `/v1/face/register` |
+
+Returns `204 No Content` on success, `404 Not Found` if the ID is
+unknown.
+
+### `POST /v1/face/embed`
+
+Returns the L2-normalized face embedding vector for the detected face.
+
+| field | type | description |
+|---|---|---|
+| `model` | string | face model |
+| `img` | string | URL / base64 / data-URI |
+
+Returns `{embedding: float[], dim: int, model: string}`. Dimension is
+512 for the insightface ArcFace/MBF recognizers and 128 for OpenCV's
+SFace.
+
+> **Note:** the OpenAI-compatible `/v1/embeddings` endpoint is
+> intentionally text-only by contract (`input` is a string or list of
+> strings of TEXT to embed) — passing an image data-URI there does
+> nothing useful. Use `/v1/face/embed` for image inputs.
+
+### Reused endpoint
+
+- `POST /v1/detection` — returns face bounding boxes with
+  `class_name: "face"`; works for both engines.
+
+## Choosing an engine
+
+| Need | Entry |
+|---|---|
+| Commercial product | `insightface-opencv` |
+| Highest accuracy (research / demos) | `insightface-buffalo-l` |
+| Edge / low-memory / research | `insightface-buffalo-s` |
+
+The recommended default `threshold` for `/v1/face/verify` and
+`/v1/face/identify` depends on the recognizer:
+
+| Recognizer | Cosine-distance threshold |
+|---|---|
+| ArcFace R50 (`buffalo_l`) | ~0.35 |
+| MBF (`buffalo_s`) | ~0.40 |
+| SFace (`opencv`) | ~0.50 |
+
+Pass `threshold` explicitly when switching engines — the per-engine
+default only fires when the field is omitted.
+
+## Related features
+
+- [Object Detection](/features/object-detection/) — generic bounding-box
+  detection; `/v1/detection` works with the insightface backend too.
+- [Embeddings](/features/embeddings/) — raw vector extraction; face
+  embeddings live in the same endpoint under the hood.
+- [Stores](/features/stores/) — the generic vector store powering the
+  1:N recognition pipeline.
--- a/docs/content/features/object-detection.md
+++ b/docs/content/features/object-detection.md
@@ -7,6 +7,11 @@ url = "/features/object-detection/"

 LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).

+For detecting **faces** specifically, see the dedicated
+[Face Recognition](/features/face-recognition/) feature — its
+`/v1/detection` support is tuned for face bounding boxes and ships
+with commercially-safe model options.
+
 ## Overview

 Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
--- a/docs/content/features/stores.md
+++ b/docs/content/features/stores.md
@@ -9,6 +9,14 @@ url = '/stores'
 Stores are an experimental feature to help with querying data using similarity search. It is
 a low level API that consists of only `get`, `set`, `delete` and `find`.

+{{% alert icon="💡" color="info" %}}
+**Face recognition uses this store.** The 1:N face identification flow
+(`/v1/face/register`, `/v1/face/identify`, `/v1/face/forget`) is built
+on top of the generic store — see
+[Face Recognition](/features/face-recognition/) for the face-oriented
+API.
+{{% /alert %}}
+
 For example if you have an embedding of some text and want to find text with similar embeddings.
 You can create embeddings for chunks of all your text then compare them against the embedding of the text you
 are searching on.
--- a/docs/content/reference/_index.md
+++ b/docs/content/reference/_index.md
@@ -130,6 +130,19 @@ Reference for system information commands and diagnostics.

 ---

+### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
+Policy for AI-assisted contributions — licensing, DCO, and attribution.
+
+**Key topics:**
+- Aligned with the Linux kernel's AI assistants policy
+- Signed-off-by and DCO rules
+- `Assisted-by` commit trailer format
+- Scope and responsibility of the human submitter
+
+**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
+
+---
+
 ## Quick Links

 | Task | Documentation |
@@ -138,6 +151,7 @@ Reference for system information commands and diagnostics.
 | CLI commands | [CLI Reference](cli-reference.md) |
 | Check compatibility | [Compatibility Table](compatibility-table.md) |
 | System diagnostics | [System Info](system-info.md) |
+| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |

 ---

--- a/docs/content/reference/ai-coding-assistants.md
+++ b/docs/content/reference/ai-coding-assistants.md
@@ -0,0 +1,79 @@
+
+++
+disableToc = false
+title = "AI Coding Assistants"
+weight = 28
+++
+
+This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
+
+**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
+
+AI tools helping with LocalAI development should follow the standard project development process:
+
+- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
+- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
+- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
+
+## Licensing and Legal Requirements
+
+All contributions must comply with LocalAI's licensing requirements:
+
+- LocalAI is licensed under the **MIT License**
+- New source files should use the SPDX license identifier `MIT` where applicable to the file type
+- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
+
+## Signed-off-by and Developer Certificate of Origin
+
+**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
+
+- Reviewing all AI-generated code
+- Ensuring compliance with licensing requirements
+- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
+- Taking full responsibility for the contribution
+
+AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
+
+## Attribution
+
+When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
+
+```
+Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
+```
+
+Where:
+
+- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
+- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
+- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
+
+Basic development tools (git, go, make, editors) should **not** be listed.
+
+### Example
+
+```
+fix(llama-cpp): handle empty tool call arguments
+
+Previously the parser panicked when the model returned a tool call with
+an empty arguments object. Fall back to an empty JSON object in that
+case so downstream consumers receive a valid payload.
+
+Assisted-by: Claude:claude-opus-4-7 golangci-lint
+Signed-off-by: Jane Developer <jane@example.com>
+```
+
+## Scope and Responsibility
+
+Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
+
+- Understand every line that lands in the PR
+- Verify that generated code compiles, passes tests, and follows the project style
+- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
+- Not submit AI output verbatim without review
+
+Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
+
+{{% notice note %}}
+This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
+{{% /notice %}}
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 |---------|-------------|-------------|
 | [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
 | [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
-| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
+| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
 | [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
 | [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
 | [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
--- a/docs/content/whats-new.md
+++ b/docs/content/whats-new.md
@@ -10,6 +10,10 @@ Release notes have been now moved completely over Github releases.

 You can see the release notes [here](https://github.com/mudler/LocalAI/releases).

+## 2026 Highlights
+
+- **April 2026**: [Face recognition backend](/features/face-recognition/) — `insightface`-powered 1:1 verification, 1:N identification, face embedding, face detection, and demographic analysis. Ships both a non-commercial `buffalo_l` model and an Apache 2.0 OpenCV Zoo alternative.
+
 ## 2024 Highlights

 - **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,4 +1,268 @@
 ---
+- name: "qwen3.6-27b"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
+  description: |
+    # Qwen3.6-27B
+
+    [](https://chat.qwen.ai)
+
+    > [!Note]
+    > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.
+    >
+    > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
+
+    Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.
+
+    ## Qwen3.6 Highlights
+
+    This release delivers substantial upgrades, particularly in
+
+      - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
+      - **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.
+
+    For more details, please refer to our blog post Qwen3.6-27B.
+
+    ## Model Overview
+
+    ...
+  license: "apache-2.0"
+  tags:
+    - llm
+    - gguf
+    - qwen
+  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    mmproj: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
+    options:
+      - use_jinja:true
+    parameters:
+      min_p: 0
+      model: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
+      presence_penalty: 1.5
+      repeat_penalty: 1
+      temperature: 0.7
+      top_k: 20
+      top_p: 0.8
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
+      sha256: 5ed60d0af4650a854b1755bd392f9aef4872643dc25a254bc68043fa638392a0
+      uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
+      sha256: fdc443e974cad1f61c45af1cfd5580855855ddce0d6c14cc500a5714c486ac1d
+      uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/mmproj-F32.gguf
+- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
+  description: |
+    # 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
+
+    A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
+
+    The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
+
+      - **Developed by:** @hesamation
+      - **Base model:** `Qwen/Qwen3.6-35B-A3B`
+      - **License:** apache-2.0
+
+    This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
+
+    [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
+
+    ## Benchmark Results
+
+    The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
+
+    ...
+  license: "apache-2.0"
+  tags:
+    - llm
+    - gguf
+    - qwen
+    - reasoning
+  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    options:
+      - use_jinja:true
+    parameters:
+      min_p: 0
+      model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
+      presence_penalty: 1.5
+      repeat_penalty: 1
+      temperature: 0.7
+      top_k: 20
+      top_p: 0.8
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
+      sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
+      uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
+- name: "qwen3.5-9b-glm5.1-distill-v1"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
+  description: |
+    # 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
+
+    ## 📌 Model Overview
+
+    **Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
+    **Base Model:** Qwen3.5-9B
+    **Training Type:** Supervised Fine-Tuning (SFT, Distillation)
+    **Parameter Scale:** 9B
+    **Training Framework:** Unsloth
+
+    This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
+
+    The primary goals are to:
+
+      - Improve **structured reasoning ability**
+      - Enhance **instruction-following consistency**
+      - Activate **latent knowledge via better reasoning structure**
+
+    ## 📊 Training Data
+
+    ### Main Dataset
+
+      - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
+      - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
+      - Generated from a **GLM-5.1 teacher model**
+      - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
+      - Training used a **filtered subset**, not the full source dataset.
+
+    ### Auxiliary Dataset
+
+      - `Jackrong/Qwen3.5-reasoning-700x`
+
+    ...
+  license: "apache-2.0"
+  tags:
+    - llm
+    - gguf
+    - qwen
+    - instruction-tuned
+    - reasoning
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
+    options:
+      - use_jinja:true
+    parameters:
+      min_p: 0
+      model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
+      presence_penalty: 1.5
+      repeat_penalty: 1
+      temperature: 0.7
+      top_k: 20
+      top_p: 0.8
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
+      sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
+      uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
+      sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
+      uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
+- name: "supergemma4-26b-uncensored-v2"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
+  description: |
+    Hugging Face |
+    GitHub |
+    Launch Blog |
+    Documentation
+
+    License: Apache 2.0 | Authors: Google DeepMind
+
+    Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
+
+    Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
+
+    Gemma 4 introduces key **capability and architectural advancements**:
+
+    * **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
+
+    ...
+  license: "gemma"
+  tags:
+    - llm
+    - gguf
+  icon: https://ai.google.dev/gemma/images/gemma4_banner.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    options:
+      - use_jinja:true
+    parameters:
+      model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
+      sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
+      uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
+- name: "qwopus-glm-18b-merged"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
+  description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n  - Improve **structured reasoning ability**\n  - Enhance **instruction-following consistency**\n  - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n  - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n  - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n  - Generated from a **GLM-5.1 teacher model**\n  - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n  - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n  - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
+  license: "apache-2.0"
+  tags:
+    - llm
+    - gguf
+    - reasoning
+  icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
+  overrides:
+    backend: llama-cpp
+    function:
+      automatic_tool_parsing_fallback: true
+      grammar:
+        disable: true
+    known_usecases:
+      - chat
+    options:
+      - use_jinja:true
+    parameters:
+      model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
+    template:
+      use_tokenizer_template: true
+  files:
+    - filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
+      sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
+      uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
 - name: "qwen3.6-35b-a3b-apex"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
@@ -887,6 +1151,8 @@
    - gpu
  overrides:
    backend: neutts
+    parameters:
+      model: neuphonic/neutts-air
    known_usecases:
      - tts
 - name: vllm-omni-z-image-turbo
@@ -3502,6 +3768,169 @@
    - filename: arcee-ai_AFM-4.5B-Q4_K_M.gguf
      sha256: f05516b323f581bebae1af2cbf900d83a2569b0a60c54366daf4a9c15ae30d4f
      uri: huggingface://bartowski/arcee-ai_AFM-4.5B-GGUF/arcee-ai_AFM-4.5B-Q4_K_M.gguf
+- &insightface_buffalo_l
+  name: "insightface-buffalo-l"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  # insightface library is MIT; pretrained packs are NON-COMMERCIAL.
+  license: "insightface-non-commercial"
+  description: |
+    Face recognition using insightface's `buffalo_l` pack
+    (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB).
+    Default choice, highest accuracy.
+
+    Weights delivered via LocalAI's gallery mechanism (SHA-256 verified,
+    cached in the models directory like any other managed model).
+    NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-l}
+    options: ["engine:insightface", "model_pack:buffalo_l"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_l.zip
+      sha256: 80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
+- &insightface_buffalo_m
+  name: "insightface-buffalo-m"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace +
+    genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a
+    cheaper detector — good balance on mid-range hardware.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-m}
+    options: ["engine:insightface", "model_pack:buffalo_m"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_m.zip
+      sha256: d98264bd8f2dc75cbc2ddce2a14e636e02bb857b3051c234b737bf3b614edca9
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_m.zip
+- &insightface_buffalo_s
+  name: "insightface-buffalo-s"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder +
+    genderage, ~159MB). Good fit for mid-range CPU deployments.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-s}
+    options: ["engine:insightface", "model_pack:buffalo_s"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_s.zip
+      sha256: d85a87f503f691807cd8bb97128bdf7a0660326cd9cd02657127fa978bab8b5e
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_s.zip
+- &insightface_buffalo_sc
+  name: "insightface-buffalo-sc"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB).
+    NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty
+    attributes for this pack. Ideal for edge/embedded deployments where
+    only verification and embedding are needed.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-buffalo-sc}
+    options: ["engine:insightface", "model_pack:buffalo_sc"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: buffalo_sc.zip
+      sha256: 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
+- &insightface_antelopev2
+  name: "insightface-antelopev2"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
+  license: "insightface-non-commercial"
+  description: |
+    Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer +
+    genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on
+    harder benchmarks; pays for it in GPU memory.
+    NON-COMMERCIAL RESEARCH USE ONLY.
+  tags: [face-recognition, face-verification, face-embedding, research-only, gpu]
+  urls: [https://github.com/deepinsight/insightface]
+  overrides:
+    backend: insightface
+    parameters: {model: insightface-antelopev2}
+    options: ["engine:insightface", "model_pack:antelopev2"]
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: antelopev2.zip
+      sha256: 8e182f14fc6e80b3bfa375b33eb6cff7ee05d8ef7633e738d1c89021dcf0c5c5
+      uri: https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip
+- &insightface_opencv
+  name: "insightface-opencv"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  license: apache-2.0
+  description: |
+    Face recognition using OpenCV Zoo weights: YuNet detector + SFace
+    128-d recognizer (fp32). APACHE 2.0 — safe for commercial use.
+    Lower accuracy than insightface packs, no demographic head
+    (`/v1/face/analyze` returns detection regions only).
+    Weights are downloaded on install via LocalAI's gallery mechanism
+    (~40MB).
+  tags: [face-recognition, face-verification, face-embedding, commercial-ok, gpu, cpu]
+  urls: [https://github.com/opencv/opencv_zoo]
+  overrides:
+    backend: insightface
+    parameters: {model: face_detection_yunet_2023mar.onnx}
+    options:
+      - "engine:onnx_direct"
+      - "detector_onnx:face_detection_yunet_2023mar.onnx"
+      - "recognizer_onnx:face_recognition_sface_2021dec.onnx"
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: face_detection_yunet_2023mar.onnx
+      sha256: 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
+    - filename: face_recognition_sface_2021dec.onnx
+      sha256: 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
+- &insightface_opencv_int8
+  name: "insightface-opencv-int8"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  license: apache-2.0
+  description: |
+    Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB).
+    Roughly 3x smaller and noticeably faster on CPU than the fp32 variant
+    at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe.
+    Weights are downloaded on install via LocalAI's gallery mechanism.
+  tags: [face-recognition, face-verification, face-embedding, commercial-ok, edge, cpu]
+  urls: [https://github.com/opencv/opencv_zoo]
+  overrides:
+    backend: insightface
+    parameters: {model: face_detection_yunet_2023mar_int8.onnx}
+    options:
+      - "engine:onnx_direct"
+      - "detector_onnx:face_detection_yunet_2023mar_int8.onnx"
+      - "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx"
+    known_usecases: [face_recognition, detection, embeddings]
+  files:
+    - filename: face_detection_yunet_2023mar_int8.onnx
+      sha256: 321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
+    - filename: face_recognition_sface_2021dec_int8.onnx
+      sha256: 2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a
+      uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
 - &rfdetr
  name: "rfdetr-base"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
@@ -15189,11 +15618,13 @@
      model: wan2.1_t2v_1.3b-q8_0.gguf
  files:
    - filename: "wan2.1_t2v_1.3b-q8_0.gguf"
+      sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
      uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
    - filename: "wan_2.1_vae.safetensors"
      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
    - filename: "umt5-xxl-encoder-Q8_0.gguf"
      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
+      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
 - name: wan-2.1-i2v-14b-480p-ggml
  license: apache-2.0
  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
@@ -15214,11 +15645,103 @@
      model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
    options:
      - "clip_vision_path:clip_vision_h.safetensors"
+      - "diffusion_model"
+      - "vae_decode_only:false"
+      - "sampler:euler"
+      - "flow_shift:3.0"
+      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
+      - "vae_path:wan_2.1_vae.safetensors"
  files:
    - filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
+      sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
      uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
    - filename: "wan_2.1_vae.safetensors"
      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
+    - filename: "umt5-xxl-encoder-Q8_0.gguf"
+      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
+      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
+    - filename: "clip_vision_h.safetensors"
+      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
+- name: wan-2.1-flf2v-14b-720p-ggml
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
+  description: |
+    Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
+    Takes a start and end reference image and interpolates a 33-frame clip
+    between them. Unlike the plain I2V variant this model feeds the end
+    frame through clip_vision as well, so it conditions semantically (not
+    just in pixel-space) on both endpoints. That makes it the right choice
+    for seamless loops (start_image == end_image) and clean narrative cuts.
+    Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
+    text encoder, and clip_vision_h as I2V 14B.
+  urls:
+    - https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
+  tags:
+    - image-to-video
+    - first-last-frame-to-video
+    - wan
+    - video-generation
+    - cpu
+    - gpu
+  overrides:
+    parameters:
+      model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
+    options:
+      - "clip_vision_path:clip_vision_h.safetensors"
+      - "diffusion_model"
+      - "vae_decode_only:false"
+      - "sampler:euler"
+      - "flow_shift:3.0"
+      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
+      - "vae_path:wan_2.1_vae.safetensors"
+  files:
+    - filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
+      sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
+      uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
+    - filename: "wan_2.1_vae.safetensors"
+      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
+    - filename: "umt5-xxl-encoder-Q8_0.gguf"
+      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
+      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
+    - filename: "clip_vision_h.safetensors"
+      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
+- name: wan-2.1-i2v-14b-720p-ggml
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
+  description: |
+    Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
+    Native 720p sibling of the 480p I2V model: animates a single
+    reference image into a 33-frame clip at up to 1280x720. Trained
+    purely as image-to-video (no first-last-frame interpolation path),
+    so motion is freer and better-suited to single-anchor animation
+    than repurposing the FLF2V 720P variant for i2v. Shares the same
+    VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
+    and FLF2V 14B 720P entries.
+  urls:
+    - https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
+  tags:
+    - image-to-video
+    - wan
+    - video-generation
+    - cpu
+    - gpu
+  overrides:
+    parameters:
+      model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
+    options:
+      - "clip_vision_path:clip_vision_h.safetensors"
+      - "diffusion_model"
+      - "vae_decode_only:false"
+      - "sampler:euler"
+      - "flow_shift:3.0"
+      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
+      - "vae_path:wan_2.1_vae.safetensors"
+  files:
+    - filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
+      sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
+      uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
+    - filename: "wan_2.1_vae.safetensors"
+      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
    - filename: "umt5-xxl-encoder-Q8_0.gguf"
      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
    - filename: "clip_vision_h.safetensors"
--- a/go.mod
+++ b/go.mod
@@ -8,13 +8,13 @@ require (
 	github.com/Masterminds/sprig/v3 v3.3.0
 	github.com/alecthomas/kong v1.14.0
 	github.com/anthropics/anthropic-sdk-go v1.27.0
-	github.com/aws/aws-sdk-go-v2 v1.41.5
-	github.com/aws/aws-sdk-go-v2/config v1.32.14
-	github.com/aws/aws-sdk-go-v2/credentials v1.19.14
-	github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
+	github.com/aws/aws-sdk-go-v2 v1.41.6
+	github.com/aws/aws-sdk-go-v2/config v1.32.16
+	github.com/aws/aws-sdk-go-v2/credentials v1.19.15
+	github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
 	github.com/charmbracelet/glamour v1.0.0
-	github.com/containerd/containerd v1.7.30
-	github.com/coreos/go-oidc/v3 v3.17.0
+	github.com/containerd/containerd v1.7.31
+	github.com/coreos/go-oidc/v3 v3.18.0
 	github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
 	github.com/ebitengine/purego v0.10.0
 	github.com/emirpasic/gods/v2 v2.0.0-alpha
@@ -35,7 +35,7 @@ require (
 	github.com/lithammer/fuzzysearch v1.1.8
 	github.com/mholt/archiver/v3 v3.5.1
 	github.com/microcosm-cc/bluemonday v1.0.27
-	github.com/modelcontextprotocol/go-sdk v1.4.1
+	github.com/modelcontextprotocol/go-sdk v1.5.0
 	github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
 	github.com/mudler/edgevpn v0.31.1
 	github.com/mudler/go-processmanager v0.1.0
@@ -75,24 +75,23 @@ require (
 )

 require (
-	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
-	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
-	github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
-	github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
-	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
-	github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
-	github.com/aws/smithy-go v1.24.2 // indirect
+	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
+	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
+	github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
+	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
+	github.com/aws/smithy-go v1.25.0 // indirect
 	github.com/bahlo/generic-list-go v0.2.0 // indirect
 	github.com/buger/jsonparser v1.1.1 // indirect
-	github.com/go-jose/go-jose/v4 v4.1.3 // indirect
+	github.com/go-jose/go-jose/v4 v4.1.4 // indirect
 	github.com/jinzhu/inflection v1.0.0 // indirect
 	github.com/jinzhu/now v1.1.5 // indirect
 	github.com/mattn/go-sqlite3 v1.14.24 // indirect
--- a/go.sum
+++ b/go.sum
@@ -70,44 +70,42 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
 github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
 github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
 github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
-github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
-github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
-github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
-github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
-github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
-github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
-github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
-github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
-github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
-github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
-github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
-github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
-github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
-github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
-github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
-github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
-github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
-github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
-github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
-github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
-github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
-github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
-github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
-github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
-github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
-github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
-github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
-github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
-github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
-github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
-github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
-github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
-github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
-github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
-github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
-github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
-github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
-github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
+github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
+github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
+github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
+github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
+github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
+github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
+github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
+github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
+github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
+github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
+github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
+github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
+github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
+github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
+github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
+github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
+github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
+github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
+github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
+github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
+github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
+github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
+github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
+github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
+github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
+github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
+github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
+github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
+github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
+github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
+github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
+github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
+github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
+github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
+github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
+github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
 github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -198,8 +196,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
 github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
 github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
 github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
-github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
-github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
+github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
+github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
 github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
 github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
 github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
@@ -212,8 +210,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
 github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
 github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
 github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
-github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
-github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
+github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
+github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
 github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
 github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
 github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
@@ -336,8 +334,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
 github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
 github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
 github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
-github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
-github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
+github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
+github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
 github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
 github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
 github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
@@ -385,8 +383,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
 github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
 github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
 github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
-github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
-github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
+github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
+github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
 github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
 github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
 github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
@@ -691,8 +689,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
 github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
 github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
 github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
-github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
-github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
+github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
+github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
 github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
--- a/pkg/grpc/backend.go
+++ b/pkg/grpc/backend.go
@@ -54,6 +54,8 @@ type Backend interface {
 	TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
 	SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
 	Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
+	FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
+	FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error)
 	AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
 	AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
 	TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)
--- a/pkg/grpc/base/base.go
+++ b/pkg/grpc/base/base.go
@@ -81,6 +81,14 @@ func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
 	return pb.DetectResponse{}, fmt.Errorf("unimplemented")
 }

+func (llm *Base) FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
+	return pb.FaceVerifyResponse{}, fmt.Errorf("unimplemented")
+}
+
+func (llm *Base) FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
+	return pb.FaceAnalyzeResponse{}, fmt.Errorf("unimplemented")
+}
+
 func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
 	return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
 }
--- a/Show More
+++ b/Show More