mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-24 00:26:34 -04:00
Compare commits
44 Commits
docs/wan-g
...
bump/turbo
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5f7a0c3b26 | ||
|
|
bbeacf140d | ||
|
|
6820ec468f | ||
|
|
20baec77ab | ||
|
|
d16f19f1eb | ||
|
|
cd7b035716 | ||
|
|
0f3bb2d647 | ||
|
|
607efe5a4c | ||
|
|
7d8c1d5e45 | ||
|
|
d18d434bb2 | ||
|
|
39573ecd2a | ||
|
|
a7dbb2a83d | ||
|
|
3ad9b16c29 | ||
|
|
c806d5ab73 | ||
|
|
47efaf5b43 | ||
|
|
315b634a91 | ||
|
|
6b245299d7 | ||
|
|
677c0315c1 | ||
|
|
478522ce4d | ||
|
|
c54897ad44 | ||
|
|
8bb1e8f21f | ||
|
|
cd94a0b61a | ||
|
|
047bc48fa9 | ||
|
|
01bd8ae5d0 | ||
|
|
d9808769be | ||
|
|
5973c0a9df | ||
|
|
486b5e25a3 | ||
|
|
c66c41e8d7 | ||
|
|
02bb715c0a | ||
|
|
8ab56e2ad3 | ||
|
|
ecf85fde9e | ||
|
|
6480715a16 | ||
|
|
f683231811 | ||
|
|
960757f0e8 | ||
|
|
865fd552f5 | ||
|
|
cb77a5a4b9 | ||
|
|
60633c4dd5 | ||
|
|
9e44944cc1 | ||
|
|
372eb08dcf | ||
|
|
28091d626e | ||
|
|
cae79d9107 | ||
|
|
babbbc6ec8 | ||
|
|
3804497186 | ||
|
|
fda1c553a1 |
101
.agents/ai-coding-assistants.md
Normal file
101
.agents/ai-coding-assistants.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# AI Coding Assistants
|
||||
|
||||
This document provides guidance for AI tools and developers using AI
|
||||
assistance when contributing to LocalAI.
|
||||
|
||||
**LocalAI follows the same guidelines as the Linux kernel project for
|
||||
AI-assisted contributions.** See the upstream policy here:
|
||||
<https://docs.kernel.org/process/coding-assistants.html>
|
||||
|
||||
The rules below mirror that policy, adapted to LocalAI's license and
|
||||
project layout. If anything is unclear, the kernel document is the
|
||||
authoritative reference for intent.
|
||||
|
||||
AI tools helping with LocalAI development should follow the standard
|
||||
project development process:
|
||||
|
||||
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
|
||||
conventions, and PR guidelines
|
||||
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
|
||||
logging, and documentation conventions
|
||||
- [.agents/building-and-testing.md](building-and-testing.md) — build and
|
||||
test procedures
|
||||
|
||||
## Licensing and Legal Requirements
|
||||
|
||||
All contributions must comply with LocalAI's licensing requirements:
|
||||
|
||||
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
|
||||
file
|
||||
- New source files should use the SPDX license identifier `MIT` where
|
||||
applicable to the file type
|
||||
- Contributions must be compatible with the MIT License and must not
|
||||
introduce code under incompatible licenses (e.g., GPL) without an
|
||||
explicit discussion with maintainers
|
||||
|
||||
## Signed-off-by and Developer Certificate of Origin
|
||||
|
||||
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
|
||||
certify the Developer Certificate of Origin (DCO). The human submitter
|
||||
is responsible for:
|
||||
|
||||
- Reviewing all AI-generated code
|
||||
- Ensuring compliance with licensing requirements
|
||||
- Adding their own `Signed-off-by` tag (when the project requires DCO)
|
||||
to certify the contribution
|
||||
- Taking full responsibility for the contribution
|
||||
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
|
||||
A human reviewer owns the contribution; the AI's involvement is recorded
|
||||
via `Assisted-by` (see below).
|
||||
|
||||
## Attribution
|
||||
|
||||
When AI tools contribute to LocalAI development, proper attribution helps
|
||||
track the evolving role of AI in the development process. Contributions
|
||||
should include an `Assisted-by` tag in the commit message trailer in the
|
||||
following format:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
|
||||
`Copilot`, `Cursor`)
|
||||
- `MODEL_VERSION` — specific model version used (e.g.,
|
||||
`claude-opus-4-7`, `gpt-5`)
|
||||
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
|
||||
agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
|
||||
|
||||
Basic development tools (git, go, make, editors) should **not** be listed.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
fix(llama-cpp): handle empty tool call arguments
|
||||
|
||||
Previously the parser panicked when the model returned a tool call with
|
||||
an empty arguments object. Fall back to an empty JSON object in that
|
||||
case so downstream consumers receive a valid payload.
|
||||
|
||||
Assisted-by: Claude:claude-opus-4-7 golangci-lint
|
||||
Signed-off-by: Jane Developer <jane@example.com>
|
||||
```
|
||||
|
||||
## Scope and Responsibility
|
||||
|
||||
Using an AI assistant does not reduce the contributor's responsibility.
|
||||
The human submitter must:
|
||||
|
||||
- Understand every line that lands in the PR
|
||||
- Verify that generated code compiles, passes tests, and follows the
|
||||
project style
|
||||
- Confirm that any referenced APIs, flags, or file paths actually exist
|
||||
in the current tree (AI models may hallucinate identifiers)
|
||||
- Not submit AI output verbatim without review
|
||||
|
||||
Reviewers may ask for clarification on any change regardless of how it
|
||||
was produced. "An AI wrote it" is not an acceptable answer to a design
|
||||
question.
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
|
||||
|
||||
> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
|
||||
|
||||
## Architecture overview
|
||||
|
||||
Authentication and authorization flow through three layers:
|
||||
@@ -234,6 +236,66 @@ Use these HTTP status codes:
|
||||
|
||||
If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
|
||||
|
||||
## Advertising surfaces — where to register a new capability
|
||||
|
||||
Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
|
||||
|
||||
### 1. Swagger `@Tags` annotation (mandatory)
|
||||
|
||||
Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
|
||||
|
||||
```go
|
||||
// MyEndpoint does X.
|
||||
// @Summary Do X.
|
||||
// @Tags my-capability
|
||||
// @Param request body schema.MyRequest true "payload"
|
||||
// @Success 200 {object} schema.MyResponse "Response"
|
||||
// @Router /v1/my-endpoint [post]
|
||||
func MyEndpoint(...) echo.HandlerFunc { ... }
|
||||
```
|
||||
|
||||
Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
|
||||
|
||||
After adding endpoints, regenerate the embedded spec so the runtime serves it:
|
||||
|
||||
```bash
|
||||
make protogen-go # ensures gRPC codegen is fresh first
|
||||
make swagger # regenerates swagger/swagger.json
|
||||
```
|
||||
|
||||
### 2. `/api/instructions` registry (for new capability areas)
|
||||
|
||||
`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
|
||||
|
||||
**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
|
||||
|
||||
Add an entry to `instructionDefs`:
|
||||
|
||||
```go
|
||||
{
|
||||
Name: "my-capability", // URL segment at /api/instructions/my-capability
|
||||
Description: "Short sentence describing the capability",
|
||||
Tags: []string{"my-capability"}, // must match swagger @Tags
|
||||
Intro: "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
|
||||
},
|
||||
```
|
||||
|
||||
Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
|
||||
|
||||
### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
|
||||
|
||||
If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
|
||||
|
||||
```js
|
||||
export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
|
||||
```
|
||||
|
||||
React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
|
||||
|
||||
### 4. `docs/content/` (user-facing documentation)
|
||||
|
||||
A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
|
||||
|
||||
## Path protection rules
|
||||
|
||||
The global auth middleware classifies paths as API paths or non-API paths:
|
||||
@@ -248,12 +310,23 @@ If you add endpoints under a new top-level path prefix, add it to `isAPIPath()`
|
||||
|
||||
When adding a new endpoint:
|
||||
|
||||
**Routing & auth**
|
||||
- [ ] Handler in `core/http/endpoints/`
|
||||
- [ ] Route registered in appropriate `core/http/routes/` file
|
||||
- [ ] Auth level chosen: public / standard / admin / feature-gated
|
||||
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
|
||||
- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
|
||||
- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
|
||||
- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
|
||||
- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
|
||||
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
|
||||
- [ ] If token-counting: `usageMiddleware` added to middleware chain
|
||||
- [ ] Error responses use `schema.ErrorResponse` format
|
||||
|
||||
**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
|
||||
- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
|
||||
- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
|
||||
- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
|
||||
- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
|
||||
|
||||
**Quality**
|
||||
- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
|
||||
- [ ] Tests cover both authenticated and unauthenticated access
|
||||
- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
|
||||
|
||||
55
.github/workflows/backend.yml
vendored
55
.github/workflows/backend.yml
vendored
@@ -30,6 +30,7 @@ jobs:
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
context: ${{ matrix.context }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
amdgpu-targets: ${{ matrix.amdgpu-targets }}
|
||||
secrets:
|
||||
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
@@ -710,6 +711,19 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "8"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-12-insightface'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "insightface"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "8"
|
||||
@@ -1623,19 +1637,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-rocm-hipblas-whisperx'
|
||||
runs-on: 'bigger-runner'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
skip-drivers: 'false'
|
||||
backend: "whisperx"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -2596,6 +2597,20 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.golang"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# kokoros (Rust TTS)
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-kokoros'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "kokoros"
|
||||
dockerfile: "./backend/Dockerfile.rust"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# local-store
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
@@ -2624,6 +2639,20 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# insightface (face recognition)
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-insightface'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "insightface"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'intel'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
|
||||
7
.github/workflows/backend_build.yml
vendored
7
.github/workflows/backend_build.yml
vendored
@@ -58,6 +58,11 @@ on:
|
||||
required: false
|
||||
default: '2204'
|
||||
type: string
|
||||
amdgpu-targets:
|
||||
description: 'AMD GPU targets for ROCm/HIP builds'
|
||||
required: false
|
||||
default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
|
||||
type: string
|
||||
secrets:
|
||||
dockerUsername:
|
||||
required: false
|
||||
@@ -214,6 +219,7 @@ jobs:
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
@@ -235,6 +241,7 @@ jobs:
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
|
||||
27
.github/workflows/gallery-agent.yaml
vendored
27
.github/workflows/gallery-agent.yaml
vendored
@@ -54,24 +54,41 @@ jobs:
|
||||
REPO: ${{ github.repository }}
|
||||
SEARCH: 'gallery agent in:title'
|
||||
run: |
|
||||
# Walk open gallery-agent PRs and act on maintainer comments:
|
||||
# Walk gallery-agent PRs and act on maintainer comments:
|
||||
# /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
|
||||
# /gallery-agent recreate → close without label (next run may repropose)
|
||||
# Only comments from OWNER / MEMBER / COLLABORATOR are honored so
|
||||
# random users can't drive the bot.
|
||||
#
|
||||
# We scan both open PRs AND recently-closed PRs that don't already
|
||||
# carry the blacklist label. This covers the common flow where a
|
||||
# maintainer writes /gallery-agent blacklist and immediately clicks
|
||||
# Close — without this, the next scheduled run wouldn't see the
|
||||
# command (PR is already closed) and would repropose the model.
|
||||
gh label create gallery-agent/blacklisted \
|
||||
--repo "$REPO" --color ededed \
|
||||
--description "gallery-agent must not repropose this model" 2>/dev/null || true
|
||||
|
||||
prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
|
||||
prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
|
||||
--json number --jq '.[].number')
|
||||
# Closed PRs from the last 14 days that don't yet have the blacklist label.
|
||||
# Bounded window keeps the scan cheap while covering late-applied commands.
|
||||
since=$(date -u -d '14 days ago' +%Y-%m-%d)
|
||||
prs_closed=$(gh pr list --repo "$REPO" --state closed \
|
||||
--search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
|
||||
--json number --jq '.[].number')
|
||||
prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
|
||||
for pr in $prs; do
|
||||
state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
|
||||
cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
|
||||
--jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
|
||||
if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
|
||||
echo "PR #$pr: blacklist command found"
|
||||
echo "PR #$pr: blacklist command found (state=$state)"
|
||||
gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
|
||||
elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
|
||||
if [ "$state" = "OPEN" ]; then
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
|
||||
fi
|
||||
elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
|
||||
echo "PR #$pr: recreate command found"
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
|
||||
fi
|
||||
|
||||
27
.github/workflows/test-extra.yml
vendored
27
.github/workflows/test-extra.yml
vendored
@@ -38,6 +38,7 @@ jobs:
|
||||
qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
|
||||
voxtral: ${{ steps.detect.outputs.voxtral }}
|
||||
kokoros: ${{ steps.detect.outputs.kokoros }}
|
||||
insightface: ${{ steps.detect.outputs.insightface }}
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v6
|
||||
@@ -751,3 +752,29 @@ jobs:
|
||||
- name: Test kokoros
|
||||
run: |
|
||||
make -C backend/rust/kokoros test
|
||||
tests-insightface-grpc:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Dependencies
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y --no-install-recommends \
|
||||
make build-essential curl unzip ca-certificates git tar
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.26.0'
|
||||
- name: Free disk space
|
||||
run: |
|
||||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
|
||||
df -h
|
||||
- name: Build insightface backend image and run both model configurations
|
||||
run: |
|
||||
make test-extra-backend-insightface-all
|
||||
|
||||
15
AGENTS.md
15
AGENTS.md
@@ -1,11 +1,23 @@
|
||||
# LocalAI Agent Instructions
|
||||
|
||||
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
|
||||
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
|
||||
|
||||
Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
|
||||
|
||||
## Policy for AI-Assisted Contributions
|
||||
|
||||
LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
|
||||
|
||||
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
|
||||
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
|
||||
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
|
||||
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
|
||||
|
||||
## Topics
|
||||
|
||||
| File | When to read |
|
||||
|------|-------------|
|
||||
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
|
||||
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
||||
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
|
||||
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
||||
@@ -22,5 +34,6 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
|
||||
- **Go style**: Prefer `any` over `interface{}`
|
||||
- **Comments**: Explain *why*, not *what*
|
||||
- **Docs**: Update `docs/content/` when adding features or changing config
|
||||
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
|
||||
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
|
||||
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
|
||||
|
||||
@@ -13,6 +13,7 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
|
||||
- [Development Workflow](#development-workflow)
|
||||
- [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
|
||||
- [Coding Guidelines](#coding-guidelines)
|
||||
- [AI Coding Assistants](#ai-coding-assistants)
|
||||
- [Testing](#testing)
|
||||
- [Documentation](#documentation)
|
||||
- [Community and Communication](#community-and-communication)
|
||||
@@ -185,7 +186,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to
|
||||
|
||||
This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
|
||||
|
||||
For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
|
||||
For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
|
||||
|
||||
### General Principles
|
||||
|
||||
@@ -211,6 +212,26 @@ For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific gui
|
||||
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
|
||||
- Be responsive to review feedback and keep discussions constructive.
|
||||
|
||||
## AI Coding Assistants
|
||||
|
||||
LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
|
||||
|
||||
The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
|
||||
|
||||
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
|
||||
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
|
||||
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
|
||||
|
||||
Basic development tools (git, go, make, editors) should not be listed.
|
||||
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
|
||||
- Contributions must remain compatible with LocalAI's **MIT License**.
|
||||
|
||||
## Testing
|
||||
|
||||
All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
|
||||
|
||||
116
Makefile
116
Makefile
@@ -1,5 +1,5 @@
|
||||
# Disable parallel execution for backend builds
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
|
||||
|
||||
GOCMD=go
|
||||
GOTEST=$(GOCMD) test
|
||||
@@ -434,6 +434,7 @@ prepare-test-extra: protogen-python
|
||||
$(MAKE) -C backend/python/ace-step
|
||||
$(MAKE) -C backend/python/trl
|
||||
$(MAKE) -C backend/python/tinygrad
|
||||
$(MAKE) -C backend/python/insightface
|
||||
$(MAKE) -C backend/rust/kokoros kokoros-grpc
|
||||
|
||||
test-extra: prepare-test-extra
|
||||
@@ -457,6 +458,7 @@ test-extra: prepare-test-extra
|
||||
$(MAKE) -C backend/python/ace-step test
|
||||
$(MAKE) -C backend/python/trl test
|
||||
$(MAKE) -C backend/python/tinygrad test
|
||||
$(MAKE) -C backend/python/insightface test
|
||||
$(MAKE) -C backend/rust/kokoros test
|
||||
|
||||
##
|
||||
@@ -507,6 +509,13 @@ test-extra-backend: protogen-go
|
||||
BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
|
||||
BACKEND_TEST_CACHE_TYPE_K="$$BACKEND_TEST_CACHE_TYPE_K" \
|
||||
BACKEND_TEST_CACHE_TYPE_V="$$BACKEND_TEST_CACHE_TYPE_V" \
|
||||
BACKEND_TEST_FACE_IMAGE_1_URL="$$BACKEND_TEST_FACE_IMAGE_1_URL" \
|
||||
BACKEND_TEST_FACE_IMAGE_1_FILE="$$BACKEND_TEST_FACE_IMAGE_1_FILE" \
|
||||
BACKEND_TEST_FACE_IMAGE_2_URL="$$BACKEND_TEST_FACE_IMAGE_2_URL" \
|
||||
BACKEND_TEST_FACE_IMAGE_2_FILE="$$BACKEND_TEST_FACE_IMAGE_2_FILE" \
|
||||
BACKEND_TEST_FACE_IMAGE_3_URL="$$BACKEND_TEST_FACE_IMAGE_3_URL" \
|
||||
BACKEND_TEST_FACE_IMAGE_3_FILE="$$BACKEND_TEST_FACE_IMAGE_3_FILE" \
|
||||
BACKEND_TEST_VERIFY_DISTANCE_CEILING="$$BACKEND_TEST_VERIFY_DISTANCE_CEILING" \
|
||||
go test -v -timeout 30m ./tests/e2e-backends/...
|
||||
|
||||
## Convenience wrappers: build the image, then exercise it.
|
||||
@@ -603,6 +612,107 @@ test-extra-backend-tinygrad-all: \
|
||||
test-extra-backend-tinygrad-sd \
|
||||
test-extra-backend-tinygrad-whisper
|
||||
|
||||
## insightface — face recognition.
|
||||
##
|
||||
## Face fixtures default to the sample images shipped in the
|
||||
## deepinsight/insightface repository (MIT-licensed). For offline/local
|
||||
## runs override with BACKEND_TEST_FACE_IMAGE_{1,2,3}_FILE pointing at
|
||||
## local paths.
|
||||
FACE_IMAGE_1_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
|
||||
FACE_IMAGE_2_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
|
||||
FACE_IMAGE_3_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/mask_white.jpg
|
||||
|
||||
## Host-side cache for the OpenCV Zoo face ONNX files used by the
|
||||
## opencv e2e target. The backend image no longer bakes model weights —
|
||||
## gallery installs bring them via `files:` — but the e2e suite drives
|
||||
## LoadModel over gRPC directly without going through the gallery. We
|
||||
## pre-download the ONNX files to a stable host path and pass absolute
|
||||
## paths in BACKEND_TEST_OPTIONS; `make` skips the downloads when the
|
||||
## SHA-256 already matches.
|
||||
INSIGHTFACE_OPENCV_DIR := /tmp/localai-insightface-opencv-cache
|
||||
INSIGHTFACE_OPENCV_YUNET_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
|
||||
INSIGHTFACE_OPENCV_SFACE_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
|
||||
INSIGHTFACE_OPENCV_YUNET_SHA := 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
|
||||
INSIGHTFACE_OPENCV_SFACE_SHA := 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
|
||||
|
||||
## buffalo_sc (insightface) — pack zip + SHA-256 mirrors the gallery
|
||||
## entry so the e2e target matches exactly what `local-ai models install
|
||||
## insightface-buffalo-sc` would have fetched. Smallest insightface pack
|
||||
## (~16MB) — keeps CI fast while still covering the insightface engine
|
||||
## code path end-to-end.
|
||||
INSIGHTFACE_BUFFALO_SC_DIR := /tmp/localai-insightface-buffalo-sc-cache
|
||||
INSIGHTFACE_BUFFALO_SC_URL := https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
|
||||
INSIGHTFACE_BUFFALO_SC_SHA := 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
|
||||
|
||||
.PHONY: insightface-opencv-models
|
||||
insightface-opencv-models:
|
||||
@mkdir -p $(INSIGHTFACE_OPENCV_DIR)
|
||||
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_YUNET_SHA)" ]; then \
|
||||
echo "Fetching YuNet..."; \
|
||||
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx $(INSIGHTFACE_OPENCV_YUNET_URL); \
|
||||
echo "$(INSIGHTFACE_OPENCV_YUNET_SHA) $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx" | sha256sum -c; \
|
||||
fi
|
||||
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/sface.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_SFACE_SHA)" ]; then \
|
||||
echo "Fetching SFace..."; \
|
||||
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/sface.onnx $(INSIGHTFACE_OPENCV_SFACE_URL); \
|
||||
echo "$(INSIGHTFACE_OPENCV_SFACE_SHA) $(INSIGHTFACE_OPENCV_DIR)/sface.onnx" | sha256sum -c; \
|
||||
fi
|
||||
|
||||
.PHONY: insightface-buffalo-sc-models
|
||||
insightface-buffalo-sc-models:
|
||||
@mkdir -p $(INSIGHTFACE_BUFFALO_SC_DIR)
|
||||
@if [ "$$(sha256sum $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_BUFFALO_SC_SHA)" ]; then \
|
||||
echo "Fetching buffalo_sc..."; \
|
||||
curl -fsSL -o $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip $(INSIGHTFACE_BUFFALO_SC_URL); \
|
||||
echo "$(INSIGHTFACE_BUFFALO_SC_SHA) $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip" | sha256sum -c; \
|
||||
rm -f $(INSIGHTFACE_BUFFALO_SC_DIR)/*.onnx; \
|
||||
fi
|
||||
@if [ ! -f "$(INSIGHTFACE_BUFFALO_SC_DIR)/det_500m.onnx" ]; then \
|
||||
echo "Extracting buffalo_sc..."; \
|
||||
unzip -o -q $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip -d $(INSIGHTFACE_BUFFALO_SC_DIR); \
|
||||
fi
|
||||
|
||||
## buffalo_sc — smallest insightface pack (SCRFD-500MF detector + MBF
|
||||
## recognizer, ~16MB). Exercises the insightface engine code path
|
||||
## (model_zoo-backed inference) without the ~326MB buffalo_l download.
|
||||
## No age/gender/landmark heads — face_analyze is dropped from caps.
|
||||
## The pack is pre-fetched on the host and passed as `root:<dir>` since
|
||||
## the e2e suite drives LoadModel directly without going through
|
||||
## LocalAI's gallery flow (which is what would normally populate
|
||||
## ModelPath and in turn the engine's `_model_dir` option).
|
||||
test-extra-backend-insightface-buffalo-sc: docker-build-insightface insightface-buffalo-sc-models
|
||||
BACKEND_IMAGE=local-ai-backend:insightface \
|
||||
BACKEND_TEST_MODEL_NAME=insightface-buffalo-sc \
|
||||
BACKEND_TEST_OPTIONS=engine:insightface,model_pack:buffalo_sc,root:$(INSIGHTFACE_BUFFALO_SC_DIR) \
|
||||
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
|
||||
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
|
||||
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
|
||||
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
|
||||
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## OpenCV Zoo YuNet + SFace — Apache 2.0, commercial-safe. face_analyze
|
||||
## cap is dropped (SFace has no demographic head). The ONNX files are
|
||||
## pre-fetched on the host via the insightface-opencv-models target and
|
||||
## passed as absolute paths, since the e2e suite drives LoadModel
|
||||
## directly without going through LocalAI's gallery flow.
|
||||
test-extra-backend-insightface-opencv: docker-build-insightface insightface-opencv-models
|
||||
BACKEND_IMAGE=local-ai-backend:insightface \
|
||||
BACKEND_TEST_MODEL_NAME=insightface-opencv \
|
||||
BACKEND_TEST_OPTIONS=engine:onnx_direct,detector_onnx:$(INSIGHTFACE_OPENCV_DIR)/yunet.onnx,recognizer_onnx:$(INSIGHTFACE_OPENCV_DIR)/sface.onnx \
|
||||
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
|
||||
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
|
||||
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
|
||||
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
|
||||
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## Aggregate — runs both face-recognition model configurations so CI
|
||||
## catches regressions across engines together.
|
||||
test-extra-backend-insightface-all: \
|
||||
test-extra-backend-insightface-buffalo-sc \
|
||||
test-extra-backend-insightface-opencv
|
||||
|
||||
## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
|
||||
## tool-call extraction via sglang's native qwen parser. CPU builds use
|
||||
## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
|
||||
@@ -748,6 +858,7 @@ BACKEND_OUTETTS = outetts|python|.|false|true
|
||||
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
|
||||
BACKEND_COQUI = coqui|python|.|false|true
|
||||
BACKEND_RFDETR = rfdetr|python|.|false|true
|
||||
BACKEND_INSIGHTFACE = insightface|python|.|false|true
|
||||
BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
|
||||
BACKEND_NEUTTS = neutts|python|.|false|true
|
||||
BACKEND_KOKORO = kokoro|python|.|false|true
|
||||
@@ -819,6 +930,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_INSIGHTFACE)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
|
||||
@@ -853,7 +965,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
|
||||
docker-save-%: backend-images
|
||||
docker save local-ai-backend:$* -o backend-images/$*.tar
|
||||
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface
|
||||
|
||||
########################################################
|
||||
### Mock Backend for E2E Tests
|
||||
|
||||
@@ -24,6 +24,8 @@ service Backend {
|
||||
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
|
||||
rpc Status(HealthMessage) returns (StatusResponse) {}
|
||||
rpc Detect(DetectOptions) returns (DetectResponse) {}
|
||||
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
|
||||
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
|
||||
|
||||
rpc StoresSet(StoresSetOptions) returns (Result) {}
|
||||
rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
|
||||
@@ -475,6 +477,57 @@ message DetectResponse {
|
||||
repeated Detection Detections = 1;
|
||||
}
|
||||
|
||||
// --- Face recognition messages ---
|
||||
|
||||
message FacialArea {
|
||||
float x = 1;
|
||||
float y = 2;
|
||||
float w = 3;
|
||||
float h = 4;
|
||||
}
|
||||
|
||||
message FaceVerifyRequest {
|
||||
string img1 = 1; // base64-encoded image
|
||||
string img2 = 2; // base64-encoded image
|
||||
float threshold = 3; // cosine-distance threshold; 0 = use backend default
|
||||
bool anti_spoofing = 4; // reserved for future MiniFASNet bolt-on
|
||||
}
|
||||
|
||||
message FaceVerifyResponse {
|
||||
bool verified = 1;
|
||||
float distance = 2; // 1 - cosine_similarity
|
||||
float threshold = 3;
|
||||
float confidence = 4; // 0-100
|
||||
string model = 5; // e.g. "buffalo_l"
|
||||
FacialArea img1_area = 6;
|
||||
FacialArea img2_area = 7;
|
||||
float processing_time_ms = 8;
|
||||
}
|
||||
|
||||
message FaceAnalyzeRequest {
|
||||
string img = 1; // base64-encoded image
|
||||
repeated string actions = 2; // subset of ["age","gender","emotion","race"]; empty = all-supported
|
||||
bool anti_spoofing = 3;
|
||||
}
|
||||
|
||||
message FaceAnalysis {
|
||||
FacialArea region = 1;
|
||||
float face_confidence = 2;
|
||||
float age = 3;
|
||||
string dominant_gender = 4; // "Man" | "Woman"
|
||||
map<string, float> gender = 5;
|
||||
string dominant_emotion = 6; // reserved; empty in MVP
|
||||
map<string, float> emotion = 7;
|
||||
string dominant_race = 8; // not populated
|
||||
map<string, float> race = 9;
|
||||
bool is_real = 10; // anti-spoofing result when enabled
|
||||
float antispoof_score = 11;
|
||||
}
|
||||
|
||||
message FaceAnalyzeResponse {
|
||||
repeated FaceAnalysis faces = 1;
|
||||
}
|
||||
|
||||
message ToolFormatMarkers {
|
||||
string format_type = 1; // "json_native", "tag_with_json", "tag_with_tagged"
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
|
||||
IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
|
||||
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -326,7 +326,7 @@ struct llama_client_slot
|
||||
char buffer[512];
|
||||
double t_token = t_prompt_processing / num_prompt_tokens_processed;
|
||||
double n_tokens_second = 1e3 / t_prompt_processing * num_prompt_tokens_processed;
|
||||
sprintf(buffer, "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
|
||||
snprintf(buffer, sizeof(buffer), "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
|
||||
t_prompt_processing, num_prompt_tokens_processed,
|
||||
t_token, n_tokens_second);
|
||||
LOG_INFO(buffer, {
|
||||
@@ -340,7 +340,7 @@ struct llama_client_slot
|
||||
|
||||
t_token = t_token_generation / n_decoded;
|
||||
n_tokens_second = 1e3 / t_token_generation * n_decoded;
|
||||
sprintf(buffer, "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
|
||||
snprintf(buffer, sizeof(buffer), "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
|
||||
t_token_generation, n_decoded,
|
||||
t_token, n_tokens_second);
|
||||
LOG_INFO(buffer, {
|
||||
@@ -352,7 +352,7 @@ struct llama_client_slot
|
||||
{"n_tokens_second", n_tokens_second},
|
||||
});
|
||||
|
||||
sprintf(buffer, " total time = %10.2f ms", t_prompt_processing + t_token_generation);
|
||||
snprintf(buffer, sizeof(buffer), " total time = %10.2f ms", t_prompt_processing + t_token_generation);
|
||||
LOG_INFO(buffer, {
|
||||
{"slot_id", id},
|
||||
{"task_id", task_id},
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
|
||||
LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
|
||||
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
|
||||
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
||||
# Pinned to the HEAD of rebase/upstream-sync-april-2026 on https://github.com/TheTom/llama-cpp-turboquant.
|
||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
|
||||
TURBOQUANT_VERSION?=7f320bb89f68096240a517783674cc17c66b7ad2
|
||||
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# stablediffusion.cpp (ggml)
|
||||
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
|
||||
STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d
|
||||
STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78
|
||||
|
||||
CMAKE_ARGS+=-DGGML_MAX_NAME=128
|
||||
|
||||
|
||||
@@ -1106,6 +1106,11 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
|
||||
const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
|
||||
const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
|
||||
const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
|
||||
// Force MP4 container. Distributed LocalAI hands us a staging
|
||||
// path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
|
||||
// extension; relying on filename suffix makes ffmpeg bail with
|
||||
// "Unable to choose an output format".
|
||||
const_cast<char*>("-f"), const_cast<char*>("mp4"),
|
||||
const_cast<char*>(dst),
|
||||
nullptr
|
||||
};
|
||||
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# whisper.cpp version
|
||||
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
|
||||
WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
|
||||
WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
|
||||
SO_TARGET?=libgowhisper.so
|
||||
|
||||
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
|
||||
|
||||
@@ -168,6 +168,43 @@
|
||||
nvidia-cuda-13: "cuda13-rfdetr"
|
||||
nvidia-cuda-12: "cuda12-rfdetr"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-rfdetr"
|
||||
- &insightface
|
||||
name: "insightface"
|
||||
alias: "insightface"
|
||||
# Upstream insightface library is MIT. The pretrained model packs
|
||||
# (buffalo_l, buffalo_s, antelopev2) are released for NON-COMMERCIAL
|
||||
# research use only. The backend image also pre-bakes OpenCV Zoo
|
||||
# YuNet + SFace (Apache 2.0) for commercial use. Pick the engine
|
||||
# via model-gallery entries (insightface-buffalo-l / insightface-opencv
|
||||
# / insightface-buffalo-s) or set `options` in your model YAML.
|
||||
license: "mixed"
|
||||
description: |
|
||||
Face recognition backend powered by `insightface` (ONNX Runtime).
|
||||
Provides face verification (/v1/face/verify), face analysis
|
||||
(/v1/face/analyze), face embedding (/v1/embeddings), face
|
||||
detection (/v1/detection), and 1:N identification
|
||||
(/v1/face/{register,identify,forget}).
|
||||
Ships two engines in a single image: one that drives the insightface
|
||||
model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research
|
||||
use only) and one that drives OpenCV Zoo's YuNet + SFace pair
|
||||
(Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]`
|
||||
in your model YAML, or install one of the ready-made model-gallery
|
||||
entries under the `insightface-*` prefix.
|
||||
The backend image contains only code and Python deps; all model
|
||||
weights are managed by LocalAI's gallery download mechanism.
|
||||
urls:
|
||||
- https://github.com/deepinsight/insightface
|
||||
- https://github.com/opencv/opencv_zoo
|
||||
tags:
|
||||
- face-recognition
|
||||
- face-verification
|
||||
- face-embedding
|
||||
- gpu
|
||||
- cpu
|
||||
capabilities:
|
||||
default: "cpu-insightface"
|
||||
nvidia: "cuda12-insightface"
|
||||
nvidia-cuda-12: "cuda12-insightface"
|
||||
- &sam3cpp
|
||||
name: "sam3-cpp"
|
||||
alias: "sam3-cpp"
|
||||
@@ -587,7 +624,6 @@
|
||||
alias: "whisperx"
|
||||
capabilities:
|
||||
nvidia: "cuda12-whisperx"
|
||||
amd: "rocm-whisperx"
|
||||
metal: "metal-whisperx"
|
||||
default: "cpu-whisperx"
|
||||
nvidia-cuda-13: "cuda13-whisperx"
|
||||
@@ -2745,7 +2781,6 @@
|
||||
name: "whisperx-development"
|
||||
capabilities:
|
||||
nvidia: "cuda12-whisperx-development"
|
||||
amd: "rocm-whisperx-development"
|
||||
metal: "metal-whisperx-development"
|
||||
default: "cpu-whisperx-development"
|
||||
nvidia-cuda-13: "cuda13-whisperx-development"
|
||||
@@ -2771,16 +2806,6 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "rocm-whisperx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "rocm-whisperx-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-rocm-hipblas-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "cuda13-whisperx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
|
||||
@@ -3721,3 +3746,30 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization
|
||||
# insightface (face recognition) — development and concrete image entries
|
||||
- !!merge <<: *insightface
|
||||
name: "insightface-development"
|
||||
capabilities:
|
||||
default: "cpu-insightface-development"
|
||||
nvidia: "cuda12-insightface-development"
|
||||
nvidia-cuda-12: "cuda12-insightface-development"
|
||||
- !!merge <<: *insightface
|
||||
name: "cpu-insightface"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-insightface"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-cpu-insightface
|
||||
- !!merge <<: *insightface
|
||||
name: "cuda12-insightface"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-insightface"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-insightface
|
||||
- !!merge <<: *insightface
|
||||
name: "cpu-insightface-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-insightface"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-cpu-insightface
|
||||
- !!merge <<: *insightface
|
||||
name: "cuda12-insightface-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-insightface"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-insightface
|
||||
|
||||
13
backend/python/insightface/Makefile
Normal file
13
backend/python/insightface/Makefile
Normal file
@@ -0,0 +1,13 @@
|
||||
.DEFAULT_GOAL := install
|
||||
|
||||
.PHONY: install
|
||||
install:
|
||||
bash install.sh
|
||||
|
||||
.PHONY: protogen-clean
|
||||
protogen-clean:
|
||||
$(RM) backend_pb2_grpc.py backend_pb2.py
|
||||
|
||||
.PHONY: clean
|
||||
clean: protogen-clean
|
||||
rm -rf venv __pycache__
|
||||
67
backend/python/insightface/README.md
Normal file
67
backend/python/insightface/README.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# insightface backend (LocalAI)
|
||||
|
||||
Face recognition backend backed by ONNX Runtime. Provides face
|
||||
verification (1:1), face analysis (age/gender), face detection, face
|
||||
embedding, and — via LocalAI's built-in vector store — 1:N
|
||||
identification.
|
||||
|
||||
## Engines
|
||||
|
||||
This backend ships with **two** interchangeable engines selected via
|
||||
`LoadModel.Options["engine"]`:
|
||||
|
||||
| engine | Implementation | Models | License |
|
||||
|---|---|---|---|
|
||||
| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
|
||||
| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
|
||||
|
||||
Both engines implement the same `FaceEngine` protocol in `engines.py`,
|
||||
so the gRPC servicer in `backend.py` doesn't need to know which one is
|
||||
active.
|
||||
|
||||
## LoadModel options
|
||||
|
||||
Common:
|
||||
|
||||
| option | default | description |
|
||||
|---|---|---|
|
||||
| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
|
||||
| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
|
||||
| `det_thresh` | `0.5` | detector confidence threshold |
|
||||
| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
|
||||
|
||||
`insightface` engine:
|
||||
|
||||
| option | default | description |
|
||||
|---|---|---|
|
||||
| `model_pack` | `buffalo_l` | which insightface pack to load |
|
||||
|
||||
`onnx_direct` engine:
|
||||
|
||||
| option | default | description |
|
||||
|---|---|---|
|
||||
| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
|
||||
| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
|
||||
|
||||
## Adding a new model pack
|
||||
|
||||
1. If it's an insightface pack (auto-downloadable or manually extracted
|
||||
into `~/.insightface/models/<name>/`), just add a new gallery entry
|
||||
in `backend/index.yaml` with `options: ["engine:insightface",
|
||||
"model_pack:<name>"]`. No code change.
|
||||
2. If it's an Apache-licensed ONNX pair, add a gallery entry with
|
||||
`options: ["engine:onnx_direct", "detector_onnx:...",
|
||||
"recognizer_onnx:..."]`. If the detector or recognizer has a
|
||||
different input-tensor shape than YuNet/SFace, you may need a new
|
||||
engine implementation in `engines.py`; the two-engine seam makes
|
||||
that a self-contained change.
|
||||
|
||||
## Running tests locally
|
||||
|
||||
```bash
|
||||
make -C backend/python/insightface # install deps + bake models
|
||||
make -C backend/python/insightface test # run test.py
|
||||
```
|
||||
|
||||
The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
|
||||
absent (e.g. on dev boxes where `install.sh` wasn't run).
|
||||
265
backend/python/insightface/backend.py
Normal file
265
backend/python/insightface/backend.py
Normal file
@@ -0,0 +1,265 @@
|
||||
#!/usr/bin/env python3
|
||||
"""gRPC server for the insightface face recognition backend.
|
||||
|
||||
Implements Health / LoadModel / Status plus the face-specific methods:
|
||||
Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
|
||||
delegated to engines.py — this file is just the gRPC plumbing.
|
||||
"""
|
||||
import argparse
|
||||
import base64
|
||||
import os
|
||||
import signal
|
||||
import sys
|
||||
import time
|
||||
from concurrent import futures
|
||||
from io import BytesIO
|
||||
|
||||
import backend_pb2
|
||||
import backend_pb2_grpc
|
||||
import cv2
|
||||
import grpc
|
||||
import numpy as np
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
|
||||
from grpc_auth import get_auth_interceptors # noqa: E402
|
||||
|
||||
from engines import FaceEngine, build_engine # noqa: E402
|
||||
|
||||
_ONE_DAY = 60 * 60 * 24
|
||||
MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
|
||||
|
||||
# Default cosine-distance threshold for "same person" on buffalo_l
|
||||
# ArcFace R50. Clients can override per-request; clients using SFace
|
||||
# should pass threshold≈0.4 since the distance distribution is wider.
|
||||
DEFAULT_VERIFY_THRESHOLD = 0.35
|
||||
|
||||
|
||||
def _decode_image(src: str) -> np.ndarray | None:
|
||||
"""Decode a base64-encoded image into an OpenCV BGR numpy array."""
|
||||
if not src:
|
||||
return None
|
||||
try:
|
||||
data = base64.b64decode(src, validate=False)
|
||||
except Exception:
|
||||
return None
|
||||
arr = np.frombuffer(data, dtype=np.uint8)
|
||||
if arr.size == 0:
|
||||
return None
|
||||
img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
|
||||
return img
|
||||
|
||||
|
||||
def _parse_options(raw: list[str]) -> dict[str, str]:
|
||||
out: dict[str, str] = {}
|
||||
for entry in raw:
|
||||
if ":" not in entry:
|
||||
continue
|
||||
k, v = entry.split(":", 1)
|
||||
out[k.strip()] = v.strip()
|
||||
return out
|
||||
|
||||
|
||||
class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
def __init__(self) -> None:
|
||||
self.engine: FaceEngine | None = None
|
||||
self.engine_name: str = ""
|
||||
self.model_name: str = ""
|
||||
self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
|
||||
|
||||
def Health(self, request, context):
|
||||
return backend_pb2.Reply(message=bytes("OK", "utf-8"))
|
||||
|
||||
def LoadModel(self, request, context):
|
||||
options = _parse_options(list(request.Options))
|
||||
# Surface LocalAI's models directory (ModelPath) so engines can
|
||||
# anchor relative paths — OnnxDirectEngine's detector_onnx /
|
||||
# recognizer_onnx point at gallery-managed files that LocalAI
|
||||
# dropped there, and InsightFaceEngine auto-downloads its packs
|
||||
# into that same directory alongside every other managed model.
|
||||
# Private key to avoid clashing with user-provided options.
|
||||
if request.ModelPath:
|
||||
options["_model_dir"] = request.ModelPath
|
||||
|
||||
engine_name = options.get("engine", "insightface")
|
||||
try:
|
||||
self.engine = build_engine(engine_name)
|
||||
self.engine.prepare(options)
|
||||
except Exception as err: # pragma: no cover - exercised via e2e
|
||||
return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
|
||||
|
||||
self.engine_name = engine_name
|
||||
self.model_name = request.Model or options.get("model_pack", "")
|
||||
if "verify_threshold" in options:
|
||||
try:
|
||||
self.verify_threshold = float(options["verify_threshold"])
|
||||
except ValueError:
|
||||
pass
|
||||
print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
|
||||
return backend_pb2.Result(success=True, message="Model loaded successfully")
|
||||
|
||||
def Status(self, request, context):
|
||||
state = (
|
||||
backend_pb2.StatusResponse.READY
|
||||
if self.engine is not None
|
||||
else backend_pb2.StatusResponse.UNINITIALIZED
|
||||
)
|
||||
return backend_pb2.StatusResponse(state=state)
|
||||
|
||||
def Embedding(self, request, context):
|
||||
if self.engine is None:
|
||||
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
|
||||
context.set_details("face model not loaded")
|
||||
return backend_pb2.EmbeddingResult()
|
||||
if not request.Images:
|
||||
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
|
||||
context.set_details("Embedding requires Images[0] to be a base64 image")
|
||||
return backend_pb2.EmbeddingResult()
|
||||
|
||||
img = _decode_image(request.Images[0])
|
||||
if img is None:
|
||||
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
|
||||
context.set_details("failed to decode image")
|
||||
return backend_pb2.EmbeddingResult()
|
||||
|
||||
vec = self.engine.embed(img)
|
||||
if vec is None:
|
||||
context.set_code(grpc.StatusCode.NOT_FOUND)
|
||||
context.set_details("no face detected")
|
||||
return backend_pb2.EmbeddingResult()
|
||||
return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
|
||||
|
||||
def Detect(self, request, context):
|
||||
if self.engine is None:
|
||||
return backend_pb2.DetectResponse()
|
||||
img = _decode_image(request.src)
|
||||
if img is None:
|
||||
return backend_pb2.DetectResponse()
|
||||
detections = []
|
||||
for d in self.engine.detect(img):
|
||||
x1, y1, x2, y2 = d.bbox
|
||||
detections.append(
|
||||
backend_pb2.Detection(
|
||||
x=float(x1),
|
||||
y=float(y1),
|
||||
width=float(x2 - x1),
|
||||
height=float(y2 - y1),
|
||||
confidence=float(d.score),
|
||||
class_name="face",
|
||||
)
|
||||
)
|
||||
return backend_pb2.DetectResponse(Detections=detections)
|
||||
|
||||
def FaceVerify(self, request, context):
|
||||
if self.engine is None:
|
||||
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
|
||||
context.set_details("face model not loaded")
|
||||
return backend_pb2.FaceVerifyResponse()
|
||||
|
||||
img1 = _decode_image(request.img1)
|
||||
img2 = _decode_image(request.img2)
|
||||
if img1 is None or img2 is None:
|
||||
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
|
||||
context.set_details("failed to decode one or both images")
|
||||
return backend_pb2.FaceVerifyResponse()
|
||||
|
||||
threshold = request.threshold if request.threshold > 0 else self.verify_threshold
|
||||
|
||||
start = time.time()
|
||||
e1 = self.engine.embed(img1)
|
||||
e2 = self.engine.embed(img2)
|
||||
if e1 is None or e2 is None:
|
||||
context.set_code(grpc.StatusCode.NOT_FOUND)
|
||||
context.set_details("no face detected in one or both images")
|
||||
return backend_pb2.FaceVerifyResponse()
|
||||
|
||||
# Both engines return L2-normalized vectors, so the dot product
|
||||
# is the cosine similarity directly.
|
||||
sim = float(np.dot(e1, e2))
|
||||
distance = 1.0 - sim
|
||||
verified = distance < threshold
|
||||
confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
|
||||
|
||||
def _region(img) -> backend_pb2.FacialArea:
|
||||
dets = self.engine.detect(img)
|
||||
if not dets:
|
||||
return backend_pb2.FacialArea()
|
||||
best = max(dets, key=lambda d: d.score)
|
||||
x1, y1, x2, y2 = best.bbox
|
||||
return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
|
||||
|
||||
return backend_pb2.FaceVerifyResponse(
|
||||
verified=verified,
|
||||
distance=float(distance),
|
||||
threshold=float(threshold),
|
||||
confidence=float(confidence),
|
||||
model=self.model_name or self.engine_name,
|
||||
img1_area=_region(img1),
|
||||
img2_area=_region(img2),
|
||||
processing_time_ms=float((time.time() - start) * 1000.0),
|
||||
)
|
||||
|
||||
def FaceAnalyze(self, request, context):
|
||||
if self.engine is None:
|
||||
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
|
||||
context.set_details("face model not loaded")
|
||||
return backend_pb2.FaceAnalyzeResponse()
|
||||
img = _decode_image(request.img)
|
||||
if img is None:
|
||||
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
|
||||
context.set_details("failed to decode image")
|
||||
return backend_pb2.FaceAnalyzeResponse()
|
||||
|
||||
faces = []
|
||||
for attrs in self.engine.analyze(img):
|
||||
x, y, w, h = attrs.region
|
||||
fa = backend_pb2.FaceAnalysis(
|
||||
region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
|
||||
face_confidence=float(attrs.face_confidence),
|
||||
)
|
||||
if attrs.age is not None:
|
||||
fa.age = float(attrs.age)
|
||||
if attrs.dominant_gender:
|
||||
fa.dominant_gender = attrs.dominant_gender
|
||||
for k, v in attrs.gender.items():
|
||||
fa.gender[k] = float(v)
|
||||
faces.append(fa)
|
||||
return backend_pb2.FaceAnalyzeResponse(faces=faces)
|
||||
|
||||
|
||||
def serve(address: str) -> None:
|
||||
server = grpc.server(
|
||||
futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
|
||||
options=[
|
||||
("grpc.max_message_length", 50 * 1024 * 1024),
|
||||
("grpc.max_send_message_length", 50 * 1024 * 1024),
|
||||
("grpc.max_receive_message_length", 50 * 1024 * 1024),
|
||||
],
|
||||
interceptors=get_auth_interceptors(),
|
||||
)
|
||||
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
|
||||
server.add_insecure_port(address)
|
||||
server.start()
|
||||
print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
|
||||
|
||||
def _stop(sig, frame): # pragma: no cover
|
||||
print("[insightface] shutting down")
|
||||
server.stop(0)
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGINT, _stop)
|
||||
signal.signal(signal.SIGTERM, _stop)
|
||||
|
||||
try:
|
||||
while True:
|
||||
time.sleep(_ONE_DAY)
|
||||
except KeyboardInterrupt:
|
||||
server.stop(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
|
||||
parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
|
||||
args = parser.parse_args()
|
||||
print(f"[insightface] startup: {args}", file=sys.stderr)
|
||||
serve(args.addr)
|
||||
382
backend/python/insightface/engines.py
Normal file
382
backend/python/insightface/engines.py
Normal file
@@ -0,0 +1,382 @@
|
||||
"""Face recognition engine implementations for the LocalAI insightface backend.
|
||||
|
||||
Two engines are provided:
|
||||
|
||||
* InsightFaceEngine — wraps insightface.app.FaceAnalysis. Supports
|
||||
buffalo_l / buffalo_s / antelopev2 model packs
|
||||
with SCRFD detector + ArcFace recognizer +
|
||||
genderage head. NON-COMMERCIAL research use
|
||||
only (upstream license).
|
||||
|
||||
* OnnxDirectEngine — loads detector + recognizer ONNX files directly
|
||||
via onnxruntime. Used for OpenCV Zoo models
|
||||
(YuNet + SFace) and any future Apache-licensed
|
||||
model set. Does not support analyze().
|
||||
|
||||
Both engines expose the same interface so the gRPC servicer (backend.py)
|
||||
can dispatch without knowing which one is active.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Protocol
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
@dataclass
|
||||
class FaceDetection:
|
||||
bbox: tuple[float, float, float, float] # x1, y1, x2, y2
|
||||
score: float
|
||||
landmarks: np.ndarray | None = None # 5x2 keypoints when available
|
||||
|
||||
|
||||
@dataclass
|
||||
class FaceAttributes:
|
||||
region: tuple[float, float, float, float] # x, y, w, h
|
||||
face_confidence: float
|
||||
age: float | None = None
|
||||
dominant_gender: str | None = None
|
||||
gender: dict[str, float] = field(default_factory=dict)
|
||||
|
||||
|
||||
class FaceEngine(Protocol):
|
||||
"""Minimal interface every engine must implement."""
|
||||
|
||||
def prepare(self, options: dict[str, str]) -> None: ...
|
||||
def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
|
||||
def embed(self, img: np.ndarray) -> np.ndarray | None: ...
|
||||
def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
|
||||
|
||||
|
||||
# ─── InsightFaceEngine ────────────────────────────────────────────────
|
||||
|
||||
class InsightFaceEngine:
|
||||
"""Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
|
||||
|
||||
FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
|
||||
in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
|
||||
build a `{taskname: model}` dict, then loop per-face at inference).
|
||||
We reimplement the same loop here so we can:
|
||||
|
||||
1. Load packs from whatever directory LocalAI's gallery extracted
|
||||
them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
|
||||
nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
|
||||
without needing a specific layout on disk.
|
||||
2. Skip insightface's built-in auto-download entirely: weight
|
||||
delivery is LocalAI's gallery `files:` job now, checksum-
|
||||
verified and cached alongside every other managed model.
|
||||
|
||||
The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
|
||||
Landmark) stay in insightface — we only reimplement the ~50 lines
|
||||
of glue around them.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.models: dict[str, Any] = {}
|
||||
self.det_model: Any = None
|
||||
self.model_pack: str = "buffalo_l"
|
||||
self.det_size: tuple[int, int] = (640, 640)
|
||||
self.det_thresh: float = 0.5
|
||||
self._providers: list[str] = ["CPUExecutionProvider"]
|
||||
|
||||
def prepare(self, options: dict[str, str]) -> None:
|
||||
import glob
|
||||
import os
|
||||
|
||||
from insightface.model_zoo import model_zoo
|
||||
|
||||
self.model_pack = options.get("model_pack", "buffalo_l")
|
||||
self.det_size = _parse_det_size(options.get("det_size", "640x640"))
|
||||
self.det_thresh = float(options.get("det_thresh", "0.5"))
|
||||
|
||||
pack_dir = _locate_insightface_pack(options, self.model_pack)
|
||||
if pack_dir is None:
|
||||
raise ValueError(
|
||||
f"no insightface pack '{self.model_pack}' found — install via "
|
||||
f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
|
||||
)
|
||||
|
||||
onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
|
||||
if not onnx_files:
|
||||
raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
|
||||
|
||||
# CUDAExecutionProvider is picked automatically by onnxruntime-gpu
|
||||
# when available; falling back to CPU keeps the CPU-only image
|
||||
# working. ctx_id=0 means "first GPU if any, else CPU".
|
||||
self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
|
||||
|
||||
self.models = {}
|
||||
for onnx_file in onnx_files:
|
||||
m = model_zoo.get_model(onnx_file, providers=self._providers)
|
||||
if m is None:
|
||||
continue
|
||||
# First occurrence of each taskname wins (matches FaceAnalysis).
|
||||
if m.taskname not in self.models:
|
||||
self.models[m.taskname] = m
|
||||
|
||||
if "detection" not in self.models:
|
||||
raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
|
||||
self.det_model = self.models["detection"]
|
||||
|
||||
self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
|
||||
for name, m in self.models.items():
|
||||
if name != "detection":
|
||||
m.prepare(0)
|
||||
|
||||
def _faces(self, img: np.ndarray) -> list[Any]:
|
||||
"""Run detection + all non-detection models per face."""
|
||||
if self.det_model is None:
|
||||
return []
|
||||
from insightface.app.common import Face
|
||||
|
||||
bboxes, kpss = self.det_model.detect(img, max_num=0)
|
||||
if bboxes is None or bboxes.shape[0] == 0:
|
||||
return []
|
||||
faces: list[Any] = []
|
||||
for i in range(bboxes.shape[0]):
|
||||
bbox = bboxes[i, 0:4]
|
||||
det_score = bboxes[i, 4]
|
||||
kps = kpss[i] if kpss is not None else None
|
||||
face = Face(bbox=bbox, kps=kps, det_score=det_score)
|
||||
for name, m in self.models.items():
|
||||
if name == "detection":
|
||||
continue
|
||||
m.get(img, face)
|
||||
faces.append(face)
|
||||
return faces
|
||||
|
||||
def detect(self, img: np.ndarray) -> list[FaceDetection]:
|
||||
return [
|
||||
FaceDetection(
|
||||
bbox=tuple(float(v) for v in f.bbox),
|
||||
score=float(f.det_score),
|
||||
landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
|
||||
)
|
||||
for f in self._faces(img)
|
||||
]
|
||||
|
||||
def embed(self, img: np.ndarray) -> np.ndarray | None:
|
||||
faces = self._faces(img)
|
||||
if not faces:
|
||||
return None
|
||||
best = max(faces, key=lambda f: float(f.det_score))
|
||||
if getattr(best, "normed_embedding", None) is None:
|
||||
return None
|
||||
return np.asarray(best.normed_embedding, dtype=np.float32)
|
||||
|
||||
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
|
||||
out: list[FaceAttributes] = []
|
||||
for f in self._faces(img):
|
||||
x1, y1, x2, y2 = (float(v) for v in f.bbox)
|
||||
region = (x1, y1, x2 - x1, y2 - y1)
|
||||
attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
|
||||
age = getattr(f, "age", None)
|
||||
if age is not None:
|
||||
attrs.age = float(age)
|
||||
gender = getattr(f, "gender", None)
|
||||
if gender is not None:
|
||||
# genderage head emits argmax, not probabilities —
|
||||
# one-hot dict keeps the API stable.
|
||||
attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
|
||||
attrs.gender = {
|
||||
"Man": 1.0 if int(gender) == 1 else 0.0,
|
||||
"Woman": 0.0 if int(gender) == 1 else 1.0,
|
||||
}
|
||||
out.append(attrs)
|
||||
return out
|
||||
|
||||
|
||||
# ─── OnnxDirectEngine ─────────────────────────────────────────────────
|
||||
|
||||
class OnnxDirectEngine:
|
||||
"""Loads detector + recognizer ONNX files directly.
|
||||
|
||||
Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
|
||||
exposes a C++-level API via cv2.FaceDetectorYN which accepts the
|
||||
ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
|
||||
Both are Apache 2.0 licensed.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.detector_path: str = ""
|
||||
self.recognizer_path: str = ""
|
||||
self.input_size: tuple[int, int] = (320, 320)
|
||||
self.det_thresh: float = 0.5
|
||||
self._detector: Any = None
|
||||
self._recognizer: Any = None
|
||||
|
||||
def prepare(self, options: dict[str, str]) -> None:
|
||||
raw_det = options.get("detector_onnx", "")
|
||||
raw_rec = options.get("recognizer_onnx", "")
|
||||
if not raw_det or not raw_rec:
|
||||
raise ValueError(
|
||||
"onnx_direct engine requires both detector_onnx and recognizer_onnx options"
|
||||
)
|
||||
model_dir = options.get("_model_dir")
|
||||
self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
|
||||
self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
|
||||
self.input_size = _parse_det_size(options.get("det_size", "320x320"))
|
||||
self.det_thresh = float(options.get("det_thresh", "0.5"))
|
||||
|
||||
# YuNet is a fixed-size detector; size is reset per detect() call to
|
||||
# match the input frame.
|
||||
self._detector = cv2.FaceDetectorYN.create(
|
||||
self.detector_path,
|
||||
"",
|
||||
self.input_size,
|
||||
score_threshold=self.det_thresh,
|
||||
nms_threshold=0.3,
|
||||
top_k=5000,
|
||||
)
|
||||
self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
|
||||
|
||||
def detect(self, img: np.ndarray) -> list[FaceDetection]:
|
||||
if self._detector is None:
|
||||
return []
|
||||
h, w = img.shape[:2]
|
||||
self._detector.setInputSize((w, h))
|
||||
retval, faces = self._detector.detect(img)
|
||||
if faces is None:
|
||||
return []
|
||||
out: list[FaceDetection] = []
|
||||
for row in faces:
|
||||
x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
|
||||
# Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
|
||||
landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
|
||||
score = float(row[-1])
|
||||
out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
|
||||
return out
|
||||
|
||||
def embed(self, img: np.ndarray) -> np.ndarray | None:
|
||||
if self._detector is None or self._recognizer is None:
|
||||
return None
|
||||
h, w = img.shape[:2]
|
||||
self._detector.setInputSize((w, h))
|
||||
retval, faces = self._detector.detect(img)
|
||||
if faces is None or len(faces) == 0:
|
||||
return None
|
||||
# Pick the highest-score face (last column is score).
|
||||
best = max(faces, key=lambda r: float(r[-1]))
|
||||
aligned = self._recognizer.alignCrop(img, best)
|
||||
feat = self._recognizer.feature(aligned)
|
||||
vec = np.asarray(feat, dtype=np.float32).flatten()
|
||||
# SFace outputs a 128-dim feature; L2-normalize to make dot-product
|
||||
# comparable to buffalo_l's already-normed 512-dim embedding.
|
||||
norm = float(np.linalg.norm(vec))
|
||||
if norm == 0:
|
||||
return None
|
||||
return vec / norm
|
||||
|
||||
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
|
||||
# OpenCV Zoo does not ship a demographic classifier; report
|
||||
# only the face-detection regions so callers can still see
|
||||
# how many faces were detected.
|
||||
return [
|
||||
FaceAttributes(
|
||||
region=(
|
||||
d.bbox[0],
|
||||
d.bbox[1],
|
||||
d.bbox[2] - d.bbox[0],
|
||||
d.bbox[3] - d.bbox[1],
|
||||
),
|
||||
face_confidence=d.score,
|
||||
)
|
||||
for d in self.detect(img)
|
||||
]
|
||||
|
||||
|
||||
# ─── helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
def _parse_det_size(raw: str) -> tuple[int, int]:
|
||||
raw = raw.strip().lower().replace(" ", "")
|
||||
if "x" in raw:
|
||||
w, h = raw.split("x", 1)
|
||||
return (int(w), int(h))
|
||||
n = int(raw)
|
||||
return (n, n)
|
||||
|
||||
|
||||
def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
|
||||
"""Find the directory holding the insightface pack's ONNX files.
|
||||
|
||||
LocalAI's gallery `files:` extracts the pack zip straight into the
|
||||
models directory. Upstream packs are inconsistent:
|
||||
|
||||
buffalo_l/s/sc — flat zip, ONNX lands at `<models_dir>/*.onnx`
|
||||
buffalo_m, antelopev2 — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
|
||||
|
||||
We search, in order:
|
||||
1. `<models_dir>/<name>/` — wrapped-zip layout, or insightface's
|
||||
own FaceAnalysis-style `<root>/models/<name>/` layout.
|
||||
2. `<models_dir>/models/<name>/` — insightface's FaceAnalysis
|
||||
auto-download lands here (handy for dev environments that
|
||||
still have old `~/.insightface` caches).
|
||||
3. `<models_dir>/` — flat-zip layout directly in models dir.
|
||||
|
||||
Returns the first directory whose contents include `*.onnx`.
|
||||
"""
|
||||
import glob
|
||||
import os
|
||||
|
||||
model_dir = options.get("_model_dir") or ""
|
||||
explicit_root = options.get("root")
|
||||
|
||||
candidates: list[str] = []
|
||||
if model_dir:
|
||||
candidates.append(os.path.join(model_dir, name))
|
||||
candidates.append(os.path.join(model_dir, "models", name))
|
||||
candidates.append(model_dir)
|
||||
if explicit_root:
|
||||
expanded = os.path.expanduser(explicit_root)
|
||||
candidates.append(os.path.join(expanded, "models", name))
|
||||
candidates.append(os.path.join(expanded, name))
|
||||
candidates.append(expanded)
|
||||
|
||||
for c in candidates:
|
||||
if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
|
||||
return c
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
|
||||
"""Resolve an ONNX file path across the paths LocalAI might deliver it from.
|
||||
|
||||
Search order:
|
||||
1. The path itself if it already resolves (absolute, or relative to CWD).
|
||||
2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
|
||||
this is how LocalAI surfaces gallery-managed files. When the gallery
|
||||
entry lists `files:`, each one lands under the models directory and
|
||||
backends load them via filename anchored by ModelFile.
|
||||
3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
|
||||
where someone manually dropped weights inside the backend dir.
|
||||
|
||||
If none hit, return the literal input so cv2/insightface surfaces a
|
||||
clearer error naming the actually-attempted path.
|
||||
"""
|
||||
import os
|
||||
|
||||
if os.path.isfile(path):
|
||||
return path
|
||||
stripped = path.lstrip("/")
|
||||
candidates: list[str] = []
|
||||
if model_dir:
|
||||
candidates.append(os.path.join(model_dir, os.path.basename(path)))
|
||||
candidates.append(os.path.join(model_dir, stripped))
|
||||
script_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
candidates.append(os.path.join(script_dir, stripped))
|
||||
for c in candidates:
|
||||
if os.path.isfile(c):
|
||||
return c
|
||||
return path
|
||||
|
||||
|
||||
def build_engine(name: str) -> FaceEngine:
|
||||
"""Factory for the engine selected by LoadModel options."""
|
||||
key = name.strip().lower()
|
||||
if key in ("", "insightface"):
|
||||
return InsightFaceEngine()
|
||||
if key in ("onnx_direct", "onnx-direct", "opencv"):
|
||||
return OnnxDirectEngine()
|
||||
raise ValueError(f"unknown engine: {name!r}")
|
||||
28
backend/python/insightface/install.sh
Executable file
28
backend/python/insightface/install.sh
Executable file
@@ -0,0 +1,28 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
backend_dir=$(dirname $0)
|
||||
if [ -d $backend_dir/common ]; then
|
||||
source $backend_dir/common/libbackend.sh
|
||||
else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
installRequirements
|
||||
|
||||
# We deliberately do NOT pre-bake any model weights here. Two reasons:
|
||||
#
|
||||
# 1. Weights should follow LocalAI's gallery-managed download flow
|
||||
# like every other backend. For OpenCV Zoo (YuNet + SFace) the
|
||||
# gallery entries in gallery/index.yaml list the ONNX files via
|
||||
# `files:` with URI + SHA-256 — LocalAI fetches them into the
|
||||
# models directory on `local-ai models install`.
|
||||
#
|
||||
# 2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
|
||||
# buffalo_sc, antelopev2), upstream distributes zip archives
|
||||
# only (no individual ONNX URLs). We rely on insightface's own
|
||||
# auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
|
||||
# at first LoadModel, pointed at a writable directory. This
|
||||
# matches how rfdetr behaves (uses `inference.get_model()`).
|
||||
#
|
||||
# Net effect: the backend image ships only Python deps (~150MB CPU).
|
||||
7
backend/python/insightface/requirements-cpu.txt
Normal file
7
backend/python/insightface/requirements-cpu.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
insightface
|
||||
onnxruntime
|
||||
opencv-python-headless
|
||||
numpy
|
||||
onnx
|
||||
cython
|
||||
scikit-image
|
||||
7
backend/python/insightface/requirements-cublas12.txt
Normal file
7
backend/python/insightface/requirements-cublas12.txt
Normal file
@@ -0,0 +1,7 @@
|
||||
insightface
|
||||
onnxruntime-gpu
|
||||
opencv-python-headless
|
||||
numpy
|
||||
onnx
|
||||
cython
|
||||
scikit-image
|
||||
3
backend/python/insightface/requirements.txt
Normal file
3
backend/python/insightface/requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
||||
grpcio==1.71.0
|
||||
protobuf
|
||||
grpcio-tools
|
||||
9
backend/python/insightface/run.sh
Executable file
9
backend/python/insightface/run.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
backend_dir=$(dirname $0)
|
||||
if [ -d $backend_dir/common ]; then
|
||||
source $backend_dir/common/libbackend.sh
|
||||
else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
startBackend $@
|
||||
264
backend/python/insightface/smoke.py
Normal file
264
backend/python/insightface/smoke.py
Normal file
@@ -0,0 +1,264 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Smoke-test every face recognition model configuration shipped in the
|
||||
gallery. Simulates what LocalAI does at runtime: for each config, sets
|
||||
up a models directory, fetches any required files via URL (as the
|
||||
gallery's `files:` list would), then loads + detects + embeds via the
|
||||
in-process BackendServicer — matching the gRPC surface end users hit.
|
||||
|
||||
Run inside the built backend image (venv already has insightface /
|
||||
onnxruntime / opencv-python-headless):
|
||||
|
||||
python smoke.py
|
||||
|
||||
Network is required for the insightface packs (fetched via upstream's
|
||||
FaceAnalysis auto-download at first LoadModel) and for downloading
|
||||
the OpenCV Zoo ONNX files on first run.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import hashlib
|
||||
import os
|
||||
import sys
|
||||
import traceback
|
||||
import urllib.request
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
|
||||
import backend_pb2 # noqa: E402
|
||||
from backend import BackendServicer # noqa: E402
|
||||
|
||||
|
||||
# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
|
||||
# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
|
||||
OPENCV_FILES = {
|
||||
"fp32": [
|
||||
(
|
||||
"face_detection_yunet_2023mar.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
|
||||
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
|
||||
),
|
||||
(
|
||||
"face_recognition_sface_2021dec.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
|
||||
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
|
||||
),
|
||||
],
|
||||
"int8": [
|
||||
(
|
||||
"face_detection_yunet_2023mar_int8.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
|
||||
"321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
|
||||
),
|
||||
(
|
||||
"face_recognition_sface_2021dec_int8.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
|
||||
"2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
|
||||
),
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
CONFIGS = [
|
||||
{
|
||||
"name": "insightface-buffalo-l",
|
||||
"options": ["engine:insightface", "model_pack:buffalo_l"],
|
||||
"has_analyze": True,
|
||||
"needs_opencv_files": None,
|
||||
},
|
||||
{
|
||||
"name": "insightface-buffalo-sc",
|
||||
"options": ["engine:insightface", "model_pack:buffalo_sc"],
|
||||
# buffalo_sc has recognizer only — no landmarks, no genderage.
|
||||
"has_analyze": False,
|
||||
"needs_opencv_files": None,
|
||||
},
|
||||
{
|
||||
"name": "insightface-buffalo-s",
|
||||
"options": ["engine:insightface", "model_pack:buffalo_s"],
|
||||
"has_analyze": True,
|
||||
"needs_opencv_files": None,
|
||||
},
|
||||
{
|
||||
"name": "insightface-buffalo-m",
|
||||
"options": ["engine:insightface", "model_pack:buffalo_m"],
|
||||
"has_analyze": True,
|
||||
"needs_opencv_files": None,
|
||||
},
|
||||
{
|
||||
"name": "insightface-antelopev2",
|
||||
"options": ["engine:insightface", "model_pack:antelopev2"],
|
||||
"has_analyze": True,
|
||||
"needs_opencv_files": None,
|
||||
},
|
||||
{
|
||||
"name": "insightface-opencv",
|
||||
"options": [
|
||||
"engine:onnx_direct",
|
||||
"detector_onnx:face_detection_yunet_2023mar.onnx",
|
||||
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
|
||||
],
|
||||
"has_analyze": False,
|
||||
"needs_opencv_files": "fp32",
|
||||
},
|
||||
{
|
||||
"name": "insightface-opencv-int8",
|
||||
"options": [
|
||||
"engine:onnx_direct",
|
||||
"detector_onnx:face_detection_yunet_2023mar_int8.onnx",
|
||||
"recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
|
||||
],
|
||||
"has_analyze": False,
|
||||
"needs_opencv_files": "int8",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
class _FakeContext:
|
||||
def __init__(self) -> None:
|
||||
self.code = None
|
||||
self.details = None
|
||||
|
||||
def set_code(self, code):
|
||||
self.code = code
|
||||
|
||||
def set_details(self, details):
|
||||
self.details = details
|
||||
|
||||
|
||||
def _encode_image(img: np.ndarray) -> str:
|
||||
_, buf = cv2.imencode(".jpg", img)
|
||||
return base64.b64encode(buf.tobytes()).decode("ascii")
|
||||
|
||||
|
||||
def _load_sample_image() -> str:
|
||||
from insightface.data import get_image as ins_get_image
|
||||
|
||||
return _encode_image(ins_get_image("t1"))
|
||||
|
||||
|
||||
def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
|
||||
dest = os.path.join(model_dir, filename)
|
||||
if os.path.isfile(dest):
|
||||
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
|
||||
if h == sha256:
|
||||
return
|
||||
sys.stderr.write(f" fetching {filename} from {uri}\n")
|
||||
sys.stderr.flush()
|
||||
urllib.request.urlretrieve(uri, dest)
|
||||
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
|
||||
if h != sha256:
|
||||
raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
|
||||
|
||||
|
||||
def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
|
||||
# Mirror LocalAI's gallery flow: populate model_dir with the
|
||||
# gallery's listed files before calling LoadModel.
|
||||
if cfg["needs_opencv_files"]:
|
||||
for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
|
||||
_download_if_missing(model_dir, filename, uri, sha256)
|
||||
|
||||
svc = BackendServicer()
|
||||
ctx = _FakeContext()
|
||||
|
||||
load_res = svc.LoadModel(
|
||||
backend_pb2.ModelOptions(
|
||||
Model=cfg["name"],
|
||||
Options=cfg["options"],
|
||||
# ModelPath is what the Go loader sets to ml.ModelPath —
|
||||
# LocalAI's models directory. The backend anchors relative
|
||||
# paths and insightface auto-download root here.
|
||||
ModelPath=model_dir,
|
||||
),
|
||||
ctx,
|
||||
)
|
||||
if not load_res.success:
|
||||
return False, f"LoadModel: {load_res.message}"
|
||||
|
||||
det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
|
||||
if len(det_res.Detections) == 0:
|
||||
return False, "Detect returned no faces"
|
||||
for d in det_res.Detections:
|
||||
if d.class_name != "face":
|
||||
return False, f"Detect returned class_name={d.class_name!r}"
|
||||
|
||||
emb_ctx = _FakeContext()
|
||||
emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
|
||||
if emb_ctx.code is not None:
|
||||
return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
|
||||
if len(emb_res.embeddings) == 0:
|
||||
return False, "Embedding returned empty vector"
|
||||
norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
|
||||
if not (0.8 <= norm_sq <= 1.2):
|
||||
return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
|
||||
|
||||
ver_ctx = _FakeContext()
|
||||
ver_res = svc.FaceVerify(
|
||||
backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
|
||||
)
|
||||
if ver_ctx.code is not None:
|
||||
return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
|
||||
if not ver_res.verified:
|
||||
return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
|
||||
if ver_res.distance > 0.1:
|
||||
return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
|
||||
|
||||
if cfg["has_analyze"]:
|
||||
an_ctx = _FakeContext()
|
||||
an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
|
||||
if an_ctx.code is not None:
|
||||
return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
|
||||
if len(an_res.faces) == 0:
|
||||
return False, "FaceAnalyze returned no faces"
|
||||
f0 = an_res.faces[0]
|
||||
if f0.age <= 0:
|
||||
return False, f"FaceAnalyze age not populated (age={f0.age})"
|
||||
if f0.dominant_gender not in ("Man", "Woman"):
|
||||
return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
|
||||
|
||||
n_dets = len(det_res.Detections)
|
||||
dim = len(emb_res.embeddings)
|
||||
return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
|
||||
|
||||
|
||||
def main() -> int:
|
||||
# Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
|
||||
# default to a fresh temp dir.
|
||||
model_dir = os.environ.get("LOCALAI_MODELS_PATH")
|
||||
if not model_dir:
|
||||
import tempfile
|
||||
|
||||
model_dir = tempfile.mkdtemp(prefix="face-smoke-")
|
||||
os.makedirs(model_dir, exist_ok=True)
|
||||
print(f"model_dir={model_dir}", file=sys.stderr)
|
||||
|
||||
print("Preparing sample image from insightface.data...", file=sys.stderr)
|
||||
img_b64 = _load_sample_image()
|
||||
|
||||
results: list[tuple[str, bool, str]] = []
|
||||
for cfg in CONFIGS:
|
||||
sys.stderr.write(f"\n=== {cfg['name']} ===\n")
|
||||
sys.stderr.flush()
|
||||
try:
|
||||
ok, detail = _run_one(cfg, img_b64, model_dir)
|
||||
except Exception:
|
||||
ok, detail = False, traceback.format_exc().splitlines()[-1]
|
||||
results.append((cfg["name"], ok, detail))
|
||||
print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s} {detail}")
|
||||
sys.stdout.flush()
|
||||
|
||||
print("\n=== summary ===")
|
||||
passed = sum(1 for _, ok, _ in results if ok)
|
||||
total = len(results)
|
||||
for name, ok, detail in results:
|
||||
mark = "✓" if ok else "✗"
|
||||
print(f" {mark} {name:30s} {detail}")
|
||||
print(f"\n{passed}/{total} passed")
|
||||
return 0 if passed == total else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
234
backend/python/insightface/test.py
Normal file
234
backend/python/insightface/test.py
Normal file
@@ -0,0 +1,234 @@
|
||||
"""Unit tests for the insightface gRPC backend.
|
||||
|
||||
The servicer is instantiated in-process (no gRPC channel) and driven
|
||||
directly. Images come from insightface.data which ships with the pip
|
||||
package — no external downloads.
|
||||
|
||||
Tests are parametrized over both engines (InsightFaceEngine and
|
||||
OnnxDirectEngine) where applicable.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import os
|
||||
import sys
|
||||
import unittest
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
sys.path.insert(0, os.path.dirname(__file__))
|
||||
|
||||
import backend_pb2 # noqa: E402
|
||||
|
||||
from backend import BackendServicer # noqa: E402
|
||||
|
||||
# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
|
||||
# to mirror LocalAI's gallery `files:` flow (the backend image itself
|
||||
# doesn't ship model weights).
|
||||
OPENCV_FILES = [
|
||||
(
|
||||
"face_detection_yunet_2023mar.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
|
||||
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
|
||||
),
|
||||
(
|
||||
"face_recognition_sface_2021dec.onnx",
|
||||
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
|
||||
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def _encode(img: np.ndarray) -> str:
|
||||
_, buf = cv2.imencode(".jpg", img)
|
||||
return base64.b64encode(buf.tobytes()).decode("ascii")
|
||||
|
||||
|
||||
def _load_insightface_samples() -> dict[str, str]:
|
||||
"""Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
|
||||
|
||||
t1 is a group photo, t2 a different one. We reuse both as
|
||||
stand-ins for "Alice photo 1/2" and "Bob".
|
||||
"""
|
||||
from insightface.data import get_image as ins_get_image
|
||||
|
||||
return {
|
||||
"t1": _encode(ins_get_image("t1")),
|
||||
"t2": _encode(ins_get_image("t2")),
|
||||
}
|
||||
|
||||
|
||||
class _FakeContext:
|
||||
"""Minimal stand-in for grpc.ServicerContext."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.code = None
|
||||
self.details = None
|
||||
|
||||
def set_code(self, code):
|
||||
self.code = code
|
||||
|
||||
def set_details(self, details):
|
||||
self.details = details
|
||||
|
||||
|
||||
class _Harness:
|
||||
def __init__(self, servicer: BackendServicer) -> None:
|
||||
self.svc = servicer
|
||||
|
||||
def health(self):
|
||||
return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
|
||||
|
||||
def load(self, options: list[str], model_path: str = ""):
|
||||
return self.svc.LoadModel(
|
||||
backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
|
||||
_FakeContext(),
|
||||
)
|
||||
|
||||
def detect(self, img_b64: str):
|
||||
return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
|
||||
|
||||
def embed(self, img_b64: str):
|
||||
ctx = _FakeContext()
|
||||
res = self.svc.Embedding(
|
||||
backend_pb2.PredictOptions(Images=[img_b64]),
|
||||
ctx,
|
||||
)
|
||||
return res, ctx
|
||||
|
||||
def verify(self, a: str, b: str, threshold: float = 0.0):
|
||||
return self.svc.FaceVerify(
|
||||
backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
|
||||
_FakeContext(),
|
||||
)
|
||||
|
||||
def analyze(self, img_b64: str):
|
||||
return self.svc.FaceAnalyze(
|
||||
backend_pb2.FaceAnalyzeRequest(img=img_b64),
|
||||
_FakeContext(),
|
||||
)
|
||||
|
||||
|
||||
class InsightFaceEngineTest(unittest.TestCase):
|
||||
@classmethod
|
||||
def setUpClass(cls):
|
||||
cls.samples = _load_insightface_samples()
|
||||
cls.harness = _Harness(BackendServicer())
|
||||
load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
|
||||
if not load.success:
|
||||
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
|
||||
|
||||
def test_health(self):
|
||||
self.assertEqual(self.harness.health().message, b"OK")
|
||||
|
||||
def test_detect_finds_face(self):
|
||||
res = self.harness.detect(self.samples["t1"])
|
||||
self.assertGreater(len(res.Detections), 0)
|
||||
for d in res.Detections:
|
||||
self.assertEqual(d.class_name, "face")
|
||||
self.assertGreater(d.width, 0)
|
||||
self.assertGreater(d.height, 0)
|
||||
|
||||
def test_embedding_is_l2_normed(self):
|
||||
res, ctx = self.harness.embed(self.samples["t1"])
|
||||
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
|
||||
self.assertEqual(len(res.embeddings), 512)
|
||||
norm_sq = sum(x * x for x in res.embeddings)
|
||||
self.assertAlmostEqual(norm_sq, 1.0, places=2)
|
||||
|
||||
def test_verify_same_image(self):
|
||||
res = self.harness.verify(self.samples["t1"], self.samples["t1"])
|
||||
self.assertTrue(res.verified)
|
||||
self.assertLess(res.distance, 0.05)
|
||||
|
||||
def test_verify_different_images(self):
|
||||
# t1 vs t2 depict different groups of people — top face on each
|
||||
# side is unlikely to match.
|
||||
res = self.harness.verify(self.samples["t1"], self.samples["t2"])
|
||||
# We assert only that some numerical answer came back; the
|
||||
# matches-or-not determination depends on which face each side
|
||||
# picked and isn't a stable test assertion.
|
||||
self.assertGreaterEqual(res.distance, 0.0)
|
||||
|
||||
def test_analyze_has_age_and_gender(self):
|
||||
res = self.harness.analyze(self.samples["t1"])
|
||||
self.assertGreater(len(res.faces), 0)
|
||||
for face in res.faces:
|
||||
self.assertGreater(face.face_confidence, 0.0)
|
||||
# Age should be populated for buffalo_l.
|
||||
self.assertGreater(face.age, 0.0)
|
||||
self.assertIn(face.dominant_gender, ("Man", "Woman"))
|
||||
|
||||
|
||||
def _prepare_opencv_models_dir() -> str | None:
|
||||
"""Download OpenCV Zoo face ONNX files into a temp dir the way
|
||||
LocalAI's gallery would. Returns the directory, or None if
|
||||
downloads failed (network-restricted sandbox).
|
||||
"""
|
||||
import hashlib
|
||||
import tempfile
|
||||
import urllib.request
|
||||
|
||||
root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
|
||||
prefix="opencv-face-"
|
||||
)
|
||||
for filename, uri, sha256 in OPENCV_FILES:
|
||||
dest = os.path.join(root, filename)
|
||||
if os.path.isfile(dest):
|
||||
if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
|
||||
continue
|
||||
try:
|
||||
urllib.request.urlretrieve(uri, dest)
|
||||
except Exception:
|
||||
return None
|
||||
if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
|
||||
return None
|
||||
return root
|
||||
|
||||
|
||||
class OnnxDirectEngineTest(unittest.TestCase):
|
||||
@classmethod
|
||||
def setUpClass(cls):
|
||||
cls.samples = _load_insightface_samples()
|
||||
cls.model_dir = _prepare_opencv_models_dir()
|
||||
if cls.model_dir is None:
|
||||
raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
|
||||
cls.harness = _Harness(BackendServicer())
|
||||
load = cls.harness.load(
|
||||
[
|
||||
"engine:onnx_direct",
|
||||
"detector_onnx:face_detection_yunet_2023mar.onnx",
|
||||
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
|
||||
],
|
||||
model_path=cls.model_dir,
|
||||
)
|
||||
if not load.success:
|
||||
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
|
||||
|
||||
def test_detect_finds_face(self):
|
||||
res = self.harness.detect(self.samples["t1"])
|
||||
self.assertGreater(len(res.Detections), 0)
|
||||
for d in res.Detections:
|
||||
self.assertEqual(d.class_name, "face")
|
||||
|
||||
def test_embedding_nonempty(self):
|
||||
res, ctx = self.harness.embed(self.samples["t1"])
|
||||
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
|
||||
self.assertGreater(len(res.embeddings), 0)
|
||||
|
||||
def test_verify_same_image(self):
|
||||
res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
|
||||
self.assertTrue(res.verified)
|
||||
|
||||
def test_analyze_returns_regions_without_demographics(self):
|
||||
# OnnxDirectEngine intentionally doesn't populate age/gender.
|
||||
res = self.harness.analyze(self.samples["t1"])
|
||||
self.assertGreater(len(res.faces), 0)
|
||||
for face in res.faces:
|
||||
self.assertEqual(face.dominant_gender, "")
|
||||
self.assertEqual(face.age, 0.0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
11
backend/python/insightface/test.sh
Executable file
11
backend/python/insightface/test.sh
Executable file
@@ -0,0 +1,11 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
backend_dir=$(dirname $0)
|
||||
if [ -d $backend_dir/common ]; then
|
||||
source $backend_dir/common/libbackend.sh
|
||||
else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
runUnittests
|
||||
@@ -1,6 +0,0 @@
|
||||
# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
|
||||
# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
|
||||
# the rocm7.2.1 runtime via AMD's forward-compatibility window.
|
||||
--extra-index-url https://download.pytorch.org/whl/rocm6.4
|
||||
torch==2.8.0+rocm6.4
|
||||
whisperx @ git+https://github.com/m-bain/whisperX.git
|
||||
@@ -7,17 +7,28 @@ import (
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
corebackend "github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
|
||||
"github.com/mudler/LocalAI/core/services/agentpool"
|
||||
"github.com/mudler/LocalAI/core/services/facerecognition"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/core/services/nodes"
|
||||
"github.com/mudler/LocalAI/core/templates"
|
||||
pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
// faceEmbeddingDim is the expected dimension for face embeddings.
|
||||
// Set to 0 so the Registry accepts whatever dim the loaded recognizer
|
||||
// produces — ArcFace R50 is 512-d, MBF is 512-d, SFace is 128-d, and
|
||||
// the insightface backend can load any of them via LoadModel options.
|
||||
// Locking this to a specific value would force a single recognizer
|
||||
// family per deployment; we keep the door open instead.
|
||||
const faceEmbeddingDim = 0
|
||||
|
||||
type Application struct {
|
||||
backendLoader *config.ModelConfigLoader
|
||||
modelLoader *model.ModelLoader
|
||||
@@ -27,6 +38,7 @@ type Application struct {
|
||||
galleryService *galleryop.GalleryService
|
||||
agentJobService *agentpool.AgentJobService
|
||||
agentPoolService atomic.Pointer[agentpool.AgentPoolService]
|
||||
faceRegistry facerecognition.Registry
|
||||
authDB *gorm.DB
|
||||
watchdogMutex sync.Mutex
|
||||
watchdogStop chan bool
|
||||
@@ -50,12 +62,23 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
|
||||
mcpTools.CloseMCPSessions(modelName)
|
||||
})
|
||||
|
||||
return &Application{
|
||||
app := &Application{
|
||||
backendLoader: config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
|
||||
modelLoader: ml,
|
||||
applicationConfig: appConfig,
|
||||
templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
|
||||
}
|
||||
|
||||
// Face-recognition registry backed by LocalAI's built-in vector store.
|
||||
// The resolver closes over the ModelLoader so the Registry stays
|
||||
// decoupled from loader plumbing; swapping in a postgres-backed
|
||||
// implementation later is a single construction change here.
|
||||
faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
|
||||
return corebackend.StoreBackend(ml, appConfig, storeName, "")
|
||||
}
|
||||
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
|
||||
|
||||
return app
|
||||
}
|
||||
|
||||
func (a *Application) ModelConfigLoader() *config.ModelConfigLoader {
|
||||
@@ -99,6 +122,14 @@ func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
|
||||
return a.agentPoolService.Load()
|
||||
}
|
||||
|
||||
// FaceRegistry returns the face-recognition registry used for 1:N
|
||||
// identification. The current implementation is backed by the
|
||||
// in-memory local-store backend; see core/services/facerecognition
|
||||
// for the interface and the postgres TODO.
|
||||
func (a *Application) FaceRegistry() facerecognition.Registry {
|
||||
return a.faceRegistry
|
||||
}
|
||||
|
||||
// AuthDB returns the auth database connection, or nil if auth is not enabled.
|
||||
func (a *Application) AuthDB() *gorm.DB {
|
||||
return a.authDB
|
||||
|
||||
60
core/backend/face_analyze.go
Normal file
60
core/backend/face_analyze.go
Normal file
@@ -0,0 +1,60 @@
|
||||
package backend
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/trace"
|
||||
"github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
)
|
||||
|
||||
func FaceAnalyze(
|
||||
img string,
|
||||
actions []string,
|
||||
antiSpoofing bool,
|
||||
loader *model.ModelLoader,
|
||||
appConfig *config.ApplicationConfig,
|
||||
modelConfig config.ModelConfig,
|
||||
) (*proto.FaceAnalyzeResponse, error) {
|
||||
opts := ModelOptions(modelConfig, appConfig)
|
||||
faceModel, err := loader.Load(opts...)
|
||||
if err != nil {
|
||||
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
|
||||
return nil, err
|
||||
}
|
||||
if faceModel == nil {
|
||||
return nil, fmt.Errorf("could not load face recognition model")
|
||||
}
|
||||
|
||||
var startTime time.Time
|
||||
if appConfig.EnableTracing {
|
||||
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
|
||||
startTime = time.Now()
|
||||
}
|
||||
|
||||
res, err := faceModel.FaceAnalyze(context.Background(), &proto.FaceAnalyzeRequest{
|
||||
Img: img,
|
||||
Actions: actions,
|
||||
AntiSpoofing: antiSpoofing,
|
||||
})
|
||||
|
||||
if appConfig.EnableTracing {
|
||||
errStr := ""
|
||||
if err != nil {
|
||||
errStr = err.Error()
|
||||
}
|
||||
trace.RecordBackendTrace(trace.BackendTrace{
|
||||
Timestamp: startTime,
|
||||
Duration: time.Since(startTime),
|
||||
Type: trace.BackendTraceFaceAnalyze,
|
||||
ModelName: modelConfig.Name,
|
||||
Backend: modelConfig.Backend,
|
||||
Error: errStr,
|
||||
})
|
||||
}
|
||||
|
||||
return res, err
|
||||
}
|
||||
43
core/backend/face_embed.go
Normal file
43
core/backend/face_embed.go
Normal file
@@ -0,0 +1,43 @@
|
||||
package backend
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
)
|
||||
|
||||
// FaceEmbed loads the face recognition backend and returns a 512-d
|
||||
// face embedding for the base64-encoded image. Unlike ModelEmbedding
|
||||
// it passes the image through PredictOptions.Images — the insightface
|
||||
// backend picks the highest-confidence face and returns its
|
||||
// L2-normalized embedding.
|
||||
func FaceEmbed(
|
||||
imgBase64 string,
|
||||
loader *model.ModelLoader,
|
||||
appConfig *config.ApplicationConfig,
|
||||
modelConfig config.ModelConfig,
|
||||
) ([]float32, error) {
|
||||
opts := ModelOptions(modelConfig, appConfig)
|
||||
faceModel, err := loader.Load(opts...)
|
||||
if err != nil {
|
||||
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
|
||||
return nil, err
|
||||
}
|
||||
if faceModel == nil {
|
||||
return nil, fmt.Errorf("could not load face recognition model")
|
||||
}
|
||||
|
||||
predictOpts := gRPCPredictOpts(modelConfig, loader.ModelPath)
|
||||
predictOpts.Images = []string{imgBase64}
|
||||
|
||||
res, err := faceModel.Embeddings(context.Background(), predictOpts)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
if len(res.Embeddings) == 0 {
|
||||
return nil, fmt.Errorf("face embedding returned empty vector (no face detected?)")
|
||||
}
|
||||
return res.Embeddings, nil
|
||||
}
|
||||
61
core/backend/face_verify.go
Normal file
61
core/backend/face_verify.go
Normal file
@@ -0,0 +1,61 @@
|
||||
package backend
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/trace"
|
||||
"github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
)
|
||||
|
||||
func FaceVerify(
|
||||
img1, img2 string,
|
||||
threshold float32,
|
||||
antiSpoofing bool,
|
||||
loader *model.ModelLoader,
|
||||
appConfig *config.ApplicationConfig,
|
||||
modelConfig config.ModelConfig,
|
||||
) (*proto.FaceVerifyResponse, error) {
|
||||
opts := ModelOptions(modelConfig, appConfig)
|
||||
faceModel, err := loader.Load(opts...)
|
||||
if err != nil {
|
||||
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
|
||||
return nil, err
|
||||
}
|
||||
if faceModel == nil {
|
||||
return nil, fmt.Errorf("could not load face recognition model")
|
||||
}
|
||||
|
||||
var startTime time.Time
|
||||
if appConfig.EnableTracing {
|
||||
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
|
||||
startTime = time.Now()
|
||||
}
|
||||
|
||||
res, err := faceModel.FaceVerify(context.Background(), &proto.FaceVerifyRequest{
|
||||
Img1: img1,
|
||||
Img2: img2,
|
||||
Threshold: threshold,
|
||||
AntiSpoofing: antiSpoofing,
|
||||
})
|
||||
|
||||
if appConfig.EnableTracing {
|
||||
errStr := ""
|
||||
if err != nil {
|
||||
errStr = err.Error()
|
||||
}
|
||||
trace.RecordBackendTrace(trace.BackendTrace{
|
||||
Timestamp: startTime,
|
||||
Duration: time.Since(startTime),
|
||||
Type: trace.BackendTraceFaceVerify,
|
||||
ModelName: modelConfig.Name,
|
||||
Backend: modelConfig.Backend,
|
||||
Error: errStr,
|
||||
})
|
||||
}
|
||||
|
||||
return res, err
|
||||
}
|
||||
@@ -40,6 +40,12 @@ type TokenUsage struct {
|
||||
ChatDeltas []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
|
||||
}
|
||||
|
||||
func needsThinkingProbe(c *config.ModelConfig) bool {
|
||||
return c.TemplateConfig.UseTokenizerTemplate &&
|
||||
(c.ReasoningConfig.DisableReasoning == nil ||
|
||||
c.ReasoningConfig.DisableReasoningTagPrefill == nil)
|
||||
}
|
||||
|
||||
// HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
|
||||
// Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
|
||||
func (t TokenUsage) HasChatDeltaContent() bool {
|
||||
@@ -100,11 +106,9 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
|
||||
// tokenizer template path is active) and the multimodal media marker (needed
|
||||
// by custom chat templates so markers line up with what mtmd expects).
|
||||
// We probe whenever any of those slots is still empty.
|
||||
needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
|
||||
c.ReasoningConfig.DisableReasoning == nil &&
|
||||
c.ReasoningConfig.DisableReasoningTagPrefill == nil
|
||||
shouldProbeThinking := needsThinkingProbe(c)
|
||||
needsMarkerProbe := c.MediaMarker == ""
|
||||
if needsThinkingProbe || needsMarkerProbe {
|
||||
if shouldProbeThinking || needsMarkerProbe {
|
||||
modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
|
||||
config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
|
||||
// Update the config in the loader so it persists for future requests
|
||||
|
||||
29
core/backend/llm_probe_test.go
Normal file
29
core/backend/llm_probe_test.go
Normal file
@@ -0,0 +1,29 @@
|
||||
package backend
|
||||
|
||||
import (
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
|
||||
"github.com/gpustack/gguf-parser-go/util/ptr"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("thinking probe gating", func() {
|
||||
It("probes tokenizer-template models when any reasoning default is still unset", func() {
|
||||
cfg := &config.ModelConfig{
|
||||
TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
Expect(needsThinkingProbe(cfg)).To(BeTrue())
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
|
||||
Expect(needsThinkingProbe(cfg)).To(BeTrue())
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
Expect(needsThinkingProbe(cfg)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("does not probe when tokenizer templates are disabled", func() {
|
||||
cfg := &config.ModelConfig{}
|
||||
Expect(needsThinkingProbe(cfg)).To(BeFalse())
|
||||
})
|
||||
})
|
||||
@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
app, err := application.New(opts...)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
|
||||
return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n 1. Check that your models directory exists and is accessible: %s\n 2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n 3. Check available disk space and file permissions\n 4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
|
||||
}
|
||||
|
||||
appHTTP, err := http.API(app)
|
||||
|
||||
@@ -3,7 +3,6 @@ package cli
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
@@ -60,7 +59,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
c, exists := cl.GetModelConfig(t.Model)
|
||||
if !exists {
|
||||
return errors.New("model not found")
|
||||
return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
|
||||
}
|
||||
|
||||
c.Threads = &t.Threads
|
||||
|
||||
@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
|
||||
if len(u.Args) == 0 {
|
||||
return fmt.Errorf("no GGUF file provided")
|
||||
return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
|
||||
}
|
||||
// We try to guess only if we don't have a template defined already
|
||||
f, err := gguf.ParseGGUFFile(u.Args[0])
|
||||
|
||||
@@ -21,6 +21,7 @@ import (
|
||||
"github.com/mudler/LocalAI/core/cli/workerregistry"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/core/services/messaging"
|
||||
"github.com/mudler/LocalAI/core/services/nodes"
|
||||
"github.com/mudler/LocalAI/core/services/storage"
|
||||
@@ -597,12 +598,20 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
|
||||
// Try to find the backend binary
|
||||
backendPath := s.findBackend(req.Backend)
|
||||
if backendPath == "" {
|
||||
// Backend not found locally — try auto-installing from gallery
|
||||
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
|
||||
if err := gallery.InstallBackendFromGallery(
|
||||
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
if req.URI != "" {
|
||||
xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
|
||||
if err := galleryop.InstallExternalBackend(
|
||||
context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
}
|
||||
} else {
|
||||
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
|
||||
if err := gallery.InstallBackendFromGallery(
|
||||
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
}
|
||||
}
|
||||
// Re-register after install and retry
|
||||
gallery.RegisterBackends(s.systemState, s.ml)
|
||||
|
||||
@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
|
||||
// Check if the token is set
|
||||
// as we always need it.
|
||||
if r.Token == "" {
|
||||
return fmt.Errorf("Token is required")
|
||||
return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
|
||||
}
|
||||
|
||||
port, err := freeport.GetFreePort()
|
||||
|
||||
@@ -125,19 +125,7 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
|
||||
return
|
||||
}
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
|
||||
|
||||
// Use the rendered template to detect if thinking token is at the end
|
||||
// This reuses the existing DetectThinkingStartToken function
|
||||
if metadata.RenderedTemplate != "" {
|
||||
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
|
||||
thinkingForcedOpen := thinkingStartToken != ""
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
|
||||
} else {
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
|
||||
}
|
||||
applyDetectedThinkingConfig(cfg, metadata)
|
||||
|
||||
// Extract tool format markers from autoparser analysis
|
||||
if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
|
||||
@@ -180,3 +168,34 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
|
||||
if cfg == nil || metadata == nil {
|
||||
return
|
||||
}
|
||||
|
||||
// Respect explicit YAML/user config. Backend probing should only fill defaults
|
||||
// when the reasoning mode has not already been set.
|
||||
if cfg.ReasoningConfig.DisableReasoning == nil {
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
|
||||
}
|
||||
|
||||
// Respect explicit prefill config for the same reason. Only infer the
|
||||
// default prefill behavior when the user did not set it.
|
||||
if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
|
||||
// Use the rendered template to detect if thinking token is at the end.
|
||||
// This reuses the existing DetectThinkingStartToken function.
|
||||
if metadata.RenderedTemplate != "" {
|
||||
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
|
||||
thinkingForcedOpen := thinkingStartToken != ""
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
|
||||
} else {
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
|
||||
}
|
||||
|
||||
101
core/config/gguf_reasoning_test.go
Normal file
101
core/config/gguf_reasoning_test.go
Normal file
@@ -0,0 +1,101 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/LocalAI/pkg/reasoning"
|
||||
|
||||
"github.com/gpustack/gguf-parser-go/util/ptr"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("GGUF backend metadata reasoning defaults", func() {
|
||||
It("fills reasoning defaults when unset", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
|
||||
})
|
||||
|
||||
It("preserves fully explicit reasoning settings", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoning: ptr.To(true),
|
||||
DisableReasoningTagPrefill: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
|
||||
It("preserves explicit disable while still inferring missing prefill", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoning: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
|
||||
})
|
||||
|
||||
It("preserves explicit prefill while still inferring missing disable flag", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoningTagPrefill: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
|
||||
It("defaults to disabling reasoning when backend does not support thinking", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: false,
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
})
|
||||
@@ -588,6 +588,7 @@ const (
|
||||
FLAG_VAD ModelConfigUsecase = 0b010000000000
|
||||
FLAG_VIDEO ModelConfigUsecase = 0b100000000000
|
||||
FLAG_DETECTION ModelConfigUsecase = 0b1000000000000
|
||||
FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000
|
||||
|
||||
// Common Subsets
|
||||
FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
|
||||
@@ -611,6 +612,7 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
|
||||
"FLAG_LLM": FLAG_LLM,
|
||||
"FLAG_VIDEO": FLAG_VIDEO,
|
||||
"FLAG_DETECTION": FLAG_DETECTION,
|
||||
"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -651,7 +653,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
|
||||
nonTextGenBackends := []string{
|
||||
"whisper", "piper", "kokoro",
|
||||
"diffusers", "stablediffusion", "stablediffusion-ggml",
|
||||
"rerankers", "silero-vad", "rfdetr",
|
||||
"rerankers", "silero-vad", "rfdetr", "insightface",
|
||||
"transformers-musicgen", "ace-step", "acestep-cpp",
|
||||
}
|
||||
|
||||
@@ -728,12 +730,19 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
|
||||
}
|
||||
|
||||
if (u & FLAG_DETECTION) == FLAG_DETECTION {
|
||||
detectionBackends := []string{"rfdetr", "sam3-cpp"}
|
||||
detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
|
||||
if !slices.Contains(detectionBackends, c.Backend) {
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
|
||||
faceBackends := []string{"insightface"}
|
||||
if !slices.Contains(faceBackends, c.Backend) {
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
|
||||
soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
|
||||
if !slices.Contains(soundGenBackends, c.Backend) {
|
||||
|
||||
@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
|
||||
bcl.configs[c.Name] = *c
|
||||
} else {
|
||||
if err != nil {
|
||||
return fmt.Errorf("config is not valid: %w", err)
|
||||
return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
|
||||
}
|
||||
return fmt.Errorf("config is not valid")
|
||||
return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
|
||||
}
|
||||
|
||||
return nil
|
||||
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
|
||||
files = append(files, info)
|
||||
}
|
||||
for _, file := range files {
|
||||
// Skip templates, YAML and .keep files
|
||||
if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
|
||||
strings.HasPrefix(file.Name(), ".") {
|
||||
// Only load real YAML config files and ignore dotfiles or backup variants
|
||||
ext := strings.ToLower(filepath.Ext(file.Name()))
|
||||
if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
|
||||
continue
|
||||
}
|
||||
|
||||
|
||||
@@ -2,6 +2,7 @@ package config
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
@@ -109,5 +110,50 @@ options:
|
||||
Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))
|
||||
|
||||
})
|
||||
|
||||
It("Only loads files ending with yaml or yml", func() {
|
||||
tmpdir, err := os.MkdirTemp("", "model-config-loader")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpdir)
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
|
||||
`name: "foo-model"
|
||||
description: "formal config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
|
||||
`name: "foo-model"
|
||||
description: "backup config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo-backup.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
|
||||
`name: "foo-backup-only"
|
||||
description: "timestamped backup config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo-timestamped.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
bcl := NewModelConfigLoader(tmpdir)
|
||||
err = bcl.LoadModelConfigsFromPath(tmpdir)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
configs := bcl.GetAllModelsConfigs()
|
||||
Expect(configs).To(HaveLen(1))
|
||||
Expect(configs[0].Name).To(Equal("foo-model"))
|
||||
Expect(configs[0].Description).To(Equal("formal config"))
|
||||
|
||||
_, exists := bcl.GetModelConfig("foo-backup-only")
|
||||
Expect(exists).To(BeFalse())
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -110,7 +110,13 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if backends.Exists(name) {
|
||||
// Only short-circuit if the install is *actually usable*. An orphaned
|
||||
// meta entry whose concrete was removed still shows up in
|
||||
// ListSystemBackends with a RunFile pointing at a path that no longer
|
||||
// exists; returning early there leaves the caller with a broken
|
||||
// alias and the worker fails with "backend not found after install
|
||||
// attempt" on every retry. Re-install in that case.
|
||||
if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
|
||||
return nil
|
||||
}
|
||||
}
|
||||
@@ -375,17 +381,44 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
|
||||
}
|
||||
|
||||
if metadata != nil && metadata.MetaBackendFor != "" {
|
||||
metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
|
||||
xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
|
||||
if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
|
||||
return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
|
||||
concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
|
||||
xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
|
||||
// If the concrete the meta points to is already gone (earlier delete,
|
||||
// partial install, or manual cleanup), keep going and remove the
|
||||
// orphaned meta dir. Previously we returned an error here, which made
|
||||
// the orphaned meta impossible to uninstall from the UI — the delete
|
||||
// kept failing and every subsequent install short-circuited because
|
||||
// the stale meta metadata made ListSystemBackends.Exists(name) true.
|
||||
if _, statErr := os.Stat(concreteDirectory); statErr == nil {
|
||||
os.RemoveAll(concreteDirectory)
|
||||
} else if os.IsNotExist(statErr) {
|
||||
xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
|
||||
"meta", name, "concrete", metadata.MetaBackendFor)
|
||||
} else {
|
||||
return statErr
|
||||
}
|
||||
os.RemoveAll(metaBackendDirectory)
|
||||
}
|
||||
|
||||
return os.RemoveAll(backendDirectory)
|
||||
}
|
||||
|
||||
// isBackendRunnable reports whether the given backend entry can actually be
|
||||
// invoked. A meta backend is runnable only if its concrete's run.sh still
|
||||
// exists on disk; concrete backends are considered runnable as long as their
|
||||
// RunFile is set (ListSystemBackends only emits them when the runfile is
|
||||
// present). Used to guard the "already installed" short-circuit so an
|
||||
// orphaned meta pointing at a missing concrete triggers a real reinstall
|
||||
// rather than being silently skipped.
|
||||
func isBackendRunnable(b SystemBackend) bool {
|
||||
if b.RunFile == "" {
|
||||
return false
|
||||
}
|
||||
if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
type SystemBackend struct {
|
||||
Name string
|
||||
RunFile string
|
||||
|
||||
@@ -952,6 +952,58 @@ var _ = Describe("Gallery Backends", func() {
|
||||
err = DeleteBackendFromSystem(systemState, "non-existent")
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("removes an orphaned meta backend whose concrete is missing", func() {
|
||||
// Real scenario from the dev cluster: the concrete got wiped
|
||||
// (partial install, manual cleanup, previous crash) but the meta
|
||||
// directory + metadata.json still points at it. The old code
|
||||
// errored with "meta backend X not found" and left the orphan in
|
||||
// place, making the backend impossible to uninstall.
|
||||
metaName := "meta-backend"
|
||||
concreteName := "concrete-backend-that-vanished"
|
||||
metaPath := filepath.Join(tempDir, metaName)
|
||||
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
|
||||
|
||||
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
|
||||
data, err := json.MarshalIndent(meta, "", " ")
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
|
||||
|
||||
// Concrete directory intentionally absent.
|
||||
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
|
||||
Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
|
||||
Expect(metaPath).NotTo(BeADirectory())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
|
||||
It("re-runs install when the meta's concrete is missing", func() {
|
||||
// Seed state: meta dir exists with metadata pointing at a
|
||||
// concrete that was removed from disk. ListSystemBackends still
|
||||
// surfaces the meta via its metadata.Name → the old short-circuit
|
||||
// at `if backends.Exists(name) { return nil }` returned silently,
|
||||
// leaving the worker's findBackend() with a dead alias forever.
|
||||
// The fix: require the backend to be runnable before we skip.
|
||||
metaName := "meta-orphan"
|
||||
concreteName := "concrete-gone"
|
||||
metaPath := filepath.Join(tempDir, metaName)
|
||||
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
|
||||
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
|
||||
data, err := json.MarshalIndent(meta, "", " ")
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
|
||||
|
||||
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
|
||||
listed, err := ListSystemBackends(systemState)
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
b, ok := listed.Get(metaName)
|
||||
Expect(ok).To(BeTrue())
|
||||
Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
|
||||
})
|
||||
})
|
||||
|
||||
Describe("ListSystemBackends", func() {
|
||||
|
||||
@@ -57,6 +57,14 @@ var RouteFeatureRegistry = []RouteFeature{
|
||||
// Detection
|
||||
{"POST", "/v1/detection", FeatureDetection},
|
||||
|
||||
// Face recognition
|
||||
{"POST", "/v1/face/verify", FeatureFaceRecognition},
|
||||
{"POST", "/v1/face/analyze", FeatureFaceRecognition},
|
||||
{"POST", "/v1/face/embed", FeatureFaceRecognition},
|
||||
{"POST", "/v1/face/register", FeatureFaceRecognition},
|
||||
{"POST", "/v1/face/identify", FeatureFaceRecognition},
|
||||
{"POST", "/v1/face/forget", FeatureFaceRecognition},
|
||||
|
||||
// Video
|
||||
{"POST", "/video", FeatureVideo},
|
||||
|
||||
@@ -151,5 +159,6 @@ func APIFeatureMetas() []FeatureMeta {
|
||||
{FeatureTokenize, "Tokenize", true},
|
||||
{FeatureMCP, "MCP", true},
|
||||
{FeatureStores, "Stores", true},
|
||||
{FeatureFaceRecognition, "Face Recognition", true},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -51,6 +51,7 @@ const (
|
||||
FeatureTokenize = "tokenize"
|
||||
FeatureMCP = "mcp"
|
||||
FeatureStores = "stores"
|
||||
FeatureFaceRecognition = "face_recognition"
|
||||
)
|
||||
|
||||
// AgentFeatures lists agent-related features (default OFF).
|
||||
@@ -64,6 +65,7 @@ var APIFeatures = []string{
|
||||
FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription,
|
||||
FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound,
|
||||
FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores,
|
||||
FeatureFaceRecognition,
|
||||
}
|
||||
|
||||
// AllFeatures lists all known features (used by UI and validation).
|
||||
|
||||
@@ -73,6 +73,12 @@ var instructionDefs = []instructionDef{
|
||||
Description: "Video generation from text prompts",
|
||||
Tags: []string{"video"},
|
||||
},
|
||||
{
|
||||
Name: "face-recognition",
|
||||
Description: "Face verification (1:1), identification (1:N), embedding, and demographic analysis",
|
||||
Tags: []string{"face-recognition"},
|
||||
Intro: "The /v1/face/register, /identify, and /forget endpoints build on a vector store — registrations are in-memory by default and lost on restart. Use /v1/face/embed for a raw embedding; /v1/embeddings is OpenAI-compatible and text-only.",
|
||||
},
|
||||
}
|
||||
|
||||
// swaggerState holds parsed swagger spec data, initialised once.
|
||||
|
||||
@@ -39,7 +39,7 @@ var _ = Describe("API Instructions Endpoints", func() {
|
||||
|
||||
instructions, ok := resp["instructions"].([]any)
|
||||
Expect(ok).To(BeTrue())
|
||||
Expect(instructions).To(HaveLen(9))
|
||||
Expect(instructions).To(HaveLen(10))
|
||||
|
||||
// Verify each instruction has required fields and correct URL format
|
||||
for _, s := range instructions {
|
||||
@@ -73,6 +73,7 @@ var _ = Describe("API Instructions Endpoints", func() {
|
||||
"model-management",
|
||||
"monitoring",
|
||||
"agents",
|
||||
"face-recognition",
|
||||
))
|
||||
})
|
||||
})
|
||||
|
||||
@@ -9,19 +9,26 @@ import (
|
||||
// BackendMonitorEndpoint returns the status of the specified backend
|
||||
// @Summary Backend monitor endpoint
|
||||
// @Tags monitoring
|
||||
// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
|
||||
// @Param model query string true "Name of the model to monitor"
|
||||
// @Success 200 {object} proto.StatusResponse "Response"
|
||||
// @Router /backend/monitor [get]
|
||||
func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
|
||||
input := new(schema.BackendMonitorRequest)
|
||||
// Get input data from the request body
|
||||
if err := c.Bind(input); err != nil {
|
||||
return err
|
||||
model := c.QueryParam("model")
|
||||
// Fall back to binding the request body so pre-existing clients that
|
||||
// sent `{"model": "..."}` with GET keep working.
|
||||
if model == "" {
|
||||
input := new(schema.BackendMonitorRequest)
|
||||
if err := c.Bind(input); err != nil {
|
||||
return err
|
||||
}
|
||||
model = input.Model
|
||||
}
|
||||
if model == "" {
|
||||
return echo.NewHTTPError(400, "model query parameter is required")
|
||||
}
|
||||
|
||||
resp, err := bm.CheckAndSample(input.Model)
|
||||
resp, err := bm.CheckAndSample(model)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -9,7 +9,6 @@ import (
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/LocalAI/pkg/utils"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
@@ -34,14 +33,14 @@ func DetectionEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC
|
||||
|
||||
xlog.Debug("Detection", "image", input.Image, "modelFile", "modelFile", "backend", cfg.Backend)
|
||||
|
||||
image, err := utils.GetContentURIAsBase64(input.Image)
|
||||
image, err := decodeImageInput(input.Image)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
res, err := backend.Detection(image, input.Prompt, input.Points, input.Boxes, input.Threshold, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return err
|
||||
return mapBackendError(err)
|
||||
}
|
||||
|
||||
response := schema.DetectionResponse{
|
||||
|
||||
69
core/http/endpoints/localai/face_analyze.go
Normal file
69
core/http/endpoints/localai/face_analyze.go
Normal file
@@ -0,0 +1,69 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// FaceAnalyzeEndpoint returns demographic attributes for faces in an image.
|
||||
// @Summary Analyze demographic attributes (age, gender, ...) of faces.
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceAnalyzeRequest true "query params"
|
||||
// @Success 200 {object} schema.FaceAnalyzeResponse "Response"
|
||||
// @Router /v1/face/analyze [post]
|
||||
func FaceAnalyzeEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceAnalyzeRequest)
|
||||
if !ok || input.Model == "" {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
|
||||
if !ok || cfg == nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
|
||||
img, err := decodeImageInput(input.Img)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
xlog.Debug("FaceAnalyze", "model", cfg.Name, "backend", cfg.Backend, "actions", input.Actions)
|
||||
res, err := backend.FaceAnalyze(img, input.Actions, input.AntiSpoofing, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return mapBackendError(err)
|
||||
}
|
||||
|
||||
response := schema.FaceAnalyzeResponse{
|
||||
Faces: make([]schema.FaceAnalysis, len(res.GetFaces())),
|
||||
}
|
||||
for i, f := range res.GetFaces() {
|
||||
response.Faces[i] = schema.FaceAnalysis{
|
||||
Region: schema.FacialArea{
|
||||
X: f.GetRegion().GetX(),
|
||||
Y: f.GetRegion().GetY(),
|
||||
W: f.GetRegion().GetW(),
|
||||
H: f.GetRegion().GetH(),
|
||||
},
|
||||
FaceConfidence: f.GetFaceConfidence(),
|
||||
Age: f.GetAge(),
|
||||
DominantGender: f.GetDominantGender(),
|
||||
Gender: f.GetGender(),
|
||||
DominantEmotion: f.GetDominantEmotion(),
|
||||
Emotion: f.GetEmotion(),
|
||||
DominantRace: f.GetDominantRace(),
|
||||
Race: f.GetRace(),
|
||||
IsReal: f.GetIsReal(),
|
||||
AntispoofScore: f.GetAntispoofScore(),
|
||||
}
|
||||
}
|
||||
|
||||
return c.JSON(http.StatusOK, response)
|
||||
}
|
||||
}
|
||||
54
core/http/endpoints/localai/face_embed.go
Normal file
54
core/http/endpoints/localai/face_embed.go
Normal file
@@ -0,0 +1,54 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// FaceEmbedEndpoint extracts a face embedding vector from an image.
|
||||
//
|
||||
// Distinct from /v1/embeddings, which is OpenAI-compatible and text-only
|
||||
// by contract (its `input` field is a string or string list of TEXT to
|
||||
// embed). Passing an image data-URI to /v1/embeddings does not work —
|
||||
// use this endpoint instead.
|
||||
//
|
||||
// @Summary Extract a face embedding from an image.
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceEmbedRequest true "query params"
|
||||
// @Success 200 {object} schema.FaceEmbedResponse "Response"
|
||||
// @Router /v1/face/embed [post]
|
||||
func FaceEmbedEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceEmbedRequest)
|
||||
if !ok || input.Model == "" {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
|
||||
if !ok || cfg == nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
|
||||
img, err := decodeImageInput(input.Img)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
xlog.Debug("FaceEmbed", "model", cfg.Name, "backend", cfg.Backend)
|
||||
vec, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return mapBackendError(err)
|
||||
}
|
||||
return c.JSON(http.StatusOK, schema.FaceEmbedResponse{
|
||||
Embedding: vec,
|
||||
Dim: len(vec),
|
||||
Model: cfg.Name,
|
||||
})
|
||||
}
|
||||
}
|
||||
45
core/http/endpoints/localai/face_forget.go
Normal file
45
core/http/endpoints/localai/face_forget.go
Normal file
@@ -0,0 +1,45 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/core/services/facerecognition"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// FaceForgetEndpoint removes a previously-registered face by ID.
|
||||
// @Summary Remove a previously-registered face by ID.
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceForgetRequest true "query params"
|
||||
// @Success 204 "No Content"
|
||||
// @Router /v1/face/forget [post]
|
||||
func FaceForgetEndpoint(registry facerecognition.Registry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceForgetRequest)
|
||||
if !ok {
|
||||
// Forget doesn't need a face model loaded — fall back to a raw bind
|
||||
// when the request extractor hasn't run (e.g. when the route was
|
||||
// registered without SetModelAndConfig).
|
||||
input = new(schema.FaceForgetRequest)
|
||||
if err := c.Bind(input); err != nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
}
|
||||
if input.ID == "" {
|
||||
return echo.NewHTTPError(http.StatusBadRequest, "id is required")
|
||||
}
|
||||
|
||||
xlog.Debug("FaceForget", "id", input.ID)
|
||||
if err := registry.Forget(c.Request().Context(), input.ID); err != nil {
|
||||
if errors.Is(err, facerecognition.ErrNotFound) {
|
||||
return echo.NewHTTPError(http.StatusNotFound, err.Error())
|
||||
}
|
||||
return err
|
||||
}
|
||||
return c.NoContent(http.StatusNoContent)
|
||||
}
|
||||
}
|
||||
80
core/http/endpoints/localai/face_identify.go
Normal file
80
core/http/endpoints/localai/face_identify.go
Normal file
@@ -0,0 +1,80 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"cmp"
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/core/services/facerecognition"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// defaultIdentifyThreshold is the cosine-distance cutoff applied when
|
||||
// the client does not specify one. Tuned for buffalo_l ArcFace R50;
|
||||
// other recognizers (e.g. SFace) should override it explicitly.
|
||||
const defaultIdentifyThreshold = float32(0.35)
|
||||
|
||||
// FaceIdentifyEndpoint runs 1:N identification against the registered store.
|
||||
// @Summary Identify a face against the registered database (1:N recognition).
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceIdentifyRequest true "query params"
|
||||
// @Success 200 {object} schema.FaceIdentifyResponse "Response"
|
||||
// @Router /v1/face/identify [post]
|
||||
func FaceIdentifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceIdentifyRequest)
|
||||
if !ok || input.Model == "" {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
|
||||
if !ok || cfg == nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
|
||||
img, err := decodeImageInput(input.Img)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
topK := cmp.Or(input.TopK, 5)
|
||||
threshold := cmp.Or(input.Threshold, defaultIdentifyThreshold)
|
||||
|
||||
xlog.Debug("FaceIdentify", "model", cfg.Name, "topK", topK, "threshold", threshold)
|
||||
probe, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return mapBackendError(err)
|
||||
}
|
||||
|
||||
matches, err := registry.Identify(c.Request().Context(), probe, topK)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
response := schema.FaceIdentifyResponse{
|
||||
Matches: make([]schema.FaceIdentifyMatch, len(matches)),
|
||||
}
|
||||
for i, m := range matches {
|
||||
confidence := (1 - m.Distance/threshold) * 100
|
||||
if confidence < 0 {
|
||||
confidence = 0
|
||||
}
|
||||
if confidence > 100 {
|
||||
confidence = 100
|
||||
}
|
||||
response.Matches[i] = schema.FaceIdentifyMatch{
|
||||
ID: m.ID,
|
||||
Name: m.Metadata.Name,
|
||||
Labels: m.Metadata.Labels,
|
||||
Distance: m.Distance,
|
||||
Confidence: confidence,
|
||||
Match: m.Distance <= threshold,
|
||||
}
|
||||
}
|
||||
return c.JSON(http.StatusOK, response)
|
||||
}
|
||||
}
|
||||
60
core/http/endpoints/localai/face_register.go
Normal file
60
core/http/endpoints/localai/face_register.go
Normal file
@@ -0,0 +1,60 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/core/services/facerecognition"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// FaceRegisterEndpoint enrolls a face into the 1:N identification store.
|
||||
// @Summary Register a face for 1:N identification.
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceRegisterRequest true "query params"
|
||||
// @Success 200 {object} schema.FaceRegisterResponse "Response"
|
||||
// @Router /v1/face/register [post]
|
||||
func FaceRegisterEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceRegisterRequest)
|
||||
if !ok || input.Model == "" {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
|
||||
if !ok || cfg == nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
if input.Name == "" {
|
||||
return echo.NewHTTPError(http.StatusBadRequest, "name is required")
|
||||
}
|
||||
|
||||
img, err := decodeImageInput(input.Img)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
xlog.Debug("FaceRegister", "model", cfg.Name, "name", input.Name)
|
||||
embedding, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return mapBackendError(err)
|
||||
}
|
||||
|
||||
stored, err := registry.Register(c.Request().Context(), embedding, facerecognition.Metadata{
|
||||
Name: input.Name,
|
||||
Labels: input.Labels,
|
||||
})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return c.JSON(http.StatusOK, schema.FaceRegisterResponse{
|
||||
ID: stored.ID,
|
||||
Name: stored.Name,
|
||||
RegisteredAt: stored.RegisteredAt,
|
||||
})
|
||||
}
|
||||
}
|
||||
68
core/http/endpoints/localai/face_verify.go
Normal file
68
core/http/endpoints/localai/face_verify.go
Normal file
@@ -0,0 +1,68 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/core/backend"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/xlog"
|
||||
)
|
||||
|
||||
// FaceVerifyEndpoint compares two images and reports whether they depict the same person.
|
||||
// @Summary Verify that two images depict the same person.
|
||||
// @Tags face-recognition
|
||||
// @Param request body schema.FaceVerifyRequest true "query params"
|
||||
// @Success 200 {object} schema.FaceVerifyResponse "Response"
|
||||
// @Router /v1/face/verify [post]
|
||||
func FaceVerifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceVerifyRequest)
|
||||
if !ok || input.Model == "" {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
|
||||
if !ok || cfg == nil {
|
||||
return echo.ErrBadRequest
|
||||
}
|
||||
|
||||
img1, err := decodeImageInput(input.Img1)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
img2, err := decodeImageInput(input.Img2)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
xlog.Debug("FaceVerify", "model", cfg.Name, "backend", cfg.Backend)
|
||||
res, err := backend.FaceVerify(img1, img2, input.Threshold, input.AntiSpoofing, ml, appConfig, *cfg)
|
||||
if err != nil {
|
||||
return mapBackendError(err)
|
||||
}
|
||||
|
||||
return c.JSON(http.StatusOK, schema.FaceVerifyResponse{
|
||||
Verified: res.GetVerified(),
|
||||
Distance: res.GetDistance(),
|
||||
Threshold: res.GetThreshold(),
|
||||
Confidence: res.GetConfidence(),
|
||||
Model: res.GetModel(),
|
||||
Img1Area: schema.FacialArea{
|
||||
X: res.GetImg1Area().GetX(),
|
||||
Y: res.GetImg1Area().GetY(),
|
||||
W: res.GetImg1Area().GetW(),
|
||||
H: res.GetImg1Area().GetH(),
|
||||
},
|
||||
Img2Area: schema.FacialArea{
|
||||
X: res.GetImg2Area().GetX(),
|
||||
Y: res.GetImg2Area().GetY(),
|
||||
W: res.GetImg2Area().GetW(),
|
||||
H: res.GetImg2Area().GetH(),
|
||||
},
|
||||
ProcessingTimeMs: res.GetProcessingTimeMs(),
|
||||
})
|
||||
}
|
||||
}
|
||||
55
core/http/endpoints/localai/images.go
Normal file
55
core/http/endpoints/localai/images.go
Normal file
@@ -0,0 +1,55 @@
|
||||
package localai
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net/http"
|
||||
|
||||
"github.com/labstack/echo/v4"
|
||||
"github.com/mudler/LocalAI/pkg/utils"
|
||||
"google.golang.org/grpc/codes"
|
||||
"google.golang.org/grpc/status"
|
||||
)
|
||||
|
||||
// decodeImageInput resolves a URL, data-URI, or plain-string image
|
||||
// input to a base64 payload ready for the gRPC surface. Errors from
|
||||
// the underlying utils helper (bad URL, not a data-URI, download
|
||||
// failure, etc.) are all caused by what the client sent — we surface
|
||||
// them as 400 rather than the default 500 so API consumers can
|
||||
// distinguish "you sent bad input" from "our server broke".
|
||||
//
|
||||
// This is the single-input path for endpoints where the image IS the
|
||||
// request (detection, face recognition, etc.). The multi-modal message
|
||||
// paths (chat completions, responses API, realtime) intentionally
|
||||
// log-and-skip individual media parts; they don't use this helper.
|
||||
func decodeImageInput(s string) (string, error) {
|
||||
img, err := utils.GetContentURIAsBase64(s)
|
||||
if err != nil {
|
||||
return "", echo.NewHTTPError(http.StatusBadRequest, fmt.Sprintf("invalid image input: %v", err))
|
||||
}
|
||||
return img, nil
|
||||
}
|
||||
|
||||
// mapBackendError converts the gRPC status code a backend returns into
|
||||
// a matching HTTP status. Without this, every backend error defaults
|
||||
// to 500 — which lies to API consumers when the backend is telling us
|
||||
// "your input was bad" (INVALID_ARGUMENT) or "the resource doesn't
|
||||
// exist" (NOT_FOUND). Pass any err from a `core/backend/*` call
|
||||
// through this before returning from a handler.
|
||||
func mapBackendError(err error) error {
|
||||
if err == nil {
|
||||
return nil
|
||||
}
|
||||
if st, ok := status.FromError(err); ok {
|
||||
switch st.Code() {
|
||||
case codes.InvalidArgument:
|
||||
return echo.NewHTTPError(http.StatusBadRequest, st.Message())
|
||||
case codes.NotFound:
|
||||
return echo.NewHTTPError(http.StatusNotFound, st.Message())
|
||||
case codes.FailedPrecondition:
|
||||
return echo.NewHTTPError(http.StatusPreconditionFailed, st.Message())
|
||||
case codes.Unimplemented:
|
||||
return echo.NewHTTPError(http.StatusNotImplemented, st.Message())
|
||||
}
|
||||
}
|
||||
return err
|
||||
}
|
||||
@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
|
||||
if err := c.Bind(&req); err != nil || req.Backend == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
|
||||
}
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
|
||||
if err != nil {
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
|
||||
|
||||
@@ -110,6 +110,27 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
|
||||
})
|
||||
}
|
||||
|
||||
// The UI reads ApiKeys from GET /api/settings, which already returns the
|
||||
// merged env+runtime list. When the user clicks Save, the same merged
|
||||
// list comes back in the POST body. Strip the env-supplied keys from
|
||||
// the incoming list before we persist or re-merge, otherwise each save
|
||||
// duplicates the env keys on top of the previous merge (#9071).
|
||||
if settings.ApiKeys != nil {
|
||||
envKeys := startupConfig.ApiKeys
|
||||
envSet := make(map[string]struct{}, len(envKeys))
|
||||
for _, k := range envKeys {
|
||||
envSet[k] = struct{}{}
|
||||
}
|
||||
runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
|
||||
for _, k := range *settings.ApiKeys {
|
||||
if _, fromEnv := envSet[k]; fromEnv {
|
||||
continue
|
||||
}
|
||||
runtimeOnly = append(runtimeOnly, k)
|
||||
}
|
||||
settings.ApiKeys = &runtimeOnly
|
||||
}
|
||||
|
||||
settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
|
||||
settingsJSON, err := json.MarshalIndent(settings, "", " ")
|
||||
if err != nil {
|
||||
|
||||
@@ -147,6 +147,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
result := ""
|
||||
lastEmittedCount := 0
|
||||
sentInitialRole := false
|
||||
sentReasoning := false
|
||||
hasChatDeltaToolCalls := false
|
||||
hasChatDeltaContent := false
|
||||
|
||||
@@ -190,6 +191,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
sentReasoning = true
|
||||
}
|
||||
|
||||
// Stream content deltas (cleaned of reasoning tags) while no tool calls
|
||||
@@ -363,7 +365,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
|
||||
}
|
||||
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
|
||||
noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0
|
||||
// noAction is a sentinel "just answer" pseudo-function — not a real
|
||||
// tool call. Scan the whole slice rather than only index 0 so we
|
||||
// don't drop a real tool call that happens to follow a noAction
|
||||
// entry, and so the default branch isn't entered with only noAction
|
||||
// entries to emit as tool_calls.
|
||||
noActionToRun := !hasRealCall(functionResults, noAction)
|
||||
|
||||
switch {
|
||||
case noActionToRun:
|
||||
@@ -377,108 +384,31 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
|
||||
}
|
||||
|
||||
if sentInitialRole {
|
||||
// Content was already streamed during the callback — just emit usage.
|
||||
delta := &schema.Message{}
|
||||
if reasoning != "" && extractor.Reasoning() == "" {
|
||||
delta.Reasoning = &reasoning
|
||||
}
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
}
|
||||
} else {
|
||||
// Content was NOT streamed — send everything at once (fallback).
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
|
||||
result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
|
||||
if err != nil {
|
||||
xlog.Error("error handling question", "error", err)
|
||||
return err
|
||||
}
|
||||
|
||||
delta := &schema.Message{Content: &result}
|
||||
if reasoning != "" {
|
||||
delta.Reasoning = &reasoning
|
||||
}
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
var result string
|
||||
if !sentInitialRole {
|
||||
var hqErr error
|
||||
result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
|
||||
if hqErr != nil {
|
||||
xlog.Error("error handling question", "error", hqErr)
|
||||
return hqErr
|
||||
}
|
||||
}
|
||||
for _, chunk := range buildNoActionFinalChunks(
|
||||
id, req.Model, created,
|
||||
sentInitialRole, sentReasoning,
|
||||
result, reasoning, usage,
|
||||
) {
|
||||
responses <- chunk
|
||||
}
|
||||
|
||||
default:
|
||||
for i, ss := range functionResults {
|
||||
name, args := ss.Name, ss.Arguments
|
||||
toolCallID := ss.ID
|
||||
if toolCallID == "" {
|
||||
toolCallID = id
|
||||
}
|
||||
|
||||
if i < lastEmittedCount {
|
||||
// Already emitted during streaming by the incremental
|
||||
// JSON/XML parser — skip to avoid duplicate tool calls.
|
||||
continue
|
||||
}
|
||||
|
||||
// Tool call not yet emitted — send name + args (two chunks).
|
||||
initialMessage := schema.OpenAIResponse{
|
||||
ID: id,
|
||||
Created: created,
|
||||
Model: req.Model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{
|
||||
{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Name: name,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
responses <- initialMessage
|
||||
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id,
|
||||
Created: created,
|
||||
Model: req.Model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
Content: textContentToReturn,
|
||||
ToolCalls: []schema.ToolCall{
|
||||
{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Arguments: args,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
for _, chunk := range buildDeferredToolCallChunks(
|
||||
id, req.Model, created,
|
||||
functionResults, lastEmittedCount,
|
||||
sentInitialRole, *textContentToReturn,
|
||||
sentReasoning, reasoning,
|
||||
) {
|
||||
responses <- chunk
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
233
core/http/endpoints/openai/chat_emit.go
Normal file
233
core/http/endpoints/openai/chat_emit.go
Normal file
@@ -0,0 +1,233 @@
|
||||
package openai
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/functions"
|
||||
)
|
||||
|
||||
// hasRealCall reports whether functionResults contains at least one
|
||||
// entry whose Name is something other than the noAction sentinel.
|
||||
// Used by processTools to decide between the "answer the question"
|
||||
// path and the real tool-call flush.
|
||||
func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
|
||||
for _, fc := range functionResults {
|
||||
if fc.Name != noAction {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// buildNoActionFinalChunks produces the closing SSE chunks for the
|
||||
// noActionToRun branch of processTools (i.e. the model chose the "answer"
|
||||
// pseudo-function or emitted no tool calls at all).
|
||||
//
|
||||
// When content was already streamed (contentAlreadyStreamed=true) the
|
||||
// helper emits a single trailing usage chunk, optionally carrying
|
||||
// reasoning that was produced but not streamed incrementally. When
|
||||
// content was not streamed it emits a role chunk followed by a
|
||||
// content+reasoning+usage chunk — the "send everything at once" fallback.
|
||||
//
|
||||
// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
|
||||
// probing the extractor's Go-side state: the C++ autoparser delivers
|
||||
// reasoning through ProcessChatDeltaReasoning which populates a
|
||||
// separate accumulator that extractor.Reasoning() does not expose.
|
||||
// Without this guard the callback would stream reasoning incrementally
|
||||
// and the final chunk would duplicate it.
|
||||
func buildNoActionFinalChunks(
|
||||
id, model string,
|
||||
created int,
|
||||
contentAlreadyStreamed bool,
|
||||
reasoningAlreadyStreamed bool,
|
||||
content string,
|
||||
reasoning string,
|
||||
usage schema.OpenAIUsage,
|
||||
) []schema.OpenAIResponse {
|
||||
var out []schema.OpenAIResponse
|
||||
|
||||
if contentAlreadyStreamed {
|
||||
delta := &schema.Message{}
|
||||
if reasoning != "" && !reasoningAlreadyStreamed {
|
||||
r := reasoning
|
||||
delta.Reasoning = &r
|
||||
}
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
})
|
||||
return out
|
||||
}
|
||||
|
||||
// Content was not streamed — send role, then content (+reasoning) + usage.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Role: "assistant"},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
|
||||
c := content
|
||||
delta := &schema.Message{Content: &c}
|
||||
if reasoning != "" && !reasoningAlreadyStreamed {
|
||||
r := reasoning
|
||||
delta.Reasoning = &r
|
||||
}
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
})
|
||||
return out
|
||||
}
|
||||
|
||||
// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
|
||||
// were discovered only during final parsing (i.e. after the streaming
|
||||
// callback finished). The caller forwards every returned chunk to the
|
||||
// responses channel.
|
||||
//
|
||||
// Guarantees:
|
||||
// - tool calls with i < lastEmittedCount are skipped (already streamed)
|
||||
// - each emitted call yields two chunks: name-only, then args-only
|
||||
// - no chunk ever carries both non-empty Content and non-empty ToolCalls
|
||||
// - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
|
||||
// - if !reasoningAlreadyStreamed && reasoningContent != "",
|
||||
// a reasoning chunk is emitted first
|
||||
// - if !contentAlreadyStreamed && textContent != "",
|
||||
// a role chunk followed by a content chunk is emitted (after reasoning)
|
||||
// - chunks order: [reasoning?] [role+content?] (name, args)+
|
||||
// - fallback IDs for empty ss.ID are unique per index so a client can
|
||||
// match tool_result messages back to the right call
|
||||
func buildDeferredToolCallChunks(
|
||||
id, model string,
|
||||
created int,
|
||||
functionResults []functions.FuncCallResults,
|
||||
lastEmittedCount int,
|
||||
contentAlreadyStreamed bool,
|
||||
textContent string,
|
||||
reasoningAlreadyStreamed bool,
|
||||
reasoningContent string,
|
||||
) []schema.OpenAIResponse {
|
||||
// If every call was already emitted incrementally there's nothing to
|
||||
// flush — and no reason to emit a standalone reasoning/content chunk.
|
||||
hasDeferred := false
|
||||
for i := range functionResults {
|
||||
if i >= lastEmittedCount {
|
||||
hasDeferred = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !hasDeferred {
|
||||
return nil
|
||||
}
|
||||
|
||||
var out []schema.OpenAIResponse
|
||||
|
||||
// Reasoning first — the callback path at processTools emits reasoning
|
||||
// incrementally in its own chunks, but when the C++ autoparser only
|
||||
// surfaces reasoning as a final aggregate the callback never sees it.
|
||||
// Recover it here (no duplication: contentAlreadyStreamed and
|
||||
// reasoningAlreadyStreamed track what the callback already sent).
|
||||
if !reasoningAlreadyStreamed && reasoningContent != "" {
|
||||
r := reasoningContent
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Reasoning: &r},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
// Then content, when it wasn't streamed via the callback. Emit role
|
||||
// and content in separate deltas — the OpenAI streaming contract
|
||||
// forbids bundling content alongside tool_calls in one delta.
|
||||
if !contentAlreadyStreamed && textContent != "" {
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Role: "assistant"},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
c := textContent
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Content: &c},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
for i, ss := range functionResults {
|
||||
if i < lastEmittedCount {
|
||||
// Already streamed by the incremental JSON/XML parser during
|
||||
// the token callback — skip to avoid a duplicate emission.
|
||||
continue
|
||||
}
|
||||
|
||||
toolCallID := ss.ID
|
||||
if toolCallID == "" {
|
||||
// Unique per-index fallback so multiple empty-ID calls don't
|
||||
// collide on the same request ID (clients match tool results
|
||||
// back by tool_call_id).
|
||||
toolCallID = fmt.Sprintf("%s-%d", id, i)
|
||||
}
|
||||
|
||||
// Name chunk.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Name: ss.Name,
|
||||
},
|
||||
}},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
|
||||
// Args chunk — no Content here. Either it was streamed through
|
||||
// the token callback earlier, or the role+content pair above
|
||||
// already delivered it.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Arguments: ss.Arguments,
|
||||
},
|
||||
}},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
return out
|
||||
}
|
||||
717
core/http/endpoints/openai/chat_emit_test.go
Normal file
717
core/http/endpoints/openai/chat_emit_test.go
Normal file
@@ -0,0 +1,717 @@
|
||||
package openai
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/functions"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
// contentOf extracts the string payload from a chunk's delta.Content,
|
||||
// transparently handling both *string and string underlying types so
|
||||
// assertions don't have to care which one the helper produced.
|
||||
func contentOf(ch schema.OpenAIResponse) string {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return ""
|
||||
}
|
||||
switch v := ch.Choices[0].Delta.Content.(type) {
|
||||
case *string:
|
||||
if v == nil {
|
||||
return ""
|
||||
}
|
||||
return *v
|
||||
case string:
|
||||
return v
|
||||
default:
|
||||
return ""
|
||||
}
|
||||
}
|
||||
|
||||
// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
|
||||
// *string on schema.Message.
|
||||
func reasoningOf(ch schema.OpenAIResponse) string {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return ""
|
||||
}
|
||||
r := ch.Choices[0].Delta.Reasoning
|
||||
if r == nil {
|
||||
return ""
|
||||
}
|
||||
return *r
|
||||
}
|
||||
|
||||
// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
|
||||
func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return nil
|
||||
}
|
||||
return ch.Choices[0].Delta.ToolCalls
|
||||
}
|
||||
|
||||
// expectSpecCompliant enforces the invariants on every chunk:
|
||||
// - Object == "chat.completion.chunk"
|
||||
// - Exactly one Choice with Index==0
|
||||
// - No delta ever carries both non-empty Content and non-empty ToolCalls
|
||||
// - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
|
||||
func expectSpecCompliant(chunks []schema.OpenAIResponse) {
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
|
||||
Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
|
||||
Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
|
||||
|
||||
hasContent := contentOf(ch) != ""
|
||||
hasReasoning := reasoningOf(ch) != ""
|
||||
hasToolCalls := len(toolCallsOf(ch)) > 0
|
||||
|
||||
if hasContent && hasToolCalls {
|
||||
Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
|
||||
}
|
||||
if hasReasoning && hasToolCalls {
|
||||
Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// expectMetadata asserts every chunk carries the same id/model/created.
|
||||
func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
|
||||
Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
|
||||
Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
|
||||
}
|
||||
}
|
||||
|
||||
var _ = Describe("buildDeferredToolCallChunks", func() {
|
||||
const (
|
||||
testID = "req"
|
||||
testModel = "test-model"
|
||||
testCreated = 1700000000
|
||||
)
|
||||
|
||||
Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
|
||||
It("emits only the tool_call chunks, no Content anywhere", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "Let me search…",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2), "two chunks: name, args")
|
||||
|
||||
// Name chunk
|
||||
tc0 := toolCallsOf(chunks[0])
|
||||
Expect(tc0).To(HaveLen(1))
|
||||
Expect(tc0[0].Index).To(Equal(0))
|
||||
Expect(tc0[0].ID).To(Equal("tc1"))
|
||||
Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
|
||||
Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// Args chunk — MUST NOT carry Content
|
||||
tc1 := toolCallsOf(chunks[1])
|
||||
Expect(tc1).To(HaveLen(1))
|
||||
Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
|
||||
Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
|
||||
Expect(contentOf(chunks[1])).To(BeEmpty(),
|
||||
"args chunk must not duplicate already-streamed content")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case B — autoparser / content not streamed", func() {
|
||||
It("emits role, content, then name+args", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "do", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "Here is my plan…",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(4), "role, content, name, args")
|
||||
|
||||
// Role chunk
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// Content chunk
|
||||
Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
|
||||
Expect(toolCallsOf(chunks[1])).To(BeEmpty())
|
||||
|
||||
// Name + args chunks
|
||||
Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
|
||||
Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
|
||||
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case C — multiple deferred calls, content already streamed", func() {
|
||||
It("emits (name, args) × 3 with no Content anywhere", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tcA"},
|
||||
{Name: "b", Arguments: "{}", ID: "tcB"},
|
||||
{Name: "c", Arguments: "{}", ID: "tcC"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "some narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(6))
|
||||
|
||||
for i := 0; i < 3; i++ {
|
||||
Expect(contentOf(chunks[2*i])).To(BeEmpty(),
|
||||
"call #%d name chunk must not carry Content", i)
|
||||
Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
|
||||
"call #%d args chunk must not carry Content", i)
|
||||
Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
|
||||
Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
|
||||
}
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
|
||||
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case D — partial incremental emission", func() {
|
||||
It("emits only the deferred tail (call #1), skipping #0", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 1,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
|
||||
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case E — all calls already emitted incrementally", func() {
|
||||
It("emits nothing", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 2,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case F — content not streamed but textContent empty", func() {
|
||||
It("emits only the tool call chunks, no leading role/content", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: "tcX"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
|
||||
It("emits a deterministic per-index fallback", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: ""},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
expectedID := fmt.Sprintf("%s-%d", testID, 0)
|
||||
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
|
||||
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
|
||||
It("avoids the collision bug where every empty-ID call shared the request id", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: ""},
|
||||
{Name: "b", Arguments: "{}", ID: ""},
|
||||
{Name: "c", Arguments: "{}", ID: ""},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(6))
|
||||
|
||||
ids := map[string]int{}
|
||||
for _, ch := range chunks {
|
||||
for _, tc := range toolCallsOf(ch) {
|
||||
ids[tc.ID]++
|
||||
}
|
||||
}
|
||||
// Each call yields a name chunk + args chunk → each distinct ID
|
||||
// should appear in exactly two chunks. Three distinct IDs
|
||||
// overall.
|
||||
Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
|
||||
for id, n := range ids {
|
||||
Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case H — indices preserved across skip with multiple calls", func() {
|
||||
It("emits Index fields matching functionResults positions", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
{Name: "c", Arguments: "{}", ID: "tc2"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 1,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(4))
|
||||
|
||||
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
|
||||
Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case I — explicit non-empty ID is preserved", func() {
|
||||
It("does not touch ss.ID when it's already set", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: "abc123"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
|
||||
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case J — chunk-shape sanity", func() {
|
||||
It("splits Name into the first chunk and Arguments into the second", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
|
||||
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case K — metadata propagation", func() {
|
||||
It("stamps every chunk with the same id/model/created", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tcA"},
|
||||
{Name: "b", Arguments: "{}", ID: "tcB"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "hello",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
expectMetadata(chunks, testID, testModel, testCreated)
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case L — Choices[0].Index == 0 invariant", func() {
|
||||
It("is upheld across every branch the helper can take", func() {
|
||||
scenarios := []struct {
|
||||
name string
|
||||
functionResults []functions.FuncCallResults
|
||||
lastEmittedCount int
|
||||
contentStreamed bool
|
||||
text string
|
||||
reasoningStreamed bool
|
||||
reasoning string
|
||||
}{
|
||||
{"streamed-content-deferred-call",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, true, "hi", true, ""},
|
||||
{"unstreamed-content-deferred-call",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, false, "hello", true, ""},
|
||||
{"unstreamed-reasoning-and-content",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, false, "hello", false, "thinking…"},
|
||||
{"partial-incremental",
|
||||
[]functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}"},
|
||||
{Name: "b", Arguments: "{}"}},
|
||||
1, true, "hi", true, ""},
|
||||
}
|
||||
for _, sc := range scenarios {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
sc.functionResults, sc.lastEmittedCount,
|
||||
sc.contentStreamed, sc.text,
|
||||
sc.reasoningStreamed, sc.reasoning,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.Choices[0].Index).To(Equal(0),
|
||||
"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case M — spec compliance across every scenario", func() {
|
||||
It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
|
||||
scenarios := []struct {
|
||||
name string
|
||||
functionResults []functions.FuncCallResults
|
||||
lastEmittedCount int
|
||||
contentStreamed bool
|
||||
text string
|
||||
reasoningStreamed bool
|
||||
reasoning string
|
||||
}{
|
||||
{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, true, "already-streamed", true, ""},
|
||||
{"C", []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"}},
|
||||
0, true, "already-streamed", true, ""},
|
||||
{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, false, "plan", true, ""},
|
||||
{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, false, "plan", false, "thinking…"},
|
||||
}
|
||||
for _, sc := range scenarios {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
sc.functionResults, sc.lastEmittedCount,
|
||||
sc.contentStreamed, sc.text,
|
||||
sc.reasoningStreamed, sc.reasoning,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
hasContent := contentOf(ch) != ""
|
||||
hasReasoning := reasoningOf(ch) != ""
|
||||
hasToolCalls := len(toolCallsOf(ch)) > 0
|
||||
Expect(hasContent && hasToolCalls).To(BeFalse(),
|
||||
"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
|
||||
Expect(hasReasoning && hasToolCalls).To(BeFalse(),
|
||||
"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case N — empty functionResults", func() {
|
||||
It("emits nothing, including no leading role/content/reasoning", func() {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
nil, 0,
|
||||
false, "ignored",
|
||||
false, "ignored",
|
||||
)
|
||||
Expect(chunks).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case O — content not streamed but all calls already emitted", func() {
|
||||
It("emits nothing, not even a standalone content chunk", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 2,
|
||||
false, "narration",
|
||||
false, "thinking…",
|
||||
)
|
||||
Expect(chunks).To(BeEmpty(),
|
||||
"no tool_calls to trigger on, so no leading role/content/reasoning either")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Reasoning — autoparser delivered reasoning only at end", func() {
|
||||
It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "streamed content",
|
||||
false, "model's private thoughts",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(3), "reasoning, name, args")
|
||||
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// The following two are the tool_call name + args chunks.
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
|
||||
It("emits reasoning before role+content when neither was streamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "final plan",
|
||||
false, "private thoughts",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
|
||||
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
|
||||
Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[2])).To(Equal("final plan"))
|
||||
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
|
||||
It("does not re-emit reasoning that was already streamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "streamed",
|
||||
true, "already-sent reasoning",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
|
||||
for _, ch := range chunks {
|
||||
Expect(reasoningOf(ch)).To(BeEmpty())
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("hasRealCall", func() {
|
||||
const noAction = "answer"
|
||||
|
||||
It("returns false for nil and empty slices", func() {
|
||||
Expect(hasRealCall(nil, noAction)).To(BeFalse())
|
||||
Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("returns false when every entry is the noAction sentinel", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
{Name: noAction, Arguments: `{"message":"hello"}`},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("returns true when only one entry is a real call", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: "{}"},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
|
||||
It("returns true when a real call follows a noAction entry", func() {
|
||||
// This is the regression the follow-up fixes: the old
|
||||
// functionResults[0].Name == noAction check would declare this
|
||||
// noActionToRun and drop the real call entirely.
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
{Name: "search", Arguments: "{}"},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
|
||||
It("returns true when a real call precedes a noAction entry", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: "{}"},
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("buildNoActionFinalChunks", func() {
|
||||
const (
|
||||
testID = "req"
|
||||
testModel = "test-model"
|
||||
testCreated = 1700000000
|
||||
)
|
||||
usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
|
||||
|
||||
Describe("Content streamed — trailing usage chunk", func() {
|
||||
It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, true,
|
||||
"", "already-streamed-reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(reasoningOf(chunks[0])).To(BeEmpty(),
|
||||
"reasoning must not be re-emitted once it was streamed via the callback")
|
||||
})
|
||||
|
||||
It("emits a trailing reasoning delivery when reasoning came only at end", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, false,
|
||||
"", "autoparser final reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("omits reasoning when it's empty regardless of streamed flag", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, false,
|
||||
"", "", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(reasoningOf(chunks[0])).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Content not streamed — role, then content+usage", func() {
|
||||
It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, true,
|
||||
"the answer", "already-streamed-reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
Expect(contentOf(chunks[1])).To(Equal("the answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(BeEmpty(),
|
||||
"reasoning must not be re-emitted if it was streamed earlier")
|
||||
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("emits role, then content+reasoning when reasoning was not streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"the answer", "autoparser final reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
|
||||
Expect(contentOf(chunks[1])).To(Equal("the answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
|
||||
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("still emits content even when reasoning is empty", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"just an answer", "", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(contentOf(chunks[1])).To(Equal("just an answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Metadata and shape invariants", func() {
|
||||
It("stamps every chunk with the same id/model/created and object", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"hi", "reasoning", usage,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
|
||||
Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
|
||||
Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
|
||||
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
|
||||
Expect(ch.Choices).To(HaveLen(1))
|
||||
Expect(ch.Choices[0].Index).To(Equal(0))
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
@@ -3,6 +3,7 @@ package middleware
|
||||
import (
|
||||
"bytes"
|
||||
"io"
|
||||
"mime"
|
||||
"net/http"
|
||||
"slices"
|
||||
"sync"
|
||||
@@ -94,7 +95,8 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
|
||||
|
||||
initializeTracing(app.ApplicationConfig().TracingMaxItems)
|
||||
|
||||
if c.Request().Header.Get("Content-Type") != "application/json" {
|
||||
ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
|
||||
if ct != "application/json" {
|
||||
return next(c)
|
||||
}
|
||||
|
||||
|
||||
1
core/http/react-ui/src/utils/capabilities.js
vendored
1
core/http/react-ui/src/utils/capabilities.js
vendored
@@ -12,3 +12,4 @@ export const CAP_TOKENIZE = 'FLAG_TOKENIZE'
|
||||
export const CAP_VAD = 'FLAG_VAD'
|
||||
export const CAP_VIDEO = 'FLAG_VIDEO'
|
||||
export const CAP_DETECTION = 'FLAG_DETECTION'
|
||||
export const CAP_FACE_RECOGNITION = 'FLAG_FACE_RECOGNITION'
|
||||
|
||||
@@ -97,6 +97,28 @@ func RegisterLocalAIRoutes(router *echo.Echo,
|
||||
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
|
||||
requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }))
|
||||
|
||||
// Face recognition endpoints
|
||||
faceMw := []echo.MiddlewareFunc{
|
||||
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_FACE_RECOGNITION)),
|
||||
}
|
||||
router.POST("/v1/face/verify",
|
||||
localai.FaceVerifyEndpoint(cl, ml, appConfig),
|
||||
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceVerifyRequest) }))...)
|
||||
router.POST("/v1/face/analyze",
|
||||
localai.FaceAnalyzeEndpoint(cl, ml, appConfig),
|
||||
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceAnalyzeRequest) }))...)
|
||||
router.POST("/v1/face/embed",
|
||||
localai.FaceEmbedEndpoint(cl, ml, appConfig),
|
||||
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceEmbedRequest) }))...)
|
||||
router.POST("/v1/face/register",
|
||||
localai.FaceRegisterEndpoint(cl, ml, appConfig, app.FaceRegistry()),
|
||||
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceRegisterRequest) }))...)
|
||||
router.POST("/v1/face/identify",
|
||||
localai.FaceIdentifyEndpoint(cl, ml, appConfig, app.FaceRegistry()),
|
||||
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceIdentifyRequest) }))...)
|
||||
// Forget does not load a face model — it only needs the registry.
|
||||
router.POST("/v1/face/forget", localai.FaceForgetEndpoint(app.FaceRegistry()))
|
||||
|
||||
ttsHandler := localai.TTSEndpoint(cl, ml, appConfig)
|
||||
router.POST("/tts",
|
||||
ttsHandler,
|
||||
|
||||
@@ -23,7 +23,6 @@ import (
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/http/auth"
|
||||
"github.com/mudler/LocalAI/core/http/endpoints/localai"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/p2p"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
@@ -1458,24 +1457,5 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
|
||||
app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
|
||||
}
|
||||
|
||||
// Logs API (admin only)
|
||||
app.GET("/api/traces", func(c echo.Context) error {
|
||||
if !appConfig.EnableTracing {
|
||||
return c.JSON(503, map[string]any{
|
||||
"error": "Tracing disabled",
|
||||
})
|
||||
}
|
||||
traces := middleware.GetTraces()
|
||||
return c.JSON(200, map[string]any{
|
||||
"traces": traces,
|
||||
})
|
||||
}, adminMiddleware)
|
||||
|
||||
app.POST("/api/traces/clear", func(c echo.Context) error {
|
||||
middleware.ClearTraces()
|
||||
return c.JSON(200, map[string]any{
|
||||
"message": "Traces cleared",
|
||||
})
|
||||
}, adminMiddleware)
|
||||
}
|
||||
|
||||
|
||||
@@ -173,6 +173,123 @@ type Detection struct {
|
||||
Mask string `json:"mask,omitempty"` // base64-encoded PNG segmentation mask
|
||||
}
|
||||
|
||||
// ─── Face recognition ──────────────────────────────────────────────
|
||||
//
|
||||
// FacialArea describes a bounding box for a detected face.
|
||||
type FacialArea struct {
|
||||
X float32 `json:"x"`
|
||||
Y float32 `json:"y"`
|
||||
W float32 `json:"w"`
|
||||
H float32 `json:"h"`
|
||||
}
|
||||
|
||||
// FaceVerifyRequest compares two images to decide whether they depict
|
||||
// the same person. Img1 and Img2 accept URL, base64, or data-URI.
|
||||
type FaceVerifyRequest struct {
|
||||
BasicModelRequest
|
||||
Img1 string `json:"img1"`
|
||||
Img2 string `json:"img2"`
|
||||
Threshold float32 `json:"threshold,omitempty"`
|
||||
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
|
||||
}
|
||||
|
||||
type FaceVerifyResponse struct {
|
||||
Verified bool `json:"verified"`
|
||||
Distance float32 `json:"distance"`
|
||||
Threshold float32 `json:"threshold"`
|
||||
Confidence float32 `json:"confidence"`
|
||||
Model string `json:"model"`
|
||||
Img1Area FacialArea `json:"img1_area"`
|
||||
Img2Area FacialArea `json:"img2_area"`
|
||||
ProcessingTimeMs float32 `json:"processing_time_ms,omitempty"`
|
||||
}
|
||||
|
||||
// FaceAnalyzeRequest asks the backend for demographic attributes on
|
||||
// every face detected in Img.
|
||||
type FaceAnalyzeRequest struct {
|
||||
BasicModelRequest
|
||||
Img string `json:"img"`
|
||||
Actions []string `json:"actions,omitempty"` // subset of {"age","gender","emotion","race"}
|
||||
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
|
||||
}
|
||||
|
||||
type FaceAnalyzeResponse struct {
|
||||
Faces []FaceAnalysis `json:"faces"`
|
||||
}
|
||||
|
||||
type FaceAnalysis struct {
|
||||
Region FacialArea `json:"region"`
|
||||
FaceConfidence float32 `json:"face_confidence"`
|
||||
Age float32 `json:"age,omitempty"`
|
||||
DominantGender string `json:"dominant_gender,omitempty"`
|
||||
Gender map[string]float32 `json:"gender,omitempty"`
|
||||
DominantEmotion string `json:"dominant_emotion,omitempty"`
|
||||
Emotion map[string]float32 `json:"emotion,omitempty"`
|
||||
DominantRace string `json:"dominant_race,omitempty"`
|
||||
Race map[string]float32 `json:"race,omitempty"`
|
||||
IsReal bool `json:"is_real,omitempty"`
|
||||
AntispoofScore float32 `json:"antispoof_score,omitempty"`
|
||||
}
|
||||
|
||||
// FaceEmbedRequest extracts a face embedding from an image. Distinct
|
||||
// from /v1/embeddings (which is OpenAI-compatible and text-only); this
|
||||
// endpoint accepts URL / base64 / data-URI image inputs.
|
||||
type FaceEmbedRequest struct {
|
||||
BasicModelRequest
|
||||
Img string `json:"img"`
|
||||
}
|
||||
|
||||
type FaceEmbedResponse struct {
|
||||
Embedding []float32 `json:"embedding"`
|
||||
Dim int `json:"dim"`
|
||||
Model string `json:"model,omitempty"`
|
||||
}
|
||||
|
||||
// FaceRegisterRequest enrolls a face into the 1:N recognition store.
|
||||
type FaceRegisterRequest struct {
|
||||
BasicModelRequest
|
||||
Img string `json:"img"`
|
||||
Name string `json:"name"`
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
Store string `json:"store,omitempty"` // vector store model; empty = local-store default
|
||||
}
|
||||
|
||||
type FaceRegisterResponse struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
RegisteredAt time.Time `json:"registered_at"`
|
||||
}
|
||||
|
||||
// FaceIdentifyRequest runs 1:N recognition: embed the probe and
|
||||
// return the top-K nearest registered faces.
|
||||
type FaceIdentifyRequest struct {
|
||||
BasicModelRequest
|
||||
Img string `json:"img"`
|
||||
TopK int `json:"top_k,omitempty"`
|
||||
Threshold float32 `json:"threshold,omitempty"` // optional cutoff on distance
|
||||
Store string `json:"store,omitempty"`
|
||||
}
|
||||
|
||||
type FaceIdentifyResponse struct {
|
||||
Matches []FaceIdentifyMatch `json:"matches"`
|
||||
}
|
||||
|
||||
type FaceIdentifyMatch struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
Distance float32 `json:"distance"`
|
||||
Confidence float32 `json:"confidence"`
|
||||
Match bool `json:"match"` // true when distance <= threshold
|
||||
}
|
||||
|
||||
// FaceForgetRequest removes a previously-registered face by ID.
|
||||
type FaceForgetRequest struct {
|
||||
BasicModelRequest
|
||||
ID string `json:"id"`
|
||||
Store string `json:"store,omitempty"`
|
||||
}
|
||||
|
||||
type ImportModelRequest struct {
|
||||
URI string `json:"uri"`
|
||||
Preferences json.RawMessage `json:"preferences,omitempty"`
|
||||
|
||||
60
core/services/facerecognition/registry.go
Normal file
60
core/services/facerecognition/registry.go
Normal file
@@ -0,0 +1,60 @@
|
||||
// Package facerecognition provides a swappable backing store for face
|
||||
// embeddings and the 1:N identification pipeline that sits on top of it.
|
||||
//
|
||||
// The current implementation (NewStoreRegistry) is backed by LocalAI's
|
||||
// in-memory local-store gRPC backend. This is in-memory only — all
|
||||
// registrations are lost when LocalAI restarts.
|
||||
//
|
||||
// TODO: add a persistent PostgreSQL/pgvector-backed implementation for
|
||||
// production deployments. The Registry interface is explicitly designed
|
||||
// so the swap is a constructor change in core/application, with zero
|
||||
// HTTP-handler changes.
|
||||
package facerecognition
|
||||
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Registry stores face embeddings keyed by an opaque ID and supports
|
||||
// approximate similarity search. Implementations are expected to be
|
||||
// safe for concurrent use.
|
||||
type Registry interface {
|
||||
// Register stores a face embedding alongside its metadata.
|
||||
// Returns the stored metadata with ID and RegisteredAt populated.
|
||||
// The embedding length must match the registry's expected dimension.
|
||||
Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error)
|
||||
|
||||
// Identify returns up to topK matches for the probe embedding,
|
||||
// sorted by ascending distance (closest first).
|
||||
Identify(ctx context.Context, probe []float32, topK int) ([]Match, error)
|
||||
|
||||
// Forget removes a previously-registered embedding by ID.
|
||||
// Returns ErrNotFound if the ID is unknown.
|
||||
Forget(ctx context.Context, id string) error
|
||||
}
|
||||
|
||||
// Metadata is the user-supplied payload stored alongside a face embedding.
|
||||
type Metadata struct {
|
||||
// ID is populated by the registry at Register time and should not be
|
||||
// set by the caller. It is echoed back in Match.Metadata.
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
RegisteredAt time.Time `json:"registered_at"`
|
||||
}
|
||||
|
||||
// Match is a single result from Identify, ranked by similarity.
|
||||
type Match struct {
|
||||
ID string
|
||||
Metadata Metadata
|
||||
Distance float32 // 1 - cosine_similarity; lower = closer
|
||||
}
|
||||
|
||||
// Sentinel errors; callers should compare with errors.Is.
|
||||
var (
|
||||
ErrNotFound = errors.New("facerecognition: id not found")
|
||||
ErrEmptyEmbedding = errors.New("facerecognition: embedding is empty")
|
||||
ErrDimensionMismatch = errors.New("facerecognition: embedding dimension mismatch")
|
||||
)
|
||||
253
core/services/facerecognition/registry_test.go
Normal file
253
core/services/facerecognition/registry_test.go
Normal file
@@ -0,0 +1,253 @@
|
||||
package facerecognition_test
|
||||
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"math"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/mudler/LocalAI/core/services/facerecognition"
|
||||
"github.com/mudler/LocalAI/pkg/grpc"
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
|
||||
grpclib "google.golang.org/grpc"
|
||||
)
|
||||
|
||||
const dim = 4 // tiny test-friendly embedding dimension
|
||||
|
||||
func TestRegisterIdentifyForget(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
reg, fake := newTestRegistry(t)
|
||||
ctx := t.Context()
|
||||
|
||||
alice := []float32{1, 0, 0, 0}
|
||||
bob := []float32{0, 1, 0, 0}
|
||||
|
||||
aliceMeta, err := reg.Register(ctx, alice, facerecognition.Metadata{Name: "Alice"})
|
||||
if err != nil {
|
||||
t.Fatalf("Register Alice: %v", err)
|
||||
}
|
||||
if aliceMeta.ID == "" {
|
||||
t.Fatalf("Register returned empty ID")
|
||||
}
|
||||
if aliceMeta.RegisteredAt.IsZero() {
|
||||
t.Fatalf("Register did not populate RegisteredAt")
|
||||
}
|
||||
|
||||
bobMeta, err := reg.Register(ctx, bob, facerecognition.Metadata{Name: "Bob"})
|
||||
if err != nil {
|
||||
t.Fatalf("Register Bob: %v", err)
|
||||
}
|
||||
if bobMeta.ID == aliceMeta.ID {
|
||||
t.Fatalf("IDs should be distinct, got %q twice", bobMeta.ID)
|
||||
}
|
||||
aliceID := aliceMeta.ID
|
||||
if got, want := fake.len(), 2; got != want {
|
||||
t.Fatalf("fake store has %d entries, want %d", got, want)
|
||||
}
|
||||
|
||||
// Identify an Alice-like probe — she should win.
|
||||
matches, err := reg.Identify(ctx, []float32{0.99, 0.01, 0, 0}, 2)
|
||||
if err != nil {
|
||||
t.Fatalf("Identify: %v", err)
|
||||
}
|
||||
if len(matches) == 0 {
|
||||
t.Fatalf("no matches returned")
|
||||
}
|
||||
if matches[0].Metadata.Name != "Alice" {
|
||||
t.Fatalf("top match name = %q, want Alice", matches[0].Metadata.Name)
|
||||
}
|
||||
if matches[0].ID != aliceID {
|
||||
t.Fatalf("top match ID = %q, want %q", matches[0].ID, aliceID)
|
||||
}
|
||||
// Sorted ascending by distance.
|
||||
for i := 1; i < len(matches); i++ {
|
||||
if matches[i].Distance < matches[i-1].Distance {
|
||||
t.Fatalf("matches not sorted by distance: %v", matches)
|
||||
}
|
||||
}
|
||||
|
||||
// Forget Alice → she's gone, Bob remains.
|
||||
if err := reg.Forget(ctx, aliceID); err != nil {
|
||||
t.Fatalf("Forget Alice: %v", err)
|
||||
}
|
||||
if got, want := fake.len(), 1; got != want {
|
||||
t.Fatalf("after Forget, store has %d entries, want %d", got, want)
|
||||
}
|
||||
|
||||
// Forget unknown ID → ErrNotFound (checkable via errors.Is).
|
||||
if err := reg.Forget(ctx, "nonexistent"); !errors.Is(err, facerecognition.ErrNotFound) {
|
||||
t.Fatalf("Forget unknown: err = %v, want ErrNotFound", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestRegisterRejectsBadEmbedding(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
reg, _ := newTestRegistry(t)
|
||||
ctx := t.Context()
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
embed []float32
|
||||
wantErr error
|
||||
}{
|
||||
{"empty", []float32{}, facerecognition.ErrEmptyEmbedding},
|
||||
{"wrong_dim", []float32{1, 2}, facerecognition.ErrDimensionMismatch},
|
||||
}
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
t.Parallel()
|
||||
_, err := reg.Register(ctx, tc.embed, facerecognition.Metadata{Name: "x"})
|
||||
if !errors.Is(err, tc.wantErr) {
|
||||
t.Fatalf("err = %v, want wrapping %v", err, tc.wantErr)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestConcurrent(t *testing.T) {
|
||||
t.Parallel()
|
||||
|
||||
reg, _ := newTestRegistry(t)
|
||||
ctx := t.Context()
|
||||
|
||||
done := make(chan struct{})
|
||||
for i := range 32 {
|
||||
go func(i int) {
|
||||
embed := []float32{float32(i % 4), float32((i + 1) % 4), 0, 1}
|
||||
meta, err := reg.Register(ctx, embed, facerecognition.Metadata{Name: "n"})
|
||||
if err == nil {
|
||||
_, _ = reg.Identify(ctx, embed, 3)
|
||||
_ = reg.Forget(ctx, meta.ID)
|
||||
}
|
||||
done <- struct{}{}
|
||||
}(i)
|
||||
}
|
||||
for range 32 {
|
||||
<-done
|
||||
}
|
||||
}
|
||||
|
||||
// ─── fake gRPC backend ───────────────────────────────────────────────
|
||||
|
||||
func newTestRegistry(t *testing.T) (facerecognition.Registry, *fakeBackend) {
|
||||
t.Helper()
|
||||
fake := &fakeBackend{}
|
||||
resolver := func(_ context.Context, _ string) (grpc.Backend, error) {
|
||||
return fake, nil
|
||||
}
|
||||
return facerecognition.NewStoreRegistry(resolver, "test-store", dim), fake
|
||||
}
|
||||
|
||||
// fakeBackend implements just enough of grpc.Backend for the store
|
||||
// helpers. All other methods panic so any accidental dependency is
|
||||
// visible in tests.
|
||||
type fakeBackend struct {
|
||||
grpc.Backend // embed to inherit no-op default method set via panic
|
||||
|
||||
mu sync.Mutex
|
||||
keys [][]float32
|
||||
vals [][]byte
|
||||
}
|
||||
|
||||
func (f *fakeBackend) len() int {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
return len(f.keys)
|
||||
}
|
||||
|
||||
func (f *fakeBackend) StoresSet(_ context.Context, in *pb.StoresSetOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
for i, k := range in.Keys {
|
||||
f.keys = append(f.keys, append([]float32(nil), k.Floats...))
|
||||
f.vals = append(f.vals, append([]byte(nil), in.Values[i].Bytes...))
|
||||
}
|
||||
return &pb.Result{Success: true}, nil
|
||||
}
|
||||
|
||||
func (f *fakeBackend) StoresDelete(_ context.Context, in *pb.StoresDeleteOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
for _, k := range in.Keys {
|
||||
idx := f.findKey(k.Floats)
|
||||
if idx < 0 {
|
||||
continue
|
||||
}
|
||||
f.keys = append(f.keys[:idx], f.keys[idx+1:]...)
|
||||
f.vals = append(f.vals[:idx], f.vals[idx+1:]...)
|
||||
}
|
||||
return &pb.Result{Success: true}, nil
|
||||
}
|
||||
|
||||
func (f *fakeBackend) StoresFind(_ context.Context, in *pb.StoresFindOptions, _ ...grpclib.CallOption) (*pb.StoresFindResult, error) {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
|
||||
type scored struct {
|
||||
key []float32
|
||||
val []byte
|
||||
sim float32
|
||||
}
|
||||
results := make([]scored, 0, len(f.keys))
|
||||
for i, k := range f.keys {
|
||||
results = append(results, scored{k, f.vals[i], cosine(k, in.Key.Floats)})
|
||||
}
|
||||
// Sort descending by similarity.
|
||||
for i := 0; i < len(results); i++ {
|
||||
for j := i + 1; j < len(results); j++ {
|
||||
if results[j].sim > results[i].sim {
|
||||
results[i], results[j] = results[j], results[i]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
top := int(in.TopK)
|
||||
if top <= 0 || top > len(results) {
|
||||
top = len(results)
|
||||
}
|
||||
out := &pb.StoresFindResult{}
|
||||
for _, r := range results[:top] {
|
||||
out.Keys = append(out.Keys, &pb.StoresKey{Floats: r.key})
|
||||
out.Values = append(out.Values, &pb.StoresValue{Bytes: r.val})
|
||||
out.Similarities = append(out.Similarities, r.sim)
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
func (f *fakeBackend) findKey(target []float32) int {
|
||||
for i, k := range f.keys {
|
||||
if equalFloats(k, target) {
|
||||
return i
|
||||
}
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
||||
func equalFloats(a, b []float32) bool {
|
||||
if len(a) != len(b) {
|
||||
return false
|
||||
}
|
||||
for i := range a {
|
||||
if a[i] != b[i] {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
func cosine(a, b []float32) float32 {
|
||||
var dot, na, nb float64
|
||||
for i := range a {
|
||||
dot += float64(a[i]) * float64(b[i])
|
||||
na += float64(a[i]) * float64(a[i])
|
||||
nb += float64(b[i]) * float64(b[i])
|
||||
}
|
||||
if na == 0 || nb == 0 {
|
||||
return 0
|
||||
}
|
||||
return float32(dot / (math.Sqrt(na) * math.Sqrt(nb)))
|
||||
}
|
||||
142
core/services/facerecognition/store_registry.go
Normal file
142
core/services/facerecognition/store_registry.go
Normal file
@@ -0,0 +1,142 @@
|
||||
package facerecognition
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"sort"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/google/uuid"
|
||||
|
||||
"github.com/mudler/LocalAI/pkg/grpc"
|
||||
"github.com/mudler/LocalAI/pkg/store"
|
||||
)
|
||||
|
||||
// StoreResolver resolves a named vector store to a gRPC backend. The
|
||||
// HTTP handler layer wires this to backend.StoreBackend so the
|
||||
// registry stays decoupled from the ModelLoader plumbing.
|
||||
type StoreResolver func(ctx context.Context, storeName string) (grpc.Backend, error)
|
||||
|
||||
// NewStoreRegistry returns a Registry backed by LocalAI's generic
|
||||
// StoresSet / StoresFind / StoresDelete gRPC surface.
|
||||
//
|
||||
// storeName selects which vector-store model to use (defaults to the
|
||||
// local-store Go backend). `dim` is the expected embedding dimension;
|
||||
// pass 0 to accept whatever dimension arrives (useful when the face
|
||||
// backend exposes multiple recognizers of different sizes, e.g.
|
||||
// ArcFace R50 at 512 vs SFace at 128). A non-zero dim is enforced at
|
||||
// Register time and fails fast with ErrDimensionMismatch.
|
||||
func NewStoreRegistry(resolve StoreResolver, storeName string, dim int) Registry {
|
||||
return &storeRegistry{
|
||||
resolve: resolve,
|
||||
storeName: storeName,
|
||||
dim: dim,
|
||||
}
|
||||
}
|
||||
|
||||
type storeRegistry struct {
|
||||
resolve StoreResolver
|
||||
storeName string
|
||||
dim int
|
||||
|
||||
// TODO(postgres): the local-store gRPC surface keys by embedding
|
||||
// vector and exposes no "list all" method, so we cannot delete by
|
||||
// ID without remembering the embedding. This in-memory index is
|
||||
// rebuilt on every Register and lost on restart — acceptable while
|
||||
// the only implementation is itself in-memory. A persistent
|
||||
// implementation must rebuild this index at startup.
|
||||
idIndex sync.Map // map[string][]float32
|
||||
}
|
||||
|
||||
func (r *storeRegistry) Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error) {
|
||||
if len(embedding) == 0 {
|
||||
return Metadata{}, ErrEmptyEmbedding
|
||||
}
|
||||
if r.dim != 0 && len(embedding) != r.dim {
|
||||
return Metadata{}, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(embedding))
|
||||
}
|
||||
|
||||
backend, err := r.resolve(ctx, r.storeName)
|
||||
if err != nil {
|
||||
return Metadata{}, fmt.Errorf("facerecognition: resolve store: %w", err)
|
||||
}
|
||||
|
||||
meta.ID = uuid.NewString()
|
||||
if meta.RegisteredAt.IsZero() {
|
||||
meta.RegisteredAt = time.Now().UTC()
|
||||
}
|
||||
|
||||
payload, err := json.Marshal(meta)
|
||||
if err != nil {
|
||||
return Metadata{}, fmt.Errorf("facerecognition: marshal metadata: %w", err)
|
||||
}
|
||||
|
||||
if err := store.SetSingle(ctx, backend, embedding, payload); err != nil {
|
||||
return Metadata{}, fmt.Errorf("facerecognition: set: %w", err)
|
||||
}
|
||||
|
||||
// Retain a copy so Forget can look up the embedding by ID.
|
||||
embCopy := append([]float32(nil), embedding...)
|
||||
r.idIndex.Store(meta.ID, embCopy)
|
||||
return meta, nil
|
||||
}
|
||||
|
||||
func (r *storeRegistry) Identify(ctx context.Context, probe []float32, topK int) ([]Match, error) {
|
||||
if len(probe) == 0 {
|
||||
return nil, ErrEmptyEmbedding
|
||||
}
|
||||
if r.dim != 0 && len(probe) != r.dim {
|
||||
return nil, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(probe))
|
||||
}
|
||||
if topK <= 0 {
|
||||
topK = 5
|
||||
}
|
||||
|
||||
backend, err := r.resolve(ctx, r.storeName)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("facerecognition: resolve store: %w", err)
|
||||
}
|
||||
|
||||
_, values, similarities, err := store.Find(ctx, backend, probe, topK)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("facerecognition: find: %w", err)
|
||||
}
|
||||
|
||||
matches := make([]Match, 0, len(values))
|
||||
for i, raw := range values {
|
||||
var meta Metadata
|
||||
if err := json.Unmarshal(raw, &meta); err != nil {
|
||||
// Skip unreadable entries instead of failing the whole query —
|
||||
// the store may contain non-face records in shared deployments.
|
||||
continue
|
||||
}
|
||||
matches = append(matches, Match{
|
||||
ID: meta.ID,
|
||||
Metadata: meta,
|
||||
Distance: 1 - similarities[i],
|
||||
})
|
||||
}
|
||||
|
||||
sort.SliceStable(matches, func(i, j int) bool { return matches[i].Distance < matches[j].Distance })
|
||||
return matches, nil
|
||||
}
|
||||
|
||||
func (r *storeRegistry) Forget(ctx context.Context, id string) error {
|
||||
raw, ok := r.idIndex.Load(id)
|
||||
if !ok {
|
||||
return ErrNotFound
|
||||
}
|
||||
embedding := raw.([]float32)
|
||||
|
||||
backend, err := r.resolve(ctx, r.storeName)
|
||||
if err != nil {
|
||||
return fmt.Errorf("facerecognition: resolve store: %w", err)
|
||||
}
|
||||
if err := store.DeleteSingle(ctx, backend, embedding); err != nil {
|
||||
return fmt.Errorf("facerecognition: delete: %w", err)
|
||||
}
|
||||
r.idIndex.Delete(id)
|
||||
return nil
|
||||
}
|
||||
@@ -124,8 +124,13 @@ func SubjectNodeBackendInstall(nodeID string) string {
|
||||
// BackendInstallRequest is the payload for a backend.install NATS request.
|
||||
type BackendInstallRequest struct {
|
||||
Backend string `json:"backend"`
|
||||
ModelID string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
|
||||
ModelID string `json:"model_id,omitempty"`
|
||||
BackendGalleries string `json:"backend_galleries,omitempty"`
|
||||
// URI is set for external installs (OCI image, URL, or path). When non-empty
|
||||
// the worker routes to InstallExternalBackend instead of the gallery lookup.
|
||||
URI string `json:"uri,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Alias string `json:"alias,omitempty"`
|
||||
}
|
||||
|
||||
// BackendInstallReply is the response from a backend.install NATS request.
|
||||
|
||||
@@ -168,6 +168,12 @@ func (c *fakeBackendClient) SoundGeneration(_ context.Context, _ *pb.SoundGenera
|
||||
func (c *fakeBackendClient) Detect(_ context.Context, _ *pb.DetectOptions, _ ...ggrpc.CallOption) (*pb.DetectResponse, error) {
|
||||
return nil, nil
|
||||
}
|
||||
func (c *fakeBackendClient) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
|
||||
return nil, nil
|
||||
}
|
||||
func (c *fakeBackendClient) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
|
||||
return nil, nil
|
||||
}
|
||||
func (c *fakeBackendClient) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
@@ -91,6 +91,14 @@ func (f *fakeGRPCBackend) Detect(_ context.Context, _ *pb.DetectOptions, _ ...gg
|
||||
return &pb.DetectResponse{}, nil
|
||||
}
|
||||
|
||||
func (f *fakeGRPCBackend) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
|
||||
return &pb.FaceVerifyResponse{}, nil
|
||||
}
|
||||
|
||||
func (f *fakeGRPCBackend) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
|
||||
return &pb.FaceAnalyzeResponse{}, nil
|
||||
}
|
||||
|
||||
func (f *fakeGRPCBackend) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
|
||||
return &pb.TranscriptResult{}, nil
|
||||
}
|
||||
|
||||
@@ -106,6 +106,13 @@ func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context
|
||||
if node.Status == StatusPending {
|
||||
continue
|
||||
}
|
||||
// Backend lifecycle ops only make sense on backend-type workers.
|
||||
// Agent workers don't subscribe to backend.install/delete/list, so
|
||||
// enqueueing for them guarantees a forever-retrying row that the
|
||||
// reconciler can never drain. Silently skip — they aren't consumers.
|
||||
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
|
||||
continue
|
||||
}
|
||||
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
|
||||
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
|
||||
result.Nodes = append(result.Nodes, NodeOpStatus{
|
||||
@@ -286,7 +293,7 @@ func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *gall
|
||||
backendName := op.GalleryElementName
|
||||
|
||||
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
|
||||
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
|
||||
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
@@ -304,7 +311,7 @@ func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name str
|
||||
galleriesJSON, _ := json.Marshal(d.backendGalleries)
|
||||
|
||||
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
|
||||
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
|
||||
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -3,12 +3,14 @@ package nodes
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/mudler/LocalAI/core/services/advisorylock"
|
||||
grpcclient "github.com/mudler/LocalAI/pkg/grpc"
|
||||
"github.com/mudler/xlog"
|
||||
"github.com/nats-io/nats.go"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
@@ -186,7 +188,7 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
|
||||
case OpBackendDelete:
|
||||
_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
|
||||
case OpBackendInstall, OpBackendUpgrade:
|
||||
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries))
|
||||
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
|
||||
if err != nil {
|
||||
applyErr = err
|
||||
} else if !reply.Success {
|
||||
@@ -206,12 +208,47 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
// ErrNoResponders means the node has no active NATS subscription for
|
||||
// this subject. Either its connection dropped, or it's the wrong
|
||||
// node type entirely. Mark unhealthy so the health monitor's
|
||||
// heartbeat-only pass doesn't immediately flip it back — and so
|
||||
// ListDuePendingBackendOps (which filters by status=healthy) stops
|
||||
// picking the row until the node genuinely recovers.
|
||||
if errors.Is(applyErr, nats.ErrNoResponders) {
|
||||
xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID)
|
||||
_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
|
||||
}
|
||||
|
||||
// Dead-letter cap: after maxAttempts the row is the reconciler
|
||||
// equivalent of a poison message. Delete it loudly so the queue
|
||||
// doesn't churn NATS every tick forever — operators can re-issue
|
||||
// the op from the UI if they still want it applied.
|
||||
if op.Attempts+1 >= maxPendingBackendOpAttempts {
|
||||
xlog.Error("Reconciler: abandoning pending backend op after max attempts",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID,
|
||||
"attempts", op.Attempts+1, "last_error", applyErr)
|
||||
if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
|
||||
xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
|
||||
xlog.Warn("Reconciler: pending backend op retry failed",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
|
||||
}
|
||||
}
|
||||
|
||||
// maxPendingBackendOpAttempts caps how many times the reconciler retries a
|
||||
// failing row before dead-lettering it. Ten attempts at exponential backoff
|
||||
// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
|
||||
// worker restart or network blip. Poisoned rows beyond that are almost
|
||||
// certainly structural (wrong node type, non-existent gallery entry) and no
|
||||
// amount of further retrying will help.
|
||||
const maxPendingBackendOpAttempts = 10
|
||||
|
||||
// probeLoadedModels gRPC-health-checks model addresses that the DB says are
|
||||
// loaded. If a model's backend process is gone (OOM, crash, manual restart)
|
||||
// we remove the row so ghosts don't linger. Only probes rows older than
|
||||
|
||||
@@ -373,4 +373,30 @@ var _ = Describe("ReplicaReconciler — state reconciliation", func() {
|
||||
Expect(row.NextRetryAt).To(BeTemporally(">", before))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("NewNodeRegistry malformed-row pruning", func() {
|
||||
It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
|
||||
agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
|
||||
Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
|
||||
backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
|
||||
Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
|
||||
|
||||
// Three rows: one for a valid backend node (should survive),
|
||||
// one for an agent node (pruned), one for an empty backend name
|
||||
// on the valid node (pruned).
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
|
||||
|
||||
// Re-instantiating the registry runs the cleanup migration.
|
||||
_, err := NewNodeRegistry(db)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
var rows []PendingBackendOp
|
||||
Expect(db.Find(&rows).Error).To(Succeed())
|
||||
Expect(rows).To(HaveLen(1))
|
||||
Expect(rows[0].NodeID).To(Equal(backend.ID))
|
||||
Expect(rows[0].Backend).To(Equal("foo"))
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -148,6 +148,30 @@ func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
|
||||
}); err != nil {
|
||||
return nil, fmt.Errorf("migrating node tables: %w", err)
|
||||
}
|
||||
|
||||
// One-shot cleanup of queue rows that can never drain: ops targeted at
|
||||
// agent workers (wrong subscription set), at non-existent nodes, or with
|
||||
// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
|
||||
// new ones from being written, but rows persisted by earlier versions
|
||||
// keep the reconciler busy retrying a permanently-failing NATS request
|
||||
// every 30s. Guarded by the same migration advisory lock so only one
|
||||
// frontend runs it.
|
||||
_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
|
||||
res := db.Exec(`
|
||||
DELETE FROM pending_backend_ops
|
||||
WHERE backend = ''
|
||||
OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
|
||||
`, NodeTypeBackend)
|
||||
if res.Error != nil {
|
||||
xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
|
||||
return res.Error
|
||||
}
|
||||
if res.RowsAffected > 0 {
|
||||
xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
return &NodeRegistry{db: db}, nil
|
||||
}
|
||||
|
||||
|
||||
@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
|
||||
return "", fmt.Errorf("no NATS connection for backend installation")
|
||||
}
|
||||
|
||||
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
|
||||
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
@@ -244,7 +244,7 @@ type fakeUnloader struct {
|
||||
unloadErr error
|
||||
}
|
||||
|
||||
func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
|
||||
func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
|
||||
return f.installReply, f.installErr
|
||||
}
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ type backendStopRequest struct {
|
||||
// NodeCommandSender abstracts NATS-based commands to worker nodes.
|
||||
// Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
|
||||
type NodeCommandSender interface {
|
||||
InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
|
||||
InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
|
||||
DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
|
||||
ListBackends(nodeID string) (*messaging.BackendListReply, error)
|
||||
StopBackend(nodeID, backend string) error
|
||||
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
|
||||
// The worker installs the backend from gallery (if not already installed),
|
||||
// starts the gRPC process, and replies when ready.
|
||||
// Timeout: 5 minutes (gallery install can take a while).
|
||||
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
|
||||
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
|
||||
subject := messaging.SubjectNodeBackendInstall(nodeID)
|
||||
xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)
|
||||
|
||||
@@ -80,6 +80,9 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
|
||||
Backend: backendType,
|
||||
ModelID: modelID,
|
||||
BackendGalleries: galleriesJSON,
|
||||
URI: uri,
|
||||
Name: name,
|
||||
Alias: alias,
|
||||
}, 5*time.Minute)
|
||||
}
|
||||
|
||||
|
||||
@@ -24,6 +24,8 @@ const (
|
||||
BackendTraceRerank BackendTraceType = "rerank"
|
||||
BackendTraceTokenize BackendTraceType = "tokenize"
|
||||
BackendTraceDetection BackendTraceType = "detection"
|
||||
BackendTraceFaceVerify BackendTraceType = "face_verify"
|
||||
BackendTraceFaceAnalyze BackendTraceType = "face_analyze"
|
||||
BackendTraceModelLoad BackendTraceType = "model_load"
|
||||
)
|
||||
|
||||
|
||||
@@ -14,11 +14,13 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend
|
||||
|
||||
### Request
|
||||
|
||||
The request body is JSON:
|
||||
The model to monitor is passed as a query parameter:
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|----------|----------|--------------------------------|
|
||||
| `model` | `string` | Yes | Name of the model to monitor |
|
||||
| Parameter | Type | Required | Location | Description |
|
||||
|-----------|----------|----------|----------|--------------------------------|
|
||||
| `model` | `string` | Yes | query | Name of the model to monitor |
|
||||
|
||||
For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.
|
||||
|
||||
### Response
|
||||
|
||||
@@ -42,9 +44,7 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/backend/monitor \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "my-model"}'
|
||||
curl "http://localhost:8080/backend/monitor?model=my-model"
|
||||
```
|
||||
|
||||
### Example response
|
||||
|
||||
@@ -7,6 +7,10 @@ url = "/features/embeddings/"
|
||||
|
||||
LocalAI supports generating embeddings for text or list of tokens.
|
||||
|
||||
For face embeddings specifically, see the
|
||||
[Face Recognition](/features/face-recognition/) feature — it produces
|
||||
512-d L2-normalized vectors tuned for face similarity.
|
||||
|
||||
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
|
||||
|
||||
## Model compatibility
|
||||
|
||||
228
docs/content/features/face-recognition.md
Normal file
228
docs/content/features/face-recognition.md
Normal file
@@ -0,0 +1,228 @@
|
||||
+++
|
||||
disableToc = false
|
||||
title = "Face Recognition"
|
||||
weight = 14
|
||||
url = "/features/face-recognition/"
|
||||
+++
|
||||
|
||||
LocalAI supports face recognition through the `insightface` backend:
|
||||
face verification (1:1), face identification (1:N) against a built-in
|
||||
vector store, face embedding, face detection, and demographic analysis
|
||||
(age / gender).
|
||||
|
||||
The backend ships **two interchangeable engines** under one image, each
|
||||
paired with a distinct gallery entry so users can pick by license and
|
||||
accuracy needs.
|
||||
|
||||
## Licensing — read this first
|
||||
|
||||
| Gallery entry | Detector + recognizer | Size | License |
|
||||
|---|---|---|---|
|
||||
| `insightface-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | ~326 MB | **Non-commercial research only** (upstream insightface weights) |
|
||||
| `insightface-buffalo-s` | SCRFD-500MF + MBF + GenderAge | ~159 MB | **Non-commercial research only** |
|
||||
| `insightface-opencv` | YuNet + SFace | ~40 MB | **Apache 2.0 — commercial-safe** |
|
||||
|
||||
The `insightface` Python library itself is MIT, but the pretrained model
|
||||
packs (buffalo_l, buffalo_s, antelopev2) are released by the upstream
|
||||
maintainers for **non-commercial research use only**. Pick the
|
||||
`insightface-opencv` entry for production / commercial deployments.
|
||||
|
||||
## Quickstart
|
||||
|
||||
Pull the commercial-safe backend (recommended for copy-paste):
|
||||
|
||||
```bash
|
||||
local-ai models install insightface-opencv
|
||||
```
|
||||
|
||||
Verify that two images depict the same person:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/verify \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "insightface-opencv",
|
||||
"img1": "https://example.com/alice_1.jpg",
|
||||
"img2": "https://example.com/alice_2.jpg"
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"verified": true,
|
||||
"distance": 0.27,
|
||||
"threshold": 0.35,
|
||||
"confidence": 23.1,
|
||||
"model": "insightface-opencv",
|
||||
"img1_area": { "x": 120.4, "y": 82.1, "w": 198.3, "h": 260.5 },
|
||||
"img2_area": { "x": 110.8, "y": 95.0, "w": 205.6, "h": 268.2 },
|
||||
"processing_time_ms": 412.0
|
||||
}
|
||||
```
|
||||
|
||||
## 1:N identification workflow (register → identify → forget)
|
||||
|
||||
This is the primary "face recognition" flow. Under the hood it uses
|
||||
LocalAI's built-in in-memory vector store — no external database to
|
||||
stand up.
|
||||
|
||||
1. Register known faces:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "insightface-buffalo-l",
|
||||
"name": "Alice",
|
||||
"img": "https://example.com/alice.jpg"
|
||||
}'
|
||||
# → {"id": "8b7...", "name": "Alice", "registered_at": "2026-04-21T..."}
|
||||
```
|
||||
|
||||
2. Identify an unknown probe:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/identify \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "insightface-buffalo-l",
|
||||
"img": "https://example.com/unknown.jpg",
|
||||
"top_k": 5
|
||||
}'
|
||||
# → {"matches": [{"id":"8b7...","name":"Alice","distance":0.22,"match":true,...}]}
|
||||
```
|
||||
|
||||
3. Remove a person by ID:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/forget \
|
||||
-d '{"id": "8b7..."}'
|
||||
# → 204 No Content
|
||||
```
|
||||
|
||||
{{% alert icon="⚠️" color="warning" %}}
|
||||
**Storage caveat.** The default vector store is in-memory. All
|
||||
registered faces are lost when LocalAI restarts. Persistent storage
|
||||
(pgvector) is a tracked future enhancement — the face-recognition HTTP
|
||||
API is designed to swap the backing store without changing the wire
|
||||
format.
|
||||
{{% /alert %}}
|
||||
|
||||
## API reference
|
||||
|
||||
### `POST /v1/face/verify` (1:1)
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `model` | string | gallery entry name (e.g. `insightface-buffalo-l`) |
|
||||
| `img1`, `img2` | string | URL, base64, or data-URI |
|
||||
| `threshold` | float, optional | cosine-distance cutoff; default depends on engine |
|
||||
| `anti_spoofing` | bool, optional | reserved — unused in the current release |
|
||||
|
||||
Returns `verified`, `distance`, `threshold`, `confidence`, `model`,
|
||||
`img1_area`, `img2_area`, and `processing_time_ms`.
|
||||
|
||||
### `POST /v1/face/analyze`
|
||||
|
||||
Returns demographic attributes for every detected face:
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `model` | string | gallery entry |
|
||||
| `img` | string | URL / base64 / data-URI |
|
||||
| `actions` | string[] | subset of `["age","gender","emotion","race"]`; empty = all supported |
|
||||
|
||||
Only `insightface-buffalo-l` / `insightface-buffalo-s` populate age and
|
||||
gender (genderage head). `insightface-opencv` returns face regions with
|
||||
empty attributes — SFace has no demographic classifier. Emotion and
|
||||
race are always empty in the current release.
|
||||
|
||||
### `POST /v1/face/register` (1:N enrollment)
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `model` | string | face recognition model |
|
||||
| `img` | string | face to enroll |
|
||||
| `name` | string | human-readable label |
|
||||
| `labels` | map[string]string, optional | arbitrary metadata |
|
||||
| `store` | string, optional | vector store model; defaults to local-store |
|
||||
|
||||
Returns `{id, name, registered_at}`. The `id` is an opaque UUID used by
|
||||
`/v1/face/identify` and `/v1/face/forget`.
|
||||
|
||||
### `POST /v1/face/identify` (1:N recognition)
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `model` | string | face recognition model |
|
||||
| `img` | string | probe image |
|
||||
| `top_k` | int, optional | max matches to return; default 5 |
|
||||
| `threshold` | float, optional | cosine-distance cutoff; default 0.35 (ArcFace) |
|
||||
| `store` | string, optional | vector store model; defaults to local-store |
|
||||
|
||||
Returns a list of matches sorted by ascending distance, each with `id`,
|
||||
`name`, `labels`, `distance`, `confidence`, and `match`
|
||||
(`distance ≤ threshold`).
|
||||
|
||||
### `POST /v1/face/forget`
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `id` | string | ID returned by `/v1/face/register` |
|
||||
|
||||
Returns `204 No Content` on success, `404 Not Found` if the ID is
|
||||
unknown.
|
||||
|
||||
### `POST /v1/face/embed`
|
||||
|
||||
Returns the L2-normalized face embedding vector for the detected face.
|
||||
|
||||
| field | type | description |
|
||||
|---|---|---|
|
||||
| `model` | string | face model |
|
||||
| `img` | string | URL / base64 / data-URI |
|
||||
|
||||
Returns `{embedding: float[], dim: int, model: string}`. Dimension is
|
||||
512 for the insightface ArcFace/MBF recognizers and 128 for OpenCV's
|
||||
SFace.
|
||||
|
||||
> **Note:** the OpenAI-compatible `/v1/embeddings` endpoint is
|
||||
> intentionally text-only by contract (`input` is a string or list of
|
||||
> strings of TEXT to embed) — passing an image data-URI there does
|
||||
> nothing useful. Use `/v1/face/embed` for image inputs.
|
||||
|
||||
### Reused endpoint
|
||||
|
||||
- `POST /v1/detection` — returns face bounding boxes with
|
||||
`class_name: "face"`; works for both engines.
|
||||
|
||||
## Choosing an engine
|
||||
|
||||
| Need | Entry |
|
||||
|---|---|
|
||||
| Commercial product | `insightface-opencv` |
|
||||
| Highest accuracy (research / demos) | `insightface-buffalo-l` |
|
||||
| Edge / low-memory / research | `insightface-buffalo-s` |
|
||||
|
||||
The recommended default `threshold` for `/v1/face/verify` and
|
||||
`/v1/face/identify` depends on the recognizer:
|
||||
|
||||
| Recognizer | Cosine-distance threshold |
|
||||
|---|---|
|
||||
| ArcFace R50 (`buffalo_l`) | ~0.35 |
|
||||
| MBF (`buffalo_s`) | ~0.40 |
|
||||
| SFace (`opencv`) | ~0.50 |
|
||||
|
||||
Pass `threshold` explicitly when switching engines — the per-engine
|
||||
default only fires when the field is omitted.
|
||||
|
||||
## Related features
|
||||
|
||||
- [Object Detection](/features/object-detection/) — generic bounding-box
|
||||
detection; `/v1/detection` works with the insightface backend too.
|
||||
- [Embeddings](/features/embeddings/) — raw vector extraction; face
|
||||
embeddings live in the same endpoint under the hood.
|
||||
- [Stores](/features/stores/) — the generic vector store powering the
|
||||
1:N recognition pipeline.
|
||||
@@ -7,6 +7,11 @@ url = "/features/object-detection/"
|
||||
|
||||
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
|
||||
|
||||
For detecting **faces** specifically, see the dedicated
|
||||
[Face Recognition](/features/face-recognition/) feature — its
|
||||
`/v1/detection` support is tuned for face bounding boxes and ships
|
||||
with commercially-safe model options.
|
||||
|
||||
## Overview
|
||||
|
||||
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
|
||||
|
||||
@@ -9,6 +9,14 @@ url = '/stores'
|
||||
Stores are an experimental feature to help with querying data using similarity search. It is
|
||||
a low level API that consists of only `get`, `set`, `delete` and `find`.
|
||||
|
||||
{{% alert icon="💡" color="info" %}}
|
||||
**Face recognition uses this store.** The 1:N face identification flow
|
||||
(`/v1/face/register`, `/v1/face/identify`, `/v1/face/forget`) is built
|
||||
on top of the generic store — see
|
||||
[Face Recognition](/features/face-recognition/) for the face-oriented
|
||||
API.
|
||||
{{% /alert %}}
|
||||
|
||||
For example if you have an embedding of some text and want to find text with similar embeddings.
|
||||
You can create embeddings for chunks of all your text then compare them against the embedding of the text you
|
||||
are searching on.
|
||||
|
||||
@@ -130,6 +130,19 @@ Reference for system information commands and diagnostics.
|
||||
|
||||
---
|
||||
|
||||
### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
|
||||
Policy for AI-assisted contributions — licensing, DCO, and attribution.
|
||||
|
||||
**Key topics:**
|
||||
- Aligned with the Linux kernel's AI assistants policy
|
||||
- Signed-off-by and DCO rules
|
||||
- `Assisted-by` commit trailer format
|
||||
- Scope and responsibility of the human submitter
|
||||
|
||||
**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
| Task | Documentation |
|
||||
@@ -138,6 +151,7 @@ Reference for system information commands and diagnostics.
|
||||
| CLI commands | [CLI Reference](cli-reference.md) |
|
||||
| Check compatibility | [Compatibility Table](compatibility-table.md) |
|
||||
| System diagnostics | [System Info](system-info.md) |
|
||||
| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |
|
||||
|
||||
---
|
||||
|
||||
|
||||
79
docs/content/reference/ai-coding-assistants.md
Normal file
79
docs/content/reference/ai-coding-assistants.md
Normal file
@@ -0,0 +1,79 @@
|
||||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "AI Coding Assistants"
|
||||
weight = 28
|
||||
+++
|
||||
|
||||
This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
|
||||
|
||||
**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
|
||||
|
||||
AI tools helping with LocalAI development should follow the standard project development process:
|
||||
|
||||
- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
|
||||
- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
|
||||
- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
|
||||
|
||||
## Licensing and Legal Requirements
|
||||
|
||||
All contributions must comply with LocalAI's licensing requirements:
|
||||
|
||||
- LocalAI is licensed under the **MIT License**
|
||||
- New source files should use the SPDX license identifier `MIT` where applicable to the file type
|
||||
- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
|
||||
|
||||
## Signed-off-by and Developer Certificate of Origin
|
||||
|
||||
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
|
||||
|
||||
- Reviewing all AI-generated code
|
||||
- Ensuring compliance with licensing requirements
|
||||
- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
|
||||
- Taking full responsibility for the contribution
|
||||
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
|
||||
|
||||
## Attribution
|
||||
|
||||
When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
|
||||
- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
|
||||
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
|
||||
|
||||
Basic development tools (git, go, make, editors) should **not** be listed.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
fix(llama-cpp): handle empty tool call arguments
|
||||
|
||||
Previously the parser panicked when the model returned a tool call with
|
||||
an empty arguments object. Fall back to an empty JSON object in that
|
||||
case so downstream consumers receive a valid payload.
|
||||
|
||||
Assisted-by: Claude:claude-opus-4-7 golangci-lint
|
||||
Signed-off-by: Jane Developer <jane@example.com>
|
||||
```
|
||||
|
||||
## Scope and Responsibility
|
||||
|
||||
Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
|
||||
|
||||
- Understand every line that lands in the PR
|
||||
- Verify that generated code compiles, passes tests, and follows the project style
|
||||
- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
|
||||
- Not submit AI output verbatim without review
|
||||
|
||||
Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
|
||||
|
||||
{{% notice note %}}
|
||||
This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
|
||||
{{% /notice %}}
|
||||
@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
|
||||
|---------|-------------|-------------|
|
||||
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
|
||||
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
|
||||
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
|
||||
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
|
||||
@@ -10,6 +10,10 @@ Release notes have been now moved completely over Github releases.
|
||||
|
||||
You can see the release notes [here](https://github.com/mudler/LocalAI/releases).
|
||||
|
||||
## 2026 Highlights
|
||||
|
||||
- **April 2026**: [Face recognition backend](/features/face-recognition/) — `insightface`-powered 1:1 verification, 1:N identification, face embedding, face detection, and demographic analysis. Ships both a non-commercial `buffalo_l` model and an Apache 2.0 OpenCV Zoo alternative.
|
||||
|
||||
## 2024 Highlights
|
||||
|
||||
- **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)
|
||||
|
||||
@@ -1,4 +1,268 @@
|
||||
---
|
||||
- name: "qwen3.6-27b"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
|
||||
description: |
|
||||
# Qwen3.6-27B
|
||||
|
||||
[](https://chat.qwen.ai)
|
||||
|
||||
> [!Note]
|
||||
> This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.
|
||||
>
|
||||
> These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
|
||||
|
||||
Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.
|
||||
|
||||
## Qwen3.6 Highlights
|
||||
|
||||
This release delivers substantial upgrades, particularly in
|
||||
|
||||
- **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
|
||||
- **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.
|
||||
|
||||
For more details, please refer to our blog post Qwen3.6-27B.
|
||||
|
||||
## Model Overview
|
||||
|
||||
...
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- qwen
|
||||
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
mmproj: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
min_p: 0
|
||||
model: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
|
||||
presence_penalty: 1.5
|
||||
repeat_penalty: 1
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
|
||||
sha256: 5ed60d0af4650a854b1755bd392f9aef4872643dc25a254bc68043fa638392a0
|
||||
uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf
|
||||
- filename: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
|
||||
sha256: fdc443e974cad1f61c45af1cfd5580855855ddce0d6c14cc500a5714c486ac1d
|
||||
uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/mmproj-F32.gguf
|
||||
- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
|
||||
description: |
|
||||
# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
|
||||
|
||||
A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
|
||||
|
||||
The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
|
||||
|
||||
- **Developed by:** @hesamation
|
||||
- **Base model:** `Qwen/Qwen3.6-35B-A3B`
|
||||
- **License:** apache-2.0
|
||||
|
||||
This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
|
||||
|
||||
[](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
|
||||
|
||||
...
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- qwen
|
||||
- reasoning
|
||||
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
min_p: 0
|
||||
model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
presence_penalty: 1.5
|
||||
repeat_penalty: 1
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
|
||||
uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
- name: "qwen3.5-9b-glm5.1-distill-v1"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
|
||||
description: |
|
||||
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
|
||||
|
||||
## 📌 Model Overview
|
||||
|
||||
**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
|
||||
**Base Model:** Qwen3.5-9B
|
||||
**Training Type:** Supervised Fine-Tuning (SFT, Distillation)
|
||||
**Parameter Scale:** 9B
|
||||
**Training Framework:** Unsloth
|
||||
|
||||
This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
|
||||
|
||||
The primary goals are to:
|
||||
|
||||
- Improve **structured reasoning ability**
|
||||
- Enhance **instruction-following consistency**
|
||||
- Activate **latent knowledge via better reasoning structure**
|
||||
|
||||
## 📊 Training Data
|
||||
|
||||
### Main Dataset
|
||||
|
||||
- `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
|
||||
- Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
|
||||
- Generated from a **GLM-5.1 teacher model**
|
||||
- Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
|
||||
- Training used a **filtered subset**, not the full source dataset.
|
||||
|
||||
### Auxiliary Dataset
|
||||
|
||||
- `Jackrong/Qwen3.5-reasoning-700x`
|
||||
|
||||
...
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- qwen
|
||||
- instruction-tuned
|
||||
- reasoning
|
||||
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
min_p: 0
|
||||
model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
presence_penalty: 1.5
|
||||
repeat_penalty: 1
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
|
||||
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
- filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
|
||||
sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
|
||||
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
|
||||
- name: "supergemma4-26b-uncensored-v2"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
|
||||
description: |
|
||||
Hugging Face |
|
||||
GitHub |
|
||||
Launch Blog |
|
||||
Documentation
|
||||
|
||||
License: Apache 2.0 | Authors: Google DeepMind
|
||||
|
||||
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
|
||||
|
||||
Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
|
||||
|
||||
Gemma 4 introduces key **capability and architectural advancements**:
|
||||
|
||||
* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
|
||||
|
||||
...
|
||||
license: "gemma"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
icon: https://ai.google.dev/gemma/images/gemma4_banner.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
|
||||
uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
- name: "qwopus-glm-18b-merged"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
|
||||
description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n - Improve **structured reasoning ability**\n - Enhance **instruction-following consistency**\n - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n - Generated from a **GLM-5.1 teacher model**\n - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- reasoning
|
||||
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
|
||||
uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
- name: "qwen3.6-35b-a3b-apex"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
@@ -887,6 +1151,8 @@
|
||||
- gpu
|
||||
overrides:
|
||||
backend: neutts
|
||||
parameters:
|
||||
model: neuphonic/neutts-air
|
||||
known_usecases:
|
||||
- tts
|
||||
- name: vllm-omni-z-image-turbo
|
||||
@@ -3502,6 +3768,169 @@
|
||||
- filename: arcee-ai_AFM-4.5B-Q4_K_M.gguf
|
||||
sha256: f05516b323f581bebae1af2cbf900d83a2569b0a60c54366daf4a9c15ae30d4f
|
||||
uri: huggingface://bartowski/arcee-ai_AFM-4.5B-GGUF/arcee-ai_AFM-4.5B-Q4_K_M.gguf
|
||||
- &insightface_buffalo_l
|
||||
name: "insightface-buffalo-l"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
|
||||
# insightface library is MIT; pretrained packs are NON-COMMERCIAL.
|
||||
license: "insightface-non-commercial"
|
||||
description: |
|
||||
Face recognition using insightface's `buffalo_l` pack
|
||||
(SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB).
|
||||
Default choice, highest accuracy.
|
||||
|
||||
Weights delivered via LocalAI's gallery mechanism (SHA-256 verified,
|
||||
cached in the models directory like any other managed model).
|
||||
NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.
|
||||
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
|
||||
urls: [https://github.com/deepinsight/insightface]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: insightface-buffalo-l}
|
||||
options: ["engine:insightface", "model_pack:buffalo_l"]
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: buffalo_l.zip
|
||||
sha256: 80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
|
||||
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
|
||||
- &insightface_buffalo_m
|
||||
name: "insightface-buffalo-m"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
|
||||
license: "insightface-non-commercial"
|
||||
description: |
|
||||
Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace +
|
||||
genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a
|
||||
cheaper detector — good balance on mid-range hardware.
|
||||
NON-COMMERCIAL RESEARCH USE ONLY.
|
||||
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
|
||||
urls: [https://github.com/deepinsight/insightface]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: insightface-buffalo-m}
|
||||
options: ["engine:insightface", "model_pack:buffalo_m"]
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: buffalo_m.zip
|
||||
sha256: d98264bd8f2dc75cbc2ddce2a14e636e02bb857b3051c234b737bf3b614edca9
|
||||
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_m.zip
|
||||
- &insightface_buffalo_s
|
||||
name: "insightface-buffalo-s"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
|
||||
license: "insightface-non-commercial"
|
||||
description: |
|
||||
Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder +
|
||||
genderage, ~159MB). Good fit for mid-range CPU deployments.
|
||||
NON-COMMERCIAL RESEARCH USE ONLY.
|
||||
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
|
||||
urls: [https://github.com/deepinsight/insightface]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: insightface-buffalo-s}
|
||||
options: ["engine:insightface", "model_pack:buffalo_s"]
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: buffalo_s.zip
|
||||
sha256: d85a87f503f691807cd8bb97128bdf7a0660326cd9cd02657127fa978bab8b5e
|
||||
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_s.zip
|
||||
- &insightface_buffalo_sc
|
||||
name: "insightface-buffalo-sc"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
|
||||
license: "insightface-non-commercial"
|
||||
description: |
|
||||
Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB).
|
||||
NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty
|
||||
attributes for this pack. Ideal for edge/embedded deployments where
|
||||
only verification and embedding are needed.
|
||||
NON-COMMERCIAL RESEARCH USE ONLY.
|
||||
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
|
||||
urls: [https://github.com/deepinsight/insightface]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: insightface-buffalo-sc}
|
||||
options: ["engine:insightface", "model_pack:buffalo_sc"]
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: buffalo_sc.zip
|
||||
sha256: 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
|
||||
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
|
||||
- &insightface_antelopev2
|
||||
name: "insightface-antelopev2"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
|
||||
license: "insightface-non-commercial"
|
||||
description: |
|
||||
Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer +
|
||||
genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on
|
||||
harder benchmarks; pays for it in GPU memory.
|
||||
NON-COMMERCIAL RESEARCH USE ONLY.
|
||||
tags: [face-recognition, face-verification, face-embedding, research-only, gpu]
|
||||
urls: [https://github.com/deepinsight/insightface]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: insightface-antelopev2}
|
||||
options: ["engine:insightface", "model_pack:antelopev2"]
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: antelopev2.zip
|
||||
sha256: 8e182f14fc6e80b3bfa375b33eb6cff7ee05d8ef7633e738d1c89021dcf0c5c5
|
||||
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip
|
||||
- &insightface_opencv
|
||||
name: "insightface-opencv"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
license: apache-2.0
|
||||
description: |
|
||||
Face recognition using OpenCV Zoo weights: YuNet detector + SFace
|
||||
128-d recognizer (fp32). APACHE 2.0 — safe for commercial use.
|
||||
Lower accuracy than insightface packs, no demographic head
|
||||
(`/v1/face/analyze` returns detection regions only).
|
||||
Weights are downloaded on install via LocalAI's gallery mechanism
|
||||
(~40MB).
|
||||
tags: [face-recognition, face-verification, face-embedding, commercial-ok, gpu, cpu]
|
||||
urls: [https://github.com/opencv/opencv_zoo]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: face_detection_yunet_2023mar.onnx}
|
||||
options:
|
||||
- "engine:onnx_direct"
|
||||
- "detector_onnx:face_detection_yunet_2023mar.onnx"
|
||||
- "recognizer_onnx:face_recognition_sface_2021dec.onnx"
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: face_detection_yunet_2023mar.onnx
|
||||
sha256: 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
|
||||
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
|
||||
- filename: face_recognition_sface_2021dec.onnx
|
||||
sha256: 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
|
||||
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
|
||||
- &insightface_opencv_int8
|
||||
name: "insightface-opencv-int8"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
license: apache-2.0
|
||||
description: |
|
||||
Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB).
|
||||
Roughly 3x smaller and noticeably faster on CPU than the fp32 variant
|
||||
at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe.
|
||||
Weights are downloaded on install via LocalAI's gallery mechanism.
|
||||
tags: [face-recognition, face-verification, face-embedding, commercial-ok, edge, cpu]
|
||||
urls: [https://github.com/opencv/opencv_zoo]
|
||||
overrides:
|
||||
backend: insightface
|
||||
parameters: {model: face_detection_yunet_2023mar_int8.onnx}
|
||||
options:
|
||||
- "engine:onnx_direct"
|
||||
- "detector_onnx:face_detection_yunet_2023mar_int8.onnx"
|
||||
- "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx"
|
||||
known_usecases: [face_recognition, detection, embeddings]
|
||||
files:
|
||||
- filename: face_detection_yunet_2023mar_int8.onnx
|
||||
sha256: 321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294
|
||||
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
|
||||
- filename: face_recognition_sface_2021dec_int8.onnx
|
||||
sha256: 2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a
|
||||
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
|
||||
- &rfdetr
|
||||
name: "rfdetr-base"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
@@ -15189,11 +15618,13 @@
|
||||
model: wan2.1_t2v_1.3b-q8_0.gguf
|
||||
files:
|
||||
- filename: "wan2.1_t2v_1.3b-q8_0.gguf"
|
||||
sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
|
||||
uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- name: wan-2.1-i2v-14b-480p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
@@ -15214,11 +15645,103 @@
|
||||
model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
|
||||
sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
|
||||
uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
|
||||
- name: wan-2.1-flf2v-14b-720p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
description: |
|
||||
Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
|
||||
Takes a start and end reference image and interpolates a 33-frame clip
|
||||
between them. Unlike the plain I2V variant this model feeds the end
|
||||
frame through clip_vision as well, so it conditions semantically (not
|
||||
just in pixel-space) on both endpoints. That makes it the right choice
|
||||
for seamless loops (start_image == end_image) and clean narrative cuts.
|
||||
Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
|
||||
text encoder, and clip_vision_h as I2V 14B.
|
||||
urls:
|
||||
- https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
|
||||
tags:
|
||||
- image-to-video
|
||||
- first-last-frame-to-video
|
||||
- wan
|
||||
- video-generation
|
||||
- cpu
|
||||
- gpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
|
||||
sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
|
||||
uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
|
||||
- name: wan-2.1-i2v-14b-720p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
description: |
|
||||
Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
|
||||
Native 720p sibling of the 480p I2V model: animates a single
|
||||
reference image into a 33-frame clip at up to 1280x720. Trained
|
||||
purely as image-to-video (no first-last-frame interpolation path),
|
||||
so motion is freer and better-suited to single-anchor animation
|
||||
than repurposing the FLF2V 720P variant for i2v. Shares the same
|
||||
VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
|
||||
and FLF2V 14B 720P entries.
|
||||
urls:
|
||||
- https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
|
||||
tags:
|
||||
- image-to-video
|
||||
- wan
|
||||
- video-generation
|
||||
- cpu
|
||||
- gpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
|
||||
sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
|
||||
uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
|
||||
45
go.mod
45
go.mod
@@ -8,13 +8,13 @@ require (
|
||||
github.com/Masterminds/sprig/v3 v3.3.0
|
||||
github.com/alecthomas/kong v1.14.0
|
||||
github.com/anthropics/anthropic-sdk-go v1.27.0
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
|
||||
github.com/charmbracelet/glamour v1.0.0
|
||||
github.com/containerd/containerd v1.7.30
|
||||
github.com/coreos/go-oidc/v3 v3.17.0
|
||||
github.com/containerd/containerd v1.7.31
|
||||
github.com/coreos/go-oidc/v3 v3.18.0
|
||||
github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
|
||||
github.com/ebitengine/purego v0.10.0
|
||||
github.com/emirpasic/gods/v2 v2.0.0-alpha
|
||||
@@ -35,7 +35,7 @@ require (
|
||||
github.com/lithammer/fuzzysearch v1.1.8
|
||||
github.com/mholt/archiver/v3 v3.5.1
|
||||
github.com/microcosm-cc/bluemonday v1.0.27
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0
|
||||
github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
|
||||
github.com/mudler/edgevpn v0.31.1
|
||||
github.com/mudler/go-processmanager v0.1.0
|
||||
@@ -75,24 +75,23 @@ require (
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
|
||||
github.com/aws/smithy-go v1.24.2 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
|
||||
github.com/aws/smithy-go v1.25.0 // indirect
|
||||
github.com/bahlo/generic-list-go v0.2.0 // indirect
|
||||
github.com/buger/jsonparser v1.1.1 // indirect
|
||||
github.com/go-jose/go-jose/v4 v4.1.3 // indirect
|
||||
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
|
||||
github.com/jinzhu/inflection v1.0.0 // indirect
|
||||
github.com/jinzhu/now v1.1.5 // indirect
|
||||
github.com/mattn/go-sqlite3 v1.14.24 // indirect
|
||||
|
||||
94
go.sum
94
go.sum
@@ -70,44 +70,42 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
|
||||
github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
|
||||
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
|
||||
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
|
||||
github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
|
||||
github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
|
||||
github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
|
||||
github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
|
||||
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
|
||||
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
|
||||
github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
|
||||
@@ -198,8 +196,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
|
||||
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
|
||||
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
|
||||
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
|
||||
github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
|
||||
github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
|
||||
github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
|
||||
github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
|
||||
github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
|
||||
github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
|
||||
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
|
||||
@@ -212,8 +210,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
|
||||
github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
|
||||
github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
|
||||
github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
|
||||
github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
|
||||
github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
|
||||
github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
|
||||
github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
|
||||
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
|
||||
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
|
||||
github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
|
||||
@@ -336,8 +334,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
|
||||
github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
|
||||
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
|
||||
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
|
||||
github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
|
||||
github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
|
||||
github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
|
||||
github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
|
||||
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
|
||||
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
|
||||
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
|
||||
@@ -385,8 +383,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
|
||||
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
|
||||
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
|
||||
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
|
||||
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
|
||||
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
|
||||
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
|
||||
@@ -691,8 +689,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
|
||||
github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
|
||||
github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
|
||||
github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
|
||||
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
|
||||
@@ -54,6 +54,8 @@ type Backend interface {
|
||||
TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
|
||||
SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
|
||||
Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
|
||||
FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
|
||||
FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error)
|
||||
AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
|
||||
AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
|
||||
TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)
|
||||
|
||||
@@ -81,6 +81,14 @@ func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
|
||||
return pb.DetectResponse{}, fmt.Errorf("unimplemented")
|
||||
}
|
||||
|
||||
func (llm *Base) FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
|
||||
return pb.FaceVerifyResponse{}, fmt.Errorf("unimplemented")
|
||||
}
|
||||
|
||||
func (llm *Base) FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
|
||||
return pb.FaceAnalyzeResponse{}, fmt.Errorf("unimplemented")
|
||||
}
|
||||
|
||||
func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
|
||||
return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
|
||||
}
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user