Compare commits

..

44 Commits

Author SHA1 Message Date
Ettore Di Giacinto
5f7a0c3b26 chore(turboquant): bump fork pin to rebase/upstream-sync-april-2026
Move the TurboQuant llama.cpp fork pin from feature/turboquant-kv-cache
(627ebbc6) to rebase/upstream-sync-april-2026 (7f320bb8), picking up the
upstream-sync work on the fork.

Assisted-by: Claude:claude-opus-4-7
2026-04-22 20:01:49 +00:00
orbisai0security
bbeacf140d fix: remove unsafe sprintf() in grpc-server.cpp (#9486)
fix: V-001 security vulnerability

Automated security fix generated by Orbis Security AI
2026-04-22 21:57:29 +02:00
LocalAI [bot]
6820ec468f chore(model gallery): 🤖 add 1 new models via gallery agent (#9491)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 21:56:11 +02:00
Ettore Di Giacinto
20baec77ab feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480)
* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis

Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.

New gRPC RPCs (backend/backend.proto):
  * FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
  * FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse

Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.

New HTTP endpoints under /v1/face/:
  * verify     — 1:1 image pair same-person decision
  * analyze    — per-face age + gender (emotion/race reserved)
  * register   — 1:N enrollment; stores embedding in vector store
  * identify   — 1:N recognition; detect → embed → StoresFind
  * forget     — remove a registered face by opaque ID

Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).

New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.

Gallery (backend/index.yaml) ships three entries:
  * insightface-buffalo-l   — SCRFD-10GF + ArcFace R50 + genderage
                              (~326MB pre-baked; non-commercial research use only)
  * insightface-opencv      — YuNet + SFace (~40MB pre-baked; Apache 2.0)
  * insightface-buffalo-s   — SCRFD-500MF + MBF (runtime download; non-commercial)

Python backend (backend/python/insightface/):
  * engines.py — FaceEngine protocol with InsightFaceEngine and
    OnnxDirectEngine; resolves model paths relative to the backend
    directory so the same gallery config works in docker-scratch and
    in the e2e-backends rootfs-extraction harness.
  * backend.py — gRPC servicer implementing Health, LoadModel, Status,
    Embedding, Detect, FaceVerify, FaceAnalyze.
  * install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
    backend directory so first-run is offline-clean (the final scratch
    image only preserves files under /<backend>/).
  * test.py — parametrized unit tests over both engines.

Tests:
  * Registry unit tests (go test -race ./core/services/facerecognition/...)
    — in-memory fake grpc.Backend, table-driven, covers register/
    identify/forget/error paths + concurrent access.
  * tests/e2e-backends/backend_test.go extended with face caps
    (face_detect, face_embed, face_verify, face_analyze); relative
    ordering + configurable verifyCeiling per engine.
  * Makefile targets: test-extra-backend-insightface-buffalo-l,
    -opencv, and the -all aggregate.
  * CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
    auto-triggered by changes under backend/python/insightface/.

Docs:
  * docs/content/features/face-recognition.md — feature page with
    license table, quickstart (defaults to the commercial-safe model),
    models matrix, API reference, 1:N workflow, storage caveats.
  * Cross-refs in object-detection.md, stores.md, embeddings.md, and
    whats-new.md.
  * Contributor README at backend/python/insightface/README.md.

Verified end-to-end:
  * buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
    face_verify, face_analyze).
  * opencv: 5/5 specs (same minus face_analyze — SFace has no
    demographic head; correctly skipped via BACKEND_TEST_CAPS).

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): move engine selection to model gallery, collapse backend entries

The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.

Correct split:

  * `backend/index.yaml` now has ONE `insightface` backend entry
    shipping the CPU + CUDA 12 container images. The Python backend
    bundles both the non-commercial insightface model packs
    (buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
    weights (YuNet + SFace); the active engine is selected at
    LoadModel time via `options: ["engine:..."]`.

  * `gallery/index.yaml` gains three model entries —
    `insightface-buffalo-l`, `insightface-opencv`,
    `insightface-buffalo-s` — each setting the appropriate
    `overrides.backend` + `overrides.options` so installing one
    actually gives the user the intended engine. This matches how
    `rfdetr-base` lives in the model gallery against the `rfdetr`
    backend.

The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): cover all supported models in the gallery + drop weight baking

Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.

Gallery now has seven `insightface-*` model entries (gallery/index.yaml):

  insightface (family)  — non-commercial research use
    • buffalo-l   (326MB)  — SCRFD-10GF + ResNet50 + genderage, default
    • buffalo-m   (313MB)  — SCRFD-2.5GF + ResNet50 + genderage
    • buffalo-s   (159MB)  — SCRFD-500MF + MBF + genderage
    • buffalo-sc  (16MB)   — SCRFD-500MF + MBF, recognition only
                             (no landmarks, no demographics — analyze
                             returns empty attributes)
    • antelopev2  (407MB)  — SCRFD-10GF + ResNet100@Glint360K + genderage

  OpenCV Zoo family — Apache 2.0 commercial-safe
    • opencv       — YuNet + SFace fp32 (~40MB)
    • opencv-int8  — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)

Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:

  * OpenCV variants list `files:` with ONNX URIs + SHA-256, so
    `local-ai models install insightface-opencv` pulls them into the
    models directory exactly like any other gallery-managed model.

  * insightface packs (upstream distributes .zip archives only, not
    individual ONNX files) auto-download on first LoadModel via
    FaceAnalysis' built-in machinery, rooted at the LocalAI models
    directory so they live alongside everything else — same pattern
    `rfdetr` uses with `inference.get_model()`.

Backend changes (backend/python/insightface/):

  * backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
    LocalAI models directory) to engines via a `_model_dir` hint.
    This replaces the earlier ModelFile-dirname approach; ModelPath
    is the canonical "models directory" variable set by the Go loader
    (pkg/model/initializers.go:144) and is always populated.

  * engines.py::_resolve_model_path — picks up `model_dir` and searches
    it (plus basename-in-model-dir) before falling back to the dev
    script-dir. This is how OnnxDirectEngine finds gallery-downloaded
    YuNet/SFace files by filename only.

  * engines.py::_flatten_insightface_pack — new helper that works
    around an upstream packaging inconsistency: buffalo_l/s/sc zips
    expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
    files in a redundant `<name>/` directory. insightface's own
    loader looks one level too shallow and fails. We call
    `ensure_available()` explicitly, flatten if nested, then hand to
    FaceAnalysis.

  * engines.py::InsightFaceEngine.prepare — root-resolution order now
    includes the `_model_dir` hint so packs download into the LocalAI
    models directory by default.

  * install.sh — no longer pre-downloads any weights. Everything is
    gallery-managed now.

  * smoke.py (new) — parametrized smoke test that iterates over every
    gallery configuration, simulating the LocalAI install flow
    (creates a models dir, fetches OpenCV files with checksum
    verification, lets insightface auto-download its packs), then
    runs detect + embed + verify (+ analyze where supported) through
    the in-process BackendServicer.

  * test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
    paths; downloads ONNX files to a temp dir at setUpClass time and
    passes ModelPath accordingly.

Registry change (core/services/facerecognition/store_registry.go):

  * `dim=0` in NewStoreRegistry now means "accept whatever dimension
    arrives" — needed because the backend supports 512-d ArcFace/MBF
    and 128-d SFace via the same Registry. A non-zero dim still fails
    fast with ErrDimensionMismatch.

  * core/application plumbs `faceEmbeddingDim = 0`, explaining the
    rationale in the comment.

Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.

Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:

    PASS: insightface-buffalo-l    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-sc   faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-s    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-m    faces=6 dim=512 same-dist=0.000
    PASS: insightface-antelopev2   faces=6 dim=512 same-dist=0.000
    PASS: insightface-opencv       faces=6 dim=128 same-dist=0.000
    PASS: insightface-opencv-int8  faces=6 dim=128 same-dist=0.000
    7/7 passed

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim

CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:

    LoadModel failed: Failed to load face engine:
    OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx

Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.

Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).

Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs

The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.

Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).

Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).

Response:

    {
      "embedding": [<dim> floats, L2-normed],
      "dim": int,           // 512 for ArcFace R50 / MBF, 128 for SFace
      "model": "<name>"
    }

Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.

Assisted-by: Claude:claude-opus-4-7

* fix(http): map malformed image input + gRPC status codes to proper 4xx

Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.

Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:

  * decodeImageInput(s)
      Wraps utils.GetContentURIAsBase64 and turns any failure
      (invalid URL, not a data-URI, download error, etc.) into
      echo.NewHTTPError(400, "invalid image input: ...").

  * mapBackendError(err)
      Inspects the gRPC status on a backend call error and maps:
        INVALID_ARGUMENT     → 400 Bad Request
        NOT_FOUND            → 404 Not Found
        FAILED_PRECONDITION  → 412 Precondition Failed
        Unimplemented        → 501 Not Implemented
      All other codes fall through unchanged (still 500).

Before, my 1×1 PNG error-path test returned:
    HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
    HTTP 400 "failed to decode one or both images"

Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.

Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.

Assisted-by: Claude:claude-opus-4-7

* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis

Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.

Two changes:

1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
   `files:` list with the pack zip's URI + SHA-256. `local-ai models
   install insightface-buffalo-l` now fetches the zip, verifies the
   hash, and extracts it into the models directory. No more reliance
   on insightface's library-internal `ensure_available()` auto-download
   or its hardcoded `BASE_REPO_URL`.

2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
   the FaceAnalysis wrapper and drives insightface's `model_zoo`
   directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
   route each through `model_zoo.get_model()`, build a
   `{taskname: model}` dict, loop per-face at inference — are
   reimplemented in `InsightFaceEngine`. The actual inference classes
   (RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
   insightface's — we only replicate the glue, so drift risk against
   upstream is minimal.

   Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
   layout that doesn't match what LocalAI's zip extraction produces.
   LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
   are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
   at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
   `<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
   `_locate_insightface_pack` helper searches both locations plus
   legacy paths and returns whichever has ONNX files. Replaces the
   earlier `_flatten_insightface_pack` helper (which tried to fight
   FaceAnalysis's layout expectations; now we just find the files
   wherever they are).

Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched

The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".

Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.

Face analyze cap dropped since buffalo_sc has no age/gender head.

Assisted-by: Claude:claude-opus-4-7[1m]

* feat(face-recognition): surface face-recognition in advertised feature maps

The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:

  * api_instructions — the machine-readable capability index at
    GET /api/instructions. Added `face-recognition` as a dedicated
    instruction area with an intro that calls out the in-memory
    registry caveat and the /v1/face/embed vs /v1/embeddings split.
  * auth/permissions — added FeatureFaceRecognition constant, routed
    all six face endpoints through it so admins can gate them per-user
    like any other API feature. Default ON (matches the other API
    features).
  * React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
    FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
    follow-up (noted in the plan).

Instruction count bumped 9 → 10; test updated.

Assisted-by: Claude:claude-opus-4-7[1m]

* docs(agents): capture advertising-surface steps in the endpoint guide

Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.

Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.

Assisted-by: Claude:claude-opus-4-7[1m]
2026-04-22 21:55:41 +02:00
Richard Palethorpe
d16f19f1eb fix(kokoros): Build and publish the backend images from CI/CD (#9487)
* fix(kokoros): Build and publish the backend images from CI/CD

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Delete .claude/agents

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/commands

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/settings.json

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/skills

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-22 13:19:55 +02:00
LocalAI [bot]
cd7b035716 chore: ⬆️ Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 (#9479)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 08:58:19 +02:00
LocalAI [bot]
0f3bb2d647 chore(model gallery): 🤖 add 1 new models via gallery agent (#9481)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 08:22:05 +02:00
Adira
607efe5a4c fix(backend-monitor): accept model as a query parameter (#9411)
The /backend/monitor endpoint is routed as GET but its handler bound the
model name from a request body, which is invalid per REST and breaks
Swagger UI and OpenAPI codegen tools that refuse to send bodies with GET.

Switch to reading ?model=<name> as a query parameter and update the
Swagger annotation, regenerated spec files, and documentation. The
handler still falls back to body binding when the query parameter is
absent, so existing clients sending {"model": "..."} continue to work.

Fixes #9207

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-04-21 22:06:35 +02:00
Ettore Di Giacinto
7d8c1d5e45 fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush (#9470)
* fix(streaming): dedupe content, recover reasoning, unique tool IDs

When tool calls are discovered only during final parsing (after the
streaming token callback returns), processTools' default switch branch
used to emit the full accumulated content alongside the tool_call args
chunk. Clients that accumulate delta.content per the OpenAI streaming
contract end up showing every narration line twice. Three related bugs
in the same flush path:

1. Content duplication: the args chunk carried Content:textContentToReturn
   even though the text had already been streamed token-by-token via
   the token callback, so delta.content was both the running total and
   bundled with tool_calls in one delta (two spec violations).
2. Reasoning drop: when the C++ autoparser surfaces reasoning only as
   a final aggregate (no incremental tokens), the callback never emits
   it and the flush branch didn't either, silently losing it.
3. tool_call ID collision: empty ss.ID fell back to the request id, so
   multiple empty-ID calls in the same turn all shared the same id,
   breaking tool_result matching by tool_call_id.

Extracted the block into buildDeferredToolCallChunks (pure function,
unit-testable) and added 19 Ginkgo specs covering streamed vs.
not-streamed content/reasoning, single vs. multi call, and
incremental-vs-deferred emission. Every case asserts the invariant
that no delta carries both non-empty Content/Reasoning and non-empty
ToolCalls.

Fix summary:
- emit reasoning in its own leading chunk when !reasoningAlreadyStreamed
- emit role+content in their own chunks when !contentAlreadyStreamed
- drop Content from the tool_call args chunk
- fallback to fmt.Sprintf("%s-%d", id, i) for empty ss.ID so calls stay
  uniquely addressable

Reproduced live against qwen3.6-35b-a3b-apex served by LocalAI with
the C++ autoparser; the full-content replay chunk that preceded each
tool_calls block is gone after the fix.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): dedupe reasoning in the noActionToRun final chunk

extractor.Reasoning() returns only the Go-side extractor's lastReasoning
accumulator (pkg/reasoning/extractor.go:129). ChatDelta reasoning
coming through ProcessChatDeltaReasoning lives in a separate
accumulator (cdLastStrippedReasoning) that Reasoning() does not
expose. The "reasoning != \"\" && extractor.Reasoning() == \"\"" guard
therefore fires exactly when the autoparser streamed reasoning
incrementally via the callback — producing a duplicate final delivery.

Replace both guard sites in the noActionToRun branch with the
sentReasoning flag introduced in the previous commit. Extract the
closing-chunk logic into buildNoActionFinalChunks so the refactor is
testable; the helper mirrors buildDeferredToolCallChunks.

Add Ginkgo coverage for both the content-streamed and
content-not-streamed paths: reasoning is dropped when it was streamed,
delivered once when it arrived only as a final aggregate, and omitted
when empty. Metadata invariants carried over from the sibling helper.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): detect noActionToRun anywhere in functionResults

The previous condition only looked at functionResults[0].Name, which
misbehaved when a real tool call followed a noAction sentinel — the
noAction shadowed the real call and the whole turn was treated as a
question to answer, silently dropping the tool call. The mirror case,
[realCall, noActionCall], fell into the default branch and emitted the
noAction entry as if it were a real tool_call.

Replace with hasRealCall, which scans the slice and returns true as
soon as it finds a non-noAction entry. noActionToRun now matches the
semantic intent: "every entry is the noAction sentinel (or the slice
is empty)".

Note: this does not change incremental emission, where noAction
entries may still be forwarded as tool_call chunks by the XML/JSON
iterative parsers. That is a separate layer (functions.Parse*) and
addressing it requires threading noAction through the parser APIs —
out of scope for this change.

Assisted-by: Claude:claude-opus-4-7 go vet
2026-04-21 21:59:33 +02:00
leinasi2014
d18d434bb2 Respect explicit reasoning config during GGUF thinking probe (#9463)
Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-21 21:53:10 +02:00
Ettore Di Giacinto
39573ecd2a chore(whisperx): drop ROCm/hipblas build target (#9474)
whisperx has no upstream AMD GPU support and its core transcription path
(faster-whisper -> ctranslate2) falls back to CPU on AMD since the PyPI
ctranslate2 is CUDA-only. The torch rocm wheels would accelerate only the
alignment/diarization stages, producing a misleadingly half-working image.

Drop the hipblas variant rather than shipping a partially accelerated build
users can't distinguish from the real thing. AMD hosts now fall through
the capability map to cpu-whisperx / cpu-whisperx-development.

Also removes the now-dangling rocm-whisperx assertion from
pkg/system/capabilities_test.go and the ROCm mention from the whisperx
row in docs/content/reference/compatibility-table.md.

Assisted-by: Claude Code:claude-opus-4-7
2026-04-21 21:50:18 +02:00
Ettore Di Giacinto
a7dbb2a83d fix(gallery-agent): process blacklist command on recently-closed PRs (#9473)
The command-processing step only walked open PRs, so when a maintainer
wrote `/gallery-agent blacklist` and immediately closed the PR, the
next scheduled run missed the command, the `gallery-agent/blacklisted`
label was never applied, and the skip-URL step (which only pulls URLs
from closed PRs carrying that label) re-proposed the model on the next
cron.

Also scan closed gallery-agent PRs from the last 14 days that don't
already carry the blacklist label, and apply the label retroactively
when the command is present. Close/recreate actions still only run on
open PRs.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 16:29:13 +02:00
dependabot[bot]
3ad9b16c29 chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 (#9455)
Bumps [github.com/coreos/go-oidc/v3](https://github.com/coreos/go-oidc) from 3.17.0 to 3.18.0.
- [Release notes](https://github.com/coreos/go-oidc/releases)
- [Commits](https://github.com/coreos/go-oidc/compare/v3.17.0...v3.18.0)

---
updated-dependencies:
- dependency-name: github.com/coreos/go-oidc/v3
  dependency-version: 3.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:31:02 +02:00
dependabot[bot]
c806d5ab73 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 (#9456)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.14 to 1.32.16.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.14...config/v1.32.16)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.16
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:30:22 +02:00
LocalAI [bot]
47efaf5b43 Fix: Add model parameter to neutts-air gallery definition (#8793)
fix: Add model parameter to neutts-air gallery definition

The neutts-air model entry was missing the 'model' parameter in its
configuration, which caused LocalAI to fail with an 'Unrecognized model'
error when trying to use it. This change adds the required model parameter
pointing to the HuggingFace repository (neuphonic/neutts-air) so the backend
can properly load the model.

Fixes #8792

Signed-off-by: localai-bot <localai-bot@example.com>
Co-authored-by: localai-bot <localai-bot@example.com>
2026-04-21 11:56:00 +02:00
LocalAI [bot]
315b634a91 feat: improve CLI error messages with actionable guidance (#8880)
- transcript.go: Model not found error now suggests available models commands
- util.go: GGUF error explains format and how to get models
- worker_p2p.go: Token error explains purpose and how to obtain one
- run.go: Startup failure includes troubleshooting steps and docs link
- model_config_loader.go: Config validation errors include file path and guidance

Refs: H2 - UX Review Issue

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 11:53:26 +02:00
dependabot[bot]
6b245299d7 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 (#9454)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:43:00 +02:00
dependabot[bot]
677c0315c1 chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 (#9453)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.30 to 1.7.31.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.30...v1.7.31)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-version: 1.7.31
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:43 +02:00
dependabot[bot]
478522ce4d chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 (#9452)
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.97.1 to 1.99.1.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.97.1...service/s3/v1.99.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.99.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:27 +02:00
Ettore Di Giacinto
c54897ad44 fix(tests): update InstallBackend call sites for new URI/Name/Alias params (#9467)
Commit 02bb715c (#9446) added uri, name, alias parameters to
RemoteUnloaderAdapter.InstallBackend but missed the e2e test call
sites, breaking the distributed test build. Pass empty strings to
match the pattern used by the other non-URI call sites.

Assisted-by: Claude Code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 11:41:38 +02:00
LocalAI [bot]
8bb1e8f21f chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 (#9448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:15:45 +02:00
LocalAI [bot]
cd94a0b61a chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe (#9450)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:09:05 +02:00
LocalAI [bot]
047bc48fa9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:07:07 +02:00
sec171
01bd8ae5d0 [gallery] Fix duplicate sha256 keys in Wan models (#9461)
Fix duplicate sha256 keys in wan models gallery

The wan models previously defined the `sha256` key twice in their files lists,
which triggered strict mapping key checks in the YAML parser and resulted
in unmarshal errors that crashed the `/api/models` loading. This removes
the redundant trailing `sha256` keys from the Wan model definitions.

Assisted-by: Antigravity:Gemini-3.1-Pro-High [multi_replace_file_content, run_command]

Signed-off-by: Alex <codecrusher24@gmail.com>
2026-04-21 11:06:36 +02:00
LocalAI [bot]
d9808769be chore(model-gallery): ⬆️ update checksum (#9451)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:58 +02:00
LocalAI [bot]
5973c0a9df chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e (#9447)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:19 +02:00
leinasi2014
486b5e25a3 fix(config): ignore yaml backup files in model loader (#9443)
Only load files whose real extension is .yaml or .yml so backup files like model.yaml.bak do not override active configs. Add a regression test covering plain and timestamped backup files.

Assisted-by: Codex:gpt-5.4 docker

Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
2026-04-20 23:41:39 +02:00
Russell Sim
c66c41e8d7 fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445)
Commit 8839a71c exposed AMDGPU_TARGETS as an ARG/ENV in
Dockerfile.llama-cpp so GPU targets could be overridden, but never
wired the value through the CI workflow inputs. Without it, Docker
receives AMDGPU_TARGETS="" which overrides the Makefile's ?= default,
causing all hipblas builds to compile only for gfx906 regardless of
the target list in the Makefile.

Add amdgpu-targets as a workflow_call input with the same default list
as the Makefile, and pass it as AMDGPU_TARGETS in the build-args of
both the push and PR build steps.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:41:19 +02:00
Russell Sim
02bb715c0a fix(distributed): pass ExternalURI through NATS backend install (#9446)
When installing a backend with a custom OCI URI in distributed mode,
the URI was captured in ManagementOp.ExternalURI by the HTTP handler
but never forwarded to workers. BackendInstallRequest had no URI field,
so workers fell through to the gallery lookup and failed with
"no backend found with name <custom-name>".

Add URI/Name/Alias fields to BackendInstallRequest and thread them from
ManagementOp through DistributedBackendManager.InstallBackend() and the
RemoteUnloaderAdapter. On the worker side, route to InstallExternalBackend
when URI is set instead of InstallBackendFromGallery. Update all
remaining InstallBackend call sites (UpgradeBackend, reconciler
pending-op drain, router auto-install) to pass empty strings for the
new params.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:39:35 +02:00
Ettore Di Giacinto
8ab56e2ad3 feat(gallery): add wan i2v 720p (#9457)
feat(gallery): add Wan 2.1 I2V 14B 720P + pin all wan ggufs by sha256

Adds a new entry for the native-720p image-to-video sibling of the
480p I2V model (wan-2.1-i2v-14b-480p-ggml). The 720p I2V model is
trained purely as image-to-video — no first-last-frame interpolation
path — so motion is freer than repurposing the FLF2V 720P variant as
an i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h
auxiliary files as the existing 480p I2V and 720p FLF2V entries, so
no new aux downloads are introduced.

Also pins the main diffusion gguf by sha256 for the new entry and for
the three existing wan entries that were previously missing a hash
(wan-2.1-t2v-1.3b-ggml, wan-2.1-i2v-14b-480p-ggml,
wan-2.1-flf2v-14b-720p-ggml). Hashes were fetched from HuggingFace's
x-linked-etag header per .agents/adding-gallery-models.md.

Assisted-by: Claude:claude-opus-4-7
2026-04-20 23:34:11 +02:00
pjbrzozowski
ecf85fde9e fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427)
The API Traces tab in /app/traces always showed (0) traces despite requests
being recorded.

The /api/traces endpoint was registered in both localai.go and ui_api.go.
The ui_api.go version wrapped the response as {"traces": [...]} instead of
the flat []APIExchange array that both the React UI (Traces.jsx) and the
legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go
handler, Array.isArray(apiData) always returned false, making the API Traces
tab permanently empty.

Remove the duplicate endpoints from ui_api.go so only the correct flat-array
version in localai.go is served.

Also use mime.ParseMediaType for the Content-Type check in the trace
middleware so requests with parameters (e.g. application/json; charset=utf-8)
are still traced.

Signed-off-by: Pawel Brzozowski <paul@ontux.net>
Co-authored-by: Pawel Brzozowski <paul@ontux.net>
2026-04-20 18:44:49 +02:00
Sai Asish Y
6480715a16 fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438)
GET /api/settings returns settings.ApiKeys as the merged env+runtime list
via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and
round-trips it back on POST /api/settings unchanged.

UpdateSettingsEndpoint was then doing:

    appConfig.ApiKeys = append(envKeys, runtimeKeys...)

where runtimeKeys already contained envKeys (because the UI got them from
the merged GET). Every save therefore duplicated the env keys on top of
the previous merge, and also wrote the duplicates to runtime_settings.json
so the duplication survived restarts and compounded with each save. This
is the user-visible behaviour in #9071: the Web UI shows the keys
twice / three times after consecutive saves.

Before we marshal the settings to disk or call ApplyRuntimeSettings, drop
any incoming key that already appears in startupConfig.ApiKeys. The file
on disk now stores only the genuinely runtime-added keys; the subsequent
append(envKeys, runtimeKeys...) produces one copy of each env key, as
intended. Behaviour is unchanged for users who never had env keys set.

Fixes #9071

Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>
2026-04-20 10:36:54 +02:00
Ettore Di Giacinto
f683231811 feat(gallery): add Wan 2.1 FLF2V 14B 720P (#9440)
First-last-frame-to-video variant of the 14B Wan family. Accepts a
start and end reference image and — unlike the pure i2v path — runs
both through clip_vision, so the final frame lands on the end image
both in pixel and semantic space. Right pick for seamless loops
(start_image == end_image) and narrative A→B cuts.

Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the
I2V 14B entry. Options block mirrors i2v's full-list-in-override
style so the template merge doesn't drop fields.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:34:36 +02:00
LocalAI [bot]
960757f0e8 chore(model gallery): 🤖 add 1 new models via gallery agent (#9436)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 08:48:47 +02:00
Ettore Di Giacinto
865fd552f5 docs(agents): adopt kernel's AI coding assistants policy
Align LocalAI with the Linux kernel project's policy for AI-assisted
contributions (https://docs.kernel.org/process/coding-assistants.html).

- Add .agents/ai-coding-assistants.md with the full policy adapted to
  LocalAI's MIT license: no Signed-off-by or Co-Authored-By from AI,
  attribute AI involvement via an Assisted-by: trailer, human submitter
  owns the contribution.
- Surface the rules at the entry points: AGENTS.md (and its CLAUDE.md
  symlink) and CONTRIBUTING.md.
- Publish a user-facing reference page at
  docs/content/reference/ai-coding-assistants.md and link it from the
  references index.

Assisted-by: Claude:claude-opus-4-7
2026-04-19 22:50:54 +00:00
LocalAI [bot]
cb77a5a4b9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9425)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:42:44 +02:00
Ettore Di Giacinto
60633c4dd5 fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435)
gen_video's ffmpeg subprocess was relying on the filename extension to
choose the output container. Distributed LocalAI hands the backend a
staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to
.mp4 only after the backend returns, so ffmpeg saw a .tmp extension and
bailed with "Unable to choose an output format". Inference had already
completed and the frames were piped in, producing the cryptic
"video inference failed (code 1)" at the API layer.

Pass -f mp4 explicitly so the container is selected by flag instead of
by filename suffix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 00:41:54 +02:00
Ettore Di Giacinto
9e44944cc1 fix(i2v): Add new options to the model configuration
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-20 00:27:05 +02:00
Ettore Di Giacinto
372eb08dcf fix(gallery): allow uninstalling orphaned meta backends + force reinstall (#9434)
Two interrelated bugs that combined to make a meta backend impossible
to uninstall once its concrete had been removed from disk (partial
install, earlier crash, manual cleanup).

1. DeleteBackendFromSystem returned "meta backend %q not found" and
   bailed out early when the concrete directory didn't exist,
   preventing the orphaned meta dir from ever being removed. Treat a
   missing concrete as idempotent success — log a warning and continue
   to remove the orphan meta.

2. InstallBackendFromGallery's "already installed, skip" short-circuit
   only checked that the name was known (`backends.Exists(name)`); an
   orphaned meta whose RunFile points at a missing concrete still
   satisfies that check, so every reinstall returned nil without doing
   anything. Afterwards the worker's findBackend returned empty and we
   kept looping with "backend %q not found after install attempt".
   Require the entry to be actually runnable (run.sh stat-able, not a
   directory) before skipping.

New helper isBackendRunnable centralises the runnability test so both
the install guard and future callers stay in sync. Tests cover the
orphaned-meta delete path and the non-runnable short-circuit case.
2026-04-20 00:10:19 +02:00
LocalAI [bot]
28091d626e chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 (#9430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:01:48 +02:00
LocalAI [bot]
cae79d9107 feat(swagger): update swagger (#9431)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:50 +02:00
LocalAI [bot]
babbbc6ec8 chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad (#9429)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:19 +02:00
LocalAI [bot]
3804497186 chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 (#9428)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:07 +02:00
Ettore Di Giacinto
fda1c553a1 fix(distributed): stop queue loops on agent nodes + dead-letter cap (#9433)
pending_backend_ops rows targeting agent-type workers looped forever:
the reconciler fan-out hit a NATS subject the worker doesn't subscribe
to, returned ErrNoResponders, we marked the node unhealthy, and the
health monitor flipped it back to healthy on the next heartbeat. Next
tick, same row, same failure.

Three related fixes:

1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend.
   Agent workers handle agent NATS subjects, not backend.install /
   delete / list, so enqueueing for them guarantees an infinite retry
   loop. Silent skip is correct — they aren't consumers of these ops.

2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on
   nats.ErrNoResponders: mark the node unhealthy before recording the
   failure, so subsequent ListDuePendingBackendOps (filters by
   status=healthy) stops picking the row until the node actually
   recovers. Matches the synchronous fan-out path.

3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of
   exponential backoff the row is a poison message; further retries
   just thrash NATS. Row is deleted and logged at ERROR so it stays
   visible without staying infinite.

Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows
that target agent-type nodes, non-existent nodes, or carry an empty
backend name. Guarded by the same schema-migration advisory lock so
only one instance performs it. The guards above prevent new rows of
this shape; this closes the migration gap for existing ones.

Tests: the prune migration (valid row stays, agent + empty-name rows
drop) on top of existing upsert / backoff coverage.
2026-04-19 23:38:43 +02:00
111 changed files with 5890 additions and 347 deletions

View File

@@ -0,0 +1,101 @@
# AI Coding Assistants
This document provides guidance for AI tools and developers using AI
assistance when contributing to LocalAI.
**LocalAI follows the same guidelines as the Linux kernel project for
AI-assisted contributions.** See the upstream policy here:
<https://docs.kernel.org/process/coding-assistants.html>
The rules below mirror that policy, adapted to LocalAI's license and
project layout. If anything is unclear, the kernel document is the
authoritative reference for intent.
AI tools helping with LocalAI development should follow the standard
project development process:
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
conventions, and PR guidelines
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
logging, and documentation conventions
- [.agents/building-and-testing.md](building-and-testing.md) — build and
test procedures
## Licensing and Legal Requirements
All contributions must comply with LocalAI's licensing requirements:
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
file
- New source files should use the SPDX license identifier `MIT` where
applicable to the file type
- Contributions must be compatible with the MIT License and must not
introduce code under incompatible licenses (e.g., GPL) without an
explicit discussion with maintainers
## Signed-off-by and Developer Certificate of Origin
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
certify the Developer Certificate of Origin (DCO). The human submitter
is responsible for:
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO)
to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
A human reviewer owns the contribution; the AI's involvement is recorded
via `Assisted-by` (see below).
## Attribution
When AI tools contribute to LocalAI development, proper attribution helps
track the evolving role of AI in the development process. Contributions
should include an `Assisted-by` tag in the commit message trailer in the
following format:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Where:
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
`Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g.,
`claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
Basic development tools (git, go, make, editors) should **not** be listed.
### Example
```
fix(llama-cpp): handle empty tool call arguments
Previously the parser panicked when the model returned a tool call with
an empty arguments object. Fall back to an empty JSON object in that
case so downstream consumers receive a valid payload.
Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility.
The human submitter must:
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the
project style
- Confirm that any referenced APIs, flags, or file paths actually exist
in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
Reviewers may ask for clarification on any change regardless of how it
was produced. "An AI wrote it" is not an acceptable answer to a design
question.

View File

@@ -2,6 +2,8 @@
This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
## Architecture overview
Authentication and authorization flow through three layers:
@@ -234,6 +236,66 @@ Use these HTTP status codes:
If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
## Advertising surfaces — where to register a new capability
Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
### 1. Swagger `@Tags` annotation (mandatory)
Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
```go
// MyEndpoint does X.
// @Summary Do X.
// @Tags my-capability
// @Param request body schema.MyRequest true "payload"
// @Success 200 {object} schema.MyResponse "Response"
// @Router /v1/my-endpoint [post]
func MyEndpoint(...) echo.HandlerFunc { ... }
```
Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
After adding endpoints, regenerate the embedded spec so the runtime serves it:
```bash
make protogen-go # ensures gRPC codegen is fresh first
make swagger # regenerates swagger/swagger.json
```
### 2. `/api/instructions` registry (for new capability areas)
`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
Add an entry to `instructionDefs`:
```go
{
Name: "my-capability", // URL segment at /api/instructions/my-capability
Description: "Short sentence describing the capability",
Tags: []string{"my-capability"}, // must match swagger @Tags
Intro: "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
},
```
Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
```js
export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
```
React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
### 4. `docs/content/` (user-facing documentation)
A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
## Path protection rules
The global auth middleware classifies paths as API paths or non-API paths:
@@ -248,12 +310,23 @@ If you add endpoints under a new top-level path prefix, add it to `isAPIPath()`
When adding a new endpoint:
**Routing & auth**
- [ ] Handler in `core/http/endpoints/`
- [ ] Route registered in appropriate `core/http/routes/` file
- [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
- [ ] If token-counting: `usageMiddleware` added to middleware chain
- [ ] Error responses use `schema.ErrorResponse` format
**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
**Quality**
- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
- [ ] Tests cover both authenticated and unauthenticated access
- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation

View File

@@ -30,6 +30,7 @@ jobs:
skip-drivers: ${{ matrix.skip-drivers }}
context: ${{ matrix.context }}
ubuntu-version: ${{ matrix.ubuntu-version }}
amdgpu-targets: ${{ matrix.amdgpu-targets }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -710,6 +711,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-insightface'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "insightface"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -1623,19 +1637,6 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-whisperx'
runs-on: 'bigger-runner'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
skip-drivers: 'false'
backend: "whisperx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2596,6 +2597,20 @@ jobs:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# kokoros (Rust TTS)
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-kokoros'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "kokoros"
dockerfile: "./backend/Dockerfile.rust"
context: "./"
ubuntu-version: '2404'
# local-store
- build-type: ''
cuda-major-version: ""
@@ -2624,6 +2639,20 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
# insightface (face recognition)
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-insightface'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "insightface"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""

View File

@@ -58,6 +58,11 @@ on:
required: false
default: '2204'
type: string
amdgpu-targets:
description: 'AMD GPU targets for ROCm/HIP builds'
required: false
default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
type: string
secrets:
dockerUsername:
required: false
@@ -214,6 +219,7 @@ jobs:
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
@@ -235,6 +241,7 @@ jobs:
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha

View File

@@ -54,24 +54,41 @@ jobs:
REPO: ${{ github.repository }}
SEARCH: 'gallery agent in:title'
run: |
# Walk open gallery-agent PRs and act on maintainer comments:
# Walk gallery-agent PRs and act on maintainer comments:
# /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
# /gallery-agent recreate → close without label (next run may repropose)
# Only comments from OWNER / MEMBER / COLLABORATOR are honored so
# random users can't drive the bot.
#
# We scan both open PRs AND recently-closed PRs that don't already
# carry the blacklist label. This covers the common flow where a
# maintainer writes /gallery-agent blacklist and immediately clicks
# Close — without this, the next scheduled run wouldn't see the
# command (PR is already closed) and would repropose the model.
gh label create gallery-agent/blacklisted \
--repo "$REPO" --color ededed \
--description "gallery-agent must not repropose this model" 2>/dev/null || true
prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
--json number --jq '.[].number')
# Closed PRs from the last 14 days that don't yet have the blacklist label.
# Bounded window keeps the scan cheap while covering late-applied commands.
since=$(date -u -d '14 days ago' +%Y-%m-%d)
prs_closed=$(gh pr list --repo "$REPO" --state closed \
--search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
--json number --jq '.[].number')
prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
for pr in $prs; do
state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
--jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
echo "PR #$pr: blacklist command found"
echo "PR #$pr: blacklist command found (state=$state)"
gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
if [ "$state" = "OPEN" ]; then
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
fi
elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
echo "PR #$pr: recreate command found"
gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
fi

View File

@@ -38,6 +38,7 @@ jobs:
qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
voxtral: ${{ steps.detect.outputs.voxtral }}
kokoros: ${{ steps.detect.outputs.kokoros }}
insightface: ${{ steps.detect.outputs.insightface }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
@@ -751,3 +752,29 @@ jobs:
- name: Test kokoros
run: |
make -C backend/rust/kokoros test
tests-insightface-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
make build-essential curl unzip ca-certificates git tar
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.26.0'
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
df -h
- name: Build insightface backend image and run both model configurations
run: |
make test-extra-backend-insightface-all

View File

@@ -1,11 +1,23 @@
# LocalAI Agent Instructions
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
## Policy for AI-Assisted Contributions
LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
## Topics
| File | When to read |
|------|-------------|
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
@@ -22,5 +34,6 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
- **Go style**: Prefer `any` over `interface{}`
- **Comments**: Explain *why*, not *what*
- **Docs**: Update `docs/content/` when adding features or changing config
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

View File

@@ -13,6 +13,7 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
- [Development Workflow](#development-workflow)
- [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
- [Coding Guidelines](#coding-guidelines)
- [AI Coding Assistants](#ai-coding-assistants)
- [Testing](#testing)
- [Documentation](#documentation)
- [Community and Communication](#community-and-communication)
@@ -185,7 +186,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to
This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
### General Principles
@@ -211,6 +212,26 @@ For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific gui
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
- Be responsive to review feedback and keep discussions constructive.
## AI Coding Assistants
LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
Basic development tools (git, go, make, editors) should not be listed.
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
- Contributions must remain compatible with LocalAI's **MIT License**.
## Testing
All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.

116
Makefile
View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
GOCMD=go
GOTEST=$(GOCMD) test
@@ -434,6 +434,7 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/ace-step
$(MAKE) -C backend/python/trl
$(MAKE) -C backend/python/tinygrad
$(MAKE) -C backend/python/insightface
$(MAKE) -C backend/rust/kokoros kokoros-grpc
test-extra: prepare-test-extra
@@ -457,6 +458,7 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/python/ace-step test
$(MAKE) -C backend/python/trl test
$(MAKE) -C backend/python/tinygrad test
$(MAKE) -C backend/python/insightface test
$(MAKE) -C backend/rust/kokoros test
##
@@ -507,6 +509,13 @@ test-extra-backend: protogen-go
BACKEND_TEST_TOOL_NAME="$$BACKEND_TEST_TOOL_NAME" \
BACKEND_TEST_CACHE_TYPE_K="$$BACKEND_TEST_CACHE_TYPE_K" \
BACKEND_TEST_CACHE_TYPE_V="$$BACKEND_TEST_CACHE_TYPE_V" \
BACKEND_TEST_FACE_IMAGE_1_URL="$$BACKEND_TEST_FACE_IMAGE_1_URL" \
BACKEND_TEST_FACE_IMAGE_1_FILE="$$BACKEND_TEST_FACE_IMAGE_1_FILE" \
BACKEND_TEST_FACE_IMAGE_2_URL="$$BACKEND_TEST_FACE_IMAGE_2_URL" \
BACKEND_TEST_FACE_IMAGE_2_FILE="$$BACKEND_TEST_FACE_IMAGE_2_FILE" \
BACKEND_TEST_FACE_IMAGE_3_URL="$$BACKEND_TEST_FACE_IMAGE_3_URL" \
BACKEND_TEST_FACE_IMAGE_3_FILE="$$BACKEND_TEST_FACE_IMAGE_3_FILE" \
BACKEND_TEST_VERIFY_DISTANCE_CEILING="$$BACKEND_TEST_VERIFY_DISTANCE_CEILING" \
go test -v -timeout 30m ./tests/e2e-backends/...
## Convenience wrappers: build the image, then exercise it.
@@ -603,6 +612,107 @@ test-extra-backend-tinygrad-all: \
test-extra-backend-tinygrad-sd \
test-extra-backend-tinygrad-whisper
## insightface — face recognition.
##
## Face fixtures default to the sample images shipped in the
## deepinsight/insightface repository (MIT-licensed). For offline/local
## runs override with BACKEND_TEST_FACE_IMAGE_{1,2,3}_FILE pointing at
## local paths.
FACE_IMAGE_1_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
FACE_IMAGE_2_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/t1.jpg
FACE_IMAGE_3_URL ?= https://github.com/deepinsight/insightface/raw/master/python-package/insightface/data/images/mask_white.jpg
## Host-side cache for the OpenCV Zoo face ONNX files used by the
## opencv e2e target. The backend image no longer bakes model weights —
## gallery installs bring them via `files:` — but the e2e suite drives
## LoadModel over gRPC directly without going through the gallery. We
## pre-download the ONNX files to a stable host path and pass absolute
## paths in BACKEND_TEST_OPTIONS; `make` skips the downloads when the
## SHA-256 already matches.
INSIGHTFACE_OPENCV_DIR := /tmp/localai-insightface-opencv-cache
INSIGHTFACE_OPENCV_YUNET_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
INSIGHTFACE_OPENCV_SFACE_URL := https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
INSIGHTFACE_OPENCV_YUNET_SHA := 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
INSIGHTFACE_OPENCV_SFACE_SHA := 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
## buffalo_sc (insightface) — pack zip + SHA-256 mirrors the gallery
## entry so the e2e target matches exactly what `local-ai models install
## insightface-buffalo-sc` would have fetched. Smallest insightface pack
## (~16MB) — keeps CI fast while still covering the insightface engine
## code path end-to-end.
INSIGHTFACE_BUFFALO_SC_DIR := /tmp/localai-insightface-buffalo-sc-cache
INSIGHTFACE_BUFFALO_SC_URL := https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
INSIGHTFACE_BUFFALO_SC_SHA := 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
.PHONY: insightface-opencv-models
insightface-opencv-models:
@mkdir -p $(INSIGHTFACE_OPENCV_DIR)
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_YUNET_SHA)" ]; then \
echo "Fetching YuNet..."; \
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx $(INSIGHTFACE_OPENCV_YUNET_URL); \
echo "$(INSIGHTFACE_OPENCV_YUNET_SHA) $(INSIGHTFACE_OPENCV_DIR)/yunet.onnx" | sha256sum -c; \
fi
@if [ "$$(sha256sum $(INSIGHTFACE_OPENCV_DIR)/sface.onnx 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_OPENCV_SFACE_SHA)" ]; then \
echo "Fetching SFace..."; \
curl -fsSL -o $(INSIGHTFACE_OPENCV_DIR)/sface.onnx $(INSIGHTFACE_OPENCV_SFACE_URL); \
echo "$(INSIGHTFACE_OPENCV_SFACE_SHA) $(INSIGHTFACE_OPENCV_DIR)/sface.onnx" | sha256sum -c; \
fi
.PHONY: insightface-buffalo-sc-models
insightface-buffalo-sc-models:
@mkdir -p $(INSIGHTFACE_BUFFALO_SC_DIR)
@if [ "$$(sha256sum $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip 2>/dev/null | awk '{print $$1}')" != "$(INSIGHTFACE_BUFFALO_SC_SHA)" ]; then \
echo "Fetching buffalo_sc..."; \
curl -fsSL -o $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip $(INSIGHTFACE_BUFFALO_SC_URL); \
echo "$(INSIGHTFACE_BUFFALO_SC_SHA) $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip" | sha256sum -c; \
rm -f $(INSIGHTFACE_BUFFALO_SC_DIR)/*.onnx; \
fi
@if [ ! -f "$(INSIGHTFACE_BUFFALO_SC_DIR)/det_500m.onnx" ]; then \
echo "Extracting buffalo_sc..."; \
unzip -o -q $(INSIGHTFACE_BUFFALO_SC_DIR)/buffalo_sc.zip -d $(INSIGHTFACE_BUFFALO_SC_DIR); \
fi
## buffalo_sc — smallest insightface pack (SCRFD-500MF detector + MBF
## recognizer, ~16MB). Exercises the insightface engine code path
## (model_zoo-backed inference) without the ~326MB buffalo_l download.
## No age/gender/landmark heads — face_analyze is dropped from caps.
## The pack is pre-fetched on the host and passed as `root:<dir>` since
## the e2e suite drives LoadModel directly without going through
## LocalAI's gallery flow (which is what would normally populate
## ModelPath and in turn the engine's `_model_dir` option).
test-extra-backend-insightface-buffalo-sc: docker-build-insightface insightface-buffalo-sc-models
BACKEND_IMAGE=local-ai-backend:insightface \
BACKEND_TEST_MODEL_NAME=insightface-buffalo-sc \
BACKEND_TEST_OPTIONS=engine:insightface,model_pack:buffalo_sc,root:$(INSIGHTFACE_BUFFALO_SC_DIR) \
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
$(MAKE) test-extra-backend
## OpenCV Zoo YuNet + SFace — Apache 2.0, commercial-safe. face_analyze
## cap is dropped (SFace has no demographic head). The ONNX files are
## pre-fetched on the host via the insightface-opencv-models target and
## passed as absolute paths, since the e2e suite drives LoadModel
## directly without going through LocalAI's gallery flow.
test-extra-backend-insightface-opencv: docker-build-insightface insightface-opencv-models
BACKEND_IMAGE=local-ai-backend:insightface \
BACKEND_TEST_MODEL_NAME=insightface-opencv \
BACKEND_TEST_OPTIONS=engine:onnx_direct,detector_onnx:$(INSIGHTFACE_OPENCV_DIR)/yunet.onnx,recognizer_onnx:$(INSIGHTFACE_OPENCV_DIR)/sface.onnx \
BACKEND_TEST_CAPS=health,load,face_detect,face_embed,face_verify \
BACKEND_TEST_FACE_IMAGE_1_URL=$(FACE_IMAGE_1_URL) \
BACKEND_TEST_FACE_IMAGE_2_URL=$(FACE_IMAGE_2_URL) \
BACKEND_TEST_FACE_IMAGE_3_URL=$(FACE_IMAGE_3_URL) \
BACKEND_TEST_VERIFY_DISTANCE_CEILING=0.55 \
$(MAKE) test-extra-backend
## Aggregate — runs both face-recognition model configurations so CI
## catches regressions across engines together.
test-extra-backend-insightface-all: \
test-extra-backend-insightface-buffalo-sc \
test-extra-backend-insightface-opencv
## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
## tool-call extraction via sglang's native qwen parser. CPU builds use
## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
@@ -748,6 +858,7 @@ BACKEND_OUTETTS = outetts|python|.|false|true
BACKEND_FASTER_WHISPER = faster-whisper|python|.|false|true
BACKEND_COQUI = coqui|python|.|false|true
BACKEND_RFDETR = rfdetr|python|.|false|true
BACKEND_INSIGHTFACE = insightface|python|.|false|true
BACKEND_KITTEN_TTS = kitten-tts|python|.|false|true
BACKEND_NEUTTS = neutts|python|.|false|true
BACKEND_KOKORO = kokoro|python|.|false|true
@@ -819,6 +930,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_COQUI)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR)))
$(eval $(call generate-docker-build-target,$(BACKEND_INSIGHTFACE)))
$(eval $(call generate-docker-build-target,$(BACKEND_KITTEN_TTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_NEUTTS)))
$(eval $(call generate-docker-build-target,$(BACKEND_KOKORO)))
@@ -853,7 +965,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface
########################################################
### Mock Backend for E2E Tests

View File

@@ -24,6 +24,8 @@ service Backend {
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
rpc Detect(DetectOptions) returns (DetectResponse) {}
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
rpc StoresSet(StoresSetOptions) returns (Result) {}
rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
@@ -475,6 +477,57 @@ message DetectResponse {
repeated Detection Detections = 1;
}
// --- Face recognition messages ---
message FacialArea {
float x = 1;
float y = 2;
float w = 3;
float h = 4;
}
message FaceVerifyRequest {
string img1 = 1; // base64-encoded image
string img2 = 2; // base64-encoded image
float threshold = 3; // cosine-distance threshold; 0 = use backend default
bool anti_spoofing = 4; // reserved for future MiniFASNet bolt-on
}
message FaceVerifyResponse {
bool verified = 1;
float distance = 2; // 1 - cosine_similarity
float threshold = 3;
float confidence = 4; // 0-100
string model = 5; // e.g. "buffalo_l"
FacialArea img1_area = 6;
FacialArea img2_area = 7;
float processing_time_ms = 8;
}
message FaceAnalyzeRequest {
string img = 1; // base64-encoded image
repeated string actions = 2; // subset of ["age","gender","emotion","race"]; empty = all-supported
bool anti_spoofing = 3;
}
message FaceAnalysis {
FacialArea region = 1;
float face_confidence = 2;
float age = 3;
string dominant_gender = 4; // "Man" | "Woman"
map<string, float> gender = 5;
string dominant_emotion = 6; // reserved; empty in MVP
map<string, float> emotion = 7;
string dominant_race = 8; // not populated
map<string, float> race = 9;
bool is_real = 10; // anti-spoofing result when enabled
float antispoof_score = 11;
}
message FaceAnalyzeResponse {
repeated FaceAnalysis faces = 1;
}
message ToolFormatMarkers {
string format_type = 1; // "json_native", "tag_with_json", "tag_with_tagged"

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -326,7 +326,7 @@ struct llama_client_slot
char buffer[512];
double t_token = t_prompt_processing / num_prompt_tokens_processed;
double n_tokens_second = 1e3 / t_prompt_processing * num_prompt_tokens_processed;
sprintf(buffer, "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
snprintf(buffer, sizeof(buffer), "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
t_prompt_processing, num_prompt_tokens_processed,
t_token, n_tokens_second);
LOG_INFO(buffer, {
@@ -340,7 +340,7 @@ struct llama_client_slot
t_token = t_token_generation / n_decoded;
n_tokens_second = 1e3 / t_token_generation * n_decoded;
sprintf(buffer, "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
snprintf(buffer, sizeof(buffer), "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
t_token_generation, n_decoded,
t_token, n_tokens_second);
LOG_INFO(buffer, {
@@ -352,7 +352,7 @@ struct llama_client_slot
{"n_tokens_second", n_tokens_second},
});
sprintf(buffer, " total time = %10.2f ms", t_prompt_processing + t_token_generation);
snprintf(buffer, sizeof(buffer), " total time = %10.2f ms", t_prompt_processing + t_token_generation);
LOG_INFO(buffer, {
{"slot_id", id},
{"task_id", task_id},

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -1,7 +1,7 @@
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
# Pinned to the HEAD of rebase/upstream-sync-april-2026 on https://github.com/TheTom/llama-cpp-turboquant.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
TURBOQUANT_VERSION?=7f320bb89f68096240a517783674cc17c66b7ad2
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d
STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -1106,6 +1106,11 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
// Force MP4 container. Distributed LocalAI hands us a staging
// path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
// extension; relying on filename suffix makes ffmpeg bail with
// "Unable to choose an output format".
const_cast<char*>("-f"), const_cast<char*>("mp4"),
const_cast<char*>(dst),
nullptr
};

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -168,6 +168,43 @@
nvidia-cuda-13: "cuda13-rfdetr"
nvidia-cuda-12: "cuda12-rfdetr"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-rfdetr"
- &insightface
name: "insightface"
alias: "insightface"
# Upstream insightface library is MIT. The pretrained model packs
# (buffalo_l, buffalo_s, antelopev2) are released for NON-COMMERCIAL
# research use only. The backend image also pre-bakes OpenCV Zoo
# YuNet + SFace (Apache 2.0) for commercial use. Pick the engine
# via model-gallery entries (insightface-buffalo-l / insightface-opencv
# / insightface-buffalo-s) or set `options` in your model YAML.
license: "mixed"
description: |
Face recognition backend powered by `insightface` (ONNX Runtime).
Provides face verification (/v1/face/verify), face analysis
(/v1/face/analyze), face embedding (/v1/embeddings), face
detection (/v1/detection), and 1:N identification
(/v1/face/{register,identify,forget}).
Ships two engines in a single image: one that drives the insightface
model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research
use only) and one that drives OpenCV Zoo's YuNet + SFace pair
(Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]`
in your model YAML, or install one of the ready-made model-gallery
entries under the `insightface-*` prefix.
The backend image contains only code and Python deps; all model
weights are managed by LocalAI's gallery download mechanism.
urls:
- https://github.com/deepinsight/insightface
- https://github.com/opencv/opencv_zoo
tags:
- face-recognition
- face-verification
- face-embedding
- gpu
- cpu
capabilities:
default: "cpu-insightface"
nvidia: "cuda12-insightface"
nvidia-cuda-12: "cuda12-insightface"
- &sam3cpp
name: "sam3-cpp"
alias: "sam3-cpp"
@@ -587,7 +624,6 @@
alias: "whisperx"
capabilities:
nvidia: "cuda12-whisperx"
amd: "rocm-whisperx"
metal: "metal-whisperx"
default: "cpu-whisperx"
nvidia-cuda-13: "cuda13-whisperx"
@@ -2745,7 +2781,6 @@
name: "whisperx-development"
capabilities:
nvidia: "cuda12-whisperx-development"
amd: "rocm-whisperx-development"
metal: "metal-whisperx-development"
default: "cpu-whisperx-development"
nvidia-cuda-13: "cuda13-whisperx-development"
@@ -2771,16 +2806,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
- !!merge <<: *whisperx
name: "rocm-whisperx"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
- !!merge <<: *whisperx
name: "rocm-whisperx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-whisperx
- !!merge <<: *whisperx
name: "cuda13-whisperx"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
@@ -3721,3 +3746,30 @@
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization
# insightface (face recognition) — development and concrete image entries
- !!merge <<: *insightface
name: "insightface-development"
capabilities:
default: "cpu-insightface-development"
nvidia: "cuda12-insightface-development"
nvidia-cuda-12: "cuda12-insightface-development"
- !!merge <<: *insightface
name: "cpu-insightface"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-insightface"
mirrors:
- localai/localai-backends:latest-cpu-insightface
- !!merge <<: *insightface
name: "cuda12-insightface"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-insightface"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-insightface
- !!merge <<: *insightface
name: "cpu-insightface-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-insightface"
mirrors:
- localai/localai-backends:master-cpu-insightface
- !!merge <<: *insightface
name: "cuda12-insightface-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-insightface"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-insightface

View File

@@ -0,0 +1,13 @@
.DEFAULT_GOAL := install
.PHONY: install
install:
bash install.sh
.PHONY: protogen-clean
protogen-clean:
$(RM) backend_pb2_grpc.py backend_pb2.py
.PHONY: clean
clean: protogen-clean
rm -rf venv __pycache__

View File

@@ -0,0 +1,67 @@
# insightface backend (LocalAI)
Face recognition backend backed by ONNX Runtime. Provides face
verification (1:1), face analysis (age/gender), face detection, face
embedding, and — via LocalAI's built-in vector store — 1:N
identification.
## Engines
This backend ships with **two** interchangeable engines selected via
`LoadModel.Options["engine"]`:
| engine | Implementation | Models | License |
|---|---|---|---|
| `insightface` (default) | `insightface.app.FaceAnalysis` | `buffalo_l`, `buffalo_s`, `antelopev2` | **Non-commercial research use only** |
| `onnx_direct` | OpenCV `FaceDetectorYN` + `FaceRecognizerSF` | OpenCV Zoo YuNet + SFace | Apache 2.0 (commercial-safe) |
Both engines implement the same `FaceEngine` protocol in `engines.py`,
so the gRPC servicer in `backend.py` doesn't need to know which one is
active.
## LoadModel options
Common:
| option | default | description |
|---|---|---|
| `engine` | `insightface` | one of `insightface`, `onnx_direct` |
| `det_size` | `640x640` (insightface), `320x320` (onnx_direct) | detector input size |
| `det_thresh` | `0.5` | detector confidence threshold |
| `verify_threshold` | `0.35` | default cosine distance cutoff for FaceVerify |
`insightface` engine:
| option | default | description |
|---|---|---|
| `model_pack` | `buffalo_l` | which insightface pack to load |
`onnx_direct` engine:
| option | default | description |
|---|---|---|
| `detector_onnx` | *(required)* | path to YuNet-compatible ONNX |
| `recognizer_onnx` | *(required)* | path to SFace-compatible ONNX |
## Adding a new model pack
1. If it's an insightface pack (auto-downloadable or manually extracted
into `~/.insightface/models/<name>/`), just add a new gallery entry
in `backend/index.yaml` with `options: ["engine:insightface",
"model_pack:<name>"]`. No code change.
2. If it's an Apache-licensed ONNX pair, add a gallery entry with
`options: ["engine:onnx_direct", "detector_onnx:...",
"recognizer_onnx:..."]`. If the detector or recognizer has a
different input-tensor shape than YuNet/SFace, you may need a new
engine implementation in `engines.py`; the two-engine seam makes
that a self-contained change.
## Running tests locally
```bash
make -C backend/python/insightface # install deps + bake models
make -C backend/python/insightface test # run test.py
```
The OpenCV Zoo tests skip gracefully when `/models/opencv/*.onnx` is
absent (e.g. on dev boxes where `install.sh` wasn't run).

View File

@@ -0,0 +1,265 @@
#!/usr/bin/env python3
"""gRPC server for the insightface face recognition backend.
Implements Health / LoadModel / Status plus the face-specific methods:
Embedding, Detect, FaceVerify, FaceAnalyze. The heavy lifting is
delegated to engines.py — this file is just the gRPC plumbing.
"""
import argparse
import base64
import os
import signal
import sys
import time
from concurrent import futures
from io import BytesIO
import backend_pb2
import backend_pb2_grpc
import cv2
import grpc
import numpy as np
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "common"))
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "common"))
from grpc_auth import get_auth_interceptors # noqa: E402
from engines import FaceEngine, build_engine # noqa: E402
_ONE_DAY = 60 * 60 * 24
MAX_WORKERS = int(os.environ.get("PYTHON_GRPC_MAX_WORKERS", "1"))
# Default cosine-distance threshold for "same person" on buffalo_l
# ArcFace R50. Clients can override per-request; clients using SFace
# should pass threshold≈0.4 since the distance distribution is wider.
DEFAULT_VERIFY_THRESHOLD = 0.35
def _decode_image(src: str) -> np.ndarray | None:
"""Decode a base64-encoded image into an OpenCV BGR numpy array."""
if not src:
return None
try:
data = base64.b64decode(src, validate=False)
except Exception:
return None
arr = np.frombuffer(data, dtype=np.uint8)
if arr.size == 0:
return None
img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
return img
def _parse_options(raw: list[str]) -> dict[str, str]:
out: dict[str, str] = {}
for entry in raw:
if ":" not in entry:
continue
k, v = entry.split(":", 1)
out[k.strip()] = v.strip()
return out
class BackendServicer(backend_pb2_grpc.BackendServicer):
def __init__(self) -> None:
self.engine: FaceEngine | None = None
self.engine_name: str = ""
self.model_name: str = ""
self.verify_threshold: float = DEFAULT_VERIFY_THRESHOLD
def Health(self, request, context):
return backend_pb2.Reply(message=bytes("OK", "utf-8"))
def LoadModel(self, request, context):
options = _parse_options(list(request.Options))
# Surface LocalAI's models directory (ModelPath) so engines can
# anchor relative paths — OnnxDirectEngine's detector_onnx /
# recognizer_onnx point at gallery-managed files that LocalAI
# dropped there, and InsightFaceEngine auto-downloads its packs
# into that same directory alongside every other managed model.
# Private key to avoid clashing with user-provided options.
if request.ModelPath:
options["_model_dir"] = request.ModelPath
engine_name = options.get("engine", "insightface")
try:
self.engine = build_engine(engine_name)
self.engine.prepare(options)
except Exception as err: # pragma: no cover - exercised via e2e
return backend_pb2.Result(success=False, message=f"Failed to load face engine: {err}")
self.engine_name = engine_name
self.model_name = request.Model or options.get("model_pack", "")
if "verify_threshold" in options:
try:
self.verify_threshold = float(options["verify_threshold"])
except ValueError:
pass
print(f"[insightface] engine={engine_name} model={self.model_name} loaded", file=sys.stderr)
return backend_pb2.Result(success=True, message="Model loaded successfully")
def Status(self, request, context):
state = (
backend_pb2.StatusResponse.READY
if self.engine is not None
else backend_pb2.StatusResponse.UNINITIALIZED
)
return backend_pb2.StatusResponse(state=state)
def Embedding(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.EmbeddingResult()
if not request.Images:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("Embedding requires Images[0] to be a base64 image")
return backend_pb2.EmbeddingResult()
img = _decode_image(request.Images[0])
if img is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode image")
return backend_pb2.EmbeddingResult()
vec = self.engine.embed(img)
if vec is None:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details("no face detected")
return backend_pb2.EmbeddingResult()
return backend_pb2.EmbeddingResult(embeddings=[float(x) for x in vec])
def Detect(self, request, context):
if self.engine is None:
return backend_pb2.DetectResponse()
img = _decode_image(request.src)
if img is None:
return backend_pb2.DetectResponse()
detections = []
for d in self.engine.detect(img):
x1, y1, x2, y2 = d.bbox
detections.append(
backend_pb2.Detection(
x=float(x1),
y=float(y1),
width=float(x2 - x1),
height=float(y2 - y1),
confidence=float(d.score),
class_name="face",
)
)
return backend_pb2.DetectResponse(Detections=detections)
def FaceVerify(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.FaceVerifyResponse()
img1 = _decode_image(request.img1)
img2 = _decode_image(request.img2)
if img1 is None or img2 is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode one or both images")
return backend_pb2.FaceVerifyResponse()
threshold = request.threshold if request.threshold > 0 else self.verify_threshold
start = time.time()
e1 = self.engine.embed(img1)
e2 = self.engine.embed(img2)
if e1 is None or e2 is None:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details("no face detected in one or both images")
return backend_pb2.FaceVerifyResponse()
# Both engines return L2-normalized vectors, so the dot product
# is the cosine similarity directly.
sim = float(np.dot(e1, e2))
distance = 1.0 - sim
verified = distance < threshold
confidence = max(0.0, min(100.0, (1.0 - distance / threshold) * 100.0)) if threshold > 0 else 0.0
def _region(img) -> backend_pb2.FacialArea:
dets = self.engine.detect(img)
if not dets:
return backend_pb2.FacialArea()
best = max(dets, key=lambda d: d.score)
x1, y1, x2, y2 = best.bbox
return backend_pb2.FacialArea(x=x1, y=y1, w=x2 - x1, h=y2 - y1)
return backend_pb2.FaceVerifyResponse(
verified=verified,
distance=float(distance),
threshold=float(threshold),
confidence=float(confidence),
model=self.model_name or self.engine_name,
img1_area=_region(img1),
img2_area=_region(img2),
processing_time_ms=float((time.time() - start) * 1000.0),
)
def FaceAnalyze(self, request, context):
if self.engine is None:
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
context.set_details("face model not loaded")
return backend_pb2.FaceAnalyzeResponse()
img = _decode_image(request.img)
if img is None:
context.set_code(grpc.StatusCode.INVALID_ARGUMENT)
context.set_details("failed to decode image")
return backend_pb2.FaceAnalyzeResponse()
faces = []
for attrs in self.engine.analyze(img):
x, y, w, h = attrs.region
fa = backend_pb2.FaceAnalysis(
region=backend_pb2.FacialArea(x=float(x), y=float(y), w=float(w), h=float(h)),
face_confidence=float(attrs.face_confidence),
)
if attrs.age is not None:
fa.age = float(attrs.age)
if attrs.dominant_gender:
fa.dominant_gender = attrs.dominant_gender
for k, v in attrs.gender.items():
fa.gender[k] = float(v)
faces.append(fa)
return backend_pb2.FaceAnalyzeResponse(faces=faces)
def serve(address: str) -> None:
server = grpc.server(
futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
options=[
("grpc.max_message_length", 50 * 1024 * 1024),
("grpc.max_send_message_length", 50 * 1024 * 1024),
("grpc.max_receive_message_length", 50 * 1024 * 1024),
],
interceptors=get_auth_interceptors(),
)
backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
server.add_insecure_port(address)
server.start()
print("[insightface] Server started. Listening on: " + address, file=sys.stderr)
def _stop(sig, frame): # pragma: no cover
print("[insightface] shutting down")
server.stop(0)
sys.exit(0)
signal.signal(signal.SIGINT, _stop)
signal.signal(signal.SIGTERM, _stop)
try:
while True:
time.sleep(_ONE_DAY)
except KeyboardInterrupt:
server.stop(0)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Run the insightface gRPC server.")
parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
args = parser.parse_args()
print(f"[insightface] startup: {args}", file=sys.stderr)
serve(args.addr)

View File

@@ -0,0 +1,382 @@
"""Face recognition engine implementations for the LocalAI insightface backend.
Two engines are provided:
* InsightFaceEngine — wraps insightface.app.FaceAnalysis. Supports
buffalo_l / buffalo_s / antelopev2 model packs
with SCRFD detector + ArcFace recognizer +
genderage head. NON-COMMERCIAL research use
only (upstream license).
* OnnxDirectEngine — loads detector + recognizer ONNX files directly
via onnxruntime. Used for OpenCV Zoo models
(YuNet + SFace) and any future Apache-licensed
model set. Does not support analyze().
Both engines expose the same interface so the gRPC servicer (backend.py)
can dispatch without knowing which one is active.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Protocol
import cv2
import numpy as np
@dataclass
class FaceDetection:
bbox: tuple[float, float, float, float] # x1, y1, x2, y2
score: float
landmarks: np.ndarray | None = None # 5x2 keypoints when available
@dataclass
class FaceAttributes:
region: tuple[float, float, float, float] # x, y, w, h
face_confidence: float
age: float | None = None
dominant_gender: str | None = None
gender: dict[str, float] = field(default_factory=dict)
class FaceEngine(Protocol):
"""Minimal interface every engine must implement."""
def prepare(self, options: dict[str, str]) -> None: ...
def detect(self, img: np.ndarray) -> list[FaceDetection]: ...
def embed(self, img: np.ndarray) -> np.ndarray | None: ...
def analyze(self, img: np.ndarray) -> list[FaceAttributes]: ...
# ─── InsightFaceEngine ────────────────────────────────────────────────
class InsightFaceEngine:
"""Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
FaceAnalysis is a thin 50-line orchestration (glob for ONNX files
in `<root>/models/<name>/`, route each through `model_zoo.get_model`,
build a `{taskname: model}` dict, then loop per-face at inference).
We reimplement the same loop here so we can:
1. Load packs from whatever directory LocalAI's gallery extracted
them into — flat (buffalo_l/s/sc — ONNX at `<dir>/*.onnx`) or
nested (buffalo_m/antelopev2 — ONNX at `<dir>/<name>/*.onnx`)
without needing a specific layout on disk.
2. Skip insightface's built-in auto-download entirely: weight
delivery is LocalAI's gallery `files:` job now, checksum-
verified and cached alongside every other managed model.
The actual inference classes (RetinaFace, ArcFaceONNX, Attribute,
Landmark) stay in insightface — we only reimplement the ~50 lines
of glue around them.
"""
def __init__(self) -> None:
self.models: dict[str, Any] = {}
self.det_model: Any = None
self.model_pack: str = "buffalo_l"
self.det_size: tuple[int, int] = (640, 640)
self.det_thresh: float = 0.5
self._providers: list[str] = ["CPUExecutionProvider"]
def prepare(self, options: dict[str, str]) -> None:
import glob
import os
from insightface.model_zoo import model_zoo
self.model_pack = options.get("model_pack", "buffalo_l")
self.det_size = _parse_det_size(options.get("det_size", "640x640"))
self.det_thresh = float(options.get("det_thresh", "0.5"))
pack_dir = _locate_insightface_pack(options, self.model_pack)
if pack_dir is None:
raise ValueError(
f"no insightface pack '{self.model_pack}' found — install via "
f"`local-ai models install insightface-{self.model_pack.replace('_', '-')}`"
)
onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
if not onnx_files:
raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
# CUDAExecutionProvider is picked automatically by onnxruntime-gpu
# when available; falling back to CPU keeps the CPU-only image
# working. ctx_id=0 means "first GPU if any, else CPU".
self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
self.models = {}
for onnx_file in onnx_files:
m = model_zoo.get_model(onnx_file, providers=self._providers)
if m is None:
continue
# First occurrence of each taskname wins (matches FaceAnalysis).
if m.taskname not in self.models:
self.models[m.taskname] = m
if "detection" not in self.models:
raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
self.det_model = self.models["detection"]
self.det_model.prepare(0, input_size=self.det_size, det_thresh=self.det_thresh)
for name, m in self.models.items():
if name != "detection":
m.prepare(0)
def _faces(self, img: np.ndarray) -> list[Any]:
"""Run detection + all non-detection models per face."""
if self.det_model is None:
return []
from insightface.app.common import Face
bboxes, kpss = self.det_model.detect(img, max_num=0)
if bboxes is None or bboxes.shape[0] == 0:
return []
faces: list[Any] = []
for i in range(bboxes.shape[0]):
bbox = bboxes[i, 0:4]
det_score = bboxes[i, 4]
kps = kpss[i] if kpss is not None else None
face = Face(bbox=bbox, kps=kps, det_score=det_score)
for name, m in self.models.items():
if name == "detection":
continue
m.get(img, face)
faces.append(face)
return faces
def detect(self, img: np.ndarray) -> list[FaceDetection]:
return [
FaceDetection(
bbox=tuple(float(v) for v in f.bbox),
score=float(f.det_score),
landmarks=np.array(f.kps) if getattr(f, "kps", None) is not None else None,
)
for f in self._faces(img)
]
def embed(self, img: np.ndarray) -> np.ndarray | None:
faces = self._faces(img)
if not faces:
return None
best = max(faces, key=lambda f: float(f.det_score))
if getattr(best, "normed_embedding", None) is None:
return None
return np.asarray(best.normed_embedding, dtype=np.float32)
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
out: list[FaceAttributes] = []
for f in self._faces(img):
x1, y1, x2, y2 = (float(v) for v in f.bbox)
region = (x1, y1, x2 - x1, y2 - y1)
attrs = FaceAttributes(region=region, face_confidence=float(f.det_score))
age = getattr(f, "age", None)
if age is not None:
attrs.age = float(age)
gender = getattr(f, "gender", None)
if gender is not None:
# genderage head emits argmax, not probabilities —
# one-hot dict keeps the API stable.
attrs.dominant_gender = "Man" if int(gender) == 1 else "Woman"
attrs.gender = {
"Man": 1.0 if int(gender) == 1 else 0.0,
"Woman": 0.0 if int(gender) == 1 else 1.0,
}
out.append(attrs)
return out
# ─── OnnxDirectEngine ─────────────────────────────────────────────────
class OnnxDirectEngine:
"""Loads detector + recognizer ONNX files directly.
Supports the OpenCV Zoo YuNet + SFace pair out of the box. YuNet
exposes a C++-level API via cv2.FaceDetectorYN which accepts the
ONNX file directly; SFace is driven through cv2.FaceRecognizerSF.
Both are Apache 2.0 licensed.
"""
def __init__(self) -> None:
self.detector_path: str = ""
self.recognizer_path: str = ""
self.input_size: tuple[int, int] = (320, 320)
self.det_thresh: float = 0.5
self._detector: Any = None
self._recognizer: Any = None
def prepare(self, options: dict[str, str]) -> None:
raw_det = options.get("detector_onnx", "")
raw_rec = options.get("recognizer_onnx", "")
if not raw_det or not raw_rec:
raise ValueError(
"onnx_direct engine requires both detector_onnx and recognizer_onnx options"
)
model_dir = options.get("_model_dir")
self.detector_path = _resolve_model_path(raw_det, model_dir=model_dir)
self.recognizer_path = _resolve_model_path(raw_rec, model_dir=model_dir)
self.input_size = _parse_det_size(options.get("det_size", "320x320"))
self.det_thresh = float(options.get("det_thresh", "0.5"))
# YuNet is a fixed-size detector; size is reset per detect() call to
# match the input frame.
self._detector = cv2.FaceDetectorYN.create(
self.detector_path,
"",
self.input_size,
score_threshold=self.det_thresh,
nms_threshold=0.3,
top_k=5000,
)
self._recognizer = cv2.FaceRecognizerSF.create(self.recognizer_path, "")
def detect(self, img: np.ndarray) -> list[FaceDetection]:
if self._detector is None:
return []
h, w = img.shape[:2]
self._detector.setInputSize((w, h))
retval, faces = self._detector.detect(img)
if faces is None:
return []
out: list[FaceDetection] = []
for row in faces:
x, y, fw, fh = float(row[0]), float(row[1]), float(row[2]), float(row[3])
# Landmarks at columns 4..13 are (lx1,ly1,...,lx5,ly5).
landmarks = np.array(row[4:14], dtype=np.float32).reshape(5, 2) if len(row) >= 14 else None
score = float(row[-1])
out.append(FaceDetection(bbox=(x, y, x + fw, y + fh), score=score, landmarks=landmarks))
return out
def embed(self, img: np.ndarray) -> np.ndarray | None:
if self._detector is None or self._recognizer is None:
return None
h, w = img.shape[:2]
self._detector.setInputSize((w, h))
retval, faces = self._detector.detect(img)
if faces is None or len(faces) == 0:
return None
# Pick the highest-score face (last column is score).
best = max(faces, key=lambda r: float(r[-1]))
aligned = self._recognizer.alignCrop(img, best)
feat = self._recognizer.feature(aligned)
vec = np.asarray(feat, dtype=np.float32).flatten()
# SFace outputs a 128-dim feature; L2-normalize to make dot-product
# comparable to buffalo_l's already-normed 512-dim embedding.
norm = float(np.linalg.norm(vec))
if norm == 0:
return None
return vec / norm
def analyze(self, img: np.ndarray) -> list[FaceAttributes]:
# OpenCV Zoo does not ship a demographic classifier; report
# only the face-detection regions so callers can still see
# how many faces were detected.
return [
FaceAttributes(
region=(
d.bbox[0],
d.bbox[1],
d.bbox[2] - d.bbox[0],
d.bbox[3] - d.bbox[1],
),
face_confidence=d.score,
)
for d in self.detect(img)
]
# ─── helpers ──────────────────────────────────────────────────────────
def _parse_det_size(raw: str) -> tuple[int, int]:
raw = raw.strip().lower().replace(" ", "")
if "x" in raw:
w, h = raw.split("x", 1)
return (int(w), int(h))
n = int(raw)
return (n, n)
def _locate_insightface_pack(options: dict[str, str], name: str) -> str | None:
"""Find the directory holding the insightface pack's ONNX files.
LocalAI's gallery `files:` extracts the pack zip straight into the
models directory. Upstream packs are inconsistent:
buffalo_l/s/sc — flat zip, ONNX lands at `<models_dir>/*.onnx`
buffalo_m, antelopev2 — wrapped zip, ONNX lands at `<models_dir>/<name>/*.onnx`
We search, in order:
1. `<models_dir>/<name>/` — wrapped-zip layout, or insightface's
own FaceAnalysis-style `<root>/models/<name>/` layout.
2. `<models_dir>/models/<name>/` — insightface's FaceAnalysis
auto-download lands here (handy for dev environments that
still have old `~/.insightface` caches).
3. `<models_dir>/` — flat-zip layout directly in models dir.
Returns the first directory whose contents include `*.onnx`.
"""
import glob
import os
model_dir = options.get("_model_dir") or ""
explicit_root = options.get("root")
candidates: list[str] = []
if model_dir:
candidates.append(os.path.join(model_dir, name))
candidates.append(os.path.join(model_dir, "models", name))
candidates.append(model_dir)
if explicit_root:
expanded = os.path.expanduser(explicit_root)
candidates.append(os.path.join(expanded, "models", name))
candidates.append(os.path.join(expanded, name))
candidates.append(expanded)
for c in candidates:
if os.path.isdir(c) and glob.glob(os.path.join(c, "*.onnx")):
return c
return None
def _resolve_model_path(path: str, model_dir: str | None = None) -> str:
"""Resolve an ONNX file path across the paths LocalAI might deliver it from.
Search order:
1. The path itself if it already resolves (absolute, or relative to CWD).
2. `model_dir` (typically `os.path.dirname(ModelOptions.ModelFile)`) —
this is how LocalAI surfaces gallery-managed files. When the gallery
entry lists `files:`, each one lands under the models directory and
backends load them via filename anchored by ModelFile.
3. `<script_dir>/<path-without-leading-slash>` — covers dev layouts
where someone manually dropped weights inside the backend dir.
If none hit, return the literal input so cv2/insightface surfaces a
clearer error naming the actually-attempted path.
"""
import os
if os.path.isfile(path):
return path
stripped = path.lstrip("/")
candidates: list[str] = []
if model_dir:
candidates.append(os.path.join(model_dir, os.path.basename(path)))
candidates.append(os.path.join(model_dir, stripped))
script_dir = os.path.dirname(os.path.abspath(__file__))
candidates.append(os.path.join(script_dir, stripped))
for c in candidates:
if os.path.isfile(c):
return c
return path
def build_engine(name: str) -> FaceEngine:
"""Factory for the engine selected by LoadModel options."""
key = name.strip().lower()
if key in ("", "insightface"):
return InsightFaceEngine()
if key in ("onnx_direct", "onnx-direct", "opencv"):
return OnnxDirectEngine()
raise ValueError(f"unknown engine: {name!r}")

View File

@@ -0,0 +1,28 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
installRequirements
# We deliberately do NOT pre-bake any model weights here. Two reasons:
#
# 1. Weights should follow LocalAI's gallery-managed download flow
# like every other backend. For OpenCV Zoo (YuNet + SFace) the
# gallery entries in gallery/index.yaml list the ONNX files via
# `files:` with URI + SHA-256 — LocalAI fetches them into the
# models directory on `local-ai models install`.
#
# 2. For insightface model packs (buffalo_l, buffalo_s, buffalo_m,
# buffalo_sc, antelopev2), upstream distributes zip archives
# only (no individual ONNX URLs). We rely on insightface's own
# auto-download machinery (`FaceAnalysis(name=<pack>, root=<dir>)`)
# at first LoadModel, pointed at a writable directory. This
# matches how rfdetr behaves (uses `inference.get_model()`).
#
# Net effect: the backend image ships only Python deps (~150MB CPU).

View File

@@ -0,0 +1,7 @@
insightface
onnxruntime
opencv-python-headless
numpy
onnx
cython
scikit-image

View File

@@ -0,0 +1,7 @@
insightface
onnxruntime-gpu
opencv-python-headless
numpy
onnx
cython
scikit-image

View File

@@ -0,0 +1,3 @@
grpcio==1.71.0
protobuf
grpcio-tools

View File

@@ -0,0 +1,9 @@
#!/bin/bash
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
startBackend $@

View File

@@ -0,0 +1,264 @@
#!/usr/bin/env python3
"""Smoke-test every face recognition model configuration shipped in the
gallery. Simulates what LocalAI does at runtime: for each config, sets
up a models directory, fetches any required files via URL (as the
gallery's `files:` list would), then loads + detects + embeds via the
in-process BackendServicer — matching the gRPC surface end users hit.
Run inside the built backend image (venv already has insightface /
onnxruntime / opencv-python-headless):
python smoke.py
Network is required for the insightface packs (fetched via upstream's
FaceAnalysis auto-download at first LoadModel) and for downloading
the OpenCV Zoo ONNX files on first run.
"""
from __future__ import annotations
import base64
import hashlib
import os
import sys
import traceback
import urllib.request
import cv2
import numpy as np
sys.path.insert(0, os.path.dirname(__file__))
import backend_pb2 # noqa: E402
from backend import BackendServicer # noqa: E402
# Gallery `files:` for the OpenCV variants — same URIs + SHA-256s as
# gallery/index.yaml lists. Tuples: (filename, uri, sha256).
OPENCV_FILES = {
"fp32": [
(
"face_detection_yunet_2023mar.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
),
(
"face_recognition_sface_2021dec.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
),
],
"int8": [
(
"face_detection_yunet_2023mar_int8.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx",
"321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294",
),
(
"face_recognition_sface_2021dec_int8.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx",
"2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a",
),
],
}
CONFIGS = [
{
"name": "insightface-buffalo-l",
"options": ["engine:insightface", "model_pack:buffalo_l"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-sc",
"options": ["engine:insightface", "model_pack:buffalo_sc"],
# buffalo_sc has recognizer only — no landmarks, no genderage.
"has_analyze": False,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-s",
"options": ["engine:insightface", "model_pack:buffalo_s"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-buffalo-m",
"options": ["engine:insightface", "model_pack:buffalo_m"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-antelopev2",
"options": ["engine:insightface", "model_pack:antelopev2"],
"has_analyze": True,
"needs_opencv_files": None,
},
{
"name": "insightface-opencv",
"options": [
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar.onnx",
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
],
"has_analyze": False,
"needs_opencv_files": "fp32",
},
{
"name": "insightface-opencv-int8",
"options": [
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar_int8.onnx",
"recognizer_onnx:face_recognition_sface_2021dec_int8.onnx",
],
"has_analyze": False,
"needs_opencv_files": "int8",
},
]
class _FakeContext:
def __init__(self) -> None:
self.code = None
self.details = None
def set_code(self, code):
self.code = code
def set_details(self, details):
self.details = details
def _encode_image(img: np.ndarray) -> str:
_, buf = cv2.imencode(".jpg", img)
return base64.b64encode(buf.tobytes()).decode("ascii")
def _load_sample_image() -> str:
from insightface.data import get_image as ins_get_image
return _encode_image(ins_get_image("t1"))
def _download_if_missing(model_dir: str, filename: str, uri: str, sha256: str) -> None:
dest = os.path.join(model_dir, filename)
if os.path.isfile(dest):
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
if h == sha256:
return
sys.stderr.write(f" fetching {filename} from {uri}\n")
sys.stderr.flush()
urllib.request.urlretrieve(uri, dest)
h = hashlib.sha256(open(dest, "rb").read()).hexdigest()
if h != sha256:
raise RuntimeError(f"sha256 mismatch for {filename}: want {sha256}, got {h}")
def _run_one(cfg: dict, img_b64: str, model_dir: str) -> tuple[bool, str]:
# Mirror LocalAI's gallery flow: populate model_dir with the
# gallery's listed files before calling LoadModel.
if cfg["needs_opencv_files"]:
for filename, uri, sha256 in OPENCV_FILES[cfg["needs_opencv_files"]]:
_download_if_missing(model_dir, filename, uri, sha256)
svc = BackendServicer()
ctx = _FakeContext()
load_res = svc.LoadModel(
backend_pb2.ModelOptions(
Model=cfg["name"],
Options=cfg["options"],
# ModelPath is what the Go loader sets to ml.ModelPath —
# LocalAI's models directory. The backend anchors relative
# paths and insightface auto-download root here.
ModelPath=model_dir,
),
ctx,
)
if not load_res.success:
return False, f"LoadModel: {load_res.message}"
det_res = svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
if len(det_res.Detections) == 0:
return False, "Detect returned no faces"
for d in det_res.Detections:
if d.class_name != "face":
return False, f"Detect returned class_name={d.class_name!r}"
emb_ctx = _FakeContext()
emb_res = svc.Embedding(backend_pb2.PredictOptions(Images=[img_b64]), emb_ctx)
if emb_ctx.code is not None:
return False, f"Embedding set error code {emb_ctx.code}: {emb_ctx.details}"
if len(emb_res.embeddings) == 0:
return False, "Embedding returned empty vector"
norm_sq = sum(float(x) * float(x) for x in emb_res.embeddings)
if not (0.8 <= norm_sq <= 1.2):
return False, f"Embedding not L2-normed (sum(x^2)={norm_sq:.3f})"
ver_ctx = _FakeContext()
ver_res = svc.FaceVerify(
backend_pb2.FaceVerifyRequest(img1=img_b64, img2=img_b64), ver_ctx
)
if ver_ctx.code is not None:
return False, f"FaceVerify set error code {ver_ctx.code}: {ver_ctx.details}"
if not ver_res.verified:
return False, f"Same-image FaceVerify not verified (dist={ver_res.distance:.3f})"
if ver_res.distance > 0.1:
return False, f"Same-image distance suspiciously high ({ver_res.distance:.3f})"
if cfg["has_analyze"]:
an_ctx = _FakeContext()
an_res = svc.FaceAnalyze(backend_pb2.FaceAnalyzeRequest(img=img_b64), an_ctx)
if an_ctx.code is not None:
return False, f"FaceAnalyze set error code {an_ctx.code}: {an_ctx.details}"
if len(an_res.faces) == 0:
return False, "FaceAnalyze returned no faces"
f0 = an_res.faces[0]
if f0.age <= 0:
return False, f"FaceAnalyze age not populated (age={f0.age})"
if f0.dominant_gender not in ("Man", "Woman"):
return False, f"FaceAnalyze dominant_gender={f0.dominant_gender!r}"
n_dets = len(det_res.Detections)
dim = len(emb_res.embeddings)
return True, f"faces={n_dets} dim={dim} same-dist={ver_res.distance:.3f}"
def main() -> int:
# Honor LOCALAI_MODELS_PATH to re-use cached downloads across runs;
# default to a fresh temp dir.
model_dir = os.environ.get("LOCALAI_MODELS_PATH")
if not model_dir:
import tempfile
model_dir = tempfile.mkdtemp(prefix="face-smoke-")
os.makedirs(model_dir, exist_ok=True)
print(f"model_dir={model_dir}", file=sys.stderr)
print("Preparing sample image from insightface.data...", file=sys.stderr)
img_b64 = _load_sample_image()
results: list[tuple[str, bool, str]] = []
for cfg in CONFIGS:
sys.stderr.write(f"\n=== {cfg['name']} ===\n")
sys.stderr.flush()
try:
ok, detail = _run_one(cfg, img_b64, model_dir)
except Exception:
ok, detail = False, traceback.format_exc().splitlines()[-1]
results.append((cfg["name"], ok, detail))
print(f"{'PASS' if ok else 'FAIL'}: {cfg['name']:30s} {detail}")
sys.stdout.flush()
print("\n=== summary ===")
passed = sum(1 for _, ok, _ in results if ok)
total = len(results)
for name, ok, detail in results:
mark = "" if ok else ""
print(f" {mark} {name:30s} {detail}")
print(f"\n{passed}/{total} passed")
return 0 if passed == total else 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,234 @@
"""Unit tests for the insightface gRPC backend.
The servicer is instantiated in-process (no gRPC channel) and driven
directly. Images come from insightface.data which ships with the pip
package — no external downloads.
Tests are parametrized over both engines (InsightFaceEngine and
OnnxDirectEngine) where applicable.
"""
from __future__ import annotations
import base64
import os
import sys
import unittest
import cv2
import numpy as np
sys.path.insert(0, os.path.dirname(__file__))
import backend_pb2 # noqa: E402
from backend import BackendServicer # noqa: E402
# OpenCV Zoo face ONNX files — downloaded on demand in OnnxDirectEngineTest
# to mirror LocalAI's gallery `files:` flow (the backend image itself
# doesn't ship model weights).
OPENCV_FILES = [
(
"face_detection_yunet_2023mar.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx",
"8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4",
),
(
"face_recognition_sface_2021dec.onnx",
"https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx",
"0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79",
),
]
def _encode(img: np.ndarray) -> str:
_, buf = cv2.imencode(".jpg", img)
return base64.b64encode(buf.tobytes()).decode("ascii")
def _load_insightface_samples() -> dict[str, str]:
"""Return {'t1': <b64>, 't2': <b64>} from insightface.data.get_image.
t1 is a group photo, t2 a different one. We reuse both as
stand-ins for "Alice photo 1/2" and "Bob".
"""
from insightface.data import get_image as ins_get_image
return {
"t1": _encode(ins_get_image("t1")),
"t2": _encode(ins_get_image("t2")),
}
class _FakeContext:
"""Minimal stand-in for grpc.ServicerContext."""
def __init__(self) -> None:
self.code = None
self.details = None
def set_code(self, code):
self.code = code
def set_details(self, details):
self.details = details
class _Harness:
def __init__(self, servicer: BackendServicer) -> None:
self.svc = servicer
def health(self):
return self.svc.Health(backend_pb2.HealthMessage(), _FakeContext())
def load(self, options: list[str], model_path: str = ""):
return self.svc.LoadModel(
backend_pb2.ModelOptions(Model="test", Options=options, ModelPath=model_path),
_FakeContext(),
)
def detect(self, img_b64: str):
return self.svc.Detect(backend_pb2.DetectOptions(src=img_b64), _FakeContext())
def embed(self, img_b64: str):
ctx = _FakeContext()
res = self.svc.Embedding(
backend_pb2.PredictOptions(Images=[img_b64]),
ctx,
)
return res, ctx
def verify(self, a: str, b: str, threshold: float = 0.0):
return self.svc.FaceVerify(
backend_pb2.FaceVerifyRequest(img1=a, img2=b, threshold=threshold),
_FakeContext(),
)
def analyze(self, img_b64: str):
return self.svc.FaceAnalyze(
backend_pb2.FaceAnalyzeRequest(img=img_b64),
_FakeContext(),
)
class InsightFaceEngineTest(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.samples = _load_insightface_samples()
cls.harness = _Harness(BackendServicer())
load = cls.harness.load(["engine:insightface", "model_pack:buffalo_l"])
if not load.success:
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
def test_health(self):
self.assertEqual(self.harness.health().message, b"OK")
def test_detect_finds_face(self):
res = self.harness.detect(self.samples["t1"])
self.assertGreater(len(res.Detections), 0)
for d in res.Detections:
self.assertEqual(d.class_name, "face")
self.assertGreater(d.width, 0)
self.assertGreater(d.height, 0)
def test_embedding_is_l2_normed(self):
res, ctx = self.harness.embed(self.samples["t1"])
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
self.assertEqual(len(res.embeddings), 512)
norm_sq = sum(x * x for x in res.embeddings)
self.assertAlmostEqual(norm_sq, 1.0, places=2)
def test_verify_same_image(self):
res = self.harness.verify(self.samples["t1"], self.samples["t1"])
self.assertTrue(res.verified)
self.assertLess(res.distance, 0.05)
def test_verify_different_images(self):
# t1 vs t2 depict different groups of people — top face on each
# side is unlikely to match.
res = self.harness.verify(self.samples["t1"], self.samples["t2"])
# We assert only that some numerical answer came back; the
# matches-or-not determination depends on which face each side
# picked and isn't a stable test assertion.
self.assertGreaterEqual(res.distance, 0.0)
def test_analyze_has_age_and_gender(self):
res = self.harness.analyze(self.samples["t1"])
self.assertGreater(len(res.faces), 0)
for face in res.faces:
self.assertGreater(face.face_confidence, 0.0)
# Age should be populated for buffalo_l.
self.assertGreater(face.age, 0.0)
self.assertIn(face.dominant_gender, ("Man", "Woman"))
def _prepare_opencv_models_dir() -> str | None:
"""Download OpenCV Zoo face ONNX files into a temp dir the way
LocalAI's gallery would. Returns the directory, or None if
downloads failed (network-restricted sandbox).
"""
import hashlib
import tempfile
import urllib.request
root = os.environ.get("OPENCV_FACE_MODELS_DIR") or tempfile.mkdtemp(
prefix="opencv-face-"
)
for filename, uri, sha256 in OPENCV_FILES:
dest = os.path.join(root, filename)
if os.path.isfile(dest):
if hashlib.sha256(open(dest, "rb").read()).hexdigest() == sha256:
continue
try:
urllib.request.urlretrieve(uri, dest)
except Exception:
return None
if hashlib.sha256(open(dest, "rb").read()).hexdigest() != sha256:
return None
return root
class OnnxDirectEngineTest(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.samples = _load_insightface_samples()
cls.model_dir = _prepare_opencv_models_dir()
if cls.model_dir is None:
raise unittest.SkipTest("OpenCV Zoo ONNX files could not be downloaded")
cls.harness = _Harness(BackendServicer())
load = cls.harness.load(
[
"engine:onnx_direct",
"detector_onnx:face_detection_yunet_2023mar.onnx",
"recognizer_onnx:face_recognition_sface_2021dec.onnx",
],
model_path=cls.model_dir,
)
if not load.success:
raise unittest.SkipTest(f"LoadModel failed: {load.message}")
def test_detect_finds_face(self):
res = self.harness.detect(self.samples["t1"])
self.assertGreater(len(res.Detections), 0)
for d in res.Detections:
self.assertEqual(d.class_name, "face")
def test_embedding_nonempty(self):
res, ctx = self.harness.embed(self.samples["t1"])
self.assertIsNone(ctx.code, f"Embedding error: {ctx.details}")
self.assertGreater(len(res.embeddings), 0)
def test_verify_same_image(self):
res = self.harness.verify(self.samples["t1"], self.samples["t1"], threshold=0.4)
self.assertTrue(res.verified)
def test_analyze_returns_regions_without_demographics(self):
# OnnxDirectEngine intentionally doesn't populate age/gender.
res = self.harness.analyze(self.samples["t1"])
self.assertGreater(len(res.faces), 0)
for face in res.faces:
self.assertEqual(face.dominant_gender, "")
self.assertEqual(face.age, 0.0)
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,11 @@
#!/bin/bash
set -e
backend_dir=$(dirname $0)
if [ -d $backend_dir/common ]; then
source $backend_dir/common/libbackend.sh
else
source $backend_dir/../common/libbackend.sh
fi
runUnittests

View File

@@ -1,6 +0,0 @@
# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
# the rocm7.2.1 runtime via AMD's forward-compatibility window.
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
whisperx @ git+https://github.com/m-bain/whisperX.git

View File

@@ -7,17 +7,28 @@ import (
"sync/atomic"
"time"
corebackend "github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/services/agentpool"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/templates"
pkggrpc "github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
"gorm.io/gorm"
)
// faceEmbeddingDim is the expected dimension for face embeddings.
// Set to 0 so the Registry accepts whatever dim the loaded recognizer
// produces — ArcFace R50 is 512-d, MBF is 512-d, SFace is 128-d, and
// the insightface backend can load any of them via LoadModel options.
// Locking this to a specific value would force a single recognizer
// family per deployment; we keep the door open instead.
const faceEmbeddingDim = 0
type Application struct {
backendLoader *config.ModelConfigLoader
modelLoader *model.ModelLoader
@@ -27,6 +38,7 @@ type Application struct {
galleryService *galleryop.GalleryService
agentJobService *agentpool.AgentJobService
agentPoolService atomic.Pointer[agentpool.AgentPoolService]
faceRegistry facerecognition.Registry
authDB *gorm.DB
watchdogMutex sync.Mutex
watchdogStop chan bool
@@ -50,12 +62,23 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
mcpTools.CloseMCPSessions(modelName)
})
return &Application{
app := &Application{
backendLoader: config.NewModelConfigLoader(appConfig.SystemState.Model.ModelsPath),
modelLoader: ml,
applicationConfig: appConfig,
templatesEvaluator: templates.NewEvaluator(appConfig.SystemState.Model.ModelsPath),
}
// Face-recognition registry backed by LocalAI's built-in vector store.
// The resolver closes over the ModelLoader so the Registry stays
// decoupled from loader plumbing; swapping in a postgres-backed
// implementation later is a single construction change here.
faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
return corebackend.StoreBackend(ml, appConfig, storeName, "")
}
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
return app
}
func (a *Application) ModelConfigLoader() *config.ModelConfigLoader {
@@ -99,6 +122,14 @@ func (a *Application) AgentPoolService() *agentpool.AgentPoolService {
return a.agentPoolService.Load()
}
// FaceRegistry returns the face-recognition registry used for 1:N
// identification. The current implementation is backed by the
// in-memory local-store backend; see core/services/facerecognition
// for the interface and the postgres TODO.
func (a *Application) FaceRegistry() facerecognition.Registry {
return a.faceRegistry
}
// AuthDB returns the auth database connection, or nil if auth is not enabled.
func (a *Application) AuthDB() *gorm.DB {
return a.authDB

View File

@@ -0,0 +1,60 @@
package backend
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
"github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
)
func FaceAnalyze(
img string,
actions []string,
antiSpoofing bool,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) (*proto.FaceAnalyzeResponse, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
startTime = time.Now()
}
res, err := faceModel.FaceAnalyze(context.Background(), &proto.FaceAnalyzeRequest{
Img: img,
Actions: actions,
AntiSpoofing: antiSpoofing,
})
if appConfig.EnableTracing {
errStr := ""
if err != nil {
errStr = err.Error()
}
trace.RecordBackendTrace(trace.BackendTrace{
Timestamp: startTime,
Duration: time.Since(startTime),
Type: trace.BackendTraceFaceAnalyze,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Error: errStr,
})
}
return res, err
}

View File

@@ -0,0 +1,43 @@
package backend
import (
"context"
"fmt"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/pkg/model"
)
// FaceEmbed loads the face recognition backend and returns a 512-d
// face embedding for the base64-encoded image. Unlike ModelEmbedding
// it passes the image through PredictOptions.Images — the insightface
// backend picks the highest-confidence face and returns its
// L2-normalized embedding.
func FaceEmbed(
imgBase64 string,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) ([]float32, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
predictOpts := gRPCPredictOpts(modelConfig, loader.ModelPath)
predictOpts.Images = []string{imgBase64}
res, err := faceModel.Embeddings(context.Background(), predictOpts)
if err != nil {
return nil, err
}
if len(res.Embeddings) == 0 {
return nil, fmt.Errorf("face embedding returned empty vector (no face detected?)")
}
return res.Embeddings, nil
}

View File

@@ -0,0 +1,61 @@
package backend
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/trace"
"github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
)
func FaceVerify(
img1, img2 string,
threshold float32,
antiSpoofing bool,
loader *model.ModelLoader,
appConfig *config.ApplicationConfig,
modelConfig config.ModelConfig,
) (*proto.FaceVerifyResponse, error) {
opts := ModelOptions(modelConfig, appConfig)
faceModel, err := loader.Load(opts...)
if err != nil {
recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil)
return nil, err
}
if faceModel == nil {
return nil, fmt.Errorf("could not load face recognition model")
}
var startTime time.Time
if appConfig.EnableTracing {
trace.InitBackendTracingIfEnabled(appConfig.TracingMaxItems)
startTime = time.Now()
}
res, err := faceModel.FaceVerify(context.Background(), &proto.FaceVerifyRequest{
Img1: img1,
Img2: img2,
Threshold: threshold,
AntiSpoofing: antiSpoofing,
})
if appConfig.EnableTracing {
errStr := ""
if err != nil {
errStr = err.Error()
}
trace.RecordBackendTrace(trace.BackendTrace{
Timestamp: startTime,
Duration: time.Since(startTime),
Type: trace.BackendTraceFaceVerify,
ModelName: modelConfig.Name,
Backend: modelConfig.Backend,
Error: errStr,
})
}
return res, err
}

View File

@@ -40,6 +40,12 @@ type TokenUsage struct {
ChatDeltas []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
}
func needsThinkingProbe(c *config.ModelConfig) bool {
return c.TemplateConfig.UseTokenizerTemplate &&
(c.ReasoningConfig.DisableReasoning == nil ||
c.ReasoningConfig.DisableReasoningTagPrefill == nil)
}
// HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
// Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
func (t TokenUsage) HasChatDeltaContent() bool {
@@ -100,11 +106,9 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
// tokenizer template path is active) and the multimodal media marker (needed
// by custom chat templates so markers line up with what mtmd expects).
// We probe whenever any of those slots is still empty.
needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
c.ReasoningConfig.DisableReasoning == nil &&
c.ReasoningConfig.DisableReasoningTagPrefill == nil
shouldProbeThinking := needsThinkingProbe(c)
needsMarkerProbe := c.MediaMarker == ""
if needsThinkingProbe || needsMarkerProbe {
if shouldProbeThinking || needsMarkerProbe {
modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
// Update the config in the loader so it persists for future requests

View File

@@ -0,0 +1,29 @@
package backend
import (
"github.com/mudler/LocalAI/core/config"
"github.com/gpustack/gguf-parser-go/util/ptr"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("thinking probe gating", func() {
It("probes tokenizer-template models when any reasoning default is still unset", func() {
cfg := &config.ModelConfig{
TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
}
Expect(needsThinkingProbe(cfg)).To(BeTrue())
cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
Expect(needsThinkingProbe(cfg)).To(BeTrue())
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
Expect(needsThinkingProbe(cfg)).To(BeFalse())
})
It("does not probe when tokenizer templates are disabled", func() {
cfg := &config.ModelConfig{}
Expect(needsThinkingProbe(cfg)).To(BeFalse())
})
})

View File

@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
app, err := application.New(opts...)
if err != nil {
return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n 1. Check that your models directory exists and is accessible: %s\n 2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n 3. Check available disk space and file permissions\n 4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
}
appHTTP, err := http.API(app)

View File

@@ -3,7 +3,6 @@ package cli
import (
"context"
"encoding/json"
"errors"
"fmt"
"strings"
@@ -60,7 +59,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
c, exists := cl.GetModelConfig(t.Model)
if !exists {
return errors.New("model not found")
return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
}
c.Threads = &t.Threads

View File

@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {
func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
if len(u.Args) == 0 {
return fmt.Errorf("no GGUF file provided")
return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
}
// We try to guess only if we don't have a template defined already
f, err := gguf.ParseGGUFFile(u.Args[0])

View File

@@ -21,6 +21,7 @@ import (
"github.com/mudler/LocalAI/core/cli/workerregistry"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/storage"
@@ -597,12 +598,20 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
// Try to find the backend binary
backendPath := s.findBackend(req.Backend)
if backendPath == "" {
// Backend not found locally — try auto-installing from gallery
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
if err := gallery.InstallBackendFromGallery(
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
if req.URI != "" {
xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
if err := galleryop.InstallExternalBackend(
context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
}
} else {
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
if err := gallery.InstallBackendFromGallery(
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
}
}
// Re-register after install and retry
gallery.RegisterBackends(s.systemState, s.ml)

View File

@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
// Check if the token is set
// as we always need it.
if r.Token == "" {
return fmt.Errorf("Token is required")
return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
}
port, err := freeport.GetFreePort()

View File

@@ -125,19 +125,7 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
return
}
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
// Use the rendered template to detect if thinking token is at the end
// This reuses the existing DetectThinkingStartToken function
if metadata.RenderedTemplate != "" {
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
thinkingForcedOpen := thinkingStartToken != ""
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
} else {
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
}
applyDetectedThinkingConfig(cfg, metadata)
// Extract tool format markers from autoparser analysis
if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
@@ -180,3 +168,34 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
}
}
}
func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
if cfg == nil || metadata == nil {
return
}
// Respect explicit YAML/user config. Backend probing should only fill defaults
// when the reasoning mode has not already been set.
if cfg.ReasoningConfig.DisableReasoning == nil {
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
}
// Respect explicit prefill config for the same reason. Only infer the
// default prefill behavior when the user did not set it.
if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
// Use the rendered template to detect if thinking token is at the end.
// This reuses the existing DetectThinkingStartToken function.
if metadata.RenderedTemplate != "" {
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
thinkingForcedOpen := thinkingStartToken != ""
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
} else {
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
}
return
}
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
}

View File

@@ -0,0 +1,101 @@
package config
import (
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/reasoning"
"github.com/gpustack/gguf-parser-go/util/ptr"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("GGUF backend metadata reasoning defaults", func() {
It("fills reasoning defaults when unset", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
})
It("preserves fully explicit reasoning settings", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoning: ptr.To(true),
DisableReasoningTagPrefill: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
It("preserves explicit disable while still inferring missing prefill", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoning: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
})
It("preserves explicit prefill while still inferring missing disable flag", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoningTagPrefill: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
It("defaults to disabling reasoning when backend does not support thinking", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: false,
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
})

View File

@@ -588,6 +588,7 @@ const (
FLAG_VAD ModelConfigUsecase = 0b010000000000
FLAG_VIDEO ModelConfigUsecase = 0b100000000000
FLAG_DETECTION ModelConfigUsecase = 0b1000000000000
FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000
// Common Subsets
FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
@@ -611,6 +612,7 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
"FLAG_LLM": FLAG_LLM,
"FLAG_VIDEO": FLAG_VIDEO,
"FLAG_DETECTION": FLAG_DETECTION,
"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
}
}
@@ -651,7 +653,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
nonTextGenBackends := []string{
"whisper", "piper", "kokoro",
"diffusers", "stablediffusion", "stablediffusion-ggml",
"rerankers", "silero-vad", "rfdetr",
"rerankers", "silero-vad", "rfdetr", "insightface",
"transformers-musicgen", "ace-step", "acestep-cpp",
}
@@ -728,12 +730,19 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
}
if (u & FLAG_DETECTION) == FLAG_DETECTION {
detectionBackends := []string{"rfdetr", "sam3-cpp"}
detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
if !slices.Contains(detectionBackends, c.Backend) {
return false
}
}
if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
faceBackends := []string{"insightface"}
if !slices.Contains(faceBackends, c.Backend) {
return false
}
}
if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
if !slices.Contains(soundGenBackends, c.Backend) {

View File

@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
bcl.configs[c.Name] = *c
} else {
if err != nil {
return fmt.Errorf("config is not valid: %w", err)
return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
}
return fmt.Errorf("config is not valid")
return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
}
return nil
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
files = append(files, info)
}
for _, file := range files {
// Skip templates, YAML and .keep files
if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
strings.HasPrefix(file.Name(), ".") {
// Only load real YAML config files and ignore dotfiles or backup variants
ext := strings.ToLower(filepath.Ext(file.Name()))
if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
continue
}

View File

@@ -2,6 +2,7 @@ package config
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
@@ -109,5 +110,50 @@ options:
Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))
})
It("Only loads files ending with yaml or yml", func() {
tmpdir, err := os.MkdirTemp("", "model-config-loader")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpdir)
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
`name: "foo-model"
description: "formal config"
backend: "llama-cpp"
parameters:
model: "foo.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
`name: "foo-model"
description: "backup config"
backend: "llama-cpp"
parameters:
model: "foo-backup.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
`name: "foo-backup-only"
description: "timestamped backup config"
backend: "llama-cpp"
parameters:
model: "foo-timestamped.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
bcl := NewModelConfigLoader(tmpdir)
err = bcl.LoadModelConfigsFromPath(tmpdir)
Expect(err).ToNot(HaveOccurred())
configs := bcl.GetAllModelsConfigs()
Expect(configs).To(HaveLen(1))
Expect(configs[0].Name).To(Equal("foo-model"))
Expect(configs[0].Description).To(Equal("formal config"))
_, exists := bcl.GetModelConfig("foo-backup-only")
Expect(exists).To(BeFalse())
})
})
})

View File

@@ -110,7 +110,13 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
if err != nil {
return err
}
if backends.Exists(name) {
// Only short-circuit if the install is *actually usable*. An orphaned
// meta entry whose concrete was removed still shows up in
// ListSystemBackends with a RunFile pointing at a path that no longer
// exists; returning early there leaves the caller with a broken
// alias and the worker fails with "backend not found after install
// attempt" on every retry. Re-install in that case.
if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
return nil
}
}
@@ -375,17 +381,44 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
}
if metadata != nil && metadata.MetaBackendFor != "" {
metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
// If the concrete the meta points to is already gone (earlier delete,
// partial install, or manual cleanup), keep going and remove the
// orphaned meta dir. Previously we returned an error here, which made
// the orphaned meta impossible to uninstall from the UI — the delete
// kept failing and every subsequent install short-circuited because
// the stale meta metadata made ListSystemBackends.Exists(name) true.
if _, statErr := os.Stat(concreteDirectory); statErr == nil {
os.RemoveAll(concreteDirectory)
} else if os.IsNotExist(statErr) {
xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
"meta", name, "concrete", metadata.MetaBackendFor)
} else {
return statErr
}
os.RemoveAll(metaBackendDirectory)
}
return os.RemoveAll(backendDirectory)
}
// isBackendRunnable reports whether the given backend entry can actually be
// invoked. A meta backend is runnable only if its concrete's run.sh still
// exists on disk; concrete backends are considered runnable as long as their
// RunFile is set (ListSystemBackends only emits them when the runfile is
// present). Used to guard the "already installed" short-circuit so an
// orphaned meta pointing at a missing concrete triggers a real reinstall
// rather than being silently skipped.
func isBackendRunnable(b SystemBackend) bool {
if b.RunFile == "" {
return false
}
if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
return false
}
return true
}
type SystemBackend struct {
Name string
RunFile string

View File

@@ -952,6 +952,58 @@ var _ = Describe("Gallery Backends", func() {
err = DeleteBackendFromSystem(systemState, "non-existent")
Expect(err).To(HaveOccurred())
})
It("removes an orphaned meta backend whose concrete is missing", func() {
// Real scenario from the dev cluster: the concrete got wiped
// (partial install, manual cleanup, previous crash) but the meta
// directory + metadata.json still points at it. The old code
// errored with "meta backend X not found" and left the orphan in
// place, making the backend impossible to uninstall.
metaName := "meta-backend"
concreteName := "concrete-backend-that-vanished"
metaPath := filepath.Join(tempDir, metaName)
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
data, err := json.MarshalIndent(meta, "", " ")
Expect(err).NotTo(HaveOccurred())
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
// Concrete directory intentionally absent.
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
Expect(err).NotTo(HaveOccurred())
Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
Expect(metaPath).NotTo(BeADirectory())
})
})
Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
It("re-runs install when the meta's concrete is missing", func() {
// Seed state: meta dir exists with metadata pointing at a
// concrete that was removed from disk. ListSystemBackends still
// surfaces the meta via its metadata.Name → the old short-circuit
// at `if backends.Exists(name) { return nil }` returned silently,
// leaving the worker's findBackend() with a dead alias forever.
// The fix: require the backend to be runnable before we skip.
metaName := "meta-orphan"
concreteName := "concrete-gone"
metaPath := filepath.Join(tempDir, metaName)
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
data, err := json.MarshalIndent(meta, "", " ")
Expect(err).NotTo(HaveOccurred())
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
Expect(err).NotTo(HaveOccurred())
listed, err := ListSystemBackends(systemState)
Expect(err).NotTo(HaveOccurred())
b, ok := listed.Get(metaName)
Expect(ok).To(BeTrue())
Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
})
})
Describe("ListSystemBackends", func() {

View File

@@ -57,6 +57,14 @@ var RouteFeatureRegistry = []RouteFeature{
// Detection
{"POST", "/v1/detection", FeatureDetection},
// Face recognition
{"POST", "/v1/face/verify", FeatureFaceRecognition},
{"POST", "/v1/face/analyze", FeatureFaceRecognition},
{"POST", "/v1/face/embed", FeatureFaceRecognition},
{"POST", "/v1/face/register", FeatureFaceRecognition},
{"POST", "/v1/face/identify", FeatureFaceRecognition},
{"POST", "/v1/face/forget", FeatureFaceRecognition},
// Video
{"POST", "/video", FeatureVideo},
@@ -151,5 +159,6 @@ func APIFeatureMetas() []FeatureMeta {
{FeatureTokenize, "Tokenize", true},
{FeatureMCP, "MCP", true},
{FeatureStores, "Stores", true},
{FeatureFaceRecognition, "Face Recognition", true},
}
}

View File

@@ -51,6 +51,7 @@ const (
FeatureTokenize = "tokenize"
FeatureMCP = "mcp"
FeatureStores = "stores"
FeatureFaceRecognition = "face_recognition"
)
// AgentFeatures lists agent-related features (default OFF).
@@ -64,6 +65,7 @@ var APIFeatures = []string{
FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription,
FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound,
FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores,
FeatureFaceRecognition,
}
// AllFeatures lists all known features (used by UI and validation).

View File

@@ -73,6 +73,12 @@ var instructionDefs = []instructionDef{
Description: "Video generation from text prompts",
Tags: []string{"video"},
},
{
Name: "face-recognition",
Description: "Face verification (1:1), identification (1:N), embedding, and demographic analysis",
Tags: []string{"face-recognition"},
Intro: "The /v1/face/register, /identify, and /forget endpoints build on a vector store — registrations are in-memory by default and lost on restart. Use /v1/face/embed for a raw embedding; /v1/embeddings is OpenAI-compatible and text-only.",
},
}
// swaggerState holds parsed swagger spec data, initialised once.

View File

@@ -39,7 +39,7 @@ var _ = Describe("API Instructions Endpoints", func() {
instructions, ok := resp["instructions"].([]any)
Expect(ok).To(BeTrue())
Expect(instructions).To(HaveLen(9))
Expect(instructions).To(HaveLen(10))
// Verify each instruction has required fields and correct URL format
for _, s := range instructions {
@@ -73,6 +73,7 @@ var _ = Describe("API Instructions Endpoints", func() {
"model-management",
"monitoring",
"agents",
"face-recognition",
))
})
})

View File

@@ -9,19 +9,26 @@ import (
// BackendMonitorEndpoint returns the status of the specified backend
// @Summary Backend monitor endpoint
// @Tags monitoring
// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
// @Param model query string true "Name of the model to monitor"
// @Success 200 {object} proto.StatusResponse "Response"
// @Router /backend/monitor [get]
func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
return func(c echo.Context) error {
input := new(schema.BackendMonitorRequest)
// Get input data from the request body
if err := c.Bind(input); err != nil {
return err
model := c.QueryParam("model")
// Fall back to binding the request body so pre-existing clients that
// sent `{"model": "..."}` with GET keep working.
if model == "" {
input := new(schema.BackendMonitorRequest)
if err := c.Bind(input); err != nil {
return err
}
model = input.Model
}
if model == "" {
return echo.NewHTTPError(400, "model query parameter is required")
}
resp, err := bm.CheckAndSample(input.Model)
resp, err := bm.CheckAndSample(model)
if err != nil {
return err
}

View File

@@ -9,7 +9,6 @@ import (
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils"
"github.com/mudler/xlog"
)
@@ -34,14 +33,14 @@ func DetectionEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC
xlog.Debug("Detection", "image", input.Image, "modelFile", "modelFile", "backend", cfg.Backend)
image, err := utils.GetContentURIAsBase64(input.Image)
image, err := decodeImageInput(input.Image)
if err != nil {
return err
}
res, err := backend.Detection(image, input.Prompt, input.Points, input.Boxes, input.Threshold, ml, appConfig, *cfg)
if err != nil {
return err
return mapBackendError(err)
}
response := schema.DetectionResponse{

View File

@@ -0,0 +1,69 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceAnalyzeEndpoint returns demographic attributes for faces in an image.
// @Summary Analyze demographic attributes (age, gender, ...) of faces.
// @Tags face-recognition
// @Param request body schema.FaceAnalyzeRequest true "query params"
// @Success 200 {object} schema.FaceAnalyzeResponse "Response"
// @Router /v1/face/analyze [post]
func FaceAnalyzeEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceAnalyzeRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceAnalyze", "model", cfg.Name, "backend", cfg.Backend, "actions", input.Actions)
res, err := backend.FaceAnalyze(img, input.Actions, input.AntiSpoofing, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
response := schema.FaceAnalyzeResponse{
Faces: make([]schema.FaceAnalysis, len(res.GetFaces())),
}
for i, f := range res.GetFaces() {
response.Faces[i] = schema.FaceAnalysis{
Region: schema.FacialArea{
X: f.GetRegion().GetX(),
Y: f.GetRegion().GetY(),
W: f.GetRegion().GetW(),
H: f.GetRegion().GetH(),
},
FaceConfidence: f.GetFaceConfidence(),
Age: f.GetAge(),
DominantGender: f.GetDominantGender(),
Gender: f.GetGender(),
DominantEmotion: f.GetDominantEmotion(),
Emotion: f.GetEmotion(),
DominantRace: f.GetDominantRace(),
Race: f.GetRace(),
IsReal: f.GetIsReal(),
AntispoofScore: f.GetAntispoofScore(),
}
}
return c.JSON(http.StatusOK, response)
}
}

View File

@@ -0,0 +1,54 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceEmbedEndpoint extracts a face embedding vector from an image.
//
// Distinct from /v1/embeddings, which is OpenAI-compatible and text-only
// by contract (its `input` field is a string or string list of TEXT to
// embed). Passing an image data-URI to /v1/embeddings does not work —
// use this endpoint instead.
//
// @Summary Extract a face embedding from an image.
// @Tags face-recognition
// @Param request body schema.FaceEmbedRequest true "query params"
// @Success 200 {object} schema.FaceEmbedResponse "Response"
// @Router /v1/face/embed [post]
func FaceEmbedEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceEmbedRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceEmbed", "model", cfg.Name, "backend", cfg.Backend)
vec, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
return c.JSON(http.StatusOK, schema.FaceEmbedResponse{
Embedding: vec,
Dim: len(vec),
Model: cfg.Name,
})
}
}

View File

@@ -0,0 +1,45 @@
package localai
import (
"errors"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/xlog"
)
// FaceForgetEndpoint removes a previously-registered face by ID.
// @Summary Remove a previously-registered face by ID.
// @Tags face-recognition
// @Param request body schema.FaceForgetRequest true "query params"
// @Success 204 "No Content"
// @Router /v1/face/forget [post]
func FaceForgetEndpoint(registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceForgetRequest)
if !ok {
// Forget doesn't need a face model loaded — fall back to a raw bind
// when the request extractor hasn't run (e.g. when the route was
// registered without SetModelAndConfig).
input = new(schema.FaceForgetRequest)
if err := c.Bind(input); err != nil {
return echo.ErrBadRequest
}
}
if input.ID == "" {
return echo.NewHTTPError(http.StatusBadRequest, "id is required")
}
xlog.Debug("FaceForget", "id", input.ID)
if err := registry.Forget(c.Request().Context(), input.ID); err != nil {
if errors.Is(err, facerecognition.ErrNotFound) {
return echo.NewHTTPError(http.StatusNotFound, err.Error())
}
return err
}
return c.NoContent(http.StatusNoContent)
}
}

View File

@@ -0,0 +1,80 @@
package localai
import (
"cmp"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// defaultIdentifyThreshold is the cosine-distance cutoff applied when
// the client does not specify one. Tuned for buffalo_l ArcFace R50;
// other recognizers (e.g. SFace) should override it explicitly.
const defaultIdentifyThreshold = float32(0.35)
// FaceIdentifyEndpoint runs 1:N identification against the registered store.
// @Summary Identify a face against the registered database (1:N recognition).
// @Tags face-recognition
// @Param request body schema.FaceIdentifyRequest true "query params"
// @Success 200 {object} schema.FaceIdentifyResponse "Response"
// @Router /v1/face/identify [post]
func FaceIdentifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceIdentifyRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
topK := cmp.Or(input.TopK, 5)
threshold := cmp.Or(input.Threshold, defaultIdentifyThreshold)
xlog.Debug("FaceIdentify", "model", cfg.Name, "topK", topK, "threshold", threshold)
probe, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
matches, err := registry.Identify(c.Request().Context(), probe, topK)
if err != nil {
return err
}
response := schema.FaceIdentifyResponse{
Matches: make([]schema.FaceIdentifyMatch, len(matches)),
}
for i, m := range matches {
confidence := (1 - m.Distance/threshold) * 100
if confidence < 0 {
confidence = 0
}
if confidence > 100 {
confidence = 100
}
response.Matches[i] = schema.FaceIdentifyMatch{
ID: m.ID,
Name: m.Metadata.Name,
Labels: m.Metadata.Labels,
Distance: m.Distance,
Confidence: confidence,
Match: m.Distance <= threshold,
}
}
return c.JSON(http.StatusOK, response)
}
}

View File

@@ -0,0 +1,60 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceRegisterEndpoint enrolls a face into the 1:N identification store.
// @Summary Register a face for 1:N identification.
// @Tags face-recognition
// @Param request body schema.FaceRegisterRequest true "query params"
// @Success 200 {object} schema.FaceRegisterResponse "Response"
// @Router /v1/face/register [post]
func FaceRegisterEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, registry facerecognition.Registry) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceRegisterRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
if input.Name == "" {
return echo.NewHTTPError(http.StatusBadRequest, "name is required")
}
img, err := decodeImageInput(input.Img)
if err != nil {
return err
}
xlog.Debug("FaceRegister", "model", cfg.Name, "name", input.Name)
embedding, err := backend.FaceEmbed(img, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
stored, err := registry.Register(c.Request().Context(), embedding, facerecognition.Metadata{
Name: input.Name,
Labels: input.Labels,
})
if err != nil {
return err
}
return c.JSON(http.StatusOK, schema.FaceRegisterResponse{
ID: stored.ID,
Name: stored.Name,
RegisteredAt: stored.RegisteredAt,
})
}
}

View File

@@ -0,0 +1,68 @@
package localai
import (
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// FaceVerifyEndpoint compares two images and reports whether they depict the same person.
// @Summary Verify that two images depict the same person.
// @Tags face-recognition
// @Param request body schema.FaceVerifyRequest true "query params"
// @Success 200 {object} schema.FaceVerifyResponse "Response"
// @Router /v1/face/verify [post]
func FaceVerifyEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.FaceVerifyRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
img1, err := decodeImageInput(input.Img1)
if err != nil {
return err
}
img2, err := decodeImageInput(input.Img2)
if err != nil {
return err
}
xlog.Debug("FaceVerify", "model", cfg.Name, "backend", cfg.Backend)
res, err := backend.FaceVerify(img1, img2, input.Threshold, input.AntiSpoofing, ml, appConfig, *cfg)
if err != nil {
return mapBackendError(err)
}
return c.JSON(http.StatusOK, schema.FaceVerifyResponse{
Verified: res.GetVerified(),
Distance: res.GetDistance(),
Threshold: res.GetThreshold(),
Confidence: res.GetConfidence(),
Model: res.GetModel(),
Img1Area: schema.FacialArea{
X: res.GetImg1Area().GetX(),
Y: res.GetImg1Area().GetY(),
W: res.GetImg1Area().GetW(),
H: res.GetImg1Area().GetH(),
},
Img2Area: schema.FacialArea{
X: res.GetImg2Area().GetX(),
Y: res.GetImg2Area().GetY(),
W: res.GetImg2Area().GetW(),
H: res.GetImg2Area().GetH(),
},
ProcessingTimeMs: res.GetProcessingTimeMs(),
})
}
}

View File

@@ -0,0 +1,55 @@
package localai
import (
"fmt"
"net/http"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/pkg/utils"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
// decodeImageInput resolves a URL, data-URI, or plain-string image
// input to a base64 payload ready for the gRPC surface. Errors from
// the underlying utils helper (bad URL, not a data-URI, download
// failure, etc.) are all caused by what the client sent — we surface
// them as 400 rather than the default 500 so API consumers can
// distinguish "you sent bad input" from "our server broke".
//
// This is the single-input path for endpoints where the image IS the
// request (detection, face recognition, etc.). The multi-modal message
// paths (chat completions, responses API, realtime) intentionally
// log-and-skip individual media parts; they don't use this helper.
func decodeImageInput(s string) (string, error) {
img, err := utils.GetContentURIAsBase64(s)
if err != nil {
return "", echo.NewHTTPError(http.StatusBadRequest, fmt.Sprintf("invalid image input: %v", err))
}
return img, nil
}
// mapBackendError converts the gRPC status code a backend returns into
// a matching HTTP status. Without this, every backend error defaults
// to 500 — which lies to API consumers when the backend is telling us
// "your input was bad" (INVALID_ARGUMENT) or "the resource doesn't
// exist" (NOT_FOUND). Pass any err from a `core/backend/*` call
// through this before returning from a handler.
func mapBackendError(err error) error {
if err == nil {
return nil
}
if st, ok := status.FromError(err); ok {
switch st.Code() {
case codes.InvalidArgument:
return echo.NewHTTPError(http.StatusBadRequest, st.Message())
case codes.NotFound:
return echo.NewHTTPError(http.StatusNotFound, st.Message())
case codes.FailedPrecondition:
return echo.NewHTTPError(http.StatusPreconditionFailed, st.Message())
case codes.Unimplemented:
return echo.NewHTTPError(http.StatusNotImplemented, st.Message())
}
}
return err
}

View File

@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
if err := c.Bind(&req); err != nil || req.Backend == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
}
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
if err != nil {
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))

View File

@@ -110,6 +110,27 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
})
}
// The UI reads ApiKeys from GET /api/settings, which already returns the
// merged env+runtime list. When the user clicks Save, the same merged
// list comes back in the POST body. Strip the env-supplied keys from
// the incoming list before we persist or re-merge, otherwise each save
// duplicates the env keys on top of the previous merge (#9071).
if settings.ApiKeys != nil {
envKeys := startupConfig.ApiKeys
envSet := make(map[string]struct{}, len(envKeys))
for _, k := range envKeys {
envSet[k] = struct{}{}
}
runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
for _, k := range *settings.ApiKeys {
if _, fromEnv := envSet[k]; fromEnv {
continue
}
runtimeOnly = append(runtimeOnly, k)
}
settings.ApiKeys = &runtimeOnly
}
settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
settingsJSON, err := json.MarshalIndent(settings, "", " ")
if err != nil {

View File

@@ -147,6 +147,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
result := ""
lastEmittedCount := 0
sentInitialRole := false
sentReasoning := false
hasChatDeltaToolCalls := false
hasChatDeltaContent := false
@@ -190,6 +191,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
}},
Object: "chat.completion.chunk",
}
sentReasoning = true
}
// Stream content deltas (cleaned of reasoning tags) while no tool calls
@@ -363,7 +365,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
}
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0
// noAction is a sentinel "just answer" pseudo-function — not a real
// tool call. Scan the whole slice rather than only index 0 so we
// don't drop a real tool call that happens to follow a noAction
// entry, and so the default branch isn't entered with only noAction
// entries to emit as tool_calls.
noActionToRun := !hasRealCall(functionResults, noAction)
switch {
case noActionToRun:
@@ -377,108 +384,31 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
}
if sentInitialRole {
// Content was already streamed during the callback — just emit usage.
delta := &schema.Message{}
if reasoning != "" && extractor.Reasoning() == "" {
delta.Reasoning = &reasoning
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
}
} else {
// Content was NOT streamed — send everything at once (fallback).
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
Object: "chat.completion.chunk",
}
result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
if err != nil {
xlog.Error("error handling question", "error", err)
return err
}
delta := &schema.Message{Content: &result}
if reasoning != "" {
delta.Reasoning = &reasoning
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
var result string
if !sentInitialRole {
var hqErr error
result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
if hqErr != nil {
xlog.Error("error handling question", "error", hqErr)
return hqErr
}
}
for _, chunk := range buildNoActionFinalChunks(
id, req.Model, created,
sentInitialRole, sentReasoning,
result, reasoning, usage,
) {
responses <- chunk
}
default:
for i, ss := range functionResults {
name, args := ss.Name, ss.Arguments
toolCallID := ss.ID
if toolCallID == "" {
toolCallID = id
}
if i < lastEmittedCount {
// Already emitted during streaming by the incremental
// JSON/XML parser — skip to avoid duplicate tool calls.
continue
}
// Tool call not yet emitted — send name + args (two chunks).
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: name,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
responses <- initialMessage
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
Content: textContentToReturn,
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Arguments: args,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
for _, chunk := range buildDeferredToolCallChunks(
id, req.Model, created,
functionResults, lastEmittedCount,
sentInitialRole, *textContentToReturn,
sentReasoning, reasoning,
) {
responses <- chunk
}
}

View File

@@ -0,0 +1,233 @@
package openai
import (
"fmt"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
)
// hasRealCall reports whether functionResults contains at least one
// entry whose Name is something other than the noAction sentinel.
// Used by processTools to decide between the "answer the question"
// path and the real tool-call flush.
func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
for _, fc := range functionResults {
if fc.Name != noAction {
return true
}
}
return false
}
// buildNoActionFinalChunks produces the closing SSE chunks for the
// noActionToRun branch of processTools (i.e. the model chose the "answer"
// pseudo-function or emitted no tool calls at all).
//
// When content was already streamed (contentAlreadyStreamed=true) the
// helper emits a single trailing usage chunk, optionally carrying
// reasoning that was produced but not streamed incrementally. When
// content was not streamed it emits a role chunk followed by a
// content+reasoning+usage chunk — the "send everything at once" fallback.
//
// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
// probing the extractor's Go-side state: the C++ autoparser delivers
// reasoning through ProcessChatDeltaReasoning which populates a
// separate accumulator that extractor.Reasoning() does not expose.
// Without this guard the callback would stream reasoning incrementally
// and the final chunk would duplicate it.
func buildNoActionFinalChunks(
id, model string,
created int,
contentAlreadyStreamed bool,
reasoningAlreadyStreamed bool,
content string,
reasoning string,
usage schema.OpenAIUsage,
) []schema.OpenAIResponse {
var out []schema.OpenAIResponse
if contentAlreadyStreamed {
delta := &schema.Message{}
if reasoning != "" && !reasoningAlreadyStreamed {
r := reasoning
delta.Reasoning = &r
}
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
})
return out
}
// Content was not streamed — send role, then content (+reasoning) + usage.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Role: "assistant"},
Index: 0,
}},
Object: "chat.completion.chunk",
})
c := content
delta := &schema.Message{Content: &c}
if reasoning != "" && !reasoningAlreadyStreamed {
r := reasoning
delta.Reasoning = &r
}
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
})
return out
}
// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
// were discovered only during final parsing (i.e. after the streaming
// callback finished). The caller forwards every returned chunk to the
// responses channel.
//
// Guarantees:
// - tool calls with i < lastEmittedCount are skipped (already streamed)
// - each emitted call yields two chunks: name-only, then args-only
// - no chunk ever carries both non-empty Content and non-empty ToolCalls
// - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
// - if !reasoningAlreadyStreamed && reasoningContent != "",
// a reasoning chunk is emitted first
// - if !contentAlreadyStreamed && textContent != "",
// a role chunk followed by a content chunk is emitted (after reasoning)
// - chunks order: [reasoning?] [role+content?] (name, args)+
// - fallback IDs for empty ss.ID are unique per index so a client can
// match tool_result messages back to the right call
func buildDeferredToolCallChunks(
id, model string,
created int,
functionResults []functions.FuncCallResults,
lastEmittedCount int,
contentAlreadyStreamed bool,
textContent string,
reasoningAlreadyStreamed bool,
reasoningContent string,
) []schema.OpenAIResponse {
// If every call was already emitted incrementally there's nothing to
// flush — and no reason to emit a standalone reasoning/content chunk.
hasDeferred := false
for i := range functionResults {
if i >= lastEmittedCount {
hasDeferred = true
break
}
}
if !hasDeferred {
return nil
}
var out []schema.OpenAIResponse
// Reasoning first — the callback path at processTools emits reasoning
// incrementally in its own chunks, but when the C++ autoparser only
// surfaces reasoning as a final aggregate the callback never sees it.
// Recover it here (no duplication: contentAlreadyStreamed and
// reasoningAlreadyStreamed track what the callback already sent).
if !reasoningAlreadyStreamed && reasoningContent != "" {
r := reasoningContent
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Reasoning: &r},
Index: 0,
}},
Object: "chat.completion.chunk",
})
}
// Then content, when it wasn't streamed via the callback. Emit role
// and content in separate deltas — the OpenAI streaming contract
// forbids bundling content alongside tool_calls in one delta.
if !contentAlreadyStreamed && textContent != "" {
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Role: "assistant"},
Index: 0,
}},
Object: "chat.completion.chunk",
})
c := textContent
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Content: &c},
Index: 0,
}},
Object: "chat.completion.chunk",
})
}
for i, ss := range functionResults {
if i < lastEmittedCount {
// Already streamed by the incremental JSON/XML parser during
// the token callback — skip to avoid a duplicate emission.
continue
}
toolCallID := ss.ID
if toolCallID == "" {
// Unique per-index fallback so multiple empty-ID calls don't
// collide on the same request ID (clients match tool results
// back by tool_call_id).
toolCallID = fmt.Sprintf("%s-%d", id, i)
}
// Name chunk.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: ss.Name,
},
}},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
})
// Args chunk — no Content here. Either it was streamed through
// the token callback earlier, or the role+content pair above
// already delivered it.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Arguments: ss.Arguments,
},
}},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
})
}
return out
}

View File

@@ -0,0 +1,717 @@
package openai
import (
"fmt"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// contentOf extracts the string payload from a chunk's delta.Content,
// transparently handling both *string and string underlying types so
// assertions don't have to care which one the helper produced.
func contentOf(ch schema.OpenAIResponse) string {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return ""
}
switch v := ch.Choices[0].Delta.Content.(type) {
case *string:
if v == nil {
return ""
}
return *v
case string:
return v
default:
return ""
}
}
// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
// *string on schema.Message.
func reasoningOf(ch schema.OpenAIResponse) string {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return ""
}
r := ch.Choices[0].Delta.Reasoning
if r == nil {
return ""
}
return *r
}
// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return nil
}
return ch.Choices[0].Delta.ToolCalls
}
// expectSpecCompliant enforces the invariants on every chunk:
// - Object == "chat.completion.chunk"
// - Exactly one Choice with Index==0
// - No delta ever carries both non-empty Content and non-empty ToolCalls
// - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
func expectSpecCompliant(chunks []schema.OpenAIResponse) {
for i, ch := range chunks {
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
hasContent := contentOf(ch) != ""
hasReasoning := reasoningOf(ch) != ""
hasToolCalls := len(toolCallsOf(ch)) > 0
if hasContent && hasToolCalls {
Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
}
if hasReasoning && hasToolCalls {
Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
}
}
}
// expectMetadata asserts every chunk carries the same id/model/created.
func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
for i, ch := range chunks {
Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
}
}
var _ = Describe("buildDeferredToolCallChunks", func() {
const (
testID = "req"
testModel = "test-model"
testCreated = 1700000000
)
Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
It("emits only the tool_call chunks, no Content anywhere", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "Let me search…",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2), "two chunks: name, args")
// Name chunk
tc0 := toolCallsOf(chunks[0])
Expect(tc0).To(HaveLen(1))
Expect(tc0[0].Index).To(Equal(0))
Expect(tc0[0].ID).To(Equal("tc1"))
Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
Expect(contentOf(chunks[0])).To(BeEmpty())
// Args chunk — MUST NOT carry Content
tc1 := toolCallsOf(chunks[1])
Expect(tc1).To(HaveLen(1))
Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
Expect(contentOf(chunks[1])).To(BeEmpty(),
"args chunk must not duplicate already-streamed content")
})
})
Describe("Case B — autoparser / content not streamed", func() {
It("emits role, content, then name+args", func() {
results := []functions.FuncCallResults{
{Name: "do", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "Here is my plan…",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(4), "role, content, name, args")
// Role chunk
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
// Content chunk
Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
Expect(toolCallsOf(chunks[1])).To(BeEmpty())
// Name + args chunks
Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case C — multiple deferred calls, content already streamed", func() {
It("emits (name, args) × 3 with no Content anywhere", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tcA"},
{Name: "b", Arguments: "{}", ID: "tcB"},
{Name: "c", Arguments: "{}", ID: "tcC"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "some narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(6))
for i := 0; i < 3; i++ {
Expect(contentOf(chunks[2*i])).To(BeEmpty(),
"call #%d name chunk must not carry Content", i)
Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
"call #%d args chunk must not carry Content", i)
Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
}
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
})
})
Describe("Case D — partial incremental emission", func() {
It("emits only the deferred tail (call #1), skipping #0", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 1,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case E — all calls already emitted incrementally", func() {
It("emits nothing", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 2,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(BeEmpty())
})
})
Describe("Case F — content not streamed but textContent empty", func() {
It("emits only the tool call chunks, no leading role/content", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: "tcX"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
It("emits a deterministic per-index fallback", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: ""},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
expectedID := fmt.Sprintf("%s-%d", testID, 0)
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
})
})
Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
It("avoids the collision bug where every empty-ID call shared the request id", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: ""},
{Name: "b", Arguments: "{}", ID: ""},
{Name: "c", Arguments: "{}", ID: ""},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(6))
ids := map[string]int{}
for _, ch := range chunks {
for _, tc := range toolCallsOf(ch) {
ids[tc.ID]++
}
}
// Each call yields a name chunk + args chunk → each distinct ID
// should appear in exactly two chunks. Three distinct IDs
// overall.
Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
for id, n := range ids {
Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
}
})
})
Describe("Case H — indices preserved across skip with multiple calls", func() {
It("emits Index fields matching functionResults positions", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
{Name: "c", Arguments: "{}", ID: "tc2"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 1,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(4))
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
})
})
Describe("Case I — explicit non-empty ID is preserved", func() {
It("does not touch ss.ID when it's already set", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: "abc123"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
})
})
Describe("Case J — chunk-shape sanity", func() {
It("splits Name into the first chunk and Arguments into the second", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
})
})
Describe("Case K — metadata propagation", func() {
It("stamps every chunk with the same id/model/created", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tcA"},
{Name: "b", Arguments: "{}", ID: "tcB"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "hello",
true, "",
)
expectSpecCompliant(chunks)
expectMetadata(chunks, testID, testModel, testCreated)
})
})
Describe("Case L — Choices[0].Index == 0 invariant", func() {
It("is upheld across every branch the helper can take", func() {
scenarios := []struct {
name string
functionResults []functions.FuncCallResults
lastEmittedCount int
contentStreamed bool
text string
reasoningStreamed bool
reasoning string
}{
{"streamed-content-deferred-call",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, true, "hi", true, ""},
{"unstreamed-content-deferred-call",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, false, "hello", true, ""},
{"unstreamed-reasoning-and-content",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, false, "hello", false, "thinking…"},
{"partial-incremental",
[]functions.FuncCallResults{
{Name: "a", Arguments: "{}"},
{Name: "b", Arguments: "{}"}},
1, true, "hi", true, ""},
}
for _, sc := range scenarios {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
sc.functionResults, sc.lastEmittedCount,
sc.contentStreamed, sc.text,
sc.reasoningStreamed, sc.reasoning,
)
for i, ch := range chunks {
Expect(ch.Choices[0].Index).To(Equal(0),
"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
}
}
})
})
Describe("Case M — spec compliance across every scenario", func() {
It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
scenarios := []struct {
name string
functionResults []functions.FuncCallResults
lastEmittedCount int
contentStreamed bool
text string
reasoningStreamed bool
reasoning string
}{
{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, true, "already-streamed", true, ""},
{"C", []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"}},
0, true, "already-streamed", true, ""},
{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, false, "plan", true, ""},
{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, false, "plan", false, "thinking…"},
}
for _, sc := range scenarios {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
sc.functionResults, sc.lastEmittedCount,
sc.contentStreamed, sc.text,
sc.reasoningStreamed, sc.reasoning,
)
for i, ch := range chunks {
hasContent := contentOf(ch) != ""
hasReasoning := reasoningOf(ch) != ""
hasToolCalls := len(toolCallsOf(ch)) > 0
Expect(hasContent && hasToolCalls).To(BeFalse(),
"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
Expect(hasReasoning && hasToolCalls).To(BeFalse(),
"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
}
}
})
})
Describe("Case N — empty functionResults", func() {
It("emits nothing, including no leading role/content/reasoning", func() {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
nil, 0,
false, "ignored",
false, "ignored",
)
Expect(chunks).To(BeEmpty())
})
})
Describe("Case O — content not streamed but all calls already emitted", func() {
It("emits nothing, not even a standalone content chunk", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 2,
false, "narration",
false, "thinking…",
)
Expect(chunks).To(BeEmpty(),
"no tool_calls to trigger on, so no leading role/content/reasoning either")
})
})
Describe("Reasoning — autoparser delivered reasoning only at end", func() {
It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "streamed content",
false, "model's private thoughts",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(3), "reasoning, name, args")
Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
// The following two are the tool_call name + args chunks.
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
})
It("emits reasoning before role+content when neither was streamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "final plan",
false, "private thoughts",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[2])).To(Equal("final plan"))
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
})
It("does not re-emit reasoning that was already streamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "streamed",
true, "already-sent reasoning",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
for _, ch := range chunks {
Expect(reasoningOf(ch)).To(BeEmpty())
}
})
})
})
var _ = Describe("hasRealCall", func() {
const noAction = "answer"
It("returns false for nil and empty slices", func() {
Expect(hasRealCall(nil, noAction)).To(BeFalse())
Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
})
It("returns false when every entry is the noAction sentinel", func() {
results := []functions.FuncCallResults{
{Name: noAction, Arguments: `{"message":"hi"}`},
{Name: noAction, Arguments: `{"message":"hello"}`},
}
Expect(hasRealCall(results, noAction)).To(BeFalse())
})
It("returns true when only one entry is a real call", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: "{}"},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
It("returns true when a real call follows a noAction entry", func() {
// This is the regression the follow-up fixes: the old
// functionResults[0].Name == noAction check would declare this
// noActionToRun and drop the real call entirely.
results := []functions.FuncCallResults{
{Name: noAction, Arguments: `{"message":"hi"}`},
{Name: "search", Arguments: "{}"},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
It("returns true when a real call precedes a noAction entry", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: "{}"},
{Name: noAction, Arguments: `{"message":"hi"}`},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
})
var _ = Describe("buildNoActionFinalChunks", func() {
const (
testID = "req"
testModel = "test-model"
testCreated = 1700000000
)
usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
Describe("Content streamed — trailing usage chunk", func() {
It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, true,
"", "already-streamed-reasoning", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(reasoningOf(chunks[0])).To(BeEmpty(),
"reasoning must not be re-emitted once it was streamed via the callback")
})
It("emits a trailing reasoning delivery when reasoning came only at end", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, false,
"", "autoparser final reasoning", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
})
It("omits reasoning when it's empty regardless of streamed flag", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, false,
"", "", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(reasoningOf(chunks[0])).To(BeEmpty())
})
})
Describe("Content not streamed — role, then content+usage", func() {
It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, true,
"the answer", "already-streamed-reasoning", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(contentOf(chunks[1])).To(Equal("the answer"))
Expect(reasoningOf(chunks[1])).To(BeEmpty(),
"reasoning must not be re-emitted if it was streamed earlier")
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
})
It("emits role, then content+reasoning when reasoning was not streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"the answer", "autoparser final reasoning", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[1])).To(Equal("the answer"))
Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
})
It("still emits content even when reasoning is empty", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"just an answer", "", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(contentOf(chunks[1])).To(Equal("just an answer"))
Expect(reasoningOf(chunks[1])).To(BeEmpty())
})
})
Describe("Metadata and shape invariants", func() {
It("stamps every chunk with the same id/model/created and object", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"hi", "reasoning", usage,
)
for i, ch := range chunks {
Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
Expect(ch.Choices).To(HaveLen(1))
Expect(ch.Choices[0].Index).To(Equal(0))
}
})
})
})

View File

@@ -3,6 +3,7 @@ package middleware
import (
"bytes"
"io"
"mime"
"net/http"
"slices"
"sync"
@@ -94,7 +95,8 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
initializeTracing(app.ApplicationConfig().TracingMaxItems)
if c.Request().Header.Get("Content-Type") != "application/json" {
ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
if ct != "application/json" {
return next(c)
}

View File

@@ -12,3 +12,4 @@ export const CAP_TOKENIZE = 'FLAG_TOKENIZE'
export const CAP_VAD = 'FLAG_VAD'
export const CAP_VIDEO = 'FLAG_VIDEO'
export const CAP_DETECTION = 'FLAG_DETECTION'
export const CAP_FACE_RECOGNITION = 'FLAG_FACE_RECOGNITION'

View File

@@ -97,6 +97,28 @@ func RegisterLocalAIRoutes(router *echo.Echo,
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_DETECTION)),
requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.DetectionRequest) }))
// Face recognition endpoints
faceMw := []echo.MiddlewareFunc{
requestExtractor.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_FACE_RECOGNITION)),
}
router.POST("/v1/face/verify",
localai.FaceVerifyEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceVerifyRequest) }))...)
router.POST("/v1/face/analyze",
localai.FaceAnalyzeEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceAnalyzeRequest) }))...)
router.POST("/v1/face/embed",
localai.FaceEmbedEndpoint(cl, ml, appConfig),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceEmbedRequest) }))...)
router.POST("/v1/face/register",
localai.FaceRegisterEndpoint(cl, ml, appConfig, app.FaceRegistry()),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceRegisterRequest) }))...)
router.POST("/v1/face/identify",
localai.FaceIdentifyEndpoint(cl, ml, appConfig, app.FaceRegistry()),
append(faceMw, requestExtractor.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.FaceIdentifyRequest) }))...)
// Forget does not load a face model — it only needs the registry.
router.POST("/v1/face/forget", localai.FaceForgetEndpoint(app.FaceRegistry()))
ttsHandler := localai.TTSEndpoint(cl, ml, appConfig)
router.POST("/tts",
ttsHandler,

View File

@@ -23,7 +23,6 @@ import (
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/http/endpoints/localai"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/p2p"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/pkg/model"
@@ -1458,24 +1457,5 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
}
// Logs API (admin only)
app.GET("/api/traces", func(c echo.Context) error {
if !appConfig.EnableTracing {
return c.JSON(503, map[string]any{
"error": "Tracing disabled",
})
}
traces := middleware.GetTraces()
return c.JSON(200, map[string]any{
"traces": traces,
})
}, adminMiddleware)
app.POST("/api/traces/clear", func(c echo.Context) error {
middleware.ClearTraces()
return c.JSON(200, map[string]any{
"message": "Traces cleared",
})
}, adminMiddleware)
}

View File

@@ -173,6 +173,123 @@ type Detection struct {
Mask string `json:"mask,omitempty"` // base64-encoded PNG segmentation mask
}
// ─── Face recognition ──────────────────────────────────────────────
//
// FacialArea describes a bounding box for a detected face.
type FacialArea struct {
X float32 `json:"x"`
Y float32 `json:"y"`
W float32 `json:"w"`
H float32 `json:"h"`
}
// FaceVerifyRequest compares two images to decide whether they depict
// the same person. Img1 and Img2 accept URL, base64, or data-URI.
type FaceVerifyRequest struct {
BasicModelRequest
Img1 string `json:"img1"`
Img2 string `json:"img2"`
Threshold float32 `json:"threshold,omitempty"`
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
}
type FaceVerifyResponse struct {
Verified bool `json:"verified"`
Distance float32 `json:"distance"`
Threshold float32 `json:"threshold"`
Confidence float32 `json:"confidence"`
Model string `json:"model"`
Img1Area FacialArea `json:"img1_area"`
Img2Area FacialArea `json:"img2_area"`
ProcessingTimeMs float32 `json:"processing_time_ms,omitempty"`
}
// FaceAnalyzeRequest asks the backend for demographic attributes on
// every face detected in Img.
type FaceAnalyzeRequest struct {
BasicModelRequest
Img string `json:"img"`
Actions []string `json:"actions,omitempty"` // subset of {"age","gender","emotion","race"}
AntiSpoofing bool `json:"anti_spoofing,omitempty"`
}
type FaceAnalyzeResponse struct {
Faces []FaceAnalysis `json:"faces"`
}
type FaceAnalysis struct {
Region FacialArea `json:"region"`
FaceConfidence float32 `json:"face_confidence"`
Age float32 `json:"age,omitempty"`
DominantGender string `json:"dominant_gender,omitempty"`
Gender map[string]float32 `json:"gender,omitempty"`
DominantEmotion string `json:"dominant_emotion,omitempty"`
Emotion map[string]float32 `json:"emotion,omitempty"`
DominantRace string `json:"dominant_race,omitempty"`
Race map[string]float32 `json:"race,omitempty"`
IsReal bool `json:"is_real,omitempty"`
AntispoofScore float32 `json:"antispoof_score,omitempty"`
}
// FaceEmbedRequest extracts a face embedding from an image. Distinct
// from /v1/embeddings (which is OpenAI-compatible and text-only); this
// endpoint accepts URL / base64 / data-URI image inputs.
type FaceEmbedRequest struct {
BasicModelRequest
Img string `json:"img"`
}
type FaceEmbedResponse struct {
Embedding []float32 `json:"embedding"`
Dim int `json:"dim"`
Model string `json:"model,omitempty"`
}
// FaceRegisterRequest enrolls a face into the 1:N recognition store.
type FaceRegisterRequest struct {
BasicModelRequest
Img string `json:"img"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
Store string `json:"store,omitempty"` // vector store model; empty = local-store default
}
type FaceRegisterResponse struct {
ID string `json:"id"`
Name string `json:"name"`
RegisteredAt time.Time `json:"registered_at"`
}
// FaceIdentifyRequest runs 1:N recognition: embed the probe and
// return the top-K nearest registered faces.
type FaceIdentifyRequest struct {
BasicModelRequest
Img string `json:"img"`
TopK int `json:"top_k,omitempty"`
Threshold float32 `json:"threshold,omitempty"` // optional cutoff on distance
Store string `json:"store,omitempty"`
}
type FaceIdentifyResponse struct {
Matches []FaceIdentifyMatch `json:"matches"`
}
type FaceIdentifyMatch struct {
ID string `json:"id"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
Distance float32 `json:"distance"`
Confidence float32 `json:"confidence"`
Match bool `json:"match"` // true when distance <= threshold
}
// FaceForgetRequest removes a previously-registered face by ID.
type FaceForgetRequest struct {
BasicModelRequest
ID string `json:"id"`
Store string `json:"store,omitempty"`
}
type ImportModelRequest struct {
URI string `json:"uri"`
Preferences json.RawMessage `json:"preferences,omitempty"`

View File

@@ -0,0 +1,60 @@
// Package facerecognition provides a swappable backing store for face
// embeddings and the 1:N identification pipeline that sits on top of it.
//
// The current implementation (NewStoreRegistry) is backed by LocalAI's
// in-memory local-store gRPC backend. This is in-memory only — all
// registrations are lost when LocalAI restarts.
//
// TODO: add a persistent PostgreSQL/pgvector-backed implementation for
// production deployments. The Registry interface is explicitly designed
// so the swap is a constructor change in core/application, with zero
// HTTP-handler changes.
package facerecognition
import (
"context"
"errors"
"time"
)
// Registry stores face embeddings keyed by an opaque ID and supports
// approximate similarity search. Implementations are expected to be
// safe for concurrent use.
type Registry interface {
// Register stores a face embedding alongside its metadata.
// Returns the stored metadata with ID and RegisteredAt populated.
// The embedding length must match the registry's expected dimension.
Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error)
// Identify returns up to topK matches for the probe embedding,
// sorted by ascending distance (closest first).
Identify(ctx context.Context, probe []float32, topK int) ([]Match, error)
// Forget removes a previously-registered embedding by ID.
// Returns ErrNotFound if the ID is unknown.
Forget(ctx context.Context, id string) error
}
// Metadata is the user-supplied payload stored alongside a face embedding.
type Metadata struct {
// ID is populated by the registry at Register time and should not be
// set by the caller. It is echoed back in Match.Metadata.
ID string `json:"id"`
Name string `json:"name"`
Labels map[string]string `json:"labels,omitempty"`
RegisteredAt time.Time `json:"registered_at"`
}
// Match is a single result from Identify, ranked by similarity.
type Match struct {
ID string
Metadata Metadata
Distance float32 // 1 - cosine_similarity; lower = closer
}
// Sentinel errors; callers should compare with errors.Is.
var (
ErrNotFound = errors.New("facerecognition: id not found")
ErrEmptyEmbedding = errors.New("facerecognition: embedding is empty")
ErrDimensionMismatch = errors.New("facerecognition: embedding dimension mismatch")
)

View File

@@ -0,0 +1,253 @@
package facerecognition_test
import (
"context"
"errors"
"math"
"sync"
"testing"
"github.com/mudler/LocalAI/core/services/facerecognition"
"github.com/mudler/LocalAI/pkg/grpc"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
grpclib "google.golang.org/grpc"
)
const dim = 4 // tiny test-friendly embedding dimension
func TestRegisterIdentifyForget(t *testing.T) {
t.Parallel()
reg, fake := newTestRegistry(t)
ctx := t.Context()
alice := []float32{1, 0, 0, 0}
bob := []float32{0, 1, 0, 0}
aliceMeta, err := reg.Register(ctx, alice, facerecognition.Metadata{Name: "Alice"})
if err != nil {
t.Fatalf("Register Alice: %v", err)
}
if aliceMeta.ID == "" {
t.Fatalf("Register returned empty ID")
}
if aliceMeta.RegisteredAt.IsZero() {
t.Fatalf("Register did not populate RegisteredAt")
}
bobMeta, err := reg.Register(ctx, bob, facerecognition.Metadata{Name: "Bob"})
if err != nil {
t.Fatalf("Register Bob: %v", err)
}
if bobMeta.ID == aliceMeta.ID {
t.Fatalf("IDs should be distinct, got %q twice", bobMeta.ID)
}
aliceID := aliceMeta.ID
if got, want := fake.len(), 2; got != want {
t.Fatalf("fake store has %d entries, want %d", got, want)
}
// Identify an Alice-like probe — she should win.
matches, err := reg.Identify(ctx, []float32{0.99, 0.01, 0, 0}, 2)
if err != nil {
t.Fatalf("Identify: %v", err)
}
if len(matches) == 0 {
t.Fatalf("no matches returned")
}
if matches[0].Metadata.Name != "Alice" {
t.Fatalf("top match name = %q, want Alice", matches[0].Metadata.Name)
}
if matches[0].ID != aliceID {
t.Fatalf("top match ID = %q, want %q", matches[0].ID, aliceID)
}
// Sorted ascending by distance.
for i := 1; i < len(matches); i++ {
if matches[i].Distance < matches[i-1].Distance {
t.Fatalf("matches not sorted by distance: %v", matches)
}
}
// Forget Alice → she's gone, Bob remains.
if err := reg.Forget(ctx, aliceID); err != nil {
t.Fatalf("Forget Alice: %v", err)
}
if got, want := fake.len(), 1; got != want {
t.Fatalf("after Forget, store has %d entries, want %d", got, want)
}
// Forget unknown ID → ErrNotFound (checkable via errors.Is).
if err := reg.Forget(ctx, "nonexistent"); !errors.Is(err, facerecognition.ErrNotFound) {
t.Fatalf("Forget unknown: err = %v, want ErrNotFound", err)
}
}
func TestRegisterRejectsBadEmbedding(t *testing.T) {
t.Parallel()
reg, _ := newTestRegistry(t)
ctx := t.Context()
tests := []struct {
name string
embed []float32
wantErr error
}{
{"empty", []float32{}, facerecognition.ErrEmptyEmbedding},
{"wrong_dim", []float32{1, 2}, facerecognition.ErrDimensionMismatch},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
_, err := reg.Register(ctx, tc.embed, facerecognition.Metadata{Name: "x"})
if !errors.Is(err, tc.wantErr) {
t.Fatalf("err = %v, want wrapping %v", err, tc.wantErr)
}
})
}
}
func TestConcurrent(t *testing.T) {
t.Parallel()
reg, _ := newTestRegistry(t)
ctx := t.Context()
done := make(chan struct{})
for i := range 32 {
go func(i int) {
embed := []float32{float32(i % 4), float32((i + 1) % 4), 0, 1}
meta, err := reg.Register(ctx, embed, facerecognition.Metadata{Name: "n"})
if err == nil {
_, _ = reg.Identify(ctx, embed, 3)
_ = reg.Forget(ctx, meta.ID)
}
done <- struct{}{}
}(i)
}
for range 32 {
<-done
}
}
// ─── fake gRPC backend ───────────────────────────────────────────────
func newTestRegistry(t *testing.T) (facerecognition.Registry, *fakeBackend) {
t.Helper()
fake := &fakeBackend{}
resolver := func(_ context.Context, _ string) (grpc.Backend, error) {
return fake, nil
}
return facerecognition.NewStoreRegistry(resolver, "test-store", dim), fake
}
// fakeBackend implements just enough of grpc.Backend for the store
// helpers. All other methods panic so any accidental dependency is
// visible in tests.
type fakeBackend struct {
grpc.Backend // embed to inherit no-op default method set via panic
mu sync.Mutex
keys [][]float32
vals [][]byte
}
func (f *fakeBackend) len() int {
f.mu.Lock()
defer f.mu.Unlock()
return len(f.keys)
}
func (f *fakeBackend) StoresSet(_ context.Context, in *pb.StoresSetOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
f.mu.Lock()
defer f.mu.Unlock()
for i, k := range in.Keys {
f.keys = append(f.keys, append([]float32(nil), k.Floats...))
f.vals = append(f.vals, append([]byte(nil), in.Values[i].Bytes...))
}
return &pb.Result{Success: true}, nil
}
func (f *fakeBackend) StoresDelete(_ context.Context, in *pb.StoresDeleteOptions, _ ...grpclib.CallOption) (*pb.Result, error) {
f.mu.Lock()
defer f.mu.Unlock()
for _, k := range in.Keys {
idx := f.findKey(k.Floats)
if idx < 0 {
continue
}
f.keys = append(f.keys[:idx], f.keys[idx+1:]...)
f.vals = append(f.vals[:idx], f.vals[idx+1:]...)
}
return &pb.Result{Success: true}, nil
}
func (f *fakeBackend) StoresFind(_ context.Context, in *pb.StoresFindOptions, _ ...grpclib.CallOption) (*pb.StoresFindResult, error) {
f.mu.Lock()
defer f.mu.Unlock()
type scored struct {
key []float32
val []byte
sim float32
}
results := make([]scored, 0, len(f.keys))
for i, k := range f.keys {
results = append(results, scored{k, f.vals[i], cosine(k, in.Key.Floats)})
}
// Sort descending by similarity.
for i := 0; i < len(results); i++ {
for j := i + 1; j < len(results); j++ {
if results[j].sim > results[i].sim {
results[i], results[j] = results[j], results[i]
}
}
}
top := int(in.TopK)
if top <= 0 || top > len(results) {
top = len(results)
}
out := &pb.StoresFindResult{}
for _, r := range results[:top] {
out.Keys = append(out.Keys, &pb.StoresKey{Floats: r.key})
out.Values = append(out.Values, &pb.StoresValue{Bytes: r.val})
out.Similarities = append(out.Similarities, r.sim)
}
return out, nil
}
func (f *fakeBackend) findKey(target []float32) int {
for i, k := range f.keys {
if equalFloats(k, target) {
return i
}
}
return -1
}
func equalFloats(a, b []float32) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if a[i] != b[i] {
return false
}
}
return true
}
func cosine(a, b []float32) float32 {
var dot, na, nb float64
for i := range a {
dot += float64(a[i]) * float64(b[i])
na += float64(a[i]) * float64(a[i])
nb += float64(b[i]) * float64(b[i])
}
if na == 0 || nb == 0 {
return 0
}
return float32(dot / (math.Sqrt(na) * math.Sqrt(nb)))
}

View File

@@ -0,0 +1,142 @@
package facerecognition
import (
"context"
"encoding/json"
"fmt"
"sort"
"sync"
"time"
"github.com/google/uuid"
"github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/LocalAI/pkg/store"
)
// StoreResolver resolves a named vector store to a gRPC backend. The
// HTTP handler layer wires this to backend.StoreBackend so the
// registry stays decoupled from the ModelLoader plumbing.
type StoreResolver func(ctx context.Context, storeName string) (grpc.Backend, error)
// NewStoreRegistry returns a Registry backed by LocalAI's generic
// StoresSet / StoresFind / StoresDelete gRPC surface.
//
// storeName selects which vector-store model to use (defaults to the
// local-store Go backend). `dim` is the expected embedding dimension;
// pass 0 to accept whatever dimension arrives (useful when the face
// backend exposes multiple recognizers of different sizes, e.g.
// ArcFace R50 at 512 vs SFace at 128). A non-zero dim is enforced at
// Register time and fails fast with ErrDimensionMismatch.
func NewStoreRegistry(resolve StoreResolver, storeName string, dim int) Registry {
return &storeRegistry{
resolve: resolve,
storeName: storeName,
dim: dim,
}
}
type storeRegistry struct {
resolve StoreResolver
storeName string
dim int
// TODO(postgres): the local-store gRPC surface keys by embedding
// vector and exposes no "list all" method, so we cannot delete by
// ID without remembering the embedding. This in-memory index is
// rebuilt on every Register and lost on restart — acceptable while
// the only implementation is itself in-memory. A persistent
// implementation must rebuild this index at startup.
idIndex sync.Map // map[string][]float32
}
func (r *storeRegistry) Register(ctx context.Context, embedding []float32, meta Metadata) (Metadata, error) {
if len(embedding) == 0 {
return Metadata{}, ErrEmptyEmbedding
}
if r.dim != 0 && len(embedding) != r.dim {
return Metadata{}, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(embedding))
}
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return Metadata{}, fmt.Errorf("facerecognition: resolve store: %w", err)
}
meta.ID = uuid.NewString()
if meta.RegisteredAt.IsZero() {
meta.RegisteredAt = time.Now().UTC()
}
payload, err := json.Marshal(meta)
if err != nil {
return Metadata{}, fmt.Errorf("facerecognition: marshal metadata: %w", err)
}
if err := store.SetSingle(ctx, backend, embedding, payload); err != nil {
return Metadata{}, fmt.Errorf("facerecognition: set: %w", err)
}
// Retain a copy so Forget can look up the embedding by ID.
embCopy := append([]float32(nil), embedding...)
r.idIndex.Store(meta.ID, embCopy)
return meta, nil
}
func (r *storeRegistry) Identify(ctx context.Context, probe []float32, topK int) ([]Match, error) {
if len(probe) == 0 {
return nil, ErrEmptyEmbedding
}
if r.dim != 0 && len(probe) != r.dim {
return nil, fmt.Errorf("%w: expected %d, got %d", ErrDimensionMismatch, r.dim, len(probe))
}
if topK <= 0 {
topK = 5
}
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return nil, fmt.Errorf("facerecognition: resolve store: %w", err)
}
_, values, similarities, err := store.Find(ctx, backend, probe, topK)
if err != nil {
return nil, fmt.Errorf("facerecognition: find: %w", err)
}
matches := make([]Match, 0, len(values))
for i, raw := range values {
var meta Metadata
if err := json.Unmarshal(raw, &meta); err != nil {
// Skip unreadable entries instead of failing the whole query —
// the store may contain non-face records in shared deployments.
continue
}
matches = append(matches, Match{
ID: meta.ID,
Metadata: meta,
Distance: 1 - similarities[i],
})
}
sort.SliceStable(matches, func(i, j int) bool { return matches[i].Distance < matches[j].Distance })
return matches, nil
}
func (r *storeRegistry) Forget(ctx context.Context, id string) error {
raw, ok := r.idIndex.Load(id)
if !ok {
return ErrNotFound
}
embedding := raw.([]float32)
backend, err := r.resolve(ctx, r.storeName)
if err != nil {
return fmt.Errorf("facerecognition: resolve store: %w", err)
}
if err := store.DeleteSingle(ctx, backend, embedding); err != nil {
return fmt.Errorf("facerecognition: delete: %w", err)
}
r.idIndex.Delete(id)
return nil
}

View File

@@ -124,8 +124,13 @@ func SubjectNodeBackendInstall(nodeID string) string {
// BackendInstallRequest is the payload for a backend.install NATS request.
type BackendInstallRequest struct {
Backend string `json:"backend"`
ModelID string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
ModelID string `json:"model_id,omitempty"`
BackendGalleries string `json:"backend_galleries,omitempty"`
// URI is set for external installs (OCI image, URL, or path). When non-empty
// the worker routes to InstallExternalBackend instead of the gallery lookup.
URI string `json:"uri,omitempty"`
Name string `json:"name,omitempty"`
Alias string `json:"alias,omitempty"`
}
// BackendInstallReply is the response from a backend.install NATS request.

View File

@@ -168,6 +168,12 @@ func (c *fakeBackendClient) SoundGeneration(_ context.Context, _ *pb.SoundGenera
func (c *fakeBackendClient) Detect(_ context.Context, _ *pb.DetectOptions, _ ...ggrpc.CallOption) (*pb.DetectResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
return nil, nil
}
func (c *fakeBackendClient) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
return nil, nil
}

View File

@@ -91,6 +91,14 @@ func (f *fakeGRPCBackend) Detect(_ context.Context, _ *pb.DetectOptions, _ ...gg
return &pb.DetectResponse{}, nil
}
func (f *fakeGRPCBackend) FaceVerify(_ context.Context, _ *pb.FaceVerifyRequest, _ ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
return &pb.FaceVerifyResponse{}, nil
}
func (f *fakeGRPCBackend) FaceAnalyze(_ context.Context, _ *pb.FaceAnalyzeRequest, _ ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
return &pb.FaceAnalyzeResponse{}, nil
}
func (f *fakeGRPCBackend) AudioTranscription(_ context.Context, _ *pb.TranscriptRequest, _ ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
return &pb.TranscriptResult{}, nil
}

View File

@@ -106,6 +106,13 @@ func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context
if node.Status == StatusPending {
continue
}
// Backend lifecycle ops only make sense on backend-type workers.
// Agent workers don't subscribe to backend.install/delete/list, so
// enqueueing for them guarantees a forever-retrying row that the
// reconciler can never drain. Silently skip — they aren't consumers.
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
continue
}
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
result.Nodes = append(result.Nodes, NodeOpStatus{
@@ -286,7 +293,7 @@ func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *gall
backendName := op.GalleryElementName
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
if err != nil {
return err
}
@@ -304,7 +311,7 @@ func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name str
galleriesJSON, _ := json.Marshal(d.backendGalleries)
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
if err != nil {
return err
}

View File

@@ -3,12 +3,14 @@ package nodes
import (
"context"
"encoding/json"
"errors"
"fmt"
"time"
"github.com/mudler/LocalAI/core/services/advisorylock"
grpcclient "github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/xlog"
"github.com/nats-io/nats.go"
"gorm.io/gorm"
)
@@ -186,7 +188,7 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
case OpBackendDelete:
_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
case OpBackendInstall, OpBackendUpgrade:
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries))
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
if err != nil {
applyErr = err
} else if !reply.Success {
@@ -206,12 +208,47 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
}
continue
}
// ErrNoResponders means the node has no active NATS subscription for
// this subject. Either its connection dropped, or it's the wrong
// node type entirely. Mark unhealthy so the health monitor's
// heartbeat-only pass doesn't immediately flip it back — and so
// ListDuePendingBackendOps (which filters by status=healthy) stops
// picking the row until the node genuinely recovers.
if errors.Is(applyErr, nats.ErrNoResponders) {
xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
"op", op.Op, "backend", op.Backend, "node", op.NodeID)
_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
}
// Dead-letter cap: after maxAttempts the row is the reconciler
// equivalent of a poison message. Delete it loudly so the queue
// doesn't churn NATS every tick forever — operators can re-issue
// the op from the UI if they still want it applied.
if op.Attempts+1 >= maxPendingBackendOpAttempts {
xlog.Error("Reconciler: abandoning pending backend op after max attempts",
"op", op.Op, "backend", op.Backend, "node", op.NodeID,
"attempts", op.Attempts+1, "last_error", applyErr)
if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
}
continue
}
_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
xlog.Warn("Reconciler: pending backend op retry failed",
"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
}
}
// maxPendingBackendOpAttempts caps how many times the reconciler retries a
// failing row before dead-lettering it. Ten attempts at exponential backoff
// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
// worker restart or network blip. Poisoned rows beyond that are almost
// certainly structural (wrong node type, non-existent gallery entry) and no
// amount of further retrying will help.
const maxPendingBackendOpAttempts = 10
// probeLoadedModels gRPC-health-checks model addresses that the DB says are
// loaded. If a model's backend process is gone (OOM, crash, manual restart)
// we remove the row so ghosts don't linger. Only probes rows older than

View File

@@ -373,4 +373,30 @@ var _ = Describe("ReplicaReconciler — state reconciliation", func() {
Expect(row.NextRetryAt).To(BeTemporally(">", before))
})
})
Describe("NewNodeRegistry malformed-row pruning", func() {
It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
// Three rows: one for a valid backend node (should survive),
// one for an agent node (pruned), one for an empty backend name
// on the valid node (pruned).
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
// Re-instantiating the registry runs the cleanup migration.
_, err := NewNodeRegistry(db)
Expect(err).ToNot(HaveOccurred())
var rows []PendingBackendOp
Expect(db.Find(&rows).Error).To(Succeed())
Expect(rows).To(HaveLen(1))
Expect(rows[0].NodeID).To(Equal(backend.ID))
Expect(rows[0].Backend).To(Equal("foo"))
})
})
})

View File

@@ -148,6 +148,30 @@ func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
}); err != nil {
return nil, fmt.Errorf("migrating node tables: %w", err)
}
// One-shot cleanup of queue rows that can never drain: ops targeted at
// agent workers (wrong subscription set), at non-existent nodes, or with
// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
// new ones from being written, but rows persisted by earlier versions
// keep the reconciler busy retrying a permanently-failing NATS request
// every 30s. Guarded by the same migration advisory lock so only one
// frontend runs it.
_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
res := db.Exec(`
DELETE FROM pending_backend_ops
WHERE backend = ''
OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
`, NodeTypeBackend)
if res.Error != nil {
xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
return res.Error
}
if res.RowsAffected > 0 {
xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
}
return nil
})
return &NodeRegistry{db: db}, nil
}

View File

@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
return "", fmt.Errorf("no NATS connection for backend installation")
}
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
if err != nil {
return "", err
}

View File

@@ -244,7 +244,7 @@ type fakeUnloader struct {
unloadErr error
}
func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
return f.installReply, f.installErr
}

View File

@@ -17,7 +17,7 @@ type backendStopRequest struct {
// NodeCommandSender abstracts NATS-based commands to worker nodes.
// Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
type NodeCommandSender interface {
InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
ListBackends(nodeID string) (*messaging.BackendListReply, error)
StopBackend(nodeID, backend string) error
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
// The worker installs the backend from gallery (if not already installed),
// starts the gRPC process, and replies when ready.
// Timeout: 5 minutes (gallery install can take a while).
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
subject := messaging.SubjectNodeBackendInstall(nodeID)
xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)
@@ -80,6 +80,9 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
Backend: backendType,
ModelID: modelID,
BackendGalleries: galleriesJSON,
URI: uri,
Name: name,
Alias: alias,
}, 5*time.Minute)
}

View File

@@ -24,6 +24,8 @@ const (
BackendTraceRerank BackendTraceType = "rerank"
BackendTraceTokenize BackendTraceType = "tokenize"
BackendTraceDetection BackendTraceType = "detection"
BackendTraceFaceVerify BackendTraceType = "face_verify"
BackendTraceFaceAnalyze BackendTraceType = "face_analyze"
BackendTraceModelLoad BackendTraceType = "model_load"
)

View File

@@ -14,11 +14,13 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend
### Request
The request body is JSON:
The model to monitor is passed as a query parameter:
| Parameter | Type | Required | Description |
|-----------|----------|----------|--------------------------------|
| `model` | `string` | Yes | Name of the model to monitor |
| Parameter | Type | Required | Location | Description |
|-----------|----------|----------|----------|--------------------------------|
| `model` | `string` | Yes | query | Name of the model to monitor |
For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.
### Response
@@ -42,9 +44,7 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
### Usage
```bash
curl http://localhost:8080/backend/monitor \
-H "Content-Type: application/json" \
-d '{"model": "my-model"}'
curl "http://localhost:8080/backend/monitor?model=my-model"
```
### Example response

View File

@@ -7,6 +7,10 @@ url = "/features/embeddings/"
LocalAI supports generating embeddings for text or list of tokens.
For face embeddings specifically, see the
[Face Recognition](/features/face-recognition/) feature — it produces
512-d L2-normalized vectors tuned for face similarity.
For the API documentation you can refer to the OpenAI docs: https://platform.openai.com/docs/api-reference/embeddings
## Model compatibility

View File

@@ -0,0 +1,228 @@
+++
disableToc = false
title = "Face Recognition"
weight = 14
url = "/features/face-recognition/"
+++
LocalAI supports face recognition through the `insightface` backend:
face verification (1:1), face identification (1:N) against a built-in
vector store, face embedding, face detection, and demographic analysis
(age / gender).
The backend ships **two interchangeable engines** under one image, each
paired with a distinct gallery entry so users can pick by license and
accuracy needs.
## Licensing — read this first
| Gallery entry | Detector + recognizer | Size | License |
|---|---|---|---|
| `insightface-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | ~326 MB | **Non-commercial research only** (upstream insightface weights) |
| `insightface-buffalo-s` | SCRFD-500MF + MBF + GenderAge | ~159 MB | **Non-commercial research only** |
| `insightface-opencv` | YuNet + SFace | ~40 MB | **Apache 2.0 — commercial-safe** |
The `insightface` Python library itself is MIT, but the pretrained model
packs (buffalo_l, buffalo_s, antelopev2) are released by the upstream
maintainers for **non-commercial research use only**. Pick the
`insightface-opencv` entry for production / commercial deployments.
## Quickstart
Pull the commercial-safe backend (recommended for copy-paste):
```bash
local-ai models install insightface-opencv
```
Verify that two images depict the same person:
```bash
curl -sX POST http://localhost:8080/v1/face/verify \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-opencv",
"img1": "https://example.com/alice_1.jpg",
"img2": "https://example.com/alice_2.jpg"
}'
```
Response:
```json
{
"verified": true,
"distance": 0.27,
"threshold": 0.35,
"confidence": 23.1,
"model": "insightface-opencv",
"img1_area": { "x": 120.4, "y": 82.1, "w": 198.3, "h": 260.5 },
"img2_area": { "x": 110.8, "y": 95.0, "w": 205.6, "h": 268.2 },
"processing_time_ms": 412.0
}
```
## 1:N identification workflow (register → identify → forget)
This is the primary "face recognition" flow. Under the hood it uses
LocalAI's built-in in-memory vector store — no external database to
stand up.
1. Register known faces:
```bash
curl -sX POST http://localhost:8080/v1/face/register \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-buffalo-l",
"name": "Alice",
"img": "https://example.com/alice.jpg"
}'
# → {"id": "8b7...", "name": "Alice", "registered_at": "2026-04-21T..."}
```
2. Identify an unknown probe:
```bash
curl -sX POST http://localhost:8080/v1/face/identify \
-H "Content-Type: application/json" \
-d '{
"model": "insightface-buffalo-l",
"img": "https://example.com/unknown.jpg",
"top_k": 5
}'
# → {"matches": [{"id":"8b7...","name":"Alice","distance":0.22,"match":true,...}]}
```
3. Remove a person by ID:
```bash
curl -sX POST http://localhost:8080/v1/face/forget \
-d '{"id": "8b7..."}'
# → 204 No Content
```
{{% alert icon="⚠️" color="warning" %}}
**Storage caveat.** The default vector store is in-memory. All
registered faces are lost when LocalAI restarts. Persistent storage
(pgvector) is a tracked future enhancement — the face-recognition HTTP
API is designed to swap the backing store without changing the wire
format.
{{% /alert %}}
## API reference
### `POST /v1/face/verify` (1:1)
| field | type | description |
|---|---|---|
| `model` | string | gallery entry name (e.g. `insightface-buffalo-l`) |
| `img1`, `img2` | string | URL, base64, or data-URI |
| `threshold` | float, optional | cosine-distance cutoff; default depends on engine |
| `anti_spoofing` | bool, optional | reserved — unused in the current release |
Returns `verified`, `distance`, `threshold`, `confidence`, `model`,
`img1_area`, `img2_area`, and `processing_time_ms`.
### `POST /v1/face/analyze`
Returns demographic attributes for every detected face:
| field | type | description |
|---|---|---|
| `model` | string | gallery entry |
| `img` | string | URL / base64 / data-URI |
| `actions` | string[] | subset of `["age","gender","emotion","race"]`; empty = all supported |
Only `insightface-buffalo-l` / `insightface-buffalo-s` populate age and
gender (genderage head). `insightface-opencv` returns face regions with
empty attributes — SFace has no demographic classifier. Emotion and
race are always empty in the current release.
### `POST /v1/face/register` (1:N enrollment)
| field | type | description |
|---|---|---|
| `model` | string | face recognition model |
| `img` | string | face to enroll |
| `name` | string | human-readable label |
| `labels` | map[string]string, optional | arbitrary metadata |
| `store` | string, optional | vector store model; defaults to local-store |
Returns `{id, name, registered_at}`. The `id` is an opaque UUID used by
`/v1/face/identify` and `/v1/face/forget`.
### `POST /v1/face/identify` (1:N recognition)
| field | type | description |
|---|---|---|
| `model` | string | face recognition model |
| `img` | string | probe image |
| `top_k` | int, optional | max matches to return; default 5 |
| `threshold` | float, optional | cosine-distance cutoff; default 0.35 (ArcFace) |
| `store` | string, optional | vector store model; defaults to local-store |
Returns a list of matches sorted by ascending distance, each with `id`,
`name`, `labels`, `distance`, `confidence`, and `match`
(`distance ≤ threshold`).
### `POST /v1/face/forget`
| field | type | description |
|---|---|---|
| `id` | string | ID returned by `/v1/face/register` |
Returns `204 No Content` on success, `404 Not Found` if the ID is
unknown.
### `POST /v1/face/embed`
Returns the L2-normalized face embedding vector for the detected face.
| field | type | description |
|---|---|---|
| `model` | string | face model |
| `img` | string | URL / base64 / data-URI |
Returns `{embedding: float[], dim: int, model: string}`. Dimension is
512 for the insightface ArcFace/MBF recognizers and 128 for OpenCV's
SFace.
> **Note:** the OpenAI-compatible `/v1/embeddings` endpoint is
> intentionally text-only by contract (`input` is a string or list of
> strings of TEXT to embed) — passing an image data-URI there does
> nothing useful. Use `/v1/face/embed` for image inputs.
### Reused endpoint
- `POST /v1/detection` — returns face bounding boxes with
`class_name: "face"`; works for both engines.
## Choosing an engine
| Need | Entry |
|---|---|
| Commercial product | `insightface-opencv` |
| Highest accuracy (research / demos) | `insightface-buffalo-l` |
| Edge / low-memory / research | `insightface-buffalo-s` |
The recommended default `threshold` for `/v1/face/verify` and
`/v1/face/identify` depends on the recognizer:
| Recognizer | Cosine-distance threshold |
|---|---|
| ArcFace R50 (`buffalo_l`) | ~0.35 |
| MBF (`buffalo_s`) | ~0.40 |
| SFace (`opencv`) | ~0.50 |
Pass `threshold` explicitly when switching engines — the per-engine
default only fires when the field is omitted.
## Related features
- [Object Detection](/features/object-detection/) — generic bounding-box
detection; `/v1/detection` works with the insightface backend too.
- [Embeddings](/features/embeddings/) — raw vector extraction; face
embeddings live in the same endpoint under the hood.
- [Stores](/features/stores/) — the generic vector store powering the
1:N recognition pipeline.

View File

@@ -7,6 +7,11 @@ url = "/features/object-detection/"
LocalAI supports object detection and image segmentation through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Available backends include [RF-DETR](https://github.com/roboflow/rf-detr) for object detection and [sam3.cpp](https://github.com/PABannier/sam3.cpp) for image segmentation (SAM 3/2/EdgeTAM).
For detecting **faces** specifically, see the dedicated
[Face Recognition](/features/face-recognition/) feature — its
`/v1/detection` support is tuned for face bounding boxes and ships
with commercially-safe model options.
## Overview
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.

View File

@@ -9,6 +9,14 @@ url = '/stores'
Stores are an experimental feature to help with querying data using similarity search. It is
a low level API that consists of only `get`, `set`, `delete` and `find`.
{{% alert icon="💡" color="info" %}}
**Face recognition uses this store.** The 1:N face identification flow
(`/v1/face/register`, `/v1/face/identify`, `/v1/face/forget`) is built
on top of the generic store — see
[Face Recognition](/features/face-recognition/) for the face-oriented
API.
{{% /alert %}}
For example if you have an embedding of some text and want to find text with similar embeddings.
You can create embeddings for chunks of all your text then compare them against the embedding of the text you
are searching on.

View File

@@ -130,6 +130,19 @@ Reference for system information commands and diagnostics.
---
### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
Policy for AI-assisted contributions — licensing, DCO, and attribution.
**Key topics:**
- Aligned with the Linux kernel's AI assistants policy
- Signed-off-by and DCO rules
- `Assisted-by` commit trailer format
- Scope and responsibility of the human submitter
**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
---
## Quick Links
| Task | Documentation |
@@ -138,6 +151,7 @@ Reference for system information commands and diagnostics.
| CLI commands | [CLI Reference](cli-reference.md) |
| Check compatibility | [Compatibility Table](compatibility-table.md) |
| System diagnostics | [System Info](system-info.md) |
| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |
---

View File

@@ -0,0 +1,79 @@
+++
disableToc = false
title = "AI Coding Assistants"
weight = 28
+++
This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
AI tools helping with LocalAI development should follow the standard project development process:
- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
## Licensing and Legal Requirements
All contributions must comply with LocalAI's licensing requirements:
- LocalAI is licensed under the **MIT License**
- New source files should use the SPDX license identifier `MIT` where applicable to the file type
- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
## Signed-off-by and Developer Certificate of Origin
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
## Attribution
When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Where:
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
Basic development tools (git, go, make, editors) should **not** be listed.
### Example
```
fix(llama-cpp): handle empty tool call arguments
Previously the parser panicked when the model returned a tool call with
an empty arguments object. Fall back to an empty JSON object in that
case so downstream consumers receive a valid payload.
Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the project style
- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
{{% notice note %}}
This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
{{% /notice %}}

View File

@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
|---------|-------------|-------------|
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |

View File

@@ -10,6 +10,10 @@ Release notes have been now moved completely over Github releases.
You can see the release notes [here](https://github.com/mudler/LocalAI/releases).
## 2026 Highlights
- **April 2026**: [Face recognition backend](/features/face-recognition/) — `insightface`-powered 1:1 verification, 1:N identification, face embedding, face detection, and demographic analysis. Ships both a non-commercial `buffalo_l` model and an Apache 2.0 OpenCV Zoo alternative.
## 2024 Highlights
- **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)

View File

@@ -1,4 +1,268 @@
---
- name: "qwen3.6-27b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
description: |
# Qwen3.6-27B
[](https://chat.qwen.ai)
> [!Note]
> This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.
>
> These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.
## Qwen3.6 Highlights
This release delivers substantial upgrades, particularly in
- **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
- **Thinking Preservation:** we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.
For more details, please refer to our blog post Qwen3.6-27B.
## Model Overview
...
license: "apache-2.0"
tags:
- llm
- gguf
- qwen
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
mmproj: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
options:
- use_jinja:true
parameters:
min_p: 0
model: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
presence_penalty: 1.5
repeat_penalty: 1
temperature: 0.7
top_k: 20
top_p: 0.8
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q4_K_M.gguf
sha256: 5ed60d0af4650a854b1755bd392f9aef4872643dc25a254bc68043fa638392a0
uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/Qwen3.6-27B-Q4_K_M.gguf
- filename: llama-cpp/mmproj/Qwen3.6-27B-GGUF/mmproj-F32.gguf
sha256: fdc443e974cad1f61c45af1cfd5580855855ddce0d6c14cc500a5714c486ac1d
uri: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/resolve/main/mmproj-F32.gguf
- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
description: |
# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
- **Developed by:** @hesamation
- **Base model:** `Qwen/Qwen3.6-35B-A3B`
- **License:** apache-2.0
This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
[](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
## Benchmark Results
The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
...
license: "apache-2.0"
tags:
- llm
- gguf
- qwen
- reasoning
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
min_p: 0
model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
presence_penalty: 1.5
repeat_penalty: 1
temperature: 0.7
top_k: 20
top_p: 0.8
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
- name: "qwen3.5-9b-glm5.1-distill-v1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
description: |
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
## 📌 Model Overview
**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
**Base Model:** Qwen3.5-9B
**Training Type:** Supervised Fine-Tuning (SFT, Distillation)
**Parameter Scale:** 9B
**Training Framework:** Unsloth
This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
The primary goals are to:
- Improve **structured reasoning ability**
- Enhance **instruction-following consistency**
- Activate **latent knowledge via better reasoning structure**
## 📊 Training Data
### Main Dataset
- `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
- Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
- Generated from a **GLM-5.1 teacher model**
- Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
- Training used a **filtered subset**, not the full source dataset.
### Auxiliary Dataset
- `Jackrong/Qwen3.5-reasoning-700x`
...
license: "apache-2.0"
tags:
- llm
- gguf
- qwen
- instruction-tuned
- reasoning
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
options:
- use_jinja:true
parameters:
min_p: 0
model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
presence_penalty: 1.5
repeat_penalty: 1
temperature: 0.7
top_k: 20
top_p: 0.8
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
- filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
- name: "supergemma4-26b-uncensored-v2"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
description: |
Hugging Face |
GitHub |
Launch Blog |
Documentation
License: Apache 2.0 | Authors: Google DeepMind
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
Gemma 4 introduces key **capability and architectural advancements**:
* **Reasoning** All models in the family are designed as highly capable reasoners, with configurable thinking modes.
...
license: "gemma"
tags:
- llm
- gguf
icon: https://ai.google.dev/gemma/images/gemma4_banner.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
- name: "qwopus-glm-18b-merged"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n - Improve **structured reasoning ability**\n - Enhance **instruction-following consistency**\n - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n - Generated from a **GLM-5.1 teacher model**\n - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
license: "apache-2.0"
tags:
- llm
- gguf
- reasoning
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
- name: "qwen3.6-35b-a3b-apex"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
@@ -887,6 +1151,8 @@
- gpu
overrides:
backend: neutts
parameters:
model: neuphonic/neutts-air
known_usecases:
- tts
- name: vllm-omni-z-image-turbo
@@ -3502,6 +3768,169 @@
- filename: arcee-ai_AFM-4.5B-Q4_K_M.gguf
sha256: f05516b323f581bebae1af2cbf900d83a2569b0a60c54366daf4a9c15ae30d4f
uri: huggingface://bartowski/arcee-ai_AFM-4.5B-GGUF/arcee-ai_AFM-4.5B-Q4_K_M.gguf
- &insightface_buffalo_l
name: "insightface-buffalo-l"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
# insightface library is MIT; pretrained packs are NON-COMMERCIAL.
license: "insightface-non-commercial"
description: |
Face recognition using insightface's `buffalo_l` pack
(SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder + genderage head, ~326MB).
Default choice, highest accuracy.
Weights delivered via LocalAI's gallery mechanism (SHA-256 verified,
cached in the models directory like any other managed model).
NON-COMMERCIAL RESEARCH USE ONLY. For commercial use see `insightface-opencv`.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-l}
options: ["engine:insightface", "model_pack:buffalo_l"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_l.zip
sha256: 80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_l.zip
- &insightface_buffalo_m
name: "insightface-buffalo-m"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Mid-tier insightface pack (SCRFD-2.5GF detector + ResNet50 ArcFace +
genderage, ~313MB). Same recognition accuracy as `buffalo_l` with a
cheaper detector — good balance on mid-range hardware.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-m}
options: ["engine:insightface", "model_pack:buffalo_m"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_m.zip
sha256: d98264bd8f2dc75cbc2ddce2a14e636e02bb857b3051c234b737bf3b614edca9
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_m.zip
- &insightface_buffalo_s
name: "insightface-buffalo-s"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Small insightface pack (SCRFD-500MF detector + MBF 512-d embedder +
genderage, ~159MB). Good fit for mid-range CPU deployments.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-s}
options: ["engine:insightface", "model_pack:buffalo_s"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_s.zip
sha256: d85a87f503f691807cd8bb97128bdf7a0660326cd9cd02657127fa978bab8b5e
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_s.zip
- &insightface_buffalo_sc
name: "insightface-buffalo-sc"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Ultra-small insightface pack (SCRFD-500MF + MBF recognition only, ~16MB).
NO landmarks, NO age/gender head — `/v1/face/analyze` returns empty
attributes for this pack. Ideal for edge/embedded deployments where
only verification and embedding are needed.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, edge, cpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-buffalo-sc}
options: ["engine:insightface", "model_pack:buffalo_sc"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: buffalo_sc.zip
sha256: 57d31b56b6ffa911c8a73cfc1707c73cab76efe7f13b675a05223bf42de47c72
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/buffalo_sc.zip
- &insightface_antelopev2
name: "insightface-antelopev2"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://avatars.githubusercontent.com/u/53104118?s=200&v=4
license: "insightface-non-commercial"
description: |
Largest insightface pack (SCRFD-10GF + ResNet100@Glint360K recognizer +
genderage, ~407MB). Higher recognition accuracy than `buffalo_l` on
harder benchmarks; pays for it in GPU memory.
NON-COMMERCIAL RESEARCH USE ONLY.
tags: [face-recognition, face-verification, face-embedding, research-only, gpu]
urls: [https://github.com/deepinsight/insightface]
overrides:
backend: insightface
parameters: {model: insightface-antelopev2}
options: ["engine:insightface", "model_pack:antelopev2"]
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: antelopev2.zip
sha256: 8e182f14fc6e80b3bfa375b33eb6cff7ee05d8ef7633e738d1c89021dcf0c5c5
uri: https://github.com/deepinsight/insightface/releases/download/v0.7/antelopev2.zip
- &insightface_opencv
name: "insightface-opencv"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
license: apache-2.0
description: |
Face recognition using OpenCV Zoo weights: YuNet detector + SFace
128-d recognizer (fp32). APACHE 2.0 — safe for commercial use.
Lower accuracy than insightface packs, no demographic head
(`/v1/face/analyze` returns detection regions only).
Weights are downloaded on install via LocalAI's gallery mechanism
(~40MB).
tags: [face-recognition, face-verification, face-embedding, commercial-ok, gpu, cpu]
urls: [https://github.com/opencv/opencv_zoo]
overrides:
backend: insightface
parameters: {model: face_detection_yunet_2023mar.onnx}
options:
- "engine:onnx_direct"
- "detector_onnx:face_detection_yunet_2023mar.onnx"
- "recognizer_onnx:face_recognition_sface_2021dec.onnx"
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: face_detection_yunet_2023mar.onnx
sha256: 8f2383e4dd3cfbb4553ea8718107fc0423210dc964f9f4280604804ed2552fa4
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
- filename: face_recognition_sface_2021dec.onnx
sha256: 0ba9fbfa01b5270c96627c4ef784da859931e02f04419c829e83484087c34e79
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx
- &insightface_opencv_int8
name: "insightface-opencv-int8"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
license: apache-2.0
description: |
Int8-quantized OpenCV Zoo face pair (YuNet int8 + SFace int8, ~12MB).
Roughly 3x smaller and noticeably faster on CPU than the fp32 variant
at comparable accuracy for face tasks. APACHE 2.0 — commercial-safe.
Weights are downloaded on install via LocalAI's gallery mechanism.
tags: [face-recognition, face-verification, face-embedding, commercial-ok, edge, cpu]
urls: [https://github.com/opencv/opencv_zoo]
overrides:
backend: insightface
parameters: {model: face_detection_yunet_2023mar_int8.onnx}
options:
- "engine:onnx_direct"
- "detector_onnx:face_detection_yunet_2023mar_int8.onnx"
- "recognizer_onnx:face_recognition_sface_2021dec_int8.onnx"
known_usecases: [face_recognition, detection, embeddings]
files:
- filename: face_detection_yunet_2023mar_int8.onnx
sha256: 321aa5a6afabf7ecc46a3d06bfab2b579dc96eb5c3be7edd365fa04502ad9294
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar_int8.onnx
- filename: face_recognition_sface_2021dec_int8.onnx
sha256: 2b0e941e6f16cc048c20aee0c8e31f569118f65d702914540f7bfdc14048d78a
uri: https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec_int8.onnx
- &rfdetr
name: "rfdetr-base"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
@@ -15189,11 +15618,13 @@
model: wan2.1_t2v_1.3b-q8_0.gguf
files:
- filename: "wan2.1_t2v_1.3b-q8_0.gguf"
sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- name: wan-2.1-i2v-14b-480p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
@@ -15214,11 +15645,103 @@
model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- filename: "clip_vision_h.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-flf2v-14b-720p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
description: |
Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
Takes a start and end reference image and interpolates a 33-frame clip
between them. Unlike the plain I2V variant this model feeds the end
frame through clip_vision as well, so it conditions semantically (not
just in pixel-space) on both endpoints. That makes it the right choice
for seamless loops (start_image == end_image) and clean narrative cuts.
Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
text encoder, and clip_vision_h as I2V 14B.
urls:
- https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
tags:
- image-to-video
- first-last-frame-to-video
- wan
- video-generation
- cpu
- gpu
overrides:
parameters:
model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- filename: "clip_vision_h.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-i2v-14b-720p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
description: |
Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
Native 720p sibling of the 480p I2V model: animates a single
reference image into a 33-frame clip at up to 1280x720. Trained
purely as image-to-video (no first-last-frame interpolation path),
so motion is freer and better-suited to single-anchor animation
than repurposing the FLF2V 720P variant for i2v. Shares the same
VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
and FLF2V 14B 720P entries.
urls:
- https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
tags:
- image-to-video
- wan
- video-generation
- cpu
- gpu
overrides:
parameters:
model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
- filename: "clip_vision_h.safetensors"

45
go.mod
View File

@@ -8,13 +8,13 @@ require (
github.com/Masterminds/sprig/v3 v3.3.0
github.com/alecthomas/kong v1.14.0
github.com/anthropics/anthropic-sdk-go v1.27.0
github.com/aws/aws-sdk-go-v2 v1.41.5
github.com/aws/aws-sdk-go-v2/config v1.32.14
github.com/aws/aws-sdk-go-v2/credentials v1.19.14
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
github.com/aws/aws-sdk-go-v2 v1.41.6
github.com/aws/aws-sdk-go-v2/config v1.32.16
github.com/aws/aws-sdk-go-v2/credentials v1.19.15
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
github.com/charmbracelet/glamour v1.0.0
github.com/containerd/containerd v1.7.30
github.com/coreos/go-oidc/v3 v3.17.0
github.com/containerd/containerd v1.7.31
github.com/coreos/go-oidc/v3 v3.18.0
github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
github.com/ebitengine/purego v0.10.0
github.com/emirpasic/gods/v2 v2.0.0-alpha
@@ -35,7 +35,7 @@ require (
github.com/lithammer/fuzzysearch v1.1.8
github.com/mholt/archiver/v3 v3.5.1
github.com/microcosm-cc/bluemonday v1.0.27
github.com/modelcontextprotocol/go-sdk v1.4.1
github.com/modelcontextprotocol/go-sdk v1.5.0
github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
github.com/mudler/edgevpn v0.31.1
github.com/mudler/go-processmanager v0.1.0
@@ -75,24 +75,23 @@ require (
)
require (
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
github.com/aws/smithy-go v1.24.2 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
github.com/aws/smithy-go v1.25.0 // indirect
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/go-jose/go-jose/v4 v4.1.3 // indirect
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
github.com/jinzhu/inflection v1.0.0 // indirect
github.com/jinzhu/now v1.1.5 // indirect
github.com/mattn/go-sqlite3 v1.14.24 // indirect

94
go.sum
View File

@@ -70,44 +70,42 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -198,8 +196,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
@@ -212,8 +210,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
@@ -336,8 +334,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
@@ -385,8 +383,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
@@ -691,8 +689,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=

View File

@@ -54,6 +54,8 @@ type Backend interface {
TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...grpc.CallOption) (*pb.FaceAnalyzeResponse, error)
AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)

View File

@@ -81,6 +81,14 @@ func (llm *Base) Detect(*pb.DetectOptions) (pb.DetectResponse, error) {
return pb.DetectResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) FaceVerify(*pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
return pb.FaceVerifyResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) FaceAnalyze(*pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
return pb.FaceAnalyzeResponse{}, fmt.Errorf("unimplemented")
}
func (llm *Base) TokenizeString(opts *pb.PredictOptions) (pb.TokenizationResponse, error) {
return pb.TokenizationResponse{}, fmt.Errorf("unimplemented")
}

Some files were not shown because too many files have changed in this diff Show More