mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-29 11:37:40 -04:00
* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis
Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.
New gRPC RPCs (backend/backend.proto):
* FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
* FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse
Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.
New HTTP endpoints under /v1/face/:
* verify — 1:1 image pair same-person decision
* analyze — per-face age + gender (emotion/race reserved)
* register — 1:N enrollment; stores embedding in vector store
* identify — 1:N recognition; detect → embed → StoresFind
* forget — remove a registered face by opaque ID
Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).
New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.
Gallery (backend/index.yaml) ships three entries:
* insightface-buffalo-l — SCRFD-10GF + ArcFace R50 + genderage
(~326MB pre-baked; non-commercial research use only)
* insightface-opencv — YuNet + SFace (~40MB pre-baked; Apache 2.0)
* insightface-buffalo-s — SCRFD-500MF + MBF (runtime download; non-commercial)
Python backend (backend/python/insightface/):
* engines.py — FaceEngine protocol with InsightFaceEngine and
OnnxDirectEngine; resolves model paths relative to the backend
directory so the same gallery config works in docker-scratch and
in the e2e-backends rootfs-extraction harness.
* backend.py — gRPC servicer implementing Health, LoadModel, Status,
Embedding, Detect, FaceVerify, FaceAnalyze.
* install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
backend directory so first-run is offline-clean (the final scratch
image only preserves files under /<backend>/).
* test.py — parametrized unit tests over both engines.
Tests:
* Registry unit tests (go test -race ./core/services/facerecognition/...)
— in-memory fake grpc.Backend, table-driven, covers register/
identify/forget/error paths + concurrent access.
* tests/e2e-backends/backend_test.go extended with face caps
(face_detect, face_embed, face_verify, face_analyze); relative
ordering + configurable verifyCeiling per engine.
* Makefile targets: test-extra-backend-insightface-buffalo-l,
-opencv, and the -all aggregate.
* CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
auto-triggered by changes under backend/python/insightface/.
Docs:
* docs/content/features/face-recognition.md — feature page with
license table, quickstart (defaults to the commercial-safe model),
models matrix, API reference, 1:N workflow, storage caveats.
* Cross-refs in object-detection.md, stores.md, embeddings.md, and
whats-new.md.
* Contributor README at backend/python/insightface/README.md.
Verified end-to-end:
* buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
face_verify, face_analyze).
* opencv: 5/5 specs (same minus face_analyze — SFace has no
demographic head; correctly skipped via BACKEND_TEST_CAPS).
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): move engine selection to model gallery, collapse backend entries
The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.
Correct split:
* `backend/index.yaml` now has ONE `insightface` backend entry
shipping the CPU + CUDA 12 container images. The Python backend
bundles both the non-commercial insightface model packs
(buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
weights (YuNet + SFace); the active engine is selected at
LoadModel time via `options: ["engine:..."]`.
* `gallery/index.yaml` gains three model entries —
`insightface-buffalo-l`, `insightface-opencv`,
`insightface-buffalo-s` — each setting the appropriate
`overrides.backend` + `overrides.options` so installing one
actually gives the user the intended engine. This matches how
`rfdetr-base` lives in the model gallery against the `rfdetr`
backend.
The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): cover all supported models in the gallery + drop weight baking
Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.
Gallery now has seven `insightface-*` model entries (gallery/index.yaml):
insightface (family) — non-commercial research use
• buffalo-l (326MB) — SCRFD-10GF + ResNet50 + genderage, default
• buffalo-m (313MB) — SCRFD-2.5GF + ResNet50 + genderage
• buffalo-s (159MB) — SCRFD-500MF + MBF + genderage
• buffalo-sc (16MB) — SCRFD-500MF + MBF, recognition only
(no landmarks, no demographics — analyze
returns empty attributes)
• antelopev2 (407MB) — SCRFD-10GF + ResNet100@Glint360K + genderage
OpenCV Zoo family — Apache 2.0 commercial-safe
• opencv — YuNet + SFace fp32 (~40MB)
• opencv-int8 — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)
Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:
* OpenCV variants list `files:` with ONNX URIs + SHA-256, so
`local-ai models install insightface-opencv` pulls them into the
models directory exactly like any other gallery-managed model.
* insightface packs (upstream distributes .zip archives only, not
individual ONNX files) auto-download on first LoadModel via
FaceAnalysis' built-in machinery, rooted at the LocalAI models
directory so they live alongside everything else — same pattern
`rfdetr` uses with `inference.get_model()`.
Backend changes (backend/python/insightface/):
* backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
LocalAI models directory) to engines via a `_model_dir` hint.
This replaces the earlier ModelFile-dirname approach; ModelPath
is the canonical "models directory" variable set by the Go loader
(pkg/model/initializers.go:144) and is always populated.
* engines.py::_resolve_model_path — picks up `model_dir` and searches
it (plus basename-in-model-dir) before falling back to the dev
script-dir. This is how OnnxDirectEngine finds gallery-downloaded
YuNet/SFace files by filename only.
* engines.py::_flatten_insightface_pack — new helper that works
around an upstream packaging inconsistency: buffalo_l/s/sc zips
expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
files in a redundant `<name>/` directory. insightface's own
loader looks one level too shallow and fails. We call
`ensure_available()` explicitly, flatten if nested, then hand to
FaceAnalysis.
* engines.py::InsightFaceEngine.prepare — root-resolution order now
includes the `_model_dir` hint so packs download into the LocalAI
models directory by default.
* install.sh — no longer pre-downloads any weights. Everything is
gallery-managed now.
* smoke.py (new) — parametrized smoke test that iterates over every
gallery configuration, simulating the LocalAI install flow
(creates a models dir, fetches OpenCV files with checksum
verification, lets insightface auto-download its packs), then
runs detect + embed + verify (+ analyze where supported) through
the in-process BackendServicer.
* test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
paths; downloads ONNX files to a temp dir at setUpClass time and
passes ModelPath accordingly.
Registry change (core/services/facerecognition/store_registry.go):
* `dim=0` in NewStoreRegistry now means "accept whatever dimension
arrives" — needed because the backend supports 512-d ArcFace/MBF
and 128-d SFace via the same Registry. A non-zero dim still fails
fast with ErrDimensionMismatch.
* core/application plumbs `faceEmbeddingDim = 0`, explaining the
rationale in the comment.
Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.
Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:
PASS: insightface-buffalo-l faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-sc faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-s faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-m faces=6 dim=512 same-dist=0.000
PASS: insightface-antelopev2 faces=6 dim=512 same-dist=0.000
PASS: insightface-opencv faces=6 dim=128 same-dist=0.000
PASS: insightface-opencv-int8 faces=6 dim=128 same-dist=0.000
7/7 passed
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim
CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:
LoadModel failed: Failed to load face engine:
OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx
Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.
Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).
Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs
The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.
Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).
Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).
Response:
{
"embedding": [<dim> floats, L2-normed],
"dim": int, // 512 for ArcFace R50 / MBF, 128 for SFace
"model": "<name>"
}
Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.
Assisted-by: Claude:claude-opus-4-7
* fix(http): map malformed image input + gRPC status codes to proper 4xx
Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.
Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:
* decodeImageInput(s)
Wraps utils.GetContentURIAsBase64 and turns any failure
(invalid URL, not a data-URI, download error, etc.) into
echo.NewHTTPError(400, "invalid image input: ...").
* mapBackendError(err)
Inspects the gRPC status on a backend call error and maps:
INVALID_ARGUMENT → 400 Bad Request
NOT_FOUND → 404 Not Found
FAILED_PRECONDITION → 412 Precondition Failed
Unimplemented → 501 Not Implemented
All other codes fall through unchanged (still 500).
Before, my 1×1 PNG error-path test returned:
HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
HTTP 400 "failed to decode one or both images"
Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.
Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.
Assisted-by: Claude:claude-opus-4-7
* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis
Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.
Two changes:
1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
`files:` list with the pack zip's URI + SHA-256. `local-ai models
install insightface-buffalo-l` now fetches the zip, verifies the
hash, and extracts it into the models directory. No more reliance
on insightface's library-internal `ensure_available()` auto-download
or its hardcoded `BASE_REPO_URL`.
2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
the FaceAnalysis wrapper and drives insightface's `model_zoo`
directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
route each through `model_zoo.get_model()`, build a
`{taskname: model}` dict, loop per-face at inference — are
reimplemented in `InsightFaceEngine`. The actual inference classes
(RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
insightface's — we only replicate the glue, so drift risk against
upstream is minimal.
Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
layout that doesn't match what LocalAI's zip extraction produces.
LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
`<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
`_locate_insightface_pack` helper searches both locations plus
legacy paths and returns whichever has ONNX files. Replaces the
earlier `_flatten_insightface_pack` helper (which tried to fight
FaceAnalysis's layout expectations; now we just find the files
wherever they are).
Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched
The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".
Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.
Face analyze cap dropped since buffalo_sc has no age/gender head.
Assisted-by: Claude:claude-opus-4-7[1m]
* feat(face-recognition): surface face-recognition in advertised feature maps
The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:
* api_instructions — the machine-readable capability index at
GET /api/instructions. Added `face-recognition` as a dedicated
instruction area with an intro that calls out the in-memory
registry caveat and the /v1/face/embed vs /v1/embeddings split.
* auth/permissions — added FeatureFaceRecognition constant, routed
all six face endpoints through it so admins can gate them per-user
like any other API feature. Default ON (matches the other API
features).
* React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
follow-up (noted in the plan).
Instruction count bumped 9 → 10; test updated.
Assisted-by: Claude:claude-opus-4-7[1m]
* docs(agents): capture advertising-surface steps in the endpoint guide
Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.
Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.
Assisted-by: Claude:claude-opus-4-7[1m]
751 lines
22 KiB
Protocol Buffer
751 lines
22 KiB
Protocol Buffer
syntax = "proto3";
|
|
|
|
option go_package = "github.com/go-skynet/LocalAI/pkg/grpc/proto";
|
|
option java_multiple_files = true;
|
|
option java_package = "io.skynet.localai.backend";
|
|
option java_outer_classname = "LocalAIBackend";
|
|
|
|
package backend;
|
|
|
|
service Backend {
|
|
rpc Health(HealthMessage) returns (Reply) {}
|
|
rpc Free(HealthMessage) returns (Result) {}
|
|
rpc Predict(PredictOptions) returns (Reply) {}
|
|
rpc LoadModel(ModelOptions) returns (Result) {}
|
|
rpc PredictStream(PredictOptions) returns (stream Reply) {}
|
|
rpc Embedding(PredictOptions) returns (EmbeddingResult) {}
|
|
rpc GenerateImage(GenerateImageRequest) returns (Result) {}
|
|
rpc GenerateVideo(GenerateVideoRequest) returns (Result) {}
|
|
rpc AudioTranscription(TranscriptRequest) returns (TranscriptResult) {}
|
|
rpc AudioTranscriptionStream(TranscriptRequest) returns (stream TranscriptStreamResponse) {}
|
|
rpc TTS(TTSRequest) returns (Result) {}
|
|
rpc TTSStream(TTSRequest) returns (stream Reply) {}
|
|
rpc SoundGeneration(SoundGenerationRequest) returns (Result) {}
|
|
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
|
|
rpc Status(HealthMessage) returns (StatusResponse) {}
|
|
rpc Detect(DetectOptions) returns (DetectResponse) {}
|
|
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
|
|
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
|
|
|
|
rpc StoresSet(StoresSetOptions) returns (Result) {}
|
|
rpc StoresDelete(StoresDeleteOptions) returns (Result) {}
|
|
rpc StoresGet(StoresGetOptions) returns (StoresGetResult) {}
|
|
rpc StoresFind(StoresFindOptions) returns (StoresFindResult) {}
|
|
|
|
rpc Rerank(RerankRequest) returns (RerankResult) {}
|
|
|
|
rpc GetMetrics(MetricsRequest) returns (MetricsResponse);
|
|
|
|
rpc VAD(VADRequest) returns (VADResponse) {}
|
|
|
|
rpc AudioEncode(AudioEncodeRequest) returns (AudioEncodeResult) {}
|
|
rpc AudioDecode(AudioDecodeRequest) returns (AudioDecodeResult) {}
|
|
|
|
rpc ModelMetadata(ModelOptions) returns (ModelMetadataResponse) {}
|
|
|
|
// Fine-tuning RPCs
|
|
rpc StartFineTune(FineTuneRequest) returns (FineTuneJobResult) {}
|
|
rpc FineTuneProgress(FineTuneProgressRequest) returns (stream FineTuneProgressUpdate) {}
|
|
rpc StopFineTune(FineTuneStopRequest) returns (Result) {}
|
|
rpc ListCheckpoints(ListCheckpointsRequest) returns (ListCheckpointsResponse) {}
|
|
rpc ExportModel(ExportModelRequest) returns (Result) {}
|
|
|
|
// Quantization RPCs
|
|
rpc StartQuantization(QuantizationRequest) returns (QuantizationJobResult) {}
|
|
rpc QuantizationProgress(QuantizationProgressRequest) returns (stream QuantizationProgressUpdate) {}
|
|
rpc StopQuantization(QuantizationStopRequest) returns (Result) {}
|
|
|
|
}
|
|
|
|
// Define the empty request
|
|
message MetricsRequest {}
|
|
|
|
message MetricsResponse {
|
|
int32 slot_id = 1;
|
|
string prompt_json_for_slot = 2; // Stores the prompt as a JSON string.
|
|
float tokens_per_second = 3;
|
|
int32 tokens_generated = 4;
|
|
int32 prompt_tokens_processed = 5;
|
|
}
|
|
|
|
message RerankRequest {
|
|
string query = 1;
|
|
repeated string documents = 2;
|
|
int32 top_n = 3;
|
|
}
|
|
|
|
message RerankResult {
|
|
Usage usage = 1;
|
|
repeated DocumentResult results = 2;
|
|
}
|
|
|
|
message Usage {
|
|
int32 total_tokens = 1;
|
|
int32 prompt_tokens = 2;
|
|
}
|
|
|
|
message DocumentResult {
|
|
int32 index = 1;
|
|
string text = 2;
|
|
float relevance_score = 3;
|
|
}
|
|
|
|
message StoresKey {
|
|
repeated float Floats = 1;
|
|
}
|
|
|
|
message StoresValue {
|
|
bytes Bytes = 1;
|
|
}
|
|
|
|
message StoresSetOptions {
|
|
repeated StoresKey Keys = 1;
|
|
repeated StoresValue Values = 2;
|
|
}
|
|
|
|
message StoresDeleteOptions {
|
|
repeated StoresKey Keys = 1;
|
|
}
|
|
|
|
message StoresGetOptions {
|
|
repeated StoresKey Keys = 1;
|
|
}
|
|
|
|
message StoresGetResult {
|
|
repeated StoresKey Keys = 1;
|
|
repeated StoresValue Values = 2;
|
|
}
|
|
|
|
message StoresFindOptions {
|
|
StoresKey Key = 1;
|
|
int32 TopK = 2;
|
|
}
|
|
|
|
message StoresFindResult {
|
|
repeated StoresKey Keys = 1;
|
|
repeated StoresValue Values = 2;
|
|
repeated float Similarities = 3;
|
|
}
|
|
|
|
message HealthMessage {}
|
|
|
|
// The request message containing the user's name.
|
|
message PredictOptions {
|
|
string Prompt = 1;
|
|
int32 Seed = 2;
|
|
int32 Threads = 3;
|
|
int32 Tokens = 4;
|
|
int32 TopK = 5;
|
|
int32 Repeat = 6;
|
|
int32 Batch = 7;
|
|
int32 NKeep = 8;
|
|
float Temperature = 9;
|
|
float Penalty = 10;
|
|
bool F16KV = 11;
|
|
bool DebugMode = 12;
|
|
repeated string StopPrompts = 13;
|
|
bool IgnoreEOS = 14;
|
|
float TailFreeSamplingZ = 15;
|
|
float TypicalP = 16;
|
|
float FrequencyPenalty = 17;
|
|
float PresencePenalty = 18;
|
|
int32 Mirostat = 19;
|
|
float MirostatETA = 20;
|
|
float MirostatTAU = 21;
|
|
bool PenalizeNL = 22;
|
|
string LogitBias = 23;
|
|
bool MLock = 25;
|
|
bool MMap = 26;
|
|
bool PromptCacheAll = 27;
|
|
bool PromptCacheRO = 28;
|
|
string Grammar = 29;
|
|
string MainGPU = 30;
|
|
string TensorSplit = 31;
|
|
float TopP = 32;
|
|
string PromptCachePath = 33;
|
|
bool Debug = 34;
|
|
repeated int32 EmbeddingTokens = 35;
|
|
string Embeddings = 36;
|
|
float RopeFreqBase = 37;
|
|
float RopeFreqScale = 38;
|
|
float NegativePromptScale = 39;
|
|
string NegativePrompt = 40;
|
|
int32 NDraft = 41;
|
|
repeated string Images = 42;
|
|
bool UseTokenizerTemplate = 43;
|
|
repeated Message Messages = 44;
|
|
repeated string Videos = 45;
|
|
repeated string Audios = 46;
|
|
string CorrelationId = 47;
|
|
string Tools = 48; // JSON array of available tools/functions for tool calling
|
|
string ToolChoice = 49; // JSON string or object specifying tool choice behavior
|
|
int32 Logprobs = 50; // Number of top logprobs to return (maps to OpenAI logprobs parameter)
|
|
int32 TopLogprobs = 51; // Number of top logprobs to return per token (maps to OpenAI top_logprobs parameter)
|
|
map<string, string> Metadata = 52; // Generic per-request metadata (e.g., enable_thinking)
|
|
float MinP = 53; // Minimum probability sampling threshold (0.0 = disabled)
|
|
}
|
|
|
|
// ToolCallDelta represents an incremental tool call update from the C++ parser.
|
|
// Used for both streaming (partial diffs) and non-streaming (final tool calls).
|
|
message ToolCallDelta {
|
|
int32 index = 1; // tool call index (0-based)
|
|
string id = 2; // tool call ID (e.g., "call_abc123")
|
|
string name = 3; // function name (set on first appearance)
|
|
string arguments = 4; // arguments chunk (incremental in streaming, full in non-streaming)
|
|
}
|
|
|
|
// ChatDelta represents incremental content/reasoning/tool_call updates parsed by the C++ backend.
|
|
message ChatDelta {
|
|
string content = 1; // content text delta
|
|
string reasoning_content = 2; // reasoning/thinking text delta
|
|
repeated ToolCallDelta tool_calls = 3; // tool call deltas
|
|
}
|
|
|
|
// The response message containing the result
|
|
message Reply {
|
|
bytes message = 1;
|
|
int32 tokens = 2;
|
|
int32 prompt_tokens = 3;
|
|
double timing_prompt_processing = 4;
|
|
double timing_token_generation = 5;
|
|
bytes audio = 6;
|
|
bytes logprobs = 7; // JSON-encoded logprobs data matching OpenAI format
|
|
repeated ChatDelta chat_deltas = 8; // Parsed chat deltas from C++ autoparser (streaming + non-streaming)
|
|
}
|
|
|
|
message GrammarTrigger {
|
|
string word = 1;
|
|
}
|
|
|
|
message ModelOptions {
|
|
string Model = 1;
|
|
int32 ContextSize = 2;
|
|
int32 Seed = 3;
|
|
int32 NBatch = 4;
|
|
bool F16Memory = 5;
|
|
bool MLock = 6;
|
|
bool MMap = 7;
|
|
bool VocabOnly = 8;
|
|
bool LowVRAM = 9;
|
|
bool Embeddings = 10;
|
|
bool NUMA = 11;
|
|
int32 NGPULayers = 12;
|
|
string MainGPU = 13;
|
|
string TensorSplit = 14;
|
|
int32 Threads = 15;
|
|
float RopeFreqBase = 17;
|
|
float RopeFreqScale = 18;
|
|
float RMSNormEps = 19;
|
|
int32 NGQA = 20;
|
|
string ModelFile = 21;
|
|
|
|
|
|
|
|
// Diffusers
|
|
string PipelineType = 26;
|
|
string SchedulerType = 27;
|
|
bool CUDA = 28;
|
|
float CFGScale = 29;
|
|
bool IMG2IMG = 30;
|
|
string CLIPModel = 31;
|
|
string CLIPSubfolder = 32;
|
|
int32 CLIPSkip = 33;
|
|
string ControlNet = 48;
|
|
|
|
string Tokenizer = 34;
|
|
|
|
// LLM (llama.cpp)
|
|
string LoraBase = 35;
|
|
string LoraAdapter = 36;
|
|
float LoraScale = 42;
|
|
|
|
bool NoMulMatQ = 37;
|
|
string DraftModel = 39;
|
|
|
|
string AudioPath = 38;
|
|
|
|
// vllm
|
|
string Quantization = 40;
|
|
float GPUMemoryUtilization = 50;
|
|
bool TrustRemoteCode = 51;
|
|
bool EnforceEager = 52;
|
|
int32 SwapSpace = 53;
|
|
int32 MaxModelLen = 54;
|
|
int32 TensorParallelSize = 55;
|
|
string LoadFormat = 58;
|
|
bool DisableLogStatus = 66;
|
|
string DType = 67;
|
|
int32 LimitImagePerPrompt = 68;
|
|
int32 LimitVideoPerPrompt = 69;
|
|
int32 LimitAudioPerPrompt = 70;
|
|
|
|
string MMProj = 41;
|
|
|
|
string RopeScaling = 43;
|
|
float YarnExtFactor = 44;
|
|
float YarnAttnFactor = 45;
|
|
float YarnBetaFast = 46;
|
|
float YarnBetaSlow = 47;
|
|
|
|
string Type = 49;
|
|
|
|
string FlashAttention = 56;
|
|
bool NoKVOffload = 57;
|
|
|
|
string ModelPath = 59;
|
|
|
|
repeated string LoraAdapters = 60;
|
|
repeated float LoraScales = 61;
|
|
|
|
repeated string Options = 62;
|
|
|
|
string CacheTypeKey = 63;
|
|
string CacheTypeValue = 64;
|
|
|
|
repeated GrammarTrigger GrammarTriggers = 65;
|
|
|
|
bool Reranking = 71;
|
|
|
|
repeated string Overrides = 72;
|
|
}
|
|
|
|
message Result {
|
|
string message = 1;
|
|
bool success = 2;
|
|
}
|
|
|
|
message EmbeddingResult {
|
|
repeated float embeddings = 1;
|
|
}
|
|
|
|
message TranscriptRequest {
|
|
string dst = 2;
|
|
string language = 3;
|
|
uint32 threads = 4;
|
|
bool translate = 5;
|
|
bool diarize = 6;
|
|
string prompt = 7;
|
|
float temperature = 8;
|
|
repeated string timestamp_granularities = 9;
|
|
bool stream = 10;
|
|
}
|
|
|
|
message TranscriptResult {
|
|
repeated TranscriptSegment segments = 1;
|
|
string text = 2;
|
|
string language = 3;
|
|
float duration = 4;
|
|
}
|
|
|
|
message TranscriptStreamResponse {
|
|
string delta = 1;
|
|
TranscriptResult final_result = 2;
|
|
}
|
|
|
|
message TranscriptSegment {
|
|
int32 id = 1;
|
|
int64 start = 2;
|
|
int64 end = 3;
|
|
string text = 4;
|
|
repeated int32 tokens = 5;
|
|
string speaker = 6;
|
|
}
|
|
|
|
message GenerateImageRequest {
|
|
int32 height = 1;
|
|
int32 width = 2;
|
|
int32 step = 4;
|
|
int32 seed = 5;
|
|
string positive_prompt = 6;
|
|
string negative_prompt = 7;
|
|
string dst = 8;
|
|
string src = 9;
|
|
|
|
// Diffusers
|
|
string EnableParameters = 10;
|
|
int32 CLIPSkip = 11;
|
|
|
|
// Reference images for models that support them (e.g., Flux Kontext)
|
|
repeated string ref_images = 12;
|
|
}
|
|
|
|
message GenerateVideoRequest {
|
|
string prompt = 1;
|
|
string negative_prompt = 2; // Negative prompt for video generation
|
|
string start_image = 3; // Path or base64 encoded image for the start frame
|
|
string end_image = 4; // Path or base64 encoded image for the end frame
|
|
int32 width = 5;
|
|
int32 height = 6;
|
|
int32 num_frames = 7; // Number of frames to generate
|
|
int32 fps = 8; // Frames per second
|
|
int32 seed = 9;
|
|
float cfg_scale = 10; // Classifier-free guidance scale
|
|
int32 step = 11; // Number of inference steps
|
|
string dst = 12; // Output path for the generated video
|
|
}
|
|
|
|
message TTSRequest {
|
|
string text = 1;
|
|
string model = 2;
|
|
string dst = 3;
|
|
string voice = 4;
|
|
optional string language = 5;
|
|
}
|
|
|
|
message VADRequest {
|
|
repeated float audio = 1;
|
|
}
|
|
|
|
message VADSegment {
|
|
float start = 1;
|
|
float end = 2;
|
|
}
|
|
|
|
message VADResponse {
|
|
repeated VADSegment segments = 1;
|
|
}
|
|
|
|
message SoundGenerationRequest {
|
|
string text = 1;
|
|
string model = 2;
|
|
string dst = 3;
|
|
optional float duration = 4;
|
|
optional float temperature = 5;
|
|
optional bool sample = 6;
|
|
optional string src = 7;
|
|
optional int32 src_divisor = 8;
|
|
optional bool think = 9;
|
|
optional string caption = 10;
|
|
optional string lyrics = 11;
|
|
optional int32 bpm = 12;
|
|
optional string keyscale = 13;
|
|
optional string language = 14;
|
|
optional string timesignature = 15;
|
|
optional bool instrumental = 17;
|
|
}
|
|
|
|
message TokenizationResponse {
|
|
int32 length = 1;
|
|
repeated int32 tokens = 2;
|
|
}
|
|
|
|
message MemoryUsageData {
|
|
uint64 total = 1;
|
|
map<string, uint64> breakdown = 2;
|
|
}
|
|
|
|
message StatusResponse {
|
|
enum State {
|
|
UNINITIALIZED = 0;
|
|
BUSY = 1;
|
|
READY = 2;
|
|
ERROR = -1;
|
|
}
|
|
State state = 1;
|
|
MemoryUsageData memory = 2;
|
|
}
|
|
|
|
message Message {
|
|
string role = 1;
|
|
string content = 2;
|
|
// Optional fields for OpenAI-compatible message format
|
|
string name = 3; // Tool name (for tool messages)
|
|
string tool_call_id = 4; // Tool call ID (for tool messages)
|
|
string reasoning_content = 5; // Reasoning content (for thinking models)
|
|
string tool_calls = 6; // Tool calls as JSON string (for assistant messages with tool calls)
|
|
}
|
|
|
|
message DetectOptions {
|
|
string src = 1;
|
|
string prompt = 2; // Text prompt (for SAM 3 PCS mode)
|
|
repeated float points = 3; // Point coordinates as [x1, y1, label1, x2, y2, label2, ...] (label: 1=pos, 0=neg)
|
|
repeated float boxes = 4; // Box coordinates as [x1, y1, x2, y2, ...]
|
|
float threshold = 5; // Detection confidence threshold
|
|
}
|
|
|
|
message Detection {
|
|
float x = 1;
|
|
float y = 2;
|
|
float width = 3;
|
|
float height = 4;
|
|
float confidence = 5;
|
|
string class_name = 6;
|
|
bytes mask = 7; // PNG-encoded binary segmentation mask
|
|
}
|
|
|
|
message DetectResponse {
|
|
repeated Detection Detections = 1;
|
|
}
|
|
|
|
// --- Face recognition messages ---
|
|
|
|
message FacialArea {
|
|
float x = 1;
|
|
float y = 2;
|
|
float w = 3;
|
|
float h = 4;
|
|
}
|
|
|
|
message FaceVerifyRequest {
|
|
string img1 = 1; // base64-encoded image
|
|
string img2 = 2; // base64-encoded image
|
|
float threshold = 3; // cosine-distance threshold; 0 = use backend default
|
|
bool anti_spoofing = 4; // reserved for future MiniFASNet bolt-on
|
|
}
|
|
|
|
message FaceVerifyResponse {
|
|
bool verified = 1;
|
|
float distance = 2; // 1 - cosine_similarity
|
|
float threshold = 3;
|
|
float confidence = 4; // 0-100
|
|
string model = 5; // e.g. "buffalo_l"
|
|
FacialArea img1_area = 6;
|
|
FacialArea img2_area = 7;
|
|
float processing_time_ms = 8;
|
|
}
|
|
|
|
message FaceAnalyzeRequest {
|
|
string img = 1; // base64-encoded image
|
|
repeated string actions = 2; // subset of ["age","gender","emotion","race"]; empty = all-supported
|
|
bool anti_spoofing = 3;
|
|
}
|
|
|
|
message FaceAnalysis {
|
|
FacialArea region = 1;
|
|
float face_confidence = 2;
|
|
float age = 3;
|
|
string dominant_gender = 4; // "Man" | "Woman"
|
|
map<string, float> gender = 5;
|
|
string dominant_emotion = 6; // reserved; empty in MVP
|
|
map<string, float> emotion = 7;
|
|
string dominant_race = 8; // not populated
|
|
map<string, float> race = 9;
|
|
bool is_real = 10; // anti-spoofing result when enabled
|
|
float antispoof_score = 11;
|
|
}
|
|
|
|
message FaceAnalyzeResponse {
|
|
repeated FaceAnalysis faces = 1;
|
|
}
|
|
|
|
message ToolFormatMarkers {
|
|
string format_type = 1; // "json_native", "tag_with_json", "tag_with_tagged"
|
|
|
|
// Tool section markers
|
|
string section_start = 2; // e.g., "<tool_call>", "[TOOL_CALLS]"
|
|
string section_end = 3; // e.g., "</tool_call>"
|
|
string per_call_start = 4; // e.g., "<|tool_call_begin|>"
|
|
string per_call_end = 5; // e.g., "<|tool_call_end|>"
|
|
|
|
// Function name markers (TAG_WITH_JSON / TAG_WITH_TAGGED)
|
|
string func_name_prefix = 6; // e.g., "<function="
|
|
string func_name_suffix = 7; // e.g., ">"
|
|
string func_close = 8; // e.g., "</function>"
|
|
|
|
// Argument markers (TAG_WITH_TAGGED)
|
|
string arg_name_prefix = 9; // e.g., "<param="
|
|
string arg_name_suffix = 10; // e.g., ">"
|
|
string arg_value_prefix = 11;
|
|
string arg_value_suffix = 12; // e.g., "</param>"
|
|
string arg_separator = 13; // e.g., "\n"
|
|
|
|
// JSON format fields (JSON_NATIVE)
|
|
string name_field = 14; // e.g., "name"
|
|
string args_field = 15; // e.g., "arguments"
|
|
string id_field = 16; // e.g., "id"
|
|
bool fun_name_is_key = 17;
|
|
bool tools_array_wrapped = 18;
|
|
reserved 19;
|
|
|
|
// Reasoning markers
|
|
string reasoning_start = 20; // e.g., "<think>"
|
|
string reasoning_end = 21; // e.g., "</think>"
|
|
|
|
// Content markers
|
|
string content_start = 22;
|
|
string content_end = 23;
|
|
|
|
// Args wrapper markers
|
|
string args_start = 24; // e.g., "<args>"
|
|
string args_end = 25; // e.g., "</args>"
|
|
|
|
// JSON parameter ordering
|
|
string function_field = 26; // e.g., "function" (wrapper key in JSON)
|
|
repeated string parameter_order = 27;
|
|
|
|
// Generated ID field (alternative field name for generated IDs)
|
|
string gen_id_field = 28; // e.g., "call_id"
|
|
|
|
// Call ID markers (position and delimiters for tool call IDs)
|
|
string call_id_position = 29; // "none", "pre_func_name", "between_func_and_args", "post_args"
|
|
string call_id_prefix = 30; // e.g., "[CALL_ID]"
|
|
string call_id_suffix = 31; // e.g., ""
|
|
}
|
|
|
|
message AudioEncodeRequest {
|
|
bytes pcm_data = 1;
|
|
int32 sample_rate = 2;
|
|
int32 channels = 3;
|
|
map<string, string> options = 4;
|
|
}
|
|
|
|
message AudioEncodeResult {
|
|
repeated bytes frames = 1;
|
|
int32 sample_rate = 2;
|
|
int32 samples_per_frame = 3;
|
|
}
|
|
|
|
message AudioDecodeRequest {
|
|
repeated bytes frames = 1;
|
|
map<string, string> options = 2;
|
|
}
|
|
|
|
message AudioDecodeResult {
|
|
bytes pcm_data = 1;
|
|
int32 sample_rate = 2;
|
|
int32 samples_per_frame = 3;
|
|
}
|
|
|
|
message ModelMetadataResponse {
|
|
bool supports_thinking = 1;
|
|
string rendered_template = 2; // The rendered chat template with enable_thinking=true (empty if not applicable)
|
|
ToolFormatMarkers tool_format = 3; // Auto-detected tool format markers from differential template analysis
|
|
string media_marker = 4; // Marker the backend expects in the prompt for each multimodal input (images/audio/video). Empty when the backend does not use a marker.
|
|
}
|
|
|
|
// Fine-tuning messages
|
|
|
|
message FineTuneRequest {
|
|
// Model identification
|
|
string model = 1; // HF model name or local path
|
|
string training_type = 2; // "lora", "loha", "lokr", "full" — what parameters to train
|
|
string training_method = 3; // "sft", "dpo", "grpo", "rloo", "reward", "kto", "orpo", "network_training"
|
|
|
|
// Adapter config (universal across LoRA/LoHa/LoKr for LLM + diffusion)
|
|
int32 adapter_rank = 10; // LoRA rank (r), default 16
|
|
int32 adapter_alpha = 11; // scaling factor, default 16
|
|
float adapter_dropout = 12; // default 0.0
|
|
repeated string target_modules = 13; // layer names to adapt
|
|
|
|
// Universal training hyperparameters
|
|
float learning_rate = 20; // default 2e-4
|
|
int32 num_epochs = 21; // default 3
|
|
int32 batch_size = 22; // default 2
|
|
int32 gradient_accumulation_steps = 23; // default 4
|
|
int32 warmup_steps = 24; // default 5
|
|
int32 max_steps = 25; // 0 = use epochs
|
|
int32 save_steps = 26; // 0 = only save final
|
|
float weight_decay = 27; // default 0.01
|
|
bool gradient_checkpointing = 28;
|
|
string optimizer = 29; // adamw_8bit, adamw, sgd, adafactor, prodigy
|
|
int32 seed = 30; // default 3407
|
|
string mixed_precision = 31; // fp16, bf16, fp8, no
|
|
|
|
// Dataset
|
|
string dataset_source = 40; // HF dataset ID, local file/dir path
|
|
string dataset_split = 41; // train, test, etc.
|
|
|
|
// Output
|
|
string output_dir = 50;
|
|
string job_id = 51; // client-assigned or auto-generated
|
|
|
|
// Resume training from a checkpoint
|
|
string resume_from_checkpoint = 55; // path to checkpoint dir to resume from
|
|
|
|
// Backend-specific AND method-specific extensibility
|
|
map<string, string> extra_options = 60;
|
|
}
|
|
|
|
message FineTuneJobResult {
|
|
string job_id = 1;
|
|
bool success = 2;
|
|
string message = 3;
|
|
}
|
|
|
|
message FineTuneProgressRequest {
|
|
string job_id = 1;
|
|
}
|
|
|
|
message FineTuneProgressUpdate {
|
|
string job_id = 1;
|
|
int32 current_step = 2;
|
|
int32 total_steps = 3;
|
|
float current_epoch = 4;
|
|
float total_epochs = 5;
|
|
float loss = 6;
|
|
float learning_rate = 7;
|
|
float grad_norm = 8;
|
|
float eval_loss = 9;
|
|
float eta_seconds = 10;
|
|
float progress_percent = 11;
|
|
string status = 12; // queued, caching, loading_model, loading_dataset, training, saving, completed, failed, stopped
|
|
string message = 13;
|
|
string checkpoint_path = 14; // set when a checkpoint is saved
|
|
string sample_path = 15; // set when a sample is generated (video/image backends)
|
|
map<string, float> extra_metrics = 16; // method-specific metrics
|
|
}
|
|
|
|
message FineTuneStopRequest {
|
|
string job_id = 1;
|
|
bool save_checkpoint = 2;
|
|
}
|
|
|
|
message ListCheckpointsRequest {
|
|
string output_dir = 1;
|
|
}
|
|
|
|
message ListCheckpointsResponse {
|
|
repeated CheckpointInfo checkpoints = 1;
|
|
}
|
|
|
|
message CheckpointInfo {
|
|
string path = 1;
|
|
int32 step = 2;
|
|
float epoch = 3;
|
|
float loss = 4;
|
|
string created_at = 5;
|
|
}
|
|
|
|
message ExportModelRequest {
|
|
string checkpoint_path = 1;
|
|
string output_path = 2;
|
|
string export_format = 3; // lora, loha, lokr, merged_16bit, merged_4bit, gguf, diffusers
|
|
string quantization_method = 4; // for GGUF: q4_k_m, q5_k_m, q8_0, f16, etc.
|
|
string model = 5; // base model name (for merge operations)
|
|
map<string, string> extra_options = 6;
|
|
}
|
|
|
|
// Quantization messages
|
|
|
|
message QuantizationRequest {
|
|
string model = 1; // HF model name or local path
|
|
string quantization_type = 2; // q4_k_m, q5_k_m, q8_0, f16, etc.
|
|
string output_dir = 3; // where to write output files
|
|
string job_id = 4; // client-assigned job ID
|
|
map<string, string> extra_options = 5; // hf_token, custom flags, etc.
|
|
}
|
|
|
|
message QuantizationJobResult {
|
|
string job_id = 1;
|
|
bool success = 2;
|
|
string message = 3;
|
|
}
|
|
|
|
message QuantizationProgressRequest {
|
|
string job_id = 1;
|
|
}
|
|
|
|
message QuantizationProgressUpdate {
|
|
string job_id = 1;
|
|
float progress_percent = 2;
|
|
string status = 3; // queued, downloading, converting, quantizing, completed, failed, stopped
|
|
string message = 4;
|
|
string output_file = 5; // set when completed — path to the output GGUF file
|
|
map<string, float> extra_metrics = 6; // e.g. file_size_mb, compression_ratio
|
|
}
|
|
|
|
message QuantizationStopRequest {
|
|
string job_id = 1;
|
|
}
|
|
|