mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-30 12:08:13 -04:00
* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis
Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.
New gRPC RPCs (backend/backend.proto):
* FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
* FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse
Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.
New HTTP endpoints under /v1/face/:
* verify — 1:1 image pair same-person decision
* analyze — per-face age + gender (emotion/race reserved)
* register — 1:N enrollment; stores embedding in vector store
* identify — 1:N recognition; detect → embed → StoresFind
* forget — remove a registered face by opaque ID
Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).
New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.
Gallery (backend/index.yaml) ships three entries:
* insightface-buffalo-l — SCRFD-10GF + ArcFace R50 + genderage
(~326MB pre-baked; non-commercial research use only)
* insightface-opencv — YuNet + SFace (~40MB pre-baked; Apache 2.0)
* insightface-buffalo-s — SCRFD-500MF + MBF (runtime download; non-commercial)
Python backend (backend/python/insightface/):
* engines.py — FaceEngine protocol with InsightFaceEngine and
OnnxDirectEngine; resolves model paths relative to the backend
directory so the same gallery config works in docker-scratch and
in the e2e-backends rootfs-extraction harness.
* backend.py — gRPC servicer implementing Health, LoadModel, Status,
Embedding, Detect, FaceVerify, FaceAnalyze.
* install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
backend directory so first-run is offline-clean (the final scratch
image only preserves files under /<backend>/).
* test.py — parametrized unit tests over both engines.
Tests:
* Registry unit tests (go test -race ./core/services/facerecognition/...)
— in-memory fake grpc.Backend, table-driven, covers register/
identify/forget/error paths + concurrent access.
* tests/e2e-backends/backend_test.go extended with face caps
(face_detect, face_embed, face_verify, face_analyze); relative
ordering + configurable verifyCeiling per engine.
* Makefile targets: test-extra-backend-insightface-buffalo-l,
-opencv, and the -all aggregate.
* CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
auto-triggered by changes under backend/python/insightface/.
Docs:
* docs/content/features/face-recognition.md — feature page with
license table, quickstart (defaults to the commercial-safe model),
models matrix, API reference, 1:N workflow, storage caveats.
* Cross-refs in object-detection.md, stores.md, embeddings.md, and
whats-new.md.
* Contributor README at backend/python/insightface/README.md.
Verified end-to-end:
* buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
face_verify, face_analyze).
* opencv: 5/5 specs (same minus face_analyze — SFace has no
demographic head; correctly skipped via BACKEND_TEST_CAPS).
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): move engine selection to model gallery, collapse backend entries
The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.
Correct split:
* `backend/index.yaml` now has ONE `insightface` backend entry
shipping the CPU + CUDA 12 container images. The Python backend
bundles both the non-commercial insightface model packs
(buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
weights (YuNet + SFace); the active engine is selected at
LoadModel time via `options: ["engine:..."]`.
* `gallery/index.yaml` gains three model entries —
`insightface-buffalo-l`, `insightface-opencv`,
`insightface-buffalo-s` — each setting the appropriate
`overrides.backend` + `overrides.options` so installing one
actually gives the user the intended engine. This matches how
`rfdetr-base` lives in the model gallery against the `rfdetr`
backend.
The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): cover all supported models in the gallery + drop weight baking
Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.
Gallery now has seven `insightface-*` model entries (gallery/index.yaml):
insightface (family) — non-commercial research use
• buffalo-l (326MB) — SCRFD-10GF + ResNet50 + genderage, default
• buffalo-m (313MB) — SCRFD-2.5GF + ResNet50 + genderage
• buffalo-s (159MB) — SCRFD-500MF + MBF + genderage
• buffalo-sc (16MB) — SCRFD-500MF + MBF, recognition only
(no landmarks, no demographics — analyze
returns empty attributes)
• antelopev2 (407MB) — SCRFD-10GF + ResNet100@Glint360K + genderage
OpenCV Zoo family — Apache 2.0 commercial-safe
• opencv — YuNet + SFace fp32 (~40MB)
• opencv-int8 — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)
Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:
* OpenCV variants list `files:` with ONNX URIs + SHA-256, so
`local-ai models install insightface-opencv` pulls them into the
models directory exactly like any other gallery-managed model.
* insightface packs (upstream distributes .zip archives only, not
individual ONNX files) auto-download on first LoadModel via
FaceAnalysis' built-in machinery, rooted at the LocalAI models
directory so they live alongside everything else — same pattern
`rfdetr` uses with `inference.get_model()`.
Backend changes (backend/python/insightface/):
* backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
LocalAI models directory) to engines via a `_model_dir` hint.
This replaces the earlier ModelFile-dirname approach; ModelPath
is the canonical "models directory" variable set by the Go loader
(pkg/model/initializers.go:144) and is always populated.
* engines.py::_resolve_model_path — picks up `model_dir` and searches
it (plus basename-in-model-dir) before falling back to the dev
script-dir. This is how OnnxDirectEngine finds gallery-downloaded
YuNet/SFace files by filename only.
* engines.py::_flatten_insightface_pack — new helper that works
around an upstream packaging inconsistency: buffalo_l/s/sc zips
expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
files in a redundant `<name>/` directory. insightface's own
loader looks one level too shallow and fails. We call
`ensure_available()` explicitly, flatten if nested, then hand to
FaceAnalysis.
* engines.py::InsightFaceEngine.prepare — root-resolution order now
includes the `_model_dir` hint so packs download into the LocalAI
models directory by default.
* install.sh — no longer pre-downloads any weights. Everything is
gallery-managed now.
* smoke.py (new) — parametrized smoke test that iterates over every
gallery configuration, simulating the LocalAI install flow
(creates a models dir, fetches OpenCV files with checksum
verification, lets insightface auto-download its packs), then
runs detect + embed + verify (+ analyze where supported) through
the in-process BackendServicer.
* test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
paths; downloads ONNX files to a temp dir at setUpClass time and
passes ModelPath accordingly.
Registry change (core/services/facerecognition/store_registry.go):
* `dim=0` in NewStoreRegistry now means "accept whatever dimension
arrives" — needed because the backend supports 512-d ArcFace/MBF
and 128-d SFace via the same Registry. A non-zero dim still fails
fast with ErrDimensionMismatch.
* core/application plumbs `faceEmbeddingDim = 0`, explaining the
rationale in the comment.
Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.
Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:
PASS: insightface-buffalo-l faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-sc faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-s faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-m faces=6 dim=512 same-dist=0.000
PASS: insightface-antelopev2 faces=6 dim=512 same-dist=0.000
PASS: insightface-opencv faces=6 dim=128 same-dist=0.000
PASS: insightface-opencv-int8 faces=6 dim=128 same-dist=0.000
7/7 passed
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim
CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:
LoadModel failed: Failed to load face engine:
OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx
Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.
Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).
Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs
The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.
Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).
Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).
Response:
{
"embedding": [<dim> floats, L2-normed],
"dim": int, // 512 for ArcFace R50 / MBF, 128 for SFace
"model": "<name>"
}
Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.
Assisted-by: Claude:claude-opus-4-7
* fix(http): map malformed image input + gRPC status codes to proper 4xx
Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.
Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:
* decodeImageInput(s)
Wraps utils.GetContentURIAsBase64 and turns any failure
(invalid URL, not a data-URI, download error, etc.) into
echo.NewHTTPError(400, "invalid image input: ...").
* mapBackendError(err)
Inspects the gRPC status on a backend call error and maps:
INVALID_ARGUMENT → 400 Bad Request
NOT_FOUND → 404 Not Found
FAILED_PRECONDITION → 412 Precondition Failed
Unimplemented → 501 Not Implemented
All other codes fall through unchanged (still 500).
Before, my 1×1 PNG error-path test returned:
HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
HTTP 400 "failed to decode one or both images"
Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.
Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.
Assisted-by: Claude:claude-opus-4-7
* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis
Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.
Two changes:
1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
`files:` list with the pack zip's URI + SHA-256. `local-ai models
install insightface-buffalo-l` now fetches the zip, verifies the
hash, and extracts it into the models directory. No more reliance
on insightface's library-internal `ensure_available()` auto-download
or its hardcoded `BASE_REPO_URL`.
2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
the FaceAnalysis wrapper and drives insightface's `model_zoo`
directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
route each through `model_zoo.get_model()`, build a
`{taskname: model}` dict, loop per-face at inference — are
reimplemented in `InsightFaceEngine`. The actual inference classes
(RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
insightface's — we only replicate the glue, so drift risk against
upstream is minimal.
Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
layout that doesn't match what LocalAI's zip extraction produces.
LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
`<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
`_locate_insightface_pack` helper searches both locations plus
legacy paths and returns whichever has ONNX files. Replaces the
earlier `_flatten_insightface_pack` helper (which tried to fight
FaceAnalysis's layout expectations; now we just find the files
wherever they are).
Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched
The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".
Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.
Face analyze cap dropped since buffalo_sc has no age/gender head.
Assisted-by: Claude:claude-opus-4-7[1m]
* feat(face-recognition): surface face-recognition in advertised feature maps
The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:
* api_instructions — the machine-readable capability index at
GET /api/instructions. Added `face-recognition` as a dedicated
instruction area with an intro that calls out the in-memory
registry caveat and the /v1/face/embed vs /v1/embeddings split.
* auth/permissions — added FeatureFaceRecognition constant, routed
all six face endpoints through it so admins can gate them per-user
like any other API feature. Default ON (matches the other API
features).
* React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
follow-up (noted in the plan).
Instruction count bumped 9 → 10; test updated.
Assisted-by: Claude:claude-opus-4-7[1m]
* docs(agents): capture advertising-surface steps in the endpoint guide
Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.
Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.
Assisted-by: Claude:claude-opus-4-7[1m]
821 lines
30 KiB
Go
821 lines
30 KiB
Go
package config
|
|
|
|
import (
|
|
"fmt"
|
|
"os"
|
|
"regexp"
|
|
"slices"
|
|
"strings"
|
|
|
|
"github.com/mudler/LocalAI/core/schema"
|
|
"github.com/mudler/LocalAI/pkg/downloader"
|
|
"github.com/mudler/LocalAI/pkg/functions"
|
|
"github.com/mudler/LocalAI/pkg/reasoning"
|
|
"github.com/mudler/cogito"
|
|
"gopkg.in/yaml.v3"
|
|
)
|
|
|
|
const (
|
|
RAND_SEED = -1
|
|
)
|
|
|
|
// @Description TTS configuration
|
|
type TTSConfig struct {
|
|
|
|
// Voice wav path or id
|
|
Voice string `yaml:"voice,omitempty" json:"voice,omitempty"`
|
|
|
|
AudioPath string `yaml:"audio_path,omitempty" json:"audio_path,omitempty"`
|
|
}
|
|
|
|
// @Description ModelConfig represents a model configuration
|
|
type ModelConfig struct {
|
|
modelConfigFile string `yaml:"-" json:"-"`
|
|
modelTemplate string `yaml:"-" json:"-"`
|
|
schema.PredictionOptions `yaml:"parameters,omitempty" json:"parameters,omitempty"`
|
|
Name string `yaml:"name,omitempty" json:"name,omitempty"`
|
|
|
|
F16 *bool `yaml:"f16,omitempty" json:"f16,omitempty"`
|
|
Threads *int `yaml:"threads,omitempty" json:"threads,omitempty"`
|
|
Debug *bool `yaml:"debug,omitempty" json:"debug,omitempty"`
|
|
Roles map[string]string `yaml:"roles,omitempty" json:"roles,omitempty"`
|
|
Embeddings *bool `yaml:"embeddings,omitempty" json:"embeddings,omitempty"`
|
|
Backend string `yaml:"backend,omitempty" json:"backend,omitempty"`
|
|
TemplateConfig TemplateConfig `yaml:"template,omitempty" json:"template,omitempty"`
|
|
KnownUsecaseStrings []string `yaml:"known_usecases,omitempty" json:"known_usecases,omitempty"`
|
|
KnownUsecases *ModelConfigUsecase `yaml:"-" json:"-"`
|
|
Pipeline Pipeline `yaml:"pipeline,omitempty" json:"pipeline,omitempty"`
|
|
|
|
PromptStrings, InputStrings []string `yaml:"-" json:"-"`
|
|
InputToken [][]int `yaml:"-" json:"-"`
|
|
functionCallString, functionCallNameString string `yaml:"-" json:"-"`
|
|
ResponseFormat string `yaml:"-" json:"-"`
|
|
ResponseFormatMap map[string]any `yaml:"-" json:"-"`
|
|
|
|
// MediaMarker is the runtime-discovered multimodal marker the backend expects
|
|
// in the prompt (e.g. "<__media__>" or a random "<__media_<rand>__>" picked by
|
|
// llama.cpp). Populated on first successful ModelMetadata call. Empty until
|
|
// then — callers must fall back to templates.DefaultMultiMediaMarker.
|
|
MediaMarker string `yaml:"-" json:"-"`
|
|
|
|
FunctionsConfig functions.FunctionsConfig `yaml:"function,omitempty" json:"function,omitempty"`
|
|
ReasoningConfig reasoning.Config `yaml:"reasoning,omitempty" json:"reasoning,omitempty"`
|
|
|
|
FeatureFlag FeatureFlag `yaml:"feature_flags,omitempty" json:"feature_flags,omitempty"` // Feature Flag registry. We move fast, and features may break on a per model/backend basis. Registry for (usually temporary) flags that indicate aborting something early.
|
|
// LLM configs (GPT4ALL, Llama.cpp, ...)
|
|
LLMConfig `yaml:",inline" json:",inline"`
|
|
|
|
// Diffusers
|
|
Diffusers Diffusers `yaml:"diffusers,omitempty" json:"diffusers,omitempty"`
|
|
Step int `yaml:"step,omitempty" json:"step,omitempty"`
|
|
|
|
// GRPC Options
|
|
GRPC GRPC `yaml:"grpc,omitempty" json:"grpc,omitempty"`
|
|
|
|
// TTS specifics
|
|
TTSConfig `yaml:"tts,omitempty" json:"tts,omitempty"`
|
|
|
|
// CUDA
|
|
// Explicitly enable CUDA or not (some backends might need it)
|
|
CUDA bool `yaml:"cuda,omitempty" json:"cuda,omitempty"`
|
|
|
|
DownloadFiles []File `yaml:"download_files,omitempty" json:"download_files,omitempty"`
|
|
|
|
Description string `yaml:"description,omitempty" json:"description,omitempty"`
|
|
Usage string `yaml:"usage,omitempty" json:"usage,omitempty"`
|
|
Disabled *bool `yaml:"disabled,omitempty" json:"disabled,omitempty"`
|
|
Pinned *bool `yaml:"pinned,omitempty" json:"pinned,omitempty"`
|
|
|
|
Options []string `yaml:"options,omitempty" json:"options,omitempty"`
|
|
Overrides []string `yaml:"overrides,omitempty" json:"overrides,omitempty"`
|
|
|
|
MCP MCPConfig `yaml:"mcp,omitempty" json:"mcp,omitempty"`
|
|
Agent AgentConfig `yaml:"agent,omitempty" json:"agent,omitempty"`
|
|
}
|
|
|
|
// @Description MCP configuration
|
|
type MCPConfig struct {
|
|
Servers string `yaml:"remote,omitempty" json:"remote,omitempty"`
|
|
Stdio string `yaml:"stdio,omitempty" json:"stdio,omitempty"`
|
|
}
|
|
|
|
// @Description Agent configuration
|
|
type AgentConfig struct {
|
|
MaxAttempts int `yaml:"max_attempts,omitempty" json:"max_attempts,omitempty"`
|
|
MaxIterations int `yaml:"max_iterations,omitempty" json:"max_iterations,omitempty"`
|
|
EnableReasoning bool `yaml:"enable_reasoning,omitempty" json:"enable_reasoning,omitempty"`
|
|
EnablePlanning bool `yaml:"enable_planning,omitempty" json:"enable_planning,omitempty"`
|
|
EnableMCPPrompts bool `yaml:"enable_mcp_prompts,omitempty" json:"enable_mcp_prompts,omitempty"`
|
|
EnablePlanReEvaluator bool `yaml:"enable_plan_re_evaluator,omitempty" json:"enable_plan_re_evaluator,omitempty"`
|
|
DisableSinkState bool `yaml:"disable_sink_state,omitempty" json:"disable_sink_state,omitempty"`
|
|
LoopDetection int `yaml:"loop_detection,omitempty" json:"loop_detection,omitempty"`
|
|
MaxAdjustmentAttempts int `yaml:"max_adjustment_attempts,omitempty" json:"max_adjustment_attempts,omitempty"`
|
|
ForceReasoningTool bool `yaml:"force_reasoning_tool,omitempty" json:"force_reasoning_tool,omitempty"`
|
|
}
|
|
|
|
// HasMCPServers returns true if any MCP servers (remote or stdio) are configured.
|
|
func (c MCPConfig) HasMCPServers() bool {
|
|
return c.Servers != "" || c.Stdio != ""
|
|
}
|
|
|
|
func (c *MCPConfig) MCPConfigFromYAML() (MCPGenericConfig[MCPRemoteServers], MCPGenericConfig[MCPSTDIOServers], error) {
|
|
var remote MCPGenericConfig[MCPRemoteServers]
|
|
var stdio MCPGenericConfig[MCPSTDIOServers]
|
|
|
|
if err := yaml.Unmarshal([]byte(c.Servers), &remote); err != nil {
|
|
return remote, stdio, err
|
|
}
|
|
|
|
if err := yaml.Unmarshal([]byte(c.Stdio), &stdio); err != nil {
|
|
return remote, stdio, err
|
|
}
|
|
return remote, stdio, nil
|
|
}
|
|
|
|
// @Description MCP generic configuration
|
|
type MCPGenericConfig[T any] struct {
|
|
Servers T `yaml:"mcpServers,omitempty" json:"mcpServers,omitempty"`
|
|
}
|
|
type MCPRemoteServers map[string]MCPRemoteServer
|
|
type MCPSTDIOServers map[string]MCPSTDIOServer
|
|
|
|
// @Description MCP remote server configuration
|
|
type MCPRemoteServer struct {
|
|
URL string `json:"url,omitempty"`
|
|
Token string `json:"token,omitempty"`
|
|
}
|
|
|
|
// @Description MCP STDIO server configuration
|
|
type MCPSTDIOServer struct {
|
|
Args []string `json:"args,omitempty"`
|
|
Env map[string]string `json:"env,omitempty"`
|
|
Command string `json:"command,omitempty"`
|
|
}
|
|
|
|
// @Description Pipeline defines other models to use for audio-to-audio
|
|
type Pipeline struct {
|
|
TTS string `yaml:"tts,omitempty" json:"tts,omitempty"`
|
|
LLM string `yaml:"llm,omitempty" json:"llm,omitempty"`
|
|
Transcription string `yaml:"transcription,omitempty" json:"transcription,omitempty"`
|
|
VAD string `yaml:"vad,omitempty" json:"vad,omitempty"`
|
|
}
|
|
|
|
// @Description File configuration for model downloads
|
|
type File struct {
|
|
Filename string `yaml:"filename,omitempty" json:"filename,omitempty"`
|
|
SHA256 string `yaml:"sha256,omitempty" json:"sha256,omitempty"`
|
|
URI downloader.URI `yaml:"uri,omitempty" json:"uri,omitempty"`
|
|
}
|
|
|
|
type FeatureFlag map[string]*bool
|
|
|
|
func (ff FeatureFlag) Enabled(s string) bool {
|
|
if v, exists := ff[s]; exists && v != nil {
|
|
return *v
|
|
}
|
|
return false
|
|
}
|
|
|
|
// @Description GRPC configuration
|
|
type GRPC struct {
|
|
Attempts int `yaml:"attempts,omitempty" json:"attempts,omitempty"`
|
|
AttemptsSleepTime int `yaml:"attempts_sleep_time,omitempty" json:"attempts_sleep_time,omitempty"`
|
|
}
|
|
|
|
// @Description Diffusers configuration
|
|
type Diffusers struct {
|
|
CUDA bool `yaml:"cuda,omitempty" json:"cuda,omitempty"`
|
|
PipelineType string `yaml:"pipeline_type,omitempty" json:"pipeline_type,omitempty"`
|
|
SchedulerType string `yaml:"scheduler_type,omitempty" json:"scheduler_type,omitempty"`
|
|
EnableParameters string `yaml:"enable_parameters,omitempty" json:"enable_parameters,omitempty"` // A list of comma separated parameters to specify
|
|
IMG2IMG bool `yaml:"img2img,omitempty" json:"img2img,omitempty"` // Image to Image Diffuser
|
|
ClipSkip int `yaml:"clip_skip,omitempty" json:"clip_skip,omitempty"` // Skip every N frames
|
|
ClipModel string `yaml:"clip_model,omitempty" json:"clip_model,omitempty"` // Clip model to use
|
|
ClipSubFolder string `yaml:"clip_subfolder,omitempty" json:"clip_subfolder,omitempty"` // Subfolder to use for clip model
|
|
ControlNet string `yaml:"control_net,omitempty" json:"control_net,omitempty"`
|
|
}
|
|
|
|
// @Description LLMConfig is a struct that holds the configuration that are generic for most of the LLM backends.
|
|
type LLMConfig struct {
|
|
SystemPrompt string `yaml:"system_prompt,omitempty" json:"system_prompt,omitempty"`
|
|
TensorSplit string `yaml:"tensor_split,omitempty" json:"tensor_split,omitempty"`
|
|
MainGPU string `yaml:"main_gpu,omitempty" json:"main_gpu,omitempty"`
|
|
RMSNormEps float32 `yaml:"rms_norm_eps,omitempty" json:"rms_norm_eps,omitempty"`
|
|
NGQA int32 `yaml:"ngqa,omitempty" json:"ngqa,omitempty"`
|
|
PromptCachePath string `yaml:"prompt_cache_path,omitempty" json:"prompt_cache_path,omitempty"`
|
|
PromptCacheAll bool `yaml:"prompt_cache_all,omitempty" json:"prompt_cache_all,omitempty"`
|
|
PromptCacheRO bool `yaml:"prompt_cache_ro,omitempty" json:"prompt_cache_ro,omitempty"`
|
|
MirostatETA *float64 `yaml:"mirostat_eta,omitempty" json:"mirostat_eta,omitempty"`
|
|
MirostatTAU *float64 `yaml:"mirostat_tau,omitempty" json:"mirostat_tau,omitempty"`
|
|
Mirostat *int `yaml:"mirostat,omitempty" json:"mirostat,omitempty"`
|
|
NGPULayers *int `yaml:"gpu_layers,omitempty" json:"gpu_layers,omitempty"`
|
|
MMap *bool `yaml:"mmap,omitempty" json:"mmap,omitempty"`
|
|
MMlock *bool `yaml:"mmlock,omitempty" json:"mmlock,omitempty"`
|
|
LowVRAM *bool `yaml:"low_vram,omitempty" json:"low_vram,omitempty"`
|
|
Reranking *bool `yaml:"reranking,omitempty" json:"reranking,omitempty"`
|
|
Grammar string `yaml:"grammar,omitempty" json:"grammar,omitempty"`
|
|
StopWords []string `yaml:"stopwords,omitempty" json:"stopwords,omitempty"`
|
|
Cutstrings []string `yaml:"cutstrings,omitempty" json:"cutstrings,omitempty"`
|
|
ExtractRegex []string `yaml:"extract_regex,omitempty" json:"extract_regex,omitempty"`
|
|
TrimSpace []string `yaml:"trimspace,omitempty" json:"trimspace,omitempty"`
|
|
TrimSuffix []string `yaml:"trimsuffix,omitempty" json:"trimsuffix,omitempty"`
|
|
|
|
ContextSize *int `yaml:"context_size,omitempty" json:"context_size,omitempty"`
|
|
NUMA bool `yaml:"numa,omitempty" json:"numa,omitempty"`
|
|
LoraAdapter string `yaml:"lora_adapter,omitempty" json:"lora_adapter,omitempty"`
|
|
LoraBase string `yaml:"lora_base,omitempty" json:"lora_base,omitempty"`
|
|
LoraAdapters []string `yaml:"lora_adapters,omitempty" json:"lora_adapters,omitempty"`
|
|
LoraScales []float32 `yaml:"lora_scales,omitempty" json:"lora_scales,omitempty"`
|
|
LoraScale float32 `yaml:"lora_scale,omitempty" json:"lora_scale,omitempty"`
|
|
NoMulMatQ bool `yaml:"no_mulmatq,omitempty" json:"no_mulmatq,omitempty"`
|
|
DraftModel string `yaml:"draft_model,omitempty" json:"draft_model,omitempty"`
|
|
NDraft int32 `yaml:"n_draft,omitempty" json:"n_draft,omitempty"`
|
|
Quantization string `yaml:"quantization,omitempty" json:"quantization,omitempty"`
|
|
LoadFormat string `yaml:"load_format,omitempty" json:"load_format,omitempty"`
|
|
GPUMemoryUtilization float32 `yaml:"gpu_memory_utilization,omitempty" json:"gpu_memory_utilization,omitempty"` // vLLM
|
|
TrustRemoteCode bool `yaml:"trust_remote_code,omitempty" json:"trust_remote_code,omitempty"` // vLLM
|
|
EnforceEager bool `yaml:"enforce_eager,omitempty" json:"enforce_eager,omitempty"` // vLLM
|
|
SwapSpace int `yaml:"swap_space,omitempty" json:"swap_space,omitempty"` // vLLM
|
|
MaxModelLen int `yaml:"max_model_len,omitempty" json:"max_model_len,omitempty"` // vLLM
|
|
TensorParallelSize int `yaml:"tensor_parallel_size,omitempty" json:"tensor_parallel_size,omitempty"` // vLLM
|
|
DisableLogStatus bool `yaml:"disable_log_stats,omitempty" json:"disable_log_stats,omitempty"` // vLLM
|
|
DType string `yaml:"dtype,omitempty" json:"dtype,omitempty"` // vLLM
|
|
LimitMMPerPrompt LimitMMPerPrompt `yaml:"limit_mm_per_prompt,omitempty" json:"limit_mm_per_prompt,omitempty"` // vLLM
|
|
MMProj string `yaml:"mmproj,omitempty" json:"mmproj,omitempty"`
|
|
|
|
FlashAttention *string `yaml:"flash_attention,omitempty" json:"flash_attention,omitempty"`
|
|
NoKVOffloading bool `yaml:"no_kv_offloading,omitempty" json:"no_kv_offloading,omitempty"`
|
|
CacheTypeK string `yaml:"cache_type_k,omitempty" json:"cache_type_k,omitempty"`
|
|
CacheTypeV string `yaml:"cache_type_v,omitempty" json:"cache_type_v,omitempty"`
|
|
|
|
RopeScaling string `yaml:"rope_scaling,omitempty" json:"rope_scaling,omitempty"`
|
|
ModelType string `yaml:"type,omitempty" json:"type,omitempty"`
|
|
|
|
YarnExtFactor float32 `yaml:"yarn_ext_factor,omitempty" json:"yarn_ext_factor,omitempty"`
|
|
YarnAttnFactor float32 `yaml:"yarn_attn_factor,omitempty" json:"yarn_attn_factor,omitempty"`
|
|
YarnBetaFast float32 `yaml:"yarn_beta_fast,omitempty" json:"yarn_beta_fast,omitempty"`
|
|
YarnBetaSlow float32 `yaml:"yarn_beta_slow,omitempty" json:"yarn_beta_slow,omitempty"`
|
|
|
|
CFGScale float32 `yaml:"cfg_scale,omitempty" json:"cfg_scale,omitempty"` // Classifier-Free Guidance Scale
|
|
}
|
|
|
|
// @Description LimitMMPerPrompt is a struct that holds the configuration for the limit-mm-per-prompt config in vLLM
|
|
type LimitMMPerPrompt struct {
|
|
LimitImagePerPrompt int `yaml:"image,omitempty" json:"image,omitempty"`
|
|
LimitVideoPerPrompt int `yaml:"video,omitempty" json:"video,omitempty"`
|
|
LimitAudioPerPrompt int `yaml:"audio,omitempty" json:"audio,omitempty"`
|
|
}
|
|
|
|
// @Description TemplateConfig is a struct that holds the configuration of the templating system
|
|
type TemplateConfig struct {
|
|
// Chat is the template used in the chat completion endpoint
|
|
Chat string `yaml:"chat,omitempty" json:"chat,omitempty"`
|
|
|
|
// ChatMessage is the template used for chat messages
|
|
ChatMessage string `yaml:"chat_message,omitempty" json:"chat_message,omitempty"`
|
|
|
|
// Completion is the template used for completion requests
|
|
Completion string `yaml:"completion,omitempty" json:"completion,omitempty"`
|
|
|
|
// Edit is the template used for edit completion requests
|
|
Edit string `yaml:"edit,omitempty" json:"edit,omitempty"`
|
|
|
|
// Functions is the template used when tools are present in the client requests
|
|
Functions string `yaml:"function,omitempty" json:"function,omitempty"`
|
|
|
|
// UseTokenizerTemplate is a flag that indicates if the tokenizer template should be used.
|
|
// Note: this is mostly consumed for backends such as vllm and transformers
|
|
// that can use the tokenizers specified in the JSON config files of the models
|
|
UseTokenizerTemplate bool `yaml:"use_tokenizer_template,omitempty" json:"use_tokenizer_template,omitempty"`
|
|
|
|
// JoinChatMessagesByCharacter is a string that will be used to join chat messages together.
|
|
// It defaults to \n
|
|
JoinChatMessagesByCharacter *string `yaml:"join_chat_messages_by_character,omitempty" json:"join_chat_messages_by_character,omitempty"`
|
|
|
|
Multimodal string `yaml:"multimodal,omitempty" json:"multimodal,omitempty"`
|
|
|
|
ReplyPrefix string `yaml:"reply_prefix,omitempty" json:"reply_prefix,omitempty"`
|
|
}
|
|
|
|
func (c *ModelConfig) syncKnownUsecasesFromString() {
|
|
c.KnownUsecases = GetUsecasesFromYAML(c.KnownUsecaseStrings)
|
|
// Make sure the usecases are valid, we rewrite with what we identified
|
|
c.KnownUsecaseStrings = []string{}
|
|
for k, usecase := range GetAllModelConfigUsecases() {
|
|
if c.HasUsecases(usecase) {
|
|
c.KnownUsecaseStrings = append(c.KnownUsecaseStrings, k)
|
|
}
|
|
}
|
|
}
|
|
|
|
func (c *ModelConfig) UnmarshalYAML(value *yaml.Node) error {
|
|
type BCAlias ModelConfig
|
|
var aux BCAlias
|
|
if err := value.Decode(&aux); err != nil {
|
|
return err
|
|
}
|
|
|
|
mc := ModelConfig(aux)
|
|
*c = mc
|
|
c.syncKnownUsecasesFromString()
|
|
return nil
|
|
}
|
|
|
|
func (c *ModelConfig) SetFunctionCallString(s string) {
|
|
c.functionCallString = s
|
|
}
|
|
|
|
func (c *ModelConfig) SetFunctionCallNameString(s string) {
|
|
c.functionCallNameString = s
|
|
}
|
|
|
|
func (c *ModelConfig) ShouldUseFunctions() bool {
|
|
return ((c.functionCallString != "none" || c.functionCallString == "") || c.ShouldCallSpecificFunction())
|
|
}
|
|
|
|
func (c *ModelConfig) ShouldCallSpecificFunction() bool {
|
|
return len(c.functionCallNameString) > 0
|
|
}
|
|
|
|
// MMProjFileName returns the filename of the MMProj file
|
|
// If the MMProj is a URL, it will return the MD5 of the URL which is the filename
|
|
func (c *ModelConfig) MMProjFileName() string {
|
|
uri := downloader.URI(c.MMProj)
|
|
if uri.LooksLikeURL() {
|
|
f, _ := uri.FilenameFromUrl()
|
|
return f
|
|
}
|
|
|
|
return c.MMProj
|
|
}
|
|
|
|
func (c *ModelConfig) IsMMProjURL() bool {
|
|
uri := downloader.URI(c.MMProj)
|
|
return uri.LooksLikeURL()
|
|
}
|
|
|
|
func (c *ModelConfig) IsModelURL() bool {
|
|
uri := downloader.URI(c.Model)
|
|
return uri.LooksLikeURL()
|
|
}
|
|
|
|
// ModelFileName returns the filename of the model
|
|
// If the model is a URL, it will return the MD5 of the URL which is the filename
|
|
func (c *ModelConfig) ModelFileName() string {
|
|
uri := downloader.URI(c.Model)
|
|
if uri.LooksLikeURL() {
|
|
f, _ := uri.FilenameFromUrl()
|
|
return f
|
|
}
|
|
|
|
return c.Model
|
|
}
|
|
|
|
func (c *ModelConfig) FunctionToCall() string {
|
|
if c.functionCallNameString != "" &&
|
|
c.functionCallNameString != "none" && c.functionCallNameString != "auto" {
|
|
return c.functionCallNameString
|
|
}
|
|
|
|
return c.functionCallString
|
|
}
|
|
|
|
func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) {
|
|
lo := &LoadOptions{}
|
|
lo.Apply(opts...)
|
|
|
|
ctx := lo.ctxSize
|
|
threads := lo.threads
|
|
f16 := lo.f16
|
|
debug := lo.debug
|
|
|
|
// Apply model-family-specific inference defaults before generic fallbacks.
|
|
// This ensures gallery-installed and runtime-loaded models get optimal parameters.
|
|
ApplyInferenceDefaults(cfg, cfg.Name, cfg.Model)
|
|
|
|
// https://github.com/ggerganov/llama.cpp/blob/75cd4c77292034ecec587ecb401366f57338f7c0/common/sampling.h#L22
|
|
defaultTopP := 0.95
|
|
defaultTopK := 40
|
|
defaultMinP := 0.0
|
|
defaultTemp := 0.9
|
|
// https://github.com/mudler/LocalAI/issues/2780
|
|
defaultMirostat := 0
|
|
defaultMirostatTAU := 5.0
|
|
defaultMirostatETA := 0.1
|
|
defaultTypicalP := 1.0
|
|
defaultTFZ := 1.0
|
|
defaultZero := 0
|
|
|
|
trueV := true
|
|
falseV := false
|
|
|
|
if cfg.Seed == nil {
|
|
// random number generator seed
|
|
defaultSeed := RAND_SEED
|
|
cfg.Seed = &defaultSeed
|
|
}
|
|
|
|
if cfg.TopK == nil {
|
|
cfg.TopK = &defaultTopK
|
|
}
|
|
|
|
if cfg.MinP == nil {
|
|
cfg.MinP = &defaultMinP
|
|
}
|
|
|
|
if cfg.TypicalP == nil {
|
|
cfg.TypicalP = &defaultTypicalP
|
|
}
|
|
|
|
if cfg.TFZ == nil {
|
|
cfg.TFZ = &defaultTFZ
|
|
}
|
|
|
|
if cfg.MMap == nil {
|
|
// MMap is enabled by default
|
|
|
|
// Only exception is for Intel GPUs
|
|
if os.Getenv("XPU") != "" {
|
|
cfg.MMap = &falseV
|
|
} else {
|
|
cfg.MMap = &trueV
|
|
}
|
|
}
|
|
|
|
if cfg.MMlock == nil {
|
|
// MMlock is disabled by default
|
|
cfg.MMlock = &falseV
|
|
}
|
|
|
|
if cfg.TopP == nil {
|
|
cfg.TopP = &defaultTopP
|
|
}
|
|
if cfg.Temperature == nil {
|
|
cfg.Temperature = &defaultTemp
|
|
}
|
|
|
|
if cfg.Maxtokens == nil {
|
|
cfg.Maxtokens = &defaultZero
|
|
}
|
|
|
|
if cfg.Mirostat == nil {
|
|
cfg.Mirostat = &defaultMirostat
|
|
}
|
|
|
|
if cfg.MirostatETA == nil {
|
|
cfg.MirostatETA = &defaultMirostatETA
|
|
}
|
|
|
|
if cfg.MirostatTAU == nil {
|
|
cfg.MirostatTAU = &defaultMirostatTAU
|
|
}
|
|
|
|
if cfg.LowVRAM == nil {
|
|
cfg.LowVRAM = &falseV
|
|
}
|
|
|
|
if cfg.Embeddings == nil {
|
|
cfg.Embeddings = &falseV
|
|
}
|
|
|
|
if cfg.Reranking == nil {
|
|
cfg.Reranking = &falseV
|
|
}
|
|
|
|
if threads == 0 {
|
|
// Threads can't be 0
|
|
threads = 4
|
|
}
|
|
|
|
if cfg.Threads == nil {
|
|
cfg.Threads = &threads
|
|
}
|
|
|
|
if cfg.F16 == nil {
|
|
cfg.F16 = &f16
|
|
}
|
|
|
|
if cfg.Debug == nil {
|
|
cfg.Debug = &falseV
|
|
}
|
|
|
|
if debug {
|
|
cfg.Debug = &trueV
|
|
}
|
|
|
|
// If a context size was provided via LoadOptions, apply it before hooks so they
|
|
// don't override it with their own defaults.
|
|
if ctx != 0 && cfg.ContextSize == nil {
|
|
cfg.ContextSize = &ctx
|
|
}
|
|
runBackendHooks(cfg, lo.modelPath)
|
|
cfg.syncKnownUsecasesFromString()
|
|
}
|
|
|
|
func (c *ModelConfig) Validate() (bool, error) {
|
|
downloadedFileNames := []string{}
|
|
for _, f := range c.DownloadFiles {
|
|
downloadedFileNames = append(downloadedFileNames, f.Filename)
|
|
}
|
|
validationTargets := []string{c.Backend, c.Model, c.MMProj}
|
|
validationTargets = append(validationTargets, downloadedFileNames...)
|
|
// Simple validation to make sure the model can be correctly loaded
|
|
for _, n := range validationTargets {
|
|
if n == "" {
|
|
continue
|
|
}
|
|
if strings.HasPrefix(n, string(os.PathSeparator)) ||
|
|
strings.Contains(n, "..") {
|
|
return false, fmt.Errorf("invalid file path: %s", n)
|
|
}
|
|
}
|
|
|
|
if c.Backend != "" {
|
|
// a regex that checks that is a string name with no special characters, except '-' and '_'
|
|
re := regexp.MustCompile(`^[a-zA-Z0-9-_]+$`)
|
|
if !re.MatchString(c.Backend) {
|
|
return false, fmt.Errorf("invalid backend name: %s", c.Backend)
|
|
}
|
|
}
|
|
|
|
// Validate MCP configuration if present
|
|
if c.MCP.Servers != "" || c.MCP.Stdio != "" {
|
|
if _, _, err := c.MCP.MCPConfigFromYAML(); err != nil {
|
|
return false, fmt.Errorf("invalid MCP configuration: %w", err)
|
|
}
|
|
}
|
|
|
|
return true, nil
|
|
}
|
|
|
|
func (c *ModelConfig) HasTemplate() bool {
|
|
return c.TemplateConfig.Completion != "" || c.TemplateConfig.Edit != "" || c.TemplateConfig.Chat != "" || c.TemplateConfig.ChatMessage != "" || c.TemplateConfig.UseTokenizerTemplate
|
|
}
|
|
|
|
func (c *ModelConfig) GetModelConfigFile() string {
|
|
return c.modelConfigFile
|
|
}
|
|
|
|
// GetModelTemplate returns the model's chat template if available
|
|
func (c *ModelConfig) GetModelTemplate() string {
|
|
return c.modelTemplate
|
|
}
|
|
|
|
// IsDisabled returns true if the model is disabled
|
|
func (c *ModelConfig) IsDisabled() bool {
|
|
return c.Disabled != nil && *c.Disabled
|
|
}
|
|
|
|
// IsPinned returns true if the model is pinned (excluded from idle unloading and eviction)
|
|
func (c *ModelConfig) IsPinned() bool {
|
|
return c.Pinned != nil && *c.Pinned
|
|
}
|
|
|
|
type ModelConfigUsecase int
|
|
|
|
const (
|
|
FLAG_ANY ModelConfigUsecase = 0b000000000000
|
|
FLAG_CHAT ModelConfigUsecase = 0b000000000001
|
|
FLAG_COMPLETION ModelConfigUsecase = 0b000000000010
|
|
FLAG_EDIT ModelConfigUsecase = 0b000000000100
|
|
FLAG_EMBEDDINGS ModelConfigUsecase = 0b000000001000
|
|
FLAG_RERANK ModelConfigUsecase = 0b000000010000
|
|
FLAG_IMAGE ModelConfigUsecase = 0b000000100000
|
|
FLAG_TRANSCRIPT ModelConfigUsecase = 0b000001000000
|
|
FLAG_TTS ModelConfigUsecase = 0b000010000000
|
|
FLAG_SOUND_GENERATION ModelConfigUsecase = 0b000100000000
|
|
FLAG_TOKENIZE ModelConfigUsecase = 0b001000000000
|
|
FLAG_VAD ModelConfigUsecase = 0b010000000000
|
|
FLAG_VIDEO ModelConfigUsecase = 0b100000000000
|
|
FLAG_DETECTION ModelConfigUsecase = 0b1000000000000
|
|
FLAG_FACE_RECOGNITION ModelConfigUsecase = 0b10000000000000
|
|
|
|
// Common Subsets
|
|
FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT
|
|
)
|
|
|
|
func GetAllModelConfigUsecases() map[string]ModelConfigUsecase {
|
|
return map[string]ModelConfigUsecase{
|
|
// Note: FLAG_ANY is intentionally excluded from this map
|
|
// because it's 0 and would always match in HasUsecases checks
|
|
"FLAG_CHAT": FLAG_CHAT,
|
|
"FLAG_COMPLETION": FLAG_COMPLETION,
|
|
"FLAG_EDIT": FLAG_EDIT,
|
|
"FLAG_EMBEDDINGS": FLAG_EMBEDDINGS,
|
|
"FLAG_RERANK": FLAG_RERANK,
|
|
"FLAG_IMAGE": FLAG_IMAGE,
|
|
"FLAG_TRANSCRIPT": FLAG_TRANSCRIPT,
|
|
"FLAG_TTS": FLAG_TTS,
|
|
"FLAG_SOUND_GENERATION": FLAG_SOUND_GENERATION,
|
|
"FLAG_TOKENIZE": FLAG_TOKENIZE,
|
|
"FLAG_VAD": FLAG_VAD,
|
|
"FLAG_LLM": FLAG_LLM,
|
|
"FLAG_VIDEO": FLAG_VIDEO,
|
|
"FLAG_DETECTION": FLAG_DETECTION,
|
|
"FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION,
|
|
}
|
|
}
|
|
|
|
func stringToFlag(s string) string {
|
|
return "FLAG_" + strings.ToUpper(s)
|
|
}
|
|
|
|
func GetUsecasesFromYAML(input []string) *ModelConfigUsecase {
|
|
if len(input) == 0 {
|
|
return nil
|
|
}
|
|
result := FLAG_ANY
|
|
flags := GetAllModelConfigUsecases()
|
|
for _, str := range input {
|
|
for _, flag := range []string{stringToFlag(str), str} {
|
|
f, exists := flags[flag]
|
|
if exists {
|
|
result |= f
|
|
}
|
|
}
|
|
}
|
|
return &result
|
|
}
|
|
|
|
// HasUsecases examines a ModelConfig and determines which endpoints have a chance of success.
|
|
func (c *ModelConfig) HasUsecases(u ModelConfigUsecase) bool {
|
|
if (c.KnownUsecases != nil) && ((u & *c.KnownUsecases) == u) {
|
|
return true
|
|
}
|
|
return c.GuessUsecases(u)
|
|
}
|
|
|
|
// GuessUsecases is a **heuristic based** function, as the backend in question may not be loaded yet, and the config may not record what it's useful at.
|
|
// In its current state, this function should ideally check for properties of the config like templates, rather than the direct backend name checks for the lower half.
|
|
// This avoids the maintenance burden of updating this list for each new backend - but unfortunately, that's the best option for some services currently.
|
|
func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
|
|
// Backends that are clearly not text-generation
|
|
nonTextGenBackends := []string{
|
|
"whisper", "piper", "kokoro",
|
|
"diffusers", "stablediffusion", "stablediffusion-ggml",
|
|
"rerankers", "silero-vad", "rfdetr", "insightface",
|
|
"transformers-musicgen", "ace-step", "acestep-cpp",
|
|
}
|
|
|
|
if (u & FLAG_CHAT) == FLAG_CHAT {
|
|
if c.TemplateConfig.Chat == "" && c.TemplateConfig.ChatMessage == "" && !c.TemplateConfig.UseTokenizerTemplate {
|
|
return false
|
|
}
|
|
if slices.Contains(nonTextGenBackends, c.Backend) {
|
|
return false
|
|
}
|
|
if c.Embeddings != nil && *c.Embeddings {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_COMPLETION) == FLAG_COMPLETION {
|
|
if c.TemplateConfig.Completion == "" {
|
|
return false
|
|
}
|
|
if slices.Contains(nonTextGenBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_EDIT) == FLAG_EDIT {
|
|
if c.TemplateConfig.Edit == "" {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_EMBEDDINGS) == FLAG_EMBEDDINGS {
|
|
if c.Embeddings == nil || !*c.Embeddings {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_IMAGE) == FLAG_IMAGE {
|
|
imageBackends := []string{"diffusers", "stablediffusion", "stablediffusion-ggml"}
|
|
if !slices.Contains(imageBackends, c.Backend) {
|
|
return false
|
|
}
|
|
|
|
if c.Backend == "diffusers" && c.Diffusers.PipelineType == "" {
|
|
return false
|
|
}
|
|
|
|
}
|
|
if (u & FLAG_VIDEO) == FLAG_VIDEO {
|
|
videoBackends := []string{"diffusers", "stablediffusion", "vllm-omni"}
|
|
if !slices.Contains(videoBackends, c.Backend) {
|
|
return false
|
|
}
|
|
|
|
if c.Backend == "diffusers" && c.Diffusers.PipelineType == "" {
|
|
return false
|
|
}
|
|
|
|
}
|
|
if (u & FLAG_RERANK) == FLAG_RERANK {
|
|
if c.Backend != "rerankers" && (c.Reranking == nil || !*c.Reranking) {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_TRANSCRIPT) == FLAG_TRANSCRIPT {
|
|
if c.Backend != "whisper" {
|
|
return false
|
|
}
|
|
// whisper models with vad_only option are VAD, not transcription
|
|
if slices.Contains(c.Options, "vad_only") {
|
|
return false
|
|
}
|
|
}
|
|
if (u & FLAG_TTS) == FLAG_TTS {
|
|
ttsBackends := []string{"piper", "transformers-musicgen", "kokoro"}
|
|
if !slices.Contains(ttsBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if (u & FLAG_DETECTION) == FLAG_DETECTION {
|
|
detectionBackends := []string{"rfdetr", "sam3-cpp", "insightface"}
|
|
if !slices.Contains(detectionBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if (u & FLAG_FACE_RECOGNITION) == FLAG_FACE_RECOGNITION {
|
|
faceBackends := []string{"insightface"}
|
|
if !slices.Contains(faceBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if (u & FLAG_SOUND_GENERATION) == FLAG_SOUND_GENERATION {
|
|
soundGenBackends := []string{"transformers-musicgen", "ace-step", "acestep-cpp", "mock-backend"}
|
|
if !slices.Contains(soundGenBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if (u & FLAG_TOKENIZE) == FLAG_TOKENIZE {
|
|
tokenizeCapableBackends := []string{"llama.cpp", "rwkv"}
|
|
if !slices.Contains(tokenizeCapableBackends, c.Backend) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if (u & FLAG_VAD) == FLAG_VAD {
|
|
if c.Backend != "silero-vad" && !(c.Backend == "whisper" && slices.Contains(c.Options, "vad_only")) {
|
|
return false
|
|
}
|
|
}
|
|
|
|
return true
|
|
}
|
|
|
|
// BuildCogitoOptions generates cogito options from the model configuration
|
|
// It accepts a context, MCP sessions, and optional callback functions for status, reasoning, tool calls, and tool results
|
|
func (c *ModelConfig) BuildCogitoOptions() []cogito.Option {
|
|
cogitoOpts := []cogito.Option{
|
|
cogito.WithIterations(3), // default to 3 iterations
|
|
cogito.WithMaxAttempts(3), // default to 3 attempts
|
|
cogito.WithForceReasoning(),
|
|
}
|
|
|
|
// Apply agent configuration options
|
|
if c.Agent.EnableReasoning {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithForceReasoning())
|
|
}
|
|
|
|
if c.Agent.EnablePlanning {
|
|
cogitoOpts = append(cogitoOpts, cogito.EnableAutoPlan)
|
|
}
|
|
|
|
if c.Agent.EnableMCPPrompts {
|
|
cogitoOpts = append(cogitoOpts, cogito.EnableMCPPrompts)
|
|
}
|
|
|
|
if c.Agent.EnablePlanReEvaluator {
|
|
cogitoOpts = append(cogitoOpts, cogito.EnableAutoPlanReEvaluator)
|
|
}
|
|
|
|
if c.Agent.MaxIterations != 0 {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithIterations(c.Agent.MaxIterations))
|
|
}
|
|
|
|
if c.Agent.MaxAttempts != 0 {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithMaxAttempts(c.Agent.MaxAttempts))
|
|
}
|
|
|
|
if c.Agent.DisableSinkState {
|
|
cogitoOpts = append(cogitoOpts, cogito.DisableSinkState)
|
|
}
|
|
|
|
if c.Agent.LoopDetection != 0 {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithLoopDetection(c.Agent.LoopDetection))
|
|
}
|
|
|
|
if c.Agent.MaxAdjustmentAttempts != 0 {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithMaxAdjustmentAttempts(c.Agent.MaxAdjustmentAttempts))
|
|
}
|
|
|
|
if c.Agent.ForceReasoningTool {
|
|
cogitoOpts = append(cogitoOpts, cogito.WithForceReasoningTool())
|
|
}
|
|
|
|
return cogitoOpts
|
|
}
|