mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-29 02:46:37 -04:00
Merge origin/master + pin-sync paged backend to 0ed235ea
master auto-bumped the stock llama-cpp pin 9d5d882d -> 0ed235ea and updated the shared grpc-server.cpp. The paged backend's pin must track the stock pin (the grpc-server.cpp is shared), so bump its LLAMA_VERSION to match. All 28 paged patches apply clean on 0ed235ea (verified against a fresh upstream clone). The bf16-tau state-serialization fix (patch 0026) is included. Bit-exact gate + full grpc-server build verify on GPU/CI to follow. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -67,6 +67,7 @@ The frontend is a standard LocalAI instance with distributed mode enabled. These
|
||||
| `--registration-require-auth` | `LOCALAI_REGISTRATION_REQUIRE_AUTH` | `false` | Fail startup when distributed mode is enabled but the registration token is empty (node endpoints and worker file-transfer would otherwise be unauthenticated) |
|
||||
| `--distributed-require-auth` | `LOCALAI_DISTRIBUTED_REQUIRE_AUTH` | `false` | **Umbrella switch.** Implies both `--nats-require-auth` and `--registration-require-auth` — one knob to lock down the NATS bus *and* the registration/file-transfer layer. Set this in production instead of the two granular flags. |
|
||||
| `--auto-approve-nodes` | `LOCALAI_AUTO_APPROVE_NODES` | `false` | Auto-approve new worker nodes (skip admin approval) |
|
||||
| `--distributed-shared-models` | `LOCALAI_DISTRIBUTED_SHARED_MODELS` | `false` | Assert that every node mounts the **same** models directory at the **same** path (a shared volume). When `true`, the router skips file staging entirely and workers load models directly from the shared path instead of re-downloading them. See [Shared models directory](#shared-models-directory). |
|
||||
| `--auth` | `LOCALAI_AUTH` | `false` | **Must be `true`** for distributed mode |
|
||||
| `--auth-database-url` | `LOCALAI_AUTH_DATABASE_URL` | *(required)* | PostgreSQL connection URL |
|
||||
| `--backend-install-timeout` | `LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT` | `15m` | How long the frontend waits for a worker to acknowledge a backend install before considering the request stalled. Raise it when workers pull large backend images over slow links. If a worker takes longer than this, the operation shows as "still installing in background" in the admin UI and clears once the worker finishes. |
|
||||
@@ -133,6 +134,14 @@ When S3 is not configured, model files are transferred directly from the fronten
|
||||
|
||||
For high-throughput or very large model files, S3 can be more efficient since it avoids streaming through the frontend.
|
||||
|
||||
### Shared models directory
|
||||
|
||||
If every node (frontend and workers) mounts the **same** models directory at the **same** path - for example a shared volume or network filesystem, as shown in the "Shared Volume Mode" section of `docker-compose.distributed.yaml` - the model files are already present on each worker at their canonical path. In that case staging is wasted work: it copies files that already exist into a per-model subdirectory the worker then loads from, which shows up as a re-download of a model you already have.
|
||||
|
||||
Set `LOCALAI_DISTRIBUTED_SHARED_MODELS=true` (or `--distributed-shared-models`) on the frontend to skip staging entirely. The router then leaves the model's absolute paths untouched and the worker loads them directly from the shared volume.
|
||||
|
||||
This flag is a contract you assert: all nodes must mount identical paths. Leave it off (the default) when workers have independent models directories - the frontend stages files to them over HTTP (or S3) as described above.
|
||||
|
||||
{{% notice warning %}}
|
||||
The worker HTTP file transfer server is authenticated by `LOCALAI_REGISTRATION_TOKEN`. If the token is **empty**, the server **fails open** — anyone who can reach the port gets read/write access to the worker's models/staging/data directories (a remote model-poisoning / exfiltration vector). The worker logs a loud warning at startup in this case. Always set `LOCALAI_REGISTRATION_TOKEN` in distributed mode, and set `LOCALAI_DISTRIBUTED_REQUIRE_AUTH=true` (frontend **and** workers) to make a missing token *or* missing NATS credentials a hard startup error rather than a silent fail-open. Firewall the file-transfer port (gRPC base − 1) so only the frontend can reach it.
|
||||
{{% /notice %}}
|
||||
|
||||
@@ -7,16 +7,93 @@ url = "/features/face-recognition/"
|
||||
|
||||

|
||||
|
||||
LocalAI supports face recognition through the `insightface` backend:
|
||||
face verification (1:1), face identification (1:N) against a built-in
|
||||
vector store, face embedding, face detection, demographic analysis
|
||||
(age / gender), and antispoofing / liveness detection.
|
||||
LocalAI supports face recognition: face verification (1:1), face
|
||||
identification (1:N) against a built-in vector store, face embedding,
|
||||
face detection, demographic analysis (age / gender), and antispoofing /
|
||||
liveness detection.
|
||||
|
||||
The backend ships **two interchangeable engines** under one image, each
|
||||
paired with a distinct gallery entry so users can pick by license and
|
||||
accuracy needs.
|
||||
The same `/v1/face/*` HTTP API is served by two backends:
|
||||
|
||||
## Licensing — read this first
|
||||
- **`face-detect` (recommended, default).** A standalone C++/ggml
|
||||
engine ([face-detect.cpp](https://github.com/mudler/face-detect.cpp)):
|
||||
no Python, no onnxruntime, no torch runtime. Each gallery entry is a
|
||||
single self-describing GGUF. This is the recommended option for new
|
||||
deployments.
|
||||
- **`insightface` (Python).** The original ONNX Runtime backend. Still
|
||||
supported; see [the Python backend](#insightface-python-backend) below.
|
||||
|
||||
Both backends expose the identical wire format, so the API examples in
|
||||
this page work with either - only the gallery entry name (the `model`
|
||||
field) changes.
|
||||
|
||||
## face-detect (ggml) backend
|
||||
|
||||
The `face-detect` backend reads the detector and recognizer architecture
|
||||
(`facedetect.arch`) directly from the GGUF metadata, so installing a
|
||||
gallery entry is all that is needed to select an engine. It drives the
|
||||
Embeddings / Detect / FaceVerify / FaceAnalyze gRPC rpcs behind the
|
||||
`/v1/face/{embed,verify,analyze,detect,register,identify,forget}`
|
||||
endpoints.
|
||||
|
||||
### Licensing - read this first
|
||||
|
||||
| Gallery entry | Detector + recognizer | Embedding dim | License |
|
||||
|---|---|---|---|
|
||||
| `face-detect-buffalo-l` | SCRFD-10GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** (upstream insightface weights) |
|
||||
| `face-detect-buffalo-m` | SCRFD-2.5GF + ArcFace R50 + GenderAge | 512 | **Non-commercial research only** |
|
||||
| `face-detect-buffalo-s` | SCRFD-500MF + MBF + GenderAge | 512 | **Non-commercial research only** |
|
||||
| `face-detect-yunet-sface` | YuNet + SFace (OpenCV Zoo) | 128 | **Apache 2.0 - commercial-safe** |
|
||||
|
||||
The insightface buffalo packs (buffalo_l / buffalo_m / buffalo_s) are
|
||||
released by the upstream maintainers for **non-commercial research use
|
||||
only**. Pick the `face-detect-yunet-sface` entry for production /
|
||||
commercial deployments.
|
||||
|
||||
### Quickstart
|
||||
|
||||
Install the commercial-safe entry (recommended for copy-paste):
|
||||
|
||||
```bash
|
||||
local-ai models install face-detect-yunet-sface
|
||||
```
|
||||
|
||||
Verify that two images depict the same person:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/verify \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "face-detect-yunet-sface",
|
||||
"img1": "https://example.com/alice_1.jpg",
|
||||
"img2": "https://example.com/alice_2.jpg"
|
||||
}'
|
||||
```
|
||||
|
||||
Detect faces and analyze demographics (buffalo entries populate
|
||||
age / gender; YuNet + SFace returns regions only):
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/face/detect \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "face-detect-buffalo-l", "img": "https://example.com/group.jpg"}'
|
||||
|
||||
curl -sX POST http://localhost:8080/v1/face/analyze \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "face-detect-buffalo-l", "img": "https://example.com/alice.jpg"}'
|
||||
```
|
||||
|
||||
The 1:N register / identify / forget workflow and the rest of the API
|
||||
are identical to the [API reference](#api-reference) below - just pass a
|
||||
`face-detect-*` model name. The per-engine verify thresholds are ~0.35
|
||||
for the buffalo ArcFace/MBF recognizers and ~0.363 for SFace.
|
||||
|
||||
## insightface (Python) backend
|
||||
|
||||
The `insightface` backend ships **two interchangeable engines** under
|
||||
one image, each paired with a distinct gallery entry so users can pick
|
||||
by license and accuracy needs.
|
||||
|
||||
### Licensing - read this first
|
||||
|
||||
| Gallery entry | Detector + recognizer | Size | License |
|
||||
|---|---|---|---|
|
||||
|
||||
@@ -7,16 +7,92 @@ url = "/features/voice-recognition/"
|
||||
|
||||

|
||||
|
||||
LocalAI supports voice (speaker) recognition through the
|
||||
`speaker-recognition` backend: speaker verification (1:1), speaker
|
||||
identification (1:N) against a built-in vector store, speaker
|
||||
embedding, and demographic analysis (age / gender / emotion from
|
||||
voice).
|
||||
LocalAI supports voice (speaker) recognition: speaker verification
|
||||
(1:1), speaker identification (1:N) against a built-in vector store,
|
||||
speaker embedding, and demographic analysis (age / gender / emotion
|
||||
from voice).
|
||||
|
||||
The audio analog to [Face Recognition](/features/face-recognition/),
|
||||
following the same two-engine pattern under one image.
|
||||
served over the same `/v1/voice/*` HTTP API by two backends:
|
||||
|
||||
## Engines
|
||||
- **`voice-detect` (recommended, default).** A standalone C++/ggml
|
||||
engine ([voice-detect.cpp](https://github.com/mudler/voice-detect.cpp)):
|
||||
no Python, no onnxruntime, no torch runtime. Each gallery entry is a
|
||||
single self-describing GGUF. This is the recommended option for new
|
||||
deployments.
|
||||
- **`speaker-recognition` (Python).** The original SpeechBrain / ONNX
|
||||
backend. Still supported; see [the Python backend](#speaker-recognition-python-backend)
|
||||
below.
|
||||
|
||||
Both backends expose the identical wire format, so the API examples on
|
||||
this page work with either - only the gallery entry name (the `model`
|
||||
field) changes.
|
||||
|
||||
## voice-detect (ggml) backend
|
||||
|
||||
The `voice-detect` backend reads the embedding (or analysis)
|
||||
architecture (`voicedetect.arch`) directly from the GGUF metadata, so
|
||||
installing a gallery entry is all that is needed to select an engine. It
|
||||
drives the VoiceEmbed / VoiceVerify / VoiceAnalyze gRPC rpcs behind the
|
||||
`/v1/voice/{embed,verify,analyze,register,identify,forget}` endpoints.
|
||||
|
||||
### Gallery entries
|
||||
|
||||
| Gallery entry | Model | Embedding dim | License |
|
||||
|---|---|---|---|
|
||||
| `voice-detect-ecapa-tdnn` | SpeechBrain ECAPA-TDNN (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
|
||||
| `voice-detect-wespeaker-resnet34` | WeSpeaker ResNet34 (VoxCeleb) | 256 | CC-BY-4.0 |
|
||||
| `voice-detect-eres2net` | 3D-Speaker ERes2Net (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
|
||||
| `voice-detect-campplus` | 3D-Speaker CAM++ (VoxCeleb) | 192 | **Apache 2.0 - commercial-safe** |
|
||||
| `voice-detect-emotion-wav2vec2` | audEERING wav2vec2 (age / gender / emotion) | analyze head | **CC-BY-NC-SA-4.0 - non-commercial** |
|
||||
|
||||
The four speaker-recognition entries drive verify / embed / identify.
|
||||
`voice-detect-emotion-wav2vec2` is the analysis head behind
|
||||
`/v1/voice/analyze` (continuous age estimate plus gender and emotion
|
||||
class scores) and is **non-commercial / research use only**.
|
||||
|
||||
### Quickstart
|
||||
|
||||
Install the default entry (recommended for copy-paste):
|
||||
|
||||
```bash
|
||||
local-ai models install voice-detect-ecapa-tdnn
|
||||
```
|
||||
|
||||
Verify that two audio clips were spoken by the same person:
|
||||
|
||||
```bash
|
||||
curl -sX POST http://localhost:8080/v1/voice/verify \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "voice-detect-ecapa-tdnn",
|
||||
"audio1": "https://example.com/alice_1.wav",
|
||||
"audio2": "https://example.com/alice_2.wav"
|
||||
}'
|
||||
```
|
||||
|
||||
Analyze age / gender / emotion (install the analyze entry first):
|
||||
|
||||
```bash
|
||||
local-ai models install voice-detect-emotion-wav2vec2
|
||||
|
||||
curl -sX POST http://localhost:8080/v1/voice/analyze \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "voice-detect-emotion-wav2vec2", "audio": "https://example.com/alice.wav"}'
|
||||
```
|
||||
|
||||
The 1:N register / identify / forget workflow and the rest of the API
|
||||
are identical to the [API reference](#api-reference) below - just pass a
|
||||
`voice-detect-*` model name. The default verify threshold is ~0.25 for
|
||||
the ECAPA-TDNN / ERes2Net / CAM++ recognizers and ~0.30 for WeSpeaker
|
||||
ResNet34.
|
||||
|
||||
## speaker-recognition (Python) backend
|
||||
|
||||
The `speaker-recognition` backend follows the same two-engine pattern
|
||||
under one image.
|
||||
|
||||
### Engines
|
||||
|
||||
| Gallery entry | Model | Size | License |
|
||||
|---|---|---|---|
|
||||
|
||||
@@ -97,6 +97,8 @@ All backends listed here can be installed on demand from the [Backend Gallery]({
|
||||
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [sam3.cpp](https://github.com/PABannier/sam3.cpp) | Segment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [face-detect.cpp](https://github.com/mudler/face-detect.cpp) | Native face detection, recognition, embedding, demographics and anti-spoofing (SCRFD/ArcFace, YuNet/SFace) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [voice-detect.cpp](https://github.com/mudler/voice-detect.cpp) | Native speaker (voice) recognition and voice analysis (ECAPA-TDNN, WeSpeaker, ERes2Net, CAM++, wav2vec2) in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [insightface](https://github.com/deepinsight/insightface) | Face verification, embedding, and anti-spoofing liveness (ONNX Runtime) | CPU, CUDA 12 |
|
||||
| [speaker-recognition](https://speechbrain.github.io/) | Speaker (voice) recognition via SpeechBrain ECAPA-TDNN | CPU, CUDA 12, Metal |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user