mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-29 11:37:40 -04:00
* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis
Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.
New gRPC RPCs (backend/backend.proto):
* FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
* FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse
Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.
New HTTP endpoints under /v1/face/:
* verify — 1:1 image pair same-person decision
* analyze — per-face age + gender (emotion/race reserved)
* register — 1:N enrollment; stores embedding in vector store
* identify — 1:N recognition; detect → embed → StoresFind
* forget — remove a registered face by opaque ID
Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).
New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.
Gallery (backend/index.yaml) ships three entries:
* insightface-buffalo-l — SCRFD-10GF + ArcFace R50 + genderage
(~326MB pre-baked; non-commercial research use only)
* insightface-opencv — YuNet + SFace (~40MB pre-baked; Apache 2.0)
* insightface-buffalo-s — SCRFD-500MF + MBF (runtime download; non-commercial)
Python backend (backend/python/insightface/):
* engines.py — FaceEngine protocol with InsightFaceEngine and
OnnxDirectEngine; resolves model paths relative to the backend
directory so the same gallery config works in docker-scratch and
in the e2e-backends rootfs-extraction harness.
* backend.py — gRPC servicer implementing Health, LoadModel, Status,
Embedding, Detect, FaceVerify, FaceAnalyze.
* install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
backend directory so first-run is offline-clean (the final scratch
image only preserves files under /<backend>/).
* test.py — parametrized unit tests over both engines.
Tests:
* Registry unit tests (go test -race ./core/services/facerecognition/...)
— in-memory fake grpc.Backend, table-driven, covers register/
identify/forget/error paths + concurrent access.
* tests/e2e-backends/backend_test.go extended with face caps
(face_detect, face_embed, face_verify, face_analyze); relative
ordering + configurable verifyCeiling per engine.
* Makefile targets: test-extra-backend-insightface-buffalo-l,
-opencv, and the -all aggregate.
* CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
auto-triggered by changes under backend/python/insightface/.
Docs:
* docs/content/features/face-recognition.md — feature page with
license table, quickstart (defaults to the commercial-safe model),
models matrix, API reference, 1:N workflow, storage caveats.
* Cross-refs in object-detection.md, stores.md, embeddings.md, and
whats-new.md.
* Contributor README at backend/python/insightface/README.md.
Verified end-to-end:
* buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
face_verify, face_analyze).
* opencv: 5/5 specs (same minus face_analyze — SFace has no
demographic head; correctly skipped via BACKEND_TEST_CAPS).
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): move engine selection to model gallery, collapse backend entries
The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.
Correct split:
* `backend/index.yaml` now has ONE `insightface` backend entry
shipping the CPU + CUDA 12 container images. The Python backend
bundles both the non-commercial insightface model packs
(buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
weights (YuNet + SFace); the active engine is selected at
LoadModel time via `options: ["engine:..."]`.
* `gallery/index.yaml` gains three model entries —
`insightface-buffalo-l`, `insightface-opencv`,
`insightface-buffalo-s` — each setting the appropriate
`overrides.backend` + `overrides.options` so installing one
actually gives the user the intended engine. This matches how
`rfdetr-base` lives in the model gallery against the `rfdetr`
backend.
The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): cover all supported models in the gallery + drop weight baking
Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.
Gallery now has seven `insightface-*` model entries (gallery/index.yaml):
insightface (family) — non-commercial research use
• buffalo-l (326MB) — SCRFD-10GF + ResNet50 + genderage, default
• buffalo-m (313MB) — SCRFD-2.5GF + ResNet50 + genderage
• buffalo-s (159MB) — SCRFD-500MF + MBF + genderage
• buffalo-sc (16MB) — SCRFD-500MF + MBF, recognition only
(no landmarks, no demographics — analyze
returns empty attributes)
• antelopev2 (407MB) — SCRFD-10GF + ResNet100@Glint360K + genderage
OpenCV Zoo family — Apache 2.0 commercial-safe
• opencv — YuNet + SFace fp32 (~40MB)
• opencv-int8 — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)
Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:
* OpenCV variants list `files:` with ONNX URIs + SHA-256, so
`local-ai models install insightface-opencv` pulls them into the
models directory exactly like any other gallery-managed model.
* insightface packs (upstream distributes .zip archives only, not
individual ONNX files) auto-download on first LoadModel via
FaceAnalysis' built-in machinery, rooted at the LocalAI models
directory so they live alongside everything else — same pattern
`rfdetr` uses with `inference.get_model()`.
Backend changes (backend/python/insightface/):
* backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
LocalAI models directory) to engines via a `_model_dir` hint.
This replaces the earlier ModelFile-dirname approach; ModelPath
is the canonical "models directory" variable set by the Go loader
(pkg/model/initializers.go:144) and is always populated.
* engines.py::_resolve_model_path — picks up `model_dir` and searches
it (plus basename-in-model-dir) before falling back to the dev
script-dir. This is how OnnxDirectEngine finds gallery-downloaded
YuNet/SFace files by filename only.
* engines.py::_flatten_insightface_pack — new helper that works
around an upstream packaging inconsistency: buffalo_l/s/sc zips
expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
files in a redundant `<name>/` directory. insightface's own
loader looks one level too shallow and fails. We call
`ensure_available()` explicitly, flatten if nested, then hand to
FaceAnalysis.
* engines.py::InsightFaceEngine.prepare — root-resolution order now
includes the `_model_dir` hint so packs download into the LocalAI
models directory by default.
* install.sh — no longer pre-downloads any weights. Everything is
gallery-managed now.
* smoke.py (new) — parametrized smoke test that iterates over every
gallery configuration, simulating the LocalAI install flow
(creates a models dir, fetches OpenCV files with checksum
verification, lets insightface auto-download its packs), then
runs detect + embed + verify (+ analyze where supported) through
the in-process BackendServicer.
* test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
paths; downloads ONNX files to a temp dir at setUpClass time and
passes ModelPath accordingly.
Registry change (core/services/facerecognition/store_registry.go):
* `dim=0` in NewStoreRegistry now means "accept whatever dimension
arrives" — needed because the backend supports 512-d ArcFace/MBF
and 128-d SFace via the same Registry. A non-zero dim still fails
fast with ErrDimensionMismatch.
* core/application plumbs `faceEmbeddingDim = 0`, explaining the
rationale in the comment.
Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.
Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:
PASS: insightface-buffalo-l faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-sc faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-s faces=6 dim=512 same-dist=0.000
PASS: insightface-buffalo-m faces=6 dim=512 same-dist=0.000
PASS: insightface-antelopev2 faces=6 dim=512 same-dist=0.000
PASS: insightface-opencv faces=6 dim=128 same-dist=0.000
PASS: insightface-opencv-int8 faces=6 dim=128 same-dist=0.000
7/7 passed
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim
CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:
LoadModel failed: Failed to load face engine:
OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx
Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.
Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).
Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).
Assisted-by: Claude:claude-opus-4-7
* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs
The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.
Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).
Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).
Response:
{
"embedding": [<dim> floats, L2-normed],
"dim": int, // 512 for ArcFace R50 / MBF, 128 for SFace
"model": "<name>"
}
Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.
Assisted-by: Claude:claude-opus-4-7
* fix(http): map malformed image input + gRPC status codes to proper 4xx
Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.
Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:
* decodeImageInput(s)
Wraps utils.GetContentURIAsBase64 and turns any failure
(invalid URL, not a data-URI, download error, etc.) into
echo.NewHTTPError(400, "invalid image input: ...").
* mapBackendError(err)
Inspects the gRPC status on a backend call error and maps:
INVALID_ARGUMENT → 400 Bad Request
NOT_FOUND → 404 Not Found
FAILED_PRECONDITION → 412 Precondition Failed
Unimplemented → 501 Not Implemented
All other codes fall through unchanged (still 500).
Before, my 1×1 PNG error-path test returned:
HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
HTTP 400 "failed to decode one or both images"
Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.
Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.
Assisted-by: Claude:claude-opus-4-7
* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis
Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.
Two changes:
1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
`files:` list with the pack zip's URI + SHA-256. `local-ai models
install insightface-buffalo-l` now fetches the zip, verifies the
hash, and extracts it into the models directory. No more reliance
on insightface's library-internal `ensure_available()` auto-download
or its hardcoded `BASE_REPO_URL`.
2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
the FaceAnalysis wrapper and drives insightface's `model_zoo`
directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
route each through `model_zoo.get_model()`, build a
`{taskname: model}` dict, loop per-face at inference — are
reimplemented in `InsightFaceEngine`. The actual inference classes
(RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
insightface's — we only replicate the glue, so drift risk against
upstream is minimal.
Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
layout that doesn't match what LocalAI's zip extraction produces.
LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
`<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
`_locate_insightface_pack` helper searches both locations plus
legacy paths and returns whichever has ONNX files. Replaces the
earlier `_flatten_insightface_pack` helper (which tried to fight
FaceAnalysis's layout expectations; now we just find the files
wherever they are).
Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.
Assisted-by: Claude:claude-opus-4-7
* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched
The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".
Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.
Face analyze cap dropped since buffalo_sc has no age/gender head.
Assisted-by: Claude:claude-opus-4-7[1m]
* feat(face-recognition): surface face-recognition in advertised feature maps
The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:
* api_instructions — the machine-readable capability index at
GET /api/instructions. Added `face-recognition` as a dedicated
instruction area with an intro that calls out the in-memory
registry caveat and the /v1/face/embed vs /v1/embeddings split.
* auth/permissions — added FeatureFaceRecognition constant, routed
all six face endpoints through it so admins can gate them per-user
like any other API feature. Default ON (matches the other API
features).
* React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
follow-up (noted in the plan).
Instruction count bumped 9 → 10; test updated.
Assisted-by: Claude:claude-opus-4-7[1m]
* docs(agents): capture advertising-surface steps in the endpoint guide
Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.
Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.
Assisted-by: Claude:claude-opus-4-7[1m]
773 lines
27 KiB
Go
773 lines
27 KiB
Go
package e2ebackends_test
|
|
|
|
import (
|
|
"context"
|
|
"encoding/base64"
|
|
"fmt"
|
|
"io"
|
|
"net"
|
|
"os"
|
|
"os/exec"
|
|
"path/filepath"
|
|
"strings"
|
|
"time"
|
|
|
|
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
|
. "github.com/onsi/ginkgo/v2"
|
|
. "github.com/onsi/gomega"
|
|
"github.com/phayes/freeport"
|
|
"google.golang.org/grpc"
|
|
"google.golang.org/grpc/credentials/insecure"
|
|
)
|
|
|
|
// Environment variables consumed by the suite.
|
|
//
|
|
// Required (one of):
|
|
//
|
|
// BACKEND_IMAGE Docker image tag to test (e.g. local-ai-backend:llama-cpp).
|
|
//
|
|
// Required model source (one of):
|
|
//
|
|
// BACKEND_TEST_MODEL_URL HTTP(S) URL of a model file to download before the test.
|
|
// BACKEND_TEST_MODEL_FILE Path to an already-available model file (skips download).
|
|
// BACKEND_TEST_MODEL_NAME HuggingFace model id (e.g. "Qwen/Qwen2.5-0.5B-Instruct").
|
|
// Passed verbatim as ModelOptions.Model; backends like vllm
|
|
// resolve it themselves and no local file is downloaded.
|
|
//
|
|
// Optional:
|
|
//
|
|
// BACKEND_TEST_MMPROJ_URL HTTP(S) URL of an mmproj file (audio/vision encoder)
|
|
// to download alongside the main model — required for
|
|
// multimodal models like Qwen3-ASR-0.6B-GGUF.
|
|
// BACKEND_TEST_MMPROJ_FILE Path to an already-available mmproj file.
|
|
// BACKEND_TEST_AUDIO_URL HTTP(S) URL of a sample audio file used by the
|
|
// transcription specs.
|
|
// BACKEND_TEST_AUDIO_FILE Path to an already-available sample audio file.
|
|
// BACKEND_TEST_CAPS Comma-separated list of capabilities to exercise.
|
|
// Supported values: health, load, predict, stream,
|
|
// embeddings, tools, transcription, image.
|
|
// Defaults to "health,load,predict,stream".
|
|
// A backend that only does embeddings would set this to
|
|
// "health,load,embeddings"; an image-generation backend
|
|
// that cannot be driven by a text prompt can set it to
|
|
// "health,load,image".
|
|
// "tools" asks the backend to extract a tool call from the
|
|
// model output into ChatDelta.tool_calls.
|
|
// "image" exercises the GenerateImage RPC and asserts a
|
|
// non-empty file is written to the requested dst path.
|
|
// BACKEND_TEST_IMAGE_PROMPT Override the positive prompt for the image spec
|
|
// (default: "a photograph of an astronaut riding a horse").
|
|
// BACKEND_TEST_IMAGE_STEPS Override the diffusion step count for the image spec
|
|
// (default: 4 — keeps CPU-only runs under a few minutes).
|
|
// BACKEND_TEST_PROMPT Override the prompt used by predict/stream specs.
|
|
// BACKEND_TEST_CTX_SIZE Override the context size passed to LoadModel (default 512).
|
|
// BACKEND_TEST_THREADS Override Threads passed to LoadModel (default 4).
|
|
// BACKEND_TEST_OPTIONS Comma-separated Options[] entries passed to LoadModel,
|
|
// e.g. "tool_parser:hermes,reasoning_parser:qwen3".
|
|
// BACKEND_TEST_CACHE_TYPE_K Sets ModelOptions.CacheTypeKey (llama.cpp -ctk),
|
|
// e.g. "q8_0" — exercises KV-cache quantization code paths.
|
|
// BACKEND_TEST_CACHE_TYPE_V Sets ModelOptions.CacheTypeValue (llama.cpp -ctv).
|
|
// BACKEND_TEST_TOOL_PROMPT Override the user prompt for the tools spec
|
|
// (default: "What's the weather like in Paris, France?").
|
|
// BACKEND_TEST_TOOL_NAME Override the function name expected in the tool call
|
|
// (default: "get_weather").
|
|
//
|
|
// The suite is intentionally model-format-agnostic: it only ever passes the
|
|
// file path to LoadModel, so GGUF, ONNX, safetensors, .bin etc. all work so
|
|
// long as the backend under test accepts that format.
|
|
const (
|
|
capHealth = "health"
|
|
capLoad = "load"
|
|
capPredict = "predict"
|
|
capStream = "stream"
|
|
capEmbeddings = "embeddings"
|
|
capTools = "tools"
|
|
capTranscription = "transcription"
|
|
capImage = "image"
|
|
capFaceDetect = "face_detect"
|
|
capFaceEmbed = "face_embed"
|
|
capFaceVerify = "face_verify"
|
|
capFaceAnalyze = "face_analyze"
|
|
|
|
defaultPrompt = "The capital of France is"
|
|
streamPrompt = "Once upon a time"
|
|
defaultToolPrompt = "What's the weather like in Paris, France?"
|
|
defaultToolName = "get_weather"
|
|
defaultImagePrompt = "a photograph of an astronaut riding a horse"
|
|
defaultImageSteps = 4
|
|
defaultVerifyDistanceCeil = float32(0.6) // upper bound for same-person; SFace runs closer to 0.5 ArcFace to 0.35.
|
|
)
|
|
|
|
func defaultCaps() map[string]bool {
|
|
return map[string]bool{
|
|
capHealth: true,
|
|
capLoad: true,
|
|
capPredict: true,
|
|
capStream: true,
|
|
}
|
|
}
|
|
|
|
// parseCaps reads BACKEND_TEST_CAPS and returns the enabled capability set.
|
|
// An empty/unset value falls back to defaultCaps().
|
|
func parseCaps() map[string]bool {
|
|
raw := strings.TrimSpace(os.Getenv("BACKEND_TEST_CAPS"))
|
|
if raw == "" {
|
|
return defaultCaps()
|
|
}
|
|
caps := map[string]bool{}
|
|
for _, part := range strings.Split(raw, ",") {
|
|
part = strings.TrimSpace(strings.ToLower(part))
|
|
if part != "" {
|
|
caps[part] = true
|
|
}
|
|
}
|
|
return caps
|
|
}
|
|
|
|
var _ = Describe("Backend container", Ordered, func() {
|
|
var (
|
|
caps map[string]bool
|
|
workDir string
|
|
binaryDir string
|
|
modelFile string // set when a local file is used
|
|
modelName string // set when a HuggingFace model id is used
|
|
mmprojFile string // optional multimodal projector
|
|
audioFile string // optional audio fixture for transcription specs
|
|
// Face fixtures: two photos of the same person + one different person.
|
|
faceFile1 string
|
|
faceFile2 string
|
|
faceFile3 string
|
|
// verifyCeiling is the upper-bound cosine distance for a
|
|
// same-person pair; each model configuration can override it via
|
|
// BACKEND_TEST_VERIFY_DISTANCE_CEILING because SFace's distance
|
|
// distribution is wider than ArcFace's.
|
|
verifyCeiling float32
|
|
addr string
|
|
serverCmd *exec.Cmd
|
|
conn *grpc.ClientConn
|
|
client pb.BackendClient
|
|
prompt string
|
|
options []string
|
|
)
|
|
|
|
BeforeAll(func() {
|
|
image := os.Getenv("BACKEND_IMAGE")
|
|
Expect(image).NotTo(BeEmpty(), "BACKEND_IMAGE env var must be set (e.g. local-ai-backend:llama-cpp)")
|
|
|
|
modelURL := os.Getenv("BACKEND_TEST_MODEL_URL")
|
|
modelFile = os.Getenv("BACKEND_TEST_MODEL_FILE")
|
|
modelName = os.Getenv("BACKEND_TEST_MODEL_NAME")
|
|
Expect(modelURL != "" || modelFile != "" || modelName != "").To(BeTrue(),
|
|
"one of BACKEND_TEST_MODEL_URL, BACKEND_TEST_MODEL_FILE, or BACKEND_TEST_MODEL_NAME must be set")
|
|
|
|
caps = parseCaps()
|
|
GinkgoWriter.Printf("Testing image=%q with capabilities=%v\n", image, keys(caps))
|
|
|
|
prompt = os.Getenv("BACKEND_TEST_PROMPT")
|
|
if prompt == "" {
|
|
prompt = defaultPrompt
|
|
}
|
|
|
|
if raw := strings.TrimSpace(os.Getenv("BACKEND_TEST_OPTIONS")); raw != "" {
|
|
for _, opt := range strings.Split(raw, ",") {
|
|
opt = strings.TrimSpace(opt)
|
|
if opt != "" {
|
|
options = append(options, opt)
|
|
}
|
|
}
|
|
}
|
|
|
|
var err error
|
|
workDir, err = os.MkdirTemp("", "backend-e2e-*")
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
// Extract the image filesystem so we can run run.sh directly.
|
|
binaryDir = filepath.Join(workDir, "rootfs")
|
|
Expect(os.MkdirAll(binaryDir, 0o755)).To(Succeed())
|
|
extractImage(image, binaryDir)
|
|
Expect(filepath.Join(binaryDir, "run.sh")).To(BeAnExistingFile())
|
|
|
|
// Download the model once if not provided and no HF name given.
|
|
if modelFile == "" && modelName == "" {
|
|
modelFile = filepath.Join(workDir, "model.bin")
|
|
downloadFile(modelURL, modelFile)
|
|
}
|
|
|
|
// Multimodal projector (mmproj): required by audio/vision-capable
|
|
// llama.cpp models like Qwen3-ASR-0.6B-GGUF. Either file or URL.
|
|
mmprojFile = os.Getenv("BACKEND_TEST_MMPROJ_FILE")
|
|
if mmprojFile == "" {
|
|
if url := os.Getenv("BACKEND_TEST_MMPROJ_URL"); url != "" {
|
|
mmprojFile = filepath.Join(workDir, "mmproj.bin")
|
|
downloadFile(url, mmprojFile)
|
|
}
|
|
}
|
|
|
|
// Audio fixture for the transcription specs.
|
|
audioFile = os.Getenv("BACKEND_TEST_AUDIO_FILE")
|
|
if audioFile == "" {
|
|
if url := os.Getenv("BACKEND_TEST_AUDIO_URL"); url != "" {
|
|
audioFile = filepath.Join(workDir, "sample.wav")
|
|
downloadFile(url, audioFile)
|
|
}
|
|
}
|
|
|
|
// Face fixtures for the face-recognition specs.
|
|
faceFile1 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_1", "face_a_1.jpg")
|
|
faceFile2 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_2", "face_a_2.jpg")
|
|
faceFile3 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_3", "face_b.jpg")
|
|
verifyCeiling = envFloat32("BACKEND_TEST_VERIFY_DISTANCE_CEILING", defaultVerifyDistanceCeil)
|
|
|
|
// Pick a free port and launch the backend.
|
|
port, err := freeport.GetFreePort()
|
|
Expect(err).NotTo(HaveOccurred())
|
|
addr = fmt.Sprintf("127.0.0.1:%d", port)
|
|
|
|
Expect(os.Chmod(filepath.Join(binaryDir, "run.sh"), 0o755)).To(Succeed())
|
|
// Mark any other top-level files executable (extraction may strip perms).
|
|
entries, _ := os.ReadDir(binaryDir)
|
|
for _, e := range entries {
|
|
if !e.IsDir() && !strings.HasSuffix(e.Name(), ".sh") {
|
|
_ = os.Chmod(filepath.Join(binaryDir, e.Name()), 0o755)
|
|
}
|
|
}
|
|
|
|
serverCmd = exec.Command(filepath.Join(binaryDir, "run.sh"), "--addr="+addr)
|
|
serverCmd.Stdout = GinkgoWriter
|
|
serverCmd.Stderr = GinkgoWriter
|
|
Expect(serverCmd.Start()).To(Succeed())
|
|
|
|
// Wait for the gRPC port to accept connections.
|
|
Eventually(func() error {
|
|
c, err := net.DialTimeout("tcp", addr, 500*time.Millisecond)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
_ = c.Close()
|
|
return nil
|
|
}, 30*time.Second, 200*time.Millisecond).Should(Succeed(), "backend did not start")
|
|
|
|
conn, err = grpc.Dial(addr,
|
|
grpc.WithTransportCredentials(insecure.NewCredentials()),
|
|
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(50*1024*1024)),
|
|
)
|
|
Expect(err).NotTo(HaveOccurred())
|
|
client = pb.NewBackendClient(conn)
|
|
})
|
|
|
|
AfterAll(func() {
|
|
if conn != nil {
|
|
_ = conn.Close()
|
|
}
|
|
if serverCmd != nil && serverCmd.Process != nil {
|
|
_ = serverCmd.Process.Kill()
|
|
_, _ = serverCmd.Process.Wait()
|
|
}
|
|
if workDir != "" {
|
|
_ = os.RemoveAll(workDir)
|
|
}
|
|
})
|
|
|
|
It("responds to Health", func() {
|
|
if !caps[capHealth] {
|
|
Skip("health capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
res, err := client.Health(ctx, &pb.HealthMessage{})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty())
|
|
})
|
|
|
|
It("loads the model", func() {
|
|
if !caps[capLoad] {
|
|
Skip("load capability not enabled")
|
|
}
|
|
ctxSize := envInt32("BACKEND_TEST_CTX_SIZE", 512)
|
|
threads := envInt32("BACKEND_TEST_THREADS", 4)
|
|
|
|
// Prefer a HuggingFace model id when provided (e.g. for vllm);
|
|
// otherwise fall back to a downloaded/local file path.
|
|
modelRef := modelFile
|
|
var modelPath string
|
|
if modelName != "" {
|
|
modelRef = modelName
|
|
} else {
|
|
modelPath = modelFile
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
|
|
defer cancel()
|
|
res, err := client.LoadModel(ctx, &pb.ModelOptions{
|
|
Model: modelRef,
|
|
ModelFile: modelPath,
|
|
ContextSize: ctxSize,
|
|
Threads: threads,
|
|
NGPULayers: 0,
|
|
MMap: true,
|
|
NBatch: 128,
|
|
Options: options,
|
|
MMProj: mmprojFile,
|
|
CacheTypeKey: os.Getenv("BACKEND_TEST_CACHE_TYPE_K"),
|
|
CacheTypeValue: os.Getenv("BACKEND_TEST_CACHE_TYPE_V"),
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetSuccess()).To(BeTrue(), "LoadModel failed: %s", res.GetMessage())
|
|
})
|
|
|
|
It("generates output via Predict", func() {
|
|
if !caps[capPredict] {
|
|
Skip("predict capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Prompt: prompt,
|
|
Tokens: 20,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty(), "Predict produced empty output")
|
|
GinkgoWriter.Printf("Predict: %q (tokens=%d, prompt_tokens=%d)\n",
|
|
res.GetMessage(), res.GetTokens(), res.GetPromptTokens())
|
|
})
|
|
|
|
It("streams output via PredictStream", func() {
|
|
if !caps[capStream] {
|
|
Skip("stream capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
stream, err := client.PredictStream(ctx, &pb.PredictOptions{
|
|
Prompt: streamPrompt,
|
|
Tokens: 20,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
var chunks int
|
|
var combined string
|
|
for {
|
|
msg, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if len(msg.GetMessage()) > 0 {
|
|
chunks++
|
|
combined += string(msg.GetMessage())
|
|
}
|
|
}
|
|
Expect(chunks).To(BeNumerically(">", 0), "no stream chunks received")
|
|
GinkgoWriter.Printf("Stream: %d chunks, combined=%q\n", chunks, combined)
|
|
})
|
|
|
|
It("computes embeddings via Embedding", func() {
|
|
if !caps[capEmbeddings] {
|
|
Skip("embeddings capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.Embedding(ctx, &pb.PredictOptions{
|
|
Embeddings: prompt,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetEmbeddings()).NotTo(BeEmpty(), "Embedding returned empty vector")
|
|
GinkgoWriter.Printf("Embedding: %d dims\n", len(res.GetEmbeddings()))
|
|
})
|
|
|
|
It("generates an image via GenerateImage", func() {
|
|
if !caps[capImage] {
|
|
Skip("image capability not enabled")
|
|
}
|
|
|
|
imgPrompt := os.Getenv("BACKEND_TEST_IMAGE_PROMPT")
|
|
if imgPrompt == "" {
|
|
imgPrompt = defaultImagePrompt
|
|
}
|
|
steps := envInt32("BACKEND_TEST_IMAGE_STEPS", defaultImageSteps)
|
|
|
|
dst := filepath.Join(workDir, "generated.png")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
|
|
defer cancel()
|
|
res, err := client.GenerateImage(ctx, &pb.GenerateImageRequest{
|
|
PositivePrompt: imgPrompt,
|
|
NegativePrompt: "",
|
|
Width: 512,
|
|
Height: 512,
|
|
Step: steps,
|
|
Seed: 42,
|
|
Dst: dst,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetSuccess()).To(BeTrue(), "GenerateImage failed: %s", res.GetMessage())
|
|
|
|
info, err := os.Stat(dst)
|
|
Expect(err).NotTo(HaveOccurred(), "GenerateImage did not write a file at %s", dst)
|
|
Expect(info.Size()).To(BeNumerically(">", int64(0)),
|
|
"GenerateImage wrote an empty file at %s", dst)
|
|
GinkgoWriter.Printf("GenerateImage: wrote %s (%d bytes)\n", dst, info.Size())
|
|
})
|
|
|
|
It("extracts tool calls into ChatDelta", func() {
|
|
if !caps[capTools] {
|
|
Skip("tools capability not enabled")
|
|
}
|
|
|
|
toolPrompt := os.Getenv("BACKEND_TEST_TOOL_PROMPT")
|
|
if toolPrompt == "" {
|
|
toolPrompt = defaultToolPrompt
|
|
}
|
|
toolName := os.Getenv("BACKEND_TEST_TOOL_NAME")
|
|
if toolName == "" {
|
|
toolName = defaultToolName
|
|
}
|
|
|
|
toolsJSON := fmt.Sprintf(`[{
|
|
"type": "function",
|
|
"function": {
|
|
"name": %q,
|
|
"description": "Get the current weather for a location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA"
|
|
}
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
}
|
|
}]`, toolName)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Messages: []*pb.Message{
|
|
{Role: "system", Content: "You are a helpful assistant. Use the provided tool when the user asks about weather."},
|
|
{Role: "user", Content: toolPrompt},
|
|
},
|
|
Tools: toolsJSON,
|
|
ToolChoice: "auto",
|
|
UseTokenizerTemplate: true,
|
|
Tokens: 200,
|
|
Temperature: 0.1,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
// Collect tool calls from every delta — some backends emit a single
|
|
// final delta, others stream incremental pieces in one Reply.
|
|
var toolCalls []*pb.ToolCallDelta
|
|
for _, delta := range res.GetChatDeltas() {
|
|
toolCalls = append(toolCalls, delta.GetToolCalls()...)
|
|
}
|
|
|
|
GinkgoWriter.Printf("Tool call: raw=%q deltas=%d tool_calls=%d\n",
|
|
string(res.GetMessage()), len(res.GetChatDeltas()), len(toolCalls))
|
|
|
|
Expect(toolCalls).NotTo(BeEmpty(),
|
|
"Predict did not return any ToolCallDelta. raw=%q", string(res.GetMessage()))
|
|
|
|
matched := false
|
|
for _, tc := range toolCalls {
|
|
GinkgoWriter.Printf(" - idx=%d id=%q name=%q args=%q\n",
|
|
tc.GetIndex(), tc.GetId(), tc.GetName(), tc.GetArguments())
|
|
if tc.GetName() == toolName {
|
|
matched = true
|
|
}
|
|
}
|
|
Expect(matched).To(BeTrue(),
|
|
"Expected a tool call named %q in ChatDelta.tool_calls", toolName)
|
|
})
|
|
|
|
It("transcribes audio via AudioTranscription", func() {
|
|
if !caps[capTranscription] {
|
|
Skip("transcription capability not enabled")
|
|
}
|
|
Expect(audioFile).NotTo(BeEmpty(),
|
|
"BACKEND_TEST_AUDIO_FILE or BACKEND_TEST_AUDIO_URL must be set when transcription cap is enabled")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
res, err := client.AudioTranscription(ctx, &pb.TranscriptRequest{
|
|
Dst: audioFile,
|
|
Threads: uint32(envInt32("BACKEND_TEST_THREADS", 4)),
|
|
Temperature: 0.0,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(strings.TrimSpace(res.GetText())).NotTo(BeEmpty(),
|
|
"AudioTranscription returned empty text")
|
|
GinkgoWriter.Printf("AudioTranscription: text=%q language=%q duration=%v\n",
|
|
res.GetText(), res.GetLanguage(), res.GetDuration())
|
|
})
|
|
|
|
It("streams audio transcription via AudioTranscriptionStream", func() {
|
|
if !caps[capTranscription] {
|
|
Skip("transcription capability not enabled")
|
|
}
|
|
Expect(audioFile).NotTo(BeEmpty(),
|
|
"BACKEND_TEST_AUDIO_FILE or BACKEND_TEST_AUDIO_URL must be set when transcription cap is enabled")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
stream, err := client.AudioTranscriptionStream(ctx, &pb.TranscriptRequest{
|
|
Dst: audioFile,
|
|
Threads: uint32(envInt32("BACKEND_TEST_THREADS", 4)),
|
|
Temperature: 0.0,
|
|
Stream: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
var deltas []string
|
|
var assembled strings.Builder
|
|
var finalText string
|
|
for {
|
|
chunk, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if d := chunk.GetDelta(); d != "" {
|
|
deltas = append(deltas, d)
|
|
assembled.WriteString(d)
|
|
}
|
|
if final := chunk.GetFinalResult(); final != nil && final.GetText() != "" {
|
|
finalText = final.GetText()
|
|
}
|
|
}
|
|
// At least one of: a delta arrived, or the final event carried text.
|
|
Expect(deltas).NotTo(BeEmpty(),
|
|
"AudioTranscriptionStream did not emit any deltas (assembled=%q final=%q)",
|
|
assembled.String(), finalText)
|
|
|
|
// If both arrived, the final event should match the assembled deltas.
|
|
if finalText != "" && assembled.Len() > 0 {
|
|
Expect(finalText).To(Equal(assembled.String()),
|
|
"final transcript should match concatenated deltas")
|
|
}
|
|
GinkgoWriter.Printf("AudioTranscriptionStream: deltas=%d assembled=%q final=%q\n",
|
|
len(deltas), assembled.String(), finalText)
|
|
})
|
|
|
|
// ─── face recognition specs ─────────────────────────────────────────
|
|
|
|
It("detects faces via Detect", func() {
|
|
if !caps[capFaceDetect] {
|
|
Skip("face_detect capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer cancel()
|
|
res, err := client.Detect(ctx, &pb.DetectOptions{Src: base64File(faceFile1)})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetDetections()).NotTo(BeEmpty(), "Detect returned no faces")
|
|
for _, d := range res.GetDetections() {
|
|
Expect(d.GetClassName()).To(Equal("face"))
|
|
Expect(d.GetWidth()).To(BeNumerically(">", 0))
|
|
Expect(d.GetHeight()).To(BeNumerically(">", 0))
|
|
}
|
|
GinkgoWriter.Printf("face_detect: %d faces\n", len(res.GetDetections()))
|
|
})
|
|
|
|
It("produces face embeddings via Embedding", func() {
|
|
if !caps[capFaceEmbed] {
|
|
Skip("face_embed capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.Embedding(ctx, &pb.PredictOptions{Images: []string{base64File(faceFile1)}})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
vec := res.GetEmbeddings()
|
|
Expect(vec).NotTo(BeEmpty(), "Embedding returned empty vector")
|
|
// Face embeddings are L2-normalized — expect unit norm.
|
|
var sumSq float64
|
|
for _, v := range vec {
|
|
sumSq += float64(v) * float64(v)
|
|
}
|
|
Expect(sumSq).To(BeNumerically("~", 1.0, 0.05),
|
|
"face embedding should be L2-normed (sum(x^2)=%.3f, dim=%d)", sumSq, len(vec))
|
|
GinkgoWriter.Printf("face_embed: dim=%d\n", len(vec))
|
|
})
|
|
|
|
It("verifies faces via FaceVerify", func() {
|
|
if !caps[capFaceVerify] {
|
|
Skip("face_verify capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
|
|
// Same image twice — expected verified=true with very small distance.
|
|
b1 := base64File(faceFile1)
|
|
same, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b1, Threshold: verifyCeiling})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(same.GetVerified()).To(BeTrue(), "same image should verify: dist=%.3f", same.GetDistance())
|
|
Expect(same.GetDistance()).To(BeNumerically("<", 0.1))
|
|
GinkgoWriter.Printf("face_verify(same): dist=%.3f confidence=%.1f\n", same.GetDistance(), same.GetConfidence())
|
|
|
|
// Different images — assert relative ordering when the detector
|
|
// actually finds a face in both images. Some fixtures (masked
|
|
// faces, profile shots, etc.) are legitimately borderline for
|
|
// SCRFD's default threshold, so we don't fail the suite when the
|
|
// second image gets a NotFound — we just log and skip the
|
|
// cross-person check. The same-image assertion above is the
|
|
// definitive proof the RPC works end-to-end.
|
|
if faceFile3 != "" {
|
|
b3 := base64File(faceFile3)
|
|
diff, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b3, Threshold: verifyCeiling})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("face_verify(diff): skipped — %v\n", err)
|
|
} else {
|
|
Expect(diff.GetDistance()).To(BeNumerically(">", same.GetDistance()),
|
|
"cross-person distance %.3f should exceed same-image distance %.3f", diff.GetDistance(), same.GetDistance())
|
|
GinkgoWriter.Printf("face_verify(diff): dist=%.3f verified=%v\n", diff.GetDistance(), diff.GetVerified())
|
|
}
|
|
}
|
|
|
|
// If two photos of the same person were provided, the ordering
|
|
// should also hold: d(a1,a2) < ceiling. Best-effort as above —
|
|
// skip if the detector doesn't find a face in the second image.
|
|
if faceFile2 != "" {
|
|
b2 := base64File(faceFile2)
|
|
sp, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b2, Threshold: verifyCeiling})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("face_verify(same-person): skipped — %v\n", err)
|
|
} else {
|
|
Expect(sp.GetDistance()).To(BeNumerically("<", verifyCeiling),
|
|
"same-person (different photos) distance %.3f exceeds ceiling %.3f", sp.GetDistance(), verifyCeiling)
|
|
GinkgoWriter.Printf("face_verify(same-person): dist=%.3f verified=%v\n", sp.GetDistance(), sp.GetVerified())
|
|
}
|
|
}
|
|
})
|
|
|
|
It("analyzes faces via FaceAnalyze", func() {
|
|
if !caps[capFaceAnalyze] {
|
|
Skip("face_analyze capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{Img: base64File(faceFile1)})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetFaces()).NotTo(BeEmpty(), "FaceAnalyze returned no faces")
|
|
for _, f := range res.GetFaces() {
|
|
Expect(f.GetFaceConfidence()).To(BeNumerically(">", 0))
|
|
Expect(f.GetAge()).To(BeNumerically(">", 0), "age should be populated by analyze-capable engines")
|
|
Expect(f.GetDominantGender()).To(BeElementOf("Man", "Woman"))
|
|
}
|
|
GinkgoWriter.Printf("face_analyze: %d faces\n", len(res.GetFaces()))
|
|
})
|
|
})
|
|
|
|
// extractImage runs `docker create` + `docker export` to materialise the image
|
|
// rootfs into dest. Using export (not save) avoids dealing with layer tarballs.
|
|
func extractImage(image, dest string) {
|
|
GinkgoHelper()
|
|
// The backend images have no default ENTRYPOINT/CMD, so docker create fails
|
|
// unless we override one; run.sh is harmless and guaranteed to exist.
|
|
create := exec.Command("docker", "create", "--entrypoint=/run.sh", image)
|
|
out, err := create.CombinedOutput()
|
|
Expect(err).NotTo(HaveOccurred(), "docker create failed: %s", string(out))
|
|
cid := strings.TrimSpace(string(out))
|
|
DeferCleanup(func() {
|
|
_ = exec.Command("docker", "rm", "-f", cid).Run()
|
|
})
|
|
|
|
// Pipe `docker export <cid>` into `tar -xf - -C dest`.
|
|
exp := exec.Command("docker", "export", cid)
|
|
expOut, err := exp.StdoutPipe()
|
|
Expect(err).NotTo(HaveOccurred())
|
|
exp.Stderr = GinkgoWriter
|
|
Expect(exp.Start()).To(Succeed())
|
|
|
|
tar := exec.Command("tar", "-xf", "-", "-C", dest)
|
|
tar.Stdin = expOut
|
|
tar.Stderr = GinkgoWriter
|
|
Expect(tar.Run()).To(Succeed())
|
|
Expect(exp.Wait()).To(Succeed())
|
|
}
|
|
|
|
// downloadFile fetches url into dest using curl -L. Used for CI convenience;
|
|
// local runs can use BACKEND_TEST_MODEL_FILE to skip downloading.
|
|
func downloadFile(url, dest string) {
|
|
GinkgoHelper()
|
|
cmd := exec.Command("curl", "-sSfL", "-o", dest, url)
|
|
cmd.Stdout = GinkgoWriter
|
|
cmd.Stderr = GinkgoWriter
|
|
Expect(cmd.Run()).To(Succeed(), "failed to download %s", url)
|
|
fi, err := os.Stat(dest)
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(fi.Size()).To(BeNumerically(">", 1024), "downloaded file is suspiciously small")
|
|
}
|
|
|
|
func envInt32(name string, def int32) int32 {
|
|
raw := os.Getenv(name)
|
|
if raw == "" {
|
|
return def
|
|
}
|
|
var v int32
|
|
_, err := fmt.Sscanf(raw, "%d", &v)
|
|
if err != nil {
|
|
return def
|
|
}
|
|
return v
|
|
}
|
|
|
|
func envFloat32(name string, def float32) float32 {
|
|
raw := os.Getenv(name)
|
|
if raw == "" {
|
|
return def
|
|
}
|
|
var v float32
|
|
if _, err := fmt.Sscanf(raw, "%f", &v); err != nil {
|
|
return def
|
|
}
|
|
return v
|
|
}
|
|
|
|
// resolveFaceFixture returns the local path of a face-fixture image,
|
|
// preferring BACKEND_TEST_<prefix>_FILE when set and otherwise
|
|
// downloading BACKEND_TEST_<prefix>_URL into workDir. Returns an empty
|
|
// string when neither is configured — specs that need it should skip.
|
|
func resolveFaceFixture(workDir, prefix, defaultName string) string {
|
|
if path := os.Getenv(prefix + "_FILE"); path != "" {
|
|
return path
|
|
}
|
|
url := os.Getenv(prefix + "_URL")
|
|
if url == "" {
|
|
return ""
|
|
}
|
|
dest := filepath.Join(workDir, defaultName)
|
|
downloadFile(url, dest)
|
|
return dest
|
|
}
|
|
|
|
// base64File reads a file and returns its base64 encoding.
|
|
func base64File(path string) string {
|
|
GinkgoHelper()
|
|
data, err := os.ReadFile(path)
|
|
Expect(err).NotTo(HaveOccurred(), "reading %s", path)
|
|
return base64.StdEncoding.EncodeToString(data)
|
|
}
|
|
|
|
func keys(m map[string]bool) []string {
|
|
out := make([]string, 0, len(m))
|
|
for k, v := range m {
|
|
if v {
|
|
out = append(out, k)
|
|
}
|
|
}
|
|
return out
|
|
}
|