mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-17 13:10:23 -04:00
* test(e2e-backends): allow BACKEND_BINARY for native-built backends
Adds an escape hatch for hardware-gated backends (e.g. ds4) where the
model is too large for Docker build context. When BACKEND_BINARY points
at a run.sh produced by 'make -C backend/cpp/<name> package', the suite
skips docker image extraction and drives the binary directly.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(e2e-backends): validate BACKEND_BINARY basename + log actual source
Two follow-ups from the cbcf5148 code review:
- BACKEND_BINARY now requires a path whose basename is `run.sh`. Without
this check, `filepath.Dir(binary)` silently discarded the filename, so
pointing the env var at an arbitrary binary failed later with a
confusing assertion that named a path the user never typed.
- The "Testing image=..." debug line printed an empty string when the
binary path was used, hiding the actual source in CI logs. The line
now reports whichever of BACKEND_IMAGE / BACKEND_BINARY is in effect
as `src=...`.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): scaffold ds4 backend dir
Adds prepare.sh, run.sh, and a .gitignore. CMakeLists, Makefile, and the
implementation arrive in follow-up commits.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): add backend Makefile
Drives ds4's upstream Makefile to produce engine .o files (CUDA on Linux
when BUILD_TYPE=cublas, Metal on Darwin, otherwise CPU debug path), then
invokes CMake on our wrapper.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): add CMakeLists for grpc-server
Generates protoc stubs from backend.proto, links grpc-server.cpp +
dsml_parser.cpp + dsml_renderer.cpp + kv_cache.cpp against pre-built
ds4 engine .o files. DS4_GPU=cuda|metal|cpu selects the backend.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): grpc-server skeleton + module stubs
The minimum that links: Backend service with Health + Free; other RPCs
default to UNIMPLEMENTED. Stub headers/sources for dsml_parser,
dsml_renderer, and kv_cache are in place so CMake links cleanly even
before those modules ship.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): implement LoadModel
Opens engine + creates session sized to ContextSize (default 32768).
Backend is compile-time: CPU when DS4_NO_GPU, Metal on __APPLE__, else
CUDA. MTP/speculative options are accepted via ModelOptions.Options[]
(mtp_path, mtp_draft, mtp_margin). kv_cache_dir option is captured into
g_kv_cache_dir for the cache module (Task 19 wires it in).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): implement TokenizeString
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): implement Predict (plain text)
Tool calls + thinking-mode split arrive in Task 13 once dsml_parser is in.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): implement PredictStream (plain text)
ChatDelta + reasoning/tool_calls split arrives in Task 14.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): implement Status RPC
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): add DSML streaming parser
Classifies raw model-emitted token text into CONTENT / REASONING /
TOOL_START / TOOL_ARGS / TOOL_END events. Markers it watches for are the
literal DSML strings rendered by ds4_server.c's prompt template
(<|DSML|tool_calls>, <|DSML|invoke name=...>, <think>, etc.) - these are
plain text the model emits, not special tokens.
Partial markers split across token chunks are buffered until a full marker
or a definitively-not-a-marker '<' is observed. RandomToolId() generates
the API-side tool call id (call_xxx) that exact-replay would key on.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): split hex escapes in DSML markers + add cstring/cstdio includes
C++ \x hex escapes have no length cap. '\x9cD' was read as a single escape
producing byte 0xCD, eating the 'D'. The markers were never actually matching
the DSML text the model emits. Split each escape with adjacent string literal
concatenation so the byte sequence is exactly EF BD 9C 44 (|D) at runtime.
Also adds <cstring> and <cstdio> includes (libstdc++ 13 does not transitively
expose std::strlen / std::snprintf via <string>).
The local plan file (uncommitted) was also updated with the same fixes so
Task 16's dsml_renderer.cpp does not re-introduce the bug.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): wire DsmlParser into Predict (ChatDelta)
Non-streaming Predict now emits one ChatDelta carrying content,
reasoning_content, and tool_calls[] parsed from the model's DSML output.
Reply.message still carries the raw model bytes for backends that prefer
the regex fallback path.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): wire DsmlParser into PredictStream
Per-token ChatDelta writes: content/reasoning_content go incrementally,
tool_calls emit TOOL_START as one delta (id + name) followed by
TOOL_ARGS deltas with incremental JSON. The Go-side aggregator
(pkg/functions/chat_deltas.go) reassembles them.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): chat template + reasoning_effort mapping
UseTokenizerTemplate=true + Messages -> ds4_chat_begin / append /
assistant_prefix. PredictOptions.Metadata['enable_thinking'] and
['reasoning_effort'] map to ds4_think_mode (DS4_THINK_HIGH default;
'max'/'xhigh' -> DS4_THINK_MAX; disabled -> DS4_THINK_NONE).
Tool-call rendering for assistant turns with tool_calls JSON arrives in
the next commit (dsml_renderer).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): render assistant tool_calls + tool results to DSML
Closes the round-trip: when an OpenAI client sends a multi-turn chat
where prior turns contain tool_calls or role=tool messages, build_prompt
serializes them back to the DSML shape the model was trained on. Mirrors
ds4_server.c's prompt renderer; uses nlohmann::json for parsing the
OpenAI tool_calls payload.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): disk KV cache module
Dir-based cache keyed by SHA1(rendered prompt prefix). File format:
'DS4G' magic + version + ctx_size + prefix_len + prefix + payload_bytes
+ ds4_session_save_payload output. NOT bit-compatible with ds4-server's
KVC files - that interop is a follow-up plan. LoadLongestPrefix walks
the dir picking the longest stored prefix that prefixes the incoming
prompt.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): wire KvCache into Predict/PredictStream
LoadModel reads 'kv_cache_dir' from ModelOptions.Options[], passes it to
g_kv_cache.SetDir. Each Predict/PredictStream computes a render text for
the request, tries LoadLongestPrefix to recover state, then Saves the
new state after generation. ds4_session_sync handles the live-cache
fast path internally, so the disk cache only matters for cold-starts
and cross-session reuse.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): add package.sh
Linux: bundles libc + ld + libstdc++ + libgomp + GPU runtime libs into
package/lib so the FROM scratch image boots without a host libc.
Darwin is handled by scripts/build/ds4-darwin.sh which uses otool -L.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): rename namespace ds4_backend -> ds4cpp
ds4.h defines 'typedef enum {...} ds4_backend' which collides with our
C++ 'namespace ds4_backend' anywhere a TU includes both. kv_cache.h
includes ds4.h directly and surfaces the conflict immediately; other
TUs would hit it once gRPC dev headers are available.
Renames the C++ namespace to ds4cpp across all wrapper files and the
plan, leaving the upstream ds4 typedef untouched.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend): add Dockerfile.ds4
Single-stage builder (CUDA devel image for cublas, ubuntu:24.04 for cpu)
-> FROM scratch with packaged grpc-server + bundled runtime libs.
nlohmann-json3-dev is required for dsml_renderer's JSON handling.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(make): wire backend/cpp/ds4 + ds4-darwin into root Makefile
BACKEND_DS4 entry + generate-docker-build-target eval + docker-build-ds4
in docker-build-backends + .NOTPARALLEL guards. Also adds the
backends/ds4-darwin target which delegates to scripts/build/ds4-darwin.sh
(landed in Task 24).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: add backend-matrix entries for ds4 (cpu + cuda13, per-arch)
Two entries per build (amd64 + arm64) so backend-merge-jobs assembles a
multi-arch manifest. Skipping cuda12 - ds4 was validated against CUDA 13.
Darwin Metal is handled outside this matrix by backend_build_darwin.yml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/index): add ds4 meta + image entries
cpu + cuda13 x latest + master. Darwin Metal builds publish under
ds4-darwin via the existing llama-cpp-darwin OCI pipeline.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(scripts/build): add ds4-darwin.sh
Native macOS/Metal build for the ds4 backend. Mirrors llama-cpp-darwin.sh:
make grpc-server -> otool -L for dylib bundling -> OCI tar that
'local-ai backends install' consumes via the backends/ds4-darwin
Makefile target.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(darwin): build ds4-darwin in backend_build_darwin
Adds a 'Build ds4 backend (Darwin Metal)' step that runs the
backends/ds4-darwin Makefile target on the macOS runner.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(import): auto-detect ds4 weights via DS4Importer
Adds core/gallery/importers/ds4.go which matches on the antirez/deepseek-v4-gguf
repo URI and the DeepSeek-V4-Flash-*.gguf filename pattern. Registered before
LlamaCPPImporter so ds4 weights route to backend: ds4 instead of falling
through to llama-cpp.
Also lists ds4 in /backends/known so the /import-model UI surfaces it as a
manual choice for users who want to force the backend on a non-canonical URI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add deepseek-v4-flash-q2 (ds4 backend)
One-click install of the q2 weights with backend: ds4.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(.agents): add ds4-backend.md
Documents the backend shape, DSML state machine, thinking-mode mapping,
disk KV cache, build matrix (cpu/cuda13/Darwin), and the BACKEND_BINARY
hardware-validation path.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): pass UBUNTU_VERSION + arch env vars to install-base-deps
The .docker/install-base-deps.sh script needs UBUNTU_VERSION (defaults to
2404), TARGETARCH, SKIP_DRIVERS, and APT_MIRROR/APT_PORTS_MIRROR exported
into the environment so it can pick the right cuda-keyring / cudss / nvpl
debs and apt mirrors. Dockerfile.ds4 was declaring some of the ARGs but not
re-exporting them via ENV. Mirrors Dockerfile.llama-cpp's pattern.
Without this fix 'make docker-build-ds4 BUILD_TYPE=cublas CUDA_MAJOR_VERSION=13'
failed at:
/usr/local/sbin/install-base-deps: line 120: UBUNTU_VERSION: unbound variable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/index): add Metal image entries for ds4
Adds metal-ds4 + metal-ds4-development image entries pointing at
quay.io/go-skynet/local-ai-backends:{latest,master}-metal-darwin-arm64-ds4
(built by scripts/build/ds4-darwin.sh on macOS arm64 runners), plus the
'metal' and 'metal-darwin-arm64' capability mappings on the ds4 meta and
ds4-development variant.
Closes a gap from the initial Task 23 landing - the Darwin Metal build
script and CI workflow step were already wired (Tasks 24-25), but the
gallery had no image entry for users to install the Metal variant.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ci): use ubuntu:24.04 base for ds4 cuda13 matrix entries
The initial Task 22 matrix landing used base-image: 'nvidia/cuda:13.0.0-devel-ubuntu24.04'
which clashes with install-base-deps.sh's cuda-keyring step:
E: Conflicting values set for option Signed-By regarding source
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/sbsa/
The canonical pattern (llama-cpp, ik-llama-cpp, turboquant) uses plain
'ubuntu:24.04' + 'skip-drivers: false' so install-base-deps installs CUDA
from scratch via its own keyring setup. Adopting that here.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): drop install-base-deps.sh dependency
The .docker/install-base-deps.sh pipeline is built around the llama-cpp
needs: NVIDIA keyring + cuda-toolkit apt + gRPC-from-source build at
/opt/grpc. For ds4 we don't need any of that:
- CUDA: nvidia/cuda:13.0.0-devel-ubuntu24.04 ships /usr/local/cuda
ready to go; install-base-deps's keyring step then conflicts with
the pre-installed Signed-By.
- gRPC: ds4's grpc-server.cpp only links against grpc++; system
libgrpc++-dev (apt) is sufficient, no source build needed.
Replaced the install-base-deps invocation in Dockerfile.ds4 with a
direct 'apt-get install libgrpc++-dev libprotobuf-dev protobuf-compiler-grpc
nlohmann-json3-dev cmake build-essential pkg-config git'. Matrix entries
back to nvidia/cuda base + skip-drivers=true so install-base-deps would
no-op even if some downstream tooling calls it.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): correct proto accessors + alias grpc::Status as GStatus
Two compile bugs caught by the docker build:
1. proto::Message uses snake_case accessors. The build_prompt loop called
m.toolcalls() / m.toolcallid() - the protoc-generated names are
m.tool_calls() / m.tool_call_id(). Plan-text bug propagated to the
wrapper.
2. The Status RPC method shadowed the 'using grpc::Status' alias, so any
later method declaration using Status as a return type failed to parse
('Status does not name a type' starting at LoadModel). Solution: alias
grpc::Status as GStatus instead, with no 'using' clause that would
conflict. All RPC method declarations and return-statement constructions
now use GStatus.
Pre-existing code reviewer flagged the Status-shadow concern as 'minor'
in the original Task 10 commit; it turned out to be a real compile blocker
under libstdc++ 13 once the surrounding methods were filled in.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): preserve TOOL_ARGS content in dsml_parser Flush
When the model emitted a parameter value that arrived in the same buffer
as the surrounding tool_call markers (e.g. the buffered tail after a
literal '</think>' opened the model output), the parser deferred all
buffered bytes to Flush() because looks_like_prefix() always returns
true while buf starts with '<'. Flush() then drained the buffer as
plain CONTENT/REASONING regardless of parser state, so the bytes
between the parameter open and close markers were classified as
CONTENT instead of TOOL_ARGS.
Symptom: the model emitted
<|DSML|parameter name="location" string="true">Paris, France</|DSML|parameter>
and the assembled tool_call arguments came out as {"location":""} -
the opener and closer were emitted into the args stream but the
"Paris, France" content went to the assistant message instead.
Fix:
1. Flush() now uses the same state-aware emit logic as DrainPlain:
PARAM_VALUE bytes become TOOL_ARGS (json-escaped when string),
THINK bytes become REASONING, TEXT bytes become CONTENT, and
INVOKE / TOOL_CALLS structural whitespace is discarded.
2. looks_like_prefix() restricts its leading-'<' fallback to buffers
that have not yet seen a '>'. Without that change, char-by-char
feeds would discard the '<' of '<|DSML|invoke name="..."' once
the marker prefix length was reached but the closing quote/'>'
were still in flight.
Verified with a standalone harness that runs the failing input three
ways (single Feed, split-after-'>', and char-by-char) and aggregates
TOOL_ARGS for tool index 0: all three now produce
{"location":"Paris, France"}.
Assisted-by: Claude:opus-4.7 [Read,Edit,Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(backend/cpp/ds4): use ds4_session_sync + manual generation loop for KV persistence
ds4_engine_generate_argmax() is a self-contained helper that doesn't take or
update a ds4_session - it manages its own internal state. Our Predict and
PredictStream methods created g_session via ds4_session_create() but then
called ds4_engine_generate_argmax(), so g_session's KV state never advanced.
ds4_session_payload_bytes(g_session) returned 0 and the disk KV cache save
correctly rejected with 'session has no valid checkpoint to save'.
Switch both RPCs to the proper session API:
ds4_session_sync(g_session, &prompt, ...)
loop:
int token = ds4_session_argmax(g_session)
if token == eos: break
emit(token)
ds4_session_eval(g_session, token, ...)
After the loop the session has a real checkpoint and ds4_session_save_payload
writes the KV state to disk. Verified end-to-end on a DGX Spark GB10: three
.kv files (15-30 MB each) are written when BACKEND_TEST_OPTIONS sets
kv_cache_dir, and the e2e tool-call assertion still passes.
Also added stderr diagnostics to KvCache (enabled/disabled at SetDir; per-save
path + payload_bytes + result) so future failures are visible instead of
silent. The 'wrote ok' lines are low-volume - one per Predict/PredictStream
when the cache is enabled - and skipped entirely when the option is unset.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): use ds4_session_eval_speculative_argmax when MTP loaded
Wires MTP (Multi-Token Prediction) speculative decoding into the manual
generation loop in both Predict and PredictStream. When the upstream MTP
weights are loaded via 'mtp_path:' option AND we're on CUDA / Metal,
ds4_engine_mtp_draft_tokens() returns >0 and we switch the inner loop to
ds4_session_eval_speculative_argmax(), which can accept N>1 tokens per
verifier step. When MTP is not loaded (no option, CPU backend, or weights
absent), we fall through to the simple ds4_session_argmax + ds4_session_eval
path with no behavior change.
Validated on a DGX Spark GB10 with the optional MTP GGUF
(DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf, ~3.6 GB). LoadModel logs
'ds4: MTP support model loaded ... (draft=2)' on stderr.
Caveat per upstream README: 'currently provides at most a slight speedup,
not a meaningful generation-speed win'. Wired now mainly to track the
upstream API; bigger speedups arrive when ds4 improves the speculative path.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend/cpp/ds4): honor PredictOptions sampling with DSML-aware override
Mirrors ds4_server.c:7102-7115 sampling-policy semantics on the LocalAI
gRPC side. The generation loop now consults compute_sample_params() per
token to pick the effective (temperature, top_k, top_p, min_p), based on:
1. Request defaults: PredictOptions.temperature / .topk / .topp / .minp
2. Thinking-mode override: when enable_thinking != false, force T=1.0,
top_k=0, top_p=1.0, min_p=0.0 (creativity for the reasoning pass and
the trailing content)
3. DSML structural override: when DsmlParser::IsInDsmlStructural()
returns true (we are between tool-call markers but NOT in a param
value payload), force T=0.0 so protocol bytes parse cleanly
When the effective temperature is 0, we keep using ds4_session_argmax +
MTP speculative path (matches ds4-server's gate that only enables MTP for
greedy positions). When > 0, we call ds4_session_sample(s, T, ...) with
a per-thread RNG seeded from system_clock and fall back to single-token
ds4_session_eval.
New public method on DsmlParser: IsInDsmlStructural() encodes which states
need protocol-byte determinism. PARAM_VALUE is excluded (payload uses user
sampling); TEXT and THINK are excluded (no tool-call context to protect).
Verified on the DGX Spark GB10: the e2e suite still passes with all 5
specs including tools, and the Predict output now varies between runs
(creative sampling active) while the tool-call args remain a clean
'{"location":"Paris, France"}' because the parser-state check forces
greedy on the structural bytes.
UX note: thinking mode is ON by default (matching ds4-server). Users who
want deterministic output should set Metadata.enable_thinking = false.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add sha256 to deepseek-v4-flash-q2 entry
Per HF LFS metadata for antirez/deepseek-v4-gguf:
size: 86720111200 bytes (~80.76 GiB)
sha256: 31598c67c8b8744d3bcebcd19aa62253c6dc43cef3b8adf9f593656c9e86fd8c
LocalAI's downloader verifies sha256 when present, so users who install
deepseek-v4-flash-q2 from the gallery get integrity-checked weights and
the partial-download issue (an 81 GB file is easy to truncate) becomes
recoverable instead of silently producing a broken backend.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
1255 lines
47 KiB
Go
1255 lines
47 KiB
Go
package e2ebackends_test
|
|
|
|
import (
|
|
"context"
|
|
"encoding/base64"
|
|
"fmt"
|
|
"io"
|
|
"net"
|
|
"os"
|
|
"os/exec"
|
|
"path/filepath"
|
|
"strings"
|
|
"time"
|
|
|
|
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
|
. "github.com/onsi/ginkgo/v2"
|
|
. "github.com/onsi/gomega"
|
|
"github.com/phayes/freeport"
|
|
"google.golang.org/grpc"
|
|
"google.golang.org/grpc/credentials/insecure"
|
|
)
|
|
|
|
// Environment variables consumed by the suite.
|
|
//
|
|
// Required (one of):
|
|
//
|
|
// BACKEND_IMAGE Docker image tag to test (e.g. local-ai-backend:llama-cpp).
|
|
//
|
|
// Required model source (one of):
|
|
//
|
|
// BACKEND_TEST_MODEL_URL HTTP(S) URL of a model file to download before the test.
|
|
// BACKEND_TEST_MODEL_FILE Path to an already-available model file (skips download).
|
|
// BACKEND_TEST_MODEL_NAME HuggingFace model id (e.g. "Qwen/Qwen2.5-0.5B-Instruct").
|
|
// Passed verbatim as ModelOptions.Model; backends like vllm
|
|
// resolve it themselves and no local file is downloaded.
|
|
//
|
|
// Optional:
|
|
//
|
|
// BACKEND_TEST_MMPROJ_URL HTTP(S) URL of an mmproj file (audio/vision encoder)
|
|
// to download alongside the main model — required for
|
|
// multimodal models like Qwen3-ASR-0.6B-GGUF.
|
|
// BACKEND_TEST_MMPROJ_FILE Path to an already-available mmproj file.
|
|
// BACKEND_TEST_EXTRA_FILES Pipe-separated list of companion files to download
|
|
// next to the main model. Each entry is "<url>" or
|
|
// "<url>#<local-name>" (the optional suffix renames
|
|
// the file on disk — useful for sherpa-onnx models
|
|
// whose loader expects specific names like
|
|
// encoder.int8.onnx).
|
|
// BACKEND_TEST_AUDIO_URL HTTP(S) URL of a sample audio file used by the
|
|
// transcription specs.
|
|
// BACKEND_TEST_AUDIO_FILE Path to an already-available sample audio file.
|
|
// BACKEND_TEST_CAPS Comma-separated list of capabilities to exercise.
|
|
// Supported values: health, load, predict, stream,
|
|
// embeddings, tools, transcription, image.
|
|
// Defaults to "health,load,predict,stream".
|
|
// A backend that only does embeddings would set this to
|
|
// "health,load,embeddings"; an image-generation backend
|
|
// that cannot be driven by a text prompt can set it to
|
|
// "health,load,image".
|
|
// "tools" asks the backend to extract a tool call from the
|
|
// model output into ChatDelta.tool_calls.
|
|
// "image" exercises the GenerateImage RPC and asserts a
|
|
// non-empty file is written to the requested dst path.
|
|
// BACKEND_TEST_IMAGE_PROMPT Override the positive prompt for the image spec
|
|
// (default: "a photograph of an astronaut riding a horse").
|
|
// BACKEND_TEST_IMAGE_STEPS Override the diffusion step count for the image spec
|
|
// (default: 4 — keeps CPU-only runs under a few minutes).
|
|
// BACKEND_TEST_PROMPT Override the prompt used by predict/stream specs.
|
|
// BACKEND_TEST_CTX_SIZE Override the context size passed to LoadModel (default 512).
|
|
// BACKEND_TEST_THREADS Override Threads passed to LoadModel (default 4).
|
|
// BACKEND_TEST_OPTIONS Comma-separated Options[] entries passed to LoadModel,
|
|
// e.g. "tool_parser:hermes,reasoning_parser:qwen3".
|
|
// BACKEND_TEST_CACHE_TYPE_K Sets ModelOptions.CacheTypeKey (llama.cpp -ctk),
|
|
// e.g. "q8_0" — exercises KV-cache quantization code paths.
|
|
// BACKEND_TEST_CACHE_TYPE_V Sets ModelOptions.CacheTypeValue (llama.cpp -ctv).
|
|
// BACKEND_TEST_TOOL_PROMPT Override the user prompt for the tools spec
|
|
// (default: "What's the weather like in Paris, France?").
|
|
// BACKEND_TEST_TOOL_NAME Override the function name expected in the tool call
|
|
// (default: "get_weather").
|
|
// BACKEND_TEST_TTS_TEXT Override the text synthesized by the tts/ttsstream
|
|
// specs (default: "The quick brown fox jumps over the
|
|
// lazy dog.").
|
|
//
|
|
// The suite is intentionally model-format-agnostic: it only ever passes the
|
|
// file path to LoadModel, so GGUF, ONNX, safetensors, .bin etc. all work so
|
|
// long as the backend under test accepts that format.
|
|
const (
|
|
capHealth = "health"
|
|
capLoad = "load"
|
|
capPredict = "predict"
|
|
capStream = "stream"
|
|
capEmbeddings = "embeddings"
|
|
capTools = "tools"
|
|
capTranscription = "transcription"
|
|
capTTS = "tts"
|
|
capImage = "image"
|
|
capFaceDetect = "face_detect"
|
|
capFaceEmbed = "face_embed"
|
|
capFaceVerify = "face_verify"
|
|
capFaceAnalyze = "face_analyze"
|
|
capFaceAntispoof = "face_antispoof"
|
|
capVoiceEmbed = "voice_embed"
|
|
capVoiceVerify = "voice_verify"
|
|
capVoiceAnalyze = "voice_analyze"
|
|
capAudioTransform = "audio_transform"
|
|
capLogprobs = "logprobs"
|
|
capLogitBias = "logit_bias"
|
|
|
|
defaultPrompt = "The capital of France is"
|
|
streamPrompt = "Once upon a time"
|
|
defaultToolPrompt = "What's the weather like in Paris, France?"
|
|
defaultToolName = "get_weather"
|
|
defaultImagePrompt = "a photograph of an astronaut riding a horse"
|
|
defaultImageSteps = 4
|
|
defaultVerifyDistanceCeil = float32(0.6) // upper bound for same-person; SFace runs closer to 0.5 ArcFace to 0.35.
|
|
defaultTTSText = "The quick brown fox jumps over the lazy dog."
|
|
)
|
|
|
|
func defaultCaps() map[string]bool {
|
|
return map[string]bool{
|
|
capHealth: true,
|
|
capLoad: true,
|
|
capPredict: true,
|
|
capStream: true,
|
|
}
|
|
}
|
|
|
|
// splitURLAndName parses a "<url>#<local-name>" entry. The #name suffix is
|
|
// optional — if absent, defaultName is returned. Used by the main-model
|
|
// and extras download paths so a test can rename downloaded files to the
|
|
// shape the backend's loader expects.
|
|
func splitURLAndName(entry, defaultName string) (url, name string) {
|
|
if hash := strings.Index(entry, "#"); hash >= 0 {
|
|
return entry[:hash], entry[hash+1:]
|
|
}
|
|
return entry, defaultName
|
|
}
|
|
|
|
// parseCaps reads BACKEND_TEST_CAPS and returns the enabled capability set.
|
|
// An empty/unset value falls back to defaultCaps().
|
|
func parseCaps() map[string]bool {
|
|
raw := strings.TrimSpace(os.Getenv("BACKEND_TEST_CAPS"))
|
|
if raw == "" {
|
|
return defaultCaps()
|
|
}
|
|
caps := map[string]bool{}
|
|
for _, part := range strings.Split(raw, ",") {
|
|
part = strings.TrimSpace(strings.ToLower(part))
|
|
if part != "" {
|
|
caps[part] = true
|
|
}
|
|
}
|
|
return caps
|
|
}
|
|
|
|
var _ = Describe("Backend container", Ordered, func() {
|
|
var (
|
|
caps map[string]bool
|
|
workDir string
|
|
binaryDir string
|
|
modelFile string // set when a local file is used
|
|
modelName string // set when a HuggingFace model id is used
|
|
mmprojFile string // optional multimodal projector
|
|
audioFile string // optional audio fixture for transcription specs
|
|
// Face fixtures: two photos of the same person + one different person.
|
|
faceFile1 string
|
|
faceFile2 string
|
|
faceFile3 string
|
|
// Spoof fixture: a photo that the antispoofing model should
|
|
// classify as fake (e.g. printed photo / screen replay). Only
|
|
// exercised when capFaceAntispoof is enabled and the env var
|
|
// is set.
|
|
faceSpoofFile string
|
|
// Voice fixtures: two clips of the same speaker + one different speaker.
|
|
voiceFile1 string
|
|
voiceFile2 string
|
|
voiceFile3 string
|
|
// voiceVerifyCeiling is the upper-bound cosine distance for a
|
|
// same-speaker pair; varies with the recognizer (ECAPA-TDNN
|
|
// runs close to 0.2, WeSpeaker around 0.3).
|
|
voiceVerifyCeiling float32
|
|
// verifyCeiling is the upper-bound cosine distance for a
|
|
// same-person pair; each model configuration can override it via
|
|
// BACKEND_TEST_VERIFY_DISTANCE_CEILING because SFace's distance
|
|
// distribution is wider than ArcFace's.
|
|
verifyCeiling float32
|
|
addr string
|
|
serverCmd *exec.Cmd
|
|
conn *grpc.ClientConn
|
|
client pb.BackendClient
|
|
prompt string
|
|
options []string
|
|
)
|
|
|
|
BeforeAll(func() {
|
|
image := os.Getenv("BACKEND_IMAGE")
|
|
// BACKEND_BINARY is an escape hatch for hardware-gated backends (e.g. ds4)
|
|
// where building a full Docker image around an 80+ GB model is impractical.
|
|
// Points at a `run.sh` produced by `make -C backend/cpp/<name> package`.
|
|
binary := os.Getenv("BACKEND_BINARY")
|
|
Expect(image != "" || binary != "").To(BeTrue(),
|
|
"either BACKEND_IMAGE or BACKEND_BINARY env var must be set")
|
|
Expect(image != "" && binary != "").To(BeFalse(),
|
|
"BACKEND_IMAGE and BACKEND_BINARY are mutually exclusive")
|
|
if binary != "" {
|
|
Expect(filepath.Base(binary)).To(Equal("run.sh"),
|
|
"BACKEND_BINARY must point at a run.sh produced by 'make -C backend/cpp/<name> package'")
|
|
}
|
|
|
|
modelURL := os.Getenv("BACKEND_TEST_MODEL_URL")
|
|
modelFile = os.Getenv("BACKEND_TEST_MODEL_FILE")
|
|
modelName = os.Getenv("BACKEND_TEST_MODEL_NAME")
|
|
Expect(modelURL != "" || modelFile != "" || modelName != "").To(BeTrue(),
|
|
"one of BACKEND_TEST_MODEL_URL, BACKEND_TEST_MODEL_FILE, or BACKEND_TEST_MODEL_NAME must be set")
|
|
|
|
caps = parseCaps()
|
|
src := image
|
|
if src == "" {
|
|
src = binary
|
|
}
|
|
GinkgoWriter.Printf("Testing src=%q with capabilities=%v\n", src, keys(caps))
|
|
|
|
prompt = os.Getenv("BACKEND_TEST_PROMPT")
|
|
if prompt == "" {
|
|
prompt = defaultPrompt
|
|
}
|
|
|
|
if raw := strings.TrimSpace(os.Getenv("BACKEND_TEST_OPTIONS")); raw != "" {
|
|
for _, opt := range strings.Split(raw, ",") {
|
|
opt = strings.TrimSpace(opt)
|
|
if opt != "" {
|
|
options = append(options, opt)
|
|
}
|
|
}
|
|
}
|
|
|
|
var err error
|
|
workDir, err = os.MkdirTemp("", "backend-e2e-*")
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
if image != "" {
|
|
binaryDir = filepath.Join(workDir, "rootfs")
|
|
Expect(os.MkdirAll(binaryDir, 0o755)).To(Succeed())
|
|
extractImage(image, binaryDir)
|
|
} else {
|
|
binaryDir = filepath.Dir(binary)
|
|
}
|
|
Expect(filepath.Join(binaryDir, "run.sh")).To(BeAnExistingFile())
|
|
|
|
// Download the model once if not provided and no HF name given.
|
|
// BACKEND_TEST_MODEL_URL accepts an optional "#<local-name>" suffix
|
|
// for cases where the backend expects the model file to have a
|
|
// specific name (e.g. sherpa-onnx's online recognizer finds
|
|
// encoder/decoder/joiner by filename substring).
|
|
if modelFile == "" && modelName == "" {
|
|
url, name := splitURLAndName(modelURL, "model.bin")
|
|
modelFile = filepath.Join(workDir, name)
|
|
downloadFile(url, modelFile)
|
|
}
|
|
|
|
// Multi-file models (sherpa-onnx streaming zipformer, sherpa-onnx
|
|
// Omnilingual, any split encoder/decoder/joiner bundle) need
|
|
// companion files next to the main model. BACKEND_TEST_EXTRA_FILES
|
|
// is a pipe-separated list of "<url>[#<local-name>]" entries; each
|
|
// is downloaded into the same directory as modelFile. The optional
|
|
// <local-name> renames the saved file (useful when upstream URLs
|
|
// have stamp/version suffixes the loader doesn't recognise).
|
|
if extraSpec := strings.TrimSpace(os.Getenv("BACKEND_TEST_EXTRA_FILES")); extraSpec != "" && modelFile != "" {
|
|
modelDir := filepath.Dir(modelFile)
|
|
for _, entry := range strings.Split(extraSpec, "|") {
|
|
entry = strings.TrimSpace(entry)
|
|
if entry == "" {
|
|
continue
|
|
}
|
|
url, name := splitURLAndName(entry, filepath.Base(entry))
|
|
downloadFile(url, filepath.Join(modelDir, name))
|
|
}
|
|
}
|
|
|
|
// Multimodal projector (mmproj): required by audio/vision-capable
|
|
// llama.cpp models like Qwen3-ASR-0.6B-GGUF. Either file or URL.
|
|
mmprojFile = os.Getenv("BACKEND_TEST_MMPROJ_FILE")
|
|
if mmprojFile == "" {
|
|
if url := os.Getenv("BACKEND_TEST_MMPROJ_URL"); url != "" {
|
|
mmprojFile = filepath.Join(workDir, "mmproj.bin")
|
|
downloadFile(url, mmprojFile)
|
|
}
|
|
}
|
|
|
|
// Audio fixture for the transcription specs.
|
|
audioFile = os.Getenv("BACKEND_TEST_AUDIO_FILE")
|
|
if audioFile == "" {
|
|
if url := os.Getenv("BACKEND_TEST_AUDIO_URL"); url != "" {
|
|
audioFile = filepath.Join(workDir, "sample.wav")
|
|
downloadFile(url, audioFile)
|
|
}
|
|
}
|
|
|
|
// Face fixtures for the face-recognition specs.
|
|
faceFile1 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_1", "face_a_1.jpg")
|
|
faceFile2 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_2", "face_a_2.jpg")
|
|
faceFile3 = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_IMAGE_3", "face_b.jpg")
|
|
faceSpoofFile = resolveFaceFixture(workDir, "BACKEND_TEST_FACE_SPOOF_IMAGE", "face_spoof.jpg")
|
|
verifyCeiling = envFloat32("BACKEND_TEST_VERIFY_DISTANCE_CEILING", defaultVerifyDistanceCeil)
|
|
|
|
// Voice fixtures for the voice-recognition specs. Same resolver
|
|
// as faces — the helper is content-agnostic.
|
|
voiceFile1 = resolveFaceFixture(workDir, "BACKEND_TEST_VOICE_AUDIO_1", "voice_a_1.wav")
|
|
voiceFile2 = resolveFaceFixture(workDir, "BACKEND_TEST_VOICE_AUDIO_2", "voice_a_2.wav")
|
|
voiceFile3 = resolveFaceFixture(workDir, "BACKEND_TEST_VOICE_AUDIO_3", "voice_b.wav")
|
|
voiceVerifyCeiling = envFloat32("BACKEND_TEST_VOICE_VERIFY_DISTANCE_CEILING", 0.4)
|
|
|
|
// Pick a free port and launch the backend.
|
|
port, err := freeport.GetFreePort()
|
|
Expect(err).NotTo(HaveOccurred())
|
|
addr = fmt.Sprintf("127.0.0.1:%d", port)
|
|
|
|
Expect(os.Chmod(filepath.Join(binaryDir, "run.sh"), 0o755)).To(Succeed())
|
|
// Mark any other top-level files executable (extraction may strip perms).
|
|
entries, _ := os.ReadDir(binaryDir)
|
|
for _, e := range entries {
|
|
if !e.IsDir() && !strings.HasSuffix(e.Name(), ".sh") {
|
|
_ = os.Chmod(filepath.Join(binaryDir, e.Name()), 0o755)
|
|
}
|
|
}
|
|
|
|
serverCmd = exec.Command(filepath.Join(binaryDir, "run.sh"), "--addr="+addr)
|
|
serverCmd.Stdout = GinkgoWriter
|
|
serverCmd.Stderr = GinkgoWriter
|
|
Expect(serverCmd.Start()).To(Succeed())
|
|
|
|
// Wait for the gRPC port to accept connections.
|
|
Eventually(func() error {
|
|
c, err := net.DialTimeout("tcp", addr, 500*time.Millisecond)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
_ = c.Close()
|
|
return nil
|
|
}, 30*time.Second, 200*time.Millisecond).Should(Succeed(), "backend did not start")
|
|
|
|
conn, err = grpc.Dial(addr,
|
|
grpc.WithTransportCredentials(insecure.NewCredentials()),
|
|
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(50*1024*1024)),
|
|
)
|
|
Expect(err).NotTo(HaveOccurred())
|
|
client = pb.NewBackendClient(conn)
|
|
})
|
|
|
|
AfterAll(func() {
|
|
if conn != nil {
|
|
_ = conn.Close()
|
|
}
|
|
if serverCmd != nil && serverCmd.Process != nil {
|
|
_ = serverCmd.Process.Kill()
|
|
_, _ = serverCmd.Process.Wait()
|
|
}
|
|
if workDir != "" {
|
|
_ = os.RemoveAll(workDir)
|
|
}
|
|
})
|
|
|
|
It("responds to Health", func() {
|
|
if !caps[capHealth] {
|
|
Skip("health capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
res, err := client.Health(ctx, &pb.HealthMessage{})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty())
|
|
})
|
|
|
|
It("loads the model", func() {
|
|
if !caps[capLoad] {
|
|
Skip("load capability not enabled")
|
|
}
|
|
ctxSize := envInt32("BACKEND_TEST_CTX_SIZE", 512)
|
|
threads := envInt32("BACKEND_TEST_THREADS", 4)
|
|
|
|
// Prefer a HuggingFace model id when provided (e.g. for vllm);
|
|
// otherwise fall back to a downloaded/local file path.
|
|
modelRef := modelFile
|
|
var modelPath string
|
|
if modelName != "" {
|
|
modelRef = modelName
|
|
} else {
|
|
modelPath = modelFile
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
|
|
defer cancel()
|
|
res, err := client.LoadModel(ctx, &pb.ModelOptions{
|
|
Model: modelRef,
|
|
ModelFile: modelPath,
|
|
ContextSize: ctxSize,
|
|
Threads: threads,
|
|
NGPULayers: 0,
|
|
MMap: true,
|
|
NBatch: 128,
|
|
Options: options,
|
|
MMProj: mmprojFile,
|
|
CacheTypeKey: os.Getenv("BACKEND_TEST_CACHE_TYPE_K"),
|
|
CacheTypeValue: os.Getenv("BACKEND_TEST_CACHE_TYPE_V"),
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetSuccess()).To(BeTrue(), "LoadModel failed: %s", res.GetMessage())
|
|
})
|
|
|
|
It("generates output via Predict", func() {
|
|
if !caps[capPredict] {
|
|
Skip("predict capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Prompt: prompt,
|
|
Tokens: 20,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty(), "Predict produced empty output")
|
|
GinkgoWriter.Printf("Predict: %q (tokens=%d, prompt_tokens=%d)\n",
|
|
res.GetMessage(), res.GetTokens(), res.GetPromptTokens())
|
|
})
|
|
|
|
It("streams output via PredictStream", func() {
|
|
if !caps[capStream] {
|
|
Skip("stream capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
stream, err := client.PredictStream(ctx, &pb.PredictOptions{
|
|
Prompt: streamPrompt,
|
|
Tokens: 20,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
var chunks int
|
|
var combined string
|
|
var firstChunks []string
|
|
for {
|
|
msg, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if len(msg.GetMessage()) > 0 {
|
|
chunks++
|
|
combined += string(msg.GetMessage())
|
|
if len(firstChunks) < 2 {
|
|
firstChunks = append(firstChunks, string(msg.GetMessage()))
|
|
}
|
|
}
|
|
}
|
|
Expect(chunks).To(BeNumerically(">", 0), "no stream chunks received")
|
|
// Regression guard: a bug in llama-cpp's grpc-server.cpp caused the
|
|
// role-init array element to get the same ChatDelta stamped, duplicating
|
|
// the first content token. Applies to any streaming backend.
|
|
if len(firstChunks) >= 2 {
|
|
Expect(firstChunks[0]).NotTo(Equal(firstChunks[1]),
|
|
"first content token was duplicated: %v", firstChunks)
|
|
}
|
|
GinkgoWriter.Printf("Stream: %d chunks, combined=%q\n", chunks, combined)
|
|
})
|
|
|
|
// Logprobs: backends that wire OpenAI-compatible logprobs return a
|
|
// JSON-encoded payload in Reply.logprobs (see backend.proto). The exact
|
|
// shape is backend-specific; we only assert that the field is populated
|
|
// when requested. Gated by capLogprobs because not every backend
|
|
// implements it.
|
|
It("returns logprobs when requested", func() {
|
|
if !caps[capLogprobs] {
|
|
Skip("logprobs capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Prompt: prompt,
|
|
Tokens: 10,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
Logprobs: 1,
|
|
TopLogprobs: 1,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty(), "Predict produced empty output")
|
|
Expect(res.GetLogprobs()).NotTo(BeEmpty(), "Reply.logprobs was empty when requested")
|
|
GinkgoWriter.Printf("Logprobs: %d bytes\n", len(res.GetLogprobs()))
|
|
})
|
|
|
|
// Logit bias: encoded as a JSON string keyed by token id. We don't
|
|
// know the model's tokenizer, so we exercise the API path with a
|
|
// nonsense bias map that any backend should accept and ignore for
|
|
// unknown ids. The assertion is that the request succeeds — proving
|
|
// the LogitBias plumbing is wired end-to-end.
|
|
It("accepts logit_bias when supplied", func() {
|
|
if !caps[capLogitBias] {
|
|
Skip("logit_bias capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 120*time.Second)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Prompt: prompt,
|
|
Tokens: 10,
|
|
Temperature: 0.1,
|
|
TopK: 40,
|
|
TopP: 0.9,
|
|
LogitBias: `{"1":-100}`,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetMessage()).NotTo(BeEmpty(), "Predict produced empty output with logit_bias")
|
|
})
|
|
|
|
It("computes embeddings via Embedding", func() {
|
|
if !caps[capEmbeddings] {
|
|
Skip("embeddings capability not enabled")
|
|
}
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.Embedding(ctx, &pb.PredictOptions{
|
|
Embeddings: prompt,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetEmbeddings()).NotTo(BeEmpty(), "Embedding returned empty vector")
|
|
GinkgoWriter.Printf("Embedding: %d dims\n", len(res.GetEmbeddings()))
|
|
})
|
|
|
|
It("generates an image via GenerateImage", func() {
|
|
if !caps[capImage] {
|
|
Skip("image capability not enabled")
|
|
}
|
|
|
|
imgPrompt := os.Getenv("BACKEND_TEST_IMAGE_PROMPT")
|
|
if imgPrompt == "" {
|
|
imgPrompt = defaultImagePrompt
|
|
}
|
|
steps := envInt32("BACKEND_TEST_IMAGE_STEPS", defaultImageSteps)
|
|
|
|
dst := filepath.Join(workDir, "generated.png")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
|
|
defer cancel()
|
|
res, err := client.GenerateImage(ctx, &pb.GenerateImageRequest{
|
|
PositivePrompt: imgPrompt,
|
|
NegativePrompt: "",
|
|
Width: 512,
|
|
Height: 512,
|
|
Step: steps,
|
|
Seed: 42,
|
|
Dst: dst,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetSuccess()).To(BeTrue(), "GenerateImage failed: %s", res.GetMessage())
|
|
|
|
info, err := os.Stat(dst)
|
|
Expect(err).NotTo(HaveOccurred(), "GenerateImage did not write a file at %s", dst)
|
|
Expect(info.Size()).To(BeNumerically(">", int64(0)),
|
|
"GenerateImage wrote an empty file at %s", dst)
|
|
GinkgoWriter.Printf("GenerateImage: wrote %s (%d bytes)\n", dst, info.Size())
|
|
})
|
|
|
|
It("extracts tool calls into ChatDelta", func() {
|
|
if !caps[capTools] {
|
|
Skip("tools capability not enabled")
|
|
}
|
|
|
|
toolPrompt := os.Getenv("BACKEND_TEST_TOOL_PROMPT")
|
|
if toolPrompt == "" {
|
|
toolPrompt = defaultToolPrompt
|
|
}
|
|
toolName := os.Getenv("BACKEND_TEST_TOOL_NAME")
|
|
if toolName == "" {
|
|
toolName = defaultToolName
|
|
}
|
|
|
|
toolsJSON := fmt.Sprintf(`[{
|
|
"type": "function",
|
|
"function": {
|
|
"name": %q,
|
|
"description": "Get the current weather for a location",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA"
|
|
}
|
|
},
|
|
"required": ["location"]
|
|
}
|
|
}
|
|
}]`, toolName)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
res, err := client.Predict(ctx, &pb.PredictOptions{
|
|
Messages: []*pb.Message{
|
|
{Role: "system", Content: "You are a helpful assistant. Use the provided tool when the user asks about weather."},
|
|
{Role: "user", Content: toolPrompt},
|
|
},
|
|
Tools: toolsJSON,
|
|
ToolChoice: "auto",
|
|
UseTokenizerTemplate: true,
|
|
Tokens: 200,
|
|
Temperature: 0.1,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
// Collect tool calls from every delta — some backends emit a single
|
|
// final delta, others stream incremental pieces in one Reply.
|
|
var toolCalls []*pb.ToolCallDelta
|
|
for _, delta := range res.GetChatDeltas() {
|
|
toolCalls = append(toolCalls, delta.GetToolCalls()...)
|
|
}
|
|
|
|
GinkgoWriter.Printf("Tool call: raw=%q deltas=%d tool_calls=%d\n",
|
|
string(res.GetMessage()), len(res.GetChatDeltas()), len(toolCalls))
|
|
|
|
Expect(toolCalls).NotTo(BeEmpty(),
|
|
"Predict did not return any ToolCallDelta. raw=%q", string(res.GetMessage()))
|
|
|
|
matched := false
|
|
for _, tc := range toolCalls {
|
|
GinkgoWriter.Printf(" - idx=%d id=%q name=%q args=%q\n",
|
|
tc.GetIndex(), tc.GetId(), tc.GetName(), tc.GetArguments())
|
|
if tc.GetName() == toolName {
|
|
matched = true
|
|
}
|
|
}
|
|
Expect(matched).To(BeTrue(),
|
|
"Expected a tool call named %q in ChatDelta.tool_calls", toolName)
|
|
})
|
|
|
|
It("transcribes audio via AudioTranscription", func() {
|
|
if !caps[capTranscription] {
|
|
Skip("transcription capability not enabled")
|
|
}
|
|
Expect(audioFile).NotTo(BeEmpty(),
|
|
"BACKEND_TEST_AUDIO_FILE or BACKEND_TEST_AUDIO_URL must be set when transcription cap is enabled")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
res, err := client.AudioTranscription(ctx, &pb.TranscriptRequest{
|
|
Dst: audioFile,
|
|
Threads: uint32(envInt32("BACKEND_TEST_THREADS", 4)),
|
|
Temperature: 0.0,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(strings.TrimSpace(res.GetText())).NotTo(BeEmpty(),
|
|
"AudioTranscription returned empty text")
|
|
GinkgoWriter.Printf("AudioTranscription: text=%q language=%q duration=%v\n",
|
|
res.GetText(), res.GetLanguage(), res.GetDuration())
|
|
})
|
|
|
|
It("streams audio transcription via AudioTranscriptionStream", func() {
|
|
if !caps[capTranscription] {
|
|
Skip("transcription capability not enabled")
|
|
}
|
|
Expect(audioFile).NotTo(BeEmpty(),
|
|
"BACKEND_TEST_AUDIO_FILE or BACKEND_TEST_AUDIO_URL must be set when transcription cap is enabled")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
|
defer cancel()
|
|
stream, err := client.AudioTranscriptionStream(ctx, &pb.TranscriptRequest{
|
|
Dst: audioFile,
|
|
Threads: uint32(envInt32("BACKEND_TEST_THREADS", 4)),
|
|
Temperature: 0.0,
|
|
Stream: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
var deltas []string
|
|
var assembled strings.Builder
|
|
var finalText string
|
|
for {
|
|
chunk, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if d := chunk.GetDelta(); d != "" {
|
|
deltas = append(deltas, d)
|
|
assembled.WriteString(d)
|
|
}
|
|
if final := chunk.GetFinalResult(); final != nil && final.GetText() != "" {
|
|
finalText = final.GetText()
|
|
}
|
|
}
|
|
// At least one of: a delta arrived, or the final event carried text.
|
|
Expect(deltas).NotTo(BeEmpty(),
|
|
"AudioTranscriptionStream did not emit any deltas (assembled=%q final=%q)",
|
|
assembled.String(), finalText)
|
|
|
|
// If both arrived, the final event should match the assembled deltas.
|
|
if finalText != "" && assembled.Len() > 0 {
|
|
Expect(finalText).To(Equal(assembled.String()),
|
|
"final transcript should match concatenated deltas")
|
|
}
|
|
GinkgoWriter.Printf("AudioTranscriptionStream: deltas=%d assembled=%q final=%q\n",
|
|
len(deltas), assembled.String(), finalText)
|
|
})
|
|
|
|
// ─── face recognition specs ─────────────────────────────────────────
|
|
|
|
It("detects faces via Detect", func() {
|
|
if !caps[capFaceDetect] {
|
|
Skip("face_detect capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer cancel()
|
|
res, err := client.Detect(ctx, &pb.DetectOptions{Src: base64File(faceFile1)})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetDetections()).NotTo(BeEmpty(), "Detect returned no faces")
|
|
for _, d := range res.GetDetections() {
|
|
Expect(d.GetClassName()).To(Equal("face"))
|
|
Expect(d.GetWidth()).To(BeNumerically(">", 0))
|
|
Expect(d.GetHeight()).To(BeNumerically(">", 0))
|
|
}
|
|
GinkgoWriter.Printf("face_detect: %d faces\n", len(res.GetDetections()))
|
|
})
|
|
|
|
It("produces face embeddings via Embedding", func() {
|
|
if !caps[capFaceEmbed] {
|
|
Skip("face_embed capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.Embedding(ctx, &pb.PredictOptions{Images: []string{base64File(faceFile1)}})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
vec := res.GetEmbeddings()
|
|
Expect(vec).NotTo(BeEmpty(), "Embedding returned empty vector")
|
|
// Face embeddings are L2-normalized — expect unit norm.
|
|
var sumSq float64
|
|
for _, v := range vec {
|
|
sumSq += float64(v) * float64(v)
|
|
}
|
|
Expect(sumSq).To(BeNumerically("~", 1.0, 0.05),
|
|
"face embedding should be L2-normed (sum(x^2)=%.3f, dim=%d)", sumSq, len(vec))
|
|
GinkgoWriter.Printf("face_embed: dim=%d\n", len(vec))
|
|
})
|
|
|
|
It("verifies faces via FaceVerify", func() {
|
|
if !caps[capFaceVerify] {
|
|
Skip("face_verify capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
|
|
// Same image twice — expected verified=true with very small distance.
|
|
b1 := base64File(faceFile1)
|
|
same, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b1, Threshold: verifyCeiling})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(same.GetVerified()).To(BeTrue(), "same image should verify: dist=%.3f", same.GetDistance())
|
|
Expect(same.GetDistance()).To(BeNumerically("<", 0.1))
|
|
GinkgoWriter.Printf("face_verify(same): dist=%.3f confidence=%.1f\n", same.GetDistance(), same.GetConfidence())
|
|
|
|
// Different images — assert relative ordering when the detector
|
|
// actually finds a face in both images. Some fixtures (masked
|
|
// faces, profile shots, etc.) are legitimately borderline for
|
|
// SCRFD's default threshold, so we don't fail the suite when the
|
|
// second image gets a NotFound — we just log and skip the
|
|
// cross-person check. The same-image assertion above is the
|
|
// definitive proof the RPC works end-to-end.
|
|
if faceFile3 != "" {
|
|
b3 := base64File(faceFile3)
|
|
diff, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b3, Threshold: verifyCeiling})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("face_verify(diff): skipped — %v\n", err)
|
|
} else {
|
|
Expect(diff.GetDistance()).To(BeNumerically(">", same.GetDistance()),
|
|
"cross-person distance %.3f should exceed same-image distance %.3f", diff.GetDistance(), same.GetDistance())
|
|
GinkgoWriter.Printf("face_verify(diff): dist=%.3f verified=%v\n", diff.GetDistance(), diff.GetVerified())
|
|
}
|
|
}
|
|
|
|
// If two photos of the same person were provided, the ordering
|
|
// should also hold: d(a1,a2) < ceiling. Best-effort as above —
|
|
// skip if the detector doesn't find a face in the second image.
|
|
if faceFile2 != "" {
|
|
b2 := base64File(faceFile2)
|
|
sp, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{Img1: b1, Img2: b2, Threshold: verifyCeiling})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("face_verify(same-person): skipped — %v\n", err)
|
|
} else {
|
|
Expect(sp.GetDistance()).To(BeNumerically("<", verifyCeiling),
|
|
"same-person (different photos) distance %.3f exceeds ceiling %.3f", sp.GetDistance(), verifyCeiling)
|
|
GinkgoWriter.Printf("face_verify(same-person): dist=%.3f verified=%v\n", sp.GetDistance(), sp.GetVerified())
|
|
}
|
|
}
|
|
|
|
// Liveness: exercise BOTH real and spoof paths when the cap is
|
|
// enabled. Gated on capFaceAntispoof so model configs without
|
|
// MiniFASNet weights (which would correctly surface
|
|
// FAILED_PRECONDITION) can still run the rest of the verify
|
|
// spec.
|
|
if caps[capFaceAntispoof] {
|
|
// (a) Real-face path: same image twice → both is_real=true,
|
|
// verified stays true, scores populated.
|
|
asReal, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{
|
|
Img1: b1, Img2: b1, Threshold: verifyCeiling, AntiSpoofing: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred(), "FaceVerify(anti_spoofing=true, real) failed")
|
|
Expect(asReal.GetImg1IsReal()).To(BeTrue(), "real face should be is_real=true (score=%.3f)", asReal.GetImg1AntispoofScore())
|
|
Expect(asReal.GetImg2IsReal()).To(BeTrue(), "real face should be is_real=true (score=%.3f)", asReal.GetImg2AntispoofScore())
|
|
Expect(asReal.GetImg1AntispoofScore()).To(BeNumerically(">", 0), "img1_antispoof_score must be populated")
|
|
Expect(asReal.GetImg2AntispoofScore()).To(BeNumerically(">", 0), "img2_antispoof_score must be populated")
|
|
Expect(asReal.GetVerified()).To(BeTrue(), "same image + real face should still verify with liveness on")
|
|
GinkgoWriter.Printf("face_antispoof(verify,real): img1_score=%.3f img2_score=%.3f\n",
|
|
asReal.GetImg1AntispoofScore(), asReal.GetImg2AntispoofScore())
|
|
|
|
// (b) Spoof path: img2 is a known-spoof fixture → img2
|
|
// classified as fake, liveness veto forces verified=false
|
|
// even though img1 vs img2 similarity isn't tested (could
|
|
// match or not). Skipped if no spoof fixture was provided,
|
|
// since a synthetic spoof is not a reliable assertion.
|
|
if faceSpoofFile != "" {
|
|
bSpoof := base64File(faceSpoofFile)
|
|
asFake, err := client.FaceVerify(ctx, &pb.FaceVerifyRequest{
|
|
Img1: b1, Img2: bSpoof, Threshold: verifyCeiling, AntiSpoofing: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred(), "FaceVerify(anti_spoofing=true, spoof img2) failed")
|
|
Expect(asFake.GetImg1IsReal()).To(BeTrue(), "img1 (real) should still be is_real=true")
|
|
Expect(asFake.GetImg2IsReal()).To(BeFalse(), "spoof fixture must classify as is_real=false (score=%.3f)", asFake.GetImg2AntispoofScore())
|
|
Expect(asFake.GetVerified()).To(BeFalse(), "failed liveness on img2 must force verified=false regardless of similarity")
|
|
GinkgoWriter.Printf("face_antispoof(verify,spoof): img1_score=%.3f img2_score=%.3f verified=%v\n",
|
|
asFake.GetImg1AntispoofScore(), asFake.GetImg2AntispoofScore(), asFake.GetVerified())
|
|
}
|
|
}
|
|
})
|
|
|
|
It("analyzes faces via FaceAnalyze", func() {
|
|
if !caps[capFaceAnalyze] {
|
|
Skip("face_analyze capability not enabled")
|
|
}
|
|
Expect(faceFile1).NotTo(BeEmpty(), "BACKEND_TEST_FACE_IMAGE_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{Img: base64File(faceFile1)})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetFaces()).NotTo(BeEmpty(), "FaceAnalyze returned no faces")
|
|
for _, f := range res.GetFaces() {
|
|
Expect(f.GetFaceConfidence()).To(BeNumerically(">", 0))
|
|
Expect(f.GetAge()).To(BeNumerically(">", 0), "age should be populated by analyze-capable engines")
|
|
Expect(f.GetDominantGender()).To(BeElementOf("Man", "Woman"))
|
|
}
|
|
GinkgoWriter.Printf("face_analyze: %d faces\n", len(res.GetFaces()))
|
|
|
|
// Liveness: exercise BOTH real and spoof paths. Gated on
|
|
// capFaceAntispoof.
|
|
if caps[capFaceAntispoof] {
|
|
// (a) Real: every face on the real-face fixture must
|
|
// classify as is_real=true with a non-zero score.
|
|
asReal, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{
|
|
Img: base64File(faceFile1), AntiSpoofing: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred(), "FaceAnalyze(anti_spoofing=true, real) failed")
|
|
Expect(asReal.GetFaces()).NotTo(BeEmpty())
|
|
for _, f := range asReal.GetFaces() {
|
|
Expect(f.GetIsReal()).To(BeTrue(), "real-face fixture must classify as is_real=true (score=%.3f)", f.GetAntispoofScore())
|
|
Expect(f.GetAntispoofScore()).To(BeNumerically(">", 0), "antispoof_score must be populated")
|
|
}
|
|
GinkgoWriter.Printf("face_antispoof(analyze,real): %d faces\n", len(asReal.GetFaces()))
|
|
|
|
// (b) Spoof: at least one detected face on the spoof
|
|
// fixture must classify as is_real=false. Skipped if no
|
|
// spoof fixture was provided.
|
|
if faceSpoofFile != "" {
|
|
asFake, err := client.FaceAnalyze(ctx, &pb.FaceAnalyzeRequest{
|
|
Img: base64File(faceSpoofFile), AntiSpoofing: true,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred(), "FaceAnalyze(anti_spoofing=true, spoof) failed")
|
|
Expect(asFake.GetFaces()).NotTo(BeEmpty(), "detector must find a face in the spoof fixture")
|
|
sawFake := false
|
|
for _, f := range asFake.GetFaces() {
|
|
if !f.GetIsReal() {
|
|
sawFake = true
|
|
}
|
|
GinkgoWriter.Printf("face_antispoof(analyze,spoof): is_real=%v score=%.3f\n", f.GetIsReal(), f.GetAntispoofScore())
|
|
}
|
|
Expect(sawFake).To(BeTrue(), "known spoof fixture must produce at least one is_real=false face")
|
|
}
|
|
}
|
|
})
|
|
|
|
// ─── voice (speaker) recognition specs ──────────────────────────────
|
|
|
|
It("produces speaker embeddings via VoiceEmbed", func() {
|
|
if !caps[capVoiceEmbed] {
|
|
Skip("voice_embed capability not enabled")
|
|
}
|
|
Expect(voiceFile1).NotTo(BeEmpty(), "BACKEND_TEST_VOICE_AUDIO_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.VoiceEmbed(ctx, &pb.VoiceEmbedRequest{Audio: voiceFile1})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
vec := res.GetEmbedding()
|
|
Expect(vec).NotTo(BeEmpty(), "VoiceEmbed returned empty vector")
|
|
GinkgoWriter.Printf("voice_embed: dim=%d\n", len(vec))
|
|
})
|
|
|
|
It("verifies speakers via VoiceVerify", func() {
|
|
if !caps[capVoiceVerify] {
|
|
Skip("voice_verify capability not enabled")
|
|
}
|
|
Expect(voiceFile1).NotTo(BeEmpty(), "BACKEND_TEST_VOICE_AUDIO_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
|
|
// Same clip twice — expected verified=true with very small distance.
|
|
same, err := client.VoiceVerify(ctx, &pb.VoiceVerifyRequest{
|
|
Audio1: voiceFile1, Audio2: voiceFile1, Threshold: voiceVerifyCeiling,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(same.GetVerified()).To(BeTrue(), "same clip should verify: dist=%.3f", same.GetDistance())
|
|
Expect(same.GetDistance()).To(BeNumerically("<", 0.05),
|
|
"identical-clip distance should be near zero, got %.3f", same.GetDistance())
|
|
GinkgoWriter.Printf("voice_verify(same): dist=%.3f confidence=%.1f\n", same.GetDistance(), same.GetConfidence())
|
|
|
|
// Cross-pair distance — assert relative ordering: d(file1,file3) > d(same).
|
|
// We don't require the fixtures to contain true same-speaker pairs —
|
|
// good same-speaker audio is hard to source un-gated. The RPC
|
|
// correctness is pinned by the same-clip check above; the pair
|
|
// distances here are about asserting the embedding actually encodes
|
|
// speaker info (ordering changes with speaker identity).
|
|
var d12, d13 float32
|
|
if voiceFile3 != "" {
|
|
res, err := client.VoiceVerify(ctx, &pb.VoiceVerifyRequest{
|
|
Audio1: voiceFile1, Audio2: voiceFile3, Threshold: voiceVerifyCeiling,
|
|
})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("voice_verify(1vs3): skipped — %v\n", err)
|
|
} else {
|
|
d13 = res.GetDistance()
|
|
Expect(d13).To(BeNumerically(">", same.GetDistance()),
|
|
"cross-clip distance %.3f should exceed same-clip distance %.3f", d13, same.GetDistance())
|
|
GinkgoWriter.Printf("voice_verify(1vs3): dist=%.3f verified=%v\n", d13, res.GetVerified())
|
|
}
|
|
}
|
|
|
|
if voiceFile2 != "" {
|
|
res, err := client.VoiceVerify(ctx, &pb.VoiceVerifyRequest{
|
|
Audio1: voiceFile1, Audio2: voiceFile2, Threshold: voiceVerifyCeiling,
|
|
})
|
|
if err != nil {
|
|
GinkgoWriter.Printf("voice_verify(1vs2): skipped — %v\n", err)
|
|
} else {
|
|
d12 = res.GetDistance()
|
|
Expect(d12).To(BeNumerically(">", same.GetDistance()),
|
|
"cross-clip distance %.3f should exceed same-clip distance %.3f", d12, same.GetDistance())
|
|
GinkgoWriter.Printf("voice_verify(1vs2): dist=%.3f verified=%v\n", d12, res.GetVerified())
|
|
}
|
|
}
|
|
|
|
// If both pair distances were computed, record their ordering.
|
|
// We log rather than assert: ordering depends on the specific
|
|
// fixtures used, and CI defaults point at three different speakers.
|
|
if d12 > 0 && d13 > 0 {
|
|
GinkgoWriter.Printf("voice_verify ordering: d(1,2)=%.3f d(1,3)=%.3f\n", d12, d13)
|
|
}
|
|
})
|
|
|
|
It("analyzes voice via VoiceAnalyze", func() {
|
|
if !caps[capVoiceAnalyze] {
|
|
Skip("voice_analyze capability not enabled")
|
|
}
|
|
Expect(voiceFile1).NotTo(BeEmpty(), "BACKEND_TEST_VOICE_AUDIO_1_FILE or _URL must be set")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
|
defer cancel()
|
|
res, err := client.VoiceAnalyze(ctx, &pb.VoiceAnalyzeRequest{Audio: voiceFile1})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res.GetSegments()).NotTo(BeEmpty(), "VoiceAnalyze returned no segments")
|
|
for _, s := range res.GetSegments() {
|
|
Expect(s.GetAge()).To(BeNumerically(">", 0), "age should be populated by analyze-capable engines")
|
|
// Audeering's age-gender head outputs female / male / child;
|
|
// LocalAI capitalises those to Female / Male / Child. Custom
|
|
// checkpoints wired via the age_gender_model option may use
|
|
// different labels, so accept anything non-empty.
|
|
Expect(s.GetDominantGender()).NotTo(BeEmpty())
|
|
}
|
|
GinkgoWriter.Printf("voice_analyze: %d segments\n", len(res.GetSegments()))
|
|
})
|
|
|
|
It("synthesizes speech via TTS", func() {
|
|
if !caps[capTTS] {
|
|
Skip("tts capability not enabled")
|
|
}
|
|
text := os.Getenv("BACKEND_TEST_TTS_TEXT")
|
|
if text == "" {
|
|
text = defaultTTSText
|
|
}
|
|
dst := filepath.Join(workDir, "tts.wav")
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
|
|
defer cancel()
|
|
_, err := client.TTS(ctx, &pb.TTSRequest{Text: text, Dst: dst})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
info, err := os.Stat(dst)
|
|
Expect(err).NotTo(HaveOccurred(), "TTS did not write a file at %s", dst)
|
|
Expect(info.Size()).To(BeNumerically(">", int64(1024)),
|
|
"TTS output too small: %d bytes", info.Size())
|
|
GinkgoWriter.Printf("TTS: wrote %s (%d bytes)\n", dst, info.Size())
|
|
})
|
|
|
|
It("streams PCM via TTSStream", func() {
|
|
if !caps[capTTS] {
|
|
Skip("tts capability not enabled")
|
|
}
|
|
text := os.Getenv("BACKEND_TEST_TTS_TEXT")
|
|
if text == "" {
|
|
text = defaultTTSText
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
|
|
defer cancel()
|
|
stream, err := client.TTSStream(ctx, &pb.TTSRequest{Text: text})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
var chunks, totalBytes int
|
|
for {
|
|
reply, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if audio := reply.GetAudio(); len(audio) > 0 {
|
|
chunks++
|
|
totalBytes += len(audio)
|
|
}
|
|
}
|
|
// Header + at least one PCM chunk proves real streaming (not emit-once).
|
|
Expect(chunks).To(BeNumerically(">=", 2),
|
|
"expected >=2 chunks (header + PCM), got %d (bytes=%d)", chunks, totalBytes)
|
|
Expect(totalBytes).To(BeNumerically(">", 1024),
|
|
"streamed audio too short: %d bytes", totalBytes)
|
|
GinkgoWriter.Printf("TTSStream: %d chunks, %d bytes\n", chunks, totalBytes)
|
|
})
|
|
|
|
It("transforms audio via AudioTransform (batch)", func() {
|
|
if !caps[capAudioTransform] {
|
|
Skip("audio_transform capability not enabled")
|
|
}
|
|
// Need an audio fixture — reuse the transcription audio knob.
|
|
Expect(audioFile).NotTo(BeEmpty(),
|
|
"BACKEND_TEST_AUDIO_FILE or BACKEND_TEST_AUDIO_URL must be set when audio_transform cap is enabled")
|
|
|
|
dst := filepath.Join(workDir, "transformed.wav")
|
|
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
|
|
defer cancel()
|
|
res, err := client.AudioTransform(ctx, &pb.AudioTransformRequest{
|
|
AudioPath: audioFile,
|
|
Dst: dst,
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(res).NotTo(BeNil())
|
|
Expect(res.SampleRate).To(BeNumerically(">", int32(0)),
|
|
"AudioTransform did not report a sample rate")
|
|
Expect(res.Samples).To(BeNumerically(">", int32(0)),
|
|
"AudioTransform did not report any output samples")
|
|
Expect(res.ReferenceProvided).To(BeFalse())
|
|
|
|
info, err := os.Stat(dst)
|
|
Expect(err).NotTo(HaveOccurred(), "AudioTransform did not write a file at %s", dst)
|
|
Expect(info.Size()).To(BeNumerically(">", int64(1024)),
|
|
"AudioTransform output too small: %d bytes", info.Size())
|
|
GinkgoWriter.Printf("AudioTransform: wrote %s (%d bytes, sr=%d, samples=%d)\n",
|
|
dst, info.Size(), res.SampleRate, res.Samples)
|
|
})
|
|
|
|
It("streams audio via AudioTransformStream (bidi)", func() {
|
|
if !caps[capAudioTransform] {
|
|
Skip("audio_transform capability not enabled")
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
|
|
defer cancel()
|
|
stream, err := client.AudioTransformStream(ctx)
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
// First message: Config. Pick the most permissive defaults so the
|
|
// test works against any audio-transform backend (LocalVQE wants
|
|
// 16 kHz / 256-sample / s16; other backends may default differently).
|
|
err = stream.Send(&pb.AudioTransformFrameRequest{
|
|
Payload: &pb.AudioTransformFrameRequest_Config{
|
|
Config: &pb.AudioTransformStreamConfig{
|
|
SampleFormat: pb.AudioTransformStreamConfig_S16_LE,
|
|
},
|
|
},
|
|
})
|
|
Expect(err).NotTo(HaveOccurred())
|
|
|
|
// Send a handful of synthetic silent frames — 256 mono s16 samples
|
|
// each — and assert the backend echoes a frame back per input.
|
|
const (
|
|
frameSamples = 256
|
|
sampleSize = 2 // s16
|
|
nFrames = 5
|
|
)
|
|
silentFrame := make([]byte, frameSamples*sampleSize)
|
|
for i := 0; i < nFrames; i++ {
|
|
err = stream.Send(&pb.AudioTransformFrameRequest{
|
|
Payload: &pb.AudioTransformFrameRequest_Frame{
|
|
Frame: &pb.AudioTransformFrame{AudioPcm: silentFrame},
|
|
},
|
|
})
|
|
Expect(err).NotTo(HaveOccurred(),
|
|
"sending frame %d failed", i)
|
|
}
|
|
Expect(stream.CloseSend()).To(Succeed())
|
|
|
|
var rxFrames int
|
|
var rxBytes int
|
|
for {
|
|
resp, err := stream.Recv()
|
|
if err == io.EOF {
|
|
break
|
|
}
|
|
Expect(err).NotTo(HaveOccurred())
|
|
if pcm := resp.GetPcm(); len(pcm) > 0 {
|
|
rxFrames++
|
|
rxBytes += len(pcm)
|
|
}
|
|
}
|
|
Expect(rxFrames).To(BeNumerically(">=", nFrames),
|
|
"AudioTransformStream returned %d frames for %d sent", rxFrames, nFrames)
|
|
GinkgoWriter.Printf("AudioTransformStream: rx %d frames, %d bytes\n", rxFrames, rxBytes)
|
|
})
|
|
})
|
|
|
|
// extractImage runs `docker create` + `docker export` to materialise the image
|
|
// rootfs into dest. Using export (not save) avoids dealing with layer tarballs.
|
|
func extractImage(image, dest string) {
|
|
GinkgoHelper()
|
|
// The backend images have no default ENTRYPOINT/CMD, so docker create fails
|
|
// unless we override one; run.sh is harmless and guaranteed to exist.
|
|
create := exec.Command("docker", "create", "--entrypoint=/run.sh", image)
|
|
out, err := create.CombinedOutput()
|
|
Expect(err).NotTo(HaveOccurred(), "docker create failed: %s", string(out))
|
|
cid := strings.TrimSpace(string(out))
|
|
DeferCleanup(func() {
|
|
_ = exec.Command("docker", "rm", "-f", cid).Run()
|
|
})
|
|
|
|
// Pipe `docker export <cid>` into `tar -xf - -C dest`.
|
|
exp := exec.Command("docker", "export", cid)
|
|
expOut, err := exp.StdoutPipe()
|
|
Expect(err).NotTo(HaveOccurred())
|
|
exp.Stderr = GinkgoWriter
|
|
Expect(exp.Start()).To(Succeed())
|
|
|
|
tar := exec.Command("tar", "-xf", "-", "-C", dest)
|
|
tar.Stdin = expOut
|
|
tar.Stderr = GinkgoWriter
|
|
Expect(tar.Run()).To(Succeed())
|
|
Expect(exp.Wait()).To(Succeed())
|
|
}
|
|
|
|
// downloadFile fetches url into dest using curl -L. Used for CI convenience;
|
|
// local runs can use BACKEND_TEST_MODEL_FILE to skip downloading.
|
|
// Retry flags guard against transient CI network hiccups (github.com in
|
|
// particular has been flaky from GHA runners, timing out TCP connects).
|
|
func downloadFile(url, dest string) {
|
|
GinkgoHelper()
|
|
cmd := exec.Command("curl", "-sSfL",
|
|
"--connect-timeout", "30",
|
|
"--max-time", "600",
|
|
"--retry", "5",
|
|
"--retry-delay", "5",
|
|
"--retry-all-errors",
|
|
"-o", dest, url)
|
|
cmd.Stdout = GinkgoWriter
|
|
cmd.Stderr = GinkgoWriter
|
|
Expect(cmd.Run()).To(Succeed(), "failed to download %s", url)
|
|
fi, err := os.Stat(dest)
|
|
Expect(err).NotTo(HaveOccurred())
|
|
Expect(fi.Size()).To(BeNumerically(">", 1024), "downloaded file is suspiciously small")
|
|
}
|
|
|
|
func envInt32(name string, def int32) int32 {
|
|
raw := os.Getenv(name)
|
|
if raw == "" {
|
|
return def
|
|
}
|
|
var v int32
|
|
_, err := fmt.Sscanf(raw, "%d", &v)
|
|
if err != nil {
|
|
return def
|
|
}
|
|
return v
|
|
}
|
|
|
|
func envFloat32(name string, def float32) float32 {
|
|
raw := os.Getenv(name)
|
|
if raw == "" {
|
|
return def
|
|
}
|
|
var v float32
|
|
if _, err := fmt.Sscanf(raw, "%f", &v); err != nil {
|
|
return def
|
|
}
|
|
return v
|
|
}
|
|
|
|
// resolveFaceFixture returns the local path of a face-fixture image,
|
|
// preferring BACKEND_TEST_<prefix>_FILE when set and otherwise
|
|
// downloading BACKEND_TEST_<prefix>_URL into workDir. Returns an empty
|
|
// string when neither is configured — specs that need it should skip.
|
|
func resolveFaceFixture(workDir, prefix, defaultName string) string {
|
|
if path := os.Getenv(prefix + "_FILE"); path != "" {
|
|
return path
|
|
}
|
|
url := os.Getenv(prefix + "_URL")
|
|
if url == "" {
|
|
return ""
|
|
}
|
|
dest := filepath.Join(workDir, defaultName)
|
|
downloadFile(url, dest)
|
|
return dest
|
|
}
|
|
|
|
// base64File reads a file and returns its base64 encoding.
|
|
func base64File(path string) string {
|
|
GinkgoHelper()
|
|
data, err := os.ReadFile(path)
|
|
Expect(err).NotTo(HaveOccurred(), "reading %s", path)
|
|
return base64.StdEncoding.EncodeToString(data)
|
|
}
|
|
|
|
func keys(m map[string]bool) []string {
|
|
out := make([]string, 0, len(m))
|
|
for k, v := range m {
|
|
if v {
|
|
out = append(out, k)
|
|
}
|
|
}
|
|
return out
|
|
}
|