Compare commits

..

1 Commits

Author SHA1 Message Date
dependabot[bot]
e90d2f42f2 chore(deps): update transformers requirement
Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v5.9.0...v5.10.2)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 5.10.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-08 18:33:54 +00:00
617 changed files with 7658 additions and 48109 deletions

View File

@@ -198,27 +198,6 @@ docker-build-backends: ... docker-build-<backend-name>
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
## Documenting the backend (README + docs)
A backend is not "added" until it is discoverable. Update the user-facing docs:
- **`docs/content/features/backends.md`** - add the backend to the right
category in the "LocalAI supports various types of backends" list (and add a
new category if it introduces a new modality, e.g. sound classification).
- If the backend introduces a **new API surface** (a new endpoint or a realtime
capability), document it under `docs/content/` where its area lives (audio,
vision, etc.) and follow the api-endpoints checklist in
[api-endpoints-and-auth.md](api-endpoints-and-auth.md).
**If the backend is a native C/C++/GGML engine created and maintained by the
LocalAI team** (a from-scratch port like `parakeet.cpp`, `ced.cpp`,
`vibevoice.cpp`, `rf-detr.cpp`, not a wrapper around a third-party runtime), it
ALSO belongs in the top-level **`README.md`** table under "native C/C++/GGML
engines ... developed and maintained by the LocalAI project itself". Add a row
linking the upstream engine repo with a one-line description. This is the
project's showcase of its own engines; a new in-house backend that is missing
from it is a documentation bug.
## 5. Verification Checklist
After adding a new backend, verify:
@@ -232,8 +211,6 @@ After adding a new backend, verify:
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
- [ ] Documented: added to the category list in `docs/content/features/backends.md` (and any new endpoint/realtime capability documented under `docs/content/`)
- [ ] If it is an in-house native C/C++/GGML engine, added to the maintained-engines table in the top-level `README.md`
## Bundling runtime shared libraries (`package.sh`)

View File

@@ -44,39 +44,6 @@ maps to `DS4_THINK_HIGH`. We pass the chosen mode to `ds4_chat_append_assistant_
via `ModelOptions.Options[] = "kv_cache_dir:/some/path"`. Format is **our own** -
NOT bit-compatible with ds4-server's KVC files (interop is a follow-up plan).
## Engine options (LoadModel)
`LoadModel` maps `ModelOptions.Options[]` (`"key:value"`, from model-YAML
`options:`) onto `ds4_engine_options` through a **declarative table**
(`kEngineOptSpecs` + `apply_engine_option` in `grpc-server.cpp`). The struct is
plain C with no reflection, so the field set is enumerated once in the table;
adding a future engine knob is a one-line table row, not a new branch. Unknown
keys are ignored (back-compat). A bare flag (`ssd_streaming` with no value)
means `true`. Path-type values (`mtp_path`, `expert_profile_path`,
`directional_steering_file`) resolve **relative to the model directory**, so a
gallery entry can reference a companion file it downloaded by bare filename;
absolute values pass through. `ds4_role` / `ds4_layers` / `ds4_listen` /
`ds4_route_timeout` / `kv_cache_dir` keep their dedicated handling (validation
+ coordinator wiring) and are not in the table.
Wired keys: `mtp_path`, `mtp_draft`, `mtp_margin`, `prefill_chunk`,
`power_percent`, `warm_weights`, `quality`, `ssd_streaming`,
`ssd_streaming_cold`, `ssd_streaming_preload_experts`,
`ssd_streaming_cache_experts` (count or `NGB`, sets both experts+bytes via
`ds4_parse_streaming_cache_experts_arg`), `simulate_used_memory` (`NGB` via
`ds4_parse_gib_arg`), `expert_profile_path`, `directional_steering_file`,
`directional_steering_attn`, `directional_steering_ffn`.
## SSD streaming (running models larger than RAM)
ds4's **SSD streaming** keeps non-routed weights resident and streams routed MoE
experts from the GGUF on cache misses, turning "does it fit in RAM" into a speed
spectrum. **Metal (Darwin) only** - it is a no-op on CUDA/CPU. Enable with
`options: ["ssd_streaming"]`; size the routed-expert cache with
`ssd_streaming_cache_experts:NGB` (omit for ds4's automatic 80%-of-working-set
budget). Gallery entries built on this: `deepseek-v4-flash-q4-ssd` (153 GB Flash
on a 128 GB Mac) and `deepseek-v4-pro-q2-ssd` (433 GB Pro, experimental).
## Build matrix
| Build | Where | Notes |

View File

@@ -70,12 +70,6 @@ if [ "${BUILD_TYPE:-}" = "vulkan" ] && [ "${SKIP_DRIVERS:-false}" = "false" ]; t
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe + Arm SoC) and their ICD
# manifests. The LunarG SDK below only provides the loader and shader
# tooling, not hardware drivers — without Mesa the packaged Vulkan backend
# would ship a loader that finds no GPU. package-gpu-libs.sh bundles these
# .so files plus their deps into the backend so it stays self-contained.
apt-get install -y mesa-vulkan-drivers libdrm2
if [ "amd64" = "${TARGETARCH:-}" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz"
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz

View File

@@ -31,15 +31,6 @@ backend/python/**/source
backend/cpp/llama-cpp/llama.cpp
backend/cpp/llama-cpp-*-build
# privacy-filter: same in-place pattern. The Makefile fetches privacy-filter.cpp
# at the pinned commit (or symlinks a PRIVACY_FILTER_SRC checkout for local dev).
# A stale dir/symlink COPY'd into the image makes the clone step fail (dangling
# symlink) or compile against the wrong commit, so keep host build state out.
backend/cpp/privacy-filter/privacy-filter.cpp
backend/cpp/privacy-filter/build
backend/cpp/privacy-filter/grpc-server
backend/cpp/privacy-filter/package
# Rust backend build output (sources are tracked; target/ is generated)
backend/rust/*/target

View File

File diff suppressed because it is too large Load Diff

View File

@@ -98,7 +98,6 @@ jobs:
/opt/homebrew/Cellar/hiredis
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
@@ -110,10 +109,7 @@ jobs:
# Without explicitly installing them, a brew cache-hit run restores
# ccache's Cellar dir but skips installing those transitive deps,
# and ccache fails at runtime with `dyld: Library not loaded`.
# nlohmann-json is header-only and required by the ds4 backend
# (dsml_renderer.cpp includes <nlohmann/json.hpp>); on Linux it comes
# from the apt-installed nlohmann-json3-dev in the build image.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd
# Force-reinstall ccache so brew re-validates its full runtime-dep
# closure on every run. This is the durable fix: when the upstream
# ccache formula gains a new transitive dep (as it has multiple times
@@ -132,7 +128,7 @@ jobs:
# and decides "already installed" without re-linking, so on a cache-
# hit run the formulas aren't on PATH. Force-link them; --overwrite
# tolerates pre-existing symlinks from earlier installs.
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json 2>/dev/null || true
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd 2>/dev/null || true
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
@@ -152,7 +148,6 @@ jobs:
/opt/homebrew/Cellar/hiredis
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----

View File

@@ -26,10 +26,6 @@ jobs:
variable: "DS4_VERSION"
branch: "main"
file: "backend/cpp/ds4/Makefile"
- repository: "localai-org/privacy-filter.cpp"
variable: "PRIVACY_FILTER_VERSION"
branch: "master"
file: "backend/cpp/privacy-filter/Makefile"
- repository: "ggml-org/whisper.cpp"
variable: "WHISPER_CPP_VERSION"
branch: "master"
@@ -42,22 +38,6 @@ jobs:
variable: "PARAKEET_VERSION"
branch: "master"
file: "backend/go/parakeet-cpp/Makefile"
- repository: "mudler/ced.cpp"
variable: "CED_VERSION"
branch: "master"
file: "backend/go/ced/Makefile"
- repository: "mudler/voice-detect.cpp"
variable: "VOICEDETECT_VERSION"
branch: "master"
file: "backend/go/voice-detect/Makefile"
- repository: "mudler/face-detect.cpp"
variable: "FACEDETECT_VERSION"
branch: "master"
file: "backend/go/face-detect/Makefile"
- repository: "mudler/depth-anything.cpp"
variable: "DEPTHANYTHING_VERSION"
branch: "master"
file: "backend/go/depth-anything-cpp/Makefile"
- repository: "leejet/stable-diffusion.cpp"
variable: "STABLEDIFFUSION_GGML_VERSION"
branch: "master"
@@ -82,18 +62,10 @@ jobs:
variable: "RFDETR_VERSION"
branch: "main"
file: "backend/go/rfdetr-cpp/Makefile"
- repository: "mudler/locate-anything.cpp"
variable: "LOCATEANYTHING_VERSION"
branch: "master"
file: "backend/go/locate-anything-cpp/Makefile"
- repository: "ServeurpersoCom/qwentts.cpp"
- repository: "predict-woo/qwen3-tts.cpp"
variable: "QWEN3TTS_CPP_VERSION"
branch: "master"
branch: "main"
file: "backend/go/qwen3-tts-cpp/Makefile"
- repository: "ServeurpersoCom/omnivoice.cpp"
variable: "OMNIVOICE_VERSION"
branch: "master"
file: "backend/go/omnivoice-cpp/Makefile"
- repository: "localai-org/vibevoice.cpp"
variable: "VIBEVOICE_CPP_VERSION"
branch: "master"

View File

@@ -21,10 +21,7 @@ jobs:
uses: securego/gosec@v2.27.1
with:
# we let the report trigger content trigger a failure using the GitHub Security features.
# backend/go/supertonic is excluded: it vendors upstream supertone-inc/supertonic
# (helper.go), whose findings (G304 model-file loads, G404 math/rand for flow-matching
# noise, G104 unhandled errors) are inherent to that upstream code, not ours to rewrite.
args: '-no-fail -exclude-dir=backend/go/supertonic -fmt sarif -out results.sarif ./...'
args: '-no-fail -fmt sarif -out results.sarif ./...'
- name: Upload SARIF file
if: ${{ github.actor != 'dependabot[bot]' }}
uses: github/codeql-action/upload-sarif@v4

View File

@@ -38,7 +38,6 @@ jobs:
acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
rfdetr-cpp: ${{ steps.detect.outputs.rfdetr-cpp }}
locate-anything-cpp: ${{ steps.detect.outputs.locate-anything-cpp }}
vibevoice-cpp: ${{ steps.detect.outputs.vibevoice-cpp }}
localvqe: ${{ steps.detect.outputs.localvqe }}
voxtral: ${{ steps.detect.outputs.voxtral }}
@@ -564,7 +563,7 @@ jobs:
- name: Run e2e-backends smoke
env:
BACKEND_IMAGE: quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp
BACKEND_TEST_CAPS: health,load,predict,stream,logprobs,logit_bias,tokenize
BACKEND_TEST_CAPS: health,load,predict,stream,logprobs,logit_bias
run: |
make test-extra-backend
# Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked LLM.
@@ -902,45 +901,6 @@ jobs:
- name: Test rfdetr-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/rfdetr-cpp test
# Per-backend e2e for locate-anything-cpp: builds the .so + Go binary and
# runs `make -C backend/go/locate-anything-cpp test`. test.sh fetches the
# locate-anything-q8_0 GGUF (~6.3 GB, NVIDIA LocateAnything-3B) from the
# published mudler/locate-anything.cpp-gguf HF repo + a COCO image, then the
# Go wire test loads the model and runs an open-vocabulary Detect, asserting
# at least one labeled box. Heavier than the other Go backends (it is a 3B),
# so it is gated to changes under backend/go/locate-anything-cpp/.
tests-locate-anything-cpp:
needs: detect-changes
if: needs.detect-changes.outputs.locate-anything-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev
- name: Setup Go
uses: actions/setup-go@v5
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build locate-anything-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/locate-anything-cpp
- name: Test locate-anything-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/locate-anything-cpp test
# Per-backend smoke for vibevoice-cpp: builds the .so + Go binary and
# runs `make -C backend/go/vibevoice-cpp test`. test.sh auto-downloads
# the published mudler/vibevoice.cpp-models bundle (TTS Q8_0 + ASR Q4_K

View File

@@ -74,8 +74,6 @@ linters:
paths:
# Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
- 'backend/go/whisper/sources'
# Vendored upstream supertonic pipeline (supertone-inc/supertonic go/helper.go).
- 'backend/go/supertonic/helper.go'
- 'docs/'
rules:
# CLI entry points: kong's `env:"..."` tag is the legitimate env→struct

View File

@@ -108,7 +108,6 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
cuda-nvrtc-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio
GOCMD=go
GOTEST=$(GOCMD) test
@@ -180,7 +180,7 @@ osx-signed: build
## Run
run: ## run local-ai
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) run ./cmd/local-ai
CGO_LDFLAGS="$(CGO_LDFLAGS)" $(GOCMD) run ./
prepare-test: protogen-go build-mock-backend
@@ -566,7 +566,6 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/speaker-recognition
$(MAKE) -C backend/rust/kokoros kokoros-grpc
$(MAKE) -C backend/go/rfdetr-cpp
$(MAKE) -C backend/go/locate-anything-cpp
test-extra: prepare-test-extra
$(MAKE) -C backend/python/transformers test
@@ -594,9 +593,6 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/python/speaker-recognition test
$(MAKE) -C backend/rust/kokoros test
$(MAKE) -C backend/go/rfdetr-cpp test
$(MAKE) -C backend/go/locate-anything-cpp test
$(MAKE) -C backend/go/depth-anything-cpp test
$(MAKE) -C backend/go/supertonic test
##
## End-to-end gRPC tests that exercise a built backend container image.
@@ -1164,10 +1160,6 @@ BACKEND_TURBOQUANT = turboquant|turboquant|.|false|false
# Single-model; hardware-only validation lives at tests/e2e-backends/
# (BACKEND_BINARY mode); see docs/superpowers/plans/2026-05-11-ds4-backend.md.
BACKEND_DS4 = ds4|ds4|.|false|false
# privacy-filter wraps the standalone privacy-filter.cpp GGML engine (the
# openai-privacy-filter PII/NER token classifier) — the TokenClassify RPC for
# the PII redactor tier, on stock ggml with no llama.cpp carry-patches.
BACKEND_PRIVACY_FILTER = privacy-filter|privacy-filter|.|false|false
# Golang backends
BACKEND_PIPER = piper|golang|.|false|true
@@ -1179,16 +1171,13 @@ BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|tr
BACKEND_WHISPER = whisper|golang|.|false|true
BACKEND_CRISPASR = crispasr|golang|.|false|true
BACKEND_PARAKEET_CPP = parakeet-cpp|golang|.|false|true
BACKEND_DEPTH_ANYTHING_CPP = depth-anything-cpp|golang|.|false|true
BACKEND_VOXTRAL = voxtral|golang|.|false|true
BACKEND_ACESTEP_CPP = acestep-cpp|golang|.|false|true
BACKEND_QWEN3_TTS_CPP = qwen3-tts-cpp|golang|.|false|true
BACKEND_OMNIVOICE_CPP = omnivoice-cpp|golang|.|false|true
BACKEND_VIBEVOICE_CPP = vibevoice-cpp|golang|.|false|true
BACKEND_LOCALVQE = localvqe|golang|.|false|true
BACKEND_OPUS = opus|golang|.|false|true
BACKEND_SHERPA_ONNX = sherpa-onnx|golang|.|false|true
BACKEND_SUPERTONIC = supertonic|golang|.|false|true
# Python backends with root context
BACKEND_RERANKERS = rerankers|python|.|false|true
@@ -1262,7 +1251,6 @@ $(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_IK_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_TURBOQUANT)))
$(eval $(call generate-docker-build-target,$(BACKEND_DS4)))
$(eval $(call generate-docker-build-target,$(BACKEND_PRIVACY_FILTER)))
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
$(eval $(call generate-docker-build-target,$(BACKEND_CLOUD_PROXY)))
@@ -1272,7 +1260,6 @@ $(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_CRISPASR)))
$(eval $(call generate-docker-build-target,$(BACKEND_PARAKEET_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_DEPTH_ANYTHING_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_VOXTRAL)))
$(eval $(call generate-docker-build-target,$(BACKEND_OPUS)))
$(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
@@ -1305,7 +1292,6 @@ $(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACESTEP_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_QWEN3_TTS_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_OMNIVOICE_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCALVQE)))
$(eval $(call generate-docker-build-target,$(BACKEND_MLX)))
@@ -1318,13 +1304,12 @@ $(eval $(call generate-docker-build-target,$(BACKEND_KOKOROS)))
$(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_RFDETR_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
$(eval $(call generate-docker-build-target,$(BACKEND_SUPERTONIC)))
# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-ds4 docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-crispasr docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-liquid-audio docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-rfdetr-cpp docker-build-qwen3-tts-cpp docker-build-omnivoice-cpp docker-build-vibevoice-cpp docker-build-localvqe docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx docker-build-cloud-proxy docker-build-supertonic docker-build-depth-anything-cpp docker-build-privacy-filter
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-ds4 docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-crispasr docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-liquid-audio docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-rfdetr-cpp docker-build-qwen3-tts-cpp docker-build-vibevoice-cpp docker-build-localvqe docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx docker-build-cloud-proxy
########################################################
### Mock Backend for E2E Tests

View File

@@ -29,18 +29,6 @@
<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>
<!-- Keep these links, translations synced daily. -->
<p align="center">
<a href="https://zdoc.app/de/mudler/LocalAI">Deutsch</a> |
<a href="https://zdoc.app/es/mudler/LocalAI">Español</a> |
<a href="https://zdoc.app/fr/mudler/LocalAI">français</a> |
<a href="https://zdoc.app/ja/mudler/LocalAI">日本語</a> |
<a href="https://zdoc.app/ko/mudler/LocalAI">한국어</a> |
<a href="https://zdoc.app/pt/mudler/LocalAI">Português</a> |
<a href="https://zdoc.app/ru/mudler/LocalAI">Русский</a> |
<a href="https://zdoc.app/zh/mudler/LocalAI">中文</a>
</p>
**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
**A small core, not a bundle.** Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.
@@ -161,26 +149,12 @@ local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
local-ai run oci://localai/phi-2:latest
```
To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, `/models` lists installed models and `/model <name>` switches between them.
```bash
# Terminal 1
local-ai run llama-3.2-1b-instruct:q4_k_m
# Terminal 2
local-ai chat --model llama-3.2-1b-instruct:q4_k_m
```
> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).
For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).
## Latest News
- **June 2026**: New [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus [streaming of the realtime LLM / TTS / transcription pipeline stages](https://github.com/mudler/LocalAI/pull/10176) and [configurable WebRTC ICE candidates](https://github.com/mudler/LocalAI/pull/10231).
- **June 2026**: Big speech push: the [parakeet.cpp](https://github.com/mudler/parakeet.cpp) ASR engine gains [NeMo-faithful segment timestamps](https://github.com/mudler/LocalAI/pull/10207), a [multilingual streaming Nemotron-3.5 model](https://github.com/mudler/LocalAI/pull/10199), [dynamic batching for concurrent transcription](https://github.com/mudler/LocalAI/pull/10112) and [CUDA graphs](https://github.com/mudler/LocalAI/pull/10273); the new [CrispASR backend](https://github.com/mudler/LocalAI/pull/10099) adds multi-architecture ASR + TTS, and [60 Piper TTS voices across 42 languages](https://github.com/mudler/LocalAI/pull/10296) land in the gallery (plus [per-request TTS instructions and params](https://github.com/mudler/LocalAI/pull/10172)).
- **June 2026**: New backends and models: [locate-anything.cpp](https://github.com/mudler/LocalAI/pull/10264) for open-vocabulary object detection via ggml, [Ideogram4 image generation](https://github.com/mudler/LocalAI/pull/10201) in stablediffusion-ggml, [llama.cpp video input](https://github.com/mudler/LocalAI/pull/10216), and the [Gemma 4 QAT family with MTP speculative-decoding pairs](https://github.com/mudler/LocalAI/pull/10215). Plus an [interactive CLI chat mode](https://github.com/mudler/LocalAI/pull/10226) and [RAG source citations in agent responses](https://github.com/mudler/LocalAI/pull/10228).
- **June 2026**: Distributed mode hardening: [prefix-cache-aware routing](https://github.com/mudler/LocalAI/pull/10071), a [production-ready request router with auto-sized embedding/rerank batches](https://github.com/mudler/LocalAI/pull/10104), [ds4 layer-split distributed inference](https://github.com/mudler/LocalAI/pull/10098), [NATS JWT auth + TLS/mTLS](https://github.com/mudler/LocalAI/pull/10159), and [resumable file uploads](https://github.com/mudler/LocalAI/pull/10109).
- **May 2026**: **LocalAI 4.3.0** - `llama.cpp` [prompt cache on by default](https://github.com/mudler/LocalAI/pull/9925) (repeated system prompts collapse from minutes to seconds), [keyless cosign signing of backend OCI images](https://github.com/mudler/LocalAI/pull/9823), [per-API-key + per-user usage attribution](https://github.com/mudler/LocalAI/pull/9920), Distributed v3 with [per-request replica routing](https://github.com/mudler/LocalAI/pull/9968). [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.3.0)
- **May 2026**: **LocalAI 4.2.0** - LocalAI sees and hears: [voice recognition](https://github.com/mudler/LocalAI/pull/9500), [face recognition + antispoofing liveness](https://github.com/mudler/LocalAI/pull/9480), speaker diarization. Plus [drop-in Ollama API](https://github.com/mudler/LocalAI/pull/9284), [video generation](https://github.com/mudler/LocalAI/pull/9420), redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.2.0)
- **April 2026**: **LocalAI 4.1.0** - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.1.0)
@@ -220,29 +194,10 @@ For older news and full release notes, see [GitHub Releases](https://github.com/
## Supported Backends & Acceleration
LocalAI supports **60+ backends** including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
LocalAI supports **36+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
### Backends built by us
Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:
| Backend | What it does |
|---------|-------------|
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
| [ced.cpp](https://github.com/mudler/ced.cpp) | C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition |
| [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C |
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation |
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) |
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose estimation |
| [privacy-filter.cpp](https://github.com/localai-org/privacy-filter.cpp) | Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier |
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
## Resources
- [Documentation](https://localai.io/)
@@ -252,7 +207,7 @@ We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-
- [Integrations & community projects](https://localai.io/docs/integrations/)
- [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4)
- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
- [Examples](https://github.com/mudler/LocalAI-examples) — including the [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (Go client for the Realtime API with tool calling)
- [Examples](https://github.com/mudler/LocalAI-examples)
## Team

View File

@@ -65,12 +65,7 @@ RUN <<EOT bash
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils && \
apt-get install -y mesa-vulkan-drivers libdrm2
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe) + their manifests. The
# LunarG SDK below only provides the loader and shader tooling, not
# hardware drivers — without Mesa, package-gpu-libs.sh has no ICD to
# bundle and the packaged backend finds no GPU at runtime.
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
@@ -211,16 +206,6 @@ RUN if [ "${BACKEND}" = "opus" ]; then \
apt-get clean && rm -rf /var/lib/apt/lists/*; \
fi
# CrispASR's piper TTS backend dlopens libespeak-ng at runtime to phonemize
# non-English text (the MIT-clean path; English uses a built-in G2P). Install
# the espeak-ng runtime + its libpcaudio/libsonic deps + voice data so
# package.sh can bundle them into the FROM scratch image.
RUN if [ "${BACKEND}" = "crispasr" ]; then \
apt-get update && apt-get install -y --no-install-recommends \
espeak-ng-data libespeak-ng1 libpcaudio0 libsonic0 && \
apt-get clean && rm -rf /var/lib/apt/lists/*; \
fi
COPY . /LocalAI
RUN git config --global --add safe.directory /LocalAI

View File

@@ -1,109 +0,0 @@
ARG BASE_IMAGE=ubuntu:24.04
# BUILDER_BASE_IMAGE defaults to BASE_IMAGE so the Dockerfile parses when no
# prebuilt base is supplied; the builder-prebuilt stage is only entered when
# BUILDER_TARGET=builder-prebuilt, so the fallback content is harmless
# (BuildKit prunes the unreferenced builder).
ARG BUILDER_BASE_IMAGE=${BASE_IMAGE}
# BUILDER_TARGET selects which builder stage the scratch image copies from.
# Declared before any FROM so it is usable in `FROM ${BUILDER_TARGET}`. The
# backend_build workflow sets it to builder-prebuilt when the matrix entry
# provides builder-base-image, else builder-fromsource (the local default).
ARG BUILDER_TARGET=builder-fromsource
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
# privacy-filter: standalone GGML engine for the openai-privacy-filter PII/NER
# token classifier, wrapped as a LocalAI gRPC backend.
#
# Mirrors backend/Dockerfile.llama-cpp: the build toolchain (gRPC + cmake +
# protoc + conditional CUDA/Vulkan) comes from the shared
# .docker/install-base-deps.sh (from-source path) or a prebuilt
# quay.io/go-skynet/ci-cache:base-grpc-* image (CI path) — nothing GPU-specific
# is hand-rolled here. BUILD_TYPE selects the engine backend in the Makefile:
# "" = cpu, "cublas" -> -DPF_CUDA=ON, "vulkan" -> -DPF_VULKAN=ON.
# ============================================================================
# Stage: builder-fromsource — self-contained build. Runs the same install
# script backend/Dockerfile.base-grpc-builder runs, so this path is
# bit-equivalent to the prebuilt base. Used when BUILDER_TARGET=builder-fromsource
# (the default; local `make backends/privacy-filter`).
# ============================================================================
FROM ${BASE_IMAGE} AS builder-fromsource
ARG BUILD_TYPE
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG CMAKE_FROM_SOURCE=false
# CUDA Toolkit 13.x needs CMake 3.31.9+ for correct toolchain/arch detection.
ARG CMAKE_VERSION=3.31.10
ARG GRPC_VERSION=v1.65.0
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG SKIP_DRIVERS=false
ARG TARGETARCH
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
ENV BUILD_TYPE=${BUILD_TYPE} \
CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} \
CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} \
CMAKE_FROM_SOURCE=${CMAKE_FROM_SOURCE} \
CMAKE_VERSION=${CMAKE_VERSION} \
GRPC_VERSION=${GRPC_VERSION} \
GRPC_MAKEFLAGS=${GRPC_MAKEFLAGS} \
SKIP_DRIVERS=${SKIP_DRIVERS} \
TARGETARCH=${TARGETARCH} \
UBUNTU_VERSION=${UBUNTU_VERSION} \
APT_MIRROR=${APT_MIRROR} \
APT_PORTS_MIRROR=${APT_PORTS_MIRROR} \
DEBIAN_FRONTEND=noninteractive
# CUDA on PATH (a no-op when CUDA is not installed, e.g. cpu/vulkan builds).
ENV PATH=/usr/local/cuda/bin:${PATH}
WORKDIR /build
# apt deps + cmake + protoc + gRPC + conditional CUDA/Vulkan, all from the
# shared script (the source of truth that base-grpc-builder also runs).
RUN --mount=type=bind,source=.docker/install-base-deps.sh,target=/usr/local/sbin/install-base-deps \
--mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
bash /usr/local/sbin/install-base-deps
# install-base-deps installs gRPC under /opt/grpc; copy it to /usr/local so the
# backend's find_package(gRPC CONFIG) resolves it at the canonical prefix.
RUN cp -a /opt/grpc/. /usr/local/
COPY . /LocalAI
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
# ============================================================================
# Stage: builder-prebuilt — FROM a prebuilt
# quay.io/go-skynet/ci-cache:base-grpc-* image (gRPC at /opt/grpc + apt deps +
# CUDA/Vulkan already installed). Used in CI when the matrix entry sets
# builder-base-image.
# ============================================================================
FROM ${BUILDER_BASE_IMAGE} AS builder-prebuilt
ARG BUILD_TYPE
ARG TARGETARCH
ENV BUILD_TYPE=${BUILD_TYPE}
# CUDA on PATH (a no-op for the cpu/vulkan base images).
ENV PATH=/usr/local/cuda/bin:${PATH}
# Mirror builder-fromsource: the base-grpc image installs gRPC to /opt/grpc but
# does not copy it to /usr/local.
RUN cp -a /opt/grpc/. /usr/local/
COPY . /LocalAI
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
# ============================================================================
# Final stage — copy the package output from the selected builder. BuildKit
# does not expand variables in `COPY --from=`, so alias the chosen builder to a
# fixed stage name first.
# ============================================================================
FROM ${BUILDER_TARGET} AS builder
FROM scratch
COPY --from=builder /LocalAI/backend/cpp/privacy-filter/package/. ./

View File

@@ -66,12 +66,7 @@ RUN <<EOT bash
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils && \
apt-get install -y mesa-vulkan-drivers libdrm2
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe) + their manifests. The
# LunarG SDK below only provides the loader and shader tooling, not
# hardware drivers — without Mesa, package-gpu-libs.sh has no ICD to
# bundle and the packaged backend finds no GPU at runtime.
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
@@ -131,7 +126,6 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
cuda-nvrtc-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \

View File

@@ -24,10 +24,6 @@ service Backend {
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
rpc Detect(DetectOptions) returns (DetectResponse) {}
// SoundDetection runs an audio-tagging / sound-event-classification model
// (e.g. CED over the AudioSet ontology) on a clip and returns scored labels.
rpc SoundDetection(SoundDetectionRequest) returns (SoundDetectionResponse) {}
rpc Depth(DepthRequest) returns (DepthResponse) {}
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
rpc VoiceVerify(VoiceVerifyRequest) returns (VoiceVerifyResponse) {}
@@ -674,53 +670,6 @@ message DetectResponse {
repeated Detection Detections = 1;
}
// --- Sound-event classification / audio tagging messages (CED) ---
message SoundDetectionRequest {
string src = 1; // audio file path (LocalAI writes the upload to disk)
int32 top_k = 2; // number of top tags to return (0 = all classes)
float threshold = 3; // optional: drop tags scoring below this
}
message SoundClass {
string label = 1; // AudioSet class name, e.g. "Baby cry, infant cry"
float score = 2; // per-class probability (multi-label, independent)
int32 index = 3; // class index in the model ontology
}
message SoundDetectionResponse {
repeated SoundClass detections = 1; // score-descending
}
// --- Depth estimation messages (Depth Anything 3) ---
message DepthRequest {
string src = 1; // input image (filesystem path or base64-encoded payload)
string dst = 2; // optional output directory for exports (glb/colmap)
bool include_depth = 3; // return the per-pixel metric depth map
bool include_confidence = 4; // return the per-pixel confidence map (DualDPT)
bool include_pose = 5; // return camera extrinsics/intrinsics (DualDPT)
bool include_sky = 6; // return the per-pixel sky map (mono models)
bool include_points = 7; // back-project to a 3D point cloud (DualDPT)
float points_conf_thresh = 8; // keep points with confidence >= this threshold
repeated string exports = 9; // requested exports: "glb", "colmap"
}
message DepthResponse {
int32 width = 1; // processed depth-map width
int32 height = 2; // processed depth-map height
repeated float depth = 3; // width*height row-major metric depth
repeated float confidence = 4; // width*height row-major confidence (DualDPT)
repeated float sky = 5; // width*height row-major sky map (mono)
repeated float extrinsics = 6; // 12 floats, 3x4 row-major (world-to-camera)
repeated float intrinsics = 7; // 9 floats, 3x3 row-major
int32 num_points = 8; // number of 3D points
repeated float points = 9; // num_points*3 xyz, world space
bytes point_colors = 10; // num_points*3 uint8 rgb
repeated string export_paths = 11; // paths written for the requested exports
bool is_metric = 12; // depth is in metric units
}
// --- Face recognition messages ---
message FacialArea {

View File

@@ -9,22 +9,6 @@ option(DS4_NATIVE "Compile with -march=native / -mcpu=native" ON)
set(DS4_GPU "cpu" CACHE STRING "GPU backend: cpu, cuda, or metal")
set(DS4_DIR "${CMAKE_CURRENT_SOURCE_DIR}/ds4" CACHE PATH "Path to cloned ds4 source")
if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
# Homebrew installs protobuf/grpc under a non-default prefix. The generated
# backend.pb.cc / backend.grpc.pb.cc pull in google/protobuf and grpcpp
# headers, but the hw_grpc_proto library links neither target, so on macOS
# the headers (e.g. google/protobuf/runtime_version.h) are never on the
# compiler's include path. Add the Homebrew prefix globally, matching the
# llama-cpp backend which builds on Darwin CI.
if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "arm64")
set(HOMEBREW_DEFAULT_PREFIX "/opt/homebrew")
else()
set(HOMEBREW_DEFAULT_PREFIX "/usr/local")
endif()
link_directories("${HOMEBREW_DEFAULT_PREFIX}/lib")
include_directories("${HOMEBREW_DEFAULT_PREFIX}/include")
endif()
find_package(Threads REQUIRED)
find_package(Protobuf CONFIG QUIET)
if(NOT Protobuf_FOUND)

View File

@@ -1,10 +1,10 @@
# ds4 backend Makefile.
#
# Upstream pin lives below as DS4_VERSION?=80ebbc396aee40eedc1d829222f3362d10fa4c6c
# Upstream pin lives below as DS4_VERSION?=c463029c205c2ec8d7ab6c0df4a3f52979091286
# (.github/bump_deps.sh) can find and update it - matches the
# llama-cpp / ik-llama-cpp / turboquant convention.
DS4_VERSION?=80ebbc396aee40eedc1d829222f3362d10fa4c6c
DS4_VERSION?=c463029c205c2ec8d7ab6c0df4a3f52979091286
DS4_REPO?=https://github.com/antirez/ds4
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

View File

@@ -25,8 +25,6 @@ extern "C" {
#include <chrono>
#include <climits>
#include <csignal>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <ctime>
@@ -107,130 +105,6 @@ static bool parse_layers_spec(const std::string &spec, ds4_distributed_layers *o
return true;
}
// Parse a boolean LoadModel option. An empty value (a bare flag-style option
// like "ssd_streaming" with no colon) means true so model YAMLs can write
// options: ["ssd_streaming"] to enable a switch.
static bool parse_bool_option(const std::string &s, bool *out) {
if (s.empty() || s == "true" || s == "1" || s == "yes" || s == "on") { *out = true; return true; }
if (s == "false" || s == "0" || s == "no" || s == "off") { *out = false; return true; }
return false;
}
// Table-driven mapping from LoadModel option keys to ds4_engine_options fields.
// ds4_engine_options is a fixed C struct with no reflection, so the field set
// is enumerated once here; adding a future engine knob is a one-line table
// entry rather than a new branch in LoadModel. Two fields need ds4's own typed
// parsers (Gib, CacheExperts) so a plain string passthrough can't cover them.
enum class DsOptType { Bool, Int, Uint, Float, Str, Gib, CacheExperts };
struct DsOptSpec {
const char *key;
DsOptType type;
size_t off; // byte offset into ds4_engine_options
size_t off2; // second offset (CacheExperts writes experts + bytes)
bool is_path; // Str values: resolve a relative value against the model dir
};
static const DsOptSpec kEngineOptSpecs[] = {
{"mtp_path", DsOptType::Str, offsetof(ds4_engine_options, mtp_path), 0, true},
{"mtp_draft", DsOptType::Int, offsetof(ds4_engine_options, mtp_draft_tokens), 0},
{"mtp_margin", DsOptType::Float, offsetof(ds4_engine_options, mtp_margin), 0},
{"prefill_chunk", DsOptType::Uint, offsetof(ds4_engine_options, prefill_chunk), 0},
{"power_percent", DsOptType::Int, offsetof(ds4_engine_options, power_percent), 0},
{"warm_weights", DsOptType::Bool, offsetof(ds4_engine_options, warm_weights), 0},
{"quality", DsOptType::Bool, offsetof(ds4_engine_options, quality), 0},
{"ssd_streaming", DsOptType::Bool, offsetof(ds4_engine_options, ssd_streaming), 0},
{"ssd_streaming_cold", DsOptType::Bool, offsetof(ds4_engine_options, ssd_streaming_cold), 0},
{"ssd_streaming_preload_experts", DsOptType::Uint, offsetof(ds4_engine_options, ssd_streaming_preload_experts), 0},
{"ssd_streaming_cache_experts", DsOptType::CacheExperts, offsetof(ds4_engine_options, ssd_streaming_cache_experts),
offsetof(ds4_engine_options, ssd_streaming_cache_bytes)},
{"simulate_used_memory", DsOptType::Gib, offsetof(ds4_engine_options, simulate_used_memory_bytes), 0},
{"expert_profile_path", DsOptType::Str, offsetof(ds4_engine_options, expert_profile_path), 0, true},
{"directional_steering_file", DsOptType::Str, offsetof(ds4_engine_options, directional_steering_file), 0, true},
{"directional_steering_attn", DsOptType::Float, offsetof(ds4_engine_options, directional_steering_attn), 0},
{"directional_steering_ffn", DsOptType::Float, offsetof(ds4_engine_options, directional_steering_ffn), 0},
};
// Apply a single key:value LoadModel option to the engine options struct.
// Unknown keys are ignored (back-compat: callers pass mixed option sets).
// String values are copied into `storage`, whose elements the engine reads by
// pointer during ds4_engine_open; `storage` MUST have reserved capacity so
// push_back never reallocates and dangles an earlier c_str(). Returns false
// with `err` set when a recognized key has an invalid value.
static bool apply_engine_option(ds4_engine_options *opt, const std::string &key,
const std::string &val, const std::string &model_dir,
std::vector<std::string> &storage, std::string &err) {
const DsOptSpec *spec = nullptr;
for (const auto &s : kEngineOptSpecs) {
if (key == s.key) { spec = &s; break; }
}
if (!spec) return true; // unknown key: ignore
char *base = reinterpret_cast<char *>(opt);
switch (spec->type) {
case DsOptType::Bool: {
bool b = false;
if (!parse_bool_option(val, &b)) { err = key + " must be true/false"; return false; }
*reinterpret_cast<bool *>(base + spec->off) = b;
return true;
}
case DsOptType::Int: {
char *end = nullptr;
long v = std::strtol(val.c_str(), &end, 10);
if (val.empty() || !end || *end != '\0') { err = key + " must be an integer"; return false; }
*reinterpret_cast<int *>(base + spec->off) = static_cast<int>(v);
return true;
}
case DsOptType::Uint: {
char *end = nullptr;
long v = std::strtol(val.c_str(), &end, 10);
if (val.empty() || !end || *end != '\0' || v < 0 || v > static_cast<long>(UINT32_MAX)) {
err = key + " must be a non-negative integer"; return false;
}
*reinterpret_cast<uint32_t *>(base + spec->off) = static_cast<uint32_t>(v);
return true;
}
case DsOptType::Float: {
char *end = nullptr;
float f = std::strtof(val.c_str(), &end);
if (val.empty() || !end || *end != '\0') { err = key + " must be a number"; return false; }
*reinterpret_cast<float *>(base + spec->off) = f;
return true;
}
case DsOptType::Str: {
// Resolve a relative path option (e.g. mtp_path: a sibling GGUF the
// gallery downloaded next to the model) against the model directory, so
// YAMLs reference companion files by name. Absolute values pass through.
if (spec->is_path && !model_dir.empty() && !val.empty() && val.front() != '/') {
storage.push_back(model_dir + "/" + val);
} else {
storage.push_back(val);
}
*reinterpret_cast<const char **>(base + spec->off) = storage.back().c_str();
return true;
}
case DsOptType::Gib: {
uint64_t bytes = 0;
if (!ds4_parse_gib_arg(val.c_str(), &bytes)) {
err = key + " must be a GiB value, e.g. 64GB"; return false;
}
*reinterpret_cast<uint64_t *>(base + spec->off) = bytes;
return true;
}
case DsOptType::CacheExperts: {
uint32_t experts = 0;
uint64_t bytes = 0;
if (!ds4_parse_streaming_cache_experts_arg(val.c_str(), &experts, &bytes)) {
err = key + " must be a positive expert count or a <number>GB budget"; return false;
}
*reinterpret_cast<uint32_t *>(base + spec->off) = experts;
*reinterpret_cast<uint64_t *>(base + spec->off2) = bytes;
return true;
}
}
return true;
}
// When acting as a distributed coordinator, block until the worker route
// covers all layers (ds4_session_distributed_route_ready == 1) or the timeout
// elapses. Returns an empty string on success, or an error message to return
@@ -602,10 +476,39 @@ public:
return GStatus::OK;
}
std::string mtp_path;
int mtp_draft = 0;
float mtp_margin = 3.0f;
std::string ds4_role, ds4_layers, ds4_listen;
for (const auto &opt : request->options()) {
auto [k, v] = split_option(opt);
if (k == "mtp_path") mtp_path = v;
else if (k == "mtp_draft") mtp_draft = std::stoi(v);
else if (k == "mtp_margin") mtp_margin = std::stof(v);
else if (k == "kv_cache_dir") g_kv_cache_dir = v;
else if (k == "ds4_role") ds4_role = v;
else if (k == "ds4_layers") ds4_layers = v;
else if (k == "ds4_listen") ds4_listen = v;
else if (k == "ds4_route_timeout") {
if (!parse_positive_int(v, &g_route_timeout_sec)) {
result->set_success(false);
result->set_message("ds4: ds4_route_timeout must be a positive integer");
return GStatus::OK;
}
}
}
g_kv_cache.SetDir(g_kv_cache_dir);
ds4_engine_options opt = {};
opt.model_path = model_path.c_str();
opt.mtp_path = mtp_path.empty() ? nullptr : mtp_path.c_str();
opt.n_threads = request->threads() > 0 ? request->threads() : 0;
opt.mtp_margin = 3.0f; // ds4 default; overridable via the mtp_margin option
opt.mtp_draft_tokens = mtp_draft;
opt.mtp_margin = mtp_margin;
opt.directional_steering_file = nullptr;
opt.warm_weights = false;
opt.quality = false;
#if defined(DS4_NO_GPU)
opt.backend = DS4_BACKEND_CPU;
@@ -615,46 +518,6 @@ public:
opt.backend = DS4_BACKEND_CUDA;
#endif
// Stable storage for string-valued engine options. The engine reads
// these by pointer during ds4_engine_open, so the std::string backing
// store must outlive the call and not reallocate; reserve up front so
// push_back keeps every prior c_str() valid. Static + clear() reuses
// the buffer across LoadModel calls (the old engine is closed above).
static std::vector<std::string> s_opt_strings;
s_opt_strings.clear();
s_opt_strings.reserve(sizeof(kEngineOptSpecs) / sizeof(kEngineOptSpecs[0]));
// Directory of the main model, used to resolve relative path options.
std::string model_dir;
if (auto slash = model_path.find_last_of('/'); slash != std::string::npos) {
model_dir = model_path.substr(0, slash);
}
std::string ds4_role, ds4_layers, ds4_listen;
for (const auto &o : request->options()) {
auto [k, v] = split_option(o);
if (k == "kv_cache_dir") { g_kv_cache_dir = v; continue; }
else if (k == "ds4_role") { ds4_role = v; continue; }
else if (k == "ds4_layers") { ds4_layers = v; continue; }
else if (k == "ds4_listen") { ds4_listen = v; continue; }
else if (k == "ds4_route_timeout") {
if (!parse_positive_int(v, &g_route_timeout_sec)) {
result->set_success(false);
result->set_message("ds4: ds4_route_timeout must be a positive integer");
return GStatus::OK;
}
continue;
}
std::string err;
if (!apply_engine_option(&opt, k, v, model_dir, s_opt_strings, err)) {
result->set_success(false);
result->set_message("ds4: " + err);
return GStatus::OK;
}
}
g_kv_cache.SetDir(g_kv_cache_dir);
// Coordinator wiring. 'ds4_role:coordinator' enables layer-split
// distributed inference: this process listens on ds4_listen and owns
// the ds4_layers slice; workers dial in (see `local-ai worker

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
IK_LLAMA_VERSION?=6b9de3dbaa21ae95ea80638e5ee836795cc48c93
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62
LLAMA_VERSION?=9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -18,18 +18,6 @@
#if __has_include("server-chat.cpp")
#include "server-chat.cpp"
#endif
// server-schema.cpp exists only in llama.cpp after the upstream refactor that
// extracted the JSON request-schema evaluation (previously the static
// server_task::params_from_json_cmpl) into server_schema::eval_llama_cmpl_schema.
// server-context.cpp and grpc-server.cpp both call into it, so its definitions
// must be part of this translation unit or the link fails. __has_include keeps
// the source compatible with older pins/forks (e.g. llama-cpp-turboquant) that
// predate the split and still expose params_from_json_cmpl (see the guarded
// call sites below).
#if __has_include("server-schema.cpp")
#define LOCALAI_HAS_SERVER_SCHEMA 1
#include "server-schema.cpp"
#endif
#include "server-context.cpp"
// LocalAI
@@ -393,15 +381,6 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, const
});
}
// for each video in the request, add the video data
for (int i = 0; i < predict->videos_size(); i++) {
data["video_data"].push_back(json
{
{"id", i},
{"data", predict->videos(i)},
});
}
data["stop"] = predict->stopprompts();
// data["n_probs"] = predict->nprobs();
//TODO: images,
@@ -1524,7 +1503,7 @@ public:
msg_json["role"] = msg.role();
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
@@ -1575,16 +1554,6 @@ public:
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
@@ -1619,16 +1588,6 @@ public:
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
@@ -1934,27 +1893,25 @@ public:
body_json["min_p"] = data["min_p"];
}
// Forward the chat_template_kwargs the Go layer resolved (model config
// chat_template_kwargs + per-request metadata: enable_thinking,
// reasoning_effort, preserve_thinking, ...). One generic merge replaces
// the previous per-key handling - new template levers need no C++ change.
// oaicompat_chat_params_parse reads these from body_json.
// Pass enable_thinking via chat_template_kwargs (where oaicompat_chat_params_parse reads it)
const auto& metadata = request->metadata();
auto ctk_it = metadata.find("chat_template_kwargs");
if (ctk_it != metadata.end() && !ctk_it->second.empty()) {
try {
json ctk = json::parse(ctk_it->second);
if (ctk.is_object()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
for (auto& el : ctk.items()) {
body_json["chat_template_kwargs"][el.key()] = el.value();
}
}
} catch (const std::exception & e) {
SRV_WRN("failed to parse chat_template_kwargs metadata: %s\n", e.what());
auto et_it = metadata.find("enable_thinking");
if (et_it != metadata.end()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["enable_thinking"] = (et_it->second == "true");
}
// Pass reasoning_effort via chat_template_kwargs too: the lever
// jinja templates like gpt-oss (Harmony) / LFM2.5 read, distinct
// from enable_thinking which those templates ignore.
auto re_it = metadata.find("reasoning_effort");
if (re_it != metadata.end() && !re_it->second.empty()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["reasoning_effort"] = re_it->second;
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
@@ -2082,16 +2039,6 @@ public:
files.push_back(decoded_data);
}
}
const auto &video_data = data.find("video_data");
if (video_data != data.end() && video_data->is_array())
{
for (const auto &video : *video_data)
{
auto decoded_data = base64_decode(video["data"].get<std::string>());
files.push_back(decoded_data);
}
}
}
const bool has_mtmd = ctx_server.impl->mctx != nullptr;
@@ -2114,11 +2061,7 @@ public:
task.index = i;
task.tokens = std::move(inputs[i]);
#ifdef LOCALAI_HAS_SERVER_SCHEMA
task.params = server_schema::eval_llama_cmpl_schema(
#else
task.params = server_task::params_from_json_cmpl(
#endif
ctx_server.impl->vocab,
params_base,
ctx_server.get_meta().slot_n_ctx,
@@ -2132,7 +2075,7 @@ public:
// cannot detect tool calls or separate reasoning from content.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by eval_llama_cmpl_schema
// oaicompat_model is already populated by params_from_json_cmpl
tasks.push_back(std::move(task));
}
@@ -2348,7 +2291,7 @@ public:
}
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
@@ -2401,16 +2344,6 @@ public:
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
@@ -2450,16 +2383,6 @@ public:
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
SRV_INF("[CONTENT DEBUG] Predict: Message %d created content array with media\n", i);
} else if (!msg.tool_calls().empty()) {
@@ -2774,26 +2697,25 @@ public:
body_json["min_p"] = data["min_p"];
}
// Forward the chat_template_kwargs the Go layer resolved (model config
// chat_template_kwargs + per-request metadata: enable_thinking,
// reasoning_effort, preserve_thinking, ...). One generic merge replaces
// the previous per-key handling - new template levers need no C++ change.
// Pass enable_thinking via chat_template_kwargs (where oaicompat_chat_params_parse reads it)
const auto& predict_metadata = request->metadata();
auto predict_ctk_it = predict_metadata.find("chat_template_kwargs");
if (predict_ctk_it != predict_metadata.end() && !predict_ctk_it->second.empty()) {
try {
json ctk = json::parse(predict_ctk_it->second);
if (ctk.is_object()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
for (auto& el : ctk.items()) {
body_json["chat_template_kwargs"][el.key()] = el.value();
}
}
} catch (const std::exception & e) {
SRV_WRN("failed to parse chat_template_kwargs metadata: %s\n", e.what());
auto predict_et_it = predict_metadata.find("enable_thinking");
if (predict_et_it != predict_metadata.end()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["enable_thinking"] = (predict_et_it->second == "true");
}
// Pass reasoning_effort via chat_template_kwargs too: the lever
// jinja templates like gpt-oss (Harmony) / LFM2.5 read, distinct
// from enable_thinking which those templates ignore.
auto predict_re_it = predict_metadata.find("reasoning_effort");
if (predict_re_it != predict_metadata.end() && !predict_re_it->second.empty()) {
if (!body_json.contains("chat_template_kwargs")) {
body_json["chat_template_kwargs"] = json::object();
}
body_json["chat_template_kwargs"]["reasoning_effort"] = predict_re_it->second;
}
// Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
@@ -2923,16 +2845,6 @@ public:
files.push_back(decoded_data);
}
}
const auto &video_data = data.find("video_data");
if (video_data != data.end() && video_data->is_array())
{
for (const auto &video : *video_data)
{
auto decoded_data = base64_decode(video["data"].get<std::string>());
files.push_back(decoded_data);
}
}
}
// process files
@@ -2956,11 +2868,7 @@ public:
task.index = i;
task.tokens = std::move(inputs[i]);
#ifdef LOCALAI_HAS_SERVER_SCHEMA
task.params = server_schema::eval_llama_cmpl_schema(
#else
task.params = server_task::params_from_json_cmpl(
#endif
ctx_server.impl->vocab,
params_base,
ctx_server.get_meta().slot_n_ctx,
@@ -2972,7 +2880,7 @@ public:
// reasoning, tool calls, and content are classified into ChatDeltas.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by eval_llama_cmpl_schema
// oaicompat_model is already populated by params_from_json_cmpl
tasks.push_back(std::move(task));
}
@@ -3509,7 +3417,7 @@ public:
if (body.count("prompt") != 0) {
const bool add_special = json_value(body, "add_special", false);
llama_tokens tokens = tokenize_mixed(ctx_server.impl->vocab, body.at("prompt"), add_special, true);
llama_tokens tokens = tokenize_mixed(ctx_server.impl->vocab, body.at("content"), add_special, true);
for (const auto& token : tokens) {

View File

@@ -1,9 +0,0 @@
/privacy-filter.cpp
build/
package/
grpc-server
*.o
backend.pb.cc
backend.pb.h
backend.grpc.pb.cc
backend.grpc.pb.h

View File

@@ -1,69 +0,0 @@
cmake_minimum_required(VERSION 3.21)
project(privacy-filter-grpc-server LANGUAGES CXX C)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(TARGET grpc-server)
# Path to the privacy-filter.cpp engine sources. The Makefile arranges for this
# to exist (clone of a pinned commit, or a symlink to PRIVACY_FILTER_SRC).
set(PRIVACY_FILTER_DIR "${CMAKE_CURRENT_SOURCE_DIR}/privacy-filter.cpp"
CACHE PATH "Path to the privacy-filter.cpp engine source tree")
find_package(Threads REQUIRED)
find_package(Protobuf CONFIG QUIET)
if(NOT Protobuf_FOUND)
find_package(Protobuf REQUIRED)
endif()
find_package(gRPC CONFIG QUIET)
if(NOT gRPC_FOUND)
# Ubuntu's apt-installed grpc++ does not ship a CMake config - fall back.
find_library(GRPCPP_LIB grpc++ REQUIRED)
find_library(GRPCPP_REFLECTION_LIB grpc++_reflection REQUIRED)
add_library(gRPC::grpc++ INTERFACE IMPORTED)
set_target_properties(gRPC::grpc++ PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_LIB}")
add_library(gRPC::grpc++_reflection INTERFACE IMPORTED)
set_target_properties(gRPC::grpc++_reflection PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_REFLECTION_LIB}")
endif()
find_program(_PROTOC NAMES protoc REQUIRED)
find_program(_GRPC_CPP_PLUGIN NAMES grpc_cpp_plugin REQUIRED)
get_filename_component(HW_PROTO "${CMAKE_CURRENT_SOURCE_DIR}/../../backend.proto" ABSOLUTE)
get_filename_component(HW_PROTO_PATH "${HW_PROTO}" PATH)
set(HW_PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.cc")
set(HW_PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.h")
set(HW_GRPC_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.cc")
set(HW_GRPC_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.h")
add_custom_command(
OUTPUT "${HW_PROTO_SRCS}" "${HW_PROTO_HDRS}" "${HW_GRPC_SRCS}" "${HW_GRPC_HDRS}"
COMMAND ${_PROTOC}
ARGS --grpc_out "${CMAKE_CURRENT_BINARY_DIR}"
--cpp_out "${CMAKE_CURRENT_BINARY_DIR}"
-I "${HW_PROTO_PATH}"
--plugin=protoc-gen-grpc="${_GRPC_CPP_PLUGIN}"
"${HW_PROTO}"
DEPENDS "${HW_PROTO}")
add_library(hw_grpc_proto STATIC
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
# Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
# PF_VULKAN is honored when passed on the cmake command line (it lands in the
# shared cache the engine reads).
set(PF_BUILD_TOOLS OFF CACHE BOOL "" FORCE)
set(PF_BUILD_TESTS OFF CACHE BOOL "" FORCE)
add_subdirectory(${PRIVACY_FILTER_DIR} ${CMAKE_CURRENT_BINARY_DIR}/privacy-filter.cpp)
add_executable(${TARGET} grpc-server.cpp)
target_link_libraries(${TARGET} PRIVATE
pf
hw_grpc_proto
gRPC::grpc++
gRPC::grpc++_reflection
protobuf::libprotobuf
Threads::Threads)

View File

@@ -1,77 +0,0 @@
# privacy-filter backend Makefile.
#
# Wraps the standalone privacy-filter.cpp GGML engine (the openai-privacy-filter
# PII/NER token classifier) as a LocalAI gRPC backend. The engine source is
# fetched at the pin below — .github/workflows/bump_deps.yaml finds and updates
# PRIVACY_FILTER_VERSION, matching the llama-cpp / ds4 convention.
#
# Local development: point at a working checkout instead of cloning, e.g.
# make PRIVACY_FILTER_SRC=$HOME/c/privacy-filter.cpp grpc-server
PRIVACY_FILTER_VERSION?=98f52c5ef2250f207cc6b9a6aef05393a120cb7c
PRIVACY_FILTER_REPO?=https://github.com/localai-org/privacy-filter.cpp
PRIVACY_FILTER_SRC?=
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
BUILD_DIR := build
BUILD_TYPE ?=
NATIVE ?= false
JOBS ?= $(shell nproc 2>/dev/null || echo 4)
CMAKE_ARGS ?= -DCMAKE_BUILD_TYPE=Release
# GPU backends; the default (cpu) needs no extra flags. 'cublas' is LocalAI's
# name for the CUDA build (matches llama-cpp / ds4), mapping to the engine's
# GGML_CUDA path; 'vulkan' selects the ggml Vulkan backend.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS += -DPF_CUDA=ON
endif
ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS += -DPF_VULKAN=ON
endif
# Portable binaries for distribution: disable -march=native unless asked.
ifneq ($(NATIVE),true)
CMAKE_ARGS += -DGGML_NATIVE=OFF
endif
.PHONY: grpc-server package clean purge test all
all: grpc-server
# Provide the engine sources at ./privacy-filter.cpp. With PRIVACY_FILTER_SRC
# set we symlink a local checkout (instant, no network); otherwise we clone the
# pinned commit and its ggml submodule. The directory/symlink is the target, so
# make only does this once — run 'make purge && make' to refetch after a bump.
privacy-filter.cpp:
ifneq ($(PRIVACY_FILTER_SRC),)
ln -sfn $(abspath $(PRIVACY_FILTER_SRC)) privacy-filter.cpp
else
mkdir -p privacy-filter.cpp
cd privacy-filter.cpp && \
git init -q && \
git remote add origin $(PRIVACY_FILTER_REPO) && \
git fetch --depth 1 origin $(PRIVACY_FILTER_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1
endif
grpc-server: privacy-filter.cpp
@echo "Building privacy-filter grpc-server ($(BUILD_TYPE)) with $(CMAKE_ARGS)"
mkdir -p $(BUILD_DIR)
cd $(BUILD_DIR) && cmake $(CMAKE_ARGS) $(CURRENT_MAKEFILE_DIR) && cmake --build . --config Release -j $(JOBS)
cp $(BUILD_DIR)/grpc-server grpc-server
package: grpc-server
bash package.sh
test:
@echo "privacy-filter backend: parity/regression coverage lives in the engine repo"
clean:
rm -rf $(BUILD_DIR) grpc-server package
# 'privacy-filter.cpp' may be a symlink (PRIVACY_FILTER_SRC) — rm without a
# trailing slash removes the link, never the linked-to checkout.
purge: clean
rm -rf privacy-filter.cpp

View File

@@ -1,210 +0,0 @@
// privacy-filter LocalAI gRPC backend.
//
// Thin shim over privacy-filter.cpp's flat C API (include/pf.h): a standalone
// GGML engine for the openai-privacy-filter token-classification model family
// (PII NER). It replaces the llama.cpp-patched TokenClassify path for this one
// model family — same GGUF files, no llama.cpp carry-patches.
//
// Only the RPCs the PII tier needs are implemented: LoadModel, TokenClassify,
// plus Health / Status / Free. Everything else inherits the generated base
// class default (UNIMPLEMENTED).
#include "backend.pb.h"
#include "backend.grpc.pb.h"
#include "pf.h"
#include <grpcpp/grpcpp.h>
#include <grpcpp/server.h>
#include <grpcpp/server_builder.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
#include <atomic>
#include <chrono>
#include <csignal>
#include <iostream>
#include <memory>
#include <mutex>
#include <string>
using grpc::Server;
using grpc::ServerBuilder;
using grpc::ServerContext;
// NOTE: do NOT alias grpc::Status as Status — the Status RPC method below would
// shadow the type and break the other method signatures. Use GStatus instead.
using GStatus = ::grpc::Status;
using grpc::StatusCode;
namespace {
// The engine is single-model-per-process: LocalAI spawns one backend process
// per loaded model. g_mu guards (re)load against in-flight classification.
std::mutex g_mu;
pf_ctx * g_ctx = nullptr;
std::atomic<Server *> g_server{nullptr};
// Resolve the device string the engine expects ("cpu" / "gpu" / "cuda" /
// "vulkan", optionally ":N"). Priority: an explicit "device:..." in
// ModelOptions.Options, then a non-zero NGPULayers as a coarse "use the GPU"
// signal, else CPU. "gpu" lets the engine pick whichever GPU backend this
// binary was compiled with (CUDA or Vulkan), so the same config works on
// either build; pin "device:cuda"/"device:vulkan" to be explicit.
std::string resolve_device(const backend::ModelOptions * opts) {
for (const auto & o : opts->options()) {
const std::string prefix = "device:";
if (o.rfind(prefix, 0) == 0) {
return o.substr(prefix.size());
}
}
if (opts->ngpulayers() > 0) {
return "gpu";
}
return "cpu";
}
class PrivacyFilterBackend final : public backend::Backend::Service {
public:
GStatus Health(ServerContext *, const backend::HealthMessage *,
backend::Reply * reply) override {
reply->set_message("OK");
return GStatus::OK;
}
GStatus Status(ServerContext *, const backend::HealthMessage *,
backend::StatusResponse * response) override {
std::lock_guard<std::mutex> lock(g_mu);
response->set_state(g_ctx ? backend::StatusResponse::READY
: backend::StatusResponse::UNINITIALIZED);
return GStatus::OK;
}
GStatus LoadModel(ServerContext *, const backend::ModelOptions * request,
backend::Result * result) override {
std::lock_guard<std::mutex> lock(g_mu);
// ModelFile is the absolute path LocalAI resolves; Model is the bare
// name. Prefer the former, fall back to the latter.
const std::string path =
!request->modelfile().empty() ? request->modelfile() : request->model();
if (path.empty()) {
result->set_success(false);
result->set_message("no model path supplied");
return GStatus::OK;
}
const std::string device = resolve_device(request);
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
pf_ctx * ctx = pf_load(path.c_str(), device.c_str(), request->threads());
const char * err = pf_last_error(ctx);
if (err) {
result->set_success(false);
result->set_message(std::string("privacy-filter load failed: ") + err);
pf_free(ctx);
return GStatus::OK;
}
// ContextSize, when set, becomes the per-forward window. The engine
// ignores values that are too small to window (<= 2*halo) and just
// runs a single forward, so passing it through is always safe.
if (request->contextsize() > 0) {
pf_set_window(ctx, request->contextsize());
}
g_ctx = ctx;
result->set_success(true);
result->set_message("privacy-filter loaded (" + device + ")");
return GStatus::OK;
}
GStatus TokenClassify(ServerContext *, const backend::TokenClassifyRequest * request,
backend::TokenClassifyResponse * response) override {
std::lock_guard<std::mutex> lock(g_mu);
if (!g_ctx) {
return GStatus(StatusCode::FAILED_PRECONDITION, "Model not loaded");
}
const std::string & text = request->text();
if (text.empty()) {
return GStatus::OK; // no text -> no entities
}
pf_entity * ents = nullptr;
size_t n = 0;
if (pf_classify(g_ctx, text.data(), text.size(), request->threshold(), &ents, &n) != 0) {
const char * err = pf_last_error(g_ctx);
return GStatus(StatusCode::INTERNAL,
std::string("TokenClassify failed: ") + (err ? err : "unknown"));
}
// Byte offsets are into the original UTF-8 text; the engine already
// applied the threshold and whitespace-trimmed span edges.
for (size_t i = 0; i < n; i++) {
backend::TokenClassifyEntity * ent = response->add_entities();
ent->set_entity_group(ents[i].label ? ents[i].label : "");
ent->set_start(ents[i].start);
ent->set_end(ents[i].end);
ent->set_score(ents[i].score);
ent->set_text(text.substr((size_t) ents[i].start,
(size_t) (ents[i].end - ents[i].start)));
}
pf_entities_free(ents, n);
return GStatus::OK;
}
GStatus Free(ServerContext *, const backend::HealthMessage *,
backend::Result * result) override {
std::lock_guard<std::mutex> lock(g_mu);
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
result->set_success(true);
return GStatus::OK;
}
};
void RunServer(const std::string & addr) {
PrivacyFilterBackend service;
grpc::EnableDefaultHealthCheckService(true);
grpc::reflection::InitProtoReflectionServerBuilderPlugin();
ServerBuilder builder;
builder.AddListeningPort(addr, grpc::InsecureServerCredentials());
builder.RegisterService(&service);
builder.SetMaxReceiveMessageSize(64 * 1024 * 1024);
builder.SetMaxSendMessageSize(64 * 1024 * 1024);
std::unique_ptr<Server> server(builder.BuildAndStart());
if (!server) {
std::cerr << "privacy-filter grpc-server: failed to bind " << addr << "\n";
std::exit(1);
}
g_server = server.get();
std::cerr << "privacy-filter grpc-server listening on " << addr << "\n";
server->Wait();
}
void signal_handler(int) {
if (auto * srv = g_server.load()) {
srv->Shutdown(std::chrono::system_clock::now() + std::chrono::seconds(3));
}
}
} // namespace
int main(int argc, char * argv[]) {
std::string addr = "127.0.0.1:50051";
for (int i = 1; i < argc; ++i) {
std::string a = argv[i];
const std::string addr_flag = "--addr=";
if (a.rfind(addr_flag, 0) == 0) addr = a.substr(addr_flag.size());
else if (a == "--addr" && i + 1 < argc) addr = argv[++i];
else if (a == "--help" || a == "-h") {
std::cout << "Usage: grpc-server --addr=HOST:PORT\n";
return 0;
}
}
std::signal(SIGINT, signal_handler);
std::signal(SIGTERM, signal_handler);
RunServer(addr);
return 0;
}

View File

@@ -1,39 +0,0 @@
#!/bin/bash
# Assemble package/ for the from-scratch backend image: the grpc-server binary,
# run.sh, the dynamic loader, and every shared library the binary needs.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/grpc-server" "$CURDIR/package/"
cp -rfv "$CURDIR/run.sh" "$CURDIR/package/"
# The dynamic loader, renamed to lib/ld.so so run.sh can invoke it explicitly
# (makes the image independent of the host's glibc layout).
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
else
echo "package.sh: unknown architecture" >&2; exit 1
fi
# Bundle the binary's transitive shared deps (libstdc++, libgomp, and the apt
# grpc++/protobuf/absl stack) by walking ldd — robust to whichever of those are
# linked shared vs static. The loader line (no "=>") is skipped; ld.so above
# already covers it.
ldd "$CURDIR/grpc-server" | awk '$2 == "=>" && $3 ~ /^\// { print $3 }' | sort -u | \
while read -r so; do
[ -f "$so" ] && cp -arfLv "$so" "$CURDIR/package/lib/"
done
# Vulkan loader / GPU libs when building the GPU variant.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "privacy-filter package contents:"
ls -lah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -1,9 +0,0 @@
#!/bin/bash
# Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
if [ -f "$CURDIR/lib/ld.so" ]; then
exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
fi
exec "$CURDIR/grpc-server" "$@"

View File

@@ -1,11 +0,0 @@
.cache/
sources/
build/
package/
ced-grpc
# build artifacts staged in-tree by the Makefile (cp from sources/) or
# symlinked for local dev; the real sources live in ced.cpp upstream.
*.so
*.so.*
ced_capi.h
compile_commands.json

View File

@@ -1,77 +0,0 @@
# ced sound-classification backend Makefile.
#
# Upstream pin lives below as CED_VERSION?=<sha> so .github/bump_deps.sh can find
# and update it (matches the parakeet-cpp / whisper.cpp convention).
#
# Local dev shortcut: symlink an out-of-tree ced.cpp shared build + header and
# skip the clone/cmake steps entirely:
# ln -sf /path/to/ced.cpp/build-shared/libced.so .
# ln -sf /path/to/ced.cpp/include/ced_capi.h .
# go build -o ced-grpc .
CED_VERSION?=c04ac14b7992d00584d9e812c9bb6268598a6ce7
CED_REPO?=https://github.com/mudler/ced.cpp
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
BUILD_TYPE?=
NATIVE?=false
# Static-link ggml into libced.so (PIC) so the shared lib is self-contained:
# dlopen needs no libggml*.so alongside it, only system libs the runtime image
# already provides.
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DCED_SHARED=ON -DCED_BUILD_CLI=OFF -DCED_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# ced.cpp gates its ggml backends behind CED_GGML_* options (set(... CACHE BOOL
# "" FORCE)), so forward those instead of a bare -DGGML_CUDA=ON.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DCED_GGML_CUDA=ON -DGGML_CUDA_GRAPHS=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DCED_GGML_HIP=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DCED_GGML_VULKAN=ON
endif
.PHONY: ced-grpc package build clean purge test all
all: ced-grpc
sources/ced.cpp:
mkdir -p sources/ced.cpp
cd sources/ced.cpp && \
git init -q && \
git remote add origin $(CED_REPO) && \
git fetch --depth 1 origin $(CED_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1 --single-branch
libced.so: sources/ced.cpp
cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
cp -fv sources/ced.cpp/include/ced_capi.h ./
ced-grpc: libced.so main.go goced.go
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o ced-grpc .
package: ced-grpc
bash package.sh
build: package
test:
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
clean: purge
rm -rf libced.so* ced_capi.h package ced-grpc
purge:
rm -rf sources/ced.cpp

View File

@@ -1,130 +0,0 @@
package main
// Go side of the ced backend: purego bindings over ced_capi.h plus the gRPC
// SoundDetection implementation.
//
// SKETCH: the pb.SoundDetection* types come from backend.proto (regenerate with
// `make protogen-go`). The C side is single-threaded per ctx, so we guard the
// engine with engineMu; LocalAI also serializes via base.SingleThread.
import (
"context"
"encoding/json"
"errors"
"fmt"
"sort"
"sync"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// purego-bound entry points from libced.so. Names match ced_capi.h exactly.
var (
CppAbiVersion func() int32
CppLoad func(ggufPath string) uintptr
CppFree func(ctx uintptr)
CppLastError func(ctx uintptr) string
CppNumClasses func(ctx uintptr) int32
CppSampleRate func(ctx uintptr) int32
CppClassifyPathJSON func(ctx uintptr, wavPath string, topK int32) uintptr
CppClassifyPcmJSON func(ctx uintptr, pcm []float32, nSamples int32, sampleRate int32, topK int32) uintptr
CppFreeString func(s uintptr)
)
// cstr copies a malloc'd C string (returned as uintptr) into a Go string and
// frees the original via ced_capi_free_string. Empty/0 -> "".
func cstr(p uintptr) string {
if p == 0 {
return ""
}
defer CppFreeString(p)
var b []byte
for i := 0; ; i++ {
ch := *(*byte)(unsafe.Pointer(p + uintptr(i))) //nolint:govet // #nosec G103 -- C-owned NUL-terminated string from libced (not Go-GC memory)
if ch == 0 {
break
}
b = append(b, ch)
}
return string(b)
}
// Ced is the gRPC backend. One loaded CED model per instance.
type Ced struct {
base.Base
ctxPtr uintptr
engineMu sync.Mutex
}
// Load resolves the GGUF and opens the C-API context.
func (c *Ced) Load(opts *pb.ModelOptions) error {
if opts.ModelFile == "" {
return errors.New("ced: ModelFile is required")
}
ctx := CppLoad(opts.ModelFile)
if ctx == 0 {
return fmt.Errorf("ced: ced_capi_load failed for %q: %s", opts.ModelFile, CppLastError(0))
}
c.ctxPtr = ctx
return nil
}
// jsonTag mirrors the ced_capi JSON tag objects.
type jsonTag struct {
Index int `json:"index"`
Score float32 `json:"score"`
Label string `json:"label"`
}
// SoundDetection classifies the clip at req.Src and returns scored AudioSet tags.
func (c *Ced) SoundDetection(ctx context.Context, req *pb.SoundDetectionRequest) (*pb.SoundDetectionResponse, error) {
if c.ctxPtr == 0 {
return nil, errors.New("ced: model not loaded")
}
if req.GetSrc() == "" {
return nil, errors.New("ced: SoundDetectionRequest.src (audio path) is required")
}
topK := req.GetTopK()
if topK <= 0 {
topK = 10 // sensible default for a tagging response
}
c.engineMu.Lock()
out := cstr(CppClassifyPathJSON(c.ctxPtr, req.GetSrc(), topK))
lastErr := CppLastError(c.ctxPtr)
c.engineMu.Unlock()
if out == "" {
return nil, fmt.Errorf("ced: classification failed: %s", lastErr)
}
var tags []jsonTag
if err := json.Unmarshal([]byte(out), &tags); err != nil {
return nil, fmt.Errorf("ced: bad classifier JSON: %w", err)
}
thr := req.GetThreshold()
resp := &pb.SoundDetectionResponse{}
for _, t := range tags {
if t.Score < thr {
continue
}
resp.Detections = append(resp.Detections, &pb.SoundClass{
Label: t.Label, Score: t.Score, Index: int32(t.Index),
})
}
sort.Slice(resp.Detections, func(i, j int) bool {
return resp.Detections[i].Score > resp.Detections[j].Score
})
return resp, nil
}
func (c *Ced) Free() error {
c.engineMu.Lock()
defer c.engineMu.Unlock()
if c.ctxPtr != 0 {
CppFree(c.ctxPtr)
c.ctxPtr = 0
}
return nil
}

View File

@@ -1,59 +0,0 @@
package main
// ced sound-classification backend. Started internally by LocalAI: one gRPC
// server per loaded model. Loads libced.so via purego and registers the flat
// C-API declared in ced_capi.h. The library name can be overridden with
// CED_LIBRARY (mirrors PARAKEET_LIBRARY / WHISPER_LIBRARY); the default looks
// for the .so next to this binary.
//
// SKETCH: requires `make protogen-go` after the backend.proto SoundDetection
// addition, and a built libced.so (see Makefile). See DESIGN.md.
import (
"flag"
"fmt"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var addr = flag.String("addr", "localhost:50051", "the address to connect to")
type libFunc struct {
ptr any
name string
}
func main() {
libName := os.Getenv("CED_LIBRARY")
if libName == "" {
libName = "libced.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("ced: dlopen %q: %w", libName, err))
}
// Bound 1:1 to ced_capi.h. char*-returning functions are declared uintptr
// so we can free the same pointer with ced_capi_free_string after copying
// (purego's string return would copy and leak the original).
for _, lf := range []libFunc{
{&CppAbiVersion, "ced_capi_abi_version"},
{&CppLoad, "ced_capi_load"},
{&CppFree, "ced_capi_free"},
{&CppLastError, "ced_capi_last_error"},
{&CppNumClasses, "ced_capi_num_classes"},
{&CppSampleRate, "ced_capi_sample_rate"},
{&CppClassifyPathJSON, "ced_capi_classify_path_json"},
{&CppClassifyPcmJSON, "ced_capi_classify_pcm_json"},
{&CppFreeString, "ced_capi_free_string"},
} {
purego.RegisterLibFunc(lf.ptr, lib, lf.name)
}
fmt.Fprintf(os.Stderr, "[ced] ABI=%d\n", CppAbiVersion())
flag.Parse()
if err := grpc.StartServer(*addr, &Ced{}); err != nil {
panic(err)
}
}

View File

@@ -1,60 +0,0 @@
#!/bin/bash
#
# Bundle the ced-grpc binary, libced.so, the core runtime libs (libc/libstdc++/
# libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE so the package
# is self-contained. Mirrors backend/go/parakeet-cpp/package.sh; run.sh routes
# the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc is used.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libced.so not found in $CURDIR, run 'make' first" >&2
exit 1
}
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -1,15 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the sibling backends).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/ced-grpc" "$@"
fi
exec "$CURDIR/ced-grpc" "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221
CRISPASR_VERSION?=f7838a306687f22c281d29c250f879a4ab3df2d7
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -67,7 +67,7 @@ sources/CrispASR:
# it, so ${CMAKE_SOURCE_DIR} is THIS backend dir and the talk-llama sources
# aren't found. Rewrite to ${PROJECT_SOURCE_DIR} (the crispasr project root),
# which is correct both standalone and as a subproject. Idempotent.
sed -i.bak 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt && rm -f sources/CrispASR/src/CMakeLists.txt.bak
sed -i 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt
# Detect OS
UNAME_S := $(shell uname -s)

View File

@@ -47,74 +47,6 @@ extern "C" void set_abort(int v) {
g_abort.store(v, std::memory_order_relaxed);
}
// --- word-level timestamp accessors ---
extern "C" {
int crispasr_session_result_n_words(crispasr_session_result *r, int seg_i);
const char *crispasr_session_result_word_text(crispasr_session_result *r,
int seg_i, int word_i);
int64_t crispasr_session_result_word_t0(crispasr_session_result *r, int seg_i,
int word_i);
int64_t crispasr_session_result_word_t1(crispasr_session_result *r, int seg_i,
int word_i);
// Parakeet-specific word accessors
int crispasr_parakeet_result_n_words(void *r);
const char *crispasr_parakeet_result_word_text(void *r, int word_i);
int64_t crispasr_parakeet_result_word_t0(void *r, int word_i);
int64_t crispasr_parakeet_result_word_t1(void *r, int word_i);
}
void *get_result(void) { return g_result; }
int get_word_count(int seg_i) {
if (!g_result)
return 0;
return crispasr_session_result_n_words(g_result, seg_i);
}
const char *get_word_text(int seg_i, int word_i) {
if (!g_result)
return "";
return crispasr_session_result_word_text(g_result, seg_i, word_i);
}
int64_t get_word_t0(int seg_i, int word_i) {
if (!g_result)
return 0;
return crispasr_session_result_word_t0(g_result, seg_i, word_i);
}
int64_t get_word_t1(int seg_i, int word_i) {
if (!g_result)
return 0;
return crispasr_session_result_word_t1(g_result, seg_i, word_i);
}
// Parakeet-specific word accessors
int get_parakeet_word_count(void) {
if (!g_result)
return 0;
return crispasr_parakeet_result_n_words(g_result);
}
const char *get_parakeet_word_text(int word_i) {
if (!g_result)
return "";
return crispasr_parakeet_result_word_text(g_result, word_i);
}
int64_t get_parakeet_word_t0(int word_i) {
if (!g_result)
return 0;
return crispasr_parakeet_result_word_t0(g_result, word_i);
}
int64_t get_parakeet_word_t1(int word_i) {
if (!g_result)
return 0;
return crispasr_parakeet_result_word_t1(g_result, word_i);
}
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void *data) {
const char *level_str;

View File

@@ -20,18 +20,4 @@ float *tts_synthesize(const char *text, int *out_n_samples); // 24kHz mono float
void tts_free(float *pcm);
int tts_set_voice(const char *name); // best-effort speaker selection; 0 ok
int tts_set_voice_file(const char *path, const char *ref_text); // load voice pack (.gguf) or zero-shot clone (.wav + ref_text)
// --- word-level timestamp accessors ---
// Session-based (works for whisper-like backends)
void *get_result(void);
int get_word_count(int seg_i);
const char *get_word_text(int seg_i, int word_i);
int64_t get_word_t0(int seg_i, int word_i);
int64_t get_word_t1(int seg_i, int word_i);
// Parakeet-specific (global word list, no segment index)
int get_parakeet_word_count(void);
const char *get_parakeet_word_text(int word_i);
int64_t get_parakeet_word_t0(int word_i);
int64_t get_parakeet_word_t1(int word_i);
}

View File

@@ -11,7 +11,6 @@ import (
"github.com/go-audio/audio"
"github.com/go-audio/wav"
gguf "github.com/gpustack/gguf-parser-go"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/utils"
@@ -34,55 +33,10 @@ var (
CppTTSFree func(ptr uintptr)
CppTTSSetVoice func(name string) int
CppTTSSetVoiceFile func(path string, refText string) int
// Word-level timestamp accessors (session-based, per-segment)
CppGetWordCount func(segI int) int
CppGetWordText func(segI int, wordI int) string
CppGetWordT0 func(segI int, wordI int) int64
CppGetWordT1 func(segI int, wordI int) int64
// Parakeet-specific word accessors (global, no segment index)
CppGetParakeetWordCount func() int
CppGetParakeetWordText func(wordI int) string
CppGetParakeetWordT0 func(wordI int) int64
CppGetParakeetWordT1 func(wordI int) int64
)
type CrispASR struct {
base.SingleThread
// sampleRate is the output rate (Hz) of the loaded TTS engine's PCM, used to
// write a correct WAV header. Most CrispASR TTS backends emit 24 kHz, but
// piper returns its model's native rate (16 kHz for x_low/low voices,
// 22.05 kHz for medium/high), so it is read from the GGUF metadata at Load.
sampleRate int
}
// defaultTTSSampleRate is the output rate assumed for CrispASR TTS engines that
// don't advertise one in GGUF metadata (vibevoice/orpheus/chatterbox/qwen3-tts
// all emit 24 kHz). piper is the exception and carries piper.sample_rate.
const defaultTTSSampleRate = 24000
// piperSampleRate reads the piper.sample_rate metadata key from a GGUF model.
// CrispASR's piper backend returns PCM at the model's native rate without
// resampling, so the WAV header must match it. Returns ok=false for non-piper
// models (key absent) or an unreadable file, letting the caller fall back to
// defaultTTSSampleRate.
func piperSampleRate(modelPath string) (int, bool) {
// Only scalar architecture keys are read, so skip the large array metadata
// (phoneme map) and mmap the header - same rationale as pkg/vram's reader.
f, err := gguf.ParseGGUFFile(modelPath, gguf.UseMMap(), gguf.SkipLargeMetadata())
if err != nil {
return 0, false
}
kv, ok := f.Header.MetadataKV.Get("piper.sample_rate")
if !ok || kv.ValueType != gguf.GGUFMetadataValueTypeUint32 {
return 0, false
}
rate := int(kv.ValueUint32())
if rate <= 0 {
return 0, false
}
return rate, true
}
// splitOption splits a "prefix:value" model option into its key and value,
@@ -149,14 +103,6 @@ func (w *CrispASR) Load(opts *pb.ModelOptions) error {
return fmt.Errorf("Failed to load CrispASR transcription model")
}
// Determine the TTS output sample rate for the WAV header. piper voices
// carry their native rate in GGUF metadata and CrispASR does not resample;
// every other engine emits the 24 kHz default.
w.sampleRate = defaultTTSSampleRate
if rate, ok := piperSampleRate(opts.ModelFile); ok {
w.sampleRate = rate
}
// Load the companion file (codec/tokenizer/s3gen) after the session is open.
// rc==0 means success or "not applicable" for the active backend; only a
// negative code is fatal.
@@ -224,28 +170,6 @@ func (w *CrispASR) VAD(req *pb.VADRequest) (pb.VADResponse, error) {
}, nil
}
// isValidWord reports whether a TranscriptWord contains recognisable speech
// content. The parakeet-specific word accessors can return stale initialisation
// data (model name, binary blobs) when a segment has no real speech. A word is
// considered valid only when:
// - the text is non-empty after trimming,
// - it contains no U+FFFD replacement characters (from binary data scrubbing),
// - both timestamps are non-negative,
// - the word has positive duration (end > start).
func isValidWord(w *pb.TranscriptWord) bool {
txt := strings.TrimSpace(w.Text)
if txt == "" {
return false
}
if strings.ContainsRune(txt, '\uFFFD') {
return false
}
if w.Start < 0 || w.End < 0 || w.End <= w.Start {
return false
}
return true
}
func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRequest) (pb.TranscriptResult, error) {
if err := ctx.Err(); err != nil {
return pb.TranscriptResult{}, status.Error(codes.Canceled, "transcription cancelled")
@@ -324,54 +248,15 @@ func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRe
// IDs, so Tokens is left empty.
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
// Populate word-level timestamps. Try session-based functions first
// (per-segment); fall back to parakeet-specific functions (global word
// list with no segment index — only populated on the first segment to
// avoid duplication).
words := []*pb.TranscriptWord{}
wordCount := CppGetWordCount(i)
if wordCount == 0 && i == 0 {
wordCount = CppGetParakeetWordCount()
for j := 0; j < wordCount; j++ {
w := &pb.TranscriptWord{
Start: CppGetParakeetWordT0(j) * (10000000),
End: CppGetParakeetWordT1(j) * (10000000),
Text: strings.ToValidUTF8(strings.Clone(CppGetParakeetWordText(j)), "<22>"),
}
if isValidWord(w) {
words = append(words, w)
}
}
} else {
for j := 0; j < wordCount; j++ {
w := &pb.TranscriptWord{
Start: CppGetWordT0(i, j) * (10000000),
End: CppGetWordT1(i, j) * (10000000),
Text: strings.ToValidUTF8(strings.Clone(CppGetWordText(i, j)), "<22>"),
}
if isValidWord(w) {
words = append(words, w)
}
}
}
// Skip empty segments with no recognisable content (e.g. trailing
// silence segments that parakeet emits with stale init data).
trimmed := strings.TrimSpace(txt)
if trimmed == "" && len(words) == 0 {
continue
}
segment := &pb.TranscriptSegment{
Id: int32(i),
Text: txt,
Start: s, End: t,
Words: words,
}
segments = append(segments, segment)
text += " " + trimmed
text += " " + strings.TrimSpace(txt)
}
return pb.TranscriptResult{
@@ -463,20 +348,13 @@ func (w *CrispASR) AudioTranscriptionStream(ctx context.Context, opts *pb.Transc
s := CppGetSegmentStart(i) * 10000000
t := CppGetSegmentEnd(i) * 10000000
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
// Skip empty segments (e.g. trailing silence that parakeet emits
// with stale init data).
trimmed := strings.TrimSpace(txt)
if trimmed == "" && s == t {
continue
}
segments = append(segments, &pb.TranscriptSegment{
Id: int32(i),
Text: txt,
Start: s, End: t,
})
trimmed := strings.TrimSpace(txt)
if trimmed == "" {
continue
}
@@ -512,7 +390,7 @@ func (w *CrispASR) synthesize(text string) ([]float32, error) {
}
defer CppTTSFree(ptr)
src := unsafe.Slice((*float32)(unsafe.Pointer(ptr)), int(n)) //nolint:govet // ptr addresses C-allocated PCM returned across the purego boundary; copied out immediately below, before tts_free.
out := make([]float32, int(n)) // copy out of C memory before free
out := make([]float32, int(n)) // copy out of C memory before free
copy(out, src)
return out, nil
}
@@ -539,7 +417,7 @@ func (w *CrispASR) TTS(req *pb.TTSRequest) error {
if err != nil {
return err
}
return writeWAV(req.Dst, pcm, w.sampleRate)
return writeWAV24k(req.Dst, pcm)
}
// TTSStream is the streaming counterpart to TTS. CrispASR has no progressive
@@ -569,7 +447,7 @@ func (w *CrispASR) TTSStream(req *pb.TTSRequest, results chan []byte) error {
}
defer func() { _ = os.Remove(dst) }()
if err := writeWAV(dst, pcm, w.sampleRate); err != nil {
if err := writeWAV24k(dst, pcm); err != nil {
return err
}
@@ -581,14 +459,14 @@ func (w *CrispASR) TTSStream(req *pb.TTSRequest, results chan []byte) error {
return nil
}
// writeWAV writes pcm as a sampleRate Hz, mono, 16-bit PCM WAV at dst.
func writeWAV(dst string, pcm []float32, sampleRate int) error {
// writeWAV24k writes pcm as a 24000 Hz, mono, 16-bit PCM WAV at dst.
func writeWAV24k(dst string, pcm []float32) error {
f, err := os.Create(dst)
if err != nil {
return fmt.Errorf("crispasr: create %q: %w", dst, err)
}
enc := wav.NewEncoder(f, sampleRate, 16, 1, 1)
enc := wav.NewEncoder(f, 24000, 16, 1, 1)
ints := make([]int, len(pcm))
for i, s := range pcm {
if s > 1 {
@@ -599,7 +477,7 @@ func writeWAV(dst string, pcm []float32, sampleRate int) error {
ints[i] = int(s * 32767)
}
buf := &audio.IntBuffer{
Format: &audio.Format{NumChannels: 1, SampleRate: sampleRate},
Format: &audio.Format{NumChannels: 1, SampleRate: 24000},
Data: ints,
SourceBitDepth: 16,
}

View File

@@ -1,164 +0,0 @@
package main
import (
"bytes"
"encoding/binary"
"os"
"path/filepath"
"github.com/go-audio/wav"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// GGUF metadata value type tags (subset) from the GGUF spec.
const (
ggufTypeUint32 uint32 = 4
ggufTypeString uint32 = 8
)
type ggufKV struct {
key string
vtype uint32
val any
}
// writeMinimalGGUF emits a valid, tensor-less GGUF file carrying only the given
// metadata key-values. Enough for the header-only parse path piperSampleRate
// uses; avoids pulling a real multi-MB voice into the test.
func writeMinimalGGUF(path string, kvs []ggufKV) error {
var b bytes.Buffer
b.WriteString("GGUF") // magic
_ = binary.Write(&b, binary.LittleEndian, uint32(3)) // version
_ = binary.Write(&b, binary.LittleEndian, uint64(0)) // tensor count
_ = binary.Write(&b, binary.LittleEndian, uint64(len(kvs)))
for _, kv := range kvs {
_ = binary.Write(&b, binary.LittleEndian, uint64(len(kv.key)))
b.WriteString(kv.key)
_ = binary.Write(&b, binary.LittleEndian, kv.vtype)
switch v := kv.val.(type) {
case uint32:
_ = binary.Write(&b, binary.LittleEndian, v)
case string:
_ = binary.Write(&b, binary.LittleEndian, uint64(len(v)))
b.WriteString(v)
}
}
return os.WriteFile(path, b.Bytes(), 0o644)
}
// wavSampleRate decodes the WAV header at path and returns its sample rate.
func wavSampleRate(path string) (int, error) {
f, err := os.Open(path)
if err != nil {
return 0, err
}
defer func() { _ = f.Close() }()
dec := wav.NewDecoder(f)
dec.ReadInfo()
return int(dec.SampleRate), nil
}
var _ = Describe("piper sample rate", func() {
Context("piperSampleRate", func() {
It("reads piper.sample_rate from a piper GGUF (medium = 22050)", func() {
p := filepath.Join(GinkgoT().TempDir(), "voice.gguf")
Expect(writeMinimalGGUF(p, []ggufKV{
{key: "general.architecture", vtype: ggufTypeString, val: "piper"},
{key: "piper.sample_rate", vtype: ggufTypeUint32, val: uint32(22050)},
})).To(Succeed())
rate, ok := piperSampleRate(p)
Expect(ok).To(BeTrue(), "piper.sample_rate should be found")
Expect(rate).To(Equal(22050))
})
It("reads the low-quality rate (16000)", func() {
p := filepath.Join(GinkgoT().TempDir(), "voice.gguf")
Expect(writeMinimalGGUF(p, []ggufKV{
{key: "piper.sample_rate", vtype: ggufTypeUint32, val: uint32(16000)},
})).To(Succeed())
rate, ok := piperSampleRate(p)
Expect(ok).To(BeTrue())
Expect(rate).To(Equal(16000))
})
It("returns ok=false for a non-piper GGUF (no piper.sample_rate key)", func() {
p := filepath.Join(GinkgoT().TempDir(), "other.gguf")
Expect(writeMinimalGGUF(p, []ggufKV{
{key: "general.architecture", vtype: ggufTypeString, val: "vibevoice"},
})).To(Succeed())
_, ok := piperSampleRate(p)
Expect(ok).To(BeFalse())
})
It("returns ok=false for an unreadable/non-GGUF file", func() {
p := filepath.Join(GinkgoT().TempDir(), "garbage.gguf")
Expect(os.WriteFile(p, []byte("not a gguf"), 0o644)).To(Succeed())
_, ok := piperSampleRate(p)
Expect(ok).To(BeFalse())
})
})
// End-to-end through the built .so. Gated on CRISPASR_PIPER_MODEL_PATH (a
// real piper voice GGUF) like the other model-backed specs; never runs in
// default CI. Proves CrispASR's piper backend output rate flows into the
// WAV header instead of the hardcoded 24 kHz default.
Context("piper TTS end-to-end", func() {
It("writes the WAV at the model's native piper.sample_rate", func() {
model := os.Getenv("CRISPASR_PIPER_MODEL_PATH")
if model == "" {
Skip("set CRISPASR_PIPER_MODEL_PATH to run the piper e2e spec")
}
ensureLibLoaded()
expected, ok := piperSampleRate(model)
Expect(ok).To(BeTrue(), "model should carry piper.sample_rate metadata")
w := &CrispASR{}
Expect(w.Load(&pb.ModelOptions{
ModelFile: model,
Options: []string{"backend:piper"},
Threads: 4,
})).To(Succeed())
dst := filepath.Join(GinkgoT().TempDir(), "piper.wav")
Expect(w.TTS(&pb.TTSRequest{Text: "Hello from CrispASR piper.", Dst: dst})).To(Succeed())
info, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(info.Size()).To(BeNumerically(">", 1024), "expected a non-trivial WAV")
rate, err := wavSampleRate(dst)
Expect(err).ToNot(HaveOccurred())
Expect(rate).To(Equal(expected),
"WAV header rate must equal the model's native piper.sample_rate, not the 24k default")
})
})
Context("writeWAV", func() {
It("writes the WAV header at the given sample rate (22050 for piper, not the 24k default)", func() {
dst := filepath.Join(GinkgoT().TempDir(), "out.wav")
pcm := make([]float32, 220) // 10 ms of silence is enough for a header
Expect(writeWAV(dst, pcm, 22050)).To(Succeed())
rate, err := wavSampleRate(dst)
Expect(err).ToNot(HaveOccurred())
Expect(rate).To(Equal(22050))
})
It("writes a 16000 Hz header for low-quality piper voices", func() {
dst := filepath.Join(GinkgoT().TempDir(), "out.wav")
pcm := make([]float32, 160)
Expect(writeWAV(dst, pcm, 16000)).To(Succeed())
rate, err := wavSampleRate(dst)
Expect(err).ToNot(HaveOccurred())
Expect(rate).To(Equal(16000))
})
})
})

View File

@@ -44,14 +44,6 @@ func main() {
{&CppTTSFree, "tts_free"},
{&CppTTSSetVoice, "tts_set_voice"},
{&CppTTSSetVoiceFile, "tts_set_voice_file"},
{&CppGetWordCount, "get_word_count"},
{&CppGetWordText, "get_word_text"},
{&CppGetWordT0, "get_word_t0"},
{&CppGetWordT1, "get_word_t1"},
{&CppGetParakeetWordCount, "get_parakeet_word_count"},
{&CppGetParakeetWordText, "get_parakeet_word_text"},
{&CppGetParakeetWordT0, "get_parakeet_word_t0"},
{&CppGetParakeetWordT1, "get_parakeet_word_t1"},
}
for _, lf := range libFuncs {

View File

@@ -51,32 +51,6 @@ else
exit 1
fi
# Bundle espeak-ng (+ its libpcaudio/libsonic runtime deps) and its voice data so
# the piper TTS backend can phonemize non-English text. CrispASR dlopens
# libespeak-ng.so.1 at runtime (the MIT-clean path); the dlopen succeeds loading
# libespeak-ng but FAILS if libpcaudio/libsonic are absent, so all three .so are
# required. run.sh points CRISPASR_ESPEAK_DATA_PATH at the bundled data dir.
# Best-effort: only copied when present, so a local dev build without espeak-ng
# installed still packages the rest (English voices keep working).
ESPEAK_LIBDIR=""
for d in /usr/lib/x86_64-linux-gnu /usr/lib/aarch64-linux-gnu; do
if [ -f "$d/libespeak-ng.so.1" ]; then
ESPEAK_LIBDIR="$d"
break
fi
done
if [ -n "$ESPEAK_LIBDIR" ]; then
echo "Bundling espeak-ng from $ESPEAK_LIBDIR ..."
cp -arfLv "$ESPEAK_LIBDIR/libespeak-ng.so.1" $CURDIR/package/lib/
cp -arfLv "$ESPEAK_LIBDIR/libpcaudio.so.0" $CURDIR/package/lib/
cp -arfLv "$ESPEAK_LIBDIR/libsonic.so.0" $CURDIR/package/lib/
if [ -d "$ESPEAK_LIBDIR/espeak-ng-data" ]; then
cp -arfLv "$ESPEAK_LIBDIR/espeak-ng-data" $CURDIR/package/
fi
else
echo "espeak-ng not found; non-English piper voices will not phonemize"
fi
# Package GPU libraries based on BUILD_TYPE
# The GPU library packaging script will detect BUILD_TYPE and copy appropriate GPU libraries
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"

View File

@@ -41,11 +41,6 @@ fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export CRISPASR_LIBRARY=$LIBRARY
# Point piper's espeak-ng phonemizer at the bundled voice data. The variable
# names the directory CONTAINING espeak-ng-data (package.sh drops it next to
# this script). Harmless when espeak-ng wasn't bundled.
export CRISPASR_ESPEAK_DATA_PATH=$CURDIR
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"

View File

@@ -1,7 +0,0 @@
sources/
build*/
package/
libdepthanythingcpp*.so
depth-anything-cpp
test-models/
test-data/

View File

@@ -1,28 +0,0 @@
cmake_minimum_required(VERSION 3.18)
project(libdepthanythingcpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# Static-link ggml into the depth-anything shared library so the resulting .so
# has no runtime dependency on an external libggml — only on
# libc/libstdc++/libgomp, which the LocalAI package step bundles into the
# docker image.
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build static libraries" FORCE)
# depth-anything.cpp build switches: skip CLI/tests, but build libdepthanything
# itself as a SHARED library (DA_SHARED) while ggml stays static
# (BUILD_SHARED_LIBS OFF above). The da_capi_* C ABI is compiled into
# src/da_capi.cpp and re-exported by that shared library, so no extra MODULE
# wrapper is needed (unlike locate-anything.cpp).
set(DA_BUILD_CLI OFF CACHE BOOL "Disable depth-anything CLI" FORCE)
set(DA_BUILD_TESTS OFF CACHE BOOL "Disable depth-anything tests" FORCE)
set(DA_SHARED ON CACHE BOOL "Build libdepthanything as a shared lib" FORCE)
add_subdirectory(./sources/depth-anything.cpp)
# Emit libdepthanything.so into the top-level build dir so the Makefile can
# rename it to the per-variant libdepthanythingcpp-<variant>.so.
set_target_properties(depthanything PROPERTIES
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

View File

@@ -1,141 +0,0 @@
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# depth-anything.cpp. Pin to a specific commit for a stable build; a squash
# merge upstream can orphan a branch, so the native version is pinned by SHA.
# This SHA adds the Depth Anything V2 engine + C-API routing (depth-only,
# relative + metric) on top of the nested two-file metric C-API (abi_version 4,
# da_capi_load_nested) required by the depth-anything-3-nested gallery model.
# It is kept alive by the upstream tag da2-support (survives a squash-merge);
# repoint to the master merge commit once mudler/depth-anything.cpp PR #1 lands.
DEPTHANYTHING_REPO?=https://github.com/mudler/depth-anything.cpp.git
DEPTHANYTHING_VERSION?=f4e17dea695dd12ae76bea98ba58030996b98118
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# Forward LocalAI's BUILD_TYPE to the matching ggml backend switch. depth-anything.cpp
# force-sets GGML_CUDA/GGML_VULKAN/GGML_METAL from its own DA_GGML_* options, so
# those must be toggled via the DA_GGML_* names (a bare -DGGML_CUDA=ON would be
# overridden); the remaining ggml switches pass straight through.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON -DDA_GGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON
else ifeq ($(BUILD_TYPE),hipblas)
ROCM_HOME ?= /opt/rocm
ROCM_PATH ?= /opt/rocm
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
AMDGPU_TARGETS?=gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1200,gfx1201
CMAKE_ARGS+=-DGGML_HIPBLAS=ON -DAMDGPU_TARGETS=$(AMDGPU_TARGETS)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
CMAKE_ARGS+=-DDA_GGML_METAL=ON
endif
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
endif
sources/depth-anything.cpp:
mkdir -p sources && \
git clone --recursive $(DEPTHANYTHING_REPO) sources/depth-anything.cpp && \
cd sources/depth-anything.cpp && \
git checkout $(DEPTHANYTHING_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
# Detect OS
UNAME_S := $(shell uname -s)
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libdepthanythingcpp-fallback.so
endif
depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o depth-anything-cpp ./
package: depth-anything-cpp
bash package.sh
build: package
clean: purge
rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
purge:
rm -rf build*
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
libdepthanythingcpp-avx.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-avx2.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx2${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:avx512${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
endif
# Build fallback variant (all platforms)
libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
libdepthanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
all: depth-anything-cpp package
# `test` is invoked by the top-level Makefile's `test-extra` target. It builds
# the backend binary + the fallback shared library (needed for dlopen at
# runtime), then runs test.sh which downloads a small GGUF + a test image and
# exercises the gRPC Load/Predict wire path via the Go smoke test in
# main_test.go.
test: depth-anything-cpp libdepthanythingcpp-fallback.so
bash test.sh

View File

@@ -1,556 +0,0 @@
package main
// godepthanythingcpp.go - gRPC handlers (Load, Predict, GenerateImage) for the
// depth-anything-cpp backend, wrapping the Depth Anything 3 ggml C-API
// (libdepthanythingcpp-<variant>.so) via purego.
//
// Embeds base.SingleThread to default the unimplemented RPCs to "not supported"
// and to serialize calls — the C side shares a ggml graph allocator and is NOT
// reentrant, so all inference must run one-at-a-time.
//
// Depth has no native OpenAI endpoint, so the model is exposed two ways:
//
// - GenerateImage(src, dst): run depth on the src image and write a
// min-max-normalised grayscale depth PNG to dst.
// - Predict(images[0]): run depth+pose and return a JSON blob with the depth
// dimensions, depth stats and the camera extrinsics (3x4) / intrinsics (3x3).
import (
"encoding/base64"
"encoding/json"
"fmt"
"image"
"image/png"
"math"
"os"
"path/filepath"
"strings"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// C-API function pointers, registered in main.go via purego. The da_capi_*
// symbols live inside libdepthanything (src/da_capi.cpp) and are re-exported by
// the DA_SHARED build.
var (
// da_capi_load(const char* gguf_path, int n_threads) -> da_ctx* (0 = fail)
CapiLoad func(gguf string, nThreads int32) uintptr
// da_capi_load_nested(const char* anyview_gguf, const char* metric_gguf,
// int n_threads) -> da_ctx* (0 = fail). The returned ctx serves the nested
// metric model: depth/pose calls produce final metric-scale depth + scaled pose.
CapiLoadNested func(anyview string, metric string, nThreads int32) uintptr
// da_capi_free(da_ctx* ctx) — safe on a 0 handle.
CapiFree func(handle uintptr)
// da_capi_last_error(da_ctx* ctx) -> const char* (owned by ctx, "" if none).
// purego marshals the returned C string into a Go string (a copy), so we
// never free it.
CapiLastError func(handle uintptr) string
// da_capi_depth_path(ctx, image_path, out_h*, out_w*) -> float* depth map
// (row-major H*W); nil on error. Caller frees via da_capi_free_floats.
CapiDepthPath func(handle uintptr, imagePath string, outH *int32, outW *int32) *float32
// da_capi_free_floats(float* p)
CapiFreeFloats func(p *float32)
// da_capi_pose_path(ctx, image_path, out_ext[12], out_intr[9]) -> 0 ok, -1 err
CapiPosePath func(handle uintptr, imagePath string, outExt *float32, outIntr *float32) int32
// da_capi_depth_dense(ctx, image_path, out_h*, out_w*, out_depth**, out_conf**,
// out_sky**, out_ext[12], out_intr[9], out_is_metric*) -> 0 ok, -1 err.
// Each non-NULL out_depth/out_conf/out_sky receives a malloc'd float[H*W] (free
// via da_capi_free_floats); buffers the model doesn't produce are set NULL.
CapiDepthDense func(handle uintptr, imagePath string,
outH, outW *int32,
outDepth, outConf, outSky **float32,
outExt, outIntr *float32,
outIsMetric *int32) int32
// da_capi_points(ctx, image_path, conf_thresh, out_n*, out_xyz**, out_rgb**) ->
// 0 ok, -1 err. *out_xyz = malloc'd float[3*N] (free via da_capi_free_floats),
// *out_rgb = malloc'd uint8[3*N] (free via da_capi_free_bytes).
CapiPoints func(handle uintptr, imagePath string, confThresh float32,
outN *int32, outXyz **float32, outRgb **byte) int32
// da_capi_free_bytes(unsigned char* p)
CapiFreeBytes func(p *byte)
// da_capi_export_glb(ctx, image_path, out_glb) -> 0 ok, -1 err
CapiExportGlb func(handle uintptr, imagePath string, outGlb string) int32
// da_capi_export_colmap(ctx, image_path, out_dir, binary) -> 0 ok, -1 err
CapiExportColmap func(handle uintptr, imagePath string, outDir string, binary int32) int32
)
type DepthAnythingCpp struct {
base.SingleThread
handle uintptr
}
// Load loads the GGUF model at opts.ModelFile (joined with opts.ModelPath if
// relative) and stores the da_ctx handle for later inference calls.
func (r *DepthAnythingCpp) Load(opts *pb.ModelOptions) error {
modelFile := opts.ModelFile
if modelFile == "" {
modelFile = opts.Model
}
if modelFile == "" {
return fmt.Errorf("depth-anything-cpp: ModelFile is empty")
}
resolve := func(name string) string {
if filepath.IsAbs(name) {
return name
}
return filepath.Join(opts.ModelPath, name)
}
modelPath := resolve(modelFile)
if _, err := os.Stat(modelPath); err != nil {
return fmt.Errorf("depth-anything-cpp: model file not found: %s: %w", modelPath, err)
}
// Nested metric models are a two-file pair: the main model is the anyview
// (GIANT) branch and the metric (ViT-L + DPT/sky) branch is named via a
// "metric_model:<filename>" entry in opts.Options. When present we load both
// branches so the engine runs the nested metric alignment.
metricFile := optionValue(opts.Options, "metric_model")
threads := opts.Threads
if threads <= 0 {
threads = 4
}
// Release previous model if any (re-Load).
if r.handle != 0 {
CapiFree(r.handle)
r.handle = 0
}
var h uintptr
if metricFile != "" {
metricPath := resolve(metricFile)
if _, err := os.Stat(metricPath); err != nil {
return fmt.Errorf("depth-anything-cpp: metric_model file not found: %s: %w", metricPath, err)
}
h = CapiLoadNested(modelPath, metricPath, threads)
if h == 0 {
if msg := CapiLastError(0); msg != "" {
return fmt.Errorf("depth-anything-cpp: da_capi_load_nested failed for %s + %s: %s", modelPath, metricPath, msg)
}
return fmt.Errorf("depth-anything-cpp: da_capi_load_nested failed for %s + %s", modelPath, metricPath)
}
} else {
h = CapiLoad(modelPath, threads)
if h == 0 {
// da_capi_last_error needs a ctx; on a failed load we have none (it
// returns "" for a null ctx), so the text is best-effort.
if msg := CapiLastError(0); msg != "" {
return fmt.Errorf("depth-anything-cpp: da_capi_load failed for %s: %s", modelPath, msg)
}
return fmt.Errorf("depth-anything-cpp: da_capi_load failed for %s", modelPath)
}
}
r.handle = h
return nil
}
// optionValue returns the value of the first "key:value" entry in opts whose key
// matches (case-sensitive), or "" if absent. Mirrors how other LocalAI backends
// read ModelOptions.Options.
func optionValue(opts []string, key string) string {
prefix := key + ":"
for _, o := range opts {
if strings.HasPrefix(o, prefix) {
return strings.TrimSpace(o[len(prefix):])
}
}
return ""
}
// depthResult is the JSON payload returned by Predict.
type depthResult struct {
DepthW int `json:"depth_w"`
DepthH int `json:"depth_h"`
DepthMin float32 `json:"depth_min"`
DepthMax float32 `json:"depth_max"`
Extrinsics [12]float32 `json:"extrinsics"` // 3x4 row-major
Intrinsics [9]float32 `json:"intrinsics"` // 3x3 row-major
}
// Predict runs depth+pose on the first supplied image and returns depth
// statistics + camera pose as a JSON string. LocalAI wraps the string into the
// Reply.Message of the gRPC response. The image in Images[0] may be a
// filesystem path or a base64-encoded payload.
func (r *DepthAnythingCpp) Predict(opts *pb.PredictOptions) (string, error) {
imgs := opts.GetImages()
if len(imgs) == 0 {
return "", fmt.Errorf("depth-anything-cpp: Predict requires an image in Images[]")
}
imgPath, cleanup, err := materializeImage(imgs[0])
if err != nil {
return "", fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
depth, h, w, ext, intr, err := r.runDepthPose(imgPath)
if err != nil {
return "", err
}
dmin, dmax := minMax(depth)
payload, err := json.Marshal(depthResult{
DepthW: w, DepthH: h,
DepthMin: dmin, DepthMax: dmax,
Extrinsics: ext, Intrinsics: intr,
})
if err != nil {
return "", fmt.Errorf("depth-anything-cpp: marshal: %w", err)
}
return string(payload), nil
}
// GenerateImage runs depth on req.Src and writes a normalised grayscale depth
// PNG to req.Dst.
func (r *DepthAnythingCpp) GenerateImage(req *pb.GenerateImageRequest) error {
if req.GetSrc() == "" {
return fmt.Errorf("depth-anything-cpp: GenerateImage requires src")
}
if req.GetDst() == "" {
return fmt.Errorf("depth-anything-cpp: GenerateImage requires dst")
}
imgPath, cleanup, err := materializeImage(req.GetSrc())
if err != nil {
return fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
depth, h, w, _, _, err := r.runDepthPose(imgPath)
if err != nil {
return err
}
return writeDepthPNG(req.GetDst(), depth, h, w)
}
// Depth is the typed Depth RPC. It runs the Depth Anything 3 pipeline on the
// request's src image and fills a DepthResponse honoring the include_* flags and
// exports: per-pixel metric depth + confidence (DualDPT) or depth + sky (mono),
// camera extrinsics/intrinsics, an optional back-projected 3D point cloud and
// glb/COLMAP exports. The src may be a filesystem path or a base64 payload.
func (r *DepthAnythingCpp) Depth(in *pb.DepthRequest) (pb.DepthResponse, error) {
// Accumulate into locals and return a single composite literal at the end:
// returning a named pb.DepthResponse value would copy its embedded mutex
// (go vet copylocks).
if r.handle == 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: model not loaded")
}
if in.GetSrc() == "" {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: Depth requires src")
}
imgPath, cleanup, err := materializeImage(in.GetSrc())
if err != nil {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: %w", err)
}
defer cleanup()
// Dense per-pixel output + pose. Pass buffer pointers only for the
// requested maps so the native side can skip unrequested work; ext/intr
// must always point at 12/9 floats per the C ABI.
var (
h, w, isMetric int32
depthPtr, confPtr *float32
skyPtr *float32
ext [12]float32
intr [9]float32
pDepth, pConf, pSky **float32
)
if in.GetIncludeDepth() {
pDepth = &depthPtr
}
if in.GetIncludeConfidence() {
pConf = &confPtr
}
if in.GetIncludeSky() {
pSky = &skyPtr
}
rc := CapiDepthDense(r.handle, imgPath, &h, &w, pDepth, pConf, pSky, &ext[0], &intr[0], &isMetric)
if rc != 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: da_capi_depth_dense failed (rc=%d): %s", rc, r.lastError())
}
n := int(h) * int(w)
var (
depth, conf, sky []float32
extrinsics, intrinsic []float32
numPoints int32
points []float32
pointColors []byte
exportPaths []string
)
if depthPtr != nil {
depth = copyFloats(depthPtr, n)
CapiFreeFloats(depthPtr)
}
if confPtr != nil {
conf = copyFloats(confPtr, n)
CapiFreeFloats(confPtr)
}
if skyPtr != nil {
sky = copyFloats(skyPtr, n)
CapiFreeFloats(skyPtr)
}
if in.GetIncludePose() {
extrinsics = append([]float32(nil), ext[:]...)
intrinsic = append([]float32(nil), intr[:]...)
}
// 3D point cloud (DualDPT / pose-capable models only).
if in.GetIncludePoints() {
var (
np int32
xyzPtr *float32
rgbPtr *byte
)
if rc := CapiPoints(r.handle, imgPath, in.GetPointsConfThresh(), &np, &xyzPtr, &rgbPtr); rc != 0 {
return pb.DepthResponse{}, fmt.Errorf("depth-anything-cpp: da_capi_points failed (rc=%d): %s", rc, r.lastError())
}
numPoints = np
if xyzPtr != nil {
points = copyFloats(xyzPtr, int(np)*3)
CapiFreeFloats(xyzPtr)
}
if rgbPtr != nil {
pointColors = copyBytes(rgbPtr, int(np)*3)
CapiFreeBytes(rgbPtr)
}
}
// Exports (glb / colmap). They are written under in.Dst (a directory); a
// temp dir is used when Dst is empty.
if len(in.GetExports()) > 0 {
exportPaths, err = r.runExports(imgPath, in.GetDst(), in.GetExports())
if err != nil {
return pb.DepthResponse{}, err
}
}
return pb.DepthResponse{
Width: w,
Height: h,
Depth: depth,
Confidence: conf,
Sky: sky,
Extrinsics: extrinsics,
Intrinsics: intrinsic,
NumPoints: numPoints,
Points: points,
PointColors: pointColors,
ExportPaths: exportPaths,
IsMetric: isMetric != 0,
}, nil
}
// runExports writes the requested exports for imgPath into dstDir and returns
// the written paths. Supported exports: "glb", "colmap".
func (r *DepthAnythingCpp) runExports(imgPath, dstDir string, exports []string) ([]string, error) {
if dstDir == "" {
tmp, err := os.MkdirTemp("", "depth-anything-export-*")
if err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir export dir: %w", err)
}
dstDir = tmp
} else if err := os.MkdirAll(dstDir, 0o750); err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir %s: %w", dstDir, err)
}
var paths []string
for _, exp := range exports {
switch exp {
case "glb":
out := filepath.Join(dstDir, "pointcloud.glb")
if rc := CapiExportGlb(r.handle, imgPath, out); rc != 0 {
return nil, fmt.Errorf("depth-anything-cpp: da_capi_export_glb failed (rc=%d): %s", rc, r.lastError())
}
paths = append(paths, out)
case "colmap":
out := filepath.Join(dstDir, "colmap")
if err := os.MkdirAll(out, 0o750); err != nil {
return nil, fmt.Errorf("depth-anything-cpp: mkdir %s: %w", out, err)
}
if rc := CapiExportColmap(r.handle, imgPath, out, 1); rc != 0 {
return nil, fmt.Errorf("depth-anything-cpp: da_capi_export_colmap failed (rc=%d): %s", rc, r.lastError())
}
paths = append(paths, out)
default:
return nil, fmt.Errorf("depth-anything-cpp: unknown export %q (want glb|colmap)", exp)
}
}
return paths, nil
}
// copyFloats copies n float32 values from a C heap pointer into a fresh Go
// slice so the C buffer can be freed afterwards.
func copyFloats(p *float32, n int) []float32 {
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
out := make([]float32, n)
copy(out, src)
return out
}
// copyBytes copies n bytes from a C heap pointer into a fresh Go slice.
func copyBytes(p *byte, n int) []byte {
if p == nil || n <= 0 {
return nil
}
src := unsafe.Slice(p, n)
out := make([]byte, n)
copy(out, src)
return out
}
// runDepthPose runs depth estimation then pose recovery on an image file. It
// returns the row-major depth map (length h*w), its dimensions, the 3x4
// extrinsics (12 floats) and 3x3 intrinsics (9 floats).
// runDepthPose returns depth + camera pose via two C-API calls (depth then pose).
// For a nested metric model both calls run the full two-branch pipeline, so this
// path infers twice; the typed Depth RPC (single da_capi_depth_dense call) is the
// efficient path for nested models.
func (r *DepthAnythingCpp) runDepthPose(imagePath string) (depth []float32, h, w int, ext [12]float32, intr [9]float32, err error) {
if r.handle == 0 {
err = fmt.Errorf("depth-anything-cpp: model not loaded")
return
}
var ch, cw int32
ptr := CapiDepthPath(r.handle, imagePath, &ch, &cw)
if ptr == nil {
err = fmt.Errorf("depth-anything-cpp: da_capi_depth_path failed: %s", r.lastError())
return
}
h, w = int(ch), int(cw)
n := h * w
if n > 0 {
src := unsafe.Slice(ptr, n)
depth = make([]float32, n)
copy(depth, src)
}
CapiFreeFloats(ptr)
if rc := CapiPosePath(r.handle, imagePath, &ext[0], &intr[0]); rc != 0 {
err = fmt.Errorf("depth-anything-cpp: da_capi_pose_path failed (rc=%d): %s", rc, r.lastError())
return
}
return
}
// lastError returns the context's last error string, or "" if none.
func (r *DepthAnythingCpp) lastError() string {
if CapiLastError == nil || r.handle == 0 {
return ""
}
return CapiLastError(r.handle)
}
// materializeImage returns a filesystem path for an image argument that may be
// either an existing path or a base64-encoded payload. When the input is
// base64 it is decoded into a temp file; cleanup removes it (no-op for a path).
func materializeImage(arg string) (path string, cleanup func(), err error) {
cleanup = func() {}
if _, statErr := os.Stat(arg); statErr == nil {
return arg, cleanup, nil
}
// Strip an optional data URL prefix (data:image/...;base64,<payload>).
b64 := arg
if i := indexComma(b64); i >= 0 && hasDataPrefix(b64) {
b64 = b64[i+1:]
}
data, decErr := base64.StdEncoding.DecodeString(b64)
if decErr != nil {
return "", cleanup, fmt.Errorf("image is neither an existing path nor valid base64: %v", decErr)
}
f, tErr := os.CreateTemp("", "depth-anything-*.img")
if tErr != nil {
return "", cleanup, tErr
}
if _, wErr := f.Write(data); wErr != nil {
_ = f.Close()
_ = os.Remove(f.Name())
return "", cleanup, wErr
}
_ = f.Close()
name := f.Name()
return name, func() { _ = os.Remove(name) }, nil
}
func hasDataPrefix(s string) bool {
return len(s) >= 5 && s[:5] == "data:"
}
func indexComma(s string) int {
for i := 0; i < len(s); i++ {
if s[i] == ',' {
return i
}
}
return -1
}
// writeDepthPNG min-max normalises a depth map and writes it as an 8-bit
// grayscale PNG. Near = bright (255), far = dark (0), matching the usual
// depth-map convention for inverse-depth-like outputs.
func writeDepthPNG(dst string, depth []float32, h, w int) error {
if h <= 0 || w <= 0 || len(depth) < h*w {
return fmt.Errorf("depth-anything-cpp: writeDepthPNG: bad dims h=%d w=%d len=%d", h, w, len(depth))
}
dmin, dmax := minMax(depth)
span := dmax - dmin
if span <= 0 || math.IsNaN(float64(span)) {
span = 1
}
img := image.NewGray(image.Rect(0, 0, w, h))
for y := 0; y < h; y++ {
for x := 0; x < w; x++ {
v := depth[y*w+x]
n := (v - dmin) / span // 0..1
if math.IsNaN(float64(n)) {
n = 0
}
if n < 0 {
n = 0
} else if n > 1 {
n = 1
}
img.Pix[y*img.Stride+x] = uint8(n * 255)
}
}
// dst is the gRPC-provided output path chosen by the LocalAI core (the
// intended write destination for the rendered depth map), not
// attacker-controlled input, so the variable path is expected here.
f, err := os.Create(dst) // #nosec G304
if err != nil {
return err
}
defer func() { _ = f.Close() }()
return png.Encode(f, img)
}
func minMax(v []float32) (mn, mx float32) {
if len(v) == 0 {
return 0, 0
}
mn, mx = v[0], v[0]
for _, x := range v {
if math.IsNaN(float64(x)) || math.IsInf(float64(x), 0) {
continue
}
if x < mn {
mn = x
}
if x > mx {
mx = x
}
}
return mn, mx
}

View File

@@ -1,62 +0,0 @@
package main
// main.go - entry point for the depth-anything-cpp gRPC backend.
//
// Dlopens libdepthanythingcpp-<variant>.so via purego at the path in
// DEPTHANYTHING_LIBRARY (set by run.sh based on /proc/cpuinfo), registers the
// da_capi_* C ABI symbols, then starts the gRPC server.
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("DEPTHANYTHING_LIBRARY")
if libName == "" {
libName = "./libdepthanythingcpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CapiLoad, "da_capi_load"},
{&CapiLoadNested, "da_capi_load_nested"},
{&CapiFree, "da_capi_free"},
{&CapiLastError, "da_capi_last_error"},
{&CapiDepthPath, "da_capi_depth_path"},
{&CapiFreeFloats, "da_capi_free_floats"},
{&CapiPosePath, "da_capi_pose_path"},
{&CapiDepthDense, "da_capi_depth_dense"},
{&CapiPoints, "da_capi_points"},
{&CapiFreeBytes, "da_capi_free_bytes"},
{&CapiExportGlb, "da_capi_export_glb"},
{&CapiExportColmap, "da_capi_export_colmap"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &DepthAnythingCpp{}); err != nil {
panic(err)
}
}

View File

@@ -1,167 +0,0 @@
package main
// main_test.go - end-to-end smoke test for the depth-anything-cpp gRPC backend.
//
// Spawns the compiled depth-anything-cpp binary on a free local port, dials it
// via gRPC, and exercises LoadModel + Predict against the test fixtures
// downloaded by test.sh: the small (vits) f32 GGUF of Depth Anything 3 and a
// real photo. Asserts that Predict returns a JSON payload with a positive
// depth-map width/height.
//
// The spec Skip()s cleanly if its fixtures (the model, the test image, the
// built binary, or the fallback .so) are missing, so the test target stays
// usable on a fresh checkout / on CI runners where the model hasn't been
// downloaded.
import (
"context"
"encoding/base64"
"encoding/json"
"fmt"
"net"
"os"
"os/exec"
"path/filepath"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func TestDepth(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "depth-anything-cpp backend smoke suite")
}
// freePort grabs an ephemeral TCP port and immediately releases it so the
// spawned backend can bind to it. There is a tiny TOCTOU window here but in
// practice it's adequate for a smoke test on a quiet runner.
func freePort() int {
l, err := net.Listen("tcp", "127.0.0.1:0")
Expect(err).ToNot(HaveOccurred(), "freePort listen")
port := l.Addr().(*net.TCPAddr).Port
Expect(l.Close()).To(Succeed())
return port
}
// startBackend spawns the depth-anything-cpp binary on the given port and waits
// until it accepts TCP connections (up to 10s). It mirrors how main.go resolves
// the purego library: the DEPTHANYTHING_LIBRARY env var points the dlopen at the
// freshly built fallback .so. The returned cleanup func kills the process.
func startBackend(port int) func() {
binary, err := filepath.Abs("./depth-anything-cpp")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(binary); err != nil {
Skip(fmt.Sprintf("backend binary not built: %s (run `make depth-anything-cpp` first)", binary))
}
libPath, err := filepath.Abs("./libdepthanythingcpp-fallback.so")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(libPath); err != nil {
Skip(fmt.Sprintf("fallback library not built: %s (run `make libdepthanythingcpp-fallback.so` first)", libPath))
}
addr := fmt.Sprintf("127.0.0.1:%d", port)
cmd := exec.Command(binary, "--addr", addr)
cmd.Env = append(os.Environ(), "DEPTHANYTHING_LIBRARY="+libPath)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
Expect(cmd.Start()).To(Succeed())
cleanup := func() {
if cmd.Process != nil {
_ = cmd.Process.Kill()
_, _ = cmd.Process.Wait()
}
}
deadline := time.Now().Add(10 * time.Second)
for time.Now().Before(deadline) {
c, err := net.DialTimeout("tcp", addr, 200*time.Millisecond)
if err == nil {
_ = c.Close()
return cleanup
}
time.Sleep(200 * time.Millisecond)
}
cleanup()
Fail(fmt.Sprintf("backend did not become ready on %s within 10s", addr))
return func() {}
}
// loadTestImage reads the test image downloaded by test.sh and returns its
// base64-encoded content (one of the wire formats accepted by Predict).
func loadTestImage() string {
imgPath, err := filepath.Abs("test-data/test.jpg")
Expect(err).ToNot(HaveOccurred())
imgBytes, err := os.ReadFile(imgPath)
if err != nil {
Skip(fmt.Sprintf("test image not present: %s (run test.sh first)", imgPath))
}
return base64.StdEncoding.EncodeToString(imgBytes)
}
// dialBackend opens a gRPC client connection to the spawned backend.
func dialBackend(port int) (pb.BackendClient, func()) {
addr := fmt.Sprintf("127.0.0.1:%d", port)
conn, err := grpc.NewClient(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
Expect(err).ToNot(HaveOccurred())
return pb.NewBackendClient(conn), func() { _ = conn.Close() }
}
// modelPathOrSkip resolves the model file under ./test-models/ and Skip()s the
// current spec if it's missing (not present on a fresh checkout / on CI runners
// without the download).
func modelPathOrSkip(name string) string {
modelDir, err := filepath.Abs("test-models")
Expect(err).ToNot(HaveOccurred())
modelPath := filepath.Join(modelDir, name)
if _, err := os.Stat(modelPath); err != nil {
Skip(fmt.Sprintf("model not present: %s (run test.sh first)", modelPath))
}
return modelPath
}
var _ = Describe("depth-anything-cpp backend", func() {
It("runs depth+pose against a known-good image", func() {
modelPath := modelPathOrSkip("depth-anything-small-f32.gguf")
imgB64 := loadTestImage()
port := freePort()
cleanup := startBackend(port)
defer cleanup()
client, closeConn := dialBackend(port)
defer closeConn()
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Minute)
defer cancel()
loadResp, err := client.LoadModel(ctx, &pb.ModelOptions{
Model: "depth-anything-small-f32.gguf",
ModelFile: modelPath,
Threads: 4,
})
Expect(err).ToNot(HaveOccurred(), "LoadModel")
Expect(loadResp.GetSuccess()).To(BeTrue(), "LoadModel reported failure: %s", loadResp.GetMessage())
// Predict runs depth+pose and returns the JSON depthResult in Reply.Message.
reply, err := client.Predict(ctx, &pb.PredictOptions{
Images: []string{imgB64},
})
Expect(err).ToNot(HaveOccurred(), "Predict")
var res depthResult
Expect(json.Unmarshal(reply.GetMessage(), &res)).To(Succeed(), "Predict returned non-JSON: %q", string(reply.GetMessage()))
Expect(res.DepthW).To(BeNumerically(">", 0), "depth width should be positive")
Expect(res.DepthH).To(BeNumerically(">", 0), "depth height should be positive")
_, _ = fmt.Fprintf(GinkgoWriter, "depth OK: %dx%d min=%.3f max=%.3f\n",
res.DepthW, res.DepthH, res.DepthMin, res.DepthMax)
})
})

View File

@@ -1,64 +0,0 @@
package main
// nested_e2e_test.go - e2e smoke for the nested two-file metric model. Loads the
// anyview branch as the main model and points the metric branch via the
// "metric_model:<file>" option (exactly as the depth-anything-3-nested gallery
// entry does), then exercises the typed Depth RPC and asserts a metric depth map.
//
// Skips cleanly unless both nested GGUFs are present under ./test-models/ and the
// backend binary + fallback .so are built.
import (
"context"
"fmt"
"path/filepath"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("depth-anything-cpp nested metric model", func() {
It("loads the two-file pair via the metric_model option and returns metric depth", func() {
anyviewPath := modelPathOrSkip("depth-anything-nested-anyview.gguf")
_ = modelPathOrSkip("depth-anything-nested-metric.gguf")
imgB64 := loadTestImage()
port := freePort()
cleanup := startBackend(port)
defer cleanup()
client, closeConn := dialBackend(port)
defer closeConn()
ctx, cancel := context.WithTimeout(context.Background(), 25*time.Minute)
defer cancel()
loadResp, err := client.LoadModel(ctx, &pb.ModelOptions{
Model: "depth-anything-nested-anyview.gguf",
ModelFile: anyviewPath,
ModelPath: filepath.Dir(anyviewPath),
Options: []string{"metric_model:depth-anything-nested-metric.gguf"},
Threads: 8,
})
Expect(err).ToNot(HaveOccurred(), "LoadModel(nested)")
Expect(loadResp.GetSuccess()).To(BeTrue(), "LoadModel reported failure: %s", loadResp.GetMessage())
resp, err := client.Depth(ctx, &pb.DepthRequest{
Src: imgB64,
IncludeDepth: true,
IncludePose: true,
})
Expect(err).ToNot(HaveOccurred(), "Depth(nested)")
Expect(resp.GetWidth()).To(BeNumerically(">", 0), "depth width")
Expect(resp.GetHeight()).To(BeNumerically(">", 0), "depth height")
Expect(resp.GetIsMetric()).To(BeTrue(), "nested output must be metric")
Expect(len(resp.GetDepth())).To(Equal(int(resp.GetWidth())*int(resp.GetHeight())), "dense depth length")
Expect(len(resp.GetExtrinsics())).To(Equal(12), "extrinsics 3x4")
Expect(resp.GetIntrinsics()[0]).To(BeNumerically(">", 0), "fx > 0")
_, _ = fmt.Fprintf(GinkgoWriter, "nested depth OK: %dx%d is_metric=%v fx=%.2f\n",
resp.GetWidth(), resp.GetHeight(), resp.GetIsMetric(), resp.GetIntrinsics()[0])
})
})

View File

@@ -1,20 +0,0 @@
package main
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = DescribeTable("optionValue",
func(opts []string, key, want string) {
Expect(optionValue(opts, key)).To(Equal(want))
},
Entry("present", []string{"foo:bar", "metric_model:m.gguf"}, "metric_model", "m.gguf"),
Entry("absent", []string{"foo:bar"}, "metric_model", ""),
Entry("nil", []string(nil), "metric_model", ""),
Entry("trims space", []string{"metric_model: m.gguf "}, "metric_model", "m.gguf"),
Entry("value with colon", []string{"metric_model:a:b.gguf"}, "metric_model", "a:b.gguf"),
Entry("first wins", []string{"metric_model:first.gguf", "metric_model:second.gguf"}, "metric_model", "first.gguf"),
Entry("empty value", []string{"metric_model:"}, "metric_model", ""),
Entry("prefix not key", []string{"metric_model_extra:x"}, "metric_model", ""),
)

View File

@@ -1,59 +0,0 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,52 +0,0 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/depth-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/depth-anything-cpp "$@"

View File

@@ -1,45 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
echo "Running depth-anything-cpp backend tests..."
# Test model from the mudler/depth-anything.cpp-gguf HuggingFace repo. The small
# (vits) f32 GGUF is the lightest backbone (~131 MB), so it keeps the download
# cheap. It is resumed with `curl -C -` and skipped entirely if already present.
DEPTHANYTHING_MODEL_DIR="${DEPTHANYTHING_MODEL_DIR:-$CURDIR/test-models}"
DEPTHANYTHING_MODEL_FILE="${DEPTHANYTHING_MODEL_FILE:-depth-anything-small-f32.gguf}"
DEPTHANYTHING_MODEL_URL="${DEPTHANYTHING_MODEL_URL:-https://huggingface.co/mudler/depth-anything.cpp-gguf/resolve/main/depth-anything-small-f32.gguf}"
mkdir -p "$DEPTHANYTHING_MODEL_DIR"
if [ ! -f "$DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE" ]; then
echo "Downloading depth-anything small f32 model (~131 MB)..."
# -C - resumes a partial download so an interrupted run doesn't restart from 0.
curl -L -C - -o "$DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE" "$DEPTHANYTHING_MODEL_URL" --progress-bar
fi
# Use a real photo (people + cars) from the upstream rf-detr.cpp repo (~46 KB).
# Depth estimation needs real content; a synthetic image would be degenerate.
TEST_IMAGE_DIR="$CURDIR/test-data"
TEST_IMAGE_FILE="$TEST_IMAGE_DIR/test.jpg"
TEST_IMAGE_URL="${TEST_IMAGE_URL:-https://raw.githubusercontent.com/mudler/rf-detr.cpp/main/tests/fixtures/ci/test_image.jpg}"
mkdir -p "$TEST_IMAGE_DIR"
if [ ! -f "$TEST_IMAGE_FILE" ]; then
echo "Downloading test image..."
curl -L -o "$TEST_IMAGE_FILE" "$TEST_IMAGE_URL" --progress-bar
fi
echo "depth-anything-cpp test setup complete."
echo " model: $DEPTHANYTHING_MODEL_DIR/$DEPTHANYTHING_MODEL_FILE"
echo " test image: $TEST_IMAGE_FILE"
# Run the Go smoke test: spawns the backend binary on a free port, calls
# LoadModel + Predict via gRPC against the downloaded GGUF + image.
echo ""
echo "Running Go smoke test..."
cd "$CURDIR"
go test -v -timeout 30m ./...

View File

@@ -1,18 +0,0 @@
# Fetched upstream sources
sources/
# CMake build directories
build*/
# build artifacts staged in-tree by the Makefile (cp from sources/) or
# symlinked for local dev; the real sources live in face-detect.cpp upstream.
*.so
*.so.*
facedetect_capi.h
compile_commands.json
# Compiled backend binary
face-detect-grpc
# Packaging output
package/

View File

@@ -1,97 +0,0 @@
# face-detect backend Makefile.
#
# Upstream pin lives below as FACEDETECT_VERSION?=636a1963... (.github/bump_deps.sh
# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
# convention).
#
# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
# symlink the .so + header into this directory and skip the clone/cmake steps:
#
# ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
# ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
# go build -o face-detect-grpc .
#
# The default target below does the proper clone-at-pin + cmake build so CI does
# not need a side-checkout.
FACEDETECT_VERSION?=636a19631a400694a08edb7e707288003b7093aa
FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
BUILD_TYPE?=
NATIVE?=false
# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
# side, so only the facedetect_capi_* surface is exported.
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
else ifeq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
endif
.PHONY: face-detect-grpc package build clean purge test all
all: face-detect-grpc
# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
# bump, run 'make purge && make' to refetch.
sources/face-detect.cpp:
mkdir -p sources/face-detect.cpp
cd sources/face-detect.cpp && \
git init -q && \
git remote add origin $(FACEDETECT_REPO) && \
git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1 --single-branch
# Build the shared lib + header out-of-tree, then stage them next to the Go
# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
# them up.
libfacedetect.so: sources/face-detect.cpp
cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
package: face-detect-grpc
bash package.sh
build: package
# Test target. The embed/detect/verify/analyze smoke specs are gated on
# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
# heavy specs auto-skip and only the pure-Go parsing specs run.
test:
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
clean: purge
rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
purge:
rm -rf sources/face-detect.cpp

View File

@@ -1,416 +0,0 @@
package main
import (
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"math"
"os"
"path/filepath"
"strings"
"time"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/xlog"
)
// purego-bound entry points from libfacedetect.so. Names match
// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
// is enough to spot drift.
//
// The opaque ctx and the malloc'd char*/float* return values are declared as
// uintptr so we get the raw pointer back and can release it via the matching
// capi free function. purego's native string/[]float32 returns would copy and
// forget the original pointer, leaking the C-owned buffer on every call.
var (
CppAbiVersion func() int32
CppLoad func(ggufPath string) uintptr
CppFree func(ctx uintptr)
CppLastError func(ctx uintptr) string
CppFreeString func(s uintptr)
CppFreeVec func(v uintptr)
CppEmbedPath func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
CppEmbedRGB func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
CppDetectJSON func(ctx uintptr, imagePath string) uintptr
CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
)
// FaceDetect implements the face-recognition (biometric) subset of the Backend
// gRPC service over libfacedetect.so. The C side keeps a single loaded model
// pack plus a per-ctx last-error buffer and is not reentrant, so
// base.SingleThread serializes every call.
type FaceDetect struct {
base.SingleThread
opts loadOptions
ctxPtr uintptr
}
func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
if model == "" {
return errors.New("face-detect: ModelFile is required")
}
f.opts = parseOptions(opts.Options)
if f.opts.modelName == "" {
f.opts.modelName = filepath.Base(model)
}
xlog.Info("face-detect: loading model", "model", model,
"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
ctx := CppLoad(model)
if ctx == 0 {
// The last-error buffer lives on the ctx that was never returned, so
// surface the path the operator tried to load instead.
return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
}
f.ctxPtr = ctx
return nil
}
// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
// the supplied image. Mirroring the Python face backend, the image is read from
// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
// server wraps the returned slice in an EmbeddingResult.
func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
if f.ctxPtr == 0 {
return nil, errors.New("face-detect: model not loaded")
}
if len(req.Images) == 0 || req.Images[0] == "" {
return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
}
path, cleanup, err := materializeImage(req.Images[0])
if err != nil {
return nil, err
}
defer cleanup()
return f.embedPath(path)
}
func (f *FaceDetect) embedPath(path string) ([]float32, error) {
var vec uintptr
var dim int32
rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
if rc != 0 || vec == 0 || dim <= 0 {
return nil, f.lastErr("embed", path)
}
defer CppFreeVec(vec)
// Copy out of the C-owned malloc'd buffer before freeing it. The
// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
// nor moves this buffer and we copy immediately.
src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
out := make([]float32, int(dim))
copy(out, src)
return out, nil
}
// Detect runs SCRFD over the image and returns one Detection per face. The
// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
// width/height, so the corners are converted. The 5 facial landmarks the engine
// also returns are dropped: the Detection message has no field for them.
func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
if f.ctxPtr == 0 {
return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
}
if req.Src == "" {
return pb.DetectResponse{}, errors.New("face-detect: src image is required")
}
path, cleanup, err := materializeImage(req.Src)
if err != nil {
return pb.DetectResponse{}, err
}
defer cleanup()
faces, err := f.detectFaces(path)
if err != nil {
return pb.DetectResponse{}, err
}
dets := make([]*pb.Detection, 0, len(faces))
for _, fc := range faces {
if req.Threshold > 0 && fc.Score < req.Threshold {
continue
}
x, y, w, h := fc.xywh()
dets = append(dets, &pb.Detection{
X: x,
Y: y,
Width: w,
Height: h,
Confidence: fc.Score,
ClassName: "face",
})
}
return pb.DetectResponse{Detections: dets}, nil
}
// FaceVerify embeds the primary face in each image and reports whether they are
// the same identity by cosine distance against a threshold. A request threshold
// <= 0 falls back to the model-configured default (verify_threshold option,
// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
// veto internally (verified forced false on a spoof); the per-image liveness
// scores are not exposed by the verify entry point, so img*_is_real /
// img*_antispoof_score stay at their zero values.
func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
if f.ctxPtr == 0 {
return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
}
if req.Img1 == "" || req.Img2 == "" {
return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
}
path1, cleanup1, err := materializeImage(req.Img1)
if err != nil {
return pb.FaceVerifyResponse{}, err
}
defer cleanup1()
path2, cleanup2, err := materializeImage(req.Img2)
if err != nil {
return pb.FaceVerifyResponse{}, err
}
defer cleanup2()
threshold := req.Threshold
if threshold <= 0 {
threshold = f.opts.verifyThreshold
}
antiSpoof := int32(0)
if req.AntiSpoofing {
antiSpoof = 1
}
started := time.Now()
var distance float32
var verified int32
rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
unsafe.Pointer(&distance), unsafe.Pointer(&verified))
if rc != 0 {
return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
}
elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
// matching the Python face backend's reporting.
confidence := float32(0)
if threshold > 0 {
confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
}
return pb.FaceVerifyResponse{
Verified: verified != 0,
Distance: distance,
Threshold: threshold,
Confidence: confidence,
Model: f.opts.modelName,
Img1Area: f.bestArea(path1),
Img2Area: f.bestArea(path2),
ProcessingTimeMs: elapsedMs,
}, nil
}
// FaceAnalyze runs the genderage head on every detected face. The C-API returns
// "M"/"F" gender labels and a rounded age; the labels are normalized to the
// "Man"/"Woman" values the proto documents.
func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
if f.ctxPtr == 0 {
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
}
if req.Img == "" {
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
}
path, cleanup, err := materializeImage(req.Img)
if err != nil {
return pb.FaceAnalyzeResponse{}, err
}
defer cleanup()
ptr := CppAnalyzeJSON(f.ctxPtr, path)
if ptr == 0 {
return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
}
defer CppFreeString(ptr)
faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
if err != nil {
return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
}
return pb.FaceAnalyzeResponse{Faces: faces}, nil
}
// faceBox is one entry of the detect/analyze JSON documents the engine emits.
type faceBox struct {
Score float32 `json:"score"`
Box []float32 `json:"box"`
Age float32 `json:"age"`
Gender string `json:"gender"`
}
// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
// proto carries. A short or missing box yields zeros.
func (b faceBox) xywh() (x, y, w, h float32) {
if len(b.Box) < 4 {
return 0, 0, 0, 0
}
return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
}
type facesJSON struct {
Faces []faceBox `json:"faces"`
}
func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
ptr := CppDetectJSON(f.ctxPtr, path)
if ptr == 0 {
return nil, f.lastErr("detect", path)
}
defer CppFreeString(ptr)
var doc facesJSON
if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
}
return doc.Faces, nil
}
// bestArea returns the FacialArea of the highest-scoring face in an image, or an
// empty area when detection fails or finds nothing. Best-effort: verify already
// succeeded, so a missing region must not turn a valid match into an error.
func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
faces, err := f.detectFaces(path)
if err != nil || len(faces) == 0 {
return &pb.FacialArea{}
}
best := faces[0]
for _, fc := range faces[1:] {
if fc.Score > best.Score {
best = fc
}
}
x, y, w, h := best.xywh()
return &pb.FacialArea{X: x, Y: y, W: w, H: h}
}
// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
// The engine reports gender as "M"/"F"; both the dominant label and the score
// map are filled with the "Man"/"Woman" form the proto documents.
func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
var parsed facesJSON
if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
return nil, err
}
out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
for _, fc := range parsed.Faces {
x, y, w, h := fc.xywh()
fa := &pb.FaceAnalysis{
Region: &pb.FacialArea{X: x, Y: y, W: w, H: h},
FaceConfidence: fc.Score,
Age: fc.Age,
}
if label := normalizeGender(fc.Gender); label != "" {
fa.DominantGender = label
fa.Gender = map[string]float32{label: 1.0}
}
out = append(out, fa)
}
return out, nil
}
// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
// proto documents. Unknown codes pass through unchanged.
func normalizeGender(g string) string {
switch strings.ToUpper(strings.TrimSpace(g)) {
case "M":
return "Man"
case "F":
return "Woman"
case "":
return ""
default:
return g
}
}
// materializeImage decodes a base64 image payload into a temp file and returns
// its path plus a cleanup func. As a convenience for callers that already pass a
// filesystem path (e.g. a test fixture), an existing path is used as-is with a
// no-op cleanup. data: URI prefixes are stripped before decoding.
func materializeImage(src string) (path string, cleanup func(), err error) {
noop := func() {}
if src == "" {
return "", noop, errors.New("face-detect: empty image input")
}
if _, statErr := os.Stat(src); statErr == nil {
return src, noop, nil
}
payload := src
if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
payload = payload[i+1:]
}
data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
if decErr != nil || len(data) == 0 {
return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
}
tmp, createErr := os.CreateTemp("", "face-detect-*.img")
if createErr != nil {
return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
}
cleanup = func() { _ = os.Remove(tmp.Name()) }
if _, wErr := tmp.Write(data); wErr != nil {
_ = tmp.Close()
cleanup()
return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
}
if cErr := tmp.Close(); cErr != nil {
cleanup()
return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
}
return tmp.Name(), cleanup, nil
}
// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
func (f *FaceDetect) lastErr(op, subject string) error {
msg := strings.TrimSpace(CppLastError(f.ctxPtr))
if msg == "" {
msg = "no error detail"
}
return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
}
// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
//
// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
// moves the buffer and we dereference it immediately to copy the bytes out.
func goStringFromCPtr(cptr uintptr) string {
if cptr == 0 {
return ""
}
p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
n := 0
for *(*byte)(unsafe.Add(p, n)) != 0 {
n++
}
return string(unsafe.Slice((*byte)(p), n))
}

View File

@@ -1,230 +0,0 @@
package main
import (
"encoding/base64"
"os"
"sync"
"testing"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestFaceDetect(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "face-detect Backend Suite")
}
var (
libLoadOnce sync.Once
libLoadErr error
)
// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
// bridge without spinning up the gRPC server. Records the error (the smoke
// specs skip themselves) when libfacedetect.so is not loadable from cwd
// (LD_LIBRARY_PATH or a symlink in ./).
func ensureLibLoaded() error {
libLoadOnce.Do(func() {
libName := os.Getenv("FACEDETECT_LIBRARY")
if libName == "" {
libName = "libfacedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
libLoadErr = err
return
}
purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
})
return libLoadErr
}
var _ = Describe("parseOptions", func() {
It("defaults verify_threshold to 0.35", func() {
o := parseOptions(nil)
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
Expect(o.modelName).To(Equal(""))
})
It("parses verify_threshold, threshold alias and model_name", func() {
o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
Expect(o.verifyThreshold).To(Equal(float32(0.4)))
Expect(o.modelName).To(Equal("buffalo_l"))
o2 := parseOptions([]string{"threshold:0.3"})
Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
})
It("ignores non-positive thresholds and keeps the default", func() {
o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
})
})
var _ = Describe("normalizeGender", func() {
It("maps M/F codes to Man/Woman", func() {
Expect(normalizeGender("M")).To(Equal("Man"))
Expect(normalizeGender("f")).To(Equal("Woman"))
Expect(normalizeGender(" m ")).To(Equal("Man"))
})
It("passes empty and unknown codes through", func() {
Expect(normalizeGender("")).To(Equal(""))
Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
})
})
var _ = Describe("faceBox.xywh", func() {
It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
b := faceBox{Box: []float32{10, 20, 50, 80}}
x, y, w, h := b.xywh()
Expect(x).To(Equal(float32(10)))
Expect(y).To(Equal(float32(20)))
Expect(w).To(Equal(float32(40)))
Expect(h).To(Equal(float32(60)))
})
It("returns zeros for a short box", func() {
x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
})
})
var _ = Describe("parseAnalyzeJSON", func() {
It("maps region, age and gender for each face", func() {
doc := `{"faces":[
{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
faces, err := parseAnalyzeJSON(doc)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(HaveLen(2))
Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
Expect(faces[0].DominantGender).To(Equal("Man"))
Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
Expect(faces[0].Region.W).To(Equal(float32(40)))
Expect(faces[0].Region.H).To(Equal(float32(60)))
Expect(faces[1].DominantGender).To(Equal("Woman"))
})
It("tolerates a missing gender field", func() {
faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(HaveLen(1))
Expect(faces[0].DominantGender).To(Equal(""))
Expect(faces[0].Gender).To(BeEmpty())
})
It("returns no faces for an empty document", func() {
faces, err := parseAnalyzeJSON(`{"faces":[]}`)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(BeEmpty())
})
It("returns an error on malformed JSON", func() {
_, err := parseAnalyzeJSON(`{not-json`)
Expect(err).To(HaveOccurred())
})
})
var _ = Describe("materializeImage", func() {
It("decodes a base64 payload to a temp file", func() {
payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
path, cleanup, err := materializeImage(payload)
Expect(err).ToNot(HaveOccurred())
defer cleanup()
data, rerr := os.ReadFile(path)
Expect(rerr).ToNot(HaveOccurred())
Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
})
It("strips a data: URI prefix before decoding", func() {
payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
path, cleanup, err := materializeImage(payload)
Expect(err).ToNot(HaveOccurred())
defer cleanup()
data, rerr := os.ReadFile(path)
Expect(rerr).ToNot(HaveOccurred())
Expect(data).To(Equal([]byte("hello")))
})
It("uses an existing path as-is", func() {
tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
Expect(err).ToNot(HaveOccurred())
defer func() { _ = os.Remove(tmp.Name()) }()
Expect(tmp.Close()).To(Succeed())
path, cleanup, err := materializeImage(tmp.Name())
Expect(err).ToNot(HaveOccurred())
defer cleanup()
Expect(path).To(Equal(tmp.Name()))
})
It("errors on input that is neither a path nor base64", func() {
_, _, err := materializeImage("not base64!!!")
Expect(err).To(HaveOccurred())
})
})
// The specs below exercise the real C-API end to end. They run only when both a
// model GGUF and a test image are provided, and skip cleanly otherwise so the
// suite stays green without large assets.
var _ = Describe("FaceDetect end-to-end", Ordered, func() {
var (
f *FaceDetect
modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
)
BeforeAll(func() {
if modelPath == "" || imagePath == "" {
Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
}
if err := ensureLibLoaded(); err != nil {
Skip("libfacedetect.so not loadable: " + err.Error())
}
f = &FaceDetect{}
Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
})
It("embeds the primary face in an image", func() {
emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
Expect(err).ToNot(HaveOccurred())
Expect(emb).ToNot(BeEmpty())
})
It("detects at least one face", func() {
resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Detections).ToNot(BeEmpty())
Expect(resp.Detections[0].ClassName).To(Equal("face"))
})
It("verifies an image against itself as the same identity", func() {
resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Verified).To(BeTrue())
Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
})
It("analyzes age/gender for each face", func() {
resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Faces).ToNot(BeEmpty())
})
})

View File

@@ -1,65 +0,0 @@
package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads libfacedetect.so via purego and registers the flat C-API entry points
// declared in facedetect_capi.h. The library name can be overridden with
// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
// convention in the sibling backends); the default looks for the .so next to
// this binary (resolved via LD_LIBRARY_PATH by run.sh).
import (
"flag"
"fmt"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
libName := os.Getenv("FACEDETECT_LIBRARY")
if libName == "" {
libName = "libfacedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
}
// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
// uintptr so the raw pointer can be freed via the matching capi free fn.
libFuncs := []LibFuncs{
{&CppAbiVersion, "facedetect_capi_abi_version"},
{&CppLoad, "facedetect_capi_load"},
{&CppFree, "facedetect_capi_free"},
{&CppLastError, "facedetect_capi_last_error"},
{&CppFreeString, "facedetect_capi_free_string"},
{&CppFreeVec, "facedetect_capi_free_vec"},
{&CppEmbedPath, "facedetect_capi_embed_path"},
{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
{&CppDetectJSON, "facedetect_capi_detect_path_json"},
{&CppVerifyPaths, "facedetect_capi_verify_paths"},
{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
flag.Parse()
if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
panic(err)
}
}

View File

@@ -1,47 +0,0 @@
package main
import (
"strconv"
"strings"
)
// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
// face backend ships with so the two implementations agree on verdicts out of
// the box.
const defaultVerifyThreshold float32 = 0.35
// loadOptions holds the parsed model-level options for face-detect.
type loadOptions struct {
verifyThreshold float32
modelName string
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
func parseOptions(opts []string) loadOptions {
o := loadOptions{verifyThreshold: defaultVerifyThreshold}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "verify_threshold", "threshold":
if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
o.verifyThreshold = float32(f)
}
case "model_name":
o.modelName = value
}
}
return o
}

View File

@@ -1,68 +0,0 @@
#!/bin/bash
#
# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
# is used instead of the host's.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
# LD_LIBRARY_PATH, which run.sh points at lib/.
cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
exit 1
}
# Detect architecture and copy the core runtime libs libfacedetect.so links
# against, plus the matching dynamic loader as lib/ld.so.
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
# BUILD_TYPE so the backend can reach the GPU without the runtime base image
# shipping those drivers.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -1,16 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the voice-detect /
# whisper / parakeet backends' runtime layout).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
fi
exec "$CURDIR/face-detect-grpc" "$@"

View File

@@ -1,15 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
cd "$CURDIR"
echo "Running face-detect backend tests..."
# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
# specs run only when a model + image are provided via
# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
# auto-skip.
LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
echo "face-detect tests completed."

View File

@@ -1,7 +0,0 @@
sources/
build*/
package/
liblocateanythingcpp*.so
locate-anything-cpp
test-models/
test-data/

View File

@@ -1,57 +0,0 @@
cmake_minimum_required(VERSION 3.18)
project(liblocateanythingcpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# Static-link ggml + locate_anything so the resulting .so has no runtime
# dependency on extra ggml/locate_anything shared libraries — only on
# libc/libstdc++/libgomp, which the LocalAI package step bundles into the
# docker image.
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build static libraries" FORCE)
# locate-anything.cpp build switches: skip CLI/tests, keep static lib.
set(LA_BUILD_CLI OFF CACHE BOOL "Disable locate-anything CLI" FORCE)
set(LA_BUILD_TESTS OFF CACHE BOOL "Disable locate-anything tests" FORCE)
set(LA_SHARED OFF CACHE BOOL "Build locate_anything as static lib" FORCE)
# Unlike rt-detr.cpp, locate-anything.cpp ships no in-tree ggml patches, so
# there is no apply_ggml_patches.sh hook to shim here.
add_subdirectory(./sources/locate-anything.cpp)
# locate-anything.cpp's top-level CMakeLists points its own target's include
# dirs at ${CMAKE_SOURCE_DIR}/{include,src,third_party,...}. CMAKE_SOURCE_DIR
# is the *top-level* source dir of the whole CMake tree, so when we pull it in
# via add_subdirectory it resolves to OUR directory, not theirs, and the
# locate_anything target fails to find its own headers (la_capi.h, stb_image.h,
# la_gguf_keys.h). Re-add the correct, subdir-relative include paths to the
# already-defined target so it compiles regardless of where it's nested.
set(LA_SRC ${CMAKE_CURRENT_SOURCE_DIR}/sources/locate-anything.cpp)
target_include_directories(locate_anything PRIVATE
${LA_SRC}/include
${LA_SRC}/src
${LA_SRC}/third_party
${LA_SRC}/third_party/stb)
# locate-anything.cpp's C-API symbols already live inside liblocate_anything
# (src/la_capi.cpp is compiled into the lib). We re-export them via a MODULE
# library that links locate_anything so the symbols are visible at dlopen time.
add_library(locateanythingcpp MODULE
sources/locate-anything.cpp/src/la_capi.cpp)
target_include_directories(locateanythingcpp PRIVATE
sources/locate-anything.cpp/include
sources/locate-anything.cpp/src
sources/locate-anything.cpp/third_party
sources/locate-anything.cpp/third_party/stb
)
target_link_libraries(locateanythingcpp PRIVATE locate_anything ggml)
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9.0)
target_link_libraries(locateanythingcpp PRIVATE stdc++fs)
endif()
set_property(TARGET locateanythingcpp PROPERTY CXX_STANDARD 17)
set_target_properties(locateanythingcpp PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

View File

@@ -1,134 +0,0 @@
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# locate-anything.cpp. Pin to a specific commit for a stable build; leaving
# this on `master` always picks up the latest C-API surface (incl. the
# per-detection accessor functions used by golocateanythingcpp.go).
LOCATEANYTHING_REPO?=https://github.com/mudler/locate-anything.cpp.git
LOCATEANYTHING_VERSION?=92c1682da792c1e8a5dec91acc2be4b02c742ded
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# Forward LocalAI's BUILD_TYPE to the matching ggml backend switch.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON -DLA_GGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON
else ifeq ($(BUILD_TYPE),hipblas)
ROCM_HOME ?= /opt/rocm
ROCM_PATH ?= /opt/rocm
export CXX=$(ROCM_HOME)/llvm/bin/clang++
export CC=$(ROCM_HOME)/llvm/bin/clang
AMDGPU_TARGETS?=gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1200,gfx1201
CMAKE_ARGS+=-DGGML_HIPBLAS=ON -DAMDGPU_TARGETS=$(AMDGPU_TARGETS)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DLA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
CMAKE_ARGS+=-DLA_GGML_METAL=ON
endif
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
endif
sources/locate-anything.cpp:
mkdir -p sources && \
git clone --recursive $(LOCATEANYTHING_REPO) sources/locate-anything.cpp && \
cd sources/locate-anything.cpp && \
git checkout $(LOCATEANYTHING_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
# Detect OS
UNAME_S := $(shell uname -s)
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = liblocateanythingcpp-fallback.so
endif
locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o locate-anything-cpp ./
package: locate-anything-cpp
bash package.sh
build: package
clean: purge
rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources
purge:
rm -rf build*
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
liblocateanythingcpp-avx.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:avx${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
liblocateanythingcpp-avx2.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:avx2${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:avx512${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
endif
# Build fallback variant (all platforms)
liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
liblocateanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)
all: locate-anything-cpp package
# `test` is invoked by the top-level Makefile's `test-extra` target. It builds
# the backend binary + the fallback shared library (needed for dlopen at
# runtime), then runs test.sh which downloads the q8_0 GGUF + COCO image and
# exercises the gRPC Load/Detect wire path via the Go smoke test in
# main_test.go.
test: locate-anything-cpp liblocateanythingcpp-fallback.so
bash test.sh

View File

@@ -1,174 +0,0 @@
package main
// golocateanythingcpp.go - gRPC handlers (Load, Detect) for the
// locate-anything-cpp backend.
//
// Embeds base.SingleThread to default unimplemented RPCs to "not supported"
// while we only implement open-vocabulary object detection (Detect).
import (
"encoding/base64"
"fmt"
"os"
"path/filepath"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// la_ctx* is an opaque handle. la_capi_load returns it directly (0 == failure),
// unlike rfdetr's out-parameter convention.
var (
// la_capi_load(const char* gguf_path, int n_threads) -> la_ctx* (0 = fail)
CapiLoad func(gguf string, nThreads int32) uintptr
// la_capi_free(la_ctx* ctx)
CapiFree func(handle uintptr)
// la_capi_locate_path(ctx, image_path, prompt, mode) -> char* json (0 = err)
CapiLocatePath func(handle uintptr, imagePath string, prompt string, mode int32) uintptr
// la_capi_locate_buffer(ctx, bytes, len, prompt, mode) -> char* json (0 = err)
CapiLocateBuffer func(handle uintptr, bytes uintptr, length uintptr, prompt string, mode int32) uintptr
// la_capi_get_n_detections(ctx) -> int
CapiGetNDetections func(handle uintptr) int32
// la_capi_get_detection_box(ctx, i, out_xyxy[4]) -> int (0 on success)
CapiGetDetectionBox func(handle uintptr, i int32, outXYXY uintptr) int32
// la_capi_get_detection_label(ctx, i, buf, buf_size) -> int (required size incl NUL; two-call sizing)
CapiGetDetectionLabel func(handle uintptr, i int32, buf uintptr, bufSize int32) int32
// la_capi_free_string(char* s)
CapiFreeString func(s uintptr)
// la_capi_last_error(ctx) -> const char* (owned by ctx, "" if none / null ctx).
// purego marshals the returned C string into a Go string (a copy), so we
// never free it and avoid raw pointer arithmetic.
CapiLastError func(handle uintptr) string
)
type LocateAnythingCpp struct {
base.SingleThread
handle uintptr
}
// Load loads the GGUF model at opts.ModelFile (joined with opts.ModelPath if
// relative) and stores the la_ctx handle for later Detect calls.
func (r *LocateAnythingCpp) Load(opts *pb.ModelOptions) error {
modelFile := opts.ModelFile
if modelFile == "" {
modelFile = opts.Model
}
if modelFile == "" {
return fmt.Errorf("locate-anything-cpp: ModelFile is empty")
}
var modelPath string
if filepath.IsAbs(modelFile) {
modelPath = modelFile
} else {
modelPath = filepath.Join(opts.ModelPath, modelFile)
}
if _, err := os.Stat(modelPath); err != nil {
return fmt.Errorf("locate-anything-cpp: model file not found: %s: %w", modelPath, err)
}
threads := opts.Threads
if threads <= 0 {
threads = 4
}
// Release previous model if any (re-Load).
if r.handle != 0 {
CapiFree(r.handle)
r.handle = 0
}
h := CapiLoad(modelPath, threads)
if h == 0 {
// la_capi_last_error needs a ctx; on a failed load we have none (it
// returns "" for a null ctx), so the text is best-effort. Surface it
// when present.
if msg := CapiLastError(0); msg != "" {
return fmt.Errorf("locate-anything-cpp: la_capi_load failed for %s: %s", modelPath, msg)
}
return fmt.Errorf("locate-anything-cpp: la_capi_load failed for %s", modelPath)
}
r.handle = h
return nil
}
// Detect runs open-vocabulary detection on the base64-encoded image in opts.Src
// using the required text prompt in opts.Prompt, returning one pb.Detection per
// located object with its predicted label as ClassName.
func (r *LocateAnythingCpp) Detect(opts *pb.DetectOptions) (pb.DetectResponse, error) {
if r.handle == 0 {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: model not loaded")
}
// Open-vocabulary detection is prompt-driven; without a prompt there is
// nothing to locate.
prompt := opts.Prompt
if prompt == "" {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: a text prompt is required (open-vocabulary detection)")
}
// Decode base64 image and write to temp file.
imgData, err := base64.StdEncoding.DecodeString(opts.Src)
if err != nil {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: failed to decode base64 image: %w", err)
}
tmpFile, err := os.CreateTemp("", "locate-anything-*.img")
if err != nil {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: failed to create temp file: %w", err)
}
defer func() { _ = os.Remove(tmpFile.Name()) }()
if _, err := tmpFile.Write(imgData); err != nil {
_ = tmpFile.Close()
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: failed to write temp file: %w", err)
}
if err := tmpFile.Close(); err != nil {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: failed to close temp file: %w", err)
}
// mode 0 = hybrid (Parallel Box Decoding). The JSON return value is unused:
// structured detections are read via the accessor functions. Still must
// free the returned string.
jsonPtr := CapiLocatePath(r.handle, tmpFile.Name(), prompt, 0)
if jsonPtr != 0 {
CapiFreeString(jsonPtr)
}
n := CapiGetNDetections(r.handle)
if n < 0 {
return pb.DetectResponse{}, fmt.Errorf("locate-anything-cpp: invalid n_detections=%d", n)
}
detections := make([]*pb.Detection, 0, n)
for i := int32(0); i < n; i++ {
var xyxy [4]float32 // x1, y1, x2, y2
if CapiGetDetectionBox(r.handle, i, uintptr(unsafe.Pointer(&xyxy[0]))) != 0 {
continue
}
// Two-call sizing for the label string.
label := ""
need := CapiGetDetectionLabel(r.handle, i, 0, 0)
if need > 0 {
buf := make([]byte, need)
CapiGetDetectionLabel(r.handle, i, uintptr(unsafe.Pointer(&buf[0])), need)
label = string(buf[:need-1])
}
detections = append(detections, &pb.Detection{
X: xyxy[0],
Y: xyxy[1],
Width: xyxy[2] - xyxy[0],
Height: xyxy[3] - xyxy[1],
Confidence: 1.0,
ClassName: label,
})
}
return pb.DetectResponse{
Detections: detections,
}, nil
}

View File

@@ -1,59 +0,0 @@
package main
// main.go - entry point for the locate-anything-cpp gRPC backend.
//
// Dlopens liblocateanythingcpp-<variant>.so via purego at the path in
// LOCATEANYTHING_LIBRARY (set by run.sh based on /proc/cpuinfo), registers
// the la_capi_* C ABI symbols, then starts the gRPC server.
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("LOCATEANYTHING_LIBRARY")
if libName == "" {
libName = "./liblocateanythingcpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CapiLoad, "la_capi_load"},
{&CapiFree, "la_capi_free"},
{&CapiLocatePath, "la_capi_locate_path"},
{&CapiLocateBuffer, "la_capi_locate_buffer"},
{&CapiGetNDetections, "la_capi_get_n_detections"},
{&CapiGetDetectionBox, "la_capi_get_detection_box"},
{&CapiGetDetectionLabel, "la_capi_get_detection_label"},
{&CapiFreeString, "la_capi_free_string"},
{&CapiLastError, "la_capi_last_error"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &LocateAnythingCpp{}); err != nil {
panic(err)
}
}

View File

@@ -1,176 +0,0 @@
package main
// main_test.go - end-to-end smoke test for the locate-anything-cpp gRPC backend.
//
// Spawns the compiled locate-anything-cpp binary on a free local port, dials it
// via gRPC, and exercises LoadModel + Detect against the test fixtures
// downloaded by test.sh: the q8_0 GGUF of nvidia/LocateAnything-3B and a real
// COCO image with people + cars. Asserts that open-vocabulary detection driven
// by a text prompt returns at least one detection, each carrying a non-empty
// class name and a bounding box of non-zero size.
//
// The spec Skip()s cleanly if its fixtures (the ~6.3 GB model, the test image,
// the built binary, or the fallback .so) are missing, so the test target stays
// usable on a fresh checkout / on CI runners where the large model hasn't been
// downloaded.
import (
"context"
"encoding/base64"
"fmt"
"net"
"os"
"os/exec"
"path/filepath"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func TestDetect(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "locate-anything-cpp backend smoke suite")
}
// freePort grabs an ephemeral TCP port and immediately releases it so the
// spawned backend can bind to it. There is a tiny TOCTOU window here but in
// practice it's adequate for a smoke test on a quiet runner.
func freePort() int {
l, err := net.Listen("tcp", "127.0.0.1:0")
Expect(err).ToNot(HaveOccurred(), "freePort listen")
port := l.Addr().(*net.TCPAddr).Port
Expect(l.Close()).To(Succeed())
return port
}
// startBackend spawns the locate-anything-cpp binary on the given port and
// waits until it accepts TCP connections (up to 10s). It mirrors how main.go
// resolves the purego library: the LOCATEANYTHING_LIBRARY env var points the
// dlopen at the freshly built fallback .so, and the la_capi_* symbols are
// registered there. The returned cleanup func kills the process and reaps it.
func startBackend(port int) func() {
binary, err := filepath.Abs("./locate-anything-cpp")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(binary); err != nil {
Skip(fmt.Sprintf("backend binary not built: %s (run `make locate-anything-cpp` first)", binary))
}
libPath, err := filepath.Abs("./liblocateanythingcpp-fallback.so")
Expect(err).ToNot(HaveOccurred())
if _, err := os.Stat(libPath); err != nil {
Skip(fmt.Sprintf("fallback library not built: %s (run `make liblocateanythingcpp-fallback.so` first)", libPath))
}
addr := fmt.Sprintf("127.0.0.1:%d", port)
cmd := exec.Command(binary, "--addr", addr)
cmd.Env = append(os.Environ(), "LOCATEANYTHING_LIBRARY="+libPath)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
Expect(cmd.Start()).To(Succeed())
cleanup := func() {
if cmd.Process != nil {
_ = cmd.Process.Kill()
_, _ = cmd.Process.Wait()
}
}
deadline := time.Now().Add(10 * time.Second)
for time.Now().Before(deadline) {
c, err := net.DialTimeout("tcp", addr, 200*time.Millisecond)
if err == nil {
_ = c.Close()
return cleanup
}
time.Sleep(200 * time.Millisecond)
}
cleanup()
Fail(fmt.Sprintf("backend did not become ready on %s within 10s", addr))
return func() {}
}
// loadTestImage reads the COCO test image downloaded by test.sh and returns its
// base64-encoded content (the wire format accepted by the Detect RPC).
func loadTestImage() string {
imgPath, err := filepath.Abs("test-data/test.jpg")
Expect(err).ToNot(HaveOccurred())
imgBytes, err := os.ReadFile(imgPath)
if err != nil {
Skip(fmt.Sprintf("test image not present: %s (run test.sh first)", imgPath))
}
return base64.StdEncoding.EncodeToString(imgBytes)
}
// dialBackend opens a gRPC client connection to the spawned backend.
func dialBackend(port int) (pb.BackendClient, func()) {
addr := fmt.Sprintf("127.0.0.1:%d", port)
conn, err := grpc.NewClient(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
Expect(err).ToNot(HaveOccurred())
return pb.NewBackendClient(conn), func() { _ = conn.Close() }
}
// modelPathOrSkip resolves the model file under ./test-models/ and Skip()s the
// current spec if it's missing (the ~6.3 GB GGUF is not present on a fresh
// checkout / on CI runners without the download).
func modelPathOrSkip(name string) string {
modelDir, err := filepath.Abs("test-models")
Expect(err).ToNot(HaveOccurred())
modelPath := filepath.Join(modelDir, name)
if _, err := os.Stat(modelPath); err != nil {
Skip(fmt.Sprintf("model not present: %s (run test.sh first)", modelPath))
}
return modelPath
}
var _ = Describe("locate-anything-cpp backend", func() {
It("runs open-vocabulary detection against a known-good COCO image", func() {
modelPath := modelPathOrSkip("locate-anything-q8_0.gguf")
imgB64 := loadTestImage()
port := freePort()
cleanup := startBackend(port)
defer cleanup()
client, closeConn := dialBackend(port)
defer closeConn()
// The q8_0 model is ~6.3 GB and hybrid Parallel Box Decoding on CPU is
// not cheap, so give LoadModel + Detect a generous deadline.
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Minute)
defer cancel()
loadResp, err := client.LoadModel(ctx, &pb.ModelOptions{
Model: "locate-anything-q8_0.gguf",
ModelFile: modelPath,
Threads: 4,
})
Expect(err).ToNot(HaveOccurred(), "LoadModel")
Expect(loadResp.GetSuccess()).To(BeTrue(), "LoadModel reported failure: %s", loadResp.GetMessage())
// Open-vocabulary detection is prompt-driven; the prompt names the
// classes to locate (people + cars), separated by the </c> control token.
detResp, err := client.Detect(ctx, &pb.DetectOptions{
Src: imgB64,
Prompt: "Locate all the instances that matches the following description: person</c>car.",
})
Expect(err).ToNot(HaveOccurred(), "Detect")
Expect(detResp.GetDetections()).ToNot(BeEmpty(), "no detections returned on a known-good COCO image")
_, _ = fmt.Fprintf(GinkgoWriter, "detection OK: %d detections\n", len(detResp.GetDetections()))
for i, d := range detResp.GetDetections() {
Expect(d.GetClassName()).ToNot(BeEmpty(), "detection %d has empty class_name", i)
Expect(d.GetWidth()).To(BeNumerically(">", float32(0)),
"detection %d has non-positive width", i)
Expect(d.GetHeight()).To(BeNumerically(">", float32(0)),
"detection %d has non-positive height", i)
_, _ = fmt.Fprintf(GinkgoWriter, " [%d] %s box=(%.1f,%.1f,%.1fx%.1f)\n",
i, d.GetClassName(), d.GetX(), d.GetY(), d.GetWidth(), d.GetHeight())
}
})
})

View File

@@ -1,59 +0,0 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,52 +0,0 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LOCATEANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/locate-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/locate-anything-cpp "$@"

View File

@@ -1,47 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
echo "Running locate-anything-cpp backend tests..."
# Test model from the mudler/locate-anything.cpp-gguf HuggingFace repo. This is
# the q8_0 quantization of nvidia/LocateAnything-3B (~6.3 GB), so the download
# is the slow step. It is resumed with `curl -C -` and skipped entirely if the
# file is already present.
LOCATEANYTHING_MODEL_DIR="${LOCATEANYTHING_MODEL_DIR:-$CURDIR/test-models}"
LOCATEANYTHING_MODEL_FILE="${LOCATEANYTHING_MODEL_FILE:-locate-anything-q8_0.gguf}"
LOCATEANYTHING_MODEL_URL="${LOCATEANYTHING_MODEL_URL:-https://huggingface.co/mudler/locate-anything.cpp-gguf/resolve/main/locate-anything-q8_0.gguf}"
mkdir -p "$LOCATEANYTHING_MODEL_DIR"
if [ ! -f "$LOCATEANYTHING_MODEL_DIR/$LOCATEANYTHING_MODEL_FILE" ]; then
echo "Downloading locate-anything q8_0 model (~6.3 GB, this is slow)..."
# -C - resumes a partial download so an interrupted run doesn't restart from 0.
curl -L -C - -o "$LOCATEANYTHING_MODEL_DIR/$LOCATEANYTHING_MODEL_FILE" "$LOCATEANYTHING_MODEL_URL" --progress-bar
fi
# Use a real COCO test image (people + cars) from the upstream rf-detr.cpp repo
# (~46 KB). Open-vocabulary detection needs real content to locate, so a
# synthetic image would trivially yield zero detections.
TEST_IMAGE_DIR="$CURDIR/test-data"
TEST_IMAGE_FILE="$TEST_IMAGE_DIR/test.jpg"
TEST_IMAGE_URL="${TEST_IMAGE_URL:-https://raw.githubusercontent.com/mudler/rf-detr.cpp/main/tests/fixtures/ci/test_image.jpg}"
mkdir -p "$TEST_IMAGE_DIR"
if [ ! -f "$TEST_IMAGE_FILE" ]; then
echo "Downloading COCO test image..."
curl -L -o "$TEST_IMAGE_FILE" "$TEST_IMAGE_URL" --progress-bar
fi
echo "locate-anything-cpp test setup complete."
echo " model: $LOCATEANYTHING_MODEL_DIR/$LOCATEANYTHING_MODEL_FILE"
echo " test image: $TEST_IMAGE_FILE"
# Run the Go smoke test: spawns the backend binary on a free port, calls
# LoadModel + Detect via gRPC against the downloaded GGUF + COCO image.
echo ""
echo "Running Go smoke test..."
cd "$CURDIR"
go test -v -timeout 30m ./...

View File

@@ -1,17 +0,0 @@
# Fetched upstream sources
sources/
# CMake build directories
build*/
# Compiled shared libraries
*.so
# Compiled backend binary
omnivoice-cpp
# Packaging output
package/
# Downloaded e2e models
omnivoice-models/

View File

@@ -1,53 +0,0 @@
cmake_minimum_required(VERSION 3.14)
project(gomnivoicecpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(OMNIVOICE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/omnivoice.cpp)
# Override upstream's CMAKE_CUDA_ARCHITECTURES before add_subdirectory.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "75-virtual;80-virtual;86-real;89-real")
endif()
# Add the upstream project. Its own CMakeLists adds ggml + builds
# omnivoice-core (STATIC, contains src/omnivoice.cpp i.e. the ov_* impl).
# EXCLUDE_FROM_ALL keeps its CLI tools/tests from building unless referenced.
add_subdirectory(${OMNIVOICE_DIR} omnivoice EXCLUDE_FROM_ALL)
# Upstream generates version.h into its own CMAKE_CURRENT_BINARY_DIR and adds
# the top-level ${CMAKE_BINARY_DIR} to omnivoice-core's include path. When the
# project is nested under add_subdirectory those two directories differ
# (<build>/omnivoice vs <build>), so omnivoice.cpp cannot find version.h. Point
# omnivoice-core at the subproject binary dir where version.h is actually
# generated. (Fix lives here, never in the fetched upstream checkout.)
target_include_directories(omnivoice-core PRIVATE ${CMAKE_BINARY_DIR}/omnivoice)
add_library(gomnivoicecpp MODULE cpp/gomnivoicecpp.cpp)
target_link_libraries(gomnivoicecpp PRIVATE omnivoice-core)
target_include_directories(gomnivoicecpp PRIVATE ${OMNIVOICE_DIR}/src)
target_include_directories(gomnivoicecpp SYSTEM PRIVATE ${OMNIVOICE_DIR}/ggml/include)
# Link GPU backends if the upstream ggml created them.
foreach(backend blas cuda metal vulkan sycl)
if(TARGET ggml-${backend})
target_link_libraries(gomnivoicecpp PRIVATE ggml-${backend})
if(backend STREQUAL "cuda")
find_package(CUDAToolkit QUIET)
if(CUDAToolkit_FOUND)
target_link_libraries(gomnivoicecpp PRIVATE CUDA::cudart)
endif()
endif()
endif()
endforeach()
if(MSVC)
target_compile_options(gomnivoicecpp PRIVATE /W4 /wd4100 /wd4505)
else()
target_compile_options(gomnivoicecpp PRIVATE -Wall -Wextra
-Wno-unused-parameter -Wno-unused-function)
endif()
set_property(TARGET gomnivoicecpp PROPERTY CXX_STANDARD 17)
set_target_properties(gomnivoicecpp PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

View File

@@ -1,122 +0,0 @@
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# omnivoice.cpp version
OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
OMNIVOICE_VERSION?=96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd
SO_TARGET?=libgomnivoicecpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DGGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),clblas)
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DGGML_HIPBLAS=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON
else ifeq ($(OS),Darwin)
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
CMAKE_ARGS+=-DGGML_METAL=ON
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
endif
endif
ifeq ($(BUILD_TYPE),sycl_f16)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx \
-DGGML_SYCL_F16=ON
endif
ifeq ($(BUILD_TYPE),sycl_f32)
CMAKE_ARGS+=-DGGML_SYCL=ON \
-DCMAKE_C_COMPILER=icx \
-DCMAKE_CXX_COMPILER=icpx
endif
sources/omnivoice.cpp:
mkdir -p sources/omnivoice.cpp
cd sources/omnivoice.cpp && \
git init && \
git remote add origin $(OMNIVOICE_REPO) && \
git fetch origin && \
git checkout $(OMNIVOICE_VERSION) && \
git submodule update --init --recursive --depth 1 --single-branch
# Detect OS
UNAME_S := $(shell uname -s)
# Only build CPU variants on Linux
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
else
VARIANT_TARGETS = libgomnivoicecpp-fallback.so
endif
omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o omnivoice-cpp ./
package: omnivoice-cpp
bash package.sh
build: package
clean: purge
rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
purge:
rm -rf build*
.NOTPARALLEL:
ifeq ($(UNAME_S),Linux)
libgomnivoicecpp-avx.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx${RESET})
SO_TARGET=libgomnivoicecpp-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx.so
libgomnivoicecpp-avx2.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx2${RESET})
SO_TARGET=libgomnivoicecpp-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx2.so
libgomnivoicecpp-avx512.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:avx512${RESET})
SO_TARGET=libgomnivoicecpp-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-avx512.so
endif
libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:fallback${RESET})
SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.so
libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
test: omnivoice-cpp
@echo "Running omnivoice-cpp tests..."
bash test.sh
@echo "omnivoice-cpp tests completed."
all: omnivoice-cpp package

View File

@@ -1,129 +0,0 @@
package main
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"runtime"
"github.com/go-audio/audio"
"github.com/go-audio/wav"
)
const omnivoiceSampleRate = 24000
// wavHeader24k returns a 44-byte WAV header for a streaming 24 kHz mono 16-bit
// PCM stream, with placeholder (0xFFFFFFFF) sizes since the total length is
// unknown up front. Emitted as the first chunk of TTSStream so the HTTP layer
// receives a self-describing WAV (the gRPC TTSStream path never sets Message,
// so the backend owns the header - see core/backend/tts.go:ModelTTSStream).
func wavHeader24k() []byte {
var buf bytes.Buffer
w := func(v any) { _ = binary.Write(&buf, binary.LittleEndian, v) }
buf.WriteString("RIFF")
w(uint32(0xFFFFFFFF))
buf.WriteString("WAVE")
buf.WriteString("fmt ")
w(uint32(16)) // Subchunk1Size
w(uint16(1)) // PCM
w(uint16(1)) // mono
w(uint32(omnivoiceSampleRate)) // sample rate
w(uint32(omnivoiceSampleRate * 2)) // byte rate = SR * blockAlign
w(uint16(2)) // block align (16-bit mono)
w(uint16(16)) // bits per sample
buf.WriteString("data")
w(uint32(0xFFFFFFFF))
return buf.Bytes()
}
// floatToPCM16LE clamps each sample to [-1,1] and encodes it as little-endian
// signed 16-bit PCM.
func floatToPCM16LE(samples []float32) []byte {
out := make([]byte, len(samples)*2)
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
v := int16(s * 32767)
out[i*2] = byte(v)
out[i*2+1] = byte(v >> 8)
}
return out
}
// writeWAV24k writes samples as a finalized 24 kHz mono 16-bit WAV at dst.
func writeWAV24k(dst string, samples []float32) error {
f, err := os.Create(dst)
if err != nil {
return fmt.Errorf("omnivoice: create %q: %w", dst, err)
}
enc := wav.NewEncoder(f, omnivoiceSampleRate, 16, 1, 1)
ints := make([]int, len(samples))
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
ints[i] = int(s * 32767)
}
b := &audio.IntBuffer{
Format: &audio.Format{NumChannels: 1, SampleRate: omnivoiceSampleRate},
Data: ints,
SourceBitDepth: 16,
}
if err := enc.Write(b); err != nil {
_ = enc.Close()
_ = f.Close()
return fmt.Errorf("omnivoice: encode WAV: %w", err)
}
if err := enc.Close(); err != nil {
_ = f.Close()
return fmt.Errorf("omnivoice: finalize WAV: %w", err)
}
return f.Close()
}
// readWAVAsFloat decodes a WAV file (any sample rate/channels) to a mono
// float32 slice in [-1,1] for use as reference audio. OmniVoice expects 24 kHz;
// callers should supply 24 kHz reference clips.
func readWAVAsFloat(path string) ([]float32, error) {
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("omnivoice: open ref %q: %w", path, err)
}
defer func() { _ = f.Close() }()
dec := wav.NewDecoder(f)
buf, err := dec.FullPCMBuffer()
if err != nil {
return nil, fmt.Errorf("omnivoice: decode ref %q: %w", path, err)
}
ch := int(buf.Format.NumChannels)
if ch < 1 {
ch = 1
}
bitDepth := int(buf.SourceBitDepth)
if bitDepth == 0 {
bitDepth = 16
}
scale := float32(int64(1) << uint(bitDepth-1))
n := len(buf.Data) / ch
out := make([]float32, n)
for i := 0; i < n; i++ {
// Downmix to mono by averaging channels.
var acc int
for c := 0; c < ch; c++ {
acc += buf.Data[i*ch+c]
}
out[i] = float32(acc) / float32(ch) / scale
}
return out, nil
}
// runtimeKeepAlive prevents the GC from reclaiming the reference-audio slice
// while its backing pointer is in use across the C call.
func runtimeKeepAlive(v any) { runtime.KeepAlive(v) }

View File

@@ -1,166 +0,0 @@
#include "gomnivoicecpp.h"
#include "ggml-backend.h"
#include "omnivoice.h"
#include <cstdio>
#include <cstdlib>
#include <cstring>
static ov_context *g_ctx = nullptr;
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void * /*data*/) {
if (!log)
return;
const char *lvl = "?????";
switch (level) {
case GGML_LOG_LEVEL_DEBUG: lvl = "DEBUG"; break;
case GGML_LOG_LEVEL_INFO: lvl = "INFO"; break;
case GGML_LOG_LEVEL_WARN: lvl = "WARN"; break;
case GGML_LOG_LEVEL_ERROR: lvl = "ERROR"; break;
default: break;
}
fprintf(stderr, "[%-5s] %s", lvl, log);
fflush(stderr);
}
int omni_load(const char *model_path, const char *codec_path, int use_fa,
int clamp_fp16) {
ggml_log_set(ggml_log_cb, nullptr);
ggml_backend_load_all();
if (!model_path || model_path[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: model_path is required\n");
return 1;
}
if (!codec_path || codec_path[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: codec_path is required\n");
return 2;
}
ov_init_params p;
ov_init_default_params(&p);
p.model_path = model_path;
p.codec_path = codec_path;
p.use_fa = use_fa != 0;
p.clamp_fp16 = clamp_fp16 != 0;
fprintf(stderr, "[omnivoice-cpp] Loading model=%s codec=%s\n", model_path,
codec_path);
g_ctx = ov_init(&p);
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] FATAL: ov_init failed: %s\n",
ov_last_error());
return 3;
}
fprintf(stderr, "[omnivoice-cpp] Model loaded (%s)\n", ov_version());
return 0;
}
// Fill an ov_tts_params from the flat wrapper arguments.
static void fill_params(ov_tts_params *tp, const char *text, const char *lang,
const char *instruct, const float *ref_samples,
int ref_n, const char *ref_text, long long seed,
int denoise) {
ov_tts_default_params(tp);
tp->text = text ? text : "";
tp->lang = lang ? lang : "";
if (instruct && instruct[0] != '\0')
tp->instruct = instruct;
if (ref_samples && ref_n > 0) {
tp->ref_audio_24k = ref_samples;
tp->ref_n_samples = ref_n;
if (ref_text && ref_text[0] != '\0')
tp->ref_text = ref_text;
tp->denoise = denoise != 0;
}
if (seed >= 0)
tp->mg_seed = (uint64_t)seed;
}
float *omni_tts(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, int *out_n) {
if (out_n)
*out_n = 0;
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] ERROR: model not loaded\n");
return nullptr;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: text is required\n");
return nullptr; // omni_tts: out_n already 0
}
ov_tts_params tp;
fill_params(&tp, text, lang, instruct, ref_samples, ref_n, ref_text, seed,
denoise);
ov_audio out = {0};
enum ov_status rc = ov_synthesize(g_ctx, &tp, &out);
if (rc != OV_STATUS_OK || out.n_samples <= 0 || !out.samples) {
fprintf(stderr, "[omnivoice-cpp] ERROR: synthesize failed (rc=%d): %s\n",
(int)rc, ov_last_error());
ov_audio_free(&out);
return nullptr;
}
// Copy into a plain malloc buffer the Go side can free symmetrically via
// omni_pcm_free; then release the ov_audio-owned buffer.
size_t bytes = (size_t)out.n_samples * sizeof(float);
float *buf = (float *)malloc(bytes);
if (!buf) {
fprintf(stderr, "[omnivoice-cpp] ERROR: malloc(%zu) failed\n", bytes);
ov_audio_free(&out);
return nullptr;
}
memcpy(buf, out.samples, bytes);
if (out_n)
*out_n = out.n_samples;
ov_audio_free(&out);
return buf;
}
int omni_tts_stream(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, omni_pcm_chunk_cb cb,
void *user_data) {
if (!g_ctx) {
fprintf(stderr, "[omnivoice-cpp] ERROR: model not loaded\n");
return 1;
}
if (!cb) {
fprintf(stderr, "[omnivoice-cpp] ERROR: stream callback is null\n");
return 2;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[omnivoice-cpp] ERROR: text is required\n");
return 4;
}
ov_tts_params tp;
fill_params(&tp, text, lang, instruct, ref_samples, ref_n, ref_text, seed,
denoise);
// ov_audio_chunk_cb has the identical signature to omni_pcm_chunk_cb
// (bool vs int return are ABI-compatible; non-zero == true).
tp.on_chunk = (ov_audio_chunk_cb)cb;
tp.on_chunk_user_data = user_data;
ov_audio out = {0}; // stays empty in streaming mode
enum ov_status rc = ov_synthesize(g_ctx, &tp, &out);
ov_audio_free(&out);
if (rc != OV_STATUS_OK && rc != OV_STATUS_CANCELLED) {
fprintf(stderr, "[omnivoice-cpp] ERROR: stream synth failed (rc=%d): %s\n",
(int)rc, ov_last_error());
return 3;
}
return 0;
}
void omni_pcm_free(float *p) { free(p); }
void omni_unload(void) {
if (g_ctx) {
ov_free(g_ctx);
g_ctx = nullptr;
}
}

View File

@@ -1,38 +0,0 @@
#pragma once
#include <cstdint>
extern "C" {
// Streaming PCM chunk callback. samples is mono float PCM at 24 kHz, valid
// only for the duration of the call. Return non-zero to continue, 0 to abort.
typedef int (*omni_pcm_chunk_cb)(const float *samples, int n_samples,
void *user_data);
// Load the LM (model_path) + codec (codec_path) GGUFs. use_fa / clamp_fp16
// map to ov_init_params. Returns 0 on success, non-zero on failure.
int omni_load(const char *model_path, const char *codec_path, int use_fa,
int clamp_fp16);
// Synthesize to a malloc'd float PCM buffer (caller frees via omni_pcm_free).
// ref_samples != null && ref_n > 0 => voice cloning (ref_text optional).
// instruct != null && non-empty => voice design. seed < 0 keeps the default
// MaskGIT seed. denoise toggles the <|denoise|> marker (only with a reference).
// Writes the sample count to *out_n. Returns NULL on failure (out_n set to 0).
float *omni_tts(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, int *out_n);
// Streaming synthesis: cb is invoked per PCM chunk as audio is produced.
// Same reference/design/seed semantics as omni_tts. Returns 0 on success.
int omni_tts_stream(const char *text, const char *lang, const char *instruct,
const float *ref_samples, int ref_n, const char *ref_text,
long long seed, int denoise, omni_pcm_chunk_cb cb,
void *user_data);
// Free a buffer returned by omni_tts.
void omni_pcm_free(float *p);
// Release the OmniVoice context.
void omni_unload(void);
}

View File

@@ -1,74 +0,0 @@
package main
import (
"os"
"strings"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func ttsReq(text, voice string, lang *string, dst string) *pb.TTSRequest {
return &pb.TTSRequest{Text: text, Voice: voice, Language: lang, Dst: dst}
}
var _ = Describe("OmniVoice e2e", Label("e2e"), func() {
var loaded bool
BeforeEach(func() {
modelPath := os.Getenv("OMNIVOICE_MODEL")
codecPath := os.Getenv("OMNIVOICE_CODEC")
if modelPath == "" || codecPath == "" {
Skip("OMNIVOICE_MODEL / OMNIVOICE_CODEC not set; skipping e2e")
}
if !loaded {
lib := os.Getenv("OMNIVOICE_LIBRARY")
if lib == "" {
lib = "./libgomnivoicecpp-fallback.so"
}
h, err := purego.Dlopen(lib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
Expect(err).ToNot(HaveOccurred())
purego.RegisterLibFunc(&CppLoad, h, "omni_load")
purego.RegisterLibFunc(&CppTTS, h, "omni_tts")
purego.RegisterLibFunc(&CppTTSStream, h, "omni_tts_stream")
purego.RegisterLibFunc(&CppPCMFree, h, "omni_pcm_free")
purego.RegisterLibFunc(&CppUnload, h, "omni_unload")
Expect(CppLoad(modelPath, codecPath, 0, 0)).To(Equal(0))
loaded = true
}
})
It("synthesizes a WAV file via TTS", func() {
b := &OmnivoiceCpp{opts: loadOptions{seed: 42, denoise: true}}
dst := GinkgoT().TempDir() + "/out.wav"
lang := "en"
err := b.TTS(ttsReq("Hello world.", "", &lang, dst))
Expect(err).ToNot(HaveOccurred())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
It("streams audio chunks via TTSStream", func() {
b := &OmnivoiceCpp{opts: loadOptions{seed: 42, denoise: true}}
results := make(chan []byte, 1024)
lang := "en"
done := make(chan error, 1)
go func() { done <- b.TTSStream(ttsReq("Hello there, streaming test.", "", &lang, ""), results) }()
var chunks int
var first []byte
for c := range results {
if chunks == 0 {
first = c
}
chunks++
}
Expect(<-done).ToNot(HaveOccurred())
Expect(chunks).To(BeNumerically(">=", 2))
Expect(string(first[0:4])).To(Equal("RIFF"))
Expect(strings.HasPrefix(string(first[8:12]), "WAVE")).To(BeTrue())
})
})

View File

@@ -1,246 +0,0 @@
package main
import (
"fmt"
"os"
"path/filepath"
"strings"
"sync"
"unsafe"
"github.com/ebitengine/purego"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
var (
// omni_load(model_path, codec_path, use_fa, clamp_fp16) int
CppLoad func(modelPath, codecPath string, useFA, clampFP16 int) int
// omni_tts(text, lang, instruct, ref_samples, ref_n, ref_text, seed, denoise, out_n) -> float* (uintptr)
CppTTS func(text, lang, instruct string, refSamples unsafe.Pointer, refN int,
refText string, seed int64, denoise int, outN unsafe.Pointer) uintptr
// omni_tts_stream(text, lang, instruct, ref_samples, ref_n, ref_text, seed, denoise, cb, user) int
CppTTSStream func(text, lang, instruct string, refSamples unsafe.Pointer, refN int,
refText string, seed int64, denoise int, cb uintptr, user uintptr) int
CppPCMFree func(ptr uintptr)
CppUnload func()
)
type OmnivoiceCpp struct {
base.SingleThread
opts loadOptions
// audioPath is the model-config reference voice (tts.audio_path), used as
// the default voice-cloning reference when a request does not set Voice.
audioPath string
}
func (o *OmnivoiceCpp) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
o.opts = parseOptions(opts.Options)
// Resolve the codec/tokenizer GGUF: explicit option, else auto-discover a
// *tokenizer*.gguf sibling of the base model.
codec := o.opts.codecPath
if codec != "" && !filepath.IsAbs(codec) {
codec = filepath.Join(filepath.Dir(model), codec)
}
if codec == "" {
codec = discoverTokenizer(filepath.Dir(model))
}
if codec == "" {
return fmt.Errorf("omnivoice: no codec/tokenizer GGUF found; set option 'tokenizer:<file>'")
}
o.opts.codecPath = codec
// tts.audio_path (ModelOptions.AudioPath) is the config-level voice-cloning
// reference: a default reference WAV used when a request omits Voice.
// Resolved relative to the model directory like the codec.
o.audioPath = opts.AudioPath
if o.audioPath != "" && !filepath.IsAbs(o.audioPath) {
o.audioPath = filepath.Join(filepath.Dir(model), o.audioPath)
}
useFA := boolToInt(o.opts.useFA)
clamp := boolToInt(o.opts.clampFP16)
fmt.Fprintf(os.Stderr, "[omnivoice-cpp] Load model=%s codec=%s use_fa=%d clamp_fp16=%d\n",
model, codec, useFA, clamp)
if rc := CppLoad(model, codec, useFA, clamp); rc != 0 {
return fmt.Errorf("omnivoice: failed to load model (rc=%d)", rc)
}
return nil
}
// discoverTokenizer returns the first *tokenizer*.gguf in dir, or "".
func discoverTokenizer(dir string) string {
entries, err := os.ReadDir(dir)
if err != nil {
return ""
}
for _, e := range entries {
name := strings.ToLower(e.Name())
if strings.Contains(name, "tokenizer") && strings.HasSuffix(name, ".gguf") {
return filepath.Join(dir, e.Name())
}
}
return ""
}
func boolToInt(b bool) int {
if b {
return 1
}
return 0
}
// refAudio loads the reference WAV (voice cloning) if voice points to a file.
// Returns nil if no cloning (empty or non-path - voice design uses Instructions).
func (o *OmnivoiceCpp) refAudio(voice string) ([]float32, error) {
v := strings.TrimSpace(voice)
if v == "" {
return nil, nil
}
if _, err := os.Stat(v); err != nil {
return nil, nil
}
return readWAVAsFloat(v)
}
// refAudioFor resolves the cloning reference for a request: the per-request
// Voice takes precedence, falling back to the model-config audio_path. Empty
// result means no cloning (voice design via Instructions still applies).
func (o *OmnivoiceCpp) refAudioFor(req *pb.TTSRequest) ([]float32, error) {
voice := strings.TrimSpace(req.Voice)
if voice == "" {
voice = o.audioPath
}
return o.refAudio(voice)
}
func reqParam(req *pb.TTSRequest, key string) string {
if req.Params == nil {
return ""
}
return req.Params[key]
}
func (o *OmnivoiceCpp) seedFor(req *pb.TTSRequest) int64 {
if s := reqParam(req, "seed"); s != "" {
var n int64
if _, err := fmt.Sscan(s, &n); err == nil {
return n
}
}
return o.opts.seed
}
func optStr(p *string) string {
if p == nil {
return ""
}
return *p
}
func (o *OmnivoiceCpp) TTS(req *pb.TTSRequest) error {
if req.Dst == "" {
return fmt.Errorf("omnivoice: TTS requires a destination path")
}
lang := normalizeLanguage(optStr(req.Language))
instruct := optStr(req.Instructions)
refText := reqParam(req, "ref_text")
seed := o.seedFor(req)
ref, err := o.refAudioFor(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
var n int32
ptr := CppTTS(req.Text, lang, instruct, refPtr, len(ref), refText, seed,
boolToInt(o.opts.denoise), unsafe.Pointer(&n))
runtimeKeepAlive(ref)
if ptr == 0 || n <= 0 {
return fmt.Errorf("omnivoice: synthesis failed")
}
defer CppPCMFree(ptr)
src := unsafe.Slice((*float32)(unsafe.Pointer(ptr)), int(n)) //nolint:govet // C-allocated PCM, copied out before free
out := make([]float32, int(n))
copy(out, src)
return writeWAV24k(req.Dst, out)
}
// streamState carries the active TTSStream channel to the single shared C
// callback. base.SingleThread serializes TTS/TTSStream, so one global slot is
// safe and avoids leaking a purego callback per request (purego callbacks
// cannot be freed and are capped).
var (
streamMu sync.Mutex
streamChan chan []byte
streamCbOnce sync.Once
streamCbPtr uintptr
)
// streamCallback is registered once and forwards each PCM chunk to streamChan.
func streamCallback(samples *float32, nSamples int32, _ uintptr) uintptr {
if nSamples <= 0 || samples == nil || streamChan == nil {
return 1 // continue
}
src := unsafe.Slice(samples, int(nSamples))
cp := make([]float32, int(nSamples)) // copy out of C memory before returning
copy(cp, src)
streamChan <- floatToPCM16LE(cp)
return 1 // continue
}
func (o *OmnivoiceCpp) TTSStream(req *pb.TTSRequest, results chan []byte) error {
defer close(results)
if req.Text == "" {
return fmt.Errorf("omnivoice: TTSStream requires text")
}
streamCbOnce.Do(func() {
streamCbPtr = purego.NewCallback(streamCallback)
})
lang := normalizeLanguage(optStr(req.Language))
instruct := optStr(req.Instructions)
refText := reqParam(req, "ref_text")
seed := o.seedFor(req)
ref, err := o.refAudioFor(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
// Emit the WAV header first so the HTTP layer gets a self-describing stream.
results <- wavHeader24k()
streamMu.Lock()
streamChan = results
rc := CppTTSStream(req.Text, lang, instruct, refPtr, len(ref), refText, seed,
boolToInt(o.opts.denoise), streamCbPtr, 0)
streamChan = nil
streamMu.Unlock()
runtimeKeepAlive(ref)
if rc != 0 {
return fmt.Errorf("omnivoice: streaming synthesis failed (rc=%d)", rc)
}
return nil
}

View File

@@ -1,90 +0,0 @@
package main
import (
"bytes"
"encoding/binary"
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestOmnivoiceCpp(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "omnivoice-cpp suite")
}
var _ = Describe("normalizeLanguage", func() {
DescribeTable("maps caller language to OmniVoice codes",
func(in, want string) {
Expect(normalizeLanguage(in)).To(Equal(want))
},
Entry("empty stays empty", "", ""),
Entry("english full name", "English", "en"),
Entry("chinese full name", "Chinese", "zh"),
Entry("locale suffix stripped", "en-US", "en"),
Entry("underscore locale", "zh_CN", "zh"),
Entry("already a code", "en", "en"),
Entry("unknown passes through normalized", "xx", "xx"),
)
})
var _ = Describe("parseOptions", func() {
It("extracts codec, use_fa, clamp_fp16, seed, denoise", func() {
o := parseOptions([]string{
"tokenizer:tok.gguf",
"use_fa:true",
"clamp_fp16:true",
"seed:7",
"denoise:false",
"unknown:ignored",
})
Expect(o.codecPath).To(Equal("tok.gguf"))
Expect(o.useFA).To(BeTrue())
Expect(o.clampFP16).To(BeTrue())
Expect(o.seed).To(Equal(int64(7)))
Expect(o.denoise).To(BeFalse())
})
It("accepts codec: as an alias for tokenizer:", func() {
o := parseOptions([]string{"codec:c.gguf"})
Expect(o.codecPath).To(Equal("c.gguf"))
})
It("defaults seed to -1 and denoise to true", func() {
o := parseOptions(nil)
Expect(o.seed).To(Equal(int64(-1)))
Expect(o.denoise).To(BeTrue())
})
})
var _ = Describe("wavHeader24k", func() {
It("emits a 44-byte streaming WAV header at 24 kHz mono 16-bit", func() {
h := wavHeader24k()
Expect(h).To(HaveLen(44))
Expect(string(h[0:4])).To(Equal("RIFF"))
Expect(string(h[8:12])).To(Equal("WAVE"))
Expect(string(h[12:16])).To(Equal("fmt "))
Expect(string(h[36:40])).To(Equal("data"))
var sampleRate uint32
Expect(binary.Read(bytes.NewReader(h[24:28]), binary.LittleEndian, &sampleRate)).To(Succeed())
Expect(sampleRate).To(Equal(uint32(24000)))
})
})
var _ = Describe("floatToPCM16LE", func() {
It("clamps and converts float PCM to little-endian int16 bytes", func() {
b := floatToPCM16LE([]float32{0, 1.0, -1.0, 2.0, -2.0})
Expect(b).To(HaveLen(10)) // 5 samples * 2 bytes
read := func(off int) int16 {
var v int16
_ = binary.Read(bytes.NewReader(b[off:off+2]), binary.LittleEndian, &v)
return v
}
Expect(read(0)).To(Equal(int16(0)))
Expect(read(2)).To(Equal(int16(32767)))
Expect(read(4)).To(Equal(int16(-32767)))
Expect(read(6)).To(Equal(int16(32767))) // clamped from 2.0
Expect(read(8)).To(Equal(int16(-32767))) // clamped from -2.0
})
})

View File

@@ -1,48 +0,0 @@
package main
// Note: this is started internally by LocalAI and a server is allocated for each model
import (
"flag"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
libName := os.Getenv("OMNIVOICE_LIBRARY")
if libName == "" {
libName = "./libgomnivoicecpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CppLoad, "omni_load"},
{&CppTTS, "omni_tts"},
{&CppTTSStream, "omni_tts_stream"},
{&CppPCMFree, "omni_pcm_free"},
{&CppUnload, "omni_unload"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
flag.Parse()
if err := grpc.StartServer(*addr, &OmnivoiceCpp{}); err != nil {
panic(err)
}
}

View File

@@ -1,74 +0,0 @@
package main
import (
"strconv"
"strings"
)
// loadOptions holds the parsed model-level options for OmniVoice.
type loadOptions struct {
codecPath string
useFA bool
clampFP16 bool
seed int64
denoise bool
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: seed -1 (engine default), denoise true.
func parseOptions(opts []string) loadOptions {
o := loadOptions{seed: -1, denoise: true}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "tokenizer", "codec":
o.codecPath = value
case "use_fa":
o.useFA = value == "true" || value == "1"
case "clamp_fp16":
o.clampFP16 = value == "true" || value == "1"
case "seed":
if n, err := strconv.ParseInt(value, 10, 64); err == nil {
o.seed = n
}
case "denoise":
o.denoise = value == "true" || value == "1"
}
}
return o
}
// languageNameAliases maps full language names to OmniVoice codes. OmniVoice's
// lang hint accepts "" (auto), "en", "zh" per the upstream convention; other
// codes pass through and the engine treats unknown hints as auto.
var languageNameAliases = map[string]string{
"english": "en",
"chinese": "zh",
}
// normalizeLanguage lowercases, trims, strips a region/locale suffix, and
// resolves common full names. Empty stays empty so the engine auto-detects.
func normalizeLanguage(lang string) string {
lang = strings.ToLower(strings.TrimSpace(lang))
if lang == "" {
return ""
}
if i := strings.IndexAny(lang, "-_."); i >= 0 {
lang = lang[:i]
}
if code, ok := languageNameAliases[lang]; ok {
return code
}
return lang
}

View File

@@ -1,64 +0,0 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
# This script is used in the final stage of the Dockerfile
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,52 +0,0 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
if [ "$(uname)" != "Darwin" ]; then
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx2.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx2.so"
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx512.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/omnivoice-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/omnivoice-cpp "$@"

View File

@@ -1,30 +0,0 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
cd "$CURDIR"
echo "Running omnivoice-cpp backend tests..."
if [ -z "$OMNIVOICE_MODEL" ]; then
MODEL_DIR="./omnivoice-models"
mkdir -p "$MODEL_DIR"
REPO_ID="Serveurperso/OmniVoice-GGUF"
BASE_URL="https://huggingface.co/${REPO_ID}/resolve/main"
FILES=( "omnivoice-base-Q4_K_M.gguf" "omnivoice-tokenizer-Q4_K_M.gguf" )
for file in "${FILES[@]}"; do
dest="${MODEL_DIR}/${file}"
if [ -f "${dest}" ]; then
echo " [skip] ${file}"
else
echo " [download] ${file}..."
curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
fi
done
export OMNIVOICE_MODEL="${MODEL_DIR}/omnivoice-base-Q4_K_M.gguf"
export OMNIVOICE_CODEC="${MODEL_DIR}/omnivoice-tokenizer-Q4_K_M.gguf"
fi
go test -v -timeout 1200s .
echo "All omnivoice-cpp e2e tests passed."

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
# Upstream pin lives below as PARAKEET_VERSION?=e270af73b94c9a5c37ec516230219ed4580e1db6
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
PARAKEET_VERSION?=e270af73b94c9a5c37ec516230219ed4580e1db6
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go
@@ -39,10 +39,7 @@ endif
# is overwritten back to OFF and the build silently falls back to CPU. Forward the
# PARAKEET_GGML_* options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
ifeq ($(BUILD_TYPE),cublas)
# GGML_CUDA_GRAPHS is OFF by ggml default; enabling it gives a small free
# speedup (~1% measured on GB10, never negative) by capturing/replaying the
# CUDA graph. Not gated by parakeet.cpp, so it passes straight through to ggml.
CMAKE_ARGS+=-DPARAKEET_GGML_CUDA=ON -DGGML_CUDA_GRAPHS=ON
CMAKE_ARGS+=-DPARAKEET_GGML_CUDA=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)

View File

@@ -98,21 +98,17 @@ type transcriptJSON struct {
}
// streamFeedJSON mirrors the document returned by
// parakeet_capi_stream_feed_json / parakeet_capi_stream_finalize_json (ABI v5):
// parakeet_capi_stream_feed_json / parakeet_capi_stream_finalize_json (ABI v4):
//
// {"text":"...","eou":0,"eob":0,"frame_sec":0.080000,
// {"text":"...","eou":0,"frame_sec":0.080000,
// "words":[{"w":"...","start":0.480,"end":0.640,"conf":0.9100}, ...]}
//
// "text" is the newly-finalized text since the last call; "eou" is 1 when an
// <EOU> (end of utterance) fired this feed and "eob" is 1 when an <EOB>
// (backchannel) fired. ABI v4 conflated the two into "eou"; v5 split them, so
// we read both and treat either as an utterance boundary for segmentation.
// "words" are the words finalized this call with absolute (stream-relative)
// start/end seconds.
// <EOU>/<EOB> fired this feed; "words" are the words finalized this call with
// absolute (stream-relative) start/end seconds.
type streamFeedJSON struct {
Text string `json:"text"`
Eou int `json:"eou"`
Eob int `json:"eob"`
FrameSec float64 `json:"frame_sec"`
Words []transcriptWord `json:"words"`
}
@@ -487,10 +483,7 @@ type streamSegmenter struct {
func (s *streamSegmenter) add(doc streamFeedJSON) {
s.cur = append(s.cur, doc.Words...)
// Close the segment on either turn signal: <EOU> (end of utterance) or
// <EOB> (backchannel). ABI v4 reported both via "eou"; v5 split them, so we
// OR them here to keep the v4 segmentation boundaries.
if doc.Eou != 0 || doc.Eob != 0 {
if doc.Eou != 0 {
s.flush()
}
}
@@ -678,12 +671,11 @@ func (p *ParakeetCpp) AudioTranscriptionStream(ctx context.Context, opts *pb.Tra
return nil
}
// streamJSON drives the streaming JSON entry points (present since ABI v4): each
// feed/finalize returns a {text,eou,eob,frame_sec,words} document. The
// newly-finalized text is emitted as a delta (unchanged streaming contract)
// while words are accumulated into per-utterance segments (closed on <EOU> or
// <EOB>) so the closing FinalResult carries timestamped segments. Runs under
// engineMu (already held by the caller).
// streamJSON drives the ABI v4 streaming JSON entry points: each feed/finalize
// returns a {text,eou,frame_sec,words} document. The newly-finalized text is
// emitted as a delta (unchanged streaming contract) while words are accumulated
// into per-utterance segments (closed on EOU) so the closing FinalResult carries
// timestamped segments. Runs under engineMu (already held by the caller).
func (p *ParakeetCpp) streamJSON(ctx context.Context, stream uintptr, data []float32,
duration float32, results chan *pb.TranscriptStreamResponse) error {
var (

View File

@@ -1,68 +1,23 @@
#!/bin/bash
#
# Bundle the parakeet-cpp-grpc binary, libparakeet.so, the core runtime
# libs (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active
# BUILD_TYPE so the package is self-contained. Mirrors
# backend/go/whisper/package.sh; run.sh routes the (CGO_ENABLED=0) binary
# through lib/ld.so so the packaged libc is used instead of the host's.
# L0 packaging stub: copy the binary, run.sh and libparakeet.so* into
# package/. The full ldd walk (libc, libstdc++, libgomp, GPU runtimes,
# arch detection) lands in L3, mirroring backend/go/whisper/package.sh.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libparakeet.so + any soname symlinks (libparakeet.so.X[.Y]). purego.Dlopen
# resolves it via LD_LIBRARY_PATH, which run.sh points at lib/.
# libparakeet.so + any soname symlinks (libparakeet.so.X, libparakeet.so.X.Y).
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
exit 1
}
# Detect architecture and copy the core runtime libs libparakeet.so links
# against, plus the matching dynamic loader as lib/ld.so.
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers)
# based on BUILD_TYPE so the backend can reach the GPU without the runtime
# base image shipping those drivers.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
echo "L0 package layout (full ldd walk lands in L3):"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -124,17 +124,4 @@ var _ = Describe("streaming segment assembly", func() {
Expect(acc.segments()).To(HaveLen(1))
Expect(acc.segments()[0].Text).To(Equal("hi there"))
})
// ABI v5 split <EOB> (backchannel) out of the "eou" flag into its own "eob"
// field; a backchannel must still close the segment as it did in v4.
It("closes a segment on EOB (backchannel) too", func() {
acc := &streamSegmenter{}
acc.add(streamFeedJSON{Text: "uh huh", Eou: 0, Eob: 1, Words: []transcriptWord{
{W: "uh", Start: 0.0, End: 0.2}, {W: "huh", Start: 0.2, End: 0.5},
}})
segs := acc.segments()
Expect(segs).To(HaveLen(1))
Expect(segs[0].Text).To(Equal("uh huh"))
Expect(segs[0].End).To(Equal(secondsToNanos(0.5)))
})
})

View File

@@ -3,36 +3,35 @@ project(goqwen3ttscpp LANGUAGES C CXX)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(QWENTTS_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/qwentts.cpp)
set(QWEN3TTS_DIR ${CMAKE_CURRENT_SOURCE_DIR}/sources/qwen3-tts.cpp)
# Override upstream's CMAKE_CUDA_ARCHITECTURES before add_subdirectory.
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES "75-virtual;80-virtual;86-real;89-real")
endif()
# Add the upstream project. Its own CMakeLists adds ggml + cpp-httplib + yyjson
# and builds qwen-core (STATIC, the qt_* impl). EXCLUDE_FROM_ALL keeps its CLI
# tools / tts-server / tests from building unless referenced.
add_subdirectory(${QWENTTS_DIR} qwentts EXCLUDE_FROM_ALL)
# Build ggml from the upstream's submodule FIRST, so that ggml/ggml-base/ggml-cpu
# CMake targets exist when the upstream project references them by name.
# The upstream CMakeLists.txt uses target_link_libraries(... ggml ggml-base ggml-cpu)
# with target_link_directories pointing at a pre-built ggml/build/. By adding ggml
# as a subdirectory here, CMake resolves those names as targets instead.
add_subdirectory(${QWEN3TTS_DIR}/ggml ggml EXCLUDE_FROM_ALL)
# Upstream generates version.h into its own CMAKE_CURRENT_BINARY_DIR and adds
# the top-level ${CMAKE_BINARY_DIR} to qwen-core's include path. Under
# add_subdirectory those two dirs differ (<build>/qwentts vs <build>), so
# qwen.cpp cannot find version.h. Point qwen-core at the subproject binary dir
# where version.h is actually generated. (Fix lives here, never in the fetched
# upstream checkout.)
target_include_directories(qwen-core PRIVATE ${CMAKE_BINARY_DIR}/qwentts)
# Now add the upstream project
add_subdirectory(${QWEN3TTS_DIR} qwen3tts EXCLUDE_FROM_ALL)
add_library(goqwen3ttscpp MODULE cpp/goqwen3ttscpp.cpp)
target_link_libraries(goqwen3ttscpp PRIVATE qwen-core)
target_link_libraries(goqwen3ttscpp PRIVATE qwen3_tts)
target_include_directories(goqwen3ttscpp PRIVATE ${QWENTTS_DIR}/src)
target_include_directories(goqwen3ttscpp SYSTEM PRIVATE ${QWENTTS_DIR}/ggml/include)
target_include_directories(goqwen3ttscpp PRIVATE ${QWEN3TTS_DIR}/src)
target_include_directories(goqwen3ttscpp SYSTEM PRIVATE ${QWEN3TTS_DIR}/ggml/include)
# Link GPU backends if the upstream ggml created them.
foreach(backend blas cuda metal vulkan sycl)
# Link GPU backends if available
foreach(backend blas cuda metal vulkan)
if(TARGET ggml-${backend})
target_link_libraries(goqwen3ttscpp PRIVATE ggml-${backend})
string(TOUPPER ${backend} BACKEND_UPPER)
target_compile_definitions(goqwen3ttscpp PRIVATE QWEN3TTS_HAVE_${BACKEND_UPPER})
if(backend STREQUAL "cuda")
find_package(CUDAToolkit QUIET)
if(CUDAToolkit_FOUND)
@@ -45,8 +44,12 @@ endforeach()
if(MSVC)
target_compile_options(goqwen3ttscpp PRIVATE /W4 /wd4100 /wd4505)
else()
target_compile_options(goqwen3ttscpp PRIVATE -Wall -Wextra
-Wno-unused-parameter -Wno-unused-function)
target_compile_options(goqwen3ttscpp PRIVATE -Wall -Wextra -Wshadow -Wconversion
-Wno-unused-parameter -Wno-unused-function -Wno-sign-conversion)
endif()
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9.0)
target_link_libraries(goqwen3ttscpp PRIVATE stdc++fs)
endif()
set_property(TARGET goqwen3ttscpp PROPERTY CXX_STANDARD 17)

View File

@@ -6,9 +6,9 @@ GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc --ignore=1)
# qwentts.cpp version
QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
QWEN3TTS_CPP_VERSION?=4536dcdce27c3764a93a06d6bf64026b124962f5
# qwen3-tts.cpp version
QWEN3TTS_REPO?=https://github.com/predict-woo/qwen3-tts.cpp
QWEN3TTS_CPP_VERSION?=136e5d36c17083da0321fd96512dc7b263f94a44
SO_TARGET?=libgoqwen3ttscpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -49,9 +49,9 @@ ifeq ($(BUILD_TYPE),sycl_f32)
-DCMAKE_CXX_COMPILER=icpx
endif
sources/qwentts.cpp:
mkdir -p sources/qwentts.cpp
cd sources/qwentts.cpp && \
sources/qwen3-tts.cpp:
mkdir -p sources/qwen3-tts.cpp
cd sources/qwen3-tts.cpp && \
git init && \
git remote add origin $(QWEN3TTS_REPO) && \
git fetch origin && \
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
build: package
clean: purge
rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
rm -rf libgoqwen3ttscpp*.so package sources/qwen3-tts.cpp qwen3-tts-cpp
purge:
rm -rf build*
@@ -88,24 +88,24 @@ purge:
# Build all variants (Linux only)
ifeq ($(UNAME_S),Linux)
libgoqwen3ttscpp-avx.so: sources/qwentts.cpp
libgoqwen3ttscpp-avx.so: sources/qwen3-tts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx${RESET})
SO_TARGET=libgoqwen3ttscpp-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx.so
libgoqwen3ttscpp-avx2.so: sources/qwentts.cpp
libgoqwen3ttscpp-avx2.so: sources/qwen3-tts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx2${RESET})
SO_TARGET=libgoqwen3ttscpp-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx2.so
libgoqwen3ttscpp-avx512.so: sources/qwentts.cpp
libgoqwen3ttscpp-avx512.so: sources/qwen3-tts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:avx512${RESET})
SO_TARGET=libgoqwen3ttscpp-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-avx512.so
endif
# Build fallback variant (all platforms)
libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
libgoqwen3ttscpp-fallback.so: sources/qwen3-tts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:fallback${RESET})
SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.so

View File

@@ -1,128 +0,0 @@
package main
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"runtime"
"github.com/go-audio/audio"
"github.com/go-audio/wav"
)
const qwen3ttsSampleRate = 24000
// wavHeader24k returns a 44-byte WAV header for a streaming 24 kHz mono 16-bit
// PCM stream, with placeholder (0xFFFFFFFF) sizes since the total length is
// unknown up front. Emitted as the first chunk of TTSStream so the HTTP layer
// receives a self-describing WAV (the gRPC TTSStream path never sets Message,
// so the backend owns the header - see core/backend/tts.go:ModelTTSStream).
func wavHeader24k() []byte {
var buf bytes.Buffer
w := func(v any) { _ = binary.Write(&buf, binary.LittleEndian, v) }
buf.WriteString("RIFF")
w(uint32(0xFFFFFFFF))
buf.WriteString("WAVE")
buf.WriteString("fmt ")
w(uint32(16)) // Subchunk1Size
w(uint16(1)) // PCM
w(uint16(1)) // mono
w(uint32(qwen3ttsSampleRate)) // sample rate
w(uint32(qwen3ttsSampleRate * 2)) // byte rate = SR * blockAlign
w(uint16(2)) // block align (16-bit mono)
w(uint16(16)) // bits per sample
buf.WriteString("data")
w(uint32(0xFFFFFFFF))
return buf.Bytes()
}
// floatToPCM16LE clamps each sample to [-1,1] and encodes it as little-endian
// signed 16-bit PCM.
func floatToPCM16LE(samples []float32) []byte {
out := make([]byte, len(samples)*2)
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
v := int16(s * 32767)
out[i*2] = byte(v)
out[i*2+1] = byte(v >> 8)
}
return out
}
// writeWAV24k writes samples as a finalized 24 kHz mono 16-bit WAV at dst.
func writeWAV24k(dst string, samples []float32) error {
f, err := os.Create(dst)
if err != nil {
return fmt.Errorf("qwen3-tts: create %q: %w", dst, err)
}
enc := wav.NewEncoder(f, qwen3ttsSampleRate, 16, 1, 1)
ints := make([]int, len(samples))
for i, s := range samples {
if s > 1 {
s = 1
} else if s < -1 {
s = -1
}
ints[i] = int(s * 32767)
}
b := &audio.IntBuffer{
Format: &audio.Format{NumChannels: 1, SampleRate: qwen3ttsSampleRate},
Data: ints,
SourceBitDepth: 16,
}
if err := enc.Write(b); err != nil {
_ = enc.Close()
_ = f.Close()
return fmt.Errorf("qwen3-tts: encode WAV: %w", err)
}
if err := enc.Close(); err != nil {
_ = f.Close()
return fmt.Errorf("qwen3-tts: finalize WAV: %w", err)
}
return f.Close()
}
// readWAVAsFloat decodes a WAV file (any sample rate/channels) to a mono
// float32 slice in [-1,1] for use as cloning reference audio. qwentts expects
// 24 kHz; callers should supply 24 kHz reference clips.
func readWAVAsFloat(path string) ([]float32, error) {
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("qwen3-tts: open ref %q: %w", path, err)
}
defer func() { _ = f.Close() }()
dec := wav.NewDecoder(f)
buf, err := dec.FullPCMBuffer()
if err != nil {
return nil, fmt.Errorf("qwen3-tts: decode ref %q: %w", path, err)
}
ch := int(buf.Format.NumChannels)
if ch < 1 {
ch = 1
}
bitDepth := int(buf.SourceBitDepth)
if bitDepth == 0 {
bitDepth = 16
}
scale := float32(int64(1) << uint(bitDepth-1))
n := len(buf.Data) / ch
out := make([]float32, n)
for i := 0; i < n; i++ {
var acc int
for c := 0; c < ch; c++ {
acc += buf.Data[i*ch+c]
}
out[i] = float32(acc) / float32(ch) / scale
}
return out, nil
}
// runtimeKeepAlive prevents the GC from reclaiming the reference-audio slice
// while its backing pointer is in use across the C call.
func runtimeKeepAlive(v any) { runtime.KeepAlive(v) }

View File

@@ -1,54 +0,0 @@
package main
import (
"path/filepath"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// These specs pin the voice-selection logic in resolveRequest, in particular
// the config-level audio_path (tts.audio_path -> ModelOptions.AudioPath) being
// used as the default voice-cloning reference. No model/C library is needed:
// resolveRequest only reads the reference WAV via readWAVAsFloat (pure Go).
var _ = Describe("resolveRequest voice/clone selection", func() {
var dir, refWav string
BeforeEach(func() {
dir = GinkgoT().TempDir()
refWav = filepath.Join(dir, "ref.wav")
// 0.5s of non-silent 24kHz mono audio as a clone reference.
samples := make([]float32, qwen3ttsSampleRate/2)
for i := range samples {
samples[i] = 0.1
}
Expect(writeWAV24k(refWav, samples)).To(Succeed())
})
It("uses the config audio_path as the clone reference when Voice is empty", func() {
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi"})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(BeEmpty())
Expect(len(ref)).To(Equal(qwen3ttsSampleRate / 2))
})
It("lets a per-request audio Voice override audio_path", func() {
other := filepath.Join(dir, "other.wav")
Expect(writeWAV24k(other, make([]float32, 100))).To(Succeed())
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi", Voice: other})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(BeEmpty())
Expect(len(ref)).To(Equal(100))
})
It("does not trigger audio_path cloning for a named-speaker Voice", func() {
q := &Qwen3TtsCpp{audioPath: refWav}
_, _, speaker, _, ref, _, err := q.resolveRequest(&pb.TTSRequest{Text: "hi", Voice: "serena"})
Expect(err).ToNot(HaveOccurred())
Expect(speaker).To(Equal("serena"))
Expect(ref).To(BeNil())
})
})

View File

@@ -1,191 +1,161 @@
#include "goqwen3ttscpp.h"
#include "ggml-backend.h"
#include "qwen.h"
#include "qwen3_tts.h"
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <string>
static qt_context *g_ctx = nullptr;
using namespace qwen3_tts;
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void * /*data*/) {
// Global engine (loaded once, reused across requests)
static Qwen3TTS *g_engine = nullptr;
static bool g_loaded = false;
static int g_threads = 4;
static void ggml_log_cb(enum ggml_log_level level, const char *log, void *data) {
const char *level_str;
if (!log)
return;
const char *lvl = "?????";
switch (level) {
case GGML_LOG_LEVEL_DEBUG: lvl = "DEBUG"; break;
case GGML_LOG_LEVEL_INFO: lvl = "INFO"; break;
case GGML_LOG_LEVEL_WARN: lvl = "WARN"; break;
case GGML_LOG_LEVEL_ERROR: lvl = "ERROR"; break;
default: break;
case GGML_LOG_LEVEL_DEBUG:
level_str = "DEBUG";
break;
case GGML_LOG_LEVEL_INFO:
level_str = "INFO";
break;
case GGML_LOG_LEVEL_WARN:
level_str = "WARN";
break;
case GGML_LOG_LEVEL_ERROR:
level_str = "ERROR";
break;
default:
level_str = "?????";
break;
}
fprintf(stderr, "[%-5s] %s", lvl, log);
fprintf(stderr, "[%-5s] ", level_str);
fputs(log, stderr);
fflush(stderr);
}
int qt3_load(const char *talker_path, const char *codec_path, int use_fa,
int clamp_fp16) {
// Map language string to language_id token used by the model
static int language_to_id(const char *lang) {
if (!lang || lang[0] == '\0')
return 2050; // default: English
std::string l(lang);
if (l == "en")
return 2050;
if (l == "ru")
return 2069;
if (l == "zh")
return 2055;
if (l == "ja")
return 2058;
if (l == "ko")
return 2064;
if (l == "de")
return 2053;
if (l == "fr")
return 2061;
if (l == "es")
return 2054;
if (l == "it")
return 2056;
if (l == "pt")
return 2057;
fprintf(stderr, "[qwen3-tts-cpp] Unknown language '%s', defaulting to English\n",
lang);
return 2050;
}
int load_model(const char *model_dir, int n_threads) {
ggml_log_set(ggml_log_cb, nullptr);
ggml_backend_load_all();
if (!talker_path || talker_path[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: talker_path is required\n");
if (n_threads <= 0)
n_threads = 4;
g_threads = n_threads;
fprintf(stderr, "[qwen3-tts-cpp] Loading models from %s (threads=%d)\n",
model_dir, n_threads);
g_engine = new Qwen3TTS();
if (!g_engine->load_models(model_dir)) {
fprintf(stderr, "[qwen3-tts-cpp] FATAL: failed to load models from %s\n",
model_dir);
delete g_engine;
g_engine = nullptr;
return 1;
}
if (!codec_path || codec_path[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: codec_path is required\n");
return 2;
}
qt_init_params p;
qt_init_default_params(&p);
p.talker_path = talker_path;
p.codec_path = codec_path;
p.use_fa = use_fa != 0;
p.clamp_fp16 = clamp_fp16 != 0;
fprintf(stderr, "[qwen3-tts-cpp] Loading talker=%s codec=%s\n", talker_path,
codec_path);
g_ctx = qt_init(&p);
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] FATAL: qt_init failed: %s\n",
qt_last_error());
return 3;
}
fprintf(stderr, "[qwen3-tts-cpp] Model loaded (%s)\n", qt_version());
g_loaded = true;
fprintf(stderr, "[qwen3-tts-cpp] Models loaded successfully\n");
return 0;
}
// Fill a qt_tts_params from the flat wrapper arguments. Unset/zero scalars keep
// the qt defaults (temperature 0.9, top_k 50, top_p 1.0, rep 1.05, max 2048).
static void fill_params(qt_tts_params *tp, const char *text, const char *lang,
const char *instruct, const char *speaker,
const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens) {
qt_tts_default_params(tp);
tp->text = text ? text : "";
if (lang && lang[0] != '\0')
tp->lang = lang; // else keep default NULL -> auto
if (instruct && instruct[0] != '\0')
tp->instruct = instruct;
if (speaker && speaker[0] != '\0')
tp->speaker = speaker;
if (ref_samples && ref_n > 0) {
tp->ref_audio_24k = ref_samples;
tp->ref_n_samples = ref_n;
if (ref_text && ref_text[0] != '\0')
tp->ref_text = ref_text;
}
if (seed >= 0)
tp->seed = (int64_t)seed; // else default -1 (random)
if (temperature > 0.0f)
tp->temperature = temperature;
if (top_k > 0)
tp->top_k = top_k;
if (top_p > 0.0f)
tp->top_p = top_p;
if (repetition_penalty > 0.0f)
tp->repetition_penalty = repetition_penalty;
if (max_new_tokens > 0)
tp->max_new_tokens = max_new_tokens;
}
float *qt3_tts(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, int *out_n) {
if (out_n)
*out_n = 0;
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: model not loaded\n");
return nullptr;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text is required\n");
return nullptr;
}
qt_tts_params tp;
fill_params(&tp, text, lang, instruct, speaker, ref_samples, ref_n,
ref_text, seed, temperature, top_k, top_p, repetition_penalty,
max_new_tokens);
qt_audio out = {0};
enum qt_status rc = qt_synthesize(g_ctx, &tp, &out);
if (rc != QT_STATUS_OK || out.n_samples <= 0 || !out.samples) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesize failed (rc=%d): %s\n",
(int)rc, qt_last_error());
qt_audio_free(&out);
return nullptr;
}
// Copy into a plain malloc buffer the Go side frees via qt3_pcm_free.
size_t bytes = (size_t)out.n_samples * sizeof(float);
float *buf = (float *)malloc(bytes);
if (!buf) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: malloc(%zu) failed\n", bytes);
qt_audio_free(&out);
return nullptr;
}
memcpy(buf, out.samples, bytes);
if (out_n)
*out_n = out.n_samples;
qt_audio_free(&out);
return buf;
}
int qt3_tts_stream(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, qt3_chunk_cb cb, void *user_data) {
if (!g_ctx) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: model not loaded\n");
int synthesize(const char *text, const char *ref_audio_path, const char *dst,
const char *language, float temperature, float top_p,
int top_k, float repetition_penalty, int max_audio_tokens,
int n_threads) {
if (!g_loaded || !g_engine) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: models not loaded\n");
return 1;
}
if (!cb) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: stream callback is null\n");
if (!text || !dst) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text and dst are required\n");
return 2;
}
if (!text || text[0] == '\0') {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: text is required\n");
tts_params params;
params.max_audio_tokens = max_audio_tokens > 0 ? max_audio_tokens : 4096;
params.temperature = temperature;
params.top_p = top_p;
params.top_k = top_k;
params.repetition_penalty = repetition_penalty;
params.n_threads = n_threads > 0 ? n_threads : g_threads;
params.language_id = language_to_id(language);
fprintf(stderr, "[qwen3-tts-cpp] Synthesizing: text='%.50s%s', lang_id=%d, "
"temp=%.2f, threads=%d\n",
text, (strlen(text) > 50 ? "..." : ""), params.language_id,
temperature, params.n_threads);
tts_result result;
bool has_ref = ref_audio_path && ref_audio_path[0] != '\0';
if (has_ref) {
fprintf(stderr, "[qwen3-tts-cpp] Voice cloning with ref: %s\n",
ref_audio_path);
result = g_engine->synthesize_with_voice(text, ref_audio_path, params);
} else {
result = g_engine->synthesize(text, params);
}
if (!result.success) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesis failed: %s\n",
result.error_msg.c_str());
return 3;
}
int n_samples = (int)result.audio.size();
if (n_samples == 0) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: synthesis produced no samples\n");
return 4;
}
qt_tts_params tp;
fill_params(&tp, text, lang, instruct, speaker, ref_samples, ref_n,
ref_text, seed, temperature, top_k, top_p, repetition_penalty,
max_new_tokens);
// qt_audio_chunk_cb has the identical signature to qt3_chunk_cb
// (bool vs int return are ABI-compatible; non-zero == true).
tp.on_chunk = (qt_audio_chunk_cb)cb;
tp.on_chunk_user_data = user_data;
qt_audio out = {0}; // stays empty in streaming mode
enum qt_status rc = qt_synthesize(g_ctx, &tp, &out);
qt_audio_free(&out);
if (rc != QT_STATUS_OK && rc != QT_STATUS_CANCELLED) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: stream synth failed (rc=%d): %s\n",
(int)rc, qt_last_error());
return 3;
fprintf(stderr,
"[qwen3-tts-cpp] Synthesis done: %d samples (%.2fs @ 24kHz)\n",
n_samples, (float)n_samples / 24000.0f);
if (!save_audio_file(dst, result.audio, result.sample_rate)) {
fprintf(stderr, "[qwen3-tts-cpp] ERROR: failed to write %s\n", dst);
return 5;
}
fprintf(stderr, "[qwen3-tts-cpp] Wrote %s\n", dst);
return 0;
}
void qt3_pcm_free(float *p) { free(p); }
void qt3_unload(void) {
if (g_ctx) {
qt_free(g_ctx);
g_ctx = nullptr;
}
}
int qt3_n_speakers(void) { return g_ctx ? qt_n_speakers(g_ctx) : 0; }
const char *qt3_speaker_name(int i) {
return g_ctx ? qt_speaker_name(g_ctx, i) : nullptr;
}

View File

@@ -1,47 +1,12 @@
#pragma once
#include <cstddef>
#include <cstdint>
extern "C" {
// Streaming PCM chunk callback. samples is mono float PCM at 24 kHz, valid
// only for the duration of the call. Return non-zero to continue, 0 to abort.
typedef int (*qt3_chunk_cb)(const float *samples, int n_samples,
void *user_data);
// Load the talker + codec/tokenizer GGUFs. use_fa / clamp_fp16 map to
// qt_init_params (the qt ABI exposes no thread count; ggml uses its own
// default). Returns 0 on success, non-zero on failure.
int qt3_load(const char *talker_path, const char *codec_path, int use_fa,
int clamp_fp16);
// Synthesize to a malloc'd float PCM buffer (caller frees via qt3_pcm_free).
// The synthesis mode (base / custom_voice / voice_design) is auto-detected by
// qt from the talker GGUF; speaker is honoured only for custom_voice, instruct
// for voice_design / custom_voice, and ref_samples (+ optional ref_text) drive
// base-mode cloning. qt enforces the rules and we surface qt_last_error() on
// QT_STATUS_MODE_INVALID. Writes the sample count to *out_n. Returns NULL on
// failure (out_n set to 0).
float *qt3_tts(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, int *out_n);
// Streaming synthesis: cb is invoked per PCM chunk as audio is produced. Same
// param semantics as qt3_tts. Returns 0 on success.
int qt3_tts_stream(const char *text, const char *lang, const char *instruct,
const char *speaker, const float *ref_samples, int ref_n,
const char *ref_text, long long seed, float temperature,
int top_k, float top_p, float repetition_penalty,
int max_new_tokens, qt3_chunk_cb cb, void *user_data);
// Free a buffer returned by qt3_tts.
void qt3_pcm_free(float *p);
// Release the qt context.
void qt3_unload(void);
// Named-speaker introspection (custom_voice models). Returns 0 / NULL when no
// model is loaded or the index is out of range.
int qt3_n_speakers(void);
const char *qt3_speaker_name(int i);
int load_model(const char *model_dir, int n_threads);
int synthesize(const char *text, const char *ref_audio_path, const char *dst,
const char *language, float temperature, float top_p,
int top_k, float repetition_penalty, int max_audio_tokens,
int n_threads);
}

View File

@@ -1,95 +0,0 @@
package main
import (
"math"
"os"
"strings"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func ttsReq(text, voice string, lang *string, dst string) *pb.TTSRequest {
return &pb.TTSRequest{Text: text, Voice: voice, Language: lang, Dst: dst}
}
var _ = Describe("qwen3-tts-cpp e2e", Label("e2e"), func() {
var loaded bool
BeforeEach(func() {
modelPath := os.Getenv("QWEN3TTS_MODEL")
codecPath := os.Getenv("QWEN3TTS_CODEC")
if modelPath == "" || codecPath == "" {
Skip("QWEN3TTS_MODEL / QWEN3TTS_CODEC not set; skipping e2e")
}
if !loaded {
lib := os.Getenv("QWEN3TTS_LIBRARY")
if lib == "" {
lib = "./libgoqwen3ttscpp-fallback.so"
}
h, err := purego.Dlopen(lib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
Expect(err).ToNot(HaveOccurred())
purego.RegisterLibFunc(&CppLoad, h, "qt3_load")
purego.RegisterLibFunc(&CppTTS, h, "qt3_tts")
purego.RegisterLibFunc(&CppTTSStream, h, "qt3_tts_stream")
purego.RegisterLibFunc(&CppPCMFree, h, "qt3_pcm_free")
purego.RegisterLibFunc(&CppUnload, h, "qt3_unload")
Expect(CppLoad(modelPath, codecPath, 1, 0)).To(Equal(0))
loaded = true
}
})
It("synthesizes a WAV file via TTS", func() {
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}}
dst := GinkgoT().TempDir() + "/out.wav"
lang := "english"
err := b.TTS(ttsReq("Hello world.", "", &lang, dst))
Expect(err).ToNot(HaveOccurred())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
It("streams audio chunks via TTSStream", func() {
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}}
results := make(chan []byte, 1024)
lang := "english"
done := make(chan error, 1)
go func() { done <- b.TTSStream(ttsReq("Hello there, streaming test.", "", &lang, ""), results) }()
var chunks int
var first []byte
for c := range results {
if chunks == 0 {
first = c
}
chunks++
}
Expect(<-done).ToNot(HaveOccurred())
Expect(chunks).To(BeNumerically(">=", 2))
Expect(string(first[0:4])).To(Equal("RIFF"))
Expect(strings.HasPrefix(string(first[8:12]), "WAVE")).To(BeTrue())
})
It("clones a voice from the config audio_path reference", func() {
// 1s of 24kHz mono audio as a clone reference; the base model carries
// a speaker encoder, so audio_path drives x-vector voice cloning.
ref := GinkgoT().TempDir() + "/ref.wav"
samples := make([]float32, qwen3ttsSampleRate)
for i := range samples {
samples[i] = float32(0.05 * math.Sin(float64(i)*0.06))
}
Expect(writeWAV24k(ref, samples)).To(Succeed())
b := &Qwen3TtsCpp{opts: loadOptions{seed: 42, useFA: true}, audioPath: ref}
dst := GinkgoT().TempDir() + "/clone.wav"
lang := "english"
// Empty Voice -> the config audio_path is used as the clone reference.
Expect(b.TTS(ttsReq("Cloned voice test.", "", &lang, dst))).To(Succeed())
fi, err := os.Stat(dst)
Expect(err).ToNot(HaveOccurred())
Expect(fi.Size()).To(BeNumerically(">", int64(44)))
})
})

View File

@@ -5,225 +5,108 @@ import (
"os"
"path/filepath"
"strings"
"sync"
"unsafe"
"github.com/ebitengine/purego"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
var (
// qt3_load(talker_path, codec_path, use_fa, clamp_fp16) int
CppLoad func(talkerPath, codecPath string, useFA, clampFP16 int) int
// qt3_tts(text, lang, instruct, speaker, ref_samples, ref_n, ref_text,
// seed, temperature, top_k, top_p, rep_pen, max_new, out_n) -> float*
CppTTS func(text, lang, instruct, speaker string, refSamples unsafe.Pointer,
refN int, refText string, seed int64, temperature float32, topK int,
topP, repPen float32, maxNew int, outN unsafe.Pointer) uintptr
// qt3_tts_stream(..., cb, user) int
CppTTSStream func(text, lang, instruct, speaker string, refSamples unsafe.Pointer,
refN int, refText string, seed int64, temperature float32, topK int,
topP, repPen float32, maxNew int, cb uintptr, user uintptr) int
CppPCMFree func(ptr uintptr)
CppUnload func()
CppLoadModel func(modelDir string, nThreads int) int
CppSynthesize func(text, refAudioPath, dst, language string,
temperature, topP float32, topK int,
repetitionPenalty float32, maxAudioTokens, nThreads int) int
)
type Qwen3TtsCpp struct {
base.SingleThread
opts loadOptions
// audioPath is the model-config reference voice (tts.audio_path), the
// default clone reference when a request omits an audio Voice.
audioPath string
threads int
}
// languageNameAliases maps common full language names to the canonical
// two-letter code understood by the C++ language_to_id table.
var languageNameAliases = map[string]string{
"english": "en",
"russian": "ru",
"chinese": "zh",
"japanese": "ja",
"korean": "ko",
"german": "de",
"french": "fr",
"spanish": "es",
"italian": "it",
"portuguese": "pt",
}
// normalizeLanguage coerces a caller-supplied language into the canonical code
// the model expects. It lowercases, trims, strips any region/locale suffix
// (en-US, en_US, ja.JP -> en/ja), and resolves common full names (english -> en).
// An empty input stays empty so the C++ side applies its English default; an
// unrecognized value is returned normalized so C++ can log it and default.
func normalizeLanguage(lang string) string {
lang = strings.ToLower(strings.TrimSpace(lang))
if lang == "" {
return ""
}
// Strip region/locale suffix: keep the segment before the first separator.
if i := strings.IndexAny(lang, "-_."); i >= 0 {
lang = lang[:i]
}
if code, ok := languageNameAliases[lang]; ok {
return code
}
return lang
}
func (q *Qwen3TtsCpp) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
// ModelFile is the model directory path (containing GGUF files)
modelDir := opts.ModelFile
if modelDir == "" {
modelDir = opts.ModelPath
}
q.opts = parseOptions(opts.Options)
// Resolve the codec/tokenizer GGUF: explicit option, else auto-discover a
// *tokenizer*.gguf sibling of the talker model.
codec := q.opts.codecPath
if codec != "" && !filepath.IsAbs(codec) {
codec = filepath.Join(filepath.Dir(model), codec)
}
if codec == "" {
codec = discoverTokenizer(filepath.Dir(model))
}
if codec == "" {
return fmt.Errorf("qwen3-tts: no codec/tokenizer GGUF found; set option 'tokenizer:<file>'")
}
q.opts.codecPath = codec
q.audioPath = opts.AudioPath
if q.audioPath != "" && !filepath.IsAbs(q.audioPath) {
q.audioPath = filepath.Join(filepath.Dir(model), q.audioPath)
// Resolve relative paths
if !filepath.IsAbs(modelDir) && opts.ModelPath != "" {
modelDir = filepath.Join(opts.ModelPath, modelDir)
}
useFA := boolToInt(q.opts.useFA)
clamp := boolToInt(q.opts.clampFP16)
fmt.Fprintf(os.Stderr, "[qwen3-tts-cpp] Load talker=%s codec=%s use_fa=%d clamp_fp16=%d\n",
model, codec, useFA, clamp)
if rc := CppLoad(model, codec, useFA, clamp); rc != 0 {
return fmt.Errorf("qwen3-tts: failed to load model (rc=%d)", rc)
threads := int(opts.Threads)
if threads <= 0 {
threads = 4
}
q.threads = threads
fmt.Fprintf(os.Stderr, "[qwen3-tts-cpp] Loading models from: %s (threads=%d)\n", modelDir, threads)
if ret := CppLoadModel(modelDir, threads); ret != 0 {
return fmt.Errorf("failed to load qwen3-tts model (error code: %d)", ret)
}
return nil
}
// discoverTokenizer returns the first *tokenizer*.gguf in dir, or "".
func discoverTokenizer(dir string) string {
entries, err := os.ReadDir(dir)
if err != nil {
return ""
}
for _, e := range entries {
name := strings.ToLower(e.Name())
if strings.Contains(name, "tokenizer") && strings.HasSuffix(name, ".gguf") {
return filepath.Join(dir, e.Name())
}
}
return ""
}
func boolToInt(b bool) int {
if b {
return 1
}
return 0
}
func optStr(p *string) string {
if p == nil {
return ""
}
return *p
}
// resolveRequest derives the synthesis inputs from a TTSRequest:
// language, instruct, speaker, ref-audio samples, ref-text and sampling.
func (q *Qwen3TtsCpp) resolveRequest(req *pb.TTSRequest) (lang, instruct, speaker, refText string, ref []float32, s sampling, err error) {
lang = normalizeLanguage(optStr(req.Language))
instruct = optStr(req.Instructions)
var refPath string
speaker, refPath = resolveVoice(req.Voice)
if refPath == "" && speaker == "" && q.audioPath != "" {
// No per-request voice: fall back to the config clone reference.
refPath = q.audioPath
}
if refPath != "" {
ref, err = readWAVAsFloat(refPath)
if err != nil {
return
}
}
if req.Params != nil {
refText = req.Params["ref_text"]
}
s = parseSampling(req.Params, q.opts.seed)
return
}
func (q *Qwen3TtsCpp) TTS(req *pb.TTSRequest) error {
if req.Dst == "" {
return fmt.Errorf("qwen3-tts: TTS requires a destination path")
}
if req.Text == "" {
return fmt.Errorf("qwen3-tts: TTS requires text")
}
lang, instruct, speaker, refText, ref, s, err := q.resolveRequest(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
text := req.Text
voice := req.Voice // reference audio path for voice cloning (empty = no cloning)
dst := req.Dst
language := ""
if req.Language != nil {
language = normalizeLanguage(*req.Language)
}
var n int32
ptr := CppTTS(req.Text, lang, instruct, speaker, refPtr, len(ref), refText,
s.seed, s.temperature, s.topK, s.topP, s.repPen, s.maxNew, unsafe.Pointer(&n))
runtimeKeepAlive(ref)
if ptr == 0 {
return fmt.Errorf("qwen3-tts: synthesis failed")
}
// Register the free as soon as we own a non-null buffer, so the n<=0 guard
// below cannot leak it (defensive: the C contract returns NULL on failure).
defer CppPCMFree(ptr)
if n <= 0 {
return fmt.Errorf("qwen3-tts: synthesis produced no samples")
}
src := unsafe.Slice((*float32)(unsafe.Pointer(ptr)), int(n)) //nolint:govet // C-allocated PCM, copied out before free
out := make([]float32, int(n))
copy(out, src)
return writeWAV24k(req.Dst, out)
}
// Synthesis parameters with sensible defaults
temperature := float32(0.9)
topP := float32(0.8)
topK := 50
repetitionPenalty := float32(1.05)
maxAudioTokens := 4096
// streamState carries the active TTSStream channel to the single shared C
// callback. base.SingleThread serializes TTS/TTSStream, so one global slot is
// safe and avoids leaking a purego callback per request (purego callbacks
// cannot be freed and are capped).
var (
streamMu sync.Mutex
streamChan chan []byte
streamCbOnce sync.Once
streamCbPtr uintptr
)
// streamCallback is registered once and forwards each PCM chunk to streamChan.
func streamCallback(samples *float32, nSamples int32, _ uintptr) uintptr {
if nSamples <= 0 || samples == nil || streamChan == nil {
return 1 // continue
}
src := unsafe.Slice(samples, int(nSamples))
cp := make([]float32, int(nSamples)) // copy out of C memory before returning
copy(cp, src)
streamChan <- floatToPCM16LE(cp)
return 1 // continue
}
func (q *Qwen3TtsCpp) TTSStream(req *pb.TTSRequest, results chan []byte) error {
defer close(results)
if req.Text == "" {
return fmt.Errorf("qwen3-tts: TTSStream requires text")
if ret := CppSynthesize(text, voice, dst, language,
temperature, topP, topK, repetitionPenalty,
maxAudioTokens, q.threads); ret != 0 {
return fmt.Errorf("failed to synthesize audio (error code: %d)", ret)
}
streamCbOnce.Do(func() {
streamCbPtr = purego.NewCallback(streamCallback)
})
lang, instruct, speaker, refText, ref, s, err := q.resolveRequest(req)
if err != nil {
return err
}
var refPtr unsafe.Pointer
if len(ref) > 0 {
refPtr = unsafe.Pointer(&ref[0])
}
// Emit the WAV header first so the HTTP layer gets a self-describing stream.
results <- wavHeader24k()
streamMu.Lock()
streamChan = results
rc := CppTTSStream(req.Text, lang, instruct, speaker, refPtr, len(ref), refText,
s.seed, s.temperature, s.topK, s.topP, s.repPen, s.maxNew, streamCbPtr, 0)
streamChan = nil
streamMu.Unlock()
runtimeKeepAlive(ref)
if rc != 0 {
return fmt.Errorf("qwen3-tts: streaming synthesis failed (rc=%d)", rc)
}
return nil
}

View File

@@ -0,0 +1,53 @@
package main
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestLanguageNormalization(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "qwen3-tts-cpp language normalization")
}
var _ = Describe("normalizeLanguage", func() {
DescribeTable("maps caller input to the canonical model language code",
func(input, expected string) {
Expect(normalizeLanguage(input)).To(Equal(expected))
},
// Canonical codes pass through unchanged
Entry("canonical en", "en", "en"),
Entry("canonical zh", "zh", "zh"),
Entry("canonical pt", "pt", "pt"),
// Case-insensitive
Entry("uppercase", "EN", "en"),
Entry("mixed case", "Ja", "ja"),
// Surrounding whitespace
Entry("trims whitespace", " en ", "en"),
// Region/locale stripping
Entry("BCP-47 region", "en-US", "en"),
Entry("underscore region", "en_US", "en"),
Entry("dotted locale", "ja.JP", "ja"),
Entry("region + case", "ZH-CN", "zh"),
// Full-name aliases
Entry("english name", "english", "en"),
Entry("chinese name cased", "Chinese", "zh"),
Entry("japanese name", "japanese", "ja"),
Entry("russian name", "russian", "ru"),
Entry("portuguese name", "portuguese", "pt"),
// Empty stays empty (C++ applies the English default)
Entry("empty", "", ""),
Entry("whitespace only", " ", ""),
// Unknown values pass through normalized so C++ can log + default
Entry("unknown code", "klingon", "klingon"),
Entry("unknown with region", "xx-YY", "xx"),
)
})

View File

@@ -19,25 +19,24 @@ type LibFuncs struct {
}
func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("QWEN3TTS_LIBRARY")
if libName == "" {
libName = "./libgoqwen3ttscpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(err)
}
libFuncs := []LibFuncs{
{&CppLoad, "qt3_load"},
{&CppTTS, "qt3_tts"},
{&CppTTSStream, "qt3_tts_stream"},
{&CppPCMFree, "qt3_pcm_free"},
{&CppUnload, "qt3_unload"},
{&CppLoadModel, "load_model"},
{&CppSynthesize, "synthesize"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
purego.RegisterLibFunc(lf.FuncPtr, gosd, lf.Name)
}
flag.Parse()

Some files were not shown because too many files have changed in this diff Show More