Compare commits

..

46 Commits

Author SHA1 Message Date
Ettore Di Giacinto
449a51ff0b fix(fish-speech): allow invalid_reference_casting so tokenizers builds on darwin
On darwin arm64 the fish-speech editable install (pip install
--no-build-isolation -e) compiles the transitive `tokenizers` Python
package's Rust extension from source, because there is no prebuilt
manylinux wheel for that platform (Linux builds never compile it, so this
only breaks on macOS). The pinned tokenizers crate fish-speech's stack
resolves to contains a `&T` -> `&mut T` cast that the macOS CI runner's
newer Rust toolchain rejects via the now-deny-by-default
`invalid_reference_casting` lint:

    error: casting `&T` to `&mut T` is undefined behavior ...
    error: could not compile `tokenizers` (lib) due to 1 previous error
    ERROR: Failed building wheel for tokenizers

This failed the fish-speech darwin/metal (mps) backend image build in the
v4.5.5 release CI while all Linux variants built fine.

Fix: export RUSTFLAGS with `-A invalid_reference_casting` (appended to any
existing value, not clobbering) before installRequirements so the
unchanged third-party crate compiles as it did under the older toolchain.
Version-agnostic and harmless on Linux, where no Rust compile happens.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 09:27:02 +00:00
LocalAI [bot]
de2ec2f136 feat(backends): add voice-detect + face-detect ggml backends (replace Python insightface/speaker-recognition) (#10441)
* feat(voice-detect): add Go purego backend for voice-detect.cpp

Add backend/go/voice-detect implementing the Backend gRPC voice subset
(VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego,
mirroring the parakeet-cpp / omnivoice-cpp backends.

The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and
float-vector returns are owned by Go and released through the matching capi
free functions, with the per-ctx last error surfaced into Go errors. Calls are
serialized via base.SingleThread since the C context is not reentrant.

Proto field mapping:
- VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model.
- VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the
  verify_threshold option, default 0.25) -> verify_paths -> verified/distance/
  threshold/confidence/model/processing_time_ms.
- VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion
  document maps to a single VoiceAnalysis segment (start/end 0; gender "label"
  -> dominant_gender with the remaining float scores as the gender map; emotion
  label/scores -> dominant_emotion/emotion).

The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so
with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external
libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo
tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs
gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(voice-detect): wire backend into index, gallery and build

Register the voice-detect.cpp speaker-recognition + voice-analysis
backend (added in Voice-INT-A) into LocalAI's distribution surfaces,
mirroring the ced backend (the closest mudler C++/ggml audio analogue):

- backend/index.yaml: add the &voicedetect meta-backend (capabilities
  platform map, no top-level uri) plus the full set of concrete per-arch
  image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the
  -development variants). Referential integrity audited - every alias
  target resolves.
- gallery/index.yaml: add 5 model entries on backend voice-detect -
  ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the
  wav2vec2 age/gender/emotion analyze model. The engine architecture is
  read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are
  not yet published: each files: entry points at the intended
  mudler/voice-detect-gguf location with a TODO to fill sha256 after
  upload (no fabricated hashes).
- .github/backend-matrix.yml: add the linux build matrix block + the
  darwin metal entry mirroring ced.
- .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via
  VOICEDETECT_VERSION (pin 47546430, = 4754643).
- core/config/backend_capabilities.go: register voice-detect in the
  backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze ->
  speaker_recognition), mirroring speaker-recognition.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(face-detect): add purego Go backend for face-detect.cpp

Add the LocalAI Go backend that dlopens libfacedetect.so (the flat
facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect
backend. Implements the Face subset of the Backend gRPC service:

- Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path
  -> L2-normalized ArcFace embedding.
- Detect(DetectOptions): src -> detect_path_json -> Detection boxes
  (class_name "face", [x1,y1,x2,y2] -> x/y/w/h).
- FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof ->
  verify_paths; best-effort img areas via detect.
- FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face
  age + gender ("M"/"F" normalized to "Man"/"Woman").

The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib
with ggml + vendored libjpeg-turbo static (PIC), so the .so is
ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_
symbols). Gated Ginkgo e2e mirrors voice-detect.

Note for the gallery-wiring task: backend registration (index.yaml,
gallery, core/config/backend_capabilities.go) is intentionally not
touched here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(voice-detect): replace em dashes in net-new descriptions

Project style forbids em/en dashes. Replace the three U+2014 chars
introduced by the voice-detect gallery/index wiring with `-`/`:`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(face-detect): wire backend into index, gallery and build

Register the face-detect.cpp face detection / embedding / verification /
analysis backend (added in Face-INT-A) into LocalAI's distribution
surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml
recognition analogue):

- backend/index.yaml: add the &facedetect meta-backend (capabilities
  platform map, no top-level uri to avoid the meta-backend gotcha) plus
  the full set of concrete per-arch image entries (cpu/cuda12/cuda13/
  metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants),
  22 entries. Referential integrity audited: every alias target resolves.
- gallery/index.yaml: add 4 model entries on backend face-detect -
  face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL)
  and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the
  commercial-friendly alternative). The detector/embedder architecture is
  read from GGUF metadata (facedetect.arch) at load; only the real
  verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF
  artifacts are not yet published: each files: entry points at the
  intended mudler/face-detect-gguf location with a TODO to fill sha256
  after upload (no fabricated hashes).
- core/config/backend_capabilities.go: register face-detect in the
  backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze ->
  face_recognition), mirroring insightface.
- .github/backend-matrix.yml: add the linux build matrix block + the
  darwin metal entry mirroring voice-detect.
- .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via
  FACEDETECT_VERSION (pin 636a1963).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(recon): voice-detect metal build branch + face-detect gallery usecases

Add the missing metal BUILD_TYPE branch to the voice-detect Makefile
forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the
darwin metal CI artifact is built with the Metal backend instead of
CPU-only.

Expand the 4 face-detect gallery models' known_usecases to
[face_recognition, detection, embeddings] to match the backend
capabilities map and the mirrored insightface-buffalo entries, so
auto-selection for /v1/detect and /embeddings works.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* docs(recon): document voice-detect and face-detect ggml backends

Document the new standalone C++/ggml biometric backends as the
recommended/default option for face and voice recognition, keeping the
existing Python insightface / speaker-recognition backends framed as the
legacy path.

- features/face-recognition.md: add a face-detect (ggml) backend section
  with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface
  Apache-2.0), licensing, and verify/detect/analyze quickstart.
- features/voice-recognition.md: add a voice-detect (ggml) backend
  section with the gallery entries (ecapa-tdnn, wespeaker-resnet34,
  eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial
  analyze head) and quickstart.
- reference/compatibility-table.md: add face-detect.cpp and
  voice-detect.cpp rows to the Vision, Detection & Recognition table.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(gallery): publish recon backend GGUF uris + sha256

Fill in the published HuggingFace GGUF uris and verified sha256 for the
9 recon gallery entries (voice-detect-* and face-detect-*), and remove
the TODO publish markers. Correct the eres2net, campplus, and
emotion-wav2vec2 uris to the actual published filenames.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model

Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now
embedded and re-uploaded under the same filenames/uris) and note the
FaceVerify anti_spoof request flag in each description. Add a new
voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion
model.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(gallery): add face-detect-buffalo-sc and antelopev2 packs

Add gallery entries for two newly-published insightface face packs on
the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small
ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k
R100, 512-d). Both are non-commercial research-only.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(recon): honor LocalAI per-model threads in voice/face-detect backends

LocalAI spawns one backend process per model and serves requests
concurrently, so the engines' own min(hardware_concurrency, 8) default
can oversubscribe cores. Forward the per-model Threads value from the
gRPC LoadModel options into the engine via VOICEDETECT_THREADS /
FACEDETECT_THREADS (read at backend construction) before the capi load.
A non-positive Threads is treated as unset, leaving the engine default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump backend pins to CPU-optimized engine commits

voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached
pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads).
Brings the CPU optimizations into the LocalAI backend builds. GGUF format and
parity unchanged, so the published HF GGUFs remain valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump backend pins to round-2 CPU-optimized engines

voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context,
wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp
-> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace
BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump backend pins to round-3 Winograd engines

voice-detect.cpp -> 45122ec (Winograd F(2x2,3x3) for WeSpeaker/ERes2Net 3x3
convs, -22%/-20% @8t); face-detect.cpp -> cd5c962 (Winograd F(4x4,3x3) for
SCRFD large maps, -22% @1t on top of F(2x2), more load-stable). Parity held
(cosine=1.0); GGUF format unchanged, HF GGUFs valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump backend pins to round-4 Winograd engines (CPU opt complete)

voice-detect.cpp -> d2839ca (CAM++ FCM 2D convs through Winograd, -15.5%/-10.3%);
face-detect.cpp -> c1db23d (AVX2-vectorized Winograd tile transforms, SCRFD
detect -14%/-9.6%). Final CPU optimization round; the conv-kernel lever class is
now exhausted (parity held cosine=1.0; GGUF/parity unchanged, HF GGUFs valid).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump face-detect pin to deep-kernel engine (7ae5c4d)

face-detect.cpp -> 7ae5c4d: register-blocked winograd-domain GEMM microkernel
(2.8x isolated GFLOP/s), AVX-512 zmm evolution behind runtime CPUID dispatch
(ship-safe, AVX2 fallback bit-identical), bias/relu fused into the winograd
output transform, and SFace Conv+BN fold + bias/PReLU fusion. SCRFD detect
~1.4x faster end-to-end vs the round-4 baseline; parity bit-exact; portable
single binary (function-multiversioned, no global -mavx512f).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump voice-detect pin to ECAPA operand-order win (e9c56ae)

voice-detect.cpp -> e9c56ae: weight-as-src0 mul_mat order in ECAPA's F32
conv1d_same (routes through tinyBLAS sgemm); ECAPA embed 1.67x @1t / ~1.3x @8t,
parity cosine=1.0. Isolated to encoder.cpp (ECAPA-only); ERes2Net/CAM++/WeSpeaker
do not call conv1d_same so are provably unaffected.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to FMA-throughput engines (voice f7b9f89, face 2d2d5f0)

face -> 2d2d5f0: route ArcFace 3x3 body convs through the AVX-512 winograd
microkernel (kWinoMinSize 80->14); ArcFace 1.62x @1t, SCRFD detect to 0.966 of
MLAS @1t, no regression. voice -> f7b9f89: runtime-CPUID-dispatched AVX-512
winograd-GEMM microkernel (ship-safe, AVX2 fallback bit-identical); WeSpeaker
1.90x @1t. Parity cosine=1.0 throughout; portable single binaries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to MLAS-class direct-conv engines (voice 7ecfd07, face be22d67)

Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s,
within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback,
fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%)
+ ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace
recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression).
Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump voice pin to Phase-A blocked backbone (f4e7eef)

WeSpeaker ResNet34 runs as one nChw16c blocked island (2 reorders/forward vs
~60) on AVX-512, default; per-conv directconv fallback on AVX2. +2.9% @1t /
+17-19% @8t vs per-conv directconv, parity cosine=1.0. The conv microkernel is
already FMA-bound near peak (~0.86-0.98x MLAS-implied); residual to MLAS is
sub-peak edge + non-conv tail, documented in docs/cpu-optimization.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to breadth blocked-backbone (voice 7f66871, face d80092b)

voice 7f66871: AVX2-vectorized (ymm) blocked island - AVX2-only hosts now run
the blocked backbone for WeSpeaker (2.3x over per-conv-AVX2, cosine=1.0);
ERes2Net stays per-conv (blocked regresses, opt-in only); CAM++ Winograd-pinned.
face d80092b: ArcFace recognizer blocked island, AVX-512 default (-13% @8t, ~0.90x
MLAS, the closest conv result), auto per-conv on AVX2; SCRFD untouched on Winograd
(0 island invocations during detect). Parity cosine=1.0 / detect <=1px throughout.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to small-spatial + stem conv kernels (voice 99b1804, face 47fdab6)

Measured-gap-driven conv kernels: small-spatial (fill the register tile when
output width <= tile width) + small-IC stem + strided-1x1/downsample recovery.
ArcFace recognizer 0.57 -> 0.70x MLAS @1t (the closest conv model), WeSpeaker
0.65 -> 0.79x @1t. Parity cosine=1.0 / detect <=1px. The OC-block-sharing lever
was a measured dead-end (deep stride-1 is L3-weight-bandwidth bound, not
read-port bound) and was NOT shipped. Kernel ceiling reached; further gap needs
an algorithm-class change (cache-blocked weight-stationary GEMM, or q8 weights).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to GPU persistent-graph + multi-model-safe cache (voice 45d2e6b, face 0a4799a)

GPU wins (CUDA/ggml backend, no CPU-path change): persistent per-shape graph+context
cache in Backend::compute() eliminates the per-call cudaGraph re-instantiation churn
-> wav2vec2 emotion+age-gender now AT GPU parity with torch-cuDNN on GB10 (0.97-0.98x),
CAM++ -5.7ms; bit-identical parity. Cache hardened multi-model-safe (invalidate-on-free
keyed by the ModelLoader weights buffer) so LocalAI multi-model hosting cannot stale-hit.
Conv models still trail cuDNN (im2col-materialization-bound) - cuDNN implicit-GEMM lever next.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump pins to cuDNN-conv-capable engines (voice b6e4356, face 6107a24)

Adds the opt-in cuDNN implicit-GEMM conv path (VOICEDETECT_GGML_CUDNN /
FACEDETECT_GGML_CUDNN, DEFAULT OFF -> zero build/runtime dep until enabled).
On GPU it kills the im2col-materialization bottleneck and reaches torch-cuDNN
parity on the spill-bound convs: SCRFD detect 14.8->6.4ms (2.3x, ~parity),
WeSpeaker ~parity, ERes2Net beats torch (1.10x); ArcFace/CAM++ neutral (no
spill). Parity exact (SCRFD <=1px, cosine=1.0). To USE it in LocalAI, the CUDA
backend build must enable the flag AND bundle libcudnn - deferred until a
cuDNN-bundled GPU image; flag stays OFF here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(recon): enable cuDNN conv path on arm64+CUDA13 recon backends

The voice-detect.cpp / face-detect.cpp engines have an opt-in cuDNN
implicit-GEMM conv path behind VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN
(default OFF) that kills im2col on the GPU and reaches torch-cuDNN parity
(SCRFD 2.3x, WeSpeaker/ERes2Net parity), measured on the GB10
(arm64, CUDA 13, sm_121a).

Enable it for the CUDA build, but only where cuDNN actually ships: the
arm64 + CUDA 13 image (GB10/Jetson/L4T). x86 CUDA images carry no cuDNN,
so flipping it on globally for BUILD_TYPE=cublas would be a link failure.
The Makefiles gate on CUDA_MAJOR_VERSION=13 + arch (TARGETARCH from the
matrix/Docker build, uname -m fallback for local builds).

backend/Dockerfile.golang already installs the runtime libcudnn9-cuda-13
in the arm64+CUDA13 apt block; add the matching libcudnn9-dev-cuda-13 so
the build-time link resolves.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): bump voice-detect pin to ERes2Net blocked-default (30beecd)

Defaults VD_ERES2NET_BLOCKED ON: routes the ERes2Net Res2Net body through the
blocked nChw16c AVX-512 directconv island instead of the 1x1 mul_mat fast path
(CONT-transpose + skinny low-K GEMM). On the shipped GGML_NATIVE=OFF build (ggml
mul_mat is AVX2-only) this wins ~2x at every thread count (2.07x@1t, 2.2x@4t,
2.05x@8t); pure-AVX2 fallback still 1.3-1.62x. Parity exact (cosine=1.000000 vs
golden), so registered voices + verify/identify thresholds are unaffected. The
prior default-OFF rested on a stale comment whose 23pct regression only held on
the non-shipping GGML_NATIVE=ON build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* docs(readme): announce native voice-detect + face-detect backends in Latest News

Add a Latest News entry for the new from-scratch C++/ggml biometric backends
(voice-detect.cpp + face-detect.cpp) that replace the Python insightface and
speaker-recognition backends: no Python/onnxruntime at inference, self-contained
GGUF, bit-exact parity, GPU cuDNN parity. Mirrors the parakeet.cpp /
locate-anything.cpp native-backend news entries. Refs PR #10441.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(recon): re-pin to the squashed engine release commits

The voice-detect.cpp and face-detect.cpp histories were squashed to a single
release commit, which orphaned the previous pins (voice 30beecd, face 6107a24).
Re-pin to the new single-commit SHAs (voice 3d51077, face 06914b0); the tree is
identical, so the backend build is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 09:29:08 +02:00
LocalAI [bot]
d3a26f961d fix(ik-llama): port multimodal path to mtmd API and bump to f96eaddb (#10534) (#10568)
* fix(ik-llama): port multimodal path to mtmd API and bump to f96eaddb (#10534)

The IK_LLAMA_VERSION bump to f96eaddba8bed6a9a5e628bbf6a566775c70b49c pulls in
upstream commit "Prune examples/llava", which deletes examples/llava (clip.* /
llava.*). The ik-llama backend's grpc-server.cpp built a local `myclip` library
from those files and called the removed clip/llava C API, so the bump no longer
builds.

ik_llama keeps its multimodal stack in the surviving `mtmd` library
(examples/mtmd/, public headers mtmd.h + mtmd-helper.h). This ports the backend's
multimodal path onto the high-level mtmd_* / mtmd_helper_* API in place, leaving
the text path (which still uses ik_llama's retained old common API) untouched:

- Makefile: bump IK_LLAMA_VERSION to f96eaddb.
- prepare.sh: drop the clip/llava source copy + sed block; mtmd is a library
  target, no source copy needed.
- CMakeLists.txt: remove the `myclip` target; link `mtmd` and add its include
  dir; build grpc-server as C++17 (mtmd headers require it).
- patches: drop 0002 (targeted the deleted examples/llava/clip.cpp; the mtmd
  clip.cpp never calls ggml_quantize_chunk, so the fix is unneeded). Keep 0001
  (verified still applies).
- grpc-server.cpp / utils.hpp: replace clip_model_load + clip_image_load_from_bytes
  + llava_image_embed_make_with_clip_img + the manual [img-N] prefix splitting and
  per-image llava_embd_batch decode loop with mtmd_init_from_file (moved after the
  model load, which it requires), mtmd_helper_bitmap_init_from_buf, mtmd_tokenize
  and mtmd_helper_eval_chunks. Legacy [img-N] tags are translated, in order, into
  mtmd media markers (mtmd_default_marker()); the post-image suffix text stays on
  the normal token path so the sampling loop is unchanged.

Supersedes #10534.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ik-llama): align json alias to ordered_json to resolve mtmd.h conflict (#10534)

mtmd.h declares `using json = nlohmann::ordered_json` at global scope (and its
mtmd.cpp depends on it), while ik_llama's whole server/common stack also uses
ordered_json. Our grpc-server.cpp/utils.hpp kept a plain `nlohmann::json` alias,
which now collides with mtmd.h once it is included for the multimodal port:
"conflicting declaration 'using json = ...'". Switch our two aliases to
ordered_json to match; it is API-compatible (utils.hpp already used ordered_json
for its log helper) and our json never crosses into an unordered-json API.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 08:57:11 +02:00
LocalAI [bot]
13b1ae53bc chore: ⬆️ Update ggml-org/llama.cpp to 0ed235ea2c17a19fc8238668653946721ed136fd (#10536)
* ⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama-cpp): link server-stream.cpp TU into grpc-server for upstream 0ed235ea (#10536)

Upstream llama.cpp 0ed235ea added an SSE stream-resumption layer in a new
translation unit tools/server/server-stream.cpp, which defines
stream_session, stream_pipe_producer and the g_stream_sessions manager.
server-context.cpp (already #included into grpc-server.cpp) now calls into
it via spipe->cleanup(), stream_aware_should_stop() and
stream_session_attach_pipe(), so without the new TU the grpc-server link
fails on every arch with:

  undefined reference to `stream_pipe_producer::cleanup()'

prepare.sh already copies every tools/server/* file into tools/grpc-server/,
so the source is present; the only missing piece was including its
definitions. Add an __has_include-guarded #include "server-stream.cpp"
before server-context.cpp, mirroring the existing server-chat.cpp and
server-schema.cpp guards, keeping the source compatible with older
pins/forks that predate the split. The file is self-contained (its only
external symbols come from server-common, already in the TU) so it adds no
new undefined references; the http route-handler factories it also defines
are unused in the grpc path but harmless.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(llama-cpp): build renamed ggml-rpc-server target for upstream 0ed235ea (#10536)

Upstream renamed the RPC server CMake target and binary from `rpc-server`
to `ggml-rpc-server` (tools/rpc/CMakeLists.txt: `set(TARGET ggml-rpc-server)`),
so the RPC-enabled grpc build failed with "No rule to make target 'rpc-server'".
The grpc-server itself links fine after the server-stream.cpp fix; this only
updates the RPC target name and the binary path copied to llama-cpp-rpc-server.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 08:56:40 +02:00
LocalAI [bot]
e68ca109c5 chore: ⬆️ Update CrispStrobe/CrispASR to 6514c9da00b03a2f0f1b49a43fae4f3a01a41844 (#10535)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-28 08:56:24 +02:00
LocalAI [bot]
6740e988d2 chore: ⬆️ Update ggml-org/whisper.cpp to 0ae02cdb2c7317b50991367c165736ce42ed96ac (#10532)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-28 08:56:06 +02:00
LocalAI [bot]
ade9cc9e37 fix(openresponses): bound resume-stream buffer and enforce response ownership (#10569)
The background=true resumable-stream path had two latent issues.

1. Unbounded resume buffer. AppendEvent grew StreamEvents without limit, so
   a long-running or abandoned background generation could consume process
   memory without bound. The store now caps the buffer (event count and total
   bytes, mirroring llama.cpp's byte-capped slot ring), evicting oldest events
   from the front and advancing a droppedThrough watermark. GetEventsAfter
   returns ErrOffsetLost when the requested starting_after is below the
   watermark, and handleStreamResume surfaces that as HTTP 409 before
   committing to the SSE response, so a resuming client gets a clear error
   instead of a silently truncated stream.

2. Missing ownership check (IDOR). GET /responses/:id, its stream resume, and
   /cancel looked up responses purely by ID, letting any caller who knows or
   guesses an ID read or cancel another caller's response. Responses now carry
   the creating caller's identity (auth.GetUser), stamped at creation and
   compared on read/cancel/resume; a mismatch returns 404 (not 403) so
   existence is not leaked. Backward compatible: responses with no owner
   (single-key / no-auth deployments) remain accessible.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 02:02:15 +02:00
LocalAI [bot]
471e38e4e7 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9956436c925a367daeab097598b1ea1f32d3503f (#10533)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-28 01:55:44 +02:00
LocalAI [bot]
f3d829e2ef feat(distributed): add LOCALAI_DISTRIBUTED_SHARED_MODELS to skip staging on shared volumes (#10556) (#10566)
In distributed mode, even when the frontend and workers share the same
models directory via a shared volume mount, starting a model on a worker
re-staged (re-downloaded) it: stageModelFiles always uploads model files
into a tracking-key-namespaced subdir on the worker, and the staging probe
only checks that staged location, so a file already present on the shared
volume at the canonical path was never reused.

Add a config switch LOCALAI_DISTRIBUTED_SHARED_MODELS (default false). When
enabled, the operator asserts that all nodes mount the SAME models directory
at the SAME path, so staging is unnecessary: the frontend's absolute model
paths are already valid on the worker. In that mode stageModelFiles returns
the cloned opts unchanged without uploading, leaving the path fields pointing
at their canonical absolute paths so the worker loads them directly from the
shared volume.

The value is plumbed from DistributedConfig through SmartRouterOptions into
the SmartRouter. Docs and docker-compose.distributed.yaml updated.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:23:07 +02:00
LocalAI [bot]
91885c2c7e fix(distributed): return empty backend list for agent nodes instead of failing backend.list (#10545) (#10565)
Opening an AGENT-type worker node's detail page errored with
"failed to list backends on node" / NATS "nodes.<id>.backend.list:
no responders available". Agent workers only subscribe to agent.*,
jobs.*, mcp.* and <prefix>.backend.stop; they never subscribe to
backend.list, so the per-node ListBackendsOnNodeEndpoint request had
no responder and timed out.

The aggregate cluster-wide list already guards this in
managers_distributed.go (skip nodes whose NodeType is set and not
"backend"). The single-node endpoint lacked the same guard. Thread the
NodeRegistry into ListBackendsOnNodeEndpoint and short-circuit to an
empty (non-nil) list for non-backend node types before issuing the
doomed NATS request, mirroring the aggregate-list gate so both views
stay consistent.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:22:48 +02:00
LocalAI [bot]
f1fcafb888 fix(gallery): match mmproj/model quant as a whole token so F16 no longer selects BF16 (#10559) (#10564)
pickPreferredGroup matched a quant preference against the shard base
filename with strings.Contains. Because `f16` is a substring of `bf16`,
asking for the `F16` mmproj quant would wrongly satisfy a `BF16` file and
select it when its group came first.

Match the preference as a whole token instead: it must be delimited by a
non-alphanumeric character (or the string start/end) on both outer edges.
Separators inside the preference itself (e.g. `ud-q4_k_xl`) are left
untouched, and all occurrences are scanned before rejecting.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:21:33 +02:00
LocalAI [bot]
fdff114701 ci(vibevoice): skip the ASR transcription e2e on release tag builds (#10567)
The `tests-vibevoice-cpp-grpc-transcription` job downloads the vibevoice ASR
model (`vibevoice-asr-q4_k.gguf`, ~10 GB) and decodes it through the
e2e-backends harness. On release tag pushes the detect step forces the full
matrix (run-all=true), so this job runs and consistently times out: the inner
`go test -timeout 30m` cannot pull a 10 GB file from HuggingFace's throttled
Xet CDN within budget (curl --max-time 600 x5 retries overruns the deadline),
leaving an orphaned curl and a 30m panic. It has been red on every release
(v4.5.3/4/5).

Guard the job's `if` with `!startsWith(github.ref, 'refs/tags/')` so it no
longer runs on tag/release builds. It still runs on PRs and branch pushes that
touch vibevoice-cpp, so real regressions are caught off the release path. A
proper fix (a small ASR test GGUF) can re-enable it on tags later.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 00:40:21 +02:00
LocalAI [bot]
1154be5eea fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563)
The GGUF metadata parser (gpustack/gguf-parser-go) cannot read NVFP4-quantized
GGUFs at all: it errors with "read tensor info 0: This quantized type is
currently unsupported" because NVFP4 is a ggml tensor type it does not know.
When ParseGGUFFile errors, the llama-cpp defaults hook skips guessGGUFFromFile
entirely and the deferred fallback sets the context window to the conservative
GGUFFallbackContextSize (1024). The result: a model that trains to 262144
tokens runs with n_ctx=1024, and every prompt over ~1k tokens fails with
"request (N tokens) exceeds the available context size (1024 tokens)".

Two changes:

- Drop GGUFFallbackContextSize (1024) and fall back to DefaultContextSize
  (4096) in both the GGUF run-estimate path (gguf.go) and the deferred hook
  fallback (hooks_llamacpp.go). 1024 is a sensible floor for a tiny CPU GGUF
  but a footgun for a large, long-context model whose header simply cannot be
  parsed. Strengthen the existing "GGUF unreadable" test to assert the value.

- Set context_size explicitly on the four NVFP4 gallery entries
  (qwen3.6-35b-a3b-nvfp4-mtp, qwopus3.6-27b-v2-mtp-nvfp4,
  qwopus3.6-27b-coder-mtp-nvfp4, qwen3.6-27b-nvfp4-mtp) so the parser failure
  is irrelevant for them. 32768 matches sibling Qwen entries and is safe on
  memory; operators can raise it toward the 262144 train length.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:34:52 +02:00
LocalAI [bot]
8aba4fdba3 chore(fish-speech): drop the darwin/metal build target (#10561)
The fish-speech metal-darwin-arm64 backend build has been failing on every
release (v4.5.3, v4.5.4, v4.5.5) and is a standing red on the darwin backend
matrix. fish-speech pulls `tokenizers` transitively from its upstream source
(`pip install -e fish-speech-src`), and on darwin/arm64 there is no prebuilt
wheel for the pinned old `tokenizers` version, so pip builds it from source.
Modern rustc rejects that old crate as a hard error:

    error: casting `&T` to `&mut T` is undefined behavior ...
       --> tokenizers-lib/src/models/bpe/trainer.rs:517:47
       = note: `#[deny(invalid_reference_casting)]` on by default
    error: could not compile `tokenizers` (lib) due to 1 previous error

This is deterministic, not a flake, and there is no clean fix that does not
either pin a stale Rust toolchain or downgrade a soundness lint guarding real
UB. Until upstream fish-speech moves to a tokenizers version that compiles on
current toolchains, drop darwin support so the release backend build stays
green. The Linux/CUDA/ROCm/Intel/L4T variants are unaffected.

Removes:
- the `-metal-darwin-arm64-fish-speech` entry from `includeDarwin` in
  backend-matrix.yml
- the `metal:` capability mappings and the concrete `metal-fish-speech` /
  `metal-fish-speech-development` gallery entries in backend/index.yaml
- the now-unused darwin-only requirements-mps.txt

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:24:21 +02:00
LocalAI [bot]
d7d7721eae feat(distributed): SyncedMap component + migrate finetune/quant/agent-tasks to cross-replica state (#10542)
* feat(distributed): add SyncedMap cross-replica in-memory state component

Introduce core/services/syncstate.SyncedMap[K,V]: a thread-safe in-memory map
that keeps itself consistent across frontend replicas via NATS, with an optional
pluggable durable Store and hydrate-from-source convergence.

Several features keep process-local state surfaced to the API (finetune/quant
jobs, agent tasks, model configs) and each hand-wired the same in-memory + NATS
broadcast + read-through-store legs - or forgot to, reintroducing cross-replica
staleness. SyncedMap makes that consistency a configuration choice:

- local writes mutate the map, write through the Store, then broadcast a delta;
- the apply path is memory-only and never re-publishes or re-writes the Store
  (structural echo-loop guard, mirroring galleryop.mergeStatus);
- on Start and on NATS reconnect the map re-hydrates from the source (Store, else
  Loader); an optional periodic Reconcile repairs silent drift;
- standalone mode (nil NATS client) is a strict in-memory no-op.

Reconnect re-hydrate is wired via a new *messaging.Client.OnReconnect callback,
consumed through an optional type-assertion so MessagingClient stays minimal.
Adds messaging.SubjectSyncStateDelta and a reusable testutil.FakeBus (synchronous
in-process MessagingClient with wildcard matching) for adopter tests.

Component only; service migrations follow in subsequent commits.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(finetune): back jobs with SyncedMap for cross-replica consistency

FineTuneService kept jobs in a process-local map and, although it wrote them to
Postgres, ListJobs/GetJob never read the store back and the wired natsClient was
never used - so in distributed mode a job created on one replica was invisible to
the others. Replace the map and the dead client with a syncstate.SyncedMap keyed
by job ID, value *schema.FineTuneJob (the exact REST shape, so responses are
unchanged).

- Add a Store adapter (core/services/finetune/syncstore.go) over FineTuneStore,
  plus FineTuneStore.ListAll (global hydrate; per-user List kept) and an
  idempotent Upsert (create-or-update; Create alone fails on dup key).
- Writes go through SyncedMap.Set/Delete (write-through + broadcast); reads use
  List/Get. The on-disk state.json path becomes the standalone Loader, keeping
  single-node restart recovery (stale->stopped / exporting->failed fixups).
- Fold SetNATSClient/SetFineTuneStore into NewFineTuneService; app.go passes the
  distributed NATS client + store when distributed, nil otherwise.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(agentpool): back agent tasks with SyncedMap for cross-replica consistency

AgentJobService.ListTasks read the process-local tasks map only, while ListJobs
already read through the DB persister + dispatcher NATS - so in distributed mode
a task created on one replica was invisible to the others. Back tasks with a
syncstate.SyncedMap keyed by task ID (value schema.Task, the exact REST shape);
jobs are left untouched.

- Store adapter (task_syncstore.go) over the existing JobPersister
  (LoadTasks/SaveTask/DeleteTask); reads svc.persister/userID live so a persister
  swap needs no rebuild. No new persister methods required.
- Task reads -> SyncedMap.List/Get; create/update -> Set (write-through +
  broadcast); delete -> Delete. The file persister now owns its own task set so
  the write-through path does not re-enter the SyncedMap lock (deadlock guard).
- The distributed NATS client is not available at construction (start() precedes
  initDistributed), so it is injected via SetTaskSyncNATS, which rebuilds the
  still-empty map before Start/hydrate. Wired at the main, restart, and per-user
  (UserServicesManager) distributed sites.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(quantization): back jobs with SyncedMap + durable QuantStore

QuantizationService kept jobs in a process-local map persisted only to a local
state.json, so in distributed mode jobs were neither visible across replicas nor
durable cluster-wide. Back jobs with a syncstate.SyncedMap keyed by job ID
(value *schema.QuantizationJob, the exact REST shape).

- New distributed.QuantStore (GORM, table quantization_jobs) mirroring
  FineTuneStore: Create/Get/ListAll/Upsert(idempotent)/Delete, registered for
  AutoMigrate via distributed.InitStores (Stores.Quant).
- New adapter (quantization/syncstore.go) over QuantStore implementing
  syncstate.Store, with record<->schema conversion.
- Reads go through List/Get, writes through Set/Delete (write-through +
  broadcast); state.json is kept as the standalone Loader for single-node restart
  recovery (stale-job fixups preserved).
- app.go passes the distributed NATS client + QuantStore when distributed, nil
  otherwise; Start/Close lifecycle mirrors finetune.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(syncstate): annotate gosec G118 false positive on lifeCtx

gosec flagged the WithCancel in Start as "cancellation function not called"
because the returned cancel is stored on the struct rather than called/deferred
in scope. It is invoked in Close (covered by tests), and lifeCtx must outlive
Start to drive the reconnect/reconcile goroutines. Suppress the verified false
positive with a justified #nosec G118.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(distributed): e2e two-replica SyncedMap sync over real NATS + Postgres

Adds the real-infrastructure counterpart to the fake-bus unit tests, in the
existing distributed e2e suite (testcontainers NATS + PostgreSQL). Two SyncedMap
instances stand in for two frontend replicas - each with its OWN NATS connection
to a shared server and a SHARED Postgres store (the distributed-mode invariant) -
and assert, over the wire:

- a create on replica A is observed by replica B;
- an update and a delete propagate A -> B (delete prunes, which a reload cannot);
- a late-joining replica recovers a job it never received a delta for, via store
  hydrate on Start (the at-most-once gap a fake bus cannot exercise);
- a local Set is written through to the shared Postgres store.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:23:51 +02:00
Nicholas Ciechanowski
c548150f99 fix(distributed): missing agent NATS permission (#10549)
Signed-off-by: Nicholas Ciechanowski <nicholas@ciech.anow.ski>
2026-06-27 21:10:12 +00:00
LocalAI [bot]
ec26b86dd4 docs: ⬆️ update docs version mudler/LocalAI (#10560)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 22:36:02 +02:00
LocalAI [bot]
d11b202dd2 fix(backends): whisper darwin run.sh loads whichever fallback lib exists (.so/.dylib) (#10553)
fix(backends): whisper darwin run.sh loads whichever fallback lib exists

The macOS branch hardcoded WHISPER_LIBRARY=$CURDIR/libgowhisper-fallback.dylib,
but the cmake build emits a Mach-O named libgowhisper-fallback.so on darwin, so
the Go loader panicked at runtime ("dlopen ...dylib: no such file") and the
backend exited ("grpc service not ready") — breaking e.g. the silero-vad-ggml
VAD on darwin. Pick whichever of .dylib/.so is present so it is robust to the
build's naming either way.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 14:07:56 +02:00
LocalAI [bot]
e95018ef70 chore(model gallery): 🤖 add 1 new models via gallery agent (#10544)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 09:42:46 +02:00
LocalAI [bot]
0258f8af55 fix(backends): repair release CI build/test breaks (kokoros, fish-speech, llama-cpp-quantization, sglang) (#10547)
* fix(kokoros): implement new Backend RPCs to fix the build

The backend.proto grew six RPCs (SoundDetection, Depth, TokenClassify,
Score and the bidi-streaming Forward) that the kokoros gRPC service never
implemented, so the trait impl no longer satisfies `Backend`:

    error[E0046]: not all trait items implemented, missing:
      `sound_detection`, `depth`, `token_classify`, `score`,
      `ForwardStream`, `forward`

kokoros is a TTS backend with no use for these, so add `unimplemented`
stubs (plus the `ForwardStream` associated type) matching the existing
pattern for every other unsupported RPC in this file.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(fish-speech): add setuptools-rust for the editable source install

install.sh installs the fish-speech source tree editable with
`--no-build-isolation`, which means the build backends of its transitive
dependencies must already be present in the venv. One of them builds a
Rust extension and its metadata step fails with:

    ModuleNotFoundError: No module named 'setuptools_rust'

Add setuptools-rust to requirements.txt so installRequirements provisions
it before the editable install runs.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(llama-cpp-quantization): vendor convert_hf_to_gguf.py with conversion/

Upstream llama.cpp split the model-specific logic out of the single
convert_hf_to_gguf.py file into a sibling `conversion/` package, so the
script now starts with `from conversion import ...`. Downloading just the
one file therefore fails at runtime with:

    ModuleNotFoundError: No module named 'conversion'

Clone the repo (reusing the clone already needed to build llama-quantize)
and copy both the script and the `conversion/` package into the backend
dir. Python puts the script's own directory on sys.path[0], so the package
resolves when it sits beside the script.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(sglang): pin the CPU source build to sglang v0.5.11

The CPU profile builds sgl-kernel from a `git clone` of sglang with no
ref, so it always tracks master. Recent master added CPU kernels (e.g.
mamba/fla.cpp) that fail to compile in our builder:

    constexpr variable 'scale' must be initialized by a constant
    static library kineto_LIBRARY-NOTFOUND not found

Pin the clone to v0.5.11, the same release the GPU path already floors on
(requirements-cublas12-after.txt). Overridable via SGLANG_VERSION so the
pin can be bumped deliberately.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 09:42:22 +02:00
LocalAI [bot]
14b29ebf4e fix(backends): derive darwin RUN_BINARY from the exec line only (#10541)
golang-darwin.sh's packaging check derived the launch binary by grepping every
$CURDIR/... reference in run.sh and taking the last one. Backends that pick a
runtime CPU variant assign it via unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so`
lines, so the heuristic returned `libgo<x>-avx512.so` — a variant Darwin never
builds (arm64 builds only fallback) — and the check then failed with
"package/libgo<x>-avx512.so not found ... refusing to package (#10267)",
breaking the darwin builds for whisper, sam3-cpp, vibevoice-cpp and friends.

Scan only the `exec` line(s) (the actual launch contract) and tolerate a
quoted `exec "$CURDIR"/<binary>`. parakeet-cpp's parakeet-cpp-grpc and the
quoted-only backends (sherpa/piper/opus) resolve correctly; no Linux change.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 02:05:40 +02:00
LocalAI [bot]
f0d0bff232 fix(llama-cpp): stop reinterpreting plain-string message content as JSON (#10524) (#10538)
The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the
tokenizer-template path and blindly json::parse'd each message's content
string. LocalAI's Go layer always flattens content to a plain string, so a
user prompt that merely looks like JSON (e.g. mealie's ingredient array
["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and
rejected by oaicompat_chat_params_parse with "unsupported content[].type".

Normalize content per role instead: user/system/developer content is opaque
text and is never JSON-sniffed; assistant/tool content still collapses a literal
JSON null/object (tool-call bookkeeping) to a string, but a plain string is
never turned into an array/scalar. The array defense is role-independent, so the
role gate only governs the benign null/object case.

While here, extract the duplicated per-message reconstruction and the
pre-template content sanitization into shared, unit-tested helpers
(message_content.h) so the streaming (PredictStream) and non-streaming (Predict)
paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the
dead tool-role parse branches, and the redundant Predict-only tool_calls branch,
while preserving the prior #7324 (null content -> "") and #7528 (tool array
content -> string) fixes.

Tests:
- backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for
  all three helpers (#10524, #7324, #7528, multimodal), discovered and run by
  `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired
  as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON).
- core/schema/message_test.go: Go regression pinning that ToProto flattens a
  JSON-array-looking text part to the verbatim string.
- prepare.sh now copies message_content.h into the build tree.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:42:05 +02:00
LocalAI [bot]
64150ca7ab fix(distributed): broadcast admin model-config changes across replicas (#10540)
In distributed mode the admin model endpoints (/models/edit, /models/import,
/models/toggle-state and the PATCH config-json endpoint) wrote the YAML to the
shared models dir but reloaded only the local replica's in-memory
ModelConfigLoader. With multiple frontend replicas behind one service, a save
landed on whichever replica handled the request; peers kept serving their stale
in-memory view, so a load-balanced request was a coin-flip between old and new
config (a created alias visible on one replica and missing on the other, an
edited alias target diverging, etc.).

The NATS cache-invalidation channel (SubjectCacheInvalidateModels +
OnModelsChanged) already existed for the gallery install/delete path; these
admin endpoints simply never published on it. Wire them up via a new
GalleryService.BroadcastModelsChanged helper (no-op in standalone mode).

Also fix delete propagation: LoadModelConfigsFromPath is additive and never
drops an entry whose file is gone, so the subscriber hook (which only reloaded
from disk) could not propagate a removal. ApplyRemoteChange now honors the
event op - pruning the element on "delete" and reloading otherwise - and shuts
down any running instance of the affected model so the new config takes effect.
This closes the same latent gap on the gallery delete path.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:36:57 +02:00
LocalAI [bot]
f98b0f1c1e fix(gpu-libs): bundle transitive deps of GPU runtime libs (#10537) (#10539)
fix(gpu-libs): bundle transitive deps of GPU runtime libs

The per-vendor packagers in package-gpu-libs.sh copy an explicit allowlist
of top-level GPU runtime libraries (libamdhip64, libhipblas, librocblas, the
CUDA/Intel equivalents, ...) but never resolved their transitive
dependencies. Backends run through the bundled lib/ld.so with
LD_LIBRARY_PATH=lib, so any transitive dep not in the allowlist is a fatal
"cannot open shared object file" at load time.

On recent ROCm (base image rocm 7.2.1) the runtime libs link against
librocprofiler-register.so.0, which is not in the allowlist, so the rocm
llama-cpp backend (and every other GPU backend sharing this script) failed
to load with:

  librocprofiler-register.so.0: cannot open shared object file

The Vulkan path already solved this class of problem with copy_elf_deps
(ldd-based transitive resolution), but that sweep was only wired into the
Vulkan ICD path. This adds a generic sweep_transitive_deps that runs the
same ldd resolution over everything the allowlist already bundled, and wires
it into the ROCm, CUDA and Intel packagers. ldd returns the full recursive
closure, so one pass suffices; core libc-family deps are skipped via
is_core_lib so we never shadow the loader's own libc/libstdc++.

Adds a self-contained regression test (gcc + ldd) that fabricates a primary
lib linking a transitive lib and asserts the sweep bundles the dependency.

Fixes #10537

Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:36:33 +02:00
LocalAI [bot]
2c96c2d08e chore: ⬆️ Update mudler/parakeet.cpp to f469a57270a1cc4554acb15febf60e56619673b9 (#10530)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 00:50:51 +02:00
LocalAI [bot]
f01a969f7b docs: ⬆️ update docs version mudler/LocalAI (#10531)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 00:29:29 +02:00
LocalAI [bot]
56600eec3e fix(nodes): show a node's existing labels on the detail view (#10529)
fix(nodes): return labels in single-node GET so the detail view shows them

The node detail view (/app/nodes/:id) reads `node.labels` to render a
node's existing labels, but the single-node GET endpoint returned a bare
BackendNode whose Labels live in a separate table - so the list was always
empty and operators could only add labels, never see what was already set
(#10527). The same response also lacked in_flight_count and model_count.

Add NodeRegistry.GetWithExtras, mirroring the existing List vs ListWithExtras
split: bare Get stays cheap for the routing hot paths and existence checks,
while the detail endpoint uses the enriched variant to attach the labels map
and live counts. No frontend change is needed - the UI already renders
existing labels once the data is present.

Closes #10527


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 23:06:42 +02:00
LocalAI [bot]
c4fa256cdf chore(model gallery): 🤖 add 1 new models via gallery agent (#10526)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-26 22:31:22 +02:00
LocalAI [bot]
17c1fc74b2 fix(backends): darwin packaging for silero-vad (last Linux-only Go backend) (#10528)
fix(backends): darwin packaging for silero-vad

silero-vad was the last Go backend with Linux-only darwin packaging:
- package.sh fell through to "Could not detect architecture" -> exit 1 on
  macOS (no Darwin branch), so its darwin image never packaged.
- run.sh exported LD_LIBRARY_PATH, which macOS dyld ignores, so the bundled
  libonnxruntime.dylib couldn't be found at runtime.

Add a Darwin branch to package.sh (skip the glibc/ld.so bundling; add an
@loader_path/lib rpath so @rpath resolves to package/lib/) and a
DYLD_LIBRARY_PATH branch to run.sh — mirroring the piper darwin fix (#10525).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 22:31:06 +02:00
LocalAI [bot]
068d397acf fix(backends): set rpath on the piper darwin binary so it can load its bundled libs (#10525)
The metal-darwin-arm64-piper backend crashed at launch on macOS:

    DYLD "Library missing"
      Library not loaded: @rpath/libucd.dylib
      Referenced from: .../piper
      Reason: no LC_RPATH's found

The piper binary links libucd, libespeak-ng, libpiper_phonemize and
libonnxruntime via @rpath, but ships with no LC_RPATH, so dyld cannot
expand @rpath and aborts before piper runs. The libraries themselves are
already bundled in package/lib/ by package.sh.

Additionally, package.sh's architecture detection only handled the Linux
glibc loaders (/lib64/ld-linux-x86-64.so.2, /lib/ld-linux-aarch64.so.1)
and otherwise hit `echo "Error: Could not detect architecture"; exit 1`,
so on macOS packaging failed outright.

Add a Darwin branch (before the Linux checks) that skips the glibc/ld.so
bundling macOS has no use for and instead runs
`install_name_tool -add_rpath @loader_path/lib` on the piper binary, so
@rpath resolves to the bundled package/lib/ directory.

Also mirror sherpa-onnx/opus in run.sh: export DYLD_LIBRARY_PATH on
Darwin (LD_LIBRARY_PATH is Linux-only) as a defensive fallback.

Validated by hand on Apple Silicon: with the rpath added, piper
synthesized a real WAV. The darwin build is validated in CI.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 15:10:15 +02:00
LocalAI [bot]
5b3572f8b8 feat(macos): sign and notarize the DMG, app, and server binary (#10510)
Produce a Gatekeeper-clean macOS distribution with no user workaround:

- Launcher DMG + the LocalAI.app inside it are built via fyne, codesigned
  with the Developer ID under the hardened runtime, then the DMG is signed,
  notarized (notarytool) and stapled. Replaces macos-dmg-creator (which had
  no signing hook) with fyne package + hdiutil so we control the .app before
  packaging.
- The bare local-ai darwin server binary is signed + notarized via
  GoReleaser's native notarize block (quill backend, runs on Linux).
- All signing is gated on secrets being present, so forks/PRs/local builds
  stay unsigned and green (contrib/macos/sign-and-notarize.sh no-ops).
- Add hardened-runtime entitlements and FyneApp.toml for deterministic
  packaging; update macOS install docs to drop the quarantine workaround.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 12:45:51 +02:00
LocalAI [bot]
6afe127cd4 fix(backends): make the opus backend build and package on macOS/Darwin (#10523)
The opus Go backend (WebRTC audio codec) never built on macOS, so the
published master-metal-darwin-arm64-opus image shipped source only — no
opus binary and no libopusshim — because every step assumed Linux.

- Makefile: hardcoded libopusshim.so with no OS handling. Mirror
  sherpa-onnx: SHIM_EXT=so / dylib on Darwin and build
  libopusshim.$(SHIM_EXT). On Darwin link the shim with
  -undefined dynamic_lookup so it resolves opus_encoder_ctl from the
  already globally-loaded libopus (codec.go dlopens it RTLD_GLOBAL
  first) instead of baking an absolute Homebrew path into the dylib,
  keeping the packaged shim relocatable.
- run.sh: hardcoded LD_LIBRARY_PATH + libopusshim.so even on macOS. Add
  a Darwin branch exporting DYLD_LIBRARY_PATH and the .dylib shim, like
  sherpa-onnx/run.sh.
- package.sh: bundle libopusshim.$(SHIM_EXT) and libopus*.dylib (not
  just .so) into package/lib so the OCI image (which ships package/.)
  is self-contained on a runtime with no Homebrew; add a Darwin arch
  branch so it doesn't warn/skip.
- backend_build_darwin.yml: install + link opus and pkg-config via brew
  so the Makefile's `pkg-config opus` resolves on the macOS runner, and
  cache opus' Cellar dir.

Go code is unchanged; darwin build is validated in CI.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 11:19:50 +02:00
LocalAI [bot]
f58dcefed4 fix(backends): ship the package/ dir for darwin go backend images (#10522)
fix(backends): ship the package/ dir for darwin go backends

golang-darwin.sh packaged the whole backend source/build dir as the OCI
image (backend/go/$BACKEND/.), so the runtime dylibs ended up under
package/lib and backend-assets/lib while run.sh looks in $CURDIR/lib. As a
result a backend like sherpa-onnx could not dlopen its libsherpa-shim.dylib
at runtime and exited immediately (the model then 500s with "grpc service
not ready"); it started fine only when run from inside package/.

Ship package/. instead — the self-contained run.sh + binary + lib/ bundle —
matching the Linux Dockerfile.golang (`COPY .../package/. ./`). Backends
that don't assemble a package/ fall back to the backend dir, and the
binary-existence guard now checks the directory actually shipped.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 08:48:27 +02:00
LocalAI [bot]
11b062f8f4 chore(model gallery): 🤖 add 1 new models via gallery agent (#10521)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-26 07:43:29 +02:00
LocalAI [bot]
114eeaae81 feat(backends): make PreferDevelopmentBackends install the development image as primary (#10520)
When LOCALAI_PREFER_DEV_BACKENDS is set, install the -development image as the
primary backend URI (keeping the released image reachable as the first
fallback), instead of only reaching development as a download fallback when the
released image is missing. This lets an operator force backends built from the
development branch — e.g. to pick up a fix already on master before a release.

Threads PreferDevelopmentBackends through SystemState so InstallBackend can see
it, and reuses the same development-URI convention as the existing failure-path
fallback (released tag -> branch tag + dev suffix). The unexported developmentURI
helper is covered by a Ginkgo spec.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 07:42:45 +02:00
LocalAI [bot]
d388f874de feat(backends): darwin/Metal build for the privacy-filter backend (#10513)
* feat(backends): darwin/Metal build for the privacy-filter backend (timeboxed try)

The privacy-filter.cpp engine is already Metal-capable on Apple Silicon: it pulls
ggml and never forces GGML_METAL=OFF, and ggml defaults Metal ON on Apple, so a
plain Darwin build is Metal-enabled. grpc++/protobuf resolve from Homebrew via
find_package(... CONFIG). It just had no darwin build path - the existing
package.sh and run.sh are Linux-only and there was no make target / workflow step.

Adds the bespoke darwin path, modeled on the ds4 one:
- scripts/build/privacy-filter-darwin.sh: native make grpc-server, otool -L dylib
  bundling, create-oci-image (no Linux package.sh).
- Makefile: backends/privacy-filter-darwin target (+ .NOTPARALLEL).
- .github/workflows/backend_build_darwin.yml: gated build step for privacy-filter.
- scripts/changed-backends.js: inferBackendPathDarwin special-case -> backend/cpp.
- .github/backend-matrix.yml: includeDarwin entry (lang go, like ds4/llama-cpp).
- backend/index.yaml: metal: capability + metal-privacy-filter(-development) entries.
- backend/cpp/privacy-filter/run.sh: DYLD_LIBRARY_PATH branch on Darwin.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(privacy-filter): macOS proto include + bundle ggml dylibs

Validated natively on an M4 (the build/package/load chain now works with Metal):

- CMakeLists.txt: hw_grpc_proto compiles the generated proto/grpc sources but
  only linked the binary dir, so on macOS it could not find protobuf's headers
  (runtime_version.h) - Homebrew puts them under /opt/homebrew, not /usr/include.
  Link protobuf::libprotobuf + gRPC::grpc++ so their include dirs propagate. No-op
  on Linux (apt headers are already on the default search path).
- privacy-filter-darwin.sh: bundle the ggml shared libs the binary @rpath-links
  (libggml{,-base,-cpu,-blas,-metal}); the otool -L walk only catches on-disk
  absolute deps and missed them. Resolved at runtime by run.sh's DYLD_LIBRARY_PATH.

M4 check: arm64 grpc-server links @rpath/libggml-metal.0.dylib; with the 15 ggml
dylibs + grpc/protobuf bundled, it loads clean (no dyld errors) and prints usage.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 01:18:41 +02:00
LocalAI [bot]
86677495a2 chore: ⬆️ Update ggml-org/llama.cpp to 9d5d882d8cd0f0a9283d87ed5e6fe3ee0d925fb1 (#10514)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-26 01:15:40 +02:00
LocalAI [bot]
253aedff06 chore: ⬆️ Update CrispStrobe/CrispASR to 8f1218141b792b8868861c1af17ba1e361b05dc0 (#10502)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-26 01:08:09 +02:00
LocalAI [bot]
74f07ecc35 fix(backends): quote $CURDIR in run.sh (fixes backends in paths with spaces) (#10519)
fix(backends): quote $CURDIR in run.sh so backends work in paths with spaces

The backend launcher scripts derive their own directory with
CURDIR=$(dirname "$(realpath $0)") and then referenced it unquoted as
$CURDIR (e.g. [ -f $CURDIR/lib/ld.so ], export LD_LIBRARY_PATH=$CURDIR/lib:...,
exec $CURDIR/<binary> "$@"). When a backend is installed under a path that
contains a space - notably macOS's ~/Library/Application Support/... - bash
word-splits the unquoted $CURDIR, so the test builtin fails with
"binary operator expected" and exec tries to run ".../Library/Application",
yielding "No such file or directory". The backend never starts, surfacing as
a gRPC "service not ready" error and an HTTP 500. Quote $CURDIR (and the
realpath "$0") in every affected run.sh; no logic changes.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 01:02:48 +02:00
LocalAI [bot]
ae0da454a7 chore: pin localrecall to tagged v0.6.3 (#10518)
#10517 pinned the pseudo-version of the postgres connection-timeout fix;
mudler/LocalRecall@v0.6.3 now tags that exact commit. Use the clean release
tag instead of the pseudo-version. No code change.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 01:02:15 +02:00
LocalAI [bot]
179210b970 chore: bump localrecall for postgres per-connection timeouts (#10517)
* chore: bump localrecall for postgres per-connection timeouts

Pulls mudler/LocalRecall#49: sets lock_timeout / idle_in_transaction
(default on) + opt-in statement_timeout on every pooled connection, so a
corrupt/wedged index (e.g. a BM25 insert spinning on a buffer-content lock)
can no longer hold its relation lock forever and head-of-line block the
whole vector store.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* docs(agents): document PostgreSQL connection safety timeouts

Note the POSTGRES_LOCK_TIMEOUT / POSTGRES_IDLE_IN_TRANSACTION_TIMEOUT /
POSTGRES_STATEMENT_TIMEOUT env vars read by the embedded vector store, and
that safe defaults are on automatically.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 00:53:03 +02:00
LocalAI [bot]
6c03e46390 chore: ⬆️ Update ikawrakow/ik_llama.cpp to b84902d2ad27c34f989f23947200c4b91b1568fd (#10515)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 23:42:21 +02:00
LocalAI [bot]
f2ed63e39a docs(backends): make OS coverage explicit + require darwin support for new backends (#10516)
docs(backends): make OS coverage explicit + require darwin for new backends

The backend matrix is the source of truth for which OS a backend ships on, but
that was never written down, so backends were landing Linux-only by default even
when the engine builds fine on macOS.

- .github/backend-matrix.yml: header block documenting the two matrices
  (include = Linux, includeDarwin = macOS/Apple Silicon) and the policy that new
  backends target every OS they can build for.
- .agents/adding-backends.md: a 'Cover every OS' subsection in step 2 (full darwin
  wiring: includeDarwin entry, index.yaml metal: + metal-<backend> entries,
  run.sh DYLD branch + inferBackendPathDarwin case for C++ backends, the
  hw_grpc_proto protobuf/grpc link gotcha, and the path-filter touch) plus a
  verification-checklist item.
- AGENTS.md (CLAUDE.md): Quick Reference pointer so it surfaces every session.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 23:26:39 +02:00
LocalAI [bot]
286c508ce0 feat(backends): darwin build for the localvqe backend (acoustic echo cancellation) (#10512)
feat(backends): darwin build for the localvqe backend

LocalVQE (acoustic echo cancellation / noise suppression / dereverberation)
already builds on Darwin - its Makefile takes the OS=Darwin branch with
GGML_METAL=OFF (upstream is CPU + Vulkan only), producing a native arm64 CPU
image. It was just never wired into CI.

- .github/backend-matrix.yml: add localvqe to includeDarwin (build-type metal,
  lang go) - the darwin/arm64 build profile; the backend itself stays CPU.
- backend/index.yaml: metal: capability + concrete metal-localvqe(-development)
  entries pointing at the -metal-darwin-arm64-localvqe images.
- backend/go/localvqe/Makefile: note on the existing Darwin branch (also the
  per-backend change the CI path filter needs to build it here).


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 22:54:36 +02:00
LocalAI [bot]
d1a9d59917 feat(backends): darwin/Metal builds for vision C++/ggml backends (depth-anything, locate-anything, rfdetr-cpp, sam3-cpp) (#10511)
feat(backends): darwin/Metal builds for the vision C++/ggml backends

depth-anything-cpp, locate-anything-cpp, rfdetr-cpp and sam3-cpp already carry
a Darwin/Metal path in their Makefiles (GGML_METAL=ON when build-type=metal),
but were never wired into CI, so no Metal image was published and Apple Silicon
could not install them.

- .github/backend-matrix.yml: add the four to includeDarwin (build-type metal,
  lang go), matching the other go+ggml *-cpp Metal entries.
- backend/index.yaml: add metal: to each backend's capabilities map (main and
  -development) plus concrete metal-<backend>(-development) entries pointing at
  the latest/master -metal-darwin-arm64-<backend> images.
- backend/go/*/Makefile: a one-line note on the existing Darwin branch (also
  the per-backend change the CI path filter needs to actually build them here).


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 22:07:56 +02:00
LocalAI [bot]
f72046b5b5 fix(auth): make advisory locks dialect-aware and harden SQLite DSN (#10509)
* fix(auth): make advisory locks dialect-aware and harden SQLite DSN

Fixes #10506.

Two failures hit deployments that use the default SQLite auth database:

1. advisorylock executed PostgreSQL-only SQL (pg_advisory_lock /
   pg_try_advisory_lock) unconditionally. On a SQLite auth DB the job
   store, agent store and node registry migrations failed with
   "no such function: pg_advisory_lock". WithLockCtx/TryWithLockCtx now
   branch on the gorm dialect: PostgreSQL keeps the cross-process advisory
   lock, every other dialect uses a context-aware, per-key in-process lock
   (a SQLite auth DB is effectively single-process, so serializing within
   the process is sufficient).

2. The SQLite auth DSN set no busy timeout, so transient SQLITE_BUSY over
   network-backed storage (SMB/CIFS/NFS, e.g. Azure Files) failed the auth
   migration immediately with "database is locked". The DSN now sets
   _busy_timeout=5000 and _txlock=immediate (caller-supplied values are
   preserved). WAL is intentionally not enabled since its shared-memory
   mmap does not work over network filesystems. Docs note that PostgreSQL
   should be used when the data directory lives on shared storage.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(jobs): regression test for #10506 SQLite job store migration

Exercises the exact caller chain that failed in the issue:
auth.InitDB(sqlite) -> jobs.NewJobStore -> advisorylock.WithLockCtx ->
AutoMigrate. Before the dialect-aware advisory lock fix this failed with
"no such function: pg_advisory_lock"; the test now asserts it migrates
cleanly on a SQLite auth DB.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 17:18:55 +02:00
186 changed files with 8747 additions and 1329 deletions

View File

@@ -102,6 +102,24 @@ Multi-arch backends are NOT a single matrix entry with `platforms: 'linux/amd64,
Entries whose `dockerfile` is `./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` must also set a `builder-base-image` field pointing at a prebuilt base from `quay.io/go-skynet/ci-cache:base-grpc-*` (CI builds these via `.github/workflows/base-images.yml`). The mapping is by `(build-type, platforms)` — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~2535 min cold). Local `make backends/<name>` ignores `builder-base-image` and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds.
### Cover every OS the project supports (Linux **and** Darwin)
`.github/backend-matrix.yml` has two matrices, and they are the source of truth for which OS a backend ships on:
- `include:` — the **Linux** matrix (x86_64 + arm64; CPU and CUDA / ROCm / SYCL / Vulkan).
- `includeDarwin:` — the **macOS / Apple Silicon** matrix (arm64; Metal where the engine supports it, otherwise a native arm64 CPU build).
**A new backend must target every OS it can build for — do not ship Linux-only by default.** A backend that appears only under `include:` is silently unavailable on macOS even when its code would run there. Most C/C++/GGML engines build on Darwin out of the box (ggml defaults `GGML_METAL=ON` on Apple, so a plain build is Metal-enabled), and many Python backends do too (CPU / MPS wheels). If a backend genuinely cannot support an OS (e.g. CUDA-only, no CPU variant), state that in the PR description instead of omitting it silently.
Wiring a backend into `includeDarwin:` is more than the matrix entry:
1. **`includeDarwin:` entry** — `tag-suffix: "-metal-darwin-arm64-<backend>"`, `build-type: "metal"`, `lang: "go"` for go+ggml backends; omit `build-type` for the bespoke C++ ones (llama-cpp / ds4 / privacy-filter). Match an existing entry of the same shape.
2. **`backend/index.yaml`** — add `metal:` to the backend's `capabilities` map (main and `-development`) and concrete `metal-<backend>` / `metal-<backend>-development` image entries pointing at the `-metal-darwin-arm64-<backend>` images.
3. **C/C++ backends only** — add an `inferBackendPathDarwin` case in `scripts/changed-backends.js` returning `backend/cpp/<backend>/` (the generic fallthrough assumes `backend/<lang>/`, which is wrong for a C++ source tree driven with `lang: go`), and give `run.sh` a Darwin branch that exports `DYLD_LIBRARY_PATH` instead of `LD_LIBRARY_PATH`. If the build is bespoke (single `grpc-server` + dylib bundling), model it on `scripts/build/ds4-darwin.sh` and add a `backends/<backend>-darwin` make target plus a gated step in `.github/workflows/backend_build_darwin.yml`.
4. **C++ proto gotcha** — if the backend compiles the generated gRPC/protobuf in a separate CMake target (e.g. `hw_grpc_proto`), that target must link `protobuf::libprotobuf` + `gRPC::grpc++` so the Homebrew include dirs propagate; otherwise macOS fails with `google/protobuf/runtime_version.h not found` (Linux hides this because apt headers sit in `/usr/include`).
The CI path filter only builds a backend on a PR when a file under its directory changes, so a darwin-only YAML edit builds nothing — touch a file under `backend/<lang>/<backend>/` (a one-line comment is enough) in the same PR.
## 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
@@ -225,6 +243,7 @@ After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/backend-matrix.yml` for all desired platforms (per-arch entries with `platform-tag` for multi-arch; `builder-base-image` for llama-cpp / ik-llama-cpp / turboquant)
- [ ] **OS coverage considered**: added to `includeDarwin:` (macOS/Apple Silicon) if the backend can build there — with the `backend/index.yaml` `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD branch and `inferBackendPathDarwin` case for C++ backends — or the PR explains why an OS is unsupported. Do not ship Linux-only by default.
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml

View File

@@ -2,6 +2,28 @@
# Matrix data for backend container image builds.
# Consumed by scripts/changed-backends.js for both backend.yml and backend_pr.yml.
# This file is NOT a workflow — it has no top-level 'on:' or 'jobs:'.
#
# OS / platform coverage — READ THIS WHEN ADDING A BACKEND
# --------------------------------------------------------
# This file is the source of truth for which OS each backend is built and
# published for. A backend ships ONLY for the matrices it appears in:
# - Linux -> the `include:` matrix below (x86_64 + arm64; CPU and
# CUDA / ROCm / SYCL / Vulkan variants).
# - macOS -> the `includeDarwin:` matrix (Apple Silicon / arm64; Metal where
# the engine supports it, otherwise a native arm64 CPU build).
#
# New backends must target EVERY OS they can build for, not just Linux. A backend
# listed only under `include:` is silently unavailable on macOS even when its code
# would run there. Most C/C++/GGML engines build on Darwin (ggml defaults
# GGML_METAL=ON on Apple, so a plain build is Metal-enabled), and many Python
# backends do too (CPU / MPS). If a backend genuinely cannot support an OS, say so
# in its PR description rather than silently omitting it.
#
# Adding a backend to `includeDarwin:` is more than one line — see the darwin
# checklist in .agents/adding-backends.md (includeDarwin entry, the index.yaml
# `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD
# branch for C/C++ backends, and the inferBackendPathDarwin case in
# scripts/changed-backends.js so the path filter actually builds it).
# Linux matrix (consumed by backend-jobs).
include:
@@ -3723,6 +3745,302 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# voice-detect
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-voice-detect'
base-image: "ubuntu:24.04"
ubuntu-version: '2404'
runs-on: 'ubuntu-24.04-arm'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-voice-detect'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-voice-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-voice-detect'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-voice-detect'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-voice-detect'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
runs-on: 'ubuntu-latest'
skip-drivers: 'false'
backend: "voice-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# face-detect
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-face-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-face-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-face-detect'
base-image: "ubuntu:24.04"
ubuntu-version: '2404'
runs-on: 'ubuntu-24.04-arm'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-face-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-face-detect'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-face-detect'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-face-detect'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-face-detect'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-face-detect'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-face-detect'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-face-detect'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
runs-on: 'ubuntu-latest'
skip-drivers: 'false'
backend: "face-detect"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# acestep-cpp
- build-type: ''
cuda-major-version: ""
@@ -4906,6 +5224,14 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-ced"
build-type: "metal"
lang: "go"
- backend: "voice-detect"
tag-suffix: "-metal-darwin-arm64-voice-detect"
build-type: "metal"
lang: "go"
- backend: "face-detect"
tag-suffix: "-metal-darwin-arm64-face-detect"
build-type: "metal"
lang: "go"
- backend: "acestep-cpp"
tag-suffix: "-metal-darwin-arm64-acestep-cpp"
build-type: "metal"
@@ -4922,6 +5248,37 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
build-type: "metal"
lang: "go"
# Vision/utility C++/ggml backends (go+cgo). Their Makefiles already carry a
# Darwin/Metal path (GGML_METAL=ON when build-type=metal); this just builds and
# publishes the metal image so Apple Silicon can install them.
- backend: "depth-anything-cpp"
tag-suffix: "-metal-darwin-arm64-depth-anything-cpp"
build-type: "metal"
lang: "go"
- backend: "locate-anything-cpp"
tag-suffix: "-metal-darwin-arm64-locate-anything-cpp"
build-type: "metal"
lang: "go"
- backend: "rfdetr-cpp"
tag-suffix: "-metal-darwin-arm64-rfdetr-cpp"
build-type: "metal"
lang: "go"
- backend: "sam3-cpp"
tag-suffix: "-metal-darwin-arm64-sam3-cpp"
build-type: "metal"
lang: "go"
# privacy-filter (PII/NER) is a C++/ggml backend built by a bespoke darwin
# script (make backends/privacy-filter-darwin); ggml defaults Metal ON on Apple
# so the build is Metal-enabled. lang=go drives runner/toolchain selection only.
- backend: "privacy-filter"
tag-suffix: "-metal-darwin-arm64-privacy-filter"
lang: "go"
# LocalVQE has no Metal path; on Apple Silicon it builds CPU-only (GGML_METAL
# OFF) but is still a native arm64 image. Uses the darwin/metal build profile.
- backend: "localvqe"
tag-suffix: "-metal-darwin-arm64-localvqe"
build-type: "metal"
lang: "go"
- backend: "voxtral"
tag-suffix: "-metal-darwin-arm64-voxtral"
build-type: "metal"
@@ -4938,9 +5295,6 @@ includeDarwin:
- backend: "qwen-tts"
tag-suffix: "-metal-darwin-arm64-qwen-tts"
build-type: "mps"
- backend: "fish-speech"
tag-suffix: "-metal-darwin-arm64-fish-speech"
build-type: "mps"
- backend: "voxcpm"
tag-suffix: "-metal-darwin-arm64-voxcpm"
build-type: "mps"

View File

@@ -99,6 +99,7 @@ jobs:
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
/opt/homebrew/Cellar/opus
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
@@ -113,7 +114,12 @@ jobs:
# nlohmann-json is header-only and required by the ds4 backend
# (dsml_renderer.cpp includes <nlohmann/json.hpp>); on Linux it comes
# from the apt-installed nlohmann-json3-dev in the build image.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json
# opus + pkg-config are required by the opus go backend: its
# Makefile/package.sh call `pkg-config --cflags/--libs opus` to build
# libopusshim.dylib and to locate libopus.dylib for bundling. brew's
# pkg-config defaults its search path to the Homebrew prefix so the
# opus.pc is found.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json opus pkg-config
# Force-reinstall ccache so brew re-validates its full runtime-dep
# closure on every run. This is the durable fix: when the upstream
# ccache formula gains a new transitive dep (as it has multiple times
@@ -132,7 +138,7 @@ jobs:
# and decides "already installed" without re-linking, so on a cache-
# hit run the formulas aren't on PATH. Force-link them; --overwrite
# tolerates pre-existing symlinks from earlier installs.
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json 2>/dev/null || true
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json opus pkg-config 2>/dev/null || true
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
@@ -153,6 +159,7 @@ jobs:
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
/opt/homebrew/Cellar/opus
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----
@@ -228,8 +235,17 @@ jobs:
run: |
make backends/ds4-darwin
# privacy-filter is a C++/ggml backend like ds4 - a single grpc-server with
# otool dylib bundling - so it gets its own bespoke darwin script rather than
# the generic build-darwin-go-backend path.
- name: Build privacy-filter backend (Darwin Metal)
if: inputs.backend == 'privacy-filter'
run: |
make protogen-go
make backends/privacy-filter-darwin
- name: Build ${{ inputs.backend }}-darwin
if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4'
if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
run: |
make protogen-go
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend

View File

@@ -46,6 +46,14 @@ jobs:
variable: "CED_VERSION"
branch: "master"
file: "backend/go/ced/Makefile"
- repository: "mudler/voice-detect.cpp"
variable: "VOICEDETECT_VERSION"
branch: "master"
file: "backend/go/voice-detect/Makefile"
- repository: "mudler/face-detect.cpp"
variable: "FACEDETECT_VERSION"
branch: "master"
file: "backend/go/face-detect/Makefile"
- repository: "mudler/depth-anything.cpp"
variable: "DEPTHANYTHING_VERSION"
branch: "master"

View File

@@ -24,6 +24,11 @@ jobs:
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
MACOS_SIGN_P12: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_SIGN_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
launcher-build-darwin:
runs-on: macos-latest
steps:
@@ -35,9 +40,19 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: 1.23
- name: Build launcher for macOS ARM64
run: |
make build-launcher-darwin
- name: Import signing certificate
env:
MACOS_CERTIFICATE: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_CERTIFICATE_PWD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_CI_KEYCHAIN_PWD: ${{ secrets.MACOS_CI_KEYCHAIN_PWD }}
run: bash contrib/macos/sign-and-notarize.sh import-cert
- name: Build, sign and notarize the DMG
env:
MACOS_SIGN_IDENTITY: ${{ secrets.MACOS_SIGN_IDENTITY }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
run: make release-launcher-darwin
- name: Upload DMG to Release
uses: softprops/action-gh-release@v3
with:

View File

@@ -1008,7 +1008,11 @@ jobs:
# image + working dir.
tests-vibevoice-cpp-grpc-transcription:
needs: detect-changes
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
# Skip on release tag pushes: the ASR Q4_K model is ~10 GB and cannot be
# pulled from HF within the inner `go test -timeout 30m` budget on a CI
# runner, so every tag build hung and timed out. Still runs on PRs/branch
# pushes that touch vibevoice-cpp so regressions are caught off the release path.
if: (needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true') && !startsWith(github.ref, 'refs/tags/')
runs-on: bigger-runner
timeout-minutes: 150
steps:

View File

@@ -121,3 +121,19 @@ jobs:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
# Fast standalone unit tests for the backends' pure C++ helpers - currently the
# llama-cpp message reconstruction (backend/cpp/llama-cpp/message_content.h),
# which guards the OpenAI chat content normalization (mudler/LocalAI#10524,
# #7324, #7528). The runner discovers every *_test.cpp under backend/cpp/, so
# new pure-C++ unit tests are picked up with no CI changes. These need only the
# C++ stdlib + nlohmann/json, so they run on every PR without the full
# llama.cpp + gRPC backend build. (The same suite is also wired as an opt-in
# CMake/ctest target, -DLLAMA_GRPC_BUILD_TESTS=ON, for in-backend-build runs.)
tests-backend-cpp:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
- name: Run backend C++ unit tests
run: make test-backend-cpp

3
.gitignore vendored
View File

@@ -94,3 +94,6 @@ core/http/react-ui/test-results/
# SDD / brainstorm scratch (agent-driven development)
.superpowers/
# Local Apple signing material (never commit)
.certs/

View File

@@ -9,7 +9,8 @@ source:
enabled: true
name_template: '{{ .ProjectName }}-{{ .Tag }}-source'
builds:
- main: ./cmd/local-ai
- id: local-ai
main: ./cmd/local-ai
env:
- CGO_ENABLED=0
ldflags:
@@ -35,3 +36,19 @@ snapshot:
version_template: "{{ .Tag }}-next"
changelog:
use: github-native
# Sign + notarize the macOS server binary via the quill backend (runs on Linux,
# no macOS runner needed). Disabled automatically when MACOS_SIGN_P12 is unset
# (forks / PRs), so those builds stay unsigned and green.
notarize:
macos:
- enabled: '{{ isEnvSet "MACOS_SIGN_P12" }}'
ids:
- local-ai
sign:
certificate: "{{.Env.MACOS_SIGN_P12}}"
password: "{{.Env.MACOS_SIGN_PASSWORD}}"
notarize:
issuer_id: "{{.Env.MACOS_NOTARY_ISSUER_ID}}"
key_id: "{{.Env.MACOS_NOTARY_KEY_ID}}"
key: "{{.Env.MACOS_NOTARY_KEY}}"
wait: true

View File

@@ -43,4 +43,5 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
- **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **Backend OS coverage**: a new backend must target every OS it can build for, not just Linux. `.github/backend-matrix.yml` has two matrices — `include:` (Linux) and `includeDarwin:` (macOS / Apple Silicon). Most C/C++/GGML and many Python backends build on Darwin too — wire the `includeDarwin` entry + `backend/index.yaml` `metal:` entries, or say in the PR why an OS is unsupported. See the darwin checklist in [.agents/adding-backends.md](.agents/adding-backends.md).
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter backends/privacy-filter-darwin
GOCMD=go
GOTEST=$(GOCMD) test
@@ -103,7 +103,7 @@ COVERAGE_E2E_LABELS?=!real-models
COVERAGE_EXCLUDE_RE?=grpc/proto/.*[.]pb[.]go
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-backend-cpp test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
all: help
@@ -201,6 +201,13 @@ test: prepare-test
OPUS_SHIM_LIBRARY=$(abspath ./pkg/opus/shim/libopusshim.so) \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
## Compiles and runs the standalone C++ unit tests for the backends (pure
## helpers that depend only on the stdlib + nlohmann/json, no full backend
## build). Discovers every *_test.cpp under backend/cpp/ - see
## backend/cpp/run-unit-tests.sh. Set NLOHMANN_INCLUDE to skip the header fetch.
test-backend-cpp:
bash backend/cpp/run-unit-tests.sh
## Runs the core suite ($(TEST_PATHS)) with statement-coverage instrumentation
## and writes a merged profile to $(COVERAGE_PROFILE). Deliberately omits
## --fail-fast so a single failure doesn't truncate the coverage number, and
@@ -1129,6 +1136,10 @@ backends/ds4-darwin: build
bash ./scripts/build/ds4-darwin.sh
./local-ai backends install "ocifile://$(abspath ./backend-images/ds4.tar)"
backends/privacy-filter-darwin: build
bash ./scripts/build/privacy-filter-darwin.sh
./local-ai backends install "ocifile://$(abspath ./backend-images/privacy-filter.tar)"
build-darwin-python-backend: build
bash ./scripts/build/python-darwin.sh
@@ -1449,13 +1460,32 @@ docs: docs/static/gallery.html
########################################################
## fyne cross-platform build
build-launcher-darwin: build-launcher
go run github.com/tiagomelo/macos-dmg-creator/cmd/createdmg@latest \
--appName "LocalAI" \
--appBinaryPath "$(LAUNCHER_BINARY_NAME)" \
--bundleIdentifier "com.localai.launcher" \
--iconPath "core/http/static/logo.png" \
--outputDir "dist/"
# Build LocalAI.app from the launcher via fyne (metadata read from cmd/launcher/FyneApp.toml).
# Signing happens via contrib/macos/sign-and-notarize.sh, which is a no-op when the signing
# secrets are unset, so unsigned local/fork builds keep working.
build-launcher-darwin:
rm -rf dist/LocalAI.app cmd/launcher/LocalAI.app
mkdir -p dist
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os darwin -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)
mv cmd/launcher/LocalAI.app dist/LocalAI.app
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.app
# Wrap the (signed) app into a drag-to-Applications DMG via hdiutil, then sign the DMG.
dmg-launcher-darwin: build-launcher-darwin
rm -rf dist/dmg dist/LocalAI.dmg
mkdir -p dist/dmg
cp -R dist/LocalAI.app dist/dmg/LocalAI.app
ln -s /Applications dist/dmg/Applications
hdiutil create -volname "LocalAI" -srcfolder dist/dmg -ov -format UDZO dist/LocalAI.dmg
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.dmg
# Submit the DMG to Apple notarization and staple the ticket (no-op without notary secrets).
notarize-launcher-darwin: dmg-launcher-darwin
bash contrib/macos/sign-and-notarize.sh notarize dist/LocalAI.dmg
# Single entrypoint for CI: build -> sign app -> dmg -> sign dmg -> notarize -> staple.
release-launcher-darwin: notarize-launcher-darwin
@echo "dist/LocalAI.dmg is ready"
build-launcher-linux:
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv launcher.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv LocalAI.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz

View File

@@ -177,6 +177,7 @@ For more details, see the [Getting Started guide](https://localai.io/basics/gett
## Latest News
- **June 2026**: New native biometric backends from the LocalAI team: [voice-detect.cpp](https://github.com/mudler/voice-detect.cpp) for speaker recognition and voice analysis (ECAPA-TDNN, WeSpeaker, ERes2Net, CAM++, wav2vec2 age/gender/emotion) and [face-detect.cpp](https://github.com/mudler/face-detect.cpp) for face detection, recognition, demographics and anti-spoofing (SCRFD/ArcFace, YuNet/SFace). Both are from-scratch C++/ggml engines with no Python or onnxruntime at inference, self-contained GGUF weights, bit-exact parity with the reference, and GPU cuDNN parity, replacing the heavier Python `insightface` and `speaker-recognition` backends ([PR #10441](https://github.com/mudler/LocalAI/pull/10441)).
- **June 2026**: New [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus [streaming of the realtime LLM / TTS / transcription pipeline stages](https://github.com/mudler/LocalAI/pull/10176) and [configurable WebRTC ICE candidates](https://github.com/mudler/LocalAI/pull/10231).
- **June 2026**: Big speech push: the [parakeet.cpp](https://github.com/mudler/parakeet.cpp) ASR engine gains [NeMo-faithful segment timestamps](https://github.com/mudler/LocalAI/pull/10207), a [multilingual streaming Nemotron-3.5 model](https://github.com/mudler/LocalAI/pull/10199), [dynamic batching for concurrent transcription](https://github.com/mudler/LocalAI/pull/10112) and [CUDA graphs](https://github.com/mudler/LocalAI/pull/10273); the new [CrispASR backend](https://github.com/mudler/LocalAI/pull/10099) adds multi-architecture ASR + TTS, and [60 Piper TTS voices across 42 languages](https://github.com/mudler/LocalAI/pull/10296) land in the gallery (plus [per-request TTS instructions and params](https://github.com/mudler/LocalAI/pull/10172)).
- **June 2026**: New backends and models: [locate-anything.cpp](https://github.com/mudler/LocalAI/pull/10264) for open-vocabulary object detection via ggml, [Ideogram4 image generation](https://github.com/mudler/LocalAI/pull/10201) in stablediffusion-ggml, [llama.cpp video input](https://github.com/mudler/LocalAI/pull/10216), and the [Gemma 4 QAT family with MTP speculative-decoding pairs](https://github.com/mudler/LocalAI/pull/10215). Plus an [interactive CLI chat mode](https://github.com/mudler/LocalAI/pull/10226) and [RAG source citations in agent responses](https://github.com/mudler/LocalAI/pull/10228).

View File

@@ -137,7 +137,7 @@ RUN <<EOT bash
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} libcudnn9-dev-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/*

View File

@@ -1,15 +1,6 @@
## Clip/LLaVA library for multimodal support — built locally from copied sources
set(TARGET myclip)
add_library(${TARGET} clip.cpp clip.h llava.cpp llava.h)
install(TARGETS ${TARGET} LIBRARY)
target_include_directories(myclip PUBLIC .)
target_include_directories(myclip PUBLIC ../..)
target_include_directories(myclip PUBLIC ../../common)
target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if (NOT MSVC)
target_compile_options(${TARGET} PRIVATE -Wno-cast-qual)
endif()
## Multimodal support is provided by the in-tree `mtmd` library target
## (examples/mtmd/), which the grpc-server links and includes below. clip/llava
## were pruned upstream; the high-level mtmd_* / mtmd_helper_* API is used instead.
set(TARGET grpc-server)
set(CMAKE_CXX_STANDARD 17)
@@ -67,12 +58,16 @@ add_library(hw_grpc_proto
${hw_proto_hdrs} )
add_executable(${TARGET} grpc-server.cpp json.hpp)
target_link_libraries(${TARGET} PRIVATE common llama myclip ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
# mtmd public headers (mtmd.h / mtmd-helper.h) live in examples/mtmd/.
# Linking the mtmd target also propagates this include dir, but we add it
# explicitly for clarity.
target_include_directories(${TARGET} PRIVATE ../mtmd)
target_link_libraries(${TARGET} PRIVATE common llama mtmd ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
absl::flags_parse
gRPC::${_REFLECTION}
gRPC::${_GRPC_GRPCPP}
protobuf::${_PROTOBUF_LIBPROTOBUF})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
target_compile_features(${TARGET} PRIVATE cxx_std_17)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=d5507e33ae7ee2b7b41475f08044d3bde3b839ee
IK_LLAMA_VERSION?=f96eaddba8bed6a9a5e628bbf6a566775c70b49c
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -11,8 +11,8 @@
#include <memory>
#include <string>
#include <getopt.h>
#include "clip.h"
#include "llava.h"
#include "mtmd.h"
#include "mtmd-helper.h"
#include "log.h"
#include "common.h"
#include "json.hpp"
@@ -45,7 +45,9 @@ using backend::HealthMessage;
///// LLAMA.CPP server code below
using json = nlohmann::json;
// Match mtmd.h and ik_llama's server/common headers, which all use
// nlohmann::ordered_json; a plain nlohmann::json alias collides at global scope.
using json = nlohmann::ordered_json;
struct server_params
{
@@ -219,6 +221,11 @@ struct llama_client_slot
// multimodal
std::vector<slot_image> images;
// Full prompt with mtmd media markers (mtmd_default_marker()) substituted in
// place of the legacy [img-N] tags, covering the text up to and including the
// last image. The text after the last image is kept in params.input_suffix and
// decoded through the normal token path so the sampling loop is unchanged.
std::string mtmd_prompt;
// stats
size_t sent_count = 0;
@@ -252,14 +259,14 @@ struct llama_client_slot
for (slot_image & img : images)
{
free(img.image_embedding);
if (img.img_data) {
clip_image_u8_free(img.img_data);
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
}
img.prefix_prompt = "";
}
images.clear();
mtmd_prompt = "";
}
bool has_budget(gpt_params &global_params) {
@@ -396,46 +403,13 @@ struct llama_metrics {
}
};
struct llava_embd_batch {
std::vector<llama_pos> pos;
std::vector<int32_t> n_seq_id;
std::vector<llama_seq_id> seq_id_0;
std::vector<llama_seq_id *> seq_ids;
std::vector<int8_t> logits;
llama_batch batch;
llava_embd_batch(float * embd, int32_t n_tokens, llama_pos pos_0, llama_seq_id seq_id) {
pos .resize(n_tokens);
n_seq_id.resize(n_tokens);
seq_ids .resize(n_tokens + 1);
logits .resize(n_tokens);
seq_id_0.resize(1);
seq_id_0[0] = seq_id;
seq_ids [n_tokens] = nullptr;
batch = {
/*n_tokens =*/ n_tokens,
/*tokens =*/ nullptr,
/*embd =*/ embd,
/*pos =*/ pos.data(),
/*n_seq_id =*/ n_seq_id.data(),
/*seq_id =*/ seq_ids.data(),
/*logits =*/ logits.data(),
};
for (int i = 0; i < n_tokens; i++) {
batch.pos [i] = pos_0 + i;
batch.n_seq_id[i] = 1;
batch.seq_id [i] = seq_id_0.data();
batch.logits [i] = false;
}
}
};
struct llama_server_context
{
llama_model *model = nullptr;
llama_context *ctx = nullptr;
const llama_vocab * vocab = nullptr;
clip_ctx *clp_ctx = nullptr;
mtmd_context *mctx = nullptr;
gpt_params params;
@@ -491,11 +465,6 @@ struct llama_server_context
if (!params.mmproj.path.empty()) {
multimodal = true;
LOG_INFO("Multi Modal Mode Enabled", {});
clp_ctx = clip_model_load(params.mmproj.path.c_str(), /*verbosity=*/ 1);
if(clp_ctx == nullptr) {
LOG_ERR("unable to load clip model: %s", params.mmproj.path.c_str());
return false;
}
if (params.n_ctx < 2048) { // request larger context for the image embedding
params.n_ctx = 2048;
@@ -512,10 +481,24 @@ struct llama_server_context
}
if (multimodal) {
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
const int n_embd_llm = llama_model_n_embd(model);
if (n_embd_clip != n_embd_llm) {
LOG("%s: embedding dim of the multimodal projector (%d) is not equal to that of LLaMA (%d). Make sure that you use the correct mmproj file.\n", __func__, n_embd_clip, n_embd_llm);
// mtmd_init_from_file requires the already-loaded text model, so it must
// run AFTER llama_init_from_gpt_params. It validates the projector
// against the model internally and returns nullptr on dim mismatch, so
// the explicit clip_n_mmproj_embd check is no longer needed.
mtmd_context_params mparams = mtmd_context_params_default();
mparams.use_gpu = params.mmproj_use_gpu;
mparams.print_timings = false;
mparams.n_threads = params.n_threads_mtmd != -1 ? params.n_threads_mtmd
: params.n_threads_batch != -1 ? params.n_threads_batch
: params.n_threads;
mparams.verbosity = GGML_LOG_LEVEL_INFO;
mparams.flash_attn_type = params.flash_attn ? LLAMA_FLASH_ATTN_TYPE_ENABLED
: LLAMA_FLASH_ATTN_TYPE_DISABLED;
mparams.image_min_tokens = params.image_min_tokens;
mparams.image_max_tokens = params.image_max_tokens;
mctx = mtmd_init_from_file(params.mmproj.path.c_str(), model, mparams);
if (mctx == nullptr) {
LOG_ERR("unable to load multimodal projector: %s", params.mmproj.path.c_str());
llama_free(ctx);
llama_free_model(model);
return false;
@@ -865,8 +848,8 @@ struct llama_server_context
slot_image img_sl;
img_sl.id = img.count("id") != 0 ? img["id"].get<int>() : slot->images.size();
img_sl.img_data = clip_image_u8_init();
if (!clip_image_load_from_bytes(image_buffer.data(), image_buffer.size(), img_sl.img_data))
img_sl.bitmap = mtmd_helper_bitmap_init_from_buf(mctx, image_buffer.data(), image_buffer.size());
if (img_sl.bitmap == nullptr)
{
LOG_ERR("%s: failed to load image, slot_id: %d, img_sl_id: %d",
__func__,
@@ -879,50 +862,74 @@ struct llama_server_context
{"slot_id", slot->id},
{"img_sl_id", img_sl.id}
});
img_sl.request_encode_image = true;
slot->images.push_back(img_sl);
}
// process prompt
// example: system prompt [img-102] user [img-103] describe [img-134] -> [{id: 102, prefix: 'system prompt '}, {id: 103, prefix: ' user '}, {id: 134, prefix: ' describe '}]}
// Translate the legacy [img-N] tags into mtmd media markers, in
// order, and collect the matching bitmaps in marker order so they
// line up with the markers passed to mtmd_tokenize(). The text after
// the last image stays in input_suffix and is decoded through the
// normal token path, so the sampling loop is unchanged.
// example: system prompt [img-102] user [img-103] describe [img-134]
if (slot->images.size() > 0 && !slot->prompt.is_array())
{
const std::string marker = mtmd_default_marker();
std::string prompt = slot->prompt.get<std::string>();
size_t pos = 0, begin_prefix = 0;
std::string built_prompt;
std::vector<slot_image> ordered;
size_t pos = 0, copy_from = 0;
std::string pattern = "[img-";
while ((pos = prompt.find(pattern, pos)) != std::string::npos) {
size_t end_prefix = pos;
pos += pattern.length();
size_t end_pos = prompt.find(']', pos);
if (end_pos != std::string::npos)
{
std::string image_id = prompt.substr(pos, end_pos - pos);
try
{
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
{
if (img.id == img_id) {
found = true;
img.prefix_prompt = prompt.substr(begin_prefix, end_prefix - begin_prefix);
begin_prefix = end_pos + 1;
break;
}
}
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
slot->images.clear();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
slot->images.clear();
return false;
auto free_images = [&]() {
for (slot_image &img : slot->images) {
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
}
}
slot->images.clear();
};
while ((pos = prompt.find(pattern, pos)) != std::string::npos) {
size_t tag_begin = pos;
pos += pattern.length();
size_t end_pos = prompt.find(']', pos);
if (end_pos == std::string::npos) {
break;
}
std::string image_id = prompt.substr(pos, end_pos - pos);
try
{
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
{
if (img.id == img_id) {
found = true;
// text before this tag, then the media marker
built_prompt += prompt.substr(copy_from, tag_begin - copy_from);
built_prompt += marker;
copy_from = end_pos + 1;
ordered.push_back(img);
break;
}
}
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
free_images();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
free_images();
return false;
}
pos = end_pos + 1;
}
// bitmaps are consumed in marker order by mtmd_tokenize()
slot->images = ordered;
slot->mtmd_prompt = built_prompt;
slot->prompt = "";
slot->params.input_suffix = prompt.substr(begin_prefix);
slot->params.input_suffix = prompt.substr(copy_from);
slot->params.cache_prompt = false; // multimodal doesn't support cache prompt
}
}
@@ -1176,21 +1183,10 @@ struct llama_server_context
bool process_images(llama_client_slot &slot) const
{
for (slot_image &img : slot.images)
{
if (!img.request_encode_image)
{
continue;
}
if (!llava_image_embed_make_with_clip_img(clp_ctx, params.n_threads, img.img_data, &img.image_embedding, &img.image_tokens)) {
LOG("Error processing the given image");
return false;
}
img.request_encode_image = false;
}
// With the mtmd pipeline, image encoding is no longer eager: the bitmaps
// are tokenized and encoded together with the surrounding text inside
// ingest_images() via mtmd_tokenize() + mtmd_helper_eval_chunks(). This
// just reports whether the slot carries any images to process.
return slot.images.size() > 0;
}
@@ -1435,69 +1431,70 @@ struct llama_server_context
}
}
// for multiple images processing
// Tokenize the multimodal prompt (text interleaved with media markers) together
// with the slot's bitmaps, then decode the resulting chunks into the llama
// context via the high-level mtmd helper. The helper runs llama_decode() on the
// text chunks and mtmd_encode() + llama_decode() on the image chunks, handling
// batching and any pre/post decode setup (e.g. non-causal attention for gemma3).
// Advances slot.n_past by the number of positions consumed, then leaves the
// post-image suffix tokens in `batch` so the normal decode + sampling loop
// produces the first generated token.
bool ingest_images(llama_client_slot &slot, int n_batch)
{
int image_idx = 0;
while (image_idx < (int) slot.images.size())
if (mctx == nullptr)
{
slot_image &img = slot.images[image_idx];
LOG("%s : multimodal context is not initialized\n", __func__);
return false;
}
// process prefix prompt
for (int32_t i = 0; i < (int32_t) batch.n_tokens; i += n_batch)
{
const int32_t n_tokens = std::min(n_batch, (int32_t) (batch.n_tokens - i));
llama_batch batch_view = {
n_tokens,
batch.token + i,
nullptr,
batch.pos + i,
batch.n_seq_id + i,
batch.seq_id + i,
batch.logits + i,
};
if (llama_decode(ctx, batch_view))
{
LOG("%s : failed to eval\n", __func__);
return false;
}
}
// bitmaps stay owned by slot.images (freed on reset()); pass non-owning ptrs
std::vector<const mtmd_bitmap *> bitmaps;
bitmaps.reserve(slot.images.size());
for (const slot_image &img : slot.images)
{
bitmaps.push_back(img.bitmap);
}
// process image with llm
for (int i = 0; i < img.image_tokens; i += n_batch)
{
int n_eval = img.image_tokens - i;
if (n_eval > n_batch)
{
n_eval = n_batch;
}
mtmd_input_text inp_txt;
inp_txt.text = slot.mtmd_prompt.c_str();
inp_txt.add_special = add_bos_token;
inp_txt.parse_special = true;
const int n_embd = llama_model_n_embd(model);
float * embd = img.image_embedding + i * n_embd;
llava_embd_batch llava_batch = llava_embd_batch(embd, n_eval, slot.n_past, 0);
if (llama_decode(ctx, llava_batch.batch))
{
LOG("%s : failed to eval image\n", __func__);
return false;
}
slot.n_past += n_eval;
}
image_idx++;
mtmd::input_chunks chunks(mtmd_input_chunks_init());
int32_t res = mtmd_tokenize(mctx,
chunks.ptr.get(),
&inp_txt,
bitmaps.data(),
bitmaps.size());
if (res != 0)
{
LOG("%s : failed to tokenize multimodal prompt, res = %d\n", __func__, res);
return false;
}
common_batch_clear(batch);
const llama_pos start_pos = (llama_pos) system_tokens.size() + slot.n_past;
llama_pos new_n_past = start_pos;
if (mtmd_helper_eval_chunks(mctx,
ctx,
chunks.ptr.get(),
start_pos,
slot.id,
n_batch,
/*logits_last=*/ false,
&new_n_past) != 0)
{
LOG("%s : failed to eval multimodal chunks\n", __func__);
return false;
}
slot.n_past += (int32_t) (new_n_past - start_pos);
// append prefix of next image
const auto json_prompt = (image_idx >= (int) slot.images.size()) ?
slot.params.input_suffix : // no more images, then process suffix prompt
(json)(slot.images[image_idx].prefix_prompt);
std::vector<llama_token> append_tokens = tokenize(json_prompt, false); // has next image
for (int i = 0; i < (int) append_tokens.size(); ++i)
{
common_batch_add(batch, append_tokens[i], system_tokens.size() + slot.n_past, { slot.id }, true);
slot.n_past += 1;
}
// queue the post-image suffix text for the normal decode + sampling path
common_batch_clear(batch);
std::vector<llama_token> suffix_tokens = tokenize(slot.params.input_suffix, false);
for (llama_token tok : suffix_tokens)
{
common_batch_add(batch, tok, system_tokens.size() + slot.n_past, { slot.id }, false);
slot.n_past += 1;
}
return true;
@@ -1884,8 +1881,11 @@ struct llama_server_context
const bool has_images = process_images(slot);
// process the prefix of first image
std::vector<llama_token> prefix_tokens = has_images ? tokenize(slot.images[0].prefix_prompt, add_bos_token) : prompt_tokens;
// For the multimodal path the whole pre-image / inter-image text is
// tokenized and decoded inside ingest_images() via mtmd, so no prefix
// tokens are queued here; the post-image suffix is appended by
// ingest_images() for the normal decode + sampling loop.
std::vector<llama_token> prefix_tokens = has_images ? std::vector<llama_token>() : prompt_tokens;
int32_t slot_npast = slot.n_past_se > 0 ? slot.n_past_se : slot.n_past;

View File

@@ -1,11 +0,0 @@
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2494,7 +2494,7 @@
}
new_data = work.data();
- new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr);
+ new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr, nullptr);
} else {
new_type = cur->type;
new_data = cur->data;

View File

@@ -17,28 +17,9 @@ cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
cp -r utils.hpp llama.cpp/examples/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/examples/grpc-server/
## Copy clip/llava files for multimodal support (built as myclip library)
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
# Prepend llama.h include to llava.h
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
# Copy clip-impl.h if it exists
if [ -f llama.cpp/examples/llava/clip-impl.h ]; then
cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h
fi
# Copy stb_image.h
if [ -f llama.cpp/vendor/stb/stb_image.h ]; then
cp -rfv llama.cpp/vendor/stb/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
elif [ -f llama.cpp/common/stb_image.h ]; then
cp -rfv llama.cpp/common/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
fi
## Fix API compatibility in llava.cpp (llama_n_embd -> llama_model_n_embd)
if [ -f llama.cpp/examples/grpc-server/llava.cpp ]; then
sed -i 's/llama_n_embd(/llama_model_n_embd(/g' llama.cpp/examples/grpc-server/llava.cpp
fi
## Multimodal support is provided by the `mtmd` library target (examples/mtmd/),
## which the grpc-server links and includes directly. No source copy is needed:
## clip/llava were pruned upstream and the high-level mtmd_* API is used instead.
set +e
if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -13,28 +13,28 @@ grep -e "flags" /proc/cpuinfo | head -1
# ik_llama.cpp requires AVX2 — default to avx2 binary
BINARY=ik-llama-cpp-avx2
if [ -e $CURDIR/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
if [ -e "$CURDIR"/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 NOT found, using fallback"
BINARY=ik-llama-cpp-fallback
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec $CURDIR/$BINARY "$@"
exec "$CURDIR"/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec $CURDIR/ik-llama-cpp-fallback "$@"
exec "$CURDIR"/ik-llama-cpp-fallback "$@"

View File

@@ -11,9 +11,12 @@
#include "json.hpp"
#include "clip.h"
#include "mtmd.h"
using json = nlohmann::json;
// mtmd.h and ik_llama's entire server/common stack (chat.h, server-common.h,
// server-task.h, ...) declare `using json = nlohmann::ordered_json`, so match it
// here: a plain `nlohmann::json` alias collides with mtmd.h's at global scope.
using json = nlohmann::ordered_json;
extern bool server_verbose;
@@ -111,13 +114,12 @@ struct slot_image
{
int32_t id;
bool request_encode_image = false;
float * image_embedding = nullptr;
int32_t image_tokens = 0;
clip_image_u8 * img_data;
std::string prefix_prompt; // before of this image
// mtmd bitmap (image/audio) decoded from the request buffer. Owned by the
// slot; freed via mtmd_bitmap_free() on reset. The high-level mtmd pipeline
// (mtmd_tokenize + mtmd_helper_eval_chunks) consumes these directly, so the
// legacy eager-encode fields (embedding/tokens) and per-image prefix prompt
// are no longer needed.
mtmd_bitmap * bitmap = nullptr;
};
// completion token output with probabilities

View File

@@ -87,3 +87,18 @@ target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()
# Unit test for the message-content normalization helper (message_content.h).
# Off by default so the normal backend build is untouched; enable with
# -DLLAMA_GRPC_BUILD_TESTS=ON and run via ctest. It reuses llama.cpp's vendored
# <nlohmann/json.hpp> (propagated by the common helpers library) so it has no
# extra dependency beyond what the backend already builds against.
option(LLAMA_GRPC_BUILD_TESTS "Build grpc-server unit tests" OFF)
if(LLAMA_GRPC_BUILD_TESTS)
enable_testing()
add_executable(message_content_test message_content_test.cpp message_content.h)
target_include_directories(message_content_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(message_content_test PRIVATE ${_LLAMA_COMMON_TARGET})
target_compile_features(message_content_test PRIVATE cxx_std_17)
add_test(NAME message_content_test COMMAND message_content_test)
endif()

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=8be759e6f70d629638a7eb70db3824cbdcea370b
LLAMA_VERSION?=0ed235ea2c17a19fc8238668653946721ed136fd
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=
@@ -156,11 +156,11 @@ llama-cpp-grpc: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build purge
$(info ${GREEN}I llama-cpp build info:grpc${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" TARGET="--target grpc-server --target ggml-rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/grpc-server llama-cpp-grpc
llama-cpp-rpc-server: llama-cpp-grpc
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/llama.cpp/build/bin/rpc-server llama-cpp-rpc-server
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/llama.cpp/build/bin/ggml-rpc-server llama-cpp-rpc-server
llama.cpp:
mkdir -p llama.cpp

View File

@@ -30,6 +30,19 @@
#define LOCALAI_HAS_SERVER_SCHEMA 1
#include "server-schema.cpp"
#endif
// server-stream.cpp exists only in llama.cpp after the upstream refactor that
// added the SSE stream-resumption layer (stream_session/stream_pipe_producer).
// server-context.cpp calls into it (spipe->cleanup(), stream_aware_should_stop,
// stream_session_attach_pipe), so its definitions must be part of this
// translation unit or the link fails with "undefined reference to
// stream_pipe_producer::cleanup()". The file is self-contained (its only
// external symbols come from server-common, already pulled in above) and the
// http route-handler factories it also defines are unused here but harmless.
// __has_include keeps the source compatible with older pins/forks that predate
// the split.
#if __has_include("server-stream.cpp")
#include "server-stream.cpp"
#endif
#include "server-context.cpp"
// LocalAI
@@ -39,6 +52,7 @@
#include "common.h"
#include "arg.h"
#include "chat-auto-parser.h"
#include "message_content.h"
#include <getopt.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
#include <grpcpp/grpcpp.h>
@@ -1616,242 +1630,20 @@ public:
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
json msg_json;
msg_json["role"] = msg.role();
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
}
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -2072,36 +1864,7 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
llama_grpc::normalize_template_message(body_json["messages"][idx]);
}
}
@@ -2433,264 +2196,20 @@ public:
SRV_INF("[CONTENT DEBUG] Predict: Processing %d messages\n", request->messages_size());
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
json msg_json;
msg_json["role"] = msg.role();
SRV_INF("[CONTENT DEBUG] Predict: Message %d: role=%s, content_empty=%d, content_length=%zu\n",
i, msg.role().c_str(), msg.content().empty() ? 1 : 0, msg.content().size());
if (!msg.content().empty()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content (first 200 chars): %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
}
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d parsed JSON is null, converting to empty string\n", i);
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content is object, converting to string\n", i);
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content_val was null, setting to empty string\n", i);
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
SRV_INF("[CONTENT DEBUG] Predict: Message %d content set, type=%s\n",
i, content_val.is_string() ? "string" :
content_val.is_array() ? "array" :
content_val.is_object() ? "object" : "other");
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
SRV_INF("[CONTENT DEBUG] Predict: Message %d created content array with media\n", i);
} else if (!msg.tool_calls().empty()) {
// Tool call messages may have null content, but templates expect string
// IMPORTANT: Set to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls, setting content to space (not empty string)\n", i);
msg_json["content"] = " ";
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] Predict: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
}
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -2911,36 +2430,7 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] Predict: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
llama_grpc::normalize_template_message(body_json["messages"][idx]);
}
}

View File

@@ -0,0 +1,192 @@
#pragma once
#include <string>
#include <vector>
#include <nlohmann/json.hpp>
namespace llama_grpc {
// Normalizes a proto message's content string into the JSON value used when
// reconstructing OpenAI-format messages for the tokenizer (jinja) template.
//
// Shared by the streaming (PredictStream) and non-streaming (Predict) message
// reconstruction paths so the two cannot drift.
//
// LocalAI's Go layer (schema.Messages.ToProto) always sends content as a plain
// text string; multimodal media travels in separate proto fields, never inside
// content. So user/system/developer content is *only ever* opaque text and must
// NOT be JSON-sniffed: a prompt that merely looks like JSON (e.g. an ingredient
// list ["1/4 cup sugar", ...]) would otherwise be reinterpreted as structured
// content parts and rejected by oaicompat_chat_params_parse with
// "unsupported content[].type" (https://github.com/mudler/LocalAI/issues/10524).
// (developer is OpenAI's modern system alias - same "human-authored text" nature.)
//
// For assistant/tool messages we still collapse a literal JSON null/object
// (tool-call bookkeeping) to a string, but we never turn a plain string into an
// array/scalar. The array defense is therefore role-independent (arrays/scalars
// fall through for every role); the role gate only governs the null/object case.
inline nlohmann::ordered_json normalize_message_content(const std::string& role,
const std::string& content) {
nlohmann::ordered_json content_val = content;
if (role != "user" && role != "system" && role != "developer") {
try {
nlohmann::ordered_json parsed = nlohmann::ordered_json::parse(content);
if (parsed.is_null()) {
content_val = "";
} else if (parsed.is_object()) {
content_val = parsed.dump();
}
// arrays / scalars: keep the original plain-text string as-is
} catch (const nlohmann::ordered_json::parse_error&) {
// Not JSON, already the plain string
}
}
return content_val;
}
// Final safety pass applied to each reconstructed OpenAI message right before it
// is handed to oaicompat_chat_params_parse (jinja templating). Jinja templates
// assume content is a string: a literal null breaks slicing such as
// message.content[:N] (#7324), and a tool message with array content is rejected
// (#7528). A multimodal user message legitimately carries a typed-part array
// ({type:text}, {type:image_url}, ...), which must be left intact. Shared by the
// streaming and non-streaming paths so this invariant cannot drift between them.
inline void normalize_template_message(nlohmann::ordered_json& msg) {
if (!msg.contains("content")) {
msg["content"] = ""; // templates expect the field to exist
return;
}
nlohmann::ordered_json& content = msg["content"];
const std::string role = (msg.contains("role") && msg["role"].is_string())
? msg["role"].get<std::string>()
: std::string();
if (content.is_null()) {
content = ""; // #7324: null would crash content[:N] slicing
} else if (role == "tool" && content.is_array()) {
content = content.dump(); // #7528: tool messages must have string content
} else if (!content.is_string() && !content.is_array()) {
if (content.is_object()) {
content = content.dump(); // tool-call bookkeeping object -> string
} else {
content = ""; // other scalar (number/bool) -> empty
}
}
// string, or a non-tool (multimodal) typed-part array: leave untouched
}
// One proto message's data, flattened to plain types so the reconstruction logic
// can be shared and unit-tested without protobuf. The streaming and non-streaming
// predict paths both populate this from proto::Message + the request's media.
struct ReconstructedMessageInput {
std::string role;
std::string content; // proto.Message.content (always a plain string)
std::string name;
std::string tool_call_id;
std::string reasoning_content;
std::string tool_calls; // tool_calls as a JSON string, or empty
bool is_last_user_msg = false; // attach request media to this message
std::vector<std::string> images; // base64 (jpeg)
std::vector<std::string> audios; // base64 (wav)
std::vector<std::string> videos; // base64
};
// Appends the request's media as OpenAI typed content parts. Imperative (not
// brace-init) to avoid nlohmann's object-vs-array initializer-list ambiguity.
inline void append_media_parts(nlohmann::ordered_json& content_array,
const std::vector<std::string>& images,
const std::vector<std::string>& audios,
const std::vector<std::string>& videos) {
for (const auto& img : images) {
nlohmann::ordered_json image_chunk;
image_chunk["type"] = "image_url";
nlohmann::ordered_json image_url;
image_url["url"] = "data:image/jpeg;base64," + img;
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
for (const auto& aud : audios) {
nlohmann::ordered_json audio_chunk;
audio_chunk["type"] = "input_audio";
nlohmann::ordered_json input_audio;
input_audio["data"] = aud;
input_audio["format"] = "wav"; // default; could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
for (const auto& vid : videos) {
nlohmann::ordered_json video_chunk;
video_chunk["type"] = "input_video";
nlohmann::ordered_json input_video;
input_video["data"] = vid;
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
// Reconstructs a single OpenAI-format message (the object fed to
// oaicompat_chat_params_parse) from a proto message. Shared by PredictStream and
// Predict so the content/multimodal/tool_calls handling cannot drift between the
// two stream modes (it previously lived as two ~150-line copies with a redundant
// Predict-only tool_calls->" " branch). Guarantees content is always a string or
// a typed-part array, never null/missing.
inline nlohmann::ordered_json build_reconstructed_message(const ReconstructedMessageInput& in) {
nlohmann::ordered_json msg_json;
msg_json["role"] = in.role;
const bool has_media = !in.images.empty() || !in.audios.empty() || !in.videos.empty();
if (!in.content.empty()) {
nlohmann::ordered_json content_val = normalize_message_content(in.role, in.content);
if (content_val.is_string() && in.is_last_user_msg && has_media) {
// Last user message + media: build a typed-part array (text first).
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
nlohmann::ordered_json text_part;
text_part["type"] = "text";
text_part["text"] = content_val.get<std::string>();
content_array.push_back(text_part);
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
} else if (in.is_last_user_msg && has_media) {
// No text but media on the last user message: media-only typed array.
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else {
// Empty content (any role, incl. tool/assistant): templates need a string.
msg_json["content"] = "";
}
if (!in.name.empty()) {
msg_json["name"] = in.name;
}
if (!in.tool_call_id.empty()) {
msg_json["tool_call_id"] = in.tool_call_id;
}
if (!in.reasoning_content.empty()) {
msg_json["reasoning_content"] = in.reasoning_content;
}
if (!in.tool_calls.empty()) {
try {
nlohmann::ordered_json tool_calls = nlohmann::ordered_json::parse(in.tool_calls);
msg_json["tool_calls"] = tool_calls;
// tool_calls + empty/blank content: use " " not "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat turns "" into null, which breaks
// templates that slice message.content[:tool_start_length] (#7324).
if (!msg_json.contains("content") ||
(msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
msg_json["content"] = " ";
}
} catch (const nlohmann::ordered_json::parse_error&) {
// Malformed tool_calls JSON: leave content as-is (prior behavior).
}
}
return msg_json;
}
} // namespace llama_grpc

View File

@@ -0,0 +1,234 @@
// Unit tests for the shared message-reconstruction helpers (message_content.h).
//
// Build & run standalone (nlohmann/json single header on the include path):
// g++ -std=c++17 -I<dir-with-nlohmann> message_content_test.cpp -o t && ./t
// or via CMake: -DLLAMA_GRPC_BUILD_TESTS=ON then ctest.
//
// Regression coverage for:
// #10524 - a user/system prompt that is itself a JSON-array string must stay
// plain text, never be reinterpreted as OpenAI structured parts.
// #7324 - assistant/tool null content -> "" (templates slice content[:N]);
// assistant+tool_calls+empty content -> " " (not "", which becomes null).
// #7528 - tool message array content must reach the template as a string.
// multimodal - last user message text + media -> typed-part array, media kept.
#include <cassert>
#include <iostream>
#include <string>
#include "message_content.h"
using nlohmann::ordered_json;
using llama_grpc::normalize_message_content;
using llama_grpc::normalize_template_message;
using llama_grpc::build_reconstructed_message;
using llama_grpc::ReconstructedMessageInput;
static int failures = 0;
static void check(bool ok, const std::string& name, const std::string& detail = "") {
if (!ok) {
std::cerr << "FAIL " << name << (detail.empty() ? "" : ": " + detail) << "\n";
failures++;
}
}
// ---- normalize_message_content -------------------------------------------
static void expect_norm_string(const char* name, const std::string& role,
const std::string& content, const std::string& want) {
auto got = normalize_message_content(role, content);
if (!got.is_string()) {
check(false, name, "expected a JSON string, got " +
std::string(got.is_array() ? "array" : got.is_object() ? "object" : "other") +
" (" + got.dump() + ")");
return;
}
check(got.get<std::string>() == want, name, "expected \"" + want + "\", got \"" + got.get<std::string>() + "\"");
}
static void test_normalize() {
const std::string ingredients = R"(["1/4 cup brown sugar, packed","1 pound ground beef"])";
// #10524 - JSON-array text must stay a string. Role-INDEPENDENT array defense.
for (const char* role : {"user", "system", "developer", "function", "assistant", "tool"}) {
expect_norm_string((std::string("json_array_stays_text:") + role).c_str(), role, ingredients, ingredients);
}
// #10524 - user/system/developer JSON-object text stays verbatim (NOT re-dumped).
expect_norm_string("user_json_object_verbatim", "user", R"({"a":1})", R"({"a":1})");
expect_norm_string("system_json_object_verbatim", "system", R"({"a":1})", R"({"a":1})");
expect_norm_string("developer_json_object_verbatim", "developer", R"({"a":1})", R"({"a":1})");
// Plain text unchanged for all roles.
expect_norm_string("user_plain_text", "user", "hello world", "hello world");
expect_norm_string("assistant_non_json_text_kept", "assistant", "hi [unclosed", "hi [unclosed");
// #7324 boundary - user/system/developer literal "null" preserved (never parsed).
expect_norm_string("user_literal_null_stays", "user", "null", "null");
expect_norm_string("system_literal_null_stays", "system", "null", "null");
expect_norm_string("developer_literal_null_stays", "developer", "null", "null");
// #7324 - assistant/tool literal null collapses to empty string.
expect_norm_string("assistant_null_to_empty", "assistant", "null", "");
expect_norm_string("tool_null_to_empty", "tool", "null", "");
// #7324/#7528 - assistant/tool object bookkeeping stringified (stays a string).
check(normalize_message_content("assistant", R"({"tool":"x"})").is_string(), "assistant_object_stringified");
check(normalize_message_content("tool", R"({"error":"boom"})").is_string(), "tool_object_stringified");
// #10524-family - a bare scalar that parses as a JSON number stays the string.
expect_norm_string("assistant_scalar_number_stays_string", "assistant", "42", "42");
// baseline - empty content stays empty.
expect_norm_string("user_empty_stays_empty", "user", "", "");
}
// ---- normalize_template_message (BEFORE TEMPLATE sanitizer) ---------------
static void test_template_sanitizer() {
// #7528 - a tool message with an ACTUAL array becomes a string.
{
ordered_json msg = {{"role", "tool"}, {"content", ordered_json::array({{{"type", "text"}, {"text", "r"}}})}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_tool_array_to_string", "got " + msg["content"].dump());
}
// #7324 - null content -> "" for any role.
{
ordered_json msg = {{"role", "assistant"}, {"content", nullptr}};
normalize_template_message(msg);
check(msg["content"].is_string() && msg["content"] == "", "before_template_null_to_empty");
}
// object content -> dumped string (would otherwise throw at the template).
{
ordered_json msg = {{"role", "assistant"}, {"content", {{"x", 1}}}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_object_to_string", "got " + msg["content"].dump());
}
// missing content field -> "".
{
ordered_json msg = {{"role", "user"}};
normalize_template_message(msg);
check(msg.contains("content") && msg["content"] == "", "before_template_missing_to_empty");
}
// multimodal: a well-typed user array must be left UNTOUCHED (role!=tool).
{
ordered_json parts = ordered_json::array();
parts.push_back({{"type", "text"}, {"text", "x"}});
ordered_json img; img["type"] = "image_url"; img["image_url"] = {{"url", "data:..."}};
parts.push_back(img);
ordered_json msg = {{"role", "user"}, {"content", parts}};
normalize_template_message(msg);
check(msg["content"].is_array() && msg["content"].size() == 2, "before_template_user_typed_array_preserved",
"got " + msg["content"].dump());
}
// a plain string is left untouched.
{
ordered_json msg = {{"role", "user"}, {"content", "hello"}};
normalize_template_message(msg);
check(msg["content"] == "hello", "before_template_string_untouched");
}
}
// ---- build_reconstructed_message ----------------------------------------
static void test_reconstruction() {
const std::string ingredients = R"(["1/4 cup brown sugar","1 pound ground beef"])";
// #10524 end-state - user JSON-array text, no media -> string content.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients;
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == ingredients, "recon_user_json_array_string",
"got " + m["content"].dump());
}
// multimodal - user text + one image on last user msg -> typed array, image kept.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 2, "recon_multimodal_text_plus_image",
"got " + m["content"].dump());
check(m["content"][0]["type"] == "text" && m["content"][0]["text"] == ingredients, "recon_multimodal_text_first");
check(m["content"][1]["type"] == "image_url", "recon_multimodal_image_kept");
}
// multimodal media-only - empty text + image on last user msg.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ""; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 1 && m["content"][0]["type"] == "image_url",
"recon_media_only", "got " + m["content"].dump());
}
// #7528 - tool array-string content stays a string.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = R"(["a","b"])"; in.tool_call_id = "call_1";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == R"(["a","b"])", "recon_tool_array_string",
"got " + m["content"].dump());
check(m["tool_call_id"] == "call_1", "recon_tool_call_id_set");
}
// tool empty content -> "".
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == "", "recon_tool_empty_to_string");
}
// #7324 - assistant + tool_calls + empty content -> " " (single space, not "").
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == " ", "recon_toolcalls_empty_content_space",
"got " + m["content"].dump());
check(m["tool_calls"].is_array() && m["tool_calls"].size() == 1, "recon_toolcalls_parsed");
}
// assistant + tool_calls + real content keeps the content.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "I'll call f";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"] == "I'll call f", "recon_toolcalls_with_content_kept");
}
// assistant null content -> "".
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "null";
auto m = build_reconstructed_message(in);
check(m["content"] == "", "recon_assistant_null_to_empty");
}
// malformed tool_calls JSON must not throw; content preserved.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "hi"; in.tool_calls = "{not json";
auto m = build_reconstructed_message(in);
check(m["content"] == "hi" && !m.contains("tool_calls"), "recon_malformed_toolcalls_safe");
}
// optional fields: name + reasoning carried through.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "result"; in.name = "get_weather"; in.reasoning_content = "thinking";
auto m = build_reconstructed_message(in);
check(m["name"] == "get_weather" && m["reasoning_content"] == "thinking", "recon_optional_fields");
}
}
int main() {
test_normalize();
test_template_sanitizer();
test_reconstruction();
if (failures == 0) {
std::cout << "OK: all message_content tests passed\n";
return 0;
}
std::cerr << failures << " test(s) failed\n";
return 1;
}

View File

@@ -18,6 +18,10 @@ done
cp -r CMakeLists.txt llama.cpp/tools/grpc-server/
cp -r grpc-server.cpp llama.cpp/tools/grpc-server/
# Shared message-reconstruction helpers (included by grpc-server.cpp) and their
# unit test (compiled only when -DLLAMA_GRPC_BUILD_TESTS=ON).
cp -r message_content.h llama.cpp/tools/grpc-server/
cp -r message_content_test.cpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/cpp-httplib/httplib.h llama.cpp/tools/grpc-server/

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -16,37 +16,37 @@ BINARY=llama-cpp-fallback
# CPU_ALL_VARIANTS: ggml's backend registry dlopens the best libggml-cpu-*.so for this
# host, so no shell-side AVX probing. GPU images (cublas/sycl/vulkan/hipblas) ship only
# llama-cpp-fallback (the accelerator does the compute), so fall back to it when absent.
if [ -e $CURDIR/llama-cpp-cpu-all ]; then
if [ -e "$CURDIR"/llama-cpp-cpu-all ]; then
BINARY=llama-cpp-cpu-all
fi
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
if [ -e $CURDIR/llama-cpp-grpc ]; then
if [ -e "$CURDIR"/llama-cpp-grpc ]; then
BINARY=llama-cpp-grpc
fi
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
if [ -d "$CURDIR/lib/rocblas/library" ]; then
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
fi
fi
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec $CURDIR/$BINARY "$@"
exec "$CURDIR"/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec $CURDIR/llama-cpp-fallback "$@"
exec "$CURDIR"/llama-cpp-fallback "$@"

View File

@@ -51,6 +51,14 @@ add_library(hw_grpc_proto STATIC
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
# The generated proto/grpc sources include protobuf and grpc++ headers, so this
# library must see their include dirs. Linking the imported targets propagates
# them. On Linux the apt headers live in /usr/include (default search path) so
# this was a no-op; on macOS the Homebrew headers are under /opt/homebrew and
# would otherwise be missed (runtime_version.h not found).
target_link_libraries(hw_grpc_proto PUBLIC
protobuf::libprotobuf
gRPC::grpc++)
# Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
# PF_VULKAN is honored when passed on the cmake command line (it lands in the

View File

@@ -2,7 +2,13 @@
# Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
# macOS has no bundled ld.so; the darwin package ships only dylibs under lib/,
# resolved via DYLD_LIBRARY_PATH (the ld.so branch below is skipped there).
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$DYLD_LIBRARY_PATH"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
fi
if [ -f "$CURDIR/lib/ld.so" ]; then
exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
fi

71
backend/cpp/run-unit-tests.sh Executable file
View File

@@ -0,0 +1,71 @@
#!/bin/bash
#
# Discovers and runs every standalone C++ unit test under backend/cpp/.
#
# A "standalone" unit test is a *_test.cpp that depends only on the C++ standard
# library and nlohmann/json (single header) - i.e. it exercises pure helpers and
# does not need the full llama.cpp + gRPC backend build. Tests that DO need the
# backend build use the CMake/ctest path (e.g. -DLLAMA_GRPC_BUILD_TESTS=ON)
# instead and are skipped here.
#
# This keeps CI generic: adding a new pure-C++ unit test file named *_test.cpp in
# an active backend source dir is picked up automatically, with no CI edits.
#
# Env:
# NLOHMANN_INCLUDE include dir that contains nlohmann/json.hpp. If unset, the
# nlohmann/json single header is fetched to a temp dir.
# CXX compiler (default: g++).
# JSON_VERSION nlohmann/json tag to fetch when NLOHMANN_INCLUDE is unset
# (default: v3.11.3).
set -uo pipefail
ROOT="$(cd "$(dirname "$0")" && pwd)"
CXX="${CXX:-g++}"
JSON_VERSION="${JSON_VERSION:-v3.11.3}"
JSON_INC="${NLOHMANN_INCLUDE:-}"
if [ -z "$JSON_INC" ]; then
JSON_INC="$(mktemp -d)"
mkdir -p "$JSON_INC/nlohmann"
echo "Fetching nlohmann/json ${JSON_VERSION} single header..."
if ! curl -L -sf \
"https://raw.githubusercontent.com/nlohmann/json/${JSON_VERSION}/single_include/nlohmann/json.hpp" \
-o "$JSON_INC/nlohmann/json.hpp"; then
echo "ERROR: failed to fetch nlohmann/json header" >&2
exit 1
fi
fi
# Active source dirs only - exclude per-variant build copies, dev snapshots and
# the vendored upstream llama.cpp tree.
mapfile -t tests < <(find "$ROOT" -name '*_test.cpp' \
-not -path '*/llama.cpp/*' \
-not -path '*-build/*' \
-not -path '*-dev/*' \
-not -path '*fallback*' | sort)
if [ "${#tests[@]}" -eq 0 ]; then
echo "No standalone C++ unit tests found under $ROOT"
exit 0
fi
fail=0
for test_src in "${tests[@]}"; do
name="$(basename "$test_src" .cpp)"
bin="$(mktemp -d)/$name"
echo "==> $test_src"
if ! "$CXX" -std=c++17 -Wall -Wextra \
-I"$JSON_INC" -I"$(dirname "$test_src")" \
"$test_src" -o "$bin"; then
echo "COMPILE FAILED: $test_src" >&2
fail=1
continue
fi
if ! "$bin"; then
echo "TEST FAILED: $test_src" >&2
fail=1
fi
done
echo "Ran ${#tests[@]} standalone C++ unit test file(s)"
exit "$fail"

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,36 +15,36 @@ BINARY=turboquant-fallback
# x86/arm64 ship a single turboquant-cpu-all built with ggml CPU_ALL_VARIANTS: ggml's
# backend registry dlopens the best libggml-cpu-*.so for this host, so no shell-side
# probing. ROCm ships only turboquant-fallback, so fall back to it when cpu-all is absent.
if [ -e $CURDIR/turboquant-cpu-all ]; then
if [ -e "$CURDIR"/turboquant-cpu-all ]; then
BINARY=turboquant-cpu-all
fi
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
if [ -e $CURDIR/turboquant-grpc ]; then
if [ -e "$CURDIR"/turboquant-grpc ]; then
BINARY=turboquant-grpc
fi
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
if [ -d "$CURDIR/lib/rocblas/library" ]; then
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
fi
fi
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec $CURDIR/$BINARY "$@"
exec "$CURDIR"/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec $CURDIR/turboquant-fallback "$@"
exec "$CURDIR"/turboquant-fallback "$@"

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -21,20 +21,20 @@ if [ "$(uname)" = "Darwin" ]; then
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
if [ -e "$CURDIR"/libgoacestepcpp-avx.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgoacestepcpp-avx2.so ]; then
if [ -e "$CURDIR"/libgoacestepcpp-avx2.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx2.so"
fi
fi
@@ -42,22 +42,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgoacestepcpp-avx512.so ]; then
if [ -e "$CURDIR"/libgoacestepcpp-avx512.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export ACESTEP_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/acestep-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/acestep-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/acestep-cpp "$@"
exec "$CURDIR"/acestep-cpp "$@"

View File

@@ -4,10 +4,10 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
export CED_LIBRARY="$CURDIR/lib/libced.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
fi
# If a self-contained ld.so was packaged, route through it so the packaged

View File

@@ -1,6 +1,6 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
exec $CURDIR/cloud-proxy "$@"
exec "$CURDIR"/cloud-proxy "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc
CRISPASR_VERSION?=6514c9da00b03a2f0f1b49a43fae4f3a01a41844
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgocrispasr-avx.so ]; then
if [ -e "$CURDIR"/libgocrispasr-avx.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgocrispasr-avx2.so ]; then
if [ -e "$CURDIR"/libgocrispasr-avx2.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx2.so"
fi
fi
@@ -36,12 +36,12 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgocrispasr-avx512.so ]; then
if [ -e "$CURDIR"/libgocrispasr-avx512.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export CRISPASR_LIBRARY=$LIBRARY
@@ -49,14 +49,14 @@ export CRISPASR_LIBRARY=$LIBRARY
# Point piper's espeak-ng phonemizer at the bundled voice data. The variable
# names the directory CONTAINING espeak-ng-data (package.sh drops it next to
# this script). Harmless when espeak-ng wasn't bundled.
export CRISPASR_ESPEAK_DATA_PATH=$CURDIR
export CRISPASR_ESPEAK_DATA_PATH="$CURDIR"
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/crispasr "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/crispasr "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/crispasr "$@"
exec "$CURDIR"/crispasr "$@"

View File

@@ -40,6 +40,8 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
if [ -e "$CURDIR"/libdepthanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx2.so ]; then
if [ -e "$CURDIR"/libdepthanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx512.so ]; then
if [ -e "$CURDIR"/libdepthanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/depth-anything-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/depth-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/depth-anything-cpp "$@"
exec "$CURDIR"/depth-anything-cpp "$@"

18
backend/go/face-detect/.gitignore vendored Normal file
View File

@@ -0,0 +1,18 @@
# Fetched upstream sources
sources/
# CMake build directories
build*/
# build artifacts staged in-tree by the Makefile (cp from sources/) or
# symlinked for local dev; the real sources live in face-detect.cpp upstream.
*.so
*.so.*
facedetect_capi.h
compile_commands.json
# Compiled backend binary
face-detect-grpc
# Packaging output
package/

View File

@@ -0,0 +1,110 @@
# face-detect backend Makefile.
#
# Upstream pin lives below as FACEDETECT_VERSION?=06914b0... (.github/bump_deps.sh
# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
# convention).
#
# Local dev shortcut: if you already have an out-of-tree face-detect.cpp build,
# symlink the .so + header into this directory and skip the clone/cmake steps:
#
# ln -sf /path/to/face-detect.cpp/build-shared/libfacedetect.so .
# ln -sf /path/to/face-detect.cpp/include/facedetect_capi.h .
# go build -o face-detect-grpc .
#
# The default target below does the proper clone-at-pin + cmake build so CI does
# not need a side-checkout.
FACEDETECT_VERSION?=06914b077d52f90d5421299138e7be6bdd06b5e8
FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
BUILD_TYPE?=
NATIVE?=false
# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
# Build ggml + the vendored libjpeg-turbo statically into libfacedetect.so (PIC)
# so the shared lib is self-contained: dlopen needs no libggml*.so alongside it,
# only system libs (libstdc++/libgomp/libc) the runtime image already provides.
# The vendored jpeg symbols are hidden via -Wl,--exclude-libs,ALL on the C++
# side, so only the facedetect_capi_* surface is exported.
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DFACEDETECT_SHARED=ON -DFACEDETECT_BUILD_CLI=OFF -DFACEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# face-detect.cpp gates its GGML backends behind FACEDETECT_GGML_* options and
# does set(GGML_CUDA ${FACEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
# -DGGML_CUDA=ON is overwritten back to OFF. Forward the FACEDETECT_GGML_*
# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DFACEDETECT_GGML_CUDA=ON
# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, SCRFD 2.3x
# vs torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
ifeq ($(CUDA_MAJOR_VERSION),13)
ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
CMAKE_ARGS+=-DFACEDETECT_GGML_CUDNN=ON
endif
endif
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DFACEDETECT_GGML_HIP=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DFACEDETECT_GGML_VULKAN=ON
else ifeq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DFACEDETECT_GGML_METAL=ON
endif
.PHONY: face-detect-grpc package build clean purge test all
all: face-detect-grpc
# Clone the upstream face-detect.cpp source at the pinned commit. Directory acts
# as the target so make only re-clones when missing. After a FACEDETECT_VERSION
# bump, run 'make purge && make' to refetch.
sources/face-detect.cpp:
mkdir -p sources/face-detect.cpp
cd sources/face-detect.cpp && \
git init -q && \
git remote add origin $(FACEDETECT_REPO) && \
git fetch --depth 1 origin $(FACEDETECT_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1 --single-branch
# Build the shared lib + header out-of-tree, then stage them next to the Go
# sources so purego.Dlopen("libfacedetect.so") and the cgo-less build both pick
# them up.
libfacedetect.so: sources/face-detect.cpp
cmake -B sources/face-detect.cpp/build-shared -S sources/face-detect.cpp $(CMAKE_ARGS)
cmake --build sources/face-detect.cpp/build-shared --config Release -j$(JOBS) --target facedetect
cp -fv sources/face-detect.cpp/build-shared/libfacedetect.so* ./ 2>/dev/null || true
cp -fv sources/face-detect.cpp/include/facedetect_capi.h ./
face-detect-grpc: libfacedetect.so main.go gofacedetect.go options.go
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o face-detect-grpc .
package: face-detect-grpc
bash package.sh
build: package
# Test target. The embed/detect/verify/analyze smoke specs are gated on
# FACEDETECT_BACKEND_TEST_MODEL + FACEDETECT_BACKEND_TEST_IMAGE; without them the
# heavy specs auto-skip and only the pure-Go parsing specs run.
test:
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
clean: purge
rm -rf libfacedetect.so* facedetect_capi.h package face-detect-grpc
purge:
rm -rf sources/face-detect.cpp

View File

@@ -0,0 +1,431 @@
package main
import (
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"math"
"os"
"path/filepath"
"strconv"
"strings"
"time"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/xlog"
)
// purego-bound entry points from libfacedetect.so. Names match
// facedetect_capi.h exactly so a `nm libfacedetect.so | grep facedetect_capi`
// is enough to spot drift.
//
// The opaque ctx and the malloc'd char*/float* return values are declared as
// uintptr so we get the raw pointer back and can release it via the matching
// capi free function. purego's native string/[]float32 returns would copy and
// forget the original pointer, leaking the C-owned buffer on every call.
var (
CppAbiVersion func() int32
CppLoad func(ggufPath string) uintptr
CppFree func(ctx uintptr)
CppLastError func(ctx uintptr) string
CppFreeString func(s uintptr)
CppFreeVec func(v uintptr)
CppEmbedPath func(ctx uintptr, imagePath string, outVec, outDim unsafe.Pointer) int32
CppEmbedRGB func(ctx uintptr, rgb []byte, width, height int32, outVec, outDim unsafe.Pointer) int32
CppDetectJSON func(ctx uintptr, imagePath string) uintptr
CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, antiSpoof int32, outDistance, outVerified unsafe.Pointer) int32
CppAnalyzeJSON func(ctx uintptr, imagePath string) uintptr
)
// FaceDetect implements the face-recognition (biometric) subset of the Backend
// gRPC service over libfacedetect.so. The C side keeps a single loaded model
// pack plus a per-ctx last-error buffer and is not reentrant, so
// base.SingleThread serializes every call.
type FaceDetect struct {
base.SingleThread
opts loadOptions
ctxPtr uintptr
}
func (f *FaceDetect) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
if model == "" {
return errors.New("face-detect: ModelFile is required")
}
f.opts = parseOptions(opts.Options)
if f.opts.modelName == "" {
f.opts.modelName = filepath.Base(model)
}
// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
// one backend process per model and serves requests concurrently, so the
// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
// FACEDETECT_THREADS is read by the engine at backend construction, so it
// must be set before the capi load. A non-positive Threads means "unset":
// leave the env alone so the engine keeps its sane default.
threads := opts.Threads
if threads > 0 {
if err := os.Setenv("FACEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
return fmt.Errorf("face-detect: set FACEDETECT_THREADS: %w", err)
}
xlog.Info("face-detect: applying LocalAI thread budget", "threads", threads)
}
xlog.Info("face-detect: loading model", "model", model,
"verify_threshold", f.opts.verifyThreshold, "abi", CppAbiVersion())
ctx := CppLoad(model)
if ctx == 0 {
// The last-error buffer lives on the ctx that was never returned, so
// surface the path the operator tried to load instead.
return fmt.Errorf("face-detect: facedetect_capi_load failed for %q", model)
}
f.ctxPtr = ctx
return nil
}
// Embeddings returns the L2-normalized ArcFace embedding of the primary face in
// the supplied image. Mirroring the Python face backend, the image is read from
// Images[0] as a base64 payload; materializeImage decodes it to a temp file so
// the path-based C-API can run its own decode (cv2.imread parity). The gRPC
// server wraps the returned slice in an EmbeddingResult.
func (f *FaceDetect) Embeddings(req *pb.PredictOptions) ([]float32, error) {
if f.ctxPtr == 0 {
return nil, errors.New("face-detect: model not loaded")
}
if len(req.Images) == 0 || req.Images[0] == "" {
return nil, errors.New("face-detect: Embedding requires Images[0] to be a base64 image")
}
path, cleanup, err := materializeImage(req.Images[0])
if err != nil {
return nil, err
}
defer cleanup()
return f.embedPath(path)
}
func (f *FaceDetect) embedPath(path string) ([]float32, error) {
var vec uintptr
var dim int32
rc := CppEmbedPath(f.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
if rc != 0 || vec == 0 || dim <= 0 {
return nil, f.lastErr("embed", path)
}
defer CppFreeVec(vec)
// Copy out of the C-owned malloc'd buffer before freeing it. The
// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
// nor moves this buffer and we copy immediately.
src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
out := make([]float32, int(dim))
copy(out, src)
return out, nil
}
// Detect runs SCRFD over the image and returns one Detection per face. The
// C-API emits a box as [x1,y1,x2,y2] in pixels; the proto carries x/y plus
// width/height, so the corners are converted. The 5 facial landmarks the engine
// also returns are dropped: the Detection message has no field for them.
func (f *FaceDetect) Detect(req *pb.DetectOptions) (pb.DetectResponse, error) {
if f.ctxPtr == 0 {
return pb.DetectResponse{}, errors.New("face-detect: model not loaded")
}
if req.Src == "" {
return pb.DetectResponse{}, errors.New("face-detect: src image is required")
}
path, cleanup, err := materializeImage(req.Src)
if err != nil {
return pb.DetectResponse{}, err
}
defer cleanup()
faces, err := f.detectFaces(path)
if err != nil {
return pb.DetectResponse{}, err
}
dets := make([]*pb.Detection, 0, len(faces))
for _, fc := range faces {
if req.Threshold > 0 && fc.Score < req.Threshold {
continue
}
x, y, w, h := fc.xywh()
dets = append(dets, &pb.Detection{
X: x,
Y: y,
Width: w,
Height: h,
Confidence: fc.Score,
ClassName: "face",
})
}
return pb.DetectResponse{Detections: dets}, nil
}
// FaceVerify embeds the primary face in each image and reports whether they are
// the same identity by cosine distance against a threshold. A request threshold
// <= 0 falls back to the model-configured default (verify_threshold option,
// 0.35 if unset). When anti_spoofing is set, the C-API applies a MiniFASNet
// veto internally (verified forced false on a spoof); the per-image liveness
// scores are not exposed by the verify entry point, so img*_is_real /
// img*_antispoof_score stay at their zero values.
func (f *FaceDetect) FaceVerify(req *pb.FaceVerifyRequest) (pb.FaceVerifyResponse, error) {
if f.ctxPtr == 0 {
return pb.FaceVerifyResponse{}, errors.New("face-detect: model not loaded")
}
if req.Img1 == "" || req.Img2 == "" {
return pb.FaceVerifyResponse{}, errors.New("face-detect: img1 and img2 are required")
}
path1, cleanup1, err := materializeImage(req.Img1)
if err != nil {
return pb.FaceVerifyResponse{}, err
}
defer cleanup1()
path2, cleanup2, err := materializeImage(req.Img2)
if err != nil {
return pb.FaceVerifyResponse{}, err
}
defer cleanup2()
threshold := req.Threshold
if threshold <= 0 {
threshold = f.opts.verifyThreshold
}
antiSpoof := int32(0)
if req.AntiSpoofing {
antiSpoof = 1
}
started := time.Now()
var distance float32
var verified int32
rc := CppVerifyPaths(f.ctxPtr, path1, path2, threshold, antiSpoof,
unsafe.Pointer(&distance), unsafe.Pointer(&verified))
if rc != 0 {
return pb.FaceVerifyResponse{}, f.lastErr("verify", req.Img1[:min(8, len(req.Img1))]+"...")
}
elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
// matching the Python face backend's reporting.
confidence := float32(0)
if threshold > 0 {
confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
}
return pb.FaceVerifyResponse{
Verified: verified != 0,
Distance: distance,
Threshold: threshold,
Confidence: confidence,
Model: f.opts.modelName,
Img1Area: f.bestArea(path1),
Img2Area: f.bestArea(path2),
ProcessingTimeMs: elapsedMs,
}, nil
}
// FaceAnalyze runs the genderage head on every detected face. The C-API returns
// "M"/"F" gender labels and a rounded age; the labels are normalized to the
// "Man"/"Woman" values the proto documents.
func (f *FaceDetect) FaceAnalyze(req *pb.FaceAnalyzeRequest) (pb.FaceAnalyzeResponse, error) {
if f.ctxPtr == 0 {
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: model not loaded")
}
if req.Img == "" {
return pb.FaceAnalyzeResponse{}, errors.New("face-detect: img is required")
}
path, cleanup, err := materializeImage(req.Img)
if err != nil {
return pb.FaceAnalyzeResponse{}, err
}
defer cleanup()
ptr := CppAnalyzeJSON(f.ctxPtr, path)
if ptr == 0 {
return pb.FaceAnalyzeResponse{}, f.lastErr("analyze", path)
}
defer CppFreeString(ptr)
faces, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
if err != nil {
return pb.FaceAnalyzeResponse{}, fmt.Errorf("face-detect: analyze JSON: %w", err)
}
return pb.FaceAnalyzeResponse{Faces: faces}, nil
}
// faceBox is one entry of the detect/analyze JSON documents the engine emits.
type faceBox struct {
Score float32 `json:"score"`
Box []float32 `json:"box"`
Age float32 `json:"age"`
Gender string `json:"gender"`
}
// xywh converts the engine's [x1,y1,x2,y2] box into the x/y/width/height the
// proto carries. A short or missing box yields zeros.
func (b faceBox) xywh() (x, y, w, h float32) {
if len(b.Box) < 4 {
return 0, 0, 0, 0
}
return b.Box[0], b.Box[1], b.Box[2] - b.Box[0], b.Box[3] - b.Box[1]
}
type facesJSON struct {
Faces []faceBox `json:"faces"`
}
func (f *FaceDetect) detectFaces(path string) ([]faceBox, error) {
ptr := CppDetectJSON(f.ctxPtr, path)
if ptr == 0 {
return nil, f.lastErr("detect", path)
}
defer CppFreeString(ptr)
var doc facesJSON
if err := json.Unmarshal([]byte(goStringFromCPtr(ptr)), &doc); err != nil {
return nil, fmt.Errorf("face-detect: detect JSON: %w", err)
}
return doc.Faces, nil
}
// bestArea returns the FacialArea of the highest-scoring face in an image, or an
// empty area when detection fails or finds nothing. Best-effort: verify already
// succeeded, so a missing region must not turn a valid match into an error.
func (f *FaceDetect) bestArea(path string) *pb.FacialArea {
faces, err := f.detectFaces(path)
if err != nil || len(faces) == 0 {
return &pb.FacialArea{}
}
best := faces[0]
for _, fc := range faces[1:] {
if fc.Score > best.Score {
best = fc
}
}
x, y, w, h := best.xywh()
return &pb.FacialArea{X: x, Y: y, W: w, H: h}
}
// parseAnalyzeJSON maps the engine's analyze document onto FaceAnalysis entries.
// The engine reports gender as "M"/"F"; both the dominant label and the score
// map are filled with the "Man"/"Woman" form the proto documents.
func parseAnalyzeJSON(doc string) ([]*pb.FaceAnalysis, error) {
var parsed facesJSON
if err := json.Unmarshal([]byte(doc), &parsed); err != nil {
return nil, err
}
out := make([]*pb.FaceAnalysis, 0, len(parsed.Faces))
for _, fc := range parsed.Faces {
x, y, w, h := fc.xywh()
fa := &pb.FaceAnalysis{
Region: &pb.FacialArea{X: x, Y: y, W: w, H: h},
FaceConfidence: fc.Score,
Age: fc.Age,
}
if label := normalizeGender(fc.Gender); label != "" {
fa.DominantGender = label
fa.Gender = map[string]float32{label: 1.0}
}
out = append(out, fa)
}
return out, nil
}
// normalizeGender maps the engine's "M"/"F" code to the "Man"/"Woman" labels the
// proto documents. Unknown codes pass through unchanged.
func normalizeGender(g string) string {
switch strings.ToUpper(strings.TrimSpace(g)) {
case "M":
return "Man"
case "F":
return "Woman"
case "":
return ""
default:
return g
}
}
// materializeImage decodes a base64 image payload into a temp file and returns
// its path plus a cleanup func. As a convenience for callers that already pass a
// filesystem path (e.g. a test fixture), an existing path is used as-is with a
// no-op cleanup. data: URI prefixes are stripped before decoding.
func materializeImage(src string) (path string, cleanup func(), err error) {
noop := func() {}
if src == "" {
return "", noop, errors.New("face-detect: empty image input")
}
if _, statErr := os.Stat(src); statErr == nil {
return src, noop, nil
}
payload := src
if i := strings.Index(payload, ","); strings.HasPrefix(payload, "data:") && i >= 0 {
payload = payload[i+1:]
}
data, decErr := base64.StdEncoding.DecodeString(strings.TrimSpace(payload))
if decErr != nil || len(data) == 0 {
return "", noop, errors.New("face-detect: image is neither an existing path nor valid base64")
}
tmp, createErr := os.CreateTemp("", "face-detect-*.img")
if createErr != nil {
return "", noop, fmt.Errorf("face-detect: create temp image: %w", createErr)
}
cleanup = func() { _ = os.Remove(tmp.Name()) }
if _, wErr := tmp.Write(data); wErr != nil {
_ = tmp.Close()
cleanup()
return "", noop, fmt.Errorf("face-detect: write temp image: %w", wErr)
}
if cErr := tmp.Close(); cErr != nil {
cleanup()
return "", noop, fmt.Errorf("face-detect: close temp image: %w", cErr)
}
return tmp.Name(), cleanup, nil
}
// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
func (f *FaceDetect) lastErr(op, subject string) error {
msg := strings.TrimSpace(CppLastError(f.ctxPtr))
if msg == "" {
msg = "no error detail"
}
return fmt.Errorf("face-detect: %s failed for %q: %s", op, subject, msg)
}
// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
//
// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
// moves the buffer and we dereference it immediately to copy the bytes out.
func goStringFromCPtr(cptr uintptr) string {
if cptr == 0 {
return ""
}
p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
n := 0
for *(*byte)(unsafe.Add(p, n)) != 0 {
n++
}
return string(unsafe.Slice((*byte)(p), n))
}

View File

@@ -0,0 +1,230 @@
package main
import (
"encoding/base64"
"os"
"sync"
"testing"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestFaceDetect(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "face-detect Backend Suite")
}
var (
libLoadOnce sync.Once
libLoadErr error
)
// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
// bridge without spinning up the gRPC server. Records the error (the smoke
// specs skip themselves) when libfacedetect.so is not loadable from cwd
// (LD_LIBRARY_PATH or a symlink in ./).
func ensureLibLoaded() error {
libLoadOnce.Do(func() {
libName := os.Getenv("FACEDETECT_LIBRARY")
if libName == "" {
libName = "libfacedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
libLoadErr = err
return
}
purego.RegisterLibFunc(&CppAbiVersion, lib, "facedetect_capi_abi_version")
purego.RegisterLibFunc(&CppLoad, lib, "facedetect_capi_load")
purego.RegisterLibFunc(&CppFree, lib, "facedetect_capi_free")
purego.RegisterLibFunc(&CppLastError, lib, "facedetect_capi_last_error")
purego.RegisterLibFunc(&CppFreeString, lib, "facedetect_capi_free_string")
purego.RegisterLibFunc(&CppFreeVec, lib, "facedetect_capi_free_vec")
purego.RegisterLibFunc(&CppEmbedPath, lib, "facedetect_capi_embed_path")
purego.RegisterLibFunc(&CppEmbedRGB, lib, "facedetect_capi_embed_rgb")
purego.RegisterLibFunc(&CppDetectJSON, lib, "facedetect_capi_detect_path_json")
purego.RegisterLibFunc(&CppVerifyPaths, lib, "facedetect_capi_verify_paths")
purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "facedetect_capi_analyze_path_json")
})
return libLoadErr
}
var _ = Describe("parseOptions", func() {
It("defaults verify_threshold to 0.35", func() {
o := parseOptions(nil)
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
Expect(o.modelName).To(Equal(""))
})
It("parses verify_threshold, threshold alias and model_name", func() {
o := parseOptions([]string{"verify_threshold:0.4", "model_name:buffalo_l", "unknown:x"})
Expect(o.verifyThreshold).To(Equal(float32(0.4)))
Expect(o.modelName).To(Equal("buffalo_l"))
o2 := parseOptions([]string{"threshold:0.3"})
Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
})
It("ignores non-positive thresholds and keeps the default", func() {
o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
Expect(o.verifyThreshold).To(Equal(float32(0.35)))
})
})
var _ = Describe("normalizeGender", func() {
It("maps M/F codes to Man/Woman", func() {
Expect(normalizeGender("M")).To(Equal("Man"))
Expect(normalizeGender("f")).To(Equal("Woman"))
Expect(normalizeGender(" m ")).To(Equal("Man"))
})
It("passes empty and unknown codes through", func() {
Expect(normalizeGender("")).To(Equal(""))
Expect(normalizeGender("nonbinary")).To(Equal("nonbinary"))
})
})
var _ = Describe("faceBox.xywh", func() {
It("converts an [x1,y1,x2,y2] box to x/y/width/height", func() {
b := faceBox{Box: []float32{10, 20, 50, 80}}
x, y, w, h := b.xywh()
Expect(x).To(Equal(float32(10)))
Expect(y).To(Equal(float32(20)))
Expect(w).To(Equal(float32(40)))
Expect(h).To(Equal(float32(60)))
})
It("returns zeros for a short box", func() {
x, y, w, h := faceBox{Box: []float32{1, 2}}.xywh()
Expect([]float32{x, y, w, h}).To(Equal([]float32{0, 0, 0, 0}))
})
})
var _ = Describe("parseAnalyzeJSON", func() {
It("maps region, age and gender for each face", func() {
doc := `{"faces":[
{"score":0.997,"box":[10,20,50,80],"age":31,"gender":"M"},
{"score":0.81,"box":[0,0,40,40],"age":24,"gender":"F"}]}`
faces, err := parseAnalyzeJSON(doc)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(HaveLen(2))
Expect(faces[0].FaceConfidence).To(BeNumerically("~", 0.997, 1e-4))
Expect(faces[0].Age).To(BeNumerically("~", 31, 1e-4))
Expect(faces[0].DominantGender).To(Equal("Man"))
Expect(faces[0].Gender).To(HaveKeyWithValue("Man", float32(1.0)))
Expect(faces[0].Region.W).To(Equal(float32(40)))
Expect(faces[0].Region.H).To(Equal(float32(60)))
Expect(faces[1].DominantGender).To(Equal("Woman"))
})
It("tolerates a missing gender field", func() {
faces, err := parseAnalyzeJSON(`{"faces":[{"score":0.5,"box":[0,0,10,10],"age":40}]}`)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(HaveLen(1))
Expect(faces[0].DominantGender).To(Equal(""))
Expect(faces[0].Gender).To(BeEmpty())
})
It("returns no faces for an empty document", func() {
faces, err := parseAnalyzeJSON(`{"faces":[]}`)
Expect(err).ToNot(HaveOccurred())
Expect(faces).To(BeEmpty())
})
It("returns an error on malformed JSON", func() {
_, err := parseAnalyzeJSON(`{not-json`)
Expect(err).To(HaveOccurred())
})
})
var _ = Describe("materializeImage", func() {
It("decodes a base64 payload to a temp file", func() {
payload := base64.StdEncoding.EncodeToString([]byte("\xff\xd8\xff\xe0fake-jpeg"))
path, cleanup, err := materializeImage(payload)
Expect(err).ToNot(HaveOccurred())
defer cleanup()
data, rerr := os.ReadFile(path)
Expect(rerr).ToNot(HaveOccurred())
Expect(data).To(Equal([]byte("\xff\xd8\xff\xe0fake-jpeg")))
})
It("strips a data: URI prefix before decoding", func() {
payload := "data:image/png;base64," + base64.StdEncoding.EncodeToString([]byte("hello"))
path, cleanup, err := materializeImage(payload)
Expect(err).ToNot(HaveOccurred())
defer cleanup()
data, rerr := os.ReadFile(path)
Expect(rerr).ToNot(HaveOccurred())
Expect(data).To(Equal([]byte("hello")))
})
It("uses an existing path as-is", func() {
tmp, err := os.CreateTemp("", "face-detect-fixture-*.bin")
Expect(err).ToNot(HaveOccurred())
defer func() { _ = os.Remove(tmp.Name()) }()
Expect(tmp.Close()).To(Succeed())
path, cleanup, err := materializeImage(tmp.Name())
Expect(err).ToNot(HaveOccurred())
defer cleanup()
Expect(path).To(Equal(tmp.Name()))
})
It("errors on input that is neither a path nor base64", func() {
_, _, err := materializeImage("not base64!!!")
Expect(err).To(HaveOccurred())
})
})
// The specs below exercise the real C-API end to end. They run only when both a
// model GGUF and a test image are provided, and skip cleanly otherwise so the
// suite stays green without large assets.
var _ = Describe("FaceDetect end-to-end", Ordered, func() {
var (
f *FaceDetect
modelPath = os.Getenv("FACEDETECT_BACKEND_TEST_MODEL")
imagePath = os.Getenv("FACEDETECT_BACKEND_TEST_IMAGE")
)
BeforeAll(func() {
if modelPath == "" || imagePath == "" {
Skip("set FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE to run the e2e specs")
}
if err := ensureLibLoaded(); err != nil {
Skip("libfacedetect.so not loadable: " + err.Error())
}
f = &FaceDetect{}
Expect(f.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
})
It("embeds the primary face in an image", func() {
emb, err := f.Embeddings(&pb.PredictOptions{Images: []string{imagePath}})
Expect(err).ToNot(HaveOccurred())
Expect(emb).ToNot(BeEmpty())
})
It("detects at least one face", func() {
resp, err := f.Detect(&pb.DetectOptions{Src: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Detections).ToNot(BeEmpty())
Expect(resp.Detections[0].ClassName).To(Equal("face"))
})
It("verifies an image against itself as the same identity", func() {
resp, err := f.FaceVerify(&pb.FaceVerifyRequest{Img1: imagePath, Img2: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Verified).To(BeTrue())
Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
})
It("analyzes age/gender for each face", func() {
resp, err := f.FaceAnalyze(&pb.FaceAnalyzeRequest{Img: imagePath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Faces).ToNot(BeEmpty())
})
})

View File

@@ -0,0 +1,65 @@
package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads libfacedetect.so via purego and registers the flat C-API entry points
// declared in facedetect_capi.h. The library name can be overridden with
// FACEDETECT_LIBRARY (mirrors the VOICEDETECT_LIBRARY / PARAKEET_LIBRARY
// convention in the sibling backends); the default looks for the .so next to
// this binary (resolved via LD_LIBRARY_PATH by run.sh).
import (
"flag"
"fmt"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
libName := os.Getenv("FACEDETECT_LIBRARY")
if libName == "" {
libName = "libfacedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("face-detect: dlopen %q: %w", libName, err))
}
// Bound 1:1 to facedetect_capi.h. char*/float* returns are registered as
// uintptr so the raw pointer can be freed via the matching capi free fn.
libFuncs := []LibFuncs{
{&CppAbiVersion, "facedetect_capi_abi_version"},
{&CppLoad, "facedetect_capi_load"},
{&CppFree, "facedetect_capi_free"},
{&CppLastError, "facedetect_capi_last_error"},
{&CppFreeString, "facedetect_capi_free_string"},
{&CppFreeVec, "facedetect_capi_free_vec"},
{&CppEmbedPath, "facedetect_capi_embed_path"},
{&CppEmbedRGB, "facedetect_capi_embed_rgb"},
{&CppDetectJSON, "facedetect_capi_detect_path_json"},
{&CppVerifyPaths, "facedetect_capi_verify_paths"},
{&CppAnalyzeJSON, "facedetect_capi_analyze_path_json"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
fmt.Fprintf(os.Stderr, "[face-detect] ABI=%d\n", CppAbiVersion())
flag.Parse()
if err := grpc.StartServer(*addr, &FaceDetect{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,47 @@
package main
import (
"strconv"
"strings"
)
// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
// not set one. Matches the insightface buffalo_l ArcFace R50 default the Python
// face backend ships with so the two implementations agree on verdicts out of
// the box.
const defaultVerifyThreshold float32 = 0.35
// loadOptions holds the parsed model-level options for face-detect.
type loadOptions struct {
verifyThreshold float32
modelName string
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: verify_threshold 0.35, model_name derived from the file.
func parseOptions(opts []string) loadOptions {
o := loadOptions{verifyThreshold: defaultVerifyThreshold}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "verify_threshold", "threshold":
if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
o.verifyThreshold = float32(f)
}
case "model_name":
o.modelName = value
}
}
return o
}

View File

@@ -0,0 +1,68 @@
#!/bin/bash
#
# Bundle the face-detect-grpc binary, libfacedetect.so, the core runtime libs
# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
# so the package is self-contained. Mirrors backend/go/voice-detect/package.sh;
# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
# is used instead of the host's.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/face-detect-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libfacedetect.so + any soname symlinks. purego.Dlopen resolves it via
# LD_LIBRARY_PATH, which run.sh points at lib/.
cp -avf "$CURDIR"/libfacedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libfacedetect.so not found in $CURDIR, run 'make' first" >&2
exit 1
}
# Detect architecture and copy the core runtime libs libfacedetect.so links
# against, plus the matching dynamic loader as lib/ld.so.
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
# BUILD_TYPE so the backend can reach the GPU without the runtime base image
# shipping those drivers.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -0,0 +1,16 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the voice-detect /
# whisper / parakeet backends' runtime layout).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/face-detect-grpc" "$@"
fi
exec "$CURDIR/face-detect-grpc" "$@"

View File

@@ -0,0 +1,15 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
cd "$CURDIR"
echo "Running face-detect backend tests..."
# The pure-Go parsing specs always run. The embed/detect/verify/analyze smoke
# specs run only when a model + image are provided via
# FACEDETECT_BACKEND_TEST_MODEL and FACEDETECT_BACKEND_TEST_IMAGE; otherwise they
# auto-skip.
LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
echo "face-detect tests completed."

View File

@@ -1,6 +1,6 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
exec $CURDIR/local-store "$@"
exec "$CURDIR"/local-store "$@"

View File

@@ -32,6 +32,8 @@ endif
ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DLOCALVQE_VULKAN=ON
else ifeq ($(OS),Darwin)
# Apple Silicon: CPU-only (no Metal upstream); built + published as an arm64
# image by CI (includeDarwin in .github/backend-matrix.yml) for macOS install.
CMAKE_ARGS+=-DGGML_METAL=OFF
endif

View File

@@ -1,34 +1,34 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
# LocalVQE's runtime CPU-variant loader (ggml_backend_load_all) searches
# get_executable_path() and current_path() — the second one is what saves us
# when /proc/self/exe resolves to lib/ld.so under the bundled-loader path.
# So we cd into $CURDIR (where all the libggml-cpu-*.so files live) before
# So we cd into "$CURDIR" (where all the libggml-cpu-*.so files live) before
# exec'ing the binary.
cd "$CURDIR"
if [ "$(uname)" = "Darwin" ]; then
# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
export DYLD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$DYLD_LIBRARY_PATH
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.dylib
export DYLD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$DYLD_LIBRARY_PATH
LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.dylib
if [ ! -e "$LOCALVQE_LIBRARY" ]; then
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
fi
export LOCALVQE_LIBRARY
else
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
export LD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
fi
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LOCALVQE_LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/localvqe "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/localvqe "$@"
fi
echo "Using library: $LOCALVQE_LIBRARY"
exec $CURDIR/localvqe "$@"
exec "$CURDIR"/localvqe "$@"

View File

@@ -33,6 +33,8 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DLA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
if [ -e "$CURDIR"/liblocateanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx2.so ]; then
if [ -e "$CURDIR"/liblocateanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx512.so ]; then
if [ -e "$CURDIR"/liblocateanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LOCATEANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/locate-anything-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/locate-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/locate-anything-cpp "$@"
exec "$CURDIR"/locate-anything-cpp "$@"

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
if [ -e "$CURDIR"/libgomnivoicecpp-avx.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx2.so ]; then
if [ -e "$CURDIR"/libgomnivoicecpp-avx2.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx512.so ]; then
if [ -e "$CURDIR"/libgomnivoicecpp-avx512.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/omnivoice-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/omnivoice-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/omnivoice-cpp "$@"
exec "$CURDIR"/omnivoice-cpp "$@"

View File

@@ -1,13 +1,30 @@
GOCMD?=go
GO_TAGS?=
# The opus shim is a small C wrapper around libopus' variadic
# opus_encoder_ctl (see csrc/opus_shim.c). It is built as a shared library
# and dlopen'd at runtime by the Go backend (codec.go). The extension is
# OS-specific: Linux uses .so, macOS uses .dylib. OS is exported by the root
# Makefile (`export OS := $(shell uname -s)`).
SHIM_EXT=so
OPUS_CFLAGS := $(shell pkg-config --cflags opus)
OPUS_LIBS := $(shell pkg-config --libs opus)
SHIM_LDFLAGS := $(OPUS_LIBS)
libopusshim.so: csrc/opus_shim.c
$(CC) -shared -fPIC -o $@ $< $(OPUS_CFLAGS) $(OPUS_LIBS)
ifeq ($(OS),Darwin)
SHIM_EXT=dylib
# Resolve libopus symbols lazily from the already globally-loaded
# libopus (codec.go dlopens it RTLD_GLOBAL before the shim) rather than
# recording an absolute Homebrew path in the dylib. This keeps the
# packaged shim relocatable on machines that have no Homebrew.
SHIM_LDFLAGS := -undefined dynamic_lookup
endif
opus: libopusshim.so
libopusshim.$(SHIM_EXT): csrc/opus_shim.c
$(CC) -shared -fPIC -o $@ $< $(OPUS_CFLAGS) $(SHIM_LDFLAGS)
opus: libopusshim.$(SHIM_EXT)
$(GOCMD) build -tags "$(GO_TAGS)" -o opus ./
package: opus
@@ -16,4 +33,7 @@ package: opus
build: package
clean:
rm -f opus libopusshim.so
rm -f opus libopusshim.$(SHIM_EXT)
rm -rf package
.PHONY: build package clean

View File

@@ -8,13 +8,23 @@ mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/opus $CURDIR/package/
cp -avf $CURDIR/run.sh $CURDIR/package/
# Copy the opus shim library
cp -avf $CURDIR/libopusshim.so $CURDIR/package/lib/
# The shim extension is OS-specific (.so on Linux, .dylib on macOS).
SHIM_EXT=so
if [ "$(uname)" = "Darwin" ]; then
SHIM_EXT=dylib
fi
# Copy system libopus
# Copy the opus shim library
cp -avf $CURDIR/libopusshim.$SHIM_EXT $CURDIR/package/lib/
# Copy system libopus so the backend is self-contained: the runtime base
# image has neither libopus-dev (Linux) nor Homebrew (macOS), so codec.go's
# dlopen would otherwise fail. Both name patterns are attempted; only the
# host's matching one exists.
if command -v pkg-config >/dev/null 2>&1 && pkg-config --exists opus; then
LIBOPUS_DIR=$(pkg-config --variable=libdir opus)
cp -avfL $LIBOPUS_DIR/libopus.so* $CURDIR/package/lib/ 2>/dev/null || true
cp -avf $LIBOPUS_DIR/libopus.so* $CURDIR/package/lib/ 2>/dev/null || true
cp -avf $LIBOPUS_DIR/libopus*.dylib $CURDIR/package/lib/ 2>/dev/null || true
fi
# Detect architecture and copy appropriate libraries
@@ -38,6 +48,8 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin — system libraries linked dynamically, no bundled loader needed"
else
echo "Warning: Could not detect architecture for system library bundling"
fi

View File

@@ -1,15 +1,20 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY=$CURDIR/lib/libopusshim.so
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/opus "$@"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY="$CURDIR"/lib/libopusshim.dylib
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY="$CURDIR"/lib/libopusshim.so
fi
exec $CURDIR/opus "$@"
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
exec "$CURDIR"/lib/ld.so "$CURDIR"/opus "$@"
fi
exec "$CURDIR"/opus "$@"

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
# Upstream pin lives below as PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go

View File

@@ -4,10 +4,10 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
fi

View File

@@ -16,7 +16,15 @@ cp -rfv $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/sources/go-piper/piper-phonemize/pi/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
if [ "$(uname)" = "Darwin" ]; then
# macOS has no glibc loader to bundle. The piper binary links its bundled
# libs (libucd, libespeak-ng, libpiper_phonemize, libonnxruntime) via
# @rpath but ships with no LC_RPATH, so dyld aborts at launch with
# "Library not loaded: @rpath/libucd.dylib ... no LC_RPATH's found".
# Add an @loader_path/lib rpath so @rpath resolves to package/lib/.
echo "Detected macOS; adding @loader_path/lib rpath so bundled libs resolve via @rpath..."
install_name_tool -add_rpath @loader_path/lib "$CURDIR/package/piper"
elif [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so

View File

@@ -1,15 +1,20 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
export ESPEAK_NG_DATA=$CURDIR/espeak-ng-data
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ESPEAK_NG_DATA="$CURDIR"/espeak-ng-data
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/piper "$@"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
exec $CURDIR/piper "$@"
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
exec "$CURDIR"/lib/ld.so "$CURDIR"/piper "$@"
fi
exec "$CURDIR"/piper "$@"

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then
if [ -e "$CURDIR"/libgoqwen3ttscpp-avx.so ]; then
LIBRARY="$CURDIR/libgoqwen3ttscpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgoqwen3ttscpp-avx2.so ]; then
if [ -e "$CURDIR"/libgoqwen3ttscpp-avx2.so ]; then
LIBRARY="$CURDIR/libgoqwen3ttscpp-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgoqwen3ttscpp-avx512.so ]; then
if [ -e "$CURDIR"/libgoqwen3ttscpp-avx512.so ]; then
LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export QWEN3TTS_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/qwen3-tts-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/qwen3-tts-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/qwen3-tts-cpp "$@"
exec "$CURDIR"/qwen3-tts-cpp "$@"

View File

@@ -34,6 +34,8 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DRFDETR_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/librfdetrcpp-avx.so ]; then
if [ -e "$CURDIR"/librfdetrcpp-avx.so ]; then
LIBRARY="$CURDIR/librfdetrcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/librfdetrcpp-avx2.so ]; then
if [ -e "$CURDIR"/librfdetrcpp-avx2.so ]; then
LIBRARY="$CURDIR/librfdetrcpp-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/librfdetrcpp-avx512.so ]; then
if [ -e "$CURDIR"/librfdetrcpp-avx512.so ]; then
LIBRARY="$CURDIR/librfdetrcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export RFDETR_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/rfdetr-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/rfdetr-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/rfdetr-cpp "$@"
exec "$CURDIR"/rfdetr-cpp "$@"

View File

@@ -31,6 +31,8 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,20 +15,20 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgosam3-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosam3-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosam3-avx.so ]; then
if [ -e "$CURDIR"/libgosam3-avx.so ]; then
LIBRARY="$CURDIR/libgosam3-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgosam3-avx2.so ]; then
if [ -e "$CURDIR"/libgosam3-avx2.so ]; then
LIBRARY="$CURDIR/libgosam3-avx2.so"
fi
fi
@@ -36,22 +36,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgosam3-avx512.so ]; then
if [ -e "$CURDIR"/libgosam3-avx512.so ]; then
LIBRARY="$CURDIR/libgosam3-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export SAM3_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/sam3-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/sam3-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/sam3-cpp "$@"
exec "$CURDIR"/sam3-cpp "$@"

View File

@@ -1,19 +1,19 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export SHERPA_SHIM_LIBRARY=$CURDIR/lib/libsherpa-shim.dylib
export SHERPA_ONNX_LIBRARY=$CURDIR/lib/libsherpa-onnx-c-api.dylib
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
export SHERPA_SHIM_LIBRARY="$CURDIR"/lib/libsherpa-shim.dylib
export SHERPA_ONNX_LIBRARY="$CURDIR"/lib/libsherpa-onnx-c-api.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/sherpa-onnx "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/sherpa-onnx "$@"
fi
exec $CURDIR/sherpa-onnx "$@"
exec "$CURDIR"/sherpa-onnx "$@"

View File

@@ -15,7 +15,14 @@ cp -avf $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
if [ "$(uname)" = "Darwin" ]; then
# macOS has no glibc loader to bundle. silero-vad links its bundled
# libonnxruntime via @rpath but ships with no LC_RPATH, so dyld can't find
# it at runtime. Add an @loader_path/lib rpath so @rpath resolves to
# package/lib/ (matching the piper darwin fix, #10525).
echo "Detected macOS; adding @loader_path/lib rpath so bundled libs resolve via @rpath..."
install_name_tool -add_rpath @loader_path/lib "$CURDIR/package/silero-vad"
elif [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so

View File

@@ -1,14 +1,18 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/silero-vad "$@"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
exec $CURDIR/silero-vad "$@"
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
exec "$CURDIR"/lib/ld.so "$CURDIR"/silero-vad "$@"
fi
exec "$CURDIR"/silero-vad "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f
STABLEDIFFUSION_GGML_VERSION?=9956436c925a367daeab097598b1ea1f32d3503f
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -20,20 +20,20 @@ if [ "$(uname)" = "Darwin" ]; then
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgosd-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosd-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosd-avx.so ]; then
if [ -e "$CURDIR"/libgosd-avx.so ]; then
LIBRARY="$CURDIR/libgosd-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgosd-avx2.so ]; then
if [ -e "$CURDIR"/libgosd-avx2.so ]; then
LIBRARY="$CURDIR/libgosd-avx2.so"
fi
fi
@@ -41,22 +41,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgosd-avx512.so ]; then
if [ -e "$CURDIR"/libgosd-avx512.so ]; then
LIBRARY="$CURDIR/libgosd-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export SD_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/stablediffusion-ggml "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/stablediffusion-ggml "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/stablediffusion-ggml "$@"
exec "$CURDIR"/stablediffusion-ggml "$@"

View File

@@ -1,21 +1,21 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
# macOS uses dyld: there is no ld.so loader, and the search path env
# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH="$CURDIR"/lib/libonnxruntime.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH="$CURDIR"/lib/libonnxruntime.so
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/supertonic "$@"
fi
fi
exec $CURDIR/supertonic "$@"
exec "$CURDIR"/supertonic "$@"

View File

@@ -1,7 +1,7 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -14,41 +14,41 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgovibevoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgovibevoicecpp-avx.so ]; then
if [ -e "$CURDIR"/libgovibevoicecpp-avx.so ]; then
LIBRARY="$CURDIR/libgovibevoicecpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgovibevoicecpp-avx2.so ]; then
if [ -e "$CURDIR"/libgovibevoicecpp-avx2.so ]; then
LIBRARY="$CURDIR/libgovibevoicecpp-avx2.so"
fi
fi
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgovibevoicecpp-avx512.so ]; then
if [ -e "$CURDIR"/libgovibevoicecpp-avx512.so ]; then
LIBRARY="$CURDIR/libgovibevoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export VIBEVOICECPP_LIBRARY=$LIBRARY
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/vibevoice-cpp "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/vibevoice-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/vibevoice-cpp "$@"
exec "$CURDIR"/vibevoice-cpp "$@"

18
backend/go/voice-detect/.gitignore vendored Normal file
View File

@@ -0,0 +1,18 @@
# Fetched upstream sources
sources/
# CMake build directories
build*/
# build artifacts staged in-tree by the Makefile (cp from sources/) or
# symlinked for local dev; the real sources live in voice-detect.cpp upstream.
*.so
*.so.*
voicedetect_capi.h
compile_commands.json
# Compiled backend binary
voice-detect-grpc
# Packaging output
package/

View File

@@ -0,0 +1,107 @@
# voice-detect backend Makefile.
#
# Upstream pin lives below as VOICEDETECT_VERSION?=3d51077... (.github/bump_deps.sh
# can find and update it - matches the parakeet.cpp / whisper.cpp / ds4 convention).
#
# Local dev shortcut: if you already have an out-of-tree voice-detect.cpp build,
# symlink the .so + header into this directory and skip the clone/cmake steps:
#
# ln -sf /path/to/voice-detect.cpp/build-shared/libvoicedetect.so .
# ln -sf /path/to/voice-detect.cpp/include/voicedetect_capi.h .
# go build -o voice-detect-grpc .
#
# The default target below does the proper clone-at-pin + cmake build so CI does
# not need a side-checkout.
VOICEDETECT_VERSION?=3d510772357538c5182808ac7de2278b84824e24
VOICEDETECT_REPO?=https://github.com/mudler/voice-detect.cpp
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
BUILD_TYPE?=
NATIVE?=false
# Resolve the target arch. The backend matrix / Docker build pass TARGETARCH
# (amd64|arm64); fall back to uname -m (aarch64|x86_64) for a local build.
RECON_ARCH?=$(or $(TARGETARCH),$(shell uname -m))
# Build ggml statically into libvoicedetect.so (PIC) so the shared lib is
# self-contained: dlopen needs no libggml*.so alongside it, only system libs
# (libstdc++/libgomp/libc) that the runtime image already provides.
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DVOICEDETECT_SHARED=ON -DVOICEDETECT_BUILD_CLI=OFF -DVOICEDETECT_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# voice-detect.cpp gates its GGML backends behind VOICEDETECT_GGML_* options and
# does set(GGML_CUDA ${VOICEDETECT_GGML_CUDA} CACHE BOOL "" FORCE), so a bare
# -DGGML_CUDA=ON is overwritten back to OFF. Forward the VOICEDETECT_GGML_*
# options instead. (openblas is not gated, so -DGGML_BLAS passes through.)
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDA=ON
# Opt-in cuDNN implicit-GEMM conv path (kills im2col on GPU, reaches
# torch-cuDNN parity). Only the arm64 + CUDA 13 image (GB10/Jetson/L4T)
# ships libcudnn9 + the -dev headers, so gate cuDNN to that variant.
# x86 CUDA images carry no cuDNN -> enabling it there is a link failure.
ifeq ($(CUDA_MAJOR_VERSION),13)
ifneq (,$(filter arm64 aarch64,$(RECON_ARCH)))
CMAKE_ARGS+=-DVOICEDETECT_GGML_CUDNN=ON
endif
endif
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DVOICEDETECT_GGML_HIP=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DVOICEDETECT_GGML_VULKAN=ON
else ifeq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DVOICEDETECT_GGML_METAL=ON
endif
.PHONY: voice-detect-grpc package build clean purge test all
all: voice-detect-grpc
# Clone the upstream voice-detect.cpp source at the pinned commit. Directory acts
# as the target so make only re-clones when missing. After a VOICEDETECT_VERSION
# bump, run 'make purge && make' to refetch.
sources/voice-detect.cpp:
mkdir -p sources/voice-detect.cpp
cd sources/voice-detect.cpp && \
git init -q && \
git remote add origin $(VOICEDETECT_REPO) && \
git fetch --depth 1 origin $(VOICEDETECT_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1 --single-branch
# Build the shared lib + header out-of-tree, then stage them next to the Go
# sources so purego.Dlopen("libvoicedetect.so") and the cgo-less build both pick
# them up.
libvoicedetect.so: sources/voice-detect.cpp
cmake -B sources/voice-detect.cpp/build-shared -S sources/voice-detect.cpp $(CMAKE_ARGS)
cmake --build sources/voice-detect.cpp/build-shared --config Release -j$(JOBS) --target voicedetect
cp -fv sources/voice-detect.cpp/build-shared/libvoicedetect.so* ./ 2>/dev/null || true
cp -fv sources/voice-detect.cpp/include/voicedetect_capi.h ./
voice-detect-grpc: libvoicedetect.so main.go govoicedetect.go options.go
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voice-detect-grpc .
package: voice-detect-grpc
bash package.sh
build: package
# Test target. The embed/verify/analyze smoke specs are gated on
# VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV; without them the
# heavy specs auto-skip and only the pure-Go parsing specs run.
test:
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
clean: purge
rm -rf libvoicedetect.so* voicedetect_capi.h package voice-detect-grpc
purge:
rm -rf sources/voice-detect.cpp

View File

@@ -0,0 +1,273 @@
package main
import (
"encoding/json"
"errors"
"fmt"
"math"
"os"
"path/filepath"
"strconv"
"strings"
"time"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/xlog"
)
// purego-bound entry points from libvoicedetect.so. Names match
// voicedetect_capi.h exactly so a `nm libvoicedetect.so | grep voicedetect_capi`
// is enough to spot drift.
//
// The opaque ctx and the malloc'd char*/float* return values are declared as
// uintptr so we get the raw pointer back and can release it via the matching
// capi free function. purego's native string/[]float32 returns would copy and
// forget the original pointer, leaking the C-owned buffer on every call.
var (
CppAbiVersion func() int32
CppLoad func(ggufPath string) uintptr
CppFree func(ctx uintptr)
CppLastError func(ctx uintptr) string
CppFreeString func(s uintptr)
CppFreeVec func(v uintptr)
CppEmbedPath func(ctx uintptr, wavPath string, outVec, outDim unsafe.Pointer) int32
CppEmbedPCM func(ctx uintptr, pcm []float32, nSamples, sampleRate int32, outVec, outDim unsafe.Pointer) int32
CppVerifyPaths func(ctx uintptr, a, b string, threshold float32, outDistance, outVerified unsafe.Pointer) int32
CppAnalyzeJSON func(ctx uintptr, wavPath string) uintptr
)
// VoiceDetect implements the speaker-recognition voice subset of the Backend
// gRPC service over libvoicedetect.so. The C side keeps a single loaded model
// plus a per-ctx last-error buffer and is not reentrant, so base.SingleThread
// serializes every call.
type VoiceDetect struct {
base.SingleThread
opts loadOptions
ctxPtr uintptr
}
func (v *VoiceDetect) Load(opts *pb.ModelOptions) error {
model := opts.ModelFile
if model == "" {
model = opts.ModelPath
}
if !filepath.IsAbs(model) && opts.ModelPath != "" {
model = filepath.Join(opts.ModelPath, model)
}
if model == "" {
return errors.New("voice-detect: ModelFile is required")
}
v.opts = parseOptions(opts.Options)
if v.opts.modelName == "" {
v.opts.modelName = filepath.Base(model)
}
// Propagate LocalAI's per-model thread budget to the engine. LocalAI spawns
// one backend process per model and serves requests concurrently, so the
// engine's own min(hardware_concurrency, 8) default can oversubscribe cores.
// VOICEDETECT_THREADS is read by the engine at backend construction, so it
// must be set before the capi load. A non-positive Threads means "unset":
// leave the env alone so the engine keeps its sane default.
threads := opts.Threads
if threads > 0 {
if err := os.Setenv("VOICEDETECT_THREADS", strconv.Itoa(int(threads))); err != nil {
return fmt.Errorf("voice-detect: set VOICEDETECT_THREADS: %w", err)
}
xlog.Info("voice-detect: applying LocalAI thread budget", "threads", threads)
}
xlog.Info("voice-detect: loading model", "model", model,
"verify_threshold", v.opts.verifyThreshold, "abi", CppAbiVersion())
ctx := CppLoad(model)
if ctx == 0 {
// The last-error buffer lives on the ctx that was never returned, so
// surface the path the operator tried to load instead.
return fmt.Errorf("voice-detect: voicedetect_capi_load failed for %q", model)
}
v.ctxPtr = ctx
return nil
}
// VoiceEmbed returns the L2-normalized speaker embedding for an audio clip.
// The request carries a filesystem PATH; the HTTP layer materializes
// base64/URL/data-URI inputs to a temp file before the gRPC call.
func (v *VoiceDetect) VoiceEmbed(req *pb.VoiceEmbedRequest) (pb.VoiceEmbedResponse, error) {
if v.ctxPtr == 0 {
return pb.VoiceEmbedResponse{}, errors.New("voice-detect: model not loaded")
}
if req.Audio == "" {
return pb.VoiceEmbedResponse{}, errors.New("voice-detect: audio path is required")
}
emb, err := v.embedPath(req.Audio)
if err != nil {
return pb.VoiceEmbedResponse{}, err
}
return pb.VoiceEmbedResponse{Embedding: emb, Model: v.opts.modelName}, nil
}
func (v *VoiceDetect) embedPath(path string) ([]float32, error) {
var vec uintptr
var dim int32
rc := CppEmbedPath(v.ctxPtr, path, unsafe.Pointer(&vec), unsafe.Pointer(&dim))
if rc != 0 || vec == 0 || dim <= 0 {
return nil, v.lastErr("embed", path)
}
defer CppFreeVec(vec)
// Copy out of the C-owned malloc'd buffer before freeing it. The
// uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory; safe here, the GC neither tracks
// nor moves this buffer and we copy immediately.
src := unsafe.Slice((*float32)(unsafe.Pointer(vec)), int(dim)) //nolint:govet // C-owned malloc'd vector, copied out before free
out := make([]float32, int(dim))
copy(out, src)
return out, nil
}
// VoiceVerify embeds two clips and reports whether they are the same speaker by
// cosine distance against a threshold. A request threshold <= 0 falls back to
// the model-configured default (verify_threshold option, 0.25 if unset).
func (v *VoiceDetect) VoiceVerify(req *pb.VoiceVerifyRequest) (pb.VoiceVerifyResponse, error) {
if v.ctxPtr == 0 {
return pb.VoiceVerifyResponse{}, errors.New("voice-detect: model not loaded")
}
if req.Audio1 == "" || req.Audio2 == "" {
return pb.VoiceVerifyResponse{}, errors.New("voice-detect: audio1 and audio2 are required")
}
threshold := req.Threshold
if threshold <= 0 {
threshold = v.opts.verifyThreshold
}
started := time.Now()
var distance float32
var verified int32
rc := CppVerifyPaths(v.ctxPtr, req.Audio1, req.Audio2, threshold,
unsafe.Pointer(&distance), unsafe.Pointer(&verified))
if rc != 0 {
return pb.VoiceVerifyResponse{}, v.lastErr("verify", req.Audio1+","+req.Audio2)
}
elapsedMs := float32(time.Since(started).Seconds() * 1000.0)
// Confidence decays linearly from 100 at distance 0 to 0 at the threshold,
// matching the Python speaker-recognition backend's reporting.
confidence := float32(0)
if threshold > 0 {
confidence = float32(math.Max(0, math.Min(100, (1.0-float64(distance)/float64(threshold))*100.0)))
}
return pb.VoiceVerifyResponse{
Verified: verified != 0,
Distance: distance,
Threshold: threshold,
Confidence: confidence,
Model: v.opts.modelName,
ProcessingTimeMs: elapsedMs,
}, nil
}
// VoiceAnalyze runs the age/gender/emotion heads on a single clip. The C-API
// always evaluates every supported head, so the request's actions filter is
// advisory and the full analysis is returned as a single segment (the engine
// does not produce time-bounded segments).
func (v *VoiceDetect) VoiceAnalyze(req *pb.VoiceAnalyzeRequest) (pb.VoiceAnalyzeResponse, error) {
if v.ctxPtr == 0 {
return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: model not loaded")
}
if req.Audio == "" {
return pb.VoiceAnalyzeResponse{}, errors.New("voice-detect: audio path is required")
}
ptr := CppAnalyzeJSON(v.ctxPtr, req.Audio)
if ptr == 0 {
return pb.VoiceAnalyzeResponse{}, v.lastErr("analyze", req.Audio)
}
defer CppFreeString(ptr)
seg, err := parseAnalyzeJSON(goStringFromCPtr(ptr))
if err != nil {
return pb.VoiceAnalyzeResponse{}, fmt.Errorf("voice-detect: analyze JSON for %q: %w", req.Audio, err)
}
return pb.VoiceAnalyzeResponse{Segments: []*pb.VoiceAnalysis{seg}}, nil
}
// analyzeJSON mirrors the document returned by voicedetect_capi_analyze_path_json:
//
// {"age":42.0,
// "gender":{"label":"female","female":0.88,"male":0.12},
// "emotion":{"label":"neutral","scores":{"neutral":0.7, ...}}}
//
// gender is a mixed object (a "label" string plus per-class float scores), so
// it is decoded into raw messages and split in parseAnalyzeJSON.
type analyzeJSON struct {
Age float32 `json:"age"`
Gender map[string]json.RawMessage `json:"gender"`
Emotion struct {
Label string `json:"label"`
Scores map[string]float32 `json:"scores"`
} `json:"emotion"`
}
// parseAnalyzeJSON maps the engine's analyze document onto a VoiceAnalysis.
// start/end stay 0: the model emits a single whole-utterance result, not
// time-bounded segments.
func parseAnalyzeJSON(doc string) (*pb.VoiceAnalysis, error) {
var a analyzeJSON
if err := json.Unmarshal([]byte(doc), &a); err != nil {
return nil, err
}
seg := &pb.VoiceAnalysis{
Age: a.Age,
DominantEmotion: a.Emotion.Label,
Emotion: a.Emotion.Scores,
}
if len(a.Gender) > 0 {
gender := make(map[string]float32, len(a.Gender))
for k, raw := range a.Gender {
if k == "label" {
_ = json.Unmarshal(raw, &seg.DominantGender)
continue
}
var score float32
if err := json.Unmarshal(raw, &score); err == nil {
gender[k] = score
}
}
seg.Gender = gender
}
return seg, nil
}
// lastErr wraps the C-API's per-ctx last-error buffer into a Go error.
func (v *VoiceDetect) lastErr(op, subject string) error {
msg := strings.TrimSpace(CppLastError(v.ctxPtr))
if msg == "" {
msg = "no error detail"
}
return fmt.Errorf("voice-detect: %s failed for %q: %s", op, subject, msg)
}
// goStringFromCPtr copies a NUL-terminated C string into Go memory. cptr is a
// malloc'd buffer the caller owns; release it via CppFreeString after the copy.
//
// The uintptr->Pointer conversion trips vet's unsafeptr check, which can't tell
// a C heap pointer from Go-managed memory. Safe here: the GC neither tracks nor
// moves the buffer and we dereference it immediately to copy the bytes out.
func goStringFromCPtr(cptr uintptr) string {
if cptr == 0 {
return ""
}
p := unsafe.Pointer(cptr) //nolint:govet // C-owned malloc'd buffer, not Go-GC memory (see doc above)
n := 0
for *(*byte)(unsafe.Add(p, n)) != 0 {
n++
}
return string(unsafe.Slice((*byte)(p), n))
}

View File

@@ -0,0 +1,144 @@
package main
import (
"os"
"sync"
"testing"
"github.com/ebitengine/purego"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestVoiceDetect(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "voice-detect Backend Suite")
}
var (
libLoadOnce sync.Once
libLoadErr error
)
// ensureLibLoaded mirrors main.go's bootstrap so a Go test can drive the C-API
// bridge without spinning up the gRPC server. Records the error (the smoke
// specs skip themselves) when libvoicedetect.so is not loadable from cwd
// (LD_LIBRARY_PATH or a symlink in ./).
func ensureLibLoaded() error {
libLoadOnce.Do(func() {
libName := os.Getenv("VOICEDETECT_LIBRARY")
if libName == "" {
libName = "libvoicedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
libLoadErr = err
return
}
purego.RegisterLibFunc(&CppAbiVersion, lib, "voicedetect_capi_abi_version")
purego.RegisterLibFunc(&CppLoad, lib, "voicedetect_capi_load")
purego.RegisterLibFunc(&CppFree, lib, "voicedetect_capi_free")
purego.RegisterLibFunc(&CppLastError, lib, "voicedetect_capi_last_error")
purego.RegisterLibFunc(&CppFreeString, lib, "voicedetect_capi_free_string")
purego.RegisterLibFunc(&CppFreeVec, lib, "voicedetect_capi_free_vec")
purego.RegisterLibFunc(&CppEmbedPath, lib, "voicedetect_capi_embed_path")
purego.RegisterLibFunc(&CppEmbedPCM, lib, "voicedetect_capi_embed_pcm")
purego.RegisterLibFunc(&CppVerifyPaths, lib, "voicedetect_capi_verify_paths")
purego.RegisterLibFunc(&CppAnalyzeJSON, lib, "voicedetect_capi_analyze_path_json")
})
return libLoadErr
}
var _ = Describe("parseOptions", func() {
It("defaults verify_threshold to 0.25", func() {
o := parseOptions(nil)
Expect(o.verifyThreshold).To(Equal(float32(0.25)))
Expect(o.modelName).To(Equal(""))
})
It("parses verify_threshold, threshold alias and model_name", func() {
o := parseOptions([]string{"verify_threshold:0.4", "model_name:ecapa", "unknown:x"})
Expect(o.verifyThreshold).To(Equal(float32(0.4)))
Expect(o.modelName).To(Equal("ecapa"))
o2 := parseOptions([]string{"threshold:0.3"})
Expect(o2.verifyThreshold).To(Equal(float32(0.3)))
})
It("ignores non-positive thresholds and keeps the default", func() {
o := parseOptions([]string{"verify_threshold:0", "threshold:-1"})
Expect(o.verifyThreshold).To(Equal(float32(0.25)))
})
})
var _ = Describe("parseAnalyzeJSON", func() {
It("maps age, gender label+scores and emotion label+scores", func() {
doc := `{"age":42.0,
"gender":{"label":"female","female":0.88,"male":0.12},
"emotion":{"label":"neutral","scores":{"neutral":0.7,"happy":0.2,"sad":0.1}}}`
seg, err := parseAnalyzeJSON(doc)
Expect(err).ToNot(HaveOccurred())
Expect(seg.Age).To(BeNumerically("~", 42.0, 1e-4))
Expect(seg.Start).To(Equal(float32(0)))
Expect(seg.End).To(Equal(float32(0)))
Expect(seg.DominantGender).To(Equal("female"))
Expect(seg.Gender).To(HaveKeyWithValue("female", BeNumerically("~", 0.88, 1e-4)))
Expect(seg.Gender).To(HaveKeyWithValue("male", BeNumerically("~", 0.12, 1e-4)))
// The "label" entry is consumed into DominantGender, not the score map.
Expect(seg.Gender).ToNot(HaveKey("label"))
Expect(seg.DominantEmotion).To(Equal("neutral"))
Expect(seg.Emotion).To(HaveKeyWithValue("neutral", BeNumerically("~", 0.7, 1e-4)))
Expect(seg.Emotion).To(HaveKeyWithValue("happy", BeNumerically("~", 0.2, 1e-4)))
})
It("tolerates a missing gender block", func() {
seg, err := parseAnalyzeJSON(`{"age":30.0,"emotion":{"label":"happy","scores":{"happy":1.0}}}`)
Expect(err).ToNot(HaveOccurred())
Expect(seg.DominantGender).To(Equal(""))
Expect(seg.DominantEmotion).To(Equal("happy"))
})
It("returns an error on malformed JSON", func() {
_, err := parseAnalyzeJSON(`{not-json`)
Expect(err).To(HaveOccurred())
})
})
// The specs below exercise the real C-API end to end. They run only when both a
// model GGUF and a test WAV are provided, and skip cleanly otherwise so the
// suite stays green without large assets.
var _ = Describe("VoiceDetect end-to-end", Ordered, func() {
var (
v *VoiceDetect
modelPath = os.Getenv("VOICEDETECT_BACKEND_TEST_MODEL")
wavPath = os.Getenv("VOICEDETECT_BACKEND_TEST_WAV")
)
BeforeAll(func() {
if modelPath == "" || wavPath == "" {
Skip("set VOICEDETECT_BACKEND_TEST_MODEL and VOICEDETECT_BACKEND_TEST_WAV to run the e2e specs")
}
if err := ensureLibLoaded(); err != nil {
Skip("libvoicedetect.so not loadable: " + err.Error())
}
v = &VoiceDetect{}
Expect(v.Load(&pb.ModelOptions{ModelFile: modelPath})).To(Succeed())
})
It("embeds an audio clip", func() {
resp, err := v.VoiceEmbed(&pb.VoiceEmbedRequest{Audio: wavPath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Embedding).ToNot(BeEmpty())
Expect(resp.Model).ToNot(BeEmpty())
})
It("verifies a clip against itself as the same speaker", func() {
resp, err := v.VoiceVerify(&pb.VoiceVerifyRequest{Audio1: wavPath, Audio2: wavPath})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Verified).To(BeTrue())
Expect(resp.Distance).To(BeNumerically("<=", resp.Threshold))
})
})

View File

@@ -0,0 +1,64 @@
package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads libvoicedetect.so via purego and registers the flat C-API entry points
// declared in voicedetect_capi.h. The library name can be overridden with
// VOICEDETECT_LIBRARY (mirrors the PARAKEET_LIBRARY / OMNIVOICE_LIBRARY
// convention in the sibling backends); the default looks for the .so next to
// this binary (resolved via LD_LIBRARY_PATH by run.sh).
import (
"flag"
"fmt"
"os"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
type LibFuncs struct {
FuncPtr any
Name string
}
func main() {
libName := os.Getenv("VOICEDETECT_LIBRARY")
if libName == "" {
libName = "libvoicedetect.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("voice-detect: dlopen %q: %w", libName, err))
}
// Bound 1:1 to voicedetect_capi.h. char*/float* returns are registered as
// uintptr so the raw pointer can be freed via the matching capi free fn.
libFuncs := []LibFuncs{
{&CppAbiVersion, "voicedetect_capi_abi_version"},
{&CppLoad, "voicedetect_capi_load"},
{&CppFree, "voicedetect_capi_free"},
{&CppLastError, "voicedetect_capi_last_error"},
{&CppFreeString, "voicedetect_capi_free_string"},
{&CppFreeVec, "voicedetect_capi_free_vec"},
{&CppEmbedPath, "voicedetect_capi_embed_path"},
{&CppEmbedPCM, "voicedetect_capi_embed_pcm"},
{&CppVerifyPaths, "voicedetect_capi_verify_paths"},
{&CppAnalyzeJSON, "voicedetect_capi_analyze_path_json"},
}
for _, lf := range libFuncs {
purego.RegisterLibFunc(lf.FuncPtr, lib, lf.Name)
}
fmt.Fprintf(os.Stderr, "[voice-detect] ABI=%d\n", CppAbiVersion())
flag.Parse()
if err := grpc.StartServer(*addr, &VoiceDetect{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,46 @@
package main
import (
"strconv"
"strings"
)
// defaultVerifyThreshold is the cosine-distance cutoff used when a request does
// not set one. Matches the Python speaker-recognition backend's default so the
// two implementations agree on verdicts out of the box.
const defaultVerifyThreshold float32 = 0.25
// loadOptions holds the parsed model-level options for voice-detect.
type loadOptions struct {
verifyThreshold float32
modelName string
}
func splitOption(o string) (key, value string, ok bool) {
i := strings.Index(o, ":")
if i < 0 {
return "", "", false
}
return strings.TrimSpace(o[:i]), strings.TrimSpace(o[i+1:]), true
}
// parseOptions reads the backend "key:value" option slice. Unknown keys are
// ignored. Defaults: verify_threshold 0.25, model_name derived from the file.
func parseOptions(opts []string) loadOptions {
o := loadOptions{verifyThreshold: defaultVerifyThreshold}
for _, oo := range opts {
key, value, ok := splitOption(oo)
if !ok {
continue
}
switch key {
case "verify_threshold", "threshold":
if f, err := strconv.ParseFloat(value, 32); err == nil && f > 0 {
o.verifyThreshold = float32(f)
}
case "model_name":
o.modelName = value
}
}
return o
}

View File

@@ -0,0 +1,68 @@
#!/bin/bash
#
# Bundle the voice-detect-grpc binary, libvoicedetect.so, the core runtime libs
# (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE
# so the package is self-contained. Mirrors backend/go/parakeet-cpp/package.sh;
# run.sh routes the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc
# is used instead of the host's.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/voice-detect-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libvoicedetect.so + any soname symlinks. purego.Dlopen resolves it via
# LD_LIBRARY_PATH, which run.sh points at lib/.
cp -avf "$CURDIR"/libvoicedetect.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libvoicedetect.so not found in $CURDIR, run 'make' first" >&2
exit 1
}
# Detect architecture and copy the core runtime libs libvoicedetect.so links
# against, plus the matching dynamic loader as lib/ld.so.
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) based on
# BUILD_TYPE so the backend can reach the GPU without the runtime base image
# shipping those drivers.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

16
backend/go/voice-detect/run.sh Executable file
View File

@@ -0,0 +1,16 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the whisper /
# parakeet backends' runtime layout).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/voice-detect-grpc" "$@"
fi
exec "$CURDIR/voice-detect-grpc" "$@"

14
backend/go/voice-detect/test.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
cd "$CURDIR"
echo "Running voice-detect backend tests..."
# The pure-Go parsing specs always run. The embed/verify/analyze smoke specs run
# only when a model + WAV are provided via VOICEDETECT_BACKEND_TEST_MODEL and
# VOICEDETECT_BACKEND_TEST_WAV; otherwise they auto-skip.
LD_LIBRARY_PATH="$CURDIR:${LD_LIBRARY_PATH:-}" go test -v -timeout 1200s .
echo "voice-detect tests completed."

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -15,35 +15,35 @@ fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgovoxtral-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgovoxtral-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgovoxtral-avx.so ]; then
if [ -e "$CURDIR"/libgovoxtral-avx.so ]; then
LIBRARY="$CURDIR/libgovoxtral-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgovoxtral-avx2.so ]; then
if [ -e "$CURDIR"/libgovoxtral-avx2.so ]; then
LIBRARY="$CURDIR/libgovoxtral-avx2.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export VOXTRAL_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it (Linux only)
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/voxtral "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/voxtral "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/voxtral "$@"
exec "$CURDIR"/voxtral "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea
WHISPER_CPP_VERSION?=0ae02cdb2c7317b50991367c165736ce42ed96ac
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
cd /
@@ -13,22 +13,28 @@ if [ "$(uname)" != "Darwin" ]; then
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
# macOS: single fallback variant (Metal/Accelerate). The cmake build emits a
# Mach-O named .so, but tolerate .dylib too — pick whichever exists so the Go
# loader doesn't panic on a hardcoded name that isn't on disk.
if [ -e "$CURDIR/libgowhisper-fallback.dylib" ]; then
LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
else
LIBRARY="$CURDIR/libgowhisper-fallback.so"
fi
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgowhisper-fallback.so"
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgowhisper-avx.so ]; then
if [ -e "$CURDIR"/libgowhisper-avx.so ]; then
LIBRARY="$CURDIR/libgowhisper-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/libgowhisper-avx2.so ]; then
if [ -e "$CURDIR"/libgowhisper-avx2.so ]; then
LIBRARY="$CURDIR/libgowhisper-avx2.so"
fi
fi
@@ -36,22 +42,22 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/libgowhisper-avx512.so ]; then
if [ -e "$CURDIR"/libgowhisper-avx512.so ]; then
LIBRARY="$CURDIR/libgowhisper-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export WHISPER_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
if [ -f "$CURDIR"/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec $CURDIR/lib/ld.so $CURDIR/whisper "$@"
exec "$CURDIR"/lib/ld.so "$CURDIR"/whisper "$@"
fi
echo "Using library: $LIBRARY"
exec $CURDIR/whisper "$@"
exec "$CURDIR"/whisper "$@"

View File

@@ -209,6 +209,78 @@
nvidia-cuda-12: "cuda12-ced"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced"
- &voicedetect
name: "voice-detect"
alias: "voice-detect"
license: mit
icon: https://avatars.githubusercontent.com/u/95302084
description: |
voice-detect speaker recognition and voice analysis.
voice-detect.cpp is a C++/ggml engine that produces L2-normalised
speaker embeddings (ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker
ERes2Net, CAM++) for voice verification and 1:N identification, plus
a wav2vec2 age / gender / emotion analysis head. It replaces the
Python speaker-recognition backend and is exposed through the Voice*
gRPC rpcs and the /v1/voice/* REST endpoints. It runs on CPU, NVIDIA
CUDA, AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
urls:
- https://github.com/mudler/voice-detect.cpp
tags:
- voice-recognition
- speaker-verification
- speaker-embedding
- CPU
- GPU
- CUDA
- HIP
capabilities:
default: "cpu-voice-detect"
nvidia: "cuda12-voice-detect"
intel: "intel-sycl-f16-voice-detect"
metal: "metal-voice-detect"
amd: "rocm-voice-detect"
vulkan: "vulkan-voice-detect"
nvidia-l4t: "nvidia-l4t-arm64-voice-detect"
nvidia-cuda-13: "cuda13-voice-detect"
nvidia-cuda-12: "cuda12-voice-detect"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect"
- &facedetect
name: "face-detect"
alias: "face-detect"
license: mit
icon: https://avatars.githubusercontent.com/u/95302084
description: |
face-detect face detection, embedding, verification and analysis.
face-detect.cpp is a C++/ggml engine that runs SCRFD / YuNet face
detection and ArcFace / SFace 512-d (or 128-d) L2-normalised face
embeddings for verification and 1:N identification, plus a landmark /
age / gender analysis head. It replaces the Python insightface backend
and is exposed through the Embedding, Detect and Face* gRPC rpcs and
the /v1/face/* REST endpoints. It runs on CPU, NVIDIA CUDA, AMD
ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets.
urls:
- https://github.com/mudler/face-detect.cpp
tags:
- face-recognition
- face-verification
- face-embedding
- CPU
- GPU
- CUDA
- HIP
capabilities:
default: "cpu-face-detect"
nvidia: "cuda12-face-detect"
intel: "intel-sycl-f16-face-detect"
metal: "metal-face-detect"
amd: "rocm-face-detect"
vulkan: "vulkan-face-detect"
nvidia-l4t: "nvidia-l4t-arm64-face-detect"
nvidia-cuda-13: "cuda13-face-detect"
nvidia-cuda-12: "cuda12-face-detect"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect"
- &voxtral
name: "voxtral"
alias: "voxtral"
@@ -340,6 +412,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp"
intel: "intel-sycl-f32-sam3-cpp"
vulkan: "vulkan-sam3-cpp"
metal: "metal-sam3-cpp"
- &rfdetrcpp
name: "rfdetr-cpp"
alias: "rfdetr-cpp"
@@ -368,6 +441,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp"
intel: "intel-sycl-f32-rfdetr-cpp"
vulkan: "vulkan-rfdetr-cpp"
metal: "metal-rfdetr-cpp"
- &locateanything
name: "locate-anything"
alias: "locate-anything"
@@ -397,6 +471,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp"
intel: "intel-sycl-f32-locate-anything-cpp"
vulkan: "vulkan-locate-anything-cpp"
metal: "metal-locate-anything-cpp"
- !!merge <<: *locateanything
name: "locate-anything-development"
capabilities:
@@ -409,6 +484,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp-development"
intel: "intel-sycl-f32-locate-anything-cpp-development"
vulkan: "vulkan-locate-anything-cpp-development"
metal: "metal-locate-anything-cpp-development"
- !!merge <<: *locateanything
name: "cpu-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-locate-anything-cpp"
@@ -419,6 +495,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-cpu-locate-anything-cpp
- !!merge <<: *locateanything
name: "metal-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "metal-locate-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-locate-anything-cpp"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-locate-anything-cpp
- !!merge <<: *locateanything
name: "cuda12-locate-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-locate-anything-cpp"
@@ -517,6 +603,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp"
intel: "intel-sycl-f32-depth-anything-cpp"
vulkan: "vulkan-depth-anything-cpp"
metal: "metal-depth-anything-cpp"
- !!merge <<: *depthanything
name: "depth-anything-development"
capabilities:
@@ -529,6 +616,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp-development"
intel: "intel-sycl-f32-depth-anything-cpp-development"
vulkan: "vulkan-depth-anything-cpp-development"
metal: "metal-depth-anything-cpp-development"
- !!merge <<: *depthanything
name: "cpu-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-depth-anything-cpp"
@@ -539,6 +627,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-cpu-depth-anything-cpp
- !!merge <<: *depthanything
name: "metal-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "metal-depth-anything-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-depth-anything-cpp"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-depth-anything-cpp
- !!merge <<: *depthanything
name: "cuda12-depth-anything-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-depth-anything-cpp"
@@ -1031,6 +1129,8 @@
nvidia-l4t: "vulkan-localvqe"
nvidia-l4t-cuda-12: "vulkan-localvqe"
nvidia-l4t-cuda-13: "vulkan-localvqe"
# Apple Silicon: CPU build (LocalVQE has no Metal path); still arm64-native.
metal: "metal-localvqe"
- &privacyfilter
name: "privacy-filter"
alias: "privacy-filter"
@@ -1067,6 +1167,7 @@
amd: "vulkan-privacy-filter"
intel: "vulkan-privacy-filter"
vulkan: "vulkan-privacy-filter"
metal: "metal-privacy-filter"
- &faster-whisper
icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
description: |
@@ -1327,7 +1428,6 @@
intel: "intel-fish-speech"
amd: "rocm-fish-speech"
nvidia-l4t: "nvidia-l4t-fish-speech"
metal: "metal-fish-speech"
default: "cpu-fish-speech"
nvidia-cuda-13: "cuda13-fish-speech"
nvidia-cuda-12: "cuda12-fish-speech"
@@ -2799,6 +2899,236 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-ced
## voice-detect
- !!merge <<: *voicedetect
name: "voice-detect-development"
capabilities:
default: "cpu-voice-detect-development"
nvidia: "cuda12-voice-detect-development"
intel: "intel-sycl-f16-voice-detect-development"
metal: "metal-voice-detect-development"
amd: "rocm-voice-detect-development"
vulkan: "vulkan-voice-detect-development"
nvidia-l4t: "nvidia-l4t-arm64-voice-detect-development"
nvidia-cuda-13: "cuda13-voice-detect-development"
nvidia-cuda-12: "cuda12-voice-detect-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-voice-detect-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-voice-detect-development"
- !!merge <<: *voicedetect
name: "nvidia-l4t-arm64-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-voice-detect"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-voice-detect
- !!merge <<: *voicedetect
name: "nvidia-l4t-arm64-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-voice-detect"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-voice-detect
- !!merge <<: *voicedetect
name: "cuda13-nvidia-l4t-arm64-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-voice-detect
- !!merge <<: *voicedetect
name: "cuda13-nvidia-l4t-arm64-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-voice-detect
- !!merge <<: *voicedetect
name: "cpu-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voice-detect"
mirrors:
- localai/localai-backends:latest-cpu-voice-detect
- !!merge <<: *voicedetect
name: "cpu-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voice-detect"
mirrors:
- localai/localai-backends:master-cpu-voice-detect
- !!merge <<: *voicedetect
name: "metal-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voice-detect"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-voice-detect
- !!merge <<: *voicedetect
name: "metal-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voice-detect"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-voice-detect
- !!merge <<: *voicedetect
name: "cuda12-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-voice-detect
- !!merge <<: *voicedetect
name: "cuda12-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-voice-detect
- !!merge <<: *voicedetect
name: "rocm-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-voice-detect
- !!merge <<: *voicedetect
name: "rocm-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-voice-detect
- !!merge <<: *voicedetect
name: "intel-sycl-f32-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f32-voice-detect
- !!merge <<: *voicedetect
name: "intel-sycl-f32-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f32-voice-detect
- !!merge <<: *voicedetect
name: "intel-sycl-f16-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f16-voice-detect
- !!merge <<: *voicedetect
name: "intel-sycl-f16-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f16-voice-detect
- !!merge <<: *voicedetect
name: "vulkan-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-voice-detect
- !!merge <<: *voicedetect
name: "vulkan-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-vulkan-voice-detect
- !!merge <<: *voicedetect
name: "cuda13-voice-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voice-detect"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-voice-detect
- !!merge <<: *voicedetect
name: "cuda13-voice-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voice-detect"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-voice-detect
## face-detect
- !!merge <<: *facedetect
name: "face-detect-development"
capabilities:
default: "cpu-face-detect-development"
nvidia: "cuda12-face-detect-development"
intel: "intel-sycl-f16-face-detect-development"
metal: "metal-face-detect-development"
amd: "rocm-face-detect-development"
vulkan: "vulkan-face-detect-development"
nvidia-l4t: "nvidia-l4t-arm64-face-detect-development"
nvidia-cuda-13: "cuda13-face-detect-development"
nvidia-cuda-12: "cuda12-face-detect-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-face-detect-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-face-detect-development"
- !!merge <<: *facedetect
name: "nvidia-l4t-arm64-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-face-detect"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-face-detect
- !!merge <<: *facedetect
name: "nvidia-l4t-arm64-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-face-detect"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-face-detect
- !!merge <<: *facedetect
name: "cuda13-nvidia-l4t-arm64-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-face-detect
- !!merge <<: *facedetect
name: "cuda13-nvidia-l4t-arm64-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-face-detect
- !!merge <<: *facedetect
name: "cpu-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-face-detect"
mirrors:
- localai/localai-backends:latest-cpu-face-detect
- !!merge <<: *facedetect
name: "cpu-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-face-detect"
mirrors:
- localai/localai-backends:master-cpu-face-detect
- !!merge <<: *facedetect
name: "metal-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-face-detect"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-face-detect
- !!merge <<: *facedetect
name: "metal-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-face-detect"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-face-detect
- !!merge <<: *facedetect
name: "cuda12-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-face-detect
- !!merge <<: *facedetect
name: "cuda12-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-face-detect"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-face-detect
- !!merge <<: *facedetect
name: "rocm-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-face-detect
- !!merge <<: *facedetect
name: "rocm-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-face-detect"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-face-detect
- !!merge <<: *facedetect
name: "intel-sycl-f32-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f32-face-detect
- !!merge <<: *facedetect
name: "intel-sycl-f32-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-face-detect"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f32-face-detect
- !!merge <<: *facedetect
name: "intel-sycl-f16-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-intel-sycl-f16-face-detect
- !!merge <<: *facedetect
name: "intel-sycl-f16-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-face-detect"
mirrors:
- localai/localai-backends:master-gpu-intel-sycl-f16-face-detect
- !!merge <<: *facedetect
name: "vulkan-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-vulkan-face-detect
- !!merge <<: *facedetect
name: "vulkan-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-face-detect"
mirrors:
- localai/localai-backends:master-gpu-vulkan-face-detect
- !!merge <<: *facedetect
name: "cuda13-face-detect"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-face-detect"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-face-detect
- !!merge <<: *facedetect
name: "cuda13-face-detect-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-face-detect"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-face-detect
## stablediffusion-ggml
- !!merge <<: *stablediffusionggml
name: "cpu-stablediffusion-ggml"
@@ -2909,6 +3239,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-privacy-filter"
mirrors:
- localai/localai-backends:master-gpu-vulkan-privacy-filter
- !!merge <<: *privacyfilter
name: "metal-privacy-filter"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-privacy-filter"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-privacy-filter
- !!merge <<: *privacyfilter
name: "metal-privacy-filter-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-privacy-filter"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-privacy-filter
- !!merge <<: *privacyfilter
name: "cuda13-privacy-filter"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-privacy-filter"
@@ -3220,6 +3560,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp-development"
intel: "intel-sycl-f32-sam3-cpp-development"
vulkan: "vulkan-sam3-cpp-development"
metal: "metal-sam3-cpp-development"
- !!merge <<: *sam3cpp
name: "cpu-sam3-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-sam3-cpp"
@@ -3230,6 +3571,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-sam3-cpp"
mirrors:
- localai/localai-backends:master-cpu-sam3-cpp
- !!merge <<: *sam3cpp
name: "metal-sam3-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-sam3-cpp"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-sam3-cpp
- !!merge <<: *sam3cpp
name: "metal-sam3-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-sam3-cpp"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-sam3-cpp
- !!merge <<: *sam3cpp
name: "cuda12-sam3-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sam3-cpp"
@@ -3303,6 +3654,7 @@
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp-development"
intel: "intel-sycl-f32-rfdetr-cpp-development"
vulkan: "vulkan-rfdetr-cpp-development"
metal: "metal-rfdetr-cpp-development"
- !!merge <<: *rfdetrcpp
name: "cpu-rfdetr-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-rfdetr-cpp"
@@ -3313,6 +3665,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-rfdetr-cpp"
mirrors:
- localai/localai-backends:master-cpu-rfdetr-cpp
- !!merge <<: *rfdetrcpp
name: "metal-rfdetr-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-rfdetr-cpp"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-rfdetr-cpp
- !!merge <<: *rfdetrcpp
name: "metal-rfdetr-cpp-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-rfdetr-cpp"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-rfdetr-cpp
- !!merge <<: *rfdetrcpp
name: "cuda12-rfdetr-cpp"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rfdetr-cpp"
@@ -4101,6 +4463,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-localvqe"
mirrors:
- localai/localai-backends:master-gpu-vulkan-localvqe
- !!merge <<: *localvqecpp
name: "metal-localvqe"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-localvqe"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-localvqe
- !!merge <<: *localvqecpp
name: "metal-localvqe-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-localvqe"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-localvqe
## kokoro
- !!merge <<: *kokoro
name: "kokoro-development"
@@ -4799,7 +5171,6 @@
intel: "intel-fish-speech-development"
amd: "rocm-fish-speech-development"
nvidia-l4t: "nvidia-l4t-fish-speech-development"
metal: "metal-fish-speech-development"
default: "cpu-fish-speech-development"
nvidia-cuda-13: "cuda13-fish-speech-development"
nvidia-cuda-12: "cuda12-fish-speech-development"
@@ -4875,16 +5246,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-fish-speech
## faster-qwen3-tts
- !!merge <<: *faster-qwen3-tts
name: "faster-qwen3-tts-development"

View File

@@ -13,6 +13,17 @@ fi
# fish-speech uses pyrootutils which requires a .project-root marker
touch "${backend_dir}/.project-root"
# On darwin arm64 the transitive `tokenizers` dep compiles its Rust extension
# from source (Linux uses prebuilt manylinux wheels, so it never compiles
# there). The pinned tokenizers crate that fish-speech's stack resolves to
# contains a `&T` -> `&mut T` cast that trips the now-deny-by-default
# `invalid_reference_casting` lint in the macOS runner's newer Rust toolchain,
# breaking the build (seen in the v4.5.5 release CI fish-speech darwin/metal
# job). Allow that lint so the unchanged third-party crate compiles as before.
# Append rather than clobber any pre-existing RUSTFLAGS; harmless on Linux
# where no Rust compile happens.
export RUSTFLAGS="${RUSTFLAGS:-} -A invalid_reference_casting"
installRequirements
# Clone fish-speech source (the pip package doesn't include inference modules)

View File

@@ -1,2 +0,0 @@
torch
torchaudio

View File

@@ -7,3 +7,7 @@ setuptools
six
scipy
numpy
# fish-speech is installed editable with --no-build-isolation, so the build
# backends of its transitive deps must already be in the venv. One of them
# builds a Rust extension and needs setuptools-rust present at metadata time.
setuptools-rust

View File

@@ -11,14 +11,31 @@ fi
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade "
installRequirements
# Fetch convert_hf_to_gguf.py from llama.cpp
# Fetch convert_hf_to_gguf.py from llama.cpp.
# Upstream split the model-specific logic out of the single file into a
# sibling `conversion/` package (convert_hf_to_gguf.py now does
# `from conversion import ...`), so a single-file download no longer runs —
# it fails with `ModuleNotFoundError: No module named 'conversion'`. We clone
# the repo and copy both the script and the package; Python puts the script's
# own directory on sys.path[0], so the package resolves when placed beside it.
LLAMA_CPP_CONVERT_VERSION="${LLAMA_CPP_CONVERT_VERSION:-master}"
LLAMA_CPP_SRC="${EDIR}/llama.cpp"
CONVERT_SCRIPT="${EDIR}/convert_hf_to_gguf.py"
if [ ! -f "${CONVERT_SCRIPT}" ]; then
echo "Downloading convert_hf_to_gguf.py from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
curl -L --fail --retry 3 \
"https://raw.githubusercontent.com/ggml-org/llama.cpp/${LLAMA_CPP_CONVERT_VERSION}/convert_hf_to_gguf.py" \
-o "${CONVERT_SCRIPT}" || echo "Warning: Failed to download convert_hf_to_gguf.py."
cloneLlamaCpp() {
if [ ! -d "${LLAMA_CPP_SRC}/.git" ]; then
git clone --depth 1 --branch "${LLAMA_CPP_CONVERT_VERSION}" \
https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}" 2>/dev/null || \
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}"
fi
}
if [ ! -f "${CONVERT_SCRIPT}" ] || [ ! -d "${EDIR}/conversion" ]; then
echo "Fetching convert_hf_to_gguf.py + conversion/ from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
cloneLlamaCpp
cp "${LLAMA_CPP_SRC}/convert_hf_to_gguf.py" "${CONVERT_SCRIPT}"
rm -rf "${EDIR}/conversion"
cp -r "${LLAMA_CPP_SRC}/conversion" "${EDIR}/conversion"
fi
# Install gguf package from the same llama.cpp commit to keep them in sync
@@ -41,12 +58,7 @@ QUANTIZE_BIN="${EDIR}/llama-quantize"
if [ ! -x "${QUANTIZE_BIN}" ] && ! command -v llama-quantize &>/dev/null; then
if command -v cmake &>/dev/null; then
echo "Building llama-quantize from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
LLAMA_CPP_SRC="${EDIR}/llama.cpp"
if [ ! -d "${LLAMA_CPP_SRC}" ]; then
git clone --depth 1 --branch "${LLAMA_CPP_CONVERT_VERSION}" \
https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}" 2>/dev/null || \
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}"
fi
cloneLlamaCpp # reuses the clone fetched for convert_hf_to_gguf.py
cmake -B "${LLAMA_CPP_SRC}/build" -S "${LLAMA_CPP_SRC}" -DGGML_NATIVE=OFF -DBUILD_SHARED_LIBS=OFF
cmake --build "${LLAMA_CPP_SRC}/build" --target llama-quantize -j"$(nproc 2>/dev/null || echo 2)"
cp "${LLAMA_CPP_SRC}/build/bin/llama-quantize" "${QUANTIZE_BIN}"

View File

@@ -85,9 +85,15 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
# The resulting binary still requires an AVX-512 capable CPU at runtime,
# same constraint sglang upstream documents in docker/xeon.Dockerfile.
# Pin the source build to the same release the GPU path floors on
# (0.5.11, see requirements-cublas12-after.txt). An unpinned master clone
# pulls in newer CPU kernels (e.g. mamba/fla.cpp) that fail to compile
# (constexpr non-constant + kineto_LIBRARY-NOTFOUND). Bump deliberately.
SGLANG_VERSION="${SGLANG_VERSION:-v0.5.11}"
_sgl_src=$(mktemp -d)
trap 'rm -rf "${_sgl_src}"' EXIT
git clone --depth 1 https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
git clone --depth 1 --branch "${SGLANG_VERSION}" \
https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
# Patch -march=native → -march=sapphirerapids in the CPU kernel CMakeLists
sed -i 's/-march=native/-march=sapphirerapids/g' \

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/cpu
accelerate
torch==2.12.0+cpu
torch==2.9.1+cpu
torchvision
torchaudio
transformers

View File

@@ -1,23 +1,23 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH=$CURDIR/lib:${LD_LIBRARY_PATH:-}
export LD_LIBRARY_PATH="$CURDIR"/lib:${LD_LIBRARY_PATH:-}
# SSL certificates for model auto-download
if [ -d "$CURDIR/etc/ssl/certs" ]; then
export SSL_CERT_DIR=$CURDIR/etc/ssl/certs
export SSL_CERT_DIR="$CURDIR"/etc/ssl/certs
fi
# espeak-ng data directory
if [ -d "$CURDIR/espeak-ng-data" ]; then
export ESPEAK_NG_DATA=$CURDIR/espeak-ng-data
export ESPEAK_NG_DATA="$CURDIR"/espeak-ng-data
fi
# Use bundled ld.so if present (portability)
if [ -f $CURDIR/lib/ld.so ]; then
exec $CURDIR/lib/ld.so $CURDIR/kokoros-grpc "$@"
if [ -f "$CURDIR"/lib/ld.so ]; then
exec "$CURDIR"/lib/ld.so "$CURDIR"/kokoros-grpc "$@"
fi
exec $CURDIR/kokoros-grpc "$@"
exec "$CURDIR"/kokoros-grpc "$@"

View File

@@ -570,6 +570,43 @@ impl Backend for KokorosService {
) -> Result<Response<backend::Result>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn sound_detection(
&self,
_: Request<backend::SoundDetectionRequest>,
) -> Result<Response<backend::SoundDetectionResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn depth(
&self,
_: Request<backend::DepthRequest>,
) -> Result<Response<backend::DepthResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn token_classify(
&self,
_: Request<backend::TokenClassifyRequest>,
) -> Result<Response<backend::TokenClassifyResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn score(
&self,
_: Request<backend::ScoreRequest>,
) -> Result<Response<backend::ScoreResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
type ForwardStream = ReceiverStream<Result<backend::ForwardReply, Status>>;
async fn forward(
&self,
_: Request<tonic::Streaming<backend::ForwardRequest>>,
) -> Result<Response<Self::ForwardStream>, Status> {
Err(Status::unimplemented("Not supported"))
}
}
#[cfg(test)]

View File

@@ -0,0 +1,8 @@
Website = "https://localai.io"
[Details]
Icon = "../../core/http/static/logo.png"
Name = "LocalAI"
ID = "com.localai.launcher"
Version = "0.0.0"
Build = 1

View File

@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.network.server</key>
<true/>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
</dict>
</plist>

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env bash
# Code-sign and notarize macOS artifacts for LocalAI.
# Every sub-command is a no-op (exit 0) when its required secret is unset,
# so unsigned builds (forks, local dev, PRs) keep working.
set -euo pipefail
ENTITLEMENTS="contrib/macos/Launcher.entitlements"
KEYCHAIN="localai-ci.keychain-db"
cmd_import_cert() {
if [ -z "${MACOS_CERTIFICATE:-}" ]; then
echo "[sign] MACOS_CERTIFICATE unset: skipping cert import (unsigned build)"
return 0
fi
local certfile keychain_pwd default_keychain
certfile="$(mktemp).p12"
keychain_pwd="${MACOS_CI_KEYCHAIN_PWD:?MACOS_CI_KEYCHAIN_PWD required when signing}"
echo "$MACOS_CERTIFICATE" | base64 --decode > "$certfile"
security create-keychain -p "$keychain_pwd" "$KEYCHAIN"
security set-keychain-settings -lut 21600 "$KEYCHAIN"
security unlock-keychain -p "$keychain_pwd" "$KEYCHAIN"
security import "$certfile" -k "$KEYCHAIN" -P "${MACOS_CERTIFICATE_PWD:?}" \
-T /usr/bin/codesign -T /usr/bin/security
security set-key-partition-list -S apple-tool:,apple:,codesign: \
-s -k "$keychain_pwd" "$KEYCHAIN" >/dev/null
default_keychain="$(security default-keychain | tr -d ' "')"
security list-keychains -d user -s "$KEYCHAIN" "$default_keychain"
rm -f "$certfile"
echo "[sign] certificate imported into $KEYCHAIN"
}
cmd_sign() {
local target="$1"
if [ -z "${MACOS_SIGN_IDENTITY:-}" ]; then
echo "[sign] MACOS_SIGN_IDENTITY unset: skipping codesign of $target"
return 0
fi
case "$target" in
*.app)
# Hardened runtime + entitlements are required for notarizing the app bundle.
codesign --deep --force --options runtime --timestamp \
--entitlements "$ENTITLEMENTS" \
--sign "$MACOS_SIGN_IDENTITY" "$target"
;;
*)
# A disk image carries no entitlements/runtime; just sign the container.
codesign --force --timestamp --sign "$MACOS_SIGN_IDENTITY" "$target"
;;
esac
codesign --verify --strict --verbose=2 "$target"
echo "[sign] signed $target"
}
cmd_notarize() {
local dmg="$1"
if [ -z "${MACOS_NOTARY_KEY:-}" ]; then
echo "[notarize] MACOS_NOTARY_KEY unset: skipping notarization of $dmg"
return 0
fi
local keyfile
keyfile="$(mktemp).p8"
echo "$MACOS_NOTARY_KEY" | base64 --decode > "$keyfile"
xcrun notarytool submit "$dmg" \
--key "$keyfile" \
--key-id "${MACOS_NOTARY_KEY_ID:?}" \
--issuer "${MACOS_NOTARY_ISSUER_ID:?}" \
--wait
rm -f "$keyfile"
xcrun stapler staple "$dmg"
xcrun stapler validate "$dmg"
echo "[notarize] notarized and stapled $dmg"
}
main() {
local sub="${1:-}"; shift || true
case "$sub" in
import-cert) cmd_import_cert ;;
sign) cmd_sign "$@" ;;
notarize) cmd_notarize "$@" ;;
*) echo "usage: $0 {import-cert|sign <path>|notarize <dmg>}" >&2; exit 2 ;;
esac
}
main "$@"

View File

@@ -37,6 +37,8 @@ func (a *Application) RestartAgentJobService() error {
if d.JobStore != nil {
agentJobService.SetDistributedJobStore(d.JobStore)
}
// Keep agent tasks consistent across replicas (same client the dispatcher uses).
agentJobService.SetTaskSyncNATS(d.Nats)
}
// Start the service

View File

@@ -604,6 +604,10 @@ func (a *Application) StartAgentPool() {
usm.SetJobDBStore(s)
}
}
// Keep per-user agent tasks consistent across replicas (nil in standalone).
if d := a.Distributed(); d != nil {
usm.SetJobSyncNATS(d.Nats)
}
aps.SetUserServicesManager(usm)
a.agentPoolService.Store(aps)

View File

@@ -355,6 +355,7 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
PrefixProvider: prefixProvider,
PrefixConfig: prefixCfg,
Pressure: pressure,
SharedModels: cfg.Distributed.SharedModels,
})
// Wire staging-progress broadcasting so file-staging shows up on every

View File

@@ -16,6 +16,7 @@ import (
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/modeladmin"
"github.com/mudler/LocalAI/core/services/monitoring"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/routing/admission"
@@ -279,6 +280,9 @@ func New(opts ...config.AppOption) (*Application, error) {
if application.agentJobService != nil {
application.agentJobService.SetDistributedBackends(distSvc.Dispatcher)
application.agentJobService.SetDistributedJobStore(distSvc.JobStore)
// Keep agent tasks consistent across replicas (jobs already sync via the
// dispatcher + DB read-through). Same NATS client the dispatcher uses.
application.agentJobService.SetTaskSyncNATS(distSvc.Nats)
}
// Wire skill store into AgentPoolService (wired at pool start time via closure)
// The actual wiring happens in StartAgentPool since the pool doesn't exist yet.
@@ -330,9 +334,14 @@ func New(opts ...config.AppOption) (*Application, error) {
gs := application.galleryService
sys := options.SystemState
cfgLoaderOpts := options.ToConfigLoaderOptions()
gs.OnModelsChanged = func(_ messaging.CacheInvalidateEvent) {
if err := application.ModelConfigLoader().LoadModelConfigsFromPath(sys.Model.ModelsPath, cfgLoaderOpts...); err != nil {
xlog.Warn("Failed to reload model configs after peer invalidation", "error", err)
gs.OnModelsChanged = func(evt messaging.CacheInvalidateEvent) {
// ApplyRemoteChange honors the op: a "delete" prunes the element
// (a reload-from-path is additive and cannot drop it), anything
// else reloads from disk; a named element's running instance is
// shut down so the new config takes effect. The originating
// replica reloads inline and never depends on this path.
if err := modeladmin.ApplyRemoteChange(application.ModelConfigLoader(), application.modelLoader, sys.Model.ModelsPath, evt, cfgLoaderOpts...); err != nil {
xlog.Warn("Failed to apply peer model config change", "error", err)
}
}
if err := application.galleryService.SubscribeBroadcasts(); err != nil {

Some files were not shown because too many files have changed in this diff Show More