Compare commits

..

29 Commits

Author SHA1 Message Date
Ettore Di Giacinto
6815bb2034 chore(recon): re-pin to squashed release commits (voice 1db1759, face e22260d)
Both engine repos were squashed to a single release commit and force-pushed, which
orphaned the previous pins (voice 30beecd, face 6107a24). Re-pin to the new
single-commit SHAs. These also fold in a real fix: the persistent graph-cache
fingerprint now includes op_params, so two structurally identical GGML_OP_CUSTOM
graphs (e.g. a blocked 3x3 vs a blocked 1x1 strided conv) can no longer false-hit
the cache and replay the wrong kernel. voice CI is now green (was failing
conv1x1_s2 with an out-of-bounds write on the GGML_NATIVE=OFF build); face stays
green. WeSpeaker embed parity 1.0 vs golden.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-29 15:22:32 +00:00
Ettore Di Giacinto
0b65e9cb3e chore(recon): re-pin to the squashed engine release commits
The voice-detect.cpp and face-detect.cpp histories were squashed to a single
release commit, which orphaned the previous pins (voice 30beecd, face 6107a24).
Re-pin to the new single-commit SHAs (voice 3d51077, face 06914b0); the tree is
identical, so the backend build is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-27 23:49:31 +00:00
Ettore Di Giacinto
3e91eafed3 docs(readme): announce native voice-detect + face-detect backends in Latest News
Add a Latest News entry for the new from-scratch C++/ggml biometric backends
(voice-detect.cpp + face-detect.cpp) that replace the Python insightface and
speaker-recognition backends: no Python/onnxruntime at inference, self-contained
GGUF, bit-exact parity, GPU cuDNN parity. Mirrors the parakeet.cpp /
locate-anything.cpp native-backend news entries. Refs PR #10441.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 22:14:41 +00:00
Ettore Di Giacinto
814b2a7c6c chore(recon): bump voice-detect pin to ERes2Net blocked-default (30beecd)
Defaults VD_ERES2NET_BLOCKED ON: routes the ERes2Net Res2Net body through the
blocked nChw16c AVX-512 directconv island instead of the 1x1 mul_mat fast path
(CONT-transpose + skinny low-K GEMM). On the shipped GGML_NATIVE=OFF build (ggml
mul_mat is AVX2-only) this wins ~2x at every thread count (2.07x@1t, 2.2x@4t,
2.05x@8t); pure-AVX2 fallback still 1.3-1.62x. Parity exact (cosine=1.000000 vs
golden), so registered voices + verify/identify thresholds are unaffected. The
prior default-OFF rested on a stale comment whose 23pct regression only held on
the non-shipping GGML_NATIVE=ON build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 19:51:03 +00:00
Ettore Di Giacinto
7cbb743b25 feat(recon): enable cuDNN conv path on arm64+CUDA13 recon backends
The voice-detect.cpp / face-detect.cpp engines have an opt-in cuDNN
implicit-GEMM conv path behind VOICEDETECT_GGML_CUDNN / FACEDETECT_GGML_CUDNN
(default OFF) that kills im2col on the GPU and reaches torch-cuDNN parity
(SCRFD 2.3x, WeSpeaker/ERes2Net parity), measured on the GB10
(arm64, CUDA 13, sm_121a).

Enable it for the CUDA build, but only where cuDNN actually ships: the
arm64 + CUDA 13 image (GB10/Jetson/L4T). x86 CUDA images carry no cuDNN,
so flipping it on globally for BUILD_TYPE=cublas would be a link failure.
The Makefiles gate on CUDA_MAJOR_VERSION=13 + arch (TARGETARCH from the
matrix/Docker build, uname -m fallback for local builds).

backend/Dockerfile.golang already installs the runtime libcudnn9-cuda-13
in the arm64+CUDA13 apt block; add the matching libcudnn9-dev-cuda-13 so
the build-time link resolves.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 15:54:12 +00:00
Ettore Di Giacinto
9684c5dd7e chore(recon): bump pins to cuDNN-conv-capable engines (voice b6e4356, face 6107a24)
Adds the opt-in cuDNN implicit-GEMM conv path (VOICEDETECT_GGML_CUDNN /
FACEDETECT_GGML_CUDNN, DEFAULT OFF -> zero build/runtime dep until enabled).
On GPU it kills the im2col-materialization bottleneck and reaches torch-cuDNN
parity on the spill-bound convs: SCRFD detect 14.8->6.4ms (2.3x, ~parity),
WeSpeaker ~parity, ERes2Net beats torch (1.10x); ArcFace/CAM++ neutral (no
spill). Parity exact (SCRFD <=1px, cosine=1.0). To USE it in LocalAI, the CUDA
backend build must enable the flag AND bundle libcudnn - deferred until a
cuDNN-bundled GPU image; flag stays OFF here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 15:39:42 +00:00
Ettore Di Giacinto
628b8a8e01 chore(recon): bump pins to GPU persistent-graph + multi-model-safe cache (voice 45d2e6b, face 0a4799a)
GPU wins (CUDA/ggml backend, no CPU-path change): persistent per-shape graph+context
cache in Backend::compute() eliminates the per-call cudaGraph re-instantiation churn
-> wav2vec2 emotion+age-gender now AT GPU parity with torch-cuDNN on GB10 (0.97-0.98x),
CAM++ -5.7ms; bit-identical parity. Cache hardened multi-model-safe (invalidate-on-free
keyed by the ModelLoader weights buffer) so LocalAI multi-model hosting cannot stale-hit.
Conv models still trail cuDNN (im2col-materialization-bound) - cuDNN implicit-GEMM lever next.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-24 13:03:48 +00:00
Ettore Di Giacinto
c4df41d209 chore(recon): bump pins to small-spatial + stem conv kernels (voice 99b1804, face 47fdab6)
Measured-gap-driven conv kernels: small-spatial (fill the register tile when
output width <= tile width) + small-IC stem + strided-1x1/downsample recovery.
ArcFace recognizer 0.57 -> 0.70x MLAS @1t (the closest conv model), WeSpeaker
0.65 -> 0.79x @1t. Parity cosine=1.0 / detect <=1px. The OC-block-sharing lever
was a measured dead-end (deep stride-1 is L3-weight-bandwidth bound, not
read-port bound) and was NOT shipped. Kernel ceiling reached; further gap needs
an algorithm-class change (cache-blocked weight-stationary GEMM, or q8 weights).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 23:44:04 +00:00
Ettore Di Giacinto
c1a3afc980 chore(recon): bump pins to breadth blocked-backbone (voice 7f66871, face d80092b)
voice 7f66871: AVX2-vectorized (ymm) blocked island - AVX2-only hosts now run
the blocked backbone for WeSpeaker (2.3x over per-conv-AVX2, cosine=1.0);
ERes2Net stays per-conv (blocked regresses, opt-in only); CAM++ Winograd-pinned.
face d80092b: ArcFace recognizer blocked island, AVX-512 default (-13% @8t, ~0.90x
MLAS, the closest conv result), auto per-conv on AVX2; SCRFD untouched on Winograd
(0 island invocations during detect). Parity cosine=1.0 / detect <=1px throughout.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 20:45:39 +00:00
Ettore Di Giacinto
f9a465ee25 chore(recon): bump voice pin to Phase-A blocked backbone (f4e7eef)
WeSpeaker ResNet34 runs as one nChw16c blocked island (2 reorders/forward vs
~60) on AVX-512, default; per-conv directconv fallback on AVX2. +2.9% @1t /
+17-19% @8t vs per-conv directconv, parity cosine=1.0. The conv microkernel is
already FMA-bound near peak (~0.86-0.98x MLAS-implied); residual to MLAS is
sub-peak edge + non-conv tail, documented in docs/cpu-optimization.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 19:58:08 +00:00
Ettore Di Giacinto
48e22da165 chore(recon): bump pins to MLAS-class direct-conv engines (voice 7ecfd07, face be22d67)
Hand-tuned nChw16c AVX-512 register-tiled direct-conv microkernel (~263 GFLOP/s,
within 6-7% of MLAS per-op efficiency), runtime-CPUID-dispatched + AVX2 fallback,
fused bias/relu. voice 7ecfd07: default 3x3-s1 kernel for WeSpeaker (+37%/+32%)
+ ERes2Net, CAM++ pinned to Winograd. face be22d67: shape-gated to the ArcFace
recognizer body (+25-27% @8t); SCRFD detector stays on Winograd (no regression).
Parity cosine=1.0 / detect <=1px on AVX-512 + AVX2 paths. Portable single binaries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 16:25:39 +00:00
Ettore Di Giacinto
f940dc858a chore(recon): bump pins to FMA-throughput engines (voice f7b9f89, face 2d2d5f0)
face -> 2d2d5f0: route ArcFace 3x3 body convs through the AVX-512 winograd
microkernel (kWinoMinSize 80->14); ArcFace 1.62x @1t, SCRFD detect to 0.966 of
MLAS @1t, no regression. voice -> f7b9f89: runtime-CPUID-dispatched AVX-512
winograd-GEMM microkernel (ship-safe, AVX2 fallback bit-identical); WeSpeaker
1.90x @1t. Parity cosine=1.0 throughout; portable single binaries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 13:17:56 +00:00
Ettore Di Giacinto
f6d93591bd chore(recon): bump voice-detect pin to ECAPA operand-order win (e9c56ae)
voice-detect.cpp -> e9c56ae: weight-as-src0 mul_mat order in ECAPA's F32
conv1d_same (routes through tinyBLAS sgemm); ECAPA embed 1.67x @1t / ~1.3x @8t,
parity cosine=1.0. Isolated to encoder.cpp (ECAPA-only); ERes2Net/CAM++/WeSpeaker
do not call conv1d_same so are provably unaffected.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 09:17:41 +00:00
Ettore Di Giacinto
594576f440 chore(recon): bump face-detect pin to deep-kernel engine (7ae5c4d)
face-detect.cpp -> 7ae5c4d: register-blocked winograd-domain GEMM microkernel
(2.8x isolated GFLOP/s), AVX-512 zmm evolution behind runtime CPUID dispatch
(ship-safe, AVX2 fallback bit-identical), bias/relu fused into the winograd
output transform, and SFace Conv+BN fold + bias/PReLU fusion. SCRFD detect
~1.4x faster end-to-end vs the round-4 baseline; parity bit-exact; portable
single binary (function-multiversioned, no global -mavx512f).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-23 00:20:27 +00:00
Ettore Di Giacinto
5614b39782 chore(recon): bump backend pins to round-4 Winograd engines (CPU opt complete)
voice-detect.cpp -> d2839ca (CAM++ FCM 2D convs through Winograd, -15.5%/-10.3%);
face-detect.cpp -> c1db23d (AVX2-vectorized Winograd tile transforms, SCRFD
detect -14%/-9.6%). Final CPU optimization round; the conv-kernel lever class is
now exhausted (parity held cosine=1.0; GGUF/parity unchanged, HF GGUFs valid).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 20:42:36 +00:00
Ettore Di Giacinto
b4f7a36d6d chore(recon): bump backend pins to round-3 Winograd engines
voice-detect.cpp -> 45122ec (Winograd F(2x2,3x3) for WeSpeaker/ERes2Net 3x3
convs, -22%/-20% @8t); face-detect.cpp -> cd5c962 (Winograd F(4x4,3x3) for
SCRFD large maps, -22% @1t on top of F(2x2), more load-stable). Parity held
(cosine=1.0); GGUF format unchanged, HF GGUFs valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 20:04:10 +00:00
Ettore Di Giacinto
c6170b875d chore(recon): bump backend pins to round-2 CPU-optimized engines
voice-detect.cpp -> fe7e6a3 (ERes2Net 1x1->mul_mat, CAM++ layout+context,
wav2vec2 conv-LN, ECAPA capture-drop, AVX512 dispatch opt-in); face-detect.cpp
-> 9c8adb7 (AVX2 Winograd F(2x2,3x3) for SCRFD/ArcFace 3x3 convs, ArcFace
BN-fold). Parity unchanged (cosine=1.0); GGUF format unchanged, HF GGUFs valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 18:27:59 +00:00
Ettore Di Giacinto
a9c7484986 chore(recon): bump backend pins to CPU-optimized engine commits
voice-detect.cpp -> 0d9c1b3 (radix-2 FFT FBank, threads, flash attn + cached
pos-conv); face-detect.cpp -> 523aee1 (thread-gated direct conv, threads).
Brings the CPU optimizations into the LocalAI backend builds. GGUF format and
parity unchanged, so the published HF GGUFs remain valid.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 15:16:21 +00:00
Ettore Di Giacinto
e05dece93c feat(recon): honor LocalAI per-model threads in voice/face-detect backends
LocalAI spawns one backend process per model and serves requests
concurrently, so the engines' own min(hardware_concurrency, 8) default
can oversubscribe cores. Forward the per-model Threads value from the
gRPC LoadModel options into the engine via VOICEDETECT_THREADS /
FACEDETECT_THREADS (read at backend construction) before the capi load.
A non-positive Threads is treated as unset, leaving the engine default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 13:32:02 +00:00
Ettore Di Giacinto
7c2a347e79 feat(gallery): add face-detect-buffalo-sc and antelopev2 packs
Add gallery entries for two newly-published insightface face packs on
the face-detect backend: buffalo_sc (smallest pack, SCRFD-500M + small
ArcFace) and antelopev2 (higher-accuracy, SCRFD-10G + ArcFace glint360k
R100, 512-d). Both are non-commercial research-only.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 10:48:00 +00:00
Ettore Di Giacinto
6e0c491380 feat(gallery): re-embed buffalo anti-spoof + add audeering age/gender voice model
Update the 3 buffalo face-detect GGUF sha256 (anti-spoof ensemble now
embedded and re-uploaded under the same filenames/uris) and note the
FaceVerify anti_spoof request flag in each description. Add a new
voice-detect-age-gender-wav2vec2 gallery entry mirroring the emotion
model.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 09:42:01 +00:00
Ettore Di Giacinto
2bcdfe2a68 chore(gallery): publish recon backend GGUF uris + sha256
Fill in the published HuggingFace GGUF uris and verified sha256 for the
9 recon gallery entries (voice-detect-* and face-detect-*), and remove
the TODO publish markers. Correct the eres2net, campplus, and
emotion-wav2vec2 uris to the actual published filenames.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 08:48:30 +00:00
Ettore Di Giacinto
b843f498ca docs(recon): document voice-detect and face-detect ggml backends
Document the new standalone C++/ggml biometric backends as the
recommended/default option for face and voice recognition, keeping the
existing Python insightface / speaker-recognition backends framed as the
legacy path.

- features/face-recognition.md: add a face-detect (ggml) backend section
  with the gallery entries (buffalo-l/m/s non-commercial, yunet-sface
  Apache-2.0), licensing, and verify/detect/analyze quickstart.
- features/voice-recognition.md: add a voice-detect (ggml) backend
  section with the gallery entries (ecapa-tdnn, wespeaker-resnet34,
  eres2net, campplus speaker recognizers; emotion-wav2vec2 non-commercial
  analyze head) and quickstart.
- reference/compatibility-table.md: add face-detect.cpp and
  voice-detect.cpp rows to the Vision, Detection & Recognition table.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 08:43:30 +00:00
Ettore Di Giacinto
46d7d59a82 fix(recon): voice-detect metal build branch + face-detect gallery usecases
Add the missing metal BUILD_TYPE branch to the voice-detect Makefile
forwarding -DVOICEDETECT_GGML_METAL=ON, mirroring face-detect, so the
darwin metal CI artifact is built with the Metal backend instead of
CPU-only.

Expand the 4 face-detect gallery models' known_usecases to
[face_recognition, detection, embeddings] to match the backend
capabilities map and the mirrored insightface-buffalo entries, so
auto-selection for /v1/detect and /embeddings works.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 07:24:54 +00:00
Ettore Di Giacinto
e3bca9a172 feat(face-detect): wire backend into index, gallery and build
Register the face-detect.cpp face detection / embedding / verification /
analysis backend (added in Face-INT-A) into LocalAI's distribution
surfaces, mirroring the voice-detect wiring (the closest mudler C++/ggml
recognition analogue):

- backend/index.yaml: add the &facedetect meta-backend (capabilities
  platform map, no top-level uri to avoid the meta-backend gotcha) plus
  the full set of concrete per-arch image entries (cpu/cuda12/cuda13/
  metal/rocm/sycl-f16/sycl-f32/vulkan/l4t and the -development variants),
  22 entries. Referential integrity audited: every alias target resolves.
- gallery/index.yaml: add 4 model entries on backend face-detect -
  face-detect-buffalo-l/m/s (insightface SCRFD + ArcFace/MBF, NON-COMMERCIAL)
  and face-detect-yunet-sface (OpenCV-Zoo YuNet + SFace, APACHE-2.0, the
  commercial-friendly alternative). The detector/embedder architecture is
  read from GGUF metadata (facedetect.arch) at load; only the real
  verify_threshold option is set (0.35 buffalo, 0.363 sface). GGUF
  artifacts are not yet published: each files: entry points at the
  intended mudler/face-detect-gguf location with a TODO to fill sha256
  after upload (no fabricated hashes).
- core/config/backend_capabilities.go: register face-detect in the
  backend capability map (Embedding/Detect/FaceVerify/FaceAnalyze ->
  face_recognition), mirroring insightface.
- .github/backend-matrix.yml: add the linux build matrix block + the
  darwin metal entry mirroring voice-detect.
- .github/workflows/bump_deps.yaml: track mudler/face-detect.cpp via
  FACEDETECT_VERSION (pin 636a1963).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 07:13:45 +00:00
Ettore Di Giacinto
a19ab22186 fix(voice-detect): replace em dashes in net-new descriptions
Project style forbids em/en dashes. Replace the three U+2014 chars
introduced by the voice-detect gallery/index wiring with `-`/`:`.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 00:28:58 +00:00
Ettore Di Giacinto
91d08d88e6 feat(face-detect): add purego Go backend for face-detect.cpp
Add the LocalAI Go backend that dlopens libfacedetect.so (the flat
facedetect_capi_* C-ABI) via purego, mirroring the sibling voice-detect
backend. Implements the Face subset of the Backend gRPC service:

- Embeddings(PredictOptions): Images[0] base64 -> temp file -> embed_path
  -> L2-normalized ArcFace embedding.
- Detect(DetectOptions): src -> detect_path_json -> Detection boxes
  (class_name "face", [x1,y1,x2,y2] -> x/y/w/h).
- FaceVerify(FaceVerifyRequest): two images + threshold + anti_spoof ->
  verify_paths; best-effort img areas via detect.
- FaceAnalyze(FaceAnalyzeRequest): img -> analyze_path_json -> per-face
  age + gender ("M"/"F" normalized to "Man"/"Woman").

The Makefile pins face-detect.cpp to 636a1963 and builds the shared lib
with ggml + vendored libjpeg-turbo static (PIC), so the .so is
ldd-clean (no libggml) and exports only facedetect_capi_* (no jpeg_
symbols). Gated Ginkgo e2e mirrors voice-detect.

Note for the gallery-wiring task: backend registration (index.yaml,
gallery, core/config/backend_capabilities.go) is intentionally not
touched here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 00:26:15 +00:00
Ettore Di Giacinto
2c5ed413cb feat(voice-detect): wire backend into index, gallery and build
Register the voice-detect.cpp speaker-recognition + voice-analysis
backend (added in Voice-INT-A) into LocalAI's distribution surfaces,
mirroring the ced backend (the closest mudler C++/ggml audio analogue):

- backend/index.yaml: add the &voicedetect meta-backend (capabilities
  platform map, no top-level uri) plus the full set of concrete per-arch
  image entries (cpu/cuda12/cuda13/metal/rocm/sycl/vulkan/l4t and the
  -development variants). Referential integrity audited - every alias
  target resolves.
- gallery/index.yaml: add 5 model entries on backend voice-detect -
  ECAPA-TDNN, WeSpeaker ResNet34, 3D-Speaker ERes2Net, CAM++ and the
  wav2vec2 age/gender/emotion analyze model. The engine architecture is
  read from GGUF metadata (voicedetect.arch) at load. GGUF artifacts are
  not yet published: each files: entry points at the intended
  mudler/voice-detect-gguf location with a TODO to fill sha256 after
  upload (no fabricated hashes).
- .github/backend-matrix.yml: add the linux build matrix block + the
  darwin metal entry mirroring ced.
- .github/workflows/bump_deps.yaml: track mudler/voice-detect.cpp via
  VOICEDETECT_VERSION (pin 47546430, = 4754643).
- core/config/backend_capabilities.go: register voice-detect in the
  backend capability map (VoiceVerify/VoiceEmbed/VoiceAnalyze ->
  speaker_recognition), mirroring speaker-recognition.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 00:15:00 +00:00
Ettore Di Giacinto
01e098a844 feat(voice-detect): add Go purego backend for voice-detect.cpp
Add backend/go/voice-detect implementing the Backend gRPC voice subset
(VoiceEmbed/VoiceVerify/VoiceAnalyze) over libvoicedetect.so via purego,
mirroring the parakeet-cpp / omnivoice-cpp backends.

The flat voicedetect_capi C ABI is dlopen'd cgo-less; malloc'd string and
float-vector returns are owned by Go and released through the matching capi
free functions, with the per-ctx last error surfaced into Go errors. Calls are
serialized via base.SingleThread since the C context is not reentrant.

Proto field mapping:
- VoiceEmbed: VoiceEmbedRequest.audio (path) -> embed_path -> Embedding+Model.
- VoiceVerify: audio1/audio2 + threshold (<=0 falls back to the
  verify_threshold option, default 0.25) -> verify_paths -> verified/distance/
  threshold/confidence/model/processing_time_ms.
- VoiceAnalyze: audio (path) -> analyze_path_json; the JSON age/gender/emotion
  document maps to a single VoiceAnalysis segment (start/end 0; gender "label"
  -> dominant_gender with the remaining float scores as the gender map; emotion
  label/scores -> dominant_emotion/emotion).

The Makefile pins voice-detect.cpp to 47546430, clones+builds libvoicedetect.so
with ggml static-linked (PIC, GGML_NATIVE off) so dlopen needs no external
libggml/libvoicedetect; ldd on the artifact shows only system libs. Ginkgo
tests cover option parsing and analyze-JSON mapping; embed/verify smoke specs
gate on VOICEDETECT_BACKEND_TEST_MODEL + VOICEDETECT_BACKEND_TEST_WAV.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-22 00:00:32 +00:00
364 changed files with 3451 additions and 12931 deletions

View File

@@ -102,24 +102,6 @@ Multi-arch backends are NOT a single matrix entry with `platforms: 'linux/amd64,
Entries whose `dockerfile` is `./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` must also set a `builder-base-image` field pointing at a prebuilt base from `quay.io/go-skynet/ci-cache:base-grpc-*` (CI builds these via `.github/workflows/base-images.yml`). The mapping is by `(build-type, platforms)` — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~2535 min cold). Local `make backends/<name>` ignores `builder-base-image` and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds.
### Cover every OS the project supports (Linux **and** Darwin)
`.github/backend-matrix.yml` has two matrices, and they are the source of truth for which OS a backend ships on:
- `include:` — the **Linux** matrix (x86_64 + arm64; CPU and CUDA / ROCm / SYCL / Vulkan).
- `includeDarwin:` — the **macOS / Apple Silicon** matrix (arm64; Metal where the engine supports it, otherwise a native arm64 CPU build).
**A new backend must target every OS it can build for — do not ship Linux-only by default.** A backend that appears only under `include:` is silently unavailable on macOS even when its code would run there. Most C/C++/GGML engines build on Darwin out of the box (ggml defaults `GGML_METAL=ON` on Apple, so a plain build is Metal-enabled), and many Python backends do too (CPU / MPS wheels). If a backend genuinely cannot support an OS (e.g. CUDA-only, no CPU variant), state that in the PR description instead of omitting it silently.
Wiring a backend into `includeDarwin:` is more than the matrix entry:
1. **`includeDarwin:` entry** — `tag-suffix: "-metal-darwin-arm64-<backend>"`, `build-type: "metal"`, `lang: "go"` for go+ggml backends; omit `build-type` for the bespoke C++ ones (llama-cpp / ds4 / privacy-filter). Match an existing entry of the same shape.
2. **`backend/index.yaml`** — add `metal:` to the backend's `capabilities` map (main and `-development`) and concrete `metal-<backend>` / `metal-<backend>-development` image entries pointing at the `-metal-darwin-arm64-<backend>` images.
3. **C/C++ backends only** — add an `inferBackendPathDarwin` case in `scripts/changed-backends.js` returning `backend/cpp/<backend>/` (the generic fallthrough assumes `backend/<lang>/`, which is wrong for a C++ source tree driven with `lang: go`), and give `run.sh` a Darwin branch that exports `DYLD_LIBRARY_PATH` instead of `LD_LIBRARY_PATH`. If the build is bespoke (single `grpc-server` + dylib bundling), model it on `scripts/build/ds4-darwin.sh` and add a `backends/<backend>-darwin` make target plus a gated step in `.github/workflows/backend_build_darwin.yml`.
4. **C++ proto gotcha** — if the backend compiles the generated gRPC/protobuf in a separate CMake target (e.g. `hw_grpc_proto`), that target must link `protobuf::libprotobuf` + `gRPC::grpc++` so the Homebrew include dirs propagate; otherwise macOS fails with `google/protobuf/runtime_version.h not found` (Linux hides this because apt headers sit in `/usr/include`).
The CI path filter only builds a backend on a PR when a file under its directory changes, so a darwin-only YAML edit builds nothing — touch a file under `backend/<lang>/<backend>/` (a one-line comment is enough) in the same PR.
## 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
@@ -243,7 +225,6 @@ After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/backend-matrix.yml` for all desired platforms (per-arch entries with `platform-tag` for multi-arch; `builder-base-image` for llama-cpp / ik-llama-cpp / turboquant)
- [ ] **OS coverage considered**: added to `includeDarwin:` (macOS/Apple Silicon) if the backend can build there — with the `backend/index.yaml` `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD branch and `inferBackendPathDarwin` case for C++ backends — or the PR explains why an OS is unsupported. Do not ship Linux-only by default.
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml

View File

@@ -17,29 +17,19 @@ if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
fi
cd /LocalAI/backend/cpp/llama-cpp
if [ -z "${BUILD_TYPE:-}" ]; then
# Pure CPU image (BUILD_TYPE empty): one build with ggml CPU_ALL_VARIANTS replaces the
# per-microarch binaries (x86: avx/avx2/avx512/fallback; arm64: armv8.x/armv9.x). ggml
# dlopens the best libggml-cpu-*.so at runtime by probing host CPU features.
#
# arm64: the CPU_ALL_VARIANTS table includes armv9.2 SME variants whose -march=...+sme is
# rejected by the Ubuntu 24.04 default gcc-13. gcc-14 accepts it, so build the arm64
# variants with it (the host never *selects* SME unless it has it, but every variant must
# still compile).
if [ "${TARGETARCH}" = "arm64" ]; then
apt-get update -qq && apt-get install -y -qq gcc-14 g++-14
export CC=gcc-14 CXX=g++-14
fi
make llama-cpp-cpu-all
else
# GPU build (cublas/hipblas/sycl/vulkan/...): the accelerator does the compute, so a
# single fallback CPU build is enough - no per-microarch CPU variants needed. (This also
# keeps the heavy GPU backend compile from also building the whole CPU variant matrix,
# and avoids the gcc-14 apt step on GPU base images such as nvidia l4t.)
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
else
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-avx
make llama-cpp-avx2
make llama-cpp-avx512
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
fi
make llama-cpp-grpc
make llama-cpp-rpc-server
ccache -s || true

View File

@@ -19,21 +19,17 @@ fi
cd /LocalAI/backend/cpp/turboquant
if [ -z "${BUILD_TYPE:-}" ]; then
# Pure CPU image: one ggml CPU_ALL_VARIANTS build replaces the per-microarch binaries.
# arm64: the armv9.2 SME variants need gcc-14 (gcc-13 rejects +sme).
if [ "${TARGETARCH}" = "arm64" ]; then
apt-get update -qq && apt-get install -y -qq gcc-14 g++-14
export CC=gcc-14 CXX=g++-14
fi
make turboquant-cpu-all
else
# GPU build (cublas/hipblas/sycl/vulkan/...): single fallback CPU build, the accelerator
# does the compute. Keeps the GPU compile from also building the CPU variant matrix and
# avoids the gcc-14 apt step on GPU base images such as nvidia l4t.
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
make turboquant-fallback
make turboquant-grpc
make turboquant-rpc-server
else
make turboquant-avx
make turboquant-avx2
make turboquant-avx512
make turboquant-fallback
make turboquant-grpc
make turboquant-rpc-server
fi
make turboquant-grpc
make turboquant-rpc-server
ccache -s || true

View File

@@ -2,28 +2,6 @@
# Matrix data for backend container image builds.
# Consumed by scripts/changed-backends.js for both backend.yml and backend_pr.yml.
# This file is NOT a workflow — it has no top-level 'on:' or 'jobs:'.
#
# OS / platform coverage — READ THIS WHEN ADDING A BACKEND
# --------------------------------------------------------
# This file is the source of truth for which OS each backend is built and
# published for. A backend ships ONLY for the matrices it appears in:
# - Linux -> the `include:` matrix below (x86_64 + arm64; CPU and
# CUDA / ROCm / SYCL / Vulkan variants).
# - macOS -> the `includeDarwin:` matrix (Apple Silicon / arm64; Metal where
# the engine supports it, otherwise a native arm64 CPU build).
#
# New backends must target EVERY OS they can build for, not just Linux. A backend
# listed only under `include:` is silently unavailable on macOS even when its code
# would run there. Most C/C++/GGML engines build on Darwin (ggml defaults
# GGML_METAL=ON on Apple, so a plain build is Metal-enabled), and many Python
# backends do too (CPU / MPS). If a backend genuinely cannot support an OS, say so
# in its PR description rather than silently omitting it.
#
# Adding a backend to `includeDarwin:` is more than one line — see the darwin
# checklist in .agents/adding-backends.md (includeDarwin entry, the index.yaml
# `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD
# branch for C/C++ backends, and the inferBackendPathDarwin case in
# scripts/changed-backends.js so the path filter actually builds it).
# Linux matrix (consumed by backend-jobs).
include:
@@ -5248,37 +5226,6 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
build-type: "metal"
lang: "go"
# Vision/utility C++/ggml backends (go+cgo). Their Makefiles already carry a
# Darwin/Metal path (GGML_METAL=ON when build-type=metal); this just builds and
# publishes the metal image so Apple Silicon can install them.
- backend: "depth-anything-cpp"
tag-suffix: "-metal-darwin-arm64-depth-anything-cpp"
build-type: "metal"
lang: "go"
- backend: "locate-anything-cpp"
tag-suffix: "-metal-darwin-arm64-locate-anything-cpp"
build-type: "metal"
lang: "go"
- backend: "rfdetr-cpp"
tag-suffix: "-metal-darwin-arm64-rfdetr-cpp"
build-type: "metal"
lang: "go"
- backend: "sam3-cpp"
tag-suffix: "-metal-darwin-arm64-sam3-cpp"
build-type: "metal"
lang: "go"
# privacy-filter (PII/NER) is a C++/ggml backend built by a bespoke darwin
# script (make backends/privacy-filter-darwin); ggml defaults Metal ON on Apple
# so the build is Metal-enabled. lang=go drives runner/toolchain selection only.
- backend: "privacy-filter"
tag-suffix: "-metal-darwin-arm64-privacy-filter"
lang: "go"
# LocalVQE has no Metal path; on Apple Silicon it builds CPU-only (GGML_METAL
# OFF) but is still a native arm64 image. Uses the darwin/metal build profile.
- backend: "localvqe"
tag-suffix: "-metal-darwin-arm64-localvqe"
build-type: "metal"
lang: "go"
- backend: "voxtral"
tag-suffix: "-metal-darwin-arm64-voxtral"
build-type: "metal"
@@ -5295,6 +5242,9 @@ includeDarwin:
- backend: "qwen-tts"
tag-suffix: "-metal-darwin-arm64-qwen-tts"
build-type: "mps"
- backend: "fish-speech"
tag-suffix: "-metal-darwin-arm64-fish-speech"
build-type: "mps"
- backend: "voxcpm"
tag-suffix: "-metal-darwin-arm64-voxcpm"
build-type: "mps"
@@ -5328,19 +5278,6 @@ includeDarwin:
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
# vLLM on Apple Silicon via vllm-metal (MLX). The install is custom
# (backend/python/vllm/install.sh has a darwin branch); lang stays python so
# backend_build_darwin.yml drives it through build-darwin-python-backend ->
# scripts/build/python-darwin.sh, which runs the backend's install.sh.
- backend: "vllm"
tag-suffix: "-metal-darwin-arm64-vllm"
build-type: "mps"
- backend: "trl"
tag-suffix: "-metal-darwin-arm64-trl"
build-type: "mps"
- backend: "liquid-audio"
tag-suffix: "-metal-darwin-arm64-liquid-audio"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"
@@ -5357,10 +5294,6 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
build-type: "metal"
lang: "go"
- backend: "supertonic"
tag-suffix: "-metal-darwin-arm64-supertonic"
build-type: "metal"
lang: "go"
- backend: "local-store"
tag-suffix: "-metal-darwin-arm64-local-store"
build-type: "metal"

View File

@@ -1,55 +0,0 @@
#!/bin/bash
# Bump the single vllm-metal pin (VLLM_METAL_VERSION) in the vLLM backend's
# darwin (Apple Silicon) install path. The macOS/Metal build
# (backend/python/vllm/install.sh, Darwin branch) installs vllm-metal, which is
# version-locked to a specific vLLM source release. install.sh derives that vLLM
# version at build time from vllm-metal's own installer (`vllm_v=`) at the pinned
# tag, so there is only ONE value to bump here -- mirroring bump_vllm_wheel.sh,
# which bumps the Linux cu130 wheel pin.
#
# This deliberately tracks vllm-project/vllm-metal, NOT vllm-project/vllm: the
# darwin build can only use the exact vLLM version vllm-metal supports, so it may
# lag the Linux pin (requirements-cublas13-after.txt) until vllm-metal catches up.
set -xe
REPO=$1 # vllm-project/vllm-metal
FILE=$2 # backend/python/vllm/install.sh
VAR=$3 # VLLM_METAL_VERSION (used for the workflow's output file names)
if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then
echo "usage: $0 <repo> <install-file> <var-name>" >&2
exit 1
fi
# vllm-metal ships frequent dev releases, all flagged as non-prerelease, so
# /releases/latest returns the newest one (with its cp312 wheel asset).
LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \
"https://api.github.com/repos/$REPO/releases/latest" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])")
# The coupled vLLM source version lives in vllm-metal's installer at that tag.
NEW_VLLM_VERSION=$(curl -fsSL \
"https://raw.githubusercontent.com/$REPO/$LATEST_TAG/install.sh" \
| grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -1 | cut -d'"' -f2)
if [ -z "$LATEST_TAG" ] || [ -z "$NEW_VLLM_VERSION" ]; then
echo "Could not resolve vllm-metal tag ($LATEST_TAG) or its vllm_v ($NEW_VLLM_VERSION)." >&2
exit 1
fi
set +e
CURRENT_TAG=$(grep -oE 'VLLM_METAL_VERSION="[^"]*"' "$FILE" | head -1 | cut -d'"' -f2)
set -e
# Rewrite the single pin. install.sh derives VLLM_VERSION from this tag at build
# time, so there is nothing else to touch. peter-evans/create-pull-request opens
# no PR on a clean tree, so a no-op rewrite (already current) is safe.
sed -i "$FILE" \
-e "s|VLLM_METAL_VERSION=\"[^\"]*\"|VLLM_METAL_VERSION=\"$LATEST_TAG\"|"
if [ -z "$CURRENT_TAG" ]; then
echo "Could not find VLLM_METAL_VERSION=\"...\" in $FILE." >&2
exit 0
fi
echo "vllm-metal ${CURRENT_TAG} -> ${LATEST_TAG} (builds vLLM ${NEW_VLLM_VERSION}): https://github.com/$REPO/releases/tag/${LATEST_TAG}" >> "${VAR}_message.txt"
echo "${LATEST_TAG}" >> "${VAR}_commit.txt"

View File

@@ -44,7 +44,7 @@ jobs:
has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
steps:
- name: Checkout repository
uses: actions/checkout@v7
uses: actions/checkout@v6
- name: Setup Bun
uses: oven-sh/setup-bun@v2

View File

@@ -101,7 +101,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true

View File

@@ -57,7 +57,7 @@ jobs:
HOMEBREW_NO_ANALYTICS: '1'
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
@@ -99,7 +99,6 @@ jobs:
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
/opt/homebrew/Cellar/opus
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
@@ -114,12 +113,7 @@ jobs:
# nlohmann-json is header-only and required by the ds4 backend
# (dsml_renderer.cpp includes <nlohmann/json.hpp>); on Linux it comes
# from the apt-installed nlohmann-json3-dev in the build image.
# opus + pkg-config are required by the opus go backend: its
# Makefile/package.sh call `pkg-config --cflags/--libs opus` to build
# libopusshim.dylib and to locate libopus.dylib for bundling. brew's
# pkg-config defaults its search path to the Homebrew prefix so the
# opus.pc is found.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json opus pkg-config
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json
# Force-reinstall ccache so brew re-validates its full runtime-dep
# closure on every run. This is the durable fix: when the upstream
# ccache formula gains a new transitive dep (as it has multiple times
@@ -138,7 +132,7 @@ jobs:
# and decides "already installed" without re-linking, so on a cache-
# hit run the formulas aren't on PATH. Force-link them; --overwrite
# tolerates pre-existing symlinks from earlier installs.
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json opus pkg-config 2>/dev/null || true
brew link --overwrite protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache blake3 fmt hiredis xxhash zstd nlohmann-json 2>/dev/null || true
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
@@ -159,7 +153,6 @@ jobs:
/opt/homebrew/Cellar/xxhash
/opt/homebrew/Cellar/zstd
/opt/homebrew/Cellar/nlohmann-json
/opt/homebrew/Cellar/opus
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----
@@ -235,17 +228,8 @@ jobs:
run: |
make backends/ds4-darwin
# privacy-filter is a C++/ggml backend like ds4 - a single grpc-server with
# otool dylib bundling - so it gets its own bespoke darwin script rather than
# the generic build-darwin-go-backend path.
- name: Build privacy-filter backend (Darwin Metal)
if: inputs.backend == 'privacy-filter'
run: |
make protogen-go
make backends/privacy-filter-darwin
- name: Build ${{ inputs.backend }}-darwin
if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4'
run: |
make protogen-go
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend

View File

@@ -49,7 +49,7 @@ jobs:
# Sparse checkout: the merge job needs `.github/scripts/` (for the
# keepalive cleanup script) but none of the source tree.
- name: Checkout (.github/scripts only)
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
sparse-checkout: |
.github/scripts

View File

@@ -23,7 +23,7 @@ jobs:
has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
steps:
- name: Checkout repository
uses: actions/checkout@v7
uses: actions/checkout@v6
- name: Setup Bun
uses: oven-sh/setup-bun@v2

View File

@@ -127,7 +127,7 @@ jobs:
# the original l4t matrix entry which set skip-drivers: 'true'.
skip-drivers: 'true'
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
with:
submodules: false
- name: Free disk space

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -25,7 +25,7 @@ jobs:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -47,7 +47,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Configure apt mirror on runner

View File

@@ -14,7 +14,7 @@ jobs:
bump:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- uses: actions/setup-go@v5
with:

View File

@@ -100,7 +100,7 @@ jobs:
file: "backend/go/vibevoice-cpp/Makefile"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- name: Bump dependencies 🔧
id: bump
run: |
@@ -136,7 +136,7 @@ jobs:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- name: Bump vLLM cu130 wheel pin 🔧
id: bump
run: |
@@ -162,39 +162,3 @@ jobs:
branch: "update/VLLM_VERSION"
body: ${{ steps.bump.outputs.message }}
signoff: true
bump-vllm-metal:
# The darwin (Apple Silicon) vLLM build installs vllm-metal, which is locked
# to a specific vLLM source release. install.sh pins both VLLM_METAL_VERSION
# (the wheel release) and VLLM_VERSION (the vLLM it builds against); this job
# tracks vllm-project/vllm-metal and rewrites both atomically. Separate from
# bump-vllm-wheel because darwin follows vllm-metal, not vllm/vllm latest.
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- name: Bump vllm-metal pin 🔧
id: bump
run: |
bash .github/bump_vllm_metal.sh vllm-project/vllm-metal backend/python/vllm/install.sh VLLM_METAL_VERSION
{
echo 'message<<EOF'
cat "VLLM_METAL_VERSION_message.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
{
echo 'commit<<EOF'
cat "VLLM_METAL_VERSION_commit.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
rm -rfv VLLM_METAL_VERSION_message.txt VLLM_METAL_VERSION_commit.txt
- name: Create Pull Request
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
commit-message: ':arrow_up: Update vllm-project/vllm-metal (darwin)'
title: 'chore: :arrow_up: Update vllm-metal (darwin) to `${{ steps.bump.outputs.commit }}`'
branch: "update/VLLM_METAL_VERSION"
body: ${{ steps.bump.outputs.message }}
signoff: true

View File

@@ -13,7 +13,7 @@ jobs:
- repository: "mudler/LocalAI"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- name: Bump dependencies 🔧
run: |
bash .github/bump_docs.sh ${{ matrix.repository }}

View File

@@ -8,7 +8,7 @@ jobs:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Install dependencies

View File

@@ -16,7 +16,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- uses: actions/setup-go@v5

View File

@@ -31,7 +31,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -44,7 +44,7 @@ jobs:
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
- name: Cache Intel images
uses: docker/build-push-action@v7

View File

@@ -28,7 +28,7 @@ jobs:
HUGO_VERSION: "0.146.3"
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0 # needed for enableGitInfo
submodules: true

View File

@@ -80,7 +80,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
- name: Configure apt mirror on runner
id: apt_mirror

View File

@@ -36,7 +36,7 @@ jobs:
# Sparse checkout: needed for .github/scripts/ (the keepalive cleanup
# script). Skips the rest of the source tree.
- name: Checkout (.github/scripts only)
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
sparse-checkout: |
.github/scripts

View File

@@ -20,7 +20,7 @@ jobs:
golangci-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
with:
# Full history so golangci-lint's new-from-merge-base can reach
# origin/master and compute the diff against it.

View File

@@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -24,35 +24,20 @@ jobs:
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
MACOS_SIGN_P12: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_SIGN_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
launcher-build-darwin:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: 1.23
- name: Import signing certificate
env:
MACOS_CERTIFICATE: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_CERTIFICATE_PWD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_CI_KEYCHAIN_PWD: ${{ secrets.MACOS_CI_KEYCHAIN_PWD }}
run: bash contrib/macos/sign-and-notarize.sh import-cert
- name: Build, sign and notarize the DMG
env:
MACOS_SIGN_IDENTITY: ${{ secrets.MACOS_SIGN_IDENTITY }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
run: make release-launcher-darwin
- name: Build launcher for macOS ARM64
run: |
make build-launcher-darwin
- name: Upload DMG to Release
uses: softprops/action-gh-release@v3
with:
@@ -61,7 +46,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Configure apt mirror on runner

View File

@@ -14,7 +14,7 @@ jobs:
GO111MODULE: on
steps:
- name: Checkout Source
uses: actions/checkout@v7
uses: actions/checkout@v6
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}

View File

@@ -50,7 +50,7 @@ jobs:
parakeet-cpp: ${{ steps.detect.outputs.parakeet-cpp }}
steps:
- name: Checkout repository
uses: actions/checkout@v7
uses: actions/checkout@v6
- name: Setup Bun
uses: oven-sh/setup-bun@v2
- name: Install dependencies
@@ -67,7 +67,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -90,7 +90,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -113,7 +113,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -137,7 +137,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -158,7 +158,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -178,7 +178,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -240,7 +240,7 @@ jobs:
# sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
# df -h
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -265,7 +265,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -288,7 +288,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -309,7 +309,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -330,7 +330,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -351,7 +351,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -373,7 +373,7 @@ jobs:
# timeout-minutes: 45
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -394,7 +394,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -415,7 +415,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -436,7 +436,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -462,7 +462,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -484,7 +484,7 @@ jobs:
timeout-minutes: 30
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -513,7 +513,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -530,7 +530,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -552,7 +552,7 @@ jobs:
timeout-minutes: 20
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -579,7 +579,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -604,7 +604,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -625,7 +625,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -645,7 +645,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -664,7 +664,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -681,7 +681,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -698,7 +698,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -741,7 +741,7 @@ jobs:
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -783,7 +783,7 @@ jobs:
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v7
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -808,7 +808,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -840,7 +840,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -876,7 +876,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -915,7 +915,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -952,7 +952,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -987,7 +987,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -1008,16 +1008,12 @@ jobs:
# image + working dir.
tests-vibevoice-cpp-grpc-transcription:
needs: detect-changes
# Skip on release tag pushes: the ASR Q4_K model is ~10 GB and cannot be
# pulled from HF within the inner `go test -timeout 30m` budget on a CI
# runner, so every tag build hung and timed out. Still runs on PRs/branch
# pushes that touch vibevoice-cpp so regressions are caught off the release path.
if: (needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true') && !startsWith(github.ref, 'refs/tags/')
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: bigger-runner
timeout-minutes: 150
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -1046,7 +1042,7 @@ jobs:
timeout-minutes: 60
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
@@ -1062,7 +1058,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -1095,7 +1091,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -1118,7 +1114,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -1144,7 +1140,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies

View File

@@ -21,7 +21,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Free disk space
@@ -84,7 +84,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
@@ -121,19 +121,3 @@ jobs:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
# Fast standalone unit tests for the backends' pure C++ helpers - currently the
# llama-cpp message reconstruction (backend/cpp/llama-cpp/message_content.h),
# which guards the OpenAI chat content normalization (mudler/LocalAI#10524,
# #7324, #7528). The runner discovers every *_test.cpp under backend/cpp/, so
# new pure-C++ unit tests are picked up with no CI changes. These need only the
# C++ stdlib + nlohmann/json, so they run on every PR without the full
# llama.cpp + gRPC backend build. (The same suite is also wired as an opt-in
# CMake/ctest target, -DLLAMA_GRPC_BUILD_TESTS=ON, for in-backend-build runs.)
tests-backend-cpp:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
- name: Run backend C++ unit tests
run: make test-backend-cpp

View File

@@ -62,7 +62,7 @@ jobs:
sudo rm -rfv build || true
df -h
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies

View File

@@ -21,7 +21,7 @@ jobs:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Configure apt mirror on runner

View File

@@ -1,97 +0,0 @@
---
name: 'PII NER tier E2E (live GGUF, CPU)'
# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the
# hermetic tests/e2e suite cannot cover (it only exercises the in-process
# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB
# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand.
#
# This drives the container-level harness (tests/e2e-backends) via
# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image,
# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned
# TokenClassify spans. The complementary HTTP-path specs in tests/e2e
# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired.
on:
workflow_dispatch:
schedule:
- cron: '0 3 * * *'
push:
branches:
- master
paths:
- 'backend/cpp/privacy-filter/**'
- 'backend/Dockerfile.privacy-filter'
- 'core/services/routing/pii/**'
- 'core/services/routing/piidetector/**'
- 'core/backend/token_classify.go'
- 'core/http/endpoints/localai/pii.go'
- 'core/schema/pii.go'
- 'tests/e2e-backends/**'
- 'tests/e2e/e2e_pii_ner_test.go'
- 'tests/e2e/e2e_suite_test.go'
- '.github/workflows/tests-pii-ner-e2e.yml'
pull_request:
paths:
- 'backend/cpp/privacy-filter/**'
- 'backend/Dockerfile.privacy-filter'
- 'core/services/routing/pii/**'
- 'core/services/routing/piidetector/**'
- 'core/backend/token_classify.go'
- 'core/http/endpoints/localai/pii.go'
- 'core/schema/pii.go'
- 'tests/e2e-backends/**'
- 'tests/e2e/e2e_pii_ner_test.go'
- 'tests/e2e/e2e_suite_test.go'
- '.github/workflows/tests-pii-ner-e2e.yml'
concurrency:
group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
tests-pii-ner-e2e:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v7
with:
submodules: true
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
sudo docker image prune --all --force || true
df -h
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
- name: Proto Dependencies
run: |
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential
# Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on
# CPU and runs the token_classify capability spec (byte-offset contract).
- name: Run live PII NER backend E2E
run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

View File

@@ -23,7 +23,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v7
uses: actions/checkout@v6
with:
submodules: true
- name: Configure apt mirror on runner

View File

@@ -10,7 +10,7 @@ jobs:
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- uses: actions/checkout@v6
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- uses: actions/setup-go@v5

6
.gitignore vendored
View File

@@ -91,9 +91,3 @@ core/http/react-ui/test-results/
# Local worktrees
.worktrees/
# SDD / brainstorm scratch (agent-driven development)
.superpowers/
# Local Apple signing material (never commit)
.certs/

View File

@@ -9,8 +9,7 @@ source:
enabled: true
name_template: '{{ .ProjectName }}-{{ .Tag }}-source'
builds:
- id: local-ai
main: ./cmd/local-ai
- main: ./cmd/local-ai
env:
- CGO_ENABLED=0
ldflags:
@@ -36,19 +35,3 @@ snapshot:
version_template: "{{ .Tag }}-next"
changelog:
use: github-native
# Sign + notarize the macOS server binary via the quill backend (runs on Linux,
# no macOS runner needed). Disabled automatically when MACOS_SIGN_P12 is unset
# (forks / PRs), so those builds stay unsigned and green.
notarize:
macos:
- enabled: '{{ isEnvSet "MACOS_SIGN_P12" }}'
ids:
- local-ai
sign:
certificate: "{{.Env.MACOS_SIGN_P12}}"
password: "{{.Env.MACOS_SIGN_PASSWORD}}"
notarize:
issuer_id: "{{.Env.MACOS_NOTARY_ISSUER_ID}}"
key_id: "{{.Env.MACOS_NOTARY_KEY_ID}}"
key: "{{.Env.MACOS_NOTARY_KEY}}"
wait: true

View File

@@ -43,5 +43,4 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
- **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **Backend OS coverage**: a new backend must target every OS it can build for, not just Linux. `.github/backend-matrix.yml` has two matrices — `include:` (Linux) and `includeDarwin:` (macOS / Apple Silicon). Most C/C++/GGML and many Python backends build on Darwin too — wire the `includeDarwin` entry + `backend/index.yaml` `metal:` entries, or say in the PR why an OS is unsupported. See the darwin checklist in [.agents/adding-backends.md](.agents/adding-backends.md).
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter backends/privacy-filter-darwin
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter
GOCMD=go
GOTEST=$(GOCMD) test
@@ -103,7 +103,7 @@ COVERAGE_E2E_LABELS?=!real-models
COVERAGE_EXCLUDE_RE?=grpc/proto/.*[.]pb[.]go
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-backend-cpp test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
all: help
@@ -201,13 +201,6 @@ test: prepare-test
OPUS_SHIM_LIBRARY=$(abspath ./pkg/opus/shim/libopusshim.so) \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
## Compiles and runs the standalone C++ unit tests for the backends (pure
## helpers that depend only on the stdlib + nlohmann/json, no full backend
## build). Discovers every *_test.cpp under backend/cpp/ - see
## backend/cpp/run-unit-tests.sh. Set NLOHMANN_INCLUDE to skip the header fetch.
test-backend-cpp:
bash backend/cpp/run-unit-tests.sh
## Runs the core suite ($(TEST_PATHS)) with statement-coverage instrumentation
## and writes a merged profile to $(COVERAGE_PROFILE). Deliberately omits
## --fail-fast so a single failure doesn't truncate the coverage number, and
@@ -697,16 +690,6 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
BACKEND_TEST_CTX_SIZE=2048 \
$(MAKE) test-extra-backend
## privacy-filter: the PII/NER token-classification backend. Exercises the
## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets
## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M
## active params). This is the live-backend coverage for the PII NER tier.
test-extra-backend-privacy-filter: docker-build-privacy-filter
BACKEND_IMAGE=local-ai-backend:privacy-filter \
BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \
BACKEND_TEST_CAPS=health,load,token_classify \
$(MAKE) test-extra-backend
## vllm is resolved from a HuggingFace model id (no file download) and
## exercises Predict + streaming + tool-call extraction via the hermes parser.
## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU
@@ -1136,10 +1119,6 @@ backends/ds4-darwin: build
bash ./scripts/build/ds4-darwin.sh
./local-ai backends install "ocifile://$(abspath ./backend-images/ds4.tar)"
backends/privacy-filter-darwin: build
bash ./scripts/build/privacy-filter-darwin.sh
./local-ai backends install "ocifile://$(abspath ./backend-images/privacy-filter.tar)"
build-darwin-python-backend: build
bash ./scripts/build/python-darwin.sh
@@ -1460,32 +1439,13 @@ docs: docs/static/gallery.html
########################################################
## fyne cross-platform build
# Build LocalAI.app from the launcher via fyne (metadata read from cmd/launcher/FyneApp.toml).
# Signing happens via contrib/macos/sign-and-notarize.sh, which is a no-op when the signing
# secrets are unset, so unsigned local/fork builds keep working.
build-launcher-darwin:
rm -rf dist/LocalAI.app cmd/launcher/LocalAI.app
mkdir -p dist
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os darwin -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)
mv cmd/launcher/LocalAI.app dist/LocalAI.app
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.app
# Wrap the (signed) app into a drag-to-Applications DMG via hdiutil, then sign the DMG.
dmg-launcher-darwin: build-launcher-darwin
rm -rf dist/dmg dist/LocalAI.dmg
mkdir -p dist/dmg
cp -R dist/LocalAI.app dist/dmg/LocalAI.app
ln -s /Applications dist/dmg/Applications
hdiutil create -volname "LocalAI" -srcfolder dist/dmg -ov -format UDZO dist/LocalAI.dmg
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.dmg
# Submit the DMG to Apple notarization and staple the ticket (no-op without notary secrets).
notarize-launcher-darwin: dmg-launcher-darwin
bash contrib/macos/sign-and-notarize.sh notarize dist/LocalAI.dmg
# Single entrypoint for CI: build -> sign app -> dmg -> sign dmg -> notarize -> staple.
release-launcher-darwin: notarize-launcher-darwin
@echo "dist/LocalAI.dmg is ready"
build-launcher-darwin: build-launcher
go run github.com/tiagomelo/macos-dmg-creator/cmd/createdmg@latest \
--appName "LocalAI" \
--appBinaryPath "$(LAUNCHER_BINARY_NAME)" \
--bundleIdentifier "com.localai.launcher" \
--iconPath "core/http/static/logo.png" \
--outputDir "dist/"
build-launcher-linux:
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv LocalAI.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv launcher.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz

View File

@@ -1,6 +1,15 @@
## Multimodal support is provided by the in-tree `mtmd` library target
## (examples/mtmd/), which the grpc-server links and includes below. clip/llava
## were pruned upstream; the high-level mtmd_* / mtmd_helper_* API is used instead.
## Clip/LLaVA library for multimodal support — built locally from copied sources
set(TARGET myclip)
add_library(${TARGET} clip.cpp clip.h llava.cpp llava.h)
install(TARGETS ${TARGET} LIBRARY)
target_include_directories(myclip PUBLIC .)
target_include_directories(myclip PUBLIC ../..)
target_include_directories(myclip PUBLIC ../../common)
target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if (NOT MSVC)
target_compile_options(${TARGET} PRIVATE -Wno-cast-qual)
endif()
set(TARGET grpc-server)
set(CMAKE_CXX_STANDARD 17)
@@ -58,16 +67,12 @@ add_library(hw_grpc_proto
${hw_proto_hdrs} )
add_executable(${TARGET} grpc-server.cpp json.hpp)
# mtmd public headers (mtmd.h / mtmd-helper.h) live in examples/mtmd/.
# Linking the mtmd target also propagates this include dir, but we add it
# explicitly for clarity.
target_include_directories(${TARGET} PRIVATE ../mtmd)
target_link_libraries(${TARGET} PRIVATE common llama mtmd ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
target_link_libraries(${TARGET} PRIVATE common llama myclip ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
absl::flags_parse
gRPC::${_REFLECTION}
gRPC::${_GRPC_GRPCPP}
protobuf::${_PROTOBUF_LIBPROTOBUF})
target_compile_features(${TARGET} PRIVATE cxx_std_17)
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=f96eaddba8bed6a9a5e628bbf6a566775c70b49c
IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -11,8 +11,8 @@
#include <memory>
#include <string>
#include <getopt.h>
#include "mtmd.h"
#include "mtmd-helper.h"
#include "clip.h"
#include "llava.h"
#include "log.h"
#include "common.h"
#include "json.hpp"
@@ -45,9 +45,7 @@ using backend::HealthMessage;
///// LLAMA.CPP server code below
// Match mtmd.h and ik_llama's server/common headers, which all use
// nlohmann::ordered_json; a plain nlohmann::json alias collides at global scope.
using json = nlohmann::ordered_json;
using json = nlohmann::json;
struct server_params
{
@@ -221,11 +219,6 @@ struct llama_client_slot
// multimodal
std::vector<slot_image> images;
// Full prompt with mtmd media markers (mtmd_default_marker()) substituted in
// place of the legacy [img-N] tags, covering the text up to and including the
// last image. The text after the last image is kept in params.input_suffix and
// decoded through the normal token path so the sampling loop is unchanged.
std::string mtmd_prompt;
// stats
size_t sent_count = 0;
@@ -259,14 +252,14 @@ struct llama_client_slot
for (slot_image & img : images)
{
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
free(img.image_embedding);
if (img.img_data) {
clip_image_u8_free(img.img_data);
}
img.prefix_prompt = "";
}
images.clear();
mtmd_prompt = "";
}
bool has_budget(gpt_params &global_params) {
@@ -403,13 +396,46 @@ struct llama_metrics {
}
};
struct llava_embd_batch {
std::vector<llama_pos> pos;
std::vector<int32_t> n_seq_id;
std::vector<llama_seq_id> seq_id_0;
std::vector<llama_seq_id *> seq_ids;
std::vector<int8_t> logits;
llama_batch batch;
llava_embd_batch(float * embd, int32_t n_tokens, llama_pos pos_0, llama_seq_id seq_id) {
pos .resize(n_tokens);
n_seq_id.resize(n_tokens);
seq_ids .resize(n_tokens + 1);
logits .resize(n_tokens);
seq_id_0.resize(1);
seq_id_0[0] = seq_id;
seq_ids [n_tokens] = nullptr;
batch = {
/*n_tokens =*/ n_tokens,
/*tokens =*/ nullptr,
/*embd =*/ embd,
/*pos =*/ pos.data(),
/*n_seq_id =*/ n_seq_id.data(),
/*seq_id =*/ seq_ids.data(),
/*logits =*/ logits.data(),
};
for (int i = 0; i < n_tokens; i++) {
batch.pos [i] = pos_0 + i;
batch.n_seq_id[i] = 1;
batch.seq_id [i] = seq_id_0.data();
batch.logits [i] = false;
}
}
};
struct llama_server_context
{
llama_model *model = nullptr;
llama_context *ctx = nullptr;
const llama_vocab * vocab = nullptr;
mtmd_context *mctx = nullptr;
clip_ctx *clp_ctx = nullptr;
gpt_params params;
@@ -465,6 +491,11 @@ struct llama_server_context
if (!params.mmproj.path.empty()) {
multimodal = true;
LOG_INFO("Multi Modal Mode Enabled", {});
clp_ctx = clip_model_load(params.mmproj.path.c_str(), /*verbosity=*/ 1);
if(clp_ctx == nullptr) {
LOG_ERR("unable to load clip model: %s", params.mmproj.path.c_str());
return false;
}
if (params.n_ctx < 2048) { // request larger context for the image embedding
params.n_ctx = 2048;
@@ -481,24 +512,10 @@ struct llama_server_context
}
if (multimodal) {
// mtmd_init_from_file requires the already-loaded text model, so it must
// run AFTER llama_init_from_gpt_params. It validates the projector
// against the model internally and returns nullptr on dim mismatch, so
// the explicit clip_n_mmproj_embd check is no longer needed.
mtmd_context_params mparams = mtmd_context_params_default();
mparams.use_gpu = params.mmproj_use_gpu;
mparams.print_timings = false;
mparams.n_threads = params.n_threads_mtmd != -1 ? params.n_threads_mtmd
: params.n_threads_batch != -1 ? params.n_threads_batch
: params.n_threads;
mparams.verbosity = GGML_LOG_LEVEL_INFO;
mparams.flash_attn_type = params.flash_attn ? LLAMA_FLASH_ATTN_TYPE_ENABLED
: LLAMA_FLASH_ATTN_TYPE_DISABLED;
mparams.image_min_tokens = params.image_min_tokens;
mparams.image_max_tokens = params.image_max_tokens;
mctx = mtmd_init_from_file(params.mmproj.path.c_str(), model, mparams);
if (mctx == nullptr) {
LOG_ERR("unable to load multimodal projector: %s", params.mmproj.path.c_str());
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
const int n_embd_llm = llama_model_n_embd(model);
if (n_embd_clip != n_embd_llm) {
LOG("%s: embedding dim of the multimodal projector (%d) is not equal to that of LLaMA (%d). Make sure that you use the correct mmproj file.\n", __func__, n_embd_clip, n_embd_llm);
llama_free(ctx);
llama_free_model(model);
return false;
@@ -848,8 +865,8 @@ struct llama_server_context
slot_image img_sl;
img_sl.id = img.count("id") != 0 ? img["id"].get<int>() : slot->images.size();
img_sl.bitmap = mtmd_helper_bitmap_init_from_buf(mctx, image_buffer.data(), image_buffer.size());
if (img_sl.bitmap == nullptr)
img_sl.img_data = clip_image_u8_init();
if (!clip_image_load_from_bytes(image_buffer.data(), image_buffer.size(), img_sl.img_data))
{
LOG_ERR("%s: failed to load image, slot_id: %d, img_sl_id: %d",
__func__,
@@ -862,74 +879,50 @@ struct llama_server_context
{"slot_id", slot->id},
{"img_sl_id", img_sl.id}
});
img_sl.request_encode_image = true;
slot->images.push_back(img_sl);
}
// Translate the legacy [img-N] tags into mtmd media markers, in
// order, and collect the matching bitmaps in marker order so they
// line up with the markers passed to mtmd_tokenize(). The text after
// the last image stays in input_suffix and is decoded through the
// normal token path, so the sampling loop is unchanged.
// example: system prompt [img-102] user [img-103] describe [img-134]
// process prompt
// example: system prompt [img-102] user [img-103] describe [img-134] -> [{id: 102, prefix: 'system prompt '}, {id: 103, prefix: ' user '}, {id: 134, prefix: ' describe '}]}
if (slot->images.size() > 0 && !slot->prompt.is_array())
{
const std::string marker = mtmd_default_marker();
std::string prompt = slot->prompt.get<std::string>();
std::string built_prompt;
std::vector<slot_image> ordered;
size_t pos = 0, copy_from = 0;
size_t pos = 0, begin_prefix = 0;
std::string pattern = "[img-";
auto free_images = [&]() {
for (slot_image &img : slot->images) {
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
}
}
slot->images.clear();
};
while ((pos = prompt.find(pattern, pos)) != std::string::npos) {
size_t tag_begin = pos;
size_t end_prefix = pos;
pos += pattern.length();
size_t end_pos = prompt.find(']', pos);
if (end_pos == std::string::npos) {
break;
}
std::string image_id = prompt.substr(pos, end_pos - pos);
try
if (end_pos != std::string::npos)
{
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
std::string image_id = prompt.substr(pos, end_pos - pos);
try
{
if (img.id == img_id) {
found = true;
// text before this tag, then the media marker
built_prompt += prompt.substr(copy_from, tag_begin - copy_from);
built_prompt += marker;
copy_from = end_pos + 1;
ordered.push_back(img);
break;
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
{
if (img.id == img_id) {
found = true;
img.prefix_prompt = prompt.substr(begin_prefix, end_prefix - begin_prefix);
begin_prefix = end_pos + 1;
break;
}
}
}
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
free_images();
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
slot->images.clear();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
slot->images.clear();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
free_images();
return false;
}
pos = end_pos + 1;
}
// bitmaps are consumed in marker order by mtmd_tokenize()
slot->images = ordered;
slot->mtmd_prompt = built_prompt;
slot->prompt = "";
slot->params.input_suffix = prompt.substr(copy_from);
slot->params.input_suffix = prompt.substr(begin_prefix);
slot->params.cache_prompt = false; // multimodal doesn't support cache prompt
}
}
@@ -1183,10 +1176,21 @@ struct llama_server_context
bool process_images(llama_client_slot &slot) const
{
// With the mtmd pipeline, image encoding is no longer eager: the bitmaps
// are tokenized and encoded together with the surrounding text inside
// ingest_images() via mtmd_tokenize() + mtmd_helper_eval_chunks(). This
// just reports whether the slot carries any images to process.
for (slot_image &img : slot.images)
{
if (!img.request_encode_image)
{
continue;
}
if (!llava_image_embed_make_with_clip_img(clp_ctx, params.n_threads, img.img_data, &img.image_embedding, &img.image_tokens)) {
LOG("Error processing the given image");
return false;
}
img.request_encode_image = false;
}
return slot.images.size() > 0;
}
@@ -1431,70 +1435,69 @@ struct llama_server_context
}
}
// Tokenize the multimodal prompt (text interleaved with media markers) together
// with the slot's bitmaps, then decode the resulting chunks into the llama
// context via the high-level mtmd helper. The helper runs llama_decode() on the
// text chunks and mtmd_encode() + llama_decode() on the image chunks, handling
// batching and any pre/post decode setup (e.g. non-causal attention for gemma3).
// Advances slot.n_past by the number of positions consumed, then leaves the
// post-image suffix tokens in `batch` so the normal decode + sampling loop
// produces the first generated token.
// for multiple images processing
bool ingest_images(llama_client_slot &slot, int n_batch)
{
if (mctx == nullptr)
{
LOG("%s : multimodal context is not initialized\n", __func__);
return false;
}
int image_idx = 0;
// bitmaps stay owned by slot.images (freed on reset()); pass non-owning ptrs
std::vector<const mtmd_bitmap *> bitmaps;
bitmaps.reserve(slot.images.size());
for (const slot_image &img : slot.images)
while (image_idx < (int) slot.images.size())
{
bitmaps.push_back(img.bitmap);
}
slot_image &img = slot.images[image_idx];
mtmd_input_text inp_txt;
inp_txt.text = slot.mtmd_prompt.c_str();
inp_txt.add_special = add_bos_token;
inp_txt.parse_special = true;
// process prefix prompt
for (int32_t i = 0; i < (int32_t) batch.n_tokens; i += n_batch)
{
const int32_t n_tokens = std::min(n_batch, (int32_t) (batch.n_tokens - i));
llama_batch batch_view = {
n_tokens,
batch.token + i,
nullptr,
batch.pos + i,
batch.n_seq_id + i,
batch.seq_id + i,
batch.logits + i,
};
if (llama_decode(ctx, batch_view))
{
LOG("%s : failed to eval\n", __func__);
return false;
}
}
mtmd::input_chunks chunks(mtmd_input_chunks_init());
int32_t res = mtmd_tokenize(mctx,
chunks.ptr.get(),
&inp_txt,
bitmaps.data(),
bitmaps.size());
if (res != 0)
{
LOG("%s : failed to tokenize multimodal prompt, res = %d\n", __func__, res);
return false;
}
// process image with llm
for (int i = 0; i < img.image_tokens; i += n_batch)
{
int n_eval = img.image_tokens - i;
if (n_eval > n_batch)
{
n_eval = n_batch;
}
const llama_pos start_pos = (llama_pos) system_tokens.size() + slot.n_past;
llama_pos new_n_past = start_pos;
if (mtmd_helper_eval_chunks(mctx,
ctx,
chunks.ptr.get(),
start_pos,
slot.id,
n_batch,
/*logits_last=*/ false,
&new_n_past) != 0)
{
LOG("%s : failed to eval multimodal chunks\n", __func__);
return false;
}
slot.n_past += (int32_t) (new_n_past - start_pos);
const int n_embd = llama_model_n_embd(model);
float * embd = img.image_embedding + i * n_embd;
llava_embd_batch llava_batch = llava_embd_batch(embd, n_eval, slot.n_past, 0);
if (llama_decode(ctx, llava_batch.batch))
{
LOG("%s : failed to eval image\n", __func__);
return false;
}
slot.n_past += n_eval;
}
image_idx++;
// queue the post-image suffix text for the normal decode + sampling path
common_batch_clear(batch);
std::vector<llama_token> suffix_tokens = tokenize(slot.params.input_suffix, false);
for (llama_token tok : suffix_tokens)
{
common_batch_add(batch, tok, system_tokens.size() + slot.n_past, { slot.id }, false);
slot.n_past += 1;
common_batch_clear(batch);
// append prefix of next image
const auto json_prompt = (image_idx >= (int) slot.images.size()) ?
slot.params.input_suffix : // no more images, then process suffix prompt
(json)(slot.images[image_idx].prefix_prompt);
std::vector<llama_token> append_tokens = tokenize(json_prompt, false); // has next image
for (int i = 0; i < (int) append_tokens.size(); ++i)
{
common_batch_add(batch, append_tokens[i], system_tokens.size() + slot.n_past, { slot.id }, true);
slot.n_past += 1;
}
}
return true;
@@ -1881,11 +1884,8 @@ struct llama_server_context
const bool has_images = process_images(slot);
// For the multimodal path the whole pre-image / inter-image text is
// tokenized and decoded inside ingest_images() via mtmd, so no prefix
// tokens are queued here; the post-image suffix is appended by
// ingest_images() for the normal decode + sampling loop.
std::vector<llama_token> prefix_tokens = has_images ? std::vector<llama_token>() : prompt_tokens;
// process the prefix of first image
std::vector<llama_token> prefix_tokens = has_images ? tokenize(slot.images[0].prefix_prompt, add_bos_token) : prompt_tokens;
int32_t slot_npast = slot.n_past_se > 0 ? slot.n_past_se : slot.n_past;

View File

@@ -0,0 +1,11 @@
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2494,7 +2494,7 @@
}
new_data = work.data();
- new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr);
+ new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr, nullptr);
} else {
new_type = cur->type;
new_data = cur->data;

View File

@@ -17,9 +17,28 @@ cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
cp -r utils.hpp llama.cpp/examples/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/examples/grpc-server/
## Multimodal support is provided by the `mtmd` library target (examples/mtmd/),
## which the grpc-server links and includes directly. No source copy is needed:
## clip/llava were pruned upstream and the high-level mtmd_* API is used instead.
## Copy clip/llava files for multimodal support (built as myclip library)
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
# Prepend llama.h include to llava.h
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
# Copy clip-impl.h if it exists
if [ -f llama.cpp/examples/llava/clip-impl.h ]; then
cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h
fi
# Copy stb_image.h
if [ -f llama.cpp/vendor/stb/stb_image.h ]; then
cp -rfv llama.cpp/vendor/stb/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
elif [ -f llama.cpp/common/stb_image.h ]; then
cp -rfv llama.cpp/common/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
fi
## Fix API compatibility in llava.cpp (llama_n_embd -> llama_model_n_embd)
if [ -f llama.cpp/examples/grpc-server/llava.cpp ]; then
sed -i 's/llama_n_embd(/llama_model_n_embd(/g' llama.cpp/examples/grpc-server/llava.cpp
fi
set +e
if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -13,28 +13,28 @@ grep -e "flags" /proc/cpuinfo | head -1
# ik_llama.cpp requires AVX2 — default to avx2 binary
BINARY=ik-llama-cpp-avx2
if [ -e "$CURDIR"/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
if [ -e $CURDIR/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 NOT found, using fallback"
BINARY=ik-llama-cpp-fallback
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec "$CURDIR"/$BINARY "$@"
exec $CURDIR/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec "$CURDIR"/ik-llama-cpp-fallback "$@"
exec $CURDIR/ik-llama-cpp-fallback "$@"

View File

@@ -11,12 +11,9 @@
#include "json.hpp"
#include "mtmd.h"
#include "clip.h"
// mtmd.h and ik_llama's entire server/common stack (chat.h, server-common.h,
// server-task.h, ...) declare `using json = nlohmann::ordered_json`, so match it
// here: a plain `nlohmann::json` alias collides with mtmd.h's at global scope.
using json = nlohmann::ordered_json;
using json = nlohmann::json;
extern bool server_verbose;
@@ -114,12 +111,13 @@ struct slot_image
{
int32_t id;
// mtmd bitmap (image/audio) decoded from the request buffer. Owned by the
// slot; freed via mtmd_bitmap_free() on reset. The high-level mtmd pipeline
// (mtmd_tokenize + mtmd_helper_eval_chunks) consumes these directly, so the
// legacy eager-encode fields (embedding/tokens) and per-image prefix prompt
// are no longer needed.
mtmd_bitmap * bitmap = nullptr;
bool request_encode_image = false;
float * image_embedding = nullptr;
int32_t image_tokens = 0;
clip_image_u8 * img_data;
std::string prefix_prompt; // before of this image
};
// completion token output with probabilities

View File

@@ -50,13 +50,8 @@ add_custom_command(
"${hw_proto}"
DEPENDS "${hw_proto}")
# hw_grpc_proto: force STATIC. Under the CPU_ALL_VARIANTS build BUILD_SHARED_LIBS=ON
# (ggml/llama become shared), which would otherwise make this glue library a DSO. As a
# DSO it references the hidden-visibility symbols in the static libprotobuf.a, which the
# linker cannot satisfy ("hidden symbol ... in libprotobuf.a is referenced by DSO").
# Keeping it STATIC links protobuf/gRPC directly into the grpc-server executable while
# only ggml/llama stay shared. No effect on the static variants (already BUILD_SHARED_LIBS=OFF).
add_library(hw_grpc_proto STATIC
# hw_grpc_proto
add_library(hw_grpc_proto
${hw_grpc_srcs}
${hw_grpc_hdrs}
${hw_proto_srcs}
@@ -87,18 +82,3 @@ target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()
# Unit test for the message-content normalization helper (message_content.h).
# Off by default so the normal backend build is untouched; enable with
# -DLLAMA_GRPC_BUILD_TESTS=ON and run via ctest. It reuses llama.cpp's vendored
# <nlohmann/json.hpp> (propagated by the common helpers library) so it has no
# extra dependency beyond what the backend already builds against.
option(LLAMA_GRPC_BUILD_TESTS "Build grpc-server unit tests" OFF)
if(LLAMA_GRPC_BUILD_TESTS)
enable_testing()
add_executable(message_content_test message_content_test.cpp message_content.h)
target_include_directories(message_content_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(message_content_test PRIVATE ${_LLAMA_COMMON_TARGET})
target_compile_features(message_content_test PRIVATE cxx_std_17)
add_test(NAME message_content_test COMMAND message_content_test)
endif()

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=0ed235ea2c17a19fc8238668653946721ed136fd
LLAMA_VERSION?=e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=
@@ -10,16 +10,8 @@ TARGET?=--target grpc-server
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)
ARCH?=$(shell uname -m)
# Shared libs default to OFF: we link static gRPC and the avx/avx2/avx512/fallback
# variants are fully static. The CPU_ALL_VARIANTS build flips SHARED_LIBS=ON (ggml/llama
# become shared so the dynamic CPU backends work; gRPC stays static via its imported
# targets). SHARED_LIBS is a make variable, not an appended -D, so it survives the
# recursive sub-make into the VARIANT build dir (which re-parses this Makefile) instead
# of being re-clobbered by a second -DBUILD_SHARED_LIBS=OFF. EXTRA_CMAKE_ARGS is the hook
# the CPU_ALL_VARIANTS target uses to inject -DGGML_BACKEND_DL/-DGGML_CPU_ALL_VARIANTS.
SHARED_LIBS?=OFF
EXTRA_CMAKE_ARGS?=
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=$(SHARED_LIBS) -DLLAMA_CURL=OFF $(EXTRA_CMAKE_ARGS)
# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
ifeq ($(NATIVE),false)
@@ -128,39 +120,15 @@ llama-cpp-fallback: llama.cpp
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback
# Single-build CPU backend using ggml's CPU_ALL_VARIANTS. Produces ONE grpc-server
# plus a set of dlopen-able libggml-cpu-*.so (sandybridge/haswell/skylakex/...) that
# ggml's backend registry selects from at runtime by probing host CPU features.
# Replaces the avx/avx2/avx512/fallback multi-binary build on x86.
#
# CPU_ALL_VARIANTS requires GGML_BACKEND_DL, which requires BUILD_SHARED_LIBS=ON, so we
# pass SHARED_LIBS=ON and the DL flags as make variables (NOT pre-expanded into the
# CMAKE_ARGS env string): command-line make variables propagate through every recursive
# sub-make, so the deepest VARIANT-dir build computes BUILD_SHARED_LIBS=ON consistently.
# Only ggml/llama go shared - gRPC is found via its static imported targets, so the
# grpc-server binary keeps static gRPC and only dynamically links ggml.
#
# TARGET adds "ggml": the per-microarch backends are runtime-dlopened, not link deps of
# grpc-server, so they only build because each is an add_dependencies() of the ggml target.
llama-cpp-cpu-all: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build purge
$(info ${GREEN}I llama-cpp build info:cpu-all-variants${RESET})
$(MAKE) SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" VARIANT="llama-cpp-cpu-all-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/grpc-server llama-cpp-cpu-all
rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs
find $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \;
@echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/
llama-cpp-grpc: llama.cpp
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build purge
$(info ${GREEN}I llama-cpp build info:grpc${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" TARGET="--target grpc-server --target ggml-rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" TARGET="--target grpc-server --target rpc-server" $(MAKE) VARIANT="llama-cpp-grpc-build" build-llama-cpp-grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/grpc-server llama-cpp-grpc
llama-cpp-rpc-server: llama-cpp-grpc
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/llama.cpp/build/bin/ggml-rpc-server llama-cpp-rpc-server
cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build/llama.cpp/build/bin/rpc-server llama-cpp-rpc-server
llama.cpp:
mkdir -p llama.cpp

View File

@@ -30,19 +30,6 @@
#define LOCALAI_HAS_SERVER_SCHEMA 1
#include "server-schema.cpp"
#endif
// server-stream.cpp exists only in llama.cpp after the upstream refactor that
// added the SSE stream-resumption layer (stream_session/stream_pipe_producer).
// server-context.cpp calls into it (spipe->cleanup(), stream_aware_should_stop,
// stream_session_attach_pipe), so its definitions must be part of this
// translation unit or the link fails with "undefined reference to
// stream_pipe_producer::cleanup()". The file is self-contained (its only
// external symbols come from server-common, already pulled in above) and the
// http route-handler factories it also defines are unused here but harmless.
// __has_include keeps the source compatible with older pins/forks that predate
// the split.
#if __has_include("server-stream.cpp")
#include "server-stream.cpp"
#endif
#include "server-context.cpp"
// LocalAI
@@ -50,9 +37,7 @@
#include "backend.pb.h"
#include "backend.grpc.pb.h"
#include "common.h"
#include "arg.h"
#include "chat-auto-parser.h"
#include "message_content.h"
#include <getopt.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
#include <grpcpp/grpcpp.h>
@@ -607,10 +592,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.checkpoint_min_step = 256;
#endif
// Raw upstream llama-server flags collected from any option entry that
// starts with '-'. Applied once after the loop via common_params_parse.
std::vector<std::string> extra_argv;
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
std::string opt = request->options(i);
@@ -1099,31 +1080,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} catch (...) {}
}
// --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) ---
} else if (!strcmp(optname, "cpu_moe")) {
// Bool-style flag: keep all MoE expert weights on CPU.
const bool enable = (optval == NULL) ||
optval_str == "true" || optval_str == "1" || optval_str == "yes" ||
optval_str == "on" || optval_str == "enabled";
if (enable) {
params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override());
}
} else if (!strcmp(optname, "n_cpu_moe")) {
if (optval != NULL) {
try {
int n = std::stoi(optval_str);
if (n < 0) n = 0;
// Keep override-name storage alive for the lifetime of the
// params struct (mirrors upstream arg.cpp's function-local static).
static std::list<std::string> buft_overrides_main;
for (int i = 0; i < n; ++i) {
buft_overrides_main.push_back(llm_ffn_exps_block_regex(i));
params.tensor_buft_overrides.push_back(
{buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()});
}
} catch (...) {}
}
// --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) ---
} else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) {
// Format: <tensor regex>=<buffer type>,<tensor regex>=<buffer type>,...
@@ -1155,30 +1111,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
else { cur.push_back(c); }
}
if (!cur.empty()) flush(cur);
// --- generic passthrough: any entry starting with '-' is a raw
// upstream llama-server flag, forwarded verbatim to the parser. ---
} else if (optname[0] == '-') {
std::string flag = optname;
// These flags make upstream's parser exit() (printing usage /
// completion), which would kill the backend process. Skip them.
if (flag == "-h" || flag == "--help" || flag == "--usage" ||
flag == "--version" || flag == "--license" ||
flag == "--list-devices" || flag == "-cl" ||
flag == "--cache-list" ||
flag.rfind("--completion", 0) == 0) {
fprintf(stderr,
"[llama-cpp] ignoring passthrough flag that would exit: %s\n",
flag.c_str());
} else {
extra_argv.push_back(flag);
// Preserve the whole value after the first ':' so embedded
// colons (e.g. host:port) survive strtok's truncation of optval.
auto colon = opt.find(':');
if (colon != std::string::npos) {
extra_argv.push_back(opt.substr(colon + 1));
}
}
}
}
@@ -1214,6 +1146,27 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
}
}
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {
@@ -1306,69 +1259,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.sampling.grammar_triggers.push_back(std::move(trigger));
}
}
// Apply any raw upstream flags last so an explicit passthrough flag wins
// over the LocalAI-resolved field it maps to (e.g. --ctx-size beats
// context_size). This is the same parser llama-server itself uses.
if (!extra_argv.empty()) {
// common_params_parser_init resets a few fields for the SERVER example
// (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated
// passthrough flag can't silently clobber LocalAI's resolved value.
const int saved_n_parallel = params.n_parallel;
std::vector<char *> argv;
std::string prog = "llama-server";
argv.push_back(prog.data());
for (auto & a : extra_argv) {
argv.push_back(a.data());
}
// ctx_arg.params is a reference, so this overlays the given flags onto
// `params` in place. Returns false on a recoverable parse error (and
// self-restores params); may exit() on a hard error, exactly as
// passing the same bad flag to llama-server would.
if (!common_params_parse((int)argv.size(), argv.data(), params,
LLAMA_EXAMPLE_SERVER)) {
fprintf(stderr,
"[llama-cpp] failed to parse passthrough options; ignoring them\n");
}
// Restore n_parallel unless a passthrough flag explicitly set it
// (parser_init's reset sentinel for SERVER is -1).
if (params.n_parallel == -1) {
params.n_parallel = saved_n_parallel;
}
}
// Terminate/pad the override vectors only after BOTH the named-option loop
// and the generic passthrough (common_params_parse above) have pushed their
// real entries, so back() is the null sentinel the model loader asserts on.
// Running these before the passthrough let a passthrough flag (--cpu-moe,
// --override-tensor, --override-kv, ...) append a real entry after the
// sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for
// kv_overrides. Double-termination is harmless (the while is a no-op if the
// passthrough parse already padded; an extra trailing null is ignored).
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
@@ -1630,20 +1520,242 @@ public:
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
json msg_json;
msg_json["role"] = msg.role();
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
}
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -1864,7 +1976,36 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
llama_grpc::normalize_template_message(body_json["messages"][idx]);
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
}
}
@@ -2196,20 +2337,264 @@ public:
SRV_INF("[CONTENT DEBUG] Predict: Processing %d messages\n", request->messages_size());
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
json msg_json;
msg_json["role"] = msg.role();
SRV_INF("[CONTENT DEBUG] Predict: Message %d: role=%s, content_empty=%d, content_length=%zu\n",
i, msg.role().c_str(), msg.content().empty() ? 1 : 0, msg.content().size());
if (!msg.content().empty()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content (first 200 chars): %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
}
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d parsed JSON is null, converting to empty string\n", i);
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content is object, converting to string\n", i);
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content_val was null, setting to empty string\n", i);
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
SRV_INF("[CONTENT DEBUG] Predict: Message %d content set, type=%s\n",
i, content_val.is_string() ? "string" :
content_val.is_array() ? "array" :
content_val.is_object() ? "object" : "other");
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
SRV_INF("[CONTENT DEBUG] Predict: Message %d created content array with media\n", i);
} else if (!msg.tool_calls().empty()) {
// Tool call messages may have null content, but templates expect string
// IMPORTANT: Set to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls, setting content to space (not empty string)\n", i);
msg_json["content"] = " ";
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] Predict: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
}
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -2430,7 +2815,36 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] Predict: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
llama_grpc::normalize_template_message(body_json["messages"][idx]);
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
}
}

View File

@@ -1,192 +0,0 @@
#pragma once
#include <string>
#include <vector>
#include <nlohmann/json.hpp>
namespace llama_grpc {
// Normalizes a proto message's content string into the JSON value used when
// reconstructing OpenAI-format messages for the tokenizer (jinja) template.
//
// Shared by the streaming (PredictStream) and non-streaming (Predict) message
// reconstruction paths so the two cannot drift.
//
// LocalAI's Go layer (schema.Messages.ToProto) always sends content as a plain
// text string; multimodal media travels in separate proto fields, never inside
// content. So user/system/developer content is *only ever* opaque text and must
// NOT be JSON-sniffed: a prompt that merely looks like JSON (e.g. an ingredient
// list ["1/4 cup sugar", ...]) would otherwise be reinterpreted as structured
// content parts and rejected by oaicompat_chat_params_parse with
// "unsupported content[].type" (https://github.com/mudler/LocalAI/issues/10524).
// (developer is OpenAI's modern system alias - same "human-authored text" nature.)
//
// For assistant/tool messages we still collapse a literal JSON null/object
// (tool-call bookkeeping) to a string, but we never turn a plain string into an
// array/scalar. The array defense is therefore role-independent (arrays/scalars
// fall through for every role); the role gate only governs the null/object case.
inline nlohmann::ordered_json normalize_message_content(const std::string& role,
const std::string& content) {
nlohmann::ordered_json content_val = content;
if (role != "user" && role != "system" && role != "developer") {
try {
nlohmann::ordered_json parsed = nlohmann::ordered_json::parse(content);
if (parsed.is_null()) {
content_val = "";
} else if (parsed.is_object()) {
content_val = parsed.dump();
}
// arrays / scalars: keep the original plain-text string as-is
} catch (const nlohmann::ordered_json::parse_error&) {
// Not JSON, already the plain string
}
}
return content_val;
}
// Final safety pass applied to each reconstructed OpenAI message right before it
// is handed to oaicompat_chat_params_parse (jinja templating). Jinja templates
// assume content is a string: a literal null breaks slicing such as
// message.content[:N] (#7324), and a tool message with array content is rejected
// (#7528). A multimodal user message legitimately carries a typed-part array
// ({type:text}, {type:image_url}, ...), which must be left intact. Shared by the
// streaming and non-streaming paths so this invariant cannot drift between them.
inline void normalize_template_message(nlohmann::ordered_json& msg) {
if (!msg.contains("content")) {
msg["content"] = ""; // templates expect the field to exist
return;
}
nlohmann::ordered_json& content = msg["content"];
const std::string role = (msg.contains("role") && msg["role"].is_string())
? msg["role"].get<std::string>()
: std::string();
if (content.is_null()) {
content = ""; // #7324: null would crash content[:N] slicing
} else if (role == "tool" && content.is_array()) {
content = content.dump(); // #7528: tool messages must have string content
} else if (!content.is_string() && !content.is_array()) {
if (content.is_object()) {
content = content.dump(); // tool-call bookkeeping object -> string
} else {
content = ""; // other scalar (number/bool) -> empty
}
}
// string, or a non-tool (multimodal) typed-part array: leave untouched
}
// One proto message's data, flattened to plain types so the reconstruction logic
// can be shared and unit-tested without protobuf. The streaming and non-streaming
// predict paths both populate this from proto::Message + the request's media.
struct ReconstructedMessageInput {
std::string role;
std::string content; // proto.Message.content (always a plain string)
std::string name;
std::string tool_call_id;
std::string reasoning_content;
std::string tool_calls; // tool_calls as a JSON string, or empty
bool is_last_user_msg = false; // attach request media to this message
std::vector<std::string> images; // base64 (jpeg)
std::vector<std::string> audios; // base64 (wav)
std::vector<std::string> videos; // base64
};
// Appends the request's media as OpenAI typed content parts. Imperative (not
// brace-init) to avoid nlohmann's object-vs-array initializer-list ambiguity.
inline void append_media_parts(nlohmann::ordered_json& content_array,
const std::vector<std::string>& images,
const std::vector<std::string>& audios,
const std::vector<std::string>& videos) {
for (const auto& img : images) {
nlohmann::ordered_json image_chunk;
image_chunk["type"] = "image_url";
nlohmann::ordered_json image_url;
image_url["url"] = "data:image/jpeg;base64," + img;
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
for (const auto& aud : audios) {
nlohmann::ordered_json audio_chunk;
audio_chunk["type"] = "input_audio";
nlohmann::ordered_json input_audio;
input_audio["data"] = aud;
input_audio["format"] = "wav"; // default; could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
for (const auto& vid : videos) {
nlohmann::ordered_json video_chunk;
video_chunk["type"] = "input_video";
nlohmann::ordered_json input_video;
input_video["data"] = vid;
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
// Reconstructs a single OpenAI-format message (the object fed to
// oaicompat_chat_params_parse) from a proto message. Shared by PredictStream and
// Predict so the content/multimodal/tool_calls handling cannot drift between the
// two stream modes (it previously lived as two ~150-line copies with a redundant
// Predict-only tool_calls->" " branch). Guarantees content is always a string or
// a typed-part array, never null/missing.
inline nlohmann::ordered_json build_reconstructed_message(const ReconstructedMessageInput& in) {
nlohmann::ordered_json msg_json;
msg_json["role"] = in.role;
const bool has_media = !in.images.empty() || !in.audios.empty() || !in.videos.empty();
if (!in.content.empty()) {
nlohmann::ordered_json content_val = normalize_message_content(in.role, in.content);
if (content_val.is_string() && in.is_last_user_msg && has_media) {
// Last user message + media: build a typed-part array (text first).
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
nlohmann::ordered_json text_part;
text_part["type"] = "text";
text_part["text"] = content_val.get<std::string>();
content_array.push_back(text_part);
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
} else if (in.is_last_user_msg && has_media) {
// No text but media on the last user message: media-only typed array.
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else {
// Empty content (any role, incl. tool/assistant): templates need a string.
msg_json["content"] = "";
}
if (!in.name.empty()) {
msg_json["name"] = in.name;
}
if (!in.tool_call_id.empty()) {
msg_json["tool_call_id"] = in.tool_call_id;
}
if (!in.reasoning_content.empty()) {
msg_json["reasoning_content"] = in.reasoning_content;
}
if (!in.tool_calls.empty()) {
try {
nlohmann::ordered_json tool_calls = nlohmann::ordered_json::parse(in.tool_calls);
msg_json["tool_calls"] = tool_calls;
// tool_calls + empty/blank content: use " " not "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat turns "" into null, which breaks
// templates that slice message.content[:tool_start_length] (#7324).
if (!msg_json.contains("content") ||
(msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
msg_json["content"] = " ";
}
} catch (const nlohmann::ordered_json::parse_error&) {
// Malformed tool_calls JSON: leave content as-is (prior behavior).
}
}
return msg_json;
}
} // namespace llama_grpc

View File

@@ -1,234 +0,0 @@
// Unit tests for the shared message-reconstruction helpers (message_content.h).
//
// Build & run standalone (nlohmann/json single header on the include path):
// g++ -std=c++17 -I<dir-with-nlohmann> message_content_test.cpp -o t && ./t
// or via CMake: -DLLAMA_GRPC_BUILD_TESTS=ON then ctest.
//
// Regression coverage for:
// #10524 - a user/system prompt that is itself a JSON-array string must stay
// plain text, never be reinterpreted as OpenAI structured parts.
// #7324 - assistant/tool null content -> "" (templates slice content[:N]);
// assistant+tool_calls+empty content -> " " (not "", which becomes null).
// #7528 - tool message array content must reach the template as a string.
// multimodal - last user message text + media -> typed-part array, media kept.
#include <cassert>
#include <iostream>
#include <string>
#include "message_content.h"
using nlohmann::ordered_json;
using llama_grpc::normalize_message_content;
using llama_grpc::normalize_template_message;
using llama_grpc::build_reconstructed_message;
using llama_grpc::ReconstructedMessageInput;
static int failures = 0;
static void check(bool ok, const std::string& name, const std::string& detail = "") {
if (!ok) {
std::cerr << "FAIL " << name << (detail.empty() ? "" : ": " + detail) << "\n";
failures++;
}
}
// ---- normalize_message_content -------------------------------------------
static void expect_norm_string(const char* name, const std::string& role,
const std::string& content, const std::string& want) {
auto got = normalize_message_content(role, content);
if (!got.is_string()) {
check(false, name, "expected a JSON string, got " +
std::string(got.is_array() ? "array" : got.is_object() ? "object" : "other") +
" (" + got.dump() + ")");
return;
}
check(got.get<std::string>() == want, name, "expected \"" + want + "\", got \"" + got.get<std::string>() + "\"");
}
static void test_normalize() {
const std::string ingredients = R"(["1/4 cup brown sugar, packed","1 pound ground beef"])";
// #10524 - JSON-array text must stay a string. Role-INDEPENDENT array defense.
for (const char* role : {"user", "system", "developer", "function", "assistant", "tool"}) {
expect_norm_string((std::string("json_array_stays_text:") + role).c_str(), role, ingredients, ingredients);
}
// #10524 - user/system/developer JSON-object text stays verbatim (NOT re-dumped).
expect_norm_string("user_json_object_verbatim", "user", R"({"a":1})", R"({"a":1})");
expect_norm_string("system_json_object_verbatim", "system", R"({"a":1})", R"({"a":1})");
expect_norm_string("developer_json_object_verbatim", "developer", R"({"a":1})", R"({"a":1})");
// Plain text unchanged for all roles.
expect_norm_string("user_plain_text", "user", "hello world", "hello world");
expect_norm_string("assistant_non_json_text_kept", "assistant", "hi [unclosed", "hi [unclosed");
// #7324 boundary - user/system/developer literal "null" preserved (never parsed).
expect_norm_string("user_literal_null_stays", "user", "null", "null");
expect_norm_string("system_literal_null_stays", "system", "null", "null");
expect_norm_string("developer_literal_null_stays", "developer", "null", "null");
// #7324 - assistant/tool literal null collapses to empty string.
expect_norm_string("assistant_null_to_empty", "assistant", "null", "");
expect_norm_string("tool_null_to_empty", "tool", "null", "");
// #7324/#7528 - assistant/tool object bookkeeping stringified (stays a string).
check(normalize_message_content("assistant", R"({"tool":"x"})").is_string(), "assistant_object_stringified");
check(normalize_message_content("tool", R"({"error":"boom"})").is_string(), "tool_object_stringified");
// #10524-family - a bare scalar that parses as a JSON number stays the string.
expect_norm_string("assistant_scalar_number_stays_string", "assistant", "42", "42");
// baseline - empty content stays empty.
expect_norm_string("user_empty_stays_empty", "user", "", "");
}
// ---- normalize_template_message (BEFORE TEMPLATE sanitizer) ---------------
static void test_template_sanitizer() {
// #7528 - a tool message with an ACTUAL array becomes a string.
{
ordered_json msg = {{"role", "tool"}, {"content", ordered_json::array({{{"type", "text"}, {"text", "r"}}})}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_tool_array_to_string", "got " + msg["content"].dump());
}
// #7324 - null content -> "" for any role.
{
ordered_json msg = {{"role", "assistant"}, {"content", nullptr}};
normalize_template_message(msg);
check(msg["content"].is_string() && msg["content"] == "", "before_template_null_to_empty");
}
// object content -> dumped string (would otherwise throw at the template).
{
ordered_json msg = {{"role", "assistant"}, {"content", {{"x", 1}}}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_object_to_string", "got " + msg["content"].dump());
}
// missing content field -> "".
{
ordered_json msg = {{"role", "user"}};
normalize_template_message(msg);
check(msg.contains("content") && msg["content"] == "", "before_template_missing_to_empty");
}
// multimodal: a well-typed user array must be left UNTOUCHED (role!=tool).
{
ordered_json parts = ordered_json::array();
parts.push_back({{"type", "text"}, {"text", "x"}});
ordered_json img; img["type"] = "image_url"; img["image_url"] = {{"url", "data:..."}};
parts.push_back(img);
ordered_json msg = {{"role", "user"}, {"content", parts}};
normalize_template_message(msg);
check(msg["content"].is_array() && msg["content"].size() == 2, "before_template_user_typed_array_preserved",
"got " + msg["content"].dump());
}
// a plain string is left untouched.
{
ordered_json msg = {{"role", "user"}, {"content", "hello"}};
normalize_template_message(msg);
check(msg["content"] == "hello", "before_template_string_untouched");
}
}
// ---- build_reconstructed_message ----------------------------------------
static void test_reconstruction() {
const std::string ingredients = R"(["1/4 cup brown sugar","1 pound ground beef"])";
// #10524 end-state - user JSON-array text, no media -> string content.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients;
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == ingredients, "recon_user_json_array_string",
"got " + m["content"].dump());
}
// multimodal - user text + one image on last user msg -> typed array, image kept.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 2, "recon_multimodal_text_plus_image",
"got " + m["content"].dump());
check(m["content"][0]["type"] == "text" && m["content"][0]["text"] == ingredients, "recon_multimodal_text_first");
check(m["content"][1]["type"] == "image_url", "recon_multimodal_image_kept");
}
// multimodal media-only - empty text + image on last user msg.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ""; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 1 && m["content"][0]["type"] == "image_url",
"recon_media_only", "got " + m["content"].dump());
}
// #7528 - tool array-string content stays a string.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = R"(["a","b"])"; in.tool_call_id = "call_1";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == R"(["a","b"])", "recon_tool_array_string",
"got " + m["content"].dump());
check(m["tool_call_id"] == "call_1", "recon_tool_call_id_set");
}
// tool empty content -> "".
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == "", "recon_tool_empty_to_string");
}
// #7324 - assistant + tool_calls + empty content -> " " (single space, not "").
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == " ", "recon_toolcalls_empty_content_space",
"got " + m["content"].dump());
check(m["tool_calls"].is_array() && m["tool_calls"].size() == 1, "recon_toolcalls_parsed");
}
// assistant + tool_calls + real content keeps the content.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "I'll call f";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"] == "I'll call f", "recon_toolcalls_with_content_kept");
}
// assistant null content -> "".
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "null";
auto m = build_reconstructed_message(in);
check(m["content"] == "", "recon_assistant_null_to_empty");
}
// malformed tool_calls JSON must not throw; content preserved.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "hi"; in.tool_calls = "{not json";
auto m = build_reconstructed_message(in);
check(m["content"] == "hi" && !m.contains("tool_calls"), "recon_malformed_toolcalls_safe");
}
// optional fields: name + reasoning carried through.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "result"; in.name = "get_weather"; in.reasoning_content = "thinking";
auto m = build_reconstructed_message(in);
check(m["name"] == "get_weather" && m["reasoning_content"] == "thinking", "recon_optional_fields");
}
}
int main() {
test_normalize();
test_template_sanitizer();
test_reconstruction();
if (failures == 0) {
std::cout << "OK: all message_content tests passed\n";
return 0;
}
std::cerr << failures << " test(s) failed\n";
return 1;
}

View File

@@ -14,22 +14,6 @@ mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/llama-cpp-* $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/
# Bundle the ggml shared backends produced by the CPU_ALL_VARIANTS build (libggml-base.so,
# libggml.so, libllama.so and the per-microarch libggml-cpu-*.so), all into package/lib.
#
# Two distinct resolution mechanisms both land here:
# - NEEDED deps (libggml-base/libggml/libllama): resolved by the dynamic linker via the
# LD_LIBRARY_PATH=$CURDIR/lib that run.sh exports.
# - The per-microarch libggml-cpu-*.so are NOT linked; ggml *discovers* them at runtime by
# scanning the executable's own directory (readlink /proc/self/exe). run.sh launches via
# the bundled $CURDIR/lib/ld.so, so /proc/self/exe -> .../lib/ld.so and ggml scans lib/.
# That is why the variants must sit in lib/ (next to ld.so), not just on the link path.
# No-op on builds (arm64/darwin) that don't produce the all-variants set.
if [ -d "$CURDIR/ggml-shared-libs" ]; then
echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..."
cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/
fi
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture

View File

@@ -18,10 +18,6 @@ done
cp -r CMakeLists.txt llama.cpp/tools/grpc-server/
cp -r grpc-server.cpp llama.cpp/tools/grpc-server/
# Shared message-reconstruction helpers (included by grpc-server.cpp) and their
# unit test (compiled only when -DLLAMA_GRPC_BUILD_TESTS=ON).
cp -r message_content.h llama.cpp/tools/grpc-server/
cp -r message_content_test.cpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/cpp-httplib/httplib.h llama.cpp/tools/grpc-server/

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,41 +12,55 @@ grep -e "flags" /proc/cpuinfo | head -1
BINARY=llama-cpp-fallback
# CPU images (x86, arm64, darwin) ship a single llama-cpp-cpu-all built with ggml
# CPU_ALL_VARIANTS: ggml's backend registry dlopens the best libggml-cpu-*.so for this
# host, so no shell-side AVX probing. GPU images (cublas/sycl/vulkan/hipblas) ship only
# llama-cpp-fallback (the accelerator does the compute), so fall back to it when absent.
if [ -e "$CURDIR"/llama-cpp-cpu-all ]; then
BINARY=llama-cpp-cpu-all
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/llama-cpp-avx ]; then
BINARY=llama-cpp-avx
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/llama-cpp-avx2 ]; then
BINARY=llama-cpp-avx2
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/llama-cpp-avx512 ]; then
BINARY=llama-cpp-avx512
fi
fi
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
if [ -e "$CURDIR"/llama-cpp-grpc ]; then
if [ -e $CURDIR/llama-cpp-grpc ]; then
BINARY=llama-cpp-grpc
fi
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
if [ -d "$CURDIR/lib/rocblas/library" ]; then
export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
fi
fi
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec "$CURDIR"/$BINARY "$@"
exec $CURDIR/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec "$CURDIR"/llama-cpp-fallback "$@"
exec $CURDIR/llama-cpp-fallback "$@"

View File

@@ -51,14 +51,6 @@ add_library(hw_grpc_proto STATIC
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
# The generated proto/grpc sources include protobuf and grpc++ headers, so this
# library must see their include dirs. Linking the imported targets propagates
# them. On Linux the apt headers live in /usr/include (default search path) so
# this was a no-op; on macOS the Homebrew headers are under /opt/homebrew and
# would otherwise be missed (runtime_version.h not found).
target_link_libraries(hw_grpc_proto PUBLIC
protobuf::libprotobuf
gRPC::grpc++)
# Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
# PF_VULKAN is honored when passed on the cmake command line (it lands in the

View File

@@ -2,13 +2,7 @@
# Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
set -e
CURDIR=$(dirname "$(realpath "$0")")
# macOS has no bundled ld.so; the darwin package ships only dylibs under lib/,
# resolved via DYLD_LIBRARY_PATH (the ld.so branch below is skipped there).
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$DYLD_LIBRARY_PATH"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
fi
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
if [ -f "$CURDIR/lib/ld.so" ]; then
exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
fi

View File

@@ -1,71 +0,0 @@
#!/bin/bash
#
# Discovers and runs every standalone C++ unit test under backend/cpp/.
#
# A "standalone" unit test is a *_test.cpp that depends only on the C++ standard
# library and nlohmann/json (single header) - i.e. it exercises pure helpers and
# does not need the full llama.cpp + gRPC backend build. Tests that DO need the
# backend build use the CMake/ctest path (e.g. -DLLAMA_GRPC_BUILD_TESTS=ON)
# instead and are skipped here.
#
# This keeps CI generic: adding a new pure-C++ unit test file named *_test.cpp in
# an active backend source dir is picked up automatically, with no CI edits.
#
# Env:
# NLOHMANN_INCLUDE include dir that contains nlohmann/json.hpp. If unset, the
# nlohmann/json single header is fetched to a temp dir.
# CXX compiler (default: g++).
# JSON_VERSION nlohmann/json tag to fetch when NLOHMANN_INCLUDE is unset
# (default: v3.11.3).
set -uo pipefail
ROOT="$(cd "$(dirname "$0")" && pwd)"
CXX="${CXX:-g++}"
JSON_VERSION="${JSON_VERSION:-v3.11.3}"
JSON_INC="${NLOHMANN_INCLUDE:-}"
if [ -z "$JSON_INC" ]; then
JSON_INC="$(mktemp -d)"
mkdir -p "$JSON_INC/nlohmann"
echo "Fetching nlohmann/json ${JSON_VERSION} single header..."
if ! curl -L -sf \
"https://raw.githubusercontent.com/nlohmann/json/${JSON_VERSION}/single_include/nlohmann/json.hpp" \
-o "$JSON_INC/nlohmann/json.hpp"; then
echo "ERROR: failed to fetch nlohmann/json header" >&2
exit 1
fi
fi
# Active source dirs only - exclude per-variant build copies, dev snapshots and
# the vendored upstream llama.cpp tree.
mapfile -t tests < <(find "$ROOT" -name '*_test.cpp' \
-not -path '*/llama.cpp/*' \
-not -path '*-build/*' \
-not -path '*-dev/*' \
-not -path '*fallback*' | sort)
if [ "${#tests[@]}" -eq 0 ]; then
echo "No standalone C++ unit tests found under $ROOT"
exit 0
fi
fail=0
for test_src in "${tests[@]}"; do
name="$(basename "$test_src" .cpp)"
bin="$(mktemp -d)/$name"
echo "==> $test_src"
if ! "$CXX" -std=c++17 -Wall -Wextra \
-I"$JSON_INC" -I"$(dirname "$test_src")" \
"$test_src" -o "$bin"; then
echo "COMPILE FAILED: $test_src" >&2
fail=1
continue
fi
if ! "$bin"; then
echo "TEST FAILED: $test_src" >&2
fail=1
fi
done
echo "Ran ${#tests[@]} standalone C++ unit test file(s)"
exit "$fail"

View File

@@ -65,29 +65,6 @@ turboquant-avx:
turboquant-fallback:
$(call turboquant-build,fallback,-DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)
# Single-build CPU backend via ggml CPU_ALL_VARIANTS (mirrors llama-cpp-cpu-all).
# turboquant reuses backend/cpp/llama-cpp's CMakeLists.txt (hw_grpc_proto STATIC) and
# Makefile (SHARED_LIBS make-var + EXTRA_CMAKE_ARGS), so this passes the same overrides
# through to the copied build: SHARED_LIBS=ON, the DL flags, and --target ggml (which
# pulls in the per-microarch libggml-cpu-*.so via ggml's add_dependencies). The .so set
# is collected for package.sh to bundle into package/lib.
turboquant-cpu-all:
rm -rf $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build
cp -rf $(LLAMA_CPP_DIR) $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build purge
bash $(CURRENT_MAKEFILE_DIR)/patch-grpc-server.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server.cpp
$(info $(GREEN)I turboquant build info:cpu-all-variants$(RESET))
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build llama.cpp
bash $(CURRENT_MAKEFILE_DIR)/apply-patches.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp $(PATCHES_DIR)
SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" \
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server turboquant-cpu-all
rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs
find $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \;
@echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/
turboquant-grpc:
$(call turboquant-build,grpc,-DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server --target rpc-server)

View File

@@ -14,15 +14,6 @@ mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/turboquant-* $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/
# Bundle the ggml shared backends from the CPU_ALL_VARIANTS build into package/lib. ggml
# discovers the per-microarch libggml-cpu-*.so by scanning the executable directory, which
# (via the bundled lib/ld.so that run.sh launches through) resolves to lib/. See the
# matching comment in backend/cpp/llama-cpp/package.sh. No-op on the fallback/ROCm builds.
if [ -d "$CURDIR/ggml-shared-libs" ]; then
echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..."
cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/
fi
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,39 +12,54 @@ grep -e "flags" /proc/cpuinfo | head -1
BINARY=turboquant-fallback
# x86/arm64 ship a single turboquant-cpu-all built with ggml CPU_ALL_VARIANTS: ggml's
# backend registry dlopens the best libggml-cpu-*.so for this host, so no shell-side
# probing. ROCm ships only turboquant-fallback, so fall back to it when cpu-all is absent.
if [ -e "$CURDIR"/turboquant-cpu-all ]; then
BINARY=turboquant-cpu-all
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/turboquant-avx ]; then
BINARY=turboquant-avx
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/turboquant-avx2 ]; then
BINARY=turboquant-avx2
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/turboquant-avx512 ]; then
BINARY=turboquant-avx512
fi
fi
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
if [ -e "$CURDIR"/turboquant-grpc ]; then
if [ -e $CURDIR/turboquant-grpc ]; then
BINARY=turboquant-grpc
fi
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
if [ -d "$CURDIR/lib/rocblas/library" ]; then
export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
fi
fi
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec "$CURDIR"/$BINARY "$@"
exec $CURDIR/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec "$CURDIR"/turboquant-fallback "$@"
exec $CURDIR/turboquant-fallback "$@"

View File

@@ -117,8 +117,7 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \
cd .. && \
(mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET)
test: acestep-cpp
@echo "Running acestep-cpp tests..."

View File

@@ -4,7 +4,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -23,11 +22,7 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("ACESTEP_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libgoacestepcpp-fallback.dylib"
} else {
libName = "./libgoacestepcpp-fallback.so"
}
libName = "./libgoacestepcpp-fallback.so"
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -13,7 +13,6 @@ mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/acestep-cpp $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,29 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single library variant (Metal or Accelerate). The goacestepcpp
# target is built as a CMake MODULE, which emits a .dylib for a SHARED
# build but a .so for a MODULE build on Apple, so prefer .dylib and fall
# back to .so.
LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib"
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
fi
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e "$CURDIR"/libgoacestepcpp-avx.so ]; then
if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e "$CURDIR"/libgoacestepcpp-avx2.so ]; then
if [ -e $CURDIR/libgoacestepcpp-avx2.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx2.so"
fi
fi
@@ -42,22 +32,21 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e "$CURDIR"/libgoacestepcpp-avx512.so ]; then
if [ -e $CURDIR/libgoacestepcpp-avx512.so ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ACESTEP_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/acestep-cpp "$@"
exec $CURDIR/lib/ld.so $CURDIR/acestep-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec "$CURDIR"/acestep-cpp "$@"
exec $CURDIR/acestep-cpp "$@"

View File

@@ -57,7 +57,6 @@ libced.so: sources/ced.cpp
cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true
cp -fv sources/ced.cpp/include/ced_capi.h ./
ced-grpc: libced.so main.go goced.go

View File

@@ -12,7 +12,6 @@ import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ type libFunc struct {
func main() {
libName := os.Getenv("CED_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "libced.dylib"
} else {
libName = "libced.so"
}
libName = "libced.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {

View File

@@ -15,12 +15,10 @@ mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then
echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libced.so not found in $CURDIR, run 'make' first" >&2
exit 1
fi
}
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."

View File

@@ -3,12 +3,7 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
export CED_LIBRARY="$CURDIR/lib/libced.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
fi
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the sibling backends).

View File

@@ -1,6 +1,6 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
exec "$CURDIR"/cloud-proxy "$@"
exec $CURDIR/cloud-proxy "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=6514c9da00b03a2f0f1b49a43fae4f3a01a41844
CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -75,8 +75,7 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgocrispasr-fallback.dylib
VARIANT_TARGETS = libgocrispasr-fallback.so
endif
crispasr: main.go gocrispasr.go $(VARIANT_TARGETS)
@@ -88,7 +87,7 @@ package: crispasr
build: package
clean: purge
rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr
rm -rf libgocrispasr*.so package sources/CrispASR crispasr
purge:
rm -rf build*
@@ -119,21 +118,13 @@ libgocrispasr-fallback.so: sources/CrispASR
SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
# Build fallback variant as a dylib (Darwin)
libgocrispasr-fallback.dylib: sources/CrispASR
$(MAKE) purge
$(info ${GREEN}I crispasr build info:fallback (dylib)${RESET})
SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
(mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET)
test: crispasr
CGO_ENABLED=0 $(GOCMD) test -v ./...

View File

@@ -4,7 +4,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("CRISPASR_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libgocrispasr-fallback.dylib"
} else {
libName = "./libgocrispasr-fallback.so"
}
libName = "./libgocrispasr-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/crispasr $CURDIR/package/
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e "$CURDIR"/libgocrispasr-avx.so ]; then
if [ -e $CURDIR/libgocrispasr-avx.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e "$CURDIR"/libgocrispasr-avx2.so ]; then
if [ -e $CURDIR/libgocrispasr-avx2.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx2.so"
fi
fi
@@ -36,27 +32,26 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e "$CURDIR"/libgocrispasr-avx512.so ]; then
if [ -e $CURDIR/libgocrispasr-avx512.so ]; then
LIBRARY="$CURDIR/libgocrispasr-avx512.so"
fi
fi
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export CRISPASR_LIBRARY=$LIBRARY
# Point piper's espeak-ng phonemizer at the bundled voice data. The variable
# names the directory CONTAINING espeak-ng-data (package.sh drops it next to
# this script). Harmless when espeak-ng wasn't bundled.
export CRISPASR_ESPEAK_DATA_PATH="$CURDIR"
export CRISPASR_ESPEAK_DATA_PATH=$CURDIR
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/crispasr "$@"
exec $CURDIR/lib/ld.so $CURDIR/crispasr "$@"
fi
echo "Using library: $LIBRARY"
exec "$CURDIR"/crispasr "$@"
exec $CURDIR/crispasr "$@"

View File

@@ -40,8 +40,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
@@ -79,7 +77,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib
VARIANT_TARGETS = libdepthanythingcpp-fallback.so
endif
depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
@@ -91,7 +89,7 @@ package: depth-anything-cpp
build: package
clean: purge
rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources
rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
purge:
rm -rf build*
@@ -118,19 +116,11 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
else
libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
endif
libdepthanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -138,8 +128,7 @@ libdepthanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
(mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
all: depth-anything-cpp package

View File

@@ -9,7 +9,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("DEPTHANYTHING_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libdepthanythingcpp-fallback.dylib"
} else {
libName = "./libdepthanythingcpp-fallback.so"
}
libName = "./libdepthanythingcpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e "$CURDIR"/libdepthanythingcpp-avx.so ]; then
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e "$CURDIR"/libdepthanythingcpp-avx2.so ]; then
if [ -e $CURDIR/libdepthanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx2.so"
fi
fi
@@ -36,22 +32,21 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e "$CURDIR"/libdepthanythingcpp-avx512.so ]; then
if [ -e $CURDIR/libdepthanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/depth-anything-cpp "$@"
exec $CURDIR/lib/ld.so $CURDIR/depth-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec "$CURDIR"/depth-anything-cpp "$@"
exec $CURDIR/depth-anything-cpp "$@"

View File

@@ -1,6 +1,6 @@
# face-detect backend Makefile.
#
# Upstream pin lives below as FACEDETECT_VERSION?=06914b0... (.github/bump_deps.sh
# Upstream pin lives below as FACEDETECT_VERSION?=e22260d... (.github/bump_deps.sh
# can find and update it - matches the voice-detect / parakeet.cpp / whisper.cpp
# convention).
#
@@ -14,7 +14,7 @@
# The default target below does the proper clone-at-pin + cmake build so CI does
# not need a side-checkout.
FACEDETECT_VERSION?=06914b077d52f90d5421299138e7be6bdd06b5e8
FACEDETECT_VERSION?=e22260d5d5490b37b021b7f795079f386d553afd
FACEDETECT_REPO?=https://github.com/mudler/face-detect.cpp
GOCMD?=go

View File

@@ -1,6 +1,6 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
exec "$CURDIR"/local-store "$@"
exec $CURDIR/local-store "$@"

View File

@@ -32,8 +32,6 @@ endif
ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DLOCALVQE_VULKAN=ON
else ifeq ($(OS),Darwin)
# Apple Silicon: CPU-only (no Metal upstream); built + published as an arm64
# image by CI (includeDarwin in .github/backend-matrix.yml) for macOS install.
CMAKE_ARGS+=-DGGML_METAL=OFF
endif
@@ -69,9 +67,8 @@ $(LIB_SENTINEL): sources/LocalVQE
# that the loader picks at runtime. We must build every target — the
# default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY
# routes all of them into build/bin; copy them out next to the binary.
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib .
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* .
cp -P build/bin/libggml*.so* . 2>/dev/null || true
cp -P build/bin/libggml*.dylib . 2>/dev/null || true
touch $(LIB_SENTINEL)
liblocalvqe.so: $(LIB_SENTINEL)

View File

@@ -4,7 +4,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("LOCALVQE_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./liblocalvqe.dylib"
} else {
libName = "./liblocalvqe.so"
}
libName = "./liblocalvqe.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -15,9 +15,7 @@ cp -avf $CURDIR/localvqe $CURDIR/package/
# liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime
# variants — LocalVQE picks the matching CPU variant at load time.
cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -1,34 +1,23 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
# LocalVQE's runtime CPU-variant loader (ggml_backend_load_all) searches
# get_executable_path() and current_path() — the second one is what saves us
# when /proc/self/exe resolves to lib/ld.so under the bundled-loader path.
# So we cd into "$CURDIR" (where all the libggml-cpu-*.so files live) before
# So we cd into $CURDIR (where all the libggml-cpu-*.so files live) before
# exec'ing the binary.
cd "$CURDIR"
if [ "$(uname)" = "Darwin" ]; then
# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
export DYLD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$DYLD_LIBRARY_PATH
LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.dylib
if [ ! -e "$LOCALVQE_LIBRARY" ]; then
LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
fi
export LOCALVQE_LIBRARY
else
export LD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
fi
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LOCALVQE_LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/localvqe "$@"
exec $CURDIR/lib/ld.so $CURDIR/localvqe "$@"
fi
echo "Using library: $LOCALVQE_LIBRARY"
exec "$CURDIR"/localvqe "$@"
exec $CURDIR/localvqe "$@"

View File

@@ -33,8 +33,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DGGML_VULKAN=ON -DLA_GGML_VULKAN=ON
else ifeq ($(OS),Darwin)
# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
ifneq ($(BUILD_TYPE),metal)
CMAKE_ARGS+=-DGGML_METAL=OFF
else
@@ -72,7 +70,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib
VARIANT_TARGETS = liblocateanythingcpp-fallback.so
endif
locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
@@ -84,7 +82,7 @@ package: locate-anything-cpp
build: package
clean: purge
rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources
rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources
purge:
rm -rf build*
@@ -111,19 +109,11 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
else
liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
endif
liblocateanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -131,8 +121,7 @@ liblocateanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
(mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)
all: locate-anything-cpp package

View File

@@ -9,7 +9,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("LOCATEANYTHING_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./liblocateanythingcpp-fallback.dylib"
} else {
libName = "./liblocateanythingcpp-fallback.so"
}
libName = "./liblocateanythingcpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e "$CURDIR"/liblocateanythingcpp-avx.so ]; then
if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e "$CURDIR"/liblocateanythingcpp-avx2.so ]; then
if [ -e $CURDIR/liblocateanythingcpp-avx2.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx2.so"
fi
fi
@@ -36,22 +32,21 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e "$CURDIR"/liblocateanythingcpp-avx512.so ]; then
if [ -e $CURDIR/liblocateanythingcpp-avx512.so ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LOCATEANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/locate-anything-cpp "$@"
exec $CURDIR/lib/ld.so $CURDIR/locate-anything-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec "$CURDIR"/locate-anything-cpp "$@"
exec $CURDIR/locate-anything-cpp "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# omnivoice.cpp version
OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607
OMNIVOICE_VERSION?=96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd
SO_TARGET?=libgomnivoicecpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,7 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib
VARIANT_TARGETS = libgomnivoicecpp-fallback.so
endif
omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
@@ -78,7 +77,7 @@ package: omnivoice-cpp
build: package
clean: purge
rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp
rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
purge:
rm -rf build*
@@ -107,20 +106,13 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.dylib
libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
cd .. && \
(mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
test: omnivoice-cpp
@echo "Running omnivoice-cpp tests..."

View File

@@ -4,7 +4,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("OMNIVOICE_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libgomnivoicecpp-fallback.dylib"
} else {
libName = "./libgomnivoicecpp-fallback.so"
}
libName = "./libgomnivoicecpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -2,7 +2,7 @@
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
cd /
@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e "$CURDIR"/libgomnivoicecpp-avx.so ]; then
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx.so"
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e "$CURDIR"/libgomnivoicecpp-avx2.so ]; then
if [ -e $CURDIR/libgomnivoicecpp-avx2.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx2.so"
fi
fi
@@ -36,22 +32,21 @@ else
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e "$CURDIR"/libgomnivoicecpp-avx512.so ]; then
if [ -e $CURDIR/libgomnivoicecpp-avx512.so ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using library: $LIBRARY"
exec "$CURDIR"/lib/ld.so "$CURDIR"/omnivoice-cpp "$@"
exec $CURDIR/lib/ld.so $CURDIR/omnivoice-cpp "$@"
fi
echo "Using library: $LIBRARY"
exec "$CURDIR"/omnivoice-cpp "$@"
exec $CURDIR/omnivoice-cpp "$@"

View File

@@ -1,30 +1,13 @@
GOCMD?=go
GO_TAGS?=
# The opus shim is a small C wrapper around libopus' variadic
# opus_encoder_ctl (see csrc/opus_shim.c). It is built as a shared library
# and dlopen'd at runtime by the Go backend (codec.go). The extension is
# OS-specific: Linux uses .so, macOS uses .dylib. OS is exported by the root
# Makefile (`export OS := $(shell uname -s)`).
SHIM_EXT=so
OPUS_CFLAGS := $(shell pkg-config --cflags opus)
OPUS_LIBS := $(shell pkg-config --libs opus)
SHIM_LDFLAGS := $(OPUS_LIBS)
ifeq ($(OS),Darwin)
SHIM_EXT=dylib
# Resolve libopus symbols lazily from the already globally-loaded
# libopus (codec.go dlopens it RTLD_GLOBAL before the shim) rather than
# recording an absolute Homebrew path in the dylib. This keeps the
# packaged shim relocatable on machines that have no Homebrew.
SHIM_LDFLAGS := -undefined dynamic_lookup
endif
libopusshim.so: csrc/opus_shim.c
$(CC) -shared -fPIC -o $@ $< $(OPUS_CFLAGS) $(OPUS_LIBS)
libopusshim.$(SHIM_EXT): csrc/opus_shim.c
$(CC) -shared -fPIC -o $@ $< $(OPUS_CFLAGS) $(SHIM_LDFLAGS)
opus: libopusshim.$(SHIM_EXT)
opus: libopusshim.so
$(GOCMD) build -tags "$(GO_TAGS)" -o opus ./
package: opus
@@ -33,7 +16,4 @@ package: opus
build: package
clean:
rm -f opus libopusshim.$(SHIM_EXT)
rm -rf package
.PHONY: build package clean
rm -f opus libopusshim.so

View File

@@ -8,23 +8,13 @@ mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/opus $CURDIR/package/
cp -avf $CURDIR/run.sh $CURDIR/package/
# The shim extension is OS-specific (.so on Linux, .dylib on macOS).
SHIM_EXT=so
if [ "$(uname)" = "Darwin" ]; then
SHIM_EXT=dylib
fi
# Copy the opus shim library
cp -avf $CURDIR/libopusshim.$SHIM_EXT $CURDIR/package/lib/
cp -avf $CURDIR/libopusshim.so $CURDIR/package/lib/
# Copy system libopus so the backend is self-contained: the runtime base
# image has neither libopus-dev (Linux) nor Homebrew (macOS), so codec.go's
# dlopen would otherwise fail. Both name patterns are attempted; only the
# host's matching one exists.
# Copy system libopus
if command -v pkg-config >/dev/null 2>&1 && pkg-config --exists opus; then
LIBOPUS_DIR=$(pkg-config --variable=libdir opus)
cp -avf $LIBOPUS_DIR/libopus.so* $CURDIR/package/lib/ 2>/dev/null || true
cp -avf $LIBOPUS_DIR/libopus*.dylib $CURDIR/package/lib/ 2>/dev/null || true
cp -avfL $LIBOPUS_DIR/libopus.so* $CURDIR/package/lib/ 2>/dev/null || true
fi
# Detect architecture and copy appropriate libraries
@@ -48,8 +38,6 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin — system libraries linked dynamically, no bundled loader needed"
else
echo "Warning: Could not detect architecture for system library bundling"
fi

View File

@@ -1,20 +1,15 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY="$CURDIR"/lib/libopusshim.dylib
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY="$CURDIR"/lib/libopusshim.so
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OPUS_SHIM_LIBRARY=$CURDIR/lib/libopusshim.so
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec "$CURDIR"/lib/ld.so "$CURDIR"/opus "$@"
exec $CURDIR/lib/ld.so $CURDIR/opus "$@"
fi
exec "$CURDIR"/opus "$@"
exec $CURDIR/opus "$@"

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go
@@ -74,7 +74,6 @@ libparakeet.so: sources/parakeet.cpp
cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS)
cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./
parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go

View File

@@ -2,17 +2,15 @@ package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads the parakeet shared library via purego and registers the flat
// C-API entry points declared in parakeet_capi.h. The library name can be
// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY /
// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default
// looks next to this binary for libparakeet.so on Linux and
// libparakeet.dylib on macOS.
// Loads libparakeet.so via purego and registers the flat C-API entry
// points declared in parakeet_capi.h. The library name can be overridden
// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY
// convention in the sibling backends); the default looks for the .so next
// to this binary.
import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -30,11 +28,7 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("PARAKEET_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "libparakeet.dylib"
} else {
libName = "libparakeet.so"
}
libName = "libparakeet.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -16,15 +16,12 @@ mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libparakeet shared lib + any soname symlinks. On Linux this is
# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen
# resolves it via the *_LIBRARY_PATH that run.sh points at lib/.
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then
echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2
# libparakeet.so + any soname symlinks (libparakeet.so.X[.Y]). purego.Dlopen
# resolves it via LD_LIBRARY_PATH, which run.sh points at lib/.
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
exit 1
fi
}
# Detect architecture and copy the core runtime libs libparakeet.so links
# against, plus the matching dynamic loader as lib/ld.so.
@@ -51,7 +48,7 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1

View File

@@ -3,17 +3,11 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
fi
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
# If a self-contained ld.so was packaged, route through it so the
# packaged libc / libstdc++ are used instead of the host's (matches the
# whisper backend's runtime layout). Linux only.
# whisper backend's runtime layout).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@"

View File

@@ -16,15 +16,7 @@ cp -rfv $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/sources/go-piper/piper-phonemize/pi/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries
if [ "$(uname)" = "Darwin" ]; then
# macOS has no glibc loader to bundle. The piper binary links its bundled
# libs (libucd, libespeak-ng, libpiper_phonemize, libonnxruntime) via
# @rpath but ships with no LC_RPATH, so dyld aborts at launch with
# "Library not loaded: @rpath/libucd.dylib ... no LC_RPATH's found".
# Add an @loader_path/lib rpath so @rpath resolves to package/lib/.
echo "Detected macOS; adding @loader_path/lib rpath so bundled libs resolve via @rpath..."
install_name_tool -add_rpath @loader_path/lib "$CURDIR/package/piper"
elif [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so

View File

@@ -1,20 +1,15 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath "$0")")
CURDIR=$(dirname "$(realpath $0)")
export ESPEAK_NG_DATA="$CURDIR"/espeak-ng-data
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
export ESPEAK_NG_DATA=$CURDIR/espeak-ng-data
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec "$CURDIR"/lib/ld.so "$CURDIR"/piper "$@"
exec $CURDIR/lib/ld.so $CURDIR/piper "$@"
fi
exec "$CURDIR"/piper "$@"
exec $CURDIR/piper "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# qwentts.cpp version
QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42
QWEN3TTS_CPP_VERSION?=4536dcdce27c3764a93a06d6bf64026b124962f5
SO_TARGET?=libgoqwen3ttscpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so
endif
qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
build: package
clean: purge
rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp
rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
purge:
rm -rf build*
@@ -110,20 +110,13 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.dylib
libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \
cd .. && \
(mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null)
mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET)
test: qwen3-tts-cpp
@echo "Running qwen3-tts-cpp tests..."

View File

@@ -4,7 +4,6 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("QWEN3TTS_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "./libgoqwen3ttscpp-fallback.dylib"
} else {
libName = "./libgoqwen3ttscpp-fallback.so"
}
libName = "./libgoqwen3ttscpp-fallback.so"
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

Some files were not shown because too many files have changed in this diff Show More