Compare commits

...

93 Commits

Author SHA1 Message Date
Ettore Di Giacinto
87d5734c33 fix(config): gate parallel-slot default on per-device VRAM too (#10485)
The first #10485 fix (#10494) made the Blackwell physical-batch boost
per-device/context-aware, which neutralized the big compute-buffer OOM, but
the reporter's 2x16 GiB consumer Blackwell still OOM'd. Tracing the post-fix
log: the model now loads its weights, builds the main context and warms up
fine, and dies only on the *last* allocation — the MTP draft context's 800 MiB
KV cache on the tighter device.

#10411 changed only two defaults: the physical batch (now gated) and a
VRAM-scaled parallel-slot count. The KV cache is unified (n_ctx_seq == full
context proves slots share the budget, so parallel doesn't multiply KV), but
n_seq_max=4 still adds per-slot compute-graph / context-checkpoint / output
scratch. On a device packed ~99% by a 27B model spanning both cards, that
overhead is the few-hundred-MiB straw — which is why reverting #10411 (and only
#10411) restores a working load.

Gate the parallel-slot default on the same per-device headroom predicate as the
batch boost: when a large context already fills a single card
(largeContextForDevice), keep n_parallel=1. A user running one big-context model
that barely fits across two consumer GPUs is not serving four concurrent
tenants. Small contexts and large unified-memory devices (GB10) keep full
concurrency. Applied on both the single-host path and the distributed router.

Also make the auto-tuning visible and reversible (the debugging here needed
DEBUG logs and a git bisect):

  - Log the effective performance-relevant runtime options at INFO once per
    model load ("effective runtime tuning …": context, n_batch, n_gpu_layers,
    parallel, flash_attention, f16) so an admin can see what will run and pin or
    override any value in the model YAML.
  - LOCALAI_DISABLE_HARDWARE_DEFAULTS=true skips the hardware auto-tuning
    entirely (mirrors LOCALAI_DISABLE_GUESSING) for stock llama.cpp behavior.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
2026-06-25 12:57:19 +00:00
LocalAI [bot]
693e3eec05 chore(model gallery): 🤖 add 1 new models via gallery agent (#10505)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:52 +02:00
LocalAI [bot]
f1e5071321 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8caa3f908ae6d4a4bef531e73b9a969f266a3d1f (#10493)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:31 +02:00
LocalAI [bot]
93d6255de3 chore: ⬆️ Update ggml-org/llama.cpp to 8be759e6f70d629638a7eb70db3824cbdcea370b (#10501)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:17 +02:00
LocalAI [bot]
fe4f425fb5 fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482) (#10504)
* fix(http): harden BaseURL proxy scheme/host detection

Split comma-separated X-Forwarded-Proto and honor the RFC 7239 Forwarded
header so generated links use https behind common reverse-proxy setups.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(http): honor explicit external base URL in BaseURL

When _external_base_url is set in the request context it dictates the
origin (scheme+host+port); the proxy path prefix is still appended.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(config): generalize LOCALAI_BASE_URL to ExternalBaseURL

LOCALAI_BASE_URL now sets a single instance-wide external base URL used
for OAuth callbacks and all self-referential links. A Pre middleware
stamps it into the request context for middleware.BaseURL.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: document LOCALAI_BASE_URL and reverse-proxy headers

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(http): cover parseForwarded edge cases; clarify base-url flag group

Adds direct unit coverage for quoted/malformed/multi-element Forwarded
headers and regroups the external base URL flag away from auth-only.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:10:59 +02:00
LocalAI [bot]
fae9f6356f chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 9dbe7ea26a01b30fccb117ae5e86807c1dc23d42 (#10499)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:10:41 +02:00
LocalAI [bot]
066abf82c0 feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough (#10490)
* feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options

Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main
model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets
users keep MoE expert weights on CPU to manage VRAM on large MoE models.

Closes part of #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): forward unknown '-' options to upstream arg parser

Any options: entry starting with '-' is collected and passed verbatim to
llama.cpp's own common_params_parse (LLAMA_EXAMPLE_SERVER) at the end of
params_parse, so every upstream llama-server flag works without a new
hand-wired branch. Passthrough runs last and wins on overlap; n_parallel is
snapshotted to survive parser_init's SERVER reset, and help/usage/completion
flags are skipped to avoid exiting the backend.

Closes #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(llama-cpp): document cpu_moe/n_cpu_moe and option passthrough

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama-cpp): terminate tensor/kv override vectors after passthrough

The tensor_buft_overrides padding and the kv/draft override terminators
ran before the generic option passthrough, so a passthrough flag
(--cpu-moe, --override-tensor, --override-kv, ...) appended a real entry
after the null sentinel - tripping the model loader's
back().pattern == nullptr assertion (crash) or being silently dropped.
Move all three termination/padding blocks to the end of params_parse,
after both the named-option loop and common_params_parse have pushed
their real entries. Also widen the exit()-flag skip list so --version,
--license, --list-devices and --cache-list cannot terminate the backend.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:10:08 +02:00
LocalAI [bot]
a7fec9a49d feat(backends): add darwin/metal (MPS) build for trl (#10487)
* feat(backends): add darwin/metal (MPS) build for trl

Authors backend/python/trl/requirements-mps.txt and wires trl into the
darwin CI matrix and gallery so the MPS training path can be built and
validated on Apple Silicon. The MPS variant installs plain PyPI torch
wheels (MPS-capable on macOS arm64) and the trl training stack; bitsandbytes
is omitted as it is a CUDA-only dependency with poor Apple Silicon support.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(trl): guard uv-only --index-strategy for the pip/darwin path

The darwin/MPS build installs with pip (USE_PIP=true), which rejects the
uv-only --index-strategy flag and failed the darwin backend build. Add it
only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:36 +02:00
LocalAI [bot]
c678530cf0 fix(backends): darwin/metal support across purego Go backends (#10481)
* fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path)

The parakeet-cpp backend had no macOS support and panicked at startup on
Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so".
Fix it across the same four layers the sibling voxtral backend already
handles correctly:

- main.go: default the dlopen target to libparakeet.dylib on darwin
  (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins.
- Makefile: also stage the built libparakeet.dylib next to the Go sources.
- package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when
  bundling instead of hard-failing when no .so is present (the macOS case);
  note that on Darwin only system frameworks are linked.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the
  packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux.

Mirrors backend/go/voxtral.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(backends): darwin/metal support across purego Go backends

The parakeet-cpp fix in the previous commit was an instance of a bug
shared by nearly every purego/dlopen Go backend: the dlopen target was
hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the
backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the
.dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling
this correctly.

Apply the same four-layer fix (mirroring backend/go/voxtral) to the
remaining affected backends:

  whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp,
  qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp,
  depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe

Per backend:
- main.go (sherpa-onnx: backend.go, two libraries): default the dlopen
  target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the
  existing <BACKEND>_LIBRARY env override still wins.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at
  the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant
  (avx/avx2/avx512) selection unchanged in the else branch.
- package.sh: also bundle the .dylib and stop hard-failing when no .so is
  present (the macOS case).
- Makefile: also stage the built .dylib.

Notes:
- stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE,
  which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls
  back to .so so both layouts work.
- sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only
  run.sh and the two dlopen defaults needed fixing.

Linux behavior is unchanged. Verified gofmt-clean and
`CGO_ENABLED=0 go build` for every backend.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:18 +02:00
LocalAI [bot]
3c63431e46 chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 0f37401bebe9b20c0160a888e592108fc1d17607 (#10492)
⬆️ Update ServeurpersoCom/omnivoice.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:58 +02:00
LocalAI [bot]
3f647a2764 chore: ⬆️ Update ikawrakow/ik_llama.cpp to d5507e33ae7ee2b7b41475f08044d3bde3b839ee (#10498)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:42 +02:00
LocalAI [bot]
f88981cdce feat(ui): data-driven hardware model recommendations + gallery surfacing (#10500)
* feat(ui): make hardware starter models data-driven

The empty-state starter widget recommended from a hardcoded list, which
drifts as the gallery evolves. Add useRecommendedModels: it queries the
live gallery for chat-capable models (their natural curated order, since
the gallery exposes no popularity signal), estimates size/VRAM for the top
candidates via the existing estimate endpoint, and ranks by hardware fit -
smallest on CPU-only boxes, largest-that-fits on GPUs.

StarterModels now renders those live picks and keeps the curated static
list only as an offline/trimmed-gallery fallback.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): recommend models for your hardware in the gallery

Hardware-aware recommendations were only shown on the first-run empty
state. Surface them on the main Models gallery too: a dismissible
"Recommended for your hardware" strip at the top, sharing the
useRecommendedModels fit-ranking with the starter widget. CPU-only boxes
get small models; GPUs get the largest picks that fit VRAM, with size and
VRAM shown per card. One-click install; dismissal persists per browser.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): gpu-mid tier + NVIDIA NVFP4 model recommendations

Refine the hardware recommendation tiers and curated picks:

- Add a gpu-mid tier (8-24GB VRAM) between gpu-small and gpu-large, so
  ~27B-class models are suggested separately from the 30B+ large tier.
- Detect NVIDIA GPUs (resources.gpus[].vendor) and, on NVIDIA only, prefer
  NVFP4 + MTP variants (Blackwell-optimised); NVFP4 models are filtered out
  of recommendations on non-NVIDIA hardware where they can't run. This
  applies to both the live ranking and the static fallback, with an NVFP4
  badge shown on those picks.
- Refresh the curated fallback to current models: Gemma-4 QAT Q4 builds at
  every tier, low qwen3.5 (4B distilled / 9B) on CPU/small, qwen3.6-27b
  and MTP variants at mid, qwen3.6/qwen3.5 35B-A3B apex/distilled at large.
  All names verified against gallery/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 00:22:45 +02:00
LocalAI [bot]
0d6de15ae9 fix(config): per-device VRAM headroom for Blackwell defaults (#10485) (#10494)
The hardware-tuned defaults from #10411 were measured on a GB10 / DGX Spark
(128 GiB unified memory) and over-provisioned multi-GPU consumer Blackwell
(e.g. 2x16 GiB RTX 50-series) into CUDA OOM during model init:

  - The Blackwell physical batch (512 -> 2048) sets both n_batch and n_ubatch.
    The compute buffer scales ~n_ubatch * n_ctx and is allocated PER DEVICE
    (it can't be split across GPUs), so a large context turns ub2048 into
    multi-GiB of scratch that must fit one 16 GiB card.
  - The VRAM-scaled parallel-slot default tiered off TotalAvailableVRAM(),
    which SUMS all GPUs (2x16 -> "32 GiB" -> 8 slots), but the allocations
    are per-device.

Make both decisions per-device and context-aware:

  - xsysinfo.MinPerGPUVRAM() reports the smallest device's VRAM; localGPU()
    uses it so the parallel tier and batch guard reason about one card.
  - PhysicalBatchForContext(gpu, ctx) raises the batch only when the extra
    compute buffer fits VRAM/4 at this model's context (16 GiB crosses over
    ~174k ctx, 32 GiB ~349k; GB10 reports system RAM so it still clears it).
  - Apply hardware defaults AFTER runBackendHooks in SetDefaults so the
    GGUF-guessed context is resolved before the batch decision.
  - The distributed router gates the node batch the same way.

Unified-memory devices (GB10, Apple) report system RAM as their single
device's VRAM, so they keep the prefill win.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 00:07:48 +02:00
LocalAI [bot]
5c3d48ab50 feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496)
* feat(ui): remember last-used model per capability

ModelSelector auto-selected the first option whenever the bound value was
empty or stale, so every visit to the Home chat box, Image, TTS or Talk
pages reset the choice to whatever sorted first. Persist the user's pick
in localStorage keyed by capability and prefer it on auto-select when the
model is still available, falling back to the first option otherwise.

Because every modality picker funnels through ModelSelector, this fixes
the friction everywhere at once. External-options callers pass no
capability and keep the previous first-item behaviour.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add visibility-aware polling hook

The app had 26 hand-rolled setInterval polls, none of which paused when
the browser tab was hidden, so backgrounded dashboards kept hitting the
server every few seconds for data nobody was looking at.

Add usePolling: runs immediately, polls on a fixed interval, pauses while
document.hidden, fires a catch-up poll on return, and guards against
overlapping slow requests. Route useResources (the highest-frequency
shared poll) through it. Further callers can be migrated incrementally.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): hardware-aware starter models on empty home

A fresh install dropped admins straight into a 1000+ model gallery with
no guidance. Add a StarterModels widget to the empty-state wizard that
recommends a small, curated set tuned to the detected hardware:

- CPU-only machines (no GPU VRAM) are steered to genuinely small models
  (1-4B, Q4) that stay responsive without a GPU.
- GPU machines get suggestions scaled to available VRAM.

Curated names are real gallery entries, intersected against the live
gallery at render time so a trimmed/custom gallery degrades gracefully.
Install is one click via the existing model-install API.

Also routes Home's cluster and system-info polls through usePolling so a
backgrounded home page stops fetching.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): optional token-cost estimates on usage dashboard

The usage dashboard tracked tokens but had no monetary view. Multi-user
deployments that bill back or budget compute had to export and compute
cost elsewhere.

Add an opt-in pricing control: admins set $ per 1M prompt/completion
tokens (stored per-browser). When set, an estimated-cost summary card and
per-model / per-user cost columns appear, computed from recorded token
counts. The entire cost surface stays hidden until a price is entered, so
the default view is unchanged. Cost is clearly labelled an estimate -
LocalAI itself has no notion of price.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): label icon-only send buttons for screen readers

The chat and agent-chat send buttons were a bare paper-plane icon with
no accessible name, so screen readers announced only "button". Add an
aria-label/title ("Send message") and mark the icon aria-hidden. An audit
of all icon-only buttons found these were the only two unlabeled controls;
the rest already carry visible text.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:30:08 +02:00
LocalAI [bot]
764b0352b9 docs: ⬆️ update docs version mudler/LocalAI (#10491)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:24 +02:00
LocalAI [bot]
75ba2daba1 chore(model-gallery): ⬆️ update checksum (#10495)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:04 +02:00
LocalAI [bot]
62b14fd635 feat(backends): add darwin/metal build for liquid-audio (#10486)
* feat(backends): add darwin/metal build for liquid-audio

Wire the already-MPS-ready liquid-audio backend (it ships
requirements-mps.txt) into the darwin CI matrix and the gallery so
metal-darwin-arm64 images are built and selectable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* ci(liquid-audio): trigger darwin build via requirements-mps note

The changed-backends path filter only builds a backend when a file under
its directory changes. The metal wiring lived in index.yaml + the matrix,
so the darwin job was skipped. Add a documenting comment to the MPS
requirements so CI actually exercises the darwin build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path

Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which
rejects the uv-only --index-strategy flag and failed the darwin backend build.
Add it only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:16:27 +02:00
LocalAI [bot]
193d0e6aef fix(backends): darwin/metal support for supertonic (#10488)
The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and
packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed
ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec
path, while package.sh hard-failed on any non-Linux host. On macOS dyld has
no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib.

This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481
landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but
which omitted supertonic:

- run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH
  at libonnxruntime.dylib; guard the ld.so exec path to Linux only.
- package.sh: recognize Darwin instead of erroring out; the bundled .dylib is
  resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle.
- helper.go: platform-native default library extension (dylib on darwin) for
  the last-resort dlopen fallback.

It also wires the darwin CI build and gallery entries, resolving the
inconsistency where backend/index.yaml advertised metal for supertonic but no
includeDarwin matrix entry built the image:

- .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry.
- backend/index.yaml: declare metal capabilities and add the concrete
  metal-supertonic / metal-supertonic-development child entries.

The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX
Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:19:03 +02:00
LocalAI [bot]
482314c623 fix(realtime): resolve model aliases for pipeline sub-models (#10484)
Realtime pipeline sub-models (llm/transcription/tts/vad/sound-detection)
were loaded via cl.LoadModelConfigFileByName without alias resolution,
unlike top-level API requests which resolve aliases in
core/http/middleware/request.go. So a pipeline that references an alias
(e.g. `pipeline.llm: default`, where `default` is an alias for a real
LLM) reached model loading as the alias stub with an empty Backend.

This was silently broken on a single host (it failed downstream) and a
hard error in distributed/p2p mode:

    routing model : loading model default: ... installing backend on
    node X: backend name is empty

Fix by routing every pipeline sub-model load through a small helper that
follows a single alias hop (mirroring the top-level resolution), so
non-alias sub-models behave identically and aliased ones get the
target's full config (Backend, Model, ...).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:50:44 +02:00
Dedy F. Setyawan
e8ae88a2a0 i18n(id): update and complete Indonesian translations (#10480)
- translate remaining English strings in chat, common, home, and media locales.
- fix typo and improve wording consistency (e.g., klaster -> kluster, otomasi -> automasi).

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-06-24 18:35:21 +02:00
Richard Palethorpe
e1994579f8 fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS (#10474)
pii_default_detectors was applied to the live config only by a live
POST /api/settings (ApplyRuntimeSettings) — neither the startup loader nor
the config file watcher read it back. So after a restart the persisted
default detectors were dropped, and the cloud-proxy MITM listener (which
resolves each intercept host's detectors once at start via ResolvePIIPolicy)
came up with an empty set and forwarded intercepted traffic unredacted, even
though the MITM model had pii.enabled:true and the defaults were on disk.
Request-side default redaction broke the same way.

- startup.go: loadRuntimeSettingsFromFile now applies pii_default_detectors,
  before startMITMIfConfigured, with env > file precedence.
- config_file_watcher.go: apply pii_default_detectors on live file edits,
  matching the existing env-guard pattern used for the other fields.
- settings endpoint: rebuild the MITM listener when pii_default_detectors
  changes (its per-host detector map is frozen at listener start), not only
  on a mitm_listen change — so toggling a default detector takes effect on
  cloud-proxy traffic immediately.
- new LOCALAI_PII_DEFAULT_DETECTORS env var / CLI flag (WithPIIDefaultDetectors)
  so the default detector set can be pinned at boot for immutable deployments.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-24 11:08:57 +02:00
LocalAI [bot]
e5620989dd refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476)
PR #10475 fixed SoundDetection in-flight tracking, but the underlying trap
remains: InFlightTrackingClient embedded the whole grpc.Backend interface
"for passthrough of untracked methods", so any newly added inference method
is silently satisfied by the embedded passthrough and never wrapped with
track(). That leaves onFirstComplete unfired and in-flight stuck at 1 - the
exact SoundDetection bug, waiting to recur for the next backend method.

Close the gap at the type level instead of relying on reviewers to remember:

- Split grpc.Backend into two composed sub-interfaces: InferenceBackend
  (methods that are one discrete inference call and must be tracked) and
  ControlBackend (control-plane calls plus the streaming constructors whose
  work spans the returned stream, safe to pass through). The classification
  now lives next to the interface it documents.
- InFlightTrackingClient embeds only grpc.ControlBackend and implements every
  InferenceBackend method explicitly, delegating to an inner InferenceBackend.
  A `var _ grpc.Backend = (*InFlightTrackingClient)(nil)` assertion makes the
  package fail to compile if any inference method is left unwrapped.

Now adding a method to InferenceBackend is a build error (at the assertion and
every call site: "does not implement grpc.Backend (missing method X)"), not a
silent runtime leak - and the obvious fix is to copy a neighbouring wrapper,
which calls track(). No runtime guard or reviewer vigilance required.

Pure refactor: the composed Backend interface is identical to the old flat
one, so all implementers and consumers are unaffected (verified with a full
`go build ./...`). Behaviour is unchanged; the existing nodes suite passes.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 11:08:29 +02:00
LocalAI [bot]
fc618dcee6 fix(distributed): track in-flight for SoundDetection requests (#10475)
The distributed router wraps backend clients in InFlightTrackingClient so
the eviction logic knows which replicas are actively serving. Every
inference method must be wrapped: track() increments in-flight on entry
and decrements (plus fires onFirstComplete, which releases the load-time
reservation) on return.

SoundDetection was added after the tracking client and never got a
wrapper, so its calls fell through to the embedded passthrough Backend.
The increment/decrement never ran and, critically, onFirstComplete never
fired, so the reservation set at model load was never released - leaving
in-flight stuck at 1 and the replica permanently ineligible for eviction.

Wrap SoundDetection like the other non-LLM methods and cover it in the
"non-LLM inference methods track in-flight" table test.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 10:13:37 +02:00
LocalAI [bot]
e6042080c0 fix(agents): URL-decode collection/agent name path params (#10443) (#10471)
fix(agents): URL-decode collection/agent name path params

Collection and agent names carry a "legacy-api-key:" prefix, so the ':'
arrives percent-encoded as %3A in the request path. Echo routes such
paths via URL.RawPath and stores the matched path-param value still
escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch"
and the store lookup 404'd ("collection not found").

This was second-order fallout of #10375/#10387: once colons became valid
in names, the URL-decode gap surfaced on every name-bearing endpoint.

Add a decodedParam helper that url.PathUnescape's the param (falling back
to the raw value on invalid encoding) and wire it into all collection
endpoints and the agent :name endpoints, which share the identical
prefix. The entry endpoints already unescaped c.Param("*"); this closes
the same gap for :name.

Fixes #10443


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 09:42:09 +02:00
LocalAI [bot]
0f3b24436d chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a (#10468)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:43 +02:00
LocalAI [bot]
4b6f911835 chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea (#10466)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:14 +02:00
LocalAI [bot]
a5e28942a6 chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:54 +02:00
LocalAI [bot]
dba9cd7ca4 chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc (#10465)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:34 +02:00
LocalAI [bot]
c93190de50 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e (#10456)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:13 +02:00
LocalAI [bot]
4dbf69f889 chore(model gallery): 🤖 add 1 new models via gallery agent (#10472)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 00:00:26 +02:00
LocalAI [bot]
deb430f3ec chore(model-gallery): ⬆️ update checksum (#10469)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 23:15:47 +02:00
LocalAI [bot]
dd8c8778e2 chore(model gallery): 🤖 add 1 new models via gallery agent (#10464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 15:43:21 +02:00
LocalAI [bot]
06a7b6cadb chore: ⬆️ Update leejet/stable-diffusion.cpp to f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f (#10457)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:29:07 +02:00
LocalAI [bot]
67c8889866 chore: ⬆️ Update CrispStrobe/CrispASR to 63b57289255267edf66e43e33bc3911e04a2e92d (#10455)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:49 +02:00
LocalAI [bot]
1d49041c85 chore: ⬆️ Update ggml-org/llama.cpp to 73618f27a801c0b8614ceaf3547d3c2a99baae14 (#10458)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:09 +02:00
LocalAI [bot]
2edc4e25b3 chore: ⬆️ Update ggml-org/whisper.cpp to bae6bc02b1940bbfb87b6a0299c565e563b916d1 (#10459)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:27:51 +02:00
Richard Palethorpe
7888067914 fix(settings): merge partial /api/settings updates instead of overwriting (#10463)
POST /api/settings rebuilt runtime_settings.json from only the request
body, so a focused admin page that submits a single field wiped every
other persisted setting. The Middleware proxy tab (mitm_listen) and
detector table (pii_default_detectors), plus the MCP SetBranding tool
(instance_name/instance_tagline), all POST partial bodies; the
no-omitempty api_keys and pii_default_detectors fields even round-tripped
as JSON null.

Read the persisted settings and overlay only the fields the request set
(RuntimeSettings.MergeNonNil) before writing. Every field is a pointer, so
the reflection-based merge is total over the struct and any field added
later is preserved automatically. Absent or null fields are now kept;
clearing a setting is done by sending its explicit empty/zero value
(api_keys [], mitm_listen "", etc.), unchanged from before. The full
Settings page sends every field, so its Save behaves identically.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-23 13:27:34 +02:00
LocalAI [bot]
9eedbf537a chore(model gallery): 🤖 add 1 new models via gallery agent (#10461)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 08:04:46 +02:00
LocalAI [bot]
69c16481c8 fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460)
PR #10454 added a `cancellable bool` parameter to GalleryStore.UpdateProgress
but missed two callers under tests/e2e/distributed, breaking the build on
master (golangci-lint and tests-e2e-backend both failed to compile with
"not enough arguments in call to ... UpdateProgress").

Pass cancellable=true (both ops are downloading installs, which are
cancellable) and assert the flag is persisted, exercising the new behavior.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 23:45:22 +02:00
LocalAI [bot]
56f8a6623f fix(galleryop): persist cancellable so restarted in-flight ops stay cancellable (#10454)
In distributed mode a model/backend install marks OpStatus.Cancellable=true
while downloading, but the gallery_operations row never recorded it:
UpdateStatus persisted only progress/status and Create left the cancellable
column at its zero value. After a replica restart Hydrate rebuilt the op with
cancellable=false, /api/operations reported false, and the UI hid the cancel
button - the orphaned op then lingered until the 30-minute stale reaper
expired it ("stays there on restart, can't cancel, after a bit it expires").

Persist the flag on every progress tick and at row creation (installs are
cancellable, deletes are not), and clear it on terminal transitions. A
rehydrated in-flight op is now cancellable, so an admin can dismiss the
orphaned op immediately instead of waiting out the reaper. The functional
cancel path already survived restart (CancelOperation persists store.Cancel
even with no live CancelFunc); this restores the UI affordance that drives it.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 22:41:16 +02:00
Ettore Di Giacinto
4755d676a3 Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar)" (#10453)
Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top…"

This reverts commit 9d54a599b0.
2026-06-22 21:59:05 +02:00
dependabot[bot]
10184b5e28 chore(deps): bump actions/checkout from 6 to 7 (#10451)
Bumps [actions/checkout](https://github.com/actions/checkout) from 6 to 7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v6...v7)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-22 21:38:37 +02:00
LocalAI [bot]
fdf475ec5f feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446)
* feat(realtime): add pipeline.compaction config + resolution

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(realtime): extract itemID helper, reuse in item.retrieve

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(realtime): drop duplicate Ginkgo bootstrap, fold specs into openai suite

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement conversation.item.delete

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement input_audio_buffer.clear

Add a handler for the input_audio_buffer.clear client event that discards
a partially-captured utterance (raw PCM + buffered Opus frames) via a
unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement conversation.item.truncate (text)

Clears both .Text and .Transcript of the assistant content part at
contentIndex so barge-in truncation also works for audio turns whose
spoken words live in .Transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): add Conversation.Memory + pair-safe compactionCut

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, avoids panic)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* style(realtime): gofmt compaction test helper closures

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): inject rolling memory into the prompt + summary builders

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): server-side summarize-then-drop compactor

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(realtime): unit-test prefixMatches eviction-safety predicate

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): resolve summarizer model + schedule compaction per turn

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(realtime): document conversation compaction + new item events

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(realtime): resolve summary model inside compaction goroutine (lazy, off-path)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(realtime): reuse reasoning.ExtractReasoningComplete for summary stripping

Replace the bespoke <think> regex in the compactor with the shared
pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of
the realtime path and covering all reasoning tag families, not just <think>.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(config): register pipeline.compaction fields in meta registry

TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have
a UI/meta registry entry; add the four pipeline.compaction.* leaves so they
render with proper labels/descriptions instead of the reflection fallback.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 21:28:49 +02:00
LocalAI [bot]
9d54a599b0 feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar) (#10449)
* feat(ui): add shared DeploymentContext (features + p2p signal)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(ui): extract launchAssistantChat shared helper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): role/mode-aware landing redirect at /app

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): pin Cluster group and collapse Create for cluster admins

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): desktop top navbar with mode pill and admin-via-chat jump

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): admin token-usage meter in the top navbar

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): top-navbar breakpoint handoff + assistant jump from chat page

M1: the desktop .top-navbar was hidden at max-width 768px while the
.mobile-header only appears at max-width 639px, leaving 640-768px with
neither bar so admins lost the mode pill, token meter and admin-via-chat
jump. Hide the top bar at 639px instead so it covers every width the rail
sidebar is shown and hands off to the mobile-header exactly at 639px.

M2: the navbar 'Admin via chat' button wrote localStorage and called
navigate('/app/chat'), but when already on the chat page Chat does not
remount so its mount-time payload reader never fired and the click was a
no-op until reload. The payload consume logic is factored into a shared
callback; the launcher now dispatches a localai-open-assistant event that
the mounted Chat listens for to re-consume the payload. Mount behavior is
unchanged.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 21:27:43 +02:00
Richard Palethorpe
63bcbf6c12 fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401)
* fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier

Follow-up to the NER tier engine (#10360), already on master. This carries
only the incremental review fixes and tests that postdate that merge — the
feature itself is not re-introduced.

Review fixes:
- openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText
  (the `elem < 0` guard above already returns).
- application.go: collapse ResolvePIIPolicy's inline re-implementation of
  PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the
  "explicit pii.enabled wins, else cloud-proxy default" rule) and return true
  past the !enabled guard where it is provable.
- pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in
  patternDetector.Detect into one local.
- grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat
  bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire —
  dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000]
  is rejected here with an actionable error; >1000 still fails closed via
  Parse. Specs pin the relationship so the guard can't silently revert.
- PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and
  force the DOM value back when clamping (min={0} only constrained the spinner,
  so a negative reached saved config and silently disabled the length filter).

Tests:
- piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead).
- model-config.spec.js: assert the min_len clamp, and that entity_actions
  collapses a duplicate group to a single row (map semantics; regression guard
  against emitting an array that drops a row on save).
- tests/e2e-backends: token_classify capability driving the TokenClassify gRPC
  RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned
  spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via
  `make test-extra-backend-privacy-filter` (3/3 specs).
- Makefile: test-extra-backend-privacy-filter wrapper.
- tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact
  (mask + block) through the full HTTP -> detector -> redactor path; gated on
  PII_NER_MODEL_GGUF so the default suite is unaffected.
- .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job
  running the container harness on CPU.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(gallery): add privacy-filter-nemotron (f16 + q8)

GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English
PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from
openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing
privacy-filter-multilingual entry, trading language breadth for category depth.

- privacy-filter-nemotron: F16 reference artifact (~2.8 GB).
- privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge
  use; description notes the size/speed tradeoff and to validate on your own
  data (a single dropped span is a PII leak).

Both run on the privacy-filter backend with known_usecases [token_classify] and
a default mask policy (min_score 0.5); operators add per-category entity_actions
as needed. sha256s taken from the HF repo's LFS object ids.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-22 18:26:19 +02:00
LocalAI [bot]
95b058e1c5 feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447)
* chore: gitignore SDD scratch directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(nodes): add GET /api/nodes/models cluster-wide loaded-models endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add nodesApi.allModels() for cluster-wide model roster

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): move Scheduling to its own page and nav item

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): replace nodes stat-card strip with cluster pulse + attention callout

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): node-panel roster with inline model chips and segmented filter

Replace the Nodes table with a full-width node-panel roster that shows
each backend node's running-model chips without an expand click, plus an
All/Backend/Agent segmented filter. Per-node detail (models, backends,
labels, capacity) moves to the node detail page.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add deep-linkable node detail page at /app/nodes/:id

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): remove em-dash from CapacityEditor comment; align detail spec backend mock

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(ui): nodes page cleanup, hover/chip polish, docs for restructured cluster view

Nodes.jsx dead-code sweep confirmed clean (no StatCard/table/expand
state/scheduling-form leftovers). Two App.css polish fixes: move the
node-panel hover border-color onto the bordered element so hover gives
real feedback, and add the missing .model-chip__state rule the
ModelChip component already emits. Update distributed-mode docs prose to
describe the restructured cluster view (cluster pulse, attention
callout, node-panel roster with inline model chips, All/Backend/Agent
filter, node detail page at /app/nodes/:id, Scheduling as its own page).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(ui): drop unused gpuVendorLabel export from nodeStatus

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 18:24:29 +02:00
LocalAI [bot]
f2abcc7503 chore(model gallery): 🤖 add 1 new models via gallery agent (#10445)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 16:09:16 +02:00
Adira
62c99c10b3 fix(diffusers): pin diffusers and transformers to a known-good pair (#9979) (#10442)
fix(diffusers): pin diffusers and transformers to a known-good pair

The diffusers backend tracked git+https://github.com/huggingface/diffusers
(main) with an unpinned transformers. transformers v5 restructured
CLIPTextModel and removed the .text_model attribute that diffusers' single
-file loader reads, so loading any single-file Stable Diffusion checkpoint
fails:

    create_diffusers_clip_model_from_ldm (single_file_utils.py)
    position_embedding_dim = model.text_model.embeddings.position_embedding...
    AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

No released diffusers (<=0.38.0) supports transformers v5 - only unreleased
diffusers main does. Because the requirements tracked main plus an unpinned
transformers, every backend image froze whichever pair existed at build
time, and images built once transformers v5 shipped but before diffusers
main caught up are permanently broken.

Pin the last known-good released pair across all requirements files:
diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every
pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no
functionality is lost, and builds become reproducible instead of drifting
into the broken window.

Fixes #9979

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-06-22 12:38:06 +02:00
LocalAI [bot]
7226bb9f30 chore: ⬆️ Update CrispStrobe/CrispASR to 7a8cb80907341c0204bd0488c1244764f4163883 (#10315)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 12:21:58 +02:00
LocalAI [bot]
569d9bbd9e fix(distributed): broadcast file-staging progress across replicas (#10440)
File-staging progress lived only in the SmartRouter's in-memory
StagingTracker on the replica performing the transfer. In a multi-replica
deployment behind a round-robin load balancer, a /api/operations poll
that lands on any other replica saw no staging row, so the progress
("processing file ... Total ... Current ...") flickered in and out as
polls rotated between frontends.

Mirror the pattern already used for gallery-install progress: the origin
replica broadcasts staging ticks over NATS (SubjectStagingProgress, a
new staging.<model>.progress subject), and peers merge them via
ApplyRemote (SubscribeBroadcasts on the wildcard). Byte-level ticks are
leading-edge debounced (~1/s); Start/FileComplete/Complete always
publish. A locally-owned op stays authoritative so the origin's own echo
and stray peer events can't clobber it, and mirrored remote ops expire
after a TTL so a missed Done event can't leave a phantom row. The UI read
path (StagingTracker.GetAll) is unchanged.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 09:28:07 +02:00
LocalAI [bot]
682fb2718c fix(distributed): detach cold-load staging from the request context (#10438)
A model not yet loaded on a worker is staged lazily on the inference
request path. Staging a multi-GB model takes minutes - far longer than
any client keeps its HTTP request open - so a browser refresh, an
ingress/LB idle-timeout, or a round-robined retry landing on another
frontend replica cancels the request context and aborts the upload with
"context canceled" mid-transfer. Large models then never finish staging,
so they never load (observed in a 2-replica deployment: both frontends
repeatedly failed to stage a 15.7 GB GGUF, each attempt dying at a
different offset).

Bind the cold load (staging + LoadModel + the per-model advisory lock) to
context.WithoutCancel(ctx): it keeps the request's values (prefix chain)
but drops cancellation/deadline. Each long step keeps its own bound (the
file stager's resume budget, LoadModel's 5m timeout), and the advisory
lock still de-dupes concurrent loaders across replicas.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 09:06:20 +02:00
LocalAI [bot]
20c643e1f6 chore(model gallery): 🤖 add 1 new models via gallery agent (#10439)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 08:46:34 +02:00
VJSai
64a4351f3a feat: send a LocalAI User-Agent on registry pulls (#10434)
LocalAI pulls models from OCI registries (via go-containerregistry), the
Ollama registry, and OCI blob stores (via oras), but every request went
out with the underlying library's generic User-Agent, so registry
operators had no way to attribute traffic to LocalAI.

Add an oci.UserAgent() helper that returns "LocalAI" (or
"LocalAI/<version>" when the binary is built with a version stamp via
internal.Version) and wire it into all three pull paths:

- pkg/oci/image.go: remote.WithUserAgent on the go-containerregistry
  image and digest requests
- pkg/oci/ollama.go: a User-Agent header on the Ollama manifest request
- pkg/oci/blob.go: a LocalAI User-Agent on the oras blob client. This
  mirrors oras' auth.DefaultClient (same retry.DefaultClient policy);
  only the advertised User-Agent changes.

Implements #6258.


Assisted-by: Claude:claude-opus-4-8 golangci-lint

Signed-off-by: Vijay Sai <vijaysaijnv@gmail.com>
2026-06-22 08:44:12 +02:00
LocalAI [bot]
b7d67f5779 chore: ⬆️ Update ggml-org/llama.cpp to 7c082bc417bbe53210a83df4ba5b49e18ce6193c (#10417)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 08:43:40 +02:00
LocalAI [bot]
600dafd20b feat(ced): sound-event classification backend (CED audio tagger) (#10425)
* feat(ced): sketch sound-classification backend (CED audio tagger)

Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry,
footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend.

SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist
in DESIGN.md):
- backend/backend.proto: new SoundDetection rpc + SoundClass messages
  (run `make protogen-go` to regenerate pkg/grpc/proto).
- backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h),
  goced.go (Ced gRPC backend: Load + SoundDetection), Makefile
  (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh,
  package.sh, .gitignore.
- DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability
  registration checklist), gallery/index + CI registration, and a scoping
  note for the realtime/websocket live-recognition path (sliding-window
  classify over the existing ws transport + voicegate; the ced C-API
  per-PCM entry point is already window-friendly).

Backend code does not compile until protogen-go regenerates the pb types
and a libced.so is built (Makefile clones+builds it).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): REST /v1/audio/classification endpoint + capability registration

Wires the ced sound-event classification backend (AudioSet audio tagger)
end to end through the REST surface, mirroring the transcription path.

- Handler: core/http/endpoints/openai/sound_classification.go parses the
  multipart audio upload, temp-files it, resolves the model config and
  calls the SoundDetection RPC; returns {model, detections[]} JSON.
- Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection)
  loads the model and normalizes the proto response into schema types.
- Schema: core/schema/sound_classification.go (SoundClassificationResult).
- gRPC layer: SoundDetection wired through the LocalAI wrapper (interface,
  Backend client, Client, embed, server, base default) so the loader-typed
  client exposes the RPC; proto regenerated via make protogen-go.
- Route: POST /v1/audio/classification (+ /audio/classification alias) with
  the audio/multipart default-model middleware in routes/openai.go.
- Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_
  CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap +
  GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase
  option; /api/instructions audio area updated; auth RouteFeatureRegistry +
  FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI
  usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter
  + i18n; docs page features/audio-classification.md + whats-new + crosslink.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): realtime sound-event detection over the websocket API

When a realtime pipeline configures a sound-classification model, each
VAD-committed utterance (the same window the transcription path produces)
is also run through the CED sound-event classifier and the scored AudioSet
tags are emitted as a new server event. No new backend rpc is needed: the
SoundDetection gRPC method already exists on this branch.

- config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty)
  beside Transcription/VAD.
- realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the
  ModelInterface; implement it on wrappedModel and transcriptOnlyModel by
  calling backend.ModelSoundDetection with the session's sound-classification
  model config (mirrors how Transcribe dispatches). Load the optional config
  in newModel / newTranscriptionOnlyModel; nil config keeps it additive.
- types: add ConversationItemSoundDetectionEvent (item_id, content_index,
  detections[]{label,score,index}) with type conversation.item.sound_detection,
  its ServerEventType constant and MarshalJSON, mirroring the transcription
  completed event.
- realtime: add emitSoundDetection (unary path: classify the committed window,
  build the event, t.SendEvent) and wire it at the utterance-commit hook right
  after emitTranscription; gated on session.SoundDetectionEnabled (resolved
  from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0).
  Its error is logged via xlog but never aborts the turn.
- test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections,
  classifier error) plus a SoundDetection method on the fakeModel double.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): implement SoundDetection in nodes backend test doubles

The SoundDetection method added to the grpc backend interface left two
test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so
core/services/nodes failed to compile under `go vet`/`go test` (go build
missed it: the doubles live in _test.go). Add the method to both,
mirroring their existing Detect mock. Repairs CI for the nodes package.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): decouple realtime sound detection from VAD (sound-only sessions)

Sound-event detection must activate on sounds, not speech, so it no longer
runs through the voice VAD/transcription path. A sound-detection-only
pipeline (sound_detection set, no transcription/LLM) now:

- is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline
  stage),
- builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS
  loaded), and
- defaults the session to turn_detection none (no VAD) with no transcription
  stage, so the client drives windowing via input_audio_buffer.commit
  (option A: client-side sliding window). The per-PCM C-API already supports
  arbitrary windows.

commitUtterance gains a sound-only branch: it emits the
conversation.item.sound_detection event (scored AudioSet tags) and stops -
no transcription, no LLM response. generateResponse is now guarded on a
transcription stage being present, so a sound-only turn never invokes the LLM.

Existing transcription/VAD sessions are unchanged (additive). Added a
commitUtterance sound-only Ginkgo spec asserting it emits the sound event and
neither transcribes nor generates a response. go vet + golangci-lint
(new-from-merge-base) clean; openai suite green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): register sound-classification backend in gallery + CI

Mechanical backend-image registration for the ced sound-event classifier,
mirroring the parakeet-cpp Go/purego backend everywhere it is wired up.

- .github/backend-matrix.yml: add the ced build matrix, field-for-field copies
  of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64,
  l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan
  amd64/arm64, rocm hipblas, and the metal darwin entry), changing only
  backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang.
- backend/index.yaml: add the &ced meta anchor (capabilities map per platform)
  plus ced-development and the per-arch image entries, each uri/mirror
  tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is
  intentionally deferred pending the HuggingFace publish (TODO note inline).
- scripts/changed-backends.js: add an explicit item.backend === "ced" branch in
  inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as
  the parakeet-cpp branch (before the generic golang fallthrough).
- .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in
  backend/go/ced/Makefile so the daily bot bumps the pin.
- swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so
  the existing /v1/audio/classification annotations land in the generated spec.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): server-side windowing for realtime sound detection (option B)

Adds an optional server-driven sliding-window classifier so a sound-only
realtime client only has to stream audio (no input_audio_buffer.commit):

- Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs.
  When both > 0 on a sound-only session, the server classifies the last
  window of streamed audio every hop and emits a conversation.item.sound_
  detection event; the input buffer is trimmed to one window so a long
  stream stays bounded. When unset, the session stays client-driven
  (option A). Runs independent of VAD (sound events are not speech).
- handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so
  it is unit-testable) + writeWindowWAV, which declares the true
  InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples
  correctly. Goroutine is started after toggleVAD and torn down with the
  session (close + wg.Wait).
- Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta
  registry; the earlier realtime commit added pipeline.sound_detection
  without a registry entry, failing TestAllFieldsHaveRegistryEntries. This
  fixes that and covers the two new knobs.

Tests: classifySoundWindow emits an event + trims the buffer to one window,
no-ops on too-little audio; writeWindowWAV declares the given sample rate.
go build/vet + golangci-lint (new-from-merge-base) clean; config + openai
suites green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0)

The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0,
converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced +
known_usecases: sound_classification) and two gallery/index.yaml entries
(ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and
removes the now-resolved TODO from backend/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add tiny/mini/small GGUF model gallery entries

Publishes the rest of the CED family (same architecture, metadata-driven port
verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds
their f16 + q8_0 gallery entries:

  ced-tiny  (5.5M, edge/Pi-class)  f16 11MB / q8_0 6MB
  ced-mini  (9.6M)                 f16 19MB / q8_0 11MB
  ced-small (22M)                  f16 42MB / q8_0 23MB

All sha256-pinned. ced-base remains the accuracy default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo

All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single
HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8
gallery model entries' urls + file uris accordingly. sha256 and filenames are
unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): bump CED_VERSION to the short-clip fix

Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip
shorter than target_length (~10.11s): time_pos_embed was added at its full
63-frame grid instead of being sliced to the clip's actual time grid, tripping
ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s
windows) and gated with a short-clip parity test upstream.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive

- README.md: add ced.cpp to the "native C/C++/GGML engines developed and
  maintained by the LocalAI project" table.
- docs/content/features/backends.md: add a Sound Classification backend
  category (sound-event classification / audio tagging) listing ced.cpp.
- .agents/adding-backends.md: add a "Documenting the backend" section and two
  verification-checklist items requiring new backends to be documented in the
  backends.md category list, and in-house native engines to be added to the
  README maintained-engines table. This directive was missing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): repin CED_VERSION to the v0.1.0 release commit

ced.cpp history was squashed into a single release commit (tagged v0.1.0), so
the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the
v0.1.0 release commit, so the backend builds against a commit that exists.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths

- sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of
  the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler.
- goced.go: reading a NUL-terminated C string from a libced-owned buffer.
  #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since
  the uintptr is a C-owned malloc'd buffer, not Go-GC memory.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 01:00:28 +02:00
LocalAI [bot]
ce8a3e9266 chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 4536dcdce27c3764a93a06d6bf64026b124962f5 (#10431)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 01:00:10 +02:00
LocalAI [bot]
a88d9d2de3 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 6c00e87ac84404af588ad2e65935bd6f079c696f (#10430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:49 +02:00
LocalAI [bot]
1cf1bf32e1 chore: ⬆️ Update leejet/stable-diffusion.cpp to b12098f5d09fc83da36e65c784f7bdb16a5a5ebf (#10429)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:33 +02:00
LocalAI [bot]
f45c6acc54 chore(model gallery): 🤖 add 1 new models via gallery agent (#10437)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:08 +02:00
LocalAI [bot]
1a1bd57469 chore(model gallery): 🤖 add 1 new models via gallery agent (#10436)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:46:56 +02:00
LocalAI [bot]
1f29e96030 chore(model gallery): 🤖 add 1 new models via gallery agent (#10433)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-21 23:51:43 +02:00
LocalAI [bot]
64560a974b chore(model gallery): 🤖 add 1 new models via gallery agent (#10432)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-21 23:31:17 +02:00
LocalAI [bot]
32c47706ae feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424)
* feat(realtime): add voice_recognition enforce + identity config

Add Enforce *bool and Identity *VoiceIdentityConfig to
PipelineVoiceRecognition, plus EnforceGate/IdentityEnabled/
AnnounceEnabled/PersonalizeEnabled helpers. Enforce nil defaults to
gating (backward compatible); identity surfacing is independent of the
gate.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): add Speaker type and conversation.item.speaker event

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(realtime): split voiceGate into Resolve + authorize

Split the speaker authorization into a Resolve step (embed once, produce a
types.Speaker identity) and a pure authorize policy step, with a 0..100
confidence score mirroring /v1/voice/identify. The legacy Authorize wrapper is
kept so existing specs stay green.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): resolve speaker per turn and emit conversation.item.speaker

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): personalize LLM turns with recognized speaker

Set the per-message name field on each recognized user turn and append a
current-speaker note to the system message, both gated by the voice
recognition identity config.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(realtime): document speaker identity surfacing and personalization

Document the new voice_recognition keys (enforce, identity.*) and the
LocalAI-extension conversation.item.speaker server event in the realtime
feature docs.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(realtime): cover when:first+identity re-resolution and multi-speaker history

Add two integration specs to harden the speaker-aware realtime path:

- when:first with an Identity block re-resolves the speaker every turn even
  though re-authorization is skipped after the first match: a later resolve
  error now fails closed, while a clean later resolve still surfaces and names
  the speaker.
- multi-speaker history attribution: each user turn carries its own per-message
  name and the injected system note reflects the latest speaker.

Test-only change; no production behavior was modified.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): surface speaker labels in conversation.item.speaker

Carry the registered speaker's labels (identify mode) on types.Speaker so
they flow into the conversation.item.speaker event and the stored item.
Verify mode has no labels, so the field is omitted there.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(e2e): cover conversation.item.speaker over a real websocket

Add a realtime-pipeline-identity config (verify mode, enforce:false, identity
announce+announce_unknown+personalize) and two e2e specs driving the real
server over a real WebSocket with the mock VoiceEmbed backend: an authorized
speaker yields a conversation.item.speaker event naming e2e-speaker (matched
true) and reaches response.done; an unauthorized speaker yields an unknown
(matched false, no name) event and still responds, proving enforce:false
never drops a turn.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(config): register voice_recognition enforce + identity fields

The meta registry coverage test (TestAllFieldsHaveRegistryEntries) requires
every config field to have an entry in core/config/meta/registry.go. The new
voice_recognition.enforce and voice_recognition.identity.* fields were missing,
failing tests-linux and tests-apple. Add registry entries (toggles) so the
fields are surfaced in the model-config editor and the coverage test passes.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-21 21:07:10 +02:00
Tai An
e58870a573 feat(react-ui/chat): paste images from clipboard into chat input (#10428)
The chat input only accepted attachments via the file picker, so users
who copied an image from a webpage or a screen region had to first save
it to a file before attaching it (#10361).

Add an onPaste handler on the input textarea that pulls image items out
of the clipboard and routes them through the same staging path as the
file picker. The per-file processing in handleFileChange is extracted
into a shared processFiles helper so both entry points stay in sync.
Clipboard images, which arrive unnamed or as a generic "image.png", are
given unique typed names so multiple pastes don't collide, and the
default paste is suppressed only when an image is actually attached so
normal text paste is unaffected.

Closes #10361

Signed-off-by: Anai-Guo <antai12232931@outlook.com>
2026-06-21 18:20:56 +02:00
LocalAI [bot]
8fab1d2e45 fix(ci): namespace-import js-yaml in changed-backends.js (Bun ESM: missing default export) (#10427)
fix(ci): use namespace import for js-yaml in changed-backends.js

js-yaml's ESM build exposes only named exports (load, dump, ...) and no
default export. Bun's strict ESM interop rejects the default import with
'Missing default export in module js-yaml.mjs', failing the detect-changes
and generate-matrix CI jobs. Import the namespace instead; yaml.load (the
only usage) resolves to the named export, so behavior is unchanged.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-21 17:52:02 +02:00
LocalAI [bot]
7b462a0d51 fix(backend): call vram.EstimateModelMultiContext (master build broken: undefined vram.EstimateModel) (#10426)
fix(backend): call vram.EstimateModelMultiContext for model size estimate

core/backend/options.go called vram.EstimateModel, which does not exist in
the vram package (it exposes EstimateModelMultiContext). This broke the build
on master (undefined: vram.EstimateModel). Use EstimateModelMultiContext with
a nil context-size slice (defaults to a single 8192 estimate); the returned
MultiContextEstimate.SizeBytes is exactly what the caller consumes, so size
estimation behavior is unchanged.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-21 17:51:46 +02:00
LocalAI [bot]
aed181e6c1 chore(model gallery): 🤖 add 1 new models via gallery agent (#10423)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-21 17:40:55 +02:00
OrbisAI Security
a556cd9afc fix: the trl backend's _do_training method directly ... in backend.py (#10422)
* fix: V-001 security vulnerability

Automated security fix generated by OrbisAI Security

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>

* fix: the trl backend's _do_training method directly ... in backend.py

The TRL backend's _do_training method directly uses request

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>

---------

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>
2026-06-21 17:40:29 +02:00
Leoy
b50b1fe418 feat(watchdog): add size-aware LRU eviction mode (#9527)
* feat(watchdog): add size-aware LRU eviction mode

When the model count hits the LRU limit or the memory reclaimer fires,
evict the largest model by on-disk file size first rather than the
least-recently-used one.  For GGUF models the file size is a reliable
proxy for GPU/RAM footprint, so evicting the largest candidate maximises
freed memory per eviction round while keeping small utility models
(embeddings, classifiers, rerankers) resident.

Changes:
- `pkg/model/watchdog.go`: add `sizeAwareEviction` flag and
  `modelSizes map[string]int64` to `WatchDog`; sort candidates by
  `sizeBytes` desc (LRU time as tiebreaker) when the flag is set;
  add `RegisterModelSize`, `SetSizeAwareEviction`, `GetSizeAwareEviction`
- `pkg/model/watchdog_options.go`: add `WithSizeAwareEviction` option
- `pkg/model/initializers.go`: stat model file after load and call
  `RegisterModelSize` so size data is available before the first eviction
- `core/config/application_config.go`, `runtime_settings.go`: add
  `SizeAwareEviction` field and `WithSizeAwareEviction` app option;
  expose via `ToRuntimeSettings` / `ApplyRuntimeSettings` for the
  `POST /api/settings` live-reload path
- `core/cli/run.go`: add `--size-aware-eviction` flag /
  `LOCALAI_SIZE_AWARE_EVICTION` env var
- `core/application/startup.go`, `watchdog.go`: wire the new option
  through to `NewWatchDog`
- `pkg/model/watchdog_test.go`: 5 new specs — option enable, dynamic
  toggle, largest-first ordering, equal-size LRU tiebreaker, no-size
  fallback to LRU, and size-map cleanup on eviction

Closes #9375

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* refactor(watchdog): use vram estimation scaffolding for model size

Replace the brittle os.Stat(modelFile) approach with a proper call to
pkg/vram, which handles multi-file models (DownloadFiles, MMProj) and
all weight file types, not just single GGUF files.

- Add estimateModelSizeBytes() in core/backend/options.go that collects
  all weight file URIs from the model config, resolves them to file://
  URIs, and calls vram.Estimate() with the shared DefaultCachedSizeResolver
  (15-min TTL cache avoids redundant stat calls on repeated loads)
- Thread the result through via a new WithModelSizeBytes() loader option
- In initializers.go, consume the pre-computed size instead of calling
  os.Stat; if no size was supplied (e.g. for external/router-dispatched
  models) the registration is simply skipped

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* refactor(watchdog): use EstimateModel with HF fallback for size estimation

Switch estimateModelSizeBytes from calling vram.Estimate directly to the
unified vram.EstimateModel entry point, which adds automatic fallbacks:
file-based GGUF metadata → HF API → size string.

Also extract the HuggingFace repo ID from model URIs (huggingface://,
hf://, https://huggingface.co/ and org/model short-form) and pass it
as ModelEstimateInput.HFRepo, so models not yet downloaded locally can
still get a size estimate via the HF API.

Addresses @mudler's review feedback: "better to rely on EstimateModel
and pass by the HF URL of the model extracted from the URI".

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* feat(webui): add Size-Aware Eviction toggle to settings page

The size-aware eviction setting was wired through the CLI flag and the
RuntimeSettings live-reload path (POST /api/settings) but had no handle
on the React settings page, so it could not be toggled from the UI.

Add a Size-Aware Eviction toggle to the Watchdog section, next to the
existing Force Eviction When Busy / LRU eviction handles. The settings
page loads and saves the whole RuntimeSettings object, so the new
size_aware_eviction key is picked up with no extra plumbing.

Addresses @mudler's review feedback: the application config setting
should land on the same UI settings page as the other handles.

Signed-off-by: supermario_leo <leo.stack@outlook.com>

---------

Signed-off-by: supermario_leo <leo.stack@outlook.com>
2026-06-21 17:17:04 +02:00
pos-ei-don
b4c0dc67fe feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346) (#10351)
* fix(vllm): don't stream raw tool-call markup as content when a tool parser is active

When a tool_parser is configured and the request carries tools, the streaming
loop emitted every text delta as delta.content — including the model's raw
tool-call markup (e.g. <tool_call>...) — because extract_tool_calls only runs
on the full output after the stream. Clients streaming a tool call therefore
saw the unparsed tool-call syntax as assistant content.

Buffer the text while a tool parser is active for the request; the existing
end-of-stream chat_delta already carries the parsed tool_calls (or the cleaned
content), which the Go side converts to SSE deltas. Non-tool-parser streaming
is unchanged.

Add a server-less regression test covering both the tool-call case (no raw
markup leaked as content) and the plain-text case (content delivered exactly
once — guards against double-emitting the buffered content).

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* test(vllm): add expectedFailure test for progressive streaming with tool parser (Case 3, #582)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* test(vllm): add Cases 4+5 — marker split across chunks + false-positive prefix (TDD, Option B state machine, #582)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* feat(vllm): progressive streaming via parser.extract_tool_calls_streaming

When a tool parser is active for a tool-enabled streaming request,
#10346 buffers the entire generation and surfaces it on the final
chunk to prevent raw tool-call markup from leaking as delta.content.
This is correct but turns the request into effectively non-streaming
for plain-text responses — the client sees nothing until the model
stops.

Every concrete tool parser shipped with vLLM 0.23+ already implements
extract_tool_calls_streaming (Granite4, Qwen3Coder, DeepSeekV31, Jamba,
Ernie45, Hermes2Pro, llama3_json, mistral, …). Use it: instantiate
the parser before the streaming loop and call its streaming method per
delta, emitting DeltaMessage(content=…) or DeltaMessage(tool_calls=[…])
when the parser is ready.

Falls back to the existing #10346 buffer path when:
  - the parser does not have extract_tool_calls_streaming, OR
  - extract_tool_calls_streaming raises mid-stream (logged, the
    rest of the request finishes via post-loop extract_tool_calls).

Tests (TestStreamingToolParser):
  1. Buffer path: no markup leaked, no content duplication
  2. Native streaming: plain-text response streams progressively
  3. Native streaming: tool_call structured, no markup leaked
  4. Native streaming exception → graceful fallback, no markup, no crash
  5. No tool parser → unchanged per-delta content stream

E2E verified against qwen3_coder on vLLM 0.23.0 (NVIDIA GB10 / arm64 / CUDA 13).

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* docs(vllm): add server-side TTFT benchmark for the streaming tool-parser path

Self-contained stdlib-only script that measures time-to-first-token (TTFT)
for the vLLM backend's two streaming scenarios:

  - tool_call:  request mentions a tool; model is expected to call it
  - plain_text: request offers a tool but explicitly asks for prose

Use this to compare:
  - the buffer-all path (#10346)         → plain_text TTFT ≈ total response time
  - the native-streaming path (this PR)  → plain_text TTFT ≈ true first-token time

  python examples/vllm-bench/ttft_streaming_tool_parser.py \\
      --url http://localhost:8080 --model my-coder --runs 3

Lives under examples/ so it does not interfere with the test suite.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* examples/vllm-bench: add long-text scenario (8 paragraphs, 1500 tokens)

The long-text scenario shows the buffering vs streaming difference most
dramatically: with the buffer-all path, the client receives nothing for
20+ seconds and then the entire 1500-token response at once. With native
streaming, the first token arrives in tens of milliseconds and the
response flows progressively.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

---------

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
Co-authored-by: Philipp Wacker <philipp.wacker@ibf-solutions.com>
2026-06-21 17:07:15 +02:00
番茄摔成番茄酱
01fa12e0de feat(nemo): enable word-level timestamps for ASR models (#10297)
* feat(nemo): enable word-level timestamps for ASR models

The nemo backend ignored timestamp_granularities and always returned a
single segment with start=0 end=0, making word-level timestamps
impossible to obtain even though the NeMo models (parakeet-tdt, etc.)
fully support them.

Changes:
- Add _get_stride_seconds() to compute frame duration from the model's
  preprocessor window_stride and encoder subsampling_factor.
- Add _build_segments_with_words() that extracts word offsets from the
  NeMo Hypothesis.timestamp dict and converts frame indices to
  nanosecond timestamps.
- Support 'word' granularity (one segment per word) and 'segment'
  granularity (merge at time-gap boundaries using a dynamic threshold).
- Populate TranscriptSegment.words with TranscriptWord entries so
  callers get both segment-level and word-level timing.
- Only request timestamps from NeMo when the caller actually asks for
  them (timestamp_granularities is non-empty), keeping the fast path
  unchanged for callers that don't need timestamps.

Tested with nvidia/parakeet-tdt-0.6b-v3 on the JFK "ask not" clip:
  curl -X POST /v1/audio/transcriptions \
    -F file=@jfk.wav -F model=nemo-parakeet-tdt-0.6b \
    -F 'timestamp_granularities[]=word' -F response_format=verbose_json
  → each word has correct start/end times in seconds.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

* fix(nemo): address Copilot review feedback

- Narrow exception handling in _get_stride_seconds to catch only
  AttributeError, KeyError, TypeError instead of bare Exception, and
  emit a warning when falling back to the hardcoded stride.
- Remove explicit return_hypotheses=False when timestamps are requested;
  timestamps=True already forces NeMo to return Hypothesis objects.
- Add a warning when NeMo does not return Hypothesis objects despite
  timestamps being requested.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

---------

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-21 17:04:19 +02:00
番茄摔成番茄酱
cf7f9573a2 fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421)
The parakeet-specific word accessors can return stale initialisation
data (model name, binary blobs) for segments with no real speech.
Add isValidWord() to filter out words that have:
- empty or whitespace-only text
- U+FFFD replacement characters (from binary data scrubbing)
- negative timestamps
- zero duration (end <= start)

Also skip empty segments entirely when they have no recognisable
content (empty text AND no valid words), preventing spurious subtitle
entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'.

Applies to both AudioTranscription and AudioTranscriptionStream.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-21 17:03:33 +02:00
pos-ei-don
c6303104c7 fix(vllm): structured outputs silently ignored on vLLM >= 0.23 (GuidedDecodingParams removed) (#10343)
fix(vllm): structured outputs silently ignored on vLLM >= 0.23

vLLM >= 0.23 removed GuidedDecodingParams (now StructuredOutputsParams) and
renamed the SamplingParams field guided_decoding -> structured_outputs. The
import failed, HAS_GUIDED_DECODING became False, and the whole guided-decoding
block was skipped, so response_format / grammar constraints were silently
ignored. Adapt the existing request.Grammar path to the new class/field.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
2026-06-21 17:02:31 +02:00
LocalAI [bot]
3e96d811b7 fix(ui): keep row action menu anchored and stop scroll snap on /app/manage (#10419)
Opening a model row's kebab (ActionMenu) on the Manage dashboard snapped the
page scroll to the top and rendered the menu detached from its trigger, making
it impossible to operate.

Two compounding causes:

- The menu auto-focus called el.focus() without preventScroll, so the browser
  scrolled the focused element into view, yanking the page to the top.
- The position:fixed Popover was rendered inline inside the table row. The
  editorial UI overhaul added hover transforms to rows/cards, and a transformed
  ancestor re-anchors position:fixed to itself instead of the viewport, so the
  menu (positioned from the trigger's viewport rect) landed in the wrong place.

Fix: portal the Popover to document.body so position:fixed always resolves
against the viewport, position it before paint with useLayoutEffect (no {0,0}
flash), and pass preventScroll:true to both focus calls.

Adds an e2e regression test that reproduces the symptom (scroll jumped from 564
to 0 on the old code) and asserts the menu tracks its trigger.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 23:25:29 +02:00
LocalAI [bot]
23f225260c refactor(config): single source of truth for default values (#10418)
refactor(config): single source of truth for default values across config + backend

Defaults were decided in two areas with duplicated/drifted literals: the config
SetDefaults tiers vs core/backend/options.go's grpcModelOpts (which translates a
ModelConfig to the backend wire format and supplied its own fallbacks). They had
drifted - n_gpu_layers 9999999 (options.go) vs 99999999 (gguf.go), two 512 batch
constants, context 1024 (gguf) vs 4096 (backend) scattered as bare literals.

Introduce core/config/defaults.go as the canonical home (DefaultContextSize=4096,
GGUFFallbackContextSize=1024, DefaultNGPULayers=99999999, DefaultFlashAttention=
auto). gguf.go / hooks_llamacpp.go use them directly; core/backend references them
(backend imports config, never the reverse) so DefaultContextSize/DefaultBatchSize
and the flash-attn / n_gpu_layers fallbacks resolve to one place. The two context
values (1024 GGUF-no-estimate vs 4096 general) are kept distinct but now named +
documented, not blind literals. Behavior-preserving; config + backend suites green.

Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 22:58:36 +02:00
LocalAI [bot]
aef10723c9 feat(config): prefix caching default + consolidate scattered defaults (#10415)
* feat(config): enable cross-request prefix caching for serving (Phase 2)

The llama.cpp backend ships n_cache_reuse=0 (cross-request KV prefix reuse via
shifting disabled). Enable it by default (256) so repeated prefixes - system
prompts, RAG context, agent scaffolds, multi-turn chat - aren't recomputed. This
is the universally-useful part of 'paged attention' (shared-prefix reuse, which
the upstream maintainers themselves identify as where paged attn actually helps)
and needs none of the block-KV machinery.

Lives in a serving_defaults.go sibling to hardware_defaults.go (device-driven vs
serving-policy defaults); both run from SetDefaults and only fill unset values.
Explicit cache_reuse/n_cache_reuse always wins. Device-independent, so it
propagates to distributed nodes via the model options with no router change.
Shares the backendOptionSet helper with the Phase-1 parallel default.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(config): extract generic fallback defaults into ApplyGenericDefaults

Behavior-preserving: move the inline sampling-param + runtime-flag fallbacks out
of SetDefaults into ApplyGenericDefaults, completing the domain-grouped tiers
(ApplyInferenceDefaults=family, ApplyHardwareDefaults=device, ApplyServingDefaults
=serving, ApplyGenericDefaults=generic fallbacks). SetDefaults is now a clean
orchestrator. Same order (runs after the family/hardware/serving tiers so those
win) and same conditions (TopK gated on UsesLlamaSamplerDefaults, MMap on XPU).
No behavior change; full config suite green. (NGPULayers stays in the GGUF-read
path for now - it's device-driven but coupled to model-size detection; a separate
follow-up.)

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 22:44:44 +02:00
LocalAI [bot]
9565db5f94 feat(models): model aliases - redirect a model name to another configured model (#10414)
* feat(config): add model alias field and self-validation

Add ModelConfig.Alias (yaml: alias), IsAlias(), and an alias
short-circuit at the top of Validate() that rejects self-reference and
forbids setting backend/parameters.model on a pure-redirect alias.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(config): resolve and validate model alias targets in the loader

Assisted-by: Claude:opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(middleware): resolve model aliases and stamp requested/served identity

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(modeladmin): reject alias configs with invalid targets on create/edit

Validate alias targets at create/swap entry points (ImportModelEndpoint,
EditYAML, PatchConfig) so a dangling, chained, or disabled alias target is
rejected at save time rather than surfacing as a runtime error.

Assisted-by: Claude:opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(api): add GET /api/aliases to list model aliases

Adds an admin-gated read-only endpoint that lists every model alias
config as {name, target} pairs, backed by the loader's existing
GetAllModelsConfigs().

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(mcp): add set_alias and list_aliases tools

Expose model-alias management over the LocalAI Assistant MCP surface:
list_aliases (read-only, GET /api/aliases) and set_alias (mutating).
SetAlias is swap-first: PATCH /api/models/config-json/:name swaps an
existing alias's target (validated, non-destructive) and a 404 falls
back to POST /models/import to create a fresh {name, alias} config. The
inproc client mirrors this via ConfigService.PatchConfig + a create path
modeled on ImportModelEndpoint. Deletion reuses delete_model.

Assisted-by: Claude:claude-opus-4 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* style(mcp): replace em dashes in alias tool comments

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(config-meta): expose alias as a model-select field

Add an 'alias' section to DefaultSections() and an 'alias' field override
in DefaultRegistry() so the schema-driven React editor renders the new
top-level ModelConfig.Alias field as a model picker in its own section.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add alias template card and Manage alias badge

Add an 'Alias / Routing' template to the create-flow gallery that seeds a
minimal name + alias config, and a read-only 'alias -> target' badge on the
Manage Models tab. The capabilities row payload does not carry the alias
field, so the badge resolves targets from GET /api/aliases looked up by name.

Assisted-by: Claude:claude-opus-4 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs: document model aliases

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(swagger): regenerate for GET /api/aliases

Adds the /api/aliases path and AliasInfo schema generated from the
ListAliasesEndpoint annotation.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(localai): check os.RemoveAll error in aliases_test

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: correct alias conversion docs and advertise /api/aliases in instructions

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(mcp): write alias config 0600 to satisfy gosec G306

The inproc createAlias path wrote the alias YAML with 0644, which gosec
flags as a new G306 finding on the PR. The LocalAI process is the sole
reader/writer of model configs, so 0600 is correct and keeps the scan clean.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 22:38:42 +02:00
LocalAI [bot]
e19c43cf04 feat(gallery): add Depth Anything V2 models + bump native version (#10413)
* feat(gallery): add Depth Anything V2 models + bump native version

Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is
depth-only (no camera pose, no confidence) and ships both relative
(relative inverse depth) and metric (depth in metres) variants. The Go
backend is model-agnostic, so no backend code changes are required — only
a native version bump and new gallery entries.

- backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the
  depth-anything.cpp commit that adds the DA2 engine + C-API routing
  (e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag
  da2-support so it survives a squash-merge).
- gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus
  Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models
  carry the metric-depth tag; none carry camera-pose.

Assisted-by: Claude:claude-opus-4-8

* chore(depth-anything-cpp): pin to merged DA2 master commit

PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint
the pin from the pre-merge commit to the canonical master commit.

Assisted-by: Claude:claude-opus-4-8

---------

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 14:56:16 +02:00
LocalAI [bot]
b081247d95 feat(config): hardware-tuned defaults — Blackwell batch + VRAM-scaled concurrency (#10411)
* feat(config): node-aware hardware defaults — larger physical batch on Blackwell

A larger physical batch (n_batch/n_ubatch) materially lifts MoE prefill on
NVIDIA Blackwell consumer GPUs (sm_120/121, incl. GB10 / DGX Spark) — measured
on a GB10 with Qwen3-Coder-30B-A3B, the prefill ceiling rises (ub512 ~2994 ->
ub2048 ~3316 t/s) and saturates around 2048.

The heuristic lives in core/config alongside the other config overriders
(ApplyInferenceDefaults, guessDefaultsFromFile/NGPULayers) — they all fill the
ModelConfig from heuristics, so hardware tuning is the same domain and stays in
one place. It is parameterized on a GPU descriptor (not direct detection) so it
works in both deployment shapes:

- Single host: SetDefaults applies it with the LocalGPU.
- Distributed: only the worker sees the GPU, so the worker reports its compute
  capability on registration (gpu_compute_capability -> BackendNode), and the
  router re-applies the SAME core/config heuristic for the SELECTED node before
  loading — fixing the case where the frontend has no GPU at all.

Explicit `batch:` always wins (only managed default values are touched).
xsysinfo gains NVIDIAComputeCapability() (detection only); all interpretation
lives in core/config. Tests: core/config, pkg/xsysinfo, core/services/nodes.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(config): injectable local-GPU seam + single-instance coverage

Make local GPU detection an injectable package var (localGPU) so the
single-instance path (SetDefaults -> ApplyHardwareDefaults) is deterministically
testable without a real GPU, mirroring the distributed override's coverage.
Adds specs asserting SetDefaults sets the Blackwell physical batch, leaves it
unset on non-Blackwell, and never overrides an explicit batch.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(config): default concurrent serving (n_parallel) by GPU VRAM

The llama.cpp backend defaults n_parallel=1, which serializes multi-user requests
and leaves continuous batching off (it auto-enables only at n_parallel>1). Fold a
VRAM-scaled parallel-slot default into the hardware-config path so multi-user
serving works out of the box: >=32GiB->8, >=8GiB->4, >=4GiB->2, else unchanged.
With the backend's unified KV the slots SHARE the context budget, so this adds
concurrency without multiplying KV memory. Explicit parallel/n_parallel always
wins. EnsureParallelOption is shared by the single-host path (ApplyHardwareDefaults
with the local GPU) and the distributed router (per selected node's reported VRAM,
since the frontend may have no GPU). LocalGPU now also reports VRAM.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 14:45:59 +02:00
LocalAI [bot]
1be959ce30 docs: mention apex-quant in the README (#10412)
Add apex-quant (MoE per-tensor/per-layer quantization recipe) to the
"Backends built by us" section as a note after the engines table, since
it is a quantization recipe rather than a native inference engine.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 11:04:56 +02:00
LocalAI [bot]
518381278e chore: ⬆️ Update ggml-org/llama.cpp to e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62 (#10392)
* ⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama-cpp): adapt grpc-server to upstream server-schema split

Upstream llama.cpp (e475fa2) extracted the JSON request-schema evaluation
out of the static server_task::params_from_json_cmpl into the new
server_schema::eval_llama_cmpl_schema (tools/server/server-schema.cpp).
The grpc-server unity build still called the old static member, breaking
every llama-cpp backend build with "no member named 'params_from_json_cmpl'
in 'server_task'".

Pull server-schema.cpp into the translation unit and call the new function,
keeping both guarded by __has_include so forks that predate the split (e.g.
llama-cpp-turboquant, which still exposes params_from_json_cmpl) keep
compiling against the old static member.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 08:22:22 +02:00
LocalAI [bot]
93706fec57 chore: ⬆️ Update mudler/parakeet.cpp to db755a78d39f789bb7d4e3935158a9e8105dbe36 (#10393)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:33 +02:00
LocalAI [bot]
11aee03a80 chore: ⬆️ Update localai-org/privacy-filter.cpp to 98f52c5ef2250f207cc6b9a6aef05393a120cb7c (#10394)
⬆️ Update localai-org/privacy-filter.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:21 +02:00
LocalAI [bot]
8915f2ab91 chore: ⬆️ Update ggml-org/whisper.cpp to 5ed76e9a079962f1c85cfce44edd325c27ef1f97 (#10396)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:06 +02:00
LocalAI [bot]
f143d7f688 chore: ⬆️ Update ikawrakow/ik_llama.cpp to d47f484d299cafad2e606afc0d31677a91b242d0 (#10410)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:51 +02:00
LocalAI [bot]
dd928f0bdd chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 26fcea5468e4069bc72d1f2fcc812c985e7361bb (#10409)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:36 +02:00
LocalAI [bot]
c43a752afc chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd (#10408)
⬆️ Update ServeurpersoCom/omnivoice.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:22 +02:00
LocalAI [bot]
079ac0e15a fix(realtime): raise WebRTC data-channel max-message-size + keep sendLoop alive (#10407)
* fix(realtime): raise WebRTC data-channel max-message-size for large events

Browsers advertise a conservative SCTP max-message-size in their SDP offer
(Chrome uses 256 KiB). pion enforces the remote's advertised value on send, so
a single realtime event larger than it cannot be sent over the "oai-events"
data channel: SendText fails, the event is dropped, and the turn silently
yields no response. Some turns legitimately produce a >256 KiB JSON event —
notably tool calls with sizeable schemas or results.

Browsers advertise the value conservatively but their SCTP stacks reassemble
much larger messages, so raise the max-message-size honored for our own
server-generated events by rewriting the attribute in the offer before
SetRemoteDescription.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(realtime): keep the WebRTC sendLoop alive when one event send fails

A failed SendText on the oai-events data channel exited the sender goroutine,
so a single dropped event (e.g. one over the negotiated SCTP max-message-size)
tore down the session and silently dropped every subsequent event. Log and skip
the offending event instead and keep draining; a genuinely dead transport is
still handled by the closed / connection-state path.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-19 21:36:25 +02:00
LocalAI [bot]
2e734bf560 fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping (#10406)
* fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping

Large model installs would hang forever or never finish. Three defects in
the HTTP download path, all hit by big GGUF pulls over a slow or flaky link:

1. No stall timeout. The shared download client sets no body deadline
   (correct for streaming) but also no read-idle timeout, and the
   transport's IdleConnTimeout does not cover an in-flight body read. A
   silently-dropped TCP connection (no FIN/RST) blocked the body Read
   forever, freezing an install at N bytes until an external reaper killed
   it. Add an idle-timeout reader that closes the body after a window of
   zero progress (DownloadStallTimeout, default 60s), turning an indefinite
   hang into a fast, retryable error. A read that returns data resets the
   clock, so a slow-but-steady transfer is unaffected.

2. Cancellation deleted the partial. On context.Canceled the code removed
   the .partial file, so any frontend restart (deploy, OOM) mid-download
   wiped all progress and the retry restarted from zero. At slow egress,
   files larger than the restart interval never completed. Keep the
   .partial on cancel so the next attempt resumes via Range.

3. Partials leaked. Cleanup only ran on the context-cancel path, never on a
   stall or a SIGKILL/OOM, so abandoned .partial files accumulated and could
   fill the models volume. Add CleanupStalePartialFiles and reap partials
   older than 24h on startup.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(downloader): discard the .partial on a deliberate user cancel

Review follow-up. The previous commit kept the .partial on every cancellation
so restarts could resume, but that also left a dangling partial when a user
*intentionally* cancelled an install — the file lingered until the 24h reaper.

Distinguish the two: cancel the gallery operation's context with a cause
(downloader.ErrUserCancelled) so the download layer can tell a deliberate
abort (discard the partial) from an incidental one such as a shutdown/restart
(keep it for resume). Detect cancellation via the context rather than the
returned error, because an HTTP request cancelled with a cause surfaces the
cause error, not context.Canceled.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(downloader): resolve gosec G122 in CleanupStalePartialFiles

CI's code-scanning (gosec) flagged G122 (symlink TOCTOU) for the os.Remove
call inside the filepath.WalkDir callback. Collect the stale paths during the
walk and delete them afterwards instead of mutating the tree from inside the
callback. Behavior is unchanged; the existing specs still pass.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-19 21:35:21 +02:00
番茄摔成番茄酱
72d46c1115 feat(crispasr): add word-level timestamp support (#10403)
* feat(crispasr): add word-level timestamp support

Add word-level timestamp extraction to the crispasr backend by calling
the CrispASR C library's word accessor functions that are already
exported by libgocraspasr but were not previously bound by the Go
wrapper.

Two families of word functions are supported:

1. Session-based (get_word_count/text/t0/t1) — works per-segment for
   whisper-like backends.
2. Parakeet-specific (get_parakeet_word_count/text/t0/t1) — returns a
   global word list for TDT/CTC/RNNT parakeet models where the session
   API does not expose per-segment word data.

The Go code tries session-based first and falls back to parakeet-specific
when the session word count is zero.

Depends on #10402 (grpc server Words forwarding) for the words to reach
the HTTP response.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

* fix(crispasr): use portable sed -i.bak for macOS compatibility

BSD sed requires -i '' for in-place editing while GNU sed uses -i.
Replace with -i.bak which works on both platforms, then remove the
backup file.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

---------

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-19 21:34:30 +02:00
Richard Palethorpe
606128e4e9 feat(vulkan): make Vulkan backends self-contained on the GPU (#10404)
Vulkan backends bundled their own loader and ICD manifests but neither the
Mesa driver the manifests point at nor a way to make the loader find them,
so on a runtime base image without Mesa the loader enumerated zero devices
and the GPU silently fell back to CPU (only NVIDIA worked, since its ICD is
injected by the container toolkit).

- scripts/build/package-gpu-libs.sh: for each installed ICD manifest, bundle
  the driver .so its library_path names — no hard-coded, platform-dependent
  soname list — plus that driver's ldd dependencies, skipping manifests whose
  driver isn't installed. Rewrite each library_path to a bare soname so the
  bundled driver resolves via the LD_LIBRARY_PATH run.sh already sets.
- .docker/install-base-deps.sh, backend/Dockerfile.golang,
  backend/Dockerfile.python: install mesa-vulkan-drivers in every Vulkan
  builder so the driver + manifests exist to be packaged (the LunarG SDK
  ships only the loader and shader tooling).
- pkg/model/process.go: when a backend ships vulkan/icd.d/, point the loader
  at it via VK_DRIVER_FILES/VK_ICD_FILENAMES at launch (no-op otherwise).
  Covered by pkg/model/process_vulkan_test.go.
- backend/go/parakeet-cpp/package.sh: complete the L0 stub (was missing the
  libc-family ldd walk + GPU-lib packaging) by mirroring whisper, so the
  vulkan-parakeet image actually bundles its GPU runtime.

Assisted-by: Claude Code:claude-opus-4-8

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-19 17:16:33 +02:00
Souheab
59c7ad5153 fix(nix flake): ensure nix flake builds successfully (#10399)
* Use inference defaults in repo src rather than fetching

there are inference_defaults.json already in the repo so we can use
those, they are regularly updated with github actions, and we avoid hash
mismatch errors in the flake this way

Signed-off-by: Souheab <souheab@protonmail.com>

* Update vendor hash

Signed-off-by: Souheab <souheab@protonmail.com>

* Create react-ui derivation as it is required for go build

Signed-off-by: Souheab <souheab@protonmail.com>

* Add FHS env wrapper to make #!/bin/bash scripts work

Signed-off-by: Souheab <souheab@protonmail.com>

* use pkgs.importNpmLock to deal with npm dependencies instead of using npmDepsHash

Signed-off-by: Souheab <souheab@protonmail.com>

---------

Signed-off-by: Souheab <souheab@protonmail.com>
2026-06-19 17:15:18 +02:00
番茄摔成番茄酱
78d682224a fix(grpc): forward word-level timestamps in AudioTranscription wrapper (#10402)
The gRPC server wrapper in pkg/grpc/server.go reconstructs
TranscriptSegment messages when relaying AudioTranscription results
from backends. The Words field was not being copied, causing all
word-level timestamps to be silently dropped regardless of backend
support.

This was introduced when PR #9621 added the TranscriptWord proto
message and transcriptResultFromProto (server-side), but did not
update the server-side gRPC relay to forward the new field.

Fixes #9306

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-19 14:59:50 +02:00
357 changed files with 14771 additions and 2554 deletions

View File

@@ -198,6 +198,27 @@ docker-build-backends: ... docker-build-<backend-name>
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
## Documenting the backend (README + docs)
A backend is not "added" until it is discoverable. Update the user-facing docs:
- **`docs/content/features/backends.md`** - add the backend to the right
category in the "LocalAI supports various types of backends" list (and add a
new category if it introduces a new modality, e.g. sound classification).
- If the backend introduces a **new API surface** (a new endpoint or a realtime
capability), document it under `docs/content/` where its area lives (audio,
vision, etc.) and follow the api-endpoints checklist in
[api-endpoints-and-auth.md](api-endpoints-and-auth.md).
**If the backend is a native C/C++/GGML engine created and maintained by the
LocalAI team** (a from-scratch port like `parakeet.cpp`, `ced.cpp`,
`vibevoice.cpp`, `rf-detr.cpp`, not a wrapper around a third-party runtime), it
ALSO belongs in the top-level **`README.md`** table under "native C/C++/GGML
engines ... developed and maintained by the LocalAI project itself". Add a row
linking the upstream engine repo with a one-line description. This is the
project's showcase of its own engines; a new in-house backend that is missing
from it is a documentation bug.
## 5. Verification Checklist
After adding a new backend, verify:
@@ -211,6 +232,8 @@ After adding a new backend, verify:
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
- [ ] Documented: added to the category list in `docs/content/features/backends.md` (and any new endpoint/realtime capability documented under `docs/content/`)
- [ ] If it is an in-house native C/C++/GGML engine, added to the maintained-engines table in the top-level `README.md`
## Bundling runtime shared libraries (`package.sh`)

View File

@@ -70,6 +70,12 @@ if [ "${BUILD_TYPE:-}" = "vulkan" ] && [ "${SKIP_DRIVERS:-false}" = "false" ]; t
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe + Arm SoC) and their ICD
# manifests. The LunarG SDK below only provides the loader and shader
# tooling, not hardware drivers — without Mesa the packaged Vulkan backend
# would ship a loader that finds no GPU. package-gpu-libs.sh bundles these
# .so files plus their deps into the backend so it stays self-contained.
apt-get install -y mesa-vulkan-drivers libdrm2
if [ "amd64" = "${TARGETARCH:-}" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz"
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz

View File

@@ -3575,6 +3575,154 @@ include:
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# ced
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-ced'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-ced'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-ced'
base-image: "ubuntu:24.04"
ubuntu-version: '2404'
runs-on: 'ubuntu-24.04-arm'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-ced'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-ced'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-ced'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-ced'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platform-tag: 'amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-ced'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/arm64'
platform-tag: 'arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-ced'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-ced'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2204'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-ced'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
runs-on: 'ubuntu-latest'
skip-drivers: 'false'
backend: "ced"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# acestep-cpp
- build-type: ''
cuda-major-version: ""
@@ -4754,6 +4902,10 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-parakeet-cpp"
build-type: "metal"
lang: "go"
- backend: "ced"
tag-suffix: "-metal-darwin-arm64-ced"
build-type: "metal"
lang: "go"
- backend: "acestep-cpp"
tag-suffix: "-metal-darwin-arm64-acestep-cpp"
build-type: "metal"
@@ -4822,6 +4974,12 @@ includeDarwin:
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
- backend: "trl"
tag-suffix: "-metal-darwin-arm64-trl"
build-type: "mps"
- backend: "liquid-audio"
tag-suffix: "-metal-darwin-arm64-liquid-audio"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"
@@ -4838,6 +4996,10 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
build-type: "metal"
lang: "go"
- backend: "supertonic"
tag-suffix: "-metal-darwin-arm64-supertonic"
build-type: "metal"
lang: "go"
- backend: "local-store"
tag-suffix: "-metal-darwin-arm64-local-store"
build-type: "metal"

View File

@@ -44,7 +44,7 @@ jobs:
has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
uses: actions/checkout@v7
- name: Setup Bun
uses: oven-sh/setup-bun@v2

View File

@@ -101,7 +101,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true

View File

@@ -57,7 +57,7 @@ jobs:
HOMEBREW_NO_ANALYTICS: '1'
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true

View File

@@ -49,7 +49,7 @@ jobs:
# Sparse checkout: the merge job needs `.github/scripts/` (for the
# keepalive cleanup script) but none of the source tree.
- name: Checkout (.github/scripts only)
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
sparse-checkout: |
.github/scripts

View File

@@ -23,7 +23,7 @@ jobs:
has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
uses: actions/checkout@v7
- name: Setup Bun
uses: oven-sh/setup-bun@v2

View File

@@ -127,7 +127,7 @@ jobs:
# the original l4t matrix entry which set skip-drivers: 'true'.
skip-drivers: 'true'
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
submodules: false
- name: Free disk space

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Set up Go
@@ -25,7 +25,7 @@ jobs:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Set up Go
@@ -47,7 +47,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Configure apt mirror on runner

View File

@@ -14,7 +14,7 @@ jobs:
bump:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- uses: actions/setup-go@v5
with:

View File

@@ -42,6 +42,10 @@ jobs:
variable: "PARAKEET_VERSION"
branch: "master"
file: "backend/go/parakeet-cpp/Makefile"
- repository: "mudler/ced.cpp"
variable: "CED_VERSION"
branch: "master"
file: "backend/go/ced/Makefile"
- repository: "mudler/depth-anything.cpp"
variable: "DEPTHANYTHING_VERSION"
branch: "master"
@@ -88,7 +92,7 @@ jobs:
file: "backend/go/vibevoice-cpp/Makefile"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Bump dependencies 🔧
id: bump
run: |
@@ -124,7 +128,7 @@ jobs:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Bump vLLM cu130 wheel pin 🔧
id: bump
run: |

View File

@@ -13,7 +13,7 @@ jobs:
- repository: "mudler/LocalAI"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Bump dependencies 🔧
run: |
bash .github/bump_docs.sh ${{ matrix.repository }}

View File

@@ -8,7 +8,7 @@ jobs:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Install dependencies

View File

@@ -16,7 +16,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- uses: actions/setup-go@v5

View File

@@ -31,7 +31,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -44,7 +44,7 @@ jobs:
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
- name: Cache Intel images
uses: docker/build-push-action@v7

View File

@@ -28,7 +28,7 @@ jobs:
HUGO_VERSION: "0.146.3"
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0 # needed for enableGitInfo
submodules: true

View File

@@ -80,7 +80,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
- name: Configure apt mirror on runner
id: apt_mirror

View File

@@ -36,7 +36,7 @@ jobs:
# Sparse checkout: needed for .github/scripts/ (the keepalive cleanup
# script). Skips the rest of the source tree.
- name: Checkout (.github/scripts only)
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
sparse-checkout: |
.github/scripts

View File

@@ -20,7 +20,7 @@ jobs:
golangci-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
with:
# Full history so golangci-lint's new-from-merge-base can reach
# origin/master and compute the diff against it.

View File

@@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Set up Go
@@ -28,7 +28,7 @@ jobs:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Set up Go
@@ -46,7 +46,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
fetch-depth: 0
- name: Configure apt mirror on runner

View File

@@ -14,7 +14,7 @@ jobs:
GO111MODULE: on
steps:
- name: Checkout Source
uses: actions/checkout@v6
uses: actions/checkout@v7
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}

View File

@@ -50,7 +50,7 @@ jobs:
parakeet-cpp: ${{ steps.detect.outputs.parakeet-cpp }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
uses: actions/checkout@v7
- name: Setup Bun
uses: oven-sh/setup-bun@v2
- name: Install dependencies
@@ -67,7 +67,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -90,7 +90,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -113,7 +113,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -137,7 +137,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -158,7 +158,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -178,7 +178,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -240,7 +240,7 @@ jobs:
# sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
# df -h
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -265,7 +265,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -288,7 +288,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -309,7 +309,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -330,7 +330,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -351,7 +351,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -373,7 +373,7 @@ jobs:
# timeout-minutes: 45
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -394,7 +394,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -415,7 +415,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -436,7 +436,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -462,7 +462,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -484,7 +484,7 @@ jobs:
timeout-minutes: 30
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -513,7 +513,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -530,7 +530,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -552,7 +552,7 @@ jobs:
timeout-minutes: 20
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -579,7 +579,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -604,7 +604,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -625,7 +625,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -645,7 +645,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -664,7 +664,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -681,7 +681,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -698,7 +698,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -741,7 +741,7 @@ jobs:
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -783,7 +783,7 @@ jobs:
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v6
# uses: actions/checkout@v7
# with:
# submodules: true
# - name: Dependencies
@@ -808,7 +808,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -840,7 +840,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -876,7 +876,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -915,7 +915,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -952,7 +952,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -987,7 +987,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -1013,7 +1013,7 @@ jobs:
timeout-minutes: 150
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -1042,7 +1042,7 @@ jobs:
timeout-minutes: 60
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go
@@ -1058,7 +1058,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -1091,7 +1091,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -1114,7 +1114,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies
@@ -1140,7 +1140,7 @@ jobs:
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies

View File

@@ -21,7 +21,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Free disk space
@@ -84,7 +84,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}

View File

@@ -62,7 +62,7 @@ jobs:
sudo rm -rfv build || true
df -h
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Dependencies

View File

@@ -21,7 +21,7 @@ jobs:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Configure apt mirror on runner

97
.github/workflows/tests-pii-ner-e2e.yml vendored Normal file
View File

@@ -0,0 +1,97 @@
---
name: 'PII NER tier E2E (live GGUF, CPU)'
# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the
# hermetic tests/e2e suite cannot cover (it only exercises the in-process
# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB
# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand.
#
# This drives the container-level harness (tests/e2e-backends) via
# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image,
# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned
# TokenClassify spans. The complementary HTTP-path specs in tests/e2e
# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired.
on:
workflow_dispatch:
schedule:
- cron: '0 3 * * *'
push:
branches:
- master
paths:
- 'backend/cpp/privacy-filter/**'
- 'backend/Dockerfile.privacy-filter'
- 'core/services/routing/pii/**'
- 'core/services/routing/piidetector/**'
- 'core/backend/token_classify.go'
- 'core/http/endpoints/localai/pii.go'
- 'core/schema/pii.go'
- 'tests/e2e-backends/**'
- 'tests/e2e/e2e_pii_ner_test.go'
- 'tests/e2e/e2e_suite_test.go'
- '.github/workflows/tests-pii-ner-e2e.yml'
pull_request:
paths:
- 'backend/cpp/privacy-filter/**'
- 'backend/Dockerfile.privacy-filter'
- 'core/services/routing/pii/**'
- 'core/services/routing/piidetector/**'
- 'core/backend/token_classify.go'
- 'core/http/endpoints/localai/pii.go'
- 'core/schema/pii.go'
- 'tests/e2e-backends/**'
- 'tests/e2e/e2e_pii_ner_test.go'
- 'tests/e2e/e2e_suite_test.go'
- '.github/workflows/tests-pii-ner-e2e.yml'
concurrency:
group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
tests-pii-ner-e2e:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v7
with:
submodules: true
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true
sudo docker image prune --all --force || true
df -h
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
- name: Proto Dependencies
run: |
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential
# Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on
# CPU and runs the token_classify capability spec (byte-offset contract).
- name: Run live PII NER backend E2E
run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

View File

@@ -23,7 +23,7 @@ jobs:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v6
uses: actions/checkout@v7
with:
submodules: true
- name: Configure apt mirror on runner

View File

@@ -10,7 +10,7 @@ jobs:
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@v7
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- uses: actions/setup-go@v5

3
.gitignore vendored
View File

@@ -91,3 +91,6 @@ core/http/react-ui/test-results/
# Local worktrees
.worktrees/
# SDD / brainstorm scratch (agent-driven development)
.superpowers/

View File

@@ -690,6 +690,16 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
BACKEND_TEST_CTX_SIZE=2048 \
$(MAKE) test-extra-backend
## privacy-filter: the PII/NER token-classification backend. Exercises the
## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets
## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M
## active params). This is the live-backend coverage for the PII NER tier.
test-extra-backend-privacy-filter: docker-build-privacy-filter
BACKEND_IMAGE=local-ai-backend:privacy-filter \
BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \
BACKEND_TEST_CAPS=health,load,token_classify \
$(MAKE) test-extra-backend
## vllm is resolved from a HuggingFace model id (no file download) and
## exercises Predict + streaming + tool-call extraction via the hermes parser.
## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU

View File

@@ -231,6 +231,7 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native
| Backend | What it does |
|---------|-------------|
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
| [ced.cpp](https://github.com/mudler/ced.cpp) | C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition |
| [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C |
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation |
@@ -240,6 +241,8 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
## Resources
- [Documentation](https://localai.io/)

View File

@@ -65,7 +65,12 @@ RUN <<EOT bash
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils && \
apt-get install -y mesa-vulkan-drivers libdrm2
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe) + their manifests. The
# LunarG SDK below only provides the loader and shader tooling, not
# hardware drivers — without Mesa, package-gpu-libs.sh has no ICD to
# bundle and the packaged backend finds no GPU at runtime.
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \

View File

@@ -66,7 +66,12 @@ RUN <<EOT bash
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils && \
apt-get install -y mesa-vulkan-drivers libdrm2
# Mesa Vulkan ICD drivers (ANV/RADV/lavapipe) + their manifests. The
# LunarG SDK below only provides the loader and shader tooling, not
# hardware drivers — without Mesa, package-gpu-libs.sh has no ICD to
# bundle and the packaged backend finds no GPU at runtime.
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \

View File

@@ -24,6 +24,9 @@ service Backend {
rpc TokenizeString(PredictOptions) returns (TokenizationResponse) {}
rpc Status(HealthMessage) returns (StatusResponse) {}
rpc Detect(DetectOptions) returns (DetectResponse) {}
// SoundDetection runs an audio-tagging / sound-event-classification model
// (e.g. CED over the AudioSet ontology) on a clip and returns scored labels.
rpc SoundDetection(SoundDetectionRequest) returns (SoundDetectionResponse) {}
rpc Depth(DepthRequest) returns (DepthResponse) {}
rpc FaceVerify(FaceVerifyRequest) returns (FaceVerifyResponse) {}
rpc FaceAnalyze(FaceAnalyzeRequest) returns (FaceAnalyzeResponse) {}
@@ -671,6 +674,24 @@ message DetectResponse {
repeated Detection Detections = 1;
}
// --- Sound-event classification / audio tagging messages (CED) ---
message SoundDetectionRequest {
string src = 1; // audio file path (LocalAI writes the upload to disk)
int32 top_k = 2; // number of top tags to return (0 = all classes)
float threshold = 3; // optional: drop tags scoring below this
}
message SoundClass {
string label = 1; // AudioSet class name, e.g. "Baby cry, infant cry"
float score = 2; // per-class probability (multi-label, independent)
int32 index = 3; // class index in the model ontology
}
message SoundDetectionResponse {
repeated SoundClass detections = 1; // score-descending
}
// --- Depth estimation messages (Depth Anything 3) ---
message DepthRequest {

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=b3dfb7858cfcb9166e92f366e5af87f19ebc94be
IK_LLAMA_VERSION?=d5507e33ae7ee2b7b41475f08044d3bde3b839ee
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=f3e182816421c648188b5eab269853bf1531d950
LLAMA_VERSION?=8be759e6f70d629638a7eb70db3824cbdcea370b
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -18,6 +18,18 @@
#if __has_include("server-chat.cpp")
#include "server-chat.cpp"
#endif
// server-schema.cpp exists only in llama.cpp after the upstream refactor that
// extracted the JSON request-schema evaluation (previously the static
// server_task::params_from_json_cmpl) into server_schema::eval_llama_cmpl_schema.
// server-context.cpp and grpc-server.cpp both call into it, so its definitions
// must be part of this translation unit or the link fails. __has_include keeps
// the source compatible with older pins/forks (e.g. llama-cpp-turboquant) that
// predate the split and still expose params_from_json_cmpl (see the guarded
// call sites below).
#if __has_include("server-schema.cpp")
#define LOCALAI_HAS_SERVER_SCHEMA 1
#include "server-schema.cpp"
#endif
#include "server-context.cpp"
// LocalAI
@@ -25,6 +37,7 @@
#include "backend.pb.h"
#include "backend.grpc.pb.h"
#include "common.h"
#include "arg.h"
#include "chat-auto-parser.h"
#include <getopt.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
@@ -580,6 +593,10 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.checkpoint_min_step = 256;
#endif
// Raw upstream llama-server flags collected from any option entry that
// starts with '-'. Applied once after the loop via common_params_parse.
std::vector<std::string> extra_argv;
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
std::string opt = request->options(i);
@@ -1068,6 +1085,31 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} catch (...) {}
}
// --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) ---
} else if (!strcmp(optname, "cpu_moe")) {
// Bool-style flag: keep all MoE expert weights on CPU.
const bool enable = (optval == NULL) ||
optval_str == "true" || optval_str == "1" || optval_str == "yes" ||
optval_str == "on" || optval_str == "enabled";
if (enable) {
params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override());
}
} else if (!strcmp(optname, "n_cpu_moe")) {
if (optval != NULL) {
try {
int n = std::stoi(optval_str);
if (n < 0) n = 0;
// Keep override-name storage alive for the lifetime of the
// params struct (mirrors upstream arg.cpp's function-local static).
static std::list<std::string> buft_overrides_main;
for (int i = 0; i < n; ++i) {
buft_overrides_main.push_back(llm_ffn_exps_block_regex(i));
params.tensor_buft_overrides.push_back(
{buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()});
}
} catch (...) {}
}
// --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) ---
} else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) {
// Format: <tensor regex>=<buffer type>,<tensor regex>=<buffer type>,...
@@ -1099,6 +1141,30 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
else { cur.push_back(c); }
}
if (!cur.empty()) flush(cur);
// --- generic passthrough: any entry starting with '-' is a raw
// upstream llama-server flag, forwarded verbatim to the parser. ---
} else if (optname[0] == '-') {
std::string flag = optname;
// These flags make upstream's parser exit() (printing usage /
// completion), which would kill the backend process. Skip them.
if (flag == "-h" || flag == "--help" || flag == "--usage" ||
flag == "--version" || flag == "--license" ||
flag == "--list-devices" || flag == "-cl" ||
flag == "--cache-list" ||
flag.rfind("--completion", 0) == 0) {
fprintf(stderr,
"[llama-cpp] ignoring passthrough flag that would exit: %s\n",
flag.c_str());
} else {
extra_argv.push_back(flag);
// Preserve the whole value after the first ':' so embedded
// colons (e.g. host:port) survive strtok's truncation of optval.
auto colon = opt.find(':');
if (colon != std::string::npos) {
extra_argv.push_back(opt.substr(colon + 1));
}
}
}
}
@@ -1134,27 +1200,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
}
}
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {
@@ -1247,6 +1292,69 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.sampling.grammar_triggers.push_back(std::move(trigger));
}
}
// Apply any raw upstream flags last so an explicit passthrough flag wins
// over the LocalAI-resolved field it maps to (e.g. --ctx-size beats
// context_size). This is the same parser llama-server itself uses.
if (!extra_argv.empty()) {
// common_params_parser_init resets a few fields for the SERVER example
// (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated
// passthrough flag can't silently clobber LocalAI's resolved value.
const int saved_n_parallel = params.n_parallel;
std::vector<char *> argv;
std::string prog = "llama-server";
argv.push_back(prog.data());
for (auto & a : extra_argv) {
argv.push_back(a.data());
}
// ctx_arg.params is a reference, so this overlays the given flags onto
// `params` in place. Returns false on a recoverable parse error (and
// self-restores params); may exit() on a hard error, exactly as
// passing the same bad flag to llama-server would.
if (!common_params_parse((int)argv.size(), argv.data(), params,
LLAMA_EXAMPLE_SERVER)) {
fprintf(stderr,
"[llama-cpp] failed to parse passthrough options; ignoring them\n");
}
// Restore n_parallel unless a passthrough flag explicitly set it
// (parser_init's reset sentinel for SERVER is -1).
if (params.n_parallel == -1) {
params.n_parallel = saved_n_parallel;
}
}
// Terminate/pad the override vectors only after BOTH the named-option loop
// and the generic passthrough (common_params_parse above) have pushed their
// real entries, so back() is the null sentinel the model loader asserts on.
// Running these before the passthrough let a passthrough flag (--cpu-moe,
// --override-tensor, --override-kv, ...) append a real entry after the
// sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for
// kv_overrides. Double-termination is harmless (the while is a no-op if the
// passthrough parse already padded; an extra trailing null is ignored).
if (!params.kv_overrides.empty()) {
params.kv_overrides.emplace_back();
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
// Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
// the main-model handling above.
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
@@ -2102,7 +2210,11 @@ public:
task.index = i;
task.tokens = std::move(inputs[i]);
#ifdef LOCALAI_HAS_SERVER_SCHEMA
task.params = server_schema::eval_llama_cmpl_schema(
#else
task.params = server_task::params_from_json_cmpl(
#endif
ctx_server.impl->vocab,
params_base,
ctx_server.get_meta().slot_n_ctx,
@@ -2116,7 +2228,7 @@ public:
// cannot detect tool calls or separate reasoning from content.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by params_from_json_cmpl
// oaicompat_model is already populated by eval_llama_cmpl_schema
tasks.push_back(std::move(task));
}
@@ -2940,7 +3052,11 @@ public:
task.index = i;
task.tokens = std::move(inputs[i]);
#ifdef LOCALAI_HAS_SERVER_SCHEMA
task.params = server_schema::eval_llama_cmpl_schema(
#else
task.params = server_task::params_from_json_cmpl(
#endif
ctx_server.impl->vocab,
params_base,
ctx_server.get_meta().slot_n_ctx,
@@ -2952,7 +3068,7 @@ public:
// reasoning, tool calls, and content are classified into ChatDeltas.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by params_from_json_cmpl
// oaicompat_model is already populated by eval_llama_cmpl_schema
tasks.push_back(std::move(task));
}

View File

@@ -8,7 +8,7 @@
# Local development: point at a working checkout instead of cloning, e.g.
# make PRIVACY_FILTER_SRC=$HOME/c/privacy-filter.cpp grpc-server
PRIVACY_FILTER_VERSION?=646342f7a59c6b7d195185eac60bad762e572f1d
PRIVACY_FILTER_VERSION?=98f52c5ef2250f207cc6b9a6aef05393a120cb7c
PRIVACY_FILTER_REPO?=https://github.com/localai-org/privacy-filter.cpp
PRIVACY_FILTER_SRC?=

View File

@@ -117,7 +117,8 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \
cd .. && \
mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: acestep-cpp
@echo "Running acestep-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,7 +23,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("ACESTEP_LIBRARY")
if libName == "" {
libName = "./libgoacestepcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgoacestepcpp-fallback.dylib"
} else {
libName = "./libgoacestepcpp-fallback.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -13,6 +13,7 @@ mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/acestep-cpp $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single library variant (Metal or Accelerate). The goacestepcpp
# target is built as a CMake MODULE, which emits a .dylib for a SHARED
# build but a .so for a MODULE build on Apple, so prefer .dylib and fall
# back to .so.
LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib"
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
@@ -36,9 +46,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ACESTEP_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

11
backend/go/ced/.gitignore vendored Normal file
View File

@@ -0,0 +1,11 @@
.cache/
sources/
build/
package/
ced-grpc
# build artifacts staged in-tree by the Makefile (cp from sources/) or
# symlinked for local dev; the real sources live in ced.cpp upstream.
*.so
*.so.*
ced_capi.h
compile_commands.json

78
backend/go/ced/Makefile Normal file
View File

@@ -0,0 +1,78 @@
# ced sound-classification backend Makefile.
#
# Upstream pin lives below as CED_VERSION?=<sha> so .github/bump_deps.sh can find
# and update it (matches the parakeet-cpp / whisper.cpp convention).
#
# Local dev shortcut: symlink an out-of-tree ced.cpp shared build + header and
# skip the clone/cmake steps entirely:
# ln -sf /path/to/ced.cpp/build-shared/libced.so .
# ln -sf /path/to/ced.cpp/include/ced_capi.h .
# go build -o ced-grpc .
CED_VERSION?=c04ac14b7992d00584d9e812c9bb6268598a6ce7
CED_REPO?=https://github.com/mudler/ced.cpp
GOCMD?=go
GO_TAGS?=
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
BUILD_TYPE?=
NATIVE?=false
# Static-link ggml into libced.so (PIC) so the shared lib is self-contained:
# dlopen needs no libggml*.so alongside it, only system libs the runtime image
# already provides.
CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DCED_SHARED=ON -DCED_BUILD_CLI=OFF -DCED_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
endif
# ced.cpp gates its ggml backends behind CED_GGML_* options (set(... CACHE BOOL
# "" FORCE)), so forward those instead of a bare -DGGML_CUDA=ON.
ifeq ($(BUILD_TYPE),cublas)
CMAKE_ARGS+=-DCED_GGML_CUDA=ON -DGGML_CUDA_GRAPHS=ON
else ifeq ($(BUILD_TYPE),openblas)
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
else ifeq ($(BUILD_TYPE),hipblas)
CMAKE_ARGS+=-DCED_GGML_HIP=ON
else ifeq ($(BUILD_TYPE),vulkan)
CMAKE_ARGS+=-DCED_GGML_VULKAN=ON
endif
.PHONY: ced-grpc package build clean purge test all
all: ced-grpc
sources/ced.cpp:
mkdir -p sources/ced.cpp
cd sources/ced.cpp && \
git init -q && \
git remote add origin $(CED_REPO) && \
git fetch --depth 1 origin $(CED_VERSION) && \
git checkout FETCH_HEAD && \
git submodule update --init --recursive --depth 1 --single-branch
libced.so: sources/ced.cpp
cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true
cp -fv sources/ced.cpp/include/ced_capi.h ./
ced-grpc: libced.so main.go goced.go
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o ced-grpc .
package: ced-grpc
bash package.sh
build: package
test:
LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1
clean: purge
rm -rf libced.so* ced_capi.h package ced-grpc
purge:
rm -rf sources/ced.cpp

130
backend/go/ced/goced.go Normal file
View File

@@ -0,0 +1,130 @@
package main
// Go side of the ced backend: purego bindings over ced_capi.h plus the gRPC
// SoundDetection implementation.
//
// SKETCH: the pb.SoundDetection* types come from backend.proto (regenerate with
// `make protogen-go`). The C side is single-threaded per ctx, so we guard the
// engine with engineMu; LocalAI also serializes via base.SingleThread.
import (
"context"
"encoding/json"
"errors"
"fmt"
"sort"
"sync"
"unsafe"
"github.com/mudler/LocalAI/pkg/grpc/base"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// purego-bound entry points from libced.so. Names match ced_capi.h exactly.
var (
CppAbiVersion func() int32
CppLoad func(ggufPath string) uintptr
CppFree func(ctx uintptr)
CppLastError func(ctx uintptr) string
CppNumClasses func(ctx uintptr) int32
CppSampleRate func(ctx uintptr) int32
CppClassifyPathJSON func(ctx uintptr, wavPath string, topK int32) uintptr
CppClassifyPcmJSON func(ctx uintptr, pcm []float32, nSamples int32, sampleRate int32, topK int32) uintptr
CppFreeString func(s uintptr)
)
// cstr copies a malloc'd C string (returned as uintptr) into a Go string and
// frees the original via ced_capi_free_string. Empty/0 -> "".
func cstr(p uintptr) string {
if p == 0 {
return ""
}
defer CppFreeString(p)
var b []byte
for i := 0; ; i++ {
ch := *(*byte)(unsafe.Pointer(p + uintptr(i))) //nolint:govet // #nosec G103 -- C-owned NUL-terminated string from libced (not Go-GC memory)
if ch == 0 {
break
}
b = append(b, ch)
}
return string(b)
}
// Ced is the gRPC backend. One loaded CED model per instance.
type Ced struct {
base.Base
ctxPtr uintptr
engineMu sync.Mutex
}
// Load resolves the GGUF and opens the C-API context.
func (c *Ced) Load(opts *pb.ModelOptions) error {
if opts.ModelFile == "" {
return errors.New("ced: ModelFile is required")
}
ctx := CppLoad(opts.ModelFile)
if ctx == 0 {
return fmt.Errorf("ced: ced_capi_load failed for %q: %s", opts.ModelFile, CppLastError(0))
}
c.ctxPtr = ctx
return nil
}
// jsonTag mirrors the ced_capi JSON tag objects.
type jsonTag struct {
Index int `json:"index"`
Score float32 `json:"score"`
Label string `json:"label"`
}
// SoundDetection classifies the clip at req.Src and returns scored AudioSet tags.
func (c *Ced) SoundDetection(ctx context.Context, req *pb.SoundDetectionRequest) (*pb.SoundDetectionResponse, error) {
if c.ctxPtr == 0 {
return nil, errors.New("ced: model not loaded")
}
if req.GetSrc() == "" {
return nil, errors.New("ced: SoundDetectionRequest.src (audio path) is required")
}
topK := req.GetTopK()
if topK <= 0 {
topK = 10 // sensible default for a tagging response
}
c.engineMu.Lock()
out := cstr(CppClassifyPathJSON(c.ctxPtr, req.GetSrc(), topK))
lastErr := CppLastError(c.ctxPtr)
c.engineMu.Unlock()
if out == "" {
return nil, fmt.Errorf("ced: classification failed: %s", lastErr)
}
var tags []jsonTag
if err := json.Unmarshal([]byte(out), &tags); err != nil {
return nil, fmt.Errorf("ced: bad classifier JSON: %w", err)
}
thr := req.GetThreshold()
resp := &pb.SoundDetectionResponse{}
for _, t := range tags {
if t.Score < thr {
continue
}
resp.Detections = append(resp.Detections, &pb.SoundClass{
Label: t.Label, Score: t.Score, Index: int32(t.Index),
})
}
sort.Slice(resp.Detections, func(i, j int) bool {
return resp.Detections[i].Score > resp.Detections[j].Score
})
return resp, nil
}
func (c *Ced) Free() error {
c.engineMu.Lock()
defer c.engineMu.Unlock()
if c.ctxPtr != 0 {
CppFree(c.ctxPtr)
c.ctxPtr = 0
}
return nil
}

64
backend/go/ced/main.go Normal file
View File

@@ -0,0 +1,64 @@
package main
// ced sound-classification backend. Started internally by LocalAI: one gRPC
// server per loaded model. Loads libced.so via purego and registers the flat
// C-API declared in ced_capi.h. The library name can be overridden with
// CED_LIBRARY (mirrors PARAKEET_LIBRARY / WHISPER_LIBRARY); the default looks
// for the .so next to this binary.
//
// SKETCH: requires `make protogen-go` after the backend.proto SoundDetection
// addition, and a built libced.so (see Makefile). See DESIGN.md.
import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var addr = flag.String("addr", "localhost:50051", "the address to connect to")
type libFunc struct {
ptr any
name string
}
func main() {
libName := os.Getenv("CED_LIBRARY")
if libName == "" {
if runtime.GOOS == "darwin" {
libName = "libced.dylib"
} else {
libName = "libced.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("ced: dlopen %q: %w", libName, err))
}
// Bound 1:1 to ced_capi.h. char*-returning functions are declared uintptr
// so we can free the same pointer with ced_capi_free_string after copying
// (purego's string return would copy and leak the original).
for _, lf := range []libFunc{
{&CppAbiVersion, "ced_capi_abi_version"},
{&CppLoad, "ced_capi_load"},
{&CppFree, "ced_capi_free"},
{&CppLastError, "ced_capi_last_error"},
{&CppNumClasses, "ced_capi_num_classes"},
{&CppSampleRate, "ced_capi_sample_rate"},
{&CppClassifyPathJSON, "ced_capi_classify_path_json"},
{&CppClassifyPcmJSON, "ced_capi_classify_pcm_json"},
{&CppFreeString, "ced_capi_free_string"},
} {
purego.RegisterLibFunc(lf.ptr, lib, lf.name)
}
fmt.Fprintf(os.Stderr, "[ced] ABI=%d\n", CppAbiVersion())
flag.Parse()
if err := grpc.StartServer(*addr, &Ced{}); err != nil {
panic(err)
}
}

62
backend/go/ced/package.sh Executable file
View File

@@ -0,0 +1,62 @@
#!/bin/bash
#
# Bundle the ced-grpc binary, libced.so, the core runtime libs (libc/libstdc++/
# libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE so the package
# is self-contained. Mirrors backend/go/parakeet-cpp/package.sh; run.sh routes
# the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc is used.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then
echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2
exit 1
fi
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

20
backend/go/ced/run.sh Executable file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath "$0")")
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export CED_LIBRARY="$CURDIR/lib/libced.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
fi
# If a self-contained ld.so was packaged, route through it so the packaged
# libc / libstdc++ are used instead of the host's (matches the sibling backends).
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/ced-grpc" "$@"
fi
exec "$CURDIR/ced-grpc" "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221
CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -67,7 +67,7 @@ sources/CrispASR:
# it, so ${CMAKE_SOURCE_DIR} is THIS backend dir and the talk-llama sources
# aren't found. Rewrite to ${PROJECT_SOURCE_DIR} (the crispasr project root),
# which is correct both standalone and as a subproject. Idempotent.
sed -i 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt
sed -i.bak 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt && rm -f sources/CrispASR/src/CMakeLists.txt.bak
# Detect OS
UNAME_S := $(shell uname -s)
@@ -75,7 +75,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so
else
VARIANT_TARGETS = libgocrispasr-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgocrispasr-fallback.dylib
endif
crispasr: main.go gocrispasr.go $(VARIANT_TARGETS)
@@ -87,7 +88,7 @@ package: crispasr
build: package
clean: purge
rm -rf libgocrispasr*.so package sources/CrispASR crispasr
rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr
purge:
rm -rf build*
@@ -118,13 +119,21 @@ libgocrispasr-fallback.so: sources/CrispASR
SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
# Build fallback variant as a dylib (Darwin)
libgocrispasr-fallback.dylib: sources/CrispASR
$(MAKE) purge
$(info ${GREEN}I crispasr build info:fallback (dylib)${RESET})
SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
rm -rfv build*
libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null)
test: crispasr
CGO_ENABLED=0 $(GOCMD) test -v ./...

View File

@@ -47,6 +47,74 @@ extern "C" void set_abort(int v) {
g_abort.store(v, std::memory_order_relaxed);
}
// --- word-level timestamp accessors ---
extern "C" {
int crispasr_session_result_n_words(crispasr_session_result *r, int seg_i);
const char *crispasr_session_result_word_text(crispasr_session_result *r,
int seg_i, int word_i);
int64_t crispasr_session_result_word_t0(crispasr_session_result *r, int seg_i,
int word_i);
int64_t crispasr_session_result_word_t1(crispasr_session_result *r, int seg_i,
int word_i);
// Parakeet-specific word accessors
int crispasr_parakeet_result_n_words(void *r);
const char *crispasr_parakeet_result_word_text(void *r, int word_i);
int64_t crispasr_parakeet_result_word_t0(void *r, int word_i);
int64_t crispasr_parakeet_result_word_t1(void *r, int word_i);
}
void *get_result(void) { return g_result; }
int get_word_count(int seg_i) {
if (!g_result)
return 0;
return crispasr_session_result_n_words(g_result, seg_i);
}
const char *get_word_text(int seg_i, int word_i) {
if (!g_result)
return "";
return crispasr_session_result_word_text(g_result, seg_i, word_i);
}
int64_t get_word_t0(int seg_i, int word_i) {
if (!g_result)
return 0;
return crispasr_session_result_word_t0(g_result, seg_i, word_i);
}
int64_t get_word_t1(int seg_i, int word_i) {
if (!g_result)
return 0;
return crispasr_session_result_word_t1(g_result, seg_i, word_i);
}
// Parakeet-specific word accessors
int get_parakeet_word_count(void) {
if (!g_result)
return 0;
return crispasr_parakeet_result_n_words(g_result);
}
const char *get_parakeet_word_text(int word_i) {
if (!g_result)
return "";
return crispasr_parakeet_result_word_text(g_result, word_i);
}
int64_t get_parakeet_word_t0(int word_i) {
if (!g_result)
return 0;
return crispasr_parakeet_result_word_t0(g_result, word_i);
}
int64_t get_parakeet_word_t1(int word_i) {
if (!g_result)
return 0;
return crispasr_parakeet_result_word_t1(g_result, word_i);
}
static void ggml_log_cb(enum ggml_log_level level, const char *log,
void *data) {
const char *level_str;

View File

@@ -20,4 +20,18 @@ float *tts_synthesize(const char *text, int *out_n_samples); // 24kHz mono float
void tts_free(float *pcm);
int tts_set_voice(const char *name); // best-effort speaker selection; 0 ok
int tts_set_voice_file(const char *path, const char *ref_text); // load voice pack (.gguf) or zero-shot clone (.wav + ref_text)
// --- word-level timestamp accessors ---
// Session-based (works for whisper-like backends)
void *get_result(void);
int get_word_count(int seg_i);
const char *get_word_text(int seg_i, int word_i);
int64_t get_word_t0(int seg_i, int word_i);
int64_t get_word_t1(int seg_i, int word_i);
// Parakeet-specific (global word list, no segment index)
int get_parakeet_word_count(void);
const char *get_parakeet_word_text(int word_i);
int64_t get_parakeet_word_t0(int word_i);
int64_t get_parakeet_word_t1(int word_i);
}

View File

@@ -34,6 +34,18 @@ var (
CppTTSFree func(ptr uintptr)
CppTTSSetVoice func(name string) int
CppTTSSetVoiceFile func(path string, refText string) int
// Word-level timestamp accessors (session-based, per-segment)
CppGetWordCount func(segI int) int
CppGetWordText func(segI int, wordI int) string
CppGetWordT0 func(segI int, wordI int) int64
CppGetWordT1 func(segI int, wordI int) int64
// Parakeet-specific word accessors (global, no segment index)
CppGetParakeetWordCount func() int
CppGetParakeetWordText func(wordI int) string
CppGetParakeetWordT0 func(wordI int) int64
CppGetParakeetWordT1 func(wordI int) int64
)
type CrispASR struct {
@@ -212,6 +224,28 @@ func (w *CrispASR) VAD(req *pb.VADRequest) (pb.VADResponse, error) {
}, nil
}
// isValidWord reports whether a TranscriptWord contains recognisable speech
// content. The parakeet-specific word accessors can return stale initialisation
// data (model name, binary blobs) when a segment has no real speech. A word is
// considered valid only when:
// - the text is non-empty after trimming,
// - it contains no U+FFFD replacement characters (from binary data scrubbing),
// - both timestamps are non-negative,
// - the word has positive duration (end > start).
func isValidWord(w *pb.TranscriptWord) bool {
txt := strings.TrimSpace(w.Text)
if txt == "" {
return false
}
if strings.ContainsRune(txt, '\uFFFD') {
return false
}
if w.Start < 0 || w.End < 0 || w.End <= w.Start {
return false
}
return true
}
func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRequest) (pb.TranscriptResult, error) {
if err := ctx.Err(); err != nil {
return pb.TranscriptResult{}, status.Error(codes.Canceled, "transcription cancelled")
@@ -290,15 +324,54 @@ func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRe
// IDs, so Tokens is left empty.
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
// Populate word-level timestamps. Try session-based functions first
// (per-segment); fall back to parakeet-specific functions (global word
// list with no segment index — only populated on the first segment to
// avoid duplication).
words := []*pb.TranscriptWord{}
wordCount := CppGetWordCount(i)
if wordCount == 0 && i == 0 {
wordCount = CppGetParakeetWordCount()
for j := 0; j < wordCount; j++ {
w := &pb.TranscriptWord{
Start: CppGetParakeetWordT0(j) * (10000000),
End: CppGetParakeetWordT1(j) * (10000000),
Text: strings.ToValidUTF8(strings.Clone(CppGetParakeetWordText(j)), "<22>"),
}
if isValidWord(w) {
words = append(words, w)
}
}
} else {
for j := 0; j < wordCount; j++ {
w := &pb.TranscriptWord{
Start: CppGetWordT0(i, j) * (10000000),
End: CppGetWordT1(i, j) * (10000000),
Text: strings.ToValidUTF8(strings.Clone(CppGetWordText(i, j)), "<22>"),
}
if isValidWord(w) {
words = append(words, w)
}
}
}
// Skip empty segments with no recognisable content (e.g. trailing
// silence segments that parakeet emits with stale init data).
trimmed := strings.TrimSpace(txt)
if trimmed == "" && len(words) == 0 {
continue
}
segment := &pb.TranscriptSegment{
Id: int32(i),
Text: txt,
Start: s, End: t,
Words: words,
}
segments = append(segments, segment)
text += " " + strings.TrimSpace(txt)
text += " " + trimmed
}
return pb.TranscriptResult{
@@ -390,13 +463,20 @@ func (w *CrispASR) AudioTranscriptionStream(ctx context.Context, opts *pb.Transc
s := CppGetSegmentStart(i) * 10000000
t := CppGetSegmentEnd(i) * 10000000
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
// Skip empty segments (e.g. trailing silence that parakeet emits
// with stale init data).
trimmed := strings.TrimSpace(txt)
if trimmed == "" && s == t {
continue
}
segments = append(segments, &pb.TranscriptSegment{
Id: int32(i),
Text: txt,
Start: s, End: t,
})
trimmed := strings.TrimSpace(txt)
if trimmed == "" {
continue
}

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("CRISPASR_LIBRARY")
if libName == "" {
libName = "./libgocrispasr-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgocrispasr-fallback.dylib"
} else {
libName = "./libgocrispasr-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
@@ -44,6 +49,14 @@ func main() {
{&CppTTSFree, "tts_free"},
{&CppTTSSetVoice, "tts_set_voice"},
{&CppTTSSetVoiceFile, "tts_set_voice_file"},
{&CppGetWordCount, "get_word_count"},
{&CppGetWordText, "get_word_text"},
{&CppGetWordT0, "get_word_t0"},
{&CppGetWordT1, "get_word_t1"},
{&CppGetParakeetWordCount, "get_parakeet_word_count"},
{&CppGetParakeetWordText, "get_parakeet_word_text"},
{&CppGetParakeetWordT0, "get_parakeet_word_t0"},
{&CppGetParakeetWordT1, "get_parakeet_word_t1"},
}
for _, lf := range libFuncs {

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/crispasr $CURDIR/package/
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/
cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgocrispasr-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgocrispasr-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgocrispasr-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export CRISPASR_LIBRARY=$LIBRARY
# Point piper's espeak-ng phonemizer at the bundled voice data. The variable

View File

@@ -8,11 +8,13 @@ JOBS?=$(shell nproc --ignore=1)
# depth-anything.cpp. Pin to a specific commit for a stable build; a squash
# merge upstream can orphan a branch, so the native version is pinned by SHA.
# This SHA adds the nested two-file metric C-API (abi_version 4,
# da_capi_load_nested) required by the depth-anything-3-nested gallery model;
# tag it (e.g. v0.1.3) upstream to keep the SHA alive.
# This SHA adds the Depth Anything V2 engine + C-API routing (depth-only,
# relative + metric) on top of the nested two-file metric C-API (abi_version 4,
# da_capi_load_nested) required by the depth-anything-3-nested gallery model.
# It is kept alive by the upstream tag da2-support (survives a squash-merge);
# repoint to the master merge commit once mudler/depth-anything.cpp PR #1 lands.
DEPTHANYTHING_REPO?=https://github.com/mudler/depth-anything.cpp.git
DEPTHANYTHING_VERSION?=cce5edc395fd1843806093d7ccc0c8b0d0b97b72
DEPTHANYTHING_VERSION?=f4e17dea695dd12ae76bea98ba58030996b98118
ifeq ($(NATIVE),false)
CMAKE_ARGS+=-DGGML_NATIVE=OFF
@@ -75,7 +77,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libdepthanythingcpp-fallback.so
VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib
endif
depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
@@ -87,7 +89,7 @@ package: depth-anything-cpp
build: package
clean: purge
rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources
purge:
rm -rf build*
@@ -114,11 +116,19 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
else
libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
rm -rfv build-$@
endif
libdepthanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -126,7 +136,8 @@ libdepthanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null)
all: depth-anything-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("DEPTHANYTHING_LIBRARY")
if libName == "" {
libName = "./libdepthanythingcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libdepthanythingcpp-fallback.dylib"
} else {
libName = "./libdepthanythingcpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export DEPTHANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -67,8 +67,9 @@ $(LIB_SENTINEL): sources/LocalVQE
# that the loader picks at runtime. We must build every target — the
# default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY
# routes all of them into build/bin; copy them out next to the binary.
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* .
cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib .
cp -P build/bin/libggml*.so* . 2>/dev/null || true
cp -P build/bin/libggml*.dylib . 2>/dev/null || true
touch $(LIB_SENTINEL)
liblocalvqe.so: $(LIB_SENTINEL)

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("LOCALVQE_LIBRARY")
if libName == "" {
libName = "./liblocalvqe.so"
if runtime.GOOS == "darwin" {
libName = "./liblocalvqe.dylib"
} else {
libName = "./liblocalvqe.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -15,7 +15,9 @@ cp -avf $CURDIR/localvqe $CURDIR/package/
# liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime
# variants — LocalVQE picks the matching CPU variant at load time.
cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true
cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -10,8 +10,19 @@ CURDIR=$(dirname "$(realpath $0)")
# exec'ing the binary.
cd "$CURDIR"
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
if [ "$(uname)" = "Darwin" ]; then
# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
export DYLD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$DYLD_LIBRARY_PATH
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.dylib
if [ ! -e "$LOCALVQE_LIBRARY" ]; then
LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
fi
export LOCALVQE_LIBRARY
else
export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
fi
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"

View File

@@ -70,7 +70,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = liblocateanythingcpp-fallback.so
VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib
endif
locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
@@ -82,7 +82,7 @@ package: locate-anything-cpp
build: package
clean: purge
rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources
rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources
purge:
rm -rf build*
@@ -109,11 +109,19 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
else
liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
rm -rfv build-$@
$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
rm -rfv build-$@
endif
liblocateanythingcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -121,7 +129,8 @@ liblocateanythingcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null)
all: locate-anything-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("LOCATEANYTHING_LIBRARY")
if libName == "" {
libName = "./liblocateanythingcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./liblocateanythingcpp-fallback.dylib"
} else {
libName = "./liblocateanythingcpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export LOCATEANYTHING_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# omnivoice.cpp version
OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
OMNIVOICE_VERSION?=2603355a5dfacae5cfc33531d5d0933221843509
OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607
SO_TARGET?=libgomnivoicecpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,7 +65,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
else
VARIANT_TARGETS = libgomnivoicecpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib
endif
omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
@@ -77,7 +78,7 @@ package: omnivoice-cpp
build: package
clean: purge
rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp
purge:
rm -rf build*
@@ -106,13 +107,20 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp
$(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
rm -rf build-libgomnivoicecpp-fallback.dylib
libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: omnivoice-cpp
@echo "Running omnivoice-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("OMNIVOICE_LIBRARY")
if libName == "" {
libName = "./libgomnivoicecpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgomnivoicecpp-fallback.dylib"
} else {
libName = "./libgomnivoicecpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export OMNIVOICE_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45
# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45
PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go
@@ -74,6 +74,7 @@ libparakeet.so: sources/parakeet.cpp
cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS)
cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS)
cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true
cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./
parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go

View File

@@ -2,15 +2,17 @@ package main
// Started internally by LocalAI - one gRPC server per loaded model.
//
// Loads libparakeet.so via purego and registers the flat C-API entry
// points declared in parakeet_capi.h. The library name can be overridden
// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY
// convention in the sibling backends); the default looks for the .so next
// to this binary.
// Loads the parakeet shared library via purego and registers the flat
// C-API entry points declared in parakeet_capi.h. The library name can be
// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY /
// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default
// looks next to this binary for libparakeet.so on Linux and
// libparakeet.dylib on macOS.
import (
"flag"
"fmt"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,7 +30,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("PARAKEET_LIBRARY")
if libName == "" {
libName = "libparakeet.so"
if runtime.GOOS == "darwin" {
libName = "libparakeet.dylib"
} else {
libName = "libparakeet.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -1,23 +1,71 @@
#!/bin/bash
#
# L0 packaging stub: copy the binary, run.sh and libparakeet.so* into
# package/. The full ldd walk (libc, libstdc++, libgomp, GPU runtimes,
# arch detection) lands in L3, mirroring backend/go/whisper/package.sh.
# Bundle the parakeet-cpp-grpc binary, libparakeet.so, the core runtime
# libs (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active
# BUILD_TYPE so the package is self-contained. Mirrors
# backend/go/whisper/package.sh; run.sh routes the (CGO_ENABLED=0) binary
# through lib/ld.so so the packaged libc is used instead of the host's.
set -e
CURDIR=$(dirname "$(realpath "$0")")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p "$CURDIR/package/lib"
cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
cp -avf "$CURDIR/run.sh" "$CURDIR/package/"
# libparakeet.so + any soname symlinks (libparakeet.so.X, libparakeet.so.X.Y).
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
# libparakeet shared lib + any soname symlinks. On Linux this is
# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen
# resolves it via the *_LIBRARY_PATH that run.sh points at lib/.
cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true
cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true
if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then
echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2
exit 1
}
fi
echo "L0 package layout (full ldd walk lands in L3):"
# Detect architecture and copy the core runtime libs libparakeet.so links
# against, plus the matching dynamic loader as lib/ld.so.
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6"
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2"
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
elif [ "$(uname -s)" = "Darwin" ]; then
echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers)
# based on BUILD_TYPE so the backend can reach the GPU without the runtime
# base image shipping those drivers.
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah "$CURDIR/package/" "$CURDIR/package/lib/"

View File

@@ -3,11 +3,17 @@ set -e
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
else
export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
fi
# If a self-contained ld.so was packaged, route through it so the
# packaged libc / libstdc++ are used instead of the host's (matches the
# whisper backend's runtime layout).
# whisper backend's runtime layout). Linux only.
if [ -f "$CURDIR/lib/ld.so" ]; then
echo "Using lib/ld.so"
exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# qwentts.cpp version
QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
QWEN3TTS_CPP_VERSION?=0bf4a18b22e8bb8718d95294e9f7f45c0d4270a4
QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42
SO_TARGET?=libgoqwen3ttscpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib
endif
qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
build: package
clean: purge
rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp
purge:
rm -rf build*
@@ -110,13 +110,20 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.so
# Build fallback variant as a dylib (Darwin)
libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp
$(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
rm -rf build-libgoqwen3ttscpp-fallback.dylib
libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \
cd .. && \
mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: qwen3-tts-cpp
@echo "Running qwen3-tts-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("QWEN3TTS_LIBRARY")
if libName == "" {
libName = "./libgoqwen3ttscpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgoqwen3ttscpp-fallback.dylib"
} else {
libName = "./libgoqwen3ttscpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/
cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export QWEN3TTS_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -71,7 +71,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = librfdetrcpp-avx.so librfdetrcpp-avx2.so librfdetrcpp-avx512.so librfdetrcpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = librfdetrcpp-fallback.so
VARIANT_TARGETS = librfdetrcpp-fallback.dylib
endif
rfdetr-cpp: main.go gorfdetrcpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: rfdetr-cpp
build: package
clean: purge
rm -rf librfdetrcpp*.so rfdetr-cpp package sources
rm -rf librfdetrcpp*.so librfdetrcpp*.dylib rfdetr-cpp package sources
purge:
rm -rf build*
@@ -110,11 +110,19 @@ librfdetrcpp-avx512.so: sources/rt-detr.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
librfdetrcpp-fallback.dylib: sources/rt-detr.cpp
rm -rfv build-$@
$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
rm -rfv build-$@
else
librfdetrcpp-fallback.so: sources/rt-detr.cpp
rm -rfv build-$@
$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
rm -rfv build-$@
endif
librfdetrcpp-custom: CMakeLists.txt
mkdir -p build-$(SO_TARGET) && \
@@ -122,7 +130,8 @@ librfdetrcpp-custom: CMakeLists.txt
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/librfdetrcpp.dylib ./$(SO_TARGET) 2>/dev/null)
all: rfdetr-cpp package

View File

@@ -9,6 +9,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("RFDETR_LIBRARY")
if libName == "" {
libName = "./librfdetrcpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./librfdetrcpp-fallback.dylib"
} else {
libName = "./librfdetrcpp-fallback.so"
}
}
rfdetrLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/librfdetrcpp-*.so $CURDIR/package/
cp -fv $CURDIR/librfdetrcpp-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/librfdetrcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/rfdetr-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/librfdetrcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/librfdetrcpp-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export RFDETR_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -66,7 +66,7 @@ ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgosam3-avx.so libgosam3-avx2.so libgosam3-avx512.so libgosam3-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgosam3-fallback.so
VARIANT_TARGETS = libgosam3-fallback.dylib
endif
sam3-cpp: main.go gosam3.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: sam3-cpp
build: package
clean: purge
rm -rf libgosam3*.so sam3-cpp package sources
rm -rf libgosam3*.so libgosam3*.dylib sam3-cpp package sources
purge:
rm -rf build*
@@ -105,11 +105,19 @@ libgosam3-avx512.so: sources/sam3.cpp
endif
# Build fallback variant (all platforms)
ifeq ($(UNAME_S),Darwin)
libgosam3-fallback.dylib: sources/sam3.cpp
$(MAKE) purge
$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
SO_TARGET=libgosam3-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
rm -rfv build*
else
libgosam3-fallback.so: sources/sam3.cpp
$(MAKE) purge
$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
SO_TARGET=libgosam3-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
rm -rfv build*
endif
libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
mkdir -p build-$(SO_TARGET) && \
@@ -117,6 +125,7 @@ libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgosam3.dylib ./$(SO_TARGET) 2>/dev/null)
all: sam3-cpp package

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("SAM3_LIBRARY")
if libName == "" {
libName = "./libgosam3-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgosam3-fallback.dylib"
} else {
libName = "./libgosam3-fallback.so"
}
}
gosamLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libgosam3-*.so $CURDIR/package/
cp -fv $CURDIR/libgosam3-*.so $CURDIR/package/ 2>/dev/null || true
cp -fv $CURDIR/libgosam3-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/sam3-cpp $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgosam3-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgosam3-fallback.dylib"
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosam3-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosam3-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgosam3-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export SAM3_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -7,6 +7,7 @@ import (
"fmt"
"os"
"path/filepath"
"runtime"
"strconv"
"strings"
"sync"
@@ -238,11 +239,19 @@ func loadSherpaLibs() error {
func loadSherpaLibsOnce() error {
shimLib := os.Getenv("SHERPA_SHIM_LIBRARY")
if shimLib == "" {
shimLib = "libsherpa-shim.so"
if runtime.GOOS == "darwin" {
shimLib = "libsherpa-shim.dylib"
} else {
shimLib = "libsherpa-shim.so"
}
}
capiLib := os.Getenv("SHERPA_ONNX_LIBRARY")
if capiLib == "" {
capiLib = "libsherpa-onnx-c-api.so"
if runtime.GOOS == "darwin" {
capiLib = "libsherpa-onnx-c-api.dylib"
} else {
capiLib = "libsherpa-onnx-c-api.so"
}
}
shim, err := purego.Dlopen(shimLib, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -3,7 +3,13 @@ set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export SHERPA_SHIM_LIBRARY=$CURDIR/lib/libsherpa-shim.dylib
export SHERPA_ONNX_LIBRARY=$CURDIR/lib/libsherpa-onnx-c-api.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=7f0e728b7d42f2490dfa5dd9539082d904f2f6b2
STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f
CMAKE_ARGS+=-DGGML_MAX_NAME=128
@@ -131,6 +131,7 @@ libgosd-custom: CMakeLists.txt cpp/gosd.cpp cpp/gosd.h
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) && \
cd .. && \
mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgosd.dylib ./$(SO_TARGET) 2>/dev/null)
all: stablediffusion-ggml package

View File

@@ -3,6 +3,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
// Get library name from environment variable, default to fallback
libName := os.Getenv("SD_LIBRARY")
if libName == "" {
libName = "./libgosd-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgosd-fallback.dylib"
} else {
libName = "./libgosd-fallback.so"
}
}
gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

View File

@@ -12,6 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
cp -fv $CURDIR/libgosd-*.dylib $CURDIR/package/ 2>/dev/null || true
cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
cp -fv $CURDIR/run.sh $CURDIR/package/

View File

@@ -12,9 +12,18 @@ if [ "$(uname)" != "Darwin" ]; then
grep -e "flags" /proc/cpuinfo | head -1
fi
LIBRARY="$CURDIR/libgosd-fallback.so"
if [ "$(uname)" = "Darwin" ]; then
# macOS: single library variant (Metal or Accelerate). The gosd target is
# built as a CMake MODULE, which emits a .dylib for a SHARED build but a
# .so for a MODULE build on Apple, so prefer .dylib and fall back to .so.
LIBRARY="$CURDIR/libgosd-fallback.dylib"
if [ ! -e "$LIBRARY" ]; then
LIBRARY="$CURDIR/libgosd-fallback.so"
fi
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgosd-fallback.so"
if [ "$(uname)" != "Darwin" ]; then
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/libgosd-avx.so ]; then
@@ -36,9 +45,10 @@ if [ "$(uname)" != "Darwin" ]; then
LIBRARY="$CURDIR/libgosd-avx512.so"
fi
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
fi
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export SD_LIBRARY=$LIBRARY
# If there is a lib/ld.so, use it

View File

@@ -16,6 +16,7 @@ import (
"os"
"path/filepath"
"regexp"
"runtime"
"strings"
"time"
"unicode"
@@ -943,7 +944,13 @@ func InitializeONNXRuntime() error {
}
}
if libPath == "" {
libPath = "/usr/local/lib/libonnxruntime.so"
// LocalAI: default to the platform-native shared library
// extension when nothing else is found (dyld vs ld.so).
if runtime.GOOS == "darwin" {
libPath = "/usr/local/lib/libonnxruntime.dylib"
} else {
libPath = "/usr/local/lib/libonnxruntime.so"
}
}
}
ort.SetSharedLibraryPath(libPath)

View File

@@ -32,6 +32,10 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
# macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in
# run.sh); there is no ld.so loader nor glibc to bundle.
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1

View File

@@ -3,12 +3,19 @@ set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ "$(uname)" = "Darwin" ]; then
# macOS uses dyld: there is no ld.so loader, and the search path env
# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
fi
fi
exec $CURDIR/supertonic "$@"

View File

@@ -70,8 +70,8 @@ UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so
else
# On non-Linux (e.g., Darwin), build only fallback variant
VARIANT_TARGETS = libgovibevoicecpp-fallback.so
# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
VARIANT_TARGETS = libgovibevoicecpp-fallback.dylib
endif
vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: vibevoice-cpp
build: package
clean: purge
rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp
rm -rf libgovibevoicecpp*.so libgovibevoicecpp*.dylib package sources/vibevoice.cpp vibevoice-cpp
purge:
rm -rf build*
@@ -119,13 +119,21 @@ libgovibevoicecpp-fallback.so: sources/vibevoice.cpp
SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
rm -rfv build*
# Build fallback variant as a dylib (Darwin)
libgovibevoicecpp-fallback.dylib: sources/vibevoice.cpp
$(MAKE) purge
$(info ${GREEN}I vibevoice-cpp build info:fallback (dylib)${RESET})
SO_TARGET=libgovibevoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
rm -rfv build*
libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h
mkdir -p build-$(SO_TARGET) && \
cd build-$(SO_TARGET) && \
cmake .. $(CMAKE_ARGS) && \
cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \
cd .. && \
mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET)
(mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
mv build-$(SO_TARGET)/libgovibevoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
test: vibevoice-cpp
@echo "Running vibevoice-cpp tests..."

View File

@@ -4,6 +4,7 @@ package main
import (
"flag"
"os"
"runtime"
"github.com/ebitengine/purego"
grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
func main() {
libName := os.Getenv("VIBEVOICECPP_LIBRARY")
if libName == "" {
libName = "./libgovibevoicecpp-fallback.so"
if runtime.GOOS == "darwin" {
libName = "./libgovibevoicecpp-fallback.dylib"
} else {
libName = "./libgovibevoicecpp-fallback.so"
}
}
lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)

Some files were not shown because too many files have changed in this diff Show More