feat(ds4): wire SSD streaming + quality engine options, add 128GB DeepSeek gallery models
The ds4 backend zero-initialized ds4_engine_options and exposed none of the
engine's tunable knobs, so SSD streaming (run a model larger than RAM by
streaming routed MoE experts from the GGUF on SSD) and the quality/perf knobs
were unreachable from LocalAI model YAMLs.
Map ModelOptions.Options onto ds4_engine_options through a declarative table
(kEngineOptSpecs + apply_engine_option) instead of per-field branches: the
struct is fixed C with no reflection, so the field set is enumerated once and a
future knob is a one-line table row. Two fields use ds4's own typed parsers
(GiB budgets, cache-experts count-or-NGB). Bare flags (e.g. "ssd_streaming")
mean true; path-type options (mtp_path, expert_profile_path,
directional_steering_file) resolve relative to the model directory so a gallery
entry can reference a companion file by bare filename. mtp_draft/mtp_margin are
now validated rather than parsed with throwing std::stoi/std::stof.
Add gallery entries for the 128 GB class:
- deepseek-v4-flash-q2-q4 (~91 GB, mixed q2/q4, fits RAM, higher quality)
- deepseek-v4-flash-q4-ssd (~153 GB full 4-bit, runs on 128 GB via SSD streaming)
- deepseek-v4-flash-q2-mtp (~81 GB + MTP speculative draft weights)
- deepseek-v4-pro-q2-ssd (~433 GB Pro, experimental SSD streaming)
SSD streaming is Metal (Darwin) only; the options are inert on CUDA/CPU.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): add depth-anything-3-metric-large gallery entry
DA3METRIC-LARGE (ViT-L) single-file metric-scale depth + sky, served by the
existing depth-anything backend (same single-GGUF path as mono-large). GGUF
published at mudler/depth-anything.cpp-gguf.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): serve nested metric model (two-file load)
The DA3 nested model needs both branches (anyview GIANT + metric ViT-L) loaded
together. Wire it through the backend:
- Load reads a 'metric_model:<file>' entry from ModelOptions.Options and, when
present, calls da_capi_load_nested(anyview, metric) instead of da_capi_load
(registers the new abi-4 symbol; helper optionValue + unit test).
- gallery: depth-anything-3-nested (model=anyview, options=metric branch, both
GGUFs fetched) for metric-scale depth + pose.
- bump depth-anything.cpp pin to cce5edc (abi 4 / da_capi_load_nested).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The Hugo relearn theme does not provide an "alert" shortcode, so the
docs deploy failed at the Build site step:
failed to extract shortcode: template for shortcode "alert" not found
docs/content/features/image-generation.md:106
Convert the vae_decode_only note to the theme-supported notice shortcode
used everywhere else in the docs.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery
Mirrors the locate-anything-cpp backend to register a new depth-anything
backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via
purego (cgo-less, no Python at inference).
- backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage),
purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts
building depth-anything.cpp's DA_SHARED static .so per CPU variant.
- backend/index.yaml: depth-anything backend meta + all hardware-variant
capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t).
- gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32,
small, large, giant, mono-large).
- .github/backend-matrix.yml: one build entry per hardware variant.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): typed Depth RPC + REST endpoint exposing full DA3 data
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): pin depth-anything.cpp to e0b6814 (ABI 3 dense C-API)
The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3);
pin the native build to the commit that exports them.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): pin depth-anything.cpp to v0.1.0 release (b515c31)
Repoint the native version from the now-orphaned e0b6814 to the
b515c31 release commit, kept alive by the upstream v0.1.0 tag.
C-API is unchanged (da_capi_abi_version == 3).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): wire depth-anything-cpp into build, CI bump, and importer
The backend dir, gallery index, and CI build-matrix were present but the
backend was never wired into the integration points that adding-backends.md
requires:
- root Makefile: add to .NOTPARALLEL, the test-extra chain, a BACKEND_*
definition, the docker-build target eval, and docker-build-backends
(mirrors parakeet-cpp; the backend's own Makefile already documented that
its `test` target is driven by test-extra).
- bump_deps.yaml: register the DEPTHANYTHING_VERSION pin so the daily
auto-bump bot tracks mudler/depth-anything.cpp master (it cannot see an
unregistered Makefile pin).
- import form: add a preference-only KnownBackend entry so depth-anything is
selectable at /import-model (mirrors sam3-cpp; no reliable GGUF auto-detect
signal, so pref-only per the doc's default).
changed-backends.js needs no entry: the generic golang suffix branch already
resolves backend/go/depth-anything-cpp/.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): auto-detect importer for depth-anything GGUFs
Replace the preference-only entry with a real auto-detect importer
(mirrors parakeet-cpp / locate-anything):
- DepthAnythingImporter matches a .gguf whose name carries a
depth-anything token (depth-anything-<size>-<quant>.gguf), so
/import-model recognises mudler/depth-anything.cpp-gguf repos and direct
GGUF URLs without an explicit backend preference. preferences.backend=
"depth-anything" still forces it.
- Registered before LlamaCPPImporter so its GGUF bundles aren't claimed by
the generic .gguf importer; the narrow name match means it cannot claim
arbitrary llama GGUFs or the upstream safetensors PyTorch repos.
- Multi-quant repos pick the smallest quant by default (q4_k -> ... -> f32,
depth stays >0.998 corr even at q4_k); quantizations preference overrides.
- Drops the now-redundant knownPrefOnlyBackends entry (importer-backed
backends are not listed there, matching parakeet-cpp).
- Table-driven Ginkgo test covers detection, negative cases (llama GGUF,
upstream safetensors), default/override/fallback quant pick, and direct
URL import. 10/10 specs pass.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): check conn.Close error in grpc Depth client (errcheck)
The new Depth() client method used a bare `defer conn.Close()`. golangci-lint
runs with new-from-merge-base, so although the 39 sibling methods use the same
bare form (grandfathered), the newly added line trips errcheck. Drop the result
explicitly to satisfy the linter.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): bump depth-anything.cpp to v0.1.1 (embeddable CMake)
v0.1.0 (b515c31) used ${CMAKE_SOURCE_DIR} for its include dirs, which
points at the parent project when built via add_subdirectory() as this
backend does, so the container build failed with missing stb_image.h /
da_gguf_keys.h. v0.1.1 (2d42897) switches to project-relative paths.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): resolve gosec findings in the backend wrapper
The code-scanning gate flagged three new failure-level alerts in
godepthanythingcpp.go (gosec runs with -no-fail; GitHub gates on new alerts):
- G301: export dirs were created with 0o755. Tighten to 0o750 (no world
access needed for backend-written export output).
- G304: writeDepthPNG creates req.GetDst(). That path is chosen by the
LocalAI core as the intended output destination (same pattern every
image backend uses), not attacker input, so annotate with #nosec G304
and document why.
The remaining G103 "audit unsafe" notes on the unsafe.Slice C-buffer copies
are warning-level (the same purego interop whisper/parakeet use) and do not
gate the check, per the supertonic exclusion precedent in secscan.yaml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): bump depth-anything.cpp to v0.1.2 (CUDA cross-build arch)
v0.1.1 forced CMAKE_CUDA_ARCHITECTURES=native, which breaks the GPU-less
l4t/cublas CI builds (nvcc "Unsupported gpu architecture 'compute_'" on
CMake 3.22). v0.1.2 (442eea4) drops the override and lets ggml pick its
default cross-build arch list.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(config): add chat_template_kwargs model field + resolver
Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier,
plus ResolveChatTemplateKwargs which layers the config map under coerced
request metadata. Foundation for generic jinja chat-template kwargs (issue #10329).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend): forward resolved chat_template_kwargs blob to backends
gRPCPredictOpts now merges per-request client metadata over the server-derived
enable_thinking/reasoning_effort (reaching all backends via the standalone keys)
and serialises the resolved chat_template_kwargs map into a JSON blob for
llama.cpp, written last so a client cannot clobber it. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(http): wire request metadata to config.RequestMetadata
The OpenAI request metadata field was parsed but unused; stamp it onto the
per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs
overrides. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama-cpp): generic chat_template_kwargs merge (drop per-key blocks)
Replace the per-key enable_thinking/reasoning_effort handling in both the
streaming and non-streaming chat paths with a single block that parses the
chat_template_kwargs JSON blob resolved by the Go layer and merges every key
into body_json. New jinja template levers (e.g. preserve_thinking) now need
no C++ change. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: document custom chat_template_kwargs (model + per-request)
Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(backend): pin reasoning_effort as a string in the chat_template_kwargs blob
Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(http): e2e guard pinning chat_template_kwargs forwarded to gRPC
Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the
received PredictOptions.Metadata, and an app_test.go spec that drives a real
/v1/chat/completions request (model chat_template_kwargs + per-request metadata
override) and asserts the exact metadata + chat_template_kwargs blob the REST
layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(config): grandfather chat_template_kwargs in registry coverage
chat_template_kwargs is a free-form map[string]any (like engine_args, already
on the list), not a scalar the config UI registry can surface, so it is exempt
from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries
failure introduced by the new field. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* ⬆️ Update leejet/stable-diffusion.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(stablediffusion-ggml): adapt gosd.cpp to upstream sd_ctx_params_t API
The bump to 5a34bc7 restructured sd_ctx_params_t: the boolean CPU-offload
knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu,
keep_control_net_on_cpu) were replaced by backend assignment specs
(backend/params_backend), and vae_decode_only / free_params_immediately
were dropped entirely. The build broke with "no member named ..." on
every arch.
Translate the legacy options we still accept from gallery configs into
the new backend assignment specs, mirroring prepare_backend_assignments()
in the upstream CLI, so offload_params_to_cpu / keep_*_on_cpu keep
working. vae_decode_only is parsed and ignored for config compatibility.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* feat(stablediffusion-ggml): expose backend/params placement options
The upstream bump introduced new sd_ctx_params_t fields for device and
memory placement (backend, params_backend, rpc_servers, max_vram,
stream_layers) plus PuLID-Flux weights (pulid_weights_path). Wire them up
as backend options so models can be split across CPU/GPU/disk/RPC:
- backend: per-component compute placement (e.g. clip=cpu,vae=cuda0)
- params_backend: per-component weight storage incl. disk mmap
- max_vram / stream_layers: graph-cut segmented parameter offload budget
- rpc_servers: offload compute to remote RPC servers
- pulid_weights_path: PuLID-Flux identity injection
The legacy keep_*_on_cpu / offload_params_to_cpu booleans now seed and
compose with the explicit backend/params_backend specs, matching upstream
prepare_backend_assignments(). Option values are taken as everything after
the first ':' so colon-bearing values (rpc_servers host:port) survive
parsing. Documented the new options in the image-generation guide.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* feat(stablediffusion-ggml): distributed RPC across ggml workers
Enable the ggml RPC backend (-DSD_RPC=ON) so image generation can be
sharded across remote rpc-server workers. The ggml rpc-server is
backend-agnostic, so this reuses the exact same worker pool as the
llama.cpp backend - one set of `local-ai worker llama-cpp-rpc` /
`p2p-llama-cpp-rpc` workers accelerates both text and image generation.
RPC servers are selected by precedence:
- the explicit `rpc_servers` option, else
- the LLAMACPP_GRPC_SERVERS env var, which LocalAI's p2p worker mode
populates automatically with discovered workers (the backend inherits
it from the parent process env), so distributed image generation needs
no per-model configuration.
Documented manual and p2p setup in the image-generation guide.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* ⬆️ Update antirez/ds4
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(ds4): add Homebrew include/lib prefix for Darwin grpc-proto build
The darwin/metal ds4 backend job runs for the first time on this bump
(it was skipped on prior ds4 PRs) and fails compiling backend.pb.cc with
'google/protobuf/runtime_version.h' file not found.
hw_grpc_proto links neither protobuf::libprotobuf nor gRPC::grpc++, so
the generated proto sources rely on default system include paths. That
works on Linux (/usr/include) but not on macOS, where Homebrew installs
under /opt/homebrew. Add the Homebrew prefix to include/link dirs on
Darwin, mirroring the llama-cpp backend that already builds on Darwin CI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(ds4): install nlohmann-json on Darwin CI for ds4 backend
After the protobuf include-path fix the ds4 darwin build advances to
compiling dsml_renderer.cpp, which includes <nlohmann/json.hpp> and
#errors when absent. On Linux the header comes from apt nlohmann-json3-dev
in the build image; the macOS runner had no equivalent. Add the
header-only nlohmann-json formula to the shared Darwin backend brew
install/link list and Homebrew cache, alongside the existing deps.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(ds4): build proper OCI image tar for Darwin backend
The darwin packaging referenced scripts/build/oci-pack.sh, which was
never added to the tree, so it fell back to a plain 'tar' that omits
manifest.json. 'local-ai backends install' then rejects the tarball
with 'file manifest.json not found in tar'.
Use './local-ai util create-oci-image' (already built by the 'build'
prerequisite of the backends/ds4-darwin target), mirroring
llama-cpp-darwin.sh, to emit a real OCI image the installer accepts.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
fix(launcher): truncate download status labels to stop dialog blowout
The download progress windows place a ProgressBar and a status Label in the
same VBox. On failure the status label is set to "Download failed: <error>",
and the error commonly contains a long, unbreakable URL/path. A Fyne label
with default settings reports its MinSize as the full single-line text width,
so a long message stretches the window — and the progress bar sharing the
VBox — arbitrarily wide (fixes#10355).
Set Truncation = fyne.TextTruncateEllipsis on the four affected status labels
(the main-window status label plus the status label in each of the three
showDownloadProgress implementations). Truncation collapses the label's
MinSize to roughly one character plus the ellipsis regardless of content, so
the window keeps its intended size. TextWrapWord is not enough because it
cannot break a spaceless URL. The full error text remains visible via the
dialog.ShowError call already present in each path.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The qwen3-tts backend migrated from predict-woo/qwen3-tts.cpp to
ServeurpersoCom/qwentts.cpp (the Makefile QWEN3TTS_REPO already points
there), but the bump_deps matrix still tracked the old repo. That made
the nightly bumper open PRs (e.g. #10334) against the wrong upstream.
Point the matrix entry at the new repo and its master branch.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The Gemma 4 QAT MTP assistant-head gallery entries currently fail to load in the stock llama.cpp backend with unknown architecture errors. Hide them until the assistant GGUFs are verified against the supported backend path.
Assisted-by: Codex:GPT-5 [gh] [git]
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(react-ui): localize SearchableSelect component
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
* feat(react-ui): localize ModelSelector component
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
* fix(react-ui): dynamically localize back navigation caption to match page title
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
* feat(react-ui): localize back navigation state on Models page
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
* feat(react-ui): localize ModelEditor page
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
* fix(react-ui): fix Indonesian typo 'Import' to 'Impor' in importModel locale
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
---------
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Expands sherpa-onnx Piper TTS coverage in the model gallery. Previously only
5 single-speaker Piper voices shipped (it_IT-paola, en_US-amy, es_ES-davefx,
fr_FR-siwis, de_DE-thorsten). This adds 19 entries:
Italian (it_IT): dii-high, miro-high, riccardo-x_low.
UK English (en_GB): alan (low+medium), alba-medium, aru-medium, cori
(high+medium), dii-high, jenny_dioco-medium, miro-high,
northern_english_male-medium, semaine-medium, southern_english_female
(low+medium), southern_english_male-medium, vctk-medium, sweetbbak-amy.
Each entry mirrors the existing Piper block (sherpa-onnx-tts.yaml base config).
sha256, ONNX path, sample rate and speaker count were read from the actual
release tarballs; licenses and source URLs were taken from each archive's
MODEL_CARD/README rather than assumed:
- dii/miro voices are OpenVoiceOS models under CC BY-NC-SA 4.0 (non-commercial),
labelled as such in both the license field and description.
- cori is LibriVox public-domain (cc0-1.0); OpenSLR-83 voices are CC BY-SA 4.0;
alba/vctk are CC BY 4.0.
- vctk (109), aru (12) and semaine (4) are multi-speaker; tagged accordingly
with a note to select the speaker via the numeric voice id.
The legacy underscore-named southern_english_female_medium duplicate is
intentionally skipped. No backend change is needed: sherpa-onnx auto-detects
single-speaker VITS vs Kokoro, and each tarball ships its own espeak-ng-data.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Composed realtime pipelines (VAD+STT+LLM+TTS) defaulted to unlimited history,
so a long-running session grew every turn and fed the whole conversation to the
LLM until its context window filled. Add an optional pipeline.max_history_items
to cap the trailing items per turn; explicit value (including 0=unlimited) wins
over the per-model-type default. Self-contained any-to-any models keep their
6-item default.
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(watchdog): start the live watchdog on a cold enable from Settings (#9125)
The React Settings "Enable Watchdog" master toggle only ever writes the
idle/busy flags; watchdog_enabled is vestigial in that UI. The live
start/stop decision in UpdateSettingsEndpoint keyed off the raw, stale
watchdog_enabled request field, so a cold enable (idle/busy=true,
watchdog_enabled=false) called StopWatchdog() and the watchdog stayed
stopped until the next restart - at which point startup re-derived it
from the idle flag. Net: enabling the watchdog appeared to do nothing.
Derive the run-state from idle||busy as the single source of truth,
mirroring the startup invariant:
- ApplyRuntimeSettings now sets WatchDog = idle||busy whenever either
field is present (so a full disable also brings it down), while an API
client posting only watchdog_enabled keeps its explicit value.
- Add ApplicationConfig.WatchdogShouldRun() mirroring startWatchdog's
gating (idle/busy, LRU eviction, memory reclaimer); the /api/settings
handler uses it to decide start vs stop.
- Belt-and-suspenders: the Settings.jsx master toggle also writes
watchdog_enabled = idle||busy.
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
A model whose ModelFile is a single file (e.g. sherpa-onnx VITS/piper: the
.onnx) failed to load on remote worker nodes because the sibling assets the
backend resolves from the model dir — tokens.txt, lexicon.txt, the
espeak-ng-data / dict directories, Kokoro's voices.bin — were never staged.
Only the declared ModelFile was shipped, so the worker hit "failed to create
sherpa-onnx TTS engine" and TTS produced no audio.
Lean on the existing option-path staging instead of hardcoding filenames:
- stageGenericOptions now also resolves an option value relative to the model's
own directory (not just the frontend models dir), so a shared config can
declare companions with bare names regardless of whether Model includes a
subdirectory; and it expands directory-valued options (e.g. espeak-ng-data)
file-by-file rather than handing a directory fd to the stager.
- gallery/sherpa-onnx-tts.yaml declares the companion assets as option paths
(tokens, lexicon, espeak-ng-data, voices.bin, dict, per-lang lexicons). The
backend ignores these keys and keeps resolving siblings from the model dir;
they exist only so distributed staging ships them. Absent files are skipped.
Adds router_optionstage_test.go covering file + directory companion staging via
the model-dir fallback.
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add mock-backend VoiceEmbed/VoiceVerify (deterministic DC-offset speaker
discrimination) and a verify-mode gated realtime pipeline, then drive the
real HTTP/WS stack: an authorized speaker reaches response.done while an
unauthorized one is dropped before the LLM with a speaker_not_authorized
event.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): add pipeline voice_recognition gate config schema
Add the PipelineVoiceRecognition config block that gates a realtime
pipeline behind speaker verification (identify against the voice
registry, or verify against reference audios), with Normalize defaults
and Validate enum/shape checks. Register the new fields in the config
meta registry so the UI renders them with proper labels/components
(required by the registry-coverage gate).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* fix(realtime): range-check voice gate threshold and floor UI min
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): add cosineDistance helper for voice gate
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): add voiceGate identify-mode authorization
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* test(realtime): cover voice gate fail-closed error paths
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): add voiceGate verify-mode authorization
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): add voiceGate decide policy helper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): add newVoiceGate constructor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* feat(realtime): gate pipeline responses behind voice recognition
Run speaker verification concurrently with transcription and join on a
hard barrier before generateResponse, so unauthorized utterances never
reach the LLM, tools, or TTS. Supports identify (registry) and verify
(reference) modes with multiple authorized speakers, per-utterance or
first-utterance checking, and drop-with-event or silent-drop on reject.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* fix(realtime): harden voice gate goroutine lifecycle
Only launch the verification goroutine on the transcription path and
drain it before the temp WAV is removed on the transcription-error
return, so an in-flight backend read never races the deferred cleanup.
Drop the write-only voiceMatched field; log the matched speaker instead.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* docs(realtime): document the voice_recognition pipeline gate
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* fix(realtime): fail closed on an incomplete voice_recognition block
A present voice_recognition block with no model previously disabled the
gate silently, authorizing every speaker. Treat block presence as the
intent signal and reject an empty model in Validate, so the session is
refused instead of running unprotected.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
* test(realtime): integration-test the voice gate through commitUtterance
Drive the real commitUtterance path (gate goroutine, hard join before the
LLM, reject event, when:first session trust) with the existing
transport/model doubles: authorized speakers reach a full response,
unauthorized ones are dropped before the LLM with a speaker_not_authorized
event, backend errors fail closed, drop_silent stays quiet, and when:first
trusts the session after one match.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Add the localai-org/localai-realtime-demo Go client to the README
Examples list and to the realtime docs (integrations + realtime feature
page). Refresh the Latest News section with June 2026 highlights pulled
from history since v4.3.0: realtime pipeline streaming, the parakeet.cpp
and CrispASR speech work, new backends (locate-anything.cpp, Ideogram4,
llama.cpp video input), and distributed-mode hardening.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Add a full Korean locale (core/http/react-ui/public/locales/ko/, 13 namespaces,
840 keys, full parity with en/) and register ko in SUPPORTED_LANGUAGES
(core/http/react-ui/src/i18n/index.js). All i18next {{interpolation}} and
_one/_other plural keys preserved; brand/model names kept untranslated.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: moduvoice <moduvoicr77@gmail.com>
* feat(omnivoice-cpp): add C wrapper + CMake/Makefile build over OmniVoice ov_* ABI
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): add option/language parsing + WAV framing helpers with tests
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): wire purego binding with TTS + streaming TTSStream
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* build(omnivoice-cpp): wire backend into root Makefile
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(omnivoice-cpp): add build matrix entries + dep-bump registration
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): register backend meta + image entries
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): expose as preference-only importable backend
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add omnivoice-cpp TTS models (Q8_0 default + BF16 HQ)
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(omnivoice-cpp): document the OmniVoice TTS backend
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(omnivoice-cpp): add env-gated e2e for TTS + streaming
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): honor tts.audio_path/tts.voice config as default cloning reference
The model config tts.audio_path (ModelOptions.AudioPath) and tts.voice now
provide a default voice-cloning reference used when a request omits Voice, so a
cloned voice can be pinned in the model YAML instead of passed per request. A
per-request voice still overrides. Paths resolve relative to the model dir.
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(omnivoice-cpp): add missing omnivoice-cpp-development backend meta
Mirrors the whisper/vibevoice convention: a -development meta aggregating the
master-tagged image variants (the production meta and per-variant prod+dev image
entries already existed; only the development meta aggregator was missing).
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Wire the Kokoro model family into the sherpa-onnx backend (which only
supported VITS/Piper before) and add gallery voices for Italian, English,
Spanish, French and German plus a multilingual Kokoro model.
- csrc/shim.{c,h}: kokoro_* config setters (model/voices/tokens/data_dir/
dict_dir/lexicon/lang/length_scale) mirroring the VITS path, with the
matching frees in tts_config_free.
- backend.go: loadTTS now detects a Kokoro model (a voices.bin beside the
ONNX) and routes to configureKokoroTTS, otherwise configureVitsTTS.
Kokoro picks up espeak-ng-data, the jieba dict and the per-language
lexicons (only one English variant, to avoid tens of thousands of
duplicate-word warnings at load); the language= option hints the lang.
- backend_test.go: functional test for isKokoroModel detection.
- gallery: 5 Piper VITS voices (it_IT-paola, en_US-amy, es_ES-davefx,
fr_FR-siwis, de_DE-thorsten) + kokoro-multi-lang-v1.0, served through
sherpa-onnx-tts.yaml with native streaming TTS.
Verified by building the backend and synthesizing with a real Piper and
Kokoro model (31/31 specs pass, including real-model synth smokes).
Assisted-by: Claude:claude-opus-4-8 gofmt golangci-lint go-test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
fix(xsysinfo): make reported system RAM total cgroup/lxcfs-aware (#8059)
GetSystemRAMInfo derived Total from memory.TotalMemory(), which on Linux
uses syscall.Sysinfo().Totalram - the HOST kernel total. lxcfs/LXD does
NOT virtualize that value, while MemAvailable (used for Free/Available)
IS virtualized. Inside an LXD/container with a 128Gi host but a ~10Gi
container view this produced Total=128Gi, Available=10Gi => Used=118Gi,
reporting ~92% RAM usage on an idle container.
Derive Total instead from the minimum of all non-zero, non-unlimited
candidates: cgroup v2 memory.max, cgroup v1 memory.limit_in_bytes (the
kernel unlimited sentinel is ignored), /proc/meminfo MemTotal (which
lxcfs virtualizes), and the syscall.Sysinfo total as the bare-metal
fallback. On bare metal every candidate is unlimited or equals the host
total, so behavior is unchanged.
The selection/parsing lives in a pure function chooseTotalMemory(...)
taking file CONTENTS, unit-tested without a real LXD host; OS file
reads stay in a thin wrapper.
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
fix(agents): make React agent chat timestamps format-agnostic
The agent SSE bridge emits the json_message timestamp in three different
encodings depending on deploy mode: an RFC3339 string (standalone agent
pool), Unix milliseconds (local dispatcher), and Unix nanoseconds (the
older NATS path). The React AgentChat handler passed data.timestamp
straight through, so the standalone string and any numeric value outside
the millisecond range rendered as "Invalid Timestamp" or a constant
epoch-ish time.
Add a small pure helper, normalizeTimestampMs, that accepts an RFC3339
string or a numeric epoch in s/ms/us/ns and returns JS milliseconds,
falling back to Date.now() on null/empty/unparseable input. Use it in
the json_message handler so the rendered time is correct regardless of
which backend path produced it.
Fixes#9867
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Two small visual fixes in the React UI:
- Talk page pipeline summary: the four-column grid used
`repeat(4, 1fr)`, which resolves to `minmax(auto, 1fr)` so each track
refuses to shrink below the min-content width of its `nowrap` model
name. Long names (e.g. a verbose GGUF LLM id) blew the grid out past
the container despite the per-cell ellipsis styling. Switching to
`minmax(0, 1fr)` lets the tracks shrink and the ellipsis take effect.
- Sidebar user avatar: the desktop collapsed look centers the avatar via
`.sidebar.collapsed .sidebar-user{-link}` rules, but the tablet
icon-rail (640-1023px) collapses visually through `.sidebar:not(.open)`
without necessarily carrying the `.collapsed` class, so the avatar kept
its left-aligned negative margins and looked misaligned. Mirror the
centering rules under `.sidebar:not(.open)`.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The gallery has metal-ds4 / metal-ds4-development entries, and the build
recipe exists (make backends/ds4-darwin, special-cased in
backend_build_darwin.yml), but ds4 was never listed in the darwin matrix,
so no metal-darwin-arm64-ds4 image was ever published and the entries
dangled.
- Add ds4 to the darwin matrix (includeDarwin), mirroring the llama-cpp
form (the reusable workflow builds it via 'make backends/ds4-darwin').
- Fix inferBackendPathDarwin in scripts/changed-backends.js to map ds4 to
backend/cpp/ds4/ (like llama-cpp): ds4 is C++ but the matrix entry carries
lang=go, so without this its darwin build would only ever run on a release
(FORCE_ALL), never incrementally when backend/cpp/ds4 changes.
sherpa-onnx and speaker-recognition are already in the darwin matrix on
master and are not changed here.
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>