Commit Graph

6815 Commits

Author SHA1 Message Date
Ettore Di Giacinto
46b76cb4ac test(http): cover parseForwarded edge cases; clarify base-url flag group
Adds direct unit coverage for quoted/malformed/multi-element Forwarded
headers and regroups the external base URL flag away from auth-only.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:14:25 +00:00
Ettore Di Giacinto
15c7ce059a docs: document LOCALAI_BASE_URL and reverse-proxy headers
Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:08:58 +00:00
Ettore Di Giacinto
975b54dfc5 feat(config): generalize LOCALAI_BASE_URL to ExternalBaseURL
LOCALAI_BASE_URL now sets a single instance-wide external base URL used
for OAuth callbacks and all self-referential links. A Pre middleware
stamps it into the request context for middleware.BaseURL.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:03:56 +00:00
Ettore Di Giacinto
2eec8bfeb9 feat(http): honor explicit external base URL in BaseURL
When _external_base_url is set in the request context it dictates the
origin (scheme+host+port); the proxy path prefix is still appended.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:58:21 +00:00
Ettore Di Giacinto
d9feac54dc fix(http): harden BaseURL proxy scheme/host detection
Split comma-separated X-Forwarded-Proto and honor the RFC 7239 Forwarded
header so generated links use https behind common reverse-proxy setups.

Refs #10482

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:57:43 +00:00
LocalAI [bot]
5c3d48ab50 feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496)
* feat(ui): remember last-used model per capability

ModelSelector auto-selected the first option whenever the bound value was
empty or stale, so every visit to the Home chat box, Image, TTS or Talk
pages reset the choice to whatever sorted first. Persist the user's pick
in localStorage keyed by capability and prefer it on auto-select when the
model is still available, falling back to the first option otherwise.

Because every modality picker funnels through ModelSelector, this fixes
the friction everywhere at once. External-options callers pass no
capability and keep the previous first-item behaviour.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add visibility-aware polling hook

The app had 26 hand-rolled setInterval polls, none of which paused when
the browser tab was hidden, so backgrounded dashboards kept hitting the
server every few seconds for data nobody was looking at.

Add usePolling: runs immediately, polls on a fixed interval, pauses while
document.hidden, fires a catch-up poll on return, and guards against
overlapping slow requests. Route useResources (the highest-frequency
shared poll) through it. Further callers can be migrated incrementally.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): hardware-aware starter models on empty home

A fresh install dropped admins straight into a 1000+ model gallery with
no guidance. Add a StarterModels widget to the empty-state wizard that
recommends a small, curated set tuned to the detected hardware:

- CPU-only machines (no GPU VRAM) are steered to genuinely small models
  (1-4B, Q4) that stay responsive without a GPU.
- GPU machines get suggestions scaled to available VRAM.

Curated names are real gallery entries, intersected against the live
gallery at render time so a trimmed/custom gallery degrades gracefully.
Install is one click via the existing model-install API.

Also routes Home's cluster and system-info polls through usePolling so a
backgrounded home page stops fetching.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): optional token-cost estimates on usage dashboard

The usage dashboard tracked tokens but had no monetary view. Multi-user
deployments that bill back or budget compute had to export and compute
cost elsewhere.

Add an opt-in pricing control: admins set $ per 1M prompt/completion
tokens (stored per-browser). When set, an estimated-cost summary card and
per-model / per-user cost columns appear, computed from recorded token
counts. The entire cost surface stays hidden until a price is entered, so
the default view is unchanged. Cost is clearly labelled an estimate -
LocalAI itself has no notion of price.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): label icon-only send buttons for screen readers

The chat and agent-chat send buttons were a bare paper-plane icon with
no accessible name, so screen readers announced only "button". Add an
aria-label/title ("Send message") and mark the icon aria-hidden. An audit
of all icon-only buttons found these were the only two unlabeled controls;
the rest already carry visible text.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:30:08 +02:00
LocalAI [bot]
764b0352b9 docs: ⬆️ update docs version mudler/LocalAI (#10491)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:24 +02:00
LocalAI [bot]
75ba2daba1 chore(model-gallery): ⬆️ update checksum (#10495)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 23:18:04 +02:00
LocalAI [bot]
62b14fd635 feat(backends): add darwin/metal build for liquid-audio (#10486)
* feat(backends): add darwin/metal build for liquid-audio

Wire the already-MPS-ready liquid-audio backend (it ships
requirements-mps.txt) into the darwin CI matrix and the gallery so
metal-darwin-arm64 images are built and selectable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* ci(liquid-audio): trigger darwin build via requirements-mps note

The changed-backends path filter only builds a backend when a file under
its directory changes. The metal wiring lived in index.yaml + the matrix,
so the darwin job was skipped. Add a documenting comment to the MPS
requirements so CI actually exercises the darwin build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path

Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which
rejects the uv-only --index-strategy flag and failed the darwin backend build.
Add it only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:16:27 +02:00
LocalAI [bot]
193d0e6aef fix(backends): darwin/metal support for supertonic (#10488)
The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and
packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed
ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec
path, while package.sh hard-failed on any non-Linux host. On macOS dyld has
no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib.

This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481
landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but
which omitted supertonic:

- run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH
  at libonnxruntime.dylib; guard the ld.so exec path to Linux only.
- package.sh: recognize Darwin instead of erroring out; the bundled .dylib is
  resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle.
- helper.go: platform-native default library extension (dylib on darwin) for
  the last-resort dlopen fallback.

It also wires the darwin CI build and gallery entries, resolving the
inconsistency where backend/index.yaml advertised metal for supertonic but no
includeDarwin matrix entry built the image:

- .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry.
- backend/index.yaml: declare metal capabilities and add the concrete
  metal-supertonic / metal-supertonic-development child entries.

The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX
Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:19:03 +02:00
LocalAI [bot]
482314c623 fix(realtime): resolve model aliases for pipeline sub-models (#10484)
Realtime pipeline sub-models (llm/transcription/tts/vad/sound-detection)
were loaded via cl.LoadModelConfigFileByName without alias resolution,
unlike top-level API requests which resolve aliases in
core/http/middleware/request.go. So a pipeline that references an alias
(e.g. `pipeline.llm: default`, where `default` is an alias for a real
LLM) reached model loading as the alias stub with an empty Backend.

This was silently broken on a single host (it failed downstream) and a
hard error in distributed/p2p mode:

    routing model : loading model default: ... installing backend on
    node X: backend name is empty

Fix by routing every pipeline sub-model load through a small helper that
follows a single alias hop (mirroring the top-level resolution), so
non-alias sub-models behave identically and aliased ones get the
target's full config (Backend, Model, ...).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:50:44 +02:00
Dedy F. Setyawan
e8ae88a2a0 i18n(id): update and complete Indonesian translations (#10480)
- translate remaining English strings in chat, common, home, and media locales.
- fix typo and improve wording consistency (e.g., klaster -> kluster, otomasi -> automasi).

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-06-24 18:35:21 +02:00
Richard Palethorpe
e1994579f8 fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS (#10474)
pii_default_detectors was applied to the live config only by a live
POST /api/settings (ApplyRuntimeSettings) — neither the startup loader nor
the config file watcher read it back. So after a restart the persisted
default detectors were dropped, and the cloud-proxy MITM listener (which
resolves each intercept host's detectors once at start via ResolvePIIPolicy)
came up with an empty set and forwarded intercepted traffic unredacted, even
though the MITM model had pii.enabled:true and the defaults were on disk.
Request-side default redaction broke the same way.

- startup.go: loadRuntimeSettingsFromFile now applies pii_default_detectors,
  before startMITMIfConfigured, with env > file precedence.
- config_file_watcher.go: apply pii_default_detectors on live file edits,
  matching the existing env-guard pattern used for the other fields.
- settings endpoint: rebuild the MITM listener when pii_default_detectors
  changes (its per-host detector map is frozen at listener start), not only
  on a mitm_listen change — so toggling a default detector takes effect on
  cloud-proxy traffic immediately.
- new LOCALAI_PII_DEFAULT_DETECTORS env var / CLI flag (WithPIIDefaultDetectors)
  so the default detector set can be pinned at boot for immutable deployments.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-24 11:08:57 +02:00
LocalAI [bot]
e5620989dd refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476)
PR #10475 fixed SoundDetection in-flight tracking, but the underlying trap
remains: InFlightTrackingClient embedded the whole grpc.Backend interface
"for passthrough of untracked methods", so any newly added inference method
is silently satisfied by the embedded passthrough and never wrapped with
track(). That leaves onFirstComplete unfired and in-flight stuck at 1 - the
exact SoundDetection bug, waiting to recur for the next backend method.

Close the gap at the type level instead of relying on reviewers to remember:

- Split grpc.Backend into two composed sub-interfaces: InferenceBackend
  (methods that are one discrete inference call and must be tracked) and
  ControlBackend (control-plane calls plus the streaming constructors whose
  work spans the returned stream, safe to pass through). The classification
  now lives next to the interface it documents.
- InFlightTrackingClient embeds only grpc.ControlBackend and implements every
  InferenceBackend method explicitly, delegating to an inner InferenceBackend.
  A `var _ grpc.Backend = (*InFlightTrackingClient)(nil)` assertion makes the
  package fail to compile if any inference method is left unwrapped.

Now adding a method to InferenceBackend is a build error (at the assertion and
every call site: "does not implement grpc.Backend (missing method X)"), not a
silent runtime leak - and the obvious fix is to copy a neighbouring wrapper,
which calls track(). No runtime guard or reviewer vigilance required.

Pure refactor: the composed Backend interface is identical to the old flat
one, so all implementers and consumers are unaffected (verified with a full
`go build ./...`). Behaviour is unchanged; the existing nodes suite passes.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 11:08:29 +02:00
LocalAI [bot]
fc618dcee6 fix(distributed): track in-flight for SoundDetection requests (#10475)
The distributed router wraps backend clients in InFlightTrackingClient so
the eviction logic knows which replicas are actively serving. Every
inference method must be wrapped: track() increments in-flight on entry
and decrements (plus fires onFirstComplete, which releases the load-time
reservation) on return.

SoundDetection was added after the tracking client and never got a
wrapper, so its calls fell through to the embedded passthrough Backend.
The increment/decrement never ran and, critically, onFirstComplete never
fired, so the reservation set at model load was never released - leaving
in-flight stuck at 1 and the replica permanently ineligible for eviction.

Wrap SoundDetection like the other non-LLM methods and cover it in the
"non-LLM inference methods track in-flight" table test.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 10:13:37 +02:00
LocalAI [bot]
e6042080c0 fix(agents): URL-decode collection/agent name path params (#10443) (#10471)
fix(agents): URL-decode collection/agent name path params

Collection and agent names carry a "legacy-api-key:" prefix, so the ':'
arrives percent-encoded as %3A in the request path. Echo routes such
paths via URL.RawPath and stores the matched path-param value still
escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch"
and the store lookup 404'd ("collection not found").

This was second-order fallout of #10375/#10387: once colons became valid
in names, the URL-decode gap surfaced on every name-bearing endpoint.

Add a decodedParam helper that url.PathUnescape's the param (falling back
to the raw value on invalid encoding) and wire it into all collection
endpoints and the agent :name endpoints, which share the identical
prefix. The entry endpoints already unescaped c.Param("*"); this closes
the same gap for :name.

Fixes #10443


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 09:42:09 +02:00
LocalAI [bot]
0f3b24436d chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a (#10468)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:43 +02:00
LocalAI [bot]
4b6f911835 chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea (#10466)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:14 +02:00
LocalAI [bot]
a5e28942a6 chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:54 +02:00
LocalAI [bot]
dba9cd7ca4 chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc (#10465)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:34 +02:00
LocalAI [bot]
c93190de50 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e (#10456)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:13 +02:00
LocalAI [bot]
4dbf69f889 chore(model gallery): 🤖 add 1 new models via gallery agent (#10472)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 00:00:26 +02:00
LocalAI [bot]
deb430f3ec chore(model-gallery): ⬆️ update checksum (#10469)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
v4.5.0
2026-06-23 23:15:47 +02:00
LocalAI [bot]
dd8c8778e2 chore(model gallery): 🤖 add 1 new models via gallery agent (#10464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 15:43:21 +02:00
LocalAI [bot]
06a7b6cadb chore: ⬆️ Update leejet/stable-diffusion.cpp to f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f (#10457)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:29:07 +02:00
LocalAI [bot]
67c8889866 chore: ⬆️ Update CrispStrobe/CrispASR to 63b57289255267edf66e43e33bc3911e04a2e92d (#10455)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:49 +02:00
LocalAI [bot]
1d49041c85 chore: ⬆️ Update ggml-org/llama.cpp to 73618f27a801c0b8614ceaf3547d3c2a99baae14 (#10458)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:09 +02:00
LocalAI [bot]
2edc4e25b3 chore: ⬆️ Update ggml-org/whisper.cpp to bae6bc02b1940bbfb87b6a0299c565e563b916d1 (#10459)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:27:51 +02:00
Richard Palethorpe
7888067914 fix(settings): merge partial /api/settings updates instead of overwriting (#10463)
POST /api/settings rebuilt runtime_settings.json from only the request
body, so a focused admin page that submits a single field wiped every
other persisted setting. The Middleware proxy tab (mitm_listen) and
detector table (pii_default_detectors), plus the MCP SetBranding tool
(instance_name/instance_tagline), all POST partial bodies; the
no-omitempty api_keys and pii_default_detectors fields even round-tripped
as JSON null.

Read the persisted settings and overlay only the fields the request set
(RuntimeSettings.MergeNonNil) before writing. Every field is a pointer, so
the reflection-based merge is total over the struct and any field added
later is preserved automatically. Absent or null fields are now kept;
clearing a setting is done by sending its explicit empty/zero value
(api_keys [], mitm_listen "", etc.), unchanged from before. The full
Settings page sends every field, so its Save behaves identically.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-23 13:27:34 +02:00
LocalAI [bot]
9eedbf537a chore(model gallery): 🤖 add 1 new models via gallery agent (#10461)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 08:04:46 +02:00
LocalAI [bot]
69c16481c8 fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460)
PR #10454 added a `cancellable bool` parameter to GalleryStore.UpdateProgress
but missed two callers under tests/e2e/distributed, breaking the build on
master (golangci-lint and tests-e2e-backend both failed to compile with
"not enough arguments in call to ... UpdateProgress").

Pass cancellable=true (both ops are downloading installs, which are
cancellable) and assert the flag is persisted, exercising the new behavior.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 23:45:22 +02:00
LocalAI [bot]
56f8a6623f fix(galleryop): persist cancellable so restarted in-flight ops stay cancellable (#10454)
In distributed mode a model/backend install marks OpStatus.Cancellable=true
while downloading, but the gallery_operations row never recorded it:
UpdateStatus persisted only progress/status and Create left the cancellable
column at its zero value. After a replica restart Hydrate rebuilt the op with
cancellable=false, /api/operations reported false, and the UI hid the cancel
button - the orphaned op then lingered until the 30-minute stale reaper
expired it ("stays there on restart, can't cancel, after a bit it expires").

Persist the flag on every progress tick and at row creation (installs are
cancellable, deletes are not), and clear it on terminal transitions. A
rehydrated in-flight op is now cancellable, so an admin can dismiss the
orphaned op immediately instead of waiting out the reaper. The functional
cancel path already survived restart (CancelOperation persists store.Cancel
even with no live CancelFunc); this restores the UI affordance that drives it.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 22:41:16 +02:00
Ettore Di Giacinto
4755d676a3 Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar)" (#10453)
Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top…"

This reverts commit 9d54a599b0.
2026-06-22 21:59:05 +02:00
dependabot[bot]
10184b5e28 chore(deps): bump actions/checkout from 6 to 7 (#10451)
Bumps [actions/checkout](https://github.com/actions/checkout) from 6 to 7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v6...v7)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-22 21:38:37 +02:00
LocalAI [bot]
fdf475ec5f feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446)
* feat(realtime): add pipeline.compaction config + resolution

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(realtime): extract itemID helper, reuse in item.retrieve

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(realtime): drop duplicate Ginkgo bootstrap, fold specs into openai suite

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement conversation.item.delete

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement input_audio_buffer.clear

Add a handler for the input_audio_buffer.clear client event that discards
a partially-captured utterance (raw PCM + buffered Opus frames) via a
unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): implement conversation.item.truncate (text)

Clears both .Text and .Transcript of the assistant content part at
contentIndex so barge-in truncation also works for audio turns whose
spoken words live in .Transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): add Conversation.Memory + pair-safe compactionCut

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, avoids panic)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* style(realtime): gofmt compaction test helper closures

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): inject rolling memory into the prompt + summary builders

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): server-side summarize-then-drop compactor

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(realtime): unit-test prefixMatches eviction-safety predicate

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(realtime): resolve summarizer model + schedule compaction per turn

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(realtime): document conversation compaction + new item events

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(realtime): resolve summary model inside compaction goroutine (lazy, off-path)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(realtime): reuse reasoning.ExtractReasoningComplete for summary stripping

Replace the bespoke <think> regex in the compactor with the shared
pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of
the realtime path and covering all reasoning tag families, not just <think>.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(config): register pipeline.compaction fields in meta registry

TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have
a UI/meta registry entry; add the four pipeline.compaction.* leaves so they
render with proper labels/descriptions instead of the reflection fallback.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 21:28:49 +02:00
LocalAI [bot]
9d54a599b0 feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar) (#10449)
* feat(ui): add shared DeploymentContext (features + p2p signal)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(ui): extract launchAssistantChat shared helper

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): role/mode-aware landing redirect at /app

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): pin Cluster group and collapse Create for cluster admins

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): desktop top navbar with mode pill and admin-via-chat jump

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): admin token-usage meter in the top navbar

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): top-navbar breakpoint handoff + assistant jump from chat page

M1: the desktop .top-navbar was hidden at max-width 768px while the
.mobile-header only appears at max-width 639px, leaving 640-768px with
neither bar so admins lost the mode pill, token meter and admin-via-chat
jump. Hide the top bar at 639px instead so it covers every width the rail
sidebar is shown and hands off to the mobile-header exactly at 639px.

M2: the navbar 'Admin via chat' button wrote localStorage and called
navigate('/app/chat'), but when already on the chat page Chat does not
remount so its mount-time payload reader never fired and the click was a
no-op until reload. The payload consume logic is factored into a shared
callback; the launcher now dispatches a localai-open-assistant event that
the mounted Chat listens for to re-consume the payload. Mount behavior is
unchanged.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 21:27:43 +02:00
Richard Palethorpe
63bcbf6c12 fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401)
* fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier

Follow-up to the NER tier engine (#10360), already on master. This carries
only the incremental review fixes and tests that postdate that merge — the
feature itself is not re-introduced.

Review fixes:
- openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText
  (the `elem < 0` guard above already returns).
- application.go: collapse ResolvePIIPolicy's inline re-implementation of
  PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the
  "explicit pii.enabled wins, else cloud-proxy default" rule) and return true
  past the !enabled guard where it is provable.
- pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in
  patternDetector.Detect into one local.
- grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat
  bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire —
  dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000]
  is rejected here with an actionable error; >1000 still fails closed via
  Parse. Specs pin the relationship so the guard can't silently revert.
- PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and
  force the DOM value back when clamping (min={0} only constrained the spinner,
  so a negative reached saved config and silently disabled the length filter).

Tests:
- piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead).
- model-config.spec.js: assert the min_len clamp, and that entity_actions
  collapses a duplicate group to a single row (map semantics; regression guard
  against emitting an array that drops a row on save).
- tests/e2e-backends: token_classify capability driving the TokenClassify gRPC
  RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned
  spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via
  `make test-extra-backend-privacy-filter` (3/3 specs).
- Makefile: test-extra-backend-privacy-filter wrapper.
- tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact
  (mask + block) through the full HTTP -> detector -> redactor path; gated on
  PII_NER_MODEL_GGUF so the default suite is unaffected.
- .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job
  running the container harness on CPU.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(gallery): add privacy-filter-nemotron (f16 + q8)

GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English
PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from
openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing
privacy-filter-multilingual entry, trading language breadth for category depth.

- privacy-filter-nemotron: F16 reference artifact (~2.8 GB).
- privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge
  use; description notes the size/speed tradeoff and to validate on your own
  data (a single dropped span is a PII leak).

Both run on the privacy-filter backend with known_usecases [token_classify] and
a default mask policy (min_score 0.5); operators add per-category entity_actions
as needed. sha256s taken from the HF repo's LFS object ids.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-22 18:26:19 +02:00
LocalAI [bot]
95b058e1c5 feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447)
* chore: gitignore SDD scratch directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(nodes): add GET /api/nodes/models cluster-wide loaded-models endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add nodesApi.allModels() for cluster-wide model roster

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): move Scheduling to its own page and nav item

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): replace nodes stat-card strip with cluster pulse + attention callout

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): node-panel roster with inline model chips and segmented filter

Replace the Nodes table with a full-width node-panel roster that shows
each backend node's running-model chips without an expand click, plus an
All/Backend/Agent segmented filter. Per-node detail (models, backends,
labels, capacity) moves to the node detail page.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add deep-linkable node detail page at /app/nodes/:id

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): remove em-dash from CapacityEditor comment; align detail spec backend mock

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(ui): nodes page cleanup, hover/chip polish, docs for restructured cluster view

Nodes.jsx dead-code sweep confirmed clean (no StatCard/table/expand
state/scheduling-form leftovers). Two App.css polish fixes: move the
node-panel hover border-color onto the bordered element so hover gives
real feedback, and add the missing .model-chip__state rule the
ModelChip component already emits. Update distributed-mode docs prose to
describe the restructured cluster view (cluster pulse, attention
callout, node-panel roster with inline model chips, All/Backend/Agent
filter, node detail page at /app/nodes/:id, Scheduling as its own page).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(ui): drop unused gpuVendorLabel export from nodeStatus

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 18:24:29 +02:00
LocalAI [bot]
f2abcc7503 chore(model gallery): 🤖 add 1 new models via gallery agent (#10445)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 16:09:16 +02:00
Adira
62c99c10b3 fix(diffusers): pin diffusers and transformers to a known-good pair (#9979) (#10442)
fix(diffusers): pin diffusers and transformers to a known-good pair

The diffusers backend tracked git+https://github.com/huggingface/diffusers
(main) with an unpinned transformers. transformers v5 restructured
CLIPTextModel and removed the .text_model attribute that diffusers' single
-file loader reads, so loading any single-file Stable Diffusion checkpoint
fails:

    create_diffusers_clip_model_from_ldm (single_file_utils.py)
    position_embedding_dim = model.text_model.embeddings.position_embedding...
    AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

No released diffusers (<=0.38.0) supports transformers v5 - only unreleased
diffusers main does. Because the requirements tracked main plus an unpinned
transformers, every backend image froze whichever pair existed at build
time, and images built once transformers v5 shipped but before diffusers
main caught up are permanently broken.

Pin the last known-good released pair across all requirements files:
diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every
pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no
functionality is lost, and builds become reproducible instead of drifting
into the broken window.

Fixes #9979

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-06-22 12:38:06 +02:00
LocalAI [bot]
7226bb9f30 chore: ⬆️ Update CrispStrobe/CrispASR to 7a8cb80907341c0204bd0488c1244764f4163883 (#10315)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 12:21:58 +02:00
LocalAI [bot]
569d9bbd9e fix(distributed): broadcast file-staging progress across replicas (#10440)
File-staging progress lived only in the SmartRouter's in-memory
StagingTracker on the replica performing the transfer. In a multi-replica
deployment behind a round-robin load balancer, a /api/operations poll
that lands on any other replica saw no staging row, so the progress
("processing file ... Total ... Current ...") flickered in and out as
polls rotated between frontends.

Mirror the pattern already used for gallery-install progress: the origin
replica broadcasts staging ticks over NATS (SubjectStagingProgress, a
new staging.<model>.progress subject), and peers merge them via
ApplyRemote (SubscribeBroadcasts on the wildcard). Byte-level ticks are
leading-edge debounced (~1/s); Start/FileComplete/Complete always
publish. A locally-owned op stays authoritative so the origin's own echo
and stray peer events can't clobber it, and mirrored remote ops expire
after a TTL so a missed Done event can't leave a phantom row. The UI read
path (StagingTracker.GetAll) is unchanged.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 09:28:07 +02:00
LocalAI [bot]
682fb2718c fix(distributed): detach cold-load staging from the request context (#10438)
A model not yet loaded on a worker is staged lazily on the inference
request path. Staging a multi-GB model takes minutes - far longer than
any client keeps its HTTP request open - so a browser refresh, an
ingress/LB idle-timeout, or a round-robined retry landing on another
frontend replica cancels the request context and aborts the upload with
"context canceled" mid-transfer. Large models then never finish staging,
so they never load (observed in a 2-replica deployment: both frontends
repeatedly failed to stage a 15.7 GB GGUF, each attempt dying at a
different offset).

Bind the cold load (staging + LoadModel + the per-model advisory lock) to
context.WithoutCancel(ctx): it keeps the request's values (prefix chain)
but drops cancellation/deadline. Each long step keeps its own bound (the
file stager's resume budget, LoadModel's 5m timeout), and the advisory
lock still de-dupes concurrent loaders across replicas.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 09:06:20 +02:00
LocalAI [bot]
20c643e1f6 chore(model gallery): 🤖 add 1 new models via gallery agent (#10439)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 08:46:34 +02:00
VJSai
64a4351f3a feat: send a LocalAI User-Agent on registry pulls (#10434)
LocalAI pulls models from OCI registries (via go-containerregistry), the
Ollama registry, and OCI blob stores (via oras), but every request went
out with the underlying library's generic User-Agent, so registry
operators had no way to attribute traffic to LocalAI.

Add an oci.UserAgent() helper that returns "LocalAI" (or
"LocalAI/<version>" when the binary is built with a version stamp via
internal.Version) and wire it into all three pull paths:

- pkg/oci/image.go: remote.WithUserAgent on the go-containerregistry
  image and digest requests
- pkg/oci/ollama.go: a User-Agent header on the Ollama manifest request
- pkg/oci/blob.go: a LocalAI User-Agent on the oras blob client. This
  mirrors oras' auth.DefaultClient (same retry.DefaultClient policy);
  only the advertised User-Agent changes.

Implements #6258.


Assisted-by: Claude:claude-opus-4-8 golangci-lint

Signed-off-by: Vijay Sai <vijaysaijnv@gmail.com>
2026-06-22 08:44:12 +02:00
LocalAI [bot]
b7d67f5779 chore: ⬆️ Update ggml-org/llama.cpp to 7c082bc417bbe53210a83df4ba5b49e18ce6193c (#10417)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 08:43:40 +02:00
LocalAI [bot]
600dafd20b feat(ced): sound-event classification backend (CED audio tagger) (#10425)
* feat(ced): sketch sound-classification backend (CED audio tagger)

Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry,
footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend.

SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist
in DESIGN.md):
- backend/backend.proto: new SoundDetection rpc + SoundClass messages
  (run `make protogen-go` to regenerate pkg/grpc/proto).
- backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h),
  goced.go (Ced gRPC backend: Load + SoundDetection), Makefile
  (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh,
  package.sh, .gitignore.
- DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability
  registration checklist), gallery/index + CI registration, and a scoping
  note for the realtime/websocket live-recognition path (sliding-window
  classify over the existing ws transport + voicegate; the ced C-API
  per-PCM entry point is already window-friendly).

Backend code does not compile until protogen-go regenerates the pb types
and a libced.so is built (Makefile clones+builds it).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): REST /v1/audio/classification endpoint + capability registration

Wires the ced sound-event classification backend (AudioSet audio tagger)
end to end through the REST surface, mirroring the transcription path.

- Handler: core/http/endpoints/openai/sound_classification.go parses the
  multipart audio upload, temp-files it, resolves the model config and
  calls the SoundDetection RPC; returns {model, detections[]} JSON.
- Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection)
  loads the model and normalizes the proto response into schema types.
- Schema: core/schema/sound_classification.go (SoundClassificationResult).
- gRPC layer: SoundDetection wired through the LocalAI wrapper (interface,
  Backend client, Client, embed, server, base default) so the loader-typed
  client exposes the RPC; proto regenerated via make protogen-go.
- Route: POST /v1/audio/classification (+ /audio/classification alias) with
  the audio/multipart default-model middleware in routes/openai.go.
- Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_
  CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap +
  GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase
  option; /api/instructions audio area updated; auth RouteFeatureRegistry +
  FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI
  usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter
  + i18n; docs page features/audio-classification.md + whats-new + crosslink.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): realtime sound-event detection over the websocket API

When a realtime pipeline configures a sound-classification model, each
VAD-committed utterance (the same window the transcription path produces)
is also run through the CED sound-event classifier and the scored AudioSet
tags are emitted as a new server event. No new backend rpc is needed: the
SoundDetection gRPC method already exists on this branch.

- config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty)
  beside Transcription/VAD.
- realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the
  ModelInterface; implement it on wrappedModel and transcriptOnlyModel by
  calling backend.ModelSoundDetection with the session's sound-classification
  model config (mirrors how Transcribe dispatches). Load the optional config
  in newModel / newTranscriptionOnlyModel; nil config keeps it additive.
- types: add ConversationItemSoundDetectionEvent (item_id, content_index,
  detections[]{label,score,index}) with type conversation.item.sound_detection,
  its ServerEventType constant and MarshalJSON, mirroring the transcription
  completed event.
- realtime: add emitSoundDetection (unary path: classify the committed window,
  build the event, t.SendEvent) and wire it at the utterance-commit hook right
  after emitTranscription; gated on session.SoundDetectionEnabled (resolved
  from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0).
  Its error is logged via xlog but never aborts the turn.
- test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections,
  classifier error) plus a SoundDetection method on the fakeModel double.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): implement SoundDetection in nodes backend test doubles

The SoundDetection method added to the grpc backend interface left two
test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so
core/services/nodes failed to compile under `go vet`/`go test` (go build
missed it: the doubles live in _test.go). Add the method to both,
mirroring their existing Detect mock. Repairs CI for the nodes package.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): decouple realtime sound detection from VAD (sound-only sessions)

Sound-event detection must activate on sounds, not speech, so it no longer
runs through the voice VAD/transcription path. A sound-detection-only
pipeline (sound_detection set, no transcription/LLM) now:

- is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline
  stage),
- builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS
  loaded), and
- defaults the session to turn_detection none (no VAD) with no transcription
  stage, so the client drives windowing via input_audio_buffer.commit
  (option A: client-side sliding window). The per-PCM C-API already supports
  arbitrary windows.

commitUtterance gains a sound-only branch: it emits the
conversation.item.sound_detection event (scored AudioSet tags) and stops -
no transcription, no LLM response. generateResponse is now guarded on a
transcription stage being present, so a sound-only turn never invokes the LLM.

Existing transcription/VAD sessions are unchanged (additive). Added a
commitUtterance sound-only Ginkgo spec asserting it emits the sound event and
neither transcribes nor generates a response. go vet + golangci-lint
(new-from-merge-base) clean; openai suite green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): register sound-classification backend in gallery + CI

Mechanical backend-image registration for the ced sound-event classifier,
mirroring the parakeet-cpp Go/purego backend everywhere it is wired up.

- .github/backend-matrix.yml: add the ced build matrix, field-for-field copies
  of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64,
  l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan
  amd64/arm64, rocm hipblas, and the metal darwin entry), changing only
  backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang.
- backend/index.yaml: add the &ced meta anchor (capabilities map per platform)
  plus ced-development and the per-arch image entries, each uri/mirror
  tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is
  intentionally deferred pending the HuggingFace publish (TODO note inline).
- scripts/changed-backends.js: add an explicit item.backend === "ced" branch in
  inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as
  the parakeet-cpp branch (before the generic golang fallthrough).
- .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in
  backend/go/ced/Makefile so the daily bot bumps the pin.
- swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so
  the existing /v1/audio/classification annotations land in the generated spec.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): server-side windowing for realtime sound detection (option B)

Adds an optional server-driven sliding-window classifier so a sound-only
realtime client only has to stream audio (no input_audio_buffer.commit):

- Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs.
  When both > 0 on a sound-only session, the server classifies the last
  window of streamed audio every hop and emits a conversation.item.sound_
  detection event; the input buffer is trimmed to one window so a long
  stream stays bounded. When unset, the session stays client-driven
  (option A). Runs independent of VAD (sound events are not speech).
- handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so
  it is unit-testable) + writeWindowWAV, which declares the true
  InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples
  correctly. Goroutine is started after toggleVAD and torn down with the
  session (close + wg.Wait).
- Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta
  registry; the earlier realtime commit added pipeline.sound_detection
  without a registry entry, failing TestAllFieldsHaveRegistryEntries. This
  fixes that and covers the two new knobs.

Tests: classifySoundWindow emits an event + trims the buffer to one window,
no-ops on too-little audio; writeWindowWAV declares the given sample rate.
go build/vet + golangci-lint (new-from-merge-base) clean; config + openai
suites green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0)

The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0,
converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced +
known_usecases: sound_classification) and two gallery/index.yaml entries
(ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and
removes the now-resolved TODO from backend/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add tiny/mini/small GGUF model gallery entries

Publishes the rest of the CED family (same architecture, metadata-driven port
verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds
their f16 + q8_0 gallery entries:

  ced-tiny  (5.5M, edge/Pi-class)  f16 11MB / q8_0 6MB
  ced-mini  (9.6M)                 f16 19MB / q8_0 11MB
  ced-small (22M)                  f16 42MB / q8_0 23MB

All sha256-pinned. ced-base remains the accuracy default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo

All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single
HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8
gallery model entries' urls + file uris accordingly. sha256 and filenames are
unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): bump CED_VERSION to the short-clip fix

Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip
shorter than target_length (~10.11s): time_pos_embed was added at its full
63-frame grid instead of being sliced to the clip's actual time grid, tripping
ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s
windows) and gated with a short-clip parity test upstream.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive

- README.md: add ced.cpp to the "native C/C++/GGML engines developed and
  maintained by the LocalAI project" table.
- docs/content/features/backends.md: add a Sound Classification backend
  category (sound-event classification / audio tagging) listing ced.cpp.
- .agents/adding-backends.md: add a "Documenting the backend" section and two
  verification-checklist items requiring new backends to be documented in the
  backends.md category list, and in-house native engines to be added to the
  README maintained-engines table. This directive was missing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): repin CED_VERSION to the v0.1.0 release commit

ced.cpp history was squashed into a single release commit (tagged v0.1.0), so
the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the
v0.1.0 release commit, so the backend builds against a commit that exists.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths

- sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of
  the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler.
- goced.go: reading a NUL-terminated C string from a libced-owned buffer.
  #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since
  the uintptr is a C-owned malloc'd buffer, not Go-GC memory.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 01:00:28 +02:00
LocalAI [bot]
ce8a3e9266 chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 4536dcdce27c3764a93a06d6bf64026b124962f5 (#10431)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 01:00:10 +02:00
LocalAI [bot]
a88d9d2de3 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 6c00e87ac84404af588ad2e65935bd6f079c696f (#10430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:49 +02:00
LocalAI [bot]
1cf1bf32e1 chore: ⬆️ Update leejet/stable-diffusion.cpp to b12098f5d09fc83da36e65c784f7bdb16a5a5ebf (#10429)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:33 +02:00