Commit Graph

1515 Commits

Author SHA1 Message Date
LocalAI [bot]
f1e5071321 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8caa3f908ae6d4a4bef531e73b9a969f266a3d1f (#10493)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:31 +02:00
LocalAI [bot]
93d6255de3 chore: ⬆️ Update ggml-org/llama.cpp to 8be759e6f70d629638a7eb70db3824cbdcea370b (#10501)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:11:17 +02:00
LocalAI [bot]
fae9f6356f chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 9dbe7ea26a01b30fccb117ae5e86807c1dc23d42 (#10499)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 08:10:41 +02:00
LocalAI [bot]
066abf82c0 feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough (#10490)
* feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options

Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main
model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets
users keep MoE expert weights on CPU to manage VRAM on large MoE models.

Closes part of #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): forward unknown '-' options to upstream arg parser

Any options: entry starting with '-' is collected and passed verbatim to
llama.cpp's own common_params_parse (LLAMA_EXAMPLE_SERVER) at the end of
params_parse, so every upstream llama-server flag works without a new
hand-wired branch. Passthrough runs last and wins on overlap; n_parallel is
snapshotted to survive parser_init's SERVER reset, and help/usage/completion
flags are skipped to avoid exiting the backend.

Closes #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(llama-cpp): document cpu_moe/n_cpu_moe and option passthrough

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama-cpp): terminate tensor/kv override vectors after passthrough

The tensor_buft_overrides padding and the kv/draft override terminators
ran before the generic option passthrough, so a passthrough flag
(--cpu-moe, --override-tensor, --override-kv, ...) appended a real entry
after the null sentinel - tripping the model loader's
back().pattern == nullptr assertion (crash) or being silently dropped.
Move all three termination/padding blocks to the end of params_parse,
after both the named-option loop and common_params_parse have pushed
their real entries. Also widen the exit()-flag skip list so --version,
--license, --list-devices and --cache-list cannot terminate the backend.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:10:08 +02:00
LocalAI [bot]
a7fec9a49d feat(backends): add darwin/metal (MPS) build for trl (#10487)
* feat(backends): add darwin/metal (MPS) build for trl

Authors backend/python/trl/requirements-mps.txt and wires trl into the
darwin CI matrix and gallery so the MPS training path can be built and
validated on Apple Silicon. The MPS variant installs plain PyPI torch
wheels (MPS-capable on macOS arm64) and the trl training stack; bitsandbytes
is omitted as it is a CUDA-only dependency with poor Apple Silicon support.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(trl): guard uv-only --index-strategy for the pip/darwin path

The darwin/MPS build installs with pip (USE_PIP=true), which rejects the
uv-only --index-strategy flag and failed the darwin backend build. Add it
only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:36 +02:00
LocalAI [bot]
c678530cf0 fix(backends): darwin/metal support across purego Go backends (#10481)
* fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path)

The parakeet-cpp backend had no macOS support and panicked at startup on
Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so".
Fix it across the same four layers the sibling voxtral backend already
handles correctly:

- main.go: default the dlopen target to libparakeet.dylib on darwin
  (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins.
- Makefile: also stage the built libparakeet.dylib next to the Go sources.
- package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when
  bundling instead of hard-failing when no .so is present (the macOS case);
  note that on Darwin only system frameworks are linked.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the
  packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux.

Mirrors backend/go/voxtral.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(backends): darwin/metal support across purego Go backends

The parakeet-cpp fix in the previous commit was an instance of a bug
shared by nearly every purego/dlopen Go backend: the dlopen target was
hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the
backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the
.dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling
this correctly.

Apply the same four-layer fix (mirroring backend/go/voxtral) to the
remaining affected backends:

  whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp,
  qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp,
  depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe

Per backend:
- main.go (sherpa-onnx: backend.go, two libraries): default the dlopen
  target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the
  existing <BACKEND>_LIBRARY env override still wins.
- run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at
  the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant
  (avx/avx2/avx512) selection unchanged in the else branch.
- package.sh: also bundle the .dylib and stop hard-failing when no .so is
  present (the macOS case).
- Makefile: also stage the built .dylib.

Notes:
- stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE,
  which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls
  back to .so so both layouts work.
- sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only
  run.sh and the two dlopen defaults needed fixing.

Linux behavior is unchanged. Verified gofmt-clean and
`CGO_ENABLED=0 go build` for every backend.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-25 08:09:18 +02:00
LocalAI [bot]
3c63431e46 chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 0f37401bebe9b20c0160a888e592108fc1d17607 (#10492)
⬆️ Update ServeurpersoCom/omnivoice.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:58 +02:00
LocalAI [bot]
3f647a2764 chore: ⬆️ Update ikawrakow/ik_llama.cpp to d5507e33ae7ee2b7b41475f08044d3bde3b839ee (#10498)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-25 00:57:42 +02:00
LocalAI [bot]
62b14fd635 feat(backends): add darwin/metal build for liquid-audio (#10486)
* feat(backends): add darwin/metal build for liquid-audio

Wire the already-MPS-ready liquid-audio backend (it ships
requirements-mps.txt) into the darwin CI matrix and the gallery so
metal-darwin-arm64 images are built and selectable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* ci(liquid-audio): trigger darwin build via requirements-mps note

The changed-backends path filter only builds a backend when a file under
its directory changes. The metal wiring lived in index.yaml + the matrix,
so the darwin job was skipped. Add a documenting comment to the MPS
requirements so CI actually exercises the darwin build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

* fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path

Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which
rejects the uv-only --index-strategy flag and failed the darwin backend build.
Add it only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 23:16:27 +02:00
LocalAI [bot]
193d0e6aef fix(backends): darwin/metal support for supertonic (#10488)
The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and
packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed
ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec
path, while package.sh hard-failed on any non-Linux host. On macOS dyld has
no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib.

This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481
landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but
which omitted supertonic:

- run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH
  at libonnxruntime.dylib; guard the ld.so exec path to Linux only.
- package.sh: recognize Darwin instead of erroring out; the bundled .dylib is
  resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle.
- helper.go: platform-native default library extension (dylib on darwin) for
  the last-resort dlopen fallback.

It also wires the darwin CI build and gallery entries, resolving the
inconsistency where backend/index.yaml advertised metal for supertonic but no
includeDarwin matrix entry built the image:

- .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry.
- backend/index.yaml: declare metal capabilities and add the concrete
  metal-supertonic / metal-supertonic-development child entries.

The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX
Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:19:03 +02:00
LocalAI [bot]
0f3b24436d chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a (#10468)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:43 +02:00
LocalAI [bot]
4b6f911835 chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea (#10466)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:14 +02:00
LocalAI [bot]
a5e28942a6 chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:54 +02:00
LocalAI [bot]
dba9cd7ca4 chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc (#10465)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:34 +02:00
LocalAI [bot]
c93190de50 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e (#10456)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:13 +02:00
LocalAI [bot]
06a7b6cadb chore: ⬆️ Update leejet/stable-diffusion.cpp to f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f (#10457)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:29:07 +02:00
LocalAI [bot]
67c8889866 chore: ⬆️ Update CrispStrobe/CrispASR to 63b57289255267edf66e43e33bc3911e04a2e92d (#10455)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:49 +02:00
LocalAI [bot]
1d49041c85 chore: ⬆️ Update ggml-org/llama.cpp to 73618f27a801c0b8614ceaf3547d3c2a99baae14 (#10458)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:28:09 +02:00
LocalAI [bot]
2edc4e25b3 chore: ⬆️ Update ggml-org/whisper.cpp to bae6bc02b1940bbfb87b6a0299c565e563b916d1 (#10459)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-23 13:27:51 +02:00
Adira
62c99c10b3 fix(diffusers): pin diffusers and transformers to a known-good pair (#9979) (#10442)
fix(diffusers): pin diffusers and transformers to a known-good pair

The diffusers backend tracked git+https://github.com/huggingface/diffusers
(main) with an unpinned transformers. transformers v5 restructured
CLIPTextModel and removed the .text_model attribute that diffusers' single
-file loader reads, so loading any single-file Stable Diffusion checkpoint
fails:

    create_diffusers_clip_model_from_ldm (single_file_utils.py)
    position_embedding_dim = model.text_model.embeddings.position_embedding...
    AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

No released diffusers (<=0.38.0) supports transformers v5 - only unreleased
diffusers main does. Because the requirements tracked main plus an unpinned
transformers, every backend image froze whichever pair existed at build
time, and images built once transformers v5 shipped but before diffusers
main caught up are permanently broken.

Pin the last known-good released pair across all requirements files:
diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every
pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no
functionality is lost, and builds become reproducible instead of drifting
into the broken window.

Fixes #9979

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-06-22 12:38:06 +02:00
LocalAI [bot]
7226bb9f30 chore: ⬆️ Update CrispStrobe/CrispASR to 7a8cb80907341c0204bd0488c1244764f4163883 (#10315)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 12:21:58 +02:00
LocalAI [bot]
b7d67f5779 chore: ⬆️ Update ggml-org/llama.cpp to 7c082bc417bbe53210a83df4ba5b49e18ce6193c (#10417)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 08:43:40 +02:00
LocalAI [bot]
600dafd20b feat(ced): sound-event classification backend (CED audio tagger) (#10425)
* feat(ced): sketch sound-classification backend (CED audio tagger)

Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry,
footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend.

SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist
in DESIGN.md):
- backend/backend.proto: new SoundDetection rpc + SoundClass messages
  (run `make protogen-go` to regenerate pkg/grpc/proto).
- backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h),
  goced.go (Ced gRPC backend: Load + SoundDetection), Makefile
  (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh,
  package.sh, .gitignore.
- DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability
  registration checklist), gallery/index + CI registration, and a scoping
  note for the realtime/websocket live-recognition path (sliding-window
  classify over the existing ws transport + voicegate; the ced C-API
  per-PCM entry point is already window-friendly).

Backend code does not compile until protogen-go regenerates the pb types
and a libced.so is built (Makefile clones+builds it).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): REST /v1/audio/classification endpoint + capability registration

Wires the ced sound-event classification backend (AudioSet audio tagger)
end to end through the REST surface, mirroring the transcription path.

- Handler: core/http/endpoints/openai/sound_classification.go parses the
  multipart audio upload, temp-files it, resolves the model config and
  calls the SoundDetection RPC; returns {model, detections[]} JSON.
- Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection)
  loads the model and normalizes the proto response into schema types.
- Schema: core/schema/sound_classification.go (SoundClassificationResult).
- gRPC layer: SoundDetection wired through the LocalAI wrapper (interface,
  Backend client, Client, embed, server, base default) so the loader-typed
  client exposes the RPC; proto regenerated via make protogen-go.
- Route: POST /v1/audio/classification (+ /audio/classification alias) with
  the audio/multipart default-model middleware in routes/openai.go.
- Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_
  CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap +
  GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase
  option; /api/instructions audio area updated; auth RouteFeatureRegistry +
  FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI
  usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter
  + i18n; docs page features/audio-classification.md + whats-new + crosslink.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): realtime sound-event detection over the websocket API

When a realtime pipeline configures a sound-classification model, each
VAD-committed utterance (the same window the transcription path produces)
is also run through the CED sound-event classifier and the scored AudioSet
tags are emitted as a new server event. No new backend rpc is needed: the
SoundDetection gRPC method already exists on this branch.

- config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty)
  beside Transcription/VAD.
- realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the
  ModelInterface; implement it on wrappedModel and transcriptOnlyModel by
  calling backend.ModelSoundDetection with the session's sound-classification
  model config (mirrors how Transcribe dispatches). Load the optional config
  in newModel / newTranscriptionOnlyModel; nil config keeps it additive.
- types: add ConversationItemSoundDetectionEvent (item_id, content_index,
  detections[]{label,score,index}) with type conversation.item.sound_detection,
  its ServerEventType constant and MarshalJSON, mirroring the transcription
  completed event.
- realtime: add emitSoundDetection (unary path: classify the committed window,
  build the event, t.SendEvent) and wire it at the utterance-commit hook right
  after emitTranscription; gated on session.SoundDetectionEnabled (resolved
  from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0).
  Its error is logged via xlog but never aborts the turn.
- test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections,
  classifier error) plus a SoundDetection method on the fakeModel double.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): implement SoundDetection in nodes backend test doubles

The SoundDetection method added to the grpc backend interface left two
test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so
core/services/nodes failed to compile under `go vet`/`go test` (go build
missed it: the doubles live in _test.go). Add the method to both,
mirroring their existing Detect mock. Repairs CI for the nodes package.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): decouple realtime sound detection from VAD (sound-only sessions)

Sound-event detection must activate on sounds, not speech, so it no longer
runs through the voice VAD/transcription path. A sound-detection-only
pipeline (sound_detection set, no transcription/LLM) now:

- is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline
  stage),
- builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS
  loaded), and
- defaults the session to turn_detection none (no VAD) with no transcription
  stage, so the client drives windowing via input_audio_buffer.commit
  (option A: client-side sliding window). The per-PCM C-API already supports
  arbitrary windows.

commitUtterance gains a sound-only branch: it emits the
conversation.item.sound_detection event (scored AudioSet tags) and stops -
no transcription, no LLM response. generateResponse is now guarded on a
transcription stage being present, so a sound-only turn never invokes the LLM.

Existing transcription/VAD sessions are unchanged (additive). Added a
commitUtterance sound-only Ginkgo spec asserting it emits the sound event and
neither transcribes nor generates a response. go vet + golangci-lint
(new-from-merge-base) clean; openai suite green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): register sound-classification backend in gallery + CI

Mechanical backend-image registration for the ced sound-event classifier,
mirroring the parakeet-cpp Go/purego backend everywhere it is wired up.

- .github/backend-matrix.yml: add the ced build matrix, field-for-field copies
  of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64,
  l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan
  amd64/arm64, rocm hipblas, and the metal darwin entry), changing only
  backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang.
- backend/index.yaml: add the &ced meta anchor (capabilities map per platform)
  plus ced-development and the per-arch image entries, each uri/mirror
  tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is
  intentionally deferred pending the HuggingFace publish (TODO note inline).
- scripts/changed-backends.js: add an explicit item.backend === "ced" branch in
  inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as
  the parakeet-cpp branch (before the generic golang fallthrough).
- .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in
  backend/go/ced/Makefile so the daily bot bumps the pin.
- swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so
  the existing /v1/audio/classification annotations land in the generated spec.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): server-side windowing for realtime sound detection (option B)

Adds an optional server-driven sliding-window classifier so a sound-only
realtime client only has to stream audio (no input_audio_buffer.commit):

- Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs.
  When both > 0 on a sound-only session, the server classifies the last
  window of streamed audio every hop and emits a conversation.item.sound_
  detection event; the input buffer is trimmed to one window so a long
  stream stays bounded. When unset, the session stays client-driven
  (option A). Runs independent of VAD (sound events are not speech).
- handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so
  it is unit-testable) + writeWindowWAV, which declares the true
  InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples
  correctly. Goroutine is started after toggleVAD and torn down with the
  session (close + wg.Wait).
- Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta
  registry; the earlier realtime commit added pipeline.sound_detection
  without a registry entry, failing TestAllFieldsHaveRegistryEntries. This
  fixes that and covers the two new knobs.

Tests: classifySoundWindow emits an event + trims the buffer to one window,
no-ops on too-little audio; writeWindowWAV declares the given sample rate.
go build/vet + golangci-lint (new-from-merge-base) clean; config + openai
suites green.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0)

The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0,
converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced +
known_usecases: sound_classification) and two gallery/index.yaml entries
(ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and
removes the now-resolved TODO from backend/index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ced): add tiny/mini/small GGUF model gallery entries

Publishes the rest of the CED family (same architecture, metadata-driven port
verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds
their f16 + q8_0 gallery entries:

  ced-tiny  (5.5M, edge/Pi-class)  f16 11MB / q8_0 6MB
  ced-mini  (9.6M)                 f16 19MB / q8_0 11MB
  ced-small (22M)                  f16 42MB / q8_0 23MB

All sha256-pinned. ced-base remains the accuracy default.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo

All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single
HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8
gallery model entries' urls + file uris accordingly. sha256 and filenames are
unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): bump CED_VERSION to the short-clip fix

Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip
shorter than target_length (~10.11s): time_pos_embed was added at its full
63-frame grid instead of being sliced to the clip's actual time grid, tripping
ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s
windows) and gated with a short-clip parity test upstream.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive

- README.md: add ced.cpp to the "native C/C++/GGML engines developed and
  maintained by the LocalAI project" table.
- docs/content/features/backends.md: add a Sound Classification backend
  category (sound-event classification / audio tagging) listing ced.cpp.
- .agents/adding-backends.md: add a "Documenting the backend" section and two
  verification-checklist items requiring new backends to be documented in the
  backends.md category list, and in-house native engines to be added to the
  README maintained-engines table. This directive was missing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ced): repin CED_VERSION to the v0.1.0 release commit

ced.cpp history was squashed into a single release commit (tagged v0.1.0), so
the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the
v0.1.0 release commit, so the backend builds against a commit that exists.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths

- sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of
  the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler.
- goced.go: reading a NUL-terminated C string from a libced-owned buffer.
  #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since
  the uintptr is a C-owned malloc'd buffer, not Go-GC memory.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-22 01:00:28 +02:00
LocalAI [bot]
ce8a3e9266 chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 4536dcdce27c3764a93a06d6bf64026b124962f5 (#10431)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 01:00:10 +02:00
LocalAI [bot]
a88d9d2de3 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 6c00e87ac84404af588ad2e65935bd6f079c696f (#10430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:49 +02:00
LocalAI [bot]
1cf1bf32e1 chore: ⬆️ Update leejet/stable-diffusion.cpp to b12098f5d09fc83da36e65c784f7bdb16a5a5ebf (#10429)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-22 00:57:33 +02:00
OrbisAI Security
a556cd9afc fix: the trl backend's _do_training method directly ... in backend.py (#10422)
* fix: V-001 security vulnerability

Automated security fix generated by OrbisAI Security

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>

* fix: the trl backend's _do_training method directly ... in backend.py

The TRL backend's _do_training method directly uses request

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>

---------

Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>
2026-06-21 17:40:29 +02:00
pos-ei-don
b4c0dc67fe feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346) (#10351)
* fix(vllm): don't stream raw tool-call markup as content when a tool parser is active

When a tool_parser is configured and the request carries tools, the streaming
loop emitted every text delta as delta.content — including the model's raw
tool-call markup (e.g. <tool_call>...) — because extract_tool_calls only runs
on the full output after the stream. Clients streaming a tool call therefore
saw the unparsed tool-call syntax as assistant content.

Buffer the text while a tool parser is active for the request; the existing
end-of-stream chat_delta already carries the parsed tool_calls (or the cleaned
content), which the Go side converts to SSE deltas. Non-tool-parser streaming
is unchanged.

Add a server-less regression test covering both the tool-call case (no raw
markup leaked as content) and the plain-text case (content delivered exactly
once — guards against double-emitting the buffered content).

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* test(vllm): add expectedFailure test for progressive streaming with tool parser (Case 3, #582)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* test(vllm): add Cases 4+5 — marker split across chunks + false-positive prefix (TDD, Option B state machine, #582)

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* feat(vllm): progressive streaming via parser.extract_tool_calls_streaming

When a tool parser is active for a tool-enabled streaming request,
#10346 buffers the entire generation and surfaces it on the final
chunk to prevent raw tool-call markup from leaking as delta.content.
This is correct but turns the request into effectively non-streaming
for plain-text responses — the client sees nothing until the model
stops.

Every concrete tool parser shipped with vLLM 0.23+ already implements
extract_tool_calls_streaming (Granite4, Qwen3Coder, DeepSeekV31, Jamba,
Ernie45, Hermes2Pro, llama3_json, mistral, …). Use it: instantiate
the parser before the streaming loop and call its streaming method per
delta, emitting DeltaMessage(content=…) or DeltaMessage(tool_calls=[…])
when the parser is ready.

Falls back to the existing #10346 buffer path when:
  - the parser does not have extract_tool_calls_streaming, OR
  - extract_tool_calls_streaming raises mid-stream (logged, the
    rest of the request finishes via post-loop extract_tool_calls).

Tests (TestStreamingToolParser):
  1. Buffer path: no markup leaked, no content duplication
  2. Native streaming: plain-text response streams progressively
  3. Native streaming: tool_call structured, no markup leaked
  4. Native streaming exception → graceful fallback, no markup, no crash
  5. No tool parser → unchanged per-delta content stream

E2E verified against qwen3_coder on vLLM 0.23.0 (NVIDIA GB10 / arm64 / CUDA 13).

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* docs(vllm): add server-side TTFT benchmark for the streaming tool-parser path

Self-contained stdlib-only script that measures time-to-first-token (TTFT)
for the vLLM backend's two streaming scenarios:

  - tool_call:  request mentions a tool; model is expected to call it
  - plain_text: request offers a tool but explicitly asks for prose

Use this to compare:
  - the buffer-all path (#10346)         → plain_text TTFT ≈ total response time
  - the native-streaming path (this PR)  → plain_text TTFT ≈ true first-token time

  python examples/vllm-bench/ttft_streaming_tool_parser.py \\
      --url http://localhost:8080 --model my-coder --runs 3

Lives under examples/ so it does not interfere with the test suite.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

* examples/vllm-bench: add long-text scenario (8 paragraphs, 1500 tokens)

The long-text scenario shows the buffering vs streaming difference most
dramatically: with the buffer-all path, the client receives nothing for
20+ seconds and then the entire 1500-token response at once. With native
streaming, the first token arrives in tens of milliseconds and the
response flows progressively.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>

---------

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
Co-authored-by: Philipp Wacker <philipp.wacker@ibf-solutions.com>
2026-06-21 17:07:15 +02:00
番茄摔成番茄酱
01fa12e0de feat(nemo): enable word-level timestamps for ASR models (#10297)
* feat(nemo): enable word-level timestamps for ASR models

The nemo backend ignored timestamp_granularities and always returned a
single segment with start=0 end=0, making word-level timestamps
impossible to obtain even though the NeMo models (parakeet-tdt, etc.)
fully support them.

Changes:
- Add _get_stride_seconds() to compute frame duration from the model's
  preprocessor window_stride and encoder subsampling_factor.
- Add _build_segments_with_words() that extracts word offsets from the
  NeMo Hypothesis.timestamp dict and converts frame indices to
  nanosecond timestamps.
- Support 'word' granularity (one segment per word) and 'segment'
  granularity (merge at time-gap boundaries using a dynamic threshold).
- Populate TranscriptSegment.words with TranscriptWord entries so
  callers get both segment-level and word-level timing.
- Only request timestamps from NeMo when the caller actually asks for
  them (timestamp_granularities is non-empty), keeping the fast path
  unchanged for callers that don't need timestamps.

Tested with nvidia/parakeet-tdt-0.6b-v3 on the JFK "ask not" clip:
  curl -X POST /v1/audio/transcriptions \
    -F file=@jfk.wav -F model=nemo-parakeet-tdt-0.6b \
    -F 'timestamp_granularities[]=word' -F response_format=verbose_json
  → each word has correct start/end times in seconds.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

* fix(nemo): address Copilot review feedback

- Narrow exception handling in _get_stride_seconds to catch only
  AttributeError, KeyError, TypeError instead of bare Exception, and
  emit a warning when falling back to the hardcoded stride.
- Remove explicit return_hypotheses=False when timestamps are requested;
  timestamps=True already forces NeMo to return Hypothesis objects.
- Add a warning when NeMo does not return Hypothesis objects despite
  timestamps being requested.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

---------

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-21 17:04:19 +02:00
番茄摔成番茄酱
cf7f9573a2 fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421)
The parakeet-specific word accessors can return stale initialisation
data (model name, binary blobs) for segments with no real speech.
Add isValidWord() to filter out words that have:
- empty or whitespace-only text
- U+FFFD replacement characters (from binary data scrubbing)
- negative timestamps
- zero duration (end <= start)

Also skip empty segments entirely when they have no recognisable
content (empty text AND no valid words), preventing spurious subtitle
entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'.

Applies to both AudioTranscription and AudioTranscriptionStream.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-21 17:03:33 +02:00
pos-ei-don
c6303104c7 fix(vllm): structured outputs silently ignored on vLLM >= 0.23 (GuidedDecodingParams removed) (#10343)
fix(vllm): structured outputs silently ignored on vLLM >= 0.23

vLLM >= 0.23 removed GuidedDecodingParams (now StructuredOutputsParams) and
renamed the SamplingParams field guided_decoding -> structured_outputs. The
import failed, HAS_GUIDED_DECODING became False, and the whole guided-decoding
block was skipped, so response_format / grammar constraints were silently
ignored. Adapt the existing request.Grammar path to the new class/field.

Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>
2026-06-21 17:02:31 +02:00
LocalAI [bot]
e19c43cf04 feat(gallery): add Depth Anything V2 models + bump native version (#10413)
* feat(gallery): add Depth Anything V2 models + bump native version

Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is
depth-only (no camera pose, no confidence) and ships both relative
(relative inverse depth) and metric (depth in metres) variants. The Go
backend is model-agnostic, so no backend code changes are required — only
a native version bump and new gallery entries.

- backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the
  depth-anything.cpp commit that adds the DA2 engine + C-API routing
  (e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag
  da2-support so it survives a squash-merge).
- gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus
  Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models
  carry the metric-depth tag; none carry camera-pose.

Assisted-by: Claude:claude-opus-4-8

* chore(depth-anything-cpp): pin to merged DA2 master commit

PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint
the pin from the pre-merge commit to the canonical master commit.

Assisted-by: Claude:claude-opus-4-8

---------

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 14:56:16 +02:00
LocalAI [bot]
518381278e chore: ⬆️ Update ggml-org/llama.cpp to e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62 (#10392)
* ⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama-cpp): adapt grpc-server to upstream server-schema split

Upstream llama.cpp (e475fa2) extracted the JSON request-schema evaluation
out of the static server_task::params_from_json_cmpl into the new
server_schema::eval_llama_cmpl_schema (tools/server/server-schema.cpp).
The grpc-server unity build still called the old static member, breaking
every llama-cpp backend build with "no member named 'params_from_json_cmpl'
in 'server_task'".

Pull server-schema.cpp into the translation unit and call the new function,
keeping both guarded by __has_include so forks that predate the split (e.g.
llama-cpp-turboquant, which still exposes params_from_json_cmpl) keep
compiling against the old static member.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 08:22:22 +02:00
LocalAI [bot]
93706fec57 chore: ⬆️ Update mudler/parakeet.cpp to db755a78d39f789bb7d4e3935158a9e8105dbe36 (#10393)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:33 +02:00
LocalAI [bot]
11aee03a80 chore: ⬆️ Update localai-org/privacy-filter.cpp to 98f52c5ef2250f207cc6b9a6aef05393a120cb7c (#10394)
⬆️ Update localai-org/privacy-filter.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:21 +02:00
LocalAI [bot]
8915f2ab91 chore: ⬆️ Update ggml-org/whisper.cpp to 5ed76e9a079962f1c85cfce44edd325c27ef1f97 (#10396)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:37:06 +02:00
LocalAI [bot]
f143d7f688 chore: ⬆️ Update ikawrakow/ik_llama.cpp to d47f484d299cafad2e606afc0d31677a91b242d0 (#10410)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:51 +02:00
LocalAI [bot]
dd928f0bdd chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to 26fcea5468e4069bc72d1f2fcc812c985e7361bb (#10409)
⬆️ Update ServeurpersoCom/qwentts.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:36 +02:00
LocalAI [bot]
c43a752afc chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to 96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd (#10408)
⬆️ Update ServeurpersoCom/omnivoice.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-20 01:36:22 +02:00
番茄摔成番茄酱
72d46c1115 feat(crispasr): add word-level timestamp support (#10403)
* feat(crispasr): add word-level timestamp support

Add word-level timestamp extraction to the crispasr backend by calling
the CrispASR C library's word accessor functions that are already
exported by libgocraspasr but were not previously bound by the Go
wrapper.

Two families of word functions are supported:

1. Session-based (get_word_count/text/t0/t1) — works per-segment for
   whisper-like backends.
2. Parakeet-specific (get_parakeet_word_count/text/t0/t1) — returns a
   global word list for TDT/CTC/RNNT parakeet models where the session
   API does not expose per-segment word data.

The Go code tries session-based first and falls back to parakeet-specific
when the session word count is zero.

Depends on #10402 (grpc server Words forwarding) for the words to reach
the HTTP response.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

* fix(crispasr): use portable sed -i.bak for macOS compatibility

BSD sed requires -i '' for in-place editing while GNU sed uses -i.
Replace with -i.bak which works on both platforms, then remove the
backup file.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>

---------

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-19 21:34:30 +02:00
Richard Palethorpe
606128e4e9 feat(vulkan): make Vulkan backends self-contained on the GPU (#10404)
Vulkan backends bundled their own loader and ICD manifests but neither the
Mesa driver the manifests point at nor a way to make the loader find them,
so on a runtime base image without Mesa the loader enumerated zero devices
and the GPU silently fell back to CPU (only NVIDIA worked, since its ICD is
injected by the container toolkit).

- scripts/build/package-gpu-libs.sh: for each installed ICD manifest, bundle
  the driver .so its library_path names — no hard-coded, platform-dependent
  soname list — plus that driver's ldd dependencies, skipping manifests whose
  driver isn't installed. Rewrite each library_path to a bare soname so the
  bundled driver resolves via the LD_LIBRARY_PATH run.sh already sets.
- .docker/install-base-deps.sh, backend/Dockerfile.golang,
  backend/Dockerfile.python: install mesa-vulkan-drivers in every Vulkan
  builder so the driver + manifests exist to be packaged (the LunarG SDK
  ships only the loader and shader tooling).
- pkg/model/process.go: when a backend ships vulkan/icd.d/, point the loader
  at it via VK_DRIVER_FILES/VK_ICD_FILENAMES at launch (no-op otherwise).
  Covered by pkg/model/process_vulkan_test.go.
- backend/go/parakeet-cpp/package.sh: complete the L0 stub (was missing the
  libc-family ldd walk + GPU-lib packaging) by mirroring whisper, so the
  vulkan-parakeet image actually bundles its GPU runtime.

Assisted-by: Claude Code:claude-opus-4-8

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-19 17:16:33 +02:00
LocalAI [bot]
4ad754eea3 chore: ⬆️ Update ikawrakow/ik_llama.cpp to b3dfb7858cfcb9166e92f366e5af87f19ebc94be (#10395)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-19 00:03:37 +02:00
Tai An
c3b3336654 fix(whisperx): use whisperx.diarize.DiarizationPipeline with token kwarg (#10389)
Signed-off-by: Anai-Guo <antai12232931@outlook.com>
2026-06-18 18:50:37 +02:00
Richard Palethorpe
3fa7b2955c feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360)
Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see
backup/pii-ner-tier-engine-prerebase). Net change:

- privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter
  PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan).
  TokenClassify moves off the patched llama.cpp path onto this backend.
- PII filter reworked to be NER-centric (encoder/NER detection tier scanning
  whole conversations as one document), with a recreated bounded restricted-
  regex secret-matching pattern detector tier alongside it (per-model
  pii_detection.builtins / .patterns + core/services/routing/piipattern).
- Detection labelled by source (ner vs pattern); backend trace / confidence /
  debug observability; analyze/redact exposed as a synchronous API.
- Instance-wide default detector policy + per-usecase default-on; request
  filtering extended to completions, embeddings, edits & Ollama.
- React UI: NER-centric PII editor, detector-models table, pattern/builtins
  editor, middleware default-policy UI.
- Gallery: privacy-filter-multilingual token-classify model + NER install
  filter; token_classify known_usecase; batch sized to context for NER models.
  privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13
  meta + image entries with a capabilities map) matching its CI matrix jobs,
  and an /import-model auto-detect importer (PrivacyFilterImporter, narrow
  privacy-filter GGUF detection) replacing the prior pref-only registration.

Reconciled against master's independent evolution:

- Dropped master's PIIPatternOverrides feature (global-pattern runtime
  overrides + /api/pii/patterns API + runtime_settings.json persistence). The
  per-model NER + pattern-detector design supersedes it; it was built on the
  global redactor pattern set this branch replaced.
- Reverted the llama.cpp Score carry-patch (0006-server-task-type-score):
  removed the patch and restored master's grpc-server.cpp Score RPC (direct
  llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's
  model_config validation forbidding score + chat/completion/embeddings on
  llama-cpp. token_classify is unaffected (it runs on the privacy-filter
  backend, not llama-cpp).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-18 11:45:22 +01:00
LocalAI [bot]
c133ca39dc chore: ⬆️ Update ggml-org/llama.cpp to f3e182816421c648188b5eab269853bf1531d950 (#10379)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 11:43:23 +02:00
LocalAI [bot]
91f97f2a54 chore: ⬆️ Update ggml-org/whisper.cpp to 86c40c3bd6fc86f1187fb751d111b49e0fc18e84 (#10382)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 08:34:43 +02:00
LocalAI [bot]
55f9ff6805 chore: ⬆️ Update mudler/parakeet.cpp to 92a5f0306be354c109150fe58ae4cc4f8a21ca45 (#10380)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 08:32:13 +02:00
LocalAI [bot]
5c2ae7857a chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 00:32:13 +02:00
LocalAI [bot]
4af360300f chore: ⬆️ Update ikawrakow/ik_llama.cpp to 71af16a6b7f6fb7315b346b4a51aad530599c3f5 (#10381)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-18 00:12:25 +02:00
LocalAI [bot]
95e7149c87 chore: ⬆️ Update ggml-org/llama.cpp to 74ade52741203e5c8f81eaf06a96cb1cfe15f2a3 (#10368)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-17 13:25:29 +02:00