LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-25 09:09:07 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	f1e5071321	chore: ⬆️ Update leejet/stable-diffusion.cpp to `8caa3f908ae6d4a4bef531e73b9a969f266a3d1f` (#10493 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:31 +02:00
LocalAI [bot]	93d6255de3	chore: ⬆️ Update ggml-org/llama.cpp to `8be759e6f70d629638a7eb70db3824cbdcea370b` (#10501 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:17 +02:00
LocalAI [bot]	fae9f6356f	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `9dbe7ea26a01b30fccb117ae5e86807c1dc23d42` (#10499 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:10:41 +02:00
LocalAI [bot]	066abf82c0	feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough (#10490 ) * feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets users keep MoE expert weights on CPU to manage VRAM on large MoE models. Closes part of #10483 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): forward unknown '-' options to upstream arg parser Any options: entry starting with '-' is collected and passed verbatim to llama.cpp's own common_params_parse (LLAMA_EXAMPLE_SERVER) at the end of params_parse, so every upstream llama-server flag works without a new hand-wired branch. Passthrough runs last and wins on overlap; n_parallel is snapshotted to survive parser_init's SERVER reset, and help/usage/completion flags are skipped to avoid exiting the backend. Closes #10483 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(llama-cpp): document cpu_moe/n_cpu_moe and option passthrough Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(llama-cpp): terminate tensor/kv override vectors after passthrough The tensor_buft_overrides padding and the kv/draft override terminators ran before the generic option passthrough, so a passthrough flag (--cpu-moe, --override-tensor, --override-kv, ...) appended a real entry after the null sentinel - tripping the model loader's back().pattern == nullptr assertion (crash) or being silently dropped. Move all three termination/padding blocks to the end of params_parse, after both the named-option loop and common_params_parse have pushed their real entries. Also widen the exit()-flag skip list so --version, --license, --list-devices and --cache-list cannot terminate the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:10:08 +02:00
LocalAI [bot]	a7fec9a49d	feat(backends): add darwin/metal (MPS) build for trl (#10487 ) * feat(backends): add darwin/metal (MPS) build for trl Authors backend/python/trl/requirements-mps.txt and wires trl into the darwin CI matrix and gallery so the MPS training path can be built and validated on Apple Silicon. The MPS variant installs plain PyPI torch wheels (MPS-capable on macOS arm64) and the trl training stack; bitsandbytes is omitted as it is a CUDA-only dependency with poor Apple Silicon support. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(trl): guard uv-only --index-strategy for the pip/darwin path The darwin/MPS build installs with pip (USE_PIP=true), which rejects the uv-only --index-strategy flag and failed the darwin backend build. Add it only on the uv path; Linux/CUDA resolution is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:09:36 +02:00
LocalAI [bot]	c678530cf0	fix(backends): darwin/metal support across purego Go backends (#10481 ) * fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path) The parakeet-cpp backend had no macOS support and panicked at startup on Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so". Fix it across the same four layers the sibling voxtral backend already handles correctly: - main.go: default the dlopen target to libparakeet.dylib on darwin (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins. - Makefile: also stage the built libparakeet.dylib next to the Go sources. - package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when bundling instead of hard-failing when no .so is present (the macOS case); note that on Darwin only system frameworks are linked. - run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux. Mirrors backend/go/voxtral. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backends): darwin/metal support across purego Go backends The parakeet-cpp fix in the previous commit was an instance of a bug shared by nearly every purego/dlopen Go backend: the dlopen target was hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the .dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling this correctly. Apply the same four-layer fix (mirroring backend/go/voxtral) to the remaining affected backends: whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp, qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp, depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe Per backend: - main.go (sherpa-onnx: backend.go, two libraries): default the dlopen target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the existing <BACKEND>_LIBRARY env override still wins. - run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant (avx/avx2/avx512) selection unchanged in the else branch. - package.sh: also bundle the .dylib and stop hard-failing when no .so is present (the macOS case). - Makefile: also stage the built .dylib. Notes: - stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE, which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls back to .so so both layouts work. - sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only run.sh and the two dlopen defaults needed fixing. Linux behavior is unchanged. Verified gofmt-clean and `CGO_ENABLED=0 go build` for every backend. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:09:18 +02:00
LocalAI [bot]	3c63431e46	chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to `0f37401bebe9b20c0160a888e592108fc1d17607` (#10492 ) ⬆️ Update ServeurpersoCom/omnivoice.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 00:57:58 +02:00
LocalAI [bot]	3f647a2764	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `d5507e33ae7ee2b7b41475f08044d3bde3b839ee` (#10498 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 00:57:42 +02:00
LocalAI [bot]	62b14fd635	feat(backends): add darwin/metal build for liquid-audio (#10486 ) * feat(backends): add darwin/metal build for liquid-audio Wire the already-MPS-ready liquid-audio backend (it ships requirements-mps.txt) into the darwin CI matrix and the gallery so metal-darwin-arm64 images are built and selectable. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * ci(liquid-audio): trigger darwin build via requirements-mps note The changed-backends path filter only builds a backend when a file under its directory changes. The metal wiring lived in index.yaml + the matrix, so the darwin job was skipped. Add a documenting comment to the MPS requirements so CI actually exercises the darwin build. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which rejects the uv-only --index-strategy flag and failed the darwin backend build. Add it only on the uv path; Linux/CUDA resolution is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 23:16:27 +02:00
LocalAI [bot]	193d0e6aef	fix(backends): darwin/metal support for supertonic (#10488 ) The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec path, while package.sh hard-failed on any non-Linux host. On macOS dyld has no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib. This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481 landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but which omitted supertonic: - run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH at libonnxruntime.dylib; guard the ld.so exec path to Linux only. - package.sh: recognize Darwin instead of erroring out; the bundled .dylib is resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle. - helper.go: platform-native default library extension (dylib on darwin) for the last-resort dlopen fallback. It also wires the darwin CI build and gallery entries, resolving the inconsistency where backend/index.yaml advertised metal for supertonic but no includeDarwin matrix entry built the image: - .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry. - backend/index.yaml: declare metal capabilities and add the concrete metal-supertonic / metal-supertonic-development child entries. The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 22:19:03 +02:00
LocalAI [bot]	0f3b24436d	chore: ⬆️ Update mudler/parakeet.cpp to `89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a` (#10468 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:43 +02:00
LocalAI [bot]	4b6f911835	chore: ⬆️ Update ggml-org/whisper.cpp to `43d78af5be58f41d6ffbc227d608f104577741ea` (#10466 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:14 +02:00
LocalAI [bot]	a5e28942a6	chore: ⬆️ Update ggml-org/llama.cpp to `be4a6a63eb2b848e19c277bdcf2bd399e8af76d9` (#10467 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:54 +02:00
LocalAI [bot]	dba9cd7ca4	chore: ⬆️ Update CrispStrobe/CrispASR to `96b2a6ee31d30389fed8a7ef1a54239b75231ddc` (#10465 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:34 +02:00
LocalAI [bot]	c93190de50	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `7ccf1d209588962b96eacca325b37e9b3e8faf5e` (#10456 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:13 +02:00
LocalAI [bot]	06a7b6cadb	chore: ⬆️ Update leejet/stable-diffusion.cpp to `f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f` (#10457 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:29:07 +02:00
LocalAI [bot]	67c8889866	chore: ⬆️ Update CrispStrobe/CrispASR to `63b57289255267edf66e43e33bc3911e04a2e92d` (#10455 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:49 +02:00
LocalAI [bot]	1d49041c85	chore: ⬆️ Update ggml-org/llama.cpp to `73618f27a801c0b8614ceaf3547d3c2a99baae14` (#10458 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:09 +02:00
LocalAI [bot]	2edc4e25b3	chore: ⬆️ Update ggml-org/whisper.cpp to `bae6bc02b1940bbfb87b6a0299c565e563b916d1` (#10459 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:27:51 +02:00
Adira	62c99c10b3	fix(diffusers): pin diffusers and transformers to a known-good pair (#9979 ) (#10442 ) fix(diffusers): pin diffusers and transformers to a known-good pair The diffusers backend tracked git+https://github.com/huggingface/diffusers (main) with an unpinned transformers. transformers v5 restructured CLIPTextModel and removed the .text_model attribute that diffusers' single -file loader reads, so loading any single-file Stable Diffusion checkpoint fails: create_diffusers_clip_model_from_ldm (single_file_utils.py) position_embedding_dim = model.text_model.embeddings.position_embedding... AttributeError: 'CLIPTextModel' object has no attribute 'text_model' No released diffusers (<=0.38.0) supports transformers v5 - only unreleased diffusers main does. Because the requirements tracked main plus an unpinned transformers, every backend image froze whichever pair existed at build time, and images built once transformers v5 shipped but before diffusers main caught up are permanently broken. Pin the last known-good released pair across all requirements files: diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no functionality is lost, and builds become reproducible instead of drifting into the broken window. Fixes #9979 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-22 12:38:06 +02:00
LocalAI [bot]	7226bb9f30	chore: ⬆️ Update CrispStrobe/CrispASR to `7a8cb80907341c0204bd0488c1244764f4163883` (#10315 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 12:21:58 +02:00
LocalAI [bot]	b7d67f5779	chore: ⬆️ Update ggml-org/llama.cpp to `7c082bc417bbe53210a83df4ba5b49e18ce6193c` (#10417 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 08:43:40 +02:00
LocalAI [bot]	600dafd20b	feat(ced): sound-event classification backend (CED audio tagger) (#10425 ) * feat(ced): sketch sound-classification backend (CED audio tagger) Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry, footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend. SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist in DESIGN.md): - backend/backend.proto: new SoundDetection rpc + SoundClass messages (run `make protogen-go` to regenerate pkg/grpc/proto). - backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h), goced.go (Ced gRPC backend: Load + SoundDetection), Makefile (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh, package.sh, .gitignore. - DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability registration checklist), gallery/index + CI registration, and a scoping note for the realtime/websocket live-recognition path (sliding-window classify over the existing ws transport + voicegate; the ced C-API per-PCM entry point is already window-friendly). Backend code does not compile until protogen-go regenerates the pb types and a libced.so is built (Makefile clones+builds it). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): REST /v1/audio/classification endpoint + capability registration Wires the ced sound-event classification backend (AudioSet audio tagger) end to end through the REST surface, mirroring the transcription path. - Handler: core/http/endpoints/openai/sound_classification.go parses the multipart audio upload, temp-files it, resolves the model config and calls the SoundDetection RPC; returns {model, detections[]} JSON. - Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection) loads the model and normalizes the proto response into schema types. - Schema: core/schema/sound_classification.go (SoundClassificationResult). - gRPC layer: SoundDetection wired through the LocalAI wrapper (interface, Backend client, Client, embed, server, base default) so the loader-typed client exposes the RPC; proto regenerated via make protogen-go. - Route: POST /v1/audio/classification (+ /audio/classification alias) with the audio/multipart default-model middleware in routes/openai.go. - Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_ CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap + GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase option; /api/instructions audio area updated; auth RouteFeatureRegistry + FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter + i18n; docs page features/audio-classification.md + whats-new + crosslink. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): realtime sound-event detection over the websocket API When a realtime pipeline configures a sound-classification model, each VAD-committed utterance (the same window the transcription path produces) is also run through the CED sound-event classifier and the scored AudioSet tags are emitted as a new server event. No new backend rpc is needed: the SoundDetection gRPC method already exists on this branch. - config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty) beside Transcription/VAD. - realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the ModelInterface; implement it on wrappedModel and transcriptOnlyModel by calling backend.ModelSoundDetection with the session's sound-classification model config (mirrors how Transcribe dispatches). Load the optional config in newModel / newTranscriptionOnlyModel; nil config keeps it additive. - types: add ConversationItemSoundDetectionEvent (item_id, content_index, detections[]{label,score,index}) with type conversation.item.sound_detection, its ServerEventType constant and MarshalJSON, mirroring the transcription completed event. - realtime: add emitSoundDetection (unary path: classify the committed window, build the event, t.SendEvent) and wire it at the utterance-commit hook right after emitTranscription; gated on session.SoundDetectionEnabled (resolved from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0). Its error is logged via xlog but never aborts the turn. - test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections, classifier error) plus a SoundDetection method on the fakeModel double. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): implement SoundDetection in nodes backend test doubles The SoundDetection method added to the grpc backend interface left two test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so core/services/nodes failed to compile under `go vet`/`go test` (go build missed it: the doubles live in _test.go). Add the method to both, mirroring their existing Detect mock. Repairs CI for the nodes package. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): decouple realtime sound detection from VAD (sound-only sessions) Sound-event detection must activate on sounds, not speech, so it no longer runs through the voice VAD/transcription path. A sound-detection-only pipeline (sound_detection set, no transcription/LLM) now: - is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline stage), - builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS loaded), and - defaults the session to turn_detection none (no VAD) with no transcription stage, so the client drives windowing via input_audio_buffer.commit (option A: client-side sliding window). The per-PCM C-API already supports arbitrary windows. commitUtterance gains a sound-only branch: it emits the conversation.item.sound_detection event (scored AudioSet tags) and stops - no transcription, no LLM response. generateResponse is now guarded on a transcription stage being present, so a sound-only turn never invokes the LLM. Existing transcription/VAD sessions are unchanged (additive). Added a commitUtterance sound-only Ginkgo spec asserting it emits the sound event and neither transcribes nor generates a response. go vet + golangci-lint (new-from-merge-base) clean; openai suite green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): register sound-classification backend in gallery + CI Mechanical backend-image registration for the ced sound-event classifier, mirroring the parakeet-cpp Go/purego backend everywhere it is wired up. - .github/backend-matrix.yml: add the ced build matrix, field-for-field copies of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64, l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan amd64/arm64, rocm hipblas, and the metal darwin entry), changing only backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang. - backend/index.yaml: add the &ced meta anchor (capabilities map per platform) plus ced-development and the per-arch image entries, each uri/mirror tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is intentionally deferred pending the HuggingFace publish (TODO note inline). - scripts/changed-backends.js: add an explicit item.backend === "ced" branch in inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as the parakeet-cpp branch (before the generic golang fallthrough). - .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in backend/go/ced/Makefile so the daily bot bumps the pin. - swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so the existing /v1/audio/classification annotations land in the generated spec. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): server-side windowing for realtime sound detection (option B) Adds an optional server-driven sliding-window classifier so a sound-only realtime client only has to stream audio (no input_audio_buffer.commit): - Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs. When both > 0 on a sound-only session, the server classifies the last window of streamed audio every hop and emits a conversation.item.sound_ detection event; the input buffer is trimmed to one window so a long stream stays bounded. When unset, the session stays client-driven (option A). Runs independent of VAD (sound events are not speech). - handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so it is unit-testable) + writeWindowWAV, which declares the true InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples correctly. Goroutine is started after toggleVAD and torn down with the session (close + wg.Wait). - Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta registry; the earlier realtime commit added pipeline.sound_detection without a registry entry, failing TestAllFieldsHaveRegistryEntries. This fixes that and covers the two new knobs. Tests: classifySoundWindow emits an event + trims the buffer to one window, no-ops on too-little audio; writeWindowWAV declares the given sample rate. go build/vet + golangci-lint (new-from-merge-base) clean; config + openai suites green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0) The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0, converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced + known_usecases: sound_classification) and two gallery/index.yaml entries (ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and removes the now-resolved TODO from backend/index.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add tiny/mini/small GGUF model gallery entries Publishes the rest of the CED family (same architecture, metadata-driven port verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds their f16 + q8_0 gallery entries: ced-tiny (5.5M, edge/Pi-class) f16 11MB / q8_0 6MB ced-mini (9.6M) f16 19MB / q8_0 11MB ced-small (22M) f16 42MB / q8_0 23MB All sha256-pinned. ced-base remains the accuracy default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8 gallery model entries' urls + file uris accordingly. sha256 and filenames are unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): bump CED_VERSION to the short-clip fix Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip shorter than target_length (~10.11s): time_pos_embed was added at its full 63-frame grid instead of being sliced to the clip's actual time grid, tripping ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s windows) and gated with a short-clip parity test upstream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive - README.md: add ced.cpp to the "native C/C++/GGML engines developed and maintained by the LocalAI project" table. - docs/content/features/backends.md: add a Sound Classification backend category (sound-event classification / audio tagging) listing ced.cpp. - .agents/adding-backends.md: add a "Documenting the backend" section and two verification-checklist items requiring new backends to be documented in the backends.md category list, and in-house native engines to be added to the README maintained-engines table. This directive was missing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): repin CED_VERSION to the v0.1.0 release commit ced.cpp history was squashed into a single release commit (tagged v0.1.0), so the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the v0.1.0 release commit, so the backend builds against a commit that exists. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths - sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler. - goced.go: reading a NUL-terminated C string from a libced-owned buffer. #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since the uintptr is a C-owned malloc'd buffer, not Go-GC memory. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 01:00:28 +02:00
LocalAI [bot]	ce8a3e9266	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `4536dcdce27c3764a93a06d6bf64026b124962f5` (#10431 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 01:00:10 +02:00
LocalAI [bot]	a88d9d2de3	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `6c00e87ac84404af588ad2e65935bd6f079c696f` (#10430 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:49 +02:00
LocalAI [bot]	1cf1bf32e1	chore: ⬆️ Update leejet/stable-diffusion.cpp to `b12098f5d09fc83da36e65c784f7bdb16a5a5ebf` (#10429 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:33 +02:00
OrbisAI Security	a556cd9afc	fix: the trl backend's _do_training method directly ... in backend.py (#10422 ) * fix: V-001 security vulnerability Automated security fix generated by OrbisAI Security Signed-off-by: orbisai0security <mediratta01.pally@gmail.com> * fix: the trl backend's _do_training method directly ... in backend.py The TRL backend's _do_training method directly uses request Signed-off-by: orbisai0security <mediratta01.pally@gmail.com> --------- Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>	2026-06-21 17:40:29 +02:00
pos-ei-don	b4c0dc67fe	feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346 ) (#10351 ) * fix(vllm): don't stream raw tool-call markup as content when a tool parser is active When a tool_parser is configured and the request carries tools, the streaming loop emitted every text delta as delta.content — including the model's raw tool-call markup (e.g. <tool_call>...) — because extract_tool_calls only runs on the full output after the stream. Clients streaming a tool call therefore saw the unparsed tool-call syntax as assistant content. Buffer the text while a tool parser is active for the request; the existing end-of-stream chat_delta already carries the parsed tool_calls (or the cleaned content), which the Go side converts to SSE deltas. Non-tool-parser streaming is unchanged. Add a server-less regression test covering both the tool-call case (no raw markup leaked as content) and the plain-text case (content delivered exactly once — guards against double-emitting the buffered content). Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * test(vllm): add expectedFailure test for progressive streaming with tool parser (Case 3, #582) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * test(vllm): add Cases 4+5 — marker split across chunks + false-positive prefix (TDD, Option B state machine, #582) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * feat(vllm): progressive streaming via parser.extract_tool_calls_streaming When a tool parser is active for a tool-enabled streaming request, #10346 buffers the entire generation and surfaces it on the final chunk to prevent raw tool-call markup from leaking as delta.content. This is correct but turns the request into effectively non-streaming for plain-text responses — the client sees nothing until the model stops. Every concrete tool parser shipped with vLLM 0.23+ already implements extract_tool_calls_streaming (Granite4, Qwen3Coder, DeepSeekV31, Jamba, Ernie45, Hermes2Pro, llama3_json, mistral, …). Use it: instantiate the parser before the streaming loop and call its streaming method per delta, emitting DeltaMessage(content=…) or DeltaMessage(tool_calls=[…]) when the parser is ready. Falls back to the existing #10346 buffer path when: - the parser does not have extract_tool_calls_streaming, OR - extract_tool_calls_streaming raises mid-stream (logged, the rest of the request finishes via post-loop extract_tool_calls). Tests (TestStreamingToolParser): 1. Buffer path: no markup leaked, no content duplication 2. Native streaming: plain-text response streams progressively 3. Native streaming: tool_call structured, no markup leaked 4. Native streaming exception → graceful fallback, no markup, no crash 5. No tool parser → unchanged per-delta content stream E2E verified against qwen3_coder on vLLM 0.23.0 (NVIDIA GB10 / arm64 / CUDA 13). Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * docs(vllm): add server-side TTFT benchmark for the streaming tool-parser path Self-contained stdlib-only script that measures time-to-first-token (TTFT) for the vLLM backend's two streaming scenarios: - tool_call: request mentions a tool; model is expected to call it - plain_text: request offers a tool but explicitly asks for prose Use this to compare: - the buffer-all path (#10346) → plain_text TTFT ≈ total response time - the native-streaming path (this PR) → plain_text TTFT ≈ true first-token time python examples/vllm-bench/ttft_streaming_tool_parser.py \\ --url http://localhost:8080 --model my-coder --runs 3 Lives under examples/ so it does not interfere with the test suite. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * examples/vllm-bench: add long-text scenario (8 paragraphs, 1500 tokens) The long-text scenario shows the buffering vs streaming difference most dramatically: with the buffer-all path, the client receives nothing for 20+ seconds and then the entire 1500-token response at once. With native streaming, the first token arrives in tens of milliseconds and the response flows progressively. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> --------- Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> Co-authored-by: Philipp Wacker <philipp.wacker@ibf-solutions.com>	2026-06-21 17:07:15 +02:00
番茄摔成番茄酱	01fa12e0de	feat(nemo): enable word-level timestamps for ASR models (#10297 ) * feat(nemo): enable word-level timestamps for ASR models The nemo backend ignored timestamp_granularities and always returned a single segment with start=0 end=0, making word-level timestamps impossible to obtain even though the NeMo models (parakeet-tdt, etc.) fully support them. Changes: - Add _get_stride_seconds() to compute frame duration from the model's preprocessor window_stride and encoder subsampling_factor. - Add _build_segments_with_words() that extracts word offsets from the NeMo Hypothesis.timestamp dict and converts frame indices to nanosecond timestamps. - Support 'word' granularity (one segment per word) and 'segment' granularity (merge at time-gap boundaries using a dynamic threshold). - Populate TranscriptSegment.words with TranscriptWord entries so callers get both segment-level and word-level timing. - Only request timestamps from NeMo when the caller actually asks for them (timestamp_granularities is non-empty), keeping the fast path unchanged for callers that don't need timestamps. Tested with nvidia/parakeet-tdt-0.6b-v3 on the JFK "ask not" clip: curl -X POST /v1/audio/transcriptions \ -F file=@jfk.wav -F model=nemo-parakeet-tdt-0.6b \ -F 'timestamp_granularities[]=word' -F response_format=verbose_json → each word has correct start/end times in seconds. Signed-off-by: fqscfqj <fqscfqj@outlook.com> * fix(nemo): address Copilot review feedback - Narrow exception handling in _get_stride_seconds to catch only AttributeError, KeyError, TypeError instead of bare Exception, and emit a warning when falling back to the hardcoded stride. - Remove explicit return_hypotheses=False when timestamps are requested; timestamps=True already forces NeMo to return Hypothesis objects. - Add a warning when NeMo does not return Hypothesis objects despite timestamps being requested. Signed-off-by: fqscfqj <fqscfqj@outlook.com> --------- Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-21 17:04:19 +02:00
番茄摔成番茄酱	cf7f9573a2	fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421 ) The parakeet-specific word accessors can return stale initialisation data (model name, binary blobs) for segments with no real speech. Add isValidWord() to filter out words that have: - empty or whitespace-only text - U+FFFD replacement characters (from binary data scrubbing) - negative timestamps - zero duration (end <= start) Also skip empty segments entirely when they have no recognisable content (empty text AND no valid words), preventing spurious subtitle entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'. Applies to both AudioTranscription and AudioTranscriptionStream. Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-21 17:03:33 +02:00
pos-ei-don	c6303104c7	fix(vllm): structured outputs silently ignored on vLLM >= 0.23 (GuidedDecodingParams removed) (#10343 ) fix(vllm): structured outputs silently ignored on vLLM >= 0.23 vLLM >= 0.23 removed GuidedDecodingParams (now StructuredOutputsParams) and renamed the SamplingParams field guided_decoding -> structured_outputs. The import failed, HAS_GUIDED_DECODING became False, and the whole guided-decoding block was skipped, so response_format / grammar constraints were silently ignored. Adapt the existing request.Grammar path to the new class/field. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-21 17:02:31 +02:00
LocalAI [bot]	e19c43cf04	feat(gallery): add Depth Anything V2 models + bump native version (#10413 ) * feat(gallery): add Depth Anything V2 models + bump native version Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is depth-only (no camera pose, no confidence) and ships both relative (relative inverse depth) and metric (depth in metres) variants. The Go backend is model-agnostic, so no backend code changes are required — only a native version bump and new gallery entries. - backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the depth-anything.cpp commit that adds the DA2 engine + C-API routing (e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag da2-support so it survives a squash-merge). - gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models carry the metric-depth tag; none carry camera-pose. Assisted-by: Claude:claude-opus-4-8 * chore(depth-anything-cpp): pin to merged DA2 master commit PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint the pin from the pre-merge commit to the canonical master commit. Assisted-by: Claude:claude-opus-4-8 --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-20 14:56:16 +02:00
LocalAI [bot]	518381278e	chore: ⬆️ Update ggml-org/llama.cpp to `e475fa2b5f9fb50c3d6fc3e7c6fdf1e004465b62` (#10392 ) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): adapt grpc-server to upstream server-schema split Upstream llama.cpp (e475fa2) extracted the JSON request-schema evaluation out of the static server_task::params_from_json_cmpl into the new server_schema::eval_llama_cmpl_schema (tools/server/server-schema.cpp). The grpc-server unity build still called the old static member, breaking every llama-cpp backend build with "no member named 'params_from_json_cmpl' in 'server_task'". Pull server-schema.cpp into the translation unit and call the new function, keeping both guarded by __has_include so forks that predate the split (e.g. llama-cpp-turboquant, which still exposes params_from_json_cmpl) keep compiling against the old static member. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-20 08:22:22 +02:00
LocalAI [bot]	93706fec57	chore: ⬆️ Update mudler/parakeet.cpp to `db755a78d39f789bb7d4e3935158a9e8105dbe36` (#10393 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:37:33 +02:00
LocalAI [bot]	11aee03a80	chore: ⬆️ Update localai-org/privacy-filter.cpp to `98f52c5ef2250f207cc6b9a6aef05393a120cb7c` (#10394 ) ⬆️ Update localai-org/privacy-filter.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:37:21 +02:00
LocalAI [bot]	8915f2ab91	chore: ⬆️ Update ggml-org/whisper.cpp to `5ed76e9a079962f1c85cfce44edd325c27ef1f97` (#10396 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:37:06 +02:00
LocalAI [bot]	f143d7f688	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `d47f484d299cafad2e606afc0d31677a91b242d0` (#10410 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:36:51 +02:00
LocalAI [bot]	dd928f0bdd	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `26fcea5468e4069bc72d1f2fcc812c985e7361bb` (#10409 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:36:36 +02:00
LocalAI [bot]	c43a752afc	chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to `96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd` (#10408 ) ⬆️ Update ServeurpersoCom/omnivoice.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-20 01:36:22 +02:00
番茄摔成番茄酱	72d46c1115	feat(crispasr): add word-level timestamp support (#10403 ) * feat(crispasr): add word-level timestamp support Add word-level timestamp extraction to the crispasr backend by calling the CrispASR C library's word accessor functions that are already exported by libgocraspasr but were not previously bound by the Go wrapper. Two families of word functions are supported: 1. Session-based (get_word_count/text/t0/t1) — works per-segment for whisper-like backends. 2. Parakeet-specific (get_parakeet_word_count/text/t0/t1) — returns a global word list for TDT/CTC/RNNT parakeet models where the session API does not expose per-segment word data. The Go code tries session-based first and falls back to parakeet-specific when the session word count is zero. Depends on #10402 (grpc server Words forwarding) for the words to reach the HTTP response. Signed-off-by: fqscfqj <fqscfqj@outlook.com> * fix(crispasr): use portable sed -i.bak for macOS compatibility BSD sed requires -i '' for in-place editing while GNU sed uses -i. Replace with -i.bak which works on both platforms, then remove the backup file. Signed-off-by: fqscfqj <fqscfqj@outlook.com> --------- Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-19 21:34:30 +02:00
Richard Palethorpe	606128e4e9	feat(vulkan): make Vulkan backends self-contained on the GPU (#10404 ) Vulkan backends bundled their own loader and ICD manifests but neither the Mesa driver the manifests point at nor a way to make the loader find them, so on a runtime base image without Mesa the loader enumerated zero devices and the GPU silently fell back to CPU (only NVIDIA worked, since its ICD is injected by the container toolkit). - scripts/build/package-gpu-libs.sh: for each installed ICD manifest, bundle the driver .so its library_path names — no hard-coded, platform-dependent soname list — plus that driver's ldd dependencies, skipping manifests whose driver isn't installed. Rewrite each library_path to a bare soname so the bundled driver resolves via the LD_LIBRARY_PATH run.sh already sets. - .docker/install-base-deps.sh, backend/Dockerfile.golang, backend/Dockerfile.python: install mesa-vulkan-drivers in every Vulkan builder so the driver + manifests exist to be packaged (the LunarG SDK ships only the loader and shader tooling). - pkg/model/process.go: when a backend ships vulkan/icd.d/, point the loader at it via VK_DRIVER_FILES/VK_ICD_FILENAMES at launch (no-op otherwise). Covered by pkg/model/process_vulkan_test.go. - backend/go/parakeet-cpp/package.sh: complete the L0 stub (was missing the libc-family ldd walk + GPU-lib packaging) by mirroring whisper, so the vulkan-parakeet image actually bundles its GPU runtime. Assisted-by: Claude Code:claude-opus-4-8 Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-19 17:16:33 +02:00
LocalAI [bot]	4ad754eea3	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `b3dfb7858cfcb9166e92f366e5af87f19ebc94be` (#10395 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-19 00:03:37 +02:00
Tai An	c3b3336654	fix(whisperx): use whisperx.diarize.DiarizationPipeline with token kwarg (#10389 ) Signed-off-by: Anai-Guo <antai12232931@outlook.com>	2026-06-18 18:50:37 +02:00
Richard Palethorpe	3fa7b2955c	feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 ) Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see backup/pii-ner-tier-engine-prerebase). Net change: - privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan). TokenClassify moves off the patched llama.cpp path onto this backend. - PII filter reworked to be NER-centric (encoder/NER detection tier scanning whole conversations as one document), with a recreated bounded restricted- regex secret-matching pattern detector tier alongside it (per-model pii_detection.builtins / .patterns + core/services/routing/piipattern). - Detection labelled by source (ner vs pattern); backend trace / confidence / debug observability; analyze/redact exposed as a synchronous API. - Instance-wide default detector policy + per-usecase default-on; request filtering extended to completions, embeddings, edits & Ollama. - React UI: NER-centric PII editor, detector-models table, pattern/builtins editor, middleware default-policy UI. - Gallery: privacy-filter-multilingual token-classify model + NER install filter; token_classify known_usecase; batch sized to context for NER models. privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13 meta + image entries with a capabilities map) matching its CI matrix jobs, and an /import-model auto-detect importer (PrivacyFilterImporter, narrow privacy-filter GGUF detection) replacing the prior pref-only registration. Reconciled against master's independent evolution: - Dropped master's PIIPatternOverrides feature (global-pattern runtime overrides + /api/pii/patterns API + runtime_settings.json persistence). The per-model NER + pattern-detector design supersedes it; it was built on the global redactor pattern set this branch replaced. - Reverted the llama.cpp Score carry-patch (0006-server-task-type-score): removed the patch and restored master's grpc-server.cpp Score RPC (direct llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's model_config validation forbidding score + chat/completion/embeddings on llama-cpp. token_classify is unaffected (it runs on the privacy-filter backend, not llama-cpp). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-18 11:45:22 +01:00
LocalAI [bot]	c133ca39dc	chore: ⬆️ Update ggml-org/llama.cpp to `f3e182816421c648188b5eab269853bf1531d950` (#10379 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 11:43:23 +02:00
LocalAI [bot]	91f97f2a54	chore: ⬆️ Update ggml-org/whisper.cpp to `86c40c3bd6fc86f1187fb751d111b49e0fc18e84` (#10382 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 08:34:43 +02:00
LocalAI [bot]	55f9ff6805	chore: ⬆️ Update mudler/parakeet.cpp to `92a5f0306be354c109150fe58ae4cc4f8a21ca45` (#10380 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 08:32:13 +02:00
LocalAI [bot]	5c2ae7857a	chore: ⬆️ Update antirez/ds4 to `80ebbc396aee40eedc1d829222f3362d10fa4c6c` (#10378 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 00:32:13 +02:00
LocalAI [bot]	4af360300f	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `71af16a6b7f6fb7315b346b4a51aad530599c3f5` (#10381 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 00:12:25 +02:00
LocalAI [bot]	95e7149c87	chore: ⬆️ Update ggml-org/llama.cpp to `74ade52741203e5c8f81eaf06a96cb1cfe15f2a3` (#10368 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 13:25:29 +02:00

1 2 3 4 5 ...

1515 Commits