LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-17 04:56:52 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	0353d3bd77	chore: ⬆️ Update ggml-org/whisper.cpp to `3e9b7d0fef3528ee2208da3cdb873a2c53d2ae2f` (#9808 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-14 00:20:14 +02:00
LocalAI [bot]	602866a9d8	chore: ⬆️ Update ggml-org/whisper.cpp to `338cce1e58133261753243802a0e7a430118866d` (#9793 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-13 00:39:57 +02:00
LocalAI [bot]	19d59102d5	feat(whisper-cpp): implement streaming transcription (#9751 ) * test(whisper): wire e2e streaming transcription target Adds test-extra-backend-whisper-transcription, mirroring the existing llama-cpp / sherpa-onnx / vibevoice-cpp targets. The generic AudioTranscriptionStream spec at tests/e2e-backends/backend_test.go:644 fails today because backend/go/whisper has no streaming impl - this target is the failing TDD gate that the next phase makes pass. Confirmed RED locally: 3 Passed (health, load, offline transcription), 1 Failed (streaming spec hits its 300s context deadline because the base implementation returns 'unimplemented' but doesn't close the result channel, leaving the gRPC stream open until the client times out). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper-cpp): expose new_segment_callback to the Go side Adds set_new_segment_callback() and a C-side trampoline that whisper.cpp invokes once per new text segment during whisper_full(). The trampoline dispatches (idx_first, n_new, user_data) to a Go function pointer registered via purego.NewCallback - text and timings are pulled by Go through the existing get_segment_text/get_segment_t0/get_segment_t1 getters. Wires the hook only when streaming is actually requested, to avoid a per-segment function-pointer dispatch on the offline path. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper-cpp): implement AudioTranscriptionStream Wires whisper.cpp's new_segment_callback through purego back to Go so the streaming transcription RPC produces real, time-correlated deltas while whisper_full() is still decoding. Each segment becomes one TranscriptStreamResponse{Delta}; whisper_full's return is the TranscriptStreamResponse{FinalResult} carrying the full segment list, language, and duration. Per-call state is tracked in a sync.Map keyed by an atomic counter; the Go callback registered via purego.NewCallback is a singleton, dispatched through user_data. SingleThread today means only one entry is ever live, but the map shape matches the sherpa-onnx TTS callback pattern. The streaming path's final.Text is the literal concat of every emitted delta (a strings.Builder accumulated by onNewSegment) so the e2e invariant `final.Text == concat(deltas)` holds exactly. The first delta has no leading space; subsequent deltas are space-prefixed. The offline AudioTranscription path is unchanged. Closes the gap with sherpa-onnx, vibevoice-cpp, llama-cpp, and tinygrad, which already implement AudioTranscriptionStream. Verified GREEN locally: make test-extra-backend-whisper-transcription passes 4/4 specs (3 Passed initially under RED, +1 streaming spec now). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(whisper-cpp): assert progressive multi-segment streaming Drives AudioTranscriptionStream against a real long-audio fixture and asserts len(deltas) >= 2. The generic e2e spec at tests/e2e-backends/backend_test.go:644 only checks len(deltas) >= 1 which is satisfied by both real and faked streaming - this spec is the guardrail that a future "fake" impl can't sneak past. Skipped by default (env-gated, like the cancellation spec); set WHISPER_LIBRARY, WHISPER_MODEL_PATH, and WHISPER_AUDIO_PATH to a 30+ second clip to run. Verified locally with a 55s 5x-JFK concat against ggml-base.en.bin: 1 Passed in 7.3s, deltas >= 2, finalSegmentCount >= 2, concat(deltas) == final.Text. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(whisper-cpp): add transcription gRPC e2e job Mirrors tests-sherpa-onnx-grpc-transcription / tests-llama-cpp-grpc-transcription. Runs make test-extra-backend-whisper-transcription whenever the whisper backend or the run-all switch fires, so a pin-bump or refactor that breaks streaming transcription gets caught before merge. The whisper output on detect-changes is already emitted by scripts/changed-backends.js (it iterates allBackendPaths); this PR just exposes it as a workflow output and consumes it. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(whisper-cpp): silence errcheck on AudioTranscriptionStream defers golangci-lint runs with new-from-merge-base=origin/master, so the identical defer patterns in the existing offline AudioTranscription path are grandfathered while the new ones in AudioTranscriptionStream trip errcheck. Wrap both defers in `func() { _ = ... }()` to match what errcheck wants without altering behavior. The errors from os.RemoveAll and *os.File.Close are not actionable inside a defer here (we're already returning), matching the offline path's contract. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-10 23:11:46 +02:00
LocalAI [bot]	28f33be48f	chore: ⬆️ Update ggml-org/whisper.cpp to `c33c5618b72bb345df029b730b36bc0e369845a3` (#9749 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-05-10 21:32:47 +02:00
LocalAI [bot]	2be07f61da	feat(whisper): honor client cancellation via ggml abort_callback (#9710 ) * refactor(transcription): propagate request ctx through ModelTranscription* Replaces context.Background() with the HTTP request ctx so client disconnects start cancelling the gRPC call. No backend-side abort wiring yet — that comes in a later commit. Pure plumbing. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cli): pass ctx to backend.ModelTranscription Follow-up to `e65d3e1f` which threaded ctx through ModelTranscription but missed the CLI caller. CLI commands have no request-scoped ctx, so context.Background() is correct here. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(audio): propagate request ctx into TTS, sound-gen, audio-transform Same ctx-plumbing pattern applied to the rest of the audio path. CLI callers use context.Background() since there is no request scope; HTTP callers use c.Request().Context(). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(backend): propagate request ctx into biometric, detection, rerank, diarization paths Replaces remaining context.Background() sites in core/backend with the caller's ctx. After this commit, every core/backend/.go entry point threads the request ctx end-to-end to the gRPC client. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> refactor(grpc): plumb ctx through AIModel.AudioTranscription{,Stream} Adds context.Context as first parameter to the AIModel interface methods that wrap whisper-style transcription. Server-side gRPC handler now forwards the per-RPC ctx (server-streaming uses stream.Context()). Whisper, Voxtral, vibevoice-cpp, and sherpa-onnx accept the parameter; none uses it yet — the actual cancellation primitive lands in the next commit so this is pure plumbing. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): add abort_callback hook in the C++ bridge Installs a std::atomic<int> flag, wires it into whisper_full_params.abort_callback, and exposes a set_abort(int) C symbol so Go can flip the flag from a goroutine watching the request context. transcribe() now distinguishes abort (return 2) from real whisper_full failure (return 1). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): register set_abort symbol in the purego loader Adds the Go-side binding for the new C export so the next commit can call CppSetAbort(1) from a watcher goroutine on ctx.Done(). Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(whisper): honor ctx cancellation and return codes.Canceled A watcher goroutine watches ctx.Done() during AudioTranscription and calls CppSetAbort(1) on cancel. whisper_full sees abort_callback return true at the next compute graph step, returns non-zero, and the bridge returns 2 -> AudioTranscription maps that to codes.Canceled. Adds an opt-in test (gated on WHISPER_MODEL_PATH / WHISPER_AUDIO_PATH) that asserts cancellation latency under 5s and proves the abort flag resets cleanly so the next transcription succeeds. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(whisper): join the cancel watcher goroutine before returning Follow-up to `85edf9d2`. The previous commit used `defer close(done)` and called the watcher "joined synchronously" — but close() only signals, it does not block until the goroutine exits. That left a window where a late CppSetAbort(1) from a cancelled call could land on the next call, after its C-side g_abort reset but before whisper_full() began polling the abort callback, corrupting the second transcription. Switch to a sync.WaitGroup join so wg.Wait() blocks until the watcher has actually returned from its select. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(whisper): short-circuit pre-cancelled ctx in AudioTranscription If ctx is already Done() at entry, return codes.Canceled immediately instead of running the full transcription. The C-side g_abort reset happens at the start of transcribe() and would otherwise overwrite a watcher-set abort flag from an already-cancelled ctx, producing a spurious successful transcription on a request the client has already abandoned. Assisted-by: Claude:claude-haiku-4-5 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests/distributed): update testLLM mock for new AudioTranscription signature Phase B (`93c48e19`) added context.Context to AIModel.AudioTranscription but missed the testLLM mock in tests/e2e/distributed. CI golangci-lint caught it: testLLM did not implement grpc.AIModel because the method signature lacked the ctx parameter, which broke the distributed test suite compilation and cascaded through every backend-build job that runs `go build ./...`. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> test(whisper): port cancellation test to Ginkgo/Gomega Project policy (.agents/coding-style.md, enforced by golangci-lint forbidigo) is that all Go tests must use Ginkgo v2 + Gomega — no stdlib testing patterns (t.Skip, t.Fatalf, etc.). Convert the cancellation test to a Describe/It block with Skip(...) for env gating and Expect/HaveOccurred for assertions. Same coverage: cancel mid-flight returns codes.Canceled within 5s and a follow-up transcription succeeds, proving the C-side g_abort flag resets cleanly. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-08 01:44:47 +02:00
LocalAI [bot]	806130bbc0	chore: ⬆️ Update ggml-org/whisper.cpp to `c81b2dabbc45484dee2ca6658cfe39c841df5c70` (#9712 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-08 01:44:32 +02:00
LocalAI [bot]	0b9344ef3d	chore: ⬆️ Update leejet/stable-diffusion.cpp to `90e87bc846f17059771efb8aaa31e9ef0cab6f78` (#9701 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-07 08:29:41 +02:00
LocalAI [bot]	a8d7d37a3c	fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) (#9682 ) * fix(docs): correct broken Hugo relrefs The Hugo build has been failing on master since the relevant pages landed: - text-generation.md:720 referenced `/docs/features/distributed-mode`, but Hugo `relref` paths are relative to the content root, not the rendered URL. Drop the `/docs/` prefix so the lookup matches the existing `features/...` form used elsewhere in the file. - audio-transform.md:144 referenced `tts.md`; the actual page is `text-to-audio.md`. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(kokoros): stub Diarize and AudioTransform Backend trait methods The recent backend.proto additions (Diarize, AudioTransform, AudioTransformStream) extended the gRPC Backend trait, breaking kokoros-grpc compilation with E0046 because the Rust implementation hadn't picked up the new methods. Add Unimplemented stubs matching the existing pattern for non-applicable RPCs in this TTS-only backend. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(vibevoice-cpp): track upstream ABI + wire 1.5B voice cloning Two recent commits in mudler/vibevoice.cpp reshaped the vv_capi_tts signature without a corresponding bump on the LocalAI side: 3bd759c "1.5b: unify into a single tts entry point" inserted a ref_audio_path parameter between voice_path and dst_wav_path. ad856bd "1.5b: multi-speaker dialog support" promoted that to a (const char* const* ref_audio_paths, int n_ref_audio_paths) pair for per-speaker conditioning. Because purego resolves symbols by name and not by signature, the build kept linking; at runtime the misaligned arguments turned the TTS->ASR closed-loop test into a SIGSEGV inside cgo. Track HEAD explicitly and bring the bridge in line with it: * Update the CppTTS purego binding to the 9-arg form. purego marshals []byte as a char by handing the C side the underlying array address; nil/empty maps to NULL, which matches the C contract for "no reference audio" on the realtime-0.5B path. Add a `ref_audio` gallery option (comma-separated, repeatable) that the 1.5B path consumes for runtime voice cloning. Multiple entries are interpreted as one WAV per speaker (Speaker 0..n-1). * TTSRequest.Voice now routes by extension/shape: `.wav` or a comma-separated list goes to ref_audio_paths; anything else stays on voice_path (realtime-0.5B's pre-baked voice gguf). * Pin VIBEVOICE_CPP_VERSION to ad856bd and wire the Makefile into the existing bump_deps matrix so future upstream rolls land as reviewable PRs instead of a silent CI break. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vibevoice-cpp): use ModelOptions.AudioPath for 1.5B ref audio Use the existing audio_path field from ModelOptions (already plumbed through config_file's `audio_path:` YAML and consumed by other audio backends like kokoros) instead of inventing a custom `ref_audio:` Options[] string. Multi-speaker setups stay on a single comma- separated value. No behavior change beyond the gallery key name; per-call routing via TTSRequest.Voice is unchanged. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-06 10:36:59 +02:00
LocalAI [bot]	7fab5e3d21	chore: ⬆️ Update ggml-org/whisper.cpp to `4bf733672b2871d4153158af4f621a6dd9104f4a` (#9636 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-06 00:34:16 +02:00
Ettore Di Giacinto	e86ade54a6	feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 ) * feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp Closes #1648. OpenAI-style multipart endpoint that returns "who spoke when". Single endpoint instead of the issue's three-endpoint sketch (refactor /vad, /vad/embedding, /diarization) — the typical client wants one call, and embeddings can land later as a sibling without breaking this surface. Response shape borrows from Pyannote/Deepgram: segments carry a normalised SPEAKER_NN id (zero-padded, stable across the response) plus the raw backend label, optional per-segment text when the backend bundles ASR, and a speakers summary in verbose_json. response_format also accepts rttm so consumers can pipe straight into pyannote.metrics / dscore. Backends: * vibevoice-cpp — Diarize() reuses the existing vv_capi_asr pass. vibevoice's ASR prompt asks the model to emit [{Start,End,Speaker,Content}] natively, so diarization is a by-product of the same pass; include_text=true preserves the transcript per segment, otherwise we drop it. * sherpa-onnx — wraps the upstream SherpaOnnxOfflineSpeakerDiarization C API (pyannote segmentation + speaker-embedding extractor + fast clustering). libsherpa-shim grew config builders, a SetClustering wrapper for per-call num_clusters/threshold overrides, and a segment_at accessor (purego can't read field arrays out of SherpaOnnxOfflineSpeakerDiarizationSegment[] directly). Plumbing: new Diarize gRPC RPC + DiarizeRequest / DiarizeSegment / DiarizeResponse messages, threaded through interface.go, base, server, client, embed. Default Base impl returns unimplemented. Capability surfaces all updated: FLAG_DIARIZATION usecase, FeatureAudioDiarization permission (default-on), RouteFeatureRegistry entries for /v1/audio/diarization and /audio/diarization, audio instruction-def description widened, CAP_DIARIZATION JS symbol, swagger regenerated, /api/instructions discovery map updated. Tests: * core/backend: speaker-label normalisation (first-seen → SPEAKER_NN, per-speaker totals, nil-safety, fallback to backend NumSpeakers when no segments). * core/http/endpoints/openai: RTTM rendering (file-id basename, negative duration clamping, fallback id). * tests/e2e: mock-backend grew a deterministic Diarize that emits raw labels "5","2","5" so the e2e suite verifies SPEAKER_NN remapping, verbose_json speakers summary + transcript pass-through (gated by include_text), RTTM bytes content-type, and rejection of unknown response_format. mock-diarize model config registered with known_usecases=[FLAG_DIARIZATION] to bypass the backend-name guard. Docs: new features/audio-diarization.md (request/response, RTTM example, sherpa-onnx + vibevoice setup), cross-link from audio-to-text.md, entry in whats-new.md. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(diarization): correct sherpa-onnx symbol name + lint cleanup CI failures on #9654: * sherpa-onnx-grpc-{tts,transcription} and sherpa-onnx-realtime panicked at backend startup with `undefined symbol: SherpaOnnxDestroyOfflineSpeakerDiarizationResult`. Upstream's actual symbol is SherpaOnnxOfflineSpeakerDiarizationDestroyResult (Destroy in the middle, not the prefix); the rest of the diarization surface follows the same naming pattern. The mismatched name made purego.RegisterLibFunc fail at dlopen time and crashed the gRPC server before the BeforeAll could probe Health, taking down every sherpa-onnx test job — not just the diarization-related ones. * golangci-lint flagged 5 errcheck violations on new defer cleanups (os.RemoveAll / Close / conn.Close); wrap each in a `defer func() { _ = X() }()` closure (matches the pattern other LocalAI files use for new code, since pre-existing bare defers are grandfathered in via new-from-merge-base). * golangci-lint also flagged forbidigo violations: the new diarization_test.go files used testing.T-style `t.Errorf` / `t.Fatalf`, which are forbidden by the project's coding-style policy (.agents/coding-style.md). Convert both files to Ginkgo/Gomega Describe/It with Expect(...) — they get picked up by the existing TestBackend / TestOpenAI suites, no new suite plumbing needed. * modernize linter: tightened the diarization segment loop to `for i := range int(numSegments)` (Go 1.22+ idiom). Verified locally: golangci-lint with new-from-merge-base=origin/master reports 0 issues across all touched packages, and the four mocked diarization e2e specs in tests/e2e/mock_backend_test.go still pass. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(vibevoice-cpp): convert non-WAV input via ffmpeg + raise ASR token budget Confirmed end-to-end against a real LocalAI instance with vibevoice-asr-q4_k loaded and the multi-speaker MP3 sample at vibevoice.cpp/samples/2p_argument.mp3: both /v1/audio/transcriptions and /v1/audio/diarization now succeed and return correctly attributed speaker turns for the full clip. Two latent issues surfaced once the diarization endpoint actually exercised the backend with a non-trivial input: 1. vv_capi_asr only accepts WAV via load_wav_24k_mono. The previous code passed the uploaded path straight through, so anything that wasn't already a 24 kHz mono s16le WAV failed at the C side with rc=-8 and the very unhelpful "vv_capi_asr failed". prepareWavInput shells out to ffmpeg ("-ar 24000 -ac 1 -acodec pcm_s16le") in a per-call temp dir, matching the rate the model was trained on; both AudioTranscription and Diarize now route through it. This is the same shape sherpa-onnx uses (utils.AudioToWav), but vibevoice needs 24 kHz rather than 16 kHz so we don't reuse that helper. 2. The C ABI's max_new_tokens defaults to 256 when 0 is passed. That's fine for a five-second clip but not for anything past ~10 s — vibevoice stops mid-JSON, the parse fails, and the caller sees a hard error. Pass a much larger budget (16 384 ≈ ~9 minutes of speech at the model's ~30 tok/s rate); generation stops at EOS so this is a cap rather than a target. 3. As a defensive belt-and-braces, mirror AudioTranscription's existing "fall back to a single segment if the model emits non-JSON text" pattern in Diarize, so partial / unusual model output never produces a 500. This kept the endpoint usable while diagnosing (1) and (2), and is the right behaviour to keep. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(vibevoice-cpp): pass valid WAVs through directly so ffmpeg is not required at runtime Spotted by tests-e2e-backend (1.25.x): the previous fix forced every incoming audio file through `ffmpeg -ar 24000 ...`, which meant the backend container — which does not ship ffmpeg — failed even for the existing happy path where the caller already uploads a WAV. The container-side error was: rpc error: code = Unknown desc = vibevoice-cpp: ffmpeg convert to 24k mono wav: exec: "ffmpeg": executable file not found in $PATH Reading vibevoice.cpp's audio_io.cpp, `load_wav_24k_mono` uses drwav and already accepts any PCM/IEEE-float WAV at any sample rate, downmixes multi-channel input to mono, and resamples to 24 kHz internally. So the only inputs that genuinely need an external converter are non-WAV formats (MP3, OGG, FLAC, ...). Detect WAVs by RIFF/WAVE magic at bytes 0..3 / 8..11 and pass them straight through with a no-op cleanup; everything else still goes through ffmpeg with the same 24 kHz mono s16le target. The result: * Container builds without ffmpeg keep working for WAV uploads (the e2e-backends fixture is jfk.wav at 16 kHz mono s16le). * MP3 and other non-WAV inputs still get the new ffmpeg conversion path so the diarization endpoint stays useful. * If the caller uploads a non-WAV but ffmpeg isn't on PATH, the surfaced error is still descriptive enough to act on. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(ci): make gcc-14 install in Dockerfile.golang best-effort for jammy bases The LocalVQE PR (`bb033b16`) made `gcc-14 g++-14` an unconditional apt install in backend/Dockerfile.golang and pointed update-alternatives at them. That works on the default `BASE_IMAGE=ubuntu:24.04` (noble has gcc-14 in main), but every Go backend that builds on `nvcr.io/nvidia/l4t-jetpack:r36.4.0` — jammy under the hood — now fails at the apt step: E: Unable to locate package gcc-14 This blocked unrelated jobs: backend-jobs(*-nvidia-l4t-arm64-{stablediffusion-ggml, sam3-cpp, whisper, acestep-cpp, qwen3-tts-cpp, vibevoice-cpp}). LocalVQE itself is only matrix-built on ubuntu:24.04 (CPU + Vulkan), so it doesn't actually need gcc-14 anywhere else. Make the gcc-14 install conditional on the package being available in the configured apt repos. On noble: identical behaviour to today (gcc-14 installed, update-alternatives points at it). On jammy: skip the gcc-14 stanza entirely and let build-essential's default gcc take over, which is what the other Go backends compile with anyway. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-05 15:10:13 +02:00
Richard Palethorpe	bb033b16a9	feat: add LocalVQE backend and audio transformations UI (#9640 ) feat(audio-transform): add LocalVQE backend, bidi gRPC RPC, Studio UI Introduce a generic "audio transform" capability for any audio-in / audio-out operation (echo cancellation, noise suppression, dereverberation, voice conversion, etc.) and ship LocalVQE as the first backend implementation. Backend protocol: - Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and bidirectional AudioTransformStream for low-latency frame-by-frame use. This is the first bidi stream in the proto; per-frame unary at LocalVQE's 16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server, embed,interface,base} with paired-channel ergonomics. LocalVQE backend (backend/go/localvqe/): - Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream shared lib + its libggml-cpu-.so runtime variants directly — no MODULE wrapper needed because LocalVQE handles CPU feature selection internally via GGML_BACKEND_DL. - Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it LocalVQE runs single-threaded at ~1× realtime instead of the documented ~9.6×. - Reference-length policy: zero-pad short refs, truncate long ones (the trailing portion can't have leaked into a mic that wasn't recording). - Ginkgo test suite (9 always-on specs + 2 model-gated). HTTP layer: - POST /audio/transformations (alias /audio/transform): multipart batch endpoint, accepts audio + optional reference + params[]=v form fields. Persists inputs alongside the output in GeneratedContentDir/audio so the React UI history can replay past (audio, reference, output) triples. - GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames (interleaved stereo mic+ref in, mono out). JSON session.update envelope for config; constants hoisted in core/schema/audio_transform.go. - ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing utils.AudioToWav (with passthrough fast-path), so the user can upload any format / rate without seeing the model's strict 16 kHz constraint. - BackendTraceAudioTransform integration so /api/backend-traces and the Traces UI light up with audio_snippet base64 and timing. - Routes registered under routes/localai.go (LocalAI extension; OpenAI has no /audio/transformations endpoint), traced via TraceMiddleware. Auth + capability + importer: - FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on, in APIFeatures), three RouteFeatureRegistry rows. - localvqe added to knownPrefOnlyBackends with modality "audio-transform". - Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on huggingface.co/LocalAI-io/LocalVQE). React UI: - New /app/transform page surfaced via a dedicated "Enhance" sidebar section (sibling of Tools / Biometrics) — the page is enhancement, not generation, so it lives outside Studio. Two AudioInput components (Upload + Record tabs, drag-drop, mic capture). - Echo-test button: records mic while playing the loaded reference through the speakers — the mic naturally picks up speaker bleed, giving a real (mic, ref) pair for AEC testing without leaving the UI. - Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls) and useAudioPeaks hook (shared module-scoped AudioContext to avoid hitting browser context limits with three players on one page); migrated TTS, Sound, Traces audio blocks to use it. - Past runs saved in localStorage via useMediaHistory('audio-transform') — the history entry stores all three URLs so clicking re-renders the full triple, not just the output. Build + e2e: - 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm, SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those two and let GPU-class hardware route through Vulkan in the gallery capabilities map. - tests-localvqe-grpc-transform job in test-extra.yml (gated on detect-changes.outputs.localvqe). - New audio_transform capability + 4 specs in tests/e2e-backends. - Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js (8 specs covering tabs, file upload, multipart shape, history, errors). Docs: - New docs/content/features/audio-transform.md covering the (audio, reference) mental model, batch + WebSocket wire formats, LocalVQE param keys, and a YAML config example. Cross-links from text-to-audio and audio-to-text feature pages. Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-04 22:07:11 +02:00
LocalAI [bot]	76971fb2aa	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3d6064b37ef4607917f8acf2ca8c8906d5087413` (#9617 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-30 08:43:42 +02:00
Ettore Di Giacinto	fe6eb57082	feat(vibevoice-cpp): add purego TTS+ASR backend (#9610 ) * feat(vibevoice-cpp): add purego TTS+ASR backend Wire up Microsoft VibeVoice via the vibevoice.cpp C ABI as a new purego-based Go backend that serves both Backend.TTS and Backend.AudioTranscription from a single gRPC binary. Mirrors the qwen3-tts-cpp / sherpa-onnx pattern so the variant matrix (cpu/cuda12/cuda13/metal/rocm/sycl-f16/f32/vulkan/l4t) and the e2e-backends gRPC harness reuse existing infrastructure. - backend/go/vibevoice-cpp/ - Makefile, CMakeLists, purego shim, gRPC Backend with model-dir auto-detection, closed-loop TTS->ASR smoke test - backend/index.yaml - &vibevoicecpp meta + 18 image entries - Makefile - .NOTPARALLEL, BACKEND_VIBEVOICE_CPP, docker-build wiring, test-extra-backend-vibevoice-cpp-{tts,transcription} e2e wrappers - .github/workflows/backend.yml - matrix entries for all variants - .github/workflows/test-extra.yml - per-backend smoke + 2 gRPC e2e jobs * feat(vibevoice-cpp): drop hardcoded glob detection, add gallery entries Refactor backend Load() to follow the standard Options[] convention used by sherpa-onnx and the rest of the multi-role backends: ModelFile is the primary gguf, supplementary paths come through opts.Options[] as key=value (or key:value for Make-target compat), resolved against opts.ModelPath. type=asr/tts decides the role of ModelFile when neither tts_model nor asr_model is set explicitly. Add gallery/index.yaml entries: - vibevoice-cpp - realtime 0.5B Q8_0 TTS + tokenizer + Carter voice - vibevoice-cpp-asr - long-form ASR Q8_0 + tokenizer Both pull from huggingface://mudler/vibevoice.cpp-models with sha256 verification. parameters.model + Options[] paths are siblings under {models_dir} per the qwen3-tts-cpp convention. Update Makefile e2e wrappers to pass BACKEND_TEST_OPTIONS comma+colon style, and tighten the per-backend Go closed-loop test to use the explicit Options API. * fix(vibevoice-cpp): force whole-archive link so vv_capi_* exports survive libvibevoice is a STATIC archive linked into the MODULE library. Without --whole-archive (or -force_load on Apple, /WHOLEARCHIVE on MSVC), the linker garbage-collects symbols not referenced from this translation unit - which means dlopen+RegisterLibFunc panics with 'undefined symbol: vv_capi_load' at backend startup, since purego looks them up by name and our cpp/govibevoicecpp.cpp doesn't call them directly. * test(vibevoice-cpp): rewrite suite with Ginkgo v2 Match the convention used by backend/go/sherpa-onnx/backend_test.go. The suite now covers backend semantics that don't need purego (Locking, empty-ModelFile rejection, TTS/ASR-without-loaded-model errors) on top of the gRPC lifecycle specs (Health, Load, closed-loop TTS->ASR). Model-dependent specs Skip() when VIBEVOICE_MODEL_DIR is unset, so `go test ./backend/go/vibevoice-cpp/` is green on a clean checkout and runs the heavyweight closed-loop spec when test.sh has staged the bundle. * fix(vibevoice-cpp): implement TTSStream + AudioTranscriptionStream The gRPC server's stream handlers (pkg/grpc/server.go) spawn a goroutine that ranges over a chan; the only thing closing that chan is the backend's own Stream method. With the default Base stub returning 'unimplemented' and never touching the chan, the server goroutine hangs forever and the client hits DeadlineExceeded - which is exactly what the e2e harness saw in the test-extra-backend-vibevoice-cpp-tts matrix run. TTSStream synthesizes via vv_capi_tts to a tempfile, then emits a streaming WAV header (chunk sizes 0xFFFFFFFF so HTTP clients can start playback before the full PCM lands) followed by the PCM body in 64 KB slices. The header + >=2 PCM frames satisfy the harness's 'expected >=2 chunks' assertion and give a real progressive stream. AudioTranscriptionStream runs the offline transcription, emits each segment as a delta, and closes with a final_result whose Text equals the concatenated deltas (the harness asserts those match). Two new Ginkgo specs guard the close-channel-on-error path so the deadline-exceeded regression can't come back silently. fix(vibevoice-cpp): silence errcheck on cleanup paths Lint flagged six unchecked Close()/Remove()/RemoveAll() calls along purely-cleanup deferred paths. Wrap each in '_ = ...' (or a closure for defers that take args) - matches what the rest of the LocalAI backend/go/* tree already does for these callsites. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(vibevoice-cpp): closed-loop slot fill + modelRoot-relative path resolution Two bugs the test-extra-backend-vibevoice-cpp-* CI matrix surfaced: 1. Closed-loop Load with ModelFile=tts.gguf + Options[asr_model=...] left v.ttsModel empty, because the default-fill block only ran when BOTH slots were empty. vv_capi_load then got tts="" + a voice and the C side rejected it with rc=-3 'TTS model required to load a voice'. Fix: ModelFile fills the primary role-slot (decided by 'type=' in Options, defaulting to tts) independently of the secondary, so ModelFile + asr_model resolves to both. 2. resolvePath stat'd CWD before falling back to relTo. With LocalAI launched from a directory that happens to contain a same-named file, supplementary Options[] paths could leak away from the models dir. Drop the CWD probe entirely - relative paths now always join onto opts.ModelPath (the gallery convention). New Ginkgo coverage: * 'ModelFile slot resolution' (4 specs) - asr_model+ModelFile, type=asr, explicit tts_model override, key:value variant. * 'resolvePath (relative-to-modelRoot)' (5 specs) - join, abs passthrough, empty input, empty relTo, and the CWD-trap regression test. * 'Load resolves relative Options paths against opts.ModelPath' - end- to-end gallery layout round-trip. Verified locally: 19/19 specs pass (with model bundle, including the closed-loop TTS->ASR; without bundle, 17 pass + 2 model-dependent skip). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(vibevoice-cpp): use gallery convention in closed-loop spec The 'loads the realtime TTS model' / closed-loop specs were passing already-prefixed paths into Options[]: Options: ['tokenizer=' + filepath.Join(modelDir, 'tokenizer.gguf')] Combined with no ModelPath set on the request, the backend's modelRoot fell back to filepath.Dir(ModelFile) = modelDir, then resolvePath joined the prefixed Options path on top of it - producing 'vibevoice-models/vibevoice-models/tokenizer.gguf' when the CI's VIBEVOICE_MODEL_DIR is the relative './vibevoice-models'. The fix is to mirror the gallery contract LocalAI core actually sends in production: ModelPath is the models root (absolute), ModelFile is a name under it, every Options[] path is relative to ModelPath. Uses filepath.Base() to get bare filenames. Verified locally with both VIBEVOICE_MODEL_DIR=/tmp/vv-bundle (abs) and VIBEVOICE_MODEL_DIR=vibevoice-models (the relative shape that broke CI). Both: 19/19 specs pass, ~55-60s. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(vibevoice-cpp): switch ASR to Q4_K + bump transcription timeout The Q8_0 ASR gguf is ~14 GB - too big to fit alongside the runner image, the docker build cache, and the test artifacts on a free ubuntu-latest GHA runner; 'test-extra-backend-vibevoice-cpp-transcription' was getting SIGTERM'd at 90 min before the model could finish loading. Switch to Q4_K (~10 GB on disk, slightly faster CPU decode) for: * the e2e harness Make target * the gallery 'vibevoice-cpp-asr' entry (parameters + files block) * the per-backend test.sh auto-download list Bump tests-vibevoice-cpp-grpc-transcription's timeout-minutes from 90 to 150 - even with Q4_K, the 30 s JFK clip on a CPU runner needs runway above the previous 90 min cap. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(vibevoice-cpp): drop transcription gRPC e2e job - too heavy for free runners The vibevoice ASR is a 7B-parameter model. Even on Q4_K (~10 GB on disk) a single 30 s transcription saturates the per-test 30 min timeout in the e2e-backends harness on a 4-core ubuntu-latest, and the 10 GB download + Docker layer + working space leaves no headroom on the runner's free disk. Two attempts in CI got SIGTERM'd at the LoadModel boundary - the bottleneck isn't tunable from the workflow side without a paid-tier runner. The per-backend tests-vibevoice-cpp job already runs the same AudioTranscription path via a closed-loop TTS->ASR Ginkgo spec - same gRPC contract, same model, single process - so the standalone tests-vibevoice-cpp-grpc-transcription job was redundant on top of the disk/CPU pressure. The Makefile target test-extra-backend-vibevoice-cpp-transcription stays for local invocation on workstations that can afford it - useful when developing the streaming codepaths. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(vibevoice-cpp): restore transcription gRPC e2e on bigger-runner Switch tests-vibevoice-cpp-grpc-transcription from ubuntu-latest to the self-hosted 'bigger-runner' label that GPU image builds in backend.yml use, plus the documented Free-disk-space prep step (purge dotnet / ghc / android / CodeQL caches) the disabled vllm/sglang entries in this file describe. That gives the 7B-param Q4_K ASR model the disk + CPU runway it needs. Keep timeout-minutes: 150 - even on a beefier runner the 30 s JFK decode plus 10 GB download has to fit comfortably. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(vibevoice-cpp): apt-get install make on bigger-runner before transcription e2e bigger-runner is a self-hosted bare runner without the standard ubuntu image's preinstalled build tools, so the previous job died at the very first command with 'make: command not found' (exit 127). Add the Dependencies step that the disabled vllm/sglang entries in this file already document - apt-get installs make + build-essential + curl + unzip + ca-certificates + git + tar before the make target runs. Mirrors how every other 'runs-on: bigger-runner' entry in backend.yml prepares the runner. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-29 22:22:14 +02:00
Richard Palethorpe	4443250756	chore: add golangci-lint with new-from-merge-base baseline (#9603 ) * chore: add golangci-lint with new-from-merge-base baseline Configure golangci-lint v2 with the standard linter set (errcheck, govet, ineffassign, unused) plus forbidigo, which enforces the Ginkgo/Gomega-only test convention from .agents/coding-style.md by rejecting stdlib testing calls (t.Errorf, t.Fatalf, t.Run, ...). staticcheck is disabled — the codebase has many pre-existing QF-style suggestions not worth gating on. issues.new-from-merge-base = master makes the lint job a gate for new issues only; the ~1300 pre-existing baseline stays visible via 'make lint-all' for incremental cleanup. CI runs 'make lint'. Backends needing C/C++ headers we don't install in the lint runner are excluded via a deny list in the Makefile (backend/go/{piper,silero-vad, llm}, cmd/launcher). Discovery still flows through 'go list ./...', so new packages are scanned automatically. To make backend/go/{sam3-cpp,stablediffusion-ggml,whisper} typecheckable, move their .cpp/.h sources into cpp/ subdirs (matching qwen3-tts-cpp / acestep-cpp). Without this 'go list' rejects the package because Go does not allow .cpp alongside .go without cgo. Fix two real bugs found by lint in tests/integration/ (run only via 'make test-stores', not default CI): a stale zerolog reference left over from the slog migration (`c37785b7`) and an unused 'os' import. Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write] Signed-off-by: Richard Palethorpe <io@richiejp.com> * ci(lint): generate proto sources and fetch full history The lint job was failing for two reasons: - pkg/grpc/proto/.go is generated, not checked in. Several packages import it, so without 'make protogen-go' typecheck fails project-wide with "no required module provides package github.com/mudler/LocalAI/ pkg/grpc/proto". - golangci-lint's new-from-merge-base needs to git-merge-base the PR against master, but actions/checkout's default shallow clone doesn't fetch master. fetch-depth: 0 brings full history; the config now references origin/master (the remote-tracking branch that survives the shallow checkout) instead of bare master (which doesn't exist locally after checkout). Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write] Signed-off-by: Richard Palethorpe <io@richiejp.com> ci(lint): stub react-ui/dist for go:embed glob core/http/app.go has //go:embed react-ui/dist/*. The glob must match at least one non-hidden entry or typecheck fails the whole core/http package. We don't need the real React bundle to lint Go code, so just touch an empty index.html to satisfy the embed. Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-28 22:07:44 +02:00
LocalAI [bot]	f3500223d7	chore: ⬆️ Update leejet/stable-diffusion.cpp to `a81677f59c92d90343aebca51dfed7decf0a0cb0` (#9586 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-28 08:44:10 +02:00
Ettore Di Giacinto	5b0196c7d0	fix(whisper): scrub invalid UTF-8 from segment text before protobuf marshal whisper.cpp can emit bytes that are not valid UTF-8 — typically a multibyte codepoint split across token boundaries. protobuf string fields reject those at marshal time, which would surface as a transcribe failure. Run strings.ToValidUTF8 on the segment text before it leaves the cgo boundary so the bad byte gets replaced with U+FFFD. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]	2026-04-26 19:35:39 +00:00
LocalAI [bot]	1c9592c77f	chore: ⬆️ Update leejet/stable-diffusion.cpp to `b8bdffc19962be7e5a84bfefeb2e31bd885b571a` (#9521 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-24 15:15:15 +02:00
Richard Palethorpe	13734ae9fa	feat: Add Sherpa ONNX backend for ASR and TTS (#8523 ) feat(backend): Add Sherpa ONNX backend and Omnilingual ASR Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same approach as opus/stablediffusion-ggml/whisper — a thin C shim (csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego can't reach directly: nested struct config writes, result-struct field reads, and the streaming TTS callback trampoline. The Go side uses opaque uintptr handles and purego.NewCallback for the TTS callback. Supports: - VAD via sherpa-onnx's Silero VAD - Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC - Online/streaming ASR: zipformer transducer with endpoint detection (AudioTranscriptionStream emits delta events during decode) - Offline TTS: VITS (LJS, etc.) - Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel, prefixed by a streaming WAV header Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR), silero-vad-sherpa, vits-ljs-sherpa. E2E coverage: tests/e2e-backends for offline + streaming ASR, tests/e2e for the full realtime pipeline (VAD + STT + TTS). Assisted-by: claude-opus-4-7-1M [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-24 14:40:06 +02:00
Ettore Di Giacinto	4906cbad04	feat: add biometrics UI (#9524 ) * feat(react-ui): add Face & Voice Recognition pages Expose the face and voice biometrics endpoints (/v1/face/, /v1/voice/) through the React UI. Each page has four tabs driving the six endpoints per modality: Analyze (demographics with bounding boxes / waveform segments), Compare (verify with a match gauge and live threshold slider), Enrollment (register / identify / forget with a top-K matches view), Embedding (raw vector inspector with sparkline + copy). MediaInput supports file upload plus live capture: webcam snap-to-canvas for face, MediaRecorder -> AudioContext -> 16-bit PCM mono WAV transcode for voice (libsndfile on the backend only handles WAV/FLAC/OGG natively). Sidebar gets a new Biometrics section feature-gated on face_recognition / voice_recognition; routes are wrapped in <RequireFeature>. No new dependencies -- Font Awesome icons picked from the Free set. Assisted-by: Claude:Opus 4.7 * fix(localai): accept data URI prefixes with codec/charset params Browser MediaRecorder produces data URIs like data:audio/webm;codecs=opus;base64,... so the pre-';base64,' section can carry multiple parameter segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go and core/http/endpoints/localai/audio.go only matched exactly one segment, so recordings straight from the React UI's live-capture tab failed the strip and then tripped the base64 decoder on the leading 'data:' literal, surfacing as "invalid audio base64: illegal base64 data at input byte 4" Widened both regexes to `^data:[^,]+?;base64,` so any number of ';param=value' segments between the mime type and ';base64,' are tolerated. Added a regression test covering the MediaRecorder shape. Assisted-by: Claude:Opus 4.7 * fix(insightface): scope pack ONNX loading to known manifests LocalAI's gallery extracts buffalo_* zips flat into the models directory, which inevitably mixes with ONNX files from other backends (opencv face engine, MiniFASNet antispoof, WeSpeaker voice embedding) and older buffalo pack installs. Feeding those foreign files into insightface's model_zoo.get_model() blows up inside the router -- it assumes a 4-D NCHW input and indexes `input_shape[2]` on tensors that aren't shaped like a face model, raising IndexError mid-load and leaving the backend unusable. The router's dispatch isn't amenable to per-file try/except alone (first-file-wins picks det_10g.onnx from buffalo_l even when the user asked for buffalo_sc -- alphabetical order happens to favour the wrong pack). Instead, ship an explicit manifest of the upstream v0.7 pack contents and scope the glob to that when the requested pack is known. The manifest is small and stable; future packs can be added alongside or fall through to the tolerance loop, which also swallows any remaining IndexError / ValueError from foreign files with a clear `[insightface] skipped` stderr line for diagnostics. Assisted-by: Claude:Opus 4.7 * fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders Pre-exported speaker-encoder ONNX graphs come in two shapes: rank-2 [batch, samples] -- some 3D-Speaker exports, take raw waveform directly. rank-3 [batch, frames, n_mels] -- WeSpeaker and most Kaldi- lineage encoders, expect pre-computed Kaldi FBank. OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` -- correct for rank-2, IndexError-on-input_shape[3] on rank-3, which surfaced to the UI as "Invalid rank for input: feats Got: 2 Expected: 3" Detect the input rank at session init and run Kaldi FBank (80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before the forward pass when rank>=3. All knobs are configurable via backend options for encoders that deviate from defaults. torchaudio.compliance.kaldi is already in the backend's requirements (SpeechBrain pulls torchaudio in), so no new dependency. Assisted-by: Claude:Opus 4.7 * fix(biometrics): isolate face and voice vector stores Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker 256-D) biometric embeddings were colliding inside a single in-memory local-store instance. Enrolling one after the other failed with "Try to add key with length N when existing length is M" because local-store correctly refuses to mix dimensions in one keyspace. The registries were constructed with `storeName=""`, which in StoreBackend() is just a WithModel() call. But ModelLoader's cache is keyed on `modelID`, not `model` -- so both registries collapsed to the same `modelID=""` slot and reused the same backend process despite looking isolated on paper. Three complementary fixes: 1. application.go -- give each registry a distinct default namespace ("localai-face-biometrics" / "localai-voice-biometrics"). The comment claimed isolation, now it's actually enforced. 2. stores.go -- pass the storeName as both WithModelID and WithModel so the ModelLoader cache key separates namespaces and the loader spawns distinct processes. 3. local-store/store.go -- drop the Load() `opts.Model != ""` guard. It was there to prevent generic model-loading loops from picking up local-store by accident, but that auto-load path is being retired; the guard now just blocks legitimate namespace isolation. opts.Model is treated as a tag; the per-tuple process isolation upstream handles discrimination. Assisted-by: Claude:Opus 4.7 * fix(gallery): stale-file cleanup and upgrade-tmp directory safety Two related robustness fixes for backend install/upgrade: pkg/downloader/uri.go OCI downloads passed through if filepath.Ext(filePath) != "" ... filePath = filepath.Dir(filePath) which was intended to redirect file-shaped download targets into their parent directory for OCI extraction. The heuristic misfires on directory-shaped paths with a dot-suffix -- gallery.UpgradeBackend uses tmpPath = "<backendsPath>/<name>.upgrade-tmp" and Go's filepath.Ext treats ".upgrade-tmp" as an extension. The rewrite landed the extraction at "<backendsPath>/", which then overwrote the real install (backends/<name>/) with a flat-layout file and left a stray run.sh at the top level. The tmp dir itself stayed empty, so the validation step that checked "<tmpPath>/run.sh" predictably failed with "upgrade validation failed: run.sh not found in new backend" Every manual upgrade silently corrupted the backends tree this way. Guard the rewrite behind "target isn't already an existing directory" -- InstallBackend / UpgradeBackend both pre-create the target as a directory, so they get the correct behaviour; existing file-path callers with a genuine dot-extension still get the parent redirect. core/gallery/backends.go InstallBackend's MkdirAll returned ENOTDIR when something at the target path was already a file (legacy dev builds dropped golang backend binaries directly at `<backendsPath>/<name>` instead of nesting them under their own subdir). That permanently blocked reinstall and upgrade for anyone carrying that state, since every retry hit the same error. Detect a pre-existing non-directory, warn, and remove it before the MkdirAll so the fresh install can write the correct nested layout with metadata.json + run.sh. Assisted-by: Claude:Opus 4.7 * fix(galleryop): refresh upgrade cache after backend ops UpgradeChecker caches the last upgrade-check result and only refreshes on the 6-hour tick or after an auto-upgrade cycle. Manual upgrades (POST /api/backends/upgrade/:name) go through the async galleryop worker, which completes the upgrade correctly but never tells UpgradeChecker to re-check -- so /api/backends/upgrades continued to list a just-upgraded backend as upgradeable, indistinguishable from a failed upgrade, for up to six hours. Add an optional `OnBackendOpCompleted func()` hook on GalleryService that fires after every successful install / upgrade / delete on the backend channel (async, so a slow callback doesn't stall the queue). startup.go wires it to UpgradeChecker.TriggerCheck after both services exist. Result: the upgrade banner clears within milliseconds of the worker finishing. Assisted-by: Claude:Opus 4.7 * build: prepend GOPATH/bin to PATH for protogen-go install-go-tools runs `go install` for protoc-gen-go and protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin. That directory isn't on every dev's PATH, and protoc resolves its code-gen plugins via PATH, so the immediately-following protoc invocation fails with "protoc-gen-go: program not found" which in turn blocks `make build` and any `make backends/%` target that depends on build. Prepend `go env GOPATH`/bin to PATH for the protoc invocation so the freshly-installed plugins are found without requiring a shell-profile change. Assisted-by: Claude:Opus 4.7 * refactor(ui-api): non-blocking backend upgrade handler with opcache POST /api/backends/upgrade/:name used to send the ManagementOp directly onto the unbuffered BackendGalleryChannel, which blocked the HTTP request whenever the galleryop worker was busy with a prior operation. The op also didn't show up in /api/operations, so the Backends UI couldn't reflect upgrade progress on the affected row. Register the op in opcache immediately, wrap it in a cancellable context, store the cancellation function on the GalleryService, and push onto the channel from a goroutine so the handler returns right away. Response gains a `jobID` field and a `message` string so clients have a consistent handle regardless of whether the op is queued or running. Pairs with the OnBackendOpCompleted hook added in the galleryop commit — together the UI sees the upgrade start, watches progress via /api/operations, and drops the "upgradeable" flag the moment the worker finishes. Assisted-by: Claude:Opus 4.7	2026-04-24 08:50:34 +02:00
LocalAI [bot]	eb00d9b178	chore: ⬆️ Update leejet/stable-diffusion.cpp to `c97702e1057c2fe13a7074cd9069cb9dd6edc1bf` (#9495 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-23 09:32:21 +02:00
LocalAI [bot]	cd94a0b61a	chore: ⬆️ Update ggml-org/whisper.cpp to `fc674574ca27cac59a15e5b22a09b9d9ad62aafe` (#9450 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-21 11:09:05 +02:00
Ettore Di Giacinto	60633c4dd5	fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435 ) gen_video's ffmpeg subprocess was relying on the filename extension to choose the output container. Distributed LocalAI hands the backend a staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to .mp4 only after the backend returns, so ffmpeg saw a .tmp extension and bailed with "Unable to choose an output format". Inference had already completed and the frames were piped in, producing the cryptic "video inference failed (code 1)" at the API layer. Pass -f mp4 explicitly so the container is selected by flag instead of by filename suffix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 00:41:54 +02:00
LocalAI [bot]	3804497186	chore: ⬆️ Update leejet/stable-diffusion.cpp to `44cca3d626d301e2215d5e243277e8f0e65bfa78` (#9428 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 23:39:07 +02:00
LocalAI [bot]	e94a9a8f10	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7d33d4b2ddeafa672761a5880ec33bdff452504d` (#9417 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-19 09:26:58 +02:00
Ettore Di Giacinto	054c4b4b45	feat(stable-diffusion.ggml): add support for video generation (#9420 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-19 09:26:33 +02:00
LocalAI [bot]	86c673fd94	chore: ⬆️ Update ggml-org/whisper.cpp to `166c20b473d5f4d04052e699f992f625ea2a2fdd` (#9403 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-18 00:42:32 +02:00
LocalAI [bot]	b6a68e5df4	chore: ⬆️ Update leejet/stable-diffusion.cpp to `a564fdf642780d1df123f1c413b19961375b8346` (#9383 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 08:11:55 +02:00
LocalAI [bot]	7f88a3ba30	chore: ⬆️ Update leejet/stable-diffusion.cpp to `c41c5ded7af85e01b7fe442ff7950c720706d53a` (#9366 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 09:04:33 +02:00
Ettore Di Giacinto	87e6de1989	feat: wire transcription for llama.cpp, add streaming support (#9353 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-14 16:13:40 +02:00
Ettore Di Giacinto	151ad271f2	feat(rocm): bump to 7.x (#9323 ) feat(rocm): bump to 7.2.1 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-12 08:51:30 +02:00
Ettore Di Giacinto	7a0e6ae6d2	feat(qwen3tts.cpp): add new backend (#9316 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-11 23:14:26 +02:00
LocalAI [bot]	e4bfc42a2d	chore: ⬆️ Update leejet/stable-diffusion.cpp to `6b675a5ede9b0edf0a0f44191e8b79d7ef27615a` (#9320 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-11 23:07:30 +02:00
LocalAI [bot]	606f462da4	chore: ⬆️ Update PABannier/sam3.cpp to `01832ef85fcc8eb6488f1d01cd247f07e96ff5a9` (#9311 ) ⬆️ Update PABannier/sam3.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-11 08:38:30 +02:00
Ettore Di Giacinto	706cf5d43c	feat(sam.cpp): add sam.cpp detection backend (#9288 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-09 21:49:11 +02:00
LocalAI [bot]	7081b54c09	chore: ⬆️ Update leejet/stable-diffusion.cpp to `e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef` (#9281 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-09 08:12:23 +02:00
LocalAI [bot]	8e59346091	chore: ⬆️ Update leejet/stable-diffusion.cpp to `8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0` (#9262 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-07 00:39:38 +02:00
LocalAI [bot]	e6e4e19633	chore: ⬆️ Update ace-step/acestep.cpp to `e0c8d75a672fca5684c88c68dbf6d12f58754258` (#9261 ) ⬆️ Update ace-step/acestep.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-07 00:39:24 +02:00
LocalAI [bot]	11637b5a1b	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7397ddaa86f4e8837d5261724678cde0f36d4d89` (#9242 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-06 10:52:51 +02:00
LocalAI [bot]	2d40725ca2	chore: ⬆️ Update leejet/stable-diffusion.cpp to `87ecb95cbc65dc8e58e3d88f4f4a59a0939796f5` (#9200 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-02 08:53:04 +02:00
LocalAI [bot]	ba7cdd532a	chore: ⬆️ Update leejet/stable-diffusion.cpp to `09b12d5f6d51d862749e8e0ee8baac8f012089e2` (#9195 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-01 00:48:25 +02:00
Copilot	87a63316c7	stablediffusion-ggml: replace hand-maintained enum string arrays with upstream API calls (#9192 ) * Initial plan * Remove hand-maintained enum string arrays in gosd.cpp, use upstream API functions Agent-Logs-Url: https://github.com/mudler/LocalAI/sessions/561fb489-89ed-4588-8f1e-7b967d91ba37 Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-31 14:53:38 +02:00
LocalAI [bot]	56db76599a	chore: ⬆️ Update ggml-org/whisper.cpp to `95ea8f9bfb03a15db08a8989966fd1ae3361e20d` (#9168 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-30 08:27:11 +02:00
LocalAI [bot]	ad57cdfefe	chore: ⬆️ Update leejet/stable-diffusion.cpp to `f16a110f8776398ef23a2a6b7b57522c2471637a` (#9167 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-30 08:26:45 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
LocalAI [bot]	7209457f53	chore: ⬆️ Update ace-step/acestep.cpp to `6f35c874ee11e86d511b860019b84976f5b52d3a` (#9128 ) ⬆️ Update ace-step/acestep.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-25 07:52:31 +01:00
LocalAI [bot]	bf92117259	chore: ⬆️ Update ggml-org/whisper.cpp to `76684141a5d059be71cbe23dc2f0ed552213ba2d` (#9094 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-22 00:57:28 +01:00
LocalAI [bot]	8036d22ec6	chore: ⬆️ Update ace-step/acestep.cpp to `7326a7bea0c2037982ec924f7364e998df70450c` (#9086 ) ⬆️ Update ace-step/acestep.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-22 00:56:52 +01:00
LocalAI [bot]	7d81bf0aa3	chore: ⬆️ Update ggml-org/whisper.cpp to `9386f239401074690479731c1e41683fbbeac557` (#9077 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-19 23:27:35 +01:00
LocalAI [bot]	dd1a8b174f	chore: ⬆️ Update ggml-org/whisper.cpp to `ef3463bb29ef90d25dfabfd1e75993111c52412d` (#9062 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-19 07:58:11 +01:00
LocalAI [bot]	8560a1e571	chore: ⬆️ Update ace-step/acestep.cpp to `ab020a9aefcd364423e0665da12babc6b0c7b507` (#9046 ) ⬆️ Update ace-step/acestep.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-18 08:54:15 +01:00

1 2 3 4 5

237 Commits