LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-03 13:56:46 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	c508e9d7c6	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7948df8ac1070f5f6881b8d34675821893eb97d6` (#10127 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:03 +02:00
LocalAI [bot]	55e754fd05	chore: ⬆️ Update ggml-org/whisper.cpp to `23ee03506a91ac3d3f0071b40e66a430eebdfa1d` (#10130 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 01:43:03 +02:00
LocalAI [bot]	7013e13f05	chore: ⬆️ Update ggml-org/llama.cpp to `399739d5c5978351f39e3454bfbfbab4f369088f` (#10119 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 14:24:51 +02:00
LocalAI [bot]	63f176346e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `be65ac7511b30379b003626c15224798929e33d4` (#10118 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:50 +02:00
LocalAI [bot]	af94d08729	chore: ⬆️ Update ggml-org/whisper.cpp to `fe69461618ffc50ba8afa65c25cc6c6e34d4537f` (#10117 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:34 +02:00
LocalAI [bot]	6795d38f50	chore: ⬆️ Update mudler/parakeet.cpp to `cb45f68068081af01e7092e91b038ee353eb56be` (#10116 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 23:57:15 +02:00
Richard Palethorpe	718223f33b	feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 ) * chore(localvqe): update backend to v1.3, add v1.2/v1.3 gallery models Bump the LocalVQE backend pin 72bfb4c6 -> b0f0378a, which adds the v1.2 (1.3 M) and v1.3 (4.8 M) GGUF SHA-256s to the upstream released-models allowlist (and the arch_version=3 loader) so both load without LOCALVQE_ALLOW_UNHASHED. Add gallery entries for localvqe-v1.2-1.3m and localvqe-v1.3-4.8m (SHA-256 verified against the downloaded weights) and update the audio-transform docs to make v1.3 the current default while noting the compact v1.1/v1.2 alternatives. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(flake): add ffmpeg-headless to the dev shell pkg/utils/ffmpeg_test.go shells out to the `ffmpeg` CLI, and the pre-commit gate runs those tests via `make test-coverage`. Without ffmpeg in the dev shell the gate fails with "executable file not found in $PATH". The headless build provides the CLI without GUI/X deps. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(localvqe): parse WAV by walking RIFF sub-chunks Walk the RIFF chunk list instead of assuming the canonical 44-byte header layout. Real inputs (browser-recorded clips, ffmpeg output with an 18/40-byte extensible `fmt ` chunk or trailing LIST/INFO metadata) would otherwise splice header/metadata bytes into the PCM stream as an audible impulse. Honour the `data` chunk size and validate that both `fmt ` and `data` chunks are present. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(security-headers): allow blob: in connect-src for waveform fetch The waveform renderer XHRs/fetches a freshly-created blob: object URL (e.g. an uploaded or enhanced clip before it has a server URL). XHR/fetch of blob: is governed by connect-src, not media-src, so it was blocked by the CSP. Add blob: to connect-src. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(react-ui): add input/output spectrogram view to AudioTransform The transform page only showed time-domain amplitude waveforms, so you could see how loud a clip was but not which frequencies the model touched. Add a time x frequency spectrogram heatmap and render the input and output spectrums side by side, so it's visible which bands the enhancement attenuates (bright input bands that go dark in the output). Computed client-side via a Hann-windowed STFT over both clips (a small dependency-free radix-2 FFT), defaulting to the LocalVQE 512/256 frame geometry. This shows the net input->output spectral change; the model's internal gain mask is not exposed by the backend. - src/utils/fft.js radix-2 FFT - src/hooks/useSpectrogram.js decode + STFT -> normalised dB magnitude grid - src/components/audio/Spectrogram.jsx canvas heatmap (magma colormap) - AudioTransform.jsx dual-spectrogram panel + CSS - e2e spec + UI coverage baseline bump (38.29 -> 39.0; measured ~39.4-40.2) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(react-ui): make UI coverage deterministic, tighten the gate UI e2e line coverage swung ~1pp run-to-run (39.1% <-> 40.2%), which forced a loose 0.8pp tolerance on the monotonic gate — a band wide enough to let a real ~300-line regression through silently. The swing was a bug, not inherent jitter: the 'Create Agent navigates' spec ended on the URL assertion, so AgentCreate.jsx's ~400 lines were collected only when its render happened to beat the coverage teardown. Wait for the page to actually render (assert its heading) so those lines are covered every run. With the race gone, repeated runs land within ~0.013pp of each other, so: - tighten UI_COVERAGE_TOLERANCE 0.8 -> 0.1 (noise floor, not a drift band) - set the baseline to the real, reliably-achieved value (39.0 -> 39.86) Localised by running the V8-coverage suite repeatedly and diffing per-file line coverage; AgentCreate.jsx was the sole ~1pp flipper. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-31 23:56:46 +02:00
LocalAI [bot]	39e050d9e2	fix(parakeet-cpp): cublas/hipblas/vulkan builds were silently CPU-only (#10120 ) fix(parakeet-cpp): forward PARAKEET_GGML_* so cublas/hipblas/vulkan builds aren't silently CPU-only parakeet.cpp gates its GGML backends behind PARAKEET_GGML_CUDA/HIP/VULKAN and does set(GGML_CUDA ${PARAKEET_GGML_CUDA} CACHE BOOL "" FORCE), which overwrites a bare -DGGML_CUDA=ON back to OFF. So the backend's BUILD_TYPE=cublas (and hipblas, vulkan) produced a CPU-only libparakeet.so. Forward the PARAKEET_GGML_* options instead. Verified on a GB10 (CUDA 13): the lib now links libcudart/libcublas and registers the CUDA backend, vs a CPU-only lib before. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 23:56:07 +02:00
LocalAI [bot]	aa80d4681b	chore: ⬆️ Update ggml-org/llama.cpp to `d6588daa800058dfa54f1d7ea695b1a810c8ae18` (#10093 ) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): skip begin-of-stream null partial in PredictStream Upstream llama.cpp (ggml-org/llama.cpp#23884), pulled in by this bump, now emits an initial "begin" partial whose to_json() returns null. It exists only to signal the HTTP layer to flush 200 status headers before any token is produced. gRPC has no such concept, and PredictStream had no guard: the null result was fed straight into build_reply_from_json, which threw an uncaught exception. That surfaced as a generic "Unexpected error in RPC handling" and the task was cancelled the instant it launched, breaking the PredictStream e2e spec. Skip null results in both the first-result handling and the streaming loop, mirroring upstream's own `if (first_result_json == nullptr)` guard. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 10:26:03 +00:00
LocalAI [bot]	76fe0bb929	feat(crispasr): add CrispASR backend — multi-architecture ASR + TTS (#10099 ) * feat(crispasr): backend source files (Go gRPC server, C-ABI shim, build files) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * polish(crispasr): brand error strings + fix stale shim comment Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): register backend in root Makefile Mirror the whisper Go backend registration for the new crispasr backend: NOTPARALLEL entry, prepare-test-extra/test-extra hooks, BACKEND_CRISPASR definition, docker-build target generation, and the docker-build-backends aggregate target. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): add backend build matrix entries Mirror the 11 whisper golang Dockerfile matrix entries (CPU amd64/arm64, CUDA 12/13, L4T CUDA 13, Intel SYCL f32/f16, Vulkan amd64/arm64, L4T arm64, ROCm hipblas) with backend and tag-suffix substituted to crispasr. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add crispasr backend gallery entries Add the crispasr meta anchor and its full set of image gallery entries (cpu, metal, cuda12/13, rocm, intel-sycl f32/f16, vulkan, L4T arm64, L4T cuda13 arm64, plus -development variants), mirroring the whisper backend gallery block. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): bump CRISPASR_VERSION via bump_deps workflow Track CrispStrobe/CrispASR main branch and bump CRISPASR_VERSION in backend/go/crispasr/Makefile. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): don't wire fixture-gated test into test-extra Mirror the whisper Go backend: its AudioTranscription test is gated on model/audio fixtures and skips in CI, so building crispasr (the heaviest ggml compile in the tree) inside the unit-test lane adds a long compile for zero coverage. The backend image build in backend-matrix.yml remains the authoritative compile check. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): add darwin metal build entry (mirror whisper) The metal-crispasr gallery entries and capabilities.metal mapping reference -metal-darwin-arm64-crispasr, which is only produced by an includeDarwin entry. Mirror whisper's darwin metal entry so the tag actually gets built. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): place hipblas matrix entry next to whisper twin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): register crispasr as pref-only ASR backend + test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): port whisper behavioral suite (cancellation + streaming) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): fix skip message env var names to CRISPASR_* Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): switch shim to crispasr_session_* multi-architecture API The shim used whisper_full(), which in CrispASR is the whisper-only path: libcrispasr only transcribes Whisper GGUFs through it. Multi-architecture transcription (Parakeet, Voxtral, Qwen3-ASR, Canary, Granite, FunASR, Paraformer, SenseVoice, ...) goes through the crispasr_session_* C-ABI, which auto-detects the architecture from the GGUF and dispatches to the matching backend. Rewrite the C shim around crispasr_session_open / _transcribe_lang / _result_* and add get_backend() so the selected backend is logged. load_model now takes a threads param (session_open binds n_threads at open). The session result is segment+word based with no token IDs and no per-decode callback, so drop n_tokens / get_token_id / get_segment_speaker_turn_next / set_new_segment_callback. set_abort is kept for API parity but is best-effort: the session transcribe is blocking with no abort hook. Update the purego bindings and gocrispasr.go to match: tokens are left empty, speaker-turn handling is removed, and AudioTranscriptionStream emits one delta per non-empty segment after the blocking decode returns (no progressive streaming via the session API), preserving the concat(deltas) == final.Text invariant. crispasr_session_set_translate is exported by libcrispasr but not declared in crispasr.h, so it is forward-declared in the shim alongside the open/transcribe/result functions. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): link full CrispASR backend set for multi-arch support The shim's crispasr_session_* dispatch calls into the per-architecture backend libs (parakeet, voxtral, qwen3_asr, canary, funasr, paraformer, sensevoice, ...), which CrispASR builds as static archives. Linking only crispasr + ggml dead-stripped every backend object from the final module (nm backend-symbol count: 0), leaving a whisper-only .so. Link the same backend set as crispasr-cli so the static archives are pulled in. After this the module carries the backend symbols (nm count 407, .so grows from ~2.1MB to ~6.7MB) and the session API can dispatch to every compiled-in architecture. Also rewrite ${CMAKE_SOURCE_DIR}/examples/talk-llama to ${PROJECT_SOURCE_DIR}/... in the vendored src/CMakeLists.txt: CrispASR locates its vendored llama.cpp via ${CMAKE_SOURCE_DIR}, which is wrong when CrispASR is add_subdirectory'd (CMAKE_SOURCE_DIR points at this backend dir, not the CrispASR root). PROJECT_SOURCE_DIR is correct both standalone and as a subproject; the sed is idempotent. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): adapt suite to session API (blocking, no decode callback) Register the new symbol set (drop the removed token/speaker/callback funcs, add get_backend; load_model now takes 2 args). The session transcribe is blocking with no abort hook, so a mid-decode cancel can't interrupt it: change the cancellation spec to cancel the context before the call and assert codes.Canceled from the pre-call ctx.Err() check, dropping the <5s mid-decode timing assertion. The streaming spec still holds with per-segment post-decode emission (>=2 deltas, concat(deltas) == final.Text). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add CrispASR ASR model entries (-crispasr) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(gallery): keep only session-auto-detectable CrispASR ASR models The crispasr backend loads models via crispasr_session_open, which auto-detects the backend from the GGUF general.architecture using crispasr_detect_backend_from_gguf. Architectures not in that detect map cannot be opened, so those gallery entries fail to load. Removed entries whose architecture is not wired into CrispASR v0.6.11's session auto-detect router (they can be re-added when upstream maps them): - Not in the detect map: data2vec, firered-asr, funasr, fun-asr-mlt-nano, glm-asr, hubert, kyutai-stt, mega-asr, mimo-asr, moonshine{,-de,-streaming,-tiny-de}, omniasr{,-llm,-llm-1b}, paraformer, sensevoice. - Pending verification (filename-heuristic routed, not arch-detected): parakeet-ctc-0.6b, parakeet-ctc-1.1b. Their GGUFs are routed to the fastconformer-ctc backend by a filename heuristic in the model registry, which implies general.architecture is not a mapped string. Kept the parakeet rnnt/tdt_ctc variants: convert-parakeet-to-gguf.py writes general.architecture="parakeet" unconditionally and encodes the rnnt/ctc distinction in metadata fields, so they session-auto-detect. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): TTS synthesis via crispasr_session_synthesize (24kHz) Add tts_synthesize/tts_free/tts_set_voice to the C-ABI shim. They reuse the already-open g_session (crispasr_session_open auto-detects a TTS model) and dispatch to the upstream synthesis call, which returns malloc'd 24 kHz mono float PCM. Orpheus needs a SNAC codec path that we do not set, so it returns NULL here and surfaces as an error Go-side. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): implement TTS/TTSStream gRPC methods Bind the new shim functions via purego and implement TTS, TTSStream and a writeWAV24k helper. synthesize copies the C-owned PCM out before freeing it; TTS writes a 24 kHz mono 16-bit WAV to req.Dst via go-audio/wav. CrispASR has no progressive synth, so TTSStream synthesizes fully, encodes to WAV, and emits the bytes as a single chunk; it owns the results-channel close (the gRPC server wrapper ranges until close), mirroring vibevoice-cpp's TTSStream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): log when a TTS voice override is not honored Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add CrispASR vibevoice-tts model entry Only vibevoice-tts works through the current shim: qwen3-tts, chatterbox, and orpheus require companion codec/s3gen/SNAC paths (set_codec_path / set_s3gen_path) that the shim doesn't wire yet, and kokoro/indextts/voxcpm2 aren't in the session auto-detect map. Those are follow-ups. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): gated TTS synthesis spec Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(crispasr): satisfy golangci-lint (errcheck defers + unsafeptr nolint) The crispasr Go file is entirely new, so new-from-merge-base lints every line (unlike the grandfathered whisper backend it was forked from): - handle os.RemoveAll / fh.Close return values in AudioTranscription - annotate the two intentional C-pointer unsafe.Slice sites with //nolint:govet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): backend: and codec: model options (explicit arch + companion files) Add two model-config options to the CrispASR backend via opts.Options: - backend:<name> selects an explicit CrispASR backend (bypassing auto-detect) by routing load_model through crispasr_session_open_explicit, unlocking architectures the detector won't pick on its own (qwen3, cohere, granite, voxtral, moonshine, mimo-asr, orpheus, kokoro, chatterbox, etc.). - codec:<path> loads a companion file (qwen3-tts codec, orpheus SNAC, chatterbox s3gen, or mimo-asr tokenizer) via the universal crispasr_session_set_codec_path setter after the session opens. A relative path resolves against the model directory. rc==0 means success or not-applicable; only a negative rc is fatal. The C shim load_model gains a backend_name argument and a new set_codec_path entry point; the Go bridge parses the prefix:value options and registers the new symbol. The vad_only path is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): expand CrispASR models via backend:/codec: options (explicit arch + companions) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(gallery): use virtual.yaml base for crispasr models The crispasr entries are just backend + model + a couple options, fully expressed inline via overrides:/files: in gallery/index.yaml. Point each url: at the shared gallery/virtual.yaml (the established 'virtual' model trick) and drop the 36 redundant per-model gallery/-crispasr.yaml files. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(gallery): drop voice-requiring TTS entries (keep vibevoice-tts) Real e2e showed qwen3-tts/orpheus/chatterbox don't synthesize through the current shim: the codec: companion loads fine, but these engines additionally need a voice pack / voice prompt / reference clip (qwen3-tts base errors 'no voice'; chatterbox is zero-shot cloning; orpheus uses named voices) that the backend doesn't wire. (qwen3-tts also can't auto-detect: its GGUF arch is 'qwen3tts', unmapped by the detector — would need backend:qwen3-tts.) Removed to avoid shipping non-working gallery entries; vibevoice-tts (built-in voice, e2e-verified) remains the working TTS. Voice-pack wiring is a follow-up. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): speaker: and voice: TTS options (baked speakers + voice packs/prompts) speaker:<name> -> crispasr_session_set_speaker_name (baked speakers: qwen3-tts CustomVoice, orpheus). voice:<path>(+voice_text:<ref>) -> crispasr_session_set_voice (voice-pack GGUF, or WAV zero-shot clone with ref text). Applied at Load as the default voice; req.Voice still overrides the speaker per request. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): re-add e2e-verified TTS engines (chatterbox, qwen3-tts-customvoice, orpheus) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 12:11:03 +02:00
LocalAI [bot]	e08492a2c3	chore: ⬆️ Update leejet/stable-diffusion.cpp to `d2797b86670622b6538123b4aeb5fbb6be2653c5` (#10094 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 00:42:13 +02:00
LocalAI [bot]	8a82753277	chore: ⬆️ Update antirez/ds4 to `ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdc` (#10095 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 00:10:33 +02:00
LocalAI [bot]	51ca109067	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `3f40e73c367ad9f0c1b1819f28c7348c26aa340d` (#10097 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 00:10:16 +02:00
LocalAI [bot]	07f6c15a37	feat(ds4): layer-split distributed inference (#10098 ) * feat(ds4): add standalone ds4-worker distributed worker binary Add worker_main.c, a minimal standalone worker that owns a slice of the model's transformer layers and serves activations over ds4's own TCP transport via ds4_dist_run(). It links the same engine objects the backend already builds (including ds4_distributed.o) and has NO gRPC/protobuf dependency, so it builds even on hosts lacking protobuf/grpc dev headers. Launched by `local-ai worker ds4-distributed`. Wire the ds4-worker CMake target (mirrors grpc-server's object/GPU/native handling) and have the Makefile copy + clean the binary alongside grpc-server. Ignore the built ds4-worker artifact. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): package ds4-worker alongside grpc-server Copy the standalone ds4-worker binary into the backend package (Linux package.sh) and the Darwin OCI tar (ds4-darwin.sh: both the explicit copy and the otool dylib-bundling loop) so distributed workers ship with the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): tighten ds4-worker integer arg validation to match upstream Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ds4): wire grpc-server as distributed coordinator Add distributed COORDINATOR support to the ds4 backend's gRPC server. Distributed inference is an engine backend: when LoadModel receives 'ds4_role:coordinator', the process populates ds4_engine_options.distributed (role, layer slice, listen host/port) before ds4_engine_open, then the normal ds4_session_* generation path runs transparently once the worker route covers all layers. - New LoadModel options: ds4_role, ds4_layers (START:END or START:output), ds4_listen (host:port), ds4_route_timeout. - parse_layers_spec() maps the layer spec onto ds4_distributed_layers. - wait_route_ready() blocks generation until ds4_session_distributed_route_ready() reports full coverage (or timeout), gating both Predict and PredictStream; returns UNAVAILABLE on timeout/error. - No ds4_role => g_distributed stays false and wait_route_ready is a no-op, so single-node behavior is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): don't block Status during route wait; validate coordinator opts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): add ds4-distributed worker exec helper Add the ds4WorkerArgs helper plus findDS4Backend/DS4Distributed.Run that resolve the ds4 backend via the gallery and exec the packaged ds4-worker binary. Unlike worker_llamacpp.go, ds4 bundles its own dynamic loader (lib/ld.so) for glibc compatibility, so when present we exec ds4-worker through that loader with LD_LIBRARY_PATH=<backend>/lib, mirroring backend/cpp/ds4/run.sh; otherwise we exec it directly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(cli): register the ds4-distributed worker subcommand Wire DS4Distributed into the Worker kong command tree so `local-ai worker ds4-distributed` is available. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): document layer-split distributed inference Add a ds4 section to the distributed-mode feature docs (coordinator model YAML, manual worker command, layer-range semantics, the 'GGUF on every machine' requirement, coordinator-listens dial direction vs llama.cpp) and a terse Distributed mode section to the ds4 backend agent guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): opt-in hardware-gated distributed e2e spec Add a self-contained, opt-in Ginkgo spec to the backend e2e suite that spins a ds4 coordinator (via the packaged run.sh, loaded with ds4_role/ds4_layers/ds4_listen options) plus a ds4-worker process for the upper layers, then uses Eventually to assert a short successful Predict once the layer route forms, before tearing the worker down. Gated by BACKEND_TEST_DS4_DISTRIBUTED=1 (plus the existing BACKEND_BINARY + BACKEND_TEST_MODEL_FILE and optional layer/listen/accel knobs); compiles and skips cleanly with no env, hardware, or model. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(ds4): pass coordinator ctx to worker; lowercase error string Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(ds4): note distributed transport is plaintext/unauthenticated Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * style(ds4): replace em dashes in distributed docs/agent/test per repo convention Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): link ds4-worker with the C++ driver for CUDA/Metal builds The ds4-worker target is built from worker_main.c (C), so CMake linked it with the C driver. The nvcc-built ds4_cuda.o (and Obj-C++ ds4_metal.o) reference the C++ runtime, so the CUDA/Metal builds failed with undefined libstdc++ symbols (std::__throw_length_error). The CPU build passed because ds4_cpu.o is pure C. Force LINKER_LANGUAGE CXX so libstdc++ is linked. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 00:09:55 +02:00
LocalAI [bot]	aee4611ab2	chore: ⬆️ Update mudler/parakeet.cpp to `30a307553f1965ceb38a1a922069a71e7dd67bf3` (#10092 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 22:48:09 +02:00
LocalAI [bot]	486467623c	chore: ⬆️ Update antirez/ds4 to `e16ead1e29c81a67bbb64e5b001117679cf9ce6e` (#10076 ) * ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(ds4): link new ds4_distributed.o into grpc-server build Upstream ds4 e16ead1e split distributed inference into a new translation unit (ds4_distributed.c/.h). ds4.c and ds4_cpu.o now reference its ds4_dist_* symbols, so the grpc-server link fails with undefined references unless that object is built and linked. Add ds4_distributed.o to both the upstream object build (Makefile) and the grpc-server link set (CMakeLists.txt) for every GPU mode. It is a single GPU-agnostic object, so it is built/linked unconditionally. Verified: the six undefined ds4_dist_session_* references in ds4_cpu.o are all defined by the newly built ds4_distributed.o (nm cross-check). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-30 22:08:30 +02:00
LocalAI [bot]	4912c9b73a	feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084 ) * feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and --parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add &parakeetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(parakeet-cpp): document the parakeet-cpp transcription backend Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(parakeet-cpp): drop em dashes from comments and docs Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add parakeet-cpp f16 models to the model gallery Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION - goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): init the ggml submodule in the backend image clone The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): statically link ggml into libparakeet.so The shared libparakeet.so linked ggml's shared libs (libggml.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-30 14:46:10 +02:00
Richard Palethorpe	12d1f3a697	security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 ) LocalAI's outbound HTTP clients used Go's default redirect policy, which follows up to 10 redirects. On a cross-host redirect Go forwards custom request headers — including credential headers such as Anthropic's x-api-key — to the redirect target (Go strips Authorization, Cookie and WWW-Authenticate cross-host, but NOT arbitrary custom headers). An attacker able to elicit a redirect from an upstream (a hijacked or spoofed upstream, DNS trickery, or a malicious upstream_url) then harvests the operator's provider API key. This was first reported against the cloud-proxy / MITM PII path (GHSA-3mj3-57v2-4636); the same class affects every other outbound client. Rather than patch each call site, add pkg/httpclient as the one sanctioned constructor for outbound HTTP and route everything through it. pkg/httpclient: - New(...) refuses redirects, TLS 1.2 floor, no body deadline (streaming/SSE safe) - NewWithTimeout(d) simple request/response calls - WithFollowRedirects opt-in following that still strips credential headers on any cross-host hop; different scheme/host/port == different origin, guarding the curl CVE-2022-27774 port-confusion class - WithTransport(rt) keep a custom transport (IP-pin, HTTP/2, a credential-injecting RoundTripper) - HardenedTransport() base transport with the TLS floor + bounded setup - Harden(c) apply the policy to a library-supplied http.Client - NoRedirect the CheckRedirect policy; wraps ErrRedirectBlocked Lint: a forbidigo rule flags http.DefaultClient and http.Get/Post/ PostForm/Head, pointing at pkg/httpclient (.golangci.yml, .agents/coding-style.md). forbidigo cannot match the &http.Client{} composite literal without also flagging legitimate http.Client type references, so that form is enforced by review. Migrates every non-test outbound call site across core/, pkg/, cmd/, and the Go backend (backend/go/cloud-proxy). Credential-bearing and internal-RPC clients refuse redirects; download / CDN / registry clients use WithFollowRedirects so they keep working while stripping secrets cross-host. The only credential-bearing client that follows redirects is the gated-download path (pkg/downloader/uri.go), which strips the token on the cross-host hop to the CDN. Hardening this closes, in passing: - MCP remote-server bearer token leaking via a redirect (the RoundTripper re-injected Authorization on every hop) - agent multimedia/webhook clients leaking user-supplied auth headers - cors_proxy following redirects, bypassing its SSRF IP-pin - downloader's authorized read path leaking the token cross-host Fixes: GHSA-3mj3-57v2-4636 (cloud-proxy leaks operator provider API key (x-api-key) to attacker host on cross-host redirect) Reported-by: tonghuaroot Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-30 12:04:10 +02:00
LocalAI [bot]	a7cad704b9	chore: ⬆️ Update ggml-org/llama.cpp to `22d66b567eef11cf2e9832f04db64ee0323a0fd0` (#10080 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 08:34:00 +02:00
LocalAI [bot]	7e4df67556	chore: ⬆️ Update ggml-org/whisper.cpp to `f24588a272ae8e23280d9c220536437164e6ed28` (#10078 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 01:09:52 +02:00
LocalAI [bot]	5b24b4dacc	chore: ⬆️ Update mudler/rf-detr.cpp to `65c0ffcc9a9bc9dae38252f63d0417c9845a6cf7` (#10075 ) ⬆️ Update mudler/rf-detr.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:55:41 +02:00
LocalAI [bot]	74281be340	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.22.0` (#10079 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:41 +02:00
LocalAI [bot]	cacf2f7a2c	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `8960c5ba5ee9db30ba838304373aa4dbec9f7cbd` (#10077 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:27 +02:00
LocalAI [bot]	b982c977d5	chore: ⬆️ Update ggml-org/whisper.cpp to `c932729a304f7d9eb5354afa38624cfa86a780cf` (#10051 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:42:06 +02:00
LocalAI [bot]	532ca1b3a2	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `6eff055a0cc0e427a6849cfcb5de531b4b82d667` (#10050 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:50 +02:00
LocalAI [bot]	00ad55b590	chore: ⬆️ Update ggml-org/llama.cpp to `751ebd17a58a8a513994509214373bb9e6a3d66c` (#10049 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:35 +02:00
LocalAI [bot]	4c58fd302f	chore: ⬆️ Update leejet/stable-diffusion.cpp to `0e4ee04488159b81d95a9ffcd983a077fd5dcb77` (#10048 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:18 +02:00
LocalAI [bot]	66582e7035	chore: ⬆️ Update antirez/ds4 to `22393e770ea8eb7501d8718d6f66c6374004e03f` (#10047 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:02 +02:00
LocalAI [bot]	c8ad67bbca	chore: ⬆️ Update mudler/rf-detr.cpp to `ecf64d7f7f20d73ebd906a983f398ed287256320` (#10035 ) ⬆️ Update mudler/rf-detr.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:39:47 +02:00
LocalAI [bot]	1c92b00918	fix(turboquant): guard upstream-only grpc-server fields for fork (#10043 ) fix(turboquant): guard upstream-only grpc-server fields for fork build backend/cpp/llama-cpp/grpc-server.cpp is reused by the turboquant build, which compiles against an older llama.cpp fork (TheTom/llama-cpp-turboquant). Two recent changes added references to upstream-only struct fields outside the existing LOCALAI_LEGACY_LLAMA_CPP_SPEC guards: - common_params::checkpoint_min_step (default + option handler), added with the ggml-org/llama.cpp 35c9b1f3 bump (#9998) - the common_params_speculative::draft tensor_buft_overrides sentinel termination (#9919), which sat after the guard's #endif The fork has neither field, so grpc-server.cpp failed to compile for every turboquant flavor. Wrap the three references in #ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC, matching the existing fork-compat guards, so the stock llama-cpp build is unchanged and the fork build skips them. Update patch-grpc-server.sh's doc comment to record what the macro now gates out. Verified by a local fallback-flavor turboquant build: grpc-server.cpp compiles against the fork and the backend image builds. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-28 17:37:54 +02:00
LocalAI [bot]	7763fb23a3	chore: ⬆️ Update antirez/ds4 to `072bc0feb187be5f374c08b16d0045e1ad7bc9bc` (#10036 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 08:41:03 +02:00
LocalAI [bot]	324277ccfd	chore: ⬆️ Update ggml-org/whisper.cpp to `6dcdd6536456158667747f724d6bd3a2ceaa8d88` (#10032 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:25:20 +02:00
LocalAI [bot]	10d02e6c59	chore: ⬆️ Update leejet/stable-diffusion.cpp to `29ab511fc75f89fbab148665eab1a8e10a139a72` (#10033 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:24:59 +02:00
LocalAI [bot]	05ae06c17b	chore: ⬆️ Update ggml-org/llama.cpp to `aa50b2c2ae91326d5aad956ceeb015d1d48e626b` (#10034 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:23:40 +02:00
LocalAI [bot]	81b6b94f0b	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `3bf7e836c2c5a895e8d12d3eb7e398ae7ab2f9ce` (#10037 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:21:45 +02:00
LocalAI [bot]	7a4ca8f60d	feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028 ) Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC. Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large) and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/ SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace. Detection returns Bbox + class_name + confidence; segmentation also returns PNG-encoded per-detection masks via the rfdetr_capi accessor functions (rfdetr_capi_get_detection_{class_id,box,score,class_name, mask_png}). End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection model, 21 detections + valid PNG masks on the seg-nano model against the kitchen fixture). Wiring: - backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt, Makefile,run.sh,package.sh,test.sh,.gitignore} - Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target, .NOTPARALLEL, prepare-test-extra, test-extra - backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra - .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13 (arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16 - backend/index.yaml: rfdetr-cpp meta anchor + latest/development image entries for every matrix tag-suffix - .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking (mudler/rf-detr.cpp branch main) - gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection variants + 6 seg variants), all backed by mudler/rfdetr-cpp-* on HuggingFace with sha256 pinning on the F16 default - core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports (mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format repos stay on the Python rfdetr backend; explicit preferences.backend overrides both heuristics) - core/gallery/importers/rfdetr_test.go: table-driven coverage of the auto-routing + a live mudler/rfdetr-cpp-nano cross-check scripts/changed-backends.js needs no change: the existing Dockerfile.golang -> backend/go/${item.backend}/ branch already routes the 9 rfdetr-cpp matrix entries to the correct backend path. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-27 18:43:57 +02:00
LocalAI [bot]	4d01298048	chore: ⬆️ Update antirez/ds4 to `e8e8779b261c10f36ad6270ba732c8f0be5b62e3` (#10024 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 15:16:43 +02:00
LocalAI [bot]	b6055e7ecf	chore: ⬆️ Update leejet/stable-diffusion.cpp to `92dc7268fc4ffb0c0cc0bd52dfcefea91326e797` (#10023 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 15:16:23 +02:00
LocalAI [bot]	51bad74bf8	chore: ⬆️ Update ggml-org/llama.cpp to `0d18aaa9d1a8af3df9abccd828e22eeaac7f840b` (#10022 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:29:14 +02:00
LocalAI [bot]	f3236b74cf	chore: ⬆️ Update ggml-org/whisper.cpp to `27101c01dcac1676e2b6422256233cd0f1f9ae28` (#10021 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:28:55 +02:00
LocalAI [bot]	eed3ecff82	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `d2da6da05c73aeb658a3d1751f386c24e6963856` (#10020 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:28:32 +02:00
番茄摔成番茄酱	df7623fd87	fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models (#10012 ) * fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models CTC models (e.g. Whisper) return List[str] from transcribe(), but TDT/RNNT models (e.g. parakeet-tdt-0.6b-v3) return List[Hypothesis] where the decoded text lives in the Hypothesis.text attribute. Previously, results[0] was assigned directly to the protobuf string field, causing silent empty output for non-CTC models. Now checks the return type and extracts .text from Hypothesis objects, with a safe fallback via getattr(). * refactor: simplify Hypothesis text extraction per Copilot review Use single getattr() call instead of hasattr() + double access, and return empty string for unknown types instead of str(result) to avoid leaking internal repr to clients.	2026-05-26 20:35:23 +00:00
番茄摔成番茄酱	4e5ec6f67b	fix(qwen-asr): enable timestamp output when forced_aligner is configured (#10013 ) * fix(qwen-asr): enable timestamp output when forced_aligner is configured Two bugs prevented timestamps from working in the qwen-asr backend: 1. transcribe() was called without return_time_stamps=True, so the forced aligner was loaded but never invoked. Now we pass return_time_stamps=True when a forced_aligner is present. 2. The timestamp parsing code expected (list, tuple) items, but the qwen_asr library returns ForcedAlignItem dataclass instances with .text, .start_time, .end_time attributes. Added hasattr() check to handle this correctly, falling back to tuple parsing for backward compatibility. * refactor: address Copilot review for qwen-asr timestamps - Wrap return_time_stamps kwarg in try/except TypeError for safety - Add defensive float() normalization for timestamp times - Use str() for text extraction to ensure string type * fix(qwen-asr): convert seconds to nanoseconds for Go time.Duration The Go server reads TranscriptSegment.start/end via time.Duration, which is in nanoseconds. Previously the backend sent milliseconds (* 1000), causing timestamps to be 1000x too small (e.g. 8e-8 instead of 0.08). Convert seconds → nanoseconds (* 1e9) instead. Also applies to the legacy tuple path for consistency. * feat(qwen-asr): respect timestamp_granularities (segment vs word) Read request.timestamp_granularities from the gRPC request. - 'word': return one segment per aligned item (character / word) - 'segment' (default): merge consecutive items at sentence boundaries Sentence boundaries detected via CJK punctuation (。！？；…) and Latin endings (. ! ? ;). This matches the OpenAI Whisper API contract where omitting the parameter defaults to segment-level. * fix(qwen-asr): escape smart quotes in punctuation set Unicode curly quotes (U+2018/2019) were being interpreted as Python string delimiters, causing SyntaxError. Use explicit unicode escapes. * fix(qwen-asr): use time-gap threshold for segment boundaries The forced aligner strips punctuation from its output, so text-based sentence detection doesn't work. Instead, detect segment boundaries by measuring time gaps between consecutive aligned items. Threshold = max(median_gap * 4, 0.3s). This cleanly separates intra-sentence gaps (< 0.24s) from inter-sentence gaps (> 0.3s) across Chinese, English, and other languages. * fix(qwen-asr): smart join with spaces for non-CJK tokens The forced aligner strips whitespace from tokenized text, so English words like ['hello', 'world'] were joined as 'helloworld'. Add _smart_join() that inserts spaces between non-CJK tokens while keeping CJK characters and punctuation unspaced. Works for Chinese, English, Korean, Japanese, and mixed-language text. --------- Co-authored-by: fqscfqj <fqsfqj@outlook.com>	2026-05-26 20:34:21 +00:00
dependabot[bot]	90c29f9258	chore(deps): bump protobuf from 6.33.5 to 7.35.0 in /backend/python/transformers (#10004 ) chore(deps): bump protobuf in /backend/python/transformers Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.5 to 7.35.0. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Commits](https://github.com/protocolbuffers/protobuf/commits) --- updated-dependencies: - dependency-name: protobuf dependency-version: 7.35.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-26 22:03:59 +02:00
dependabot[bot]	66aaa525e5	chore(deps): update transformers requirement from >=5.8.1 to >=5.9.0 in /backend/python/transformers (#10005 ) chore(deps): update transformers requirement Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v5.8.1...v5.9.0) --- updated-dependencies: - dependency-name: transformers dependency-version: 5.9.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-26 22:03:38 +02:00
dependabot[bot]	a29a06f6b6	chore(deps): bump sentence-transformers from 5.5.0 to 5.5.1 in /backend/python/transformers (#10007 ) chore(deps): bump sentence-transformers in /backend/python/transformers Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.5.0 to 5.5.1. - [Release notes](https://github.com/huggingface/sentence-transformers/releases) - [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.5.0...v5.5.1) --- updated-dependencies: - dependency-name: sentence-transformers dependency-version: 5.5.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-26 22:03:05 +02:00
LocalAI [bot]	4aad97971c	chore: ⬆️ Update ggml-org/llama.cpp to `35c9b1f39ebe5a7bb83986d64415a079218be78d` (#9998 ) * ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): track upstream rename checkpoint_every_nt -> checkpoint_min_step Upstream llama.cpp renamed common_params::checkpoint_every_nt to checkpoint_min_step and changed its default from 8192 to 256. The semantics also shifted: it used to enforce a fixed checkpoint cadence during prefill, now it sets a minimum spacing between context checkpoints. Track the new field name in grpc-server.cpp and accept the old option names as backward- compatible aliases for users with existing configs. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-26 08:34:41 +02:00
LocalAI [bot]	4a5219fa9c	chore: ⬆️ Update ggml-org/whisper.cpp to `e0fd1f6787a5bd4a4957dd97c5b64df882ee7b0c` (#9997 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 08:33:53 +02:00
LocalAI [bot]	b5a620294e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `1ceb5bd9df7784bcdf67dd9ed8bf0198b542ebc9` (#9994 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 08:33:37 +02:00
LocalAI [bot]	5d544a7868	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `b4e1d916c5ec7e75ea3c124dd090425a99fc613f` (#9995 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 23:57:17 +02:00

1 2 3 4 5 ...

1370 Commits