LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-09 09:18:19 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	378da34571	chore(llama-cpp): re-pin to upstream #24316 , drop vendored stdin patch Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 20:26:00 +00:00
Ettore Di Giacinto	564e431717	fix(llama-cpp): patch mtmd video stdin double-close (heap crash) Upstream mtmd video input (ggml-org/llama.cpp#24269) double-fcloses the ffmpeg/ffprobe stdin FILE: feed_stdin() fclose()s the FILE returned by subprocess_stdin() (which is sp->stdin_file), then subprocess_destroy() fclose()s the same pointer again -> heap corruption that aborts the backend on any base64 input_video request (the CLI --video file path is unaffected). Vendor a one-line fix (null sp->stdin_file after fclose) via prepare.sh's patches/ until upstream merges it. Verified e2e with gemma-4-e2b-it-qat-q4_0: video frames decode via ffmpeg and the model answers correctly (red clip -> 'Red', blue -> 'Blue'). Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 16:29:39 +00:00
Ettore Di Giacinto	35ff935845	feat(llama-cpp): forward video input to mtmd (template + non-template paths) Wire request->videos() into grpc-server.cpp mirroring the existing image and audio handling: a video_data build + non-template files extraction, and input_video chat chunks on the tokenizer-template path. allow_video is auto-set at model load by the vendored upstream chat_params. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 15:06:51 +00:00
Ettore Di Giacinto	37158c27df	chore(llama-cpp): bump to 8f83d6c for mtmd video input support Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 15:02:33 +00:00
LocalAI [bot]	c20225fc13	chore: ⬆️ Update CrispStrobe/CrispASR to `f7838a306687f22c281d29c250f879a4ab3df2d7` (#10177 ) * ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(crispasr): link crispasr-lib CMake target instead of crispasr The dependency-bump regeneration of this branch reset CMakeLists.txt to master and dropped the prior link-target fix, reintroducing the `cannot find -lcrispasr` failure. Upstream CrispASR (f7838a3) defines the library as the CMake target `crispasr-lib` (with OUTPUT_NAME crispasr); there is no target named `crispasr`, so target_link_libraries falls back to a bare `-lcrispasr` linker flag that cannot be resolved. Point the link at the real target name. Verified locally: CPU cmake-configure of the bumped source generates a gocrispasr link line referencing sources/CrispASR/src/libcrispasr.a with no dangling -lcrispasr. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 16:01:19 +02:00
LocalAI [bot]	337acc4c37	chore: ⬆️ Update antirez/ds4 to `c463029c205c2ec8d7ab6c0df4a3f52979091286` (#10189 ) * ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(ds4): link ds4_ssd.o into the backend build Upstream antirez/ds4 splits the SSD expert-cache into its own ds4_ssd.c translation unit, whose symbols (ds4_ssd_memory_lock_acquire/release, ds4_ssd_cache_experts_for_byte_budget, ds4_ssd_auto_cache_plan) are referenced by ds4.c/ds4_cpu.o. The dependency-bump automation regenerated this branch from clean master and dropped the prior linkage fix, so the cpu-ds4 / cublas-ds4 backend builds fail again with undefined references. Re-apply the ds4_ssd.o linkage GPU-agnostically (mirroring ds4_distributed.o) in both the backend Makefile (DS4_OBJ_TARGET + the engine-object build rule for every GPU mode) and CMakeLists.txt (list(APPEND DS4_OBJS ds4_ssd.o)). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 11:15:32 +02:00
LocalAI [bot]	2e93186043	chore: ⬆️ Update ggml-org/llama.cpp to `9e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66` (#10210 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-08 09:35:17 +02:00
LocalAI [bot]	d07037e817	chore: ⬆️ Update leejet/stable-diffusion.cpp to `b3d56d0ba1bd437886079e339118e8e75bb79ee7` (#10211 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-08 09:03:57 +02:00
LocalAI [bot]	f6cc90d258	chore: ⬆️ Update mudler/parakeet.cpp to `e270af73b94c9a5c37ec516230219ed4580e1db6` (#10212 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-07 23:52:44 +02:00
LocalAI [bot]	a7cb587d96	feat(parakeet-cpp): real segment timestamps (NeMo-faithful) (#10207 ) * feat(parakeet-cpp): real segment timestamps (NeMo-faithful) Offline: replace the single synthetic whole-clip segment with multiple segments grouped exactly like NeMo's get_segment_offsets - a new segment after sentence-ending punctuation ('. ? !'), each carrying start/end and its time-window token ids. The optional model option segment_gap_threshold (NeMo's unit: encoder FRAMES, default 0=off) adds NeMo's silence-gap split, converted to seconds via the JSON frame_sec the engine now reports. Per-segment words are still gated behind timestamp_granularities=["word"]; a zero-word document falls back to a single text segment. Streaming: when libparakeet.so exposes the ABI v4 JSON entry points (probed), drive parakeet_capi_stream_feed_json / _finalize_json and accumulate the streamed per-word timestamps into per-utterance segments (EOU stays the boundary), so streaming FinalResult segments now carry start/end. Falls back to the text-only feed against an older library. Pure-Go specs cover splitWordsIntoSegments (punctuation + gap rules, NeMo elif order, fallback), transcriptResultFromDoc (multi-segment, token windows, word-granularity gate), and the streaming segmenter. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(audio): document parakeet-cpp segment timestamps + segment_gap_threshold Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(parakeet-cpp): update model-gated specs for multi-segment output The offline AudioTranscription specs asserted the old single synthetic segment (Segments HaveLen(1), Segments[0].Text == res.Text). With NeMo-faithful segmentation a multi-sentence clip now yields multiple punctuation-delimited segments, so assert the new contract instead: one-or-more time-ordered segments, each with text and (under word granularity) per-segment words whose span tracks the segment start/end. Caught by running the model-gated suite on the dgx (GB10) against the real tdt_ctc-110m + realtime_eou models. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-07 22:08:24 +02:00
LocalAI [bot]	f7c74ad2da	chore: ⬆️ Update ggml-org/llama.cpp to `31e82494c0a3913c919c1027fa70500fbf4c07dd` (#10191 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-07 10:43:17 +02:00
LocalAI [bot]	7402d1fd20	chore(turboquant): bump to 7d9715f1 + fix compilation against rebased fork (#10205 ) * chore(turboquant): bump TheTom/llama-cpp-turboquant to 7d9715f1 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(turboquant): drop obsolete legacy-spec shim after fork rebased The TheTom/llama-cpp-turboquant fork (pin c9aa86a) rebased past the upstream common_params_speculative refactor (ggml-org/llama.cpp #22397/#22838/#22964), the model_tgt rename (#22838) and get_media_marker (#21962). The old fork-compat shim forced now-wrong legacy code paths, breaking the build with errors like 'struct common_params_speculative has no member named mparams_dft / type' and 'server_context_impl has no member named model'. Remove the obsolete LOCALAI_LEGACY_LLAMA_CPP_SPEC branches from the shared grpc-server.cpp (stock llama-cpp and the modern fork both take the modern path now), and narrow the one remaining gap (the fork still lacks common_params::checkpoint_min_step) to a dedicated LOCALAI_TURBOQUANT_NO_CHECKPOINT_MIN_STEP guard injected by patch-grpc-server.sh. The patch script now only adds the turbo2/3/4 KV-cache types and injects that one macro. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(turboquant): HIP-port the fork's CUDA additions (copy2d 3D-peer + cudaEventCreate) The turboquant fork adds/modifies a few ggml-cuda.cu spots with CUDA APIs that ggml's HIP/MUSA shim does not provide, breaking the -gpu-rocm-hipblas-turboquant build. patches/0001-hip-guard-copy2d-peer-fastpath.patch (applied by apply-patches.sh) ports them: - Guard ggml_cuda_copy2d_across_devices's 3D-peer copy fast path with #if !defined(GGML_USE_HIP) && !defined(GGML_USE_MUSA) so HIP/MUSA fall through to the existing cudaMemcpyAsync staging fallback (HIP genuinely lacks cudaMemcpy3DPeerAsync, per the fork's own comment). - Create the device event in ggml_backend_cuda_device_event_new with the HIP-aliased cudaEventCreateWithFlags(.., cudaEventDisableTiming) instead of the un-aliased plain cudaEventCreate, matching this file's own usage elsewhere. CUDA builds are unaffected. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * ci(turboquant): drop the ROCm/hipblas build flavor The TheTom/llama-cpp-turboquant fork is not ROCm-clean at the current pin: beyond the CUDA-API gaps already patched (3D-peer copy, cudaEventCreate), its llama.cpp base fails to compile the flash-attention MMA f16 kernels for head-dim 640 under HIP (cols_per_warp evaluates to 0 -> division-by-zero / non-constant static asserts in fattn-mma-f16.cuh). That is a deep ggml-on-ROCm kernel issue, not something a small fork patch can paper over. Drop -gpu-rocm-hipblas-turboquant from the build matrix so turboquant still ships for cpu / cublas / vulkan / sycl. Re-add it once the fork's HIP path compiles (or upstream ggml fixes the large-head-dim MMA kernels for ROCm). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-07 10:42:06 +02:00
LocalAI [bot]	8c42695ef8	chore: ⬆️ Update ggml-org/whisper.cpp to `a8ec021f2750a473ff4a8f3883bc9fdf5feafa84` (#10202 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-07 08:37:42 +02:00
LocalAI [bot]	72e3241431	chore: ⬆️ Update mudler/parakeet.cpp to `abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67` (#10204 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-07 00:37:28 +02:00
LocalAI [bot]	f64b72dd7d	feat: support Ideogram4 in stablediffusion-ggml backend + gallery (#10201 ) * feat(stablediffusion-ggml): support Ideogram4 unconditional diffusion model Bump stable-diffusion.cpp from 1f9ee88 to b9254dd, the upstream commit that adds Ideogram4 support (leejet/stable-diffusion.cpp#1609). Ideogram4 derives its classifier-free guidance from a separate unconditional diffusion model, exposed upstream through the new sd_ctx_params_t.uncond_diffusion_model_path field. Wire that field into the gosd wrapper via a new uncond_diffusion_model_path option. The _path suffix is deliberate: the Go loader only resolves options whose name contains "path" to an absolute path under the model directory, so this keeps the option consistent with diffusion_model_path and high_noise_diffusion_model_path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): add Ideogram4 stablediffusion-ggml models Single-file GGUF weights for Ideogram4 are now published (stduhpf/ideogram-4-gguf), so add the model to the gallery. Ideogram4 is a text-to-image model with strong, accurate in-image text rendering, driven by a Qwen3-VL-8B text encoder and real classifier-free guidance from a separate unconditional diffusion model (the uncond_diffusion_model_path support added in the preceding commit). Two index entries, both built on gallery/virtual.yaml with the full config inlined in overrides (same pattern as the other models, no dedicated template file): - ideogram-4-iq4nl-ggml (4-bit, ~11.6GB diffusion) - ideogram-4-q8_0-ggml (8-bit, ~20GB diffusion) Each bundles the diffusion + unconditional GGUF (stduhpf), the Qwen3-VL-8B-Instruct text encoder (unsloth), and the FLUX.2 VAE (Comfy-Org mirror, non-gated). cfg_scale is 7 to match the upstream Ideogram4 default, since it performs real CFG unlike the guidance-distilled Flux/Z-Image models. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-06 22:50:12 +02:00
LocalAI [bot]	03c84cff28	feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support (#10199 ) * feat(parakeet-cpp): honor request language (multilingual nemotron) on batched + streaming paths Reads opts.GetLanguage() and threads it through to the new parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_stream_begin_lang C-API entry points, both probed with Dlsym so the backend still loads against an older libparakeet.so (falling back to the non-lang paths, i.e. model default). parakeet.cpp's batched C-API takes a single target_lang for the whole batch, so the dispatcher only coalesces same-language requests: a request whose language differs from the batch leader is held as a single carry-over and becomes the leader of the next batch, never dropped and never left waiting (including on shutdown). A new batcher test asserts no dispatched batch is ever mixed-language and that every submitted request still receives a reply. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gallery): add parakeet-cpp-nemotron-3.5-asr-streaming-0.6b; bump parakeet.cpp pin Adds the multilingual prompt-conditioned streaming model to the gallery (q8_0 default, OpenMDW-1.1) and bumps the parakeet-cpp backend pin to the parakeet.cpp commit that ships nemotron support plus batched causal subsampling and the batched target_lang C-API. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-06 13:53:10 +02:00
LocalAI [bot]	1e6c9cfd60	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `6b9de3dbaa21ae95ea80638e5ee836795cc48c93` (#10190 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-06 09:42:43 +02:00
LocalAI [bot]	0e6712f734	chore: ⬆️ Update mudler/parakeet.cpp to `843600590f96a31467a5199f827c253f34c110f7` (#10198 ) chore(parakeet-cpp): bump pin to banded long-audio attention (843600590) Update PARAKEET_VERSION to mudler/parakeet.cpp@843600590f (merge of parakeet.cpp#9). Brings NeMo rel_pos_local_attn banded/Longformer attention with the chunk-matmul construction: long audio now uses O(T*window) attention instead of global O(T^2), fixing the encoder OOM on long clips (~16.6-min clip: 54GB->9.4GB peak, ~4x faster) at NeMo's full [128,128] window. Short clips are unchanged (global path). No C-ABI change. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-06 09:25:25 +02:00
LocalAI [bot]	ba706422fb	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.22.1` (#10188 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-05 23:42:50 +02:00
LocalAI [bot]	e837921c2c	feat: forward reasoning_effort to the backend so jinja models honor it (#10184 ) * feat: forward reasoning_effort to the backend so jinja models honor it reasoning_effort was only mapped to the binary enable_thinking toggle and otherwise reached Go-side templates — it was never sent to the backend. So jinja-templated models whose chat template keys on reasoning_effort (gpt-oss Harmony, LFM2.5) could not be driven by it: LFM2.5 ignores enable_thinking and kept emitting <think>. Forward the effective reasoning_effort to the backend as a chat_template_kwarg (mirroring enable_thinking) in grpc-server.cpp, and put it in PredictOptions metadata (gRPCPredictOpts). Add a config-level default: ModelConfig.reasoning_effort and Pipeline.reasoning_effort, resolved by ModelConfig.ApplyReasoningEffort (request value overrides config default, none->disable / level->enable, an operator's reasoning.disable wins). request.go now uses that helper. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): set the pipeline LLM's reasoning_effort Apply Pipeline.ReasoningEffort to the pipeline's LLM config when the realtime model is built (per-session copy, overrides the LLM's own reasoning_effort), and surface the resolved effort on the template input so Go-templated models get it too. jinja models receive it via the backend metadata. This lets a realtime pipeline disable thinking on models that only honor reasoning_effort (e.g. LFM2.5), which enable_thinking can't. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-05 13:45:43 +00:00
LocalAI [bot]	a4e671779a	chore: ⬆️ Update ggml-org/whisper.cpp to `99613cb720b65036237d44b52f753b51f75c2797` (#10178 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-05 09:04:25 +02:00
LocalAI [bot]	7051b2e0a1	chore: ⬆️ Update ggml-org/llama.cpp to `7c158fbb4aec1bdc9c81d6ca0e785139f4826fae` (#10179 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-05 09:04:10 +02:00
LocalAI [bot]	469737101a	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `1520eda980564241434b791ce2bbbd128c4be9ea` (#10180 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-05 09:03:08 +02:00
LocalAI [bot]	858257eaf0	fix(distributed): self-heal stale 'model not loaded' routing (#10181 ) * fix(distributed): self-heal stale 'model not loaded' routing In distributed mode the registry can list a model as loaded on a node while the worker has evicted it (autonomous LRU eviction, an out-of-band unload, etc.) yet the backend process survives. The router's cached-node check only verifies the process is alive (probeHealth), so it routes there and inference fails with "<backend>: model not loaded" — and stays broken until the controller restarts and rebuilds its registry. InFlightTrackingClient now reconciles this: when a tracked inference call returns a model-not-loaded error, it drops the stale replica row (RemoveNodeModel) so the next request reloads the model on a healthy node instead of routing back to the evicted one. The original error is returned unchanged; only the registry is corrected. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed): typed model-not-loaded error via gRPC status code Replace the controller-side error-string match with a shared, code-aware helper. Go error types don't survive the gRPC boundary, so the signal is carried as a status code (FailedPrecondition): - pkg/grpc/grpcerrors: ModelNotLoaded(backend) constructor + IsModelNotLoaded(err) checker (status-code first, message fallback for backends not yet migrated). - InFlightTrackingClient.reconcile now uses grpcerrors.IsModelNotLoaded. - Migrate the Go backends that emit this error (parakeet-cpp, cloud-proxy, rfdetr-cpp) to the typed constructor. Acting on a false positive is harmless (the model is just reloaded). Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-05 09:01:36 +02:00
LocalAI [bot]	994063ba9a	feat(qwen3-tts-cpp): normalize request language for flexible matching (#10174 ) The qwen3-tts.cpp backend honored the request `language` field only via exact lowercase two-letter codes in the C++ language_to_id table, silently defaulting to English for anything else (en-US, EN, english, ...). Add normalizeLanguage() in the Go handler: lowercase + trim, strip the region/locale suffix (en-US, pt_BR, zh-Hans -> en/pt/zh), and resolve common English full names (english -> en). The canonical codes match the existing C++ table, so no C++ change is needed. Covered by a pure-Go Ginkgo spec. Also document the language field and accepted forms under the Qwen3-TTS docs. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 17:26:31 +02:00
LocalAI [bot]	c1a55cf72d	chore: ⬆️ Update mudler/parakeet.cpp to `b11fe5bca78ad8b342dd559a43d76df3984bb447` (#10167 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:07:09 +02:00
LocalAI [bot]	96758841d8	chore: ⬆️ Update predict-woo/qwen3-tts.cpp to `136e5d36c17083da0321fd96512dc7b263f94a44` (#10165 ) ⬆️ Update predict-woo/qwen3-tts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:06:55 +02:00
LocalAI [bot]	7a59260621	chore: ⬆️ Update CrispStrobe/CrispASR to `13d54e110e1538e0f0bc3af0680b9ab246cfb48d` (#10145 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 12:06:32 +02:00
LocalAI [bot]	27e63b9a78	feat(tts): support per-request instructions and params (#10172 ) The OpenAI-compatible TTS endpoint accepts an `instructions` field, but it was silently dropped at the HTTP->gRPC boundary: neither schema.TTSRequest nor the gRPC TTSRequest proto carried it, so backends could only read such a value from static YAML options (identical for every request). This blocked per-line emotion/style and, for Qwen3-TTS VoiceDesign, limited a model config to a single designed voice. Plumb a generic per-request instruction string end to end, plus an optional backend-specific params map: - proto: add `optional string instructions` and `map<string,string> params` to TTSRequest. - schema: add Instructions (maps OpenAI `instructions`) and Params (LocalAI extension) to schema.TTSRequest. - core: thread both through ModelTTS/ModelTTSStream via a newTTSRequest helper that attaches instructions only when non-empty (so backends can fall back to YAML when unset); forward them from the /v1/audio/speech handler. - qwen-tts: prefer the per-request instruction over the YAML `instruct` option (used by both mode detection and generation) and merge per-request params. - chatterbox: merge per-request params (coerced to float/int/bool) over YAML options into generate() kwargs. Fully backward compatible: empty instructions fall back to the YAML option and backends that don't support style/voice instructions ignore the field. Closes #10164 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-04 11:45:02 +02:00
LocalAI [bot]	55c0911c23	chore: ⬆️ Update leejet/stable-diffusion.cpp to `1f9ee88e09c258053fa59d5e05e23dfb10fa0b13` (#10166 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 09:34:34 +02:00
LocalAI [bot]	f6cb6ab6d9	chore: ⬆️ Update ggml-org/llama.cpp to `94a220cd6745e6e3f8de62870b66fd5b9bc92700` (#10168 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 09:34:13 +02:00
LocalAI [bot]	a5c4f822f0	chore: ⬆️ Update antirez/ds4 to `477c0e82e2699b35a65fd0a1ed6fe66b41087dfe` (#10142 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:45:23 +02:00
LocalAI [bot]	0e4e8980e6	chore: ⬆️ Update ggml-org/llama.cpp to `5c394fdc8b564eff6faacc50a139529d875f0e36` (#10143 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:44:21 +02:00
LocalAI [bot]	9d10418593	fix(parakeet-cpp): convert audio before the non-batched transcribe path (#10161 ) The direct (non-batched) transcription path handed the original upload path straight to the C library via parakeet_capi_transcribe_path_json. That loader only understands 16 kHz mono WAV/PCM, so any other format (MP3, etc.) failed with "parakeet: failed to load audio: <file>". Only the batched path converted the input (via decodeWavMono16k -> utils.AudioToWav). Every other audio backend (whisper, crispasr) converts unconditionally with utils.AudioToWav before handing the file to its engine; the parakeet-cpp fallback was the lone exception. Extract a convertToWavMono16k helper (reused by decodeWavMono16k) that produces a 16 kHz mono WAV in a temp dir, and run the non-batched path through it before calling the C loader. WAV inputs already in the target format are passed through without ffmpeg. Add specs covering the helper (decodable copy + cleanup, and an error on a missing input) that need neither the model, the C library, nor ffmpeg. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-03 15:06:57 +02:00
dependabot[bot]	5470051d4d	chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/transformers (#10158 ) chore(deps): bump grpcio in /backend/python/transformers Bumps [grpcio](https://github.com/grpc/grpc) from 1.80.0 to 1.81.0. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.80.0...v1.81.0) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.81.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:38:43 +02:00
LocalAI [bot]	68c5eeebc3	chore: ⬆️ Update ggml-org/whisper.cpp to `610e664ba7cfe3af46125ed1b5a1184fccb51bcd` (#10140 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 10:38:28 +02:00
LocalAI [bot]	b7673d5b76	chore: ⬆️ Update leejet/stable-diffusion.cpp to `2d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5` (#10144 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 10:37:51 +02:00
dependabot[bot]	eebf08ff1d	chore(deps): bump grpcio from 1.80.0 to 1.81.0 in /backend/python/vllm (#10157 ) Bumps [grpcio](https://github.com/grpc/grpc) from 1.80.0 to 1.81.0. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.80.0...v1.81.0) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.81.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 10:37:16 +02:00
LocalAI [bot]	d9ae6481fb	chore: ⬆️ Update mudler/parakeet.cpp to `9edf17c3ada66e0f881dcff155492867db7ac4cf` (#10141 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 08:49:47 +02:00
LocalAI [bot]	860f9d63ad	feat(parakeet-cpp): dynamic batching for concurrent transcription requests (#10112 ) * feat(parakeet-cpp): dynamic-batching scheduler (queue + dispatcher) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): dynamic batching for AudioTranscription via batched JSON C-API Drop SingleThread; route unary transcription through the in-process batcher which coalesces concurrent requests into one batched engine call. Streaming stays mutually exclusive via engineMu. Adds batch_max_size / batch_max_wait_ms options (size=1 disables; recommended on CPU). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): tear down dispatcher in Free; log batch config; preallocate; clarify stream lock Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): Ginkgo batcher tests; optional batch C-API binding with per-request fallback The batched JSON C-API symbol exists only in newer libparakeet.so (ABI >= 2); probe it with Dlsym and register optionally so the backend still loads against an older library, falling back to per-request transcription. Rewrites the batcher unit tests as Ginkgo/Gomega specs (forbidigo bans t.Fatal in tests). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): debug-log coalesced batch size in runBatch Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): default batch_max_size to 1 (batching opt-in) Dynamic batching now defaults off (batch_max_size:1, one request at a time). Raise batch_max_size to opt in: it is a large throughput win on GPU under concurrent load, but on CPU and low-concurrency setups it only adds latency, so off is the safer default. The startup log now states whether batching is on or off, and the audio-to-text docs are updated to match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(parakeet-cpp): bump parakeet.cpp to 8a7c482 (batched decode + B=1 fast-path) parakeet.cpp PR #1 merged the batched encoder/decode and the B=1 encoder fast-path to master. Point PARAKEET_VERSION at that commit so the backend builds the batched C-API (parakeet_capi_transcribe_pcm_batch_json) that the dynamic batcher calls; the prior pin (30a3075) predated it, so only the per-request fallback path was exercised. Verified the shared lib builds with the backend's CMake flags and exports the batch symbol. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 14:49:02 +02:00
LocalAI [bot]	a5a0b3dc4e	chore: ⬆️ Update CrispStrobe/CrispASR to `05e60432bcb5bc2113f8c395a41e86497c11504a` (#10115 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 14:48:47 +02:00
番茄摔成番茄酱	94eca04c60	fix(nemo): pin texterrors to 1.1.6 for GLIBCXX compatibility (#10134 ) Pin texterrors==1.1.6 before nemo_toolkit[asr] in requirements-cublas13.txt. The texterrors package (a NeMo transitive dependency) contains a compiled C++ extension (texterrors_align.so) that may be built from source during OCI image creation. When built on systems with GCC 14+ (e.g. Ubuntu 24.04), the resulting binary requires GLIBCXX_3.4.32, which is not available in the default LocalAI container (Ubuntu 22.04, GLIBCXX up to 3.4.30). Pinning to 1.1.6 (the latest release) ensures: - Reproducible builds across environments - pip resolves the pre-built manylinux2014 wheel (needs only GLIBCXX_3.4.11) instead of potentially building from source with a newer toolchain Fixes #10056 Signed-off-by: 番茄摔成番茄酱 <fqscfqj@outlook.com>	2026-06-02 14:48:27 +02:00
LocalAI [bot]	35bd485d6a	chore: ⬆️ Update ggml-org/llama.cpp to `5dcb71166686799f0d873eab7386234302d05ecf` (#10128 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:35 +02:00
LocalAI [bot]	1fe96f8d9a	chore: ⬆️ Update mudler/parakeet.cpp to `8a7c48209d7882a7ce79a6b306270e4703194543` (#10129 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:19 +02:00
LocalAI [bot]	c508e9d7c6	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7948df8ac1070f5f6881b8d34675821893eb97d6` (#10127 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 09:06:03 +02:00
LocalAI [bot]	55e754fd05	chore: ⬆️ Update ggml-org/whisper.cpp to `23ee03506a91ac3d3f0071b40e66a430eebdfa1d` (#10130 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-02 01:43:03 +02:00
LocalAI [bot]	7013e13f05	chore: ⬆️ Update ggml-org/llama.cpp to `399739d5c5978351f39e3454bfbfbab4f369088f` (#10119 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 14:24:51 +02:00
LocalAI [bot]	63f176346e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `be65ac7511b30379b003626c15224798929e33d4` (#10118 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:50 +02:00
LocalAI [bot]	af94d08729	chore: ⬆️ Update ggml-org/whisper.cpp to `fe69461618ffc50ba8afa65c25cc6c6e34d4537f` (#10117 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 00:43:34 +02:00
LocalAI [bot]	6795d38f50	chore: ⬆️ Update mudler/parakeet.cpp to `cb45f68068081af01e7092e91b038ee353eb56be` (#10116 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-31 23:57:15 +02:00

1 2 3 4 5 ...

1414 Commits