LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-23 06:34:48 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	4a5219fa9c	chore: ⬆️ Update ggml-org/whisper.cpp to `e0fd1f6787a5bd4a4957dd97c5b64df882ee7b0c` (#9997 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 08:33:53 +02:00
LocalAI [bot]	b5a620294e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `1ceb5bd9df7784bcdf67dd9ed8bf0198b542ebc9` (#9994 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 08:33:37 +02:00
LocalAI [bot]	5d544a7868	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `b4e1d916c5ec7e75ea3c124dd090425a99fc613f` (#9995 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 23:57:17 +02:00
LocalAI [bot]	87e01aa290	chore: ⬆️ Update antirez/ds4 to `ad0209f6a4b067574d2b4afe896c08c177156b31` (#9996 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 23:56:33 +02:00
LocalAI [bot]	de2ce74bea	fix(stablediffusion-ggml): mux LTX-2 audio into output MP4 (#9990 ) feat(stablediffusion-ggml): mux LTX-2 audio into output MP4 sd.cpp's generate_video now returns a sd_audio_t* alongside the video frames for models with an audio VAE (LTX-2.3). Our gosd wrapper was already collecting that pointer but immediately freed it without ever muxing it into the output, so LTX-2 generations landed as silent MP4s even though the audio VAE decode succeeded. Stage the planar float32 waveform to a temp WAV (IEEE float, header hand-built; samples interleaved on the fly), then add it as a second ffmpeg input with -c:a aac -map 0:v:0 -map 1:a:0 -shortest. The temp WAV is cleaned up unconditionally after ffmpeg exits, including on the write/waitpid error paths. Non-LTX models (Wan i2v / FLF2V) keep their current behaviour: audio arg is nullptr, the audio-related ffmpeg flags are not added, and no temp file is created. Assisted-by: Claude:claude-opus-4-7 Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-25 22:40:16 +02:00
LocalAI [bot]	b02e3ffe61	feat(stablediffusion-ggml): LTX-2 support + LTX-2.3 GGUF gallery entries (#9980 ) stable-diffusion.cpp gained LTX-2 video generation, which requires an audio VAE and an embeddings_connectors safetensors in addition to the usual diffusion model, VAE, and LLM text encoder. The pinned commit exposes audio_vae_path and embeddings_connectors_path on sd_ctx_params_t; wire both through the option parser so gallery entries can point at the LTX-specific assets. Ship six LTX-2.3 GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0 each) backed by a new ltx-ggml.yaml template that defaults to euler / cfg_scale 6.0 / vae_decode_only:false / diffusion_flash_attn / offload_params_to_cpu — matching the upstream LTX-2 CLI recipe. Each entry pulls the model GGUF plus the QAT gemma-3-12b-it text encoder, video VAE, audio VAE, and embeddings connectors needed for T2V / I2V / FLF2V. Assisted-by: Claude:claude-opus-4-7 [Claude-Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-25 13:00:28 +02:00
Richard Palethorpe	6a80e23733	feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 ) Add a routing middleware stack and a cloud-proxy backend. * cloud-proxy: a Go gRPC backend that forwards OpenAI- and Anthropic-shaped chat requests to upstream providers, with an optional translate mode (OpenAI request -> Anthropic /v1/messages -> OpenAI response) and full tool-calling support. * routing: admission control, content-aware model routing (embedding cache + classifier + rerank + Arch-Router score), PII detection/redaction (regex + NER) with streaming filter and OpenAI/Anthropic adapters, and a per-user/per-key billing recorder backed by GORM or in-memory storage. * middleware: UsageMiddleware records usage via the billing recorder, plus admission, route-model, usage-stamp and trace middlewares. * observability: BackendTrace ring buffer stores full request bodies (capped), MITM proxy emits structured trace events, and router classifier decisions surface at /api/router/decide. * gallery: Arch-Router-1.5B (Q4_K_M and Q8_0). * UI: cloud-proxy model-editor fields, classifier system-prompt and score-normalization config, and a Traces page rendering request bodies. Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-25 09:28:27 +02:00
LocalAI [bot]	1dcd1ae915	chore: ⬆️ Update ggml-org/llama.cpp to `549b9d84330c327e6791fa812a7d60c0cf63572e` (#9974 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 09:22:56 +02:00
LocalAI [bot]	acad78a95a	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `9f7ba245ab41e118f03aa8dd5134d18a81159d02` (#9973 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 00:05:29 +02:00
LocalAI [bot]	c94d1e1f5b	chore: ⬆️ Update antirez/ds4 to `f91c12b50a1448527c435c028bfc70d1b00f6c33` (#9975 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-25 00:05:15 +02:00
Copilot	270c256409	Fix kokoros backend build break from Backend trait drift (#9972 ) * Initial plan * fix(kokoros): implement missing AudioToAudioStream trait stubs Agent-Logs-Url: https://github.com/mudler/LocalAI/sessions/e3c6b042-f055-4df9-a05e-e2d8434ee58b Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-24 22:39:15 +02:00
LocalAI [bot]	dcc5599f89	chore: ⬆️ Update leejet/stable-diffusion.cpp to `a397e03488cc27e1a42da646b82dfce9f50741c0` (#9965 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-24 08:35:36 +02:00
LocalAI [bot]	a95f4e63e0	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `642c038ccdf3dd08e6d9ac6fdc3b1c311ebd8a02` (#9966 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 23:52:51 +02:00
LocalAI [bot]	dfd19a3f88	chore: ⬆️ Update ggml-org/llama.cpp to `c0c7e147e7efa6c5858754b47259ba4880f8a906` (#9963 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 23:52:36 +02:00
LocalAI [bot]	63d84a5705	chore: ⬆️ Update antirez/ds4 to `444afce822057d87f14c4dec307dce24fd49b3ee` (#9964 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 23:51:53 +02:00
LocalAI [bot]	e4cc1f11f3	chore: ⬆️ Update ggml-org/llama.cpp to `1acee6bf8939948f9bcbf4b14034e4b475f06069` (#9952 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 08:38:29 +02:00
LocalAI [bot]	6ed269d0b9	chore: ⬆️ Update ggml-org/whisper.cpp to `0ccd896f5b882628e1c077f9769735ef4ce52860` (#9954 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 08:37:26 +02:00
LocalAI [bot]	5756fb046d	chore: ⬆️ Update leejet/stable-diffusion.cpp to `0baf721215f45335a5df8caf0ecb34e870c956e7` (#9955 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-23 08:37:10 +02:00
LocalAI [bot]	d0a59be9de	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `b3d39cff8bffbd67296d6badd4076a1486a0715c` (#9953 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-22 23:58:48 +02:00
LocalAI [bot]	5cda4f1ccf	fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 ) * fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-22 23:01:22 +02:00
LocalAI [bot]	4735345105	chore: ⬆️ Update ggml-org/llama.cpp to `bb28c1fe246b72276ee1d00ce89306be7b865766` (#9934 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-22 09:49:33 +02:00
LocalAI [bot]	7384fd800b	chore: ⬆️ Update antirez/ds4 to `8d576642c39b9a2d782a80159ba84ef5a81c0b81` (#9932 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-22 08:31:49 +02:00
LocalAI [bot]	6942713d85	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3a8788cb7d74f185d6b18688e9563015524ecaf5` (#9933 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-22 00:31:19 +02:00
LocalAI [bot]	0cf52c44d4	chore: ⬆️ Update ggml-org/whisper.cpp to `8443cf05e3fa8ce1b32348e1bcbcf8fc31f7f3ae` (#9929 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-21 23:24:01 +02:00
LocalAI [bot]	0d34cf7cbd	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `48a55f74e4c6e2aeda363dd386c1ac9170a0af71` (#9930 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-21 23:23:37 +02:00
LocalAI [bot]	959de86761	feat(llama-cpp): make server-side prompt cache work by default (#9925 ) Aligns LocalAI's llama-cpp gRPC backend with upstream's auto-on prompt cache path so repeated system prompts (agents, OpenAI/Anthropic-compatible CLIs, coding assistants) skip prefill on subsequent calls without any YAML changes. Reported in #9921. Upstream's server enables `kv_unified=true` (and bumps `n_parallel` to 4) when slot count is auto, which unlocks `cache_idle_slots`. LocalAI hardcodes `n_parallel=1` and so far also hardcoded `kv_unified=false`, which silently force-disables idle-slot saving at server init. The host prompt cache was allocated but never written across requests. Changes in backend/cpp/llama-cpp/grpc-server.cpp: - params.kv_unified: false -> true (single-slot path now benefits from the prompt cache; users can opt out with `kv_unified:false`) - params.n_ctx_checkpoints: 8 -> 32 (match upstream default) - params.cache_idle_slots = true initialized explicitly (upstream default) - params.checkpoint_every_nt = 8192 initialized explicitly (upstream default) - New option parsers: cache_idle_slots / idle_slots_cache, checkpoint_every_nt / checkpoint_every_n_tokens Docs: - features/text-generation.md: fix misleading `cache_ram` description (it's the host-side prompt cache, not the KV cache), document the kv_unified + cache_ram + cache_idle_slots interaction, add rows for the two newly-exposed options, and add a worked example for the agent/CLI workload from the issue. - advanced/model-configuration.md: mark the legacy `prompt_cache_path` / `prompt_cache_all` / `prompt_cache_ro` YAML fields as unused by the llama-cpp gRPC backend (they target upstream's CLI completion tool and are not consumed by grpc-server.cpp) and point readers at the new prompt-cache explainer. Closes #9921 Assisted-by: claude:opus-4.7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-21 16:31:48 +02:00
Richard Palethorpe	c68818a62e	fix(llama-cpp): terminate tensor_buft_overrides with sentinel (#9919 ) llama.cpp's model loader asserts back().pattern == nullptr on params.tensor_buft_overrides (and on params.kv_overrides.back().key[0] == 0) before binding them into llama_model_params. PR #8560 attempted to satisfy llama_params_fit's placeholder requirement by pre-filling params.tensor_buft_overrides up to llama_max_tensor_buft_overrides() before the option-parse loop. Any subsequent push_back from override_tensor / draft_cpu_moe / draft_n_cpu_moe / draft_override_tensor then appended real entries after the placeholders, leaving back() with a real pattern and tripping the assert. The draft override vector likewise had no terminator at all. Mirror upstream common/arg.cpp:645-658 instead: real entries are pushed during option parsing, and after parsing we pad the main vector up to ntbo (placeholders land at the end, so back() is always nullptr) and append a single {nullptr, nullptr} to the draft vector when it is non-empty. The existing kv_overrides terminator block already matches upstream and stays. Verified against ggml-org/llama.cpp@5cbaa5e: only tensor_buft_overrides (main + draft) and kv_overrides are sentinel-terminated common_params fields; everything else is size-driven std::vector. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-21 12:55:06 +02:00
LocalAI [bot]	12e056e96d	chore: ⬆️ Update ggml-org/llama.cpp to `ad277572619fcfb6ddd38f4c6437283a4b2b8636` (#9915 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-21 09:07:31 +02:00
LocalAI [bot]	b2d68a53a2	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `11a1fea9e291f12ce2c803a9d7812c30ca806bcf` (#9914 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 22:04:06 +00:00
LocalAI [bot]	1ffd82a050	chore: ⬆️ Update antirez/ds4 to `2606543be7a8c125a32cee37f5d1d85dc78f2fcf` (#9909 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 21:22:26 +00:00
LocalAI [bot]	f515168dbe	chore(acestep-cpp): bump pin to ed53caf and adapt wrapper to new API (#9908 ) The new ace-step.cpp revision moves backend initialization inside each `_load` call and drops the separate `DiTGGMLConfig` argument from `dit_ggml_load` (config now lives in `DiTGGML::cfg`, populated from GGUF metadata at load time). Drop the now-removed `_init_backend` calls and replace `g_dit_cfg` accesses with `g_dit.cfg`. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-20 21:05:32 +00:00
LocalAI [bot]	ef6ca34513	chore: ⬆️ Update leejet/stable-diffusion.cpp to `5b0267e941cade15bd80089d89838795d9f4baa6` (#9907 ) Adapt the C++ wrapper to the new `generate_video()` signature: upstream now returns `bool` and writes frames/audio via out-parameters (`sd_image_t`, `sd_audio_t`). Also set `p->fps` on the params struct (new upstream field) and free the returned audio handle on both the success and error paths. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-20 20:53:19 +00:00
dependabot[bot]	9413c3767f	chore(deps): update transformers requirement from >=5.8.0 to >=5.8.1 in /backend/python/transformers (#9883 ) chore(deps): update transformers requirement Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v5.8.0...v5.8.1) --- updated-dependencies: - dependency-name: transformers dependency-version: 5.8.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-20 22:16:02 +02:00
dependabot[bot]	3bf3cce232	chore(deps): bump sentence-transformers from 5.4.0 to 5.5.0 in /backend/python/transformers (#9888 ) chore(deps): bump sentence-transformers in /backend/python/transformers Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.4.0 to 5.5.0. - [Release notes](https://github.com/huggingface/sentence-transformers/releases) - [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.4.0...v5.5.0) --- updated-dependencies: - dependency-name: sentence-transformers dependency-version: 5.5.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-20 22:13:39 +02:00
LocalAI [bot]	06f8159035	chore: ⬆️ Update ggml-org/llama.cpp to `67ace021da905e27ecbdf1176b0eef578a5288c0` (#9897 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 22:05:58 +02:00
LocalAI [bot]	24e04d8e81	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `77413bc900f9a2bfd8a5407f184427bcc0825f6c` (#9899 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 01:02:53 +02:00
LocalAI [bot]	b9a49449ae	chore: ⬆️ Update ggml-org/whisper.cpp to `afa2ea544fb4b0448916b4a31ecd33c8685bd482` (#9898 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 01:02:25 +02:00
LocalAI [bot]	1879e11042	chore: ⬆️ Update antirez/ds4 to `599e49d253971451f710cb8323344e789906ed6c` (#9900 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 01:01:45 +02:00
LocalAI [bot]	4b02d23c0c	chore: ⬆️ Update ggml-org/llama.cpp to `5cbaa5e69e09bde3334cd8c355570553a0dca027` (#9876 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-19 08:06:16 +02:00
LocalAI [bot]	21140e96b2	chore: ⬆️ Update ggml-org/whisper.cpp to `47b9eb37a33c5031a1b667ace64477330b9f36c1` (#9877 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-19 08:05:56 +02:00
LocalAI [bot]	ca51606bfe	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `40aae0b6d86d50c0ee7011b3ce59a233203e430a` (#9875 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-19 08:01:41 +02:00
LocalAI [bot]	11cff1b309	chore: ⬆️ Update ggml-org/llama.cpp to `87589042cac2c390cec8d68fb2fad64e0a2a252a` (#9855 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-18 08:01:30 +02:00
LocalAI [bot]	3cba35ed32	chore: ⬆️ Update antirez/ds4 to `c9dd9499bfa57c1bbfbb4446eff963330ab5329b` (#9864 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-17 23:19:58 +02:00
LocalAI [bot]	265ae35231	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `c35189d83c91aad780aba62b89f2830cb2916223` (#9866 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-17 23:19:43 +02:00
LocalAI [bot]	6a48157a80	chore: ⬆️ Update leejet/stable-diffusion.cpp to `bd17f53b7386fb5f60e8587b75e73c4b2fed3426` (#9854 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-16 23:12:05 +02:00
LocalAI [bot]	41c838b2df	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `3e573cfea6e0a332eff822ffbdb1dd3b112e9051` (#9856 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-16 22:44:08 +02:00
LocalAI [bot]	21e793ad2a	chore: ⬆️ Update antirez/ds4 to `ef0a4905d05263df8e63689f2dd1efac618a752c` (#9857 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-16 22:43:46 +02:00
LocalAI [bot]	d77a9137d8	feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852 ) * feat(llama-cpp): bump to MTP-merge SHA and document draft-mtp spec type Update LLAMA_VERSION to 0253fb21 (post ggml-org/llama.cpp#22673 merge, 2026-05-16) to pick up Multi-Token Prediction support. No grpc-server.cpp changes are required: the existing `spec_type` option delegates to upstream's `common_speculative_types_from_names()`, which already accepts the new `draft-mtp` name. The `n_rs_seq` cparam needed by MTP is auto-derived inside `common_context_params_to_llama` from `params.speculative.need_n_rs_seq()`, and when no `draft_model` is set the upstream server builds the MTP context off the target model itself. Docs: extend the speculative-decoding section of the model-configuration guide with the new type, both load paths (MTP head embedded in the main GGUF vs. separate `mtp-.gguf` sibling), the PR's recommended `spec_n_max:2-3`, and the chained `draft-mtp,ngram-mod` recipe. Also notes that the upstream `-hf` auto-discovery of `mtp-.gguf` siblings is not wired through LocalAI's gRPC layer. Agent guide: short note explaining that new upstream spec types are picked up automatically and that MTP needs no gRPC plumbing. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): auto-detect MTP heads and enable draft-mtp on import + load Detect upstream's `<arch>.nextn_predict_layers` GGUF metadata key (set by `convert_hf_to_gguf.py` for Qwen3.5/3.6 family models and similar) and, when present and the user has not configured a `spec_type` explicitly, auto-append the upstream-recommended speculative-decoding tuple: - spec_type:draft-mtp - spec_n_max:6 - spec_p_min:0.75 The 0.75 p_min is pinned defensively because upstream marks the current default with a "change to 0.0f" TODO; locking it here keeps acceptance thresholds stable across future llama.cpp bumps. Detection runs in two places: - The model importer (`POST /models/import-uri`, the `/import-model` UI) range-fetches the GGUF header for HuggingFace / direct-URL imports via `gguf.ParseGGUFFileRemote`, with a 30s timeout and non-fatal error handling. OCI/Ollama URIs are skipped because the artifact is not directly streamable; the load-time hook covers them once the file is on disk. - The llama-cpp load-time hook (`guessGGUFFromFile`) reads the local header on every model start and appends the same options if `spec_type` is not already set. Both paths share `ApplyMTPDefaults` and respect an explicit user-set `spec_type:` / `speculative_type:` so YAML overrides win. Ginkgo specs cover the append, preserve-user-choice, legacy alias, and nil safety paths. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(importer): resolve huggingface:// URIs before MTP header probe `gguf.ParseGGUFFileRemote` only speaks HTTP(S), but the importer was handing it the raw `huggingface://...` URI directly (and similarly for any other custom downloader scheme). Live-test against `huggingface://ggml-org/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-MTP-Q8_0.gguf` exposed this: the probe failed with `unsupported protocol scheme "huggingface"`, was caught by the non-fatal error path, and the MTP options were silently never applied to the generated YAML. Route every candidate URI through `downloader.URI.ResolveURL()` and require the resolved form to be HTTP(S). After the fix the probe successfully reads `<arch>.nextn_predict_layers=1` from the real HF GGUF and the emitted ConfigFile carries spec_type:draft-mtp, spec_n_max:6, spec_p_min:0.75 as intended. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-16 22:42:48 +02:00
LocalAI [bot]	00b8989886	chore: ⬆️ Update ggml-org/llama.cpp to `1348f67c58f561808136e8a152a9eddec168f221` (#9842 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-16 08:41:09 +02:00
LocalAI [bot]	43e0d397ca	chore: ⬆️ Update ggml-org/whisper.cpp to `968eebe77225d25e57a3f981da7c696310f0e881` (#9843 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-16 00:30:04 +02:00

1 2 3 4 5 ...

1323 Commits