LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 04:46:54 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	237bce48e8	feat(ui): forking chat - retry any answer, copy, duplicate, branch (#10645 ) (#10654 ) * feat(ui): clone a chat into a new conversation (#10645) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): retry any assistant answer, not just the last (#10645) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): copy an entire chat to the clipboard (#10645) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): branch a new chat from any assistant answer (#10645) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ui): send truncated history on mid-conversation retry (#10645) Mid-conversation retry regenerated an answer with the downstream turns still in the model's context. handleRegenerate truncated the DOM history via updateChatSettings (a scheduled state update), but the synchronous sendMessage that followed read the stale, pre-truncation history from its closure to build the outbound API payload. Thread the intended base history explicitly through sendMessage's options.baseHistory so the request body matches the truncated view. Backward compatible: the normal send path (no baseHistory) is unchanged. Also guard two minor issues in Chat.jsx: the "Branch from here" button now renders under !isStreaming to match the retry button, and the duplicate toast only fires when forkChat returns a chat (not on a null result). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-03 00:04:44 +02:00
LocalAI [bot]	a4e6e01e4d	fix(process): give backend workers a parent-death safety net (#10639 ) * fix(grpc): self-terminate backend workers when LocalAI dies non-gracefully Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI spawns) can be orphaned and linger — holding VRAM and its listen port — if the LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before its own teardown runs. Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown -> ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when LocalAI receives a catchable signal and survives long enough to run its handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1, whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or option for a caller to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (it never builds the exec.Cmd itself), so it cannot set a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing tells the backend to exit and it is reparented to init. Fix: add a best-effort, backend-side safety net at the one shared choke point every out-of-process Go backend routes through — grpc.StartServer / RunServer in pkg/grpc. On startup it captures getppid() and polls; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal the original parent died) it logs and self-terminates. getppid() reparent detection is portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged. Scope/limitations: covers Go-based backends (everything using pkg/grpc). The C++ backends (e.g. llama-cpp) and Python backends do not route through pkg/grpc and are not covered by this mechanism — they would each need an equivalent parent-death check (follow-up). The fully general fix is for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language (suggested upstream follow-up; out of scope for this LocalAI-only PR). Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild process tree, lets the middle process exit to orphan the grandchild running the real watchParentDeath, and asserts it detects the reparent and self-terminates. Unix-only (build-tagged), runs in CI (Linux). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(process): extend parent-death backstop to C++ and Python backends The Go parent-death watcher (pkg/grpc/parentwatch.go, commit `772b435d5`) only protects backends that route through pkg/grpc. C++ and Python backends don't, so the originally-reported case — the llama.cpp gRPC worker surviving a non-graceful LocalAI death — was still uncovered. Extend the same best-effort backstop to both languages, reusing the exact mechanism and semantics: - capture getppid() at startup, skip if already orphaned (<=1) - a background thread polls getppid() and self-exits on reparenting (getppid() != orig \|\| == 1), portable across Linux/macOS, no-op on Windows - same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL (default 2s; accepts Go-style durations like 500ms/2s/1m) C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++ backend) as a dependency-free header parent_watch.h, wired into grpc-server.cpp's main() and copied at build time via prepare.sh. C++ backends have no shared server scaffolding, so other C++ backends (ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would each need the same one-line include+call as follow-ups. Python: implemented once in the shared common/parent_watch.py and armed from common/grpc_auth.py's get_auth_interceptors() — the single helper every one of the 35 Python backends invokes while building its gRPC server — so all Python backends (and future ones) are covered with no per-backend edits and no duplicated implementation. Tests (real process-tree reparent detection, mirroring the Go test): - backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh) - backend/python/common/parent_watch_test.py (python -m unittest) Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>	2026-07-02 19:16:48 +02:00
LocalAI [bot]	6eea3ef2ac	fix(backends): make backend install ops idempotent unless forced (#10643 ) * fix(backends): make backend install ops idempotent unless forced POST /backends/apply hardcoded force=true through LocalBackendManager.InstallBackend, so applying an already-installed backend re-downloaded and re-extracted the whole artifact every time. API clients that ensure a backend exists at startup paid a full OCI image pull on every boot. Backend install ops now default to non-forced — an installed, runnable backend short-circuits (the orphaned-meta reinstall path in InstallBackendFromGallery is preserved) — and reinstall stays available: - ManagementOp gains a Force field; the local manager passes it through instead of hardcoding true. - /backends/apply accepts an optional "force" boolean in the body. - The React UI install route keeps forcing, since its button doubles as the explicit "Reinstall backend" action. Distributed installs already behaved this way (workers skip when the binary exists unless force is set); this aligns single-node behavior. Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backends): don't force-reinstall LOCALAI_EXTERNAL_BACKENDS on boot The startup loop for LOCALAI_EXTERNAL_BACKENDS runs InstallExternalBackend for each listed backend on every boot, and its gallery-name path hardcoded force=true — so every start re-downloaded and re-extracted each listed backend's OCI image even when it was installed and runnable. Supervising apps that list several backends paid several full OCI pulls per launch. Give InstallExternalBackend an explicit force parameter (it only affects the gallery-name fallback; URI installs always write) and pass: - false from the boot loop and `local-ai backends install` (idempotent ensure — `backends upgrade` is the refresh path), - op.Force from the local manager's external-URI op, - the request's force on the worker install path and true on its upgrade path (behavior unchanged). Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-02 19:16:29 +02:00
LocalAI [bot]	ad97bcbbdd	chore(model gallery): 🤖 add 1 new models via gallery agent (#10644 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 19:16:09 +02:00
walcz-de	9d8ff90941	fix(cloud-proxy): parameter compatibility with newest reasoning models (#10640 ) Newest cloud reasoning models reject two parameters the cloud-proxy backend currently sends: - Anthropic (claude-opus-4-x) and OpenAI (gpt-5.x) return 400 when temperature is present: "'temperature' is deprecated for this model". OpenAI-compatible clients typically send only the server-side DEFAULT sampling values rather than user intent, so the translators now forward neither temperature nor top_p and let the upstream apply its own defaults. - OpenAI gpt-5.x rejects max_tokens ("Unsupported parameter: 'max_tokens' ... Use 'max_completion_tokens' instead"). The OpenAI translator now serializes the token limit as max_completion_tokens, which current chat-completions models accept. Verified live against claude-opus-4-8, gpt-5.5 and gemini-3.1-pro (Gemini OpenAI-compat endpoint). Tests updated to the new contract. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: stefanwalcz <stefan.walcz@walcz.de>	2026-07-02 19:15:43 +02:00
LocalAI [bot]	29001a88c1	fix(distributed): don't let a dead worker pin the model-load advisory lock (#10600 ) * fix(distributed): don't let a dead worker pin the model-load advisory lock In distributed mode a chat request could fail with: failed to route model with internal loader: routing model ...: loading model ...: advisorylock: acquiring lock <id>: ERROR: canceling statement due to lock timeout (SQLSTATE 55P03) Root cause is two independent defects in the cross-replica model-load path: 1. SmartRouter.Route holds a per-model PostgreSQL advisory lock for the whole cold-load sequence, which includes installBackendOnNode -> InstallBackend, a NATS request-reply with a 15m deadline (DefaultBackendInstallTimeout) that ignored ctx. When the chosen worker died mid-install, the holder sat on the lock for up to 15m. The detached loadCtx (WithoutCancel) had no deadline, so nothing capped the hold. 2. The acquiring statement, pg_advisory_lock(), is subject to any deployment global lock_timeout. A common operator setting (e.g. 10s) aborts the wait with SQLSTATE 55P03, so every other replica's request for that model hard -errored instead of waiting for the in-progress load and reusing it. For the ~15m window the model was effectively unroutable. Fixes: - advisorylock.WithLockCtx (postgres): SET lock_timeout = 0 on its dedicated connection (RESET before it returns to the pool) so the Go context, not a deployment-wide GUC, governs how long we wait. Waiters now block and then re-check, reusing the model another replica just loaded. - SmartRouter: bound the detached loadCtx with a single ModelLoadCeiling so the lock is always released in bounded time even if a sub-step wedges. Default is the configured backend.install deadline + 10m (staging + LoadModel margin), so a legitimately slow load is never cut. - installBackendOnNode: use singleflight.DoChan + select on ctx.Done() so the install wait honors cancellation; the ceiling can then actually free a caller pinned behind a dead worker. The shared install still coalesces via singleflight. Reproduced both defects as failing tests first (a real 55P03 against a testcontainer with a short lock_timeout; a wedged install that blocks Route) and confirmed green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(distributed): bound advisory-lock wait instead of disabling lock_timeout Setting lock_timeout = 0 to override a deployment's short global lock_timeout meant "wait forever" server-side. Safe for SmartRouter.Route (its loadCtx now carries the model-load ceiling) but unsafe for the schema-migration callers that pass context.Background(): a holder whose session never releases would hang them indefinitely. Derive the server-side lock_timeout from the caller's context instead: its remaining budget plus a margin (so the Go context's cancellation still wins with a clean error and the server bound is only a backstop), or a finite 30m backstop when the context has no deadline. Never zero - "wait forever" is no longer possible, while a deployment's hostile short lock_timeout is still overridden so legitimate cross-replica waits don't fail with 55P03. Added a spec proving a deadline-less waiter gives up at the (shrunk) backstop rather than hanging. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-07-02 09:52:51 +02:00
LocalAI [bot]	b0bfa0852e	chore: ⬆️ Update CrispStrobe/CrispASR to `fcbc8718e654995e3bd2d0c98bcb8e55e297d23c` (#10634 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:48:20 +02:00
LocalAI [bot]	39a93e91cf	chore: ⬆️ Update vllm-metal (darwin) to `v0.3.0.dev20260701132215` (#10633 ) ⬆️ Update vllm-project/vllm-metal (darwin) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:48:08 +02:00
LocalAI [bot]	26e0c98967	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3590aa8d626e671a1b1dc84506ea2932a243a480` (#10631 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:54 +02:00
LocalAI [bot]	9acca54b25	chore: ⬆️ Update mudler/parakeet.cpp to `e8acc6172a94e20a952cf1843decace5d771a94b` (#10629 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:41 +02:00
LocalAI [bot]	2728e6000e	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `068b173649f2fd8dc96b35ada5a0b76d8985105d` (#10632 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:28 +02:00
LocalAI [bot]	006310d746	chore: ⬆️ Update ggml-org/llama.cpp to `4fc4ec5541b243957ae5099edb67372f8f3b550e` (#10630 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:15 +02:00
LocalAI [bot]	05acdb1778	chore: ⬆️ Update ggml-org/whisper.cpp to `6fc7c33b4c3a2cec83e4b65abd5e96a890480375` (#10635 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:01 +02:00
LocalAI [bot]	5e68b5700c	chore(model-gallery): ⬆️ update checksum (#10637 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:26:32 +02:00
pos-ei-don	7910018249	fix(vllm): non-streaming tool-call regression after #10351 (#10638 ) fix(vllm): non-streaming tool-call regression after #10351 (native_streaming is a capability flag, not a state flag) #10351 introduced native streaming via `parser.extract_tool_calls_streaming` and gated the post-loop `extract_tool_calls` block on `native_streaming and not native_streaming_error`. That works for streaming requests, but for non-streaming requests the same flag is still True (it only means "the parser can stream", not "we actually streamed"), so the block was skipped and the `elif` cleared `content = ""` — the tool call was silently lost. Symptom: non-streaming chat.completions with `tools=[...]` returns `finish_reason: "stop"` with `content: ""` and no `tool_calls`. Streaming requests are unaffected. Fix: gate both branches on `streaming` too, so the extract_tool_calls block runs for non-streaming requests (and for streaming requests that fell back to the buffered path). Reproduction (vLLM 0.24, Qwen3-Coder-Next-NVFP4, qwen3_coder parser): curl -s -X POST http://localhost:8080/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"coder","stream":false, "messages":[{"role":"user","content":"7*8 via calc"}], "tools":[{"type":"function","function":{"name":"calc", "parameters":{"type":"object", "properties":{"expression":{"type":"string"}}}}}]}' Before: finish_reason: "stop", content: "", tool_calls: [] After: finish_reason: "tool_calls", tool_calls[0].function.name: "calc" Streaming path re-verified in the same setup: delta.tool_calls arrives token-by-token, finish_reason: "tool_calls", no raw XML in content. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-07-02 09:26:14 +02:00
LocalAI [bot]	1a03712a6f	fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table (#10627 ) * fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table ROCm's bundled libdrm_amdgpu looks up the GPU ASIC ID table at a hardcoded fallback path, /opt/amdgpu/share/libdrm/amdgpu.ids, which is only populated by AMD's full amdgpu-install (graphics/DKMS) stack. The hipblas image is compute-only and doesn't have it, so every model load logs "No such file or directory" and the GPU can't be identified. Symlink it to the equivalent file already shipped by Ubuntu's libdrm-amdgpu1 package. Fixes #10624 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(hipblas): correct amdgpu.ids source package name in comment Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids is owned by libdrm-common, not libdrm-amdgpu1 as the comment said. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-02 09:25:14 +02:00
LocalAI [bot]	703ea32de6	chore: ⬆️ Update vllm-metal (darwin) to `v0.3.0.dev20260630095652` (#10616 ) ⬆️ Update vllm-project/vllm-metal (darwin) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:59 +02:00
LocalAI [bot]	751db06e35	chore: ⬆️ Update CrispStrobe/CrispASR to `8fd9db8fec8cb5e929d23d3267ed5817794feb1a` (#10615 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:41 +02:00
LocalAI [bot]	f46c0e9c83	docs: ⬆️ update docs version mudler/LocalAI (#10614 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:21 +02:00
LocalAI [bot]	0d8adfc59a	chore: ⬆️ Update ggml-org/llama.cpp to `0eca4d490e591d4e93058d07540cf47278a72577` (#10617 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 09:31:50 +02:00
LocalAI [bot]	43f2615e19	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.24.0` (#10618 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:53:03 +02:00
LocalAI [bot]	875c539ad5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `29431b31c89e79c10f8736e8f2742485ba1713d6` (#10620 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:52:36 +02:00
LocalAI [bot]	d641ded194	chore: ⬆️ Update ggml-org/whisper.cpp to `0874de3e8e8e48361dba85c7fe6d176f008bf158` (#10621 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:43:40 +02:00
LocalAI [bot]	40445fff05	chore: ⬆️ Update leejet/stable-diffusion.cpp to `484baa41e5e006c52dcd4addc38c830b9489745f` (#10619 ) * ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt to new generate_image() out-param signature leejet/stable-diffusion.cpp@484baa4 changed generate_image() from returning sd_image_t* to returning bool with images_out/num_images_out out-parameters (same pattern already used by generate_video()). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 08:32:57 +02:00
Tai An	057dee956a	fix(launcher): keep data/config under ~/.localai (#10610 ) (#10613 ) The launcher starts the server with run --models-path/--backends-path but leaves --data-path and the dynamic config dir unset, so the server falls back to its /data and /configuration defaults. is kong.ExpandPath("."), i.e. the launcher process CWD (commonly the user's home root), producing ~/data and ~/configuration outside ~/.localai and an agent-pool stateDir under ~/data. Pass --data-path and --localai-config-dir explicitly, rooted at the launcher's own data directory (GetDataPath() -> ~/.localai), so data and config stay consistent with --models-path/--backends-path.	2026-06-30 22:14:59 +02:00
Adira	4ec39bb776	fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602 ) (#10607 ) * fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) When the watchdog evicts a model, deleteProcess calls the backend's gRPC Free() to release VRAM before stopping the process. Free is optional: backends that don't override it -- the generated UnimplementedBackendServer stub, many Python/external backends, or a federation proxy in distributed mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is reclaimed when the local process is stopped, or by the remote unloader for remote backends. Logging it as "WARN Error freeing GPU resources" made a benign, optional RPC look like a fault (the alarming line in #10602, seen in distributed mode where the model is remote and Free hits a stub). Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine failures still Warn. Free() is still attempted for every backend, so any backend that does implement it is unaffected. Add a reusable grpcerrors.IsUnimplemented helper following the package's existing code-based detection idiom (prefer the typed status code, fall back to the message across non-gRPC boundaries), with table tests. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> * fix(watchdog): log a non-Unimplemented Free() failure at error level Per review: now that the expected gRPC Unimplemented case is split out and logged at Debug, any remaining Free() error is a genuine failure to release VRAM, so surface it at error level instead of warn. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> --------- Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-30 22:14:01 +02:00
Ettore Di Giacinto	25ecb9f015	fix(gallery): use Q8_0 for lfm2.5-8b-a1b to fix poor tool-call quality The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B. Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF x-linked-etag) while keeping the native jinja tool-parsing config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-30 17:46:20 +00:00
LocalAI [bot]	2be495f9c0	fix(kokoros): implement AudioTranscriptionLive trait stub (#10612 ) The backend.proto AudioTranscriptionLive bidirectional streaming RPC added new required trait items (AudioTranscriptionLiveStream + audio_transcription_live) on the generated Backend trait. The kokoros (TTS) backend did not implement them, breaking its release build with E0046 (missing trait items). kokoros is text-to-speech and has no live-ASR support, so stub the method to return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 19:38:41 +02:00
LocalAI [bot]	02b007a31e	feat(config): default swa_full:true for sliding-window-attention models (#10611 ) LocalAI enables a cross-request prompt-prefix cache (cache_reuse, see core/config/serving_defaults.go) so repeated prefixes — system prompts, RAG context, agent scaffolds, multi-turn chat — are not reprocessed every turn. For sliding-window-attention (SWA) models (Gemma 2/3, Cohere2, Llama 4, ...) this silently does nothing: llama.cpp defaults to a reduced SWA KV cache sized to the sliding window, and that reduced cache cannot preserve a prompt prefix across requests, so every turn reprocesses the whole prompt anyway. llama.cpp's --swa-full (params.swa_full, already wired through the LocalAI llama.cpp backend's `swa_full` option) keeps the full KV cache so the shared prefix is reused. Enable it automatically, but only for models that are actually SWA: detection reads the gguf-parser-normalized `<arch>.attention.sliding_window` metadata (which also applies llama.cpp's family rules, e.g. Phi-3 → not SWA), right where the GGUF is already parsed for defaults. It is never applied to dense models (pure memory waste) and never overrides an explicit user `swa_full`/`n_swa` choice. Tradeoff: the full SWA cache scales with context_size, so it costs more memory at large contexts — hence the SWA gating and the documented `swa_full:false` opt-out. Assisted-by: Claude:claude-opus-4-8 [Claude Code] golangci-lint Co-authored-by: Ettore Di Giacinto <mudler@localai.io> v4.5.6	2026-06-30 17:58:17 +02:00
LocalAI [bot]	fd8cebd0b3	fix(watchdog): persist UI-saved Check Interval across restarts (#10601 ) (#10605 ) fix(watchdog): persist a UI-saved Check Interval across restarts (#10601) The watchdog Check Interval saved via /api/settings reverted to 500ms on every restart, while the idle/busy timeouts persisted correctly. Root cause: NewApplicationConfig baseline-defaulted WatchDogInterval to 500ms, whereas the idle/busy timeouts default to 0. The startup loader (loadRuntimeSettingsFromFile) applies a persisted runtime_settings.json value only when the field is still at its zero default - its heuristic for "this wasn't set by an env var". Because the interval was always 500ms at that point, the loader never read the persisted value back, so the saved interval was silently discarded on each boot. Fix: drop the non-zero baseline default so the interval behaves like the sibling timeouts (0 = unset). The effective 500ms default is now supplied at the watchdog layer: WithWatchdogInterval ignores a non-positive value so DefaultWatchDogOptions' 500ms is preserved (and a 0 interval can never turn the watchdog loop into a busy spin). Also mirror the interval in the live config file watcher alongside idle/busy, and report the real 500ms default (not the stale "2s") from ToRuntimeSettings. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 17:48:14 +02:00
LocalAI [bot]	dd625921ff	fix(macos): staple the notarization ticket to the .app, not just the dmg (#10606 ) Stapling only the dmg leaves the LocalAI.app bundle with no embedded notarization ticket. Gatekeeper then falls back to an online notarization check on first launch, so the app fails to open on a Mac that is offline or behind a firewall, or once it has been copied out of the dmg — while it keeps working on the (online) build host, which masks the problem. Notarize and staple the .app before packaging it into the dmg so the bundle verifies offline. Adds a `notarize-app` subcommand to contrib/macos/sign-and-notarize.sh (zips the bundle for notarytool, then staples + validates) and invokes it from dmg-launcher-darwin. Stays a no-op when notary secrets are unset, so unsigned local/fork builds are unaffected. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 17:38:47 +02:00
LocalAI [bot]	d74f88357e	fix(tests): align openresponses test model name with GGUF-derived naming (#10589 ) (#10609 ) PR #10589 changed repo-root HuggingFace URI imports to name the model after the selected GGUF file rather than the repository. The Open Responses API integration test still requested the old repo-derived name ("Qwen3-VL-2B-Instruct-GGUF"), so every request 404'd on an unknown model and the suite has failed on master since `1a4f68ed4`. Update testModel to the name the importer now registers for the default q4_k_m quant ("Qwen3-VL-2B-Instruct-Q4_K_M") so the specs resolve the model again. The #10589 behaviour change is intentional; only the stale test needed updating. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 15:41:44 +02:00
Adira	dfaec3bd51	fix(import): strip file:// scheme from model path for local imports (#10599 ) Importing a model from a local directory (e.g. a HuggingFace checkout or an LM Studio store) via a file:// URI produced a config whose model field kept the scheme verbatim, e.g. model: file:///Users/u/.../Qwen3-4bit. The mlx and vllm backends treat that field as a HuggingFace repo id or local path and reject the file:// form with "Repo id must be in the form 'repo_name' or 'namespace/repo_name'", so the model imported fine but failed to load (issue #7461). Add a shared LocalModelPath helper that reduces a file:// URI to the bare filesystem path it points at and leaves HuggingFace/HTTP URIs untouched, and route the mlx, vllm, transformers and diffusers importers (all of which pass details.URI straight into the model field for from_pretrained-style loading) through it. Cover the helper directly plus end-to-end file:// import specs for the mlx and vllm importers. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-30 10:21:08 +02:00
LocalAI [bot]	0e381897b5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `f74a6fb87b315b2c3154166e075360e15021a61d` (#10598 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:17:48 +02:00
LocalAI [bot]	b1af37257d	chore: ⬆️ Update CrispStrobe/CrispASR to `3b93758f9725d400eca82976f895e4cec3f31260` (#10597 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:17:11 +02:00
LocalAI [bot]	ebefa6dcca	chore: ⬆️ Update localai-org/privacy-filter.cpp to `595f59630c69d361b5196f2aba2c71c873d0c13c` (#10596 ) ⬆️ Update localai-org/privacy-filter.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:16:52 +02:00
LocalAI [bot]	605348925d	chore: ⬆️ Update ggml-org/llama.cpp to `6f4f53f2b7da54fcdbbecaaa734337c337ad6176` (#10595 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:16:37 +02:00
LocalAI [bot]	686ce10b54	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3b6c9ca97cfcda8e68e719e6670d06379fcbe943` (#10594 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-30 09:16:21 +02:00
pos-ei-don	2cee318fad	fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall (#10592 ) fix(functions): avoid quadratic-time debug logging in CleanupLLMResult/ParseFunctionCall The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go) calls CleanupLLMResult / ParseFunctionCall once per delta chunk with the full accumulated LLM result so far. Both functions xlog.Debug the entire argument on entry and exit, so a single N-chunk stream emits roughly chunk_size * N^2 bytes of debug output. Under LOG_LEVEL=debug this was observed in a recent SGLang-via-LocalAI session on a DGX Spark host (about 50K tokens, long streaming generation) to drive container logs to ~96 GiB, which interacted with the streaming hot loop on the same filesystem and contributed to a host-wide hard hang once disk pressure built up. Workaround was setting LOG_LEVEL=info, but the quadratic shape remains a foot-gun for anyone intentionally enabling debug. Replace the four result-content debug arguments with len(...) plus a fixed-size head (200 bytes via a new truncForLog helper), bounding per- call output to a constant. The debug signal stays useful: the first 200 chars are enough to identify which generation is in flight, and the length lets you observe growth without paying for the payload itself. No API change. No behaviour change for LOG_LEVEL != debug. Signed-off-by: Poseidon <philipp.wacker@ibf-solutions.com> Co-authored-by: Poseidon <philipp.wacker@ibf-solutions.com>	2026-06-30 09:16:03 +02:00
Adira	1a4f68ed4a	fix(import): derive model name from selected GGUF for repo-root URIs (#10589 ) When importing a HuggingFace GGUF model from a repository-root URI (no file component, e.g. hf://owner/repo) with the Model Name field left blank, the importer named the model after the repository (filepath.Base(details.URI)) instead of the GGUF file it actually selected from the repo listing (issue #10587). Track whether the user supplied an explicit name; the URI base is now only a fallback. In the HuggingFace branch, once the model group is picked, re-derive the name from the selected GGUF via a new modelNameFromShardGroup helper that uses ShardGroup.Base minus the .gguf extension. For sharded models this yields a clean logical name (e.g. Qwen3-30B-A3B-Q4_K_M) rather than a shard filename like ...-00001-of-00002. An explicit name preference still always wins, and the .gguf/URL/OCI paths are unchanged. Add network-free unit specs covering name-from-GGUF, clean-name-from-shard-base, and explicit-name precedence, and update the live integration specs that had encoded the previous repo-name behaviour. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-30 09:03:27 +02:00
Adira	28d7397743	fix(openai): stop max_tokens streaming retry loop on reasoning models (#9716 ) (#10448 ) fix(openai): stop max_tokens streaming retry loop on reasoning models When a thinking model spends its entire max_tokens budget on the reasoning block, the C++ autoparser clears the raw Response and delivers reasoning-only ChatDeltas (no content, no tool calls). ComputeChoices' empty-response retry then fires and regenerates from scratch up to maxRetries times, each re-consuming the whole budget, instead of terminating with finish_reason "length" (issue #9716). Add a reachedTokenBudget helper and suppress both the built-in and caller-driven retries when the completion count has reached the configured max_tokens ceiling. Report finish_reason "length" instead of "stop" in the streaming and non-streaming chat paths when the budget was exhausted. Adds a deterministic regression test that counts backend invocations (previously 6, now 1) plus boundary tests for the helper. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Dennisadira <dennisadira@gmail.com>	2026-06-30 09:01:53 +02:00
Richard Palethorpe	5d0c43ec6e	feat(realtime): Semantic VAD EOU token (#10444 ) * feat(realtime): EOU-driven semantic_vad turn detection Add a `semantic_vad` turn-detection mode to the realtime API that feeds the transcription model live and decides "the user finished speaking" from the `<EOU>` end-of-utterance token rather than from silence alone. When EOU fires the turn commits immediately (~0.3s); otherwise it falls back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s). Plumbing, bottom to top: - proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof, mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus `TranscriptResult.eou` for the unary retranscribe gate. - pkg/grpc: client/server/base/embed scaffolding for the bidi stream, modeled on AudioTransformStream; release stream conns on terminal Recv. - parakeet-cpp: live transcription RPC with per-C-call engine locking (one live stream per turn, finalize+free at commit); bump parakeet.cpp to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel recompute that delayed EOU on long turns) and the <EOU>/<EOB> split; strip the literal <EOU>/<EOB> from offline text and set Eou. - core/backend: LiveTranscriptionSession wrapper + pipeline `turn_detection:` config block (type/eagerness/retranscribe). - realtime: semantic_vad integration — live input captions streamed as transcription deltas while the user speaks, EOU-immediate commit with eagerness fallback, optional retranscribe gate (batch re-decode must also end in <EOU> to confirm), clause synthesis off the LLM token callback, and per-turn live-transcription / model_load telemetry. - UI: show the realtime pipeline components as a vertical list. Docs and tests included; opt-in via the pipeline YAML or per-session `session.update`. Non-streaming STT backends degrade to silence-only. Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash] Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): explicit formally-verified state machines + parakeet streaming driver The realtime API had several implicit state machines whose state was inferred from scattered booleans, channels, and five separate mutexes, leaving illegal/inconsistent states reachable. Make them explicit and keep the implementation in step with a formal design; rework the parakeet streaming backend along the same lines. Realtime state machines (M1-M5). Each is a sealed sum-type State/Event/Effect with a total, pure Next(state,event)->(state,[]effect) behind a single-writer Coordinator: M1 conncoord connection lifecycle: VAD toggle + once-only teardown (replaces vadServerStarted + a `done` channel closed from two sites). M2 turncoord turn detection: collapses speechStarted and the live-stream "turn open" flag into one state, so discardTurn can no longer desync them and suppress the next onset. M3 respcoord response coordination: serializes the dual-writer start/cancel so at most one response is live; one response.done per response.create. M4 compactcoord conversation compaction: single-flight (replaces the `compacting atomic.Bool` CAS). M5 ttscoord TTS pipeline: open->closing->closed, idempotent wait(), rejects enqueue-after-close (was a silent drop). The Coordinator/Sink/Next plumbing — only the sealed types and Next differed per machine — is extracted once into core/http/endpoints/openai/coordinator as a generic Coordinator[S,E,F]; each machine keeps its public API via type aliases, so no sink, call-site, or test moved. Hierarchy. session_lifecycle.fizz models M1 as the parent region with its children (M2/M3/M4) as one statechart and asserts ChildrenDieWithParent (conn torn => all children terminal, none start after teardown). respcoord and compactcoord gain an absorbing Terminated state + Shutdown event; conncoord's teardown drives the children terminal. This closes a compaction teardown gap: a fire-and-forget compaction could outlive a torn session — compactionSink now takes a session-scoped cancellable context + WaitGroup and joins the in-flight summarize+evict on shutdown. Formal verification. formal-verification/ holds one authoritative FizzBee spec per machine plus the composition spec, each with an always-assertion and a documented one-line edit that makes the checker fail (verified non-vacuous). scripts/realtime-conformance.sh is fail-closed: all Go conformance suites under -race AND a model-check of every .fizz spec; a missing FizzBee is a hard error (only the loud REALTIME_CONFORMANCE_SKIP_FIZZBEE=1 bypasses it, never in CI). FizzBee is pinned by sha256 and installed via scripts/install-fizzbee.sh into .tools/ (gitignored). Wired as make test-realtime-conformance, a CI workflow, and a pre-commit path filter. Go conformance tests are Ginkgo/Gomega (per the repo's forbidigo lint): transition tables + fixed-seed property walks + concurrent/-race specs, no rapid dependency. Design map: docs/design/realtime-state-machines.md. Parakeet streaming backend. The same treatment applied to the parakeet-cpp streaming paths: - AudioTranscriptionStream returns codes.Unimplemented for non-streaming models instead of decoding offline and emitting it as one delta + final. A client that asked for streaming learns the model cannot stream rather than receiving a batch result shaped like a stream. New grpcerrors.StreamTranscriptionUnsupported carries that signal; the HTTP /v1/audio/transcriptions stream path surfaces it as an SSE error event. Mirrors AudioTranscriptionLive, which already did this. - utteranceBoundary (boundary.go): a single definition of the end-of-utterance latch, replacing three open-coded finalEou toggles. Modelled as a two-valued type so illegal states are unrepresentable. - Shared decode driver (driver.go): streamFeedResult (one per-feed event) + feedChunk (hides the ABI v4 JSON vs text-only split) + feedSlices + flushTail. The feed loop is written once. - AudioTranscriptionLive becomes a bidi adapter: it streams the per-feed {delta,eou,eob,words} the realtime turn detector consumes and a terminal FinalResult carrying only Text. Segments/duration/eou are offline-only and no longer produced (nor read) on the live path; liveTraceState drops the terminal eou and keeps the per-feed eou_events count. - AudioTranscriptionStream + streamJSON merge into one driver-based function; streamSegmenter is generalized to the unified event with a text-only fallback that preserves the legacy (no-words) library's per-utterance segmentation. Verified: build/vet/gofumpt clean, golangci-lint 0 issues, all coordinator and parakeet packages under -race, the fail-closed conformance gate green, and make test-realtime (12 e2e WS+WebRTC). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-30 09:01:22 +02:00
pos-ei-don	6ab29ec8b9	fix(sglang): parse tool_call function arguments before applying the chat template (#10558 ) OpenAI wire format carries `function.arguments` as a JSON-encoded string, but chat templates (e.g. Qwen3-Coder) iterate over it as a mapping. The vllm backend already parses arguments before applying the chat template (PR #10256); this mirrors that fix in the sglang backend. Without this fix the second turn of any tool-using session (assistant returns tool_calls, user posts `role:"tool"` result, model is invoked with arguments still as a string) crashes inside transformers' Jinja chat-template rendering with: TypeError: Can only get item pairs from a mapping. File ".../transformers/utils/chat_template_utils.py", in render_jinja_template File ".../jinja2/filters.py", in do_items raise TypeError("Can only get item pairs from a mapping.") Reproduced on `lmsysorg/sglang:v0.5.14` via LocalAI v4.5.4 with `saricles/Qwen3-Coder-Next-NVFP4-GB10` (W4A4 NVFP4 / compressed-tensors) on NVIDIA DGX Spark (GB10, sm_121). After the patch, a tool-call roundtrip (assistant tool_calls -> tool result -> assistant final answer) returns http=200 with the expected follow-up content; no behaviour change on requests that don't carry tool_calls. Signed-off-by: Poseidon <philipp.wacker@ibf-solutions.com> Co-authored-by: Poseidon <philipp.wacker@ibf-solutions.com>	2026-06-30 09:00:51 +02:00
dependabot[bot]	036f950b1b	chore(deps): bump actions/cache from 4 to 6 (#10593 ) Bumps [actions/cache](https://github.com/actions/cache) from 4 to 6. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/v4...v6) --- updated-dependencies: - dependency-name: actions/cache dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-29 22:31:10 +02:00
LocalAI [bot]	5b7b914b4f	chore(recon): re-pin voice/face-detect to squashed release commits (+ graph-cache fix) (#10591 ) chore(recon): re-pin voice/face-detect to squashed release commits The voice-detect.cpp and face-detect.cpp engine repos were squashed to a single release commit, which orphaned the previous pins (voice 3d51077, face 06914b0). Re-pin to the new single-commit SHAs (voice 1db1759, face e22260d). These also fold in a real correctness fix: the persistent graph-cache fingerprint now includes op_params, so two structurally identical GGML_OP_CUSTOM graphs (a blocked 3x3 vs a blocked 1x1 strided conv) can no longer false-hit the cache and replay the wrong kernel. voice CI was failing test_blocked/conv1x1_s2 with an out-of-bounds write on the GGML_NATIVE=OFF build; both engine repos are now green and WeSpeaker embed parity is 1.0 vs golden. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-29 18:48:47 +02:00
LocalAI [bot]	d1cee4c52a	chore: ⬆️ Update vllm-metal (darwin) to `v0.3.0.dev20260628073537` (#10562 ) ⬆️ Update vllm-project/vllm-metal (darwin) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 09:13:22 +02:00
LocalAI [bot]	baaa0fe94f	chore: ⬆️ Update mudler/face-detect.cpp to `06914b077d52f90d5421299138e7be6bdd06b5e8` (#10580 ) ⬆️ Update mudler/face-detect.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:04:22 +02:00
LocalAI [bot]	c3b5c7c3fa	chore: ⬆️ Update mudler/voice-detect.cpp to `3d510772357538c5182808ac7de2278b84824e24` (#10581 ) ⬆️ Update mudler/voice-detect.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:03:43 +02:00
LocalAI [bot]	bd1ec8f2c2	chore: ⬆️ Update ggml-org/llama.cpp to `dbdaece23de9ac63f2e7ca9e6bfcdc4fc156a3fa` (#10582 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:03:20 +02:00
LocalAI [bot]	135debf9af	chore: ⬆️ Update CrispStrobe/CrispASR to `6b50f76e59700665358a1aabf5295597fa318e06` (#10583 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-29 08:03:06 +02:00

1 2 3 4 5 ...

6927 Commits