LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-26 01:16:58 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	00288b21cc	docs(backends): make OS coverage explicit + require darwin for new backends The backend matrix is the source of truth for which OS a backend ships on, but that was never written down, so backends were landing Linux-only by default even when the engine builds fine on macOS. - .github/backend-matrix.yml: header block documenting the two matrices (include = Linux, includeDarwin = macOS/Apple Silicon) and the policy that new backends target every OS they can build for. - .agents/adding-backends.md: a 'Cover every OS' subsection in step 2 (full darwin wiring: includeDarwin entry, index.yaml metal: + metal-<backend> entries, run.sh DYLD branch + inferBackendPathDarwin case for C++ backends, the hw_grpc_proto protobuf/grpc link gotcha, and the path-filter touch) plus a verification-checklist item. - AGENTS.md (CLAUDE.md): Quick Reference pointer so it surfaces every session. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code]	2026-06-25 21:04:25 +00:00
LocalAI [bot]	286c508ce0	feat(backends): darwin build for the localvqe backend (acoustic echo cancellation) (#10512 ) feat(backends): darwin build for the localvqe backend LocalVQE (acoustic echo cancellation / noise suppression / dereverberation) already builds on Darwin - its Makefile takes the OS=Darwin branch with GGML_METAL=OFF (upstream is CPU + Vulkan only), producing a native arm64 CPU image. It was just never wired into CI. - .github/backend-matrix.yml: add localvqe to includeDarwin (build-type metal, lang go) - the darwin/arm64 build profile; the backend itself stays CPU. - backend/index.yaml: metal: capability + concrete metal-localvqe(-development) entries pointing at the -metal-darwin-arm64-localvqe images. - backend/go/localvqe/Makefile: note on the existing Darwin branch (also the per-backend change the CI path filter needs to build it here). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 22:54:36 +02:00
LocalAI [bot]	d1a9d59917	feat(backends): darwin/Metal builds for vision C++/ggml backends (depth-anything, locate-anything, rfdetr-cpp, sam3-cpp) (#10511 ) feat(backends): darwin/Metal builds for the vision C++/ggml backends depth-anything-cpp, locate-anything-cpp, rfdetr-cpp and sam3-cpp already carry a Darwin/Metal path in their Makefiles (GGML_METAL=ON when build-type=metal), but were never wired into CI, so no Metal image was published and Apple Silicon could not install them. - .github/backend-matrix.yml: add the four to includeDarwin (build-type metal, lang go), matching the other go+ggml -cpp Metal entries. - backend/index.yaml: add metal: to each backend's capabilities map (main and -development) plus concrete metal-<backend>(-development) entries pointing at the latest/master -metal-darwin-arm64-<backend> images. - backend/go//Makefile: a one-line note on the existing Darwin branch (also the per-backend change the CI path filter needs to actually build them here). Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 22:07:56 +02:00
LocalAI [bot]	f72046b5b5	fix(auth): make advisory locks dialect-aware and harden SQLite DSN (#10509 ) * fix(auth): make advisory locks dialect-aware and harden SQLite DSN Fixes #10506. Two failures hit deployments that use the default SQLite auth database: 1. advisorylock executed PostgreSQL-only SQL (pg_advisory_lock / pg_try_advisory_lock) unconditionally. On a SQLite auth DB the job store, agent store and node registry migrations failed with "no such function: pg_advisory_lock". WithLockCtx/TryWithLockCtx now branch on the gorm dialect: PostgreSQL keeps the cross-process advisory lock, every other dialect uses a context-aware, per-key in-process lock (a SQLite auth DB is effectively single-process, so serializing within the process is sufficient). 2. The SQLite auth DSN set no busy timeout, so transient SQLITE_BUSY over network-backed storage (SMB/CIFS/NFS, e.g. Azure Files) failed the auth migration immediately with "database is locked". The DSN now sets _busy_timeout=5000 and _txlock=immediate (caller-supplied values are preserved). WAL is intentionally not enabled since its shared-memory mmap does not work over network filesystems. Docs note that PostgreSQL should be used when the data directory lives on shared storage. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(jobs): regression test for #10506 SQLite job store migration Exercises the exact caller chain that failed in the issue: auth.InitDB(sqlite) -> jobs.NewJobStore -> advisorylock.WithLockCtx -> AutoMigrate. Before the dialect-aware advisory lock fix this failed with "no such function: pg_advisory_lock"; the test now asserts it migrates cleanly on a SQLite auth DB. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 17:18:55 +02:00
LocalAI [bot]	79783120dd	fix(config): gate parallel-slot default on per-device VRAM too (#10485 ) (#10507 ) The first #10485 fix (#10494) made the Blackwell physical-batch boost per-device/context-aware, which neutralized the big compute-buffer OOM, but the reporter's 2x16 GiB consumer Blackwell still OOM'd. Tracing the post-fix log: the model now loads its weights, builds the main context and warms up fine, and dies only on the last allocation — the MTP draft context's 800 MiB KV cache on the tighter device. #10411 changed only two defaults: the physical batch (now gated) and a VRAM-scaled parallel-slot count. The KV cache is unified (n_ctx_seq == full context proves slots share the budget, so parallel doesn't multiply KV), but n_seq_max=4 still adds per-slot compute-graph / context-checkpoint / output scratch. On a device packed ~99% by a 27B model spanning both cards, that overhead is the few-hundred-MiB straw — which is why reverting #10411 (and only #10411) restores a working load. Gate the parallel-slot default on the same per-device headroom predicate as the batch boost: when a large context already fills a single card (largeContextForDevice), keep n_parallel=1. A user running one big-context model that barely fits across two consumer GPUs is not serving four concurrent tenants. Small contexts and large unified-memory devices (GB10) keep full concurrency. Applied on both the single-host path and the distributed router. Also make the auto-tuning visible and reversible (the debugging here needed DEBUG logs and a git bisect): - Log the effective performance-relevant runtime options at INFO once per model load ("effective runtime tuning …": context, n_batch, n_gpu_layers, parallel, flash_attention, f16) so an admin can see what will run and pin or override any value in the model YAML. - LOCALAI_DISABLE_HARDWARE_DEFAULTS=true skips the hardware auto-tuning entirely (mirrors LOCALAI_DISABLE_GUESSING) for stock llama.cpp behavior. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 15:48:23 +02:00
LocalAI [bot]	4ac67d255d	feat: single-build ggml CPU_ALL_VARIANTS for llama-cpp + turboquant (x86/arm64/apple) (#10497 ) * feat(llama-cpp): single x86 CPU build via ggml CPU_ALL_VARIANTS Replace the per-microarch avx/avx2/avx512/fallback multi-binary build on x86 with a single grpc-server plus the dlopen-able libggml-cpu-.so set that ggml's backend registry selects at runtime by probing host CPU features. One build instead of four, broader microarch coverage (adds alderlake AVX-VNNI, zen4 AVX512-BF16, sapphirerapids AMX), and the shell-side /proc/cpuinfo probing in run.sh goes away. Build/link notes: - CPU_ALL_VARIANTS requires GGML_BACKEND_DL + BUILD_SHARED_LIBS=ON, so ggml/llama become shared objects. SHARED_LIBS is now a make variable (default OFF) so the override survives the recursive sub-make into the VARIANT build dir instead of being re-clobbered by the base flags. - The cpu-all target also builds "--target ggml": the per-microarch backends are runtime-dlopened, not link deps, so they only compile via ggml's add_dependencies(). - hw_grpc_proto is pinned STATIC. Under BUILD_SHARED_LIBS=ON it would otherwise become a DSO referencing hidden-visibility symbols in the static libprotobuf.a, which fails to link ("hidden symbol ... is referenced by DSO"). Keeping it static links gRPC/protobuf into the executable while only ggml/llama stay shared, so no PIC or base-image change is required. - package.sh bundles the libggml-.so set into package/lib; ggml finds them by scanning the bundled ld.so directory (/proc/self/exe), which run.sh launches from. Scope: x86 only. arm64/darwin keep the single fallback build. The ik-llama-cpp / turboquant forks and the other ggml C++ backends are unchanged; the same recipe applies but is out of scope here. Validated with a full docker build plus a live inference smoke test: the model loads, ggml selects the AVX512_BF16 variant on a Zen-class host, and tokens generate correctly. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(llama-cpp,turboquant): extend CPU_ALL_VARIANTS to arm64 + turboquant - llama-cpp: x86 AND arm64 now use the single llama-cpp-cpu-all build (only hipblas keeps the fallback build). ggml's arm64 variant table (armv8.x / armv9.x, plus apple_m* on darwin) is selected at runtime. - turboquant: same recipe via a turboquant-cpu-all target. turboquant copies backend/cpp/llama-cpp's CMakeLists.txt + Makefile per flavor, so the hw_grpc_proto STATIC fix and the SHARED_LIBS / EXTRA_CMAKE_ARGS make-vars are inherited; the target just passes SHARED_LIBS=ON, the DL flags and --target ggml through, then collects the .so set. run.sh and package.sh updated to ship/select turboquant-cpu-all. - Makefile lib-collection find now also matches .dylib (for the darwin build, which emits dylibs rather than .so). ik-llama-cpp is intentionally left unchanged: its pinned ggml has no CPU_ALL_VARIANTS support and its IQK kernels require AVX2, so the per-microarch dynamic backend set does not apply. Scope still excludes the darwin packaging wiring (separate change). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] feat(llama-cpp,turboquant): arm64 gcc-14 for SME variants + darwin cpu-all packaging - arm64: ggml CPU_ALL_VARIANTS builds armv9.2 SME variants whose -march=...+sme is rejected by the Ubuntu 24.04 default gcc-13. Build the arm64 variants with gcc-14 (installed in the compile step). The host only selects a variant it actually supports at runtime, but every variant must still compile. - darwin: scripts/build/llama-cpp-darwin.sh builds llama-cpp-cpu-all instead of the fallback binary, keeps Metal (GGML_METAL stays ON; --target ggml also builds ggml-metal). The per-microarch libggml-cpu-.dylib are placed in the package root next to the binary (darwin has no bundled ld.so, so ggml's executable-dir scan looks there), while the other shared dylibs go in lib/ for DYLD_LIBRARY_PATH. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] fix(llama-cpp-darwin): distribute ggml backends by suffix (.so root, .dylib lib) ggml emits its loadable backends (per-microarch CPU variants, metal, blas) with a .so suffix even on darwin, while the core libraries (ggml-base/ggml/llama/ llama-common/mtmd) use .dylib. Split the distribution by suffix: .so DL backends go in the package root for ggml's executable-directory scan, .dylib core libs go in lib/ for DYLD_LIBRARY_PATH. The previous .dylib name-pattern matched none of the variants. Verified on an M4: ggml loads the apple_m4 CPU variant (SME=1) and Metal, model loads and generates correct tokens. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(llama-cpp,turboquant): only CPU_ALL_VARIANTS for pure-CPU builds, GPU uses fallback The previous gate sent every non-hipblas build through llama-cpp-cpu-all, so the GPU image builds (cublas, sycl_f16/f32, vulkan, nvidia l4t) compiled the whole CPU microarch variant matrix on top of their already-huge GPU backend - blowing the build time (the sycl job was only 59% done after 2h11m) - and the arm64 l4t build failed at `apt-get install gcc-14` (exit 100) on the Jetson base. Gate on an empty BUILD_TYPE instead: only the pure CPU image (build-type: '' in .github/backend-matrix.yml) builds the CPU_ALL_VARIANTS set; every GPU build gets a single fallback CPU grpc-server, since the accelerator does the compute. This also confines the arm64 gcc-14 step (needed for the armv9.2 SME variants) to the CPU build, away from the GPU base images. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(llama-cpp): correct run.sh comment for arm64/darwin cpu-all arm64 and darwin CPU images now also ship llama-cpp-cpu-all (not fallback-only); only GPU images ship fallback-only. Fix the stale comment to match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 15:47:03 +02:00
LocalAI [bot]	3a87d9e48f	feat(vllm): macOS/Metal support via vllm-metal (MLX) (#10489 ) * feat(vllm): macOS/Metal support via vllm-metal (MLX) Add an additive Apple-Silicon path to the existing vllm Python backend so vLLM runs on macOS via vllm-metal (github.com/vllm-project/vllm-metal). Spike outcome (proven on a real M4 / macOS 26.5, Qwen3-0.6B): - vllm-metal registers through vLLM's platform-plugin entry point (metal -> vllm_metal:register); MetalPlatform activates and runs on the GPU through MLX. - LocalAI's backend.py is UNCHANGED: AsyncEngineArgs(...) -> AsyncLLMEngine.from_engine_args transparently resolves to vLLM 0.23's v1 AsyncLLM MLX engine, and async generate produced correct output. - backend.py is NOT touched: its only empty_cache() call is CUDA-only (guarded by torch.cuda.is_available()), so the benign shutdown-only "Allocator for mps is not a DeviceAllocator" noise comes from vLLM's internal EngineCore teardown, not from our code. Changes (all gated behind a darwin condition; Linux/CUDA/ROCm/Intel paths are byte-for-byte unchanged): - install.sh: darwin branch forces PYTHON_VERSION=3.12 (vllm-metal requirement), creates/activates LocalAI's managed venv via ensureVenv, then reproduces vllm-metal's installer INTO that venv (build vLLM 0.23.0 from the release source tarball against requirements/cpu.txt, then install the prebuilt vllm-metal wheel from its latest GitHub release), and runs runProtogen. installRequirements is skipped on darwin. - backend-matrix.yml: add a vllm includeDarwin entry (mps, python). - index.yaml: add metal capability + concrete metal-vllm / metal-vllm-development child entries mirroring the metal-kitten-tts template. Version coupling: vllm-metal pins vLLM 0.23.0, equal to LocalAI's current vllm pin. Bumping vllm must be coordinated with a supporting vllm-metal release; documented in install.sh and requirements-cublas13-after.txt. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * chore(vllm): track the darwin vllm-metal pin via the autobumper The Apple Silicon build pinned vLLM 0.23.0 as a hidden string in install.sh while floating the vllm-metal wheel on releases/latest - the two could drift apart silently. Make both a tracked, reproducible pair (VLLM_METAL_VERSION + VLLM_VERSION), fetch the wheel by tag, and add .github/bump_vllm_metal.sh wired into bump_deps.yaml. It tracks vllm-project/vllm-metal (not vllm/vllm latest), reading the coupled vLLM source version from vllm-metal's own installer, and opens a bump PR - mirroring the existing bump_vllm_wheel.sh for the cu130 wheel. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * chore(vllm): derive the darwin vLLM version, drop the second pin Follow-up: VLLM_VERSION was still a hardcoded string duplicating what VLLM_METAL_VERSION already determines. Derive it at install time from vllm-metal's own installer (vllm_v=) at the pinned tag - one source of truth, no second value to drift. The bumper now touches only VLLM_METAL_VERSION; the derivation is immutable per tag, so builds stay reproducible. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(vllm): fetch the vllm-metal wheel without the GitHub API The darwin build resolved the wheel URL via api.github.com, whose unauthenticated rate limit (60/hr per IP) 403s on shared macOS runners (observed after the 9-min vLLM source build). Construct the release-asset download URL deterministically from the pinned tag and the cp312/arm64 wheel name instead - no API call, no rate limit. Verified the URL resolves (200). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(vllm): fail Score cleanly when the engine returns no prompt_logprobs Audit of the Score path against vllm-metal (MLX on macOS): the engine accepts SamplingParams(prompt_logprobs=1) but returns an all-None prompt_logprobs list rather than computing it, so scoring is not supported there. The old guard treated the truthy [None] list as valid and silently scored every candidate as 0. Detect the all-None case and return UNIMPLEMENTED instead. No-op on Linux/CUDA, which populate real entries. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 15:46:19 +02:00
LocalAI [bot]	693e3eec05	chore(model gallery): 🤖 add 1 new models via gallery agent (#10505 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:52 +02:00
LocalAI [bot]	f1e5071321	chore: ⬆️ Update leejet/stable-diffusion.cpp to `8caa3f908ae6d4a4bef531e73b9a969f266a3d1f` (#10493 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:31 +02:00
LocalAI [bot]	93d6255de3	chore: ⬆️ Update ggml-org/llama.cpp to `8be759e6f70d629638a7eb70db3824cbdcea370b` (#10501 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:11:17 +02:00
LocalAI [bot]	fe4f425fb5	fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482 ) (#10504 ) * fix(http): harden BaseURL proxy scheme/host detection Split comma-separated X-Forwarded-Proto and honor the RFC 7239 Forwarded header so generated links use https behind common reverse-proxy setups. Refs #10482 Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(http): honor explicit external base URL in BaseURL When _external_base_url is set in the request context it dictates the origin (scheme+host+port); the proxy path prefix is still appended. Refs #10482 Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(config): generalize LOCALAI_BASE_URL to ExternalBaseURL LOCALAI_BASE_URL now sets a single instance-wide external base URL used for OAuth callbacks and all self-referential links. A Pre middleware stamps it into the request context for middleware.BaseURL. Refs #10482 Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: document LOCALAI_BASE_URL and reverse-proxy headers Refs #10482 Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(http): cover parseForwarded edge cases; clarify base-url flag group Adds direct unit coverage for quoted/malformed/multi-element Forwarded headers and regroups the external base URL flag away from auth-only. Refs #10482 Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:10:59 +02:00
LocalAI [bot]	fae9f6356f	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `9dbe7ea26a01b30fccb117ae5e86807c1dc23d42` (#10499 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 08:10:41 +02:00
LocalAI [bot]	066abf82c0	feat(llama-cpp): cpu_moe/n_cpu_moe options + generic upstream-flag passthrough (#10490 ) * feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets users keep MoE expert weights on CPU to manage VRAM on large MoE models. Closes part of #10483 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): forward unknown '-' options to upstream arg parser Any options: entry starting with '-' is collected and passed verbatim to llama.cpp's own common_params_parse (LLAMA_EXAMPLE_SERVER) at the end of params_parse, so every upstream llama-server flag works without a new hand-wired branch. Passthrough runs last and wins on overlap; n_parallel is snapshotted to survive parser_init's SERVER reset, and help/usage/completion flags are skipped to avoid exiting the backend. Closes #10483 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(llama-cpp): document cpu_moe/n_cpu_moe and option passthrough Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(llama-cpp): terminate tensor/kv override vectors after passthrough The tensor_buft_overrides padding and the kv/draft override terminators ran before the generic option passthrough, so a passthrough flag (--cpu-moe, --override-tensor, --override-kv, ...) appended a real entry after the null sentinel - tripping the model loader's back().pattern == nullptr assertion (crash) or being silently dropped. Move all three termination/padding blocks to the end of params_parse, after both the named-option loop and common_params_parse have pushed their real entries. Also widen the exit()-flag skip list so --version, --license, --list-devices and --cache-list cannot terminate the backend. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:10:08 +02:00
LocalAI [bot]	a7fec9a49d	feat(backends): add darwin/metal (MPS) build for trl (#10487 ) * feat(backends): add darwin/metal (MPS) build for trl Authors backend/python/trl/requirements-mps.txt and wires trl into the darwin CI matrix and gallery so the MPS training path can be built and validated on Apple Silicon. The MPS variant installs plain PyPI torch wheels (MPS-capable on macOS arm64) and the trl training stack; bitsandbytes is omitted as it is a CUDA-only dependency with poor Apple Silicon support. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(trl): guard uv-only --index-strategy for the pip/darwin path The darwin/MPS build installs with pip (USE_PIP=true), which rejects the uv-only --index-strategy flag and failed the darwin backend build. Add it only on the uv path; Linux/CUDA resolution is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:09:36 +02:00
LocalAI [bot]	c678530cf0	fix(backends): darwin/metal support across purego Go backends (#10481 ) * fix(parakeet-cpp): darwin/metal support (libparakeet.dylib + DYLD path) The parakeet-cpp backend had no macOS support and panicked at startup on Apple/Metal nodes when purego.Dlopen could not find "libparakeet.so". Fix it across the same four layers the sibling voxtral backend already handles correctly: - main.go: default the dlopen target to libparakeet.dylib on darwin (runtime.GOOS), libparakeet.so elsewhere; PARAKEET_LIBRARY still wins. - Makefile: also stage the built libparakeet.dylib next to the Go sources. - package.sh: accept either the Linux .so[.X.Y] or the macOS .dylib when bundling instead of hard-failing when no .so is present (the macOS case); note that on Darwin only system frameworks are linked. - run.sh: on Darwin set DYLD_LIBRARY_PATH and PARAKEET_LIBRARY to the packaged .dylib; keep LD_LIBRARY_PATH + .so on Linux. Mirrors backend/go/voxtral. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(backends): darwin/metal support across purego Go backends The parakeet-cpp fix in the previous commit was an instance of a bug shared by nearly every purego/dlopen Go backend: the dlopen target was hardcoded to a .so name and run.sh exported only LD_LIBRARY_PATH, so the backend panicked at startup on macOS/Apple-Metal nodes (dyld needs the .dylib name and DYLD_LIBRARY_PATH). voxtral was the only backend handling this correctly. Apply the same four-layer fix (mirroring backend/go/voxtral) to the remaining affected backends: whisper, sherpa-onnx, ced, stablediffusion-ggml, vibevoice-cpp, qwen3-tts-cpp, omnivoice-cpp, crispasr, acestep-cpp, locate-anything-cpp, depth-anything-cpp, rfdetr-cpp, sam3-cpp, localvqe Per backend: - main.go (sherpa-onnx: backend.go, two libraries): default the dlopen target to the .dylib on darwin (runtime.GOOS), .so elsewhere; the existing <BACKEND>_LIBRARY env override still wins. - run.sh: on Darwin set DYLD_LIBRARY_PATH and point <BACKEND>_LIBRARY at the packaged .dylib; keep LD_LIBRARY_PATH + the Linux CPU-variant (avx/avx2/avx512) selection unchanged in the else branch. - package.sh: also bundle the .dylib and stop hard-failing when no .so is present (the macOS case). - Makefile: also stage the built .dylib. Notes: - stablediffusion-ggml and acestep-cpp build their lib as a CMake MODULE, which emits .so (not .dylib) on macOS; run.sh prefers .dylib and falls back to .so so both layouts work. - sherpa-onnx was already partly darwin-aware (Makefile/package.sh); only run.sh and the two dlopen defaults needed fixing. Linux behavior is unchanged. Verified gofmt-clean and `CGO_ENABLED=0 go build` for every backend. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 08:09:18 +02:00
LocalAI [bot]	3c63431e46	chore: ⬆️ Update ServeurpersoCom/omnivoice.cpp to `0f37401bebe9b20c0160a888e592108fc1d17607` (#10492 ) ⬆️ Update ServeurpersoCom/omnivoice.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 00:57:58 +02:00
LocalAI [bot]	3f647a2764	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `d5507e33ae7ee2b7b41475f08044d3bde3b839ee` (#10498 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-25 00:57:42 +02:00
LocalAI [bot]	f88981cdce	feat(ui): data-driven hardware model recommendations + gallery surfacing (#10500 ) * feat(ui): make hardware starter models data-driven The empty-state starter widget recommended from a hardcoded list, which drifts as the gallery evolves. Add useRecommendedModels: it queries the live gallery for chat-capable models (their natural curated order, since the gallery exposes no popularity signal), estimates size/VRAM for the top candidates via the existing estimate endpoint, and ranks by hardware fit - smallest on CPU-only boxes, largest-that-fits on GPUs. StarterModels now renders those live picks and keeps the curated static list only as an offline/trimmed-gallery fallback. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): recommend models for your hardware in the gallery Hardware-aware recommendations were only shown on the first-run empty state. Surface them on the main Models gallery too: a dismissible "Recommended for your hardware" strip at the top, sharing the useRecommendedModels fit-ranking with the starter widget. CPU-only boxes get small models; GPUs get the largest picks that fit VRAM, with size and VRAM shown per card. One-click install; dismissal persists per browser. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): gpu-mid tier + NVIDIA NVFP4 model recommendations Refine the hardware recommendation tiers and curated picks: - Add a gpu-mid tier (8-24GB VRAM) between gpu-small and gpu-large, so ~27B-class models are suggested separately from the 30B+ large tier. - Detect NVIDIA GPUs (resources.gpus[].vendor) and, on NVIDIA only, prefer NVFP4 + MTP variants (Blackwell-optimised); NVFP4 models are filtered out of recommendations on non-NVIDIA hardware where they can't run. This applies to both the live ranking and the static fallback, with an NVFP4 badge shown on those picks. - Refresh the curated fallback to current models: Gemma-4 QAT Q4 builds at every tier, low qwen3.5 (4B distilled / 9B) on CPU/small, qwen3.6-27b and MTP variants at mid, qwen3.6/qwen3.5 35B-A3B apex/distilled at large. All names verified against gallery/index.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 00:22:45 +02:00
LocalAI [bot]	0d6de15ae9	fix(config): per-device VRAM headroom for Blackwell defaults (#10485 ) (#10494 ) The hardware-tuned defaults from #10411 were measured on a GB10 / DGX Spark (128 GiB unified memory) and over-provisioned multi-GPU consumer Blackwell (e.g. 2x16 GiB RTX 50-series) into CUDA OOM during model init: - The Blackwell physical batch (512 -> 2048) sets both n_batch and n_ubatch. The compute buffer scales ~n_ubatch * n_ctx and is allocated PER DEVICE (it can't be split across GPUs), so a large context turns ub2048 into multi-GiB of scratch that must fit one 16 GiB card. - The VRAM-scaled parallel-slot default tiered off TotalAvailableVRAM(), which SUMS all GPUs (2x16 -> "32 GiB" -> 8 slots), but the allocations are per-device. Make both decisions per-device and context-aware: - xsysinfo.MinPerGPUVRAM() reports the smallest device's VRAM; localGPU() uses it so the parallel tier and batch guard reason about one card. - PhysicalBatchForContext(gpu, ctx) raises the batch only when the extra compute buffer fits VRAM/4 at this model's context (16 GiB crosses over ~174k ctx, 32 GiB ~349k; GB10 reports system RAM so it still clears it). - Apply hardware defaults AFTER runBackendHooks in SetDefaults so the GGUF-guessed context is resolved before the batch decision. - The distributed router gates the node batch the same way. Unified-memory devices (GB10, Apple) report system RAM as their single device's VRAM, so they keep the prefill win. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-25 00:07:48 +02:00
LocalAI [bot]	5c3d48ab50	feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496 ) * feat(ui): remember last-used model per capability ModelSelector auto-selected the first option whenever the bound value was empty or stale, so every visit to the Home chat box, Image, TTS or Talk pages reset the choice to whatever sorted first. Persist the user's pick in localStorage keyed by capability and prefer it on auto-select when the model is still available, falling back to the first option otherwise. Because every modality picker funnels through ModelSelector, this fixes the friction everywhere at once. External-options callers pass no capability and keep the previous first-item behaviour. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): add visibility-aware polling hook The app had 26 hand-rolled setInterval polls, none of which paused when the browser tab was hidden, so backgrounded dashboards kept hitting the server every few seconds for data nobody was looking at. Add usePolling: runs immediately, polls on a fixed interval, pauses while document.hidden, fires a catch-up poll on return, and guards against overlapping slow requests. Route useResources (the highest-frequency shared poll) through it. Further callers can be migrated incrementally. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): hardware-aware starter models on empty home A fresh install dropped admins straight into a 1000+ model gallery with no guidance. Add a StarterModels widget to the empty-state wizard that recommends a small, curated set tuned to the detected hardware: - CPU-only machines (no GPU VRAM) are steered to genuinely small models (1-4B, Q4) that stay responsive without a GPU. - GPU machines get suggestions scaled to available VRAM. Curated names are real gallery entries, intersected against the live gallery at render time so a trimmed/custom gallery degrades gracefully. Install is one click via the existing model-install API. Also routes Home's cluster and system-info polls through usePolling so a backgrounded home page stops fetching. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): optional token-cost estimates on usage dashboard The usage dashboard tracked tokens but had no monetary view. Multi-user deployments that bill back or budget compute had to export and compute cost elsewhere. Add an opt-in pricing control: admins set $ per 1M prompt/completion tokens (stored per-browser). When set, an estimated-cost summary card and per-model / per-user cost columns appear, computed from recorded token counts. The entire cost surface stays hidden until a price is entered, so the default view is unchanged. Cost is clearly labelled an estimate - LocalAI itself has no notion of price. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ui): label icon-only send buttons for screen readers The chat and agent-chat send buttons were a bare paper-plane icon with no accessible name, so screen readers announced only "button". Add an aria-label/title ("Send message") and mark the icon aria-hidden. An audit of all icon-only buttons found these were the only two unlabeled controls; the rest already carry visible text. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 23:30:08 +02:00
LocalAI [bot]	764b0352b9	docs: ⬆️ update docs version mudler/LocalAI (#10491 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 23:18:24 +02:00
LocalAI [bot]	75ba2daba1	chore(model-gallery): ⬆️ update checksum (#10495 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 23:18:04 +02:00
LocalAI [bot]	62b14fd635	feat(backends): add darwin/metal build for liquid-audio (#10486 ) * feat(backends): add darwin/metal build for liquid-audio Wire the already-MPS-ready liquid-audio backend (it ships requirements-mps.txt) into the darwin CI matrix and the gallery so metal-darwin-arm64 images are built and selectable. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * ci(liquid-audio): trigger darwin build via requirements-mps note The changed-backends path filter only builds a backend when a file under its directory changes. The metal wiring lived in index.yaml + the matrix, so the darwin job was skipped. Add a documenting comment to the MPS requirements so CI actually exercises the darwin build. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which rejects the uv-only --index-strategy flag and failed the darwin backend build. Add it only on the uv path; Linux/CUDA resolution is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 23:16:27 +02:00
LocalAI [bot]	193d0e6aef	fix(backends): darwin/metal support for supertonic (#10488 ) The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec path, while package.sh hard-failed on any non-Linux host. On macOS dyld has no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib. This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481 landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but which omitted supertonic: - run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH at libonnxruntime.dylib; guard the ld.so exec path to Linux only. - package.sh: recognize Darwin instead of erroring out; the bundled .dylib is resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle. - helper.go: platform-native default library extension (dylib on darwin) for the last-resort dlopen fallback. It also wires the darwin CI build and gallery entries, resolving the inconsistency where backend/index.yaml advertised metal for supertonic but no includeDarwin matrix entry built the image: - .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry. - backend/index.yaml: declare metal capabilities and add the concrete metal-supertonic / metal-supertonic-development child entries. The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 22:19:03 +02:00
LocalAI [bot]	482314c623	fix(realtime): resolve model aliases for pipeline sub-models (#10484 ) Realtime pipeline sub-models (llm/transcription/tts/vad/sound-detection) were loaded via cl.LoadModelConfigFileByName without alias resolution, unlike top-level API requests which resolve aliases in core/http/middleware/request.go. So a pipeline that references an alias (e.g. `pipeline.llm: default`, where `default` is an alias for a real LLM) reached model loading as the alias stub with an empty Backend. This was silently broken on a single host (it failed downstream) and a hard error in distributed/p2p mode: routing model : loading model default: ... installing backend on node X: backend name is empty Fix by routing every pipeline sub-model load through a small helper that follows a single alias hop (mirroring the top-level resolution), so non-alias sub-models behave identically and aliased ones get the target's full config (Backend, Model, ...). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 21:50:44 +02:00
Dedy F. Setyawan	e8ae88a2a0	i18n(id): update and complete Indonesian translations (#10480 ) - translate remaining English strings in chat, common, home, and media locales. - fix typo and improve wording consistency (e.g., klaster -> kluster, otomasi -> automasi). Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>	2026-06-24 18:35:21 +02:00
Richard Palethorpe	e1994579f8	fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS (#10474 ) pii_default_detectors was applied to the live config only by a live POST /api/settings (ApplyRuntimeSettings) — neither the startup loader nor the config file watcher read it back. So after a restart the persisted default detectors were dropped, and the cloud-proxy MITM listener (which resolves each intercept host's detectors once at start via ResolvePIIPolicy) came up with an empty set and forwarded intercepted traffic unredacted, even though the MITM model had pii.enabled:true and the defaults were on disk. Request-side default redaction broke the same way. - startup.go: loadRuntimeSettingsFromFile now applies pii_default_detectors, before startMITMIfConfigured, with env > file precedence. - config_file_watcher.go: apply pii_default_detectors on live file edits, matching the existing env-guard pattern used for the other fields. - settings endpoint: rebuild the MITM listener when pii_default_detectors changes (its per-host detector map is frozen at listener start), not only on a mitm_listen change — so toggling a default detector takes effect on cloud-proxy traffic immediately. - new LOCALAI_PII_DEFAULT_DETECTORS env var / CLI flag (WithPIIDefaultDetectors) so the default detector set can be pinned at boot for immutable deployments. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-06-24 11:08:57 +02:00
LocalAI [bot]	e5620989dd	refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476 ) PR #10475 fixed SoundDetection in-flight tracking, but the underlying trap remains: InFlightTrackingClient embedded the whole grpc.Backend interface "for passthrough of untracked methods", so any newly added inference method is silently satisfied by the embedded passthrough and never wrapped with track(). That leaves onFirstComplete unfired and in-flight stuck at 1 - the exact SoundDetection bug, waiting to recur for the next backend method. Close the gap at the type level instead of relying on reviewers to remember: - Split grpc.Backend into two composed sub-interfaces: InferenceBackend (methods that are one discrete inference call and must be tracked) and ControlBackend (control-plane calls plus the streaming constructors whose work spans the returned stream, safe to pass through). The classification now lives next to the interface it documents. - InFlightTrackingClient embeds only grpc.ControlBackend and implements every InferenceBackend method explicitly, delegating to an inner InferenceBackend. A `var _ grpc.Backend = (*InFlightTrackingClient)(nil)` assertion makes the package fail to compile if any inference method is left unwrapped. Now adding a method to InferenceBackend is a build error (at the assertion and every call site: "does not implement grpc.Backend (missing method X)"), not a silent runtime leak - and the obvious fix is to copy a neighbouring wrapper, which calls track(). No runtime guard or reviewer vigilance required. Pure refactor: the composed Backend interface is identical to the old flat one, so all implementers and consumers are unaffected (verified with a full `go build ./...`). Behaviour is unchanged; the existing nodes suite passes. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 11:08:29 +02:00
LocalAI [bot]	fc618dcee6	fix(distributed): track in-flight for SoundDetection requests (#10475 ) The distributed router wraps backend clients in InFlightTrackingClient so the eviction logic knows which replicas are actively serving. Every inference method must be wrapped: track() increments in-flight on entry and decrements (plus fires onFirstComplete, which releases the load-time reservation) on return. SoundDetection was added after the tracking client and never got a wrapper, so its calls fell through to the embedded passthrough Backend. The increment/decrement never ran and, critically, onFirstComplete never fired, so the reservation set at model load was never released - leaving in-flight stuck at 1 and the replica permanently ineligible for eviction. Wrap SoundDetection like the other non-LLM methods and cover it in the "non-LLM inference methods track in-flight" table test. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 10:13:37 +02:00
LocalAI [bot]	e6042080c0	fix(agents): URL-decode collection/agent name path params (#10443 ) (#10471 ) fix(agents): URL-decode collection/agent name path params Collection and agent names carry a "legacy-api-key:" prefix, so the ':' arrives percent-encoded as %3A in the request path. Echo routes such paths via URL.RawPath and stores the matched path-param value still escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch" and the store lookup 404'd ("collection not found"). This was second-order fallout of #10375/#10387: once colons became valid in names, the URL-decode gap surfaced on every name-bearing endpoint. Add a decodedParam helper that url.PathUnescape's the param (falling back to the raw value on invalid encoding) and wire it into all collection endpoints and the agent :name endpoints, which share the identical prefix. The entry endpoints already unescaped c.Param("*"); this closes the same gap for :name. Fixes #10443 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 09:42:09 +02:00
LocalAI [bot]	0f3b24436d	chore: ⬆️ Update mudler/parakeet.cpp to `89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a` (#10468 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:43 +02:00
LocalAI [bot]	4b6f911835	chore: ⬆️ Update ggml-org/whisper.cpp to `43d78af5be58f41d6ffbc227d608f104577741ea` (#10466 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:14 +02:00
LocalAI [bot]	a5e28942a6	chore: ⬆️ Update ggml-org/llama.cpp to `be4a6a63eb2b848e19c277bdcf2bd399e8af76d9` (#10467 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:54 +02:00
LocalAI [bot]	dba9cd7ca4	chore: ⬆️ Update CrispStrobe/CrispASR to `96b2a6ee31d30389fed8a7ef1a54239b75231ddc` (#10465 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:34 +02:00
LocalAI [bot]	c93190de50	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `7ccf1d209588962b96eacca325b37e9b3e8faf5e` (#10456 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:13 +02:00
LocalAI [bot]	4dbf69f889	chore(model gallery): 🤖 add 1 new models via gallery agent (#10472 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 00:00:26 +02:00
LocalAI [bot]	deb430f3ec	chore(model-gallery): ⬆️ update checksum (#10469 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> v4.5.0	2026-06-23 23:15:47 +02:00
LocalAI [bot]	dd8c8778e2	chore(model gallery): 🤖 add 1 new models via gallery agent (#10464 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 15:43:21 +02:00
LocalAI [bot]	06a7b6cadb	chore: ⬆️ Update leejet/stable-diffusion.cpp to `f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f` (#10457 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:29:07 +02:00
LocalAI [bot]	67c8889866	chore: ⬆️ Update CrispStrobe/CrispASR to `63b57289255267edf66e43e33bc3911e04a2e92d` (#10455 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:49 +02:00
LocalAI [bot]	1d49041c85	chore: ⬆️ Update ggml-org/llama.cpp to `73618f27a801c0b8614ceaf3547d3c2a99baae14` (#10458 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:09 +02:00
LocalAI [bot]	2edc4e25b3	chore: ⬆️ Update ggml-org/whisper.cpp to `bae6bc02b1940bbfb87b6a0299c565e563b916d1` (#10459 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:27:51 +02:00
Richard Palethorpe	7888067914	fix(settings): merge partial /api/settings updates instead of overwriting (#10463 ) POST /api/settings rebuilt runtime_settings.json from only the request body, so a focused admin page that submits a single field wiped every other persisted setting. The Middleware proxy tab (mitm_listen) and detector table (pii_default_detectors), plus the MCP SetBranding tool (instance_name/instance_tagline), all POST partial bodies; the no-omitempty api_keys and pii_default_detectors fields even round-tripped as JSON null. Read the persisted settings and overlay only the fields the request set (RuntimeSettings.MergeNonNil) before writing. Every field is a pointer, so the reflection-based merge is total over the struct and any field added later is preserved automatically. Absent or null fields are now kept; clearing a setting is done by sending its explicit empty/zero value (api_keys [], mitm_listen "", etc.), unchanged from before. The full Settings page sends every field, so its Save behaves identically. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-23 13:27:34 +02:00
LocalAI [bot]	9eedbf537a	chore(model gallery): 🤖 add 1 new models via gallery agent (#10461 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 08:04:46 +02:00
LocalAI [bot]	69c16481c8	fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460 ) PR #10454 added a `cancellable bool` parameter to GalleryStore.UpdateProgress but missed two callers under tests/e2e/distributed, breaking the build on master (golangci-lint and tests-e2e-backend both failed to compile with "not enough arguments in call to ... UpdateProgress"). Pass cancellable=true (both ops are downloading installs, which are cancellable) and assert the flag is persisted, exercising the new behavior. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 23:45:22 +02:00
LocalAI [bot]	56f8a6623f	fix(galleryop): persist cancellable so restarted in-flight ops stay cancellable (#10454 ) In distributed mode a model/backend install marks OpStatus.Cancellable=true while downloading, but the gallery_operations row never recorded it: UpdateStatus persisted only progress/status and Create left the cancellable column at its zero value. After a replica restart Hydrate rebuilt the op with cancellable=false, /api/operations reported false, and the UI hid the cancel button - the orphaned op then lingered until the 30-minute stale reaper expired it ("stays there on restart, can't cancel, after a bit it expires"). Persist the flag on every progress tick and at row creation (installs are cancellable, deletes are not), and clear it on terminal transitions. A rehydrated in-flight op is now cancellable, so an admin can dismiss the orphaned op immediately instead of waiting out the reaper. The functional cancel path already survived restart (CancelOperation persists store.Cancel even with no live CancelFunc); this restores the UI affordance that drives it. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 22:41:16 +02:00
Ettore Di Giacinto	4755d676a3	Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar)" (#10453 ) Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top…" This reverts commit `9d54a599b0`.	2026-06-22 21:59:05 +02:00
dependabot[bot]	10184b5e28	chore(deps): bump actions/checkout from 6 to 7 (#10451 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 6 to 7. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v6...v7) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-22 21:38:37 +02:00
LocalAI [bot]	fdf475ec5f	feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446 ) * feat(realtime): add pipeline.compaction config + resolution Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): extract itemID helper, reuse in item.retrieve Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): drop duplicate Ginkgo bootstrap, fold specs into openai suite Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.delete Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement input_audio_buffer.clear Add a handler for the input_audio_buffer.clear client event that discards a partially-captured utterance (raw PCM + buffered Opus frames) via a unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.truncate (text) Clears both .Text and .Transcript of the assistant content part at contentIndex so barge-in truncation also works for audio turns whose spoken words live in .Transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): add Conversation.Memory + pair-safe compactionCut Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, avoids panic) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(realtime): gofmt compaction test helper closures Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): inject rolling memory into the prompt + summary builders Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): server-side summarize-then-drop compactor Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): unit-test prefixMatches eviction-safety predicate Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): resolve summarizer model + schedule compaction per turn Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document conversation compaction + new item events Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): resolve summary model inside compaction goroutine (lazy, off-path) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): reuse reasoning.ExtractReasoningComplete for summary stripping Replace the bespoke <think> regex in the compactor with the shared pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of the realtime path and covering all reasoning tag families, not just <think>. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(config): register pipeline.compaction fields in meta registry TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have a UI/meta registry entry; add the four pipeline.compaction.* leaves so they render with proper labels/descriptions instead of the reflection fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 21:28:49 +02:00
LocalAI [bot]	9d54a599b0	feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar) (#10449 ) * feat(ui): add shared DeploymentContext (features + p2p signal) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(ui): extract launchAssistantChat shared helper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): role/mode-aware landing redirect at /app Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): pin Cluster group and collapse Create for cluster admins Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): desktop top navbar with mode pill and admin-via-chat jump Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): admin token-usage meter in the top navbar Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ui): top-navbar breakpoint handoff + assistant jump from chat page M1: the desktop .top-navbar was hidden at max-width 768px while the .mobile-header only appears at max-width 639px, leaving 640-768px with neither bar so admins lost the mode pill, token meter and admin-via-chat jump. Hide the top bar at 639px instead so it covers every width the rail sidebar is shown and hands off to the mobile-header exactly at 639px. M2: the navbar 'Admin via chat' button wrote localStorage and called navigate('/app/chat'), but when already on the chat page Chat does not remount so its mount-time payload reader never fired and the click was a no-op until reload. The payload consume logic is factored into a shared callback; the launcher now dispatches a localai-open-assistant event that the mounted Chat listens for to re-consume the payload. Mount behavior is unchanged. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 21:27:43 +02:00

1 2 3 4 5 ...

6829 Commits