LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-19 06:09:07 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	95e7149c87	chore: ⬆️ Update ggml-org/llama.cpp to `74ade52741203e5c8f81eaf06a96cb1cfe15f2a3` (#10368 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 13:25:29 +02:00
LocalAI [bot]	fd26c8c753	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `064d23a6f816d50491d8c9b35a0cafe546eaf4b5` (#10367 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 13:25:14 +02:00
LocalAI [bot]	e60c094a7d	feat(ds4): SSD streaming + quality engine options, 128GB DeepSeek gallery models (#10374 ) feat(ds4): wire SSD streaming + quality engine options, add 128GB DeepSeek gallery models The ds4 backend zero-initialized ds4_engine_options and exposed none of the engine's tunable knobs, so SSD streaming (run a model larger than RAM by streaming routed MoE experts from the GGUF on SSD) and the quality/perf knobs were unreachable from LocalAI model YAMLs. Map ModelOptions.Options onto ds4_engine_options through a declarative table (kEngineOptSpecs + apply_engine_option) instead of per-field branches: the struct is fixed C with no reflection, so the field set is enumerated once and a future knob is a one-line table row. Two fields use ds4's own typed parsers (GiB budgets, cache-experts count-or-NGB). Bare flags (e.g. "ssd_streaming") mean true; path-type options (mtp_path, expert_profile_path, directional_steering_file) resolve relative to the model directory so a gallery entry can reference a companion file by bare filename. mtp_draft/mtp_margin are now validated rather than parsed with throwing std::stoi/std::stof. Add gallery entries for the 128 GB class: - deepseek-v4-flash-q2-q4 (~91 GB, mixed q2/q4, fits RAM, higher quality) - deepseek-v4-flash-q4-ssd (~153 GB full 4-bit, runs on 128 GB via SSD streaming) - deepseek-v4-flash-q2-mtp (~81 GB + MTP speculative draft weights) - deepseek-v4-pro-q2-ssd (~433 GB Pro, experimental SSD streaming) SSD streaming is Metal (Darwin) only; the options are inert on CUDA/CPU. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-17 10:30:06 +02:00
LocalAI [bot]	980ec4a311	chore: ⬆️ Update antirez/ds4 to `cafc134f78a5a1890d98808d3102f4313573a1bc` (#10369 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:28:19 +02:00
LocalAI [bot]	dfd5a00e6f	chore: ⬆️ Update ggml-org/whisper.cpp to `9efddafb9153e1fb22bdc3dd3057072c99165ed2` (#10366 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:27:52 +02:00
LocalAI [bot]	63be479066	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7f0e728b7d42f2490dfa5dd9539082d904f2f6b2` (#10370 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:08:34 +02:00
LocalAI [bot]	4c6750fe6b	feat(depth): metric-large + nested metric model gallery entries (#10363 ) * feat(depth): add depth-anything-3-metric-large gallery entry DA3METRIC-LARGE (ViT-L) single-file metric-scale depth + sky, served by the existing depth-anything backend (same single-GGUF path as mono-large). GGUF published at mudler/depth-anything.cpp-gguf. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): serve nested metric model (two-file load) The DA3 nested model needs both branches (anyview GIANT + metric ViT-L) loaded together. Wire it through the backend: - Load reads a 'metric_model:<file>' entry from ModelOptions.Options and, when present, calls da_capi_load_nested(anyview, metric) instead of da_capi_load (registers the new abi-4 symbol; helper optionValue + unit test). - gallery: depth-anything-3-nested (model=anyview, options=metric branch, both GGUFs fetched) for metric-scale depth + pose. - bump depth-anything.cpp pin to cce5edc (abi 4 / da_capi_load_nested). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 22:03:58 +02:00
LocalAI [bot]	294170d3ed	feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery (#10352 ) * feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery Mirrors the locate-anything-cpp backend to register a new depth-anything backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via purego (cgo-less, no Python at inference). - backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage), purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts building depth-anything.cpp's DA_SHARED static .so per CPU variant. - backend/index.yaml: depth-anything backend meta + all hardware-variant capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t). - gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32, small, large, giant, mono-large). - .github/backend-matrix.yml: one build entry per hardware variant. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): typed Depth RPC + REST endpoint exposing full DA3 data Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): pin depth-anything.cpp to e0b6814 (ABI 3 dense C-API) The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3); pin the native build to the commit that exports them. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): pin depth-anything.cpp to v0.1.0 release (b515c31) Repoint the native version from the now-orphaned e0b6814 to the b515c31 release commit, kept alive by the upstream v0.1.0 tag. C-API is unchanged (da_capi_abi_version == 3). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): wire depth-anything-cpp into build, CI bump, and importer The backend dir, gallery index, and CI build-matrix were present but the backend was never wired into the integration points that adding-backends.md requires: - root Makefile: add to .NOTPARALLEL, the test-extra chain, a BACKEND_* definition, the docker-build target eval, and docker-build-backends (mirrors parakeet-cpp; the backend's own Makefile already documented that its `test` target is driven by test-extra). - bump_deps.yaml: register the DEPTHANYTHING_VERSION pin so the daily auto-bump bot tracks mudler/depth-anything.cpp master (it cannot see an unregistered Makefile pin). - import form: add a preference-only KnownBackend entry so depth-anything is selectable at /import-model (mirrors sam3-cpp; no reliable GGUF auto-detect signal, so pref-only per the doc's default). changed-backends.js needs no entry: the generic golang suffix branch already resolves backend/go/depth-anything-cpp/. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): auto-detect importer for depth-anything GGUFs Replace the preference-only entry with a real auto-detect importer (mirrors parakeet-cpp / locate-anything): - DepthAnythingImporter matches a .gguf whose name carries a depth-anything token (depth-anything-<size>-<quant>.gguf), so /import-model recognises mudler/depth-anything.cpp-gguf repos and direct GGUF URLs without an explicit backend preference. preferences.backend= "depth-anything" still forces it. - Registered before LlamaCPPImporter so its GGUF bundles aren't claimed by the generic .gguf importer; the narrow name match means it cannot claim arbitrary llama GGUFs or the upstream safetensors PyTorch repos. - Multi-quant repos pick the smallest quant by default (q4_k -> ... -> f32, depth stays >0.998 corr even at q4_k); quantizations preference overrides. - Drops the now-redundant knownPrefOnlyBackends entry (importer-backed backends are not listed there, matching parakeet-cpp). - Table-driven Ginkgo test covers detection, negative cases (llama GGUF, upstream safetensors), default/override/fallback quant pick, and direct URL import. 10/10 specs pass. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): check conn.Close error in grpc Depth client (errcheck) The new Depth() client method used a bare `defer conn.Close()`. golangci-lint runs with new-from-merge-base, so although the 39 sibling methods use the same bare form (grandfathered), the newly added line trips errcheck. Drop the result explicitly to satisfy the linter. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): bump depth-anything.cpp to v0.1.1 (embeddable CMake) v0.1.0 (b515c31) used ${CMAKE_SOURCE_DIR} for its include dirs, which points at the parent project when built via add_subdirectory() as this backend does, so the container build failed with missing stb_image.h / da_gguf_keys.h. v0.1.1 (2d42897) switches to project-relative paths. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): resolve gosec findings in the backend wrapper The code-scanning gate flagged three new failure-level alerts in godepthanythingcpp.go (gosec runs with -no-fail; GitHub gates on new alerts): - G301: export dirs were created with 0o755. Tighten to 0o750 (no world access needed for backend-written export output). - G304: writeDepthPNG creates req.GetDst(). That path is chosen by the LocalAI core as the intended output destination (same pattern every image backend uses), not attacker input, so annotate with #nosec G304 and document why. The remaining G103 "audit unsafe" notes on the unsafe.Slice C-buffer copies are warning-level (the same purego interop whisper/parakeet use) and do not gate the check, per the supertonic exclusion precedent in secscan.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): bump depth-anything.cpp to v0.1.2 (CUDA cross-build arch) v0.1.1 forced CMAKE_CUDA_ARCHITECTURES=native, which breaks the GPU-less l4t/cublas CI builds (nvcc "Unsupported gpu architecture 'compute_'" on CMake 3.22). v0.1.2 (442eea4) drops the override and lets ggml pick its default cross-build arch list. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 16:28:28 +02:00
LocalAI [bot]	1ab61a0875	feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 ) * feat(config): add chat_template_kwargs model field + resolver Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier, plus ResolveChatTemplateKwargs which layers the config map under coerced request metadata. Foundation for generic jinja chat-template kwargs (issue #10329). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(backend): forward resolved chat_template_kwargs blob to backends gRPCPredictOpts now merges per-request client metadata over the server-derived enable_thinking/reasoning_effort (reaching all backends via the standalone keys) and serialises the resolved chat_template_kwargs map into a JSON blob for llama.cpp, written last so a client cannot clobber it. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(http): wire request metadata to config.RequestMetadata The OpenAI request metadata field was parsed but unused; stamp it onto the per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs overrides. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): generic chat_template_kwargs merge (drop per-key blocks) Replace the per-key enable_thinking/reasoning_effort handling in both the streaming and non-streaming chat paths with a single block that parses the chat_template_kwargs JSON blob resolved by the Go layer and merges every key into body_json. New jinja template levers (e.g. preserve_thinking) now need no C++ change. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: document custom chat_template_kwargs (model + per-request) Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(backend): pin reasoning_effort as a string in the chat_template_kwargs blob Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(http): e2e guard pinning chat_template_kwargs forwarded to gRPC Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the received PredictOptions.Metadata, and an app_test.go spec that drives a real /v1/chat/completions request (model chat_template_kwargs + per-request metadata override) and asserts the exact metadata + chat_template_kwargs blob the REST layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(config): grandfather chat_template_kwargs in registry coverage chat_template_kwargs is a free-form map[string]any (like engine_args, already on the list), not a scalar the config UI registry can surface, so it is exempt from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries failure introduced by the new field. Issue #10329. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 12:16:34 +02:00
LocalAI [bot]	f44034021e	chore: ⬆️ Update leejet/stable-diffusion.cpp to `5a34bc7f6e0621dd2f899daa64476eac667d7ed3` (#10335 ) * ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt gosd.cpp to upstream sd_ctx_params_t API The bump to 5a34bc7 restructured sd_ctx_params_t: the boolean CPU-offload knobs (offload_params_to_cpu, keep_clip_on_cpu, keep_vae_on_cpu, keep_control_net_on_cpu) were replaced by backend assignment specs (backend/params_backend), and vae_decode_only / free_params_immediately were dropped entirely. The build broke with "no member named ..." on every arch. Translate the legacy options we still accept from gallery configs into the new backend assignment specs, mirroring prepare_backend_assignments() in the upstream CLI, so offload_params_to_cpu / keep__on_cpu keep working. vae_decode_only is parsed and ignored for config compatibility. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] feat(stablediffusion-ggml): expose backend/params placement options The upstream bump introduced new sd_ctx_params_t fields for device and memory placement (backend, params_backend, rpc_servers, max_vram, stream_layers) plus PuLID-Flux weights (pulid_weights_path). Wire them up as backend options so models can be split across CPU/GPU/disk/RPC: - backend: per-component compute placement (e.g. clip=cpu,vae=cuda0) - params_backend: per-component weight storage incl. disk mmap - max_vram / stream_layers: graph-cut segmented parameter offload budget - rpc_servers: offload compute to remote RPC servers - pulid_weights_path: PuLID-Flux identity injection The legacy keep__on_cpu / offload_params_to_cpu booleans now seed and compose with the explicit backend/params_backend specs, matching upstream prepare_backend_assignments(). Option values are taken as everything after the first ':' so colon-bearing values (rpc_servers host:port) survive parsing. Documented the new options in the image-generation guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] feat(stablediffusion-ggml): distributed RPC across ggml workers Enable the ggml RPC backend (-DSD_RPC=ON) so image generation can be sharded across remote rpc-server workers. The ggml rpc-server is backend-agnostic, so this reuses the exact same worker pool as the llama.cpp backend - one set of `local-ai worker llama-cpp-rpc` / `p2p-llama-cpp-rpc` workers accelerates both text and image generation. RPC servers are selected by precedence: - the explicit `rpc_servers` option, else - the LLAMACPP_GRPC_SERVERS env var, which LocalAI's p2p worker mode populates automatically with discovered workers (the backend inherits it from the parent process env), so distributed image generation needs no per-model configuration. Documented manual and p2p setup in the image-generation guide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 12:15:45 +02:00
LocalAI [bot]	6b9f1bd4b3	chore: ⬆️ Update antirez/ds4 to `e34a8086693ba7ca5cfabd2b9028ee52f0bfac2e` (#10350 ) * ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(ds4): add Homebrew include/lib prefix for Darwin grpc-proto build The darwin/metal ds4 backend job runs for the first time on this bump (it was skipped on prior ds4 PRs) and fails compiling backend.pb.cc with 'google/protobuf/runtime_version.h' file not found. hw_grpc_proto links neither protobuf::libprotobuf nor gRPC::grpc++, so the generated proto sources rely on default system include paths. That works on Linux (/usr/include) but not on macOS, where Homebrew installs under /opt/homebrew. Add the Homebrew prefix to include/link dirs on Darwin, mirroring the llama-cpp backend that already builds on Darwin CI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): install nlohmann-json on Darwin CI for ds4 backend After the protobuf include-path fix the ds4 darwin build advances to compiling dsml_renderer.cpp, which includes <nlohmann/json.hpp> and #errors when absent. On Linux the header comes from apt nlohmann-json3-dev in the build image; the macOS runner had no equivalent. Add the header-only nlohmann-json formula to the shared Darwin backend brew install/link list and Homebrew cache, alongside the existing deps. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ds4): build proper OCI image tar for Darwin backend The darwin packaging referenced scripts/build/oci-pack.sh, which was never added to the tree, so it fell back to a plain 'tar' that omits manifest.json. 'local-ai backends install' then rejects the tarball with 'file manifest.json not found in tar'. Use './local-ai util create-oci-image' (already built by the 'build' prerequisite of the backends/ds4-darwin target), mirroring llama-cpp-darwin.sh, to emit a real OCI image the installer accepts. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 09:59:50 +02:00
LocalAI [bot]	3d295adfa8	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `2f524850a1f67716bc0ba80ffa30ce39c5b8bd5f` (#10336 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-06-16 09:04:35 +02:00
LocalAI [bot]	4fa2064875	chore: ⬆️ Update ggml-org/llama.cpp to `7dad2f1a17d65b5e2034c277125bc9f97573a779` (#10337 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-16 08:22:26 +02:00
LocalAI [bot]	cb74399b3a	chore: ⬆️ Update ggml-org/whisper.cpp to `0ec0845110dc934911dc48e8c5beb5ad3189b3f3` (#10349 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-16 08:22:10 +02:00
dependabot[bot]	2388686369	chore(deps): bump grpcio from 1.81.0 to 1.81.1 in /backend/python/vllm (#10347 ) Bumps [grpcio](https://github.com/grpc/grpc) from 1.81.0 to 1.81.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.81.0...v1.81.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.81.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-15 22:57:38 +02:00
LocalAI [bot]	2df2876db2	feat(supertonic): add Supertonic ONNX TTS backend (CPU) (#10342 ) * feat(supertonic): vendor upstream Go TTS pipeline (helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add gRPC backend (Load/TTS/TTSStream, CPU) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): satisfy unused linter (use onnxProvider; exclude vendored helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(supertonic): unit tests for resolvers + gated end-to-end synthesis Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(supertonic): gofmt backend.go comment block Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add Makefile, run.sh, package.sh (CPU build) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(supertonic): wire backend into root Makefile Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): check ort.DestroyEnvironment return (errcheck) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): resolve voice_styles as sibling of onnx dir; guard trim; test voice Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add CPU build matrix + gallery index entries Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): expose as pref-only importable backend Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add Supertonic/supertonic-3 TTS model to the gallery 16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles) from HF Supertone/supertonic-3, served via the supertonic backend. Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the backend's resolveVoicesDir. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(meta): register pipeline.max_history_items config field Pre-existing on master: the field was added without a registry entry, failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the entry so it renders properly in the model-config UI. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(secscan): exclude vendored supertonic backend from gosec helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104 findings are inherent to upstream and the math/rand use is correct for flow-matching noise (crypto/rand would be wrong). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-15 16:54:11 +02:00
LocalAI [bot]	f648f07b13	chore: ⬆️ Update ggml-org/llama.cpp to `4988f6e866057afd130c1515ecef0c9bab9a15f8` (#10280 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-14 21:53:25 +02:00
LocalAI [bot]	61cde6fd77	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `5f917a64b391b7d31839845153a473a65f630458` (#10240 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-14 16:46:49 +02:00
LocalAI [bot]	692970e507	chore: ⬆️ Update leejet/stable-diffusion.cpp to `276025e054555166ec419413c6748ca79986ee93` (#10313 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-14 16:44:05 +02:00
LocalAI [bot]	36e3419203	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.23.0` (#10314 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-13 23:39:10 +02:00
LocalAI [bot]	4bb592cf91	feat(qwen3-tts-cpp): migrate to ServeurpersoCom/qwentts.cpp (streaming, speakers, voice design) (#10316 ) * feat(qwen3-tts-cpp): repoint upstream to ServeurpersoCom/qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): flatten qt_* ABI into qt3_* purego shim Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): build shim against upstream qwen-core static lib Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add option/language/voice/sampling parsing Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add 24kHz WAV encode/decode/stream-header helpers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): purego backend with streaming, speakers, voice design Map TTSRequest onto qwentts.cpp: instructions->instruct, voice->named speaker or clone-reference path, params map->ref_text + sampling. Add TTSStream over the qt chunk callback. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): unit specs + build-gated TTS/TTSStream e2e Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(qwen3-tts-cpp): close defensive PCM-free gap on zero-sample result Register CppPCMFree before the n<=0 guard so a non-null buffer with zero samples cannot leak (the C contract returns NULL on failure, so this is defensive). Raised in code review. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): advertise TTSStream capability Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): update backend index metadata for qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): qwentts.cpp models - base/customvoice/voicedesign, Q8_0 & Q4_K_M Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(qwen3-tts-cpp): release note for qwentts.cpp migration Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): cover audio_path voice-cloning fallback Add resolveRequest unit specs (config audio_path used as the clone reference when Voice is empty; per-request audio Voice overrides it; a named-speaker Voice does not trigger cloning) plus a real-inference e2e that clones from audio_path (confirmed ref_spk_emb=yes in the pipeline). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): drop the release-note doc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 23:09:59 +02:00
LocalAI [bot]	0854932a25	feat(omnivoice-cpp): add OmniVoice TTS backend (file + streaming, voice cloning + voice design) (#10310 ) * feat(omnivoice-cpp): add C wrapper + CMake/Makefile build over OmniVoice ov_* ABI Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): add option/language parsing + WAV framing helpers with tests Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): wire purego binding with TTS + streaming TTSStream Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(omnivoice-cpp): wire backend into root Makefile Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(omnivoice-cpp): add build matrix entries + dep-bump registration Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): register backend meta + image entries Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): expose as preference-only importable backend Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add omnivoice-cpp TTS models (Q8_0 default + BF16 HQ) Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(omnivoice-cpp): document the OmniVoice TTS backend Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(omnivoice-cpp): add env-gated e2e for TTS + streaming Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): honor tts.audio_path/tts.voice config as default cloning reference The model config tts.audio_path (ModelOptions.AudioPath) and tts.voice now provide a default voice-cloning reference used when a request omits Voice, so a cloned voice can be pinned in the model YAML instead of passed per request. A per-request voice still overrides. Paths resolve relative to the model dir. Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(omnivoice-cpp): add missing omnivoice-cpp-development backend meta Mirrors the whisper/vibevoice convention: a -development meta aggregating the master-tagged image variants (the production meta and per-variant prod+dev image entries already existed; only the development meta aggregator was missing). Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 21:28:46 +02:00
LocalAI [bot]	203410871b	feat(sherpa-onnx): add Kokoro TTS + multilingual Piper voices (#10309 ) Wire the Kokoro model family into the sherpa-onnx backend (which only supported VITS/Piper before) and add gallery voices for Italian, English, Spanish, French and German plus a multilingual Kokoro model. - csrc/shim.{c,h}: kokoro_* config setters (model/voices/tokens/data_dir/ dict_dir/lexicon/lang/length_scale) mirroring the VITS path, with the matching frees in tts_config_free. - backend.go: loadTTS now detects a Kokoro model (a voices.bin beside the ONNX) and routes to configureKokoroTTS, otherwise configureVitsTTS. Kokoro picks up espeak-ng-data, the jieba dict and the per-language lexicons (only one English variant, to avoid tens of thousands of duplicate-word warnings at load); the language= option hints the lang. - backend_test.go: functional test for isKokoroModel detection. - gallery: 5 Piper VITS voices (it_IT-paola, en_US-amy, es_ES-davefx, fr_FR-siwis, de_DE-thorsten) + kokoro-multi-lang-v1.0, served through sherpa-onnx-tts.yaml with native streaming TTS. Verified by building the backend and synthesizing with a real Piper and Kokoro model (31/31 specs pass, including real-model synth smokes). Assisted-by: Claude:claude-opus-4-8 gofmt golangci-lint go-test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 21:27:27 +02:00
LocalAI [bot]	0eca930b8d	fix(gallery): correct meta-backend definitions for platform auto-selection (#10299 ) fix(gallery): correct meta-backend definitions in backend/index.yaml Backends that ship per-platform images must be meta backends (a capabilities map and NO uri) so the right variant is auto-selected per platform - mirroring llama-cpp/whisper. Several entries were misdefined; fixed here: - Concrete base + metal sibling (could not select the Apple Silicon variant): silero-vad, piper, kitten-tts, local-store (+ their -development). Converted each anchor to a meta and added the cpu-<name> concrete. - mlx family (mlx, mlx-vlm, mlx-audio, mlx-distributed + -development): anchor had both a uri AND a capabilities map, so IsMeta() was false and the map was ignored (always resolved to the metal-darwin image); the metal-<name> target did not exist. Removed the uri and added the missing metal-<name> concretes. - Dangling capability targets: diffusers/kokoro nvidia-l4t-cuda-12 repointed to the existing nvidia-l4t-<name> concrete; coqui nvidia-cuda-13 key removed (no cuda13-coqui image). - locate-anything: the meta existed but its concrete entries were never added, so it was un-installable on every platform. Added the full concrete set plus the locate-anything-development meta, mirroring rfdetr-cpp. Image tags grounded against the published quay.io tags. - trl (cuda12/13): repointed the stale 'cublas-cuda12/13-trl' image tags to the actually-published 'gpu-nvidia-cuda-12/13-trl' tags (fixes #9236). Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 10:43:14 +02:00
LocalAI [bot]	0413fc03f8	fix(gallery): make opus a meta backend for platform auto-selection (#9813 ) (#10291 ) fix(gallery): make opus a meta backend so the platform variant is auto-selected (#9813) The realtime/WebRTC path loads the "opus" codec backend by name, but on macOS arm64 only "metal-opus" is installable, so Load("opus") failed with "opus backend not available". The root cause: unlike llama-cpp and whisper, the opus entry was a concrete CPU backend (it carried a uri and no capabilities map) rather than a meta backend, so nothing mapped "opus" to the platform-appropriate variant. Restructure opus to mirror llama-cpp/whisper: "opus" becomes a meta backend with a capabilities map (default -> cpu-opus, metal -> metal-opus) and no uri; the CPU image moves to a new "cpu-opus" concrete (and its dev variant to "cpu-opus-development"). Installing "opus" now resolves to metal-opus on Apple Silicon and cpu-opus elsewhere, and Load("opus") works on every platform via the meta pointer - so the realtime endpoint needs no special casing. This reverts the realtime_webrtc.go resolution helper from the earlier approach in favor of the gallery-level fix. Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 09:51:02 +02:00
LocalAI [bot]	7088572f75	fix(neutts): pin torchaudio to match torch (fixes undefined symbol) (#9798 ) (#10292 ) fix(neutts): pin torchaudio to match torch to avoid ABI mismatch (#9798) neucodec pulls torchaudio transitively but it was unpinned, so an incompatible torchaudio could be resolved against the pinned torch==2.8.0, producing the 'undefined symbol: torch_library_impl' load failure. Pin torchaudio==2.8.0 alongside torch in the cpu and cublas12 requirements. Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 09:28:41 +02:00
LocalAI [bot]	d28a5b6da1	chore: ⬆️ Update mudler/locate-anything.cpp to `92c1682da792c1e8a5dec91acc2be4b02c742ded` (#10282 ) ⬆️ Update mudler/locate-anything.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-13 09:01:17 +02:00
LocalAI [bot]	cf71e291b4	fix(darwin): fix vibevoice-cpp build linkage + fail-safe go backend packaging (#10276 ) * fix(darwin): never package a go backend build tree as a working image The darwin/arm64 vibevoice-cpp image shipped the source tree with a half-built CMake directory (build-libgovibevoicecpp-fallback.so/) and no backend binary, so the backend could never start: run.sh exec'd a vibevoice-cpp binary that was not in the package and LocalAI timed out waiting for the gRPC service. Two durable, backend-agnostic defenses: - backend/go/vibevoice-cpp/Makefile: mirror whisper's cleanup discipline so a partial CMake tree cannot survive into packaging. Run `make purge` before each variant build and `rm -rfv build` after. The old recipe only removed its build dir after a successful `mv`, so a failed build left the half-built tree behind. - scripts/build/golang-darwin.sh: before creating the OCI image, remove any stray build- directory and assert that the binary run.sh launches actually exists. A build that produced no binary now fails the job loudly instead of publishing a source tree as a working backend. The binary name is derived from run.sh's `exec $CURDIR/<binary>` line (parakeet-cpp launches parakeet-cpp-grpc, so it is not always ${BACKEND}) with a ${BACKEND} fallback. The underlying native build failure that left vibevoice-cpp half-built still needs to be reproduced and fixed on Apple Silicon; this change ensures such a failure can never again be published as a working image. Refs #10267 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(vibevoice-cpp): build libvibevoice.a on darwin (link target, not path) The darwin build failed with: No rule to make target 'vibevoice/libvibevoice.a', needed by 'libgovibevoicecpp.so'. Stop. The upstream vibevoice project is added with add_subdirectory(... EXCLUDE_FROM_ALL), so its `vibevoice` static-library target is only built when something links it as a target. The Apple branch linked only `$<TARGET_FILE:vibevoice>` - a bare archive path with no target reference - so CMake never emitted a rule to build libvibevoice.a, while the Linux branch worked because it passes the `vibevoice` target name inside the --whole-archive flags. Link the `vibevoice` target on Apple (establishing the build dependency) and apply -force_load as a separate link option to keep whole-archive semantics so purego can dlsym the vv_capi_* symbols. Refs #10267 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:13:50 +02:00
LocalAI [bot]	a7a7bd646b	fix(mlx): route vision-language models to the mlx-vlm backend (#10274 ) Vision-language checkpoints such as mlx-community/gemma-4-E4B-it-qat-4bit declare the "image-text-to-text" pipeline tag on HuggingFace. The mlx importer hardcoded backend "mlx" for every mlx-community model, so these VLMs were served by the text-only mlx-lm backend whose tokenizer does not carry the processor chat template. The template was never applied and the model produced degenerate, looping output that echoed the prompt. Detect the "image-text-to-text" pipeline tag in the importer and route those models to mlx-vlm, which applies the processor-aware chat template. An explicit backend preference still wins. As a defensive backstop, the mlx backend now warns loudly when the loaded model has no chat template, so a misrouted VLM surfaces the problem instead of silently looping. Fixes #10269 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:12:42 +02:00
LocalAI [bot]	722bdb87e9	chore: ⬆️ Update mudler/parakeet.cpp to `b8012f11e5269126eddb7f4fd02f891a2ccc29b0` (#10281 ) * ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(parakeet-cpp): close streaming segments on <EOB> after ABI v5 eou/eob split parakeet.cpp ABI v5 (the pin this PR bumps to) splits the streaming JSON "eou" flag: in v4 "eou":1 fired for either <EOU> (end of utterance) or <EOB> (backchannel); in v5 "eou" means <EOU> only, with a new separate "eob" field for the backchannel token. The streamSegmenter closed a segment on "eou" alone, so after the bump a backchannel token would silently stop ending a segment and merge into the next utterance. Read the new "eob" field and flush on either signal to preserve the v4 segmentation boundaries. The flat stream_feed eou_out path is unaffected: its mask is still non-zero for either event. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:12:04 +02:00
LocalAI [bot]	50dea8c983	feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery (#10283 ) CrispASR's piper backend phonemizes non-English text via espeak-ng (dlopen, the MIT-clean path; English uses a built-in G2P). The FROM scratch crispasr image shipped none of it, so non-English piper voices loaded but failed synthesis with "phonemization failed". Bundle the espeak-ng runtime so they work: - Dockerfile.golang: install espeak-ng-data + libespeak-ng1 and its libpcaudio0 / libsonic0 deps in the crispasr builder (espeak's dlopen fails without the latter two). - package.sh: copy libespeak-ng.so.1, libpcaudio.so.0, libsonic.so.0 into package/lib/ and the espeak-ng-data dir into the package root. - run.sh: export CRISPASR_ESPEAK_DATA_PATH so the bundled data is found. Add 9 single-speaker piper voices (de/en/it, incl. Italian paola + riccardo) to the gallery, run through backend:piper, hosted at LocalAI-Community/piper-voices-GGUF (converted from rhasspy/piper-voices with CrispASR's convert-piper-to-gguf.py). Only single-speaker low/medium voices are included; the engine does not yet support multi-speaker or high-quality piper decoders. All 9 verified end-to-end: each synthesizes a WAV at the model's native sample rate using only the image-bundled espeak payload. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:10:30 +02:00
LocalAI [bot]	46ba70632b	fix(crispasr): write piper TTS WAV at the model's native sample rate (#10277 ) CrispASR's piper backend returns PCM at the voice's native rate (from the GGUF piper.sample_rate key: 16 kHz for x_low/low, 22.05 kHz for medium/high) and does not resample, but the Go WAV encoder hardcoded 24000 Hz. Every piper voice was therefore written with a wrong header and played back at the wrong pitch/speed. Read piper.sample_rate from the model's GGUF metadata at Load via the vendored gguf-parser-go and use it for the WAV header, falling back to the 24 kHz default for the other CrispASR TTS engines (vibevoice/orpheus/chatterbox/qwen3-tts) that emit 24 kHz and carry no such key. Adds unit specs (minimal crafted GGUFs + WAV-header decode) and an env-gated end-to-end spec (CRISPASR_PIPER_MODEL_PATH). Verified e2e: en_GB-cori-medium synthesizes a 22050 Hz WAV through backend:piper. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:10:17 +02:00
LocalAI [bot]	60facc7252	fix(darwin): publish sherpa-onnx and speaker-recognition images for darwin/arm64 (#10275 ) Neither the sherpa-onnx nor the speaker-recognition backend had a darwin/arm64 image, so `local-ai backends install` failed with "no child with platform darwin/arm64" on macOS. This left /v1/audio/diarization (the sherpa-onnx path) and /v1/voice/embed without any usable backend on Apple Silicon. Both backends build on darwin/arm64: - sherpa-onnx (Go) already fetches the onnxruntime osx-arm64 runtime in its Makefile; it only needed a darwin matrix entry (build-type metal, lang go, like whisper and silero-vad). - speaker-recognition (Python) needed a requirements-mps.txt so the mps build installs plain onnxruntime (which ships a macOS arm64 wheel) instead of the onnxruntime-gpu pulled by its base requirements (which does not). Add both to the includeDarwin build matrix, wire the metal capability and metal image aliases into the gallery, and add the speaker-recognition requirements-mps.txt. Fixes #10268 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 22:32:42 +02:00
LocalAI [bot]	8c8204d3c4	feat(parakeet-cpp): enable GGML_CUDA_GRAPHS in the cublas build (#10273 ) ggml leaves GGML_CUDA_GRAPHS off by default. Passing -DGGML_CUDA_GRAPHS=ON for cublas builds lets the CUDA backend capture and replay the compute graph for a small free speedup (about 1% measured on a GB10, never negative). It is not gated by parakeet.cpp's CMake options, so it passes straight through to ggml. Assisted-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 18:47:36 +02:00
Richard Palethorpe	085fc53bbc	fix(router): production-ready request router + auto-size batch for embedding/rerank (#10104 ) * fix(router): score classifier production-readiness Conversation trimming runs through the classifier model's chat template and trims by exact token count, sized to the model's n_batch which is now scaled to context so long probes can't crash the backend. Missing chat_message templates are a hard error at router build time. Router- facing factories (Embedder/Scorer/Reranker/TokenCounter) re-resolve ModelConfig per call so a model installed post-startup doesn't bind a stub Backend="" config and silently fall into the loader's auto- iterate path. New 'vector_store' backend trace recorded inside localVectorStore on every Search/Insert — including the backend-load-failure path that previously vanished into an xlog.Warn — with outcome tagging (hit/miss/empty_store/backend_load_error/find_error/insert_error/ok). Companion cleanup drops misleading similarity:0 and input_tokens_count:0 from non-hit and text-mode traces. Gallery local-store-development aliases to 'local-store' so the master image satisfies pkg/model.LocalStoreBackend lookups from the embedding cache. Misc: llama-cpp TokenizeString reads the correct 'prompt' JSON key (the original bug); ModelTokenize nil-guard; non-fatal mitm proxy startup; PII 'route_local' renamed to 'allow' with docs/UI in sync; model-editor footer no longer eats the edit area on small screens; several config-editor template/dropdown/section fixes. Tests: e2e router specs (casual/code-hint + long-conversation trim), vector_store trace specs, lazy-factory specs, gallery dev-alias resolution, Playwright trace badge + scroll regression. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(backend): auto-size batch to context for embedding and rerank models Embedding and rerank models pool over the whole input in a single physical batch (n_ubatch). With batch left at the 512 default, the backend rejects longer inputs with "input is too large to process", silently capping a large-context embedder (e.g. 8k/32k) at 512 tokens. Size n_batch to the context for these single-pass usecases, mirroring the existing FLAG_SCORE behaviour; an explicit batch: still wins. Extracts EffectiveContextSize/EffectiveBatchSize from grpcModelOpts so the effective decode window has one home for other callers to reuse. Adds an e2e-aio regression test that embeds a >512-token input. The AIO embedding model is switched to nomic-embed-text-v1.5 (2048 context) because the previous granite model was capped at 512 tokens and could not exercise the larger batch. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(gallery): raise arch-router scoring output cap via parallel:64 Scoring decodes the whole prompt+candidate in a single llama_decode and reads one logit row per candidate token. The vendored llama.cpp server caps causal output rows at n_parallel, so the default of 1 aborts with GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) on multi-token route labels. Set options: [parallel:64] on both arch-router quant entries to lift the cap; kv_unified (the grpc-server default) keeps the full context per sequence, so this does not split the KV cache. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-12 16:21:15 +02:00
LocalAI [bot]	56cc4f63fc	feat(backend): locate-anything-cpp (open-vocabulary object detection via ggml) (#10264 ) * feat(backend): add locate-anything-cpp backend (open-vocab detection via la_capi) A Go/purego backend wrapping locate-anything.cpp's la_capi C ABI, implementing the gRPC Detect RPC: image + open-vocabulary text prompt -> labeled boxes. Mirrors backend/go/rfdetr-cpp; static-links ggml into a per-CPU-variant .so. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend): register locate-anything-cpp in build matrix Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): locate-anything gallery entry + model importer Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(backend): locate-anything-cpp Load+Detect wire test Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add locate-anything-3b model to the gallery index Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend): register locate-anything.cpp in bump_deps auto-bump Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io> * ci(test): e2e smoke for locate-anything-cpp in test-extra (loads the 3B + image, runs Detect) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io> Co-authored-by: mudler <mudler@localai.io>	2026-06-12 14:59:07 +02:00
LocalAI [bot]	a53f34e78f	chore: ⬆️ Update ggml-org/llama.cpp to `4c6595503fe45d5a39f88d194e270f64c7424677` (#10261 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-12 14:57:52 +02:00
LocalAI [bot]	006a9d38c7	chore: ⬆️ Update mudler/parakeet.cpp to `9db92be63179a27201d3b88d5d40c545b2ac48ae` (#10263 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-12 09:18:21 +02:00
LocalAI [bot]	892ce951ce	chore: ⬆️ Update antirez/ds4 to `d881f2a05e8ff6bec001315a36b794b4aa310173` (#10262 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-12 09:18:07 +02:00
LocalAI [bot]	9a88eb81e7	chore: ⬆️ Update CrispStrobe/CrispASR to `d745bda4386ae0f9d1d2f23fff8ec95d76428221` (#10260 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-12 09:17:34 +02:00
pos-ei-don	58cdc050e9	fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packages (#10257 ) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-11 23:57:00 +02:00
pos-ei-don	b962f4a192	fix(vllm): parse tool_call function arguments before applying the chat template (#10256 ) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-11 23:55:38 +02:00
LocalAI [bot]	b6fcb3e1db	chore: ⬆️ Update CrispStrobe/CrispASR to `4b27392ffd0991a857594652cbb8b57e585bcd7b` (#10241 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-11 18:33:58 +02:00
LocalAI [bot]	ff09683d84	chore: ⬆️ Update ggml-org/llama.cpp to `ac4cddeb0dbd778f650bf568f6f08344a06abe3a` (#10239 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-11 18:33:38 +02:00
pos-ei-don	228a6dfe79	fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved to vllm.tokenizers) (#10252 ) fix(vllm): restore compatibility with vLLM >= 0.22 (get_tokenizer moved) vLLM 0.22 moved get_tokenizer from vllm.transformers_utils.tokenizer to vllm.tokenizers. Since the backend requirements install vllm unpinned, freshly built/installed vllm backends currently fail to start with ModuleNotFoundError: No module named 'vllm.transformers_utils.tokenizer' (surfacing as 'grpc service not ready' when loading a model). Use the same try/except version-compat import pattern already used elsewhere in this file: try the new vllm.tokenizers location first and fall back to the pre-0.22 path. Tested on a DGX Spark (GB10, ARM64) with the cuda13-nvidia-l4t-arm64-vllm backend and vllm 0.22.0: model load, chat completions and tool calls all work with this patch applied. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 09:05:23 +02:00
LocalAI [bot]	51a92b6093	chore: ⬆️ Update antirez/ds4 to `8384adf0f9fa0f3bb342dd925372de778b95b263` (#10242 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-11 00:10:34 +02:00
LocalAI [bot]	6b2badb837	chore: ⬆️ Update CrispStrobe/CrispASR to `c29f6653a516a3001d923944dad8892072cc7334` (#10236 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-10 16:16:24 +02:00
LocalAI [bot]	8b8506d01a	chore: ⬆️ Update ggml-org/llama.cpp to `039e20a2db9e87b2477c76cc04905f3e1acad77f` (#10223 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-10 12:22:03 +02:00
LocalAI [bot]	6910a0bb48	chore: ⬆️ Update antirez/ds4 to `91bafb5acd5a6cf00b1e55ef68bf40ddd207bee7` (#10234 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-10 12:08:19 +02:00
LocalAI [bot]	cffd03b522	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `e6f8112f3ba126eed3ff5b30cdd08085414a7516` (#10233 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-10 12:07:49 +02:00

1 2 3 4 5 ...

1466 Commits