LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-01 20:53:15 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	aee4611ab2	chore: ⬆️ Update mudler/parakeet.cpp to `30a307553f1965ceb38a1a922069a71e7dd67bf3` (#10092 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> v4.3.6	2026-05-30 22:48:09 +02:00
LocalAI [bot]	486467623c	chore: ⬆️ Update antirez/ds4 to `e16ead1e29c81a67bbb64e5b001117679cf9ce6e` (#10076 ) * ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(ds4): link new ds4_distributed.o into grpc-server build Upstream ds4 e16ead1e split distributed inference into a new translation unit (ds4_distributed.c/.h). ds4.c and ds4_cpu.o now reference its ds4_dist_* symbols, so the grpc-server link fails with undefined references unless that object is built and linked. Add ds4_distributed.o to both the upstream object build (Makefile) and the grpc-server link set (CMakeLists.txt) for every GPU mode. It is a single GPU-agnostic object, so it is built/linked unconditionally. Verified: the six undefined ds4_dist_session_* references in ds4_cpu.o are all defined by the newly built ds4_distributed.o (nm cross-check). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-30 22:08:30 +02:00
LocalAI [bot]	4912c9b73a	feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084 ) * feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and --parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add &parakeetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(parakeet-cpp): document the parakeet-cpp transcription backend Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(parakeet-cpp): drop em dashes from comments and docs Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add parakeet-cpp f16 models to the model gallery Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION - goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): init the ggml submodule in the backend image clone The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): statically link ggml into libparakeet.so The shared libparakeet.so linked ggml's shared libs (libggml.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-30 14:46:10 +02:00
Richard Palethorpe	12d1f3a697	security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 ) LocalAI's outbound HTTP clients used Go's default redirect policy, which follows up to 10 redirects. On a cross-host redirect Go forwards custom request headers — including credential headers such as Anthropic's x-api-key — to the redirect target (Go strips Authorization, Cookie and WWW-Authenticate cross-host, but NOT arbitrary custom headers). An attacker able to elicit a redirect from an upstream (a hijacked or spoofed upstream, DNS trickery, or a malicious upstream_url) then harvests the operator's provider API key. This was first reported against the cloud-proxy / MITM PII path (GHSA-3mj3-57v2-4636); the same class affects every other outbound client. Rather than patch each call site, add pkg/httpclient as the one sanctioned constructor for outbound HTTP and route everything through it. pkg/httpclient: - New(...) refuses redirects, TLS 1.2 floor, no body deadline (streaming/SSE safe) - NewWithTimeout(d) simple request/response calls - WithFollowRedirects opt-in following that still strips credential headers on any cross-host hop; different scheme/host/port == different origin, guarding the curl CVE-2022-27774 port-confusion class - WithTransport(rt) keep a custom transport (IP-pin, HTTP/2, a credential-injecting RoundTripper) - HardenedTransport() base transport with the TLS floor + bounded setup - Harden(c) apply the policy to a library-supplied http.Client - NoRedirect the CheckRedirect policy; wraps ErrRedirectBlocked Lint: a forbidigo rule flags http.DefaultClient and http.Get/Post/ PostForm/Head, pointing at pkg/httpclient (.golangci.yml, .agents/coding-style.md). forbidigo cannot match the &http.Client{} composite literal without also flagging legitimate http.Client type references, so that form is enforced by review. Migrates every non-test outbound call site across core/, pkg/, cmd/, and the Go backend (backend/go/cloud-proxy). Credential-bearing and internal-RPC clients refuse redirects; download / CDN / registry clients use WithFollowRedirects so they keep working while stripping secrets cross-host. The only credential-bearing client that follows redirects is the gated-download path (pkg/downloader/uri.go), which strips the token on the cross-host hop to the CDN. Hardening this closes, in passing: - MCP remote-server bearer token leaking via a redirect (the RoundTripper re-injected Authorization on every hop) - agent multimedia/webhook clients leaking user-supplied auth headers - cors_proxy following redirects, bypassing its SSRF IP-pin - downloader's authorized read path leaking the token cross-host Fixes: GHSA-3mj3-57v2-4636 (cloud-proxy leaks operator provider API key (x-api-key) to attacker host on cross-host redirect) Reported-by: tonghuaroot Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-30 12:04:10 +02:00
LocalAI [bot]	a7cad704b9	chore: ⬆️ Update ggml-org/llama.cpp to `22d66b567eef11cf2e9832f04db64ee0323a0fd0` (#10080 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 08:34:00 +02:00
LocalAI [bot]	7e4df67556	chore: ⬆️ Update ggml-org/whisper.cpp to `f24588a272ae8e23280d9c220536437164e6ed28` (#10078 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> v4.3.5	2026-05-30 01:09:52 +02:00
LocalAI [bot]	5b24b4dacc	chore: ⬆️ Update mudler/rf-detr.cpp to `65c0ffcc9a9bc9dae38252f63d0417c9845a6cf7` (#10075 ) ⬆️ Update mudler/rf-detr.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:55:41 +02:00
LocalAI [bot]	52fdb46892	docs: ⬆️ update docs version mudler/LocalAI (#10074 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:24:34 +02:00
LocalAI [bot]	b389f0fe5f	chore(model-gallery): ⬆️ update checksum (#10081 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:57 +02:00
LocalAI [bot]	74281be340	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.22.0` (#10079 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:41 +02:00
LocalAI [bot]	cacf2f7a2c	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `8960c5ba5ee9db30ba838304373aa4dbec9f7cbd` (#10077 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:27 +02:00
LocalAI [bot]	4a2cc64d07	feat(reasoning): honor per-request reasoning_effort on chat completions (#10082 ) The OpenAI `reasoning_effort` field only reached the prompt template; it never toggled the backend's thinking. Map it onto ReasoningConfig.DisableReasoning (which becomes the enable_thinking gRPC metadata) in the request merge, so reasoning_effort="none" disables reasoning per request: the use case from #10072 (run a single Qwen3-style model and turn reasoning off for low-latency tasks while keeping it on for others). Effort levels (minimal/low/medium/high) enable thinking unless the model config explicitly disabled it (reasoning.disable: true wins and is never re-enabled by a request); "none" always disables. Closes #10072 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-29 22:09:07 +00:00
Richard Palethorpe	4647770316	fix(model): track intentional stops, stop misreading clean shutdowns as crashes (#10060 ) Two separate issues made graceful backend shutdown look ungraceful in the logs, even though the processes were being terminated correctly (go-processmanager defaults to process-group SIGTERM + 15s grace + SIGKILL): 1. "failed to read PID" — startProcess registers a per-process graceful- termination handler that calls Stop(), but StopAllGRPC (registered earlier, via app.Shutdown) already stopped and released store-tracked backends first. The second Stop() then failed reading the removed pidfile. Guard the handler with IsAlive() so it skips already-stopped processes; it still covers backends StopAllGRPC doesn't track (worker- supervised ones). 2. "Backend process exited unexpectedly" exitCode=-1 — the exit watcher treated only exit codes 0/143 as clean. But a child killed by our own SIGTERM/SIGKILL is reported by Go as exitCode -1 (signal termination), not the shell's 128+signal convention, so every intentional stop logged a false crash warning. The exit code can't distinguish an intended stop from a signal-induced crash. Track intent directly instead: a stoppingProcs sync.Map (keyed by the *process.Process pointer) is marked wherever LocalAI calls Stop() on purpose, and the exit watcher uses it to pick the log level — Info "stopped" when intentional, Warn "exited unexpectedly" otherwise (still catching real crashes). The raw exit code is reported as a field but no longer interpreted. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-29 18:54:27 +02:00
LocalAI [bot]	3c9b9529c0	chore(model gallery): 🤖 add 1 new models via gallery agent (#10061 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 16:39:14 +02:00
TLoE419	fc2bd0986c	test(utils): cover path verification, sanitization, and unique naming (#9978 ) pkg/utils/path.go provides the security primitives for download paths (VerifyPath, InTrustedRoot) and the file-naming helpers used by every import flow (SanitizeFileName, GenerateUniqueFileName). None of them had test coverage, so a future regression in the traversal check or in the ".." stripping inside SanitizeFileName would land unnoticed. The new specs pin the lexical contract for each helper: - VerifyPath accepts strict descendants and inner traversal that stays inside the base, rejects "..", compound traversal, and the base path itself. An explicit spec documents that the check is purely lexical (filepath.Clean, not EvalSymlinks) so any future caller that needs symlink-aware defence knows to EvalSymlinks first. - InTrustedRoot rejects the trusted root and sibling directories, accepts deeply nested descendants. - SanitizeFileName covers the leading-directory and absolute-prefix paths plus the embedded ".." case ("foo..bar" -> "foobar") that the Clean+Base layer alone would leave intact. - GenerateUniqueFileName covers the no-collision, single-collision, walk-the-counter, and empty-extension cases using GinkgoT().TempDir() so the suite stays hermetic. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: TLoE419 <tloemizuchizu@gmail.com>	2026-05-29 10:40:08 +00:00
Ching	a473a32678	test(react-ui): cover models gallery empty-state reset flow (#10019 ) Exercise the filtered empty-state path in the models gallery and verify that the clear-filters action restores the list and resets the filter selection. Assisted-by: Codex:gpt-5 Signed-off-by: Ching Kao <0980124jim@gmail.com>	2026-05-29 10:39:33 +00:00
LocalAI [bot]	3e220373b0	fix(functions): validate auto-detected XML tool-call names — robust glm-4.5/Hermes guard (#9722 , supersedes #9940 ) (#10059 ) fix(functions): validate auto-detected XML tool-call names (#9722) The XML tool-call auto-detector tries every preset, including glm-4.5 whose tool block is <tool_call>name...</tool_call>. When a Hermes/NousResearch model emits <tool_call>{"name":"bash","arguments":{...}}</tool_call>, glm-4.5 mis-claims the block and returns the entire JSON object (or leading prose, or a JSON array) as the function NAME. The misparse then wins over the JSON parser, so streaming clients receive a tool call whose name is a JSON blob. Guard the auto-detect paths in ParseXMLIterative: a returned tool name must look like a real function name ([A-Za-z0-9_.-]+). Results that don't are dropped so auto-detection falls through to the next format and ultimately to JSON parsing, which handles Hermes correctly. An explicitly forced format (format != nil) is left untouched and trusted verbatim. This supersedes PR #9940, which dropped only names with a leading "{". That narrower check misses leading prose ("Sure: {...}"), JSON arrays ("[{...}]") and brace-less garbage ("name: bash, ..."); the name-shape check rejects all of them while still accepting legitimate glm-4.5 calls. The fix applies to both the streaming worker and the non-streaming ParseFunctionCall path, which both call ParseXMLIterative with auto-detection. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-29 12:03:33 +02:00
Richard Palethorpe	fbcd886a47	fix(application): stop backend processes synchronously on shutdown (#10058 ) application.New wires a fire-and-forget goroutine that runs StopAllGRPC + distributed.Shutdown when the app context is cancelled. Callers (tests, CLI signal handler) cancel the context and then exit immediately, so the test binary / process can terminate before that goroutine kills the spawned backend children. go-processmanager sets no Pdeathsig, so the orphans are reparented to init and survive — leaving dozens of stray mock-backend processes after an e2e run. Add Application.Shutdown(), which runs the same cleanup synchronously on the caller's stack and is idempotent via sync.Once. The context-cancel goroutine, the CLI signal handler, and the test suites all call it, so cleanup is deterministic and the duplicated teardown logic collapses to one place. The async goroutine remains as a safety net for callers that forget; sync.Once dedupes the double call. Wire e2e_suite_test and the two mock-backend Contexts in app_test to call Shutdown in their AfterSuite/AfterEach. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-29 11:40:43 +02:00
泊舟	e1a782b70f	fix(openai): stop streaming tool-call double-emission when autoparser is active (#10055 ) Streaming /v1/chat/completions could emit the same logical tool call at multiple `index` values. In processStreamWithTools the Go-side iterative parser (ParseXMLIterative / ParseJSONIterative) runs on every token and emits tool-call deltas, while the C++ chat-template autoparser delivers its own tool calls via ChatDeltas that are flushed at end-of-stream by ToolCallsFromChatDeltas -> buildDeferredToolCallChunks. With both paths active the same call is emitted twice at different indices, so OpenAI clients that accumulate tool calls by `index` dispatch the tool N times. Skip the Go-side iterative parser once the autoparser is producing tool calls (hasChatDeltaToolCalls). The deferred flush stays guarded by lastEmittedCount, so the race where the Go parser emitted before the flag flipped also remains single-emission. Backends without an autoparser (e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected. Refs #9722 Signed-off-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com> Co-authored-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 11:39:09 +02:00
LocalAI [bot]	73cfedc023	fix: tool-call JSON leaks into content with stream+tools on tokenizer-template models (#10052 ) (#10057 ) * fix(grammars): honor properties_order entry at index 0 The JSON-schema-to-GBNF property sort used `aOrder != 0 && bOrder != 0` as its "is this key ordered?" guard. That treats index 0 — the first key listed in properties_order — as unset, so `properties_order: name,arguments` fell back to alphabetical ordering and still emitted "arguments" before "name". Use presence in the order map instead: listed keys sort by their index and ahead of unlisted keys, which keep a stable alphabetical order. This makes the documented `properties_order: name,arguments` actually produce name-first tool-call JSON. Relates to #10052. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(functions): defer tool grammar to the backend when the tokenizer template owns templating (#10052) When use_tokenizer_template delegates templating to the backend (llama.cpp), the backend also owns tool-call grammar generation and parsing. LocalAI was still generating its own GBNF grammar and sending it down. With a grammar present, llama.cpp does not hand the tools to its template, so its native peg/json tool parser never engages: it streams the grammar-constrained tool-call JSON back as plain content instead of emitting tool_calls. In streaming mode the JSON object leaked into the content field, and the Go-side incremental detector never gated content because the LocalAI-generated grammar emitted "arguments" before "name". The GGUF auto-import path already couples use_tokenizer_template with grammar.disable, but that block is skipped when a template is already configured, so gallery and hand-written configs (e.g. qwen3) that set the tokenizer template directly never got the paired grammar.disable. - SetDefaults now enforces the coupling for every config: when use_tokenizer_template is set, grammar generation is disabled and tools flow to the backend's native (name-first) pipeline. This also fixes already-installed models without editing each config. - Set function.grammar.disable in the shared gallery/qwen3.yaml, which is the base config referenced by every qwen3 gallery entry. Verified end to end against qwen3-4b with stream:true + tools: content no longer carries the tool-call JSON, reasoning is classified separately, and tool calls stream as proper name-first tool_calls deltas. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-29 10:12:53 +02:00
LocalAI [bot]	b982c977d5	chore: ⬆️ Update ggml-org/whisper.cpp to `c932729a304f7d9eb5354afa38624cfa86a780cf` (#10051 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:42:06 +02:00
LocalAI [bot]	532ca1b3a2	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `6eff055a0cc0e427a6849cfcb5de531b4b82d667` (#10050 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:50 +02:00
LocalAI [bot]	00ad55b590	chore: ⬆️ Update ggml-org/llama.cpp to `751ebd17a58a8a513994509214373bb9e6a3d66c` (#10049 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:35 +02:00
LocalAI [bot]	4c58fd302f	chore: ⬆️ Update leejet/stable-diffusion.cpp to `0e4ee04488159b81d95a9ffcd983a077fd5dcb77` (#10048 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:18 +02:00
LocalAI [bot]	66582e7035	chore: ⬆️ Update antirez/ds4 to `22393e770ea8eb7501d8718d6f66c6374004e03f` (#10047 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:41:02 +02:00
LocalAI [bot]	1d13949588	docs: ⬆️ update docs version mudler/LocalAI (#10046 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:40:47 +02:00
LocalAI [bot]	c8ad67bbca	chore: ⬆️ Update mudler/rf-detr.cpp to `ecf64d7f7f20d73ebd906a983f398ed287256320` (#10035 ) ⬆️ Update mudler/rf-detr.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 08:39:47 +02:00
LocalAI [bot]	1c92b00918	fix(turboquant): guard upstream-only grpc-server fields for fork (#10043 ) fix(turboquant): guard upstream-only grpc-server fields for fork build backend/cpp/llama-cpp/grpc-server.cpp is reused by the turboquant build, which compiles against an older llama.cpp fork (TheTom/llama-cpp-turboquant). Two recent changes added references to upstream-only struct fields outside the existing LOCALAI_LEGACY_LLAMA_CPP_SPEC guards: - common_params::checkpoint_min_step (default + option handler), added with the ggml-org/llama.cpp 35c9b1f3 bump (#9998) - the common_params_speculative::draft tensor_buft_overrides sentinel termination (#9919), which sat after the guard's #endif The fork has neither field, so grpc-server.cpp failed to compile for every turboquant flavor. Wrap the three references in #ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC, matching the existing fork-compat guards, so the stock llama-cpp build is unchanged and the fork build skips them. Update patch-grpc-server.sh's doc comment to record what the macro now gates out. Verified by a local fallback-flavor turboquant build: grpc-server.cpp compiles against the fork and the backend image builds. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> v4.3.4	2026-05-28 17:37:54 +02:00
Richard Palethorpe	b81a6d01b3	perf(react-ui): code-split bundle, speed up coverage suite (#10042 ) * Curate the highlight.js build to ~29 languages (lib/core + the common set) instead of the full ~190-grammar default: -787 KB raw / -230 KB gz on the base bundle. * Code-split every route via React.lazy with a per-layout <Suspense> in App.jsx so the sidebar stays mounted on navigation. Initial entry chunk drops from 3194 KB raw / 887 KB gz to 397 KB / 122 KB (-87%). Warm chunks on sidebar hover/focus/touch via a preload registry so the click finds the chunk already in flight or cached. * Migrate Playwright coverage from istanbul (build-time counters) to native Chromium V8 coverage, with per-worker accumulation + conversion. Suite drops from 71s to 30s at 20 workers (~58%) at the non-instrumented floor. * Keep the coverage gate bundling-invariant: the coverage build inlines dynamic imports so every shipped source file lands in the denominator (otherwise untested page chunks silently drop out and inflate the percentage). Production builds stay code-split. * Add UI_TEST_WORKERS=N Makefile knob; tighten coverage tolerance to 0.8pp now that jitter sits near istanbul's ~0.5pp again. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> v4.3.3	2026-05-28 13:43:15 +02:00
Tai An	0fd666ee6e	fix(openresponses): populate Content and accept bare {role,content} items (#10039 ) (#10040 ) * fix(openresponses): populate Content and accept bare {role,content} items (#10039) Fixes mudler/LocalAI#10039 — `/v1/responses` silently returned empty output on any model whose YAML doesn't include a Go-side `template.chat_message` block. Three cooperating bugs: * `convertORInputToMessages` populated only `StringContent` for string input and for the `input.Instructions` system message, leaving the `Content` (any) field nil. * `TemplateMessages` gated all fallback content-rendering branches on `Content != nil && StringContent != ""` — but every branch in that function consumes `StringContent`, not `Content`. The `&&` silently dropped messages that had StringContent set and Content nil, producing an empty prompt that the 5× empty-retry guard then turned into a 200 OK with `output: []`. * The array-input branch of `convertORInputToMessages` dispatched on `itemMap["type"]` with no default, dropping bare `{role, content}` items emitted by the OpenAI Python SDK helper `client.responses.create(input=[{...}])`. Fix: * Set both `Content` and `StringContent` in the two openresponses message-construction sites that only set one. * Treat a bare `{role, content}` item (no `type`) as `type: "message"` for OpenAI-SDK compatibility. * Gate `TemplateMessages` fallback rendering on `StringContent != ""`, which is what every downstream branch in that function actually reads. Regression test added to `evaluator_test.go` covering the fallback path (no `ChatMessage` template) with a StringContent-only message, both with and without a role mapping. * test(openresponses): guard Content population and ToProto path (#10039) Add regression tests for the two seams the original fix touched but left uncovered: * convertORInputToMessages must populate both Content and StringContent for plain string input and for bare {role, content} array items (the OpenAI SDK shape that omits the type discriminator). Both are functional reds against the pre-fix code. * Messages.ToProto reads Content, not StringContent — this is the path UseTokenizerTemplate backends (imported GGUFs) take. The cases pin that contract so a future regression on the producer side is caught. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-28 07:21:48 +00:00
LocalAI [bot]	7763fb23a3	chore: ⬆️ Update antirez/ds4 to `072bc0feb187be5f374c08b16d0045e1ad7bc9bc` (#10036 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 08:41:03 +02:00
LocalAI [bot]	324277ccfd	chore: ⬆️ Update ggml-org/whisper.cpp to `6dcdd6536456158667747f724d6bd3a2ceaa8d88` (#10032 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:25:20 +02:00
LocalAI [bot]	10d02e6c59	chore: ⬆️ Update leejet/stable-diffusion.cpp to `29ab511fc75f89fbab148665eab1a8e10a139a72` (#10033 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:24:59 +02:00
LocalAI [bot]	05ae06c17b	chore: ⬆️ Update ggml-org/llama.cpp to `aa50b2c2ae91326d5aad956ceeb015d1d48e626b` (#10034 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:23:40 +02:00
LocalAI [bot]	2671e0c6f7	chore(model-gallery): ⬆️ update checksum (#10038 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:22:19 +02:00
LocalAI [bot]	81b6b94f0b	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `3bf7e836c2c5a895e8d12d3eb7e398ae7ab2f9ce` (#10037 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:21:45 +02:00
LocalAI [bot]	373dc44992	fix(react-ui): force .check() on hidden Toggle input in fits-filter e2e (#10031 ) * fix(react-ui): force .check() on hidden Toggle input in fits-filter e2e The polish PR (#10030) swapped the raw <input type=checkbox> for the shared <Toggle> component, which visually hides its native input via opacity:0;width:0;height:0. Playwright's .check() waits for visibility before clicking and times out after 30 s, breaking two UI E2E tests: - enabling fits filter hides models that exceed available VRAM - fits filter state persists after reload Pass { force: true } to skip the visibility check; the input is still the real focusable checkbox and toggles state on click. The companion .toBeChecked() assertion only reads state and works unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * fix(react-ui): click visible Toggle track in fits-filter e2e force:true skips the actionability checks but not the viewport check, and the Toggle's hidden input has width:0;height:0 so Playwright still reports "Element is outside of the viewport". Click the visible .toggle__track inside the filter-bar-group__toggle wrapper instead — that's what a real user clicks, and label-input association toggles the wrapped checkbox naturally. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> v4.3.2	2026-05-27 22:41:01 +02:00
LocalAI [bot]	02a0e70396	fix(react-ui): polish 'Fits in my GPU' filter to use design-system Toggle (#10030 ) * fix(react-ui): polish 'Fits in my GPU' filter to use design-system Toggle The recently added VRAM-fit filter in the Models page used a raw <input type="checkbox"> next to the themed range slider, breaking the visual language of the rest of the row. Swap it for the shared <Toggle> component (already used by Backends, Settings, Traces, AgentCreate), adopt the filter-bar-group__toggle class to drop the duplicated inline styles, add a fa-microchip icon to mirror the per-row fit indicator, and add a subtle left divider so the filter reads as separate from the context-size slider on its left. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(react-ui): move 'Fits in GPU' filter to filter row and unify copy Two follow-ups on the previous polish pass: 1. Move the toggle from the context-slider row into the filter-button row above. The toggle is a filter on the result set, not a config for VRAM estimation, so it belongs with the type chips and backend select. The context slider stays its own thing. 2. Unify the label copy. The same locale file had "Fits in my GPU" for the filter and "Fits in GPU" for the per-row indicator; pick the shorter, possessive-free variant everywhere (en/de/es/it/zh-CN). Update e2e selectors to match. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-27 21:09:14 +02:00
LocalAI [bot]	7a4ca8f60d	feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028 ) Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC. Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large) and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/ SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace. Detection returns Bbox + class_name + confidence; segmentation also returns PNG-encoded per-detection masks via the rfdetr_capi accessor functions (rfdetr_capi_get_detection_{class_id,box,score,class_name, mask_png}). End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection model, 21 detections + valid PNG masks on the seg-nano model against the kitchen fixture). Wiring: - backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt, Makefile,run.sh,package.sh,test.sh,.gitignore} - Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target, .NOTPARALLEL, prepare-test-extra, test-extra - backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra - .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13 (arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16 - backend/index.yaml: rfdetr-cpp meta anchor + latest/development image entries for every matrix tag-suffix - .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking (mudler/rf-detr.cpp branch main) - gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection variants + 6 seg variants), all backed by mudler/rfdetr-cpp-* on HuggingFace with sha256 pinning on the F16 default - core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports (mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format repos stay on the Python rfdetr backend; explicit preferences.backend overrides both heuristics) - core/gallery/importers/rfdetr_test.go: table-driven coverage of the auto-routing + a live mudler/rfdetr-cpp-nano cross-check scripts/changed-backends.js needs no change: the existing Dockerfile.golang -> backend/go/${item.backend}/ branch already routes the 9 rfdetr-cpp matrix entries to the correct backend path. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-27 18:43:57 +02:00
LocalAI [bot]	893e69cbf8	fix(react-ui): share single /api/operations poller across consumers (#10029 ) useOperations() spun up its own setInterval per hook instance, so on pages like /app/models the OperationsBar in App.jsx plus the page's own useOperations() call each polled /api/operations at 1 Hz - 2 RPS sustained for the whole session, repeated on Backends and Chat. Lift the poller into an OperationsProvider mounted under AuthProvider so all consumers (OperationsBar, Models, Backends, Chat) share one timer. The hook file re-exports from the context to keep call sites unchanged. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-27 16:39:09 +02:00
Siddharth More	c9a1a7e6a0	UI: add 'Fits in my GPU' filter on Install Models (#10017 ) * feat(ui): add GPU fit filter on models install page * Delete docs/vram-fits-filter-backend-optionals.md Signed-off-by: Siddharth More <siddimore@gmail.com> --------- Signed-off-by: Siddharth More <siddimore@gmail.com>	2026-05-27 15:17:44 +02:00
LocalAI [bot]	4d01298048	chore: ⬆️ Update antirez/ds4 to `e8e8779b261c10f36ad6270ba732c8f0be5b62e3` (#10024 ) ⬆️ Update antirez/ds4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 15:16:43 +02:00
LocalAI [bot]	b6055e7ecf	chore: ⬆️ Update leejet/stable-diffusion.cpp to `92dc7268fc4ffb0c0cc0bd52dfcefea91326e797` (#10023 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 15:16:23 +02:00
LocalAI [bot]	51bad74bf8	chore: ⬆️ Update ggml-org/llama.cpp to `0d18aaa9d1a8af3df9abccd828e22eeaac7f840b` (#10022 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:29:14 +02:00
LocalAI [bot]	f3236b74cf	chore: ⬆️ Update ggml-org/whisper.cpp to `27101c01dcac1676e2b6422256233cd0f1f9ae28` (#10021 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:28:55 +02:00
LocalAI [bot]	eed3ecff82	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `d2da6da05c73aeb658a3d1751f386c24e6963856` (#10020 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-27 00:28:32 +02:00
番茄摔成番茄酱	df7623fd87	fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models (#10012 ) * fix(nemo): extract Hypothesis.text for TDT/RNNT ASR models CTC models (e.g. Whisper) return List[str] from transcribe(), but TDT/RNNT models (e.g. parakeet-tdt-0.6b-v3) return List[Hypothesis] where the decoded text lives in the Hypothesis.text attribute. Previously, results[0] was assigned directly to the protobuf string field, causing silent empty output for non-CTC models. Now checks the return type and extracts .text from Hypothesis objects, with a safe fallback via getattr(). * refactor: simplify Hypothesis text extraction per Copilot review Use single getattr() call instead of hasattr() + double access, and return empty string for unknown types instead of str(result) to avoid leaking internal repr to clients.	2026-05-26 20:35:23 +00:00
番茄摔成番茄酱	4e5ec6f67b	fix(qwen-asr): enable timestamp output when forced_aligner is configured (#10013 ) * fix(qwen-asr): enable timestamp output when forced_aligner is configured Two bugs prevented timestamps from working in the qwen-asr backend: 1. transcribe() was called without return_time_stamps=True, so the forced aligner was loaded but never invoked. Now we pass return_time_stamps=True when a forced_aligner is present. 2. The timestamp parsing code expected (list, tuple) items, but the qwen_asr library returns ForcedAlignItem dataclass instances with .text, .start_time, .end_time attributes. Added hasattr() check to handle this correctly, falling back to tuple parsing for backward compatibility. * refactor: address Copilot review for qwen-asr timestamps - Wrap return_time_stamps kwarg in try/except TypeError for safety - Add defensive float() normalization for timestamp times - Use str() for text extraction to ensure string type * fix(qwen-asr): convert seconds to nanoseconds for Go time.Duration The Go server reads TranscriptSegment.start/end via time.Duration, which is in nanoseconds. Previously the backend sent milliseconds (* 1000), causing timestamps to be 1000x too small (e.g. 8e-8 instead of 0.08). Convert seconds → nanoseconds (* 1e9) instead. Also applies to the legacy tuple path for consistency. * feat(qwen-asr): respect timestamp_granularities (segment vs word) Read request.timestamp_granularities from the gRPC request. - 'word': return one segment per aligned item (character / word) - 'segment' (default): merge consecutive items at sentence boundaries Sentence boundaries detected via CJK punctuation (。！？；…) and Latin endings (. ! ? ;). This matches the OpenAI Whisper API contract where omitting the parameter defaults to segment-level. * fix(qwen-asr): escape smart quotes in punctuation set Unicode curly quotes (U+2018/2019) were being interpreted as Python string delimiters, causing SyntaxError. Use explicit unicode escapes. * fix(qwen-asr): use time-gap threshold for segment boundaries The forced aligner strips punctuation from its output, so text-based sentence detection doesn't work. Instead, detect segment boundaries by measuring time gaps between consecutive aligned items. Threshold = max(median_gap * 4, 0.3s). This cleanly separates intra-sentence gaps (< 0.24s) from inter-sentence gaps (> 0.3s) across Chinese, English, and other languages. * fix(qwen-asr): smart join with spaces for non-CJK tokens The forced aligner strips whitespace from tokenized text, so English words like ['hello', 'world'] were joined as 'helloworld'. Add _smart_join() that inserts spaces between non-CJK tokens while keeping CJK characters and punctuation unspaced. Works for Chinese, English, Korean, Japanese, and mixed-language text. --------- Co-authored-by: fqscfqj <fqsfqj@outlook.com>	2026-05-26 20:34:21 +00:00
Richard Palethorpe	8d70855ea6	test: add Go + React UI coverage gates and fill test gaps (#9989 ) - Strict monotonic Go coverage gate (make test-coverage-check, 45% baseline) run in CI; fixes ginkgo dropping all-but-one coverprofile across multiple recursive roots, builds with -tags auth, and folds in the in-process tests/e2e suite via --coverpkg. - React UI e2e coverage (make test-ui-coverage: vite-plugin-istanbul + nyc, nix-provided Chromium) plus e2e specs for 6 previously-untested pages, and a UI coverage gate (make test-ui-coverage-check) with a small tolerance since e2e line coverage jitters ~0.5pp run-to-run. - pre-commit hook: lint + coverage on Go changes, Playwright e2e + UI coverage gate on react-ui changes; install with make install-hooks. - New Go handler tests (settings, branding), hermetic base64 download test. - fix(ui): model editor reads vram_display (snake_case), so the VRAM estimate renders again; covered by a regression test. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-26 22:06:10 +02:00
dependabot[bot]	90c29f9258	chore(deps): bump protobuf from 6.33.5 to 7.35.0 in /backend/python/transformers (#10004 ) chore(deps): bump protobuf in /backend/python/transformers Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.5 to 7.35.0. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Commits](https://github.com/protocolbuffers/protobuf/commits) --- updated-dependencies: - dependency-name: protobuf dependency-version: 7.35.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-26 22:03:59 +02:00

1 2 3 4 5 ...

6521 Commits