LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-24 16:49:06 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	e6042080c0	fix(agents): URL-decode collection/agent name path params (#10443 ) (#10471 ) fix(agents): URL-decode collection/agent name path params Collection and agent names carry a "legacy-api-key:" prefix, so the ':' arrives percent-encoded as %3A in the request path. Echo routes such paths via URL.RawPath and stores the matched path-param value still escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch" and the store lookup 404'd ("collection not found"). This was second-order fallout of #10375/#10387: once colons became valid in names, the URL-decode gap surfaced on every name-bearing endpoint. Add a decodedParam helper that url.PathUnescape's the param (falling back to the raw value on invalid encoding) and wire it into all collection endpoints and the agent :name endpoints, which share the identical prefix. The entry endpoints already unescaped c.Param("*"); this closes the same gap for :name. Fixes #10443 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-24 09:42:09 +02:00
LocalAI [bot]	0f3b24436d	chore: ⬆️ Update mudler/parakeet.cpp to `89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a` (#10468 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:43 +02:00
LocalAI [bot]	4b6f911835	chore: ⬆️ Update ggml-org/whisper.cpp to `43d78af5be58f41d6ffbc227d608f104577741ea` (#10466 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:41:14 +02:00
LocalAI [bot]	a5e28942a6	chore: ⬆️ Update ggml-org/llama.cpp to `be4a6a63eb2b848e19c277bdcf2bd399e8af76d9` (#10467 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:54 +02:00
LocalAI [bot]	dba9cd7ca4	chore: ⬆️ Update CrispStrobe/CrispASR to `96b2a6ee31d30389fed8a7ef1a54239b75231ddc` (#10465 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:34 +02:00
LocalAI [bot]	c93190de50	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `7ccf1d209588962b96eacca325b37e9b3e8faf5e` (#10456 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 09:40:13 +02:00
LocalAI [bot]	4dbf69f889	chore(model gallery): 🤖 add 1 new models via gallery agent (#10472 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-24 00:00:26 +02:00
LocalAI [bot]	deb430f3ec	chore(model-gallery): ⬆️ update checksum (#10469 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> v4.5.0	2026-06-23 23:15:47 +02:00
LocalAI [bot]	dd8c8778e2	chore(model gallery): 🤖 add 1 new models via gallery agent (#10464 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 15:43:21 +02:00
LocalAI [bot]	06a7b6cadb	chore: ⬆️ Update leejet/stable-diffusion.cpp to `f440ad9c29dd8bc34e5d1f4b863832b96d6ea05f` (#10457 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:29:07 +02:00
LocalAI [bot]	67c8889866	chore: ⬆️ Update CrispStrobe/CrispASR to `63b57289255267edf66e43e33bc3911e04a2e92d` (#10455 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:49 +02:00
LocalAI [bot]	1d49041c85	chore: ⬆️ Update ggml-org/llama.cpp to `73618f27a801c0b8614ceaf3547d3c2a99baae14` (#10458 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:28:09 +02:00
LocalAI [bot]	2edc4e25b3	chore: ⬆️ Update ggml-org/whisper.cpp to `bae6bc02b1940bbfb87b6a0299c565e563b916d1` (#10459 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 13:27:51 +02:00
Richard Palethorpe	7888067914	fix(settings): merge partial /api/settings updates instead of overwriting (#10463 ) POST /api/settings rebuilt runtime_settings.json from only the request body, so a focused admin page that submits a single field wiped every other persisted setting. The Middleware proxy tab (mitm_listen) and detector table (pii_default_detectors), plus the MCP SetBranding tool (instance_name/instance_tagline), all POST partial bodies; the no-omitempty api_keys and pii_default_detectors fields even round-tripped as JSON null. Read the persisted settings and overlay only the fields the request set (RuntimeSettings.MergeNonNil) before writing. Every field is a pointer, so the reflection-based merge is total over the struct and any field added later is preserved automatically. Absent or null fields are now kept; clearing a setting is done by sending its explicit empty/zero value (api_keys [], mitm_listen "", etc.), unchanged from before. The full Settings page sends every field, so its Save behaves identically. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-23 13:27:34 +02:00
LocalAI [bot]	9eedbf537a	chore(model gallery): 🤖 add 1 new models via gallery agent (#10461 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-23 08:04:46 +02:00
LocalAI [bot]	69c16481c8	fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460 ) PR #10454 added a `cancellable bool` parameter to GalleryStore.UpdateProgress but missed two callers under tests/e2e/distributed, breaking the build on master (golangci-lint and tests-e2e-backend both failed to compile with "not enough arguments in call to ... UpdateProgress"). Pass cancellable=true (both ops are downloading installs, which are cancellable) and assert the flag is persisted, exercising the new behavior. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 23:45:22 +02:00
LocalAI [bot]	56f8a6623f	fix(galleryop): persist cancellable so restarted in-flight ops stay cancellable (#10454 ) In distributed mode a model/backend install marks OpStatus.Cancellable=true while downloading, but the gallery_operations row never recorded it: UpdateStatus persisted only progress/status and Create left the cancellable column at its zero value. After a replica restart Hydrate rebuilt the op with cancellable=false, /api/operations reported false, and the UI hid the cancel button - the orphaned op then lingered until the 30-minute stale reaper expired it ("stays there on restart, can't cancel, after a bit it expires"). Persist the flag on every progress tick and at row creation (installs are cancellable, deletes are not), and clear it on terminal transitions. A rehydrated in-flight op is now cancellable, so an admin can dismiss the orphaned op immediately instead of waiting out the reaper. The functional cancel path already survived restart (CancelOperation persists store.Cancel even with no live CancelFunc); this restores the UI affordance that drives it. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 22:41:16 +02:00
Ettore Di Giacinto	4755d676a3	Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar)" (#10453 ) Revert "feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top…" This reverts commit `9d54a599b0`.	2026-06-22 21:59:05 +02:00
dependabot[bot]	10184b5e28	chore(deps): bump actions/checkout from 6 to 7 (#10451 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 6 to 7. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v6...v7) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-22 21:38:37 +02:00
LocalAI [bot]	fdf475ec5f	feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446 ) * feat(realtime): add pipeline.compaction config + resolution Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): extract itemID helper, reuse in item.retrieve Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): drop duplicate Ginkgo bootstrap, fold specs into openai suite Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.delete Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement input_audio_buffer.clear Add a handler for the input_audio_buffer.clear client event that discards a partially-captured utterance (raw PCM + buffered Opus frames) via a unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.truncate (text) Clears both .Text and .Transcript of the assistant content part at contentIndex so barge-in truncation also works for audio turns whose spoken words live in .Transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): add Conversation.Memory + pair-safe compactionCut Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, avoids panic) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(realtime): gofmt compaction test helper closures Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): inject rolling memory into the prompt + summary builders Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): server-side summarize-then-drop compactor Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): unit-test prefixMatches eviction-safety predicate Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): resolve summarizer model + schedule compaction per turn Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document conversation compaction + new item events Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): resolve summary model inside compaction goroutine (lazy, off-path) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): reuse reasoning.ExtractReasoningComplete for summary stripping Replace the bespoke <think> regex in the compactor with the shared pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of the realtime path and covering all reasoning tag families, not just <think>. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(config): register pipeline.compaction fields in meta registry TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have a UI/meta registry entry; add the four pipeline.compaction.* leaves so they render with proper labels/descriptions instead of the reflection fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 21:28:49 +02:00
LocalAI [bot]	9d54a599b0	feat(ui): role and deployment-mode adaptive UI (landing, sidebar, top navbar) (#10449 ) * feat(ui): add shared DeploymentContext (features + p2p signal) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(ui): extract launchAssistantChat shared helper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): role/mode-aware landing redirect at /app Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): pin Cluster group and collapse Create for cluster admins Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): desktop top navbar with mode pill and admin-via-chat jump Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): admin token-usage meter in the top navbar Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ui): top-navbar breakpoint handoff + assistant jump from chat page M1: the desktop .top-navbar was hidden at max-width 768px while the .mobile-header only appears at max-width 639px, leaving 640-768px with neither bar so admins lost the mode pill, token meter and admin-via-chat jump. Hide the top bar at 639px instead so it covers every width the rail sidebar is shown and hands off to the mobile-header exactly at 639px. M2: the navbar 'Admin via chat' button wrote localStorage and called navigate('/app/chat'), but when already on the chat page Chat does not remount so its mount-time payload reader never fired and the click was a no-op until reload. The payload consume logic is factored into a shared callback; the launcher now dispatches a localai-open-assistant event that the mounted Chat listens for to re-consume the payload. Mount behavior is unchanged. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 21:27:43 +02:00
Richard Palethorpe	63bcbf6c12	fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401 ) * fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier Follow-up to the NER tier engine (#10360), already on master. This carries only the incremental review fixes and tests that postdate that merge — the feature itself is not re-introduced. Review fixes: - openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText (the `elem < 0` guard above already returns). - application.go: collapse ResolvePIIPolicy's inline re-implementation of PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the "explicit pii.enabled wins, else cloud-proxy default" rule) and return true past the !enabled guard where it is provable. - pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in patternDetector.Detect into one local. - grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire — dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000] is rejected here with an actionable error; >1000 still fails closed via Parse. Specs pin the relationship so the guard can't silently revert. - PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and force the DOM value back when clamping (min={0} only constrained the spinner, so a negative reached saved config and silently disabled the length filter). Tests: - piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead). - model-config.spec.js: assert the min_len clamp, and that entity_actions collapses a duplicate group to a single row (map semantics; regression guard against emitting an array that drops a row on save). - tests/e2e-backends: token_classify capability driving the TokenClassify gRPC RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via `make test-extra-backend-privacy-filter` (3/3 specs). - Makefile: test-extra-backend-privacy-filter wrapper. - tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact (mask + block) through the full HTTP -> detector -> redactor path; gated on PII_NER_MODEL_GGUF so the default suite is unaffected. - .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job running the container harness on CPU. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(gallery): add privacy-filter-nemotron (f16 + q8) GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing privacy-filter-multilingual entry, trading language breadth for category depth. - privacy-filter-nemotron: F16 reference artifact (~2.8 GB). - privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge use; description notes the size/speed tradeoff and to validate on your own data (a single dropped span is a PII leak). Both run on the privacy-filter backend with known_usecases [token_classify] and a default mask policy (min_score 0.5); operators add per-category entity_actions as needed. sha256s taken from the HF repo's LFS object ids. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-22 18:26:19 +02:00
LocalAI [bot]	95b058e1c5	feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447 ) * chore: gitignore SDD scratch directory Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(nodes): add GET /api/nodes/models cluster-wide loaded-models endpoint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): add nodesApi.allModels() for cluster-wide model roster Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): move Scheduling to its own page and nav item Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): replace nodes stat-card strip with cluster pulse + attention callout Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): node-panel roster with inline model chips and segmented filter Replace the Nodes table with a full-width node-panel roster that shows each backend node's running-model chips without an expand click, plus an All/Backend/Agent segmented filter. Per-node detail (models, backends, labels, capacity) moves to the node detail page. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(ui): add deep-linkable node detail page at /app/nodes/:id Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(ui): remove em-dash from CapacityEditor comment; align detail spec backend mock Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(ui): nodes page cleanup, hover/chip polish, docs for restructured cluster view Nodes.jsx dead-code sweep confirmed clean (no StatCard/table/expand state/scheduling-form leftovers). Two App.css polish fixes: move the node-panel hover border-color onto the bordered element so hover gives real feedback, and add the missing .model-chip__state rule the ModelChip component already emits. Update distributed-mode docs prose to describe the restructured cluster view (cluster pulse, attention callout, node-panel roster with inline model chips, All/Backend/Agent filter, node detail page at /app/nodes/:id, Scheduling as its own page). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(ui): drop unused gpuVendorLabel export from nodeStatus Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 18:24:29 +02:00
LocalAI [bot]	f2abcc7503	chore(model gallery): 🤖 add 1 new models via gallery agent (#10445 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 16:09:16 +02:00
Adira	62c99c10b3	fix(diffusers): pin diffusers and transformers to a known-good pair (#9979 ) (#10442 ) fix(diffusers): pin diffusers and transformers to a known-good pair The diffusers backend tracked git+https://github.com/huggingface/diffusers (main) with an unpinned transformers. transformers v5 restructured CLIPTextModel and removed the .text_model attribute that diffusers' single -file loader reads, so loading any single-file Stable Diffusion checkpoint fails: create_diffusers_clip_model_from_ldm (single_file_utils.py) position_embedding_dim = model.text_model.embeddings.position_embedding... AttributeError: 'CLIPTextModel' object has no attribute 'text_model' No released diffusers (<=0.38.0) supports transformers v5 - only unreleased diffusers main does. Because the requirements tracked main plus an unpinned transformers, every backend image froze whichever pair existed at build time, and images built once transformers v5 shipped but before diffusers main caught up are permanently broken. Pin the last known-good released pair across all requirements files: diffusers==0.38.0 and transformers==4.57.6. 0.38.0 still exposes every pipeline backend.py imports (Flux, Wan, Sana, LTX2, Qwen, GGUF), so no functionality is lost, and builds become reproducible instead of drifting into the broken window. Fixes #9979 Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-22 12:38:06 +02:00
LocalAI [bot]	7226bb9f30	chore: ⬆️ Update CrispStrobe/CrispASR to `7a8cb80907341c0204bd0488c1244764f4163883` (#10315 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 12:21:58 +02:00
LocalAI [bot]	569d9bbd9e	fix(distributed): broadcast file-staging progress across replicas (#10440 ) File-staging progress lived only in the SmartRouter's in-memory StagingTracker on the replica performing the transfer. In a multi-replica deployment behind a round-robin load balancer, a /api/operations poll that lands on any other replica saw no staging row, so the progress ("processing file ... Total ... Current ...") flickered in and out as polls rotated between frontends. Mirror the pattern already used for gallery-install progress: the origin replica broadcasts staging ticks over NATS (SubjectStagingProgress, a new staging.<model>.progress subject), and peers merge them via ApplyRemote (SubscribeBroadcasts on the wildcard). Byte-level ticks are leading-edge debounced (~1/s); Start/FileComplete/Complete always publish. A locally-owned op stays authoritative so the origin's own echo and stray peer events can't clobber it, and mirrored remote ops expire after a TTL so a missed Done event can't leave a phantom row. The UI read path (StagingTracker.GetAll) is unchanged. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 09:28:07 +02:00
LocalAI [bot]	682fb2718c	fix(distributed): detach cold-load staging from the request context (#10438 ) A model not yet loaded on a worker is staged lazily on the inference request path. Staging a multi-GB model takes minutes - far longer than any client keeps its HTTP request open - so a browser refresh, an ingress/LB idle-timeout, or a round-robined retry landing on another frontend replica cancels the request context and aborts the upload with "context canceled" mid-transfer. Large models then never finish staging, so they never load (observed in a 2-replica deployment: both frontends repeatedly failed to stage a 15.7 GB GGUF, each attempt dying at a different offset). Bind the cold load (staging + LoadModel + the per-model advisory lock) to context.WithoutCancel(ctx): it keeps the request's values (prefix chain) but drops cancellation/deadline. Each long step keeps its own bound (the file stager's resume budget, LoadModel's 5m timeout), and the advisory lock still de-dupes concurrent loaders across replicas. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 09:06:20 +02:00
LocalAI [bot]	20c643e1f6	chore(model gallery): 🤖 add 1 new models via gallery agent (#10439 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 08:46:34 +02:00
VJSai	64a4351f3a	feat: send a LocalAI User-Agent on registry pulls (#10434 ) LocalAI pulls models from OCI registries (via go-containerregistry), the Ollama registry, and OCI blob stores (via oras), but every request went out with the underlying library's generic User-Agent, so registry operators had no way to attribute traffic to LocalAI. Add an oci.UserAgent() helper that returns "LocalAI" (or "LocalAI/<version>" when the binary is built with a version stamp via internal.Version) and wire it into all three pull paths: - pkg/oci/image.go: remote.WithUserAgent on the go-containerregistry image and digest requests - pkg/oci/ollama.go: a User-Agent header on the Ollama manifest request - pkg/oci/blob.go: a LocalAI User-Agent on the oras blob client. This mirrors oras' auth.DefaultClient (same retry.DefaultClient policy); only the advertised User-Agent changes. Implements #6258. Assisted-by: Claude:claude-opus-4-8 golangci-lint Signed-off-by: Vijay Sai <vijaysaijnv@gmail.com>	2026-06-22 08:44:12 +02:00
LocalAI [bot]	b7d67f5779	chore: ⬆️ Update ggml-org/llama.cpp to `7c082bc417bbe53210a83df4ba5b49e18ce6193c` (#10417 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 08:43:40 +02:00
LocalAI [bot]	600dafd20b	feat(ced): sound-event classification backend (CED audio tagger) (#10425 ) * feat(ced): sketch sound-classification backend (CED audio tagger) Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry, footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend. SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist in DESIGN.md): - backend/backend.proto: new SoundDetection rpc + SoundClass messages (run `make protogen-go` to regenerate pkg/grpc/proto). - backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h), goced.go (Ced gRPC backend: Load + SoundDetection), Makefile (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh, package.sh, .gitignore. - DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability registration checklist), gallery/index + CI registration, and a scoping note for the realtime/websocket live-recognition path (sliding-window classify over the existing ws transport + voicegate; the ced C-API per-PCM entry point is already window-friendly). Backend code does not compile until protogen-go regenerates the pb types and a libced.so is built (Makefile clones+builds it). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): REST /v1/audio/classification endpoint + capability registration Wires the ced sound-event classification backend (AudioSet audio tagger) end to end through the REST surface, mirroring the transcription path. - Handler: core/http/endpoints/openai/sound_classification.go parses the multipart audio upload, temp-files it, resolves the model config and calls the SoundDetection RPC; returns {model, detections[]} JSON. - Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection) loads the model and normalizes the proto response into schema types. - Schema: core/schema/sound_classification.go (SoundClassificationResult). - gRPC layer: SoundDetection wired through the LocalAI wrapper (interface, Backend client, Client, embed, server, base default) so the loader-typed client exposes the RPC; proto regenerated via make protogen-go. - Route: POST /v1/audio/classification (+ /audio/classification alias) with the audio/multipart default-model middleware in routes/openai.go. - Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_ CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap + GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase option; /api/instructions audio area updated; auth RouteFeatureRegistry + FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter + i18n; docs page features/audio-classification.md + whats-new + crosslink. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): realtime sound-event detection over the websocket API When a realtime pipeline configures a sound-classification model, each VAD-committed utterance (the same window the transcription path produces) is also run through the CED sound-event classifier and the scored AudioSet tags are emitted as a new server event. No new backend rpc is needed: the SoundDetection gRPC method already exists on this branch. - config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty) beside Transcription/VAD. - realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the ModelInterface; implement it on wrappedModel and transcriptOnlyModel by calling backend.ModelSoundDetection with the session's sound-classification model config (mirrors how Transcribe dispatches). Load the optional config in newModel / newTranscriptionOnlyModel; nil config keeps it additive. - types: add ConversationItemSoundDetectionEvent (item_id, content_index, detections[]{label,score,index}) with type conversation.item.sound_detection, its ServerEventType constant and MarshalJSON, mirroring the transcription completed event. - realtime: add emitSoundDetection (unary path: classify the committed window, build the event, t.SendEvent) and wire it at the utterance-commit hook right after emitTranscription; gated on session.SoundDetectionEnabled (resolved from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0). Its error is logged via xlog but never aborts the turn. - test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections, classifier error) plus a SoundDetection method on the fakeModel double. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): implement SoundDetection in nodes backend test doubles The SoundDetection method added to the grpc backend interface left two test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so core/services/nodes failed to compile under `go vet`/`go test` (go build missed it: the doubles live in _test.go). Add the method to both, mirroring their existing Detect mock. Repairs CI for the nodes package. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): decouple realtime sound detection from VAD (sound-only sessions) Sound-event detection must activate on sounds, not speech, so it no longer runs through the voice VAD/transcription path. A sound-detection-only pipeline (sound_detection set, no transcription/LLM) now: - is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline stage), - builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS loaded), and - defaults the session to turn_detection none (no VAD) with no transcription stage, so the client drives windowing via input_audio_buffer.commit (option A: client-side sliding window). The per-PCM C-API already supports arbitrary windows. commitUtterance gains a sound-only branch: it emits the conversation.item.sound_detection event (scored AudioSet tags) and stops - no transcription, no LLM response. generateResponse is now guarded on a transcription stage being present, so a sound-only turn never invokes the LLM. Existing transcription/VAD sessions are unchanged (additive). Added a commitUtterance sound-only Ginkgo spec asserting it emits the sound event and neither transcribes nor generates a response. go vet + golangci-lint (new-from-merge-base) clean; openai suite green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): register sound-classification backend in gallery + CI Mechanical backend-image registration for the ced sound-event classifier, mirroring the parakeet-cpp Go/purego backend everywhere it is wired up. - .github/backend-matrix.yml: add the ced build matrix, field-for-field copies of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64, l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan amd64/arm64, rocm hipblas, and the metal darwin entry), changing only backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang. - backend/index.yaml: add the &ced meta anchor (capabilities map per platform) plus ced-development and the per-arch image entries, each uri/mirror tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is intentionally deferred pending the HuggingFace publish (TODO note inline). - scripts/changed-backends.js: add an explicit item.backend === "ced" branch in inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as the parakeet-cpp branch (before the generic golang fallthrough). - .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in backend/go/ced/Makefile so the daily bot bumps the pin. - swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so the existing /v1/audio/classification annotations land in the generated spec. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): server-side windowing for realtime sound detection (option B) Adds an optional server-driven sliding-window classifier so a sound-only realtime client only has to stream audio (no input_audio_buffer.commit): - Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs. When both > 0 on a sound-only session, the server classifies the last window of streamed audio every hop and emits a conversation.item.sound_ detection event; the input buffer is trimmed to one window so a long stream stays bounded. When unset, the session stays client-driven (option A). Runs independent of VAD (sound events are not speech). - handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so it is unit-testable) + writeWindowWAV, which declares the true InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples correctly. Goroutine is started after toggleVAD and torn down with the session (close + wg.Wait). - Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta registry; the earlier realtime commit added pipeline.sound_detection without a registry entry, failing TestAllFieldsHaveRegistryEntries. This fixes that and covers the two new knobs. Tests: classifySoundWindow emits an event + trims the buffer to one window, no-ops on too-little audio; writeWindowWAV declares the given sample rate. go build/vet + golangci-lint (new-from-merge-base) clean; config + openai suites green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0) The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0, converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced + known_usecases: sound_classification) and two gallery/index.yaml entries (ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and removes the now-resolved TODO from backend/index.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add tiny/mini/small GGUF model gallery entries Publishes the rest of the CED family (same architecture, metadata-driven port verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds their f16 + q8_0 gallery entries: ced-tiny (5.5M, edge/Pi-class) f16 11MB / q8_0 6MB ced-mini (9.6M) f16 19MB / q8_0 11MB ced-small (22M) f16 42MB / q8_0 23MB All sha256-pinned. ced-base remains the accuracy default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8 gallery model entries' urls + file uris accordingly. sha256 and filenames are unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): bump CED_VERSION to the short-clip fix Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip shorter than target_length (~10.11s): time_pos_embed was added at its full 63-frame grid instead of being sliced to the clip's actual time grid, tripping ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s windows) and gated with a short-clip parity test upstream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive - README.md: add ced.cpp to the "native C/C++/GGML engines developed and maintained by the LocalAI project" table. - docs/content/features/backends.md: add a Sound Classification backend category (sound-event classification / audio tagging) listing ced.cpp. - .agents/adding-backends.md: add a "Documenting the backend" section and two verification-checklist items requiring new backends to be documented in the backends.md category list, and in-house native engines to be added to the README maintained-engines table. This directive was missing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): repin CED_VERSION to the v0.1.0 release commit ced.cpp history was squashed into a single release commit (tagged v0.1.0), so the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the v0.1.0 release commit, so the backend builds against a commit that exists. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths - sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler. - goced.go: reading a NUL-terminated C string from a libced-owned buffer. #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since the uintptr is a C-owned malloc'd buffer, not Go-GC memory. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 01:00:28 +02:00
LocalAI [bot]	ce8a3e9266	chore: ⬆️ Update ServeurpersoCom/qwentts.cpp to `4536dcdce27c3764a93a06d6bf64026b124962f5` (#10431 ) ⬆️ Update ServeurpersoCom/qwentts.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 01:00:10 +02:00
LocalAI [bot]	a88d9d2de3	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `6c00e87ac84404af588ad2e65935bd6f079c696f` (#10430 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:49 +02:00
LocalAI [bot]	1cf1bf32e1	chore: ⬆️ Update leejet/stable-diffusion.cpp to `b12098f5d09fc83da36e65c784f7bdb16a5a5ebf` (#10429 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:33 +02:00
LocalAI [bot]	f45c6acc54	chore(model gallery): 🤖 add 1 new models via gallery agent (#10437 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:57:08 +02:00
LocalAI [bot]	1a1bd57469	chore(model gallery): 🤖 add 1 new models via gallery agent (#10436 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-22 00:46:56 +02:00
LocalAI [bot]	1f29e96030	chore(model gallery): 🤖 add 1 new models via gallery agent (#10433 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-21 23:51:43 +02:00
LocalAI [bot]	64560a974b	chore(model gallery): 🤖 add 1 new models via gallery agent (#10432 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-21 23:31:17 +02:00
LocalAI [bot]	32c47706ae	feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 ) * feat(realtime): add voice_recognition enforce + identity config Add Enforce bool and Identity VoiceIdentityConfig to PipelineVoiceRecognition, plus EnforceGate/IdentityEnabled/ AnnounceEnabled/PersonalizeEnabled helpers. Enforce nil defaults to gating (backward compatible); identity surfacing is independent of the gate. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): add Speaker type and conversation.item.speaker event Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): split voiceGate into Resolve + authorize Split the speaker authorization into a Resolve step (embed once, produce a types.Speaker identity) and a pure authorize policy step, with a 0..100 confidence score mirroring /v1/voice/identify. The legacy Authorize wrapper is kept so existing specs stay green. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): resolve speaker per turn and emit conversation.item.speaker Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): personalize LLM turns with recognized speaker Set the per-message name field on each recognized user turn and append a current-speaker note to the system message, both gated by the voice recognition identity config. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document speaker identity surfacing and personalization Document the new voice_recognition keys (enforce, identity.) and the LocalAI-extension conversation.item.speaker server event in the realtime feature docs. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> test(realtime): cover when:first+identity re-resolution and multi-speaker history Add two integration specs to harden the speaker-aware realtime path: - when:first with an Identity block re-resolves the speaker every turn even though re-authorization is skipped after the first match: a later resolve error now fails closed, while a clean later resolve still surfaces and names the speaker. - multi-speaker history attribution: each user turn carries its own per-message name and the injected system note reflects the latest speaker. Test-only change; no production behavior was modified. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): surface speaker labels in conversation.item.speaker Carry the registered speaker's labels (identify mode) on types.Speaker so they flow into the conversation.item.speaker event and the stored item. Verify mode has no labels, so the field is omitted there. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(e2e): cover conversation.item.speaker over a real websocket Add a realtime-pipeline-identity config (verify mode, enforce:false, identity announce+announce_unknown+personalize) and two e2e specs driving the real server over a real WebSocket with the mock VoiceEmbed backend: an authorized speaker yields a conversation.item.speaker event naming e2e-speaker (matched true) and reaches response.done; an unauthorized speaker yields an unknown (matched false, no name) event and still responds, proving enforce:false never drops a turn. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(config): register voice_recognition enforce + identity fields The meta registry coverage test (TestAllFieldsHaveRegistryEntries) requires every config field to have an entry in core/config/meta/registry.go. The new voice_recognition.enforce and voice_recognition.identity.* fields were missing, failing tests-linux and tests-apple. Add registry entries (toggles) so the fields are surfaced in the model-config editor and the coverage test passes. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-06-21 21:07:10 +02:00
Tai An	e58870a573	feat(react-ui/chat): paste images from clipboard into chat input (#10428 ) The chat input only accepted attachments via the file picker, so users who copied an image from a webpage or a screen region had to first save it to a file before attaching it (#10361). Add an onPaste handler on the input textarea that pulls image items out of the clipboard and routes them through the same staging path as the file picker. The per-file processing in handleFileChange is extracted into a shared processFiles helper so both entry points stay in sync. Clipboard images, which arrive unnamed or as a generic "image.png", are given unique typed names so multiple pastes don't collide, and the default paste is suppressed only when an image is actually attached so normal text paste is unaffected. Closes #10361 Signed-off-by: Anai-Guo <antai12232931@outlook.com>	2026-06-21 18:20:56 +02:00
LocalAI [bot]	8fab1d2e45	fix(ci): namespace-import js-yaml in changed-backends.js (Bun ESM: missing default export) (#10427 ) fix(ci): use namespace import for js-yaml in changed-backends.js js-yaml's ESM build exposes only named exports (load, dump, ...) and no default export. Bun's strict ESM interop rejects the default import with 'Missing default export in module js-yaml.mjs', failing the detect-changes and generate-matrix CI jobs. Import the namespace instead; yaml.load (the only usage) resolves to the named export, so behavior is unchanged. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-21 17:52:02 +02:00
LocalAI [bot]	7b462a0d51	fix(backend): call vram.EstimateModelMultiContext (master build broken: undefined vram.EstimateModel) (#10426 ) fix(backend): call vram.EstimateModelMultiContext for model size estimate core/backend/options.go called vram.EstimateModel, which does not exist in the vram package (it exposes EstimateModelMultiContext). This broke the build on master (undefined: vram.EstimateModel). Use EstimateModelMultiContext with a nil context-size slice (defaults to a single 8192 estimate); the returned MultiContextEstimate.SizeBytes is exactly what the caller consumes, so size estimation behavior is unchanged. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-21 17:51:46 +02:00
LocalAI [bot]	aed181e6c1	chore(model gallery): 🤖 add 1 new models via gallery agent (#10423 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-21 17:40:55 +02:00
OrbisAI Security	a556cd9afc	fix: the trl backend's _do_training method directly ... in backend.py (#10422 ) * fix: V-001 security vulnerability Automated security fix generated by OrbisAI Security Signed-off-by: orbisai0security <mediratta01.pally@gmail.com> * fix: the trl backend's _do_training method directly ... in backend.py The TRL backend's _do_training method directly uses request Signed-off-by: orbisai0security <mediratta01.pally@gmail.com> --------- Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>	2026-06-21 17:40:29 +02:00
Leoy	b50b1fe418	feat(watchdog): add size-aware LRU eviction mode (#9527 ) * feat(watchdog): add size-aware LRU eviction mode When the model count hits the LRU limit or the memory reclaimer fires, evict the largest model by on-disk file size first rather than the least-recently-used one. For GGUF models the file size is a reliable proxy for GPU/RAM footprint, so evicting the largest candidate maximises freed memory per eviction round while keeping small utility models (embeddings, classifiers, rerankers) resident. Changes: - `pkg/model/watchdog.go`: add `sizeAwareEviction` flag and `modelSizes map[string]int64` to `WatchDog`; sort candidates by `sizeBytes` desc (LRU time as tiebreaker) when the flag is set; add `RegisterModelSize`, `SetSizeAwareEviction`, `GetSizeAwareEviction` - `pkg/model/watchdog_options.go`: add `WithSizeAwareEviction` option - `pkg/model/initializers.go`: stat model file after load and call `RegisterModelSize` so size data is available before the first eviction - `core/config/application_config.go`, `runtime_settings.go`: add `SizeAwareEviction` field and `WithSizeAwareEviction` app option; expose via `ToRuntimeSettings` / `ApplyRuntimeSettings` for the `POST /api/settings` live-reload path - `core/cli/run.go`: add `--size-aware-eviction` flag / `LOCALAI_SIZE_AWARE_EVICTION` env var - `core/application/startup.go`, `watchdog.go`: wire the new option through to `NewWatchDog` - `pkg/model/watchdog_test.go`: 5 new specs — option enable, dynamic toggle, largest-first ordering, equal-size LRU tiebreaker, no-size fallback to LRU, and size-map cleanup on eviction Closes #9375 Signed-off-by: supermario_leo <leo.stack@outlook.com> * refactor(watchdog): use vram estimation scaffolding for model size Replace the brittle os.Stat(modelFile) approach with a proper call to pkg/vram, which handles multi-file models (DownloadFiles, MMProj) and all weight file types, not just single GGUF files. - Add estimateModelSizeBytes() in core/backend/options.go that collects all weight file URIs from the model config, resolves them to file:// URIs, and calls vram.Estimate() with the shared DefaultCachedSizeResolver (15-min TTL cache avoids redundant stat calls on repeated loads) - Thread the result through via a new WithModelSizeBytes() loader option - In initializers.go, consume the pre-computed size instead of calling os.Stat; if no size was supplied (e.g. for external/router-dispatched models) the registration is simply skipped Signed-off-by: supermario_leo <leo.stack@outlook.com> * refactor(watchdog): use EstimateModel with HF fallback for size estimation Switch estimateModelSizeBytes from calling vram.Estimate directly to the unified vram.EstimateModel entry point, which adds automatic fallbacks: file-based GGUF metadata → HF API → size string. Also extract the HuggingFace repo ID from model URIs (huggingface://, hf://, https://huggingface.co/ and org/model short-form) and pass it as ModelEstimateInput.HFRepo, so models not yet downloaded locally can still get a size estimate via the HF API. Addresses @mudler's review feedback: "better to rely on EstimateModel and pass by the HF URL of the model extracted from the URI". Signed-off-by: supermario_leo <leo.stack@outlook.com> * feat(webui): add Size-Aware Eviction toggle to settings page The size-aware eviction setting was wired through the CLI flag and the RuntimeSettings live-reload path (POST /api/settings) but had no handle on the React settings page, so it could not be toggled from the UI. Add a Size-Aware Eviction toggle to the Watchdog section, next to the existing Force Eviction When Busy / LRU eviction handles. The settings page loads and saves the whole RuntimeSettings object, so the new size_aware_eviction key is picked up with no extra plumbing. Addresses @mudler's review feedback: the application config setting should land on the same UI settings page as the other handles. Signed-off-by: supermario_leo <leo.stack@outlook.com> --------- Signed-off-by: supermario_leo <leo.stack@outlook.com>	2026-06-21 17:17:04 +02:00
pos-ei-don	b4c0dc67fe	feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346 ) (#10351 ) * fix(vllm): don't stream raw tool-call markup as content when a tool parser is active When a tool_parser is configured and the request carries tools, the streaming loop emitted every text delta as delta.content — including the model's raw tool-call markup (e.g. <tool_call>...) — because extract_tool_calls only runs on the full output after the stream. Clients streaming a tool call therefore saw the unparsed tool-call syntax as assistant content. Buffer the text while a tool parser is active for the request; the existing end-of-stream chat_delta already carries the parsed tool_calls (or the cleaned content), which the Go side converts to SSE deltas. Non-tool-parser streaming is unchanged. Add a server-less regression test covering both the tool-call case (no raw markup leaked as content) and the plain-text case (content delivered exactly once — guards against double-emitting the buffered content). Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * test(vllm): add expectedFailure test for progressive streaming with tool parser (Case 3, #582) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * test(vllm): add Cases 4+5 — marker split across chunks + false-positive prefix (TDD, Option B state machine, #582) Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * feat(vllm): progressive streaming via parser.extract_tool_calls_streaming When a tool parser is active for a tool-enabled streaming request, #10346 buffers the entire generation and surfaces it on the final chunk to prevent raw tool-call markup from leaking as delta.content. This is correct but turns the request into effectively non-streaming for plain-text responses — the client sees nothing until the model stops. Every concrete tool parser shipped with vLLM 0.23+ already implements extract_tool_calls_streaming (Granite4, Qwen3Coder, DeepSeekV31, Jamba, Ernie45, Hermes2Pro, llama3_json, mistral, …). Use it: instantiate the parser before the streaming loop and call its streaming method per delta, emitting DeltaMessage(content=…) or DeltaMessage(tool_calls=[…]) when the parser is ready. Falls back to the existing #10346 buffer path when: - the parser does not have extract_tool_calls_streaming, OR - extract_tool_calls_streaming raises mid-stream (logged, the rest of the request finishes via post-loop extract_tool_calls). Tests (TestStreamingToolParser): 1. Buffer path: no markup leaked, no content duplication 2. Native streaming: plain-text response streams progressively 3. Native streaming: tool_call structured, no markup leaked 4. Native streaming exception → graceful fallback, no markup, no crash 5. No tool parser → unchanged per-delta content stream E2E verified against qwen3_coder on vLLM 0.23.0 (NVIDIA GB10 / arm64 / CUDA 13). Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * docs(vllm): add server-side TTFT benchmark for the streaming tool-parser path Self-contained stdlib-only script that measures time-to-first-token (TTFT) for the vLLM backend's two streaming scenarios: - tool_call: request mentions a tool; model is expected to call it - plain_text: request offers a tool but explicitly asks for prose Use this to compare: - the buffer-all path (#10346) → plain_text TTFT ≈ total response time - the native-streaming path (this PR) → plain_text TTFT ≈ true first-token time python examples/vllm-bench/ttft_streaming_tool_parser.py \\ --url http://localhost:8080 --model my-coder --runs 3 Lives under examples/ so it does not interfere with the test suite. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> * examples/vllm-bench: add long-text scenario (8 paragraphs, 1500 tokens) The long-text scenario shows the buffering vs streaming difference most dramatically: with the buffer-all path, the client receives nothing for 20+ seconds and then the entire 1500-token response at once. With native streaming, the first token arrives in tens of milliseconds and the response flows progressively. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> --------- Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com> Co-authored-by: Philipp Wacker <philipp.wacker@ibf-solutions.com>	2026-06-21 17:07:15 +02:00
番茄摔成番茄酱	01fa12e0de	feat(nemo): enable word-level timestamps for ASR models (#10297 ) * feat(nemo): enable word-level timestamps for ASR models The nemo backend ignored timestamp_granularities and always returned a single segment with start=0 end=0, making word-level timestamps impossible to obtain even though the NeMo models (parakeet-tdt, etc.) fully support them. Changes: - Add _get_stride_seconds() to compute frame duration from the model's preprocessor window_stride and encoder subsampling_factor. - Add _build_segments_with_words() that extracts word offsets from the NeMo Hypothesis.timestamp dict and converts frame indices to nanosecond timestamps. - Support 'word' granularity (one segment per word) and 'segment' granularity (merge at time-gap boundaries using a dynamic threshold). - Populate TranscriptSegment.words with TranscriptWord entries so callers get both segment-level and word-level timing. - Only request timestamps from NeMo when the caller actually asks for them (timestamp_granularities is non-empty), keeping the fast path unchanged for callers that don't need timestamps. Tested with nvidia/parakeet-tdt-0.6b-v3 on the JFK "ask not" clip: curl -X POST /v1/audio/transcriptions \ -F file=@jfk.wav -F model=nemo-parakeet-tdt-0.6b \ -F 'timestamp_granularities[]=word' -F response_format=verbose_json → each word has correct start/end times in seconds. Signed-off-by: fqscfqj <fqscfqj@outlook.com> * fix(nemo): address Copilot review feedback - Narrow exception handling in _get_stride_seconds to catch only AttributeError, KeyError, TypeError instead of bare Exception, and emit a warning when falling back to the hardcoded stride. - Remove explicit return_hypotheses=False when timestamps are requested; timestamps=True already forces NeMo to return Hypothesis objects. - Add a warning when NeMo does not return Hypothesis objects despite timestamps being requested. Signed-off-by: fqscfqj <fqscfqj@outlook.com> --------- Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-21 17:04:19 +02:00
番茄摔成番茄酱	cf7f9573a2	fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421 ) The parakeet-specific word accessors can return stale initialisation data (model name, binary blobs) for segments with no real speech. Add isValidWord() to filter out words that have: - empty or whitespace-only text - U+FFFD replacement characters (from binary data scrubbing) - negative timestamps - zero duration (end <= start) Also skip empty segments entirely when they have no recognisable content (empty text AND no valid words), preventing spurious subtitle entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'. Applies to both AudioTranscription and AudioTranscriptionStream. Signed-off-by: fqscfqj <fqscfqj@outlook.com>	2026-06-21 17:03:33 +02:00
pos-ei-don	c6303104c7	fix(vllm): structured outputs silently ignored on vLLM >= 0.23 (GuidedDecodingParams removed) (#10343 ) fix(vllm): structured outputs silently ignored on vLLM >= 0.23 vLLM >= 0.23 removed GuidedDecodingParams (now StructuredOutputsParams) and renamed the SamplingParams field guided_decoding -> structured_outputs. The import failed, HAS_GUIDED_DECODING became False, and the whole guided-decoding block was skipped, so response_format / grammar constraints were silently ignored. Adapt the existing request.Grammar path to the new class/field. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-06-21 17:02:31 +02:00

1 2 3 4 5 ...

6800 Commits