Commit Graph

6106 Commits

Author SHA1 Message Date
LocalAI [bot]
0f3bb2d647 chore(model gallery): 🤖 add 1 new models via gallery agent (#9481)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 08:22:05 +02:00
Adira
607efe5a4c fix(backend-monitor): accept model as a query parameter (#9411)
The /backend/monitor endpoint is routed as GET but its handler bound the
model name from a request body, which is invalid per REST and breaks
Swagger UI and OpenAPI codegen tools that refuse to send bodies with GET.

Switch to reading ?model=<name> as a query parameter and update the
Swagger annotation, regenerated spec files, and documentation. The
handler still falls back to body binding when the query parameter is
absent, so existing clients sending {"model": "..."} continue to work.

Fixes #9207

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-04-21 22:06:35 +02:00
Ettore Di Giacinto
7d8c1d5e45 fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush (#9470)
* fix(streaming): dedupe content, recover reasoning, unique tool IDs

When tool calls are discovered only during final parsing (after the
streaming token callback returns), processTools' default switch branch
used to emit the full accumulated content alongside the tool_call args
chunk. Clients that accumulate delta.content per the OpenAI streaming
contract end up showing every narration line twice. Three related bugs
in the same flush path:

1. Content duplication: the args chunk carried Content:textContentToReturn
   even though the text had already been streamed token-by-token via
   the token callback, so delta.content was both the running total and
   bundled with tool_calls in one delta (two spec violations).
2. Reasoning drop: when the C++ autoparser surfaces reasoning only as
   a final aggregate (no incremental tokens), the callback never emits
   it and the flush branch didn't either, silently losing it.
3. tool_call ID collision: empty ss.ID fell back to the request id, so
   multiple empty-ID calls in the same turn all shared the same id,
   breaking tool_result matching by tool_call_id.

Extracted the block into buildDeferredToolCallChunks (pure function,
unit-testable) and added 19 Ginkgo specs covering streamed vs.
not-streamed content/reasoning, single vs. multi call, and
incremental-vs-deferred emission. Every case asserts the invariant
that no delta carries both non-empty Content/Reasoning and non-empty
ToolCalls.

Fix summary:
- emit reasoning in its own leading chunk when !reasoningAlreadyStreamed
- emit role+content in their own chunks when !contentAlreadyStreamed
- drop Content from the tool_call args chunk
- fallback to fmt.Sprintf("%s-%d", id, i) for empty ss.ID so calls stay
  uniquely addressable

Reproduced live against qwen3.6-35b-a3b-apex served by LocalAI with
the C++ autoparser; the full-content replay chunk that preceded each
tool_calls block is gone after the fix.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): dedupe reasoning in the noActionToRun final chunk

extractor.Reasoning() returns only the Go-side extractor's lastReasoning
accumulator (pkg/reasoning/extractor.go:129). ChatDelta reasoning
coming through ProcessChatDeltaReasoning lives in a separate
accumulator (cdLastStrippedReasoning) that Reasoning() does not
expose. The "reasoning != \"\" && extractor.Reasoning() == \"\"" guard
therefore fires exactly when the autoparser streamed reasoning
incrementally via the callback — producing a duplicate final delivery.

Replace both guard sites in the noActionToRun branch with the
sentReasoning flag introduced in the previous commit. Extract the
closing-chunk logic into buildNoActionFinalChunks so the refactor is
testable; the helper mirrors buildDeferredToolCallChunks.

Add Ginkgo coverage for both the content-streamed and
content-not-streamed paths: reasoning is dropped when it was streamed,
delivered once when it arrived only as a final aggregate, and omitted
when empty. Metadata invariants carried over from the sibling helper.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): detect noActionToRun anywhere in functionResults

The previous condition only looked at functionResults[0].Name, which
misbehaved when a real tool call followed a noAction sentinel — the
noAction shadowed the real call and the whole turn was treated as a
question to answer, silently dropping the tool call. The mirror case,
[realCall, noActionCall], fell into the default branch and emitted the
noAction entry as if it were a real tool_call.

Replace with hasRealCall, which scans the slice and returns true as
soon as it finds a non-noAction entry. noActionToRun now matches the
semantic intent: "every entry is the noAction sentinel (or the slice
is empty)".

Note: this does not change incremental emission, where noAction
entries may still be forwarded as tool_call chunks by the XML/JSON
iterative parsers. That is a separate layer (functions.Parse*) and
addressing it requires threading noAction through the parser APIs —
out of scope for this change.

Assisted-by: Claude:claude-opus-4-7 go vet
2026-04-21 21:59:33 +02:00
leinasi2014
d18d434bb2 Respect explicit reasoning config during GGUF thinking probe (#9463)
Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-21 21:53:10 +02:00
Ettore Di Giacinto
39573ecd2a chore(whisperx): drop ROCm/hipblas build target (#9474)
whisperx has no upstream AMD GPU support and its core transcription path
(faster-whisper -> ctranslate2) falls back to CPU on AMD since the PyPI
ctranslate2 is CUDA-only. The torch rocm wheels would accelerate only the
alignment/diarization stages, producing a misleadingly half-working image.

Drop the hipblas variant rather than shipping a partially accelerated build
users can't distinguish from the real thing. AMD hosts now fall through
the capability map to cpu-whisperx / cpu-whisperx-development.

Also removes the now-dangling rocm-whisperx assertion from
pkg/system/capabilities_test.go and the ROCm mention from the whisperx
row in docs/content/reference/compatibility-table.md.

Assisted-by: Claude Code:claude-opus-4-7
2026-04-21 21:50:18 +02:00
Ettore Di Giacinto
a7dbb2a83d fix(gallery-agent): process blacklist command on recently-closed PRs (#9473)
The command-processing step only walked open PRs, so when a maintainer
wrote `/gallery-agent blacklist` and immediately closed the PR, the
next scheduled run missed the command, the `gallery-agent/blacklisted`
label was never applied, and the skip-URL step (which only pulls URLs
from closed PRs carrying that label) re-proposed the model on the next
cron.

Also scan closed gallery-agent PRs from the last 14 days that don't
already carry the blacklist label, and apply the label retroactively
when the command is present. Close/recreate actions still only run on
open PRs.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 16:29:13 +02:00
dependabot[bot]
3ad9b16c29 chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 (#9455)
Bumps [github.com/coreos/go-oidc/v3](https://github.com/coreos/go-oidc) from 3.17.0 to 3.18.0.
- [Release notes](https://github.com/coreos/go-oidc/releases)
- [Commits](https://github.com/coreos/go-oidc/compare/v3.17.0...v3.18.0)

---
updated-dependencies:
- dependency-name: github.com/coreos/go-oidc/v3
  dependency-version: 3.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:31:02 +02:00
dependabot[bot]
c806d5ab73 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 (#9456)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.14 to 1.32.16.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.14...config/v1.32.16)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.16
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:30:22 +02:00
LocalAI [bot]
47efaf5b43 Fix: Add model parameter to neutts-air gallery definition (#8793)
fix: Add model parameter to neutts-air gallery definition

The neutts-air model entry was missing the 'model' parameter in its
configuration, which caused LocalAI to fail with an 'Unrecognized model'
error when trying to use it. This change adds the required model parameter
pointing to the HuggingFace repository (neuphonic/neutts-air) so the backend
can properly load the model.

Fixes #8792

Signed-off-by: localai-bot <localai-bot@example.com>
Co-authored-by: localai-bot <localai-bot@example.com>
2026-04-21 11:56:00 +02:00
LocalAI [bot]
315b634a91 feat: improve CLI error messages with actionable guidance (#8880)
- transcript.go: Model not found error now suggests available models commands
- util.go: GGUF error explains format and how to get models
- worker_p2p.go: Token error explains purpose and how to obtain one
- run.go: Startup failure includes troubleshooting steps and docs link
- model_config_loader.go: Config validation errors include file path and guidance

Refs: H2 - UX Review Issue

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 11:53:26 +02:00
dependabot[bot]
6b245299d7 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 (#9454)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:43:00 +02:00
dependabot[bot]
677c0315c1 chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 (#9453)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.30 to 1.7.31.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.30...v1.7.31)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-version: 1.7.31
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:43 +02:00
dependabot[bot]
478522ce4d chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 (#9452)
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.97.1 to 1.99.1.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.97.1...service/s3/v1.99.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.99.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:27 +02:00
Ettore Di Giacinto
c54897ad44 fix(tests): update InstallBackend call sites for new URI/Name/Alias params (#9467)
Commit 02bb715c (#9446) added uri, name, alias parameters to
RemoteUnloaderAdapter.InstallBackend but missed the e2e test call
sites, breaking the distributed test build. Pass empty strings to
match the pattern used by the other non-URI call sites.

Assisted-by: Claude Code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 11:41:38 +02:00
LocalAI [bot]
8bb1e8f21f chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 (#9448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:15:45 +02:00
LocalAI [bot]
cd94a0b61a chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe (#9450)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:09:05 +02:00
LocalAI [bot]
047bc48fa9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:07:07 +02:00
sec171
01bd8ae5d0 [gallery] Fix duplicate sha256 keys in Wan models (#9461)
Fix duplicate sha256 keys in wan models gallery

The wan models previously defined the `sha256` key twice in their files lists,
which triggered strict mapping key checks in the YAML parser and resulted
in unmarshal errors that crashed the `/api/models` loading. This removes
the redundant trailing `sha256` keys from the Wan model definitions.

Assisted-by: Antigravity:Gemini-3.1-Pro-High [multi_replace_file_content, run_command]

Signed-off-by: Alex <codecrusher24@gmail.com>
2026-04-21 11:06:36 +02:00
LocalAI [bot]
d9808769be chore(model-gallery): ⬆️ update checksum (#9451)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:58 +02:00
LocalAI [bot]
5973c0a9df chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e (#9447)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:19 +02:00
leinasi2014
486b5e25a3 fix(config): ignore yaml backup files in model loader (#9443)
Only load files whose real extension is .yaml or .yml so backup files like model.yaml.bak do not override active configs. Add a regression test covering plain and timestamped backup files.

Assisted-by: Codex:gpt-5.4 docker

Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
2026-04-20 23:41:39 +02:00
Russell Sim
c66c41e8d7 fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445)
Commit 8839a71c exposed AMDGPU_TARGETS as an ARG/ENV in
Dockerfile.llama-cpp so GPU targets could be overridden, but never
wired the value through the CI workflow inputs. Without it, Docker
receives AMDGPU_TARGETS="" which overrides the Makefile's ?= default,
causing all hipblas builds to compile only for gfx906 regardless of
the target list in the Makefile.

Add amdgpu-targets as a workflow_call input with the same default list
as the Makefile, and pass it as AMDGPU_TARGETS in the build-args of
both the push and PR build steps.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:41:19 +02:00
Russell Sim
02bb715c0a fix(distributed): pass ExternalURI through NATS backend install (#9446)
When installing a backend with a custom OCI URI in distributed mode,
the URI was captured in ManagementOp.ExternalURI by the HTTP handler
but never forwarded to workers. BackendInstallRequest had no URI field,
so workers fell through to the gallery lookup and failed with
"no backend found with name <custom-name>".

Add URI/Name/Alias fields to BackendInstallRequest and thread them from
ManagementOp through DistributedBackendManager.InstallBackend() and the
RemoteUnloaderAdapter. On the worker side, route to InstallExternalBackend
when URI is set instead of InstallBackendFromGallery. Update all
remaining InstallBackend call sites (UpgradeBackend, reconciler
pending-op drain, router auto-install) to pass empty strings for the
new params.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:39:35 +02:00
Ettore Di Giacinto
8ab56e2ad3 feat(gallery): add wan i2v 720p (#9457)
feat(gallery): add Wan 2.1 I2V 14B 720P + pin all wan ggufs by sha256

Adds a new entry for the native-720p image-to-video sibling of the
480p I2V model (wan-2.1-i2v-14b-480p-ggml). The 720p I2V model is
trained purely as image-to-video — no first-last-frame interpolation
path — so motion is freer than repurposing the FLF2V 720P variant as
an i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h
auxiliary files as the existing 480p I2V and 720p FLF2V entries, so
no new aux downloads are introduced.

Also pins the main diffusion gguf by sha256 for the new entry and for
the three existing wan entries that were previously missing a hash
(wan-2.1-t2v-1.3b-ggml, wan-2.1-i2v-14b-480p-ggml,
wan-2.1-flf2v-14b-720p-ggml). Hashes were fetched from HuggingFace's
x-linked-etag header per .agents/adding-gallery-models.md.

Assisted-by: Claude:claude-opus-4-7
2026-04-20 23:34:11 +02:00
pjbrzozowski
ecf85fde9e fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427)
The API Traces tab in /app/traces always showed (0) traces despite requests
being recorded.

The /api/traces endpoint was registered in both localai.go and ui_api.go.
The ui_api.go version wrapped the response as {"traces": [...]} instead of
the flat []APIExchange array that both the React UI (Traces.jsx) and the
legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go
handler, Array.isArray(apiData) always returned false, making the API Traces
tab permanently empty.

Remove the duplicate endpoints from ui_api.go so only the correct flat-array
version in localai.go is served.

Also use mime.ParseMediaType for the Content-Type check in the trace
middleware so requests with parameters (e.g. application/json; charset=utf-8)
are still traced.

Signed-off-by: Pawel Brzozowski <paul@ontux.net>
Co-authored-by: Pawel Brzozowski <paul@ontux.net>
2026-04-20 18:44:49 +02:00
Sai Asish Y
6480715a16 fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438)
GET /api/settings returns settings.ApiKeys as the merged env+runtime list
via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and
round-trips it back on POST /api/settings unchanged.

UpdateSettingsEndpoint was then doing:

    appConfig.ApiKeys = append(envKeys, runtimeKeys...)

where runtimeKeys already contained envKeys (because the UI got them from
the merged GET). Every save therefore duplicated the env keys on top of
the previous merge, and also wrote the duplicates to runtime_settings.json
so the duplication survived restarts and compounded with each save. This
is the user-visible behaviour in #9071: the Web UI shows the keys
twice / three times after consecutive saves.

Before we marshal the settings to disk or call ApplyRuntimeSettings, drop
any incoming key that already appears in startupConfig.ApiKeys. The file
on disk now stores only the genuinely runtime-added keys; the subsequent
append(envKeys, runtimeKeys...) produces one copy of each env key, as
intended. Behaviour is unchanged for users who never had env keys set.

Fixes #9071

Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>
2026-04-20 10:36:54 +02:00
Ettore Di Giacinto
f683231811 feat(gallery): add Wan 2.1 FLF2V 14B 720P (#9440)
First-last-frame-to-video variant of the 14B Wan family. Accepts a
start and end reference image and — unlike the pure i2v path — runs
both through clip_vision, so the final frame lands on the end image
both in pixel and semantic space. Right pick for seamless loops
(start_image == end_image) and narrative A→B cuts.

Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the
I2V 14B entry. Options block mirrors i2v's full-list-in-override
style so the template merge doesn't drop fields.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:34:36 +02:00
LocalAI [bot]
960757f0e8 chore(model gallery): 🤖 add 1 new models via gallery agent (#9436)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 08:48:47 +02:00
Ettore Di Giacinto
865fd552f5 docs(agents): adopt kernel's AI coding assistants policy
Align LocalAI with the Linux kernel project's policy for AI-assisted
contributions (https://docs.kernel.org/process/coding-assistants.html).

- Add .agents/ai-coding-assistants.md with the full policy adapted to
  LocalAI's MIT license: no Signed-off-by or Co-Authored-By from AI,
  attribute AI involvement via an Assisted-by: trailer, human submitter
  owns the contribution.
- Surface the rules at the entry points: AGENTS.md (and its CLAUDE.md
  symlink) and CONTRIBUTING.md.
- Publish a user-facing reference page at
  docs/content/reference/ai-coding-assistants.md and link it from the
  references index.

Assisted-by: Claude:claude-opus-4-7
2026-04-19 22:50:54 +00:00
LocalAI [bot]
cb77a5a4b9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9425)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:42:44 +02:00
Ettore Di Giacinto
60633c4dd5 fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435)
gen_video's ffmpeg subprocess was relying on the filename extension to
choose the output container. Distributed LocalAI hands the backend a
staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to
.mp4 only after the backend returns, so ffmpeg saw a .tmp extension and
bailed with "Unable to choose an output format". Inference had already
completed and the frames were piped in, producing the cryptic
"video inference failed (code 1)" at the API layer.

Pass -f mp4 explicitly so the container is selected by flag instead of
by filename suffix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 00:41:54 +02:00
Ettore Di Giacinto
9e44944cc1 fix(i2v): Add new options to the model configuration
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-20 00:27:05 +02:00
Ettore Di Giacinto
372eb08dcf fix(gallery): allow uninstalling orphaned meta backends + force reinstall (#9434)
Two interrelated bugs that combined to make a meta backend impossible
to uninstall once its concrete had been removed from disk (partial
install, earlier crash, manual cleanup).

1. DeleteBackendFromSystem returned "meta backend %q not found" and
   bailed out early when the concrete directory didn't exist,
   preventing the orphaned meta dir from ever being removed. Treat a
   missing concrete as idempotent success — log a warning and continue
   to remove the orphan meta.

2. InstallBackendFromGallery's "already installed, skip" short-circuit
   only checked that the name was known (`backends.Exists(name)`); an
   orphaned meta whose RunFile points at a missing concrete still
   satisfies that check, so every reinstall returned nil without doing
   anything. Afterwards the worker's findBackend returned empty and we
   kept looping with "backend %q not found after install attempt".
   Require the entry to be actually runnable (run.sh stat-able, not a
   directory) before skipping.

New helper isBackendRunnable centralises the runnability test so both
the install guard and future callers stay in sync. Tests cover the
orphaned-meta delete path and the non-runnable short-circuit case.
2026-04-20 00:10:19 +02:00
LocalAI [bot]
28091d626e chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 (#9430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:01:48 +02:00
LocalAI [bot]
cae79d9107 feat(swagger): update swagger (#9431)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:50 +02:00
LocalAI [bot]
babbbc6ec8 chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad (#9429)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:19 +02:00
LocalAI [bot]
3804497186 chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 (#9428)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:07 +02:00
Ettore Di Giacinto
fda1c553a1 fix(distributed): stop queue loops on agent nodes + dead-letter cap (#9433)
pending_backend_ops rows targeting agent-type workers looped forever:
the reconciler fan-out hit a NATS subject the worker doesn't subscribe
to, returned ErrNoResponders, we marked the node unhealthy, and the
health monitor flipped it back to healthy on the next heartbeat. Next
tick, same row, same failure.

Three related fixes:

1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend.
   Agent workers handle agent NATS subjects, not backend.install /
   delete / list, so enqueueing for them guarantees an infinite retry
   loop. Silent skip is correct — they aren't consumers of these ops.

2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on
   nats.ErrNoResponders: mark the node unhealthy before recording the
   failure, so subsequent ListDuePendingBackendOps (filters by
   status=healthy) stops picking the row until the node actually
   recovers. Matches the synchronous fan-out path.

3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of
   exponential backoff the row is a poison message; further retries
   just thrash NATS. Row is deleted and logged at ERROR so it stays
   visible without staying infinite.

Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows
that target agent-type nodes, non-existent nodes, or carry an empty
backend name. Guarded by the same schema-migration advisory lock so
only one instance performs it. The guards above prevent new rows of
this shape; this closes the migration gap for existing ones.

Tests: the prune migration (valid row stays, agent + empty-name rows
drop) on top of existing upsert / backoff coverage.
2026-04-19 23:38:43 +02:00
Ettore Di Giacinto
b27de08fff chore(gallery): fixup wan
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 21:31:22 +00:00
Ettore Di Giacinto
510f791ccc feat(gallery): add stablediffusion-ggml-development meta backend 2026-04-19 20:16:33 +00:00
Ettore Di Giacinto
369c50a41c fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423)
* fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc

The upstream PR #21203 (server: respect the ignore_eos flag) has been
merged into the TheTom/llama-cpp-turboquant feature/turboquant-kv-cache
branch. With the fix now in-tree, 0001-server-respect-the-ignore-eos-flag.patch
no longer applies (git apply sees its additions already present) and the
nightly turboquant bump fails.

Retire the patch and bump the pin to the first fork revision that carries
the merged fix (tag feature-turboquant-kv-cache-b8967-627ebbc). This matches
the contract in apply-patches.sh: drop patches once the fork catches up.

* fix(turboquant): patch out get_media_marker() call in grpc-server copy

CI turboquant docker build was failing with:

  grpc-server.cpp:2825:40: error: use of undeclared identifier
  'get_media_marker'

The call was added by 7809c5f5 (PR #9412) to propagate the mtmd random
per-server media marker upstream landed in ggml-org/llama.cpp#21962. The
TheTom/llama-cpp-turboquant fork branched before that PR, so its
server-common.cpp has no such symbol.

Extend patch-grpc-server.sh to substitute get_media_marker() with the
legacy "<__media__>" literal in the build-time grpc-server.cpp copy
under turboquant-<flavor>-build/. The fork's mtmd_default_marker()
returns exactly that string, and the Go layer falls back to the same
sentinel when media_marker is empty, so behavior on the turboquant path
is unchanged. Patched copy only — the shared source under
backend/cpp/llama-cpp/ keeps compiling against vanilla upstream.

Verified by running `make docker-build-turboquant` locally end-to-end:
all five flavors (avx, avx2, avx512, fallback, grpc+rpc-server) now
compile past the previous failure and the image tags successfully.
2026-04-19 21:05:21 +02:00
Ettore Di Giacinto
75a63f87d8 feat(distributed): sync state with frontends, better backend management reporting (#9426)
* fix(distributed): detect backend upgrades across worker nodes

Before this change `DistributedBackendManager.CheckUpgrades` delegated to the
local manager, which read backends from the frontend filesystem. In
distributed deployments the frontend has no backends installed locally —
they live on workers — so the upgrade-detection loop never ran and the UI
silently never surfaced upgrades even when the gallery advertised newer
versions or digests.

Worker-side: NATS backend.list reply now carries Version, URI and Digest
for each installed backend (read from metadata.json).

Frontend-side: DistributedBackendManager.ListBackends aggregates per-node
refs (name, status, version, digest) instead of deduping, and CheckUpgrades
feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint
factored out of CheckBackendUpgrades so both paths share the same core
logic.

Cluster drift policy: when per-node version/digest tuples disagree, the
backend is flagged upgradeable regardless of whether any single node
matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so
operators can see *why* it is out of sync. The next upgrade-all realigns
the cluster.

Tests cover: drift detection, unanimous-match (no upgrade), and the
empty-installed-version path that the old distributed code silently
missed.

* feat(ui): surface backend upgrades in the System page

The System page (Manage.jsx) only showed updates as a tiny inline arrow,
so operators routinely missed them. Port the Backend Gallery's upgrade UX
so System speaks the same visual language:

- Yellow banner at the top of the Backends tab when upgrades are pending,
  with an "Upgrade all" button (serial fan-out, matches the gallery) and a
  "Updates only" filter toggle.
- Warning pill (↑ N) next to the tab label so the count is glanceable even
  when the banner is scrolled out of view.
- Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button
  that silently flipped semantics between Reinstall and Upgrade), plus an
  "Update available" badge in the new Version column.
- New columns: Version (with upgrade + drift chips), Nodes (per-node
  attribution badges for distributed mode, degrading to a compact
  "on N nodes · M offline" chip above three nodes), Installed (relative
  time).
- System backends render a "Protected" chip instead of a bare "—" so rows
  still align and the reason is obvious.
- Delete uses the softer btn-danger-ghost so rows don't scream red; the
  ConfirmDialog still owns the "are you sure".

The upgrade checker also needed the same per-worker fix as the previous
commit: NewUpgradeChecker now takes a BackendManager getter so its
periodic runs call the distributed CheckUpgrades (which asks workers)
instead of the empty frontend filesystem. Without this the /api/backends/
upgrades endpoint stayed empty in distributed mode even with the protocol
change in place.

New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack,
.cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in
App.css so other pages can adopt them without duplicating styles.

* feat(ui): polish the Nodes page so it reads like a product

The Nodes page was the biggest visual liability in distributed mode.
Rework the main dashboard surfaces in place without changing behavior:

StatCards: uniform height (96px min), left accent bar colored by the
metric's semantic (success/warning/error/primary), icon lives in a
36x36 soft-tinted chip top-right, value is left-aligned and large.
Grid auto-fills so the row doesn't collapse on narrow viewports. This
replaces the previous thin-bordered boxes with inconsistent heights.

Table rows: expandable rows now show a chevron cue on the left (rotates
on expand) so users know rows open. Status cell became a dedicated chip
with an LED-style halo dot instead of a bare bullet. Action buttons gained
labels — "Approve", "Resume", "Drain" — so the icons aren't doing all
the semantic work; the destructive remove action uses the softer
btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog
still owning the real "are you sure". Applied cell-mono/cell-muted
utility classes so label chips and addresses share one spacing/font
grammar instead of re-declaring inline styles everywhere.

Expanded drawer: empty states for Loaded Models and Installed Backends
now render as a proper drawer-empty card (dashed border, icon, one-line
hint) instead of a plain muted string that read like broken formatting.

Tabs: three inline-styled buttons became the shared .tab class so they
inherit focus ring, hover state, and the rest of the design system —
matches the System page.

"Add more workers" toggle turned into a .nodes-add-worker dashed-border
button labelled "Register a new worker" (action voice) instead of a
chevron + muted link that operators kept mistaking for broken text.

New shared CSS primitives carry over to other pages:
.stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty,
.nodes-add-worker.

* feat(distributed): durable backend fan-out + state reconciliation

Two connected problems handled together:

1) Backend delete/install/upgrade used to silently skip non-healthy nodes,
   so a delete during an outage left a zombie on the offline node once it
   returned. The fan-out now records intent in a new pending_backend_ops
   table before attempting the NATS round-trip. Currently-healthy nodes
   get an immediate attempt; everyone else is queued. Unique index on
   (node_id, backend, op) means reissuing the same operation refreshes
   next_retry_at instead of stacking duplicates.

2) Loaded-model state could drift from reality: a worker OOM'd, got
   killed, or restarted a backend process would leave a node_models row
   claiming the model was still loaded, feeding ghost entries into the
   /api/nodes/models listing and the router's scheduling decisions.

The existing ReplicaReconciler gains two new passes that run under a
fresh KeyStateReconciler advisory lock (non-blocking, so one wedged
frontend doesn't freeze the cluster):

  - drainPendingBackendOps: retries queued ops whose next_retry_at has
    passed on currently-healthy nodes. Success deletes the row; failure
    bumps attempts and pushes next_retry_at out with exponential backoff
    (30s → 15m cap). ErrNoResponders also marks the node unhealthy.

  - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are
    loaded but hasn't seen touched in the last probeStaleAfter (2m).
    Unreachable addresses are removed from the registry. A pluggable
    ModelProber lets tests substitute a fake without standing up gRPC.

DistributedBackendManager exposes DeleteBackendDetailed so the HTTP
handler can surface per-node outcomes ("2 succeeded, 1 queued") to the
UI in a follow-up commit; the existing DeleteBackend still returns
error-only for callers that don't care about node breakdown.

Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx
on a new key so N frontends coordinate — the same pattern the health
monitor and replica reconciler already rely on. Single-node mode runs
both passes inline (adapter is nil, state drain is a no-op).

Tests cover the upsert semantics, backoff math, the probe removing an
unreachable model but keeping a reachable one, and filtering by
probeStaleAfter.

* feat(ui): show cluster distribution of models in the System page

When a frontend restarted in distributed mode, models that workers had
already loaded weren't visible until the operator clicked into each node
manually — the /api/models/capabilities endpoint only knew about
configs on the frontend's filesystem, not the registry-backed truth.

/api/models/capabilities now joins in ListAllLoadedModels() when the
registry is active, returning loaded_on[] with node id/name/state/status
for each model. Models that live in the registry but lack a local config
(the actual ghosts, not recovered from the frontend's file cache) still
surface with source="registry-only" so operators can see and persist
them; without that emission they'd be invisible to this frontend.

Manage → Models replaces the old Running/Idle pill with a distribution
cell that lists the first three nodes the model is loaded on as chips
colored by state (green loaded, blue loading, amber anything else). On
wider clusters the remaining count collapses into a +N chip with a
title-attribute breakdown. Disabled / single-node behavior unchanged.

Adopted models get an extra "Adopted" ghost-icon chip with hover copy
explaining what it means and how to make it permanent.

Distributed mode also enables a 10s auto-refresh and a "Last synced Xs
ago" indicator next to the Update button so ghost rows drop off within
one reconcile tick after their owning process dies. Non-distributed
mode is untouched — no polling, no cell-stack, same old Running/Idle.

* feat(ui): NodeDistributionChip — shared per-node attribution component

Large clusters were going to break the Manage → Backends Nodes column:
the old inline logic rendered every node as a badge and would shred the
layout at >10 workers, plus the Manage → Models distribution cell had
copy-pasted its own slightly-different version.

NodeDistributionChip handles any cluster size with two render modes:
  - small (≤3 nodes): inline chips of node names, colored by health.
  - large: a single "on N nodes · M offline · K drift" summary chip;
    clicking opens a Popover with a per-node table (name, status,
    version, digest for backends; name, status, state for models).

Drift counting mirrors the backend's summarizeNodeDrift so the UI
number matches UpgradeInfo.NodeDrift. Digests are truncated to the
docker-style 12-char form with the full value preserved in the title.

Popover is a new general-purpose primitive: fixed positioning anchored
to the trigger, flips above when there's no room below, closes on
outside-click or Escape, returns focus to the trigger. Uses .card as
its surface so theming is inherited. Also useful for a future
labels-editor popup and the user menu.

Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell
and uses the shared chip with context="backends" / "models"
respectively. Delete code removes ~40 lines of ad-hoc logic.

* feat(ui): shared FilterBar across the System page tabs

The Backends gallery had a nice search + chip + toggle strip; the System
page had nothing, so the two surfaces felt like different apps. Lift the
pattern into a reusable FilterBar and wire both System tabs through it.

New component core/http/react-ui/src/components/FilterBar.jsx renders a
search input, a role="tablist" chip row (aria-selected for a11y), and
optional toggles / right slot. Chips support an optional `count` which
the System page uses to show "User 3", "Updates 1" etc.

System Models tab: search by id or backend; chips for
All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in
distributed mode. "Last synced" + Update button live in the right slot.

System Backends tab: search by name/alias/meta-backend-for; chips for
All/User/System/Meta plus conditional Updates / Offline-nodes chips
when relevant. The old ad-hoc "Updates only" toggle from the upgrade
banner folded into the Updates chip — one source of truth for that
filter. Offline chip only appears in distributed mode when at least
one backend has an unhealthy node, so the chip row stays quiet on
healthy clusters.

Filter state persists in URL query params (mq/mf/bq/bf) so deep links
and tab switches keep the operator's filter context instead of
resetting every time.

Also adds an "Adopted" distribution path: when a model in
/api/models/capabilities carries source="registry-only" (discovered on
a worker but not configured locally), the Models tab shows a ghost chip
labelled "Adopted" with hover copy explaining how to persist it — this
is what closes the loop on the ghost-model story end-to-end.
2026-04-19 17:55:53 +02:00
Ettore Di Giacinto
9cd8d7951f fix(kokoros): implement audio_transcription_stream trait stub (#9422)
The backend.proto was updated to add AudioTranscriptionStream RPC, but
the Rust KokorosService was never updated to match the regenerated
tonic trait, breaking compilation with E0046.

Stubs the new streaming method as unimplemented, matching the pattern
used for the other streaming RPCs Kokoros does not support.
2026-04-19 13:29:58 +02:00
LocalAI [bot]
884bfb84c9 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe (#9418)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:27:11 +02:00
LocalAI [bot]
e94a9a8f10 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d (#9417)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-19 09:26:58 +02:00
Ettore Di Giacinto
054c4b4b45 feat(stable-diffusion.ggml): add support for video generation (#9420)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 09:26:33 +02:00
LocalAI [bot]
6e49dba27c chore: ⬆️ Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb (#9415)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:26:05 +02:00
Ettore Di Giacinto
e463820566 fix(ui): fix dark-theme colors in chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-18 23:01:01 +00:00
Keith Mattix II
8839a71c87 fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg (#9410)
Add gfx1151 (AMD Strix Halo / Ryzen AI MAX) to the default AMDGPU_TARGETS
list in the llama-cpp backend Makefile. ROCm 7.2.1 ships with gfx1151
Tensile libraries, so this architecture should be included in default builds.

Also expose AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so that
users building for non-default GPU architectures can override the target
list via --build-arg AMDGPU_TARGETS=<arch>. Previously, passing
-DAMDGPU_TARGETS=<arch> through CMAKE_ARGS was silently overridden by
the Makefile's own append of the default target list.

Fixes #9374

Signed-off-by: Keith Mattix <keithmattix2@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-18 20:39:40 +02:00
Ettore Di Giacinto
117f6430b8 fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413)
The shared grpc-server CMakeLists hardcoded `llama-common`, the post-rename
target name in upstream llama.cpp. The turboquant fork branched before that
rename and still exposes the helpers library as `common`, so the name
silently degraded to a plain `-llama-common` link flag, the PUBLIC include
directory was never propagated, and tools/server/server-task.h failed to
find common.h during turboquant-<flavor> builds.
2026-04-18 20:30:28 +02:00