Compare commits

..

42 Commits

Author SHA1 Message Date
Ettore Di Giacinto
9eb21e9a20 fix(turboquant): patch ggml-hip CMakeLists to compile new f16-turbo fattn-vec instances
Fork commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
in fattn.cu") added three new template instance files under
ggml-cuda/template-instances/ (fattn-vec-instance-f16-turbo{2,3,4}_0.cu)
and wired matching FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO*)
dispatch cases into fattn.cu.

fattn.cu is shared with the HIP build via hipify, but the fork forgot
to mirror the new source files into ggml/src/ggml-hip/CMakeLists.txt.
CMake's ROCm branch carries a hand-curated template-instance list (used
when GGML_CUDA_FA_ALL_QUANTS is OFF, the default), so the HIP build
ended up with the extern template declarations but no matching
instantiations — the -gpu-rocm-hipblas-turboquant job failed partway
through the 3h+ build.

Add patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch, which the
existing apply-patches.sh machinery applies to the cloned fork sources
after fetch. The patch appends the three new f16-turbo instance files
to ggml-hip's source list in the same interleaved order used by
ggml-cuda's CMakeLists.txt. Drop this patch once the fork syncs the
ROCm list — the build will fail fast if the anchor context goes stale,
which is the signal to retire it.

CUDA builds were unaffected (ggml-cuda's CMakeLists.txt was updated
upstream) — the link failure was isolated to HIP.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-22 07:17:33 +00:00
mudler
85ff7a310f ⬆️ Update TheTom/llama-cpp-turboquant
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-21 21:28:32 +00:00
Adira
607efe5a4c fix(backend-monitor): accept model as a query parameter (#9411)
The /backend/monitor endpoint is routed as GET but its handler bound the
model name from a request body, which is invalid per REST and breaks
Swagger UI and OpenAPI codegen tools that refuse to send bodies with GET.

Switch to reading ?model=<name> as a query parameter and update the
Swagger annotation, regenerated spec files, and documentation. The
handler still falls back to body binding when the query parameter is
absent, so existing clients sending {"model": "..."} continue to work.

Fixes #9207

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-04-21 22:06:35 +02:00
Ettore Di Giacinto
7d8c1d5e45 fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush (#9470)
* fix(streaming): dedupe content, recover reasoning, unique tool IDs

When tool calls are discovered only during final parsing (after the
streaming token callback returns), processTools' default switch branch
used to emit the full accumulated content alongside the tool_call args
chunk. Clients that accumulate delta.content per the OpenAI streaming
contract end up showing every narration line twice. Three related bugs
in the same flush path:

1. Content duplication: the args chunk carried Content:textContentToReturn
   even though the text had already been streamed token-by-token via
   the token callback, so delta.content was both the running total and
   bundled with tool_calls in one delta (two spec violations).
2. Reasoning drop: when the C++ autoparser surfaces reasoning only as
   a final aggregate (no incremental tokens), the callback never emits
   it and the flush branch didn't either, silently losing it.
3. tool_call ID collision: empty ss.ID fell back to the request id, so
   multiple empty-ID calls in the same turn all shared the same id,
   breaking tool_result matching by tool_call_id.

Extracted the block into buildDeferredToolCallChunks (pure function,
unit-testable) and added 19 Ginkgo specs covering streamed vs.
not-streamed content/reasoning, single vs. multi call, and
incremental-vs-deferred emission. Every case asserts the invariant
that no delta carries both non-empty Content/Reasoning and non-empty
ToolCalls.

Fix summary:
- emit reasoning in its own leading chunk when !reasoningAlreadyStreamed
- emit role+content in their own chunks when !contentAlreadyStreamed
- drop Content from the tool_call args chunk
- fallback to fmt.Sprintf("%s-%d", id, i) for empty ss.ID so calls stay
  uniquely addressable

Reproduced live against qwen3.6-35b-a3b-apex served by LocalAI with
the C++ autoparser; the full-content replay chunk that preceded each
tool_calls block is gone after the fix.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): dedupe reasoning in the noActionToRun final chunk

extractor.Reasoning() returns only the Go-side extractor's lastReasoning
accumulator (pkg/reasoning/extractor.go:129). ChatDelta reasoning
coming through ProcessChatDeltaReasoning lives in a separate
accumulator (cdLastStrippedReasoning) that Reasoning() does not
expose. The "reasoning != \"\" && extractor.Reasoning() == \"\"" guard
therefore fires exactly when the autoparser streamed reasoning
incrementally via the callback — producing a duplicate final delivery.

Replace both guard sites in the noActionToRun branch with the
sentReasoning flag introduced in the previous commit. Extract the
closing-chunk logic into buildNoActionFinalChunks so the refactor is
testable; the helper mirrors buildDeferredToolCallChunks.

Add Ginkgo coverage for both the content-streamed and
content-not-streamed paths: reasoning is dropped when it was streamed,
delivered once when it arrived only as a final aggregate, and omitted
when empty. Metadata invariants carried over from the sibling helper.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): detect noActionToRun anywhere in functionResults

The previous condition only looked at functionResults[0].Name, which
misbehaved when a real tool call followed a noAction sentinel — the
noAction shadowed the real call and the whole turn was treated as a
question to answer, silently dropping the tool call. The mirror case,
[realCall, noActionCall], fell into the default branch and emitted the
noAction entry as if it were a real tool_call.

Replace with hasRealCall, which scans the slice and returns true as
soon as it finds a non-noAction entry. noActionToRun now matches the
semantic intent: "every entry is the noAction sentinel (or the slice
is empty)".

Note: this does not change incremental emission, where noAction
entries may still be forwarded as tool_call chunks by the XML/JSON
iterative parsers. That is a separate layer (functions.Parse*) and
addressing it requires threading noAction through the parser APIs —
out of scope for this change.

Assisted-by: Claude:claude-opus-4-7 go vet
2026-04-21 21:59:33 +02:00
leinasi2014
d18d434bb2 Respect explicit reasoning config during GGUF thinking probe (#9463)
Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-21 21:53:10 +02:00
Ettore Di Giacinto
39573ecd2a chore(whisperx): drop ROCm/hipblas build target (#9474)
whisperx has no upstream AMD GPU support and its core transcription path
(faster-whisper -> ctranslate2) falls back to CPU on AMD since the PyPI
ctranslate2 is CUDA-only. The torch rocm wheels would accelerate only the
alignment/diarization stages, producing a misleadingly half-working image.

Drop the hipblas variant rather than shipping a partially accelerated build
users can't distinguish from the real thing. AMD hosts now fall through
the capability map to cpu-whisperx / cpu-whisperx-development.

Also removes the now-dangling rocm-whisperx assertion from
pkg/system/capabilities_test.go and the ROCm mention from the whisperx
row in docs/content/reference/compatibility-table.md.

Assisted-by: Claude Code:claude-opus-4-7
2026-04-21 21:50:18 +02:00
Ettore Di Giacinto
a7dbb2a83d fix(gallery-agent): process blacklist command on recently-closed PRs (#9473)
The command-processing step only walked open PRs, so when a maintainer
wrote `/gallery-agent blacklist` and immediately closed the PR, the
next scheduled run missed the command, the `gallery-agent/blacklisted`
label was never applied, and the skip-URL step (which only pulls URLs
from closed PRs carrying that label) re-proposed the model on the next
cron.

Also scan closed gallery-agent PRs from the last 14 days that don't
already carry the blacklist label, and apply the label retroactively
when the command is present. Close/recreate actions still only run on
open PRs.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 16:29:13 +02:00
dependabot[bot]
3ad9b16c29 chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 (#9455)
Bumps [github.com/coreos/go-oidc/v3](https://github.com/coreos/go-oidc) from 3.17.0 to 3.18.0.
- [Release notes](https://github.com/coreos/go-oidc/releases)
- [Commits](https://github.com/coreos/go-oidc/compare/v3.17.0...v3.18.0)

---
updated-dependencies:
- dependency-name: github.com/coreos/go-oidc/v3
  dependency-version: 3.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:31:02 +02:00
dependabot[bot]
c806d5ab73 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 (#9456)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.14 to 1.32.16.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.14...config/v1.32.16)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.16
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:30:22 +02:00
LocalAI [bot]
47efaf5b43 Fix: Add model parameter to neutts-air gallery definition (#8793)
fix: Add model parameter to neutts-air gallery definition

The neutts-air model entry was missing the 'model' parameter in its
configuration, which caused LocalAI to fail with an 'Unrecognized model'
error when trying to use it. This change adds the required model parameter
pointing to the HuggingFace repository (neuphonic/neutts-air) so the backend
can properly load the model.

Fixes #8792

Signed-off-by: localai-bot <localai-bot@example.com>
Co-authored-by: localai-bot <localai-bot@example.com>
2026-04-21 11:56:00 +02:00
LocalAI [bot]
315b634a91 feat: improve CLI error messages with actionable guidance (#8880)
- transcript.go: Model not found error now suggests available models commands
- util.go: GGUF error explains format and how to get models
- worker_p2p.go: Token error explains purpose and how to obtain one
- run.go: Startup failure includes troubleshooting steps and docs link
- model_config_loader.go: Config validation errors include file path and guidance

Refs: H2 - UX Review Issue

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 11:53:26 +02:00
dependabot[bot]
6b245299d7 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 (#9454)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:43:00 +02:00
dependabot[bot]
677c0315c1 chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 (#9453)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.30 to 1.7.31.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.30...v1.7.31)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-version: 1.7.31
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:43 +02:00
dependabot[bot]
478522ce4d chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 (#9452)
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.97.1 to 1.99.1.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.97.1...service/s3/v1.99.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.99.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:27 +02:00
Ettore Di Giacinto
c54897ad44 fix(tests): update InstallBackend call sites for new URI/Name/Alias params (#9467)
Commit 02bb715c (#9446) added uri, name, alias parameters to
RemoteUnloaderAdapter.InstallBackend but missed the e2e test call
sites, breaking the distributed test build. Pass empty strings to
match the pattern used by the other non-URI call sites.

Assisted-by: Claude Code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 11:41:38 +02:00
LocalAI [bot]
8bb1e8f21f chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 (#9448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:15:45 +02:00
LocalAI [bot]
cd94a0b61a chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe (#9450)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:09:05 +02:00
LocalAI [bot]
047bc48fa9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:07:07 +02:00
sec171
01bd8ae5d0 [gallery] Fix duplicate sha256 keys in Wan models (#9461)
Fix duplicate sha256 keys in wan models gallery

The wan models previously defined the `sha256` key twice in their files lists,
which triggered strict mapping key checks in the YAML parser and resulted
in unmarshal errors that crashed the `/api/models` loading. This removes
the redundant trailing `sha256` keys from the Wan model definitions.

Assisted-by: Antigravity:Gemini-3.1-Pro-High [multi_replace_file_content, run_command]

Signed-off-by: Alex <codecrusher24@gmail.com>
2026-04-21 11:06:36 +02:00
LocalAI [bot]
d9808769be chore(model-gallery): ⬆️ update checksum (#9451)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:58 +02:00
LocalAI [bot]
5973c0a9df chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e (#9447)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:19 +02:00
leinasi2014
486b5e25a3 fix(config): ignore yaml backup files in model loader (#9443)
Only load files whose real extension is .yaml or .yml so backup files like model.yaml.bak do not override active configs. Add a regression test covering plain and timestamped backup files.

Assisted-by: Codex:gpt-5.4 docker

Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
2026-04-20 23:41:39 +02:00
Russell Sim
c66c41e8d7 fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445)
Commit 8839a71c exposed AMDGPU_TARGETS as an ARG/ENV in
Dockerfile.llama-cpp so GPU targets could be overridden, but never
wired the value through the CI workflow inputs. Without it, Docker
receives AMDGPU_TARGETS="" which overrides the Makefile's ?= default,
causing all hipblas builds to compile only for gfx906 regardless of
the target list in the Makefile.

Add amdgpu-targets as a workflow_call input with the same default list
as the Makefile, and pass it as AMDGPU_TARGETS in the build-args of
both the push and PR build steps.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:41:19 +02:00
Russell Sim
02bb715c0a fix(distributed): pass ExternalURI through NATS backend install (#9446)
When installing a backend with a custom OCI URI in distributed mode,
the URI was captured in ManagementOp.ExternalURI by the HTTP handler
but never forwarded to workers. BackendInstallRequest had no URI field,
so workers fell through to the gallery lookup and failed with
"no backend found with name <custom-name>".

Add URI/Name/Alias fields to BackendInstallRequest and thread them from
ManagementOp through DistributedBackendManager.InstallBackend() and the
RemoteUnloaderAdapter. On the worker side, route to InstallExternalBackend
when URI is set instead of InstallBackendFromGallery. Update all
remaining InstallBackend call sites (UpgradeBackend, reconciler
pending-op drain, router auto-install) to pass empty strings for the
new params.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:39:35 +02:00
Ettore Di Giacinto
8ab56e2ad3 feat(gallery): add wan i2v 720p (#9457)
feat(gallery): add Wan 2.1 I2V 14B 720P + pin all wan ggufs by sha256

Adds a new entry for the native-720p image-to-video sibling of the
480p I2V model (wan-2.1-i2v-14b-480p-ggml). The 720p I2V model is
trained purely as image-to-video — no first-last-frame interpolation
path — so motion is freer than repurposing the FLF2V 720P variant as
an i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h
auxiliary files as the existing 480p I2V and 720p FLF2V entries, so
no new aux downloads are introduced.

Also pins the main diffusion gguf by sha256 for the new entry and for
the three existing wan entries that were previously missing a hash
(wan-2.1-t2v-1.3b-ggml, wan-2.1-i2v-14b-480p-ggml,
wan-2.1-flf2v-14b-720p-ggml). Hashes were fetched from HuggingFace's
x-linked-etag header per .agents/adding-gallery-models.md.

Assisted-by: Claude:claude-opus-4-7
2026-04-20 23:34:11 +02:00
pjbrzozowski
ecf85fde9e fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427)
The API Traces tab in /app/traces always showed (0) traces despite requests
being recorded.

The /api/traces endpoint was registered in both localai.go and ui_api.go.
The ui_api.go version wrapped the response as {"traces": [...]} instead of
the flat []APIExchange array that both the React UI (Traces.jsx) and the
legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go
handler, Array.isArray(apiData) always returned false, making the API Traces
tab permanently empty.

Remove the duplicate endpoints from ui_api.go so only the correct flat-array
version in localai.go is served.

Also use mime.ParseMediaType for the Content-Type check in the trace
middleware so requests with parameters (e.g. application/json; charset=utf-8)
are still traced.

Signed-off-by: Pawel Brzozowski <paul@ontux.net>
Co-authored-by: Pawel Brzozowski <paul@ontux.net>
2026-04-20 18:44:49 +02:00
Sai Asish Y
6480715a16 fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438)
GET /api/settings returns settings.ApiKeys as the merged env+runtime list
via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and
round-trips it back on POST /api/settings unchanged.

UpdateSettingsEndpoint was then doing:

    appConfig.ApiKeys = append(envKeys, runtimeKeys...)

where runtimeKeys already contained envKeys (because the UI got them from
the merged GET). Every save therefore duplicated the env keys on top of
the previous merge, and also wrote the duplicates to runtime_settings.json
so the duplication survived restarts and compounded with each save. This
is the user-visible behaviour in #9071: the Web UI shows the keys
twice / three times after consecutive saves.

Before we marshal the settings to disk or call ApplyRuntimeSettings, drop
any incoming key that already appears in startupConfig.ApiKeys. The file
on disk now stores only the genuinely runtime-added keys; the subsequent
append(envKeys, runtimeKeys...) produces one copy of each env key, as
intended. Behaviour is unchanged for users who never had env keys set.

Fixes #9071

Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>
2026-04-20 10:36:54 +02:00
Ettore Di Giacinto
f683231811 feat(gallery): add Wan 2.1 FLF2V 14B 720P (#9440)
First-last-frame-to-video variant of the 14B Wan family. Accepts a
start and end reference image and — unlike the pure i2v path — runs
both through clip_vision, so the final frame lands on the end image
both in pixel and semantic space. Right pick for seamless loops
(start_image == end_image) and narrative A→B cuts.

Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the
I2V 14B entry. Options block mirrors i2v's full-list-in-override
style so the template merge doesn't drop fields.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:34:36 +02:00
LocalAI [bot]
960757f0e8 chore(model gallery): 🤖 add 1 new models via gallery agent (#9436)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 08:48:47 +02:00
Ettore Di Giacinto
865fd552f5 docs(agents): adopt kernel's AI coding assistants policy
Align LocalAI with the Linux kernel project's policy for AI-assisted
contributions (https://docs.kernel.org/process/coding-assistants.html).

- Add .agents/ai-coding-assistants.md with the full policy adapted to
  LocalAI's MIT license: no Signed-off-by or Co-Authored-By from AI,
  attribute AI involvement via an Assisted-by: trailer, human submitter
  owns the contribution.
- Surface the rules at the entry points: AGENTS.md (and its CLAUDE.md
  symlink) and CONTRIBUTING.md.
- Publish a user-facing reference page at
  docs/content/reference/ai-coding-assistants.md and link it from the
  references index.

Assisted-by: Claude:claude-opus-4-7
2026-04-19 22:50:54 +00:00
LocalAI [bot]
cb77a5a4b9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9425)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:42:44 +02:00
Ettore Di Giacinto
60633c4dd5 fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435)
gen_video's ffmpeg subprocess was relying on the filename extension to
choose the output container. Distributed LocalAI hands the backend a
staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to
.mp4 only after the backend returns, so ffmpeg saw a .tmp extension and
bailed with "Unable to choose an output format". Inference had already
completed and the frames were piped in, producing the cryptic
"video inference failed (code 1)" at the API layer.

Pass -f mp4 explicitly so the container is selected by flag instead of
by filename suffix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 00:41:54 +02:00
Ettore Di Giacinto
9e44944cc1 fix(i2v): Add new options to the model configuration
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-20 00:27:05 +02:00
Ettore Di Giacinto
372eb08dcf fix(gallery): allow uninstalling orphaned meta backends + force reinstall (#9434)
Two interrelated bugs that combined to make a meta backend impossible
to uninstall once its concrete had been removed from disk (partial
install, earlier crash, manual cleanup).

1. DeleteBackendFromSystem returned "meta backend %q not found" and
   bailed out early when the concrete directory didn't exist,
   preventing the orphaned meta dir from ever being removed. Treat a
   missing concrete as idempotent success — log a warning and continue
   to remove the orphan meta.

2. InstallBackendFromGallery's "already installed, skip" short-circuit
   only checked that the name was known (`backends.Exists(name)`); an
   orphaned meta whose RunFile points at a missing concrete still
   satisfies that check, so every reinstall returned nil without doing
   anything. Afterwards the worker's findBackend returned empty and we
   kept looping with "backend %q not found after install attempt".
   Require the entry to be actually runnable (run.sh stat-able, not a
   directory) before skipping.

New helper isBackendRunnable centralises the runnability test so both
the install guard and future callers stay in sync. Tests cover the
orphaned-meta delete path and the non-runnable short-circuit case.
2026-04-20 00:10:19 +02:00
LocalAI [bot]
28091d626e chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 (#9430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:01:48 +02:00
LocalAI [bot]
cae79d9107 feat(swagger): update swagger (#9431)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:50 +02:00
LocalAI [bot]
babbbc6ec8 chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad (#9429)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:19 +02:00
LocalAI [bot]
3804497186 chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 (#9428)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:07 +02:00
Ettore Di Giacinto
fda1c553a1 fix(distributed): stop queue loops on agent nodes + dead-letter cap (#9433)
pending_backend_ops rows targeting agent-type workers looped forever:
the reconciler fan-out hit a NATS subject the worker doesn't subscribe
to, returned ErrNoResponders, we marked the node unhealthy, and the
health monitor flipped it back to healthy on the next heartbeat. Next
tick, same row, same failure.

Three related fixes:

1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend.
   Agent workers handle agent NATS subjects, not backend.install /
   delete / list, so enqueueing for them guarantees an infinite retry
   loop. Silent skip is correct — they aren't consumers of these ops.

2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on
   nats.ErrNoResponders: mark the node unhealthy before recording the
   failure, so subsequent ListDuePendingBackendOps (filters by
   status=healthy) stops picking the row until the node actually
   recovers. Matches the synchronous fan-out path.

3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of
   exponential backoff the row is a poison message; further retries
   just thrash NATS. Row is deleted and logged at ERROR so it stays
   visible without staying infinite.

Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows
that target agent-type nodes, non-existent nodes, or carry an empty
backend name. Guarded by the same schema-migration advisory lock so
only one instance performs it. The guards above prevent new rows of
this shape; this closes the migration gap for existing ones.

Tests: the prune migration (valid row stays, agent + empty-name rows
drop) on top of existing upsert / backoff coverage.
2026-04-19 23:38:43 +02:00
Ettore Di Giacinto
b27de08fff chore(gallery): fixup wan
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 21:31:22 +00:00
Ettore Di Giacinto
510f791ccc feat(gallery): add stablediffusion-ggml-development meta backend 2026-04-19 20:16:33 +00:00
Ettore Di Giacinto
369c50a41c fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423)
* fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc

The upstream PR #21203 (server: respect the ignore_eos flag) has been
merged into the TheTom/llama-cpp-turboquant feature/turboquant-kv-cache
branch. With the fix now in-tree, 0001-server-respect-the-ignore-eos-flag.patch
no longer applies (git apply sees its additions already present) and the
nightly turboquant bump fails.

Retire the patch and bump the pin to the first fork revision that carries
the merged fix (tag feature-turboquant-kv-cache-b8967-627ebbc). This matches
the contract in apply-patches.sh: drop patches once the fork catches up.

* fix(turboquant): patch out get_media_marker() call in grpc-server copy

CI turboquant docker build was failing with:

  grpc-server.cpp:2825:40: error: use of undeclared identifier
  'get_media_marker'

The call was added by 7809c5f5 (PR #9412) to propagate the mtmd random
per-server media marker upstream landed in ggml-org/llama.cpp#21962. The
TheTom/llama-cpp-turboquant fork branched before that PR, so its
server-common.cpp has no such symbol.

Extend patch-grpc-server.sh to substitute get_media_marker() with the
legacy "<__media__>" literal in the build-time grpc-server.cpp copy
under turboquant-<flavor>-build/. The fork's mtmd_default_marker()
returns exactly that string, and the Go layer falls back to the same
sentinel when media_marker is empty, so behavior on the turboquant path
is unchanged. Patched copy only — the shared source under
backend/cpp/llama-cpp/ keeps compiling against vanilla upstream.

Verified by running `make docker-build-turboquant` locally end-to-end:
all five flavors (avx, avx2, avx512, fallback, grpc+rpc-server) now
compile past the previous failure and the image tags successfully.
2026-04-19 21:05:21 +02:00
61 changed files with 2251 additions and 520 deletions

View File

@@ -0,0 +1,101 @@
# AI Coding Assistants
This document provides guidance for AI tools and developers using AI
assistance when contributing to LocalAI.
**LocalAI follows the same guidelines as the Linux kernel project for
AI-assisted contributions.** See the upstream policy here:
<https://docs.kernel.org/process/coding-assistants.html>
The rules below mirror that policy, adapted to LocalAI's license and
project layout. If anything is unclear, the kernel document is the
authoritative reference for intent.
AI tools helping with LocalAI development should follow the standard
project development process:
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
conventions, and PR guidelines
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
logging, and documentation conventions
- [.agents/building-and-testing.md](building-and-testing.md) — build and
test procedures
## Licensing and Legal Requirements
All contributions must comply with LocalAI's licensing requirements:
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
file
- New source files should use the SPDX license identifier `MIT` where
applicable to the file type
- Contributions must be compatible with the MIT License and must not
introduce code under incompatible licenses (e.g., GPL) without an
explicit discussion with maintainers
## Signed-off-by and Developer Certificate of Origin
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
certify the Developer Certificate of Origin (DCO). The human submitter
is responsible for:
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO)
to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
A human reviewer owns the contribution; the AI's involvement is recorded
via `Assisted-by` (see below).
## Attribution
When AI tools contribute to LocalAI development, proper attribution helps
track the evolving role of AI in the development process. Contributions
should include an `Assisted-by` tag in the commit message trailer in the
following format:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Where:
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
`Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g.,
`claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
Basic development tools (git, go, make, editors) should **not** be listed.
### Example
```
fix(llama-cpp): handle empty tool call arguments
Previously the parser panicked when the model returned a tool call with
an empty arguments object. Fall back to an empty JSON object in that
case so downstream consumers receive a valid payload.
Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility.
The human submitter must:
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the
project style
- Confirm that any referenced APIs, flags, or file paths actually exist
in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
Reviewers may ask for clarification on any change regardless of how it
was produced. "An AI wrote it" is not an acceptable answer to a design
question.

View File

@@ -30,6 +30,7 @@ jobs:
skip-drivers: ${{ matrix.skip-drivers }}
context: ${{ matrix.context }}
ubuntu-version: ${{ matrix.ubuntu-version }}
amdgpu-targets: ${{ matrix.amdgpu-targets }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -1623,19 +1624,6 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-whisperx'
runs-on: 'bigger-runner'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
skip-drivers: 'false'
backend: "whisperx"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""

View File

@@ -58,6 +58,11 @@ on:
required: false
default: '2204'
type: string
amdgpu-targets:
description: 'AMD GPU targets for ROCm/HIP builds'
required: false
default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
type: string
secrets:
dockerUsername:
required: false
@@ -214,6 +219,7 @@ jobs:
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
@@ -235,6 +241,7 @@ jobs:
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha

View File

@@ -54,24 +54,41 @@ jobs:
REPO: ${{ github.repository }}
SEARCH: 'gallery agent in:title'
run: |
# Walk open gallery-agent PRs and act on maintainer comments:
# Walk gallery-agent PRs and act on maintainer comments:
# /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
# /gallery-agent recreate → close without label (next run may repropose)
# Only comments from OWNER / MEMBER / COLLABORATOR are honored so
# random users can't drive the bot.
#
# We scan both open PRs AND recently-closed PRs that don't already
# carry the blacklist label. This covers the common flow where a
# maintainer writes /gallery-agent blacklist and immediately clicks
# Close — without this, the next scheduled run wouldn't see the
# command (PR is already closed) and would repropose the model.
gh label create gallery-agent/blacklisted \
--repo "$REPO" --color ededed \
--description "gallery-agent must not repropose this model" 2>/dev/null || true
prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
--json number --jq '.[].number')
# Closed PRs from the last 14 days that don't yet have the blacklist label.
# Bounded window keeps the scan cheap while covering late-applied commands.
since=$(date -u -d '14 days ago' +%Y-%m-%d)
prs_closed=$(gh pr list --repo "$REPO" --state closed \
--search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
--json number --jq '.[].number')
prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
for pr in $prs; do
state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
--jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
echo "PR #$pr: blacklist command found"
echo "PR #$pr: blacklist command found (state=$state)"
gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
if [ "$state" = "OPEN" ]; then
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
fi
elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
echo "PR #$pr: recreate command found"
gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
fi

View File

@@ -1,11 +1,23 @@
# LocalAI Agent Instructions
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
## Policy for AI-Assisted Contributions
LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
## Topics
| File | When to read |
|------|-------------|
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |

View File

@@ -13,6 +13,7 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
- [Development Workflow](#development-workflow)
- [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
- [Coding Guidelines](#coding-guidelines)
- [AI Coding Assistants](#ai-coding-assistants)
- [Testing](#testing)
- [Documentation](#documentation)
- [Community and Communication](#community-and-communication)
@@ -185,7 +186,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to
This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
### General Principles
@@ -211,6 +212,26 @@ For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific gui
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
- Be responsive to review feedback and keep discussions constructive.
## AI Coding Assistants
LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
Basic development tools (git, go, make, editors) should not be listed.
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
- Contributions must remain compatible with LocalAI's **MIT License**.
## Testing
All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,38 +0,0 @@
From: LocalAI maintainers <noreply@localai.io>
Subject: [PATCH] gemma3: default rms norm eps when GGUF metadata key is missing
Some Gemma 3 GGUF files (notably those distributed via the Ollama
registry) do not embed the `gemma3.attention.layer_norm_rms_epsilon`
metadata key. ik_llama.cpp currently requires the key to be present and
fails the entire model load with:
error loading model hyperparameters:
key not found in model: gemma3.attention.layer_norm_rms_epsilon
Ollama's own loader silently falls back to ~1e-6 in the same situation,
which is the canonical Gemma 3 default (see google/gemma_pytorch
config.py and the Hugging Face Gemma3Config), so the model still loads
and works correctly.
Mirror that behavior here: pre-seed the field with the Gemma 3 default
and mark the metadata key as optional. This unblocks Ollama-converted
Gemma 3 models without affecting GGUFs that already carry the key.
Refs: ggml-org/llama.cpp#12367, ollama/ollama#10262, mudler/LocalAI#9414
---
src/llama-hparams.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/llama-hparams.cpp b/src/llama-hparams.cpp
--- a/src/llama-hparams.cpp
+++ b/src/llama-hparams.cpp
@@ -679,7 +679,8 @@
hparams.rope_freq_scale_train_swa = 1.0f;
ml.get_key(LLM_KV_ATTENTION_SLIDING_WINDOW, hparams.n_swa);
- ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
+ hparams.f_norm_rms_eps = 1e-6f; // Gemma 3 canonical default; some Ollama GGUFs omit the key
+ ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps, false);
switch (hparams.n_layer) {
case 26: model.type = e_model::MODEL_2B; break;

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
LLAMA_VERSION?=cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -1,38 +0,0 @@
From: LocalAI maintainers <noreply@localai.io>
Subject: [PATCH] gemma3: default rms norm eps when GGUF metadata key is missing
Some Gemma 3 GGUF files (notably those distributed via the Ollama
registry) do not embed the `gemma3.attention.layer_norm_rms_epsilon`
metadata key. llama.cpp currently requires the key to be present and
fails the entire model load with:
error loading model hyperparameters:
key not found in model: gemma3.attention.layer_norm_rms_epsilon
Ollama's own loader silently falls back to ~1e-6 in the same situation,
which is the canonical Gemma 3 default (see google/gemma_pytorch
config.py and the Hugging Face Gemma3Config), so the model still loads
and works correctly.
Mirror that behavior here: pre-seed the field with the Gemma 3 default
and mark the metadata key as optional. This unblocks Ollama-converted
Gemma 3 models without affecting GGUFs that already carry the key.
Refs: ggml-org/llama.cpp#12367, ollama/ollama#10262, mudler/LocalAI#9414
---
src/llama-model.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/llama-model.cpp b/src/llama-model.cpp
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@@ -1568,7 +1568,8 @@
hparams.f_final_logit_softcapping = 0.0f;
ml.get_key(LLM_KV_FINAL_LOGIT_SOFTCAPPING, hparams.f_final_logit_softcapping, false);
- ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
+ hparams.f_norm_rms_eps = 1e-6f; // Gemma 3 canonical default; some Ollama GGUFs omit the key
+ ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps, false);
switch (hparams.n_layer) {
case 18: type = LLM_TYPE_270M; break;

View File

@@ -1,7 +1,7 @@
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
TURBOQUANT_VERSION?=45f8a066ed5f5bb38c695cec532f6cef9f4efa9d
TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
CMAKE_ARGS?=

View File

@@ -1,13 +1,22 @@
#!/bin/bash
# Augment the shared backend/cpp/llama-cpp/grpc-server.cpp allow-list of KV-cache
# types so the gRPC `LoadModel` call accepts the TurboQuant-specific
# `turbo2` / `turbo3` / `turbo4` cache types.
# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
# turboquant build to account for two gaps between upstream and the fork:
#
# We do this on the *copy* sitting in turboquant-<flavor>-build/, never on the
# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
# compiling against vanilla upstream which does not know about GGML_TYPE_TURBO*.
# 1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
# fork-specific `turbo2` / `turbo3` / `turbo4` cache types.
# 2. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
# server-side random per-instance marker) with the legacy "<__media__>"
# literal. The fork branched before that PR, so server-common.cpp has no
# get_media_marker symbol. The fork's mtmd_default_marker() still returns
# "<__media__>", and Go-side tooling falls back to that sentinel when the
# backend does not expose media_marker, so substituting the literal keeps
# behavior identical on the turboquant path.
#
# Idempotent: skips the insertion if the marker is already present (so re-runs
# We patch the *copy* sitting in turboquant-<flavor>-build/, never the original
# under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps compiling
# against vanilla upstream.
#
# Idempotent: skips each insertion if its marker is already present (so re-runs
# of the same build dir don't double-insert).
set -euo pipefail
@@ -25,33 +34,47 @@ if [[ ! -f "$SRC" ]]; then
fi
if grep -q 'GGML_TYPE_TURBO2_0' "$SRC"; then
echo "==> $SRC already has TurboQuant cache types, skipping"
exit 0
echo "==> $SRC already has TurboQuant cache types, skipping KV allow-list patch"
else
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
# line (the kv_cache_types[] allow-list). Using awk because the builder image
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
awk '
/^ GGML_TYPE_Q5_1,$/ && !done {
print
print " // turboquant fork extras — added by patch-grpc-server.sh"
print " GGML_TYPE_TURBO2_0,"
print " GGML_TYPE_TURBO3_0,"
print " GGML_TYPE_TURBO4_0,"
done = 1
next
}
{ print }
END {
if (!done) {
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
exit 1
}
}
' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> KV allow-list patch OK"
fi
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
if grep -q 'get_media_marker()' "$SRC"; then
echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
# Only one call site today (ModelMetadata), but replace all occurrences to
# stay robust if upstream adds more. Use a temp file to avoid relying on
# sed -i portability (the builder image uses GNU sed, but keeping this
# consistent with the awk block above).
sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> get_media_marker() substitution OK"
else
echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
fi
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
# line (the kv_cache_types[] allow-list). Using awk because the builder image
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
awk '
/^ GGML_TYPE_Q5_1,$/ && !done {
print
print " // turboquant fork extras — added by patch-grpc-server.sh"
print " GGML_TYPE_TURBO2_0,"
print " GGML_TYPE_TURBO3_0,"
print " GGML_TYPE_TURBO4_0,"
done = 1
next
}
{ print }
END {
if (!done) {
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
exit 1
}
}
' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> patched OK"
echo "==> all patches applied"

View File

@@ -0,0 +1,47 @@
From: LocalAI turboquant backend maintainers <noreply@localai.io>
Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
in fattn.cu") added three new template instance files under ggml-cuda/:
- fattn-vec-instance-f16-turbo2_0.cu
- fattn-vec-instance-f16-turbo3_0.cu
- fattn-vec-instance-f16-turbo4_0.cu
and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
build path via hipify.
However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
is the default) and was never updated for the new F16-K + TURBO-V combos.
The HIP build therefore compiles the dispatch cases (which reference
ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
the matching template instantiations, causing a link-time failure in the
-gpu-rocm-hipblas-turboquant CI job.
Add the three new template instance files to ggml-hip's list so the HIP
build links cleanly. Drop this patch once the fork picks up the
corresponding upstream sync in ggml-hip/CMakeLists.txt.
--- a/ggml/src/ggml-hip/CMakeLists.txt
+++ b/ggml/src/ggml-hip/CMakeLists.txt
@@ -85,14 +85,17 @@ else()
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu

View File

@@ -1,83 +0,0 @@
From 660600081fb7b9b769ded5c805a2d39a419f0a0d Mon Sep 17 00:00:00 2001
From: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Date: Wed, 8 Apr 2026 11:12:15 -0400
Subject: [PATCH] server: respect the ignore eos flag (#21203)
---
tools/server/server-context.cpp | 3 +++
tools/server/server-context.h | 3 +++
tools/server/server-task.cpp | 3 ++-
tools/server/server-task.h | 1 +
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
index 9d3ac538..b31981c5 100644
--- a/tools/server/server-context.cpp
+++ b/tools/server/server-context.cpp
@@ -3033,6 +3033,8 @@ server_context_meta server_context::get_meta() const {
/* fim_rep_token */ llama_vocab_fim_rep(impl->vocab),
/* fim_sep_token */ llama_vocab_fim_sep(impl->vocab),
+ /* logit_bias_eog */ impl->params_base.sampling.logit_bias_eog,
+
/* model_vocab_type */ llama_vocab_type(impl->vocab),
/* model_vocab_n_tokens */ llama_vocab_n_tokens(impl->vocab),
/* model_n_ctx_train */ llama_model_n_ctx_train(impl->model),
@@ -3117,6 +3119,7 @@ std::unique_ptr<server_res_generator> server_routes::handle_completions_impl(
ctx_server.vocab,
params,
meta->slot_n_ctx,
+ meta->logit_bias_eog,
data);
task.id_slot = json_value(data, "id_slot", -1);
diff --git a/tools/server/server-context.h b/tools/server/server-context.h
index d7ce8735..6ea9afc0 100644
--- a/tools/server/server-context.h
+++ b/tools/server/server-context.h
@@ -39,6 +39,9 @@ struct server_context_meta {
llama_token fim_rep_token;
llama_token fim_sep_token;
+ // sampling
+ std::vector<llama_logit_bias> logit_bias_eog;
+
// model meta
enum llama_vocab_type model_vocab_type;
int32_t model_vocab_n_tokens;
diff --git a/tools/server/server-task.cpp b/tools/server/server-task.cpp
index 4cc87bc5..856b3f0e 100644
--- a/tools/server/server-task.cpp
+++ b/tools/server/server-task.cpp
@@ -239,6 +239,7 @@ task_params server_task::params_from_json_cmpl(
const llama_vocab * vocab,
const common_params & params_base,
const int n_ctx_slot,
+ const std::vector<llama_logit_bias> & logit_bias_eog,
const json & data) {
task_params params;
@@ -562,7 +563,7 @@ task_params server_task::params_from_json_cmpl(
if (params.sampling.ignore_eos) {
params.sampling.logit_bias.insert(
params.sampling.logit_bias.end(),
- defaults.sampling.logit_bias_eog.begin(), defaults.sampling.logit_bias_eog.end());
+ logit_bias_eog.begin(), logit_bias_eog.end());
}
}
diff --git a/tools/server/server-task.h b/tools/server/server-task.h
index d855bf08..243e47a8 100644
--- a/tools/server/server-task.h
+++ b/tools/server/server-task.h
@@ -209,6 +209,7 @@ struct server_task {
const llama_vocab * vocab,
const common_params & params_base,
const int n_ctx_slot,
+ const std::vector<llama_logit_bias> & logit_bias_eog,
const json & data);
// utility function
--
2.43.0

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d
STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -1106,6 +1106,11 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
// Force MP4 container. Distributed LocalAI hands us a staging
// path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
// extension; relying on filename suffix makes ffmpeg bail with
// "Unable to choose an output format".
const_cast<char*>("-f"), const_cast<char*>("mp4"),
const_cast<char*>(dst),
nullptr
};

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -587,7 +587,6 @@
alias: "whisperx"
capabilities:
nvidia: "cuda12-whisperx"
amd: "rocm-whisperx"
metal: "metal-whisperx"
default: "cpu-whisperx"
nvidia-cuda-13: "cuda13-whisperx"
@@ -1008,6 +1007,20 @@
nvidia-cuda-12: "cuda12-turboquant-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-turboquant-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-turboquant-development"
- !!merge <<: *stablediffusionggml
name: "stablediffusion-ggml-development"
capabilities:
default: "cpu-stablediffusion-ggml-development"
nvidia: "cuda12-stablediffusion-ggml-development"
intel: "intel-sycl-f16-stablediffusion-ggml-development"
# amd: "rocm-stablediffusion-ggml-development"
vulkan: "vulkan-stablediffusion-ggml-development"
nvidia-l4t: "nvidia-l4t-arm64-stablediffusion-ggml-development"
metal: "metal-stablediffusion-ggml-development"
nvidia-cuda-13: "cuda13-stablediffusion-ggml-development"
nvidia-cuda-12: "cuda12-stablediffusion-ggml-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-stablediffusion-ggml-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-stablediffusion-ggml-development"
- !!merge <<: *neutts
name: "cpu-neutts"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-neutts"
@@ -2731,7 +2744,6 @@
name: "whisperx-development"
capabilities:
nvidia: "cuda12-whisperx-development"
amd: "rocm-whisperx-development"
metal: "metal-whisperx-development"
default: "cpu-whisperx-development"
nvidia-cuda-13: "cuda13-whisperx-development"
@@ -2757,16 +2769,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
- !!merge <<: *whisperx
name: "rocm-whisperx"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
- !!merge <<: *whisperx
name: "rocm-whisperx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-whisperx
- !!merge <<: *whisperx
name: "cuda13-whisperx"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"

View File

@@ -1,6 +0,0 @@
# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
# the rocm7.2.1 runtime via AMD's forward-compatibility window.
--extra-index-url https://download.pytorch.org/whl/rocm6.4
torch==2.8.0+rocm6.4
whisperx @ git+https://github.com/m-bain/whisperX.git

View File

@@ -40,6 +40,12 @@ type TokenUsage struct {
ChatDeltas []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
}
func needsThinkingProbe(c *config.ModelConfig) bool {
return c.TemplateConfig.UseTokenizerTemplate &&
(c.ReasoningConfig.DisableReasoning == nil ||
c.ReasoningConfig.DisableReasoningTagPrefill == nil)
}
// HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
// Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
func (t TokenUsage) HasChatDeltaContent() bool {
@@ -100,11 +106,9 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
// tokenizer template path is active) and the multimodal media marker (needed
// by custom chat templates so markers line up with what mtmd expects).
// We probe whenever any of those slots is still empty.
needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
c.ReasoningConfig.DisableReasoning == nil &&
c.ReasoningConfig.DisableReasoningTagPrefill == nil
shouldProbeThinking := needsThinkingProbe(c)
needsMarkerProbe := c.MediaMarker == ""
if needsThinkingProbe || needsMarkerProbe {
if shouldProbeThinking || needsMarkerProbe {
modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
// Update the config in the loader so it persists for future requests

View File

@@ -0,0 +1,29 @@
package backend
import (
"github.com/mudler/LocalAI/core/config"
"github.com/gpustack/gguf-parser-go/util/ptr"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("thinking probe gating", func() {
It("probes tokenizer-template models when any reasoning default is still unset", func() {
cfg := &config.ModelConfig{
TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
}
Expect(needsThinkingProbe(cfg)).To(BeTrue())
cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
Expect(needsThinkingProbe(cfg)).To(BeTrue())
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
Expect(needsThinkingProbe(cfg)).To(BeFalse())
})
It("does not probe when tokenizer templates are disabled", func() {
cfg := &config.ModelConfig{}
Expect(needsThinkingProbe(cfg)).To(BeFalse())
})
})

View File

@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
app, err := application.New(opts...)
if err != nil {
return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n 1. Check that your models directory exists and is accessible: %s\n 2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n 3. Check available disk space and file permissions\n 4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
}
appHTTP, err := http.API(app)

View File

@@ -3,7 +3,6 @@ package cli
import (
"context"
"encoding/json"
"errors"
"fmt"
"strings"
@@ -60,7 +59,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
c, exists := cl.GetModelConfig(t.Model)
if !exists {
return errors.New("model not found")
return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
}
c.Threads = &t.Threads

View File

@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {
func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
if len(u.Args) == 0 {
return fmt.Errorf("no GGUF file provided")
return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
}
// We try to guess only if we don't have a template defined already
f, err := gguf.ParseGGUFFile(u.Args[0])

View File

@@ -21,6 +21,7 @@ import (
"github.com/mudler/LocalAI/core/cli/workerregistry"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/storage"
@@ -597,12 +598,20 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
// Try to find the backend binary
backendPath := s.findBackend(req.Backend)
if backendPath == "" {
// Backend not found locally — try auto-installing from gallery
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
if err := gallery.InstallBackendFromGallery(
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
if req.URI != "" {
xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
if err := galleryop.InstallExternalBackend(
context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
}
} else {
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
if err := gallery.InstallBackendFromGallery(
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
); err != nil {
return "", fmt.Errorf("installing backend from gallery: %w", err)
}
}
// Re-register after install and retry
gallery.RegisterBackends(s.systemState, s.ml)

View File

@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
// Check if the token is set
// as we always need it.
if r.Token == "" {
return fmt.Errorf("Token is required")
return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
}
port, err := freeport.GetFreePort()

View File

@@ -125,19 +125,7 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
return
}
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
// Use the rendered template to detect if thinking token is at the end
// This reuses the existing DetectThinkingStartToken function
if metadata.RenderedTemplate != "" {
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
thinkingForcedOpen := thinkingStartToken != ""
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
} else {
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
}
applyDetectedThinkingConfig(cfg, metadata)
// Extract tool format markers from autoparser analysis
if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
@@ -180,3 +168,34 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
}
}
}
func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
if cfg == nil || metadata == nil {
return
}
// Respect explicit YAML/user config. Backend probing should only fill defaults
// when the reasoning mode has not already been set.
if cfg.ReasoningConfig.DisableReasoning == nil {
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
}
// Respect explicit prefill config for the same reason. Only infer the
// default prefill behavior when the user did not set it.
if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
// Use the rendered template to detect if thinking token is at the end.
// This reuses the existing DetectThinkingStartToken function.
if metadata.RenderedTemplate != "" {
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
thinkingForcedOpen := thinkingStartToken != ""
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
} else {
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
}
return
}
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
}

View File

@@ -0,0 +1,101 @@
package config
import (
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/reasoning"
"github.com/gpustack/gguf-parser-go/util/ptr"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("GGUF backend metadata reasoning defaults", func() {
It("fills reasoning defaults when unset", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
})
It("preserves fully explicit reasoning settings", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoning: ptr.To(true),
DisableReasoningTagPrefill: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
It("preserves explicit disable while still inferring missing prefill", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoning: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
})
It("preserves explicit prefill while still inferring missing disable flag", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
ReasoningConfig: reasoning.Config{
DisableReasoningTagPrefill: ptr.To(true),
},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: true,
RenderedTemplate: "{{ bos_token }}<think>",
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
It("defaults to disabling reasoning when backend does not support thinking", func() {
cfg := &ModelConfig{
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
}
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
SupportsThinking: false,
})
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
})
})

View File

@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
bcl.configs[c.Name] = *c
} else {
if err != nil {
return fmt.Errorf("config is not valid: %w", err)
return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
}
return fmt.Errorf("config is not valid")
return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
}
return nil
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
files = append(files, info)
}
for _, file := range files {
// Skip templates, YAML and .keep files
if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
strings.HasPrefix(file.Name(), ".") {
// Only load real YAML config files and ignore dotfiles or backup variants
ext := strings.ToLower(filepath.Ext(file.Name()))
if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
continue
}

View File

@@ -2,6 +2,7 @@ package config
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
@@ -109,5 +110,50 @@ options:
Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))
})
It("Only loads files ending with yaml or yml", func() {
tmpdir, err := os.MkdirTemp("", "model-config-loader")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpdir)
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
`name: "foo-model"
description: "formal config"
backend: "llama-cpp"
parameters:
model: "foo.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
`name: "foo-model"
description: "backup config"
backend: "llama-cpp"
parameters:
model: "foo-backup.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
`name: "foo-backup-only"
description: "timestamped backup config"
backend: "llama-cpp"
parameters:
model: "foo-timestamped.gguf"
`), 0644)
Expect(err).ToNot(HaveOccurred())
bcl := NewModelConfigLoader(tmpdir)
err = bcl.LoadModelConfigsFromPath(tmpdir)
Expect(err).ToNot(HaveOccurred())
configs := bcl.GetAllModelsConfigs()
Expect(configs).To(HaveLen(1))
Expect(configs[0].Name).To(Equal("foo-model"))
Expect(configs[0].Description).To(Equal("formal config"))
_, exists := bcl.GetModelConfig("foo-backup-only")
Expect(exists).To(BeFalse())
})
})
})

View File

@@ -110,7 +110,13 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
if err != nil {
return err
}
if backends.Exists(name) {
// Only short-circuit if the install is *actually usable*. An orphaned
// meta entry whose concrete was removed still shows up in
// ListSystemBackends with a RunFile pointing at a path that no longer
// exists; returning early there leaves the caller with a broken
// alias and the worker fails with "backend not found after install
// attempt" on every retry. Re-install in that case.
if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
return nil
}
}
@@ -375,17 +381,44 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
}
if metadata != nil && metadata.MetaBackendFor != "" {
metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
// If the concrete the meta points to is already gone (earlier delete,
// partial install, or manual cleanup), keep going and remove the
// orphaned meta dir. Previously we returned an error here, which made
// the orphaned meta impossible to uninstall from the UI — the delete
// kept failing and every subsequent install short-circuited because
// the stale meta metadata made ListSystemBackends.Exists(name) true.
if _, statErr := os.Stat(concreteDirectory); statErr == nil {
os.RemoveAll(concreteDirectory)
} else if os.IsNotExist(statErr) {
xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
"meta", name, "concrete", metadata.MetaBackendFor)
} else {
return statErr
}
os.RemoveAll(metaBackendDirectory)
}
return os.RemoveAll(backendDirectory)
}
// isBackendRunnable reports whether the given backend entry can actually be
// invoked. A meta backend is runnable only if its concrete's run.sh still
// exists on disk; concrete backends are considered runnable as long as their
// RunFile is set (ListSystemBackends only emits them when the runfile is
// present). Used to guard the "already installed" short-circuit so an
// orphaned meta pointing at a missing concrete triggers a real reinstall
// rather than being silently skipped.
func isBackendRunnable(b SystemBackend) bool {
if b.RunFile == "" {
return false
}
if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
return false
}
return true
}
type SystemBackend struct {
Name string
RunFile string

View File

@@ -952,6 +952,58 @@ var _ = Describe("Gallery Backends", func() {
err = DeleteBackendFromSystem(systemState, "non-existent")
Expect(err).To(HaveOccurred())
})
It("removes an orphaned meta backend whose concrete is missing", func() {
// Real scenario from the dev cluster: the concrete got wiped
// (partial install, manual cleanup, previous crash) but the meta
// directory + metadata.json still points at it. The old code
// errored with "meta backend X not found" and left the orphan in
// place, making the backend impossible to uninstall.
metaName := "meta-backend"
concreteName := "concrete-backend-that-vanished"
metaPath := filepath.Join(tempDir, metaName)
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
data, err := json.MarshalIndent(meta, "", " ")
Expect(err).NotTo(HaveOccurred())
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
// Concrete directory intentionally absent.
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
Expect(err).NotTo(HaveOccurred())
Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
Expect(metaPath).NotTo(BeADirectory())
})
})
Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
It("re-runs install when the meta's concrete is missing", func() {
// Seed state: meta dir exists with metadata pointing at a
// concrete that was removed from disk. ListSystemBackends still
// surfaces the meta via its metadata.Name → the old short-circuit
// at `if backends.Exists(name) { return nil }` returned silently,
// leaving the worker's findBackend() with a dead alias forever.
// The fix: require the backend to be runnable before we skip.
metaName := "meta-orphan"
concreteName := "concrete-gone"
metaPath := filepath.Join(tempDir, metaName)
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
data, err := json.MarshalIndent(meta, "", " ")
Expect(err).NotTo(HaveOccurred())
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
Expect(err).NotTo(HaveOccurred())
listed, err := ListSystemBackends(systemState)
Expect(err).NotTo(HaveOccurred())
b, ok := listed.Get(metaName)
Expect(ok).To(BeTrue())
Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
})
})
Describe("ListSystemBackends", func() {

View File

@@ -9,19 +9,26 @@ import (
// BackendMonitorEndpoint returns the status of the specified backend
// @Summary Backend monitor endpoint
// @Tags monitoring
// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
// @Param model query string true "Name of the model to monitor"
// @Success 200 {object} proto.StatusResponse "Response"
// @Router /backend/monitor [get]
func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
return func(c echo.Context) error {
input := new(schema.BackendMonitorRequest)
// Get input data from the request body
if err := c.Bind(input); err != nil {
return err
model := c.QueryParam("model")
// Fall back to binding the request body so pre-existing clients that
// sent `{"model": "..."}` with GET keep working.
if model == "" {
input := new(schema.BackendMonitorRequest)
if err := c.Bind(input); err != nil {
return err
}
model = input.Model
}
if model == "" {
return echo.NewHTTPError(400, "model query parameter is required")
}
resp, err := bm.CheckAndSample(input.Model)
resp, err := bm.CheckAndSample(model)
if err != nil {
return err
}

View File

@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
if err := c.Bind(&req); err != nil || req.Backend == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
}
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
if err != nil {
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))

View File

@@ -110,6 +110,27 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
})
}
// The UI reads ApiKeys from GET /api/settings, which already returns the
// merged env+runtime list. When the user clicks Save, the same merged
// list comes back in the POST body. Strip the env-supplied keys from
// the incoming list before we persist or re-merge, otherwise each save
// duplicates the env keys on top of the previous merge (#9071).
if settings.ApiKeys != nil {
envKeys := startupConfig.ApiKeys
envSet := make(map[string]struct{}, len(envKeys))
for _, k := range envKeys {
envSet[k] = struct{}{}
}
runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
for _, k := range *settings.ApiKeys {
if _, fromEnv := envSet[k]; fromEnv {
continue
}
runtimeOnly = append(runtimeOnly, k)
}
settings.ApiKeys = &runtimeOnly
}
settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
settingsJSON, err := json.MarshalIndent(settings, "", " ")
if err != nil {

View File

@@ -147,6 +147,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
result := ""
lastEmittedCount := 0
sentInitialRole := false
sentReasoning := false
hasChatDeltaToolCalls := false
hasChatDeltaContent := false
@@ -190,6 +191,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
}},
Object: "chat.completion.chunk",
}
sentReasoning = true
}
// Stream content deltas (cleaned of reasoning tags) while no tool calls
@@ -363,7 +365,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
}
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0
// noAction is a sentinel "just answer" pseudo-function — not a real
// tool call. Scan the whole slice rather than only index 0 so we
// don't drop a real tool call that happens to follow a noAction
// entry, and so the default branch isn't entered with only noAction
// entries to emit as tool_calls.
noActionToRun := !hasRealCall(functionResults, noAction)
switch {
case noActionToRun:
@@ -377,108 +384,31 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
}
if sentInitialRole {
// Content was already streamed during the callback — just emit usage.
delta := &schema.Message{}
if reasoning != "" && extractor.Reasoning() == "" {
delta.Reasoning = &reasoning
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
}
} else {
// Content was NOT streamed — send everything at once (fallback).
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
Object: "chat.completion.chunk",
}
result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
if err != nil {
xlog.Error("error handling question", "error", err)
return err
}
delta := &schema.Message{Content: &result}
if reasoning != "" {
delta.Reasoning = &reasoning
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
var result string
if !sentInitialRole {
var hqErr error
result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
if hqErr != nil {
xlog.Error("error handling question", "error", hqErr)
return hqErr
}
}
for _, chunk := range buildNoActionFinalChunks(
id, req.Model, created,
sentInitialRole, sentReasoning,
result, reasoning, usage,
) {
responses <- chunk
}
default:
for i, ss := range functionResults {
name, args := ss.Name, ss.Arguments
toolCallID := ss.ID
if toolCallID == "" {
toolCallID = id
}
if i < lastEmittedCount {
// Already emitted during streaming by the incremental
// JSON/XML parser — skip to avoid duplicate tool calls.
continue
}
// Tool call not yet emitted — send name + args (two chunks).
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: name,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
responses <- initialMessage
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
Content: textContentToReturn,
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Arguments: args,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
for _, chunk := range buildDeferredToolCallChunks(
id, req.Model, created,
functionResults, lastEmittedCount,
sentInitialRole, *textContentToReturn,
sentReasoning, reasoning,
) {
responses <- chunk
}
}

View File

@@ -0,0 +1,233 @@
package openai
import (
"fmt"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
)
// hasRealCall reports whether functionResults contains at least one
// entry whose Name is something other than the noAction sentinel.
// Used by processTools to decide between the "answer the question"
// path and the real tool-call flush.
func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
for _, fc := range functionResults {
if fc.Name != noAction {
return true
}
}
return false
}
// buildNoActionFinalChunks produces the closing SSE chunks for the
// noActionToRun branch of processTools (i.e. the model chose the "answer"
// pseudo-function or emitted no tool calls at all).
//
// When content was already streamed (contentAlreadyStreamed=true) the
// helper emits a single trailing usage chunk, optionally carrying
// reasoning that was produced but not streamed incrementally. When
// content was not streamed it emits a role chunk followed by a
// content+reasoning+usage chunk — the "send everything at once" fallback.
//
// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
// probing the extractor's Go-side state: the C++ autoparser delivers
// reasoning through ProcessChatDeltaReasoning which populates a
// separate accumulator that extractor.Reasoning() does not expose.
// Without this guard the callback would stream reasoning incrementally
// and the final chunk would duplicate it.
func buildNoActionFinalChunks(
id, model string,
created int,
contentAlreadyStreamed bool,
reasoningAlreadyStreamed bool,
content string,
reasoning string,
usage schema.OpenAIUsage,
) []schema.OpenAIResponse {
var out []schema.OpenAIResponse
if contentAlreadyStreamed {
delta := &schema.Message{}
if reasoning != "" && !reasoningAlreadyStreamed {
r := reasoning
delta.Reasoning = &r
}
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
})
return out
}
// Content was not streamed — send role, then content (+reasoning) + usage.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Role: "assistant"},
Index: 0,
}},
Object: "chat.completion.chunk",
})
c := content
delta := &schema.Message{Content: &c}
if reasoning != "" && !reasoningAlreadyStreamed {
r := reasoning
delta.Reasoning = &r
}
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{Delta: delta, Index: 0}},
Object: "chat.completion.chunk",
Usage: usage,
})
return out
}
// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
// were discovered only during final parsing (i.e. after the streaming
// callback finished). The caller forwards every returned chunk to the
// responses channel.
//
// Guarantees:
// - tool calls with i < lastEmittedCount are skipped (already streamed)
// - each emitted call yields two chunks: name-only, then args-only
// - no chunk ever carries both non-empty Content and non-empty ToolCalls
// - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
// - if !reasoningAlreadyStreamed && reasoningContent != "",
// a reasoning chunk is emitted first
// - if !contentAlreadyStreamed && textContent != "",
// a role chunk followed by a content chunk is emitted (after reasoning)
// - chunks order: [reasoning?] [role+content?] (name, args)+
// - fallback IDs for empty ss.ID are unique per index so a client can
// match tool_result messages back to the right call
func buildDeferredToolCallChunks(
id, model string,
created int,
functionResults []functions.FuncCallResults,
lastEmittedCount int,
contentAlreadyStreamed bool,
textContent string,
reasoningAlreadyStreamed bool,
reasoningContent string,
) []schema.OpenAIResponse {
// If every call was already emitted incrementally there's nothing to
// flush — and no reason to emit a standalone reasoning/content chunk.
hasDeferred := false
for i := range functionResults {
if i >= lastEmittedCount {
hasDeferred = true
break
}
}
if !hasDeferred {
return nil
}
var out []schema.OpenAIResponse
// Reasoning first — the callback path at processTools emits reasoning
// incrementally in its own chunks, but when the C++ autoparser only
// surfaces reasoning as a final aggregate the callback never sees it.
// Recover it here (no duplication: contentAlreadyStreamed and
// reasoningAlreadyStreamed track what the callback already sent).
if !reasoningAlreadyStreamed && reasoningContent != "" {
r := reasoningContent
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Reasoning: &r},
Index: 0,
}},
Object: "chat.completion.chunk",
})
}
// Then content, when it wasn't streamed via the callback. Emit role
// and content in separate deltas — the OpenAI streaming contract
// forbids bundling content alongside tool_calls in one delta.
if !contentAlreadyStreamed && textContent != "" {
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Role: "assistant"},
Index: 0,
}},
Object: "chat.completion.chunk",
})
c := textContent
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{Content: &c},
Index: 0,
}},
Object: "chat.completion.chunk",
})
}
for i, ss := range functionResults {
if i < lastEmittedCount {
// Already streamed by the incremental JSON/XML parser during
// the token callback — skip to avoid a duplicate emission.
continue
}
toolCallID := ss.ID
if toolCallID == "" {
// Unique per-index fallback so multiple empty-ID calls don't
// collide on the same request ID (clients match tool results
// back by tool_call_id).
toolCallID = fmt.Sprintf("%s-%d", id, i)
}
// Name chunk.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: ss.Name,
},
}},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
})
// Args chunk — no Content here. Either it was streamed through
// the token callback earlier, or the role+content pair above
// already delivered it.
out = append(out, schema.OpenAIResponse{
ID: id, Created: created, Model: model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{{
Index: i,
ID: toolCallID,
Type: "function",
FunctionCall: schema.FunctionCall{
Arguments: ss.Arguments,
},
}},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
})
}
return out
}

View File

@@ -0,0 +1,717 @@
package openai
import (
"fmt"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// contentOf extracts the string payload from a chunk's delta.Content,
// transparently handling both *string and string underlying types so
// assertions don't have to care which one the helper produced.
func contentOf(ch schema.OpenAIResponse) string {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return ""
}
switch v := ch.Choices[0].Delta.Content.(type) {
case *string:
if v == nil {
return ""
}
return *v
case string:
return v
default:
return ""
}
}
// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
// *string on schema.Message.
func reasoningOf(ch schema.OpenAIResponse) string {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return ""
}
r := ch.Choices[0].Delta.Reasoning
if r == nil {
return ""
}
return *r
}
// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
return nil
}
return ch.Choices[0].Delta.ToolCalls
}
// expectSpecCompliant enforces the invariants on every chunk:
// - Object == "chat.completion.chunk"
// - Exactly one Choice with Index==0
// - No delta ever carries both non-empty Content and non-empty ToolCalls
// - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
func expectSpecCompliant(chunks []schema.OpenAIResponse) {
for i, ch := range chunks {
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
hasContent := contentOf(ch) != ""
hasReasoning := reasoningOf(ch) != ""
hasToolCalls := len(toolCallsOf(ch)) > 0
if hasContent && hasToolCalls {
Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
}
if hasReasoning && hasToolCalls {
Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
}
}
}
// expectMetadata asserts every chunk carries the same id/model/created.
func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
for i, ch := range chunks {
Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
}
}
var _ = Describe("buildDeferredToolCallChunks", func() {
const (
testID = "req"
testModel = "test-model"
testCreated = 1700000000
)
Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
It("emits only the tool_call chunks, no Content anywhere", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "Let me search…",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2), "two chunks: name, args")
// Name chunk
tc0 := toolCallsOf(chunks[0])
Expect(tc0).To(HaveLen(1))
Expect(tc0[0].Index).To(Equal(0))
Expect(tc0[0].ID).To(Equal("tc1"))
Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
Expect(contentOf(chunks[0])).To(BeEmpty())
// Args chunk — MUST NOT carry Content
tc1 := toolCallsOf(chunks[1])
Expect(tc1).To(HaveLen(1))
Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
Expect(contentOf(chunks[1])).To(BeEmpty(),
"args chunk must not duplicate already-streamed content")
})
})
Describe("Case B — autoparser / content not streamed", func() {
It("emits role, content, then name+args", func() {
results := []functions.FuncCallResults{
{Name: "do", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "Here is my plan…",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(4), "role, content, name, args")
// Role chunk
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
// Content chunk
Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
Expect(toolCallsOf(chunks[1])).To(BeEmpty())
// Name + args chunks
Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case C — multiple deferred calls, content already streamed", func() {
It("emits (name, args) × 3 with no Content anywhere", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tcA"},
{Name: "b", Arguments: "{}", ID: "tcB"},
{Name: "c", Arguments: "{}", ID: "tcC"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "some narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(6))
for i := 0; i < 3; i++ {
Expect(contentOf(chunks[2*i])).To(BeEmpty(),
"call #%d name chunk must not carry Content", i)
Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
"call #%d args chunk must not carry Content", i)
Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
}
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
})
})
Describe("Case D — partial incremental emission", func() {
It("emits only the deferred tail (call #1), skipping #0", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 1,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case E — all calls already emitted incrementally", func() {
It("emits nothing", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 2,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(BeEmpty())
})
})
Describe("Case F — content not streamed but textContent empty", func() {
It("emits only the tool call chunks, no leading role/content", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: "tcX"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
})
})
Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
It("emits a deterministic per-index fallback", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: ""},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
expectedID := fmt.Sprintf("%s-%d", testID, 0)
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
})
})
Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
It("avoids the collision bug where every empty-ID call shared the request id", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: ""},
{Name: "b", Arguments: "{}", ID: ""},
{Name: "c", Arguments: "{}", ID: ""},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(6))
ids := map[string]int{}
for _, ch := range chunks {
for _, tc := range toolCallsOf(ch) {
ids[tc.ID]++
}
}
// Each call yields a name chunk + args chunk → each distinct ID
// should appear in exactly two chunks. Three distinct IDs
// overall.
Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
for id, n := range ids {
Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
}
})
})
Describe("Case H — indices preserved across skip with multiple calls", func() {
It("emits Index fields matching functionResults positions", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
{Name: "c", Arguments: "{}", ID: "tc2"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 1,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(4))
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
})
})
Describe("Case I — explicit non-empty ID is preserved", func() {
It("does not touch ss.ID when it's already set", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: "{}", ID: "abc123"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
})
})
Describe("Case J — chunk-shape sanity", func() {
It("splits Name into the first chunk and Arguments into the second", func() {
results := []functions.FuncCallResults{
{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "narration",
true, "",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
})
})
Describe("Case K — metadata propagation", func() {
It("stamps every chunk with the same id/model/created", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tcA"},
{Name: "b", Arguments: "{}", ID: "tcB"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "hello",
true, "",
)
expectSpecCompliant(chunks)
expectMetadata(chunks, testID, testModel, testCreated)
})
})
Describe("Case L — Choices[0].Index == 0 invariant", func() {
It("is upheld across every branch the helper can take", func() {
scenarios := []struct {
name string
functionResults []functions.FuncCallResults
lastEmittedCount int
contentStreamed bool
text string
reasoningStreamed bool
reasoning string
}{
{"streamed-content-deferred-call",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, true, "hi", true, ""},
{"unstreamed-content-deferred-call",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, false, "hello", true, ""},
{"unstreamed-reasoning-and-content",
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
0, false, "hello", false, "thinking…"},
{"partial-incremental",
[]functions.FuncCallResults{
{Name: "a", Arguments: "{}"},
{Name: "b", Arguments: "{}"}},
1, true, "hi", true, ""},
}
for _, sc := range scenarios {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
sc.functionResults, sc.lastEmittedCount,
sc.contentStreamed, sc.text,
sc.reasoningStreamed, sc.reasoning,
)
for i, ch := range chunks {
Expect(ch.Choices[0].Index).To(Equal(0),
"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
}
}
})
})
Describe("Case M — spec compliance across every scenario", func() {
It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
scenarios := []struct {
name string
functionResults []functions.FuncCallResults
lastEmittedCount int
contentStreamed bool
text string
reasoningStreamed bool
reasoning string
}{
{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, true, "already-streamed", true, ""},
{"C", []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"}},
0, true, "already-streamed", true, ""},
{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, false, "plan", true, ""},
{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
0, false, "plan", false, "thinking…"},
}
for _, sc := range scenarios {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
sc.functionResults, sc.lastEmittedCount,
sc.contentStreamed, sc.text,
sc.reasoningStreamed, sc.reasoning,
)
for i, ch := range chunks {
hasContent := contentOf(ch) != ""
hasReasoning := reasoningOf(ch) != ""
hasToolCalls := len(toolCallsOf(ch)) > 0
Expect(hasContent && hasToolCalls).To(BeFalse(),
"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
Expect(hasReasoning && hasToolCalls).To(BeFalse(),
"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
}
}
})
})
Describe("Case N — empty functionResults", func() {
It("emits nothing, including no leading role/content/reasoning", func() {
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
nil, 0,
false, "ignored",
false, "ignored",
)
Expect(chunks).To(BeEmpty())
})
})
Describe("Case O — content not streamed but all calls already emitted", func() {
It("emits nothing, not even a standalone content chunk", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc0"},
{Name: "b", Arguments: "{}", ID: "tc1"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 2,
false, "narration",
false, "thinking…",
)
Expect(chunks).To(BeEmpty(),
"no tool_calls to trigger on, so no leading role/content/reasoning either")
})
})
Describe("Reasoning — autoparser delivered reasoning only at end", func() {
It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "streamed content",
false, "model's private thoughts",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(3), "reasoning, name, args")
Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
// The following two are the tool_call name + args chunks.
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
})
It("emits reasoning before role+content when neither was streamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
false, "final plan",
false, "private thoughts",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[2])).To(Equal("final plan"))
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
})
It("does not re-emit reasoning that was already streamed", func() {
results := []functions.FuncCallResults{
{Name: "a", Arguments: "{}", ID: "tc"},
}
chunks := buildDeferredToolCallChunks(
testID, testModel, testCreated,
results, 0,
true, "streamed",
true, "already-sent reasoning",
)
expectSpecCompliant(chunks)
Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
for _, ch := range chunks {
Expect(reasoningOf(ch)).To(BeEmpty())
}
})
})
})
var _ = Describe("hasRealCall", func() {
const noAction = "answer"
It("returns false for nil and empty slices", func() {
Expect(hasRealCall(nil, noAction)).To(BeFalse())
Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
})
It("returns false when every entry is the noAction sentinel", func() {
results := []functions.FuncCallResults{
{Name: noAction, Arguments: `{"message":"hi"}`},
{Name: noAction, Arguments: `{"message":"hello"}`},
}
Expect(hasRealCall(results, noAction)).To(BeFalse())
})
It("returns true when only one entry is a real call", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: "{}"},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
It("returns true when a real call follows a noAction entry", func() {
// This is the regression the follow-up fixes: the old
// functionResults[0].Name == noAction check would declare this
// noActionToRun and drop the real call entirely.
results := []functions.FuncCallResults{
{Name: noAction, Arguments: `{"message":"hi"}`},
{Name: "search", Arguments: "{}"},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
It("returns true when a real call precedes a noAction entry", func() {
results := []functions.FuncCallResults{
{Name: "search", Arguments: "{}"},
{Name: noAction, Arguments: `{"message":"hi"}`},
}
Expect(hasRealCall(results, noAction)).To(BeTrue())
})
})
var _ = Describe("buildNoActionFinalChunks", func() {
const (
testID = "req"
testModel = "test-model"
testCreated = 1700000000
)
usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
Describe("Content streamed — trailing usage chunk", func() {
It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, true,
"", "already-streamed-reasoning", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(reasoningOf(chunks[0])).To(BeEmpty(),
"reasoning must not be re-emitted once it was streamed via the callback")
})
It("emits a trailing reasoning delivery when reasoning came only at end", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, false,
"", "autoparser final reasoning", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
})
It("omits reasoning when it's empty regardless of streamed flag", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
true, false,
"", "", usage,
)
Expect(chunks).To(HaveLen(1))
Expect(reasoningOf(chunks[0])).To(BeEmpty())
})
})
Describe("Content not streamed — role, then content+usage", func() {
It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, true,
"the answer", "already-streamed-reasoning", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[0])).To(BeEmpty())
Expect(contentOf(chunks[1])).To(Equal("the answer"))
Expect(reasoningOf(chunks[1])).To(BeEmpty(),
"reasoning must not be re-emitted if it was streamed earlier")
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
})
It("emits role, then content+reasoning when reasoning was not streamed", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"the answer", "autoparser final reasoning", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
Expect(contentOf(chunks[1])).To(Equal("the answer"))
Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
})
It("still emits content even when reasoning is empty", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"just an answer", "", usage,
)
Expect(chunks).To(HaveLen(2))
Expect(contentOf(chunks[1])).To(Equal("just an answer"))
Expect(reasoningOf(chunks[1])).To(BeEmpty())
})
})
Describe("Metadata and shape invariants", func() {
It("stamps every chunk with the same id/model/created and object", func() {
chunks := buildNoActionFinalChunks(
testID, testModel, testCreated,
false, false,
"hi", "reasoning", usage,
)
for i, ch := range chunks {
Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
Expect(ch.Choices).To(HaveLen(1))
Expect(ch.Choices[0].Index).To(Equal(0))
}
})
})
})

View File

@@ -3,6 +3,7 @@ package middleware
import (
"bytes"
"io"
"mime"
"net/http"
"slices"
"sync"
@@ -94,7 +95,8 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
initializeTracing(app.ApplicationConfig().TracingMaxItems)
if c.Request().Header.Get("Content-Type") != "application/json" {
ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
if ct != "application/json" {
return next(c)
}

View File

@@ -23,7 +23,6 @@ import (
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/http/endpoints/localai"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/p2p"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/pkg/model"
@@ -1458,24 +1457,5 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
}
// Logs API (admin only)
app.GET("/api/traces", func(c echo.Context) error {
if !appConfig.EnableTracing {
return c.JSON(503, map[string]any{
"error": "Tracing disabled",
})
}
traces := middleware.GetTraces()
return c.JSON(200, map[string]any{
"traces": traces,
})
}, adminMiddleware)
app.POST("/api/traces/clear", func(c echo.Context) error {
middleware.ClearTraces()
return c.JSON(200, map[string]any{
"message": "Traces cleared",
})
}, adminMiddleware)
}

View File

@@ -124,8 +124,13 @@ func SubjectNodeBackendInstall(nodeID string) string {
// BackendInstallRequest is the payload for a backend.install NATS request.
type BackendInstallRequest struct {
Backend string `json:"backend"`
ModelID string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
ModelID string `json:"model_id,omitempty"`
BackendGalleries string `json:"backend_galleries,omitempty"`
// URI is set for external installs (OCI image, URL, or path). When non-empty
// the worker routes to InstallExternalBackend instead of the gallery lookup.
URI string `json:"uri,omitempty"`
Name string `json:"name,omitempty"`
Alias string `json:"alias,omitempty"`
}
// BackendInstallReply is the response from a backend.install NATS request.

View File

@@ -106,6 +106,13 @@ func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context
if node.Status == StatusPending {
continue
}
// Backend lifecycle ops only make sense on backend-type workers.
// Agent workers don't subscribe to backend.install/delete/list, so
// enqueueing for them guarantees a forever-retrying row that the
// reconciler can never drain. Silently skip — they aren't consumers.
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
continue
}
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
result.Nodes = append(result.Nodes, NodeOpStatus{
@@ -286,7 +293,7 @@ func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *gall
backendName := op.GalleryElementName
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
if err != nil {
return err
}
@@ -304,7 +311,7 @@ func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name str
galleriesJSON, _ := json.Marshal(d.backendGalleries)
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
if err != nil {
return err
}

View File

@@ -3,12 +3,14 @@ package nodes
import (
"context"
"encoding/json"
"errors"
"fmt"
"time"
"github.com/mudler/LocalAI/core/services/advisorylock"
grpcclient "github.com/mudler/LocalAI/pkg/grpc"
"github.com/mudler/xlog"
"github.com/nats-io/nats.go"
"gorm.io/gorm"
)
@@ -186,7 +188,7 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
case OpBackendDelete:
_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
case OpBackendInstall, OpBackendUpgrade:
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries))
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
if err != nil {
applyErr = err
} else if !reply.Success {
@@ -206,12 +208,47 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
}
continue
}
// ErrNoResponders means the node has no active NATS subscription for
// this subject. Either its connection dropped, or it's the wrong
// node type entirely. Mark unhealthy so the health monitor's
// heartbeat-only pass doesn't immediately flip it back — and so
// ListDuePendingBackendOps (which filters by status=healthy) stops
// picking the row until the node genuinely recovers.
if errors.Is(applyErr, nats.ErrNoResponders) {
xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
"op", op.Op, "backend", op.Backend, "node", op.NodeID)
_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
}
// Dead-letter cap: after maxAttempts the row is the reconciler
// equivalent of a poison message. Delete it loudly so the queue
// doesn't churn NATS every tick forever — operators can re-issue
// the op from the UI if they still want it applied.
if op.Attempts+1 >= maxPendingBackendOpAttempts {
xlog.Error("Reconciler: abandoning pending backend op after max attempts",
"op", op.Op, "backend", op.Backend, "node", op.NodeID,
"attempts", op.Attempts+1, "last_error", applyErr)
if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
}
continue
}
_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
xlog.Warn("Reconciler: pending backend op retry failed",
"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
}
}
// maxPendingBackendOpAttempts caps how many times the reconciler retries a
// failing row before dead-lettering it. Ten attempts at exponential backoff
// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
// worker restart or network blip. Poisoned rows beyond that are almost
// certainly structural (wrong node type, non-existent gallery entry) and no
// amount of further retrying will help.
const maxPendingBackendOpAttempts = 10
// probeLoadedModels gRPC-health-checks model addresses that the DB says are
// loaded. If a model's backend process is gone (OOM, crash, manual restart)
// we remove the row so ghosts don't linger. Only probes rows older than

View File

@@ -373,4 +373,30 @@ var _ = Describe("ReplicaReconciler — state reconciliation", func() {
Expect(row.NextRetryAt).To(BeTemporally(">", before))
})
})
Describe("NewNodeRegistry malformed-row pruning", func() {
It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
// Three rows: one for a valid backend node (should survive),
// one for an agent node (pruned), one for an empty backend name
// on the valid node (pruned).
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
// Re-instantiating the registry runs the cleanup migration.
_, err := NewNodeRegistry(db)
Expect(err).ToNot(HaveOccurred())
var rows []PendingBackendOp
Expect(db.Find(&rows).Error).To(Succeed())
Expect(rows).To(HaveLen(1))
Expect(rows[0].NodeID).To(Equal(backend.ID))
Expect(rows[0].Backend).To(Equal("foo"))
})
})
})

View File

@@ -148,6 +148,30 @@ func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
}); err != nil {
return nil, fmt.Errorf("migrating node tables: %w", err)
}
// One-shot cleanup of queue rows that can never drain: ops targeted at
// agent workers (wrong subscription set), at non-existent nodes, or with
// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
// new ones from being written, but rows persisted by earlier versions
// keep the reconciler busy retrying a permanently-failing NATS request
// every 30s. Guarded by the same migration advisory lock so only one
// frontend runs it.
_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
res := db.Exec(`
DELETE FROM pending_backend_ops
WHERE backend = ''
OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
`, NodeTypeBackend)
if res.Error != nil {
xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
return res.Error
}
if res.RowsAffected > 0 {
xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
}
return nil
})
return &NodeRegistry{db: db}, nil
}

View File

@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
return "", fmt.Errorf("no NATS connection for backend installation")
}
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
if err != nil {
return "", err
}

View File

@@ -244,7 +244,7 @@ type fakeUnloader struct {
unloadErr error
}
func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
return f.installReply, f.installErr
}

View File

@@ -17,7 +17,7 @@ type backendStopRequest struct {
// NodeCommandSender abstracts NATS-based commands to worker nodes.
// Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
type NodeCommandSender interface {
InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
ListBackends(nodeID string) (*messaging.BackendListReply, error)
StopBackend(nodeID, backend string) error
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
// The worker installs the backend from gallery (if not already installed),
// starts the gRPC process, and replies when ready.
// Timeout: 5 minutes (gallery install can take a while).
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
subject := messaging.SubjectNodeBackendInstall(nodeID)
xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)
@@ -80,6 +80,9 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
Backend: backendType,
ModelID: modelID,
BackendGalleries: galleriesJSON,
URI: uri,
Name: name,
Alias: alias,
}, 5*time.Minute)
}

View File

@@ -14,11 +14,13 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend
### Request
The request body is JSON:
The model to monitor is passed as a query parameter:
| Parameter | Type | Required | Description |
|-----------|----------|----------|--------------------------------|
| `model` | `string` | Yes | Name of the model to monitor |
| Parameter | Type | Required | Location | Description |
|-----------|----------|----------|----------|--------------------------------|
| `model` | `string` | Yes | query | Name of the model to monitor |
For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.
### Response
@@ -42,9 +44,7 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
### Usage
```bash
curl http://localhost:8080/backend/monitor \
-H "Content-Type: application/json" \
-d '{"model": "my-model"}'
curl "http://localhost:8080/backend/monitor?model=my-model"
```
### Example response

View File

@@ -130,6 +130,19 @@ Reference for system information commands and diagnostics.
---
### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
Policy for AI-assisted contributions — licensing, DCO, and attribution.
**Key topics:**
- Aligned with the Linux kernel's AI assistants policy
- Signed-off-by and DCO rules
- `Assisted-by` commit trailer format
- Scope and responsibility of the human submitter
**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
---
## Quick Links
| Task | Documentation |
@@ -138,6 +151,7 @@ Reference for system information commands and diagnostics.
| CLI commands | [CLI Reference](cli-reference.md) |
| Check compatibility | [Compatibility Table](compatibility-table.md) |
| System diagnostics | [System Info](system-info.md) |
| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |
---

View File

@@ -0,0 +1,79 @@
+++
disableToc = false
title = "AI Coding Assistants"
weight = 28
+++
This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
AI tools helping with LocalAI development should follow the standard project development process:
- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
## Licensing and Legal Requirements
All contributions must comply with LocalAI's licensing requirements:
- LocalAI is licensed under the **MIT License**
- New source files should use the SPDX license identifier `MIT` where applicable to the file type
- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
## Signed-off-by and Developer Certificate of Origin
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
## Attribution
When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Where:
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
Basic development tools (git, go, make, editors) should **not** be listed.
### Example
```
fix(llama-cpp): handle empty tool call arguments
Previously the parser panicked when the model returned a tool call with
an empty arguments object. Fall back to an empty JSON object in that
case so downstream consumers receive a valid payload.
Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the project style
- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
{{% notice note %}}
This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
{{% /notice %}}

View File

@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
|---------|-------------|-------------|
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |

View File

@@ -1,4 +1,151 @@
---
- name: "qwen3.5-9b-glm5.1-distill-v1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
description: |
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
## 📌 Model Overview
**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
**Base Model:** Qwen3.5-9B
**Training Type:** Supervised Fine-Tuning (SFT, Distillation)
**Parameter Scale:** 9B
**Training Framework:** Unsloth
This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
The primary goals are to:
- Improve **structured reasoning ability**
- Enhance **instruction-following consistency**
- Activate **latent knowledge via better reasoning structure**
## 📊 Training Data
### Main Dataset
- `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
- Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
- Generated from a **GLM-5.1 teacher model**
- Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
- Training used a **filtered subset**, not the full source dataset.
### Auxiliary Dataset
- `Jackrong/Qwen3.5-reasoning-700x`
...
license: "apache-2.0"
tags:
- llm
- gguf
- qwen
- instruction-tuned
- reasoning
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
options:
- use_jinja:true
parameters:
min_p: 0
model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
presence_penalty: 1.5
repeat_penalty: 1
temperature: 0.7
top_k: 20
top_p: 0.8
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
- filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
- name: "supergemma4-26b-uncensored-v2"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
description: |
Hugging Face |
GitHub |
Launch Blog |
Documentation
License: Apache 2.0 | Authors: Google DeepMind
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
Gemma 4 introduces key **capability and architectural advancements**:
* **Reasoning** All models in the family are designed as highly capable reasoners, with configurable thinking modes.
...
license: "gemma"
tags:
- llm
- gguf
icon: https://ai.google.dev/gemma/images/gemma4_banner.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
- name: "qwopus-glm-18b-merged"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n - Improve **structured reasoning ability**\n - Enhance **instruction-following consistency**\n - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n - Generated from a **GLM-5.1 teacher model**\n - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
license: "apache-2.0"
tags:
- llm
- gguf
- reasoning
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
- name: "qwen3.6-35b-a3b-apex"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
@@ -887,6 +1034,8 @@
- gpu
overrides:
backend: neutts
parameters:
model: neuphonic/neutts-air
known_usecases:
- tts
- name: vllm-omni-z-image-turbo
@@ -15186,14 +15335,16 @@
- gpu
overrides:
parameters:
model: wan2.1-t2v-1.3B-Q8_0.gguf
model: wan2.1_t2v_1.3b-q8_0.gguf
files:
- filename: "wan2.1-t2v-1.3B-Q8_0.gguf"
uri: "huggingface://calcuis/wan-gguf/wan2.1-t2v-1.3B-Q8_0.gguf"
- filename: "wan2.1_t2v_1.3b-q8_0.gguf"
sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- name: wan-2.1-i2v-14b-480p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
@@ -15214,11 +15365,103 @@
model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- filename: "clip_vision_h.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-flf2v-14b-720p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
description: |
Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
Takes a start and end reference image and interpolates a 33-frame clip
between them. Unlike the plain I2V variant this model feeds the end
frame through clip_vision as well, so it conditions semantically (not
just in pixel-space) on both endpoints. That makes it the right choice
for seamless loops (start_image == end_image) and clean narrative cuts.
Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
text encoder, and clip_vision_h as I2V 14B.
urls:
- https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
tags:
- image-to-video
- first-last-frame-to-video
- wan
- video-generation
- cpu
- gpu
overrides:
parameters:
model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
- filename: "clip_vision_h.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-i2v-14b-720p-ggml
license: apache-2.0
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
description: |
Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
Native 720p sibling of the 480p I2V model: animates a single
reference image into a 33-frame clip at up to 1280x720. Trained
purely as image-to-video (no first-last-frame interpolation path),
so motion is freer and better-suited to single-anchor animation
than repurposing the FLF2V 720P variant for i2v. Shares the same
VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
and FLF2V 14B 720P entries.
urls:
- https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
tags:
- image-to-video
- wan
- video-generation
- cpu
- gpu
overrides:
parameters:
model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
options:
- "clip_vision_path:clip_vision_h.safetensors"
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "flow_shift:3.0"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"
files:
- filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
- filename: "wan_2.1_vae.safetensors"
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
- filename: "umt5-xxl-encoder-Q8_0.gguf"
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
- filename: "clip_vision_h.safetensors"

View File

@@ -9,11 +9,6 @@ config_file: |
- "diffusion_model"
- "vae_decode_only:false"
- "sampler:euler"
- "scheduler:discrete"
- "flow_shift:3.0"
- "diffusion_flash_attn:true"
- "offload_params_to_cpu:true"
- "keep_vae_on_cpu:true"
- "keep_clip_on_cpu:true"
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
- "vae_path:wan_2.1_vae.safetensors"

45
go.mod
View File

@@ -8,13 +8,13 @@ require (
github.com/Masterminds/sprig/v3 v3.3.0
github.com/alecthomas/kong v1.14.0
github.com/anthropics/anthropic-sdk-go v1.27.0
github.com/aws/aws-sdk-go-v2 v1.41.5
github.com/aws/aws-sdk-go-v2/config v1.32.14
github.com/aws/aws-sdk-go-v2/credentials v1.19.14
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
github.com/aws/aws-sdk-go-v2 v1.41.6
github.com/aws/aws-sdk-go-v2/config v1.32.16
github.com/aws/aws-sdk-go-v2/credentials v1.19.15
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
github.com/charmbracelet/glamour v1.0.0
github.com/containerd/containerd v1.7.30
github.com/coreos/go-oidc/v3 v3.17.0
github.com/containerd/containerd v1.7.31
github.com/coreos/go-oidc/v3 v3.18.0
github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
github.com/ebitengine/purego v0.10.0
github.com/emirpasic/gods/v2 v2.0.0-alpha
@@ -35,7 +35,7 @@ require (
github.com/lithammer/fuzzysearch v1.1.8
github.com/mholt/archiver/v3 v3.5.1
github.com/microcosm-cc/bluemonday v1.0.27
github.com/modelcontextprotocol/go-sdk v1.4.1
github.com/modelcontextprotocol/go-sdk v1.5.0
github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
github.com/mudler/edgevpn v0.31.1
github.com/mudler/go-processmanager v0.1.0
@@ -75,24 +75,23 @@ require (
)
require (
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
github.com/aws/smithy-go v1.24.2 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
github.com/aws/smithy-go v1.25.0 // indirect
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/go-jose/go-jose/v4 v4.1.3 // indirect
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
github.com/jinzhu/inflection v1.0.0 // indirect
github.com/jinzhu/now v1.1.5 // indirect
github.com/mattn/go-sqlite3 v1.14.24 // indirect

94
go.sum
View File

@@ -70,44 +70,42 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -198,8 +196,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
@@ -212,8 +210,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
@@ -336,8 +334,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
@@ -385,8 +383,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
@@ -691,8 +689,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=

View File

@@ -159,7 +159,6 @@ var _ = Describe("CapabilityFilterDisabled", func() {
os.Setenv(capabilityEnv, "disable")
s := &SystemState{}
Expect(s.IsBackendCompatible("cuda12-whisperx", "quay.io/nvidia-cuda-12")).To(BeTrue())
Expect(s.IsBackendCompatible("rocm-whisperx", "quay.io/rocm")).To(BeTrue())
Expect(s.IsBackendCompatible("metal-whisperx", "quay.io/metal-darwin")).To(BeTrue())
Expect(s.IsBackendCompatible("intel-whisperx", "quay.io/intel-sycl")).To(BeTrue())
Expect(s.IsBackendCompatible("cpu-whisperx", "quay.io/cpu")).To(BeTrue())

View File

@@ -985,13 +985,11 @@ const docTemplate = `{
"summary": "Backend monitor endpoint",
"parameters": [
{
"description": "Backend statistics request",
"name": "request",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/schema.BackendMonitorRequest"
}
"type": "string",
"description": "Name of the model to monitor",
"name": "model",
"in": "query",
"required": true
}
],
"responses": {
@@ -2408,6 +2406,23 @@ const docTemplate = `{
}
}
},
"gallery.NodeDriftInfo": {
"type": "object",
"properties": {
"digest": {
"type": "string"
},
"node_id": {
"type": "string"
},
"node_name": {
"type": "string"
},
"version": {
"type": "string"
}
}
},
"gallery.UpgradeInfo": {
"type": "object",
"properties": {
@@ -2425,6 +2440,13 @@ const docTemplate = `{
},
"installed_version": {
"type": "string"
},
"node_drift": {
"description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
"type": "array",
"items": {
"$ref": "#/definitions/gallery.NodeDriftInfo"
}
}
}
},

View File

@@ -982,13 +982,11 @@
"summary": "Backend monitor endpoint",
"parameters": [
{
"description": "Backend statistics request",
"name": "request",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/schema.BackendMonitorRequest"
}
"type": "string",
"description": "Name of the model to monitor",
"name": "model",
"in": "query",
"required": true
}
],
"responses": {
@@ -2405,6 +2403,23 @@
}
}
},
"gallery.NodeDriftInfo": {
"type": "object",
"properties": {
"digest": {
"type": "string"
},
"node_id": {
"type": "string"
},
"node_name": {
"type": "string"
},
"version": {
"type": "string"
}
}
},
"gallery.UpgradeInfo": {
"type": "object",
"properties": {
@@ -2422,6 +2437,13 @@
},
"installed_version": {
"type": "string"
},
"node_drift": {
"description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
"type": "array",
"items": {
"$ref": "#/definitions/gallery.NodeDriftInfo"
}
}
}
},

View File

@@ -157,6 +157,17 @@ definitions:
type: string
type: array
type: object
gallery.NodeDriftInfo:
properties:
digest:
type: string
node_id:
type: string
node_name:
type: string
version:
type: string
type: object
gallery.UpgradeInfo:
properties:
available_digest:
@@ -169,6 +180,14 @@ definitions:
type: string
installed_version:
type: string
node_drift:
description: |-
NodeDrift lists nodes whose installed version or digest differs from
the cluster majority. Non-empty means the cluster has diverged and an
upgrade will realign it. Empty in single-node mode.
items:
$ref: '#/definitions/gallery.NodeDriftInfo'
type: array
type: object
galleryop.OpStatus:
properties:
@@ -2363,12 +2382,11 @@ paths:
/backend/monitor:
get:
parameters:
- description: Backend statistics request
in: body
name: request
- description: Name of the model to monitor
in: query
name: model
required: true
schema:
$ref: '#/definitions/schema.BackendMonitorRequest'
type: string
responses:
"200":
description: Response

View File

@@ -57,7 +57,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
FlushNATS(infra.NC)
adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "")
installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "", "", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(installReply.Success).To(BeTrue())
})
@@ -78,7 +78,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
FlushNATS(infra.NC)
adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "")
installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "", "", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(installReply.Success).To(BeFalse())
Expect(installReply.Error).To(ContainSubstring("backend not found"))