fix(distributed): stop queue loops on agent nodes + dead-letter cap

pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.
feat(ui): shared FilterBar across the System page tabs
2026-05-21 15:15:40 -04:00 · 2026-04-19 21:27:05 +00:00 · 2026-04-19 08:46:22 +00:00 · 2026-04-19 08:39:59 +00:00 · 2026-04-19 08:37:45 +00:00 · 2026-04-19 08:34:57 +00:00
58 changed files with 444 additions and 2222 deletions
--- a/.agents/ai-coding-assistants.md
+++ b/.agents/ai-coding-assistants.md
@@ -1,101 +0,0 @@
-# AI Coding Assistants
-
-This document provides guidance for AI tools and developers using AI
-assistance when contributing to LocalAI.
-
-**LocalAI follows the same guidelines as the Linux kernel project for
-AI-assisted contributions.** See the upstream policy here:
-<https://docs.kernel.org/process/coding-assistants.html>
-
-The rules below mirror that policy, adapted to LocalAI's license and
-project layout. If anything is unclear, the kernel document is the
-authoritative reference for intent.
-
-AI tools helping with LocalAI development should follow the standard
-project development process:
-
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
-  conventions, and PR guidelines
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
-  logging, and documentation conventions
- [.agents/building-and-testing.md](building-and-testing.md) — build and
-  test procedures
-
-## Licensing and Legal Requirements
-
-All contributions must comply with LocalAI's licensing requirements:
-
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
-  file
- New source files should use the SPDX license identifier `MIT` where
-  applicable to the file type
- Contributions must be compatible with the MIT License and must not
-  introduce code under incompatible licenses (e.g., GPL) without an
-  explicit discussion with maintainers
-
-## Signed-off-by and Developer Certificate of Origin
-
-**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
-certify the Developer Certificate of Origin (DCO). The human submitter
-is responsible for:
-
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO)
-  to certify the contribution
- Taking full responsibility for the contribution
-
-AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
-A human reviewer owns the contribution; the AI's involvement is recorded
-via `Assisted-by` (see below).
-
-## Attribution
-
-When AI tools contribute to LocalAI development, proper attribution helps
-track the evolving role of AI in the development process. Contributions
-should include an `Assisted-by` tag in the commit message trailer in the
-following format:
-
-```
-Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
-```
-
-Where:
-
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
-  `Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g.,
-  `claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
-  agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
-
-Basic development tools (git, go, make, editors) should **not** be listed.
-
-### Example
-
-```
-fix(llama-cpp): handle empty tool call arguments
-
-Previously the parser panicked when the model returned a tool call with
-an empty arguments object. Fall back to an empty JSON object in that
-case so downstream consumers receive a valid payload.
-
-Assisted-by: Claude:claude-opus-4-7 golangci-lint
-Signed-off-by: Jane Developer <jane@example.com>
-```
-
-## Scope and Responsibility
-
-Using an AI assistant does not reduce the contributor's responsibility.
-The human submitter must:
-
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the
-  project style
- Confirm that any referenced APIs, flags, or file paths actually exist
-  in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
-
-Reviewers may ask for clarification on any change regardless of how it
-was produced. "An AI wrote it" is not an acceptable answer to a design
-question.
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -30,7 +30,6 @@ jobs:
      skip-drivers: ${{ matrix.skip-drivers }}
      context: ${{ matrix.context }}
      ubuntu-version: ${{ matrix.ubuntu-version }}
-      amdgpu-targets: ${{ matrix.amdgpu-targets }}
    secrets:
      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -1624,6 +1623,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          - build-type: 'hipblas'
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-rocm-hipblas-whisperx'
+            runs-on: 'bigger-runner'
+            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
+            skip-drivers: 'false'
+            backend: "whisperx"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'hipblas'
            cuda-major-version: ""
            cuda-minor-version: ""
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -58,11 +58,6 @@ on:
        required: false
        default: '2204'
        type: string
-      amdgpu-targets:
-        description: 'AMD GPU targets for ROCm/HIP builds'
-        required: false
-        default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
-        type: string
    secrets:
      dockerUsername:
        required: false
@@ -219,7 +214,6 @@ jobs:
            BASE_IMAGE=${{ inputs.base-image }}
            BACKEND=${{ inputs.backend }}
            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=gha
@@ -241,7 +235,6 @@ jobs:
            BASE_IMAGE=${{ inputs.base-image }}
            BACKEND=${{ inputs.backend }}
            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
          context: ${{ inputs.context }}
          file: ${{ inputs.dockerfile }}
          cache-from: type=gha
--- a/.github/workflows/gallery-agent.yaml
+++ b/.github/workflows/gallery-agent.yaml
@@ -54,41 +54,24 @@ jobs:
          REPO: ${{ github.repository }}
          SEARCH: 'gallery agent in:title'
        run: |
-          # Walk gallery-agent PRs and act on maintainer comments:
+          # Walk open gallery-agent PRs and act on maintainer comments:
          #   /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
          #   /gallery-agent recreate  → close without label (next run may repropose)
          # Only comments from OWNER / MEMBER / COLLABORATOR are honored so
          # random users can't drive the bot.
-          #
-          # We scan both open PRs AND recently-closed PRs that don't already
-          # carry the blacklist label. This covers the common flow where a
-          # maintainer writes /gallery-agent blacklist and immediately clicks
-          # Close — without this, the next scheduled run wouldn't see the
-          # command (PR is already closed) and would repropose the model.
          gh label create gallery-agent/blacklisted \
            --repo "$REPO" --color ededed \
            --description "gallery-agent must not repropose this model" 2>/dev/null || true

-          prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
-            --json number --jq '.[].number')
-          # Closed PRs from the last 14 days that don't yet have the blacklist label.
-          # Bounded window keeps the scan cheap while covering late-applied commands.
-          since=$(date -u -d '14 days ago' +%Y-%m-%d)
-          prs_closed=$(gh pr list --repo "$REPO" --state closed \
-            --search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
-            --json number --jq '.[].number')
-          prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
+          prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
          for pr in $prs; do
-            state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
            cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
              --jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
            if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
-              echo "PR #$pr: blacklist command found (state=$state)"
+              echo "PR #$pr: blacklist command found"
              gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
-              if [ "$state" = "OPEN" ]; then
-                gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
-              fi
-            elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
+              gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
+            elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
              echo "PR #$pr: recreate command found"
              gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
            fi
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,23 +1,11 @@
 # LocalAI Agent Instructions

-This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
-
-Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
-
-## Policy for AI-Assisted Contributions
-
-LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
-
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
+This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.

 ## Topics

 | File | When to read |
 |------|-------------|
-| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
 | [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
 | [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
 | [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -13,7 +13,6 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
  - [Development Workflow](#development-workflow)
  - [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
 - [Coding Guidelines](#coding-guidelines)
- [AI Coding Assistants](#ai-coding-assistants)
 - [Testing](#testing)
 - [Documentation](#documentation)
 - [Community and Communication](#community-and-communication)
@@ -186,7 +185,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to

 This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.

-For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
+For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.

 ### General Principles

@@ -212,26 +211,6 @@ For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`C
 - Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
 - Be responsive to review feedback and keep discussions constructive.

-## AI Coding Assistants
-
-LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
-
-The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
-
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
-
-  ```
-  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
-  ```
-
-  Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
-
-  Basic development tools (git, go, make, editors) should not be listed.
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
- Contributions must remain compatible with LocalAI's **MIT License**.
-
 ## Testing

 All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
+IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
+LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/turboquant/Makefile
+++ b/backend/cpp/turboquant/Makefile
@@ -1,7 +1,7 @@

 # Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
 # Auto-bumped nightly by .github/workflows/bump_deps.yaml.
-TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234
+TURBOQUANT_VERSION?=45f8a066ed5f5bb38c695cec532f6cef9f4efa9d
 LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant

 CMAKE_ARGS?=
--- a/backend/cpp/turboquant/patch-grpc-server.sh
+++ b/backend/cpp/turboquant/patch-grpc-server.sh
@@ -1,22 +1,13 @@
 #!/bin/bash
-# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
-# turboquant build to account for two gaps between upstream and the fork:
+# Augment the shared backend/cpp/llama-cpp/grpc-server.cpp allow-list of KV-cache
+# types so the gRPC `LoadModel` call accepts the TurboQuant-specific
+# `turbo2` / `turbo3` / `turbo4` cache types.
 #
-#   1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
-#      fork-specific `turbo2` / `turbo3` / `turbo4` cache types.
-#   2. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
-#      server-side random per-instance marker) with the legacy "<__media__>"
-#      literal. The fork branched before that PR, so server-common.cpp has no
-#      get_media_marker symbol. The fork's mtmd_default_marker() still returns
-#      "<__media__>", and Go-side tooling falls back to that sentinel when the
-#      backend does not expose media_marker, so substituting the literal keeps
-#      behavior identical on the turboquant path.
+# We do this on the *copy* sitting in turboquant-<flavor>-build/, never on the
+# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
+# compiling against vanilla upstream which does not know about GGML_TYPE_TURBO*.
 #
-# We patch the *copy* sitting in turboquant-<flavor>-build/, never the original
-# under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps compiling
-# against vanilla upstream.
-#
-# Idempotent: skips each insertion if its marker is already present (so re-runs
+# Idempotent: skips the insertion if the marker is already present (so re-runs
 # of the same build dir don't double-insert).

 set -euo pipefail
@@ -34,47 +25,33 @@ if [[ ! -f "$SRC" ]]; then
 fi

 if grep -q 'GGML_TYPE_TURBO2_0' "$SRC"; then
-    echo "==> $SRC already has TurboQuant cache types, skipping KV allow-list patch"
-else
-    echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
-
-    # Insert the three TURBO entries right after the first `    GGML_TYPE_Q5_1,`
-    # line (the kv_cache_types[] allow-list). Using awk because the builder image
-    # does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
-    awk '
-        /^    GGML_TYPE_Q5_1,$/ && !done {
-            print
-            print "    // turboquant fork extras — added by patch-grpc-server.sh"
-            print "    GGML_TYPE_TURBO2_0,"
-            print "    GGML_TYPE_TURBO3_0,"
-            print "    GGML_TYPE_TURBO4_0,"
-            done = 1
-            next
-        }
-        { print }
-        END {
-            if (!done) {
-                print "patch-grpc-server.sh: anchor `    GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
-                exit 1
-            }
-        }
-    ' "$SRC" > "$SRC.tmp"
-    mv "$SRC.tmp" "$SRC"
-
-    echo "==> KV allow-list patch OK"
+    echo "==> $SRC already has TurboQuant cache types, skipping"
+    exit 0
 fi

-if grep -q 'get_media_marker()' "$SRC"; then
-    echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
-    # Only one call site today (ModelMetadata), but replace all occurrences to
-    # stay robust if upstream adds more. Use a temp file to avoid relying on
-    # sed -i portability (the builder image uses GNU sed, but keeping this
-    # consistent with the awk block above).
-    sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
-    mv "$SRC.tmp" "$SRC"
-    echo "==> get_media_marker() substitution OK"
-else
-    echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
-fi
+echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"

-echo "==> all patches applied"
+# Insert the three TURBO entries right after the first `    GGML_TYPE_Q5_1,`
+# line (the kv_cache_types[] allow-list). Using awk because the builder image
+# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
+awk '
+    /^    GGML_TYPE_Q5_1,$/ && !done {
+        print
+        print "    // turboquant fork extras — added by patch-grpc-server.sh"
+        print "    GGML_TYPE_TURBO2_0,"
+        print "    GGML_TYPE_TURBO3_0,"
+        print "    GGML_TYPE_TURBO4_0,"
+        done = 1
+        next
+    }
+    { print }
+    END {
+        if (!done) {
+            print "patch-grpc-server.sh: anchor `    GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
+            exit 1
+        }
+    }
+' "$SRC" > "$SRC.tmp"
+mv "$SRC.tmp" "$SRC"
+
+echo "==> patched OK"
--- a/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch
+++ b/backend/cpp/turboquant/patches/0001-ggml-hip-add-f16-turbo-vec-instances.patch
@@ -1,47 +0,0 @@
-From: LocalAI turboquant backend maintainers <noreply@localai.io>
-Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
-
-Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
-in fattn.cu") added three new template instance files under ggml-cuda/:
-
-  - fattn-vec-instance-f16-turbo2_0.cu
-  - fattn-vec-instance-f16-turbo3_0.cu
-  - fattn-vec-instance-f16-turbo4_0.cu
-
-and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
-dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
-were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
-build path via hipify.
-
-However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
-template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
-is the default) and was never updated for the new F16-K + TURBO-V combos.
-The HIP build therefore compiles the dispatch cases (which reference
-ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
-the matching template instantiations, causing a link-time failure in the
-gpu-rocm-hipblas-turboquant CI job.
-
-Add the three new template instance files to ggml-hip's list so the HIP
-build links cleanly. Drop this patch once the fork picks up the
-corresponding upstream sync in ggml-hip/CMakeLists.txt.
-
--- a/ggml/src/ggml-hip/CMakeLists.txt
-+++ b/ggml/src/ggml-hip/CMakeLists.txt
-@@ -85,14 +85,17 @@ else()
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
-+        ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
-+        ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
-+        ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
-         ../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu
--- a/backend/cpp/turboquant/patches/0001-server-respect-the-ignore-eos-flag.patch
+++ b/backend/cpp/turboquant/patches/0001-server-respect-the-ignore-eos-flag.patch
@@ -0,0 +1,83 @@
+From 660600081fb7b9b769ded5c805a2d39a419f0a0d Mon Sep 17 00:00:00 2001
+From: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
+Date: Wed, 8 Apr 2026 11:12:15 -0400
+Subject: [PATCH] server: respect the ignore eos flag (#21203)
+
+---
+ tools/server/server-context.cpp | 3 +++
+ tools/server/server-context.h   | 3 +++
+ tools/server/server-task.cpp    | 3 ++-
+ tools/server/server-task.h      | 1 +
+ 4 files changed, 9 insertions(+), 1 deletion(-)
+
+diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
+index 9d3ac538..b31981c5 100644
+--- a/tools/server/server-context.cpp
+++ b/tools/server/server-context.cpp
+@@ -3033,6 +3033,8 @@ server_context_meta server_context::get_meta() const {
+         /* fim_rep_token          */ llama_vocab_fim_rep(impl->vocab),
+         /* fim_sep_token          */ llama_vocab_fim_sep(impl->vocab),
+ 
+        /* logit_bias_eog         */ impl->params_base.sampling.logit_bias_eog,
+
+         /* model_vocab_type       */ llama_vocab_type(impl->vocab),
+         /* model_vocab_n_tokens   */ llama_vocab_n_tokens(impl->vocab),
+         /* model_n_ctx_train      */ llama_model_n_ctx_train(impl->model),
+@@ -3117,6 +3119,7 @@ std::unique_ptr<server_res_generator> server_routes::handle_completions_impl(
+                     ctx_server.vocab,
+                     params,
+                     meta->slot_n_ctx,
+                    meta->logit_bias_eog,
+                     data);
+             task.id_slot = json_value(data, "id_slot", -1);
+ 
+diff --git a/tools/server/server-context.h b/tools/server/server-context.h
+index d7ce8735..6ea9afc0 100644
+--- a/tools/server/server-context.h
+++ b/tools/server/server-context.h
+@@ -39,6 +39,9 @@ struct server_context_meta {
+     llama_token fim_rep_token;
+     llama_token fim_sep_token;
+ 
+    // sampling
+    std::vector<llama_logit_bias> logit_bias_eog;
+
+     // model meta
+     enum llama_vocab_type model_vocab_type;
+     int32_t model_vocab_n_tokens;
+diff --git a/tools/server/server-task.cpp b/tools/server/server-task.cpp
+index 4cc87bc5..856b3f0e 100644
+--- a/tools/server/server-task.cpp
+++ b/tools/server/server-task.cpp
+@@ -239,6 +239,7 @@ task_params server_task::params_from_json_cmpl(
+         const llama_vocab * vocab,
+         const common_params & params_base,
+         const int n_ctx_slot,
+        const std::vector<llama_logit_bias> & logit_bias_eog,
+         const json & data) {
+     task_params params;
+ 
+@@ -562,7 +563,7 @@ task_params server_task::params_from_json_cmpl(
+         if (params.sampling.ignore_eos) {
+             params.sampling.logit_bias.insert(
+                     params.sampling.logit_bias.end(),
+-                    defaults.sampling.logit_bias_eog.begin(), defaults.sampling.logit_bias_eog.end());
+                    logit_bias_eog.begin(), logit_bias_eog.end());
+         }
+     }
+ 
+diff --git a/tools/server/server-task.h b/tools/server/server-task.h
+index d855bf08..243e47a8 100644
+--- a/tools/server/server-task.h
+++ b/tools/server/server-task.h
+@@ -209,6 +209,7 @@ struct server_task {
+         const llama_vocab * vocab,
+         const common_params & params_base,
+         const int n_ctx_slot,
+        const std::vector<llama_logit_bias> & logit_bias_eog,
+         const json & data);
+ 
+     // utility function
+-- 
+2.43.0
+
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78
+STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/stablediffusion-ggml/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/gosd.cpp
@@ -1106,11 +1106,6 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
            const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
            const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
            const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
-            // Force MP4 container. Distributed LocalAI hands us a staging
-            // path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
-            // extension; relying on filename suffix makes ffmpeg bail with
-            // "Unable to choose an output format".
-            const_cast<char*>("-f"), const_cast<char*>("mp4"),
            const_cast<char*>(dst),
            nullptr
        };
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
+WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -587,6 +587,7 @@
  alias: "whisperx"
  capabilities:
    nvidia: "cuda12-whisperx"
+    amd: "rocm-whisperx"
    metal: "metal-whisperx"
    default: "cpu-whisperx"
    nvidia-cuda-13: "cuda13-whisperx"
@@ -1007,20 +1008,6 @@
    nvidia-cuda-12: "cuda12-turboquant-development"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-turboquant-development"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-turboquant-development"
- !!merge <<: *stablediffusionggml
-  name: "stablediffusion-ggml-development"
-  capabilities:
-    default: "cpu-stablediffusion-ggml-development"
-    nvidia: "cuda12-stablediffusion-ggml-development"
-    intel: "intel-sycl-f16-stablediffusion-ggml-development"
-    # amd: "rocm-stablediffusion-ggml-development"
-    vulkan: "vulkan-stablediffusion-ggml-development"
-    nvidia-l4t: "nvidia-l4t-arm64-stablediffusion-ggml-development"
-    metal: "metal-stablediffusion-ggml-development"
-    nvidia-cuda-13: "cuda13-stablediffusion-ggml-development"
-    nvidia-cuda-12: "cuda12-stablediffusion-ggml-development"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-stablediffusion-ggml-development"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-stablediffusion-ggml-development"
 - !!merge <<: *neutts
  name: "cpu-neutts"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-neutts"
@@ -2744,6 +2731,7 @@
  name: "whisperx-development"
  capabilities:
    nvidia: "cuda12-whisperx-development"
+    amd: "rocm-whisperx-development"
    metal: "metal-whisperx-development"
    default: "cpu-whisperx-development"
    nvidia-cuda-13: "cuda13-whisperx-development"
@@ -2769,6 +2757,16 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
+- !!merge <<: *whisperx
+  name: "rocm-whisperx"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
+  mirrors:
+    - localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
+- !!merge <<: *whisperx
+  name: "rocm-whisperx-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
+  mirrors:
+    - localai/localai-backends:master-gpu-rocm-hipblas-whisperx
 - !!merge <<: *whisperx
  name: "cuda13-whisperx"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
--- a/backend/python/whisperx/requirements-hipblas.txt
+++ b/backend/python/whisperx/requirements-hipblas.txt
@@ -0,0 +1,6 @@
+# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
+# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
+# the rocm7.2.1 runtime via AMD's forward-compatibility window.
+--extra-index-url https://download.pytorch.org/whl/rocm6.4
+torch==2.8.0+rocm6.4
+whisperx @ git+https://github.com/m-bain/whisperX.git
--- a/backend/rust/kokoros/src/service.rs
+++ b/backend/rust/kokoros/src/service.rs
@@ -341,16 +341,6 @@ impl Backend for KokorosService {
        Err(Status::unimplemented("Not supported"))
    }

-    type AudioTranscriptionStreamStream =
-        ReceiverStream<Result<backend::TranscriptStreamResponse, Status>>;
-
-    async fn audio_transcription_stream(
-        &self,
-        _: Request<backend::TranscriptRequest>,
-    ) -> Result<Response<Self::AudioTranscriptionStreamStream>, Status> {
-        Err(Status::unimplemented("Not supported"))
-    }
-
    async fn sound_generation(
        &self,
        _: Request<backend::SoundGenerationRequest>,
--- a/core/backend/llm.go
+++ b/core/backend/llm.go
@@ -40,12 +40,6 @@ type TokenUsage struct {
 	ChatDeltas             []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
 }

-func needsThinkingProbe(c *config.ModelConfig) bool {
-	return c.TemplateConfig.UseTokenizerTemplate &&
-		(c.ReasoningConfig.DisableReasoning == nil ||
-			c.ReasoningConfig.DisableReasoningTagPrefill == nil)
-}
-
 // HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
 // Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
 func (t TokenUsage) HasChatDeltaContent() bool {
@@ -106,9 +100,11 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
 	// tokenizer template path is active) and the multimodal media marker (needed
 	// by custom chat templates so markers line up with what mtmd expects).
 	// We probe whenever any of those slots is still empty.
-	shouldProbeThinking := needsThinkingProbe(c)
+	needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
+		c.ReasoningConfig.DisableReasoning == nil &&
+		c.ReasoningConfig.DisableReasoningTagPrefill == nil
 	needsMarkerProbe := c.MediaMarker == ""
-	if shouldProbeThinking || needsMarkerProbe {
+	if needsThinkingProbe || needsMarkerProbe {
 		modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
 		config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
 		// Update the config in the loader so it persists for future requests
--- a/core/backend/llm_probe_test.go
+++ b/core/backend/llm_probe_test.go
@@ -1,29 +0,0 @@
-package backend
-
-import (
-	"github.com/mudler/LocalAI/core/config"
-
-	"github.com/gpustack/gguf-parser-go/util/ptr"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-var _ = Describe("thinking probe gating", func() {
-	It("probes tokenizer-template models when any reasoning default is still unset", func() {
-		cfg := &config.ModelConfig{
-			TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
-		}
-		Expect(needsThinkingProbe(cfg)).To(BeTrue())
-
-		cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
-		Expect(needsThinkingProbe(cfg)).To(BeTrue())
-
-		cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
-		Expect(needsThinkingProbe(cfg)).To(BeFalse())
-	})
-
-	It("does not probe when tokenizer templates are disabled", func() {
-		cfg := &config.ModelConfig{}
-		Expect(needsThinkingProbe(cfg)).To(BeFalse())
-	})
-})
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {

 	app, err := application.New(opts...)
 	if err != nil {
-		return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n  1. Check that your models directory exists and is accessible: %s\n  2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n  3. Check available disk space and file permissions\n  4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
+		return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
 	}

 	appHTTP, err := http.API(app)
--- a/core/cli/transcript.go
+++ b/core/cli/transcript.go
@@ -3,6 +3,7 @@ package cli
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"strings"

@@ -59,7 +60,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {

 	c, exists := cl.GetModelConfig(t.Model)
 	if !exists {
-		return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
+		return errors.New("model not found")
 	}

 	c.Threads = &t.Threads
--- a/core/cli/util.go
+++ b/core/cli/util.go
@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {

 func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
 	if len(u.Args) == 0 {
-		return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
+		return fmt.Errorf("no GGUF file provided")
 	}
 	// We try to guess only if we don't have a template defined already
 	f, err := gguf.ParseGGUFFile(u.Args[0])
--- a/core/cli/worker.go
+++ b/core/cli/worker.go
@@ -21,7 +21,6 @@ import (
 	"github.com/mudler/LocalAI/core/cli/workerregistry"
 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/gallery"
-	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/messaging"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/LocalAI/core/services/storage"
@@ -598,20 +597,12 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
 	// Try to find the backend binary
 	backendPath := s.findBackend(req.Backend)
 	if backendPath == "" {
-		if req.URI != "" {
-			xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
-			if err := galleryop.InstallExternalBackend(
-				context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
-			); err != nil {
-				return "", fmt.Errorf("installing backend from gallery: %w", err)
-			}
-		} else {
-			xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
-			if err := gallery.InstallBackendFromGallery(
-				context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
-			); err != nil {
-				return "", fmt.Errorf("installing backend from gallery: %w", err)
-			}
+		// Backend not found locally — try auto-installing from gallery
+		xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
+		if err := gallery.InstallBackendFromGallery(
+			context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
+		); err != nil {
+			return "", fmt.Errorf("installing backend from gallery: %w", err)
 		}
 		// Re-register after install and retry
 		gallery.RegisterBackends(s.systemState, s.ml)
--- a/core/cli/worker/worker_p2p.go
+++ b/core/cli/worker/worker_p2p.go
@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
 	// Check if the token is set
 	// as we always need it.
 	if r.Token == "" {
-		return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
+		return fmt.Errorf("Token is required")
 	}

 	port, err := freeport.GetFreePort()
--- a/core/config/gguf.go
+++ b/core/config/gguf.go
@@ -125,7 +125,19 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
 			return
 		}

-		applyDetectedThinkingConfig(cfg, metadata)
+		cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
+
+		// Use the rendered template to detect if thinking token is at the end
+		// This reuses the existing DetectThinkingStartToken function
+		if metadata.RenderedTemplate != "" {
+			thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
+			thinkingForcedOpen := thinkingStartToken != ""
+			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
+			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
+		} else {
+			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
+			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
+		}

 		// Extract tool format markers from autoparser analysis
 		if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
@@ -168,34 +180,3 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
 		}
 	}
 }
-
-func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
-	if cfg == nil || metadata == nil {
-		return
-	}
-
-	// Respect explicit YAML/user config. Backend probing should only fill defaults
-	// when the reasoning mode has not already been set.
-	if cfg.ReasoningConfig.DisableReasoning == nil {
-		cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
-	}
-
-	// Respect explicit prefill config for the same reason. Only infer the
-	// default prefill behavior when the user did not set it.
-	if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
-		// Use the rendered template to detect if thinking token is at the end.
-		// This reuses the existing DetectThinkingStartToken function.
-		if metadata.RenderedTemplate != "" {
-			thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
-			thinkingForcedOpen := thinkingStartToken != ""
-			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
-			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
-		} else {
-			cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
-			xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
-		}
-		return
-	}
-
-	xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
-}
--- a/core/config/gguf_reasoning_test.go
+++ b/core/config/gguf_reasoning_test.go
@@ -1,101 +0,0 @@
-package config
-
-import (
-	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
-	"github.com/mudler/LocalAI/pkg/reasoning"
-
-	"github.com/gpustack/gguf-parser-go/util/ptr"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-var _ = Describe("GGUF backend metadata reasoning defaults", func() {
-	It("fills reasoning defaults when unset", func() {
-		cfg := &ModelConfig{
-			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
-		}
-
-		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
-			SupportsThinking: true,
-			RenderedTemplate: "{{ bos_token }}<think>",
-		})
-
-		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
-		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
-	})
-
-	It("preserves fully explicit reasoning settings", func() {
-		cfg := &ModelConfig{
-			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
-			ReasoningConfig: reasoning.Config{
-				DisableReasoning:           ptr.To(true),
-				DisableReasoningTagPrefill: ptr.To(true),
-			},
-		}
-
-		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
-			SupportsThinking: true,
-			RenderedTemplate: "{{ bos_token }}<think>",
-		})
-
-		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
-		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
-	})
-
-	It("preserves explicit disable while still inferring missing prefill", func() {
-		cfg := &ModelConfig{
-			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
-			ReasoningConfig: reasoning.Config{
-				DisableReasoning: ptr.To(true),
-			},
-		}
-
-		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
-			SupportsThinking: true,
-			RenderedTemplate: "{{ bos_token }}<think>",
-		})
-
-		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
-		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
-	})
-
-	It("preserves explicit prefill while still inferring missing disable flag", func() {
-		cfg := &ModelConfig{
-			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
-			ReasoningConfig: reasoning.Config{
-				DisableReasoningTagPrefill: ptr.To(true),
-			},
-		}
-
-		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
-			SupportsThinking: true,
-			RenderedTemplate: "{{ bos_token }}<think>",
-		})
-
-		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
-		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
-	})
-
-	It("defaults to disabling reasoning when backend does not support thinking", func() {
-		cfg := &ModelConfig{
-			TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
-		}
-
-		applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
-			SupportsThinking: false,
-		})
-
-		Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
-		Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
-		Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
-	})
-})
--- a/core/config/model_config_loader.go
+++ b/core/config/model_config_loader.go
@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
 		bcl.configs[c.Name] = *c
 	} else {
 		if err != nil {
-			return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
+			return fmt.Errorf("config is not valid: %w", err)
 		}
-		return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
+		return fmt.Errorf("config is not valid")
 	}

 	return nil
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
 		files = append(files, info)
 	}
 	for _, file := range files {
-		// Only load real YAML config files and ignore dotfiles or backup variants
-		ext := strings.ToLower(filepath.Ext(file.Name()))
-		if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
+		// Skip templates, YAML and .keep files
+		if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
+			strings.HasPrefix(file.Name(), ".") {
 			continue
 		}

--- a/core/config/model_test.go
+++ b/core/config/model_test.go
@@ -2,7 +2,6 @@ package config

 import (
 	"os"
-	"path/filepath"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
@@ -110,50 +109,5 @@ options:
 			Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))

 		})
-
-		It("Only loads files ending with yaml or yml", func() {
-			tmpdir, err := os.MkdirTemp("", "model-config-loader")
-			Expect(err).ToNot(HaveOccurred())
-			defer os.RemoveAll(tmpdir)
-
-			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
-				`name: "foo-model"
-description: "formal config"
-backend: "llama-cpp"
-parameters:
-  model: "foo.gguf"
-`), 0644)
-			Expect(err).ToNot(HaveOccurred())
-
-			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
-				`name: "foo-model"
-description: "backup config"
-backend: "llama-cpp"
-parameters:
-  model: "foo-backup.gguf"
-`), 0644)
-			Expect(err).ToNot(HaveOccurred())
-
-			err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
-				`name: "foo-backup-only"
-description: "timestamped backup config"
-backend: "llama-cpp"
-parameters:
-  model: "foo-timestamped.gguf"
-`), 0644)
-			Expect(err).ToNot(HaveOccurred())
-
-			bcl := NewModelConfigLoader(tmpdir)
-			err = bcl.LoadModelConfigsFromPath(tmpdir)
-			Expect(err).ToNot(HaveOccurred())
-
-			configs := bcl.GetAllModelsConfigs()
-			Expect(configs).To(HaveLen(1))
-			Expect(configs[0].Name).To(Equal("foo-model"))
-			Expect(configs[0].Description).To(Equal("formal config"))
-
-			_, exists := bcl.GetModelConfig("foo-backup-only")
-			Expect(exists).To(BeFalse())
-		})
 	})
 })
--- a/core/gallery/backends.go
+++ b/core/gallery/backends.go
@@ -110,13 +110,7 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
 		if err != nil {
 			return err
 		}
-		// Only short-circuit if the install is *actually usable*. An orphaned
-		// meta entry whose concrete was removed still shows up in
-		// ListSystemBackends with a RunFile pointing at a path that no longer
-		// exists; returning early there leaves the caller with a broken
-		// alias and the worker fails with "backend not found after install
-		// attempt" on every retry. Re-install in that case.
-		if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
+		if backends.Exists(name) {
 			return nil
 		}
 	}
@@ -381,44 +375,17 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
 	}

 	if metadata != nil && metadata.MetaBackendFor != "" {
-		concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
-		xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
-		// If the concrete the meta points to is already gone (earlier delete,
-		// partial install, or manual cleanup), keep going and remove the
-		// orphaned meta dir. Previously we returned an error here, which made
-		// the orphaned meta impossible to uninstall from the UI — the delete
-		// kept failing and every subsequent install short-circuited because
-		// the stale meta metadata made ListSystemBackends.Exists(name) true.
-		if _, statErr := os.Stat(concreteDirectory); statErr == nil {
-			os.RemoveAll(concreteDirectory)
-		} else if os.IsNotExist(statErr) {
-			xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
-				"meta", name, "concrete", metadata.MetaBackendFor)
-		} else {
-			return statErr
+		metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
+		xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
+		if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
+			return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
 		}
+		os.RemoveAll(metaBackendDirectory)
 	}

 	return os.RemoveAll(backendDirectory)
 }

-// isBackendRunnable reports whether the given backend entry can actually be
-// invoked. A meta backend is runnable only if its concrete's run.sh still
-// exists on disk; concrete backends are considered runnable as long as their
-// RunFile is set (ListSystemBackends only emits them when the runfile is
-// present). Used to guard the "already installed" short-circuit so an
-// orphaned meta pointing at a missing concrete triggers a real reinstall
-// rather than being silently skipped.
-func isBackendRunnable(b SystemBackend) bool {
-	if b.RunFile == "" {
-		return false
-	}
-	if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
-		return false
-	}
-	return true
-}
-
 type SystemBackend struct {
 	Name             string
 	RunFile          string
--- a/core/gallery/backends_test.go
+++ b/core/gallery/backends_test.go
@@ -952,58 +952,6 @@ var _ = Describe("Gallery Backends", func() {
 			err = DeleteBackendFromSystem(systemState, "non-existent")
 			Expect(err).To(HaveOccurred())
 		})
-
-		It("removes an orphaned meta backend whose concrete is missing", func() {
-			// Real scenario from the dev cluster: the concrete got wiped
-			// (partial install, manual cleanup, previous crash) but the meta
-			// directory + metadata.json still points at it. The old code
-			// errored with "meta backend X not found" and left the orphan in
-			// place, making the backend impossible to uninstall.
-			metaName := "meta-backend"
-			concreteName := "concrete-backend-that-vanished"
-			metaPath := filepath.Join(tempDir, metaName)
-			Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
-
-			meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
-			data, err := json.MarshalIndent(meta, "", "  ")
-			Expect(err).NotTo(HaveOccurred())
-			Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
-
-			// Concrete directory intentionally absent.
-			systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
-			Expect(err).NotTo(HaveOccurred())
-
-			Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
-			Expect(metaPath).NotTo(BeADirectory())
-		})
-	})
-
-	Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
-		It("re-runs install when the meta's concrete is missing", func() {
-			// Seed state: meta dir exists with metadata pointing at a
-			// concrete that was removed from disk. ListSystemBackends still
-			// surfaces the meta via its metadata.Name → the old short-circuit
-			// at `if backends.Exists(name) { return nil }` returned silently,
-			// leaving the worker's findBackend() with a dead alias forever.
-			// The fix: require the backend to be runnable before we skip.
-			metaName := "meta-orphan"
-			concreteName := "concrete-gone"
-			metaPath := filepath.Join(tempDir, metaName)
-			Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
-			meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
-			data, err := json.MarshalIndent(meta, "", "  ")
-			Expect(err).NotTo(HaveOccurred())
-			Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
-
-			systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
-			Expect(err).NotTo(HaveOccurred())
-
-			listed, err := ListSystemBackends(systemState)
-			Expect(err).NotTo(HaveOccurred())
-			b, ok := listed.Get(metaName)
-			Expect(ok).To(BeTrue())
-			Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
-		})
 	})

 	Describe("ListSystemBackends", func() {
--- a/core/http/endpoints/localai/backend_monitor.go
+++ b/core/http/endpoints/localai/backend_monitor.go
@@ -9,26 +9,19 @@ import (
 // BackendMonitorEndpoint returns the status of the specified backend
 // @Summary Backend monitor endpoint
 // @Tags monitoring
-// @Param model query string true "Name of the model to monitor"
+// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
 // @Success 200 {object} proto.StatusResponse "Response"
 // @Router /backend/monitor [get]
 func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
 	return func(c echo.Context) error {
-		model := c.QueryParam("model")
-		// Fall back to binding the request body so pre-existing clients that
-		// sent `{"model": "..."}` with GET keep working.
-		if model == "" {
-			input := new(schema.BackendMonitorRequest)
-			if err := c.Bind(input); err != nil {
-				return err
-			}
-			model = input.Model
-		}
-		if model == "" {
-			return echo.NewHTTPError(400, "model query parameter is required")
+
+		input := new(schema.BackendMonitorRequest)
+		// Get input data from the request body
+		if err := c.Bind(input); err != nil {
+			return err
 		}

-		resp, err := bm.CheckAndSample(model)
+		resp, err := bm.CheckAndSample(input.Model)
 		if err != nil {
 			return err
 		}
--- a/core/http/endpoints/localai/nodes.go
+++ b/core/http/endpoints/localai/nodes.go
@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
 		if err := c.Bind(&req); err != nil || req.Backend == "" {
 			return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
 		}
-		reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
+		reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
 		if err != nil {
 			xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
 			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
--- a/core/http/endpoints/localai/settings.go
+++ b/core/http/endpoints/localai/settings.go
@@ -110,27 +110,6 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			})
 		}

-		// The UI reads ApiKeys from GET /api/settings, which already returns the
-		// merged env+runtime list. When the user clicks Save, the same merged
-		// list comes back in the POST body. Strip the env-supplied keys from
-		// the incoming list before we persist or re-merge, otherwise each save
-		// duplicates the env keys on top of the previous merge (#9071).
-		if settings.ApiKeys != nil {
-			envKeys := startupConfig.ApiKeys
-			envSet := make(map[string]struct{}, len(envKeys))
-			for _, k := range envKeys {
-				envSet[k] = struct{}{}
-			}
-			runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
-			for _, k := range *settings.ApiKeys {
-				if _, fromEnv := envSet[k]; fromEnv {
-					continue
-				}
-				runtimeOnly = append(runtimeOnly, k)
-			}
-			settings.ApiKeys = &runtimeOnly
-		}
-
 		settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
 		settingsJSON, err := json.MarshalIndent(settings, "", "  ")
 		if err != nil {
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -147,7 +147,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 		result := ""
 		lastEmittedCount := 0
 		sentInitialRole := false
-		sentReasoning := false
 		hasChatDeltaToolCalls := false
 		hasChatDeltaContent := false

@@ -191,7 +190,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 					}},
 					Object: "chat.completion.chunk",
 				}
-				sentReasoning = true
 			}

 			// Stream content deltas (cleaned of reasoning tags) while no tool calls
@@ -365,12 +363,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 			functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
 		}
 		xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
-		// noAction is a sentinel "just answer" pseudo-function — not a real
-		// tool call. Scan the whole slice rather than only index 0 so we
-		// don't drop a real tool call that happens to follow a noAction
-		// entry, and so the default branch isn't entered with only noAction
-		// entries to emit as tool_calls.
-		noActionToRun := !hasRealCall(functionResults, noAction)
+		noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0

 		switch {
 		case noActionToRun:
@@ -384,31 +377,108 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
 			}

-			var result string
-			if !sentInitialRole {
-				var hqErr error
-				result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
-				if hqErr != nil {
-					xlog.Error("error handling question", "error", hqErr)
-					return hqErr
+			if sentInitialRole {
+				// Content was already streamed during the callback — just emit usage.
+				delta := &schema.Message{}
+				if reasoning != "" && extractor.Reasoning() == "" {
+					delta.Reasoning = &reasoning
+				}
+				responses <- schema.OpenAIResponse{
+					ID: id, Created: created, Model: req.Model,
+					Choices: []schema.Choice{{Delta: delta, Index: 0}},
+					Object:  "chat.completion.chunk",
+					Usage:   usage,
+				}
+			} else {
+				// Content was NOT streamed — send everything at once (fallback).
+				responses <- schema.OpenAIResponse{
+					ID: id, Created: created, Model: req.Model,
+					Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
+					Object:  "chat.completion.chunk",
+				}
+
+				result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
+				if err != nil {
+					xlog.Error("error handling question", "error", err)
+					return err
+				}
+
+				delta := &schema.Message{Content: &result}
+				if reasoning != "" {
+					delta.Reasoning = &reasoning
+				}
+				responses <- schema.OpenAIResponse{
+					ID: id, Created: created, Model: req.Model,
+					Choices: []schema.Choice{{Delta: delta, Index: 0}},
+					Object:  "chat.completion.chunk",
+					Usage:   usage,
 				}
-			}
-			for _, chunk := range buildNoActionFinalChunks(
-				id, req.Model, created,
-				sentInitialRole, sentReasoning,
-				result, reasoning, usage,
-			) {
-				responses <- chunk
 			}

 		default:
-			for _, chunk := range buildDeferredToolCallChunks(
-				id, req.Model, created,
-				functionResults, lastEmittedCount,
-				sentInitialRole, *textContentToReturn,
-				sentReasoning, reasoning,
-			) {
-				responses <- chunk
+			for i, ss := range functionResults {
+				name, args := ss.Name, ss.Arguments
+				toolCallID := ss.ID
+				if toolCallID == "" {
+					toolCallID = id
+				}
+
+				if i < lastEmittedCount {
+					// Already emitted during streaming by the incremental
+					// JSON/XML parser — skip to avoid duplicate tool calls.
+					continue
+				}
+
+				// Tool call not yet emitted — send name + args (two chunks).
+				initialMessage := schema.OpenAIResponse{
+					ID:      id,
+					Created: created,
+					Model:   req.Model,
+					Choices: []schema.Choice{{
+						Delta: &schema.Message{
+							Role: "assistant",
+							ToolCalls: []schema.ToolCall{
+								{
+									Index: i,
+									ID:    toolCallID,
+									Type:  "function",
+									FunctionCall: schema.FunctionCall{
+										Name: name,
+									},
+								},
+							},
+						},
+						Index:        0,
+						FinishReason: nil,
+					}},
+					Object: "chat.completion.chunk",
+				}
+				responses <- initialMessage
+
+				responses <- schema.OpenAIResponse{
+					ID:      id,
+					Created: created,
+					Model:   req.Model,
+					Choices: []schema.Choice{{
+						Delta: &schema.Message{
+							Role:    "assistant",
+							Content: textContentToReturn,
+							ToolCalls: []schema.ToolCall{
+								{
+									Index: i,
+									ID:    toolCallID,
+									Type:  "function",
+									FunctionCall: schema.FunctionCall{
+										Arguments: args,
+									},
+								},
+							},
+						},
+						Index:        0,
+						FinishReason: nil,
+					}},
+					Object: "chat.completion.chunk",
+				}
 			}
 		}

--- a/core/http/endpoints/openai/chat_emit.go
+++ b/core/http/endpoints/openai/chat_emit.go
@@ -1,233 +0,0 @@
-package openai
-
-import (
-	"fmt"
-
-	"github.com/mudler/LocalAI/core/schema"
-	"github.com/mudler/LocalAI/pkg/functions"
-)
-
-// hasRealCall reports whether functionResults contains at least one
-// entry whose Name is something other than the noAction sentinel.
-// Used by processTools to decide between the "answer the question"
-// path and the real tool-call flush.
-func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
-	for _, fc := range functionResults {
-		if fc.Name != noAction {
-			return true
-		}
-	}
-	return false
-}
-
-// buildNoActionFinalChunks produces the closing SSE chunks for the
-// noActionToRun branch of processTools (i.e. the model chose the "answer"
-// pseudo-function or emitted no tool calls at all).
-//
-// When content was already streamed (contentAlreadyStreamed=true) the
-// helper emits a single trailing usage chunk, optionally carrying
-// reasoning that was produced but not streamed incrementally. When
-// content was not streamed it emits a role chunk followed by a
-// content+reasoning+usage chunk — the "send everything at once" fallback.
-//
-// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
-// probing the extractor's Go-side state: the C++ autoparser delivers
-// reasoning through ProcessChatDeltaReasoning which populates a
-// separate accumulator that extractor.Reasoning() does not expose.
-// Without this guard the callback would stream reasoning incrementally
-// and the final chunk would duplicate it.
-func buildNoActionFinalChunks(
-	id, model string,
-	created int,
-	contentAlreadyStreamed bool,
-	reasoningAlreadyStreamed bool,
-	content string,
-	reasoning string,
-	usage schema.OpenAIUsage,
-) []schema.OpenAIResponse {
-	var out []schema.OpenAIResponse
-
-	if contentAlreadyStreamed {
-		delta := &schema.Message{}
-		if reasoning != "" && !reasoningAlreadyStreamed {
-			r := reasoning
-			delta.Reasoning = &r
-		}
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{Delta: delta, Index: 0}},
-			Object:  "chat.completion.chunk",
-			Usage:   usage,
-		})
-		return out
-	}
-
-	// Content was not streamed — send role, then content (+reasoning) + usage.
-	out = append(out, schema.OpenAIResponse{
-		ID: id, Created: created, Model: model,
-		Choices: []schema.Choice{{
-			Delta: &schema.Message{Role: "assistant"},
-			Index: 0,
-		}},
-		Object: "chat.completion.chunk",
-	})
-
-	c := content
-	delta := &schema.Message{Content: &c}
-	if reasoning != "" && !reasoningAlreadyStreamed {
-		r := reasoning
-		delta.Reasoning = &r
-	}
-	out = append(out, schema.OpenAIResponse{
-		ID: id, Created: created, Model: model,
-		Choices: []schema.Choice{{Delta: delta, Index: 0}},
-		Object:  "chat.completion.chunk",
-		Usage:   usage,
-	})
-	return out
-}
-
-// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
-// were discovered only during final parsing (i.e. after the streaming
-// callback finished). The caller forwards every returned chunk to the
-// responses channel.
-//
-// Guarantees:
-//   - tool calls with i < lastEmittedCount are skipped (already streamed)
-//   - each emitted call yields two chunks: name-only, then args-only
-//   - no chunk ever carries both non-empty Content and non-empty ToolCalls
-//   - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
-//   - if !reasoningAlreadyStreamed && reasoningContent != "",
-//     a reasoning chunk is emitted first
-//   - if !contentAlreadyStreamed && textContent != "",
-//     a role chunk followed by a content chunk is emitted (after reasoning)
-//   - chunks order: [reasoning?] [role+content?] (name, args)+
-//   - fallback IDs for empty ss.ID are unique per index so a client can
-//     match tool_result messages back to the right call
-func buildDeferredToolCallChunks(
-	id, model string,
-	created int,
-	functionResults []functions.FuncCallResults,
-	lastEmittedCount int,
-	contentAlreadyStreamed bool,
-	textContent string,
-	reasoningAlreadyStreamed bool,
-	reasoningContent string,
-) []schema.OpenAIResponse {
-	// If every call was already emitted incrementally there's nothing to
-	// flush — and no reason to emit a standalone reasoning/content chunk.
-	hasDeferred := false
-	for i := range functionResults {
-		if i >= lastEmittedCount {
-			hasDeferred = true
-			break
-		}
-	}
-	if !hasDeferred {
-		return nil
-	}
-
-	var out []schema.OpenAIResponse
-
-	// Reasoning first — the callback path at processTools emits reasoning
-	// incrementally in its own chunks, but when the C++ autoparser only
-	// surfaces reasoning as a final aggregate the callback never sees it.
-	// Recover it here (no duplication: contentAlreadyStreamed and
-	// reasoningAlreadyStreamed track what the callback already sent).
-	if !reasoningAlreadyStreamed && reasoningContent != "" {
-		r := reasoningContent
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{
-				Delta: &schema.Message{Reasoning: &r},
-				Index: 0,
-			}},
-			Object: "chat.completion.chunk",
-		})
-	}
-
-	// Then content, when it wasn't streamed via the callback. Emit role
-	// and content in separate deltas — the OpenAI streaming contract
-	// forbids bundling content alongside tool_calls in one delta.
-	if !contentAlreadyStreamed && textContent != "" {
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{
-				Delta: &schema.Message{Role: "assistant"},
-				Index: 0,
-			}},
-			Object: "chat.completion.chunk",
-		})
-		c := textContent
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{
-				Delta: &schema.Message{Content: &c},
-				Index: 0,
-			}},
-			Object: "chat.completion.chunk",
-		})
-	}
-
-	for i, ss := range functionResults {
-		if i < lastEmittedCount {
-			// Already streamed by the incremental JSON/XML parser during
-			// the token callback — skip to avoid a duplicate emission.
-			continue
-		}
-
-		toolCallID := ss.ID
-		if toolCallID == "" {
-			// Unique per-index fallback so multiple empty-ID calls don't
-			// collide on the same request ID (clients match tool results
-			// back by tool_call_id).
-			toolCallID = fmt.Sprintf("%s-%d", id, i)
-		}
-
-		// Name chunk.
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{
-				Delta: &schema.Message{
-					Role: "assistant",
-					ToolCalls: []schema.ToolCall{{
-						Index: i,
-						ID:    toolCallID,
-						Type:  "function",
-						FunctionCall: schema.FunctionCall{
-							Name: ss.Name,
-						},
-					}},
-				},
-				Index:        0,
-				FinishReason: nil,
-			}},
-			Object: "chat.completion.chunk",
-		})
-
-		// Args chunk — no Content here. Either it was streamed through
-		// the token callback earlier, or the role+content pair above
-		// already delivered it.
-		out = append(out, schema.OpenAIResponse{
-			ID: id, Created: created, Model: model,
-			Choices: []schema.Choice{{
-				Delta: &schema.Message{
-					Role: "assistant",
-					ToolCalls: []schema.ToolCall{{
-						Index: i,
-						ID:    toolCallID,
-						Type:  "function",
-						FunctionCall: schema.FunctionCall{
-							Arguments: ss.Arguments,
-						},
-					}},
-				},
-				Index:        0,
-				FinishReason: nil,
-			}},
-			Object: "chat.completion.chunk",
-		})
-	}
-
-	return out
-}
--- a/core/http/endpoints/openai/chat_emit_test.go
+++ b/core/http/endpoints/openai/chat_emit_test.go
@@ -1,717 +0,0 @@
-package openai
-
-import (
-	"fmt"
-
-	"github.com/mudler/LocalAI/core/schema"
-	"github.com/mudler/LocalAI/pkg/functions"
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-)
-
-// contentOf extracts the string payload from a chunk's delta.Content,
-// transparently handling both *string and string underlying types so
-// assertions don't have to care which one the helper produced.
-func contentOf(ch schema.OpenAIResponse) string {
-	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
-		return ""
-	}
-	switch v := ch.Choices[0].Delta.Content.(type) {
-	case *string:
-		if v == nil {
-			return ""
-		}
-		return *v
-	case string:
-		return v
-	default:
-		return ""
-	}
-}
-
-// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
-// *string on schema.Message.
-func reasoningOf(ch schema.OpenAIResponse) string {
-	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
-		return ""
-	}
-	r := ch.Choices[0].Delta.Reasoning
-	if r == nil {
-		return ""
-	}
-	return *r
-}
-
-// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
-func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
-	if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
-		return nil
-	}
-	return ch.Choices[0].Delta.ToolCalls
-}
-
-// expectSpecCompliant enforces the invariants on every chunk:
-//   - Object == "chat.completion.chunk"
-//   - Exactly one Choice with Index==0
-//   - No delta ever carries both non-empty Content and non-empty ToolCalls
-//   - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
-func expectSpecCompliant(chunks []schema.OpenAIResponse) {
-	for i, ch := range chunks {
-		Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
-		Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
-		Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
-
-		hasContent := contentOf(ch) != ""
-		hasReasoning := reasoningOf(ch) != ""
-		hasToolCalls := len(toolCallsOf(ch)) > 0
-
-		if hasContent && hasToolCalls {
-			Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
-		}
-		if hasReasoning && hasToolCalls {
-			Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
-		}
-	}
-}
-
-// expectMetadata asserts every chunk carries the same id/model/created.
-func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
-	for i, ch := range chunks {
-		Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
-		Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
-		Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
-	}
-}
-
-var _ = Describe("buildDeferredToolCallChunks", func() {
-	const (
-		testID      = "req"
-		testModel   = "test-model"
-		testCreated = 1700000000
-	)
-
-	Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
-		It("emits only the tool_call chunks, no Content anywhere", func() {
-			results := []functions.FuncCallResults{
-				{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "Let me search…",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2), "two chunks: name, args")
-
-			// Name chunk
-			tc0 := toolCallsOf(chunks[0])
-			Expect(tc0).To(HaveLen(1))
-			Expect(tc0[0].Index).To(Equal(0))
-			Expect(tc0[0].ID).To(Equal("tc1"))
-			Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
-			Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-
-			// Args chunk — MUST NOT carry Content
-			tc1 := toolCallsOf(chunks[1])
-			Expect(tc1).To(HaveLen(1))
-			Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
-			Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
-			Expect(contentOf(chunks[1])).To(BeEmpty(),
-				"args chunk must not duplicate already-streamed content")
-		})
-	})
-
-	Describe("Case B — autoparser / content not streamed", func() {
-		It("emits role, content, then name+args", func() {
-			results := []functions.FuncCallResults{
-				{Name: "do", Arguments: "{}", ID: "tc1"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				false, "Here is my plan…",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(4), "role, content, name, args")
-
-			// Role chunk
-			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-			Expect(toolCallsOf(chunks[0])).To(BeEmpty())
-
-			// Content chunk
-			Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
-			Expect(toolCallsOf(chunks[1])).To(BeEmpty())
-
-			// Name + args chunks
-			Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
-			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
-			Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
-			Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
-		})
-	})
-
-	Describe("Case C — multiple deferred calls, content already streamed", func() {
-		It("emits (name, args) × 3 with no Content anywhere", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tcA"},
-				{Name: "b", Arguments: "{}", ID: "tcB"},
-				{Name: "c", Arguments: "{}", ID: "tcC"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "some narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(6))
-
-			for i := 0; i < 3; i++ {
-				Expect(contentOf(chunks[2*i])).To(BeEmpty(),
-					"call #%d name chunk must not carry Content", i)
-				Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
-					"call #%d args chunk must not carry Content", i)
-				Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
-				Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
-			}
-			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
-			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
-			Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
-		})
-	})
-
-	Describe("Case D — partial incremental emission", func() {
-		It("emits only the deferred tail (call #1), skipping #0", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc0"},
-				{Name: "b", Arguments: "{}", ID: "tc1"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 1,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2))
-			Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
-			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
-			Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
-			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
-		})
-	})
-
-	Describe("Case E — all calls already emitted incrementally", func() {
-		It("emits nothing", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc0"},
-				{Name: "b", Arguments: "{}", ID: "tc1"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 2,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(BeEmpty())
-		})
-	})
-
-	Describe("Case F — content not streamed but textContent empty", func() {
-		It("emits only the tool call chunks, no leading role/content", func() {
-			results := []functions.FuncCallResults{
-				{Name: "x", Arguments: "{}", ID: "tcX"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				false, "",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2))
-			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
-			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
-		})
-	})
-
-	Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
-		It("emits a deterministic per-index fallback", func() {
-			results := []functions.FuncCallResults{
-				{Name: "x", Arguments: "{}", ID: ""},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2))
-			expectedID := fmt.Sprintf("%s-%d", testID, 0)
-			Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
-			Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
-		})
-	})
-
-	Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
-		It("avoids the collision bug where every empty-ID call shared the request id", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: ""},
-				{Name: "b", Arguments: "{}", ID: ""},
-				{Name: "c", Arguments: "{}", ID: ""},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(6))
-
-			ids := map[string]int{}
-			for _, ch := range chunks {
-				for _, tc := range toolCallsOf(ch) {
-					ids[tc.ID]++
-				}
-			}
-			// Each call yields a name chunk + args chunk → each distinct ID
-			// should appear in exactly two chunks. Three distinct IDs
-			// overall.
-			Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
-			for id, n := range ids {
-				Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
-			}
-		})
-	})
-
-	Describe("Case H — indices preserved across skip with multiple calls", func() {
-		It("emits Index fields matching functionResults positions", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc0"},
-				{Name: "b", Arguments: "{}", ID: "tc1"},
-				{Name: "c", Arguments: "{}", ID: "tc2"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 1,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(4))
-
-			Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
-			Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
-			Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
-			Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
-		})
-	})
-
-	Describe("Case I — explicit non-empty ID is preserved", func() {
-		It("does not touch ss.ID when it's already set", func() {
-			results := []functions.FuncCallResults{
-				{Name: "x", Arguments: "{}", ID: "abc123"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2))
-			Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
-			Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
-		})
-	})
-
-	Describe("Case J — chunk-shape sanity", func() {
-		It("splits Name into the first chunk and Arguments into the second", func() {
-			results := []functions.FuncCallResults{
-				{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "narration",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2))
-
-			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
-			Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
-
-			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
-			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
-		})
-	})
-
-	Describe("Case K — metadata propagation", func() {
-		It("stamps every chunk with the same id/model/created", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tcA"},
-				{Name: "b", Arguments: "{}", ID: "tcB"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				false, "hello",
-				true, "",
-			)
-
-			expectSpecCompliant(chunks)
-			expectMetadata(chunks, testID, testModel, testCreated)
-		})
-	})
-
-	Describe("Case L — Choices[0].Index == 0 invariant", func() {
-		It("is upheld across every branch the helper can take", func() {
-			scenarios := []struct {
-				name                  string
-				functionResults       []functions.FuncCallResults
-				lastEmittedCount      int
-				contentStreamed       bool
-				text                  string
-				reasoningStreamed     bool
-				reasoning             string
-			}{
-				{"streamed-content-deferred-call",
-					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
-					0, true, "hi", true, ""},
-				{"unstreamed-content-deferred-call",
-					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
-					0, false, "hello", true, ""},
-				{"unstreamed-reasoning-and-content",
-					[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
-					0, false, "hello", false, "thinking…"},
-				{"partial-incremental",
-					[]functions.FuncCallResults{
-						{Name: "a", Arguments: "{}"},
-						{Name: "b", Arguments: "{}"}},
-					1, true, "hi", true, ""},
-			}
-			for _, sc := range scenarios {
-				chunks := buildDeferredToolCallChunks(
-					testID, testModel, testCreated,
-					sc.functionResults, sc.lastEmittedCount,
-					sc.contentStreamed, sc.text,
-					sc.reasoningStreamed, sc.reasoning,
-				)
-				for i, ch := range chunks {
-					Expect(ch.Choices[0].Index).To(Equal(0),
-						"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
-				}
-			}
-		})
-	})
-
-	Describe("Case M — spec compliance across every scenario", func() {
-		It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
-			scenarios := []struct {
-				name                  string
-				functionResults       []functions.FuncCallResults
-				lastEmittedCount      int
-				contentStreamed       bool
-				text                  string
-				reasoningStreamed     bool
-				reasoning             string
-			}{
-				{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
-					0, true, "already-streamed", true, ""},
-				{"C", []functions.FuncCallResults{
-					{Name: "a", Arguments: "{}", ID: "tc0"},
-					{Name: "b", Arguments: "{}", ID: "tc1"}},
-					0, true, "already-streamed", true, ""},
-				{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
-					0, false, "plan", true, ""},
-				{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
-					0, false, "plan", false, "thinking…"},
-			}
-			for _, sc := range scenarios {
-				chunks := buildDeferredToolCallChunks(
-					testID, testModel, testCreated,
-					sc.functionResults, sc.lastEmittedCount,
-					sc.contentStreamed, sc.text,
-					sc.reasoningStreamed, sc.reasoning,
-				)
-				for i, ch := range chunks {
-					hasContent := contentOf(ch) != ""
-					hasReasoning := reasoningOf(ch) != ""
-					hasToolCalls := len(toolCallsOf(ch)) > 0
-					Expect(hasContent && hasToolCalls).To(BeFalse(),
-						"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
-					Expect(hasReasoning && hasToolCalls).To(BeFalse(),
-						"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
-				}
-			}
-		})
-	})
-
-	Describe("Case N — empty functionResults", func() {
-		It("emits nothing, including no leading role/content/reasoning", func() {
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				nil, 0,
-				false, "ignored",
-				false, "ignored",
-			)
-			Expect(chunks).To(BeEmpty())
-		})
-	})
-
-	Describe("Case O — content not streamed but all calls already emitted", func() {
-		It("emits nothing, not even a standalone content chunk", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc0"},
-				{Name: "b", Arguments: "{}", ID: "tc1"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 2,
-				false, "narration",
-				false, "thinking…",
-			)
-			Expect(chunks).To(BeEmpty(),
-				"no tool_calls to trigger on, so no leading role/content/reasoning either")
-		})
-	})
-
-	Describe("Reasoning — autoparser delivered reasoning only at end", func() {
-		It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "streamed content",
-				false, "model's private thoughts",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(3), "reasoning, name, args")
-
-			Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-			Expect(toolCallsOf(chunks[0])).To(BeEmpty())
-
-			// The following two are the tool_call name + args chunks.
-			Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
-			Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
-		})
-
-		It("emits reasoning before role+content when neither was streamed", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				false, "final plan",
-				false, "private thoughts",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
-
-			Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
-			Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
-			Expect(contentOf(chunks[2])).To(Equal("final plan"))
-			Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
-			Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
-		})
-
-		It("does not re-emit reasoning that was already streamed", func() {
-			results := []functions.FuncCallResults{
-				{Name: "a", Arguments: "{}", ID: "tc"},
-			}
-			chunks := buildDeferredToolCallChunks(
-				testID, testModel, testCreated,
-				results, 0,
-				true, "streamed",
-				true, "already-sent reasoning",
-			)
-
-			expectSpecCompliant(chunks)
-			Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
-			for _, ch := range chunks {
-				Expect(reasoningOf(ch)).To(BeEmpty())
-			}
-		})
-	})
-})
-
-var _ = Describe("hasRealCall", func() {
-	const noAction = "answer"
-
-	It("returns false for nil and empty slices", func() {
-		Expect(hasRealCall(nil, noAction)).To(BeFalse())
-		Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
-	})
-
-	It("returns false when every entry is the noAction sentinel", func() {
-		results := []functions.FuncCallResults{
-			{Name: noAction, Arguments: `{"message":"hi"}`},
-			{Name: noAction, Arguments: `{"message":"hello"}`},
-		}
-		Expect(hasRealCall(results, noAction)).To(BeFalse())
-	})
-
-	It("returns true when only one entry is a real call", func() {
-		results := []functions.FuncCallResults{
-			{Name: "search", Arguments: "{}"},
-		}
-		Expect(hasRealCall(results, noAction)).To(BeTrue())
-	})
-
-	It("returns true when a real call follows a noAction entry", func() {
-		// This is the regression the follow-up fixes: the old
-		// functionResults[0].Name == noAction check would declare this
-		// noActionToRun and drop the real call entirely.
-		results := []functions.FuncCallResults{
-			{Name: noAction, Arguments: `{"message":"hi"}`},
-			{Name: "search", Arguments: "{}"},
-		}
-		Expect(hasRealCall(results, noAction)).To(BeTrue())
-	})
-
-	It("returns true when a real call precedes a noAction entry", func() {
-		results := []functions.FuncCallResults{
-			{Name: "search", Arguments: "{}"},
-			{Name: noAction, Arguments: `{"message":"hi"}`},
-		}
-		Expect(hasRealCall(results, noAction)).To(BeTrue())
-	})
-})
-
-var _ = Describe("buildNoActionFinalChunks", func() {
-	const (
-		testID      = "req"
-		testModel   = "test-model"
-		testCreated = 1700000000
-	)
-	usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
-
-	Describe("Content streamed — trailing usage chunk", func() {
-		It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				true, true,
-				"", "already-streamed-reasoning", usage,
-			)
-
-			Expect(chunks).To(HaveLen(1))
-			Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-			Expect(reasoningOf(chunks[0])).To(BeEmpty(),
-				"reasoning must not be re-emitted once it was streamed via the callback")
-		})
-
-		It("emits a trailing reasoning delivery when reasoning came only at end", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				true, false,
-				"", "autoparser final reasoning", usage,
-			)
-
-			Expect(chunks).To(HaveLen(1))
-			Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-			Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
-		})
-
-		It("omits reasoning when it's empty regardless of streamed flag", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				true, false,
-				"", "", usage,
-			)
-
-			Expect(chunks).To(HaveLen(1))
-			Expect(reasoningOf(chunks[0])).To(BeEmpty())
-		})
-	})
-
-	Describe("Content not streamed — role, then content+usage", func() {
-		It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				false, true,
-				"the answer", "already-streamed-reasoning", usage,
-			)
-
-			Expect(chunks).To(HaveLen(2))
-			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
-			Expect(contentOf(chunks[0])).To(BeEmpty())
-
-			Expect(contentOf(chunks[1])).To(Equal("the answer"))
-			Expect(reasoningOf(chunks[1])).To(BeEmpty(),
-				"reasoning must not be re-emitted if it was streamed earlier")
-			Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
-		})
-
-		It("emits role, then content+reasoning when reasoning was not streamed", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				false, false,
-				"the answer", "autoparser final reasoning", usage,
-			)
-
-			Expect(chunks).To(HaveLen(2))
-			Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
-
-			Expect(contentOf(chunks[1])).To(Equal("the answer"))
-			Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
-			Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
-		})
-
-		It("still emits content even when reasoning is empty", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				false, false,
-				"just an answer", "", usage,
-			)
-
-			Expect(chunks).To(HaveLen(2))
-			Expect(contentOf(chunks[1])).To(Equal("just an answer"))
-			Expect(reasoningOf(chunks[1])).To(BeEmpty())
-		})
-	})
-
-	Describe("Metadata and shape invariants", func() {
-		It("stamps every chunk with the same id/model/created and object", func() {
-			chunks := buildNoActionFinalChunks(
-				testID, testModel, testCreated,
-				false, false,
-				"hi", "reasoning", usage,
-			)
-			for i, ch := range chunks {
-				Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
-				Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
-				Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
-				Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
-				Expect(ch.Choices).To(HaveLen(1))
-				Expect(ch.Choices[0].Index).To(Equal(0))
-			}
-		})
-	})
-})
--- a/core/http/middleware/trace.go
+++ b/core/http/middleware/trace.go
@@ -3,7 +3,6 @@ package middleware
 import (
 	"bytes"
 	"io"
-	"mime"
 	"net/http"
 	"slices"
 	"sync"
@@ -95,8 +94,7 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {

 			initializeTracing(app.ApplicationConfig().TracingMaxItems)

-			ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
-			if ct != "application/json" {
+			if c.Request().Header.Get("Content-Type") != "application/json" {
 				return next(c)
 			}

--- a/core/http/routes/ui_api.go
+++ b/core/http/routes/ui_api.go
@@ -23,6 +23,7 @@ import (
 	"github.com/mudler/LocalAI/core/gallery"
 	"github.com/mudler/LocalAI/core/http/auth"
 	"github.com/mudler/LocalAI/core/http/endpoints/localai"
+	"github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/core/p2p"
 	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/pkg/model"
@@ -1457,5 +1458,24 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 		app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
 	}

+	// Logs API (admin only)
+	app.GET("/api/traces", func(c echo.Context) error {
+		if !appConfig.EnableTracing {
+			return c.JSON(503, map[string]any{
+				"error": "Tracing disabled",
+			})
+		}
+		traces := middleware.GetTraces()
+		return c.JSON(200, map[string]any{
+			"traces": traces,
+		})
+	}, adminMiddleware)
+
+	app.POST("/api/traces/clear", func(c echo.Context) error {
+		middleware.ClearTraces()
+		return c.JSON(200, map[string]any{
+			"message": "Traces cleared",
+		})
+	}, adminMiddleware)
 }

--- a/core/services/messaging/subjects.go
+++ b/core/services/messaging/subjects.go
@@ -124,13 +124,8 @@ func SubjectNodeBackendInstall(nodeID string) string {
 // BackendInstallRequest is the payload for a backend.install NATS request.
 type BackendInstallRequest struct {
 	Backend          string `json:"backend"`
-	ModelID          string `json:"model_id,omitempty"`
+	ModelID          string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
 	BackendGalleries string `json:"backend_galleries,omitempty"`
-	// URI is set for external installs (OCI image, URL, or path). When non-empty
-	// the worker routes to InstallExternalBackend instead of the gallery lookup.
-	URI   string `json:"uri,omitempty"`
-	Name  string `json:"name,omitempty"`
-	Alias string `json:"alias,omitempty"`
 }

 // BackendInstallReply is the response from a backend.install NATS request.
--- a/core/services/nodes/managers_distributed.go
+++ b/core/services/nodes/managers_distributed.go
@@ -293,7 +293,7 @@ func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *gall
 	backendName := op.GalleryElementName

 	_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
-		reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
+		reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
 		if err != nil {
 			return err
 		}
@@ -311,7 +311,7 @@ func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name str
 	galleriesJSON, _ := json.Marshal(d.backendGalleries)

 	_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
-		reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
+		reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
 		if err != nil {
 			return err
 		}
--- a/core/services/nodes/reconciler.go
+++ b/core/services/nodes/reconciler.go
@@ -188,7 +188,7 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
 		case OpBackendDelete:
 			_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
 		case OpBackendInstall, OpBackendUpgrade:
-			reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
+			reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries))
 			if err != nil {
 				applyErr = err
 			} else if !reply.Success {
--- a/core/services/nodes/router.go
+++ b/core/services/nodes/router.go
@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
 		return "", fmt.Errorf("no NATS connection for backend installation")
 	}

-	reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
+	reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
 	if err != nil {
 		return "", err
 	}
--- a/core/services/nodes/router_test.go
+++ b/core/services/nodes/router_test.go
@@ -244,7 +244,7 @@ type fakeUnloader struct {
 	unloadErr    error
 }

-func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
+func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
 	return f.installReply, f.installErr
 }

--- a/core/services/nodes/unloader.go
+++ b/core/services/nodes/unloader.go
@@ -17,7 +17,7 @@ type backendStopRequest struct {
 // NodeCommandSender abstracts NATS-based commands to worker nodes.
 // Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
 type NodeCommandSender interface {
-	InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
+	InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
 	DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
 	ListBackends(nodeID string) (*messaging.BackendListReply, error)
 	StopBackend(nodeID, backend string) error
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
 // The worker installs the backend from gallery (if not already installed),
 // starts the gRPC process, and replies when ready.
 // Timeout: 5 minutes (gallery install can take a while).
-func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
+func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
 	subject := messaging.SubjectNodeBackendInstall(nodeID)
 	xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)

@@ -80,9 +80,6 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
 		Backend:          backendType,
 		ModelID:          modelID,
 		BackendGalleries: galleriesJSON,
-		URI:              uri,
-		Name:             name,
-		Alias:            alias,
 	}, 5*time.Minute)
 }

--- a/docs/content/features/backend-monitor.md
+++ b/docs/content/features/backend-monitor.md
@@ -14,13 +14,11 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend

 ### Request

-The model to monitor is passed as a query parameter:
+The request body is JSON:

-| Parameter | Type     | Required | Location | Description                    |
-|-----------|----------|----------|----------|--------------------------------|
-| `model`   | `string` | Yes      | query    | Name of the model to monitor   |
-
-For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.
+| Parameter | Type     | Required | Description                    |
+|-----------|----------|----------|--------------------------------|
+| `model`   | `string` | Yes      | Name of the model to monitor   |

 ### Response

@@ -44,7 +42,9 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
 ### Usage

 ```bash
-curl "http://localhost:8080/backend/monitor?model=my-model"
+curl http://localhost:8080/backend/monitor \
+  -H "Content-Type: application/json" \
+  -d '{"model": "my-model"}'
 ```

 ### Example response
--- a/docs/content/reference/_index.md
+++ b/docs/content/reference/_index.md
@@ -130,19 +130,6 @@ Reference for system information commands and diagnostics.

 ---

-### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
-Policy for AI-assisted contributions — licensing, DCO, and attribution.
-
-**Key topics:**
- Aligned with the Linux kernel's AI assistants policy
- Signed-off-by and DCO rules
- `Assisted-by` commit trailer format
- Scope and responsibility of the human submitter
-
-**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
-
---
-
 ## Quick Links

 | Task | Documentation |
@@ -151,7 +138,6 @@ Policy for AI-assisted contributions — licensing, DCO, and attribution.
 | CLI commands | [CLI Reference](cli-reference.md) |
 | Check compatibility | [Compatibility Table](compatibility-table.md) |
 | System diagnostics | [System Info](system-info.md) |
-| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |

 ---

--- a/docs/content/reference/ai-coding-assistants.md
+++ b/docs/content/reference/ai-coding-assistants.md
@@ -1,79 +0,0 @@
-
-+++
-disableToc = false
-title = "AI Coding Assistants"
-weight = 28
-+++
-
-This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
-
-**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
-
-AI tools helping with LocalAI development should follow the standard project development process:
-
- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
-
-## Licensing and Legal Requirements
-
-All contributions must comply with LocalAI's licensing requirements:
-
- LocalAI is licensed under the **MIT License**
- New source files should use the SPDX license identifier `MIT` where applicable to the file type
- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
-
-## Signed-off-by and Developer Certificate of Origin
-
-**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
-
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
- Taking full responsibility for the contribution
-
-AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
-
-## Attribution
-
-When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
-
-```
-Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
-```
-
-Where:
-
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
-
-Basic development tools (git, go, make, editors) should **not** be listed.
-
-### Example
-
-```
-fix(llama-cpp): handle empty tool call arguments
-
-Previously the parser panicked when the model returned a tool call with
-an empty arguments object. Fall back to an empty JSON object in that
-case so downstream consumers receive a valid payload.
-
-Assisted-by: Claude:claude-opus-4-7 golangci-lint
-Signed-off-by: Jane Developer <jane@example.com>
-```
-
-## Scope and Responsibility
-
-Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
-
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the project style
- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
-
-Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
-
-{{% notice note %}}
-This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
-{{% /notice %}}
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 |---------|-------------|-------------|
 | [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
 | [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
-| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
+| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
 | [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
 | [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
 | [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,206 +1,4 @@
 ---
- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
-  description: |
-    # 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
-
-    A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
-
-    The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
-
-      - **Developed by:** @hesamation
-      - **Base model:** `Qwen/Qwen3.6-35B-A3B`
-      - **License:** apache-2.0
-
-    This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
-
-    [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
-
-    ## Benchmark Results
-
-    The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
-
-    ...
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qwen
-    - reasoning
-  icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    options:
-      - use_jinja:true
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
-      presence_penalty: 1.5
-      repeat_penalty: 1
-      temperature: 0.7
-      top_k: 20
-      top_p: 0.8
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
-      sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
-      uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
- name: "qwen3.5-9b-glm5.1-distill-v1"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
-  description: |
-    # 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
-
-    ## 📌 Model Overview
-
-    **Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
-    **Base Model:** Qwen3.5-9B
-    **Training Type:** Supervised Fine-Tuning (SFT, Distillation)
-    **Parameter Scale:** 9B
-    **Training Framework:** Unsloth
-
-    This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
-
-    The primary goals are to:
-
-      - Improve **structured reasoning ability**
-      - Enhance **instruction-following consistency**
-      - Activate **latent knowledge via better reasoning structure**
-
-    ## 📊 Training Data
-
-    ### Main Dataset
-
-      - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
-      - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
-      - Generated from a **GLM-5.1 teacher model**
-      - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
-      - Training used a **filtered subset**, not the full source dataset.
-
-    ### Auxiliary Dataset
-
-      - `Jackrong/Qwen3.5-reasoning-700x`
-
-    ...
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - qwen
-    - instruction-tuned
-    - reasoning
-  icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
-    options:
-      - use_jinja:true
-    parameters:
-      min_p: 0
-      model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
-      presence_penalty: 1.5
-      repeat_penalty: 1
-      temperature: 0.7
-      top_k: 20
-      top_p: 0.8
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
-      sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
-      uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
-    - filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
-      sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
-      uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
- name: "supergemma4-26b-uncensored-v2"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
-  description: |
-    Hugging Face |
-    GitHub |
-    Launch Blog |
-    Documentation
-
-    License: Apache 2.0 | Authors: Google DeepMind
-
-    Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
-
-    Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
-
-    Gemma 4 introduces key **capability and architectural advancements**:
-
-    * **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
-
-    ...
-  license: "gemma"
-  tags:
-    - llm
-    - gguf
-  icon: https://ai.google.dev/gemma/images/gemma4_banner.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    options:
-      - use_jinja:true
-    parameters:
-      model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
-      sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
-      uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
- name: "qwopus-glm-18b-merged"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
-  description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n  - Improve **structured reasoning ability**\n  - Enhance **instruction-following consistency**\n  - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n  - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n  - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n  - Generated from a **GLM-5.1 teacher model**\n  - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n  - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n  - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
-  license: "apache-2.0"
-  tags:
-    - llm
-    - gguf
-    - reasoning
-  icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
-  overrides:
-    backend: llama-cpp
-    function:
-      automatic_tool_parsing_fallback: true
-      grammar:
-        disable: true
-    known_usecases:
-      - chat
-    options:
-      - use_jinja:true
-    parameters:
-      model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
-    template:
-      use_tokenizer_template: true
-  files:
-    - filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
-      sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
-      uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
 - name: "qwen3.6-35b-a3b-apex"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
@@ -1089,8 +887,6 @@
    - gpu
  overrides:
    backend: neutts
-    parameters:
-      model: neuphonic/neutts-air
    known_usecases:
      - tts
 - name: vllm-omni-z-image-turbo
@@ -15390,16 +15186,14 @@
    - gpu
  overrides:
    parameters:
-      model: wan2.1_t2v_1.3b-q8_0.gguf
+      model: wan2.1-t2v-1.3B-Q8_0.gguf
  files:
-    - filename: "wan2.1_t2v_1.3b-q8_0.gguf"
-      sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
-      uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
+    - filename: "wan2.1-t2v-1.3B-Q8_0.gguf"
+      uri: "huggingface://calcuis/wan-gguf/wan2.1-t2v-1.3B-Q8_0.gguf"
    - filename: "wan_2.1_vae.safetensors"
      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
    - filename: "umt5-xxl-encoder-Q8_0.gguf"
      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
-      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
 - name: wan-2.1-i2v-14b-480p-ggml
  license: apache-2.0
  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
@@ -15420,103 +15214,11 @@
      model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
    options:
      - "clip_vision_path:clip_vision_h.safetensors"
-      - "diffusion_model"
-      - "vae_decode_only:false"
-      - "sampler:euler"
-      - "flow_shift:3.0"
-      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
-      - "vae_path:wan_2.1_vae.safetensors"
  files:
    - filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
-      sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
      uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
    - filename: "wan_2.1_vae.safetensors"
      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
-    - filename: "umt5-xxl-encoder-Q8_0.gguf"
-      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
-      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
-    - filename: "clip_vision_h.safetensors"
-      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-flf2v-14b-720p-ggml
-  license: apache-2.0
-  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
-  description: |
-    Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
-    Takes a start and end reference image and interpolates a 33-frame clip
-    between them. Unlike the plain I2V variant this model feeds the end
-    frame through clip_vision as well, so it conditions semantically (not
-    just in pixel-space) on both endpoints. That makes it the right choice
-    for seamless loops (start_image == end_image) and clean narrative cuts.
-    Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
-    text encoder, and clip_vision_h as I2V 14B.
-  urls:
-    - https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
-  tags:
-    - image-to-video
-    - first-last-frame-to-video
-    - wan
-    - video-generation
-    - cpu
-    - gpu
-  overrides:
-    parameters:
-      model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
-    options:
-      - "clip_vision_path:clip_vision_h.safetensors"
-      - "diffusion_model"
-      - "vae_decode_only:false"
-      - "sampler:euler"
-      - "flow_shift:3.0"
-      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
-      - "vae_path:wan_2.1_vae.safetensors"
-  files:
-    - filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
-      sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
-      uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
-    - filename: "wan_2.1_vae.safetensors"
-      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
-    - filename: "umt5-xxl-encoder-Q8_0.gguf"
-      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
-      sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
-    - filename: "clip_vision_h.safetensors"
-      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
- name: wan-2.1-i2v-14b-720p-ggml
-  license: apache-2.0
-  url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
-  description: |
-    Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
-    Native 720p sibling of the 480p I2V model: animates a single
-    reference image into a 33-frame clip at up to 1280x720. Trained
-    purely as image-to-video (no first-last-frame interpolation path),
-    so motion is freer and better-suited to single-anchor animation
-    than repurposing the FLF2V 720P variant for i2v. Shares the same
-    VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
-    and FLF2V 14B 720P entries.
-  urls:
-    - https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
-  tags:
-    - image-to-video
-    - wan
-    - video-generation
-    - cpu
-    - gpu
-  overrides:
-    parameters:
-      model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
-    options:
-      - "clip_vision_path:clip_vision_h.safetensors"
-      - "diffusion_model"
-      - "vae_decode_only:false"
-      - "sampler:euler"
-      - "flow_shift:3.0"
-      - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
-      - "vae_path:wan_2.1_vae.safetensors"
-  files:
-    - filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
-      sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
-      uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
-    - filename: "wan_2.1_vae.safetensors"
-      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
    - filename: "umt5-xxl-encoder-Q8_0.gguf"
      uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
    - filename: "clip_vision_h.safetensors"
--- a/gallery/wan-ggml.yaml
+++ b/gallery/wan-ggml.yaml
@@ -9,6 +9,11 @@ config_file: |
    - "diffusion_model"
    - "vae_decode_only:false"
    - "sampler:euler"
+    - "scheduler:discrete"
    - "flow_shift:3.0"
+    - "diffusion_flash_attn:true"
+    - "offload_params_to_cpu:true"
+    - "keep_vae_on_cpu:true"
+    - "keep_clip_on_cpu:true"
    - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
    - "vae_path:wan_2.1_vae.safetensors"
--- a/go.mod
+++ b/go.mod
@@ -8,13 +8,13 @@ require (
 	github.com/Masterminds/sprig/v3 v3.3.0
 	github.com/alecthomas/kong v1.14.0
 	github.com/anthropics/anthropic-sdk-go v1.27.0
-	github.com/aws/aws-sdk-go-v2 v1.41.6
-	github.com/aws/aws-sdk-go-v2/config v1.32.16
-	github.com/aws/aws-sdk-go-v2/credentials v1.19.15
-	github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
+	github.com/aws/aws-sdk-go-v2 v1.41.5
+	github.com/aws/aws-sdk-go-v2/config v1.32.14
+	github.com/aws/aws-sdk-go-v2/credentials v1.19.14
+	github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
 	github.com/charmbracelet/glamour v1.0.0
-	github.com/containerd/containerd v1.7.31
-	github.com/coreos/go-oidc/v3 v3.18.0
+	github.com/containerd/containerd v1.7.30
+	github.com/coreos/go-oidc/v3 v3.17.0
 	github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
 	github.com/ebitengine/purego v0.10.0
 	github.com/emirpasic/gods/v2 v2.0.0-alpha
@@ -35,7 +35,7 @@ require (
 	github.com/lithammer/fuzzysearch v1.1.8
 	github.com/mholt/archiver/v3 v3.5.1
 	github.com/microcosm-cc/bluemonday v1.0.27
-	github.com/modelcontextprotocol/go-sdk v1.5.0
+	github.com/modelcontextprotocol/go-sdk v1.4.1
 	github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
 	github.com/mudler/edgevpn v0.31.1
 	github.com/mudler/go-processmanager v0.1.0
@@ -75,23 +75,24 @@ require (
 )

 require (
-	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
-	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
-	github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
-	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
-	github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
-	github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
-	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
-	github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
-	github.com/aws/smithy-go v1.25.0 // indirect
+	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
+	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
+	github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
+	github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
+	github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
+	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
+	github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
+	github.com/aws/smithy-go v1.24.2 // indirect
 	github.com/bahlo/generic-list-go v0.2.0 // indirect
 	github.com/buger/jsonparser v1.1.1 // indirect
-	github.com/go-jose/go-jose/v4 v4.1.4 // indirect
+	github.com/go-jose/go-jose/v4 v4.1.3 // indirect
 	github.com/jinzhu/inflection v1.0.0 // indirect
 	github.com/jinzhu/now v1.1.5 // indirect
 	github.com/mattn/go-sqlite3 v1.14.24 // indirect
--- a/go.sum
+++ b/go.sum
@@ -70,42 +70,44 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
 github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
 github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
 github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
-github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
-github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
-github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
-github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
-github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
-github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
-github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
-github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
-github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
-github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
-github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
-github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
-github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
-github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
-github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
-github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
-github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
-github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
-github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
-github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
-github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
-github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
-github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
-github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
-github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
-github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
-github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
-github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
-github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
-github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
-github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
-github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
-github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
-github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
-github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
-github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
+github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
+github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
+github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
+github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
+github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
+github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
+github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
+github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
+github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
+github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
+github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
+github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
+github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
+github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
+github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
+github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
+github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
+github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
+github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
+github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
+github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
+github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
+github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
+github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
+github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
+github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
+github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
+github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
+github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
+github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
+github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
+github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
+github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
+github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
+github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
+github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
+github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
+github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
 github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -196,8 +198,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
 github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
 github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
 github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
-github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
-github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
+github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
+github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
 github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
 github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
 github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
@@ -210,8 +212,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
 github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
 github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
 github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
-github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
-github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
+github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
+github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
 github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
 github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
 github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
@@ -334,8 +336,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
 github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
 github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
 github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
-github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
-github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
+github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
+github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
 github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
 github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
 github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
@@ -383,8 +385,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
 github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
 github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
 github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
-github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
-github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
+github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
+github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
 github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
 github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
 github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
@@ -689,8 +691,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
 github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
 github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
 github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
-github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
-github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
+github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
+github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
 github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
 github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
--- a/pkg/system/capabilities_test.go
+++ b/pkg/system/capabilities_test.go
@@ -159,6 +159,7 @@ var _ = Describe("CapabilityFilterDisabled", func() {
 		os.Setenv(capabilityEnv, "disable")
 		s := &SystemState{}
 		Expect(s.IsBackendCompatible("cuda12-whisperx", "quay.io/nvidia-cuda-12")).To(BeTrue())
+		Expect(s.IsBackendCompatible("rocm-whisperx", "quay.io/rocm")).To(BeTrue())
 		Expect(s.IsBackendCompatible("metal-whisperx", "quay.io/metal-darwin")).To(BeTrue())
 		Expect(s.IsBackendCompatible("intel-whisperx", "quay.io/intel-sycl")).To(BeTrue())
 		Expect(s.IsBackendCompatible("cpu-whisperx", "quay.io/cpu")).To(BeTrue())
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -985,11 +985,13 @@ const docTemplate = `{
                "summary": "Backend monitor endpoint",
                "parameters": [
                    {
-                        "type": "string",
-                        "description": "Name of the model to monitor",
-                        "name": "model",
-                        "in": "query",
-                        "required": true
+                        "description": "Backend statistics request",
+                        "name": "request",
+                        "in": "body",
+                        "required": true,
+                        "schema": {
+                            "$ref": "#/definitions/schema.BackendMonitorRequest"
+                        }
                    }
                ],
                "responses": {
@@ -2406,23 +2408,6 @@ const docTemplate = `{
                }
            }
        },
-        "gallery.NodeDriftInfo": {
-            "type": "object",
-            "properties": {
-                "digest": {
-                    "type": "string"
-                },
-                "node_id": {
-                    "type": "string"
-                },
-                "node_name": {
-                    "type": "string"
-                },
-                "version": {
-                    "type": "string"
-                }
-            }
-        },
        "gallery.UpgradeInfo": {
            "type": "object",
            "properties": {
@@ -2440,13 +2425,6 @@ const docTemplate = `{
                },
                "installed_version": {
                    "type": "string"
-                },
-                "node_drift": {
-                    "description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
-                    "type": "array",
-                    "items": {
-                        "$ref": "#/definitions/gallery.NodeDriftInfo"
-                    }
                }
            }
        },
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -982,11 +982,13 @@
                "summary": "Backend monitor endpoint",
                "parameters": [
                    {
-                        "type": "string",
-                        "description": "Name of the model to monitor",
-                        "name": "model",
-                        "in": "query",
-                        "required": true
+                        "description": "Backend statistics request",
+                        "name": "request",
+                        "in": "body",
+                        "required": true,
+                        "schema": {
+                            "$ref": "#/definitions/schema.BackendMonitorRequest"
+                        }
                    }
                ],
                "responses": {
@@ -2403,23 +2405,6 @@
                }
            }
        },
-        "gallery.NodeDriftInfo": {
-            "type": "object",
-            "properties": {
-                "digest": {
-                    "type": "string"
-                },
-                "node_id": {
-                    "type": "string"
-                },
-                "node_name": {
-                    "type": "string"
-                },
-                "version": {
-                    "type": "string"
-                }
-            }
-        },
        "gallery.UpgradeInfo": {
            "type": "object",
            "properties": {
@@ -2437,13 +2422,6 @@
                },
                "installed_version": {
                    "type": "string"
-                },
-                "node_drift": {
-                    "description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
-                    "type": "array",
-                    "items": {
-                        "$ref": "#/definitions/gallery.NodeDriftInfo"
-                    }
                }
            }
        },
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -157,17 +157,6 @@ definitions:
          type: string
        type: array
    type: object
-  gallery.NodeDriftInfo:
-    properties:
-      digest:
-        type: string
-      node_id:
-        type: string
-      node_name:
-        type: string
-      version:
-        type: string
-    type: object
  gallery.UpgradeInfo:
    properties:
      available_digest:
@@ -180,14 +169,6 @@ definitions:
        type: string
      installed_version:
        type: string
-      node_drift:
-        description: |-
-          NodeDrift lists nodes whose installed version or digest differs from
-          the cluster majority. Non-empty means the cluster has diverged and an
-          upgrade will realign it. Empty in single-node mode.
-        items:
-          $ref: '#/definitions/gallery.NodeDriftInfo'
-        type: array
    type: object
  galleryop.OpStatus:
    properties:
@@ -2382,11 +2363,12 @@ paths:
  /backend/monitor:
    get:
      parameters:
-      - description: Name of the model to monitor
-        in: query
-        name: model
+      - description: Backend statistics request
+        in: body
+        name: request
        required: true
-        type: string
+        schema:
+          $ref: '#/definitions/schema.BackendMonitorRequest'
      responses:
        "200":
          description: Response
--- a/tests/e2e/distributed/node_lifecycle_test.go
+++ b/tests/e2e/distributed/node_lifecycle_test.go
@@ -57,7 +57,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
 			FlushNATS(infra.NC)

 			adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
-			installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "", "", "", "")
+			installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "")
 			Expect(err).ToNot(HaveOccurred())
 			Expect(installReply.Success).To(BeTrue())
 		})
@@ -78,7 +78,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
 			FlushNATS(infra.NC)

 			adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
-			installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "", "", "", "")
+			installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "")
 			Expect(err).ToNot(HaveOccurred())
 			Expect(installReply.Success).To(BeFalse())
 			Expect(installReply.Error).To(ContainSubstring("backend not found"))
Author	SHA1	Message	Date
Ettore Di Giacinto	44e7d9806b	fix(distributed): stop queue loops on agent nodes + dead-letter cap pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.	2026-04-19 21:27:05 +00:00
Ettore Di Giacinto	7a9d89fa54	feat(ui): shared FilterBar across the System page tabs The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.	2026-04-19 08:46:22 +00:00
Ettore Di Giacinto	ee34a52c5d	feat(ui): NodeDistributionChip — shared per-node attribution component Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic.	2026-04-19 08:39:59 +00:00
Ettore Di Giacinto	92b9e22dc9	feat(ui): show cluster distribution of models in the System page When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle.	2026-04-19 08:37:45 +00:00
Ettore Di Giacinto	f0ab68e352	feat(distributed): durable backend fan-out + state reconciliation Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter.	2026-04-19 08:34:57 +00:00
Ettore Di Giacinto	9373de9f9b	feat(ui): polish the Nodes page so it reads like a product The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker.	2026-04-19 08:20:52 +00:00
Ettore Di Giacinto	1b3c951c85	feat(ui): surface backend upgrades in the System page The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles.	2026-04-19 08:14:49 +00:00
Ettore Di Giacinto	1f43762655	fix(distributed): detect backend upgrades across worker nodes Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see why it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed.	2026-04-19 08:03:20 +00:00