fix(react-ui): unify backend-logs entry point for distributed mode

In distributed mode the local /api/backend-logs WebSocket has nothing behind it (inference runs on workers), so the "View backend logs" link in Traces (and the action in Manage when previously not hidden) dead- ended on /app/backend-logs/<modelId>. Manage worked around it by hiding the action; Traces still rendered the link. Make /app/backend-logs/:modelId the single, mode-aware entry point. A new BackendLogsRouter probes useDistributedMode and forks: - standalone: existing local WebSocket view (BackendLogsDetail). - distributed: DistributedBackendLogsResolver fans out to each node via nodesApi.getModels, filters by model_name, and routes: * 0 hits -> empty state with a link to the Nodes page. * 1 hit -> <Navigate replace> to /app/node-backend-logs/<nodeId>/<modelId>, preserving the ?from= deep-link timestamp. * N hits -> picker listing each hosting worker (node id, replica index, load state) so the operator can choose which worker's logs to view. Bare modelId in the redirect target intentionally aggregates that node's replicas via the worker's BackendLogStore, matching the existing per-node link pattern in Nodes.jsx. Revert the per-caller distributed checks now that routing is centralised: drop the hidden:distributedMode guard on Manage's Backend logs action, and remove the prop threading in Traces so the link is unconditional. Any future view that wants to link to backend logs uses the same URL and gets correct behaviour in both modes. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix(traces): cap captured body size to keep admin Traces UI responsive (#9946 )
2026-05-22 15:50:31 -04:00 · 2026-05-22 19:36:45 +00:00 · 2026-05-22 15:29:24 +02:00 · 2026-05-22 10:13:41 +02:00 · 2026-05-22 09:49:33 +02:00 · 2026-05-22 08:31:49 +02:00
125 changed files with 5872 additions and 771 deletions
--- a/.agents/adding-backends.md
+++ b/.agents/adding-backends.md
@@ -112,6 +112,8 @@ Add a YAML anchor definition in the `## metas` section (around line 2-300). Look

 Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.

+**Note on integrity:** OCI backends installed from a gallery whose `verification:` block is set are verified against a keyless-cosign policy before extraction; tarball/HTTP backends use the optional `sha256:` field. New backends do not need any extra YAML — the gallery-level `verification:` block covers every entry. See [.agents/backend-signing.md](backend-signing.md) for the producer-side CI step.
+
 ## 4. Update the Makefile

 The Makefile needs to be updated in several places to support building and testing the new backend:
--- a/.agents/backend-signing.md
+++ b/.agents/backend-signing.md
@@ -0,0 +1,120 @@
+# Backend image signing & verification
+
+LocalAI verifies backend OCI images against a per-gallery keyless-cosign
+policy. This page documents the trust model, the producer side
+(`.github/workflows/backend_merge.yml` in this repo), and the consumer
+side (`pkg/oci/cosignverify` plus the gallery YAML).
+
+## Trust model
+
+- **Producer:** `.github/workflows/backend_merge.yml` signs each pushed
+  manifest list with `cosign sign --recursive` in keyless mode after
+  `docker buildx imagetools create`. The signing cert is issued by
+  Fulcio bound to the workflow's OIDC identity. There is no long-lived
+  signing key. `--recursive` signs both the manifest list and every
+  per-arch entry — needed because our consumer resolves a tag to a
+  per-arch manifest before checking signatures.
+- **Storage:** Signatures are written as OCI 1.1 referrers
+  (`--registry-referrers-mode=oci-1-1`) in the new Sigstore bundle format
+  (`--new-bundle-format`). No `:sha256-<hex>.sig` tag clutter.
+- **Consumer:** `pkg/oci/cosignverify` discovers the bundle via the
+  referrers API, hands it to `sigstore-go`, and verifies it against the
+  policy declared in the gallery YAML (`Gallery.Verification`).
+- **Revocation:** Keyless cosign certs are ephemeral (10-minute Fulcio
+  validity), so revocation is policy-side, not CA-side. The gallery's
+  `verification.not_before` (RFC3339) is the kill-switch — advance it to
+  invalidate every signature produced before a known compromise window.
+
+## Producer setup
+
+`backend_merge.yml` is the workflow that joins per-arch digests into the
+multi-arch manifest list users actually pull, so it's also the right place
+to sign. The job needs:
+
+- `permissions: { id-token: write, contents: read }` at the job level so
+  the runner can exchange its GitHub OIDC token for a Fulcio cert.
+- `sigstore/cosign-installer@v3` step (cosign ≥ 2.2 for
+  `--new-bundle-format`).
+- After each `docker buildx imagetools create`, resolve the resulting
+  list digest with `docker buildx imagetools inspect <tag> --format
+  '{{.Manifest.Digest}}'` and sign:
+
+```sh
+cosign sign --yes --recursive \
+  --new-bundle-format \
+  --registry-referrers-mode=oci-1-1 \
+  "${REGISTRY_REPO}@${DIGEST}"
+```
+
+Sign by digest, never by tag — signing by tag binds the signature to
+whatever the tag points at *now*, and a subsequent tag push orphans it.
+
+`backend_build_darwin.yml` builds and pushes single-arch darwin images
+that bypass the manifest-list merge. If/when those entries get a gallery
+`verification:` policy, the equivalent cosign step has to land there
+too.
+
+## Consumer setup (in `mudler/LocalAI` gallery YAML)
+
+Once CI is signing, add a `verification:` block to the backend gallery
+entry (`backend/index.yaml`):
+
+```yaml
+- name: localai
+  url: github:mudler/LocalAI/backend/index.yaml@master
+  verification:
+    issuer: "https://token.actions.githubusercontent.com"
+    identity_regex: "^https://github\\.com/mudler/LocalAI/\\.github/workflows/backend_merge\\.yml@refs/heads/master$"
+    # Optional revocation cutoff; advance during incident response.
+    # not_before: "2026-06-01T00:00:00Z"
+```
+
+Identity matching pins the OIDC subject Fulcio issued the signing cert
+to. Without this, any image signed by *anyone* with a Fulcio cert would
+pass — the regex is what makes a signature mean "produced by our CI".
+
+## Strict mode
+
+Default behaviour: OCI backends without a `verification:` block install
+with a warning (logs include `installing OCI backend without signature
+verification`). Tarball/HTTP backends without a `sha256` field log a
+similar warning.
+
+For production, set `LOCALAI_REQUIRE_BACKEND_INTEGRITY=1` (or pass
+`--require-backend-integrity` to `local-ai run` / `local-ai backends
+install` / `local-ai models install`). The warning becomes a hard error
+and unverifiable backends refuse to install.
+
+## Revocation playbook
+
+If `backend_merge.yml` (or any workflow with `id-token: write`) is
+compromised and we've shipped malicious signed images:
+
+1. **Identify the compromise window.** Find the earliest IntegratedTime
+   from the bad signatures (Rekor search by `subject` filter).
+2. **Set `verification.not_before`** in `backend/index.yaml` to a
+   timestamp just *after* that window's start.
+3. **Push the YAML.** Deployed LocalAI instances pick it up on next
+   gallery refresh (1-hour cache in `core/gallery/gallery.go`).
+4. **Fix the underlying compromise** in the workflow and re-sign images
+   with the new build, which will have IntegratedTime > `not_before`.
+5. **Optional:** for absolute decisiveness, also rotate to a new
+   workflow path (`backend_merge_v2.yml`) and update `identity_regex`.
+
+## Where the code lives
+
+- `pkg/oci/cosignverify/` — verifier, policy, OCI referrer fetch, NotBefore enforcement.
+- `pkg/downloader/uri.go` — `WithImageVerifier` option threaded through `DownloadFileWithContext`.
+- `core/gallery/backends.go` — `backendDownloadOptions` builds the verifier from the gallery's policy.
+- `core/config/gallery.go` — `Gallery.Verification` YAML schema.
+- `core/cli/run.go`, `core/cli/backends.go`, `core/cli/models.go` — `--require-backend-integrity` flag propagation.
+- `.github/workflows/backend_merge.yml` — producer-side `cosign sign --recursive` after each multi-arch manifest list push.
+
+## Out of scope (follow-ups)
+
+- **Signing the gallery YAML itself.** The index is fetched over HTTPS
+  from GitHub; we trust the host. A cosign blob signature on the YAML
+  would close that gap but adds key-management overhead. Revisit this
+  page if/when added.
+- **Tarball/HTTP backend signing.** Cosign can sign arbitrary blobs, but
+  for now non-OCI backends keep using the `sha256:` field in YAML.
--- a/.github/workflows/backend_merge.yml
+++ b/.github/workflows/backend_merge.yml
@@ -31,6 +31,13 @@ on:
 jobs:
  merge:
    runs-on: ubuntu-latest
+    # id-token: write is required for keyless cosign — the workflow
+    # exchanges the GitHub OIDC token for a short-lived Fulcio cert that
+    # signs each pushed manifest. Without this permission the runner
+    # cannot mint the token, and `cosign sign` fails with "no token".
+    permissions:
+      contents: read
+      id-token: write
    env:
      quay_username: ${{ secrets.quayUsername }}
    steps:
@@ -57,6 +64,15 @@ jobs:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@master

+      # cosign signs each pushed manifest list with --recursive so the
+      # index and every per-arch entry get an attached Sigstore bundle.
+      # 2.2+ is required for --new-bundle-format.
+      - name: Install cosign
+        if: github.event_name != 'pull_request'
+        uses: sigstore/cosign-installer@v3
+        with:
+          cosign-release: 'v2.4.1'
+
      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v4
@@ -120,11 +136,26 @@ jobs:
          ' <<< "$DOCKER_METADATA_OUTPUT_JSON")
          if [ -z "$tags" ]; then
            echo "No quay.io tags from docker/metadata-action; skipping quay merge"
-          else
-            # shellcheck disable=SC2086
-            docker buildx imagetools create $tags \
-              $(printf 'quay.io/go-skynet/ci-cache@sha256:%s ' *)
+            exit 0
          fi
+          # shellcheck disable=SC2086
+          docker buildx imagetools create $tags \
+            $(printf 'quay.io/go-skynet/ci-cache@sha256:%s ' *)
+          # Resolve the manifest-list digest (any tag points at it) so
+          # cosign can sign by digest. Signing by tag would leave the
+          # signature orphaned the next time the tag moves.
+          first_tag=$(jq -cr '
+            .tags | map(select(startswith("quay.io/"))) | .[0]
+          ' <<< "$DOCKER_METADATA_OUTPUT_JSON")
+          digest=$(docker buildx imagetools inspect "$first_tag" --format '{{.Manifest.Digest}}')
+          # --recursive walks the list and signs every per-arch entry
+          # too — clients that resolve a tag to a platform-specific
+          # manifest before checking signatures need the per-arch
+          # signatures, not just the list-level one.
+          cosign sign --yes --recursive \
+            --new-bundle-format \
+            --registry-referrers-mode=oci-1-1 \
+            "quay.io/go-skynet/local-ai-backends@${digest}"

      - name: Create manifest list and push (dockerhub)
        if: github.event_name != 'pull_request'
@@ -139,11 +170,19 @@ jobs:
          ' <<< "$DOCKER_METADATA_OUTPUT_JSON")
          if [ -z "$tags" ]; then
            echo "No dockerhub tags from docker/metadata-action; skipping dockerhub merge"
-          else
-            # shellcheck disable=SC2086
-            docker buildx imagetools create $tags \
-              $(printf 'localai/localai-backends@sha256:%s ' *)
+            exit 0
          fi
+          # shellcheck disable=SC2086
+          docker buildx imagetools create $tags \
+            $(printf 'localai/localai-backends@sha256:%s ' *)
+          first_tag=$(jq -cr '
+            .tags | map(select(startswith("localai/"))) | .[0]
+          ' <<< "$DOCKER_METADATA_OUTPUT_JSON")
+          digest=$(docker buildx imagetools inspect "$first_tag" --format '{{.Manifest.Digest}}')
+          cosign sign --yes --recursive \
+            --new-bundle-format \
+            --registry-referrers-mode=oci-1-1 \
+            "localai/localai-backends@${digest}"

      - name: Inspect manifest
        if: github.event_name != 'pull_request'
--- a/.github/workflows/image_build.yml
+++ b/.github/workflows/image_build.yml
@@ -106,6 +106,7 @@ jobs:
            type=ref,event=branch
            type=semver,pattern={{raw}}
            type=sha
+            type=raw,value={{branch}}-{{date 'X'}}-{{sha}},enable={{is_default_branch}}
          flavor: |
            latest=${{ inputs.tag-latest }}
            suffix=${{ inputs.tag-suffix }},onlatest=true
--- a/.github/workflows/image_merge.yml
+++ b/.github/workflows/image_merge.yml
@@ -80,6 +80,7 @@ jobs:
            type=ref,event=branch
            type=semver,pattern={{raw}}
            type=sha
+            type=raw,value={{branch}}-{{date 'X'}}-{{sha}},enable={{is_default_branch}}
          flavor: |
            latest=${{ inputs.tag-latest }}
            suffix=${{ inputs.tag-suffix }},onlatest=true
--- a/.gitignore
+++ b/.gitignore
@@ -77,3 +77,6 @@ local-backends/
 tests/e2e-ui/ui-test-server
 core/http/react-ui/playwright-report/
 core/http/react-ui/test-results/
+
+# Local worktrees
+.worktrees/
--- a/.golangci.yml
+++ b/.golangci.yml
@@ -46,8 +46,52 @@ linters:
          msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.Fail. See .agents/coding-style.md.'
        - pattern: '^t\.FailNow$'
          msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.FailNow. See .agents/coding-style.md.'
+        # In-process config should flow through ApplicationConfig / kong-bound
+        # CLI flags, not via os.Getenv. The CLI layer is the legitimate
+        # env→struct boundary (kong's `env:"..."` tag); anything deeper that
+        # reads env directly leaks process state into business logic and
+        # makes flags impossible to test or override per-request. Backend
+        # subprocesses, the system/capabilities probe, and a few places that
+        # read non-LocalAI env vars (HOME, PATH, AUTH_TOKEN passed by parent)
+        # are exempt — see linters.exclusions.rules below.
+        - pattern: '^os\.(Getenv|LookupEnv|Environ)$'
+          msg: 'Plumb config through ApplicationConfig (or the relevant CLI struct) instead of reading env directly. CLI entry points (core/cli/) bind env vars via kong''s `env:` tag — that is the only sanctioned env→struct boundary. See .agents/coding-style.md.'
  exclusions:
    paths:
      # Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
      - 'backend/go/whisper/sources'
      - 'docs/'
+    rules:
+      # CLI entry points: kong's `env:"..."` tag is the legitimate env→struct
+      # boundary, and a handful of subcommands legitimately propagate values
+      # to spawned subprocesses (LLAMACPP_GRPC_SERVERS, MLX hostfile, ...).
+      - path: ^core/cli/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      # Backend subprocesses are independent binaries with their own env
+      # surface; they're not "in-process config" of the LocalAI server.
+      - path: ^backend/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      # System capability probe reads HOME, PATH-style vars to discover
+      # GPUs, default paths, etc. — not LocalAI config.
+      - path: ^pkg/system/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      # gRPC server reads AUTH_TOKEN passed in by the parent process at spawn
+      # time; model.Loader sets/inherits env to communicate with subprocesses.
+      - path: ^pkg/grpc/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      - path: ^pkg/model/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      # Top-level main binaries (local-ai, launcher) are entry points.
+      - path: ^cmd/
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
+      # Tests legitimately read $HOME, $TMPDIR, and gating env vars
+      # (LOCALAI_COSIGN_LIVE, etc.) to skip live-network specs.
+      - path: _test\.go$
+        text: 'os\.(Getenv|LookupEnv|Environ)'
+        linters: [forbidigo]
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -31,6 +31,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
 | [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
 | [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
 | [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |
+| [.agents/backend-signing.md](.agents/backend-signing.md) | Backend OCI image signing (keyless cosign + sigstore-go) — producer-side CI setup, consumer-side gallery `verification:` block, strict mode (`LOCALAI_REQUIRE_BACKEND_INTEGRITY`), revocation via `not_before` |

 ## Quick Reference

--- a/backend/cpp/ds4/Makefile
+++ b/backend/cpp/ds4/Makefile
@@ -1,10 +1,10 @@
 # ds4 backend Makefile.
 #
-# Upstream pin lives below as DS4_VERSION?=ef0a4905d05263df8e63689f2dd1efac618a752c
+# Upstream pin lives below as DS4_VERSION?=8d576642c39b9a2d782a80159ba84ef5a81c0b81
 # (.github/bump_deps.sh) can find and update it - matches the
 # llama-cpp / ik-llama-cpp / turboquant convention.

-DS4_VERSION?=ef0a4905d05263df8e63689f2dd1efac618a752c
+DS4_VERSION?=8d576642c39b9a2d782a80159ba84ef5a81c0b81
 DS4_REPO?=https://github.com/antirez/ds4

 CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=3e573cfea6e0a332eff822ffbdb1dd3b112e9051
+IK_LLAMA_VERSION?=48a55f74e4c6e2aeda363dd386c1ac9170a0af71
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=0253fb21f595246f54c192fe8332f34173be251b
+LLAMA_VERSION?=bb28c1fe246b72276ee1d00ce89306be7b865766
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/grpc-server.cpp
+++ b/backend/cpp/llama-cpp/grpc-server.cpp
@@ -517,16 +517,27 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
    params.warmup = true;
    // no_op_offload: disable host tensor op offload (default: false)
    params.no_op_offload = false;
-    // kv_unified: enable unified KV cache (default: false)
-    params.kv_unified = false;
-    // n_ctx_checkpoints: max context checkpoints per slot (default: 8)
-    params.n_ctx_checkpoints = 8;
-
-    // llama memory fit fails if we don't provide a buffer for tensor overrides
-    const size_t ntbo = llama_max_tensor_buft_overrides();
-    while (params.tensor_buft_overrides.size() < ntbo) {
-        params.tensor_buft_overrides.push_back({nullptr, nullptr});
-    }
+    // kv_unified: enable unified KV cache. Upstream's server auto-enables this
+    // when the slot count is auto (-np <0), bumping n_parallel to 4 alongside.
+    // LocalAI keeps n_parallel=1 by default, which would skip that auto path
+    // and leave kv_unified=false. We flip the default to true here so the
+    // server-side prompt cache (cache_idle_slots) is actually usable on the
+    // single-slot path that LocalAI ships with: without it, idle slots are
+    // never persisted across requests and the prompt cache is dead weight.
+    // Users can opt out with `options: [ "kv_unified:false" ]`.
+    params.kv_unified = true;
+    // n_ctx_checkpoints: max context checkpoints per slot. Match upstream's
+    // default (32); the previous LocalAI-specific 8 was unnecessarily tight
+    // and limits partial-prefix recovery without a clear memory rationale.
+    params.n_ctx_checkpoints = 32;
+    // cache_idle_slots: save and clear idle slot KV to the prompt cache on
+    // task switch. Upstream default is true; the server auto-disables it if
+    // kv_unified=false or cache_ram_mib=0, so flipping kv_unified above is
+    // what actually unlocks it.
+    params.cache_idle_slots = true;
+    // checkpoint_every_nt: create a context checkpoint every N tokens during
+    // prefill (-1 disables). Match upstream's default (8192).
+    params.checkpoint_every_nt = 8192;

     // decode options. Options are in form optname:optvale, or if booleans only optname.
    for (int i = 0; i < request->options_size(); i++) {
@@ -685,7 +696,29 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                try {
                    params.n_ctx_checkpoints = std::stoi(optval_str);
                } catch (const std::exception& e) {
-                    // If conversion fails, keep default value (8)
+                    // If conversion fails, keep default value (32)
+                }
+            }
+
+        // --- server-side idle-slot prompt cache toggle (upstream --cache-idle-slots) ---
+        // Saves the slot's KV state into the host-side prompt cache on task
+        // switch so a later request with the same prefix can warm-load it.
+        // Auto-disabled by the server if kv_unified=false or cache_ram=0.
+        } else if (!strcmp(optname, "cache_idle_slots") || !strcmp(optname, "idle_slots_cache")) {
+            if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
+                params.cache_idle_slots = true;
+            } else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
+                params.cache_idle_slots = false;
+            }
+
+        // --- prefill checkpoint cadence (upstream -cpent / --checkpoint-every-n-tokens) ---
+        // -1 disables checkpointing during prefill.
+        } else if (!strcmp(optname, "checkpoint_every_nt") || !strcmp(optname, "checkpoint_every_n_tokens")) {
+            if (optval != NULL) {
+                try {
+                    params.checkpoint_every_nt = std::stoi(optval_str);
+                } catch (const std::exception& e) {
+                    // If conversion fails, keep default value (8192)
                }
            }

@@ -1081,6 +1114,20 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
        params.kv_overrides.back().key[0] = 0;
    }

+    // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
+    // Real entries are pushed during option parsing; here we pad/terminate so the
+    // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
+    // and so llama_params_fit has the placeholder slots it requires.
+    {
+        const size_t ntbo = llama_max_tensor_buft_overrides();
+        while (params.tensor_buft_overrides.size() < ntbo) {
+            params.tensor_buft_overrides.push_back({nullptr, nullptr});
+        }
+    }
+    if (!params.speculative.draft.tensor_buft_overrides.empty()) {
+        params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
+    }
+
    // TODO: Add yarn

    if (!request->tensorsplit().empty()) {
--- a/backend/go/acestep-cpp/Makefile
+++ b/backend/go/acestep-cpp/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # acestep.cpp version
 ACESTEP_REPO?=https://github.com/ace-step/acestep.cpp
-ACESTEP_CPP_VERSION?=e0c8d75a672fca5684c88c68dbf6d12f58754258
+ACESTEP_CPP_VERSION?=ed53caf164e4492a5620b2e3f2264629cf66da24
 SO_TARGET?=libgoacestepcpp.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/go/acestep-cpp/cpp/goacestepcpp.cpp
+++ b/backend/go/acestep-cpp/cpp/goacestepcpp.cpp
@@ -22,12 +22,11 @@
 #include <vector>

 // Global model contexts (loaded once, reused across requests)
-static DiTGGML       g_dit       = {};
-static DiTGGMLConfig g_dit_cfg;
-static VAEGGML       g_vae       = {};
-static bool          g_dit_loaded = false;
-static bool          g_vae_loaded = false;
-static bool          g_is_turbo   = false;
+static DiTGGML g_dit        = {};
+static VAEGGML g_vae        = {};
+static bool    g_dit_loaded = false;
+static bool    g_vae_loaded = false;
+static bool    g_is_turbo   = false;

 // Silence latent [15000, 64] — read once from DiT GGUF
 static std::vector<float> g_silence_full;
@@ -72,10 +71,9 @@ int load_model(const char * lm_model_path, const char * text_encoder_path,
    g_text_enc_path = text_encoder_path;
    g_dit_path      = dit_model_path;

-    // Load DiT model
+    // Load DiT model (backend init + config are handled inside dit_ggml_load)
    fprintf(stderr, "[acestep-cpp] Loading DiT from %s\n", dit_model_path);
-    dit_ggml_init_backend(&g_dit);
-    if (!dit_ggml_load(&g_dit, dit_model_path, g_dit_cfg, nullptr, 0.0f)) {
+    if (!dit_ggml_load(&g_dit, dit_model_path)) {
        fprintf(stderr, "[acestep-cpp] FATAL: failed to load DiT from %s\n", dit_model_path);
        return 1;
    }
@@ -149,16 +147,16 @@ int generate_music(const char * caption, const char * lyrics, int bpm,

    // Compute T (latent frames at 25Hz)
    int T = (int)(duration * FRAMES_PER_SECOND);
-    T     = ((T + g_dit_cfg.patch_size - 1) / g_dit_cfg.patch_size) * g_dit_cfg.patch_size;
-    int S = T / g_dit_cfg.patch_size;
+    T     = ((T + g_dit.cfg.patch_size - 1) / g_dit.cfg.patch_size) * g_dit.cfg.patch_size;
+    int S = T / g_dit.cfg.patch_size;

    if (T > 15000) {
        fprintf(stderr, "[acestep-cpp] ERROR: T=%d exceeds max 15000\n", T);
        return 2;
    }

-    int Oc     = g_dit_cfg.out_channels;      // 64
-    int ctx_ch = g_dit_cfg.in_channels - Oc;  // 128
+    int Oc     = g_dit.cfg.out_channels;      // 64
+    int ctx_ch = g_dit.cfg.in_channels - Oc;  // 128

    fprintf(stderr, "[acestep-cpp] T=%d, S=%d, duration=%.1fs, seed=%d\n", T, S, duration, seed);

@@ -191,9 +189,8 @@ int generate_music(const char * caption, const char * lyrics, int bpm,

    fprintf(stderr, "[acestep-cpp] caption: %d tokens, lyrics: %d tokens\n", S_text, S_lyric);

-    // 4. Text encoder forward
+    // 4. Text encoder forward (backend init handled inside qwen3_load_text_encoder)
    Qwen3GGML text_enc = {};
-    qwen3_init_backend(&text_enc);
    if (!qwen3_load_text_encoder(&text_enc, g_text_enc_path.c_str())) {
        fprintf(stderr, "[acestep-cpp] FATAL: failed to load text encoder\n");
        return 4;
@@ -209,9 +206,8 @@ int generate_music(const char * caption, const char * lyrics, int bpm,
    std::vector<float> lyric_embed(H_text * S_lyric);
    qwen3_embed_lookup(&text_enc, lyric_ids.data(), S_lyric, lyric_embed.data());

-    // 6. Condition encoder
+    // 6. Condition encoder (backend init handled inside cond_ggml_load)
    CondGGML cond = {};
-    cond_ggml_init_backend(&cond);
    if (!cond_ggml_load(&cond, g_dit_path.c_str())) {
        fprintf(stderr, "[acestep-cpp] FATAL: failed to load condition encoder\n");
        qwen3_free(&text_enc);
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=bd17f53b7386fb5f60e8587b75e73c4b2fed3426
+STABLEDIFFUSION_GGML_VERSION?=3a8788cb7d74f185d6b18688e9563015524ecaf5

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/stablediffusion-ggml/cpp/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/cpp/gosd.cpp
@@ -1188,6 +1188,9 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int
    p->high_noise_sample_params.scheduler                = scheduler;
    p->high_noise_sample_params.flow_shift               = flow_shift;

+    // Pin output fps in params; upstream uses it for audio sync (and we also mux at this rate).
+    p->fps = fps;
+
    // Load init/end reference images if provided (resized to output dims).
    uint8_t* init_buf = nullptr;
    uint8_t* end_buf  = nullptr;
@@ -1206,11 +1209,14 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int

    // Generate
    int num_frames_out = 0;
-    sd_image_t* frames = generate_video(sd_c, p, &num_frames_out);
+    sd_image_t* frames = nullptr;
+    sd_audio_t* audio = nullptr;
+    bool ok = generate_video(sd_c, p, &frames, &num_frames_out, &audio);
    std::free(p);

-    if (!frames || num_frames_out == 0) {
+    if (!ok || !frames || num_frames_out == 0) {
        fprintf(stderr, "generate_video produced no frames\n");
+        if (audio) free_sd_audio(audio);
        if (init_buf) free(init_buf);
        if (end_buf) free(end_buf);
        return 1;
@@ -1224,6 +1230,7 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int
        if (frames[i].data) free(frames[i].data);
    }
    free(frames);
+    if (audio) free_sd_audio(audio);
    if (init_buf) free(init_buf);
    if (end_buf) free(end_buf);

--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=968eebe77225d25e57a3f981da7c696310f0e881
+WHISPER_CPP_VERSION?=8443cf05e3fa8ce1b32348e1bcbcf8fc31f7f3ae
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -2,9 +2,9 @@ torch==2.7.1
 llvmlite==0.43.0
 numba==0.60.0
 accelerate
-transformers>=5.8.0
+transformers>=5.8.1
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -2,9 +2,9 @@ torch==2.7.1
 accelerate
 llvmlite==0.43.0
 numba==0.60.0
-transformers>=5.8.0
+transformers>=5.8.1
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-cublas13.txt
+++ b/backend/python/transformers/requirements-cublas13.txt
@@ -2,9 +2,9 @@
 torch==2.9.0
 llvmlite==0.43.0
 numba==0.60.0
-transformers>=5.8.0
+transformers>=5.8.1
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -1,11 +1,11 @@
 --extra-index-url https://download.pytorch.org/whl/rocm7.0
 torch==2.10.0+rocm7.0
 accelerate
-transformers>=5.8.0
+transformers>=5.8.1
 llvmlite==0.43.0
 numba==0.60.0
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -3,9 +3,9 @@ torch
 optimum[openvino]
 llvmlite==0.43.0
 numba==0.60.0
-transformers>=5.8.0
+transformers>=5.8.1
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-mps.txt
+++ b/backend/python/transformers/requirements-mps.txt
@@ -2,9 +2,9 @@ torch==2.7.1
 llvmlite==0.43.0
 numba==0.60.0
 accelerate
-transformers>=5.8.0
+transformers>=5.8.1
 bitsandbytes
-sentence-transformers==5.4.0
+sentence-transformers==5.5.0
 diffusers
 soundfile
 protobuf==6.33.5
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -212,12 +212,12 @@ func New(opts ...config.AppOption) (*Application, error) {
 		}
 	}

-	if err := coreStartup.InstallModels(options.Context, application.GalleryService(), options.Galleries, options.BackendGalleries, options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, nil, options.ModelsURL...); err != nil {
+	if err := coreStartup.InstallModels(options.Context, application.GalleryService(), options.Galleries, options.BackendGalleries, options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.RequireBackendIntegrity, nil, options.ModelsURL...); err != nil {
 		xlog.Error("error installing models", "error", err)
 	}

 	for _, backend := range options.ExternalBackends {
-		if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", ""); err != nil {
+		if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", "", options.RequireBackendIntegrity); err != nil {
 			xlog.Error("error installing external backend", "error", err)
 		}
 	}
@@ -267,13 +267,13 @@ func New(opts ...config.AppOption) (*Application, error) {
 	}

 	if options.PreloadJSONModels != "" {
-		if err := galleryop.ApplyGalleryFromString(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadJSONModels); err != nil {
+		if err := galleryop.ApplyGalleryFromString(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadJSONModels, options.RequireBackendIntegrity); err != nil {
 			return nil, err
 		}
 	}

 	if options.PreloadModelsFromPath != "" {
-		if err := galleryop.ApplyGalleryFromFile(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadModelsFromPath); err != nil {
+		if err := galleryop.ApplyGalleryFromFile(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadModelsFromPath, options.RequireBackendIntegrity); err != nil {
 			return nil, err
 		}
 	}
@@ -552,6 +552,13 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
 			options.TracingMaxItems = *settings.TracingMaxItems
 		}
 	}
+	if settings.TracingMaxBodyBytes != nil {
+		// Allow the on-disk setting to override the CLI/env default. The
+		// startup default is non-zero (see NewApplicationConfig), so a plain
+		// `== 0` guard like the others would never trigger; we instead respect
+		// any value the file specifies. 0 in the file means "uncapped".
+		options.TracingMaxBodyBytes = *settings.TracingMaxBodyBytes
+	}

 	// Branding / whitelabeling. There are no env vars for these — the file is
 	// the only source — so apply unconditionally. Without this block a server
--- a/core/application/upgrade_checker.go
+++ b/core/application/upgrade_checker.go
@@ -217,7 +217,7 @@ func (uc *UpgradeChecker) runCheck(ctx context.Context) {
 				err = bm.UpgradeBackend(ctx, name, nil)
 			} else {
 				err = gallery.UpgradeBackend(ctx, uc.systemState, uc.modelLoader,
-					uc.galleries, name, nil)
+					uc.galleries, name, nil, uc.appConfig.RequireBackendIntegrity)
 			}
 			if err != nil {
 				xlog.Error("Failed to auto-upgrade backend",
--- a/core/backend/llm.go
+++ b/core/backend/llm.go
@@ -86,7 +86,7 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
 		if !slices.Contains(modelNames, modelName) {
 			utils.ResetDownloadTimers()
 			// if we failed to load the model, we try to download it
-			err := gallery.InstallModelFromGallery(ctx, o.Galleries, o.BackendGalleries, o.SystemState, loader, modelName, gallery.GalleryModel{}, utils.DisplayDownloadFunction, o.EnforcePredownloadScans, o.AutoloadBackendGalleries)
+			err := gallery.InstallModelFromGallery(ctx, o.Galleries, o.BackendGalleries, o.SystemState, loader, modelName, gallery.GalleryModel{}, utils.DisplayDownloadFunction, o.EnforcePredownloadScans, o.AutoloadBackendGalleries, o.RequireBackendIntegrity)
 			if err != nil {
 				xlog.Error("failed to install model from gallery", "error", err, "model", modelFile)
 				//return nil, err
--- a/core/cli/backends.go
+++ b/core/cli/backends.go
@@ -17,9 +17,10 @@ import (
 )

 type BackendsCMDFlags struct {
-	BackendGalleries   string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
-	BackendsPath       string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"storage"`
-	BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
+	BackendGalleries        string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
+	BackendsPath            string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"storage"`
+	BackendsSystemPath      string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
+	RequireBackendIntegrity bool   `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
 }

 type BackendsList struct {
@@ -126,7 +127,7 @@ func (bi *BackendsInstall) Run(ctx *cliContext.Context) error {
 	}

 	modelLoader := model.NewModelLoader(systemState)
-	err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias)
+	err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias, bi.RequireBackendIntegrity)
 	if err != nil {
 		return err
 	}
@@ -197,7 +198,7 @@ func (bu *BackendsUpgrade) Run(ctx *cliContext.Context) error {
 			}
 		}

-		if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback); err != nil {
+		if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback, bu.RequireBackendIntegrity); err != nil {
 			fmt.Printf("Failed to upgrade %s: %v\n", name, err)
 		} else {
 			fmt.Printf("Backend %s upgraded successfully\n", name)
--- a/core/cli/models.go
+++ b/core/cli/models.go
@@ -32,6 +32,7 @@ type ModelsList struct {

 type ModelsInstall struct {
 	DisablePredownloadScan   bool     `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"`
+	RequireBackendIntegrity  bool     `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
 	AutoloadBackendGalleries bool     `env:"LOCALAI_AUTOLOAD_BACKEND_GALLERIES" help:"If true, automatically loads backend galleries" group:"backends" default:"true"`
 	ModelArgs                []string `arg:"" optional:"" name:"models" help:"Model configuration URLs to load"`

@@ -71,7 +72,6 @@ func (ml *ModelsList) Run(ctx *cliContext.Context) error {
 }

 func (mi *ModelsInstall) Run(ctx *cliContext.Context) error {
-
 	systemState, err := system.GetSystemState(
 		system.WithModelPath(mi.ModelsPath),
 		system.WithBackendPath(mi.BackendsPath),
@@ -135,7 +135,7 @@ func (mi *ModelsInstall) Run(ctx *cliContext.Context) error {
 		}

 		modelLoader := model.NewModelLoader(systemState)
-		err = startup.InstallModels(context.Background(), galleryService, galleries, backendGalleries, systemState, modelLoader, !mi.DisablePredownloadScan, mi.AutoloadBackendGalleries, progressCallback, modelName)
+		err = startup.InstallModels(context.Background(), galleryService, galleries, backendGalleries, systemState, modelLoader, !mi.DisablePredownloadScan, mi.AutoloadBackendGalleries, mi.RequireBackendIntegrity, progressCallback, modelName)
 		if err != nil {
 			return err
 		}
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -67,6 +67,7 @@ type RunCMD struct {
 	OllamaAPIRootEndpoint              bool     `env:"LOCALAI_OLLAMA_API_ROOT_ENDPOINT" default:"false" help:"Register Ollama-compatible health check on / (replaces web UI on root path). The /api/* Ollama endpoints are always available regardless of this flag" group:"api"`
 	DisableRuntimeSettings             bool     `env:"LOCALAI_DISABLE_RUNTIME_SETTINGS,DISABLE_RUNTIME_SETTINGS" default:"false" help:"Disables the runtime settings. When set to true, the server will not load the runtime settings from the runtime_settings.json file" group:"api"`
 	DisablePredownloadScan             bool     `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"`
+	RequireBackendIntegrity            bool     `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, backend installs without a configured signature verification policy (for OCI URIs) or SHA256 (for tarball/HTTP URIs) are rejected. Default is to warn and install. Set this in production once your gallery's verification: block is populated." group:"hardening" default:"false"`
 	OpaqueErrors                       bool     `env:"LOCALAI_OPAQUE_ERRORS" default:"false" help:"If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended." group:"hardening"`
 	UseSubtleKeyComparison             bool     `env:"LOCALAI_SUBTLE_KEY_COMPARISON" default:"false" help:"If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resiliancy against timing attacks." group:"hardening"`
 	DisableApiKeyRequirementForHttpGet bool     `env:"LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET" default:"false" help:"If true, a valid API key is not required to issue GET requests to portions of the web ui. This should only be enabled in secure testing environments" group:"hardening"`
@@ -99,6 +100,7 @@ type RunCMD struct {
 	LoadToMemory                       []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"`
 	EnableTracing                      bool     `env:"LOCALAI_ENABLE_TRACING,ENABLE_TRACING" help:"Enable API tracing" group:"api"`
 	TracingMaxItems                    int      `env:"LOCALAI_TRACING_MAX_ITEMS" default:"1024" help:"Maximum number of traces to keep" group:"api"`
+	TracingMaxBodyBytes                int      `env:"LOCALAI_TRACING_MAX_BODY_BYTES" default:"65536" help:"Maximum bytes captured per request/response body in the trace buffer (0 = uncapped). Caps memory growth from chatty endpoints like /embeddings." group:"api"`
 	AgentJobRetentionDays              int      `env:"LOCALAI_AGENT_JOB_RETENTION_DAYS,AGENT_JOB_RETENTION_DAYS" default:"30" help:"Number of days to keep agent job history (default: 30)" group:"api"`
 	OpenResponsesStoreTTL              string   `env:"LOCALAI_OPEN_RESPONSES_STORE_TTL,OPEN_RESPONSES_STORE_TTL" default:"0" help:"TTL for Open Responses store (e.g., 1h, 30m, 0 = no expiration)" group:"api"`

@@ -272,6 +274,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		opts = append(opts, config.EnableTracing)
 	}
 	opts = append(opts, config.WithTracingMaxItems(r.TracingMaxItems))
+	opts = append(opts, config.WithTracingMaxBodyBytes(r.TracingMaxBodyBytes))

 	token := ""
 	if r.Peer2Peer || r.Peer2PeerToken != "" {
@@ -503,6 +506,10 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		opts = append(opts, config.WithAutoUpgradeBackends(r.AutoUpgradeBackends))
 	}

+	if r.RequireBackendIntegrity {
+		opts = append(opts, config.WithRequireBackendIntegrity(r.RequireBackendIntegrity))
+	}
+
 	if r.PreferDevelopmentBackends {
 		opts = append(opts, config.WithPreferDevelopmentBackends(r.PreferDevelopmentBackends))
 	}
--- a/core/cli/worker/worker.go
+++ b/core/cli/worker/worker.go
@@ -1,10 +1,11 @@
 package worker

 type WorkerFlags struct {
-	BackendsPath       string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"backends"`
-	BackendGalleries   string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
-	BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
-	ExtraLLamaCPPArgs  string `name:"llama-cpp-args" env:"LOCALAI_EXTRA_LLAMA_CPP_ARGS,EXTRA_LLAMA_CPP_ARGS" help:"Extra arguments to pass to llama-cpp-rpc-server"`
+	BackendsPath            string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"backends"`
+	BackendGalleries        string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
+	BackendsSystemPath      string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
+	RequireBackendIntegrity bool   `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
+	ExtraLLamaCPPArgs       string `name:"llama-cpp-args" env:"LOCALAI_EXTRA_LLAMA_CPP_ARGS,EXTRA_LLAMA_CPP_ARGS" help:"Extra arguments to pass to llama-cpp-rpc-server"`
 }

 type Worker struct {
--- a/core/cli/worker/worker_backend_common.go
+++ b/core/cli/worker/worker_backend_common.go
@@ -18,7 +18,7 @@ import (
 // installing the backend from the gallery if it isn't present.
 // `name` is the gallery entry name (for vLLM the meta entry "vllm"
 // resolves to a platform-specific package via capability lookup).
-func findBackendPath(name, galleries string, systemState *system.SystemState) (string, error) {
+func findBackendPath(name, galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
 	backends, err := gallery.ListSystemBackends(systemState)
 	if err != nil {
 		return "", err
@@ -33,7 +33,7 @@ func findBackendPath(name, galleries string, systemState *system.SystemState) (s
 		xlog.Error("failed loading galleries", "error", err)
 		return "", err
 	}
-	if err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, name, nil, true); err != nil {
+	if err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, name, nil, true, requireIntegrity); err != nil {
 		xlog.Error("backend not found, failed to install it", "name", name, "error", err)
 		return "", err
 	}
--- a/core/cli/worker/worker_llamacpp.go
+++ b/core/cli/worker/worker_llamacpp.go
@@ -27,7 +27,7 @@ const (
 	llamaCPPGalleryName   = "llama-cpp"
 )

-func findLLamaCPPBackend(galleries string, systemState *system.SystemState) (string, error) {
+func findLLamaCPPBackend(galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
 	backends, err := gallery.ListSystemBackends(systemState)
 	if err != nil {
 		xlog.Warn("Failed listing system backends", "error", err)
@@ -43,7 +43,7 @@ func findLLamaCPPBackend(galleries string, systemState *system.SystemState) (str
 			xlog.Error("failed loading galleries", "error", err)
 			return "", err
 		}
-		err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, llamaCPPGalleryName, nil, true)
+		err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, llamaCPPGalleryName, nil, true, requireIntegrity)
 		if err != nil {
 			xlog.Error("llama-cpp backend not found, failed to install it", "error", err)
 			return "", err
@@ -76,7 +76,7 @@ func (r *LLamaCPP) Run(ctx *cliContext.Context) error {
 	if err != nil {
 		return err
 	}
-	grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState)
+	grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
 	if err != nil {
 		return err
 	}
--- a/core/cli/worker/worker_mlx_common.go
+++ b/core/cli/worker/worker_mlx_common.go
@@ -9,8 +9,8 @@ import (

 const mlxDistributedGalleryName = "mlx-distributed"

-func findMLXDistributedBackendPath(galleries string, systemState *system.SystemState) (string, error) {
-	return findBackendPath(mlxDistributedGalleryName, galleries, systemState)
+func findMLXDistributedBackendPath(galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
+	return findBackendPath(mlxDistributedGalleryName, galleries, systemState, requireIntegrity)
 }

 // buildMLXCommand builds the exec.Cmd to launch the mlx-distributed backend.
--- a/core/cli/worker/worker_mlx_distributed.go
+++ b/core/cli/worker/worker_mlx_distributed.go
@@ -28,7 +28,7 @@ func (r *MLXDistributed) Run(ctx *cliContext.Context) error {
 		return err
 	}

-	backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState)
+	backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
 	if err != nil {
 		return fmt.Errorf("cannot find mlx-distributed backend: %w", err)
 	}
--- a/core/cli/worker/worker_p2p.go
+++ b/core/cli/worker/worker_p2p.go
@@ -73,7 +73,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
 			for {
 				xlog.Info("Starting llama-cpp-rpc-server", "address", address, "port", port)

-				grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState)
+				grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
 				if err != nil {
 					xlog.Error("Failed to find llama-cpp-rpc-server", "error", err)
 					return
--- a/core/cli/worker/worker_p2p_mlx.go
+++ b/core/cli/worker/worker_p2p_mlx.go
@@ -48,7 +48,7 @@ func (r *P2PMLX) Run(ctx *cliContext.Context) error {
 	c, cancel := context.WithCancel(context.Background())
 	defer cancel()

-	backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState)
+	backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
 	if err != nil {
 		xlog.Warn("Could not find mlx-distributed backend from gallery, will try backend.py directly", "error", err)
 	}
--- a/core/cli/worker/worker_vllm.go
+++ b/core/cli/worker/worker_vllm.go
@@ -77,7 +77,7 @@ func (r *VLLMDistributed) Run(ctx *cliContext.Context) error {
 		return fmt.Errorf("getting system state: %w", err)
 	}

-	backendPath, err := findBackendPath("vllm", r.BackendGalleries, systemState)
+	backendPath, err := findBackendPath("vllm", r.BackendGalleries, systemState, r.RequireBackendIntegrity)
 	if err != nil {
 		return fmt.Errorf("cannot find vllm backend: %w", err)
 	}
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -21,6 +21,7 @@ type ApplicationConfig struct {
 	Debug                               bool
 	EnableTracing                       bool
 	TracingMaxItems                     int
+	TracingMaxBodyBytes                 int // Per-body cap for captured request/response bodies; 0 disables the cap
 	EnableBackendLogging                bool
 	GeneratedContentDir                 string

@@ -60,6 +61,13 @@ type ApplicationConfig struct {
 	AutoUpgradeBackends                         bool
 	PreferDevelopmentBackends                   bool

+	// RequireBackendIntegrity promotes a missing SHA256 (tarball/HTTP URIs)
+	// or missing verification policy (OCI URIs) from a warning to a hard
+	// failure during backend install/upgrade. Off by default to keep
+	// upgrades non-breaking; operators opt in explicitly via
+	// --require-backend-integrity / LOCALAI_REQUIRE_BACKEND_INTEGRITY.
+	RequireBackendIntegrity bool
+
 	SingleBackend           bool // Deprecated: use MaxActiveBackends = 1 instead
 	MaxActiveBackends       int  // Maximum number of active backends (0 = unlimited, 1 = single backend mode)
 	WatchDogIdle bool
@@ -180,6 +188,7 @@ func NewApplicationConfig(o ...AppOption) *ApplicationConfig {
 		LRUEvictionRetryInterval: 1 * time.Second,        // Default: 1 second
 		WatchDogInterval:         500 * time.Millisecond, // Default: 500ms
 		TracingMaxItems:          1024,
+		TracingMaxBodyBytes:      64 * 1024, // 64 KiB - caps each request/response body in the trace buffer
 		AgentPool: AgentPoolConfig{
 			Enabled:         true,
 			Timeout:         "5m",
@@ -436,6 +445,10 @@ func WithAutoUpgradeBackends(v bool) AppOption {
 	return func(o *ApplicationConfig) { o.AutoUpgradeBackends = v }
 }

+func WithRequireBackendIntegrity(v bool) AppOption {
+	return func(o *ApplicationConfig) { o.RequireBackendIntegrity = v }
+}
+
 func WithPreferDevelopmentBackends(v bool) AppOption {
 	return func(o *ApplicationConfig) { o.PreferDevelopmentBackends = v }
 }
@@ -567,6 +580,12 @@ func WithTracingMaxItems(items int) AppOption {
 	}
 }

+func WithTracingMaxBodyBytes(bytes int) AppOption {
+	return func(o *ApplicationConfig) {
+		o.TracingMaxBodyBytes = bytes
+	}
+}
+
 func WithGeneratedContentDir(generatedContentDir string) AppOption {
 	return func(o *ApplicationConfig) {
 		o.GeneratedContentDir = generatedContentDir
@@ -909,6 +928,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 	f16 := o.F16
 	debug := o.Debug
 	tracingMaxItems := o.TracingMaxItems
+	tracingMaxBodyBytes := o.TracingMaxBodyBytes
 	enableTracing := o.EnableTracing
 	enableBackendLogging := o.EnableBackendLogging
 	cors := o.CORS
@@ -997,6 +1017,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 		F16:                       &f16,
 		Debug:                     &debug,
 		TracingMaxItems:           &tracingMaxItems,
+		TracingMaxBodyBytes:       &tracingMaxBodyBytes,
 		EnableTracing:             &enableTracing,
 		EnableBackendLogging:      &enableBackendLogging,
 		CORS:                      &cors,
@@ -1135,6 +1156,9 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
 	if settings.TracingMaxItems != nil {
 		o.TracingMaxItems = *settings.TracingMaxItems
 	}
+	if settings.TracingMaxBodyBytes != nil {
+		o.TracingMaxBodyBytes = *settings.TracingMaxBodyBytes
+	}
 	if settings.EnableBackendLogging != nil {
 		o.EnableBackendLogging = *settings.EnableBackendLogging
 	}
--- a/core/config/gallery.go
+++ b/core/config/gallery.go
@@ -1,6 +1,37 @@
 package config

-type Gallery struct {
-	URL  string `json:"url" yaml:"url"`
-	Name string `json:"name" yaml:"name"`
+// GalleryVerification declares the keyless-cosign signature policy that
+// every OCI backend image fetched from this gallery must satisfy.
+//
+// Verification is opt-in: galleries without a Verification block install
+// backends with no signature check (the downloader logs a warning when
+// LOCALAI_REQUIRE_BACKEND_INTEGRITY is unset; that flag turns the warning
+// into a hard error).
+//
+// Identity matching: set Issuer (exact) or IssuerRegex, AND Identity
+// (exact) or IdentityRegex. For GitHub Actions keyless signing the
+// typical shape is:
+//
+//	verification:
+//	  issuer: "https://token.actions.githubusercontent.com"
+//	  identity_regex: "^https://github\\.com/mudler/local-ai-backends/\\.github/workflows/build\\.yaml@refs/heads/master$"
+//	  not_before: "2026-05-01T00:00:00Z"
+//
+// NotBefore is the revocation lever: advance it to invalidate every
+// signature produced before a known compromise window. Keyless cosign
+// certs are ephemeral so there is no CA-side revocation.
+type GalleryVerification struct {
+	Issuer        string `json:"issuer,omitempty" yaml:"issuer,omitempty"`
+	IssuerRegex   string `json:"issuer_regex,omitempty" yaml:"issuer_regex,omitempty"`
+	Identity      string `json:"identity,omitempty" yaml:"identity,omitempty"`
+	IdentityRegex string `json:"identity_regex,omitempty" yaml:"identity_regex,omitempty"`
+
+	// NotBefore is an RFC3339 timestamp. Empty disables the time check.
+	NotBefore string `json:"not_before,omitempty" yaml:"not_before,omitempty"`
+}
+
+type Gallery struct {
+	URL          string               `json:"url" yaml:"url"`
+	Name         string               `json:"name" yaml:"name"`
+	Verification *GalleryVerification `json:"verification,omitempty" yaml:"verification,omitempty"`
 }
--- a/core/config/runtime_settings.go
+++ b/core/config/runtime_settings.go
@@ -38,6 +38,7 @@ type RuntimeSettings struct {
 	Debug                *bool `json:"debug,omitempty"`
 	EnableTracing        *bool `json:"enable_tracing,omitempty"`
 	TracingMaxItems      *int  `json:"tracing_max_items,omitempty"`
+	TracingMaxBodyBytes  *int  `json:"tracing_max_body_bytes,omitempty"` // Per-body cap in bytes; 0 disables the cap
 	EnableBackendLogging *bool `json:"enable_backend_logging,omitempty"`

 	// Security/CORS settings
--- a/core/gallery/backends.go
+++ b/core/gallery/backends.go
@@ -16,6 +16,7 @@ import (
 	"github.com/mudler/LocalAI/pkg/downloader"
 	"github.com/mudler/LocalAI/pkg/model"
 	"github.com/mudler/LocalAI/pkg/oci"
+	"github.com/mudler/LocalAI/pkg/oci/cosignverify"
 	"github.com/mudler/LocalAI/pkg/system"
 	"github.com/mudler/xlog"
 	cp "github.com/otiai10/copy"
@@ -102,8 +103,81 @@ func writeBackendMetadata(backendPath string, metadata *BackendMetadata) error {
 	return nil
 }

+// backendDownloadOptions translates the gallery's verification policy into
+// downloader options, and gates the call on strict-integrity mode. Both
+// InstallBackend and UpgradeBackend MUST route their download through these
+// options — without them, the corresponding code path silently downloads
+// and activates unverified backend bytes even when the gallery has a
+// verification: policy configured.
+//
+// For OCI URIs with a verification policy, returns a slice containing
+// downloader.WithImageVerifier(v) — the downloader will then run cosign
+// signature verification between fetching the manifest and extracting
+// layers (see pkg/downloader/uri.go OCI branch).
+//
+// For OCI URIs without a verification policy, or non-OCI URIs without a
+// SHA256, the function either returns a non-fatal warning (requireIntegrity
+// false) or fails the install (requireIntegrity true).
+func backendDownloadOptions(config *GalleryBackend, requireIntegrity bool) ([]downloader.DownloadOption, error) {
+	uri := downloader.URI(config.URI)
+	hasVerification := config.Gallery.Verification != nil
+	hasSHA := config.SHA256 != ""
+
+	switch {
+	case uri.LooksLikeOCI():
+		if !hasVerification {
+			if requireIntegrity {
+				return nil, fmt.Errorf("strict integrity: gallery %q has no verification policy for OCI backend %q (set verification: in the gallery YAML or disable --require-backend-integrity)",
+					config.Gallery.Name, config.Name)
+			}
+			xlog.Warn("installing OCI backend without signature verification",
+				"backend", config.Name, "gallery", config.Gallery.Name, "uri", config.URI)
+			return nil, nil
+		}
+		v, err := newGalleryVerifier(config.Gallery.Verification)
+		if err != nil {
+			return nil, fmt.Errorf("gallery %q verification policy: %w", config.Gallery.Name, err)
+		}
+		return []downloader.DownloadOption{downloader.WithImageVerifier(v)}, nil
+
+	case uri.LooksLikeDir():
+		// Local directory — out of scope for integrity checks.
+		return nil, nil
+
+	default:
+		if !hasSHA && requireIntegrity {
+			return nil, fmt.Errorf("strict integrity: backend %q has no SHA256 (gallery %q)",
+				config.Name, config.Gallery.Name)
+		}
+		// Non-strict: pkg/downloader already emits a warning when sha is empty.
+		return nil, nil
+	}
+}
+
+// newGalleryVerifier constructs a cosignverify.Verifier from the gallery
+// policy. Parses NotBefore (RFC3339) here so YAML errors surface at install
+// time rather than during signature verification.
+func newGalleryVerifier(p *config.GalleryVerification) (*cosignverify.Verifier, error) {
+	pol := cosignverify.Policy{
+		Issuer:        p.Issuer,
+		IssuerRegex:   p.IssuerRegex,
+		Identity:      p.Identity,
+		IdentityRegex: p.IdentityRegex,
+	}
+	if p.NotBefore != "" {
+		t, err := time.Parse(time.RFC3339, p.NotBefore)
+		if err != nil {
+			return nil, fmt.Errorf("not_before %q: %w", p.NotBefore, err)
+		}
+		pol.NotBefore = t
+	}
+	return cosignverify.NewVerifier(pol, nil, nil)
+}
+
 // InstallBackendFromGallery installs a backend from the gallery.
-func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, name string, downloadStatus func(string, string, string, float64), force bool) error {
+// requireIntegrity escalates a missing SHA256 / verification policy from a
+// warning to a hard failure (see backendDownloadOptions).
+func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, name string, downloadStatus func(string, string, string, float64), force, requireIntegrity bool) error {
 	if !force {
 		// check if we already have the backend installed
 		backends, err := ListSystemBackends(systemState)
@@ -149,7 +223,7 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
 		xlog.Debug("Installing backend from meta backend", "name", name, "bestBackend", bestBackend.Name)

 		// Then, let's install the best backend
-		if err := InstallBackend(ctx, systemState, modelLoader, bestBackend, downloadStatus); err != nil {
+		if err := InstallBackend(ctx, systemState, modelLoader, bestBackend, downloadStatus, requireIntegrity); err != nil {
 			return err
 		}

@@ -175,10 +249,10 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
 		return nil
 	}

-	return InstallBackend(ctx, systemState, modelLoader, backend, downloadStatus)
+	return InstallBackend(ctx, systemState, modelLoader, backend, downloadStatus, requireIntegrity)
 }

-func InstallBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, config *GalleryBackend, downloadStatus func(string, string, string, float64)) error {
+func InstallBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, config *GalleryBackend, downloadStatus func(string, string, string, float64), requireIntegrity bool) error {
 	// Get configurable fallback tag values from SystemState
 	latestTag, masterTag, devSuffix := getFallbackTagValues(systemState)

@@ -213,6 +287,14 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
 		return fmt.Errorf("failed to create base path: %v", err)
 	}

+	// Build the download options once and reuse for every retry path —
+	// mirrors and tag fallbacks must verify against the same gallery
+	// policy or we open a hole where a non-default URI bypasses the check.
+	downloadOpts, optsErr := backendDownloadOptions(config, requireIntegrity)
+	if optsErr != nil {
+		return fmt.Errorf("backend %q: %w", config.Name, optsErr)
+	}
+
 	uri := downloader.URI(config.URI)
 	// Check if it is a directory
 	if uri.LooksLikeDir() {
@@ -222,7 +304,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
 		}
 	} else {
 		xlog.Debug("Downloading backend", "uri", config.URI, "backendPath", backendPath)
-		if err := uri.DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err != nil {
+		if err := uri.DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err != nil {
 			xlog.Debug("Backend download failed, trying fallback", "backendPath", backendPath, "error", err)

 			// resetBackendPath cleans up partial state from a failed OCI extraction
@@ -243,7 +325,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
 				default:
 				}
 				resetBackendPath()
-				if err := downloader.URI(mirror).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
+				if err := downloader.URI(mirror).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
 					success = true
 					xlog.Debug("Downloaded backend from mirror", "uri", config.URI, "backendPath", backendPath)
 					break
@@ -256,7 +338,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
 				if fallbackURI != string(config.URI) {
 					resetBackendPath()
 					xlog.Info("Trying fallback URI", "original", config.URI, "fallback", fallbackURI)
-					if err := downloader.URI(fallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
+					if err := downloader.URI(fallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
 						xlog.Info("Downloaded backend using fallback URI", "uri", fallbackURI, "backendPath", backendPath)
 						success = true
 					} else {
@@ -265,7 +347,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
 							resetBackendPath()
 							devFallbackURI := fallbackURI + "-" + devSuffix
 							xlog.Info("Trying development fallback URI", "fallback", devFallbackURI)
-							if err := downloader.URI(devFallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
+							if err := downloader.URI(devFallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
 								xlog.Info("Downloaded backend using development fallback URI", "uri", devFallbackURI, "backendPath", backendPath)
 								success = true
 							} else {
--- a/core/gallery/backends_test.go
+++ b/core/gallery/backends_test.go
@@ -117,13 +117,13 @@ var _ = Describe("Gallery Backends", func() {

 	Describe("InstallBackendFromGallery", func() {
 		It("should return error when backend is not found", func() {
-			err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "non-existent", nil, true)
+			err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "non-existent", nil, true, false)
 			Expect(err).To(HaveOccurred())
 			Expect(err.Error()).To(ContainSubstring("no backend found with name \"non-existent\""))
 		})

 		It("should install backend from gallery", func() {
-			err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "test-backend", nil, true)
+			err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "test-backend", nil, true, false)
 			Expect(err).ToNot(HaveOccurred())
 			Expect(filepath.Join(tempDir, "test-backend", "run.sh")).To(BeARegularFile())
 		})
@@ -545,7 +545,7 @@ var _ = Describe("Gallery Backends", func() {
 				VRAM:      1000000000000,
 				Backend:   system.Backend{BackendsPath: tempDir},
 			}
-			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
+			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
 			Expect(err).NotTo(HaveOccurred())

 			metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -625,7 +625,7 @@ var _ = Describe("Gallery Backends", func() {
 				VRAM:      1000000000000,
 				Backend:   system.Backend{BackendsPath: tempDir},
 			}
-			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
+			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
 			Expect(err).NotTo(HaveOccurred())

 			metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -709,7 +709,7 @@ var _ = Describe("Gallery Backends", func() {
 				VRAM:      1000000000000,
 				Backend:   system.Backend{BackendsPath: tempDir},
 			}
-			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
+			err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
 			Expect(err).NotTo(HaveOccurred())

 			metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -808,7 +808,7 @@ var _ = Describe("Gallery Backends", func() {
 				system.WithBackendPath(newPath),
 			)
 			Expect(err).NotTo(HaveOccurred())
-			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
+			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
 			Expect(newPath).To(BeADirectory())
 			Expect(err).To(HaveOccurred()) // Will fail due to invalid URI, but path should be created
 		})
@@ -840,7 +840,7 @@ var _ = Describe("Gallery Backends", func() {
 				system.WithBackendPath(tempDir),
 			)
 			Expect(err).NotTo(HaveOccurred())
-			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
+			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
 			Expect(err).ToNot(HaveOccurred())
 			Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())
 			dat, err := os.ReadFile(filepath.Join(tempDir, "test-backend", "metadata.json"))
@@ -873,7 +873,7 @@ var _ = Describe("Gallery Backends", func() {

 			Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).ToNot(BeARegularFile())

-			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
+			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
 			Expect(err).ToNot(HaveOccurred())
 			Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())
 		})
@@ -894,7 +894,7 @@ var _ = Describe("Gallery Backends", func() {
 				system.WithBackendPath(tempDir),
 			)
 			Expect(err).NotTo(HaveOccurred())
-			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
+			err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
 			Expect(err).ToNot(HaveOccurred())
 			Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())

--- a/core/gallery/backends_version_test.go
+++ b/core/gallery/backends_version_test.go
@@ -47,7 +47,7 @@ var _ = Describe("Backend versioning", func() {
 		backend.URI = srcDir
 		backend.Version = "1.2.3"

-		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
+		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
 		Expect(err).NotTo(HaveOccurred())

 		// Read the metadata file and check version
@@ -74,7 +74,7 @@ var _ = Describe("Backend versioning", func() {
 		backend.URI = srcDir
 		backend.Version = "2.0.0"

-		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
+		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
 		Expect(err).NotTo(HaveOccurred())

 		metadataPath := filepath.Join(tempDir, "test-backend-uri", "metadata.json")
@@ -100,7 +100,7 @@ var _ = Describe("Backend versioning", func() {
 		backend.URI = srcDir
 		// Version intentionally left empty

-		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
+		err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
 		Expect(err).NotTo(HaveOccurred())

 		metadataPath := filepath.Join(tempDir, "test-backend-noversion", "metadata.json")
--- a/core/gallery/models.go
+++ b/core/gallery/models.go
@@ -77,7 +77,7 @@ func InstallModelFromGallery(
 	modelGalleries, backendGalleries []lconfig.Gallery,
 	systemState *system.SystemState,
 	modelLoader *model.ModelLoader,
-	name string, req GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool) error {
+	name string, req GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend, requireBackendIntegrity bool) error {

 	applyModel := func(model *GalleryModel) error {
 		name = strings.ReplaceAll(name, string(os.PathSeparator), "__")
@@ -137,7 +137,7 @@ func InstallModelFromGallery(
 		if automaticallyInstallBackend && installedModel.Backend != "" {
 			xlog.Debug("Installing backend", "backend", installedModel.Backend)

-			if err := InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false); err != nil {
+			if err := InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false, requireBackendIntegrity); err != nil {
 				return err
 			}
 		}
--- a/core/gallery/models_test.go
+++ b/core/gallery/models_test.go
@@ -89,7 +89,7 @@ var _ = Describe("Model test", func() {
 			Expect(models[0].URL).To(Equal(bertEmbeddingsURL))
 			Expect(models[0].Installed).To(BeFalse())

-			err = InstallModelFromGallery(context.TODO(), galleries, []config.Gallery{}, systemState, nil, "test@bert", GalleryModel{}, func(s1, s2, s3 string, f float64) {}, true, true)
+			err = InstallModelFromGallery(context.TODO(), galleries, []config.Gallery{}, systemState, nil, "test@bert", GalleryModel{}, func(s1, s2, s3 string, f float64) {}, true, true, false)
 			Expect(err).ToNot(HaveOccurred())

 			dat, err := os.ReadFile(filepath.Join(tempdir, "bert.yaml"))
--- a/core/gallery/upgrade.go
+++ b/core/gallery/upgrade.go
@@ -232,7 +232,7 @@ func summarizeNodeDrift(nodes []NodeBackendRef) (majority struct{ version, diges

 // UpgradeBackend upgrades a single backend to the latest gallery version using
 // an atomic swap with backup-based rollback on failure.
-func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, galleries []config.Gallery, backendName string, downloadStatus func(string, string, string, float64)) error {
+func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, galleries []config.Gallery, backendName string, downloadStatus func(string, string, string, float64), requireIntegrity bool) error {
 	// Look up the installed backend
 	installedBackends, err := ListSystemBackends(systemState)
 	if err != nil {
@@ -251,7 +251,7 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
 	// If this is a meta backend, recursively upgrade the concrete backend it points to
 	if installed.Metadata != nil && installed.Metadata.MetaBackendFor != "" {
 		xlog.Info("Meta backend detected, upgrading concrete backend", "meta", backendName, "concrete", installed.Metadata.MetaBackendFor)
-		return UpgradeBackend(ctx, systemState, modelLoader, galleries, installed.Metadata.MetaBackendFor, downloadStatus)
+		return UpgradeBackend(ctx, systemState, modelLoader, galleries, installed.Metadata.MetaBackendFor, downloadStatus, requireIntegrity)
 	}

 	// Find the gallery entry
@@ -265,6 +265,16 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
 		return fmt.Errorf("no gallery entry found for backend %q", backendName)
 	}

+	// Resolve integrity options (cosign verifier for OCI URIs, strict-mode
+	// gate for missing SHA256/policy) BEFORE writing anything to disk.
+	// Without this, the upgrade path would atomically swap in an
+	// unverified backend even when the gallery has a verification policy
+	// — see backendDownloadOptions in backends.go.
+	downloadOpts, err := backendDownloadOptions(galleryEntry, requireIntegrity)
+	if err != nil {
+		return fmt.Errorf("upgrade %q: %w", backendName, err)
+	}
+
 	backendPath := filepath.Join(systemState.Backend.BackendsPath, backendName)
 	tmpPath := backendPath + ".upgrade-tmp"
 	backupPath := backendPath + ".backup"
@@ -285,7 +295,7 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
 			return fmt.Errorf("failed to copy backend from directory: %w", err)
 		}
 	} else {
-		if err := uri.DownloadFileWithContext(ctx, tmpPath, "", 1, 1, downloadStatus); err != nil {
+		if err := uri.DownloadFileWithContext(ctx, tmpPath, galleryEntry.SHA256, 1, 1, downloadStatus, downloadOpts...); err != nil {
 			os.RemoveAll(tmpPath)
 			return fmt.Errorf("failed to download backend: %w", err)
 		}
--- a/core/gallery/upgrade_test.go
+++ b/core/gallery/upgrade_test.go
@@ -383,7 +383,7 @@ var _ = Describe("Upgrade Detection and Execution", func() {
 			})

 			ml := model.NewModelLoader(systemState)
-			err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil)
+			err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, false)
 			Expect(err).NotTo(HaveOccurred())

 			// Verify run.sh was updated
@@ -417,7 +417,7 @@ var _ = Describe("Upgrade Detection and Execution", func() {
 			})

 			ml := model.NewModelLoader(systemState)
-			err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil)
+			err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, false)
 			Expect(err).To(HaveOccurred())

 			// Verify v1 is still intact
@@ -432,5 +432,41 @@ var _ = Describe("Upgrade Detection and Execution", func() {
 			Expect(json.Unmarshal(metaData, &meta)).To(Succeed())
 			Expect(meta.Version).To(Equal("1.0.0"))
 		})
+
+		// Regression: an earlier version of UpgradeBackend wrote the
+		// downloaded bytes to disk without going through
+		// backendDownloadOptions, so the gallery's verification policy
+		// (and strict-integrity gate) didn't apply on upgrade. This test
+		// pins the upgrade path to the same integrity gate as installs:
+		// strict mode + an OCI URI without a verification: block must
+		// hard-fail *before* anything is downloaded or swapped in.
+		It("should refuse to upgrade an OCI backend that bypasses integrity in strict mode", func() {
+			installBackendWithVersion("my-backend", "1.0.0", "#!/bin/sh\necho v1")
+
+			// OCI URI, no Gallery.Verification → backendDownloadOptions
+			// returns a strict-integrity error before any network call.
+			writeGalleryYAML([]GalleryBackend{
+				{
+					Metadata: Metadata{
+						Name: "my-backend",
+					},
+					URI:     "oci://example.invalid/missing:never-fetched",
+					Version: "2.0.0",
+				},
+			})
+
+			ml := model.NewModelLoader(systemState)
+			err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, true)
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("strict integrity"))
+
+			// The installed v1 must be untouched — the upgrade should
+			// have aborted before writing anything.
+			content, err := os.ReadFile(filepath.Join(backendsPath, "my-backend", "run.sh"))
+			Expect(err).NotTo(HaveOccurred())
+			Expect(string(content)).To(Equal("#!/bin/sh\necho v1"))
+			Expect(filepath.Join(backendsPath, "my-backend.upgrade-tmp")).NotTo(BeAnExistingFile())
+			Expect(filepath.Join(backendsPath, "my-backend.backup")).NotTo(BeAnExistingFile())
+		})
 	})
 })
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -28,6 +28,7 @@ import (
 	"github.com/mudler/LocalAI/core/services/monitoring"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/LocalAI/core/services/quantization"
+	"github.com/mudler/LocalAI/pkg/signals"

 	"github.com/mudler/xlog"
 )
@@ -267,9 +268,12 @@ func API(application *application.Application) (*echo.Echo, error) {
 		e.Static("/generated-videos", videoPath)
 	}

-	// Initialize usage recording when auth DB is available
+	// Initialize usage recording when auth DB is available, and ensure the
+	// batcher drains its in-memory queue on graceful shutdown so the last
+	// few seconds of usage don't disappear when the process exits.
 	if application.AuthDB() != nil {
 		httpMiddleware.InitUsageRecorder(application.AuthDB())
+		signals.RegisterGracefulTerminationHandler(httpMiddleware.ShutdownUsageRecorder)
 	}

 	// Auth is applied to _all_ endpoints. Filtering out endpoints to bypass is
@@ -403,7 +407,7 @@ func API(application *application.Application) (*echo.Echo, error) {
 		}
 	}
 	routes.RegisterNodeSelfServiceRoutes(e, registry, distCfg.RegistrationToken, distCfg.AutoApproveNodes, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret)
-	routes.RegisterNodeAdminRoutes(e, registry, remoteUnloader, adminMiddleware, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret, application.ApplicationConfig().Distributed.RegistrationToken)
+	routes.RegisterNodeAdminRoutes(e, registry, remoteUnloader, application.GalleryService(), opcache, application.ApplicationConfig(), adminMiddleware, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret, application.ApplicationConfig().Distributed.RegistrationToken)

 	// Distributed SSE routes (job progress + agent events via NATS)
 	if d := application.Distributed(); d != nil {
--- a/core/http/auth/db.go
+++ b/core/http/auth/db.go
@@ -38,9 +38,15 @@ func InitDB(databaseURL string) (*gorm.DB, error) {
 	}

 	// Backfill: users created before the provider column existed have an empty
-	// provider — treat them as local accounts so the UI can identify them.
+	// provider - treat them as local accounts so the UI can identify them.
 	db.Exec("UPDATE users SET provider = ? WHERE provider = '' OR provider IS NULL", ProviderLocal)

+	// Backfill: pre-feature usage_records have no source column. Classify them so the
+	// new per-source aggregators include them.
+	if err := BackfillUsageSource(db); err != nil {
+		return nil, fmt.Errorf("failed to backfill usage source: %w", err)
+	}
+
 	// Create composite index on users(provider, subject) for fast OAuth lookups
 	if err := db.Exec("CREATE INDEX IF NOT EXISTS idx_users_provider_subject ON users(provider, subject)").Error; err != nil {
 		// Ignore error on postgres if index already exists
--- a/core/http/auth/middleware.go
+++ b/core/http/auth/middleware.go
@@ -16,8 +16,10 @@ import (
 )

 const (
-	contextKeyUser = "auth_user"
-	contextKeyRole = "auth_role"
+	contextKeyUser   = "auth_user"
+	contextKeyRole   = "auth_role"
+	contextKeyAPIKey = "auth_apikey"
+	contextKeySource = "auth_source"
 )

 // Middleware returns an Echo middleware that handles authentication.
@@ -75,6 +77,7 @@ func Middleware(db *gorm.DB, appConfig *config.ApplicationConfig) echo.Middlewar
 					}
 					c.Set(contextKeyUser, syntheticUser)
 					c.Set(contextKeyRole, RoleAdmin)
+					c.Set(contextKeySource, UsageSourceLegacy)
 					authenticated = true
 				}
 			}
@@ -213,6 +216,20 @@ func GetUserRole(c echo.Context) string {
 	return role
 }

+// GetAPIKey returns the resolved API key from the echo context, or nil.
+// Nil for session-cookie and legacy-env-key authentication.
+func GetAPIKey(c echo.Context) *UserAPIKey {
+	k, _ := c.Get(contextKeyAPIKey).(*UserAPIKey)
+	return k
+}
+
+// GetSource returns the request's authentication source: UsageSourceAPIKey,
+// UsageSourceWeb, UsageSourceLegacy, or empty if no authentication was performed.
+func GetSource(c echo.Context) string {
+	s, _ := c.Get(contextKeySource).(string)
+	return s
+}
+
 // RequireRouteFeature returns a global middleware that checks the user has access
 // to the feature required by the matched route. It uses the RouteFeatureRegistry
 // to look up the required feature for each route pattern + HTTP method.
@@ -421,47 +438,67 @@ func RequireQuota(db *gorm.DB) echo.MiddlewareFunc {
 }

 // tryAuthenticate attempts to authenticate the request using the database.
+//
+// On success it returns the user and, as a side effect, sets the following
+// values on the Echo context:
+//   - contextKeySource ("auth_source"): always set, one of UsageSourceWeb /
+//     UsageSourceAPIKey. UsageSourceLegacy is set elsewhere by the parent
+//     Middleware when a legacy env key matches.
+//   - contextKeyAPIKey ("auth_apikey"): set to the resolved *UserAPIKey for
+//     named-key branches (Bearer, x-api-key, xi-api-key, token cookie).
+//   - "_auth_session": session record, used by Middleware to drive cookie
+//     rotation. Only set on the session-cookie branch.
+//
+// contextKeyUser and contextKeyRole are populated by the parent Middleware
+// after this function returns.
 func tryAuthenticate(c echo.Context, db *gorm.DB, appConfig *config.ApplicationConfig) *User {
 	hmacSecret := appConfig.Auth.APIKeyHMACSecret

-	// a. Session cookie
+	// a. Session cookie -> web UI
 	if cookie, err := c.Cookie(sessionCookie); err == nil && cookie.Value != "" {
 		if user, session := ValidateSession(db, cookie.Value, hmacSecret); user != nil {
 			// Store session for rotation check in middleware
 			c.Set("_auth_session", session)
+			c.Set(contextKeySource, UsageSourceWeb)
 			return user
 		}
 	}

-	// b. Authorization: Bearer token
+	// b. Authorization: Bearer
 	authHeader := c.Request().Header.Get("Authorization")
 	if strings.HasPrefix(authHeader, "Bearer ") {
 		token := strings.TrimPrefix(authHeader, "Bearer ")

-		// Try as session ID first
+		// b1. Session token via Bearer -> still web UI
 		if user, _ := ValidateSession(db, token, hmacSecret); user != nil {
+			c.Set(contextKeySource, UsageSourceWeb)
 			return user
 		}

-		// Try as user API key
+		// b2. Named API key
 		if key, err := ValidateAPIKey(db, token, hmacSecret); err == nil {
+			c.Set(contextKeySource, UsageSourceAPIKey)
+			c.Set(contextKeyAPIKey, key)
 			return &key.User
 		}
 	}

-	// c. x-api-key / xi-api-key headers
+	// c. x-api-key / xi-api-key -> named API key
 	for _, header := range []string{"x-api-key", "xi-api-key"} {
-		if key := c.Request().Header.Get(header); key != "" {
-			if apiKey, err := ValidateAPIKey(db, key, hmacSecret); err == nil {
+		if k := c.Request().Header.Get(header); k != "" {
+			if apiKey, err := ValidateAPIKey(db, k, hmacSecret); err == nil {
+				c.Set(contextKeySource, UsageSourceAPIKey)
+				c.Set(contextKeyAPIKey, apiKey)
 				return &apiKey.User
 			}
 		}
 	}

-	// d. token cookie (legacy)
+	// d. token cookie -> named API key
 	if cookie, err := c.Cookie("token"); err == nil && cookie.Value != "" {
-		// Try as user API key
 		if key, err := ValidateAPIKey(db, cookie.Value, hmacSecret); err == nil {
+			c.Set(contextKeySource, UsageSourceAPIKey)
+			c.Set(contextKeyAPIKey, key)
 			return &key.User
 		}
 	}
--- a/core/http/auth/middleware_test.go
+++ b/core/http/auth/middleware_test.go
@@ -303,4 +303,122 @@ var _ = Describe("Auth Middleware", func() {
 			}
 		})
 	})
+
+	Describe("auth context plumbing for usage source", func() {
+		// probeApp builds a minimal echo app with the auth middleware and a single
+		// "/probe" route that captures the user, source, and apikey from context.
+		type probe struct {
+			user   *auth.User
+			source string
+			key    *auth.UserAPIKey
+		}
+		probeApp := func(db *gorm.DB, appConfig *config.ApplicationConfig, p *probe) *echo.Echo {
+			e := echo.New()
+			e.Use(auth.Middleware(db, appConfig))
+			e.GET("/probe", func(c echo.Context) error {
+				p.user = auth.GetUser(c)
+				p.source = auth.GetSource(c)
+				p.key = auth.GetAPIKey(c)
+				return c.NoContent(http.StatusOK)
+			})
+			return e
+		}
+
+		It("session cookie sets source=web, apikey=nil", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
+			token := createTestSession(db, user.ID)
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withSessionCookie(token))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.user).ToNot(BeNil())
+			Expect(p.user.ID).To(Equal(user.ID))
+			Expect(p.source).To(Equal(auth.UsageSourceWeb))
+			Expect(p.key).To(BeNil())
+		})
+
+		It("Bearer session token sets source=web, apikey=nil", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
+			token := createTestSession(db, user.ID)
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withBearerToken(token))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.user).ToNot(BeNil())
+			Expect(p.user.ID).To(Equal(user.ID))
+			Expect(p.source).To(Equal(auth.UsageSourceWeb))
+			Expect(p.key).To(BeNil())
+		})
+
+		It("Bearer API key sets source=apikey and exposes the resolved *UserAPIKey", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
+			plaintext, key, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
+			Expect(err).ToNot(HaveOccurred())
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withBearerToken(plaintext))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
+			Expect(p.key).ToNot(BeNil())
+			Expect(p.key.ID).To(Equal(key.ID))
+		})
+
+		It("x-api-key header sets source=apikey", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
+			plaintext, _, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
+			Expect(err).ToNot(HaveOccurred())
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withXApiKey(plaintext))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
+			Expect(p.key).ToNot(BeNil())
+		})
+
+		It("token cookie sets source=apikey", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
+			plaintext, _, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
+			Expect(err).ToNot(HaveOccurred())
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withTokenCookie(plaintext))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
+			Expect(p.key).ToNot(BeNil())
+		})
+
+		It("legacy env key sets source=legacy, apikey=nil", func() {
+			db := testDB()
+			appConfig := config.NewApplicationConfig()
+			appConfig.ApiKeys = []string{"legacy-secret"}
+
+			var p probe
+			app := probeApp(db, appConfig, &p)
+			rec := doRequest(app, http.MethodGet, "/probe", withBearerToken("legacy-secret"))
+
+			Expect(rec.Code).To(Equal(http.StatusOK))
+			Expect(p.source).To(Equal(auth.UsageSourceLegacy))
+			Expect(p.key).To(BeNil())
+		})
+	})
 })
--- a/core/http/auth/usage.go
+++ b/core/http/auth/usage.go
@@ -5,14 +5,31 @@ import (
 	"strings"
 	"time"

+	"github.com/mudler/xlog"
 	"gorm.io/gorm"
 )

+// Source classification for a UsageRecord.
+const (
+	UsageSourceAPIKey = "apikey" // request authenticated with a named UserAPIKey
+	UsageSourceWeb    = "web"    // request authenticated with a session cookie (web UI)
+	UsageSourceLegacy = "legacy" // request authenticated with an env-configured legacy key
+)
+
 // UsageRecord represents a single API request's token usage.
 type UsageRecord struct {
-	ID               uint   `gorm:"primaryKey;autoIncrement"`
-	UserID           string `gorm:"size:36;index:idx_usage_user_time"`
-	UserName         string `gorm:"size:255"`
+	ID       uint   `gorm:"primaryKey;autoIncrement"`
+	UserID   string `gorm:"size:36;index:idx_usage_user_time"`
+	UserName string `gorm:"size:255"`
+
+	// Source classifies how the request authenticated. One of UsageSource* constants.
+	// Empty for pre-feature rows until the InitDB backfill runs.
+	Source string `gorm:"size:16;index:idx_usage_source"`
+	// APIKeyID is the UserAPIKey.ID when Source == UsageSourceAPIKey. Nil otherwise.
+	APIKeyID *string `gorm:"size:36;index:idx_usage_apikey"`
+	// APIKeyName is a snapshot of UserAPIKey.Name at write time. Survives key deletion.
+	APIKeyName string `gorm:"size:255"`
+
 	Model            string `gorm:"size:255;index"`
 	Endpoint         string `gorm:"size:255"`
 	PromptTokens     int64
@@ -30,9 +47,12 @@ func RecordUsage(db *gorm.DB, record *UsageRecord) error {
 // UsageBucket is an aggregated time bucket for the dashboard.
 type UsageBucket struct {
 	Bucket           string `json:"bucket"`
-	Model            string `json:"model"`
+	Model            string `json:"model,omitempty"`
 	UserID           string `json:"user_id,omitempty"`
 	UserName         string `json:"user_name,omitempty"`
+	Source           string `json:"source,omitempty"`
+	APIKeyID         string `json:"api_key_id,omitempty"`
+	APIKeyName       string `json:"api_key_name,omitempty"`
 	PromptTokens     int64  `json:"prompt_tokens"`
 	CompletionTokens int64  `json:"completion_tokens"`
 	TotalTokens      int64  `json:"total_tokens"`
@@ -119,6 +139,28 @@ func GetUserUsage(db *gorm.DB, userID, period string) ([]UsageBucket, error) {
 	return buckets, nil
 }

+// BackfillUsageSource sets the Source column on pre-feature usage rows.
+// Idempotent: only touches rows where source is NULL or empty.
+//   - rows whose user_id == "legacy-api-key" -> UsageSourceLegacy
+//   - everything else                        -> UsageSourceWeb
+func BackfillUsageSource(db *gorm.DB) error {
+	// Legacy first (more specific predicate)
+	if err := db.Exec(
+		`UPDATE usage_records SET source = ? WHERE (source IS NULL OR source = '') AND user_id = ?`,
+		UsageSourceLegacy, "legacy-api-key",
+	).Error; err != nil {
+		return fmt.Errorf("backfill legacy usage source: %w", err)
+	}
+	// Everything else -> web
+	if err := db.Exec(
+		`UPDATE usage_records SET source = ? WHERE (source IS NULL OR source = '')`,
+		UsageSourceWeb,
+	).Error; err != nil {
+		return fmt.Errorf("backfill web usage source: %w", err)
+	}
+	return nil
+}
+
 // GetAllUsage returns aggregated usage for all users (admin). Optional userID filter.
 func GetAllUsage(db *gorm.DB, period, userID string) ([]UsageBucket, error) {
 	sqlite := isSQLiteDB(db)
@@ -149,3 +191,257 @@ func GetAllUsage(db *gorm.DB, period, userID string) ([]UsageBucket, error) {
 	}
 	return buckets, nil
 }
+
+// TotalsEntry is a token+request roll-up.
+type TotalsEntry struct {
+	Tokens   int64 `json:"tokens"`
+	Requests int64 `json:"requests"`
+}
+
+// KeyTotal is the per-key roll-up returned by sources endpoints. UserID and
+// UserName are snapshotted from the UsageRecord so revoked-and-deleted keys
+// still carry their owner attribution in admin views.
+type KeyTotal struct {
+	APIKeyID   string    `json:"api_key_id"`
+	APIKeyName string    `json:"api_key_name"`
+	UserID     string    `json:"user_id"`
+	UserName   string    `json:"user_name"`
+	Tokens     int64     `json:"tokens"`
+	Requests   int64     `json:"requests"`
+	LastUsed   time.Time `json:"last_used"`
+}
+
+// UserSourceTotal is a per-(user, source) roll-up for sources that don't carry
+// a named API key identity (web, legacy). It exists so admin views can show
+// which user generated each block of Web UI / legacy traffic; the per-apikey
+// breakdown for source=apikey already lives in KeyTotal.
+type UserSourceTotal struct {
+	Source   string `json:"source"`
+	UserID   string `json:"user_id"`
+	UserName string `json:"user_name"`
+	Tokens   int64  `json:"tokens"`
+	Requests int64  `json:"requests"`
+}
+
+// SourceTotals summarises a per-source breakdown.
+type SourceTotals struct {
+	BySource     map[string]TotalsEntry `json:"by_source"`
+	ByKey        []KeyTotal             `json:"by_key"`                   // server-sorted desc by tokens, capped
+	ByUserSource []UserSourceTotal      `json:"by_user_source,omitempty"` // populated only when includeLegacy=true
+	GrandTotal   TotalsEntry            `json:"grand_total"`
+}
+
+const maxKeyTotals = 200
+
+// GetUserUsageBySource returns per-source aggregated usage for one user. Legacy
+// is excluded by design (visible to admins only via the admin variant).
+func GetUserUsageBySource(db *gorm.DB, userID, period string) ([]UsageBucket, SourceTotals, error) {
+	sqlite := isSQLiteDB(db)
+	since, dateFmt := periodToWindow(period, sqlite)
+	bucketExpr := fmt.Sprintf("%s as bucket", dateFmt)
+
+	query := db.Model(&UsageRecord{}).
+		Select(bucketExpr+", source, COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
+			"SUM(prompt_tokens) as prompt_tokens, "+
+			"SUM(completion_tokens) as completion_tokens, "+
+			"SUM(total_tokens) as total_tokens, "+
+			"COUNT(*) as request_count").
+		Where("user_id = ?", userID).
+		Where("source <> ?", UsageSourceLegacy).
+		Group("bucket, source, api_key_id, api_key_name").
+		Order("bucket ASC")
+
+	if !since.IsZero() {
+		query = query.Where("created_at >= ?", since)
+	}
+
+	var buckets []UsageBucket
+	if err := query.Find(&buckets).Error; err != nil {
+		return nil, SourceTotals{}, err
+	}
+
+	totals := computeSourceTotals(db, userID, "", since, false)
+	return buckets, totals, nil
+}
+
+// computeSourceTotals rolls up by_source / by_key / grand_total.
+// userID/apiKeyID are optional filters. includeLegacy controls whether the
+// legacy bucket is exposed (admin-only).
+func computeSourceTotals(db *gorm.DB, userID, apiKeyID string, since time.Time, includeLegacy bool) SourceTotals {
+	totals := SourceTotals{BySource: map[string]TotalsEntry{}}
+
+	bySourceQ := db.Model(&UsageRecord{}).
+		Select("source, SUM(total_tokens) as tokens, COUNT(*) as requests").
+		Group("source")
+	bySourceQ = applyFilters(bySourceQ, userID, apiKeyID, since, includeLegacy)
+
+	var bySourceRows []struct {
+		Source   string
+		Tokens   int64
+		Requests int64
+	}
+	if err := bySourceQ.Scan(&bySourceRows).Error; err != nil {
+		xlog.Warn("computeSourceTotals: by-source Scan failed", "error", err)
+		return totals
+	}
+	for _, r := range bySourceRows {
+		totals.BySource[r.Source] = TotalsEntry{Tokens: r.Tokens, Requests: r.Requests}
+		totals.GrandTotal.Tokens += r.Tokens
+		totals.GrandTotal.Requests += r.Requests
+	}
+
+	byKeyQ := db.Model(&UsageRecord{}).
+		Select("COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
+			"user_id, user_name, "+
+			"SUM(total_tokens) as tokens, COUNT(*) as requests, MAX(created_at) as last_used").
+		Where("api_key_id IS NOT NULL AND api_key_id <> ''").
+		Group("api_key_id, api_key_name, user_id, user_name").
+		Order("tokens DESC").
+		Limit(maxKeyTotals)
+	byKeyQ = applyFilters(byKeyQ, userID, apiKeyID, since, includeLegacy)
+
+	// Iterate Rows() manually because MAX(created_at) is returned as a string by
+	// the SQLite driver, and Go's database/sql refuses to scan that into
+	// *time.Time. Postgres returns a proper timestamp. We accept both shapes
+	// via a Rows.Scan into a string column, then parse uniformly.
+	rows, err := byKeyQ.Rows()
+	if err != nil {
+		xlog.Warn("computeSourceTotals: by-key Rows() failed", "error", err)
+	} else {
+		defer func() { _ = rows.Close() }()
+		out := make([]KeyTotal, 0)
+		for rows.Next() {
+			var (
+				apiKeyID, apiKeyName, userIDCol, userName, lastUsedRaw string
+				tokens, requests                                       int64
+			)
+			if scanErr := rows.Scan(&apiKeyID, &apiKeyName, &userIDCol, &userName, &tokens, &requests, &lastUsedRaw); scanErr != nil {
+				continue
+			}
+			out = append(out, KeyTotal{
+				APIKeyID:   apiKeyID,
+				APIKeyName: apiKeyName,
+				UserID:     userIDCol,
+				UserName:   userName,
+				Tokens:     tokens,
+				Requests:   requests,
+				LastUsed:   parseLastUsedString(lastUsedRaw),
+			})
+		}
+		if rerr := rows.Err(); rerr != nil {
+			xlog.Warn("computeSourceTotals: by-key rows iteration failed", "error", rerr)
+		}
+		totals.ByKey = out
+	}
+
+	// by_user_source: only populated for admin callers (includeLegacy=true) so
+	// they can attribute Web UI / legacy traffic to specific users. Per-apikey
+	// rows already carry user info via KeyTotal above, so this query only
+	// covers source != apikey.
+	if includeLegacy {
+		byUserSourceQ := db.Model(&UsageRecord{}).
+			Select("source, user_id, user_name, "+
+				"SUM(total_tokens) as tokens, COUNT(*) as requests").
+			Where("source <> ?", UsageSourceAPIKey).
+			Group("source, user_id, user_name").
+			Order("tokens DESC")
+		byUserSourceQ = applyFilters(byUserSourceQ, userID, apiKeyID, since, includeLegacy)
+
+		var byUserSourceRows []UserSourceTotal
+		if scanErr := byUserSourceQ.Scan(&byUserSourceRows).Error; scanErr != nil {
+			xlog.Warn("computeSourceTotals: by-user-source Scan failed", "error", scanErr)
+		} else {
+			totals.ByUserSource = byUserSourceRows
+		}
+	}
+
+	return totals
+}
+
+// parseLastUsedString converts the textual MAX(created_at) value returned by
+// SQLite (or any driver that surfaces the timestamp as a string) into a
+// time.Time. Returns the zero time on parse failure.
+func parseLastUsedString(s string) time.Time {
+	if s == "" {
+		return time.Time{}
+	}
+	// GORM's SQLite driver emits Go's default time formatting. Try the formats
+	// it commonly produces, falling back to RFC3339Nano.
+	layouts := []string{
+		"2006-01-02 15:04:05.999999999 -0700 MST",
+		"2006-01-02 15:04:05.999999999-07:00",
+		"2006-01-02 15:04:05.999999999",
+		"2006-01-02 15:04:05",
+		time.RFC3339Nano,
+		time.RFC3339,
+	}
+	for _, layout := range layouts {
+		if t, err := time.Parse(layout, s); err == nil {
+			return t
+		}
+	}
+	xlog.Warn("parseLastUsedString: unrecognised format", "value", s)
+	return time.Time{}
+}
+
+// GetAllUsageBySource is the admin variant of GetUserUsageBySource.
+// Optional filters: userID and apiKeyID. Legacy is included.
+// truncated == true iff the per-key roll-up was capped at maxKeyTotals.
+func GetAllUsageBySource(db *gorm.DB, period, userID, apiKeyID string) ([]UsageBucket, SourceTotals, bool, error) {
+	sqlite := isSQLiteDB(db)
+	since, dateFmt := periodToWindow(period, sqlite)
+	bucketExpr := fmt.Sprintf("%s as bucket", dateFmt)
+
+	query := db.Model(&UsageRecord{}).
+		Select(bucketExpr+", source, COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
+			"user_id, user_name, "+
+			"SUM(prompt_tokens) as prompt_tokens, "+
+			"SUM(completion_tokens) as completion_tokens, "+
+			"SUM(total_tokens) as total_tokens, "+
+			"COUNT(*) as request_count").
+		Group("bucket, source, api_key_id, api_key_name, user_id, user_name").
+		Order("bucket ASC")
+
+	query = applyFilters(query, userID, apiKeyID, since, true)
+
+	var buckets []UsageBucket
+	if err := query.Find(&buckets).Error; err != nil {
+		return nil, SourceTotals{}, false, err
+	}
+
+	totals := computeSourceTotals(db, userID, apiKeyID, since, true)
+
+	// Count distinct api_key_ids matching the filters. If > maxKeyTotals,
+	// the by_key slice was capped and we signal truncation to the caller.
+	truncated := false
+	var distinct int64
+	countQ := applyFilters(
+		db.Model(&UsageRecord{}).
+			Distinct("api_key_id").
+			Where("api_key_id IS NOT NULL AND api_key_id <> ''"),
+		userID, apiKeyID, since, true,
+	)
+	if err := countQ.Count(&distinct).Error; err != nil {
+		xlog.Warn("GetAllUsageBySource: distinct api_key_id count failed", "error", err)
+	} else {
+		truncated = distinct > maxKeyTotals
+	}
+
+	return buckets, totals, truncated, nil
+}
+
+func applyFilters(q *gorm.DB, userID, apiKeyID string, since time.Time, includeLegacy bool) *gorm.DB {
+	if userID != "" {
+		q = q.Where("user_id = ?", userID)
+	}
+	if apiKeyID != "" {
+		q = q.Where("api_key_id = ?", apiKeyID)
+	}
+	if !since.IsZero() {
+		q = q.Where("created_at >= ?", since)
+	}
+	if !includeLegacy {
+		q = q.Where("source <> ?", UsageSourceLegacy)
+	}
+	return q
+}
--- a/core/http/auth/usage_test.go
+++ b/core/http/auth/usage_test.go
@@ -3,11 +3,13 @@
 package auth_test

 import (
+	"fmt"
 	"time"

 	"github.com/mudler/LocalAI/core/http/auth"
 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
+	"gorm.io/gorm"
 )

 var _ = Describe("Usage", func() {
@@ -158,4 +160,275 @@ var _ = Describe("Usage", func() {
 			}
 		})
 	})
+
+	Describe("Usage source backfill", func() {
+		It("backfills 'web' for pre-feature rows", func() {
+			db := testDB()
+
+			rawDB, err := db.DB()
+			Expect(err).ToNot(HaveOccurred())
+			_, err = rawDB.Exec(
+				`INSERT INTO usage_records (user_id, source, model, created_at, total_tokens, prompt_tokens, completion_tokens, duration) VALUES (?, '', ?, ?, 0, 0, 0, 0)`,
+				"user-x", "gpt-4", time.Now())
+			Expect(err).ToNot(HaveOccurred())
+
+			Expect(auth.BackfillUsageSource(db)).To(Succeed())
+
+			var loaded auth.UsageRecord
+			Expect(db.Where("user_id = ?", "user-x").First(&loaded).Error).To(Succeed())
+			Expect(loaded.Source).To(Equal(auth.UsageSourceWeb))
+		})
+
+		It("backfills 'legacy' for pre-feature rows with legacy-api-key user_id", func() {
+			db := testDB()
+
+			rawDB, err := db.DB()
+			Expect(err).ToNot(HaveOccurred())
+			_, err = rawDB.Exec(
+				`INSERT INTO usage_records (user_id, source, model, created_at, total_tokens, prompt_tokens, completion_tokens, duration) VALUES (?, '', ?, ?, 0, 0, 0, 0)`,
+				"legacy-api-key", "gpt-4", time.Now())
+			Expect(err).ToNot(HaveOccurred())
+
+			Expect(auth.BackfillUsageSource(db)).To(Succeed())
+
+			var loaded auth.UsageRecord
+			Expect(db.Where("user_id = ?", "legacy-api-key").First(&loaded).Error).To(Succeed())
+			Expect(loaded.Source).To(Equal(auth.UsageSourceLegacy))
+		})
+
+		It("is idempotent on re-run", func() {
+			db := testDB()
+			Expect(auth.BackfillUsageSource(db)).To(Succeed())
+			Expect(auth.BackfillUsageSource(db)).To(Succeed())
+		})
+	})
+
+	Describe("UsageRecord with source fields", func() {
+		It("persists Source, APIKeyID, APIKeyName", func() {
+			db := testDB()
+			keyID := "key-uuid-1"
+			record := &auth.UsageRecord{
+				UserID:      "user-1",
+				UserName:    "Test User",
+				Source:      auth.UsageSourceAPIKey,
+				APIKeyID:    &keyID,
+				APIKeyName:  "ci-runner",
+				Model:       "gpt-4",
+				Endpoint:    "/v1/chat/completions",
+				TotalTokens: 150,
+				CreatedAt:   time.Now(),
+			}
+			Expect(auth.RecordUsage(db, record)).To(Succeed())
+
+			var loaded auth.UsageRecord
+			Expect(db.First(&loaded, record.ID).Error).To(Succeed())
+			Expect(loaded.Source).To(Equal(auth.UsageSourceAPIKey))
+			Expect(loaded.APIKeyID).ToNot(BeNil())
+			Expect(*loaded.APIKeyID).To(Equal("key-uuid-1"))
+			Expect(loaded.APIKeyName).To(Equal("ci-runner"))
+		})
+
+		It("allows nil APIKeyID for web/legacy sources", func() {
+			db := testDB()
+			record := &auth.UsageRecord{
+				UserID:    "user-1",
+				Source:    auth.UsageSourceWeb,
+				Model:     "gpt-4",
+				CreatedAt: time.Now(),
+			}
+			Expect(auth.RecordUsage(db, record)).To(Succeed())
+
+			var loaded auth.UsageRecord
+			Expect(db.First(&loaded, record.ID).Error).To(Succeed())
+			Expect(loaded.Source).To(Equal(auth.UsageSourceWeb))
+			Expect(loaded.APIKeyID).To(BeNil())
+			Expect(loaded.APIKeyName).To(BeEmpty())
+		})
+	})
+
+	Describe("GetUserUsageBySource", func() {
+		insert := func(db *gorm.DB, userID, source, keyID, keyName string, tokens int64, when time.Time) {
+			rec := &auth.UsageRecord{
+				UserID:      userID,
+				Source:      source,
+				Model:       "gpt-4",
+				TotalTokens: tokens,
+				CreatedAt:   when,
+			}
+			if keyID != "" {
+				rec.APIKeyID = &keyID
+				rec.APIKeyName = keyName
+			}
+			Expect(auth.RecordUsage(db, rec)).To(Succeed())
+		}
+
+		It("returns only the caller's rows, never legacy", func() {
+			db := testDB()
+			now := time.Now()
+			insert(db, "alice", auth.UsageSourceAPIKey, "k1", "ci", 100, now)
+			insert(db, "alice", auth.UsageSourceWeb, "", "", 50, now)
+			insert(db, "alice", auth.UsageSourceLegacy, "", "", 30, now)
+			insert(db, "bob", auth.UsageSourceAPIKey, "k2", "bobk", 90, now)
+
+			buckets, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
+			Expect(err).ToNot(HaveOccurred())
+
+			for _, b := range buckets {
+				Expect(b.UserID).To(Or(BeEmpty(), Equal("alice")))
+				Expect(b.Source).ToNot(Equal(auth.UsageSourceLegacy))
+			}
+
+			Expect(totals.GrandTotal.Tokens).To(Equal(int64(150)))
+			Expect(totals.BySource[auth.UsageSourceAPIKey].Tokens).To(Equal(int64(100)))
+			Expect(totals.BySource[auth.UsageSourceWeb].Tokens).To(Equal(int64(50)))
+			_, hasLegacy := totals.BySource[auth.UsageSourceLegacy]
+			Expect(hasLegacy).To(BeFalse())
+		})
+
+		It("snapshots survive key deletion", func() {
+			db := testDB()
+			now := time.Now()
+			insert(db, "alice", auth.UsageSourceAPIKey, "deleted-key", "old-name", 42, now)
+			_, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(totals.ByKey).To(HaveLen(1))
+			Expect(totals.ByKey[0].APIKeyName).To(Equal("old-name"))
+			Expect(totals.ByKey[0].APIKeyID).To(Equal("deleted-key"))
+			Expect(totals.ByKey[0].LastUsed).ToNot(BeZero())
+			Expect(totals.ByKey[0].LastUsed).To(BeTemporally("~", now, 2*time.Second))
+		})
+	})
+
+	Describe("GetAllUsageBySource", func() {
+		insert := func(db *gorm.DB, userID, source, keyID string, tokens int64) {
+			rec := &auth.UsageRecord{
+				UserID:      userID,
+				Source:      source,
+				Model:       "gpt-4",
+				TotalTokens: tokens,
+				CreatedAt:   time.Now(),
+			}
+			if keyID != "" {
+				rec.APIKeyID = &keyID
+				rec.APIKeyName = "name-" + keyID
+			}
+			Expect(auth.RecordUsage(db, rec)).To(Succeed())
+		}
+
+		It("includes legacy for admins", func() {
+			db := testDB()
+			insert(db, "alice", auth.UsageSourceAPIKey, "k1", 10)
+			insert(db, "legacy-api-key", auth.UsageSourceLegacy, "", 5)
+
+			_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(totals.BySource).To(HaveKey(auth.UsageSourceLegacy))
+			Expect(totals.BySource[auth.UsageSourceLegacy].Tokens).To(Equal(int64(5)))
+		})
+
+		It("filters by user_id AND api_key_id", func() {
+			db := testDB()
+			insert(db, "alice", auth.UsageSourceAPIKey, "k1", 10)
+			insert(db, "alice", auth.UsageSourceAPIKey, "k2", 20)
+			insert(db, "bob", auth.UsageSourceAPIKey, "k3", 30)
+
+			_, totals, _, err := auth.GetAllUsageBySource(db, "month", "alice", "k2")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(totals.GrandTotal.Tokens).To(Equal(int64(20)))
+		})
+
+		It("sets truncated=true when by_key exceeds the cap", func() {
+			db := testDB()
+			for i := 0; i < 210; i++ {
+				insert(db, "alice", auth.UsageSourceAPIKey, fmt.Sprintf("key-%03d", i), int64(210-i))
+			}
+
+			_, totals, truncated, err := auth.GetAllUsageBySource(db, "month", "", "")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(truncated).To(BeTrue())
+			Expect(totals.ByKey).To(HaveLen(200))
+			Expect(totals.ByKey[0].Tokens > totals.ByKey[199].Tokens).To(BeTrue())
+		})
+
+		// insertNamed records a row with explicit user_id, user_name, source,
+		// and optional api key snapshot. Used by the user-attribution tests
+		// below which the older insert helper can't express.
+		insertNamed := func(db *gorm.DB, userID, userName, source, keyID, keyName string, tokens int64) {
+			rec := &auth.UsageRecord{
+				UserID:      userID,
+				UserName:    userName,
+				Source:      source,
+				Model:       "gpt-4",
+				TotalTokens: tokens,
+				CreatedAt:   time.Now(),
+			}
+			if keyID != "" {
+				rec.APIKeyID = &keyID
+				rec.APIKeyName = keyName
+			}
+			Expect(auth.RecordUsage(db, rec)).To(Succeed())
+		}
+
+		It("attributes each KeyTotal to its owner user", func() {
+			db := testDB()
+			insertNamed(db, "alice", "Alice", auth.UsageSourceAPIKey, "k1", "ci-runner", 100)
+			insertNamed(db, "bob", "Bob", auth.UsageSourceAPIKey, "k2", "lap", 50)
+
+			_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(totals.ByKey).To(HaveLen(2))
+
+			byID := map[string]auth.KeyTotal{}
+			for _, k := range totals.ByKey {
+				byID[k.APIKeyID] = k
+			}
+			Expect(byID["k1"].UserID).To(Equal("alice"))
+			Expect(byID["k1"].UserName).To(Equal("Alice"))
+			Expect(byID["k2"].UserID).To(Equal("bob"))
+			Expect(byID["k2"].UserName).To(Equal("Bob"))
+		})
+
+		It("breaks Web UI and legacy traffic out per user in by_user_source for admin", func() {
+			db := testDB()
+			// Alice and Bob both have Web UI traffic; a synthetic legacy user
+			// also contributes. ByUserSource should expose one row per
+			// (source, user) pair, never for source=apikey.
+			insertNamed(db, "alice", "Alice", auth.UsageSourceWeb, "", "", 30)
+			insertNamed(db, "bob", "Bob", auth.UsageSourceWeb, "", "", 70)
+			insertNamed(db, "legacy-api-key", "API Key User", auth.UsageSourceLegacy, "", "", 10)
+			insertNamed(db, "alice", "Alice", auth.UsageSourceAPIKey, "k1", "ci-runner", 5)
+
+			_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
+			Expect(err).ToNot(HaveOccurred())
+			Expect(totals.ByUserSource).ToNot(BeEmpty())
+
+			for _, r := range totals.ByUserSource {
+				Expect(r.Source).ToNot(Equal(auth.UsageSourceAPIKey))
+			}
+
+			webByUser := map[string]int64{}
+			legacyByUser := map[string]int64{}
+			for _, r := range totals.ByUserSource {
+				switch r.Source {
+				case auth.UsageSourceWeb:
+					webByUser[r.UserID] = r.Tokens
+				case auth.UsageSourceLegacy:
+					legacyByUser[r.UserID] = r.Tokens
+				}
+			}
+			Expect(webByUser["alice"]).To(Equal(int64(30)))
+			Expect(webByUser["bob"]).To(Equal(int64(70)))
+			Expect(legacyByUser["legacy-api-key"]).To(Equal(int64(10)))
+		})
+
+		It("does NOT populate by_user_source in the non-admin path", func() {
+			db := testDB()
+			insertNamed(db, "alice", "Alice", auth.UsageSourceWeb, "", "", 30)
+
+			_, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
+			Expect(err).ToNot(HaveOccurred())
+			// Non-admin path uses includeLegacy=false, so by_user_source stays nil.
+			Expect(totals.ByUserSource).To(BeNil())
+		})
+	})
 })
--- a/core/http/endpoints/localai/nodes.go
+++ b/core/http/endpoints/localai/nodes.go
@@ -16,8 +16,11 @@ import (
 	"github.com/google/uuid"
 	"github.com/gorilla/websocket"
 	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
 	"github.com/mudler/LocalAI/core/http/auth"
 	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"github.com/mudler/xlog"
 	"gorm.io/gorm"
@@ -381,14 +384,24 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
 	}
 }

-// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
+// InstallBackendOnNodeEndpoint triggers backend installation on a worker node.
+// Async: enqueues a ManagementOp on the gallery service channel and returns a
+// jobID immediately. The gallery service worker goroutine drives the actual
+// install via DistributedBackendManager.InstallBackend, which honors the op's
+// TargetNodeID to scope the fan-out to one node. The UI polls /api/backends/job/:uid
+// for progress, mirroring /api/backends/install/:id.
+//
 // Backend can be either a gallery ID (resolved against BackendGalleries) or a
-// direct URI install (URI + Name + optional Alias) — same shape as the
+// direct URI install (URI + Name + optional Alias) - same shape as the
 // standalone /api/backends/install-external path, just scoped to one node.
-func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
+//
+// The legacy unloader argument is retained for signature symmetry with
+// DeleteBackendOnNodeEndpoint / ListBackendsOnNodeEndpoint but is no longer
+// used here - the async path goes through galleryService.
+func InstallBackendOnNodeEndpoint(_ nodes.NodeCommandSender, galleryService *galleryop.GalleryService, opcache *galleryop.OpCache, appConfig *config.ApplicationConfig) echo.HandlerFunc {
 	return func(c echo.Context) error {
-		if unloader == nil {
-			return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "NATS not configured"))
+		if galleryService == nil {
+			return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "gallery service not configured"))
 		}
 		nodeID := c.Param("id")
 		var req struct {
@@ -401,25 +414,65 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
 		if err := c.Bind(&req); err != nil {
 			return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
 		}
-		// Either a gallery backend name or a direct URI must be supplied.
 		if req.Backend == "" && req.URI == "" {
 			return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
 		}
-		// Admin-driven backend install: not tied to a specific replica slot
-		// (no model is being loaded). Pass replica 0 to match the worker's
-		// admin process-key convention (`backend#0`). The worker's fast path
-		// takes over if the backend is already running — upgrades go through
-		// the dedicated /api/backends/upgrade path on backend.upgrade.
-		reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
+
+		jobUUID, err := uuid.NewUUID()
 		if err != nil {
-			xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
-			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
+			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to generate job id"))
 		}
-		if !reply.Success {
-			xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
-			return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
+		jobID := jobUUID.String()
+
+		// Cache key: for gallery installs, use the backend slug; for URI
+		// installs prefer the provided Name (falling back to URI). All keys
+		// are node-scoped so concurrent installs of the same backend on
+		// different nodes do not stomp each other in opcache.
+		backendKey := req.Backend
+		if backendKey == "" {
+			backendKey = req.Name
+			if backendKey == "" {
+				backendKey = req.URI
+			}
 		}
-		return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
+		cacheKey := galleryop.NodeScopedKey(nodeID, backendKey)
+		opcache.SetBackend(cacheKey, jobID)
+
+		// Optional caller-supplied galleries override. Mirrors the standalone
+		// install path so an admin can point at a private gallery.
+		galleries := appConfig.BackendGalleries
+		if req.BackendGalleries != "" {
+			var custom []config.Gallery
+			if err := json.Unmarshal([]byte(req.BackendGalleries), &custom); err != nil {
+				xlog.Warn("Ignoring malformed backend_galleries override; falling back to configured galleries", "error", err, "nodeID", nodeID)
+			} else if len(custom) > 0 {
+				galleries = custom
+			}
+		}
+
+		ctx, cancelFunc := context.WithCancel(context.Background())
+		op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
+			ID:                 jobID,
+			GalleryElementName: req.Backend,
+			Galleries:          galleries,
+			TargetNodeID:       nodeID,
+			ExternalURI:        req.URI,
+			ExternalName:       req.Name,
+			ExternalAlias:      req.Alias,
+			Context:            ctx,
+			CancelFunc:         cancelFunc,
+		}
+		galleryService.StoreCancellation(jobID, cancelFunc)
+		go func() {
+			galleryService.BackendGalleryChannel <- op
+		}()
+
+		xlog.Info("Node-scoped backend install dispatched", "node", nodeID, "backend", req.Backend, "uri", req.URI, "jobID", jobID)
+		return c.JSON(http.StatusAccepted, map[string]string{
+			"jobID":     jobID,
+			"statusUrl": "/api/backends/job/" + jobID,
+			"message":   "backend installation started",
+		})
 	}
 }

--- a/core/http/endpoints/localai/nodes_install_async_test.go
+++ b/core/http/endpoints/localai/nodes_install_async_test.go
@@ -0,0 +1,123 @@
+package localai_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+
+	"github.com/labstack/echo/v4"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/http/endpoints/localai"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+)
+
+// InstallBackendOnNodeEndpoint became async to stop blocking the browser on
+// the 3-minute NATS reply timeout. These specs lock in the new contract:
+// HTTP 202 with a jobID, a ManagementOp enqueued on the gallery channel, and
+// an opcache entry keyed by NodeScopedKey so concurrent installs of the same
+// backend on different nodes do not stomp each other.
+var _ = Describe("InstallBackendOnNodeEndpoint async behavior", func() {
+	var (
+		e              *echo.Echo
+		galleryService *galleryop.GalleryService
+		opcache        *galleryop.OpCache
+		appCfg         *config.ApplicationConfig
+		dispatched     chan galleryop.ManagementOp[gallery.GalleryBackend, any]
+		done           chan struct{}
+		drainExited    chan struct{}
+	)
+
+	BeforeEach(func() {
+		e = echo.New()
+		appCfg = &config.ApplicationConfig{
+			BackendGalleries: []config.Gallery{{Name: "test-gallery", URL: "http://example.com"}},
+		}
+		galleryService = galleryop.NewGalleryService(appCfg, nil)
+		opcache = galleryop.NewOpCache(galleryService)
+		// Drain the gallery channel into a buffered side channel so the
+		// handler's `go func() { ch <- op }()` send does not block waiting
+		// for the real worker (which is not running in this unit test).
+		dispatched = make(chan galleryop.ManagementOp[gallery.GalleryBackend, any], 4)
+		done = make(chan struct{})
+		drainExited = make(chan struct{})
+		go func() {
+			defer close(drainExited)
+			for {
+				select {
+				case op := <-galleryService.BackendGalleryChannel:
+					dispatched <- op
+				case <-done:
+					return
+				}
+			}
+		}()
+	})
+
+	AfterEach(func() {
+		// Signal the drain goroutine to exit. We do NOT close
+		// BackendGalleryChannel: the handler's dispatch goroutine may still
+		// be pending (specs that don't Eventually-Receive), and a send on a
+		// closed channel panics. Signalling via `done` lets the drain
+		// goroutine return without touching the gallery channel.
+		close(done)
+		Eventually(drainExited, "2s").Should(BeClosed())
+	})
+
+	It("returns 202 with a jobID and dispatches a TargetNodeID-scoped op", func() {
+		body := `{"backend": "llama-cpp"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		c := e.NewContext(req, rec)
+		c.SetParamNames("id")
+		c.SetParamValues("node-xyz")
+
+		handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
+		Expect(handler(c)).To(Succeed())
+		Expect(rec.Code).To(Equal(http.StatusAccepted))
+
+		var resp map[string]any
+		Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
+		Expect(resp["jobID"]).To(BeAssignableToTypeOf(""))
+		Expect(resp["jobID"].(string)).ToNot(BeEmpty())
+		Expect(resp["message"]).To(Equal("backend installation started"))
+
+		Eventually(dispatched, "2s").Should(Receive())
+		Expect(opcache.Exists(galleryop.NodeScopedKey("node-xyz", "llama-cpp"))).To(BeTrue())
+		Expect(opcache.IsBackendOp(galleryop.NodeScopedKey("node-xyz", "llama-cpp"))).To(BeTrue())
+	})
+
+	It("returns 400 when neither backend nor uri is supplied", func() {
+		req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(`{}`))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		c := e.NewContext(req, rec)
+		c.SetParamNames("id")
+		c.SetParamValues("node-xyz")
+
+		handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
+		Expect(handler(c)).To(Succeed())
+		Expect(rec.Code).To(Equal(http.StatusBadRequest))
+	})
+
+	It("accepts a direct URI install and uses the name as the cache key", func() {
+		body := `{"uri": "oci://example.com/custom-backend:v1", "name": "custom"}`
+		req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(body))
+		req.Header.Set("Content-Type", "application/json")
+		rec := httptest.NewRecorder()
+		c := e.NewContext(req, rec)
+		c.SetParamNames("id")
+		c.SetParamValues("node-xyz")
+
+		handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
+		Expect(handler(c)).To(Succeed())
+		Expect(rec.Code).To(Equal(http.StatusAccepted))
+
+		Expect(opcache.Exists(galleryop.NodeScopedKey("node-xyz", "custom"))).To(BeTrue())
+	})
+})
--- a/core/http/endpoints/openai/chat.go
+++ b/core/http/endpoints/openai/chat.go
@@ -73,363 +73,6 @@ func mergeToolCallDeltas(existing []schema.ToolCall, deltas []schema.ToolCall) [
 // @Success 200 {object} schema.OpenAIResponse "Response"
 // @Router /v1/chat/completions [post]
 func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, startupOptions *config.ApplicationConfig, natsClient mcpTools.MCPNATSClient, assistantHolder *mcpTools.LocalAIAssistantHolder) echo.HandlerFunc {
-	process := func(s string, req *schema.OpenAIRequest, config *config.ModelConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool, id string, created int) error {
-		initialMessage := schema.OpenAIResponse{
-			ID:      id,
-			Created: created,
-			Model:   req.Model, // we have to return what the user sent here, due to OpenAI spec.
-			Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0, FinishReason: nil}},
-			Object:  "chat.completion.chunk",
-		}
-		responses <- initialMessage
-
-		// Detect if thinking token is already in prompt or template
-		// When UseTokenizerTemplate is enabled, predInput is empty, so we check the template
-		var template string
-		if config.TemplateConfig.UseTokenizerTemplate {
-			template = config.GetModelTemplate()
-		} else {
-			template = s
-		}
-		thinkingStartToken := reason.DetectThinkingStartToken(template, &config.ReasoningConfig)
-		extractor := reason.NewReasoningExtractor(thinkingStartToken, config.ReasoningConfig)
-
-		_, _, _, err := ComputeChoices(req, s, config, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
-			var reasoningDelta, contentDelta string
-
-			// Always keep the Go-side extractor in sync with raw tokens so it
-			// can serve as fallback for backends without an autoparser (e.g. vLLM).
-			goReasoning, goContent := extractor.ProcessToken(s)
-
-			// When C++ autoparser chat deltas are available, prefer them — they
-			// handle model-specific formats (Gemma 4, etc.) without Go-side tags.
-			// Otherwise fall back to Go-side extraction.
-			if tokenUsage.HasChatDeltaContent() {
-				rawReasoning, cd := tokenUsage.ChatDeltaReasoningAndContent()
-				contentDelta = cd
-				reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
-			} else {
-				reasoningDelta = goReasoning
-				contentDelta = goContent
-			}
-
-			usage := schema.OpenAIUsage{
-				PromptTokens:     tokenUsage.Prompt,
-				CompletionTokens: tokenUsage.Completion,
-				TotalTokens:      tokenUsage.Prompt + tokenUsage.Completion,
-			}
-			if extraUsage {
-				usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
-				usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
-			}
-
-			delta := &schema.Message{}
-			if contentDelta != "" {
-				delta.Content = &contentDelta
-			}
-			if reasoningDelta != "" {
-				delta.Reasoning = &reasoningDelta
-			}
-
-			// Usage rides as a struct field for the consumer to track the
-			// running cumulative — it is stripped before JSON marshal so the
-			// wire chunk stays spec-compliant (no `usage` on intermediate
-			// chunks). The dedicated trailer chunk (when include_usage=true)
-			// carries the final totals.
-			usageForChunk := usage
-			resp := schema.OpenAIResponse{
-				ID:      id,
-				Created: created,
-				Model:   req.Model, // we have to return what the user sent here, due to OpenAI spec.
-				Choices: []schema.Choice{{Delta: delta, Index: 0, FinishReason: nil}},
-				Object:  "chat.completion.chunk",
-				Usage:   &usageForChunk,
-			}
-
-			responses <- resp
-			return true
-		})
-		close(responses)
-		return err
-	}
-	processTools := func(noAction string, prompt string, req *schema.OpenAIRequest, config *config.ModelConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool, id string, created int, textContentToReturn *string) error {
-		// Detect if thinking token is already in prompt or template
-		var template string
-		if config.TemplateConfig.UseTokenizerTemplate {
-			template = config.GetModelTemplate()
-		} else {
-			template = prompt
-		}
-		thinkingStartToken := reason.DetectThinkingStartToken(template, &config.ReasoningConfig)
-		extractor := reason.NewReasoningExtractor(thinkingStartToken, config.ReasoningConfig)
-
-		result := ""
-		lastEmittedCount := 0
-		sentInitialRole := false
-		sentReasoning := false
-		hasChatDeltaToolCalls := false
-		hasChatDeltaContent := false
-
-		_, _, chatDeltas, err := ComputeChoices(req, prompt, config, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
-			result += s
-
-			// Track whether ChatDeltas from the C++ autoparser contain
-			// tool calls or content, so the retry decision can account for them.
-			for _, d := range usage.ChatDeltas {
-				if len(d.ToolCalls) > 0 {
-					hasChatDeltaToolCalls = true
-				}
-				if d.Content != "" {
-					hasChatDeltaContent = true
-				}
-			}
-
-			var reasoningDelta, contentDelta string
-
-			goReasoning, goContent := extractor.ProcessToken(s)
-
-			if usage.HasChatDeltaContent() {
-				rawReasoning, cd := usage.ChatDeltaReasoningAndContent()
-				contentDelta = cd
-				reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
-			} else {
-				reasoningDelta = goReasoning
-				contentDelta = goContent
-			}
-
-			// Emit reasoning deltas in their own SSE chunks before any tool-call chunks
-			// (OpenAI spec: reasoning and tool_calls never share a delta)
-			if reasoningDelta != "" {
-				responses <- schema.OpenAIResponse{
-					ID:      id,
-					Created: created,
-					Model:   req.Model,
-					Choices: []schema.Choice{{
-						Delta: &schema.Message{Reasoning: &reasoningDelta},
-						Index: 0,
-					}},
-					Object: "chat.completion.chunk",
-				}
-				sentReasoning = true
-			}
-
-			// Stream content deltas (cleaned of reasoning tags) while no tool calls
-			// have been detected. Once the incremental parser finds tool calls,
-			// content stops — per OpenAI spec, content and tool_calls don't mix.
-			if lastEmittedCount == 0 && contentDelta != "" {
-				if !sentInitialRole {
-					responses <- schema.OpenAIResponse{
-						ID: id, Created: created, Model: req.Model,
-						Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
-						Object:  "chat.completion.chunk",
-					}
-					sentInitialRole = true
-				}
-				responses <- schema.OpenAIResponse{
-					ID: id, Created: created, Model: req.Model,
-					Choices: []schema.Choice{{
-						Delta: &schema.Message{Content: &contentDelta},
-						Index: 0,
-					}},
-					Object: "chat.completion.chunk",
-				}
-			}
-
-			// Try incremental XML parsing for streaming support using iterative parser
-			// This allows emitting partial tool calls as they're being generated
-			cleanedResult := functions.CleanupLLMResult(result, config.FunctionsConfig)
-
-			// Determine XML format from config
-			var xmlFormat *functions.XMLToolCallFormat
-			if config.FunctionsConfig.XMLFormat != nil {
-				xmlFormat = config.FunctionsConfig.XMLFormat
-			} else if config.FunctionsConfig.XMLFormatPreset != "" {
-				xmlFormat = functions.GetXMLFormatPreset(config.FunctionsConfig.XMLFormatPreset)
-			}
-
-			// Use iterative parser for streaming (partial parsing enabled)
-			// Try XML parsing first
-			partialResults, parseErr := functions.ParseXMLIterative(cleanedResult, xmlFormat, true)
-			if parseErr == nil && len(partialResults) > 0 {
-				// Emit new XML tool calls that weren't emitted before
-				if len(partialResults) > lastEmittedCount {
-					for i := lastEmittedCount; i < len(partialResults); i++ {
-						toolCall := partialResults[i]
-						initialMessage := schema.OpenAIResponse{
-							ID:      id,
-							Created: created,
-							Model:   req.Model,
-							Choices: []schema.Choice{{
-								Delta: &schema.Message{
-									Role: "assistant",
-									ToolCalls: []schema.ToolCall{
-										{
-											Index: i,
-											ID:    id,
-											Type:  "function",
-											FunctionCall: schema.FunctionCall{
-												Name: toolCall.Name,
-											},
-										},
-									},
-								},
-								Index:        0,
-								FinishReason: nil,
-							}},
-							Object: "chat.completion.chunk",
-						}
-						select {
-						case responses <- initialMessage:
-						default:
-						}
-					}
-					lastEmittedCount = len(partialResults)
-				}
-			} else {
-				// Try JSON tool call parsing for streaming.
-				// Only emit NEW tool calls (same guard as XML parser above).
-				jsonResults, jsonErr := functions.ParseJSONIterative(cleanedResult, true)
-				if jsonErr == nil && len(jsonResults) > lastEmittedCount {
-					for i := lastEmittedCount; i < len(jsonResults); i++ {
-						jsonObj := jsonResults[i]
-						name, ok := jsonObj["name"].(string)
-						if !ok || name == "" {
-							continue
-						}
-						args := "{}"
-						if argsVal, ok := jsonObj["arguments"]; ok {
-							if argsStr, ok := argsVal.(string); ok {
-								args = argsStr
-							} else {
-								argsBytes, _ := json.Marshal(argsVal)
-								args = string(argsBytes)
-							}
-						}
-						initialMessage := schema.OpenAIResponse{
-							ID:      id,
-							Created: created,
-							Model:   req.Model,
-							Choices: []schema.Choice{{
-								Delta: &schema.Message{
-									Role: "assistant",
-									ToolCalls: []schema.ToolCall{
-										{
-											Index: i,
-											ID:    id,
-											Type:  "function",
-											FunctionCall: schema.FunctionCall{
-												Name:      name,
-												Arguments: args,
-											},
-										},
-									},
-								},
-								Index:        0,
-								FinishReason: nil,
-							}},
-							Object: "chat.completion.chunk",
-						}
-						responses <- initialMessage
-					}
-					lastEmittedCount = len(jsonResults)
-				}
-			}
-			return true
-		},
-			func(attempt int) bool {
-				// After streaming completes: check if we got actionable content
-				cleaned := extractor.CleanedContent()
-				// Check for tool calls from chat deltas (will be re-checked after ComputeChoices,
-				// but we need to know here whether to retry).
-				// Also check ChatDelta flags — when the C++ autoparser is active,
-				// tool calls and content are delivered via ChatDeltas while the
-				// raw message is cleared. Without this check, we'd retry
-				// unnecessarily, losing valid results and concatenating output.
-				hasToolCalls := lastEmittedCount > 0 || hasChatDeltaToolCalls
-				hasContent := cleaned != "" || hasChatDeltaContent
-				if !hasContent && !hasToolCalls {
-					xlog.Warn("Streaming: backend produced only reasoning, retrying",
-						"reasoning_len", len(extractor.Reasoning()), "attempt", attempt+1)
-					extractor.ResetAndSuppressReasoning()
-					result = ""
-					lastEmittedCount = 0
-					sentInitialRole = false
-					hasChatDeltaToolCalls = false
-					hasChatDeltaContent = false
-					return true
-				}
-				return false
-			},
-		)
-		if err != nil {
-			return err
-		}
-		// Try using pre-parsed tool calls from C++ autoparser (chat deltas)
-		var functionResults []functions.FuncCallResults
-		var reasoning string
-
-		if deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas); len(deltaToolCalls) > 0 {
-			xlog.Debug("[ChatDeltas] Using pre-parsed tool calls from C++ autoparser", "count", len(deltaToolCalls))
-			functionResults = deltaToolCalls
-			// Use content/reasoning from deltas too
-			*textContentToReturn = functions.ContentFromChatDeltas(chatDeltas)
-			reasoning = functions.ReasoningFromChatDeltas(chatDeltas)
-		} else {
-			// Fallback: parse tool calls from raw text (no chat deltas from backend)
-			xlog.Debug("[ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing")
-			reasoning = extractor.Reasoning()
-			cleanedResult := extractor.CleanedContent()
-			*textContentToReturn = functions.ParseTextContent(cleanedResult, config.FunctionsConfig)
-			cleanedResult = functions.CleanupLLMResult(cleanedResult, config.FunctionsConfig)
-			functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
-		}
-		xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
-		// noAction is a sentinel "just answer" pseudo-function — not a real
-		// tool call. Scan the whole slice rather than only index 0 so we
-		// don't drop a real tool call that happens to follow a noAction
-		// entry, and so the default branch isn't entered with only noAction
-		// entries to emit as tool_calls.
-		noActionToRun := !hasRealCall(functionResults, noAction)
-
-		switch {
-		case noActionToRun:
-			// Token-cumulative usage is communicated to the streaming
-			// consumer via the per-token callback's chunk struct (stripped
-			// before wire marshal). The final usage trailer — when the
-			// caller opted in with stream_options.include_usage — is built
-			// by the outer streaming loop, not here.
-			var result string
-			if !sentInitialRole {
-				var hqErr error
-				result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
-				if hqErr != nil {
-					xlog.Error("error handling question", "error", hqErr)
-					return hqErr
-				}
-			}
-			for _, chunk := range buildNoActionFinalChunks(
-				id, req.Model, created,
-				sentInitialRole, sentReasoning,
-				result, reasoning,
-			) {
-				responses <- chunk
-			}
-
-		default:
-			for _, chunk := range buildDeferredToolCallChunks(
-				id, req.Model, created,
-				functionResults, lastEmittedCount,
-				sentInitialRole, *textContentToReturn,
-				sentReasoning, reasoning,
-			) {
-				responses <- chunk
-			}
-		}
-
-		close(responses)
-		return err
-	}
-
 	return func(c echo.Context) error {
 		var textContentToReturn string
 		id := uuid.New().String()
@@ -697,17 +340,19 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				}

 				responses := make(chan schema.OpenAIResponse)
-				ended := make(chan error, 1)
+				ended := make(chan streamWorkerResult, 1)

 				go func() {
 					if !shouldUseFn {
-						ended <- process(predInput, input, config, ml, responses, extraUsage, id, created)
+						u, err := processStream(predInput, input, config, cl, startupOptions, ml, responses, id, created)
+						ended <- streamWorkerResult{usage: u, err: err}
 					} else {
-						ended <- processTools(noActionName, predInput, input, config, ml, responses, extraUsage, id, created, &textContentToReturn)
+						u, err := processStreamWithTools(noActionName, predInput, input, config, cl, startupOptions, ml, responses, id, created, &textContentToReturn)
+						ended <- streamWorkerResult{usage: u, err: err}
 					}
 				}()

-				usage := &schema.OpenAIUsage{}
+				var finalUsage backend.TokenUsage
 				toolsCalled := false
 				var collectedToolCalls []schema.ToolCall
 				var collectedContent string
@@ -725,13 +370,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 							xlog.Debug("No choices in the response, skipping")
 							continue
 						}
-						// Capture the running cumulative usage from this chunk
-						// (when present) so the include_usage trailer can carry
-						// the final totals. Usage is stripped before marshal
-						// below so the wire chunk stays spec-compliant.
-						if ev.Usage != nil {
-							usage = ev.Usage
-						}
 						if len(ev.Choices[0].Delta.ToolCalls) > 0 {
 							toolsCalled = true
 							// Collect and merge tool call deltas for MCP execution
@@ -747,11 +385,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 								collectedContent += *sp
 							}
 						}
-						// OpenAI streaming spec: intermediate chunks must NOT
-						// carry a `usage` field. Strip the tracking copy
-						// before marshalling — usage is delivered via the
-						// dedicated trailer chunk when include_usage=true.
-						ev.Usage = nil
 						respData, err := json.Marshal(ev)
 						if err != nil {
 							xlog.Debug("Failed to marshal response", "error", err)
@@ -766,15 +399,16 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 							return err
 						}
 						c.Response().Flush()
-					case err := <-ended:
-						if err == nil {
+					case res := <-ended:
+						if res.err == nil {
+							finalUsage = res.usage
 							break LOOP
 						}
-						xlog.Error("Stream ended with error", "error", err)
+						xlog.Error("Stream ended with error", "error", res.err)

 						errorResp := schema.ErrorResponse{
 							Error: &schema.APIError{
-								Message: err.Error(),
+								Message: res.err.Error(),
 								Type:    "server_error",
 								Code:    "server_error",
 							},
@@ -797,7 +431,10 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				// still trying to send (e.g., after client disconnect). The goroutine
 				// calls close(responses) when done, which terminates the drain.
 				if input.Context.Err() != nil {
-					go func() { for range responses {} }()
+					go func() {
+						for range responses {
+						}
+					}()
 					<-ended
 				}

@@ -921,8 +558,16 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
 				// Trailing usage chunk per OpenAI spec: emit only when the
 				// caller opted in via stream_options.include_usage. Shape:
 				// {"choices":[],"usage":{...},"object":"chat.completion.chunk",...}
-				if input.StreamOptions != nil && input.StreamOptions.IncludeUsage && usage != nil {
-					trailer := streamUsageTrailerJSON(id, input.Model, created, *usage)
+				//
+				// finalUsage is the authoritative TokenUsage returned by the
+				// worker function (process / processTools) via the `ended`
+				// channel. The worker reads it from ComputeChoices' return
+				// value, which is the cumulative count produced by the backend
+				// over the whole prediction. Issue #9927 was caused by the
+				// tools-path worker not surfacing this value at all.
+				if input.StreamOptions != nil && input.StreamOptions.IncludeUsage {
+					trailerUsage := streamUsageFromTokenUsage(finalUsage, extraUsage)
+					trailer := streamUsageTrailerJSON(id, input.Model, created, trailerUsage)
 					_, _ = fmt.Fprintf(c.Response().Writer, "data: %s\n\n", trailer)
 				}

--- a/core/http/endpoints/openai/chat_emit.go
+++ b/core/http/endpoints/openai/chat_emit.go
@@ -4,10 +4,39 @@ import (
 	"encoding/json"
 	"fmt"

+	"github.com/mudler/LocalAI/core/backend"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/functions"
 )

+// streamWorkerResult is what the streaming workers (process / processTools)
+// hand back to the outer ChatEndpoint loop through the `ended` channel.
+// Threading the final TokenUsage here, instead of piggy-backing it on the
+// `responses` SSE channel, keeps the SSE channel single-purpose (wire chunks)
+// and gives the trailer emitter a plain Go value to read after LOOP exits.
+// Fix for issue #9927: the previous tools-path worker never surfaced the
+// cumulative token counts at all, so the include_usage trailer reported zeros.
+type streamWorkerResult struct {
+	usage backend.TokenUsage
+	err   error
+}
+
+// streamUsageFromTokenUsage converts the backend's cumulative TokenUsage into
+// the OpenAI-spec OpenAIUsage shape used on the wire. `extraUsage` controls
+// whether the non-standard timing fields are forwarded.
+func streamUsageFromTokenUsage(usage backend.TokenUsage, extraUsage bool) schema.OpenAIUsage {
+	out := schema.OpenAIUsage{
+		PromptTokens:     usage.Prompt,
+		CompletionTokens: usage.Completion,
+		TotalTokens:      usage.Prompt + usage.Completion,
+	}
+	if extraUsage {
+		out.TimingTokenGeneration = usage.TimingTokenGeneration
+		out.TimingPromptProcessing = usage.TimingPromptProcessing
+	}
+	return out
+}
+
 // streamUsageTrailerJSON returns the bytes of the OpenAI-spec trailing usage
 // chunk emitted in streaming completions when the request opts in via
 // `stream_options.include_usage: true`. The shape is:
--- a/core/http/endpoints/openai/chat_stream_usage_test.go
+++ b/core/http/endpoints/openai/chat_stream_usage_test.go
@@ -1,10 +1,14 @@
 package openai

 import (
+	"context"
 	"encoding/json"

+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/LocalAI/pkg/functions"
+	"github.com/mudler/LocalAI/pkg/model"
 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
 )
@@ -152,6 +156,28 @@ var _ = Describe("streaming usage spec compliance", func() {
 		})
 	})

+	Describe("streamUsageFromTokenUsage", func() {
+		It("converts backend TokenUsage to schema OpenAIUsage", func() {
+			tu := backend.TokenUsage{Prompt: 18, Completion: 213}
+			u := streamUsageFromTokenUsage(tu, false)
+			Expect(u.PromptTokens).To(Equal(18))
+			Expect(u.CompletionTokens).To(Equal(213))
+			Expect(u.TotalTokens).To(Equal(231))
+			Expect(u.TimingTokenGeneration).To(BeZero())
+			Expect(u.TimingPromptProcessing).To(BeZero())
+		})
+		It("includes timings when extraUsage is true", func() {
+			tu := backend.TokenUsage{
+				Prompt: 10, Completion: 20,
+				TimingPromptProcessing: 0.5,
+				TimingTokenGeneration:  1.5,
+			}
+			u := streamUsageFromTokenUsage(tu, true)
+			Expect(u.TimingPromptProcessing).To(Equal(0.5))
+			Expect(u.TimingTokenGeneration).To(Equal(1.5))
+		})
+	})
+
 	Describe("OpenAIRequest.StreamOptions", func() {
 		It("parses stream_options.include_usage=true", func() {
 			body := []byte(`{
@@ -177,3 +203,160 @@ var _ = Describe("streaming usage spec compliance", func() {
 		})
 	})
 })
+
+// Functional regression coverage for issue #9927: the streaming workers
+// must surface the cumulative TokenUsage returned by ComputeChoices to
+// their caller. The earlier broken implementations discarded that value
+// (`_, _, chatDeltas, err := ComputeChoices(...)`) and threw away the
+// counts on the floor, so the include_usage trailer always reported
+// zeros when tools were enabled.
+//
+// These tests stub backend.ModelInferenceFunc so the worker exercises the
+// real ComputeChoices → predFunc → LLMResponse pipeline. If a future change
+// drops the TokenUsage somewhere along that path, the assertions on the
+// returned value fail with a concrete count mismatch (e.g. 0 vs 213),
+// not with a "function undefined" compile error.
+var _ = Describe("streaming workers surface final TokenUsage (issue #9927)", func() {
+	var (
+		origInference modelInferenceFunc
+		appCfg        *config.ApplicationConfig
+	)
+
+	BeforeEach(func() {
+		origInference = backend.ModelInferenceFunc
+		appCfg = config.NewApplicationConfig()
+	})
+
+	AfterEach(func() {
+		backend.ModelInferenceFunc = origInference
+	})
+
+	// mockBackendUsage installs a stub backend that yields one LLMResponse
+	// carrying the supplied TokenUsage. ComputeChoices' single-attempt path
+	// copies these counts into the value it returns to the worker.
+	mockBackendUsage := func(usage backend.TokenUsage, response string) {
+		backend.ModelInferenceFunc = func(
+			ctx context.Context, s string, messages schema.Messages,
+			images, videos, audios []string,
+			loader *model.ModelLoader, c *config.ModelConfig, cl *config.ModelConfigLoader,
+			o *config.ApplicationConfig,
+			tokenCallback func(string, backend.TokenUsage) bool,
+			tools, toolChoice string,
+			logprobs, topLogprobs *int,
+			logitBias map[string]float64,
+			metadata map[string]string,
+		) (func() (backend.LLMResponse, error), error) {
+			return func() (backend.LLMResponse, error) {
+				return backend.LLMResponse{
+					Response: response,
+					Usage:    usage,
+				}, nil
+			}, nil
+		}
+	}
+
+	makeReq := func() *schema.OpenAIRequest {
+		ctx, cancel := context.WithCancel(context.Background())
+		req := &schema.OpenAIRequest{
+			Context: ctx,
+			Cancel:  cancel,
+		}
+		req.Model = "test-model" // promoted from BasicModelRequest
+		return req
+	}
+
+	// drainResponses consumes everything the worker pushes onto the channel
+	// so the worker is never blocked on its send. The channel is unbuffered
+	// (matching production), so the drain goroutine must be running before
+	// the worker is called.
+	drainResponses := func(ch <-chan schema.OpenAIResponse) <-chan struct{} {
+		done := make(chan struct{})
+		go func() {
+			for range ch {
+			}
+			close(done)
+		}()
+		return done
+	}
+
+	Describe("processStream (no-tools path)", func() {
+		It("returns the cumulative TokenUsage produced by the backend", func() {
+			mockBackendUsage(backend.TokenUsage{Prompt: 18, Completion: 213}, "Hello there")
+
+			req := makeReq()
+			cfg := &config.ModelConfig{}
+			responses := make(chan schema.OpenAIResponse)
+			done := drainResponses(responses)
+
+			actual, err := processStream("prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0)
+			<-done
+
+			Expect(err).ToNot(HaveOccurred())
+			Expect(actual.Prompt).To(Equal(18),
+				"prompt tokens must round-trip from backend through processStream")
+			Expect(actual.Completion).To(Equal(213),
+				"completion tokens must round-trip from backend through processStream")
+		})
+
+		It("returns zero TokenUsage when the backend reports zero (negative control)", func() {
+			mockBackendUsage(backend.TokenUsage{}, "x")
+
+			req := makeReq()
+			cfg := &config.ModelConfig{}
+			responses := make(chan schema.OpenAIResponse)
+			done := drainResponses(responses)
+
+			actual, err := processStream("prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0)
+			<-done
+
+			Expect(err).ToNot(HaveOccurred())
+			Expect(actual.Prompt).To(BeZero())
+			Expect(actual.Completion).To(BeZero())
+		})
+	})
+
+	Describe("processStreamWithTools (tools path)", func() {
+		It("returns the cumulative TokenUsage produced by the backend", func() {
+			// This is the direct regression check for issue #9927: with tools
+			// enabled, the trailer was reporting {0,0,0} because the worker
+			// discarded ComputeChoices' second return value.
+			mockBackendUsage(backend.TokenUsage{Prompt: 18, Completion: 213}, "answer")
+
+			req := makeReq()
+			cfg := &config.ModelConfig{}
+			responses := make(chan schema.OpenAIResponse)
+			done := drainResponses(responses)
+			var textContent string
+
+			actual, err := processStreamWithTools("none", "prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0, &textContent)
+			<-done
+
+			Expect(err).ToNot(HaveOccurred())
+			Expect(actual.Prompt).To(Equal(18),
+				"prompt tokens must round-trip from backend through processStreamWithTools (issue #9927)")
+			Expect(actual.Completion).To(Equal(213),
+				"completion tokens must round-trip from backend through processStreamWithTools (issue #9927)")
+		})
+
+		It("forwards timing fields when the backend supplies them", func() {
+			mockBackendUsage(backend.TokenUsage{
+				Prompt: 10, Completion: 20,
+				TimingPromptProcessing: 0.5,
+				TimingTokenGeneration:  1.5,
+			}, "answer")
+
+			req := makeReq()
+			cfg := &config.ModelConfig{}
+			responses := make(chan schema.OpenAIResponse)
+			done := drainResponses(responses)
+			var textContent string
+
+			actual, err := processStreamWithTools("none", "prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0, &textContent)
+			<-done
+
+			Expect(err).ToNot(HaveOccurred())
+			Expect(actual.TimingPromptProcessing).To(Equal(0.5))
+			Expect(actual.TimingTokenGeneration).To(Equal(1.5))
+		})
+	})
+})
--- a/core/http/endpoints/openai/chat_stream_workers.go
+++ b/core/http/endpoints/openai/chat_stream_workers.go
@@ -0,0 +1,390 @@
+package openai
+
+import (
+	"encoding/json"
+
+	"github.com/mudler/LocalAI/core/backend"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/LocalAI/pkg/functions"
+	"github.com/mudler/LocalAI/pkg/model"
+	reason "github.com/mudler/LocalAI/pkg/reasoning"
+	"github.com/mudler/xlog"
+)
+
+// processStream is the streaming worker for chat completions with no
+// tool/function calling involved. It pushes SSE-shaped chunks onto
+// `responses` and returns the authoritative cumulative TokenUsage from
+// the prediction so the caller can populate the include_usage trailer
+// without having to peek inside the chunks.
+//
+// The caller owns the `responses` channel and is expected to read from
+// it while this function runs; processStream closes the channel before
+// returning.
+func processStream(
+	s string,
+	req *schema.OpenAIRequest,
+	cfg *config.ModelConfig,
+	cl *config.ModelConfigLoader,
+	startupOptions *config.ApplicationConfig,
+	loader *model.ModelLoader,
+	responses chan schema.OpenAIResponse,
+	id string,
+	created int,
+) (backend.TokenUsage, error) {
+	responses <- schema.OpenAIResponse{
+		ID:      id,
+		Created: created,
+		Model:   req.Model, // we have to return what the user sent here, due to OpenAI spec.
+		Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0, FinishReason: nil}},
+		Object:  "chat.completion.chunk",
+	}
+
+	// Detect if thinking token is already in prompt or template
+	// When UseTokenizerTemplate is enabled, predInput is empty, so we check the template
+	var template string
+	if cfg.TemplateConfig.UseTokenizerTemplate {
+		template = cfg.GetModelTemplate()
+	} else {
+		template = s
+	}
+	thinkingStartToken := reason.DetectThinkingStartToken(template, &cfg.ReasoningConfig)
+	extractor := reason.NewReasoningExtractor(thinkingStartToken, cfg.ReasoningConfig)
+
+	_, finalUsage, _, err := ComputeChoices(req, s, cfg, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
+		var reasoningDelta, contentDelta string
+
+		// Always keep the Go-side extractor in sync with raw tokens so it
+		// can serve as fallback for backends without an autoparser (e.g. vLLM).
+		goReasoning, goContent := extractor.ProcessToken(s)
+
+		// When C++ autoparser chat deltas are available, prefer them: they
+		// handle model-specific formats (Gemma 4, etc.) without Go-side tags.
+		// Otherwise fall back to Go-side extraction.
+		if tokenUsage.HasChatDeltaContent() {
+			rawReasoning, cd := tokenUsage.ChatDeltaReasoningAndContent()
+			contentDelta = cd
+			reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
+		} else {
+			reasoningDelta = goReasoning
+			contentDelta = goContent
+		}
+
+		delta := &schema.Message{}
+		if contentDelta != "" {
+			delta.Content = &contentDelta
+		}
+		if reasoningDelta != "" {
+			delta.Reasoning = &reasoningDelta
+		}
+
+		responses <- schema.OpenAIResponse{
+			ID:      id,
+			Created: created,
+			Model:   req.Model, // we have to return what the user sent here, due to OpenAI spec.
+			Choices: []schema.Choice{{Delta: delta, Index: 0, FinishReason: nil}},
+			Object:  "chat.completion.chunk",
+		}
+		return true
+	})
+	close(responses)
+	return finalUsage, err
+}
+
+// processStreamWithTools is the streaming worker for chat completions
+// with tools / function calling. Same contract as processStream: pushes
+// chunks onto `responses`, closes the channel, returns the cumulative
+// TokenUsage.
+//
+// Returning the TokenUsage as a normal Go value (rather than smuggling
+// it on a sentinel chunk) is the fix for issue #9927 — the previous
+// implementation discarded the value from ComputeChoices, so the
+// include_usage trailer reported zeros whenever `tools` was in play.
+func processStreamWithTools(
+	noAction string,
+	prompt string,
+	req *schema.OpenAIRequest,
+	cfg *config.ModelConfig,
+	cl *config.ModelConfigLoader,
+	startupOptions *config.ApplicationConfig,
+	loader *model.ModelLoader,
+	responses chan schema.OpenAIResponse,
+	id string,
+	created int,
+	textContentToReturn *string,
+) (backend.TokenUsage, error) {
+	// Detect if thinking token is already in prompt or template
+	var template string
+	if cfg.TemplateConfig.UseTokenizerTemplate {
+		template = cfg.GetModelTemplate()
+	} else {
+		template = prompt
+	}
+	thinkingStartToken := reason.DetectThinkingStartToken(template, &cfg.ReasoningConfig)
+	extractor := reason.NewReasoningExtractor(thinkingStartToken, cfg.ReasoningConfig)
+
+	result := ""
+	lastEmittedCount := 0
+	sentInitialRole := false
+	sentReasoning := false
+	hasChatDeltaToolCalls := false
+	hasChatDeltaContent := false
+
+	_, finalUsage, chatDeltas, err := ComputeChoices(req, prompt, cfg, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
+		result += s
+
+		// Track whether ChatDeltas from the C++ autoparser contain
+		// tool calls or content, so the retry decision can account for them.
+		for _, d := range usage.ChatDeltas {
+			if len(d.ToolCalls) > 0 {
+				hasChatDeltaToolCalls = true
+			}
+			if d.Content != "" {
+				hasChatDeltaContent = true
+			}
+		}
+
+		var reasoningDelta, contentDelta string
+
+		goReasoning, goContent := extractor.ProcessToken(s)
+
+		if usage.HasChatDeltaContent() {
+			rawReasoning, cd := usage.ChatDeltaReasoningAndContent()
+			contentDelta = cd
+			reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
+		} else {
+			reasoningDelta = goReasoning
+			contentDelta = goContent
+		}
+
+		// Emit reasoning deltas in their own SSE chunks before any tool-call chunks
+		// (OpenAI spec: reasoning and tool_calls never share a delta)
+		if reasoningDelta != "" {
+			responses <- schema.OpenAIResponse{
+				ID:      id,
+				Created: created,
+				Model:   req.Model,
+				Choices: []schema.Choice{{
+					Delta: &schema.Message{Reasoning: &reasoningDelta},
+					Index: 0,
+				}},
+				Object: "chat.completion.chunk",
+			}
+			sentReasoning = true
+		}
+
+		// Stream content deltas (cleaned of reasoning tags) while no tool calls
+		// have been detected. Once the incremental parser finds tool calls,
+		// content stops: per OpenAI spec, content and tool_calls don't mix.
+		if lastEmittedCount == 0 && contentDelta != "" {
+			if !sentInitialRole {
+				responses <- schema.OpenAIResponse{
+					ID: id, Created: created, Model: req.Model,
+					Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
+					Object:  "chat.completion.chunk",
+				}
+				sentInitialRole = true
+			}
+			responses <- schema.OpenAIResponse{
+				ID: id, Created: created, Model: req.Model,
+				Choices: []schema.Choice{{
+					Delta: &schema.Message{Content: &contentDelta},
+					Index: 0,
+				}},
+				Object: "chat.completion.chunk",
+			}
+		}
+
+		// Try incremental XML parsing for streaming support using iterative parser
+		// This allows emitting partial tool calls as they're being generated
+		cleanedResult := functions.CleanupLLMResult(result, cfg.FunctionsConfig)
+
+		// Determine XML format from config
+		var xmlFormat *functions.XMLToolCallFormat
+		if cfg.FunctionsConfig.XMLFormat != nil {
+			xmlFormat = cfg.FunctionsConfig.XMLFormat
+		} else if cfg.FunctionsConfig.XMLFormatPreset != "" {
+			xmlFormat = functions.GetXMLFormatPreset(cfg.FunctionsConfig.XMLFormatPreset)
+		}
+
+		// Use iterative parser for streaming (partial parsing enabled)
+		// Try XML parsing first
+		partialResults, parseErr := functions.ParseXMLIterative(cleanedResult, xmlFormat, true)
+		if parseErr == nil && len(partialResults) > 0 {
+			// Emit new XML tool calls that weren't emitted before
+			if len(partialResults) > lastEmittedCount {
+				for i := lastEmittedCount; i < len(partialResults); i++ {
+					toolCall := partialResults[i]
+					initialMessage := schema.OpenAIResponse{
+						ID:      id,
+						Created: created,
+						Model:   req.Model,
+						Choices: []schema.Choice{{
+							Delta: &schema.Message{
+								Role: "assistant",
+								ToolCalls: []schema.ToolCall{
+									{
+										Index: i,
+										ID:    id,
+										Type:  "function",
+										FunctionCall: schema.FunctionCall{
+											Name: toolCall.Name,
+										},
+									},
+								},
+							},
+							Index:        0,
+							FinishReason: nil,
+						}},
+						Object: "chat.completion.chunk",
+					}
+					select {
+					case responses <- initialMessage:
+					default:
+					}
+				}
+				lastEmittedCount = len(partialResults)
+			}
+		} else {
+			// Try JSON tool call parsing for streaming.
+			// Only emit NEW tool calls (same guard as XML parser above).
+			jsonResults, jsonErr := functions.ParseJSONIterative(cleanedResult, true)
+			if jsonErr == nil && len(jsonResults) > lastEmittedCount {
+				for i := lastEmittedCount; i < len(jsonResults); i++ {
+					jsonObj := jsonResults[i]
+					name, ok := jsonObj["name"].(string)
+					if !ok || name == "" {
+						continue
+					}
+					args := "{}"
+					if argsVal, ok := jsonObj["arguments"]; ok {
+						if argsStr, ok := argsVal.(string); ok {
+							args = argsStr
+						} else {
+							argsBytes, _ := json.Marshal(argsVal)
+							args = string(argsBytes)
+						}
+					}
+					initialMessage := schema.OpenAIResponse{
+						ID:      id,
+						Created: created,
+						Model:   req.Model,
+						Choices: []schema.Choice{{
+							Delta: &schema.Message{
+								Role: "assistant",
+								ToolCalls: []schema.ToolCall{
+									{
+										Index: i,
+										ID:    id,
+										Type:  "function",
+										FunctionCall: schema.FunctionCall{
+											Name:      name,
+											Arguments: args,
+										},
+									},
+								},
+							},
+							Index:        0,
+							FinishReason: nil,
+						}},
+						Object: "chat.completion.chunk",
+					}
+					responses <- initialMessage
+				}
+				lastEmittedCount = len(jsonResults)
+			}
+		}
+		return true
+	},
+		func(attempt int) bool {
+			// After streaming completes: check if we got actionable content
+			cleaned := extractor.CleanedContent()
+			// Check for tool calls from chat deltas (will be re-checked after ComputeChoices,
+			// but we need to know here whether to retry).
+			// Also check ChatDelta flags: when the C++ autoparser is active,
+			// tool calls and content are delivered via ChatDeltas while the
+			// raw message is cleared. Without this check, we'd retry
+			// unnecessarily, losing valid results and concatenating output.
+			hasToolCalls := lastEmittedCount > 0 || hasChatDeltaToolCalls
+			hasContent := cleaned != "" || hasChatDeltaContent
+			if !hasContent && !hasToolCalls {
+				xlog.Warn("Streaming: backend produced only reasoning, retrying",
+					"reasoning_len", len(extractor.Reasoning()), "attempt", attempt+1)
+				extractor.ResetAndSuppressReasoning()
+				result = ""
+				lastEmittedCount = 0
+				sentInitialRole = false
+				hasChatDeltaToolCalls = false
+				hasChatDeltaContent = false
+				return true
+			}
+			return false
+		},
+	)
+	if err != nil {
+		return finalUsage, err
+	}
+	// Try using pre-parsed tool calls from C++ autoparser (chat deltas)
+	var functionResults []functions.FuncCallResults
+	var reasoning string
+
+	if deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas); len(deltaToolCalls) > 0 {
+		xlog.Debug("[ChatDeltas] Using pre-parsed tool calls from C++ autoparser", "count", len(deltaToolCalls))
+		functionResults = deltaToolCalls
+		// Use content/reasoning from deltas too
+		*textContentToReturn = functions.ContentFromChatDeltas(chatDeltas)
+		reasoning = functions.ReasoningFromChatDeltas(chatDeltas)
+	} else {
+		// Fallback: parse tool calls from raw text (no chat deltas from backend)
+		xlog.Debug("[ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing")
+		reasoning = extractor.Reasoning()
+		cleanedResult := extractor.CleanedContent()
+		*textContentToReturn = functions.ParseTextContent(cleanedResult, cfg.FunctionsConfig)
+		cleanedResult = functions.CleanupLLMResult(cleanedResult, cfg.FunctionsConfig)
+		functionResults = functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
+	}
+	xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
+	// noAction is a sentinel "just answer" pseudo-function: not a real
+	// tool call. Scan the whole slice rather than only index 0 so we
+	// don't drop a real tool call that happens to follow a noAction
+	// entry, and so the default branch isn't entered with only noAction
+	// entries to emit as tool_calls.
+	noActionToRun := !hasRealCall(functionResults, noAction)
+
+	switch {
+	case noActionToRun:
+		// The final usage trailer (when the caller opted in with
+		// stream_options.include_usage) is built by the outer streaming
+		// loop from the TokenUsage this function returns, not from any
+		// chunk on the responses channel.
+		var result string
+		if !sentInitialRole {
+			var hqErr error
+			result, hqErr = handleQuestion(cfg, functionResults, extractor.CleanedContent(), prompt)
+			if hqErr != nil {
+				xlog.Error("error handling question", "error", hqErr)
+				return finalUsage, hqErr
+			}
+		}
+		for _, chunk := range buildNoActionFinalChunks(
+			id, req.Model, created,
+			sentInitialRole, sentReasoning,
+			result, reasoning,
+		) {
+			responses <- chunk
+		}
+
+	default:
+		for _, chunk := range buildDeferredToolCallChunks(
+			id, req.Model, created,
+			functionResults, lastEmittedCount,
+			sentInitialRole, *textContentToReturn,
+			sentReasoning, reasoning,
+		) {
+			responses <- chunk
+		}
+	}
+
+	close(responses)
+	return finalUsage, err
+}
--- a/core/http/middleware/trace.go
+++ b/core/http/middleware/trace.go
@@ -17,16 +17,20 @@ import (
 )

 type APIExchangeRequest struct {
-	Method  string       `json:"method"`
-	Path    string       `json:"path"`
-	Headers *http.Header `json:"headers"`
-	Body    *[]byte      `json:"body"`
+	Method        string       `json:"method"`
+	Path          string       `json:"path"`
+	Headers       *http.Header `json:"headers"`
+	Body          *[]byte      `json:"body"`
+	BodyTruncated bool         `json:"body_truncated,omitempty"`
+	BodyBytes     int          `json:"body_bytes,omitempty"` // original size before truncation
 }

 type APIExchangeResponse struct {
-	Status  int          `json:"status"`
-	Headers *http.Header `json:"headers"`
-	Body    *[]byte      `json:"body"`
+	Status        int          `json:"status"`
+	Headers       *http.Header `json:"headers"`
+	Body          *[]byte      `json:"body"`
+	BodyTruncated bool         `json:"body_truncated,omitempty"`
+	BodyBytes     int          `json:"body_bytes,omitempty"` // original size before truncation
 }

 type APIExchange struct {
@@ -66,11 +70,29 @@ var doInitializeTracing = sync.OnceFunc(func() {

 type bodyWriter struct {
 	http.ResponseWriter
-	body *bytes.Buffer
+	body       *bytes.Buffer
+	maxBytes   int // 0 = unlimited capture
+	truncated  bool
+	totalBytes int // bytes the upstream handler wrote, even past the cap
 }

 func (w *bodyWriter) Write(b []byte) (int, error) {
-	w.body.Write(b)
+	// Capture into the trace buffer up to maxBytes, then drop the overflow
+	// so a chatty endpoint can't grow the buffer without bound. The full
+	// payload still flows through to the real client below.
+	w.totalBytes += len(b)
+	if w.maxBytes <= 0 {
+		w.body.Write(b)
+	} else if remain := w.maxBytes - w.body.Len(); remain > 0 {
+		if remain >= len(b) {
+			w.body.Write(b)
+		} else {
+			w.body.Write(b[:remain])
+			w.truncated = true
+		}
+	} else {
+		w.truncated = true
+	}
 	return w.ResponseWriter.Write(b)
 }

@@ -80,6 +102,20 @@ func (w *bodyWriter) Flush() {
 	}
 }

+// truncateForTrace returns a defensive copy of body capped at maxBytes,
+// and a flag indicating whether the cap forced truncation. maxBytes <= 0
+// disables the cap.
+func truncateForTrace(body []byte, maxBytes int) ([]byte, bool) {
+	if maxBytes <= 0 || len(body) <= maxBytes {
+		out := make([]byte, len(body))
+		copy(out, body)
+		return out, false
+	}
+	out := make([]byte, maxBytes)
+	copy(out, body[:maxBytes])
+	return out, true
+}
+
 func initializeTracing(maxItems int) {
 	tracingMaxItems = maxItems
 	doInitializeTracing()
@@ -134,11 +170,18 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {

 			startTime := time.Now()

+			// Cap captured payload size. Without this, /embeddings and
+			// streaming /chat/completions blow the in-memory buffer into the
+			// tens of MB, which then locks the admin Traces UI fetching the
+			// JSON dump faster than the 5s auto-refresh.
+			maxBodyBytes := app.ApplicationConfig().TracingMaxBodyBytes
+
 			// Wrap response writer to capture body
 			resBody := new(bytes.Buffer)
 			mw := &bodyWriter{
 				ResponseWriter: c.Response().Writer,
 				body:           resBody,
+				maxBytes:       maxBodyBytes,
 			}
 			c.Response().Writer = mw

@@ -159,8 +202,7 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
 			// via any heap-dump-style introspection, and tokens shouldn't
 			// outlive the request that carried them.
 			requestHeaders := redactSensitiveHeaders(c.Request().Header)
-			requestBody := make([]byte, len(body))
-			copy(requestBody, body)
+			requestBody, requestTruncated := truncateForTrace(body, maxBodyBytes)
 			responseHeaders := redactSensitiveHeaders(c.Response().Header())
 			responseBody := make([]byte, resBody.Len())
 			copy(responseBody, resBody.Bytes())
@@ -168,15 +210,19 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
 				Timestamp: startTime,
 				Duration:  time.Since(startTime),
 				Request: APIExchangeRequest{
-					Method:  c.Request().Method,
-					Path:    c.Path(),
-					Headers: &requestHeaders,
-					Body:    &requestBody,
+					Method:        c.Request().Method,
+					Path:          c.Path(),
+					Headers:       &requestHeaders,
+					Body:          &requestBody,
+					BodyTruncated: requestTruncated,
+					BodyBytes:     len(body),
 				},
 				Response: APIExchangeResponse{
-					Status:  status,
-					Headers: &responseHeaders,
-					Body:    &responseBody,
+					Status:        status,
+					Headers:       &responseHeaders,
+					Body:          &responseBody,
+					BodyTruncated: mw.truncated,
+					BodyBytes:     mw.totalBytes,
 				},
 			}
 			if handlerErr != nil {
--- a/core/http/middleware/trace_body_cap_test.go
+++ b/core/http/middleware/trace_body_cap_test.go
@@ -0,0 +1,116 @@
+package middleware
+
+import (
+	"bytes"
+	"net/http/httptest"
+	"strings"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// The trace middleware copies request and response bodies into an in-memory
+// buffer that backs the admin /api/traces endpoint. With no upper bound a
+// chatty workload (embeddings, large completions) trivially produces a
+// multi-MB response that locks the Traces UI in a loading state — fetching
+// and parsing the payload outruns the 5-second auto-refresh. These specs
+// pin the capping contract so future refactors keep both the cap and the
+// passthrough to the real client intact.
+
+var _ = Describe("bodyWriter capping", func() {
+	It("captures the full body when maxBytes is 0 (unlimited)", func() {
+		downstream := httptest.NewRecorder()
+		buf := &bytes.Buffer{}
+		bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 0}
+
+		payload := []byte(strings.Repeat("x", 4096))
+		n, err := bw.Write(payload)
+
+		Expect(err).ToNot(HaveOccurred())
+		Expect(n).To(Equal(len(payload)))
+		Expect(buf.Len()).To(Equal(len(payload)))
+		Expect(downstream.Body.Len()).To(Equal(len(payload)))
+		Expect(bw.truncated).To(BeFalse())
+	})
+
+	It("stops appending to the trace buffer once maxBytes is reached but still forwards to the client", func() {
+		downstream := httptest.NewRecorder()
+		buf := &bytes.Buffer{}
+		bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 100}
+
+		payload := []byte(strings.Repeat("a", 250))
+		n, err := bw.Write(payload)
+
+		Expect(err).ToNot(HaveOccurred())
+		Expect(n).To(Equal(len(payload)), "Write must return the full byte count so callers see no short write")
+		Expect(buf.Len()).To(Equal(100), "trace buffer should hold exactly maxBytes")
+		Expect(downstream.Body.Len()).To(Equal(len(payload)), "client must still receive every byte")
+		Expect(bw.truncated).To(BeTrue())
+	})
+
+	It("handles a write that straddles the cap by keeping only the leading slice", func() {
+		downstream := httptest.NewRecorder()
+		buf := &bytes.Buffer{}
+		bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 10}
+
+		_, err := bw.Write([]byte("12345"))
+		Expect(err).ToNot(HaveOccurred())
+		Expect(bw.truncated).To(BeFalse())
+
+		_, err = bw.Write([]byte("67890ABCDE"))
+		Expect(err).ToNot(HaveOccurred())
+
+		Expect(buf.String()).To(Equal("1234567890"))
+		Expect(downstream.Body.String()).To(Equal("1234567890ABCDE"))
+		Expect(bw.truncated).To(BeTrue())
+	})
+
+	It("ignores further writes after the cap was already hit", func() {
+		downstream := httptest.NewRecorder()
+		buf := &bytes.Buffer{}
+		bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 4}
+
+		_, _ = bw.Write([]byte("AAAA"))
+		_, _ = bw.Write([]byte("BBBB"))
+		_, _ = bw.Write([]byte("CCCC"))
+
+		Expect(buf.String()).To(Equal("AAAA"))
+		Expect(downstream.Body.String()).To(Equal("AAAABBBBCCCC"))
+		Expect(bw.truncated).To(BeTrue())
+	})
+})
+
+var _ = Describe("truncateForTrace", func() {
+	It("returns the input unchanged when below the cap", func() {
+		in := []byte("hello")
+		out, truncated := truncateForTrace(in, 1024)
+		Expect(truncated).To(BeFalse())
+		Expect(out).To(Equal(in))
+	})
+
+	It("truncates when the input exceeds the cap and signals truncation", func() {
+		in := []byte(strings.Repeat("z", 200))
+		out, truncated := truncateForTrace(in, 64)
+		Expect(truncated).To(BeTrue())
+		Expect(out).To(HaveLen(64))
+		Expect(string(out)).To(Equal(strings.Repeat("z", 64)))
+	})
+
+	It("treats maxBytes <= 0 as unlimited (back-compat with current default)", func() {
+		in := []byte(strings.Repeat("q", 10_000))
+		out, truncated := truncateForTrace(in, 0)
+		Expect(truncated).To(BeFalse())
+		Expect(out).To(HaveLen(len(in)))
+	})
+
+	It("does not retain the caller's backing array (defensive copy)", func() {
+		in := []byte("abcdefghij")
+		out, truncated := truncateForTrace(in, 4)
+		Expect(truncated).To(BeTrue())
+		Expect(string(out)).To(Equal("abcd"))
+
+		// Mutating the source must not corrupt the trace copy.
+		in[0] = 'Z'
+		Expect(string(out)).To(Equal("abcd"))
+	})
+})
--- a/core/http/middleware/usage.go
+++ b/core/http/middleware/usage.go
@@ -4,6 +4,7 @@ import (
 	"bytes"
 	"encoding/json"
 	"sync"
+	"sync/atomic"
 	"time"

 	"github.com/labstack/echo/v4"
@@ -14,18 +15,37 @@ import (

 const (
 	usageFlushInterval = 5 * time.Second
-	usageMaxPending    = 5000
+	// usageMaxPending bounds the in-memory queue. Sized for bursty inference
+	// traffic on a self-hosted instance with a slow or unavailable DB.
+	usageMaxPending = 50000
 )

 // usageBatcher accumulates usage records and flushes them to the DB periodically.
 type usageBatcher struct {
-	mu      sync.Mutex
-	pending []*auth.UsageRecord
-	db      *gorm.DB
+	mu       sync.Mutex
+	pending  []*auth.UsageRecord
+	db       *gorm.DB
+	stop     chan struct{}
+	done     chan struct{}
+	stopOnce sync.Once
 }

+// droppedRecords counts records discarded because the in-memory queue was full.
+// Used to rate-limit the warn log so a sustained outage doesn't flood it.
+var droppedRecords atomic.Uint64
+
 func (b *usageBatcher) add(r *auth.UsageRecord) {
 	b.mu.Lock()
+	if len(b.pending) >= usageMaxPending {
+		b.mu.Unlock()
+		// Rate-limit: one warn per 1024 drops keeps the log readable.
+		n := droppedRecords.Add(1)
+		if n&1023 == 1 {
+			xlog.Warn("usage batcher full, dropping record",
+				"cap", usageMaxPending, "total_dropped", n)
+		}
+		return
+	}
 	b.pending = append(b.pending, r)
 	b.mu.Unlock()
 }
@@ -42,31 +62,102 @@ func (b *usageBatcher) flush() {

 	if err := b.db.Create(&batch).Error; err != nil {
 		xlog.Error("Failed to flush usage batch", "count", len(batch), "error", err)
-		// Re-queue failed records with a cap to avoid unbounded growth
+		// Cap-aware re-queue: prepend as much of the failed batch as fits
+		// alongside any records added concurrently with the failed write.
 		b.mu.Lock()
-		if len(b.pending) < usageMaxPending {
-			b.pending = append(batch, b.pending...)
+		room := usageMaxPending - len(b.pending)
+		if room > 0 {
+			if room > len(batch) {
+				room = len(batch)
+			}
+			b.pending = append(batch[:room], b.pending...)
 		}
 		b.mu.Unlock()
 	}
 }

-var batcher *usageBatcher
+func (b *usageBatcher) run() {
+	defer close(b.done)
+	ticker := time.NewTicker(usageFlushInterval)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-ticker.C:
+			b.flush()
+		case <-b.stop:
+			b.flush() // final drain
+			return
+		}
+	}
+}
+
+func (b *usageBatcher) shutdown() {
+	b.stopOnce.Do(func() {
+		close(b.stop)
+		<-b.done
+	})
+}
+
+// The package-level batcher is guarded by batcherMu so Init / Shutdown cycles
+// (the test pattern) don't race against UsageMiddleware reads.
+var (
+	batcherMu sync.RWMutex
+	batcher   *usageBatcher
+)
+
+func currentBatcher() *usageBatcher {
+	batcherMu.RLock()
+	defer batcherMu.RUnlock()
+	return batcher
+}

 // InitUsageRecorder starts a background goroutine that periodically flushes
-// accumulated usage records to the database.
+// accumulated usage records to the database. Calling it more than once
+// shuts down the previous batcher first so its goroutine doesn't leak.
 func InitUsageRecorder(db *gorm.DB) {
 	if db == nil {
 		return
 	}
-	batcher = &usageBatcher{db: db}
-	go func() {
-		ticker := time.NewTicker(usageFlushInterval)
-		defer ticker.Stop()
-		for range ticker.C {
-			batcher.flush()
-		}
-	}()
+
+	batcherMu.Lock()
+	old := batcher
+	batcher = nil
+	batcherMu.Unlock()
+	if old != nil {
+		old.shutdown()
+	}
+
+	b := &usageBatcher{
+		db:   db,
+		stop: make(chan struct{}),
+		done: make(chan struct{}),
+	}
+	batcherMu.Lock()
+	batcher = b
+	batcherMu.Unlock()
+
+	go b.run()
+}
+
+// ShutdownUsageRecorder stops the background flusher and synchronously drains
+// pending records once. Safe to call multiple times. Not yet wired into the
+// application lifecycle; intended for graceful process exit and tests.
+func ShutdownUsageRecorder() {
+	batcherMu.Lock()
+	b := batcher
+	batcher = nil
+	batcherMu.Unlock()
+	if b != nil {
+		b.shutdown()
+	}
+}
+
+// FlushNow synchronously flushes any pending usage records. Intended for tests
+// that need deterministic behaviour without waiting for the ticker.
+func FlushNow() {
+	if b := currentBatcher(); b != nil {
+		b.flush()
+	}
 }

 // usageResponseBody is the minimal structure we need from the response JSON.
@@ -84,7 +175,8 @@ type usageResponseBody struct {
 func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
 	return func(next echo.HandlerFunc) echo.HandlerFunc {
 		return func(c echo.Context) error {
-			if db == nil || batcher == nil {
+			b := currentBatcher()
+			if db == nil || b == nil {
 				return next(c)
 			}

@@ -149,9 +241,17 @@ func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
 				return handlerErr
 			}

+			source := auth.GetSource(c)
+			if source == "" {
+				// Auth disabled or unrecognised path: classify as web so the row is still
+				// bucketable rather than silently dropped from per-source aggregates.
+				source = auth.UsageSourceWeb
+			}
+
 			record := &auth.UsageRecord{
 				UserID:           user.ID,
 				UserName:         user.Name,
+				Source:           source,
 				Model:            resp.Model,
 				Endpoint:         c.Request().URL.Path,
 				PromptTokens:     resp.Usage.PromptTokens,
@@ -161,7 +261,13 @@ func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
 				CreatedAt:        startTime,
 			}

-			batcher.add(record)
+			if key := auth.GetAPIKey(c); key != nil {
+				id := key.ID
+				record.APIKeyID = &id
+				record.APIKeyName = key.Name
+			}
+
+			b.add(record)

 			return handlerErr
 		}
--- a/core/http/middleware/usage_test.go
+++ b/core/http/middleware/usage_test.go
@@ -0,0 +1,140 @@
+//go:build auth
+
+package middleware_test
+
+import (
+	"bytes"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/http/auth"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+	"gorm.io/gorm"
+)
+
+// testAuthDB returns a fresh in-memory SQLite auth DB.
+func testAuthDB() *gorm.DB {
+	db, err := auth.InitDB(":memory:")
+	if err != nil {
+		panic(err)
+	}
+	return db
+}
+
+var _ = Describe("UsageMiddleware", func() {
+	var (
+		e  *echo.Echo
+		db *gorm.DB
+	)
+
+	BeforeEach(func() {
+		db = testAuthDB()
+		e = echo.New()
+		middleware.InitUsageRecorder(db)
+	})
+
+	AfterEach(func() {
+		middleware.ShutdownUsageRecorder()
+	})
+
+	okHandler := func(c echo.Context) error {
+		body, _ := json.Marshal(map[string]any{
+			"model": "gpt-4",
+			"usage": map[string]int{
+				"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15,
+			},
+		})
+		c.Response().Header().Set("Content-Type", "application/json")
+		c.Response().WriteHeader(http.StatusOK)
+		_, _ = c.Response().Write(body)
+		return nil
+	}
+
+	// FlushNow drains pending records synchronously, replacing the 6s sleep
+	// that was previously needed to wait for the batcher's ticker.
+	flush := middleware.FlushNow
+
+	It("records source=web when auth_source is web", func() {
+		e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
+			return func(c echo.Context) error {
+				c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
+				c.Set("auth_source", auth.UsageSourceWeb)
+				return next(c)
+			}
+		}, middleware.UsageMiddleware(db))
+
+		req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
+		e.ServeHTTP(httptest.NewRecorder(), req)
+		flush()
+
+		var rec auth.UsageRecord
+		Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
+		Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
+		Expect(rec.APIKeyID).To(BeNil())
+		Expect(rec.APIKeyName).To(BeEmpty())
+	})
+
+	It("records source=apikey with snapshotted name when auth_apikey is set", func() {
+		e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
+			return func(c echo.Context) error {
+				c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
+				c.Set("auth_source", auth.UsageSourceAPIKey)
+				c.Set("auth_apikey", &auth.UserAPIKey{ID: "key-1", Name: "ci-runner"})
+				return next(c)
+			}
+		}, middleware.UsageMiddleware(db))
+
+		req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
+		e.ServeHTTP(httptest.NewRecorder(), req)
+		flush()
+
+		var rec auth.UsageRecord
+		Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
+		Expect(rec.Source).To(Equal(auth.UsageSourceAPIKey))
+		Expect(rec.APIKeyID).ToNot(BeNil())
+		Expect(*rec.APIKeyID).To(Equal("key-1"))
+		Expect(rec.APIKeyName).To(Equal("ci-runner"))
+	})
+
+	It("FlushNow drains pending records synchronously", func() {
+		e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
+			return func(c echo.Context) error {
+				c.Set("auth_user", &auth.User{ID: "carol", Name: "Carol"})
+				c.Set("auth_source", auth.UsageSourceWeb)
+				return next(c)
+			}
+		}, middleware.UsageMiddleware(db))
+
+		req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
+		e.ServeHTTP(httptest.NewRecorder(), req)
+
+		// No sleep: FlushNow should drain immediately.
+		middleware.FlushNow()
+
+		var rec auth.UsageRecord
+		Expect(db.Where("user_id = ?", "carol").First(&rec).Error).To(Succeed())
+		Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
+	})
+
+	It("falls back to source=web when auth_source is empty", func() {
+		e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
+			return func(c echo.Context) error {
+				c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
+				// no auth_source set
+				return next(c)
+			}
+		}, middleware.UsageMiddleware(db))
+
+		req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
+		e.ServeHTTP(httptest.NewRecorder(), req)
+		flush()
+
+		var rec auth.UsageRecord
+		Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
+		Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
+	})
+})
--- a/core/http/react-ui/e2e/chat-polling-selection.spec.js
+++ b/core/http/react-ui/e2e/chat-polling-selection.spec.js
@@ -0,0 +1,143 @@
+import { test, expect } from '@playwright/test'
+
+// Regression coverage for issue #9904:
+// - /api/operations was polled every 1s and *always* re-rendered the Chat
+//   page, even when the response was unchanged. The reconciliation would
+//   collapse any text selection inside an assistant message.
+// - The copy button next to each assistant message used navigator.clipboard
+//   without any fallback, which is undefined when the page is served over
+//   plain http (non-secure context) from a remote host.
+
+async function setupChatPage(page) {
+  await page.route('**/api/models/capabilities', (route) => {
+    route.fulfill({
+      contentType: 'application/json',
+      body: JSON.stringify({
+        data: [{ id: 'test-model', capabilities: ['FLAG_CHAT'] }],
+      }),
+    })
+  })
+
+  // Poll-tracking mock: assert the hook is hammering /api/operations every
+  // ~1s, and always return an empty list so its contents never change.
+  let operationsHits = 0
+  await page.route('**/api/operations', (route) => {
+    operationsHits++
+    route.fulfill({
+      contentType: 'application/json',
+      body: JSON.stringify({ operations: [] }),
+    })
+  })
+
+  await page.route('**/v1/chat/completions', (route) => {
+    // One short SSE stream so the chat finishes streaming quickly and we
+    // can interact with a stable assistant message.
+    const body = [
+      'data: {"choices":[{"delta":{"content":"Hello world this is a long assistant reply that we can try to select."},"index":0}]}\n\n',
+      'data: {"choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":1,"completion_tokens":1,"total_tokens":2}}\n\n',
+      'data: [DONE]\n\n',
+    ].join('')
+    route.fulfill({
+      status: 200,
+      headers: { 'Content-Type': 'text/event-stream' },
+      body,
+    })
+  })
+
+  return { getOperationsHits: () => operationsHits }
+}
+
+test.describe('Chat - /api/operations polling (#9904)', () => {
+  test('text selection inside an assistant message survives polling', async ({ page }) => {
+    const { getOperationsHits } = await setupChatPage(page)
+
+    await page.goto('/app/chat')
+    await expect(page.getByRole('button', { name: 'test-model' })).toBeVisible({ timeout: 10_000 })
+
+    await page.locator('.chat-input').fill('Hi')
+    await page.locator('.chat-send-btn').click()
+
+    const assistantContent = page.locator('.chat-message-assistant .chat-message-content').first()
+    await expect(assistantContent).toContainText('Hello world', { timeout: 10_000 })
+
+    // Sanity check: the polling we're regressing against is actually firing.
+    await page.waitForTimeout(2_500)
+    expect(getOperationsHits()).toBeGreaterThan(1)
+
+    // Sanity check that the bug we're guarding against is structurally
+    // possible: count how many times the assistant content node gets
+    // *touched* by React (childList / characterData mutations) over a
+    // 3-second window. Before the fix, every poll re-rendered Chat and
+    // re-set dangerouslySetInnerHTML, triggering a mutation cascade that
+    // collapsed the user's text selection. After the fix, polling with
+    // identical contents must not mutate the DOM at all.
+    const mutationCount = await assistantContent.evaluate((el) => new Promise((resolve) => {
+      let count = 0
+      const obs = new MutationObserver((records) => { count += records.length })
+      obs.observe(el, { childList: true, subtree: true, characterData: true })
+      setTimeout(() => { obs.disconnect(); resolve(count) }, 3_000)
+    }))
+    expect(mutationCount).toBe(0)
+
+    // Same sanity check translated to a user-observable property: a
+    // programmatically created selection survives the polling window.
+    await assistantContent.evaluate((el) => {
+      const range = document.createRange()
+      range.selectNodeContents(el)
+      const sel = window.getSelection()
+      sel.removeAllRanges()
+      sel.addRange(range)
+    })
+
+    const initialSelection = await page.evaluate(() => window.getSelection().toString())
+    expect(initialSelection).toContain('Hello world')
+
+    await page.waitForTimeout(2_500)
+
+    const selectionAfterPolling = await page.evaluate(() => window.getSelection().toString())
+    expect(selectionAfterPolling).toBe(initialSelection)
+  })
+})
+
+test.describe('Chat - copy button (#9904)', () => {
+  test('copy button works when navigator.clipboard is unavailable (plain http)', async ({ page }) => {
+    await setupChatPage(page)
+
+    // Simulate a non-secure context: hide navigator.clipboard before any of
+    // our app code touches it. This mirrors what browsers do over plain
+    // http from a remote host.
+    await page.addInitScript(() => {
+      Object.defineProperty(window, 'isSecureContext', { value: false, configurable: true })
+      try {
+        Object.defineProperty(navigator, 'clipboard', { value: undefined, configurable: true })
+      } catch { /* some browsers refuse — the secure-context flag is enough */ }
+    })
+
+    await page.goto('/app/chat')
+    await expect(page.getByRole('button', { name: 'test-model' })).toBeVisible({ timeout: 10_000 })
+
+    await page.locator('.chat-input').fill('Hi')
+    await page.locator('.chat-send-btn').click()
+
+    const assistantBubble = page.locator('.chat-message-assistant .chat-message-bubble').first()
+    await expect(assistantBubble).toContainText('Hello world', { timeout: 10_000 })
+
+    // Spy on document.execCommand so we can confirm the fallback path ran.
+    await page.evaluate(() => {
+      window.__execCommandCalls = []
+      const original = document.execCommand?.bind(document)
+      document.execCommand = (cmd, ...rest) => {
+        window.__execCommandCalls.push(cmd)
+        // execCommand('copy') in a headless browser may return false because
+        // there is no real clipboard, but the fact that we tried is what we
+        // care about for this regression.
+        return original ? original(cmd, ...rest) : false
+      }
+    })
+
+    await assistantBubble.locator('.chat-message-actions button').first().click()
+
+    const execCommandCalls = await page.evaluate(() => window.__execCommandCalls)
+    expect(execCommandCalls).toContain('copy')
+  })
+})
--- a/core/http/react-ui/public/locales/de/chat.json
+++ b/core/http/react-ui/public/locales/de/chat.json
@@ -97,7 +97,8 @@
  },
  "toasts": {
    "selectModel": "Bitte wählen Sie ein Modell",
-    "copied": "In die Zwischenablage kopiert"
+    "copied": "In die Zwischenablage kopiert",
+    "copyFailed": "Kopieren in die Zwischenablage fehlgeschlagen"
  },
  "menu": {
    "trigger": "Chats",
--- a/core/http/react-ui/public/locales/en/admin.json
+++ b/core/http/react-ui/public/locales/en/admin.json
@@ -53,7 +53,30 @@
  },
  "usage": {
    "title": "Usage",
-    "subtitle": "API token usage statistics"
+    "subtitle": "API token usage statistics",
+    "sources": {
+      "tab": "Sources",
+      "mixTitle": "Source mix",
+      "ribbonAria": "{{apikey}}% API keys, {{web}}% Web UI, {{legacy}}% Legacy",
+      "topSources": "Top sources over time",
+      "searchPlaceholder": "Search by name or prefix",
+      "sortBy": "Sort",
+      "sortTokens": "Tokens",
+      "sortRequests": "Requests",
+      "sortLastUsed": "Last used",
+      "sortName": "Name",
+      "sortUser": "User",
+      "webUI": "Web UI",
+      "legacy": "Legacy",
+      "revoked": "revoked",
+      "filteredTo": "Filtered to: {{name}}",
+      "clearFilter": "Clear filter",
+      "other": "Other ({{count}})",
+      "noTrafficShort": "No requests in this period.",
+      "noKeysYet": "Once requests come in, you'll see them broken down here.",
+      "createKey": "Create your first API key",
+      "truncatedWarning": "Showing top 200 keys. Apply a filter to narrow further."
+    }
  },
  "explorer": {
    "title": "Explorer",
--- a/core/http/react-ui/public/locales/en/chat.json
+++ b/core/http/react-ui/public/locales/en/chat.json
@@ -97,7 +97,8 @@
  },
  "toasts": {
    "selectModel": "Please select a model",
-    "copied": "Copied to clipboard"
+    "copied": "Copied to clipboard",
+    "copyFailed": "Could not copy to clipboard"
  },
  "menu": {
    "trigger": "Chats",
--- a/core/http/react-ui/public/locales/es/chat.json
+++ b/core/http/react-ui/public/locales/es/chat.json
@@ -97,7 +97,8 @@
  },
  "toasts": {
    "selectModel": "Por favor selecciona un modelo",
-    "copied": "Copiado al portapapeles"
+    "copied": "Copiado al portapapeles",
+    "copyFailed": "No se pudo copiar al portapapeles"
  },
  "menu": {
    "trigger": "Chats",
--- a/core/http/react-ui/public/locales/it/chat.json
+++ b/core/http/react-ui/public/locales/it/chat.json
@@ -97,7 +97,8 @@
  },
  "toasts": {
    "selectModel": "Seleziona un modello",
-    "copied": "Copiato negli appunti"
+    "copied": "Copiato negli appunti",
+    "copyFailed": "Impossibile copiare negli appunti"
  },
  "menu": {
    "trigger": "Chat",
--- a/core/http/react-ui/public/locales/zh-CN/chat.json
+++ b/core/http/react-ui/public/locales/zh-CN/chat.json
@@ -97,7 +97,8 @@
  },
  "toasts": {
    "selectModel": "请选择一个模型",
-    "copied": "已复制到剪贴板"
+    "copied": "已复制到剪贴板",
+    "copyFailed": "无法复制到剪贴板"
  },
  "menu": {
    "trigger": "聊天",
--- a/core/http/react-ui/src/components/CanvasPanel.jsx
+++ b/core/http/react-ui/src/components/CanvasPanel.jsx
@@ -2,6 +2,7 @@ import { useState, useEffect, useRef } from 'react'
 import { renderMarkdown } from '../utils/markdown'
 import { getArtifactIcon } from '../utils/artifacts'
 import { safeHref } from '../utils/url'
+import { copyToClipboard } from '../utils/clipboard'
 import DOMPurify from 'dompurify'
 import hljs from 'highlight.js'

@@ -23,11 +24,13 @@ export default function CanvasPanel({ artifacts, selectedId, onSelect, onClose }
    }
  }, [current, showPreview])

-  const handleCopy = () => {
+  const handleCopy = async () => {
    const text = current.code || current.url || ''
-    navigator.clipboard.writeText(text)
-    setCopySuccess(true)
-    setTimeout(() => setCopySuccess(false), 2000)
+    const ok = await copyToClipboard(text)
+    if (ok) {
+      setCopySuccess(true)
+      setTimeout(() => setCopySuccess(false), 2000)
+    }
  }

  const handleDownload = () => {
--- a/core/http/react-ui/src/components/NodeInstallPicker.jsx
+++ b/core/http/react-ui/src/components/NodeInstallPicker.jsx
@@ -1,7 +1,7 @@
 import { useState, useMemo, useEffect, useRef } from 'react'
 import Modal from './Modal'
 import SearchableSelect from './SearchableSelect'
-import { nodesApi } from '../utils/api'
+import { nodesApi, backendsApi } from '../utils/api'

 // NodeInstallPicker is the single multi-node install surface used both from
 // the Backends gallery split-button and from the "Install on more nodes" `+`
@@ -240,6 +240,37 @@ export default function NodeInstallPicker({
  }
  const clearSelection = () => setSelected(new Set())

+  // pollJob resolves with { done: true, error?: string } once a single job
+  // completes, fails, or is cancelled. Bounded by a hard wall-clock cap so a
+  // stuck worker eventually surfaces in the UI as "Failed" instead of
+  // spinning forever.
+  const pollJob = (jobID) => new Promise((resolve) => {
+    const POLL_INTERVAL_MS = 1500
+    const HARD_CAP_MS = 6 * 60 * 1000 // 6 min - generous for a fresh worker download
+    const startedAt = Date.now()
+
+    const tick = async () => {
+      try {
+        const status = await backendsApi.getJob(jobID)
+        if (status?.completed) { resolve({ done: true }); return }
+        if (status?.error) { resolve({ done: true, error: status.error }); return }
+        if (status?.processed && !status?.completed) {
+          resolve({ done: true, error: status.error || 'install did not complete' })
+          return
+        }
+      } catch (err) {
+        resolve({ done: true, error: err?.message || 'polling failed' })
+        return
+      }
+      if (Date.now() - startedAt > HARD_CAP_MS) {
+        resolve({ done: true, error: 'timed out waiting for install to finish' })
+        return
+      }
+      setTimeout(tick, POLL_INTERVAL_MS)
+    }
+    tick()
+  })
+
  const submit = async () => {
    if (selected.size === 0 || submitting) return
    if (counts.overrides > 0 && !showMismatchConfirm) {
@@ -255,38 +286,68 @@ export default function NodeInstallPicker({
      return next
    })

-    const results = await Promise.allSettled(ids.map(id =>
+    // Phase 1: dispatch all installs in parallel. Each POST returns immediately
+    // with { jobID } now that the handler is async.
+    const dispatchResults = await Promise.allSettled(ids.map(id =>
      nodesApi.installBackend(id, effectiveBackendName)
-        .then(r => ({ id, ok: true, message: r?.message }))
-        .catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
+        .then(r => ({ id, ok: true, jobID: r?.jobID }))
+        .catch(err => ({ id, ok: false, error: err?.message || 'dispatch failed' }))
    ))

-    let successCount = 0, failCount = 0
-    setPerNode(prev => {
-      const next = { ...prev }
-      for (const r of results) {
-        if (r.status !== 'fulfilled') continue
-        const v = r.value
-        if (v.ok) {
-          next[v.id] = { status: 'done' }
-          successCount++
-        } else {
-          next[v.id] = { status: 'error', error: v.error }
-          failCount++
-        }
+    // Classify dispatch results synchronously OUTSIDE the setter. React may
+    // invoke a functional state updater more than once (StrictMode dev double
+    // invoke, concurrent rendering replay): building the jobs array inside
+    // the closure would duplicate entries and re-poll the same job.
+    const jobs = []
+    const dispatchPatch = {}
+    for (const r of dispatchResults) {
+      if (r.status !== 'fulfilled') continue
+      const v = r.value
+      if (v.ok && v.jobID) {
+        dispatchPatch[v.id] = { status: 'installing', jobID: v.jobID }
+        jobs.push({ nodeID: v.id, jobID: v.jobID })
+      } else {
+        dispatchPatch[v.id] = { status: 'error', error: v.error || 'dispatch failed' }
      }
-      return next
+    }
+    setPerNode(prev => ({ ...prev, ...dispatchPatch }))
+
+    // Phase 2: poll each job. Promise.all resolves when the last job settles;
+    // intermediate updates flip per-row state via the setPerNode inside pollJob.
+    await Promise.all(jobs.map(async ({ nodeID, jobID }) => {
+      const result = await pollJob(jobID)
+      setPerNode(prev => {
+        const next = { ...prev }
+        if (result.error) {
+          next[nodeID] = { status: 'error', error: result.error, jobID }
+        } else {
+          next[nodeID] = { status: 'done', jobID }
+        }
+        return next
+      })
+    }))
+
+    // Phase 3: summary toast + onComplete. Read latest state via functional setter.
+    let successCount = 0
+    let failCount = 0
+    setPerNode(prev => {
+      for (const v of Object.values(prev)) {
+        if (v.status === 'done') successCount++
+        else if (v.status === 'error') failCount++
+      }
+      return prev
    })
+
    setSubmitting(false)

    if (successCount > 0 && onComplete) onComplete()

-    if (failCount === 0) {
+    if (failCount === 0 && successCount > 0) {
      addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
      setTimeout(() => onClose?.(), 800)
-    } else if (successCount === 0) {
+    } else if (successCount === 0 && failCount > 0) {
      addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
-    } else {
+    } else if (successCount > 0 && failCount > 0) {
      addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
    }
  }
@@ -297,32 +358,58 @@ export default function NodeInstallPicker({
      .map(([id]) => id)
    if (failedIds.length === 0) return
    setSelected(new Set(failedIds))
-    // Replace state for failed rows so they show "installing" again, not stale errors.
    setPerNode(prev => {
      const next = { ...prev }
      failedIds.forEach(id => { next[id] = { status: 'installing' } })
      return next
    })
    setSubmitting(true)
-    const results = await Promise.allSettled(failedIds.map(id =>
+
+    const dispatchResults = await Promise.allSettled(failedIds.map(id =>
      nodesApi.installBackend(id, effectiveBackendName)
-        .then(r => ({ id, ok: true, message: r?.message }))
-        .catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
+        .then(r => ({ id, ok: true, jobID: r?.jobID }))
+        .catch(err => ({ id, ok: false, error: err?.message || 'dispatch failed' }))
    ))
+
+    // Same precaution as in submit(): classify outside the functional setter
+    // so a replayed updater can't push duplicate jobs into the polling list.
+    const jobs = []
+    const dispatchPatch = {}
+    for (const r of dispatchResults) {
+      if (r.status !== 'fulfilled') continue
+      const v = r.value
+      if (v.ok && v.jobID) {
+        dispatchPatch[v.id] = { status: 'installing', jobID: v.jobID }
+        jobs.push({ nodeID: v.id, jobID: v.jobID })
+      } else {
+        dispatchPatch[v.id] = { status: 'error', error: v.error || 'dispatch failed' }
+      }
+    }
+    setPerNode(prev => ({ ...prev, ...dispatchPatch }))
+
+    await Promise.all(jobs.map(async ({ nodeID, jobID }) => {
+      const result = await pollJob(jobID)
+      setPerNode(prev => {
+        const next = { ...prev }
+        if (result.error) next[nodeID] = { status: 'error', error: result.error, jobID }
+        else next[nodeID] = { status: 'done', jobID }
+        return next
+      })
+    }))
+
+    setSubmitting(false)
+
    let successCount = 0, failCount = 0
    setPerNode(prev => {
-      const next = { ...prev }
-      for (const r of results) {
-        if (r.status !== 'fulfilled') continue
-        const v = r.value
-        if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
-        else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
+      for (const id of failedIds) {
+        const v = prev[id]
+        if (v?.status === 'done') successCount++
+        else if (v?.status === 'error') failCount++
      }
-      return next
+      return prev
    })
-    setSubmitting(false)
    if (successCount > 0 && onComplete) onComplete()
-    if (failCount === 0) {
+    if (failCount === 0 && successCount > 0) {
      addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
      setTimeout(() => onClose?.(), 800)
    }
--- a/core/http/react-ui/src/hooks/useChat.js
+++ b/core/http/react-ui/src/hooks/useChat.js
@@ -218,9 +218,15 @@ export function useChat(initialModel = '') {
          })
          userFiles.push({ name: file.name, type: 'audio' })
        } else {
-          // Text/PDF files - append to content
-          userFiles.push({ name: file.name, type: 'file', content: file.textContent || '' })
-        }
+			// Text/PDF files - append to content
+			if (file.textContent) {
+				messageContent.push({
+					type: 'text',
+					text: `\n\n--- File: ${file.name} ---\n${file.textContent}\n--- End of ${file.name} ---`,
+				})
+			}
+			userFiles.push({ name: file.name, type: 'file', content: file.textContent || '' })
+		}
      }
    } else {
      messageContent = content
--- a/core/http/react-ui/src/hooks/useOperations.js
+++ b/core/http/react-ui/src/hooks/useOperations.js
@@ -2,6 +2,14 @@ import { useState, useEffect, useCallback, useRef } from 'react'
 import { operationsApi } from '../utils/api'
 import { useAuth } from '../context/AuthContext'

+// Serialize ops into a stable comparison key. Each op is a flat map of
+// primitives, so JSON.stringify is good enough and stable as long as the
+// server emits keys in the same order (Go's map iteration into JSON happens
+// to be stable here because we build an explicit map[string]any).
+function serializeOps(ops) {
+  return JSON.stringify(ops)
+}
+
 export function useOperations(pollInterval = 1000) {
  const [operations, setOperations] = useState([])
  const [loading, setLoading] = useState(true)
@@ -11,16 +19,26 @@ export function useOperations(pollInterval = 1000) {

  const previousCountRef = useRef(0)
  const onAllCompleteRef = useRef(null)
+  // Track the last payload we wrote into state. Each poll otherwise produces
+  // a fresh array reference even when nothing changed, and that re-render
+  // ripples into the Chat page — wiping the user's text selection mid-read
+  // (#9904).
+  const lastSerializedRef = useRef('[]')

  const fetchOperations = useCallback(async () => {
    if (!isAdmin) {
-      setLoading(false)
+      setLoading((prev) => (prev ? false : prev))
      return
    }
    try {
      const data = await operationsApi.list()
      const ops = data?.operations || (Array.isArray(data) ? data : [])
-      setOperations(ops)
+
+      const serialized = serializeOps(ops)
+      if (serialized !== lastSerializedRef.current) {
+        lastSerializedRef.current = serialized
+        setOperations(ops)
+      }

      // Separate active (non-failed) operations from failed ones
      const activeOps = ops.filter(op => !op.error)
@@ -32,11 +50,11 @@ export function useOperations(pollInterval = 1000) {
      }
      previousCountRef.current = activeOps.length

-      setError(null)
+      setError((prev) => (prev === null ? prev : null))
    } catch (err) {
-      setError(err.message)
+      setError((prev) => (prev === err.message ? prev : err.message))
    } finally {
-      setLoading(false)
+      setLoading((prev) => (prev ? false : prev))
    }
  }, [isAdmin])

--- a/core/http/react-ui/src/pages/AgentChat.jsx
+++ b/core/http/react-ui/src/pages/AgentChat.jsx
@@ -9,6 +9,7 @@ import ResourceCards from '../components/ResourceCards'
 import ConfirmDialog from '../components/ConfirmDialog'
 import { useAgentChat } from '../hooks/useAgentChat'
 import { relativeTime } from '../utils/format'
+import { copyToClipboard } from '../utils/clipboard'

 function getLastMessagePreview(conv) {
  if (!conv.messages || conv.messages.length === 0) return ''
@@ -390,9 +391,13 @@ export default function AgentChat() {
    }
  }

-  const copyMessage = (content) => {
-    navigator.clipboard.writeText(content)
-    addToast('Copied to clipboard', 'success', 2000)
+  const copyMessage = async (content) => {
+    const ok = await copyToClipboard(content)
+    addToast(
+      ok ? 'Copied to clipboard' : 'Could not copy to clipboard',
+      ok ? 'success' : 'error',
+      ok ? 2000 : 3000,
+    )
  }

  const senderToRole = (sender) => {
--- a/core/http/react-ui/src/pages/BackendLogs.jsx
+++ b/core/http/react-ui/src/pages/BackendLogs.jsx
@@ -1,9 +1,10 @@
 import { useState, useEffect, useCallback, useRef, useMemo } from 'react'
-import { useParams, useSearchParams, useOutletContext, Link } from 'react-router-dom'
-import { backendLogsApi } from '../utils/api'
+import { useParams, useSearchParams, useOutletContext, Link, Navigate } from 'react-router-dom'
+import { backendLogsApi, nodesApi } from '../utils/api'
 import { formatTimestamp } from '../utils/format'
 import { apiUrl } from '../utils/basePath'
 import LoadingSpinner from '../components/LoadingSpinner'
+import { useDistributedMode } from '../hooks/useDistributedMode'

 function wsUrl(path) {
  const proto = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
@@ -274,11 +275,158 @@ function BackendLogsDetail({ modelId }) {
  )
 }

+// DistributedBackendLogsResolver runs only in distributed mode. The local
+// /api/backend-logs WebSocket has no backend behind it here (inference lives
+// on workers), so we resolve modelId → hosting node(s) and forward to the
+// per-node logs page. One hit redirects automatically; multiple hits render
+// a picker so the operator can pick which worker's logs to inspect.
+function DistributedBackendLogsResolver({ modelId, fromTimestamp }) {
+  const [hits, setHits] = useState(null) // [{ node, model }] once resolved
+  const [error, setError] = useState(null)
+
+  useEffect(() => {
+    let cancelled = false
+    ;(async () => {
+      try {
+        const nodes = await nodesApi.list()
+        const nodeList = Array.isArray(nodes) ? nodes : []
+        // Fan out to each node and collect entries that match this model.
+        // Per-node failures are tolerated — a single offline worker shouldn't
+        // hide logs available on its peers.
+        const perNode = await Promise.all(nodeList.map(async (node) => {
+          try {
+            const models = await nodesApi.getModels(node.id)
+            const matches = (Array.isArray(models) ? models : []).filter(m => m.model_name === modelId)
+            return matches.map(m => ({ node, model: m }))
+          } catch {
+            return []
+          }
+        }))
+        if (cancelled) return
+        setHits(perNode.flat())
+      } catch (err) {
+        if (!cancelled) setError(err)
+      }
+    })()
+    return () => { cancelled = true }
+  }, [modelId])
+
+  if (error) {
+    return (
+      <div className="page page--wide">
+        <div className="empty-state">
+          <div className="empty-state-icon"><i className="fas fa-exclamation-triangle" /></div>
+          <h2 className="empty-state-title">Failed to resolve hosting nodes</h2>
+          <p className="empty-state-text">{error.message}</p>
+        </div>
+      </div>
+    )
+  }
+
+  if (hits === null) {
+    return (
+      <div style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
+        <LoadingSpinner size="lg" />
+      </div>
+    )
+  }
+
+  if (hits.length === 0) {
+    return (
+      <div className="page page--wide">
+        <div className="empty-state">
+          <div className="empty-state-icon"><i className="fas fa-terminal" /></div>
+          <h2 className="empty-state-title">Model not loaded on any worker</h2>
+          <p className="empty-state-text">
+            <span style={{ fontFamily: 'var(--font-mono)' }}>{modelId}</span> isn't currently loaded on any node in the cluster.
+            Check the <Link to="/app/nodes" style={{ color: 'var(--color-primary)' }}>Nodes page</Link> to see which models are running where.
+          </p>
+        </div>
+      </div>
+    )
+  }
+
+  // Bare model name aggregates this node's replicas via the worker's log
+  // store; preserve ?from= so the deep-link from a trace still scrolls to
+  // the right line on arrival.
+  const buildHref = (nodeId) => {
+    const base = `/app/node-backend-logs/${nodeId}/${encodeURIComponent(modelId)}`
+    return fromTimestamp ? `${base}?from=${encodeURIComponent(fromTimestamp)}` : base
+  }
+
+  if (hits.length === 1) {
+    return <Navigate to={buildHref(hits[0].node.id)} replace />
+  }
+
+  // Multiple workers host this model — let the operator pick.
+  return (
+    <div className="page page--wide">
+      <div className="page-header">
+        <div>
+          <h1 className="page-title" style={{ marginBottom: 0 }}>
+            <i className="fas fa-terminal" style={{ fontSize: '0.8em', marginRight: 'var(--spacing-sm)' }} />
+            {modelId}
+          </h1>
+          <p className="page-subtitle" style={{ marginTop: 'var(--spacing-xs)' }}>
+            Hosted on {hits.length} workers — pick one to view its logs.
+          </p>
+        </div>
+      </div>
+      <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-xs)' }}>
+        {hits.map(({ node, model }) => (
+          <Link
+            key={`${node.id}#${model.replica_index ?? 0}`}
+            to={buildHref(node.id)}
+            style={{
+              display: 'flex', alignItems: 'center', justifyContent: 'space-between',
+              padding: 'var(--spacing-sm) var(--spacing-md)',
+              background: 'var(--color-bg-primary)', border: '1px solid var(--color-border)',
+              borderRadius: 'var(--radius-md)', textDecoration: 'none', color: 'inherit',
+            }}
+          >
+            <div>
+              <div style={{ fontWeight: 500 }}>{node.name || node.id}</div>
+              <div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', fontFamily: 'var(--font-mono)' }}>
+                {node.id}{model.replica_index ? ` · replica ${model.replica_index}` : ''} · {model.state}
+              </div>
+            </div>
+            <i className="fas fa-chevron-right" style={{ color: 'var(--color-text-muted)' }} />
+          </Link>
+        ))}
+      </div>
+    </div>
+  )
+}
+
+// BackendLogsRouter picks between the local WebSocket view (standalone) and
+// the distributed resolver. The probe runs once via useDistributedMode so a
+// 503 from /api/nodes (the canonical "distributed disabled" signal) keeps the
+// existing standalone path intact.
+function BackendLogsRouter({ modelId }) {
+  const [searchParams] = useSearchParams()
+  const fromTimestamp = searchParams.get('from')
+  const { enabled: distributedMode, loading } = useDistributedMode()
+
+  if (loading) {
+    return (
+      <div style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
+        <LoadingSpinner size="lg" />
+      </div>
+    )
+  }
+
+  if (distributedMode) {
+    return <DistributedBackendLogsResolver modelId={modelId} fromTimestamp={fromTimestamp} />
+  }
+
+  return <BackendLogsDetail modelId={modelId} />
+}
+
 export default function BackendLogs() {
  const { modelId } = useParams()

  if (modelId) {
-    return <BackendLogsDetail modelId={decodeURIComponent(modelId)} />
+    return <BackendLogsRouter modelId={decodeURIComponent(modelId)} />
  }

  // No model specified — redirect to System page
--- a/core/http/react-ui/src/pages/Backends.jsx
+++ b/core/http/react-ui/src/pages/Backends.jsx
@@ -179,16 +179,19 @@ export default function Backends() {

  // Install a single gallery backend on a specific node, used in target-node
  // mode (the URL has ?target=<node-id> set from the Nodes page entry point).
+  // The handler is async - we dispatch and let the global Operations panel
+  // surface progress; no need to await completion here.
  const handleInstallOnTarget = async (id) => {
    if (!targetNode) return
    try {
      await nodesApi.installBackend(targetNode.id, id)
-      addToast(`Installing ${id} on ${targetNode.name}…`, 'info')
-      // Per-node install is request-reply, not part of the global jobs feed —
-      // refetch to reflect the new Nodes column state.
-      setTimeout(() => { fetchBackends(); refetchNodes() }, 600)
+      addToast(`Installing ${id} on ${targetNode.name}...`, 'info')
+      // The install runs async via the gallery job queue. Refetch shortly so
+      // the Nodes column reflects "installing" state; the Operations panel
+      // tracks the actual progress until completion.
+      setTimeout(() => { fetchBackends(); refetchNodes() }, 1200)
    } catch (err) {
-      addToast(`Install failed on ${targetNode.name}: ${err.message}`, 'error')
+      addToast(`Install dispatch failed on ${targetNode.name}: ${err.message}`, 'error')
    }
  }

--- a/core/http/react-ui/src/pages/Chat.jsx
+++ b/core/http/react-ui/src/pages/Chat.jsx
@@ -17,6 +17,7 @@ import ChatsMenu from '../components/ChatsMenu'
 import { useAuth } from '../context/AuthContext'
 import { useOperations } from '../hooks/useOperations'
 import { relativeTime } from '../utils/format'
+import { copyToClipboard } from '../utils/clipboard'

 function getLastMessagePreview(chat) {
  if (!chat.history || chat.history.length === 0) return ''
@@ -798,10 +799,14 @@ export default function Chat() {
    }
  }

-  const copyMessage = (content) => {
+  const copyMessage = async (content) => {
    const text = typeof content === 'string' ? content : content?.[0]?.text || ''
-    navigator.clipboard.writeText(text)
-    addToast(t('toasts.copied'), 'success', 2000)
+    const ok = await copyToClipboard(text)
+    if (ok) {
+      addToast(t('toasts.copied'), 'success', 2000)
+    } else {
+      addToast(t('toasts.copyFailed'), 'error', 3000)
+    }
  }

  const contextPercent = getContextUsagePercent()
--- a/core/http/react-ui/src/pages/Home.jsx
+++ b/core/http/react-ui/src/pages/Home.jsx
@@ -161,7 +161,11 @@ export default function Home() {
    const newFiles = []
    for (const file of fileList) {
      const base64 = await fileToBase64(file)
-      newFiles.push({ name: file.name, type: file.type, base64 })
+      const entry = { name: file.name, type: file.type, base64 }
+      if (!file.type.startsWith('image/') && !file.type.startsWith('audio/')) {
+        entry.textContent = await file.text().catch(() => '')
+      }
+      newFiles.push(entry)
    }
    setter(prev => [...prev, ...newFiles])
  }, [])
--- a/core/http/react-ui/src/pages/Manage.jsx
+++ b/core/http/react-ui/src/pages/Manage.jsx
@@ -660,8 +660,7 @@ export default function Manage() {
                            { key: 'edit', icon: 'fa-pen-to-square', label: 'Edit configuration',
                              onClick: () => navigate(`/app/model-editor/${encodeURIComponent(model.id)}`) },
                            { key: 'logs', icon: 'fa-terminal', label: 'Backend logs',
-                              onClick: () => navigate(`/app/backend-logs/${encodeURIComponent(model.id)}`),
-                              hidden: distributedMode },
+                              onClick: () => navigate(`/app/backend-logs/${encodeURIComponent(model.id)}`) },
                            { divider: true },
                            { key: 'delete', icon: 'fa-trash', label: 'Delete model', danger: true,
                              onClick: () => handleDeleteModel(model.id) },
--- a/core/http/react-ui/src/pages/Traces.jsx
+++ b/core/http/react-ui/src/pages/Traces.jsx
@@ -220,7 +220,10 @@ function BackendTraceDetail({ trace }) {
        </div>
      )}

-      {/* Backend logs link */}
+      {/* Backend logs link — /app/backend-logs/:modelId is the unified entry
+          point: in standalone mode it streams local logs, in distributed mode
+          it resolves the model to the host worker(s) and either redirects to
+          /app/node-backend-logs/<nodeId>/<modelId> or shows a node picker. */}
      {trace.model_name && (
        <div style={{ marginBottom: 'var(--spacing-md)' }}>
          <a
@@ -406,7 +409,15 @@ export default function Traces() {
        <button className="btn btn-secondary btn-sm" onClick={fetchTraces}><i className="fas fa-rotate" /> Refresh</button>
        <button className="btn btn-secondary btn-sm" onClick={handleExport} disabled={traces.length === 0}><i className="fas fa-download" /> Export</button>
        <div style={{ flex: 1 }} />
-        <button className="btn btn-danger btn-sm" onClick={handleClear} disabled={traces.length === 0}><i className="fas fa-trash" /> Clear</button>
+        <button
+          className="btn btn-danger btn-sm"
+          onClick={handleClear}
+          /* Stay enabled while loading: a massive in-memory trace buffer is
+             precisely the case where the user can't see the table yet and
+             needs Clear to recover. Clearing an already-empty server-side
+             buffer is a harmless no-op. */
+          disabled={!loading && traces.length === 0}
+        ><i className="fas fa-trash" /> Clear</button>
      </div>

      {settings && (() => {
--- a/core/http/react-ui/src/pages/Usage.jsx
+++ b/core/http/react-ui/src/pages/Usage.jsx
@@ -4,6 +4,7 @@ import { useTranslation } from 'react-i18next'
 import { useAuth } from '../context/AuthContext'
 import { apiUrl } from '../utils/basePath'
 import LoadingSpinner from '../components/LoadingSpinner'
+import SourcesTab from './Usage/SourcesTab'

 const PERIODS = [
  { key: 'day', label: 'Day' },
@@ -724,23 +725,27 @@ export default function Usage() {
            {p.label}
          </button>
        ))}
+        <div style={{ width: 1, height: 20, background: 'var(--color-border-subtle)', margin: '0 var(--spacing-xs)' }} />
+        <button
+          className={`btn btn-sm ${activeTab === 'models' ? 'btn-primary' : 'btn-secondary'}`}
+          onClick={() => setActiveTab('models')}
+        >
+          <i className="fas fa-cube" style={{ fontSize: '0.7rem' }} /> Models
+        </button>
        {isAdmin && (
-          <>
-            <div style={{ width: 1, height: 20, background: 'var(--color-border-subtle)', margin: '0 var(--spacing-xs)' }} />
-            <button
-              className={`btn btn-sm ${activeTab === 'models' ? 'btn-primary' : 'btn-secondary'}`}
-              onClick={() => setActiveTab('models')}
-            >
-              <i className="fas fa-cube" style={{ fontSize: '0.7rem' }} /> Models
-            </button>
-            <button
-              className={`btn btn-sm ${activeTab === 'users' ? 'btn-primary' : 'btn-secondary'}`}
-              onClick={() => setActiveTab('users')}
-            >
-              <i className="fas fa-users" style={{ fontSize: '0.7rem' }} /> Users
-            </button>
-          </>
+          <button
+            className={`btn btn-sm ${activeTab === 'users' ? 'btn-primary' : 'btn-secondary'}`}
+            onClick={() => setActiveTab('users')}
+          >
+            <i className="fas fa-users" style={{ fontSize: '0.7rem' }} /> Users
+          </button>
        )}
+        <button
+          className={`btn btn-sm ${activeTab === 'sources' ? 'btn-primary' : 'btn-secondary'}`}
+          onClick={() => setActiveTab('sources')}
+        >
+          <i className="fas fa-key" style={{ fontSize: '0.7rem' }} /> {t('usage.sources.tab')}
+        </button>
        <div style={{ flex: 1 }} />
        <button className="btn btn-secondary btn-sm" onClick={fetchUsage} disabled={loading} style={{ gap: 4 }}>
          <i className={`fas fa-rotate${loading ? ' fa-spin' : ''}`} /> Refresh
@@ -884,6 +889,10 @@ export default function Usage() {
              </div>
            )
          )}
+
+          {activeTab === 'sources' && (
+            <SourcesTab period={period} adminUserId={selectedUserId} />
+          )}
        </>
      )}
    </div>
--- a/core/http/react-ui/src/pages/Usage/SourceMixRibbon.jsx
+++ b/core/http/react-ui/src/pages/Usage/SourceMixRibbon.jsx
@@ -0,0 +1,83 @@
+import { useTranslation } from 'react-i18next'
+
+const SEGMENT_COLORS = {
+  apikey: 'var(--color-primary)',
+  web: 'var(--color-info, #3b82f6)',
+  legacy: 'var(--color-warning, #f59e0b)',
+}
+
+// SourceMixRibbon renders one segmented horizontal bar showing the share of
+// tokens by source class (apikey / web / legacy). Clicking a segment invokes
+// onSelectSourceClass with the segment key so the parent can filter the view.
+//
+// Props:
+//   bySource: { apikey?: {tokens, requests}, web?: {...}, legacy?: {...} }
+//   keyCount: number of distinct API keys in the dataset (for the legend)
+//   onSelectSourceClass: (cls: 'apikey'|'web'|'legacy') => void (optional)
+export default function SourceMixRibbon({ bySource = {}, keyCount = 0, onSelectSourceClass }) {
+  const { t } = useTranslation('admin')
+
+  const apikey = (bySource.apikey?.tokens) || 0
+  const web = (bySource.web?.tokens) || 0
+  const legacy = (bySource.legacy?.tokens) || 0
+  const total = apikey + web + legacy || 1
+
+  const pct = (n) => Math.round((n / total) * 100)
+  const apiPct = pct(apikey)
+  const webPct = pct(web)
+  const legacyPct = pct(legacy)
+
+  const segments = [
+    { key: 'apikey', label: `${apiPct}% API keys (${keyCount})`, pct: apiPct, color: SEGMENT_COLORS.apikey },
+    { key: 'web', label: `${webPct}% ${t('usage.sources.webUI')}`, pct: webPct, color: SEGMENT_COLORS.web },
+    { key: 'legacy', label: `${legacyPct}% ${t('usage.sources.legacy')}`, pct: legacyPct, color: SEGMENT_COLORS.legacy },
+  ].filter((s) => s.pct > 0)
+
+  return (
+    <div
+      role="group"
+      aria-label={t('usage.sources.ribbonAria', { apikey: apiPct, web: webPct, legacy: legacyPct })}
+      style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-xs)' }}
+    >
+      <div style={{ fontSize: '0.875rem', fontWeight: 600, color: 'var(--color-text-primary)' }}>
+        {t('usage.sources.mixTitle')}
+      </div>
+      <div
+        style={{
+          display: 'flex',
+          height: 12,
+          borderRadius: 'var(--radius-sm)',
+          overflow: 'hidden',
+          border: '1px solid var(--color-border-subtle)',
+        }}
+      >
+        {segments.map((s) => (
+          <button
+            key={s.key}
+            type="button"
+            onClick={() => onSelectSourceClass?.(s.key)}
+            aria-label={s.label}
+            style={{
+              width: `${s.pct}%`,
+              background: s.color,
+              border: 'none',
+              padding: 0,
+              cursor: onSelectSourceClass ? 'pointer' : 'default',
+            }}
+          />
+        ))}
+      </div>
+      <div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-sm)', fontSize: '0.75rem' }}>
+        {segments.map((s) => (
+          <span key={s.key} style={{ display: 'inline-flex', alignItems: 'center', gap: 6 }}>
+            <span
+              style={{ width: 10, height: 10, borderRadius: 2, background: s.color, display: 'inline-block' }}
+              aria-hidden
+            />
+            {s.label}
+          </span>
+        ))}
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/pages/Usage/SourceTimeChart.jsx
+++ b/core/http/react-ui/src/pages/Usage/SourceTimeChart.jsx
@@ -0,0 +1,147 @@
+import { useMemo } from 'react'
+import { useTranslation } from 'react-i18next'
+
+const TOP_N = 7
+// Distinct, accessible-ish series colors that read on both light and dark themes.
+const SERIES_COLORS = [
+  'var(--color-primary)',
+  'var(--color-success, #10b981)',
+  'var(--color-warning, #f59e0b)',
+  'var(--color-info, #3b82f6)',
+  'var(--color-danger, #ef4444)',
+  '#a855f7',
+  '#ec4899',
+]
+const OTHER_COLOR = 'var(--color-text-muted, #94a3b8)'
+
+function identityFor(bucket) {
+  return bucket.api_key_id || bucket.source || 'unknown'
+}
+
+// buckets: UsageBucket[] from /api/auth/usage/sources (server-sorted ASC by bucket)
+// selectedKey: 'web' | 'legacy' | api_key_id | null
+// totals: SourceTotals (for the "Other (count)" legend label)
+export default function SourceTimeChart({ buckets = [], selectedKey, totals }) {
+  const { t } = useTranslation('admin')
+
+  // Find the top-N identities by total tokens across the period.
+  const topIds = useMemo(() => {
+    const sums = new Map()
+    for (const b of buckets) {
+      const id = identityFor(b)
+      sums.set(id, (sums.get(id) || 0) + (b.total_tokens || 0))
+    }
+    return [...sums.entries()]
+      .sort((a, b) => b[1] - a[1])
+      .slice(0, TOP_N)
+      .map(([id]) => id)
+  }, [buckets])
+
+  const topSet = useMemo(() => new Set(topIds), [topIds])
+
+  // Resolve a display label for an identity (api_key_id -> snapshotted name, or source name).
+  const labelByIdentity = useMemo(() => {
+    const m = new Map()
+    for (const b of buckets) {
+      const id = identityFor(b)
+      if (m.has(id)) continue
+      if (b.source === 'web')    { m.set(id, t('usage.sources.webUI')); continue }
+      if (b.source === 'legacy') { m.set(id, t('usage.sources.legacy')); continue }
+      m.set(id, b.api_key_name || b.api_key_id || id)
+    }
+    return m
+  }, [buckets, t])
+
+  // Build a dense per-bucket row, splitting top-N vs Other.
+  const series = useMemo(() => {
+    const byBucket = new Map()
+    for (const b of buckets) {
+      const id = identityFor(b)
+      const seriesId = topSet.has(id) ? id : '__other__'
+      const row = byBucket.get(b.bucket) || { bucket: b.bucket, total: 0 }
+      row[seriesId] = (row[seriesId] || 0) + (b.total_tokens || 0)
+      row.total += b.total_tokens || 0
+      byBucket.set(b.bucket, row)
+    }
+    return [...byBucket.values()]
+  }, [buckets, topSet])
+
+  const max = useMemo(
+    () => series.reduce((m, r) => Math.max(m, r.total), 0) || 1,
+    [series]
+  )
+
+  const seriesIds = [...topIds, '__other__']
+  const colorOf = (id) =>
+    id === '__other__'
+      ? OTHER_COLOR
+      : SERIES_COLORS[topIds.indexOf(id) % SERIES_COLORS.length]
+
+  const labelOfId = (id) => {
+    if (id === '__other__') return null // computed inline (need count)
+    return labelByIdentity.get(id) || id
+  }
+
+  const otherCount = Math.max(0, (totals?.by_key?.length || 0) - TOP_N)
+
+  // SVG geometry: 24px wide per bar (2px gap), 100px tall, viewBox stretches with bar count.
+  const barWidth = 20
+  const barGap = 4
+  const slotWidth = barWidth + barGap
+  const height = 100
+  const width = Math.max(series.length * slotWidth, 200)
+
+  return (
+    <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-xs)' }}>
+      <div style={{ fontSize: '0.875rem', fontWeight: 600, color: 'var(--color-text-primary)' }}>
+        {t('usage.sources.topSources')}
+      </div>
+
+      <svg
+        viewBox={`0 0 ${width} ${height}`}
+        preserveAspectRatio="none"
+        style={{ width: '100%', height: 160, display: 'block' }}
+        aria-hidden
+      >
+        {series.map((row, i) => {
+          let y = height
+          return (
+            <g key={row.bucket} transform={`translate(${i * slotWidth}, 0)`}>
+              {seriesIds.map(id => {
+                const v = row[id] || 0
+                if (!v) return null
+                const h = (v / max) * height
+                y -= h
+                const dim = selectedKey && selectedKey !== id ? 0.25 : 1
+                const title = id === '__other__'
+                  ? t('usage.sources.other', { count: otherCount })
+                  : labelOfId(id)
+                return (
+                  <rect
+                    key={id}
+                    x={barGap / 2} y={y}
+                    width={barWidth} height={h}
+                    fill={colorOf(id)} opacity={dim}
+                  >
+                    <title>{`${row.bucket} - ${title}: ${v.toLocaleString()}`}</title>
+                  </rect>
+                )
+              })}
+            </g>
+          )
+        })}
+      </svg>
+
+      <div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-sm)', fontSize: '0.75rem' }}>
+        {seriesIds.map(id => (
+          <span key={id} style={{ display: 'inline-flex', alignItems: 'center', gap: 6 }}>
+            <span style={{ width: 10, height: 10, borderRadius: 2, background: colorOf(id), display: 'inline-block' }} aria-hidden />
+            {id === '__other__'
+              ? t('usage.sources.other', { count: otherCount })
+              : labelOfId(id)}
+          </span>
+        ))}
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/pages/Usage/SourcesTab.jsx
+++ b/core/http/react-ui/src/pages/Usage/SourcesTab.jsx
@@ -0,0 +1,176 @@
+import { useEffect, useState } from 'react'
+import { useTranslation } from 'react-i18next'
+import { usageApi, apiKeysApi } from '../../utils/api'
+import { useAuth } from '../../context/AuthContext'
+import LoadingSpinner from '../../components/LoadingSpinner'
+import SourceMixRibbon from './SourceMixRibbon'
+import SourcesTable from './SourcesTable'
+import SourceTimeChart from './SourceTimeChart'
+
+const EMPTY_DATA = {
+  buckets: [],
+  totals: { by_source: {}, by_key: [], grand_total: { tokens: 0, requests: 0 } },
+  truncated: false,
+}
+
+// Resolve a human label for the currently selected key (web/legacy class or api_key_id).
+function labelForSelected(totals, selectedKey, t) {
+  if (!selectedKey) return ''
+  if (selectedKey === 'web')    return t('usage.sources.webUI')
+  if (selectedKey === 'legacy') return t('usage.sources.legacy')
+  const row = (totals?.by_key || []).find(k => k.api_key_id === selectedKey)
+  return row ? (row.api_key_name || selectedKey) : selectedKey
+}
+
+// SourcesTab fetches and renders per-source / per-API-key usage breakdown.
+// Task 10 replaces the raw JSON / list placeholders with SourceMixRibbon and
+// SourcesTable. Task 11 will add the time chart and drill-in chip.
+export default function SourcesTab({ period, adminUserId }) {
+  const { t } = useTranslation('admin')
+  const { isAdmin } = useAuth()
+
+  const [data, setData] = useState(EMPTY_DATA)
+  const [loading, setLoading] = useState(false)
+  const [error, setError] = useState(null)
+
+  const [selectedKey, setSelectedKey] = useState(null)
+  const [search, setSearch] = useState('')
+  const [sortKey, setSortKey] = useState('tokens')
+
+  // Pull the current set of API key ids so the table can mark unknown keys as
+  // revoked. null = "don't know yet" so the table won't dim live keys during
+  // the fetch or after a failure.
+  const [existingKeyIds, setExistingKeyIds] = useState(null)
+  useEffect(() => {
+    apiKeysApi
+      .list()
+      .then((resp) => {
+        const list = Array.isArray(resp) ? resp : (resp?.keys || [])
+        setExistingKeyIds(new Set(list.map((k) => k.id)))
+      })
+      .catch(() => { /* leave existingKeyIds null so revoked detection is skipped */ })
+  }, [])
+
+  useEffect(() => {
+    let cancelled = false
+    setLoading(true)
+    setError(null)
+    const p = isAdmin
+      ? usageApi.getAdminSources(period, adminUserId)
+      : usageApi.getMySources(period)
+    p
+      .then((d) => { if (!cancelled) setData(d || EMPTY_DATA) })
+      .catch((e) => { if (!cancelled) setError(e) })
+      .finally(() => { if (!cancelled) setLoading(false) })
+    return () => { cancelled = true }
+  }, [isAdmin, period, adminUserId])
+
+  const totals = data.totals || EMPTY_DATA.totals
+  const buckets = data.buckets || EMPTY_DATA.buckets
+  const grandT = totals.grand_total || { tokens: 0, requests: 0 }
+  const truncated = data.truncated || false
+
+  const isEmpty = !loading && (grandT.tokens || 0) === 0 && (grandT.requests || 0) === 0
+
+  if (loading) {
+    return (
+      <div style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
+        <LoadingSpinner size="lg" />
+      </div>
+    )
+  }
+
+  if (error) {
+    return (
+      <div className="empty-state">
+        <div className="empty-state-icon"><i className="fas fa-triangle-exclamation" /></div>
+        <h2 className="empty-state-title">Failed to load</h2>
+        <p className="empty-state-text">{String(error.message || error)}</p>
+      </div>
+    )
+  }
+
+  if (isEmpty) {
+    return (
+      <div className="empty-state">
+        <div className="empty-state-icon"><i className="fas fa-key" /></div>
+        <h2 className="empty-state-title">{t('usage.sources.noTrafficShort')}</h2>
+        <p className="empty-state-text">{t('usage.sources.noKeysYet')}</p>
+      </div>
+    )
+  }
+
+  return (
+    <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
+      <div className="card" style={{ padding: 'var(--spacing-md)' }}>
+        <SourceMixRibbon
+          bySource={totals.by_source}
+          keyCount={(totals.by_key || []).length}
+          onSelectSourceClass={(cls) => setSelectedKey(cls)}
+        />
+      </div>
+
+      {selectedKey && (
+        <div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-xs)' }}>
+          <span
+            style={{
+              display: 'inline-flex',
+              alignItems: 'center',
+              gap: 'var(--spacing-xs)',
+              padding: 'calc(var(--spacing-xs) / 2) var(--spacing-sm)',
+              background: 'var(--color-bg-secondary)',
+              color: 'var(--color-text-primary)',
+              fontSize: '0.75rem',
+              borderRadius: 'var(--radius-sm)',
+              border: '1px solid var(--color-border-subtle)',
+            }}
+          >
+            <i className="fas fa-filter" style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)' }} aria-hidden />
+            {t('usage.sources.filteredTo', { name: labelForSelected(totals, selectedKey, t) })}
+            <button
+              type="button"
+              onClick={() => setSelectedKey(null)}
+              aria-label={t('usage.sources.clearFilter')}
+              style={{
+                appearance: 'none',
+                background: 'transparent',
+                border: 'none',
+                color: 'var(--color-text-muted)',
+                cursor: 'pointer',
+                padding: 0,
+                fontSize: '0.875rem',
+                lineHeight: 1,
+              }}
+            >
+              <i className="fas fa-xmark" />
+            </button>
+          </span>
+        </div>
+      )}
+
+      <div className="card" style={{ padding: 'var(--spacing-md)' }}>
+        <SourceTimeChart buckets={buckets} selectedKey={selectedKey} totals={totals} />
+      </div>
+
+      <div className="card" style={{ padding: 'var(--spacing-md)' }}>
+        <SourcesTable
+          totals={totals}
+          selectedKey={selectedKey}
+          onSelectKey={setSelectedKey}
+          search={search}
+          setSearch={setSearch}
+          sortKey={sortKey}
+          setSortKey={setSortKey}
+          existingKeyIds={existingKeyIds}
+          showUserColumn={isAdmin}
+        />
+      </div>
+
+      {truncated && (
+        <div style={{ fontSize: '0.75rem', color: 'var(--color-warning)' }}>
+          {t('usage.sources.truncatedWarning')}
+        </div>
+      )}
+    </div>
+  )
+}
--- a/core/http/react-ui/src/pages/Usage/SourcesTable.jsx
+++ b/core/http/react-ui/src/pages/Usage/SourcesTable.jsx
@@ -0,0 +1,245 @@
+import { useMemo } from 'react'
+import { useTranslation } from 'react-i18next'
+
+const SORT_FNS = {
+  tokens: (a, b) => (b.tokens || 0) - (a.tokens || 0),
+  requests: (a, b) => (b.requests || 0) - (a.requests || 0),
+  last_used: (a, b) => new Date(b.last_used || 0).getTime() - new Date(a.last_used || 0).getTime(),
+  name: (a, b) => (a.name || '').localeCompare(b.name || ''),
+  user: (a, b) => (a.userName || '').localeCompare(b.userName || ''),
+}
+
+function formatTokens(n) {
+  if (!n) return '0'
+  if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M'
+  if (n >= 1_000) return (n / 1_000).toFixed(1) + 'k'
+  return String(n)
+}
+
+function formatRelative(iso) {
+  if (!iso) return '-'
+  const t = new Date(iso).getTime()
+  if (Number.isNaN(t) || t <= 0) return '-'
+  const diff = Date.now() - t
+  if (diff < 60_000) return 'just now'
+  if (diff < 3_600_000) return Math.round(diff / 60_000) + 'm ago'
+  if (diff < 86_400_000) return Math.round(diff / 3_600_000) + 'h ago'
+  return Math.round(diff / 86_400_000) + 'd ago'
+}
+
+// SourcesTable is the searchable, sortable list of key totals plus pseudo-rows
+// for the web UI and legacy (unkeyed) source classes. Clicking a row selects
+// it; the parent decides what to do with the selection (the drill-in panel
+// will be wired in Task 11).
+//
+// Props:
+//   totals: SourceTotals payload (from /api/auth/usage/sources)
+//   selectedKey: currently-selected row id (api_key_id | 'web' | 'legacy' | null)
+//   onSelectKey: (id|null) => void
+//   search / setSearch: free-text filter state lifted to the parent
+//   sortKey / setSortKey: sort column state lifted to the parent
+//   existingKeyIds: Set<string> of current (non-revoked) api key ids, or null
+//     when the parent hasn't yet learned which keys exist. Null suppresses the
+//     revoked badge entirely so live keys aren't dimmed during the fetch or
+//     after a failure.
+//   showUserColumn: render the User column. Admin views set this true so the
+//     reader can attribute each key (and each Web UI row) to its owner.
+export default function SourcesTable({
+  totals,
+  selectedKey,
+  onSelectKey,
+  search,
+  setSearch,
+  sortKey,
+  setSortKey,
+  existingKeyIds = null,
+  showUserColumn = false,
+}) {
+  const { t } = useTranslation('admin')
+
+  const rows = useMemo(() => {
+    const named = (totals?.by_key || []).map((k) => ({
+      kind: 'apikey',
+      id: k.api_key_id,
+      name: k.api_key_name || k.api_key_id,
+      userID: k.user_id || '',
+      userName: k.user_name || '',
+      prefix: '',
+      tokens: k.tokens,
+      requests: k.requests,
+      last_used: k.last_used,
+      revoked: existingKeyIds != null && !existingKeyIds.has(k.api_key_id),
+    }))
+
+    // Pseudo-rows for sources that don't have a named key identity.
+    // In admin view (showUserColumn=true), prefer the per-user breakdown
+    // from totals.by_user_source so each user's Web UI / legacy traffic
+    // gets its own row. Otherwise fall back to the global by_source aggregate.
+    let unkeyed = []
+    if (showUserColumn && Array.isArray(totals?.by_user_source) && totals.by_user_source.length > 0) {
+      unkeyed = totals.by_user_source.map((r) => ({
+        kind: r.source,
+        id: r.source + ':' + (r.user_id || ''),
+        name: r.source === 'legacy' ? t('usage.sources.legacy') : t('usage.sources.webUI'),
+        userID: r.user_id || '',
+        userName: r.user_name || '',
+        prefix: '-',
+        tokens: r.tokens,
+        requests: r.requests,
+      }))
+    } else {
+      if (totals?.by_source?.web) {
+        unkeyed.push({
+          kind: 'web',
+          id: 'web',
+          name: t('usage.sources.webUI'),
+          userID: '',
+          userName: '',
+          prefix: '-',
+          tokens: totals.by_source.web.tokens,
+          requests: totals.by_source.web.requests,
+        })
+      }
+      if (totals?.by_source?.legacy) {
+        unkeyed.push({
+          kind: 'legacy',
+          id: 'legacy',
+          name: t('usage.sources.legacy'),
+          userID: '',
+          userName: '',
+          prefix: '-',
+          tokens: totals.by_source.legacy.tokens,
+          requests: totals.by_source.legacy.requests,
+        })
+      }
+    }
+
+    return [...named, ...unkeyed]
+  }, [totals, existingKeyIds, showUserColumn, t])
+
+  const filtered = useMemo(() => {
+    const q = (search || '').trim().toLowerCase()
+    const list = q
+      ? rows.filter((r) =>
+          (r.name || '').toLowerCase().includes(q) ||
+          (r.prefix || '').toLowerCase().includes(q) ||
+          (r.userName || '').toLowerCase().includes(q) ||
+          (r.userID || '').toLowerCase().includes(q)
+        )
+      : rows
+    return [...list].sort(SORT_FNS[sortKey] || SORT_FNS.tokens)
+  }, [rows, search, sortKey])
+
+  const iconFor = (kind) =>
+    kind === 'apikey' ? 'fas fa-key' : kind === 'web' ? 'fas fa-globe' : 'fas fa-gear'
+
+  return (
+    <div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-sm)' }}>
+      <div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
+        <input
+          type="search"
+          value={search}
+          onChange={(e) => setSearch(e.target.value)}
+          placeholder={t('usage.sources.searchPlaceholder')}
+          aria-label={t('usage.sources.searchPlaceholder')}
+          style={{
+            flex: '1 1 12rem',
+            minWidth: 160,
+            padding: 'var(--spacing-xs) var(--spacing-sm)',
+            border: '1px solid var(--color-border-subtle)',
+            borderRadius: 'var(--radius-sm)',
+            background: 'var(--color-bg-primary)',
+            color: 'var(--color-text-primary)',
+          }}
+        />
+        <label style={{ display: 'inline-flex', alignItems: 'center', gap: 6, fontSize: '0.75rem' }}>
+          {t('usage.sources.sortBy')}:
+          <select
+            value={sortKey}
+            onChange={(e) => setSortKey(e.target.value)}
+            style={{
+              padding: 'calc(var(--spacing-xs) / 2) var(--spacing-xs)',
+              border: '1px solid var(--color-border-subtle)',
+              borderRadius: 'var(--radius-sm)',
+              background: 'var(--color-bg-primary)',
+              color: 'var(--color-text-primary)',
+            }}
+          >
+            <option value="tokens">{t('usage.sources.sortTokens')}</option>
+            <option value="requests">{t('usage.sources.sortRequests')}</option>
+            <option value="last_used">{t('usage.sources.sortLastUsed')}</option>
+            <option value="name">{t('usage.sources.sortName')}</option>
+            {showUserColumn && <option value="user">{t('usage.sources.sortUser')}</option>}
+          </select>
+        </label>
+      </div>
+
+      <div className="table-container">
+        <table className="table">
+          <thead>
+            <tr>
+              <th>{t('usage.sources.sortName')}</th>
+              {showUserColumn && <th style={{ width: 180 }}>{t('usage.sources.sortUser')}</th>}
+              <th style={{ width: 110 }}>Prefix</th>
+              <th style={{ width: 100, textAlign: 'right' }}>{t('usage.sources.sortRequests')}</th>
+              <th style={{ width: 100, textAlign: 'right' }}>{t('usage.sources.sortTokens')}</th>
+              <th style={{ width: 120, textAlign: 'right' }}>{t('usage.sources.sortLastUsed')}</th>
+            </tr>
+          </thead>
+          <tbody>
+            {filtered.map((r) => {
+              const isSel = selectedKey === r.id
+              return (
+                <tr
+                  key={r.id}
+                  onClick={() => onSelectKey?.(isSel ? null : r.id)}
+                  style={{
+                    cursor: 'pointer',
+                    background: isSel ? 'var(--color-bg-secondary)' : undefined,
+                    opacity: r.revoked ? 0.5 : 1,
+                  }}
+                >
+                  <td>
+                    <span style={{ display: 'inline-flex', alignItems: 'center', gap: 8 }}>
+                      <i
+                        className={iconFor(r.kind)}
+                        style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}
+                      />
+                      <span>{r.name}</span>
+                      {r.revoked && (
+                        <span
+                          style={{
+                            fontSize: '0.6875rem',
+                            textTransform: 'uppercase',
+                            color: 'var(--color-text-muted)',
+                          }}
+                        >
+                          ({t('usage.sources.revoked')})
+                        </span>
+                      )}
+                    </span>
+                  </td>
+                  {showUserColumn && (
+                    <td style={{ color: 'var(--color-text-secondary)', fontSize: '0.8125rem' }}>
+                      {r.userName || r.userID || '-'}
+                    </td>
+                  )}
+                  <td style={{ color: 'var(--color-text-muted)', fontSize: '0.75rem' }}>{r.prefix || '-'}</td>
+                  <td style={{ textAlign: 'right', fontFamily: 'var(--font-mono)' }}>
+                    {Number(r.requests || 0).toLocaleString()}
+                  </td>
+                  <td style={{ textAlign: 'right', fontFamily: 'var(--font-mono)' }}>
+                    {formatTokens(r.tokens || 0)}
+                  </td>
+                  <td style={{ textAlign: 'right', fontSize: '0.75rem', color: 'var(--color-text-muted)' }}>
+                    {formatRelative(r.last_used)}
+                  </td>
+                </tr>
+              )
+            })}
+          </tbody>
+        </table>
+      </div>
+    </div>
+  )
+}
--- a/core/http/react-ui/src/utils/api.js
+++ b/core/http/react-ui/src/utils/api.js
@@ -422,6 +422,14 @@ export const usageApi = {
    if (userId) url += `&user_id=${encodeURIComponent(userId)}`
    return fetchJSON(url)
  },
+  getMySources: (period) =>
+    fetchJSON(`/api/auth/usage/sources?period=${period || 'month'}`),
+  getAdminSources: (period, userId, apiKeyId) => {
+    let url = `/api/auth/admin/usage/sources?period=${period || 'month'}`
+    if (userId) url += `&user_id=${encodeURIComponent(userId)}`
+    if (apiKeyId) url += `&api_key_id=${encodeURIComponent(apiKeyId)}`
+    return fetchJSON(url)
+  },
  getMyQuotas: () => fetchJSON('/api/auth/quota'),
 }

--- a/core/http/react-ui/src/utils/clipboard.js
+++ b/core/http/react-ui/src/utils/clipboard.js
@@ -0,0 +1,81 @@
+// Clipboard helper that works in non-secure contexts.
+//
+// navigator.clipboard is only defined on https:// origins and on
+// http://localhost. When LocalAI is served over plain http from a remote
+// host (LXC + Docker is a common deployment), every page that called
+// `navigator.clipboard.writeText` silently failed (#9904). This helper
+// transparently falls back to a hidden-textarea + execCommand('copy')
+// trick that browsers still honour when the page is not a secure context.
+//
+// Returns true on success, false on failure. Callers should use the return
+// value to drive the success/failure toast — the old code always claimed
+// success regardless of what actually happened.
+export async function copyToClipboard(text) {
+  if (text == null) return false
+  const value = typeof text === 'string' ? text : String(text)
+
+  if (typeof navigator !== 'undefined' && navigator.clipboard?.writeText && window.isSecureContext) {
+    try {
+      await navigator.clipboard.writeText(value)
+      return true
+    } catch {
+      // Permissions denied, browser refused, etc. — try the fallback.
+    }
+  }
+
+  return legacyCopy(value)
+}
+
+function legacyCopy(value) {
+  if (typeof document === 'undefined') return false
+  const ta = document.createElement('textarea')
+  ta.value = value
+  // Keep the textarea out of the viewport and out of layout reads. Using
+  // `position: fixed` + a negative offset avoids scrolling the page when
+  // we call .select() below.
+  ta.setAttribute('readonly', '')
+  ta.style.position = 'fixed'
+  ta.style.top = '0'
+  ta.style.left = '-9999px'
+  ta.style.opacity = '0'
+  document.body.appendChild(ta)
+  // Preserve the current selection so triggering execCommand doesn't blow
+  // away whatever the user had highlighted on the page.
+  const previousSelection = saveSelection()
+  let ok = false
+  try {
+    ta.select()
+    ta.setSelectionRange(0, value.length)
+    ok = document.execCommand('copy')
+  } catch {
+    ok = false
+  } finally {
+    document.body.removeChild(ta)
+    restoreSelection(previousSelection)
+  }
+  return ok
+}
+
+function saveSelection() {
+  try {
+    const sel = window.getSelection()
+    if (!sel || sel.rangeCount === 0) return null
+    const ranges = []
+    for (let i = 0; i < sel.rangeCount; i++) ranges.push(sel.getRangeAt(i).cloneRange())
+    return ranges
+  } catch {
+    return null
+  }
+}
+
+function restoreSelection(ranges) {
+  if (!ranges) return
+  try {
+    const sel = window.getSelection()
+    if (!sel) return
+    sel.removeAllRanges()
+    for (const r of ranges) sel.addRange(r)
+  } catch {
+    // best-effort
+  }
+}
--- a/core/http/routes/auth.go
+++ b/core/http/routes/auth.go
@@ -789,6 +789,30 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
 		})
 	})

+	// GET /api/auth/usage/sources - caller's per-source breakdown (no legacy)
+	e.GET("/api/auth/usage/sources", func(c echo.Context) error {
+		user := auth.GetUser(c)
+		if user == nil {
+			return c.JSON(http.StatusUnauthorized, map[string]string{"error": "not authenticated"})
+		}
+
+		period := c.QueryParam("period")
+		if period == "" {
+			period = "month"
+		}
+
+		buckets, totals, err := auth.GetUserUsageBySource(db, user.ID, period)
+		if err != nil {
+			return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
+		}
+
+		return c.JSON(http.StatusOK, map[string]any{
+			"buckets":   buckets,
+			"totals":    totals,
+			"truncated": false,
+		})
+	})
+
 	// Admin endpoints
 	adminMw := auth.RequireAdmin()

@@ -1104,6 +1128,27 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
 		})
 	}, adminMw)

+	// GET /api/auth/admin/usage/sources - all users' per-source breakdown (admin only)
+	e.GET("/api/auth/admin/usage/sources", func(c echo.Context) error {
+		period := c.QueryParam("period")
+		if period == "" {
+			period = "month"
+		}
+		userID := c.QueryParam("user_id")
+		apiKeyID := c.QueryParam("api_key_id")
+
+		buckets, totals, truncated, err := auth.GetAllUsageBySource(db, period, userID, apiKeyID)
+		if err != nil {
+			return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
+		}
+
+		return c.JSON(http.StatusOK, map[string]any{
+			"buckets":   buckets,
+			"totals":    totals,
+			"truncated": truncated,
+		})
+	}, adminMw)
+
 	// --- Invite management endpoints ---

 	// POST /api/auth/admin/invites - create invite (admin only)
--- a/core/http/routes/auth_test.go
+++ b/core/http/routes/auth_test.go
@@ -286,6 +286,45 @@ func newTestAuthApp(db *gorm.DB, appConfig *config.ApplicationConfig) *echo.Echo
 		return c.JSON(http.StatusOK, map[string]string{"message": "user deleted"})
 	}, adminMw)

+	// Mirror of production handler in routes/auth.go GET /api/auth/usage/sources.
+	// Keep this body in sync with the real handler; this test app cannot call
+	// RegisterAuthRoutes because it needs a *application.Application.
+	e.GET("/api/auth/usage/sources", func(c echo.Context) error {
+		user := auth.GetUser(c)
+		if user == nil {
+			return c.JSON(http.StatusUnauthorized, map[string]string{"error": "not authenticated"})
+		}
+		period := c.QueryParam("period")
+		if period == "" {
+			period = "month"
+		}
+		buckets, totals, err := auth.GetUserUsageBySource(db, user.ID, period)
+		if err != nil {
+			return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
+		}
+		return c.JSON(http.StatusOK, map[string]any{
+			"buckets": buckets, "totals": totals, "truncated": false,
+		})
+	})
+
+	// Mirror of production handler in routes/auth.go GET /api/auth/admin/usage/sources.
+	// Keep this body in sync with the real handler.
+	e.GET("/api/auth/admin/usage/sources", func(c echo.Context) error {
+		period := c.QueryParam("period")
+		if period == "" {
+			period = "month"
+		}
+		userID := c.QueryParam("user_id")
+		apiKeyID := c.QueryParam("api_key_id")
+		buckets, totals, truncated, err := auth.GetAllUsageBySource(db, period, userID, apiKeyID)
+		if err != nil {
+			return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
+		}
+		return c.JSON(http.StatusOK, map[string]any{
+			"buckets": buckets, "totals": totals, "truncated": truncated,
+		})
+	}, adminMw)
+
 	// Regular API endpoint for testing
 	e.POST("/v1/chat/completions", func(c echo.Context) error {
 		return c.String(http.StatusOK, "ok")
@@ -931,4 +970,110 @@ var _ = Describe("Auth Routes", Label("auth"), func() {
 			Expect(providers).To(ContainElement(auth.ProviderGitHub))
 		})
 	})
+
+	Describe("GET /api/auth/usage/sources", func() {
+		It("returns only the caller's data, never legacy", func() {
+			app := newTestAuthApp(db, appConfig)
+
+			alice := createRouteTestUser(db, "alice@example.com", auth.RoleUser)
+			aliceToken, err := auth.CreateSession(db, alice.ID, "")
+			Expect(err).ToNot(HaveOccurred())
+
+			keyID := "k-alice"
+			now := time.Now()
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{
+				UserID: alice.ID, Source: auth.UsageSourceAPIKey,
+				APIKeyID: &keyID, APIKeyName: "alice-key",
+				Model: "gpt-4", TotalTokens: 100, CreatedAt: now,
+			})).To(Succeed())
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{
+				UserID: alice.ID, Source: auth.UsageSourceWeb,
+				Model: "gpt-4", TotalTokens: 50, CreatedAt: now,
+			})).To(Succeed())
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{
+				UserID: "legacy-api-key", Source: auth.UsageSourceLegacy,
+				Model: "gpt-4", TotalTokens: 30, CreatedAt: now,
+			})).To(Succeed())
+
+			rec := doAuthRequest(app, http.MethodGet, "/api/auth/usage/sources?period=month", nil, withSession(aliceToken))
+			Expect(rec.Code).To(Equal(http.StatusOK))
+
+			var resp struct {
+				Buckets   []auth.UsageBucket `json:"buckets"`
+				Totals    auth.SourceTotals  `json:"totals"`
+				Truncated bool               `json:"truncated"`
+			}
+			Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
+			_, hasLegacy := resp.Totals.BySource[auth.UsageSourceLegacy]
+			Expect(hasLegacy).To(BeFalse())
+			Expect(resp.Totals.GrandTotal.Tokens).To(Equal(int64(150)))
+			Expect(resp.Truncated).To(BeFalse())
+		})
+
+		It("returns 401 when unauthenticated", func() {
+			app := newTestAuthApp(db, appConfig)
+
+			// Without a session cookie or bearer token, the global auth middleware
+			// should refuse the request before our handler runs.
+			rec := doAuthRequest(app, http.MethodGet, "/api/auth/usage/sources?period=month", nil)
+			Expect(rec.Code).To(Equal(http.StatusUnauthorized))
+		})
+	})
+
+	Describe("GET /api/auth/admin/usage/sources", func() {
+		It("returns 403 for non-admin", func() {
+			app := newTestAuthApp(db, appConfig)
+
+			alice := createRouteTestUser(db, "alice@example.com", auth.RoleUser)
+			aliceToken, _ := auth.CreateSession(db, alice.ID, "")
+
+			rec := doAuthRequest(app, http.MethodGet, "/api/auth/admin/usage/sources?period=month", nil, withSession(aliceToken))
+			Expect(rec.Code).To(Equal(http.StatusForbidden))
+		})
+
+		It("returns legacy bucket for admin and applies api_key_id filter", func() {
+			app := newTestAuthApp(db, appConfig)
+
+			admin := createRouteTestUser(db, "admin@example.com", auth.RoleAdmin)
+			adminToken, _ := auth.CreateSession(db, admin.ID, "")
+
+			k1 := "k1"
+			k2 := "k2"
+			now := time.Now()
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "alice", Source: auth.UsageSourceAPIKey, APIKeyID: &k1, APIKeyName: "ci", Model: "gpt-4", TotalTokens: 10, CreatedAt: now})).To(Succeed())
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "alice", Source: auth.UsageSourceAPIKey, APIKeyID: &k2, APIKeyName: "lap", Model: "gpt-4", TotalTokens: 20, CreatedAt: now})).To(Succeed())
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "legacy-api-key", Source: auth.UsageSourceLegacy, Model: "gpt-4", TotalTokens: 5, CreatedAt: now})).To(Succeed())
+
+			rec := doAuthRequest(app, http.MethodGet,
+				"/api/auth/admin/usage/sources?period=month&api_key_id=k2", nil, withSession(adminToken))
+			Expect(rec.Code).To(Equal(http.StatusOK))
+
+			var resp struct {
+				Totals    auth.SourceTotals `json:"totals"`
+				Truncated bool              `json:"truncated"`
+			}
+			Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
+			Expect(resp.Totals.GrandTotal.Tokens).To(Equal(int64(20)))
+		})
+
+		It("includes legacy in by_source for admin with no filter", func() {
+			app := newTestAuthApp(db, appConfig)
+
+			admin := createRouteTestUser(db, "admin@example.com", auth.RoleAdmin)
+			adminToken, _ := auth.CreateSession(db, admin.ID, "")
+
+			now := time.Now()
+			Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "legacy-api-key", Source: auth.UsageSourceLegacy, Model: "gpt-4", TotalTokens: 7, CreatedAt: now})).To(Succeed())
+
+			rec := doAuthRequest(app, http.MethodGet, "/api/auth/admin/usage/sources?period=month", nil, withSession(adminToken))
+			Expect(rec.Code).To(Equal(http.StatusOK))
+
+			var resp struct {
+				Totals auth.SourceTotals `json:"totals"`
+			}
+			Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
+			Expect(resp.Totals.BySource).To(HaveKey(auth.UsageSourceLegacy))
+			Expect(resp.Totals.BySource[auth.UsageSourceLegacy].Tokens).To(Equal(int64(7)))
+		})
+	})
 })
--- a/core/http/routes/nodes.go
+++ b/core/http/routes/nodes.go
@@ -6,7 +6,9 @@ import (
 	"strings"

 	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/core/http/endpoints/localai"
+	"github.com/mudler/LocalAI/core/services/galleryop"
 	"github.com/mudler/LocalAI/core/services/nodes"
 	"gorm.io/gorm"
 )
@@ -53,7 +55,12 @@ func RegisterNodeSelfServiceRoutes(e *echo.Echo, registry *nodes.NodeRegistry, r

 // RegisterNodeAdminRoutes registers /api/nodes/ endpoints used by admins
 // (list, get, get models, drain, delete, approve, backend management). Protected by admin middleware.
-func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloader nodes.NodeCommandSender, adminMw echo.MiddlewareFunc, authDB *gorm.DB, hmacSecret string, registrationToken string) {
+//
+// galleryService/opcache/appConfig are threaded in for the async node-scoped
+// backend install path (POST /:id/backends/install). That handler enqueues a
+// ManagementOp on the gallery channel rather than blocking on a NATS reply, so
+// the browser gets HTTP 202 + jobID immediately instead of waiting up to 3 minutes.
+func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloader nodes.NodeCommandSender, galleryService *galleryop.GalleryService, opcache *galleryop.OpCache, appConfig *config.ApplicationConfig, adminMw echo.MiddlewareFunc, authDB *gorm.DB, hmacSecret string, registrationToken string) {
 	if registry == nil {
 		return
 	}
@@ -78,7 +85,7 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade

 	// Backend management on workers
 	admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader))
-	admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader))
+	admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader, galleryService, opcache, appConfig))
 	admin.POST("/:id/backends/delete", localai.DeleteBackendOnNodeEndpoint(unloader))

 	// Model management on workers
--- a/core/http/routes/ui_api.go
+++ b/core/http/routes/ui_api.go
@@ -214,6 +214,17 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 				}
 			}

+			// Node-scoped backend ops (from /api/nodes/:id/backends/install)
+			// carry the nodeID inside the opcache key as "node:<nodeID>:<backend>".
+			// Pull it back out so the operations panel can label which node the
+			// install is targeting, and so the display name is just the backend
+			// slug instead of the full prefixed key.
+			scopedNodeID := ""
+			if nodeID, backend, ok := galleryop.ParseNodeScopedKey(galleryID); ok {
+				scopedNodeID = nodeID
+				galleryID = backend
+			}
+
 			// Extract display name (remove repo prefix if exists)
 			displayName := galleryID
 			if strings.Contains(galleryID, "@") {
@@ -237,6 +248,12 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 				"cancellable": isCancellable,
 				"message":     message,
 			}
+			// Only attach nodeID when this op was node-scoped: an empty string
+			// would mislead the UI into rendering a node attribution that never
+			// existed in the first place.
+			if scopedNodeID != "" {
+				opData["nodeID"] = scopedNodeID
+			}
 			if status != nil && status.Error != nil {
 				opData["error"] = status.Error.Error()
 			}
--- a/core/http/routes/ui_api_operations_test.go
+++ b/core/http/routes/ui_api_operations_test.go
@@ -0,0 +1,98 @@
+package routes_test
+
+import (
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+
+	"github.com/labstack/echo/v4"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/application"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/routes"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+)
+
+// These specs guard the contract between the opcache (which stores
+// node-scoped backend installs under a "node:<nodeID>:<backend>" key) and the
+// /api/operations response surface the React UI polls. Without nodeID
+// extraction the panel would show the raw prefixed key and have no way to
+// label which worker an install is targeting.
+var _ = Describe("/api/operations with node-scoped backend ops", func() {
+	// We pass a zero-value *application.Application because the handler's
+	// distributed-services branch guards on a nil check on the returned
+	// *DistributedServices, which is nil for a fresh Application{}.
+	noopMw := func(next echo.HandlerFunc) echo.HandlerFunc { return next }
+
+	It("emits nodeID and the un-prefixed backend name for keys built by NodeScopedKey", func() {
+		appCfg := &config.ApplicationConfig{}
+		galleryService := galleryop.NewGalleryService(appCfg, nil)
+		opcache := galleryop.NewOpCache(galleryService)
+
+		key := galleryop.NodeScopedKey("worker-7", "llama-cpp")
+		opcache.SetBackend(key, "job-uuid-123")
+
+		e := echo.New()
+		routes.RegisterUIAPIRoutes(e, nil, nil, appCfg, galleryService, opcache, &application.Application{}, noopMw)
+
+		req := httptest.NewRequest(http.MethodGet, "/api/operations", nil)
+		rec := httptest.NewRecorder()
+		e.ServeHTTP(rec, req)
+
+		Expect(rec.Code).To(Equal(http.StatusOK))
+
+		// The handler wraps operations in {"operations": [...]}.
+		var envelope struct {
+			Operations []map[string]any `json:"operations"`
+		}
+		Expect(json.Unmarshal(rec.Body.Bytes(), &envelope)).To(Succeed())
+
+		var found map[string]any
+		for _, op := range envelope.Operations {
+			if op["jobID"] == "job-uuid-123" {
+				found = op
+				break
+			}
+		}
+		Expect(found).ToNot(BeNil(), "node-scoped op should appear in /api/operations")
+		Expect(found["nodeID"]).To(Equal("worker-7"))
+		Expect(found["name"]).To(Equal("llama-cpp"))
+		Expect(found["isBackend"]).To(Equal(true))
+	})
+
+	It("does not emit nodeID for non-node-scoped backend ops", func() {
+		appCfg := &config.ApplicationConfig{}
+		galleryService := galleryop.NewGalleryService(appCfg, nil)
+		opcache := galleryop.NewOpCache(galleryService)
+
+		// Legacy/global install path: bare backend name as the opcache key.
+		opcache.SetBackend("llama-cpp", "job-uuid-456")
+
+		e := echo.New()
+		routes.RegisterUIAPIRoutes(e, nil, nil, appCfg, galleryService, opcache, &application.Application{}, noopMw)
+
+		req := httptest.NewRequest(http.MethodGet, "/api/operations", nil)
+		rec := httptest.NewRecorder()
+		e.ServeHTTP(rec, req)
+
+		Expect(rec.Code).To(Equal(http.StatusOK))
+		var envelope struct {
+			Operations []map[string]any `json:"operations"`
+		}
+		Expect(json.Unmarshal(rec.Body.Bytes(), &envelope)).To(Succeed())
+
+		var found map[string]any
+		for _, op := range envelope.Operations {
+			if op["jobID"] == "job-uuid-456" {
+				found = op
+				break
+			}
+		}
+		Expect(found).ToNot(BeNil())
+		// Critical: bare ops must NOT gain a misleading empty nodeID field.
+		Expect(found).ToNot(HaveKey("nodeID"), "non-node-scoped ops must NOT carry a nodeID field")
+		Expect(found["name"]).To(Equal("llama-cpp"))
+	})
+})
--- a/core/services/galleryop/backends.go
+++ b/core/services/galleryop/backends.go
@@ -113,7 +113,7 @@ func (g *GalleryService) backendHandler(op *ManagementOp[gallery.GalleryBackend,
 // InstallExternalBackend installs a backend from an external source (OCI image, URL, or path).
 // This method contains the logic to detect the input type and call the appropriate installation function.
 // It can be used by both CLI and Web UI for installing backends from external sources.
-func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string) error {
+func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string, requireIntegrity bool) error {
 	uri := downloader.URI(backend)
 	switch {
 	case uri.LooksLikeDir():
@@ -127,7 +127,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
 			},
 			Alias: alias,
 			URI:   backend,
-		}, downloadStatus); err != nil {
+		}, downloadStatus, requireIntegrity); err != nil {
 			return fmt.Errorf("error installing backend %s: %w", backend, err)
 		}
 	case uri.LooksLikeOCI() && !uri.LooksLikeOCIFile():
@@ -141,7 +141,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
 			},
 			Alias: alias,
 			URI:   backend,
-		}, downloadStatus); err != nil {
+		}, downloadStatus, requireIntegrity); err != nil {
 			return fmt.Errorf("error installing backend %s: %w", backend, err)
 		}
 	case uri.LooksLikeOCIFile():
@@ -163,7 +163,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
 			},
 			Alias: alias,
 			URI:   backend,
-		}, downloadStatus); err != nil {
+		}, downloadStatus, requireIntegrity); err != nil {
 			return fmt.Errorf("error installing backend %s: %w", backend, err)
 		}
 	default:
@@ -171,7 +171,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
 		if name != "" || alias != "" {
 			return fmt.Errorf("specifying a name or alias is not supported for gallery backends")
 		}
-		err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, true)
+		err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, true, requireIntegrity)
 		if err != nil {
 			return fmt.Errorf("error installing backend %s: %w", backend, err)
 		}
--- a/core/services/galleryop/backends_test.go
+++ b/core/services/galleryop/backends_test.go
@@ -70,6 +70,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"test-backend", // gallery name
 				"custom-name",  // name should not be allowed
 				"",
+				false,
 			)
 			Expect(err).To(HaveOccurred())
 			Expect(err.Error()).To(ContainSubstring("specifying a name or alias is not supported for gallery backends"))
@@ -85,6 +86,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"non-existent-backend",
 				"",
 				"",
+				false,
 			)
 			Expect(err).To(HaveOccurred())
 		})
@@ -101,6 +103,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"oci://quay.io/mudler/tests:localai-backend-test",
 				"", // name is required for OCI images
 				"",
+				false,
 			)
 			Expect(err).To(HaveOccurred())
 			Expect(err.Error()).To(ContainSubstring("specifying a name is required for OCI images"))
@@ -133,6 +136,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"", // name should be inferred as "source-backend"
 				"",
+				false,
 			)
 			// The function should at least attempt to install with the inferred name
 			// Even if it fails for other reasons, it shouldn't fail due to missing name
@@ -151,6 +155,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"custom-backend-name",
 				"",
+				false,
 			)
 			// The function should use the provided name
 			if err != nil {
@@ -168,6 +173,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"custom-backend-name",
 				"custom-alias",
+				false,
 			)
 			// The function should accept alias for directory paths
 			if err != nil {
@@ -190,4 +196,60 @@ var _ = Describe("ManagementOp with External Backend", func() {
 		Expect(op.ExternalName).To(Equal("test-backend"))
 		Expect(op.ExternalAlias).To(Equal("test-alias"))
 	})
+
+	Context("TargetNodeID field", func() {
+		It("defaults to empty string", func() {
+			op := galleryop.ManagementOp[string, string]{
+				ExternalURI: "oci://example.com/backend:latest",
+			}
+			Expect(op.TargetNodeID).To(BeEmpty())
+		})
+
+		It("preserves TargetNodeID across a channel send", func() {
+			ch := make(chan galleryop.ManagementOp[string, string], 1)
+			ch <- galleryop.ManagementOp[string, string]{
+				GalleryElementName: "llama-cpp",
+				TargetNodeID:       "node-abc-123",
+			}
+			received := <-ch
+			Expect(received.TargetNodeID).To(Equal("node-abc-123"))
+			Expect(received.GalleryElementName).To(Equal("llama-cpp"))
+		})
+	})
+
+	Describe("NodeScopedKey", func() {
+		It("builds a unique key per (nodeID, backend) pair", func() {
+			Expect(galleryop.NodeScopedKey("node-a", "llama-cpp")).To(Equal("node:node-a:llama-cpp"))
+			Expect(galleryop.NodeScopedKey("node-b", "llama-cpp")).To(Equal("node:node-b:llama-cpp"))
+			Expect(galleryop.NodeScopedKey("node-a", "vllm")).To(Equal("node:node-a:vllm"))
+		})
+
+		It("handles backend names containing colons", func() {
+			// Gallery IDs sometimes look like "official@llama-cpp"; nodeIDs are UUIDs
+			// without colons, but the backend slug may contain anything. Splitting on
+			// the first colon after the prefix MUST yield the full backend back.
+			key := galleryop.NodeScopedKey("node-1", "official@llama-cpp:v2")
+			node, backend, ok := galleryop.ParseNodeScopedKey(key)
+			Expect(ok).To(BeTrue())
+			Expect(node).To(Equal("node-1"))
+			Expect(backend).To(Equal("official@llama-cpp:v2"))
+		})
+
+		It("rejects keys without the node prefix", func() {
+			_, _, ok := galleryop.ParseNodeScopedKey("llama-cpp")
+			Expect(ok).To(BeFalse())
+			_, _, ok = galleryop.ParseNodeScopedKey("official@llama-cpp")
+			Expect(ok).To(BeFalse())
+		})
+
+		It("rejects malformed node-prefixed keys", func() {
+			_, _, ok := galleryop.ParseNodeScopedKey("node:only-one-segment")
+			Expect(ok).To(BeFalse())
+		})
+
+		It("rejects keys with an empty nodeID segment", func() {
+			_, _, ok := galleryop.ParseNodeScopedKey("node::llama-cpp")
+			Expect(ok).To(BeFalse())
+		})
+	})
 })
--- a/core/services/galleryop/managers_local.go
+++ b/core/services/galleryop/managers_local.go
@@ -16,6 +16,7 @@ type LocalModelManager struct {
 	modelLoader                 *model.ModelLoader
 	enforcePredownloadScans     bool
 	automaticallyInstallBackend bool
+	requireBackendIntegrity     bool
 }

 // NewLocalModelManager creates a LocalModelManager from the application config.
@@ -25,6 +26,7 @@ func NewLocalModelManager(appConfig *config.ApplicationConfig, ml *model.ModelLo
 		modelLoader:                 ml,
 		enforcePredownloadScans:     appConfig.EnforcePredownloadScans,
 		automaticallyInstallBackend: appConfig.AutoloadBackendGalleries,
+		requireBackendIntegrity:     appConfig.RequireBackendIntegrity,
 	}
 }

@@ -53,32 +55,34 @@ func (m *LocalModelManager) InstallModel(ctx context.Context, op *ManagementOp[g
 		if m.automaticallyInstallBackend && installedModel.Backend != "" {
 			xlog.Debug("Installing backend", "backend", installedModel.Backend)
 			return gallery.InstallBackendFromGallery(ctx, op.BackendGalleries, m.systemState,
-				m.modelLoader, installedModel.Backend, progressCb, false)
+				m.modelLoader, installedModel.Backend, progressCb, false, m.requireBackendIntegrity)
 		}
 		return nil
 	case op.GalleryElementName != "":
 		return gallery.InstallModelFromGallery(ctx, op.Galleries, op.BackendGalleries,
 			m.systemState, m.modelLoader, op.GalleryElementName, op.Req, progressCb,
-			m.enforcePredownloadScans, m.automaticallyInstallBackend)
+			m.enforcePredownloadScans, m.automaticallyInstallBackend, m.requireBackendIntegrity)
 	default:
 		return installModelFromRemoteConfig(ctx, m.systemState, m.modelLoader, op.Req,
-			progressCb, m.enforcePredownloadScans, m.automaticallyInstallBackend, op.BackendGalleries)
+			progressCb, m.enforcePredownloadScans, m.automaticallyInstallBackend, op.BackendGalleries, m.requireBackendIntegrity)
 	}
 }

 // LocalBackendManager handles backend install/delete on the local instance.
 type LocalBackendManager struct {
-	systemState      *system.SystemState
-	modelLoader      *model.ModelLoader
-	backendGalleries []config.Gallery
+	systemState             *system.SystemState
+	modelLoader             *model.ModelLoader
+	backendGalleries        []config.Gallery
+	requireBackendIntegrity bool
 }

 // NewLocalBackendManager creates a LocalBackendManager from the application config.
 func NewLocalBackendManager(appConfig *config.ApplicationConfig, ml *model.ModelLoader) *LocalBackendManager {
 	return &LocalBackendManager{
-		systemState:      appConfig.SystemState,
-		modelLoader:      ml,
-		backendGalleries: appConfig.BackendGalleries,
+		systemState:             appConfig.SystemState,
+		modelLoader:             ml,
+		backendGalleries:        appConfig.BackendGalleries,
+		requireBackendIntegrity: appConfig.RequireBackendIntegrity,
 	}
 }

@@ -93,7 +97,7 @@ func (b *LocalBackendManager) ListBackends() (gallery.SystemBackends, error) {
 }

 func (b *LocalBackendManager) UpgradeBackend(ctx context.Context, name string, progressCb ProgressCallback) error {
-	return gallery.UpgradeBackend(ctx, b.systemState, b.modelLoader, b.backendGalleries, name, progressCb)
+	return gallery.UpgradeBackend(ctx, b.systemState, b.modelLoader, b.backendGalleries, name, progressCb, b.requireBackendIntegrity)
 }

 func (b *LocalBackendManager) CheckUpgrades(ctx context.Context) (map[string]gallery.UpgradeInfo, error) {
@@ -103,10 +107,10 @@ func (b *LocalBackendManager) CheckUpgrades(ctx context.Context) (map[string]gal
 func (b *LocalBackendManager) InstallBackend(ctx context.Context, op *ManagementOp[gallery.GalleryBackend, any], progressCb ProgressCallback) error {
 	if op.ExternalURI != "" {
 		return InstallExternalBackend(ctx, b.backendGalleries, b.systemState, b.modelLoader,
-			progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias)
+			progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias, b.requireBackendIntegrity)
 	}
 	return gallery.InstallBackendFromGallery(ctx, b.backendGalleries, b.systemState,
-		b.modelLoader, op.GalleryElementName, progressCb, true)
+		b.modelLoader, op.GalleryElementName, progressCb, true, b.requireBackendIntegrity)
 }

 func (b *LocalBackendManager) IsDistributed() bool { return false }
--- a/core/services/galleryop/models.go
+++ b/core/services/galleryop/models.go
@@ -123,7 +123,7 @@ func (g *GalleryService) modelHandler(op *ManagementOp[gallery.GalleryModel, gal
 	return nil
 }

-func installModelFromRemoteConfig(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, req gallery.GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool, backendGalleries []config.Gallery) error {
+func installModelFromRemoteConfig(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, req gallery.GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool, backendGalleries []config.Gallery, requireBackendIntegrity bool) error {
 	config, err := gallery.GetGalleryConfigFromURLWithContext[gallery.ModelConfig](ctx, req.URL, systemState.Model.ModelsPath)
 	if err != nil {
 		return err
@@ -137,7 +137,7 @@ func installModelFromRemoteConfig(ctx context.Context, systemState *system.Syste
 	}

 	if automaticallyInstallBackend && installedModel.Backend != "" {
-		if err := gallery.InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false); err != nil {
+		if err := gallery.InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false, requireBackendIntegrity); err != nil {
 			return err
 		}
 	}
@@ -150,23 +150,23 @@ type galleryModel struct {
 	ID                   string           `json:"id"`
 }

-func processRequests(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, requests []galleryModel) error {
+func processRequests(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, requests []galleryModel, requireBackendIntegrity bool) error {
 	ctx := context.Background()
 	var err error
 	for _, r := range requests {
 		utils.ResetDownloadTimers()
 		if r.ID == "" {
-			err = installModelFromRemoteConfig(ctx, systemState, modelLoader, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, backendGalleries)
+			err = installModelFromRemoteConfig(ctx, systemState, modelLoader, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, backendGalleries, requireBackendIntegrity)

 		} else {
 			err = gallery.InstallModelFromGallery(
-				ctx, galleries, backendGalleries, systemState, modelLoader, r.ID, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend)
+				ctx, galleries, backendGalleries, systemState, modelLoader, r.ID, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, requireBackendIntegrity)
 		}
 	}
 	return err
 }

-func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string) error {
+func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string, requireBackendIntegrity bool) error {
 	dat, err := os.ReadFile(s)
 	if err != nil {
 		return err
@@ -177,15 +177,15 @@ func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.Mo
 		return err
 	}

-	return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests)
+	return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests, requireBackendIntegrity)
 }

-func ApplyGalleryFromString(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string) error {
+func ApplyGalleryFromString(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string, requireBackendIntegrity bool) error {
 	var requests []galleryModel
 	err := json.Unmarshal([]byte(s), &requests)
 	if err != nil {
 		return err
 	}

-	return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests)
+	return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests, requireBackendIntegrity)
 }
--- a/core/services/galleryop/operation.go
+++ b/core/services/galleryop/operation.go
@@ -2,6 +2,7 @@ package galleryop

 import (
 	"context"
+	"strings"

 	"github.com/mudler/LocalAI/core/config"
 	"github.com/mudler/LocalAI/pkg/xsync"
@@ -30,6 +31,12 @@ type ManagementOp[T any, E any] struct {
 	ExternalName  string // Custom name for the backend
 	ExternalAlias string // Custom alias for the backend

+	// TargetNodeID scopes a backend install/upgrade to a single worker node.
+	// Empty means fan out to every healthy backend node (the previous behavior).
+	// Set by InstallBackendOnNodeEndpoint so an admin can install a hardware-specific
+	// build on one node without touching the rest of the cluster.
+	TargetNodeID string
+
 	// Upgrade is true if this is an upgrade operation (not a fresh install)
 	Upgrade bool
 }
@@ -115,3 +122,31 @@ func (m *OpCache) GetStatus() (map[string]string, map[string]string) {

 	return processingModelsData, taskTypes
 }
+
+// NodeScopedKeyPrefix is the opcache key prefix used by InstallBackendOnNodeEndpoint
+// so per-node installs do not collide on the bare backend name. Format:
+// "node:<nodeID>:<backend>". Read by /api/operations to extract nodeID for the UI.
+const NodeScopedKeyPrefix = "node:"
+
+// NodeScopedKey returns the opcache key for a node-scoped backend operation.
+// The prefix lets ParseNodeScopedKey detach the nodeID back out so the
+// operations endpoint can surface it without storing nodeID separately.
+func NodeScopedKey(nodeID, backend string) string {
+	return NodeScopedKeyPrefix + nodeID + ":" + backend
+}
+
+// ParseNodeScopedKey extracts (nodeID, backend) from a key built by NodeScopedKey.
+// Returns ok=false for keys that lack the prefix or are missing the nodeID or
+// backend segment. Backend names containing colons are preserved because we
+// split on the first colon after the prefix only.
+func ParseNodeScopedKey(key string) (nodeID, backend string, ok bool) {
+	rest, hasPrefix := strings.CutPrefix(key, NodeScopedKeyPrefix)
+	if !hasPrefix {
+		return "", "", false
+	}
+	nodeID, backend, ok = strings.Cut(rest, ":")
+	if !ok || nodeID == "" || backend == "" {
+		return "", "", false
+	}
+	return nodeID, backend, true
+}
--- a/core/services/nodes/managers_distributed.go
+++ b/core/services/nodes/managers_distributed.go
@@ -331,13 +331,23 @@ func (d *DistributedBackendManager) ListBackends() (gallery.SystemBackends, erro
 // non-healthy nodes get retried when they come back instead of being silently
 // skipped. Reply success from the NATS round-trip deletes the queue row;
 // reply.Success==false is treated as an error so the row stays for retry.
+//
+// When op.TargetNodeID is set, only that node is visited - the same allowlist
+// path UpgradeBackend uses. Empty TargetNodeID preserves the original fan-out
+// behavior so the periodic reconciler and /api/backends/install/:id keep
+// working unchanged.
 func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *galleryop.ManagementOp[gallery.GalleryBackend, any], progressCb galleryop.ProgressCallback) error {
 	galleriesJSON, _ := json.Marshal(op.Galleries)
 	backendName := op.GalleryElementName

-	result, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, nil, func(node BackendNode) error {
+	var targetNodeIDs map[string]bool
+	if op.TargetNodeID != "" {
+		targetNodeIDs = map[string]bool{op.TargetNodeID: true}
+	}
+
+	result, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, targetNodeIDs, func(node BackendNode) error {
 		// Admin-driven backend install: not tied to a specific replica slot.
-		// Pass replica 0 — the worker's processKey is "backend#0" when no
+		// Pass replica 0 - the worker's processKey is "backend#0" when no
 		// modelID is supplied, matching pre-PR4 behavior.
 		reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias, 0)
 		if err != nil {
--- a/core/services/nodes/managers_distributed_test.go
+++ b/core/services/nodes/managers_distributed_test.go
@@ -311,6 +311,47 @@ var _ = Describe("DistributedBackendManager", func() {
 				Expect(mgr.InstallBackend(ctx, op("vllm-development"), nil)).To(Succeed())
 			})
 		})
+
+		Context("when op.TargetNodeID is set to a healthy node", func() {
+			It("installs only on that node, leaving the others untouched", func() {
+				target := registerHealthyBackend("worker-target", "10.0.0.1:50051")
+				other := registerHealthyBackend("worker-other", "10.0.0.2:50051")
+
+				mc.scriptReply(messaging.SubjectNodeBackendInstall(target.ID),
+					messaging.BackendInstallReply{Success: true, Address: "10.0.0.1:50100"})
+				// No reply scripted for `other`: if InstallBackend fans out
+				// to it, the fakeNoRespondersErr default would surface and
+				// the test would fail.
+
+				targetedOp := &galleryop.ManagementOp[gallery.GalleryBackend, any]{
+					GalleryElementName: "llama-cpp",
+					TargetNodeID:       target.ID,
+				}
+				Expect(mgr.InstallBackend(ctx, targetedOp, nil)).To(Succeed())
+
+				mc.mu.Lock()
+				defer mc.mu.Unlock()
+				Expect(mc.calls).To(HaveLen(1))
+				Expect(mc.calls[0].Subject).To(Equal(messaging.SubjectNodeBackendInstall(target.ID)))
+				Expect(mc.calls[0].Subject).ToNot(Equal(messaging.SubjectNodeBackendInstall(other.ID)))
+			})
+		})
+
+		Context("when op.TargetNodeID is set to a node that does not exist", func() {
+			It("returns nil without sending any NATS request", func() {
+				registerHealthyBackend("worker-a", "10.0.0.1:50051")
+
+				ghostOp := &galleryop.ManagementOp[gallery.GalleryBackend, any]{
+					GalleryElementName: "llama-cpp",
+					TargetNodeID:       "this-id-does-not-exist",
+				}
+				Expect(mgr.InstallBackend(ctx, ghostOp, nil)).To(Succeed())
+
+				mc.mu.Lock()
+				defer mc.mu.Unlock()
+				Expect(mc.calls).To(BeEmpty())
+			})
+		})
 	})

 	Describe("UpgradeBackend", func() {
--- a/Show More
+++ b/Show More