fix(distributed): stop queue loops on agent nodes + dead-letter cap

pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.
feat(ui): shared FilterBar across the System page tabs
2026-07-04 05:16:42 -04:00 · 2026-04-19 21:27:05 +00:00 · 2026-04-19 08:46:22 +00:00 · 2026-04-19 08:39:59 +00:00 · 2026-04-19 08:37:45 +00:00 · 2026-04-19 08:34:57 +00:00
11 changed files with 220 additions and 85 deletions
--- a/backend/cpp/turboquant/Makefile
+++ b/backend/cpp/turboquant/Makefile
@@ -1,7 +1,7 @@

 # Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
 # Auto-bumped nightly by .github/workflows/bump_deps.yaml.
-TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
+TURBOQUANT_VERSION?=45f8a066ed5f5bb38c695cec532f6cef9f4efa9d
 LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant

 CMAKE_ARGS?=
--- a/backend/cpp/turboquant/patch-grpc-server.sh
+++ b/backend/cpp/turboquant/patch-grpc-server.sh
@@ -1,22 +1,13 @@
 #!/bin/bash
-# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
-# turboquant build to account for two gaps between upstream and the fork:
+# Augment the shared backend/cpp/llama-cpp/grpc-server.cpp allow-list of KV-cache
+# types so the gRPC `LoadModel` call accepts the TurboQuant-specific
+# `turbo2` / `turbo3` / `turbo4` cache types.
 #
-#   1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
-#      fork-specific `turbo2` / `turbo3` / `turbo4` cache types.
-#   2. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
-#      server-side random per-instance marker) with the legacy "<__media__>"
-#      literal. The fork branched before that PR, so server-common.cpp has no
-#      get_media_marker symbol. The fork's mtmd_default_marker() still returns
-#      "<__media__>", and Go-side tooling falls back to that sentinel when the
-#      backend does not expose media_marker, so substituting the literal keeps
-#      behavior identical on the turboquant path.
+# We do this on the *copy* sitting in turboquant-<flavor>-build/, never on the
+# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
+# compiling against vanilla upstream which does not know about GGML_TYPE_TURBO*.
 #
-# We patch the *copy* sitting in turboquant-<flavor>-build/, never the original
-# under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps compiling
-# against vanilla upstream.
-#
-# Idempotent: skips each insertion if its marker is already present (so re-runs
+# Idempotent: skips the insertion if the marker is already present (so re-runs
 # of the same build dir don't double-insert).

 set -euo pipefail
@@ -34,47 +25,33 @@ if [[ ! -f "$SRC" ]]; then
 fi

 if grep -q 'GGML_TYPE_TURBO2_0' "$SRC"; then
-    echo "==> $SRC already has TurboQuant cache types, skipping KV allow-list patch"
-else
-    echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
-
-    # Insert the three TURBO entries right after the first `    GGML_TYPE_Q5_1,`
-    # line (the kv_cache_types[] allow-list). Using awk because the builder image
-    # does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
-    awk '
-        /^    GGML_TYPE_Q5_1,$/ && !done {
-            print
-            print "    // turboquant fork extras — added by patch-grpc-server.sh"
-            print "    GGML_TYPE_TURBO2_0,"
-            print "    GGML_TYPE_TURBO3_0,"
-            print "    GGML_TYPE_TURBO4_0,"
-            done = 1
-            next
-        }
-        { print }
-        END {
-            if (!done) {
-                print "patch-grpc-server.sh: anchor `    GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
-                exit 1
-            }
-        }
-    ' "$SRC" > "$SRC.tmp"
-    mv "$SRC.tmp" "$SRC"
-
-    echo "==> KV allow-list patch OK"
+    echo "==> $SRC already has TurboQuant cache types, skipping"
+    exit 0
 fi

-if grep -q 'get_media_marker()' "$SRC"; then
-    echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
-    # Only one call site today (ModelMetadata), but replace all occurrences to
-    # stay robust if upstream adds more. Use a temp file to avoid relying on
-    # sed -i portability (the builder image uses GNU sed, but keeping this
-    # consistent with the awk block above).
-    sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
-    mv "$SRC.tmp" "$SRC"
-    echo "==> get_media_marker() substitution OK"
-else
-    echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
-fi
+echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"

-echo "==> all patches applied"
+# Insert the three TURBO entries right after the first `    GGML_TYPE_Q5_1,`
+# line (the kv_cache_types[] allow-list). Using awk because the builder image
+# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
+awk '
+    /^    GGML_TYPE_Q5_1,$/ && !done {
+        print
+        print "    // turboquant fork extras — added by patch-grpc-server.sh"
+        print "    GGML_TYPE_TURBO2_0,"
+        print "    GGML_TYPE_TURBO3_0,"
+        print "    GGML_TYPE_TURBO4_0,"
+        done = 1
+        next
+    }
+    { print }
+    END {
+        if (!done) {
+            print "patch-grpc-server.sh: anchor `    GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
+            exit 1
+        }
+    }
+' "$SRC" > "$SRC.tmp"
+mv "$SRC.tmp" "$SRC"
+
+echo "==> patched OK"
--- a/backend/cpp/turboquant/patches/0001-server-respect-the-ignore-eos-flag.patch
+++ b/backend/cpp/turboquant/patches/0001-server-respect-the-ignore-eos-flag.patch
@@ -0,0 +1,83 @@
+From 660600081fb7b9b769ded5c805a2d39a419f0a0d Mon Sep 17 00:00:00 2001
+From: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
+Date: Wed, 8 Apr 2026 11:12:15 -0400
+Subject: [PATCH] server: respect the ignore eos flag (#21203)
+
+---
+ tools/server/server-context.cpp | 3 +++
+ tools/server/server-context.h   | 3 +++
+ tools/server/server-task.cpp    | 3 ++-
+ tools/server/server-task.h      | 1 +
+ 4 files changed, 9 insertions(+), 1 deletion(-)
+
+diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
+index 9d3ac538..b31981c5 100644
+--- a/tools/server/server-context.cpp
+++ b/tools/server/server-context.cpp
+@@ -3033,6 +3033,8 @@ server_context_meta server_context::get_meta() const {
+         /* fim_rep_token          */ llama_vocab_fim_rep(impl->vocab),
+         /* fim_sep_token          */ llama_vocab_fim_sep(impl->vocab),
+ 
+        /* logit_bias_eog         */ impl->params_base.sampling.logit_bias_eog,
+
+         /* model_vocab_type       */ llama_vocab_type(impl->vocab),
+         /* model_vocab_n_tokens   */ llama_vocab_n_tokens(impl->vocab),
+         /* model_n_ctx_train      */ llama_model_n_ctx_train(impl->model),
+@@ -3117,6 +3119,7 @@ std::unique_ptr<server_res_generator> server_routes::handle_completions_impl(
+                     ctx_server.vocab,
+                     params,
+                     meta->slot_n_ctx,
+                    meta->logit_bias_eog,
+                     data);
+             task.id_slot = json_value(data, "id_slot", -1);
+ 
+diff --git a/tools/server/server-context.h b/tools/server/server-context.h
+index d7ce8735..6ea9afc0 100644
+--- a/tools/server/server-context.h
+++ b/tools/server/server-context.h
+@@ -39,6 +39,9 @@ struct server_context_meta {
+     llama_token fim_rep_token;
+     llama_token fim_sep_token;
+ 
+    // sampling
+    std::vector<llama_logit_bias> logit_bias_eog;
+
+     // model meta
+     enum llama_vocab_type model_vocab_type;
+     int32_t model_vocab_n_tokens;
+diff --git a/tools/server/server-task.cpp b/tools/server/server-task.cpp
+index 4cc87bc5..856b3f0e 100644
+--- a/tools/server/server-task.cpp
+++ b/tools/server/server-task.cpp
+@@ -239,6 +239,7 @@ task_params server_task::params_from_json_cmpl(
+         const llama_vocab * vocab,
+         const common_params & params_base,
+         const int n_ctx_slot,
+        const std::vector<llama_logit_bias> & logit_bias_eog,
+         const json & data) {
+     task_params params;
+ 
+@@ -562,7 +563,7 @@ task_params server_task::params_from_json_cmpl(
+         if (params.sampling.ignore_eos) {
+             params.sampling.logit_bias.insert(
+                     params.sampling.logit_bias.end(),
+-                    defaults.sampling.logit_bias_eog.begin(), defaults.sampling.logit_bias_eog.end());
+                    logit_bias_eog.begin(), logit_bias_eog.end());
+         }
+     }
+ 
+diff --git a/tools/server/server-task.h b/tools/server/server-task.h
+index d855bf08..243e47a8 100644
+--- a/tools/server/server-task.h
+++ b/tools/server/server-task.h
+@@ -209,6 +209,7 @@ struct server_task {
+         const llama_vocab * vocab,
+         const common_params & params_base,
+         const int n_ctx_slot,
+        const std::vector<llama_logit_bias> & logit_bias_eog,
+         const json & data);
+ 
+     // utility function
+-- 
+2.43.0
+
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -1008,20 +1008,6 @@
    nvidia-cuda-12: "cuda12-turboquant-development"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-turboquant-development"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-turboquant-development"
- !!merge <<: *stablediffusionggml
-  name: "stablediffusion-ggml-development"
-  capabilities:
-    default: "cpu-stablediffusion-ggml-development"
-    nvidia: "cuda12-stablediffusion-ggml-development"
-    intel: "intel-sycl-f16-stablediffusion-ggml-development"
-    # amd: "rocm-stablediffusion-ggml-development"
-    vulkan: "vulkan-stablediffusion-ggml-development"
-    nvidia-l4t: "nvidia-l4t-arm64-stablediffusion-ggml-development"
-    metal: "metal-stablediffusion-ggml-development"
-    nvidia-cuda-13: "cuda13-stablediffusion-ggml-development"
-    nvidia-cuda-12: "cuda12-stablediffusion-ggml-development"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-stablediffusion-ggml-development"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-stablediffusion-ggml-development"
 - !!merge <<: *neutts
  name: "cpu-neutts"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-neutts"
--- a/backend/rust/kokoros/src/service.rs
+++ b/backend/rust/kokoros/src/service.rs
@@ -341,16 +341,6 @@ impl Backend for KokorosService {
        Err(Status::unimplemented("Not supported"))
    }

-    type AudioTranscriptionStreamStream =
-        ReceiverStream<Result<backend::TranscriptStreamResponse, Status>>;
-
-    async fn audio_transcription_stream(
-        &self,
-        _: Request<backend::TranscriptRequest>,
-    ) -> Result<Response<Self::AudioTranscriptionStreamStream>, Status> {
-        Err(Status::unimplemented("Not supported"))
-    }
-
    async fn sound_generation(
        &self,
        _: Request<backend::SoundGenerationRequest>,
--- a/core/services/nodes/managers_distributed.go
+++ b/core/services/nodes/managers_distributed.go
@@ -106,6 +106,13 @@ func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context
 		if node.Status == StatusPending {
 			continue
 		}
+		// Backend lifecycle ops only make sense on backend-type workers.
+		// Agent workers don't subscribe to backend.install/delete/list, so
+		// enqueueing for them guarantees a forever-retrying row that the
+		// reconciler can never drain. Silently skip — they aren't consumers.
+		if node.NodeType != "" && node.NodeType != NodeTypeBackend {
+			continue
+		}
 		if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
 			xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
 			result.Nodes = append(result.Nodes, NodeOpStatus{
--- a/core/services/nodes/reconciler.go
+++ b/core/services/nodes/reconciler.go
@@ -3,12 +3,14 @@ package nodes
 import (
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"time"

 	"github.com/mudler/LocalAI/core/services/advisorylock"
 	grpcclient "github.com/mudler/LocalAI/pkg/grpc"
 	"github.com/mudler/xlog"
+	"github.com/nats-io/nats.go"
 	"gorm.io/gorm"
 )

@@ -206,12 +208,47 @@ func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
 			}
 			continue
 		}
+
+		// ErrNoResponders means the node has no active NATS subscription for
+		// this subject. Either its connection dropped, or it's the wrong
+		// node type entirely. Mark unhealthy so the health monitor's
+		// heartbeat-only pass doesn't immediately flip it back — and so
+		// ListDuePendingBackendOps (which filters by status=healthy) stops
+		// picking the row until the node genuinely recovers.
+		if errors.Is(applyErr, nats.ErrNoResponders) {
+			xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
+				"op", op.Op, "backend", op.Backend, "node", op.NodeID)
+			_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
+		}
+
+		// Dead-letter cap: after maxAttempts the row is the reconciler
+		// equivalent of a poison message. Delete it loudly so the queue
+		// doesn't churn NATS every tick forever — operators can re-issue
+		// the op from the UI if they still want it applied.
+		if op.Attempts+1 >= maxPendingBackendOpAttempts {
+			xlog.Error("Reconciler: abandoning pending backend op after max attempts",
+				"op", op.Op, "backend", op.Backend, "node", op.NodeID,
+				"attempts", op.Attempts+1, "last_error", applyErr)
+			if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
+				xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
+			}
+			continue
+		}
+
 		_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
 		xlog.Warn("Reconciler: pending backend op retry failed",
 			"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
 	}
 }

+// maxPendingBackendOpAttempts caps how many times the reconciler retries a
+// failing row before dead-lettering it. Ten attempts at exponential backoff
+// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
+// worker restart or network blip. Poisoned rows beyond that are almost
+// certainly structural (wrong node type, non-existent gallery entry) and no
+// amount of further retrying will help.
+const maxPendingBackendOpAttempts = 10
+
 // probeLoadedModels gRPC-health-checks model addresses that the DB says are
 // loaded. If a model's backend process is gone (OOM, crash, manual restart)
 // we remove the row so ghosts don't linger. Only probes rows older than
--- a/core/services/nodes/reconciler_test.go
+++ b/core/services/nodes/reconciler_test.go
@@ -373,4 +373,30 @@ var _ = Describe("ReplicaReconciler — state reconciliation", func() {
 			Expect(row.NextRetryAt).To(BeTemporally(">", before))
 		})
 	})
+
+	Describe("NewNodeRegistry malformed-row pruning", func() {
+		It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
+			agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
+			Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
+			backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
+			Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
+
+			// Three rows: one for a valid backend node (should survive),
+			// one for an agent node (pruned), one for an empty backend name
+			// on the valid node (pruned).
+			Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
+			Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
+			Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
+
+			// Re-instantiating the registry runs the cleanup migration.
+			_, err := NewNodeRegistry(db)
+			Expect(err).ToNot(HaveOccurred())
+
+			var rows []PendingBackendOp
+			Expect(db.Find(&rows).Error).To(Succeed())
+			Expect(rows).To(HaveLen(1))
+			Expect(rows[0].NodeID).To(Equal(backend.ID))
+			Expect(rows[0].Backend).To(Equal("foo"))
+		})
+	})
 })
--- a/core/services/nodes/registry.go
+++ b/core/services/nodes/registry.go
@@ -148,6 +148,30 @@ func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
 	}); err != nil {
 		return nil, fmt.Errorf("migrating node tables: %w", err)
 	}
+
+	// One-shot cleanup of queue rows that can never drain: ops targeted at
+	// agent workers (wrong subscription set), at non-existent nodes, or with
+	// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
+	// new ones from being written, but rows persisted by earlier versions
+	// keep the reconciler busy retrying a permanently-failing NATS request
+	// every 30s. Guarded by the same migration advisory lock so only one
+	// frontend runs it.
+	_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
+		res := db.Exec(`
+			DELETE FROM pending_backend_ops
+			WHERE backend = ''
+			   OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
+		`, NodeTypeBackend)
+		if res.Error != nil {
+			xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
+			return res.Error
+		}
+		if res.RowsAffected > 0 {
+			xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
+		}
+		return nil
+	})
+
 	return &NodeRegistry{db: db}, nil
 }

--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -15186,10 +15186,10 @@
    - gpu
  overrides:
    parameters:
-      model: wan2.1_t2v_1.3b-q8_0.gguf
+      model: wan2.1-t2v-1.3B-Q8_0.gguf
  files:
-    - filename: "wan2.1_t2v_1.3b-q8_0.gguf"
-      uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
+    - filename: "wan2.1-t2v-1.3B-Q8_0.gguf"
+      uri: "huggingface://calcuis/wan-gguf/wan2.1-t2v-1.3B-Q8_0.gguf"
    - filename: "wan_2.1_vae.safetensors"
      uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
    - filename: "umt5-xxl-encoder-Q8_0.gguf"
--- a/gallery/wan-ggml.yaml
+++ b/gallery/wan-ggml.yaml
@@ -9,6 +9,11 @@ config_file: |
    - "diffusion_model"
    - "vae_decode_only:false"
    - "sampler:euler"
+    - "scheduler:discrete"
    - "flow_shift:3.0"
+    - "diffusion_flash_attn:true"
+    - "offload_params_to_cpu:true"
+    - "keep_vae_on_cpu:true"
+    - "keep_clip_on_cpu:true"
    - "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
    - "vae_path:wan_2.1_vae.safetensors"
Author	SHA1	Message	Date
Ettore Di Giacinto	44e7d9806b	fix(distributed): stop queue loops on agent nodes + dead-letter cap pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.	2026-04-19 21:27:05 +00:00
Ettore Di Giacinto	7a9d89fa54	feat(ui): shared FilterBar across the System page tabs The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.	2026-04-19 08:46:22 +00:00
Ettore Di Giacinto	ee34a52c5d	feat(ui): NodeDistributionChip — shared per-node attribution component Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic.	2026-04-19 08:39:59 +00:00
Ettore Di Giacinto	92b9e22dc9	feat(ui): show cluster distribution of models in the System page When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle.	2026-04-19 08:37:45 +00:00
Ettore Di Giacinto	f0ab68e352	feat(distributed): durable backend fan-out + state reconciliation Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter.	2026-04-19 08:34:57 +00:00
Ettore Di Giacinto	9373de9f9b	feat(ui): polish the Nodes page so it reads like a product The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker.	2026-04-19 08:20:52 +00:00
Ettore Di Giacinto	1b3c951c85	feat(ui): surface backend upgrades in the System page The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles.	2026-04-19 08:14:49 +00:00
Ettore Di Giacinto	1f43762655	fix(distributed): detect backend upgrades across worker nodes Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see why it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed.	2026-04-19 08:03:20 +00:00