Compare commits

..

45 Commits

Author SHA1 Message Date
Ettore Di Giacinto
3280b9a287 fix(distributed): per-replica backend logs (store aggregation + UI)
The multi-replica refactor (PR #9583) changed the worker's process key
from `modelID` to `modelID#replicaIndex`, but the BackendLogStore kept
the bare-modelID lookup. Result: every distributed deployment lost
backend logs in the Nodes UI — single-replica too, since even the
default capacity of 1 produces a `#0` suffix.

Two changes wired together:

* pkg/model: BackendLogStore.GetLines/Subscribe now treat a modelID
  without `#` as a model prefix and merge across all `modelID#N` replica
  buffers (timestamp-sorted for GetLines; fan-in for Subscribe). Calls
  with a full `modelID#N` key resolve exactly. ListModels strips
  replica suffixes and deduplicates so the listing surfaces one entry
  per loaded model.

* react-ui: per-replica log streams as the default. Loaded Models
  table disambiguates each row with a `rep N` pill (only when the node
  hosts >1 replica of a model). Each row's "View logs" link routes to
  the per-replica process key so operators see only that replica's
  output. The logs page renders the replica context as a chip in the
  title and surfaces a segmented control — `Replica 0 / 1 / … / All
  merged` — when the model has multiple replicas; the merged segment
  uses the bare-modelID URL (delegating to the store's prefix
  aggregation) for the side-by-side comparison case. Single-replica
  deployments see no extra UI.

Tests added first (TDD): the regression set in
backend_log_store_test.go reproduces the bug at the exact failure
point — GetLines/ListModels/Subscribe assertions all fail against the
broken code, all pass against the fix. TestSubscribe_PerReplicaFilter
pins the exact-key path so a future change can't silently break it.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:distill]
2026-04-27 20:55:24 +00:00
Ettore Di Giacinto
375bf1929d fix(ui): hide meta-dev backends in System → Backends Development toggle
The Manage view's flagsFor() short-circuited on b.IsMeta and returned
dev=false for every meta backend, so meta-dev entries
(e.g. llama-cpp-development, whisper-development, insightface-development)
leaked through the Development toggle in distributed mode and stayed
visible whether the toggle was on or off. The count chip even
under-reported because those rows were excluded from it.

Drop the IsMeta short-circuit and trust gallery enrichment for both
flags. Production metas (llama-cpp) are tagged isAlias=false /
isDevelopment=false in the gallery so they still pass both toggles;
meta-dev entries carry isDevelopment=true and now correctly hide
alongside concrete dev variants.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 20:38:20 +00:00
Ettore Di Giacinto
9a7f5e68bd ci(darwin): add native caches to backend_build_darwin
macOS runners can't use the registry-backed BuildKit cache (no Docker
daemon), so every darwin matrix run was paying full cost for brew
installs, Go module downloads, llama.cpp recompiles and Python wheel
resolution.

Wires actions/cache@v4 into the reusable workflow for four caches:

- Go modules + build cache (setup-go cache: true), shared across matrix
- Homebrew downloads + selected /opt/homebrew/Cellar entries, with
  HOMEBREW_NO_AUTO_UPDATE so restored Cellar paths stay stable
- ccache for the llama-cpp CMake variants, keyed on the pinned
  LLAMA_VERSION; CMAKE_*_COMPILER_LAUNCHER is exported via GITHUB_ENV
  so backend/cpp/llama-cpp/Makefile picks it up without script changes
- Python uv + pip wheel cache, keyed by backend + ISO week — same
  one-cold-rebuild-per-week cadence as the Linux DEPS_REFRESH

Read/write semantics match the existing BuildKit policy: every run
restores, only master/tag pushes save, so PRs can't pollute master's
warm cache.

Documents the new caches and the macOS-specific constraints in
.agents/ci-caching.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
2026-04-27 20:17:36 +00:00
Ettore Di Giacinto
6b63b47f61 feat(distributed): support multiple replicas of one model on the same node (#9583)
* feat(distributed): support multiple replicas of one model on the same node

The distributed scheduler implicitly assumed `(node_id, model_name)` was
unique, but the schema didn't enforce it and the worker keyed all gRPC
processes by model name alone. With `MinReplicas=2` against a single
worker, the reconciler "scaled up" every 30s but the registry never
advanced past 1 row — the worker re-loaded the model in-place every tick
until VRAM fragmented and the gRPC process died.

This change introduces multi-replica-per-node as a first-class concept,
with capacity-aware scheduling, a circuit breaker, and VRAM
soft-reservation. Operators can declare per-node capacity via the worker
flag `--max-replicas-per-model` (mirrored as auto-label
`node.replica-slots=N`) or override per-node from the UI.

* Schema: BackendNode gains MaxReplicasPerModel (default 1) and
  ReservedVRAM. NodeModel gains ReplicaIndex (composite with node_id +
  model_name). ModelSchedulingConfig gains UnsatisfiableUntil/Ticks for
  the reconciler circuit breaker.

* Registry: replica_index threaded through SetNodeModel, RemoveNodeModel,
  IncrementInFlight, DecrementInFlight, TouchNodeModel, GetNodeModel,
  SetNodeModelLoadInfo and the InFlightTrackingClient. New helpers:
  CountReplicasOnNode, NextFreeReplicaIndex (with ErrNoFreeSlot),
  RemoveAllNodeModelReplicas, FindNodesWithFreeSlot,
  ClusterCapacityForModel, ReserveVRAM/ReleaseVRAM (atomic UPDATE with
  ErrInsufficientVRAM), and the unsatisfiable-flag CRUD.

* Worker: processKey now `<modelID>#<replicaIndex>` so concurrent loads
  of the same model land on distinct ports. Adds CLI flag
  --max-replicas-per-model (env LOCALAI_MAX_REPLICAS_PER_MODEL, default 1)
  and emits the auto-label.

* Router: scheduleNewModel filters candidates by free slot, allocates the
  replica index, and soft-reserves VRAM before installing the backend.
  evictLRUAndFreeNode now deletes the targeted row by ID instead of all
  replicas of the model on the node — fixes a latent bug where evicting
  one replica orphaned its siblings.

* Reconciler: caps scale-up at ClusterCapacityForModel so a misconfig
  (MinReplicas > capacity) doesn't loop forever. After 3 consecutive
  ticks of capacity==0 it sets UnsatisfiableUntil for a 5m cooldown and
  emits a warning. ClearAllUnsatisfiable fires from Register,
  ApproveNode, SetNodeLabel(s), RemoveNodeLabel and
  UpdateMaxReplicasPerModel so a new node joining or label changes wake
  the reconciler immediately. scaleDownIdle removes highest-replica-index
  first to keep slots compact.

* Heartbeat resets reserved_vram to 0 — worker is the source of truth
  for actual free VRAM; the reservation is only for the in-tick race
  window between two scheduling decisions.

* Probe path (reconciler.probeLoadedModels and health.doCheckAll) now
  pass the row's replica_index to RemoveNodeModel so an unreachable
  replica doesn't orphan healthy siblings.

* Admin override: PUT /api/nodes/:id/max-replicas-per-model sets a
  sticky override (preserved across worker re-registration). DELETE
  clears the override so the worker's flag applies again on next
  register. Required because Kong defaults the worker flag to 1, so
  every worker restart would have silently reverted the UI value.

* React UI: always-visible slot badge on the node row (muted at default
  1, accented when >1); inline editor in the expanded drawer with
  pencil-to-edit, Save/Cancel, Esc/Enter, "(override)" indicator when
  the value is admin-set, and a "Reset" button to hand control back to
  the worker. Soft confirm when shrinking the cap below the count of
  loaded replicas. Scheduling rules table gets an "Unsatisfiable until
  HH:MM" status badge surfacing the cooldown.

* node.replica-slots filtered out of the labels strip on the row to
  avoid duplicating the slot badge.

23 new Ginkgo specs (registry, reconciler, inflight, health) cover:
multi-replica row independence, RemoveNodeModel of one replica
preserving siblings, NextFreeReplicaIndex slot allocation including
ErrNoFreeSlot, capacity-gated scale-up with circuit breaker tripping
and recovery on Register, scheduleDownIdle ordering, ClusterCapacity
math, ReserveVRAM admission gating, Heartbeat reset, override survival
across worker re-registration, and ResetMaxReplicasPerModel handing
control back. Plus 8 stdlib tests for the worker processKey / CLI /
auto-label.

Closes the flap reproduced on Qwen3.6-35B against the nvidia-thor
worker (single 128 GiB node, MinReplicas=2): the reconciler now caps
the scale-up at the cluster's actual capacity instead of looping.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Read] [Edit] [Bash] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:golang-testing]

* refactor(react-ui/nodes): tighten capacity editor copy + adopt ActionMenu for row actions

* Capacity editor hint trimmed from operator-doc-style ("Sourced from
  the worker's `--max-replicas-per-model` flag. Changing it here makes it
  a sticky admin override that survives worker restarts." → "Saved
  values stick across worker restarts.") and the override-state copy
  similarly compressed. The full mechanic is no longer needed in the UI
  — the override pill carries the meaning and the docs cover the rest.

* Node row actions migrated from an inline cluster of icon buttons
  (Drain / Resume / Trash) to the kebab ActionMenu used by /manage for
  per-row model actions, so dense Nodes tables stay clean. Approve
  stays as a prominent primary button — it's a stateful admission gate,
  not a routine action, and elevating it matches how /manage surfaces
  install-time decisions outside the menu.

* The expanded drawer's Labels section now filters node.replica-slots
  out of the editable label list. The label is owned by the Capacity
  editor above; surfacing it again as an editable label invited
  confusion (the Capacity save would clobber any direct edit).

Both backend and agent workers benefit — they share the row rendering
path, so the action menu and label filter apply to both.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:critique] [Skill:audit] [Skill:polish]

* fix(react-ui/nodes): suppress slot badge on agent workers

Agent workers don't load models, so the per-node replica capacity is
inapplicable to them. Showing "1× slots" on agent rows was a tiny
inconsistency from the unified rendering path — gate the badge on
node_type !== 'agent' so it only appears on backend workers.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp]

* refactor(react-ui/nodes): distill expanded drawer + restyle scheduling form

The expanded node drawer used to stack five panels — slot badge,
filled capacity box, Loaded Models h4+empty-state, Installed Backends
h4+empty-state, Labels h4+chips+form — making routine inspections feel
like a control panel. The scheduling rule form wrapped its mode toggle
as two 50%-width filled buttons that competed visually with the actual
primary action.

* Drawer: collapse three rarely-touched config zones (Capacity,
  Backends, Labels) into one `<details>` "Manage" disclosure (closed by
  default) with small uppercase eyebrow labels for each zone instead of
  parallel h4 sub-headings. Loaded Models stays as the at-a-glance
  headline with a single-line empty hint instead of a boxed empty state.
  CapacityEditor renders flat (no filled background) — the Manage
  disclosure provides framing.

* Scheduling form: replace the chunky 50%-width button-tabs with the
  project's existing `.segmented` control (icon + label, sized to
  content). Mode hint becomes a single tied line below. Fields stack
  vertically with helper text under inputs and a hairline divider above
  the right-aligned Save / Cancel.

The empty drawer collapses from ~5 stacked sections (~280px tall) to
two lines (~80px). The scheduling form now reads as a designed dialog
instead of raw building blocks. Both surfaces now match the typographic
density and weight of the rest of the admin pages.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:distill] [Skill:audit] [Skill:polish]

* feat(react-ui/nodes): replace scheduling form's model picker with searchable combobox

The native <select> made operators scroll through every gallery entry to
find a model name. The project already has SearchableModelSelect (used
in Studio/Talk/etc.) which combines free-text search with the gallery
list and accepts typed model names that aren't installed yet — useful
for pre-staging a scheduling rule before the node it'll run on has
finished bootstrapping.

Also drops the now-unused useModels import (the combobox manages the
gallery hook internally).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit]

* refactor(react-ui/nodes): consolidate key/value chip editor + add replica preset chips

The Nodes page was rendering the same key=value chip pattern in two
places with subtly different markup: the Labels editor in the expanded
drawer and (post-distill) the Node Selector input in the scheduling
form. The form's input was also a comma-separated string that operators
were getting wrong.

* Extract <KeyValueChips> as a fully controlled chip-builder. Parent
  owns the map and decides what onAdd/onRemove does — form state for the
  scheduling form, API calls for the live drawer Labels editor. Same
  visuals everywhere; one component to change when polish needs apply.

* Replace the comma-separated Node Selector text input with KeyValueChips.
  Operators were copying syntax from docs and missing commas; the chip
  vocabulary makes the key=value structure self-documenting.

* Add <ReplicaInput>: numeric input + quick-pick preset chips for Min/Max
  replicas. Picked over a slider because replica counts are exact specs
  derived from VRAM math (operator decision, not a fuzzy estimate). The
  chips give one-click access to common values (1/2/3/4 for Min,
  0=no-limit/2/4/8 for Max) without the slider's special-value problem
  (MaxReplicas=0 is categorical, not a position on a continuum).

* Drop the now-unused labelInputs state in the Nodes page (the inline
  label editor's per-node draft state lived there and is now owned by
  KeyValueChips).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:distill]

* test: fix CI fallout from multi-replica refactor (e2e/distributed + playwright)

Two breakages caught by CI that didn't surface in the local run:

* tests/e2e/distributed/*.go — multiple files used the pre-PR2 registry
  signatures for SetNodeModel / IncrementInFlight / DecrementInFlight /
  RemoveNodeModel / TouchNodeModel / GetNodeModel / SetNodeModelLoadInfo
  and one stale adapter.InstallBackend call in node_lifecycle_test.go.
  All updated to pass replicaIndex=0 — these tests don't exercise
  multi-replica behavior, they just need to compile against the new
  signatures. The chip-builder tests in core/services/nodes/ already
  cover the multi-replica logic.

* core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js — the
  drawer's distill refactor moved Backends inside a "Manage" <details>
  disclosure that's collapsed by default. The test helper expanded the
  node row but never opened Manage, so the per-node backend table was
  never in the DOM. Helper now clicks `.node-manage > summary` after
  expanding the row.

All 100 playwright tests pass locally; tests/e2e/distributed compiles
clean.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 21:20:05 +02:00
Ettore Di Giacinto
f4036fa83f ci(python-backends): add weekly DEPS_REFRESH cache-buster
The shared backend/Dockerfile.python ends in:
    RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.

DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.

Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.

Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 14:21:11 +00:00
Ettore Di Giacinto
3810fe1a1e fix(distributed): worker container healthcheck always unhealthy
The Dockerfile's HEALTHCHECK probes http://localhost:8080/readyz, which
is the OpenAI API server port. When the same image runs as a worker, it
listens on the gRPC base port (50051) and an HTTP file transfer server
on port-1 (50050) — nothing on 8080 — so docker always reports the
container as unhealthy.

Add unauthenticated /readyz and /healthz endpoints to the worker's HTTP
file transfer server, and override HEALTHCHECK_ENDPOINT for worker-1 in
the distributed compose file. Disable the healthcheck for agent-worker
since it is NATS-only and exposes no HTTP server.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 13:51:57 +00:00
Ettore Di Giacinto
bdfa5e934a ci: switch image/backend build cache to a dedicated registry image
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
  from the unused gha cache to type=registry pointing at
  quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
  ignore-error=true. Master/tag builds populate their own
  per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
  Dockerfile stage that was removed by b1fc5acd in July 2025, has been
  failing every night since, and never populated the gha cache. The new
  registry-cache scheme is self-warming, so no separate populator is
  needed.
- Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS
  build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in
  the root Dockerfile (the root Dockerfile no longer compiles gRPC; the
  source build now lives in backend/Dockerfile.{llama-cpp,
  ik-llama-cpp, turboquant} only and uses its own ARG defaults).
- Drop the unused grpc-base-image input from image_build.yml plus the
  matrix passthroughs in image.yml / image-pr.yml.
- Drop the unused GRPC_VERSION env in test.yml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 13:13:04 +00:00
Richard Palethorpe
deca6dbdad feat: Log backend exit code (#9581)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-27 14:19:18 +02:00
Ettore Di Giacinto
60549a8a60 feat(react-ui): page-width archetype system + mobile/tablet nav polish
Replace the universal max-width:1200px cap on .page with a four-tier
archetype system (narrow 760, medium 1080, default 1600, wide unbounded)
selected per page based on what its UX actually wants. Data/table pages
fill ultrawide displays; forms cap at reading width; tabbed feature
surfaces breathe.

Mobile/tablet:
- New 640/1024 breakpoint split. Tablets (640-1023) get a persistent
  52px icon rail; below 640 keeps the slide-off drawer.
- Drawer polish: body-scroll lock, Escape to close, focus moves into
  the drawer on open and back to the hamburger on close, aria-hidden
  + inert on main while open.
- Mobile top bar carries hamburger + theme toggle + account avatar
  (44x44 touch targets) so theme/account aren't trapped in the drawer.
- Page-level reflow on phones: page-header column-stacks, filter chips
  scroll horizontally, tables go edge-to-edge, OperationsBar overflows
  rather than wrapping. Honors prefers-reduced-motion.

Manage > Models: drop the toggle column; Enable/Disable joins the
per-row Actions menu alongside Stop/Pin/Edit/Logs/Delete for
consistency with the other action verbs.

Page-width tokens live in theme.css so future tuning is one line.
Removes 7 inline maxWidth workarounds from page roots.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Edit] [Bash]
2026-04-27 11:51:29 +00:00
Ettore Di Giacinto
54728e292f feat(react-ui): split Manage backends toggle into Variants and Development
Meta backends are now always shown — they're the entries operators
configure against — and two independent toggles govern the noise around
them. "Variants" hides platform-specific concrete builds that a meta
backend aliases on the host (e.g. llama-cpp-cuda12-12.4). "Development"
hides pre-release `-development` builds. Each toggle shows the count of
items currently hidden in its category. The legacy `bm` URL flag is
honored on read so existing deep-links resolve to the same view they
used to.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 08:23:53 +00:00
Tai An
86fd62233f fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) (#9580)
The overrides.parameters.model field referenced 'Qwen3.-27B-Claude-...' (missing the '5'), so model loads failed because the configured filename did not match the file actually downloaded by the entry's files: list ('Qwen3.5-27B-Claude-...').

Aligns the override filename with the files: entries and with the upstream HF repo (mradermacher/Qwen3.5-27B-...).
2026-04-27 09:24:00 +02:00
Alex Brick
41ed8ced70 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) (#9578)
Update additional intel base images
2026-04-27 09:18:57 +02:00
LocalAI [bot]
05e94bd9e7 chore: ⬆️ Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 (#9577)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-27 08:57:24 +02:00
Ettore Di Giacinto
8d124d080f feat(gallery): add whisper-development umbrella stanza
Mirrors the whisper capabilities map with -development variants so
clients can pull the master-tagged whisper.cpp backend via a single
platform-resolved name, matching the existing faster-whisper-development
and whisperx-development entries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 23:04:27 +00:00
Ettore Di Giacinto
2da1a4d230 feat(distributed): per-node backend installation from the gallery
In distributed mode the Backends gallery used to fan every install out to
every worker — fine for auto-resolving (meta) backends like llama-cpp where
each node picks its own variant, but wrong for hardware-specific builds
like cpu-llama-cpp that would silently land on every GPU node.

Adds a node-targeted install path through the existing
POST /api/nodes/:id/backends/install plumbing, with two entry points:

- Backends gallery row gets a split-button in distributed mode. Auto-
  resolving keeps "Install on all nodes" as the primary; chevron menu
  opens the picker. Hardware-specific routes the primary directly to the
  picker — no fan-out path on the row.
- Nodes-page drawer gets a "+ Add backend" button that navigates to
  /app/backends?target=<node-id>; the gallery scopes itself to that node
  (banner, single per-row install button, Reinstall/Remove for already-
  installed). One gallery, two scopes — no second UI to maintain.

The picker (new NodeInstallPicker) shows a 3-state suitability column
(Compatible / Override / Installed), an auto-expanding variant override
disclosure that fires when selected nodes have no working GPU, parallel
per-node installs with inline status and Retry-failed-nodes, and a
mismatch confirm that names the consequence on the button itself.

A 409 fan-out guard on /api/backends/apply protects CLI/Terraform/script
users from the same footgun: hardware-specific installs in distributed
mode now return code "concrete_backend_requires_target" with a human-
readable error and a meta_alternative pointer.

The gallery list payload now surfaces capabilities, metaBackendFor and
per-row nodes (NodeBackendRef) so the picker and the new Nodes column
have everything they need without re-walking the gallery client-side.

GODEBUG=netdns=go is set on the compose services because the cgo DNS
resolver follows the container's nsswitch.conf to host systemd-resolved
(127.0.0.53), unreachable from inside the container; the pure-Go
resolver reads /etc/resolv.conf directly and uses Docker's embedded DNS.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7[1m] [Edit] [Bash] [Read] [Write]
2026-04-26 22:05:18 +00:00
Ettore Di Giacinto
988430c850 test(react-ui): drive Manage page Backend logs link via the new kebab menu
Manage page row actions moved into ActionMenu in b336d9c6, so the
inline `<a title="Backend logs">` the e2e specs were asserting on no
longer exists. Open the row's kebab and assert against the menuitem.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
2026-04-26 20:51:01 +00:00
Ettore Di Giacinto
b336d9c626 feat(react-ui): polish Manage page with kebab menus and gallery rows
Bring the System / Manage page up to the visual standard of the Install
gallery so installed models and backends stop reading like a debug dump.

- Unified ResourceRow anatomy (icon, name+description, badges, status,
  expandable detail) shared across both tabs.
- Gallery enrichment cross-references installed names against the gallery
  list endpoints to surface icons, descriptions, license, tags, and links
  with a graceful "no description" fallback for custom imports.
- Header summary with four StatCards (Models / Backends / Running /
  Updates) — clickable to switch tab + pre-set filter.
- Backends meta + development entries hidden by default; "Show meta &
  development" paired toggle in the FilterBar with hidden-count hint.
- Kebab (three-dot) ActionMenu replaces the inline button cluster on
  every row; restrained until hover, keyboard-navigable, danger items
  separated by a divider.
- Backend "Version" cell now falls back to short digest, OCI tag, or
  ocifile basename when no semver is set, instead of showing "—" for
  every OCI install. Detail panel exposes full Source URI + Digest.
- Drop redundant column headers ("Actions", "On") — kebabs and toggles
  carry their own affordance; screen readers still get a label.
- Inline System / User / Meta / Dev badges next to the backend name so
  the dedicated Type column doesn't reserve space for "USER" repeated.
- Tightened the spacing between the System Resources card and the
  StatCards so they no longer crowd the RAM bar.

Extracted StatCard and GalleryLoader from Nodes.jsx and Models.jsx into
shared components so the visual language is one source of truth.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
2026-04-26 20:33:49 +00:00
Ettore Di Giacinto
f384c64a91 fix(model-loader): also skip .ckpt, .zip, and .tag files when scanning models
The local model directory scan treats every non-skipped file as a model
config candidate. Sidecar artifacts that ship alongside checkpoints
(checkpoint blobs, downloaded archives, ggml-style tag files) were
slipping through and showing up as bogus models in the listing. Add
their extensions to the suffix-skip list.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:37:53 +00:00
Ettore Di Giacinto
e9d8e92988 fix(react-ui): don't yank chat scroll to bottom while user is reading
The chat and agent-chat pages auto-scrolled to the bottom on every
streamed token. If the user scrolled up to re-read part of a response,
the next chunk pulled them back down — making long replies unreadable
while streaming.

Track a stickToBottomRef on each scroll event: if the user is within
80px of the bottom we keep auto-scrolling, otherwise we leave them
where they are. On chat switch we snap back to the bottom and re-pin.

Same fix applied to both Chat.jsx and AgentChat.jsx since they share
the same streaming pattern.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
5b0196c7d0 fix(whisper): scrub invalid UTF-8 from segment text before protobuf marshal
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
c8d63a1003 fix(react-ui): stop Manage page from blanking on auto-refresh; show real model use cases
- useModels.refetch now runs silently — distributed-mode 10s auto-refresh
  no longer flips loading=true and replaces the table with a spinner card.
- Manage Use Cases column derives badges from each model's actual
  capabilities (Chat / Image / TTS / Embeddings / etc.) instead of
  hardcoding a "Chat" link for every row.
- FilterBar right slot is right-aligned via margin-left:auto so the
  Update button lives at the end of the row, not next to the chips.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
LocalAI [bot]
d9cb0d6133 chore: ⬆️ Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 (#9572)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:35 +02:00
LocalAI [bot]
f5c268deac chore: ⬆️ Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a (#9571)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:25 +02:00
Tai An
8931a2ad31 fix(gallery): normalize inconsistent tag casing/plurals across gallery models (#9574)
- embeddings → embedding (6 models): aligns with the WebUI filter button
  defined in core/http/views/models.html ({ term: 'embedding', ... }), so
  models like nomic-embed-text-v1.5 now appear under the Embedding filter
- TTS → tts (5 models), ASR → asr (2 models): lowercase, per existing
  convention used by 161+ models
- CPU/Cpu → cpu (17 models), GPU → gpu (17 models): lowercase, per existing
  convention used by 666+ models
- dedupe duplicate tag entries on 3 models that already had repeated tags
  (gpt-oss-20b had gguf x2; arcee-ai/AFM-4.5B had gpu x2; one Qwen model
  had default x2)

Closes #9247
2026-04-26 08:33:38 +02:00
Ettore Di Giacinto
e16e758dff ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 (#9573)
Extend the existing CPU build matrix entries to produce a multi-arch
manifest (linux/amd64,linux/arm64) at the same image tags. arm64
Linux hosts without an NVIDIA GPU report the "default" capability,
which already maps to cpu-whisperx / cpu-faster-whisper in
backend/index.yaml -- so the manifest list lets Docker pull the right
variant without any gallery changes.

Both stacks install cleanly under aarch64: torch (2.4.1/2.8.0),
faster-whisper, ctranslate2, whisperx, opencv-python and the
remaining deps all ship manylinux2014_aarch64 wheels, so no source
builds run under QEMU emulation.

Follows the same pattern already used by cpu-llama-cpp-quantization.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 08:30:03 +02:00
LocalAI [bot]
1c45227346 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 (#9570)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 08:26:37 +02:00
Ettore Di Giacinto
fbe4f0a99b fix(docs): replace Docsy alert shortcode with Relearn notice
The docs site uses the hugo-theme-relearn theme, which provides
`notice` instead of Docsy's `alert`. The face-recognition,
voice-recognition, and stores feature pages used `{{% alert %}}`,
breaking `hugo build` with "template for shortcode \"alert\" not
found".

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 21:04:31 +00:00
Ettore Di Giacinto
d733c9cd13 fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds (#9568)
Blaizzy/mlx-vlm git HEAD bumped its constraint to mlx>=0.31.2, but
mlx-cuda-12 and mlx-cuda-13 are only published up to 0.31.1 on PyPI.
Since mlx[cudaXX]==0.31.2 forces a sibling wheel that doesn't exist,
pip backtracks through every older mlx[cudaXX], none of which satisfy
mlx>=0.31.2, producing ResolutionImpossible.

Pin all variants to the v0.4.4 tag (mlx>=0.30.0), which resolves
cleanly against mlx[cuda13]==0.31.1. cpu/mps weren't broken yet but
are pinned for consistency.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 22:06:01 +02:00
Ettore Di Giacinto
703b4fcae8 Change cron schedule to run every 12 hours
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-25 18:38:28 +02:00
Richard Palethorpe
73aacad2f9 fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557)
The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:

  ImportError: .../flash_attn_2_cuda...so: undefined symbol:
  _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.

vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.

Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-25 15:38:13 +00:00
LocalAI [bot]
806ea24ff4 chore: ⬆️ Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d (#9497)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:46 +02:00
LocalAI [bot]
385de3705e chore(model gallery): 🤖 add 1 new models via gallery agent (#9558)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:15 +02:00
Ettore Di Giacinto
21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 14:02:57 +02:00
Ettore Di Giacinto
24505e57f5 feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553)
* feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang

Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM
serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and
JetPack 7 boards:

- vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130
  prebuilt vllm + flash-attn).
- vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm
  0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master;
  cu12 path keeps the 0.14.0 pin to avoid disturbing existing images.
- sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the
  jetson-ai-lab SBSA cu130 index, so no source build is needed.
  Cublas13 sglang on x86_64 is intentionally deferred.

CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni},
-nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml
gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13)
and latest/development merge entries.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]

* fix(backends): use unsafe-best-match index strategy on l4t13 builds

The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.)
at limited versions / older Python ABIs. uv defaults to the first index
that contains a package and refuses to fall through to PyPI, so sglang
l4t13 build fails resolving decord. Mirror the existing cpu sglang
profile by setting --index-strategy=unsafe-best-match on l4t13 across
the three backends, and apply it to the explicit vllm install line in
vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS).

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]

* fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0

The [all] extra brings in outlines→decord, and decord has no aarch64
cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37
tags). With unsafe-best-match enabled, uv backtracked through sglang
versions trying to satisfy decord and silently landed on
sglang==0.1.16, an ancient version with an entirely different dep
tree (cloudpickle/outlines 0.0.44, etc.).

Drop [all] so decord is no longer required, and floor sglang at 0.5.0
to prevent any future resolver misfire from degrading the version
again.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 12:26:29 +02:00
LocalAI [bot]
d09706dc60 chore(model gallery): 🤖 add 1 new models via gallery agent (#9555)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 09:00:37 +02:00
LocalAI [bot]
08e393f7db chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 (#9549)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:59:48 +02:00
LocalAI [bot]
47cc3dc8d7 chore: ⬆️ Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 (#9548)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:58:27 +02:00
Ettore Di Giacinto
83b384de97 feat: surface distributed backend management errors (#9552)
* fix(distributed): surface per-node backend op errors to OpStatus

DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the
per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`.
When workers replied Success=false (e.g. an OCI image with no arm64
variant on a Jetson host), the per-node Error string was recorded in
result.Nodes[].Error but never reached the toplevel return value, so
OpStatus.Error stayed empty and the UI reported the install as
"completed" while the backend was nowhere on the cluster.

Add BackendOpResult.Err() that aggregates per-node Status=="error"
entries into a single error. Queued nodes (waiting for reconciler retry)
are deliberately not treated as failures. Wire the three callers and
DeleteBackendDetailed to call result.Err() so reply.Success=false
finally reaches OpStatus.Error → /api/backends/job/:uid → the UI.

The Delete closures had a related bug: they discarded the reply with
`_` and only checked the NATS round-trip error, so reply.Success=false
was a silent success even with the new aggregation. Check both.

Standalone mode (LocalBackendManager) already surfaces gallery errors
correctly through the same OpStatus.Error path; no change needed there.

Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct
errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]

* feat(react-ui): per-node backend delete + clearer upgrade affordance

The Nodes page exposed a per-node "reinstall" button (fa-sync-alt,
tooltip "Reinstall backend") but no per-node delete, even though the
Go side has had POST /api/nodes/:id/backends/delete →
RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up
for a while. Sync icons read as "refresh data" — the action is
functionally an upgrade (re-pulls the gallery image), so the affordance
was misleading.

Per-node backend row now renders two icon buttons:

- Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend
  on this node". Names both action and scope to differentiate from the
  cluster-wide upgrade on the Backends page.
- Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend
  from this node". Matches the node-level destructive style at the row
  action column rather than the solid btn-danger of primary destructive
  pages, since this is a secondary action inside a busy row.

Delete goes through the existing ConfirmDialog (danger=true) with copy
that names the backend and the node explicitly — it's a non-recoverable
op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which
already existed in the API client.

Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip),
delete button presence, confirm dialog flow with POST body assertion,
and cancel-doesn't-POST.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
2026-04-25 08:57:59 +02:00
Ettore Di Giacinto
487e3fd2a4 feat(react-ui): editorial refresh with Nord palette and polished primitives (#9550)
* feat(react-ui): editorial refresh with Nord palette and polished primitives

Replaces the cool gray-blue theme with a deep Nord-inspired palette:
frost-cyan accent (#88c0d0) on deep blue-black surfaces (#13171f /
#1a1f2a / #242a36), snow-storm text scale, aurora status colours.

- Typography: Geist Variable + Geist Mono Variable (Google Fonts) with
  ss01/ss03/cv11 stylistic alternates; strengthened h1-h6 hierarchy;
  editorial negative tracking.
- Primitives: buttons gain depth (inset highlight + hover lift +
  brightness filter); inputs become sunken wells with sage-swap-to-frost
  focus rings; cards hover-lift and gain an .card--accent left-rail
  variant; badges become mono caps rectangles with tabular-nums.
- Chrome: sidebar active state is now an inset left rail + tint
  (no border-left); modals get popIn animation and proper shadow lift;
  toasts carry an inset accent bar + slide-in instead of tinted fills;
  operations bar breathes on active installs.
- Empty states: editorial pattern (eyebrow rule, large mono title,
  52ch lede) that inherits gracefully even without page JSX edits.
- Chat: assistant bubbles drop the gray-nested-in-gray card for a
  transparent pull-quote with a left border; user bubbles soften from
  loud accent fill to a subtle frost tint.
- Motion: custom spring easing cubic-bezier(0.22,1,0.36,1), 180ms
  standard; breathing/pulse/popIn keyframes; global prefers-reduced-
  motion honoring.
- Radii tightened to 3/5/8/10px; warm-shadow tokens redone for cool
  depth; ::selection, :focus-visible, kbd globals added.
- Migrated hardcoded 'JetBrains Mono' CSS literals to var(--font-mono)
  so the Geist Mono swap lands everywhere.

Scope is intentionally tokens + primitives only. Page JSX and the
~1,800 inline style={{…}} instances are untouched and flagged as
follow-ups.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]

* feat(react-ui): complete-coverage pass — migrate inline styles to tokens

Follows up the editorial/Nord token refresh with a mechanical sweep of
page JSX and shared components so nothing bypasses the design system.

- Font family: replaced 80+ 'JetBrains Mono' / 'Space Grotesk' inline
  literals (and the string-CSS variants in CollectionDetails and
  AgentStatus) with var(--font-mono) / var(--font-sans). SVG <text>
  nodes that used the attribute form were switched to style={{ }} so
  the CSS variable resolves.
- Radii: every unquoted numeric borderRadius (2/3/4/10) is now a
  var(--radius-*) token; 50% and 999px kept as computed shapes.
- Spacing: clean-token gaps and margins (4/8/16px) moved to
  var(--spacing-xs/sm/md); padding: '4px 8px' and '8px 16px' lifted
  into token pairs. Micro-values (2/6/10/12px) left inline where no
  token maps cleanly.
- Colors: Talk.jsx button/canvas-surface hardcodes moved to
  var(--color-*); FineTune.jsx chart series colours now use the
  --color-data-* Nord palette (cyan/red/purple/orange instead of
  tailwind hex); AgentStatus tool-call icon and error tag hex swapped
  for var(--color-warning) / var(--color-text-inverse).
- CodeMirror editor (utils/cmTheme.js): both themes rebased on Nord —
  polar-night surfaces and aurora syntax highlighting (dark), snow-
  storm surfaces with darkened aurora (light). Caret/selection/active
  line/search now frost-cyan tinted instead of legacy indigo/purple.

Legitimately dynamic styles (computed widths, per-row colours, canvas
2D context fill/stroke for waveform and spectrogram drawing) remain
inline — they can't be expressed as CSS tokens.

29 files, +237/-237 — identity preserved, semantics re-anchored to
the token system.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]
2026-04-24 23:35:59 +02:00
dependabot[bot]
9ab3496de2 chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory (#9546)
chore(deps): bump rustls-webpki

Bumps the cargo group with 1 update in the /backend/rust/kokoros directory: [rustls-webpki](https://github.com/rustls/webpki).


Updates `rustls-webpki` from 0.103.10 to 0.103.13
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.10...v/0.103.13)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-version: 0.103.13
  dependency-type: indirect
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:58 +02:00
dependabot[bot]
c4511be33a chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9544)
chore(deps): bump postcss

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [postcss](https://github.com/postcss/postcss).


Updates `postcss` from 8.5.8 to 8.5.10
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.5.8...8.5.10)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.10
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:41 +02:00
Ettore Di Giacinto
551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545)
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.

Three causes addressed:

* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
  (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
  manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
  `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
  `/sys/devices/soc0/soc_id`.

* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
  `/proc/device-tree/model` was missing. That's what happens for Thor inside a
  docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
  resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
  (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
  correctly.

* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
  usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
  with `waitid: no child processes` under containers without `--init`. Each
  such heartbeat overwrote the DB and made the UI flip to "fully used".
  `heartbeatBody` now omits `available_vram` in that case so the DB keeps its
  last good value.

Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.

Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.

Assisted-by: Claude:opus-4.7 [Claude Code]
2026-04-24 22:02:23 +02:00
Andreas Egli
1d0de757c3 fix: add hipblaslt library (#9541)
Signed-off-by: Andreas Egli <github@kharan.ch>
2026-04-24 18:50:03 +02:00
Alex Brick
e5337039b0 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (#9543)
* Use latest oneapi-basekit image for Intel images

The current `localai/localai:master-gpu-intel` images don't work with the intel arc pro b70. Updating the base_image to 2025.3.2 fixes it.

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>

* Update github workflow base image

---------

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>
2026-04-24 18:29:10 +02:00
LocalAI [bot]
1c9592c77f chore: ⬆️ Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a (#9521)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-24 15:15:15 +02:00
177 changed files with 8260 additions and 3046 deletions

View File

@@ -43,7 +43,7 @@ If you add a new language bucket, `scripts/changed-backends.js` also needs a bra
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
## 3. Add Backend Metadata to `backend/index.yaml`

View File

@@ -35,33 +35,19 @@ All contributions must comply with LocalAI's licensing requirements:
## Signed-off-by and Developer Certificate of Origin
Only humans can certify the Developer Certificate of Origin (DCO). AI
agents MUST NOT invent or guess a human identity for `Signed-off-by`
doing so forges the DCO certification.
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
certify the Developer Certificate of Origin (DCO). The human submitter
is responsible for:
However, when a human operator explicitly directs the AI to commit on
their behalf, the AI is acting as a typing tool — no different from an
editor macro or `git commit -s`. In that case the AI SHOULD add
`Signed-off-by:` using the **configured `user.name` / `user.email`** of
the current git repository (i.e. the operator's own identity). The
resulting trailer is the operator's signature; they take responsibility
for it by reviewing and pushing the commit. The AI MUST NOT use any
other identity and MUST NOT add its own name to the sign-off.
When running `git commit`, prefer `git commit --signoff` (or `-s`) so
the trailer is emitted by git itself from the configured identity,
rather than hand-writing it in a heredoc — this guarantees the sign-off
matches whatever identity the operator is currently using.
The human submitter remains responsible for:
- Reviewing all AI-generated code before it's pushed or merged
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO)
to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves. A human
reviewer owns the contribution; the AI's involvement is recorded via
`Assisted-by` (see below).
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
A human reviewer owns the contribution; the AI's involvement is recorded
via `Assisted-by` (see below).
## Attribution
@@ -98,12 +84,6 @@ Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
The `Signed-off-by` line uses Jane's own identity because Jane is the
submitter operating the AI. If Jane asks Claude to create the commit via
`git commit -s`, git emits that exact trailer from Jane's configured
identity — no separate human step is needed beyond Jane reviewing the
diff before pushing.
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility.

111
.agents/ci-caching.md Normal file
View File

@@ -0,0 +1,111 @@
# CI Build Caching
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
## Cache layout
- **Cache registry**: `quay.io/go-skynet/ci-cache`
- **One tag per matrix entry**, derived from the existing `tag-suffix`:
- Backend builds (`backend_build.yml`): `cache<tag-suffix>`
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
## Read/write semantics
| Trigger | `cache-from` | `cache-to` |
|---|---|---|
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
| `pull_request` | yes | **no** |
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
## Self-warming, no separate populator
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
## The `DEPS_REFRESH` cache-buster (Python backends)
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
```dockerfile
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
```
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
`DEPS_REFRESH` defends against that:
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
- Within a week, builds stay warm.
This applies only to `Dockerfile.python` because:
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
### Adjusting the cadence
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
## Manually evicting cache
To force a fully cold build for one backend or the whole image:
```bash
# Delete a single tag (requires quay credentials with admin on the repo)
curl -X DELETE \
-H "Authorization: Bearer ${QUAY_TOKEN}" \
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
# List all tags
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
```
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
## What the cache **does not** cover
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
## Darwin native caches
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
| Cache | Path(s) | Key | Scope |
|---|---|---|---|
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
### Cache budget on Darwin
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
## Touching the cache pipeline
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.

View File

@@ -141,7 +141,7 @@ jobs:
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-whisperx'
runs-on: 'ubuntu-latest'
@@ -154,7 +154,7 @@ jobs:
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-faster-whisper'
runs-on: 'ubuntu-latest'
@@ -399,19 +399,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-buun-llama-cpp'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
@@ -907,19 +894,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-buun-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -949,16 +923,29 @@ jobs:
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-buun-llama-cpp'
tag-suffix: '-gpu-nvidia-cuda-13-vllm'
runs-on: 'arc-runner-set'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
ubuntu-version: '2404'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
skip-drivers: 'false'
backend: "vllm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-vllm-omni'
runs-on: 'arc-runner-set'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "vllm-omni"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1115,6 +1102,45 @@ jobs:
backend: "diffusers"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "vllm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm-omni'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "vllm-omni"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-sglang'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "sglang"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1493,19 +1519,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-buun-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: 'hipblas'
cuda-major-version: ""
cuda-minor-version: ""
@@ -1723,7 +1736,7 @@ jobs:
tag-latest: 'auto'
tag-suffix: '-gpu-intel-rerankers'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "rerankers"
dockerfile: "./backend/Dockerfile.python"
@@ -1736,7 +1749,7 @@ jobs:
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "llama-cpp"
dockerfile: "./backend/Dockerfile.llama-cpp"
@@ -1755,19 +1768,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f32'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-buun-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
@@ -1794,19 +1794,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'sycl_f16'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f16-buun-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: 'intel'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2212,19 +2199,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-buun-llama-cpp'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
@@ -2264,19 +2238,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2204'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'false'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-buun-llama-cpp'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2204'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
@@ -2303,19 +2264,6 @@ jobs:
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
ubuntu-version: '2404'
- build-type: 'vulkan'
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-buun-llama-cpp'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "buun-llama-cpp"
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
context: "./"
ubuntu-version: '2404'
# Stablediffusion-ggml
- build-type: ''
cuda-major-version: ""

View File

@@ -208,6 +208,15 @@ jobs:
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
# Weekly cache-buster for the per-backend `make` step. Most Python
# backends list unpinned deps (torch, transformers, vllm, ...), so a
# warm cache freezes upstream versions indefinitely. Rolling this
# weekly forces a re-resolve of the install layer at most once per
# week, picking up newer wheels without a full cold rebuild.
- name: Compute deps refresh key
id: deps_refresh
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Build and push
uses: docker/build-push-action@v7
if: github.event_name != 'pull_request'
@@ -222,9 +231,11 @@ jobs:
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
@@ -244,9 +255,10 @@ jobs:
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
push: ${{ env.quay_username != '' }}
tags: ${{ steps.meta_pull_request.outputs.tags }}

View File

@@ -48,6 +48,13 @@ jobs:
strategy:
matrix:
go-version: ['${{ inputs.go-version }}']
env:
# Keep the brew Cellar stable across cache restores. Without these,
# `brew install` would auto-update brew itself and re-link formulas,
# mutating the very paths the cache just restored.
HOMEBREW_NO_AUTO_UPDATE: '1'
HOMEBREW_NO_INSTALL_CLEANUP: '1'
HOMEBREW_NO_ANALYTICS: '1'
steps:
- name: Clone
uses: actions/checkout@v6
@@ -58,21 +65,141 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
# Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
# Shared across every darwin matrix entry — first job in a run warms
# it, the rest hit warm.
cache: true
# You can test your matrix by printing the current Go version
- name: Display Go version
run: go version
# ---- Homebrew cache ----
# macOS runners have no Docker daemon, so the BuildKit registry cache used
# for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
# We cache the brew downloads + Cellar entries for the formulas we install
# below. Read on every run, write only on master/tag pushes — same policy
# as the Linux registry cache.
- name: Restore Homebrew cache
id: brew-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
# ccache is always installed (used by the llama-cpp variant build) so
# the brew cache content stays stable across every backend in the
# matrix — they all share one cache key.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----
# Three CMake variants (fallback, grpc, rpc-server) compile the same
# llama.cpp source tree with overlapping flags — ccache dedupes object
# files across them. Key on the pinned LLAMA_VERSION so a pin bump
# invalidates cleanly; restore-keys fall back to the latest entry for the
# same pin so unchanged TUs stay warm even when the cache is fresh.
- name: Compute llama.cpp version
if: inputs.backend == 'llama-cpp'
id: llama-version
run: |
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
echo "version=${version}" >> "$GITHUB_OUTPUT"
- name: Restore ccache
if: inputs.backend == 'llama-cpp'
id: ccache-cache
uses: actions/cache/restore@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
restore-keys: |
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
- name: Configure ccache
if: inputs.backend == 'llama-cpp'
run: |
mkdir -p "$HOME/Library/Caches/ccache"
ccache -M 2G
ccache -z
# llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
{
echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
} >> "$GITHUB_ENV"
# ---- Python wheel cache (uv + pip) ----
# Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
# ISO-week segment of the cache key forces at most one cold rebuild per
# backend per week, automatically picking up newer wheels for unpinned
# deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
# recent build of the same backend so off-week PRs still hit warm.
- name: Compute weekly cache bucket
if: inputs.lang == 'python'
id: weekly
run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Restore Python wheel cache
if: inputs.lang == 'python'
id: pyenv-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
restore-keys: |
pyenv-darwin-${{ inputs.backend }}-
- name: Build ${{ inputs.backend }}-darwin
run: |
make protogen-go
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
- name: ccache stats
if: inputs.backend == 'llama-cpp'
run: ccache -s
- name: Save ccache
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
uses: actions/cache/save@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
- name: Save Python wheel cache
if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
- name: Upload ${{ inputs.backend }}.tar
uses: actions/upload-artifact@v7
with:

View File

@@ -2,7 +2,7 @@ name: Gallery Agent
on:
schedule:
- cron: '0 */3 * * *' # Run every 4 hours
- cron: '0 */12 * * *' # Run every 4 hours
workflow_dispatch:
inputs:
search_term:

View File

@@ -1,96 +0,0 @@
name: 'generate and publish GRPC docker caches'
on:
workflow_dispatch:
schedule:
# daily at midnight
- cron: '0 0 * * *'
concurrency:
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
generate_caches:
if: github.repository == 'mudler/LocalAI'
strategy:
matrix:
include:
- grpc-base-image: ubuntu:24.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}
steps:
- name: Release space from worker
if: matrix.runs-on == 'ubuntu-latest'
run: |
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
df -h
echo
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
sudo rm -rf /usr/local/lib/android
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
sudo rm -rf /usr/share/dotnet
sudo apt-get remove -y '^mono-.*' || true
sudo apt-get remove -y '^ghc-.*' || true
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
sudo apt-get remove -y 'php.*' || true
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
sudo apt-get remove -y '^google-.*' || true
sudo apt-get remove -y azure-cli || true
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
sudo apt-get remove -y '^gfortran-.*' || true
sudo apt-get remove -y microsoft-edge-stable || true
sudo apt-get remove -y firefox || true
sudo apt-get remove -y powershell || true
sudo apt-get remove -y r-base-core || true
sudo apt-get autoremove -y
sudo apt-get clean
echo
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
sudo rm -rfv build || true
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /opt/ghc || true
sudo rm -rf "/usr/local/share/boost" || true
sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
df -h
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v6
- name: Cache GRPC
uses: docker/build-push-action@v7
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
build-args: |
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
context: .
file: ./Dockerfile
cache-to: type=gha,ignore-error=true
cache-from: type=gha
target: grpc
platforms: ${{ matrix.platforms }}
push: false

View File

@@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
- base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
runs-on: 'arc-runner-set'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}

View File

@@ -20,7 +20,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
@@ -60,15 +59,13 @@
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"

View File

@@ -25,7 +25,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
@@ -42,12 +41,11 @@
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
core-image-build:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
@@ -60,7 +58,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
@@ -121,8 +118,7 @@
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
@@ -141,7 +137,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}

View File

@@ -8,11 +8,6 @@ on:
description: 'Base image'
required: true
type: string
grpc-base-image:
description: 'GRPC Base image, must be a compatible image with base-image'
required: false
default: ''
type: string
build-type:
description: 'Build type'
default: ''
@@ -201,25 +196,19 @@ jobs:
if: github.event_name != 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
@@ -230,25 +219,18 @@ jobs:
if: github.event_name == 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
#push: true
tags: ${{ steps.meta_pull_request.outputs.tags }}

View File

@@ -32,7 +32,6 @@ jobs:
llama-cpp: ${{ steps.detect.outputs.llama-cpp }}
ik-llama-cpp: ${{ steps.detect.outputs.ik-llama-cpp }}
turboquant: ${{ steps.detect.outputs.turboquant }}
buun-llama-cpp: ${{ steps.detect.outputs['buun-llama-cpp'] }}
vllm: ${{ steps.detect.outputs.vllm }}
sglang: ${{ steps.detect.outputs.sglang }}
acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
@@ -614,30 +613,6 @@ jobs:
- name: Build turboquant backend image and run gRPC e2e tests
run: |
make test-extra-backend-turboquant
tests-buun-llama-cpp-grpc:
needs: detect-changes
if: needs.detect-changes.outputs['buun-llama-cpp'] == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
# Exercises the buun-llama-cpp (fork-of-a-fork) backend with the
# fork-specific TurboQuant/TCQ KV-cache types. BACKEND_TEST_CACHE_TYPE_V
# is set to turbo3 so the test round-trips through the fork's KV
# allow-list — picking a stock llama.cpp type would only re-test the
# shared code path. DFlash speculative decoding is not exercised here
# because the one known public target/drafter pair (Qwen3.5-27B) is too
# large for CI.
- name: Build buun-llama-cpp backend image and run gRPC e2e tests
run: |
make test-extra-backend-buun-llama-cpp
# tests-vllm-grpc is currently disabled in CI.
#
# The prebuilt vllm CPU wheel is compiled with AVX-512 VNNI/BF16

View File

@@ -9,9 +9,6 @@ on:
tags:
- '*'
env:
GRPC_VERSION: v1.65.0
concurrency:
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true

View File

@@ -19,6 +19,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
|------|-------------|
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |

View File

@@ -1,5 +1,4 @@
ARG BASE_IMAGE=ubuntu:24.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
ARG UBUNTU_CODENAME=noble
@@ -149,6 +148,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/buun-llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
GOCMD=go
GOTEST=$(GOCMD) test
@@ -545,19 +545,6 @@ test-extra-backend-turboquant: docker-build-turboquant
BACKEND_TEST_CACHE_TYPE_V=turbo3 \
$(MAKE) test-extra-backend
## buun-llama-cpp: exercises the fork-of-a-fork backend (spiritbuun/buun-llama-cpp)
## with the *TurboQuant/TCQ-specific* KV-cache types (turbo3 for V). Same rationale
## as turboquant above: picking a standard llama.cpp type would only re-test the
## shared code path. buun inherits turboquant's turbo2/turbo3/turbo4 and adds
## turbo2_tcq / turbo3_tcq on top. DFlash speculative decoding is not exercised
## here because no small DFlash drafter model exists (the known public pair is
## Qwen3.5-27B, ~54 GB).
test-extra-backend-buun-llama-cpp: docker-build-buun-llama-cpp
BACKEND_IMAGE=local-ai-backend:buun-llama-cpp \
BACKEND_TEST_CACHE_TYPE_K=q8_0 \
BACKEND_TEST_CACHE_TYPE_V=turbo3 \
$(MAKE) test-extra-backend
## Audio transcription wrapper for the llama-cpp backend.
## Drives the new AudioTranscription / AudioTranscriptionStream RPCs against
## ggml-org/Qwen3-ASR-0.6B-GGUF (a small ASR model that requires its mmproj
@@ -896,7 +883,7 @@ docker-cuda12:
docker-image-intel:
docker build \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04 \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
@@ -962,11 +949,6 @@ BACKEND_IK_LLAMA_CPP = ik-llama-cpp|ik-llama-cpp|.|false|false
# turboquant is a llama.cpp fork with TurboQuant KV-cache quantization.
# Reuses backend/cpp/llama-cpp grpc-server sources via a thin wrapper Makefile.
BACKEND_TURBOQUANT = turboquant|turboquant|.|false|false
# buun-llama-cpp is a fork-of-a-fork (spiritbuun/buun-llama-cpp forks
# TheTom/llama-cpp-turboquant) that adds DFlash block-diffusion speculative
# decoding and extra TCQ KV-cache variants on top of TurboQuant. Same thin
# wrapper pattern as turboquant — reuses backend/cpp/llama-cpp grpc-server.
BACKEND_BUUN_LLAMA_CPP = buun-llama-cpp|buun-llama-cpp|.|false|false
# Golang backends
BACKEND_PIPER = piper|golang|.|false|true
@@ -1047,7 +1029,6 @@ endef
$(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_IK_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_TURBOQUANT)))
$(eval $(call generate-docker-build-target,$(BACKEND_BUUN_LLAMA_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
$(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
@@ -1099,7 +1080,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-buun-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
########################################################
### Mock Backend for E2E Tests

View File

@@ -1,290 +0,0 @@
ARG BASE_IMAGE=ubuntu:24.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
# The grpc target does one thing, it builds and installs GRPC. This is in it's own layer so that it can be effectively cached by CI.
# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
FROM ${GRPC_BASE_IMAGE} AS grpc
# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG GRPC_VERSION=v1.65.0
ARG CMAKE_FROM_SOURCE=false
# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
ARG CMAKE_VERSION=3.31.10
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
WORKDIR /build
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
build-essential curl libssl-dev \
git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
# and running make install in the target container
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
mkdir -p /build/grpc/cmake/build && \
cd /build/grpc/cmake/build && \
sed -i "216i\ TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
make && \
make install && \
rm -rf /build
FROM ${BASE_IMAGE} AS builder
ARG CMAKE_FROM_SOURCE=false
ARG CMAKE_VERSION=3.31.10
# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
ARG CUDA_DOCKER_ARCH
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
ARG CMAKE_ARGS
ENV CMAKE_ARGS=${CMAKE_ARGS}
ARG BACKEND=rerankers
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.25.4
ARG UBUNTU_VERSION=2404
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
ccache git \
ca-certificates \
make \
pkg-config libcurl4-openssl-dev \
curl unzip \
libssl-dev wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
else
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
fi
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get install -y nvpl
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig && \
# Log which GPU architectures have rocBLAS kernel support
echo "rocBLAS library data architectures:" && \
(ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
echo "WARNING: No rocBLAS kernel data found" \
; fi
RUN echo "TARGETARCH: $TARGETARCH"
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
# here so that we can generate the grpc code for the stablediffusion build
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
COPY --from=grpc /opt/grpc /usr/local
COPY . /LocalAI
RUN <<'EOT' bash
set -euxo pipefail
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
rm -rf /LocalAI/backend/cpp/buun-llama-cpp-*-build
fi
cd /LocalAI/backend/cpp/buun-llama-cpp
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
make buun-llama-cpp-fallback
make buun-llama-cpp-grpc
make buun-llama-cpp-rpc-server
else
make buun-llama-cpp-avx
make buun-llama-cpp-avx2
make buun-llama-cpp-avx512
make buun-llama-cpp-fallback
make buun-llama-cpp-grpc
make buun-llama-cpp-rpc-server
fi
EOT
# Copy libraries using a script to handle architecture differences
RUN make -BC /LocalAI/backend/cpp/buun-llama-cpp package
FROM scratch
# Copy all available binaries (the build process only creates the appropriate ones for the target architecture)
COPY --from=builder /LocalAI/backend/cpp/buun-llama-cpp/package/. ./

View File

@@ -147,6 +147,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -206,6 +206,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -162,6 +162,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
@@ -202,6 +203,13 @@ COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
ARG FROM_SOURCE=""
ENV FROM_SOURCE=${FROM_SOURCE}
# Cache-buster for the per-backend `make` step. Most Python backends list
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
# would otherwise freeze upstream versions indefinitely. CI passes a value
# that rolls weekly so the install layer is rebuilt at most once per week
# and picks up newer wheels from PyPI / nightly indexes.
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
# Package GPU libraries into the backend's lib directory
@@ -216,4 +224,4 @@ RUN if [ -f "/${BACKEND}/package.sh" ]; then \
FROM scratch
ARG BACKEND=rerankers
COPY --from=builder /${BACKEND}/ /
COPY --from=builder /${BACKEND}/ /

View File

@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -1,85 +0,0 @@
# Pinned to the HEAD of master on https://github.com/spiritbuun/buun-llama-cpp.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
BUUN_LLAMA_VERSION?=22464d0848b87c5d56b52fdf6af2e5da46bf803e
LLAMA_REPO?=https://github.com/spiritbuun/buun-llama-cpp
CMAKE_ARGS?=
BUILD_TYPE?=
NATIVE?=false
ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
TARGET?=--target grpc-server
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)
ARCH?=$(shell uname -m)
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
LLAMA_CPP_DIR := $(CURRENT_MAKEFILE_DIR)/../llama-cpp
GREEN := \033[0;32m
RESET := \033[0m
# buun-llama-cpp is a llama.cpp fork-of-a-fork (spiritbuun/buun-llama-cpp forked
# TheTom/llama-cpp-turboquant, which itself forked ggml-org/llama.cpp). Rather
# than duplicating grpc-server.cpp / CMakeLists.txt / prepare.sh we reuse the
# ones in backend/cpp/llama-cpp, and only swap which repo+sha the fetch step
# pulls. Each flavor target copies ../llama-cpp into a sibling
# ../buun-llama-cpp-<flavor>-build directory, then invokes llama-cpp's own
# build-llama-cpp-grpc-server with LLAMA_REPO/LLAMA_VERSION overridden to point
# at the fork.
PATCHES_DIR := $(CURRENT_MAKEFILE_DIR)/patches
# Each flavor target:
# 1. copies backend/cpp/llama-cpp/ (grpc-server.cpp + prepare.sh + CMakeLists.txt + Makefile)
# into a sibling buun-llama-cpp-<flavor>-build directory;
# 2. clones the buun fork into buun-llama-cpp-<flavor>-build/llama.cpp via the
# copy's own `llama.cpp` target, overriding LLAMA_REPO/LLAMA_VERSION;
# 3. applies patches from backend/cpp/buun-llama-cpp/patches/ to the cloned
# fork sources (for backporting upstream commits the fork hasn't pulled);
# 4. runs the copy's `grpc-server` target, which produces the binary we copy
# up as buun-llama-cpp-<flavor>.
define buun-llama-cpp-build
rm -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build
cp -rf $(LLAMA_CPP_DIR) $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build purge
# Augment the copied grpc-server.cpp's KV-cache allow-list with the
# fork's turbo2/turbo3/turbo4/turbo2_tcq/turbo3_tcq types and wire up the
# DFlash-specific option handlers (tree_budget / draft_topk). We patch the
# *copy*, never the original under backend/cpp/llama-cpp/, so the stock
# llama-cpp build stays compiling against vanilla upstream.
bash $(CURRENT_MAKEFILE_DIR)/patch-grpc-server.sh $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/grpc-server.cpp
$(info $(GREEN)I buun-llama-cpp build info:$(1)$(RESET))
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(BUUN_LLAMA_VERSION) \
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build llama.cpp
bash $(CURRENT_MAKEFILE_DIR)/apply-patches.sh $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/llama.cpp $(PATCHES_DIR)
CMAKE_ARGS="$(CMAKE_ARGS) $(2)" TARGET="$(3)" \
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(BUUN_LLAMA_VERSION) \
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build grpc-server
cp -rfv $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/grpc-server buun-llama-cpp-$(1)
endef
buun-llama-cpp-avx2:
$(call buun-llama-cpp-build,avx2,-DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on,--target grpc-server)
buun-llama-cpp-avx512:
$(call buun-llama-cpp-build,avx512,-DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on,--target grpc-server)
buun-llama-cpp-avx:
$(call buun-llama-cpp-build,avx,-DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)
buun-llama-cpp-fallback:
$(call buun-llama-cpp-build,fallback,-DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)
buun-llama-cpp-grpc:
$(call buun-llama-cpp-build,grpc,-DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server --target rpc-server)
buun-llama-cpp-rpc-server: buun-llama-cpp-grpc
cp -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-grpc-build/llama.cpp/build/bin/rpc-server buun-llama-cpp-rpc-server
package:
bash package.sh
purge:
rm -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-*-build
rm -rf buun-llama-cpp-* package
clean: purge

View File

@@ -1,50 +0,0 @@
#!/bin/bash
# Apply the buun-llama-cpp patch series to a cloned buun-llama-cpp checkout.
#
# buun-llama-cpp is a fork-of-a-fork that branched off upstream llama.cpp
# before some API changes the shared backend/cpp/llama-cpp/grpc-server.cpp
# depends on. We carry those upstream commits as patch files under
# backend/cpp/buun-llama-cpp/patches/ and apply them here so the reused
# grpc-server source compiles against the fork unmodified.
#
# Drop the corresponding patch from patches/ whenever the fork catches up with
# upstream — the build will fail fast if a patch stops applying, which is the
# signal to retire it.
set -euo pipefail
if [[ $# -ne 2 ]]; then
echo "usage: $0 <llama.cpp-src-dir> <patches-dir>" >&2
exit 2
fi
SRC_DIR=$1
PATCHES_DIR=$2
if [[ ! -d "$SRC_DIR" ]]; then
echo "source dir does not exist: $SRC_DIR" >&2
exit 2
fi
if [[ ! -d "$PATCHES_DIR" ]]; then
echo "no patches dir at $PATCHES_DIR, nothing to apply"
exit 0
fi
shopt -s nullglob
patches=("$PATCHES_DIR"/*.patch)
shopt -u nullglob
if [[ ${#patches[@]} -eq 0 ]]; then
echo "no .patch files in $PATCHES_DIR, nothing to apply"
exit 0
fi
cd "$SRC_DIR"
for patch in "${patches[@]}"; do
echo "==> applying $patch"
git apply --verbose "$patch"
done
echo "all buun-llama-cpp patches applied successfully"

View File

@@ -1,57 +0,0 @@
#!/bin/bash
# Script to copy the appropriate libraries based on architecture
# This script is used in the final stage of the Dockerfile
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
# Create lib directory
mkdir -p $CURDIR/package/lib
cp -avrf $CURDIR/buun-llama-cpp-* $CURDIR/package/
cp -rfv $CURDIR/run.sh $CURDIR/package/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
# ARM64 architecture
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
else
echo "Error: Could not detect architecture"
exit 1
fi
# Package GPU libraries based on BUILD_TYPE
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

View File

@@ -1,162 +0,0 @@
#!/bin/bash
# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
# buun-llama-cpp build to account for three gaps between upstream and the fork:
#
# 1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
# fork-specific `turbo2` / `turbo3` / `turbo4` cache types plus the buun
# additions `turbo2_tcq` / `turbo3_tcq`.
#
# 2. Wire up buun-exclusive speculative-decoding option handlers
# (tree_budget / draft_topk) alongside the existing spec_* handlers.
# These reference struct fields (common_params.speculative.tree_budget
# and .draft_topk) that only exist in buun's common/common.h — adding
# them to the shared backend/cpp/llama-cpp/grpc-server.cpp would break
# the stock llama-cpp build, so we inject them only into the buun copy.
#
# 3. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
# server-side random per-instance marker) with the legacy "<__media__>"
# literal. The fork branched before that PR, so server-common.cpp has no
# get_media_marker symbol. The fork's mtmd_default_marker() still returns
# "<__media__>", and Go-side tooling falls back to that sentinel when the
# backend does not expose media_marker, so substituting the literal keeps
# behavior identical on the buun path.
#
# We patch the *copy* sitting in buun-llama-cpp-<flavor>-build/, never the
# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
# compiling against vanilla upstream.
#
# Idempotent: skips each insertion if its marker is already present (so re-runs
# of the same build dir don't double-insert).
set -euo pipefail
if [[ $# -ne 1 ]]; then
echo "usage: $0 <grpc-server.cpp>" >&2
exit 2
fi
SRC=$1
if [[ ! -f "$SRC" ]]; then
echo "grpc-server.cpp not found at $SRC" >&2
exit 2
fi
if grep -q 'GGML_TYPE_TURBO2_TCQ' "$SRC"; then
echo "==> $SRC already has buun cache types, skipping KV allow-list patch"
else
echo "==> patching $SRC to allow turbo2/turbo3/turbo4/turbo2_tcq/turbo3_tcq KV-cache types"
# Insert the five TURBO entries right after the first ` GGML_TYPE_Q5_1,`
# line (the kv_cache_types[] allow-list). Using awk because the builder
# image does not ship python3, and GNU sed's multi-line `a\` quoting is
# awkward.
awk '
/^ GGML_TYPE_Q5_1,$/ && !done {
print
print " // buun-llama-cpp fork extras — added by patch-grpc-server.sh"
print " GGML_TYPE_TURBO2_0,"
print " GGML_TYPE_TURBO3_0,"
print " GGML_TYPE_TURBO4_0,"
print " GGML_TYPE_TURBO2_TCQ,"
print " GGML_TYPE_TURBO3_TCQ,"
done = 1
next
}
{ print }
END {
if (!done) {
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
exit 1
}
}
' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> KV allow-list patch OK"
fi
if grep -q 'optname, "tree_budget"' "$SRC"; then
echo "==> $SRC already has DFlash option handlers, skipping"
else
echo "==> patching $SRC to add tree_budget / draft_topk option handlers"
# Insert two new `else if` handlers between the inner close-brace of the
# `spec_p_split` block and the next `} else if (…spec_ngram_size_n…)` line.
# Upstream writes each `} else if` as a single physical line, so we don't
# emit an outer `}` ourselves — the existing next line provides both the
# close of our `draft_topk` block and the open of `spec_ngram_size_n`.
# Anchor on the exact 3-line body of spec_p_split so we can't drift.
awk '
prev2 == " } else if (!strcmp(optname, \"spec_p_split\")) {" &&
prev1 ~ /^ +if \(optval != NULL\) \{$/ &&
$0 ~ /^ +try \{ params\.speculative\.p_split = std::stof\(optval_str\); \} catch \(\.\.\.\) \{\}$/ &&
!done {
print # print the try-line itself
getline inner_close # read " }" closing the inner if
print inner_close # print it — this closes spec_p_split body
print " // buun-llama-cpp DFlash options — added by patch-grpc-server.sh"
print " } else if (!strcmp(optname, \"tree_budget\")) {"
print " if (optval != NULL) {"
print " try { params.speculative.tree_budget = std::stoi(optval_str); } catch (...) {}"
print " }"
print " } else if (!strcmp(optname, \"draft_topk\")) {"
print " if (optval != NULL) {"
print " try { params.speculative.draft_topk = std::stoi(optval_str); } catch (...) {}"
print " }"
# The next source line (`} else if (…spec_ngram_size_n…) {`) closes
# our draft_topk block and continues the chain naturally; fall back
# into the main loop to emit it and everything after.
done = 1
prev2 = prev1
prev1 = inner_close
next
}
{ print; prev2 = prev1; prev1 = $0 }
END {
if (!done) {
print "patch-grpc-server.sh: spec_p_split anchor not found" > "/dev/stderr"
exit 1
}
}
' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> DFlash option-handler patch OK"
fi
if grep -qE 'ctx_server\.get_meta\(\)\.logit_bias_eog|params_base\.sampling\.logit_bias_eog,' "$SRC"; then
echo "==> patching $SRC to drop the logit_bias_eog arg from params_from_json_cmpl() callsites (buun still uses the pre-refactor 4-arg signature)"
# Upstream llama.cpp refactored params_from_json_cmpl to take a precomputed
# logit_bias_eog vector after buun's 2026-04-05 fork-point — simultaneously
# adding server_context_meta::logit_bias_eog as the supplier. Buun carries
# neither change: its params_from_json_cmpl is still 4-arg, and internally
# derives logit_bias_eog from the common_params it's passed. So we just
# delete the argument line entirely — the remaining 4 args match buun's
# signature and the resulting behavior matches upstream bit-for-bit
# (upstream's 5th arg is the same data buun derives internally).
#
# Guard is broad so this works whether the line has been run through this
# block before (leaving params_base.sampling.logit_bias_eog,) or not
# (leaving the original ctx_server.get_meta().logit_bias_eog,).
sed -E '/^[[:space:]]+(ctx_server\.get_meta\(\)\.logit_bias_eog|params_base\.sampling\.logit_bias_eog),$/d' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> logit_bias_eog arg drop OK"
else
echo "==> $SRC has no logit_bias_eog arg line, skipping"
fi
if grep -q 'get_media_marker()' "$SRC"; then
echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
# Only one call site today (ModelMetadata), but replace all occurrences to
# stay robust if upstream adds more. Use a temp file to avoid relying on
# sed -i portability (the builder image uses GNU sed, but keeping this
# consistent with the awk block above).
sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
mv "$SRC.tmp" "$SRC"
echo "==> get_media_marker() substitution OK"
else
echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
fi
echo "==> all patches applied"

View File

@@ -1,46 +0,0 @@
Subject: [PATCH] ggml-cuda/fattn: provide atomicAdd(double*,double) shim for pre-sm_60
Buun's Q² calibration path in ggml_cuda_turbo_scale_q calls
atomicAdd(&d_q_channel_sq_fattn[threadIdx.x], (double)(val * val));
but native double atomicAdd is only available on compute capability 6.0
and newer. Compiling against a CUDA arch list that includes older
architectures (LocalAI's CUDA 12 Docker image builds for the full
published arch range) fails with:
fattn.cu(812): error: no instance of overloaded function "atomicAdd"
matches the argument list, argument types are: (double *, double)
Add the canonical CUDA-programming-guide shim at the top of fattn.cu so
pre-sm_60 codegen has a definition to call. On sm_60+ the native CUDA
intrinsic is used and the shim is elided via __CUDA_ARCH__.
--- a/ggml/src/ggml-cuda/fattn.cu
+++ b/ggml/src/ggml-cuda/fattn.cu
@@ -7,6 +7,27 @@
#include <atomic>
+// Pre-sm_60 double atomicAdd shim. Native double atomicAdd(double*,double)
+// is only available on CUDA compute capability 6.0+ (see CUDA C Programming
+// Guide, B.15 Atomic Functions). Buun's Q² calibration path below calls
+// atomicAdd with a double*; without this definition, nvcc fails to find a
+// matching overload whenever the compile target list includes pre-sm_60
+// architectures. The standard CAS loop implementation below matches the
+// semantics of the native intrinsic.
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600
+static __device__ double atomicAdd(double * address, double val) {
+ unsigned long long int * address_as_ull = (unsigned long long int *)address;
+ unsigned long long int old = *address_as_ull;
+ unsigned long long int assumed;
+ do {
+ assumed = old;
+ old = atomicCAS(address_as_ull, assumed,
+ __double_as_longlong(val + __longlong_as_double(assumed)));
+ } while (assumed != old);
+ return __longlong_as_double(old);
+}
+#endif
+
// InnerQ: update the fattn-side inverse scale array from host (all devices)
void turbo_innerq_update_fattn_scales(const float * scale_inv) {
int cur_device;

View File

@@ -1,32 +0,0 @@
Subject: [PATCH] ggml-cuda/argmax: pass WARP_SIZE to the top-K __shfl_xor_sync calls
Two __shfl_xor_sync calls in the top-K intra-warp merge drop the `width`
argument and rely on the CUDA default (warpSize). Every other call in
the same file already passes WARP_SIZE explicitly, and the HIP/ROCm
compatibility shim at ggml/src/ggml-cuda/vendors/hip.h:33 is a 4-arg
function-like macro — so the 3-arg form fails to preprocess when
building with hipcc against ROCm:
argmax.cu:265: error: too few arguments provided to function-like
macro invocation
note: macro '__shfl_xor_sync' defined here:
#define __shfl_xor_sync(mask, var, laneMask, width) \
__shfl_xor(var, laneMask, width)
Align the two call sites with the rest of the file by passing WARP_SIZE
explicitly. On CUDA the generated code is unchanged (warpSize is the
default); on HIP it now matches the macro's arity.
--- a/ggml/src/ggml-cuda/argmax.cu
+++ b/ggml/src/ggml-cuda/argmax.cu
@@ -262,8 +262,8 @@
// Each step: lane gets partner's min element, if it beats our min, replace and re-heapify
for (int offset = WARP_SIZE / 2; offset > 0; offset >>= 1) {
for (int i = 0; i < K; i++) {
- float partner_val = __shfl_xor_sync(0xFFFFFFFF, heap_val[i], offset);
- int partner_idx = __shfl_xor_sync(0xFFFFFFFF, heap_idx[i], offset);
+ float partner_val = __shfl_xor_sync(0xFFFFFFFF, heap_val[i], offset, WARP_SIZE);
+ int partner_idx = __shfl_xor_sync(0xFFFFFFFF, heap_idx[i], offset, WARP_SIZE);
if (partner_val > heap_val[0]) {
heap_val[0] = partner_val;
heap_idx[0] = partner_idx;

View File

@@ -1,24 +0,0 @@
Subject: [PATCH] ggml-cuda/vendors/hip: alias cudaMemcpy{To,From}Symbol to hip counterparts
Buun's Q² calibration + TCQ codebook upload paths in fattn.cu use
cudaMemcpyToSymbol / cudaMemcpyFromSymbol. The HIP-compat header in
ggml/src/ggml-cuda/vendors/hip.h already aliases the scalar cudaMemcpy
family (cudaMemcpy, cudaMemcpyAsync, cudaMemcpy2DAsync, …) but is
missing the symbol variants. Building with hipcc therefore fails with
15+ "use of undeclared identifier 'cudaMemcpyToSymbol'" errors.
Add the two missing aliases alongside the existing memcpy block. HIP
provides hipMemcpy{To,From}Symbol with the same signature as CUDA's
equivalents, so this is a straight name substitution.
--- a/ggml/src/ggml-cuda/vendors/hip.h
+++ b/ggml/src/ggml-cuda/vendors/hip.h
@@ -85,6 +85,8 @@
#define cudaMemcpyDeviceToDevice hipMemcpyDeviceToDevice
#define cudaMemcpyDeviceToHost hipMemcpyDeviceToHost
#define cudaMemcpyHostToDevice hipMemcpyHostToDevice
+#define cudaMemcpyToSymbol hipMemcpyToSymbol
+#define cudaMemcpyFromSymbol hipMemcpyFromSymbol
#define cudaMemcpyKind hipMemcpyKind
#define cudaMemset hipMemset
#define cudaMemsetAsync hipMemsetAsync

View File

@@ -1,36 +0,0 @@
Subject: [PATCH] ggml-cuda/fattn: pass WARP_SIZE to fwht128 __shfl_xor_sync calls
Same issue as the argmax top-K fix: two __shfl_xor_sync call sites in
the FWHT-128 butterfly kernels (ggml_cuda_fwht128 and fwht128_store_half)
use the 3-arg CUDA form and omit the `width` argument that the HIP
function-like macro in vendors/hip.h:33 requires. Hipcc fails with:
fattn.cu:512: too few arguments provided to function-like macro
invocation
note: macro '__shfl_xor_sync' defined here:
#define __shfl_xor_sync(mask, var, laneMask, width) \
__shfl_xor(var, laneMask, width)
Add WARP_SIZE to both calls. CUDA codegen is unchanged (warpSize is the
default); HIP now matches the macro arity.
--- a/ggml/src/ggml-cuda/fattn.cu
+++ b/ggml/src/ggml-cuda/fattn.cu
@@ -509,7 +509,7 @@
// Intra-warp passes: shuffle xor with stride h, no smem, no sync.
#pragma unroll
for (int h = 1; h <= 16; h *= 2) {
- const float other = __shfl_xor_sync(0xFFFFFFFF, val, h);
+ const float other = __shfl_xor_sync(0xFFFFFFFF, val, h, WARP_SIZE);
val = (tid & h) ? (other - val) : (val + other);
}
@@ -533,7 +533,7 @@
static __device__ __forceinline__ void fwht128_store_half(
float val, half * dst_base) {
const int tid = threadIdx.x;
- const float neighbor = __shfl_xor_sync(0xFFFFFFFF, val, 1);
+ const float neighbor = __shfl_xor_sync(0xFFFFFFFF, val, 1, WARP_SIZE);
if ((tid & 1) == 0) {
const half2 packed = __floats2half2_rn(val, neighbor);
*((half2 *)(dst_base + tid)) = packed;

View File

@@ -1,65 +0,0 @@
#!/bin/bash
set -ex
# Get the absolute current dir where the script is located
CURDIR=$(dirname "$(realpath $0)")
cd /
echo "CPU info:"
grep -e "model\sname" /proc/cpuinfo | head -1
grep -e "flags" /proc/cpuinfo | head -1
BINARY=buun-llama-cpp-fallback
if grep -q -e "\savx\s" /proc/cpuinfo ; then
echo "CPU: AVX found OK"
if [ -e $CURDIR/buun-llama-cpp-avx ]; then
BINARY=buun-llama-cpp-avx
fi
fi
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
echo "CPU: AVX2 found OK"
if [ -e $CURDIR/buun-llama-cpp-avx2 ]; then
BINARY=buun-llama-cpp-avx2
fi
fi
# Check avx 512
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
echo "CPU: AVX512F found OK"
if [ -e $CURDIR/buun-llama-cpp-avx512 ]; then
BINARY=buun-llama-cpp-avx512
fi
fi
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
if [ -e $CURDIR/buun-llama-cpp-grpc ]; then
BINARY=buun-llama-cpp-grpc
fi
fi
# Extend ld library path with the dir where this script is located/lib
if [ "$(uname)" == "Darwin" ]; then
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
if [ -d "$CURDIR/lib/rocblas/library" ]; then
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
fi
fi
# If there is a lib/ld.so, use it
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
echo "Using binary: $BINARY"
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
fi
echo "Using binary: $BINARY"
exec $CURDIR/$BINARY "$@"
# We should never reach this point, however just in case we do, run fallback
exec $CURDIR/buun-llama-cpp-fallback "$@"

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=16996aeab772c69b6473597038b2ef0b85297e8b
IK_LLAMA_VERSION?=3a945af45d45936341a45bbf7deda56776a4af26
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=187a45637054881ecacf17f8e2f6f8f2ba7df1c7
LLAMA_VERSION?=f53577432541bb9edc1588c4ef45c66bf07e4468
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -642,6 +642,21 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
params.no_op_offload = false;
}
} else if (!strcmp(optname, "split_mode") || !strcmp(optname, "sm")) {
// Accepts: none | layer | row | tensor (the latter requires a llama.cpp build
// that includes ggml-org/llama.cpp#19378, FlashAttention enabled, and KV-cache
// quantization disabled).
if (optval != NULL) {
if (optval_str == "none") {
params.split_mode = LLAMA_SPLIT_MODE_NONE;
} else if (optval_str == "layer") {
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
} else if (optval_str == "row") {
params.split_mode = LLAMA_SPLIT_MODE_ROW;
} else if (optval_str == "tensor") {
params.split_mode = LLAMA_SPLIT_MODE_TENSOR;
}
}
} else if (!strcmp(optname, "kv_unified") || !strcmp(optname, "unified_kv")) {
if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
params.kv_unified = true;

View File

@@ -1,7 +1,7 @@
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
TURBOQUANT_VERSION?=11a241d0db78a68e0a5b99fe6f36de6683100f6a
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=c97702e1057c2fe13a7074cd9069cb9dd6edc1bf
STABLEDIFFUSION_GGML_VERSION?=b8bdffc19962be7e5a84bfefeb2e31bd885b571a
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -139,7 +139,10 @@ func (w *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptR
// segment start/end conversion factor taken from https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L895
s := CppGetSegmentStart(i) * (10000000)
t := CppGetSegmentEnd(i) * (10000000)
txt := strings.Clone(CppGetSegmentText(i))
// whisper.cpp can emit bytes that aren't valid UTF-8 (e.g. a multibyte
// codepoint split across token boundaries); protobuf string fields
// reject those at marshal time. Scrub before the value escapes cgo.
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
tokens := make([]int32, CppNTokens(i))
if opts.Diarize && CppGetSegmentSpeakerTurnNext(i) {

View File

@@ -263,6 +263,8 @@
amd: "rocm-vllm"
intel: "intel-vllm"
nvidia-cuda-12: "cuda12-vllm"
nvidia-cuda-13: "cuda13-vllm"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
cpu: "cpu-vllm"
- &sglang
name: "sglang"
@@ -285,6 +287,7 @@
amd: "rocm-sglang"
intel: "intel-sglang"
nvidia-cuda-12: "cuda12-sglang"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang"
cpu: "cpu-sglang"
- &vllm-omni
name: "vllm-omni"
@@ -311,6 +314,8 @@
nvidia: "cuda12-vllm-omni"
amd: "rocm-vllm-omni"
nvidia-cuda-12: "cuda12-vllm-omni"
nvidia-cuda-13: "cuda13-vllm-omni"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni"
- &mlx
name: "mlx"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
@@ -1608,6 +1613,20 @@
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant
## whisper
- !!merge <<: *whispercpp
name: "whisper-development"
capabilities:
default: "cpu-whisper-development"
nvidia: "cuda12-whisper-development"
intel: "intel-sycl-f16-whisper-development"
metal: "metal-whisper-development"
amd: "rocm-whisper-development"
vulkan: "vulkan-whisper-development"
nvidia-l4t: "nvidia-l4t-arm64-whisper-development"
nvidia-cuda-13: "cuda13-whisper-development"
nvidia-cuda-12: "cuda12-whisper-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-whisper-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-whisper-development"
- !!merge <<: *whispercpp
name: "nvidia-l4t-arm64-whisper"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-whisper"
@@ -1814,12 +1833,25 @@
nvidia: "cuda12-vllm-development"
amd: "rocm-vllm-development"
intel: "intel-vllm-development"
nvidia-cuda-12: "cuda12-vllm-development"
nvidia-cuda-13: "cuda13-vllm-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
cpu: "cpu-vllm-development"
- !!merge <<: *vllm
name: "cuda12-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm
- !!merge <<: *vllm
name: "cuda13-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm
- !!merge <<: *vllm
name: "cuda13-nvidia-l4t-arm64-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm
- !!merge <<: *vllm
name: "rocm-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
@@ -1840,6 +1872,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm
- !!merge <<: *vllm
name: "cuda13-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm
- !!merge <<: *vllm
name: "cuda13-nvidia-l4t-arm64-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm
- !!merge <<: *vllm
name: "rocm-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
@@ -1862,12 +1904,19 @@
nvidia: "cuda12-sglang-development"
amd: "rocm-sglang-development"
intel: "intel-sglang-development"
nvidia-cuda-12: "cuda12-sglang-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang-development"
cpu: "cpu-sglang-development"
- !!merge <<: *sglang
name: "cuda12-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sglang"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sglang
- !!merge <<: *sglang
name: "cuda13-nvidia-l4t-arm64-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang
- !!merge <<: *sglang
name: "rocm-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-sglang"
@@ -1888,6 +1937,11 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sglang"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-sglang
- !!merge <<: *sglang
name: "cuda13-nvidia-l4t-arm64-sglang-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-sglang"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-sglang
- !!merge <<: *sglang
name: "rocm-sglang-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-sglang"
@@ -1910,11 +1964,23 @@
nvidia: "cuda12-vllm-omni-development"
amd: "rocm-vllm-omni-development"
nvidia-cuda-12: "cuda12-vllm-omni-development"
nvidia-cuda-13: "cuda13-vllm-omni-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
- !!merge <<: *vllm-omni
name: "cuda12-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm-omni"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm-omni"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-nvidia-l4t-arm64-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni
- !!merge <<: *vllm-omni
name: "rocm-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm-omni"
@@ -1925,6 +1991,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm-omni"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm-omni"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni
- !!merge <<: *vllm-omni
name: "rocm-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm-omni"

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cpu]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda12]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda13]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda12]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda13]

View File

@@ -1 +1 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4

View File

@@ -23,6 +23,19 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
# wheel resolves cleanly. unsafe-best-match is required because the
# jetson-ai-lab index lists transitive deps (e.g. decord) at older
# versions only — without it uv refuses to fall through to PyPI for a
# compatible wheel and resolution fails.
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.

View File

@@ -0,0 +1,12 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
# Drop the [all] extra: it pulls outlines/decord, and decord has no
# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
# only legacy cp35-cp37). With [all] uv backtracks through versions
# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
# so uv can't silently downgrade if a future resolution misfires.
sglang>=0.5.0

View File

@@ -12,11 +12,15 @@ else
source $backend_dir/../common/libbackend.sh
fi
# Handle l4t build profiles (Python 3.12, pip fallback) if needed
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
# unsafe-best-match is required on l4t13 because the jetson-ai-lab index
# lists transitive deps at limited versions — without it uv pins to the
# first matching index and fails to resolve a compatible wheel from PyPI.
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match"
fi
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
@@ -26,7 +30,11 @@ fi
# Install base requirements first
installRequirements
# Install vllm based on build type
# Install vllm based on build type. vllm-omni tracks vllm master from
# source (cloned below) so we leave the upstream vllm dependency unpinned
# — vllm 0.19+ ships cu130 wheels by default, which is what we want for
# cublas13. Older cuda12/rocm/cpu paths still resolve a compatible wheel
# from the relevant channel.
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
# ROCm
if [ "x${USE_PIP}" == "xtrue" ]; then
@@ -34,8 +42,26 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
else
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
fi
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
# JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel
# at jetson-ai-lab. Version is unpinned: the index ships whatever build
# matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through
# to PyPI for transitive deps not present on the jetson-ai-lab index.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
else
uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
fi
elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
# vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm --torch-backend=auto
else
uv pip install vllm --torch-backend=auto
fi
elif [ "x${BUILD_TYPE}" == "xcublas" ] || [ "x${BUILD_TYPE}" == "x" ]; then
# CUDA (default) or CPU
# cuda12 / CPU — keep the 0.14.0 pin for compatibility with the existing
# cuda12 vllm-omni image; bumping should be its own change.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm==0.14.0 --torch-backend=auto
else

View File

@@ -0,0 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cu130
accelerate
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,13 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
bitsandbytes
flash-attn
diffusers
librosa
soundfile
pillow
numpy

View File

@@ -32,6 +32,22 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. unsafe-best-match
# is required because the jetson-ai-lab index lists transitive deps at
# limited versions — without it uv pins to the first matching index and
# fails to resolve a compatible wheel from PyPI.
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
# requirements-cpu-after.txt and compiles vllm locally against the host's
# actual CPU. Not used by default because it takes ~30-40 minutes, but

View File

@@ -1,2 +1,9 @@
https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
# flash-attn wheels are ABI-tied to a specific torch version. vllm forces
# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships
# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently
# broken when vllm upgrades torch during install, producing an undefined
# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers
# attention, and rotary_embedding/common.py guards the flash_attn import
# with find_spec(), so skipping flash-attn is safe and the only stable
# choice until upstream ships a torch-2.10 wheel.
vllm

View File

@@ -1,4 +1,4 @@
accelerate
torch==2.7.0
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,2 @@
--extra-index-url https://download.pytorch.org/whl/cu130
vllm

View File

@@ -0,0 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cu130
accelerate
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,2 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
vllm

View File

@@ -0,0 +1,8 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
bitsandbytes
flash-attn

View File

@@ -1867,9 +1867,9 @@ dependencies = [
[[package]]
name = "rustls-webpki"
version = "0.103.10"
version = "0.103.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
dependencies = [
"ring",
"rustls-pki-types",

View File

@@ -90,6 +90,14 @@ type WorkerCMD struct {
RegistrationToken string `env:"LOCALAI_REGISTRATION_TOKEN" help:"Token for authenticating with the frontend" group:"registration"`
HeartbeatInterval string `env:"LOCALAI_HEARTBEAT_INTERVAL" default:"10s" help:"Interval between heartbeats" group:"registration"`
NodeLabels string `env:"LOCALAI_NODE_LABELS" help:"Comma-separated key=value labels for this node (e.g. tier=fast,gpu=a100)" group:"registration"`
// MaxReplicasPerModel caps how many replicas of any one model can run on
// this worker concurrently. Default 1 = historical single-replica
// behavior. Set higher when a node has enough VRAM to host multiple
// copies of the same model (e.g. a fat 128 GiB box running 4× of a
// 24 GiB model for throughput). The auto-label `node.replica-slots=N`
// is published so model schedulers can target high-capacity nodes via
// the existing label selector.
MaxReplicasPerModel int `env:"LOCALAI_MAX_REPLICAS_PER_MODEL" default:"1" help:"Max replicas of any single model on this worker. Default 1 preserves single-replica behavior; set higher to allow stacking replicas on a fat node." group:"registration"`
// NATS (required)
NatsURL string `env:"LOCALAI_NATS_URL" required:"" help:"NATS server URL" group:"distributed"`
@@ -567,22 +575,35 @@ func (s *backendSupervisor) getAddr(backend string) string {
return ""
}
// buildProcessKey is the supervisor's stable identifier for a backend gRPC
// process. It includes the replica index so the same model can run multiple
// processes on a worker simultaneously without colliding on the same map slot
// or port. The "#N" suffix is purely internal — the controller never reads it.
func buildProcessKey(modelID, backend string, replicaIndex int) string {
base := modelID
if base == "" {
base = backend
}
return fmt.Sprintf("%s#%d", base, replicaIndex)
}
// installBackend handles the backend.install flow:
// 1. If already running for this model, return existing address
// 1. If already running for this (model, replica) slot, return existing address
// 2. Install backend from gallery (if not already installed)
// 3. Find backend binary
// 4. Start gRPC process on a new port
// Returns the gRPC address of the backend process.
//
// ProcessKey includes the replica index so a worker with MaxReplicasPerModel>1
// can host multiple processes for the same model on distinct ports. Old
// controllers (no replica_index in the request) implicitly target replica 0,
// which preserves single-replica behavior.
func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest) (string, error) {
// Process key: use ModelID if provided (per-model process), else backend name
processKey := req.ModelID
if processKey == "" {
processKey = req.Backend
}
processKey := buildProcessKey(req.ModelID, req.Backend, int(req.ReplicaIndex))
// If already running for this model, return its address
// If already running for this model+replica, return its address
if addr := s.getAddr(processKey); addr != "" {
xlog.Info("Backend already running for model", "backend", req.Backend, "model", req.ModelID, "addr", addr)
xlog.Info("Backend already running for model replica", "backend", req.Backend, "model", req.ModelID, "replica", req.ReplicaIndex, "addr", addr)
return addr, nil
}
@@ -886,13 +907,18 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
totalVRAM, _ := xsysinfo.TotalAvailableVRAM()
gpuVendor, _ := xsysinfo.DetectGPUVendor()
maxReplicas := cmd.MaxReplicasPerModel
if maxReplicas < 1 {
maxReplicas = 1
}
body := map[string]any{
"name": nodeName,
"address": cmd.advertiseAddr(),
"http_address": cmd.advertiseHTTPAddr(),
"total_vram": totalVRAM,
"available_vram": totalVRAM, // initially all VRAM is available
"gpu_vendor": gpuVendor,
"name": nodeName,
"address": cmd.advertiseAddr(),
"http_address": cmd.advertiseHTTPAddr(),
"total_vram": totalVRAM,
"available_vram": totalVRAM, // initially all VRAM is available
"gpu_vendor": gpuVendor,
"max_replicas_per_model": maxReplicas,
}
// If no GPU detected, report system RAM so the scheduler/UI has capacity info
@@ -906,39 +932,40 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
body["token"] = cmd.RegistrationToken
}
// Parse and add static node labels
// Parse and add static node labels. Always include the auto-label
// `node.replica-slots=N` so AND-selectors in ModelSchedulingConfig can
// target high-capacity nodes (e.g. {"node.replica-slots":"4"}).
labels := make(map[string]string)
if cmd.NodeLabels != "" {
labels := make(map[string]string)
for _, pair := range strings.Split(cmd.NodeLabels, ",") {
pair = strings.TrimSpace(pair)
if k, v, ok := strings.Cut(pair, "="); ok {
labels[strings.TrimSpace(k)] = strings.TrimSpace(v)
}
}
if len(labels) > 0 {
body["labels"] = labels
}
}
labels["node.replica-slots"] = strconv.Itoa(maxReplicas)
body["labels"] = labels
return body
}
// heartbeatBody returns the current VRAM/RAM stats for heartbeat payloads.
//
// When aggregate VRAM usage is unknown (no GPU, or temporary detection
// failure), we deliberately OMIT available_vram so the frontend keeps its
// last good value — overwriting with 0 makes the UI show the node as "fully
// used", while reporting total-as-available lies to the scheduler about
// free capacity.
func (cmd *WorkerCMD) heartbeatBody() map[string]any {
var availVRAM uint64
body := map[string]any{}
aggregate := xsysinfo.GetGPUAggregateInfo()
if aggregate.TotalVRAM > 0 {
availVRAM = aggregate.FreeVRAM
} else {
// Fallback: report total as available (no usage tracking possible)
availVRAM, _ = xsysinfo.TotalAvailableVRAM()
body["available_vram"] = aggregate.FreeVRAM
}
body := map[string]any{
"available_vram": availVRAM,
}
// If no GPU, report system RAM usage instead
// CPU-only workers (or workers that lost GPU visibility momentarily):
// report system RAM so the scheduler still has capacity info.
if aggregate.TotalVRAM == 0 {
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
body["available_ram"] = ramInfo.Available

View File

@@ -0,0 +1,70 @@
package cli
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("Worker per-replica process keying", func() {
Describe("buildProcessKey", func() {
// Pin the supervisor's keying contract: distinct replica indexes for
// the same modelID produce distinct process keys, so the supervisor
// map can hold multiple processes for one model. Dropping the suffix
// would re-introduce the original flap (one model, one slot, churn).
DescribeTable("produces stable, distinct keys",
func(modelID, backend string, replica int, want string) {
Expect(buildProcessKey(modelID, backend, replica)).To(Equal(want))
},
Entry("modelID present, replica 0", "Qwen3-35B", "llama-cpp", 0, "Qwen3-35B#0"),
Entry("modelID present, replica 1", "Qwen3-35B", "llama-cpp", 1, "Qwen3-35B#1"),
Entry("falls back to backend when modelID empty", "", "llama-cpp", 0, "llama-cpp#0"),
Entry("backend fallback with replica 2", "", "llama-cpp", 2, "llama-cpp#2"),
)
It("makes replicas distinguishable", func() {
r0 := buildProcessKey("model-a", "llama-cpp", 0)
r1 := buildProcessKey("model-a", "llama-cpp", 1)
Expect(r0).ToNot(Equal(r1), "replicas of the same model must produce distinct keys")
})
})
Describe("registrationBody", func() {
It("includes max_replicas_per_model and the auto-label", func() {
cmd := &WorkerCMD{
Addr: "worker.example.com:50051",
MaxReplicasPerModel: 4,
}
body := cmd.registrationBody()
Expect(body).To(HaveKey("max_replicas_per_model"))
Expect(body["max_replicas_per_model"]).To(Equal(4))
labels, ok := body["labels"].(map[string]string)
Expect(ok).To(BeTrue(), "labels must be present so selectors can target the slot count")
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "4"))
})
It("coerces zero/unset MaxReplicasPerModel to 1", func() {
cmd := &WorkerCMD{Addr: "worker.example.com:50051"}
body := cmd.registrationBody()
Expect(body["max_replicas_per_model"]).To(Equal(1),
"unset must default to single-replica behavior, not capacity 0")
labels := body["labels"].(map[string]string)
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "1"))
})
It("preserves user-provided labels alongside the auto-label", func() {
cmd := &WorkerCMD{
Addr: "worker.example.com:50051",
MaxReplicasPerModel: 2,
NodeLabels: "tier=fast,gpu=a100",
}
body := cmd.registrationBody()
labels := body["labels"].(map[string]string)
Expect(labels).To(HaveKeyWithValue("tier", "fast"))
Expect(labels).To(HaveKeyWithValue("gpu", "a100"))
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "2"))
})
})
})

View File

@@ -37,14 +37,6 @@ var CacheTypeOptions = []FieldOption{
{Value: "q4_1", Label: "Q4_1"},
{Value: "q5_0", Label: "Q5_0"},
{Value: "q5_1", Label: "Q5_1"},
// TurboQuant KV-cache types — accepted by the turboquant and
// buun-llama-cpp fork backends; stock llama-cpp will reject them at load.
{Value: "turbo2", Label: "Turbo2 (TurboQuant)"},
{Value: "turbo3", Label: "Turbo3 (TurboQuant)"},
{Value: "turbo4", Label: "Turbo4 (TurboQuant)"},
// Trellis-Coded Quantization variants — buun-llama-cpp only.
{Value: "turbo2_tcq", Label: "Turbo2 TCQ (buun-llama-cpp)"},
{Value: "turbo3_tcq", Label: "Turbo3 TCQ (buun-llama-cpp)"},
}
var DiffusersPipelineOptions = []FieldOption{

View File

@@ -34,7 +34,6 @@ func (i *LlamaCPPImporter) AdditionalBackends() []KnownBackendEntry {
return []KnownBackendEntry{
{Name: "ik-llama-cpp", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with ik-quants"},
{Name: "turboquant", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with TurboQuant optimizations"},
{Name: "buun-llama-cpp", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with DFlash speculative decoding and TurboQuant/TCQ KV-cache quantization"},
}
}
@@ -128,7 +127,7 @@ func (i *LlamaCPPImporter) Import(details Details) (gallery.ModelConfig, error)
backend := "llama-cpp"
if b, ok := preferencesMap["backend"].(string); ok {
switch b {
case "ik-llama-cpp", "turboquant", "buun-llama-cpp":
case "ik-llama-cpp", "turboquant":
backend = b
}
}

View File

@@ -181,23 +181,6 @@ var _ = Describe("LlamaCPPImporter", func() {
Expect(modelConfig.Files[0].Filename).To(Equal("my-model.gguf"))
})
It("swaps the emitted backend to buun-llama-cpp when preferred", func() {
preferences := json.RawMessage(`{"backend": "buun-llama-cpp"}`)
details := Details{
URI: "https://example.com/my-model.gguf",
Preferences: preferences,
}
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("backend: buun-llama-cpp"), fmt.Sprintf("Model config: %+v", modelConfig))
Expect(modelConfig.ConfigFile).NotTo(ContainSubstring("backend: llama-cpp\n"), fmt.Sprintf("Model config: %+v", modelConfig))
Expect(modelConfig.ConfigFile).To(ContainSubstring("model: my-model.gguf"), fmt.Sprintf("Model config: %+v", modelConfig))
Expect(len(modelConfig.Files)).To(Equal(1))
Expect(modelConfig.Files[0].Filename).To(Equal("my-model.gguf"))
})
It("keeps backend: llama-cpp for unknown backend preferences", func() {
// Unknown backend values must not leak into the emitted YAML —
// we only honour the two curated drop-in replacements.
@@ -392,7 +375,7 @@ var _ = Describe("LlamaCPPImporter", func() {
})
Context("AdditionalBackends", func() {
It("advertises ik-llama-cpp, turboquant, and buun-llama-cpp as drop-in replacements", func() {
It("advertises ik-llama-cpp and turboquant as drop-in replacements", func() {
entries := importer.AdditionalBackends()
names := make([]string, 0, len(entries))
@@ -401,7 +384,7 @@ var _ = Describe("LlamaCPPImporter", func() {
names = append(names, e.Name)
byName[e.Name] = e
}
Expect(names).To(ConsistOf("ik-llama-cpp", "turboquant", "buun-llama-cpp"))
Expect(names).To(ConsistOf("ik-llama-cpp", "turboquant"))
ik := byName["ik-llama-cpp"]
Expect(ik.Modality).To(Equal("text"))
@@ -410,10 +393,6 @@ var _ = Describe("LlamaCPPImporter", func() {
tq := byName["turboquant"]
Expect(tq.Modality).To(Equal("text"))
Expect(tq.Description).NotTo(BeEmpty())
bn := byName["buun-llama-cpp"]
Expect(bn.Modality).To(Equal("text"))
Expect(bn.Description).NotTo(BeEmpty())
})
})
})

View File

@@ -98,7 +98,7 @@ func (mgs *BackendEndpointService) GetAllStatusEndpoint() echo.HandlerFunc {
// @Param request body GalleryBackend true "query params"
// @Success 200 {object} schema.BackendResponse "Response"
// @Router /backends/apply [post]
func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
func (mgs *BackendEndpointService) ApplyBackendEndpoint(systemState *system.SystemState) echo.HandlerFunc {
return func(c echo.Context) error {
input := new(GalleryBackend)
// Get input data from the request body
@@ -106,6 +106,18 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
return err
}
// In distributed mode, refuse to fan out a hardware-specific build to
// every node — a CPU build landing on a GPU cluster is almost always
// wrong, and the silent footgun is exactly what this guard exists for.
// Auto-resolving (meta) backends are fine because each node picks its
// own variant. Tooling can recover by hitting
// POST /api/nodes/{id}/backends/install per target node.
if mgs.backendApplier.BackendManager().IsDistributed() && input.ID != "" {
if guard := concreteFanOutGuard(c, mgs.galleries, systemState, input.ID); guard != nil {
return guard
}
}
uuid, err := uuid.NewUUID()
if err != nil {
return err
@@ -120,6 +132,66 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
}
}
// concreteFanOutGuard returns a 409 response if the requested backend is a
// hardware-specific build (not auto-resolving / meta) and we are in
// distributed mode. It looks up the backend in the configured galleries; if
// the lookup itself fails (gallery unreachable, name not found), the guard
// stays out of the way and lets the install enqueue normally — a missing
// name will surface from the worker as a clearer error than the guard could
// produce here. The response body deliberately speaks human, with `code` and
// `meta_alternative` as the programmatic contract for tooling.
func concreteFanOutGuard(c echo.Context, galleries []config.Gallery, systemState *system.SystemState, backendID string) error {
// Use the unfiltered listing because in distributed mode the frontend's
// hardware is irrelevant — the install targets workers, not us — and the
// filtered list would hide variants that don't match the frontend host
// (e.g. a CUDA build on a CPU-only frontend), preventing the guard from
// firing for exactly the cases it's meant to protect against.
available, err := gallery.AvailableBackendsUnfiltered(galleries, systemState)
if err != nil {
return nil
}
requested := available.FindByName(backendID)
if requested == nil || requested.IsMeta() {
return nil
}
// Try to find an auto-resolving (meta) backend that has this concrete
// variant in its CapabilitiesMap, so we can suggest it as a one-shot
// alternative. Optional — empty string is fine if no parent exists.
metaAlternative := ""
for _, b := range available {
if !b.IsMeta() {
continue
}
for _, concrete := range b.CapabilitiesMap {
if concrete == backendID {
metaAlternative = b.Name
break
}
}
if metaAlternative != "" {
break
}
}
msg := fmt.Sprintf(
"Backend %q is a hardware-specific build and won't run correctly on every node in this cluster. In distributed mode, install it on specific nodes:\n\n POST /api/nodes/{node_id}/backends/install\n {\"backend\": %q}",
backendID, backendID,
)
if metaAlternative != "" {
msg += fmt.Sprintf(
"\n\nTo install across all nodes, use the auto-resolving backend %q — each node picks its own variant based on its hardware.",
metaAlternative,
)
}
return c.JSON(409, map[string]any{
"error": msg,
"code": "concrete_backend_requires_target",
"meta_alternative": metaAlternative,
})
}
// DeleteBackendEndpoint lets delete backends from a LocalAI instance
// @Summary delete backends from LocalAI.
// @Tags backends

View File

@@ -73,6 +73,10 @@ type RegisterNodeRequest struct {
AvailableRAM uint64 `json:"available_ram,omitempty"`
GPUVendor string `json:"gpu_vendor,omitempty"`
Labels map[string]string `json:"labels,omitempty"`
// MaxReplicasPerModel is the per-node cap on replicas of any single model.
// Workers older than this field omit it; we coerce 0 → 1 below to preserve
// historical single-replica behavior.
MaxReplicasPerModel int `json:"max_replicas_per_model,omitempty"`
}
// RegisterNodeEndpoint registers a new backend node.
@@ -131,17 +135,26 @@ func RegisterNodeEndpoint(registry *nodes.NodeRegistry, expectedToken string, au
tokenHash = hex.EncodeToString(h[:])
}
// Coerce 0 → 1 for backward compat with workers that don't send the field.
// GORM's `default:1` only fires for a missing column; once Go zero-values
// reach the struct field they're written as 0 unless explicitly set here.
maxReplicasPerModel := req.MaxReplicasPerModel
if maxReplicasPerModel < 1 {
maxReplicasPerModel = 1
}
node := &nodes.BackendNode{
Name: req.Name,
NodeType: nodeType,
Address: req.Address,
HTTPAddress: req.HTTPAddress,
TokenHash: tokenHash,
TotalVRAM: req.TotalVRAM,
AvailableVRAM: req.AvailableVRAM,
TotalRAM: req.TotalRAM,
AvailableRAM: req.AvailableRAM,
GPUVendor: req.GPUVendor,
Name: req.Name,
NodeType: nodeType,
Address: req.Address,
HTTPAddress: req.HTTPAddress,
TokenHash: tokenHash,
TotalVRAM: req.TotalVRAM,
AvailableVRAM: req.AvailableVRAM,
TotalRAM: req.TotalRAM,
AvailableRAM: req.AvailableRAM,
GPUVendor: req.GPUVendor,
MaxReplicasPerModel: maxReplicasPerModel,
}
ctx := c.Request().Context()
@@ -363,6 +376,9 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
}
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
// Backend can be either a gallery ID (resolved against BackendGalleries) or a
// direct URI install (URI + Name + optional Alias) — same shape as the
// standalone /api/backends/install-external path, just scoped to one node.
func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
return func(c echo.Context) error {
if unloader == nil {
@@ -372,17 +388,27 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
var req struct {
Backend string `json:"backend"`
BackendGalleries string `json:"backend_galleries,omitempty"`
URI string `json:"uri,omitempty"`
Name string `json:"name,omitempty"`
Alias string `json:"alias,omitempty"`
}
if err := c.Bind(&req); err != nil || req.Backend == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
}
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
// Either a gallery backend name or a direct URI must be supplied.
if req.Backend == "" && req.URI == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
}
// Admin-driven backend install: not tied to a specific replica slot
// (no model is being loaded). Pass replica 0 to match the worker's
// admin process-key convention (`backend#0`).
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
if err != nil {
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
}
if !reply.Success {
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "error", reply.Error)
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
}
return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
@@ -457,8 +483,8 @@ func UnloadModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
xlog.Error("Failed to stop backend after model unload", "node", nodeID, "model", req.ModelName, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "model unloaded but backend stop failed"))
}
// Remove from registry
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
// Remove every replica of this model on the node from the registry.
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
return c.JSON(http.StatusOK, map[string]string{"message": "model unloaded"})
}
}
@@ -484,7 +510,7 @@ func DeleteModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
// Non-fatal — backend process may not be running
xlog.Warn("StopBackend failed during model deletion (non-fatal)", "node", nodeID, "model", req.ModelName, "error", err)
}
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
return c.JSON(http.StatusOK, map[string]string{"message": "model deleted from node"})
}
}
@@ -659,6 +685,78 @@ func GetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
}
}
// UpdateMaxReplicasPerModelRequest is the body for the per-node replica cap endpoint.
type UpdateMaxReplicasPerModelRequest struct {
// Value is the new per-model replica cap on this node. Must be >= 1.
Value int `json:"value"`
}
// UpdateMaxReplicasPerModelEndpoint sets the per-node cap on how many replicas
// of any one model can be loaded concurrently. The corresponding
// `node.replica-slots` auto-label is refreshed so existing AND-selectors keep
// matching, and any unsatisfiable scheduling cooldowns are cleared so the
// reconciler retries on the next tick.
//
// This is a transient admin override — a worker re-registration restores the
// value the worker was started with (--max-replicas-per-model). For permanent
// fleet changes, change the worker flag.
//
// @Summary Update a node's max replicas per model
// @Tags Nodes
// @Param id path string true "Node ID"
// @Param request body UpdateMaxReplicasPerModelRequest true "New value"
// @Success 200 {object} map[string]int
// @Failure 400 {object} map[string]any "value must be >= 1"
// @Failure 404 {object} map[string]any "node not found"
// @Router /api/nodes/{id}/max-replicas-per-model [put]
func UpdateMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {
ctx := c.Request().Context()
nodeID := c.Param("id")
if _, err := registry.Get(ctx, nodeID); err != nil {
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
}
var req UpdateMaxReplicasPerModelRequest
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
}
if req.Value < 1 {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "value must be >= 1"))
}
if err := registry.UpdateMaxReplicasPerModel(ctx, nodeID, req.Value); err != nil {
xlog.Error("Failed to update max_replicas_per_model", "node", nodeID, "value", req.Value, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to update max replicas per model"))
}
return c.JSON(http.StatusOK, map[string]int{"max_replicas_per_model": req.Value})
}
}
// ResetMaxReplicasPerModelEndpoint clears the admin override on a node, so
// the next worker re-registration is allowed to update the value from its
// CLI flag again. The current value is left in place until the worker calls
// register.
//
// @Summary Reset a node's max replicas per model to the worker default
// @Tags Nodes
// @Param id path string true "Node ID"
// @Success 200 {object} map[string]bool
// @Failure 404 {object} map[string]any "node not found"
// @Router /api/nodes/{id}/max-replicas-per-model [delete]
func ResetMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {
ctx := c.Request().Context()
nodeID := c.Param("id")
if _, err := registry.Get(ctx, nodeID); err != nil {
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
}
if err := registry.ResetMaxReplicasPerModel(ctx, nodeID); err != nil {
xlog.Error("Failed to reset max_replicas_per_model override", "node", nodeID, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to reset override"))
}
return c.JSON(http.StatusOK, map[string]bool{"reset": true})
}
}
// SetNodeLabelsEndpoint replaces all labels for a node.
func SetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {

View File

@@ -1,29 +1,32 @@
import { test, expect } from '@playwright/test'
test.describe('Manage Page - Backend Logs Link', () => {
test('models table shows terminal icon for logs', async ({ page }) => {
test('row action menu exposes Backend logs entry with terminal icon', async ({ page }) => {
await page.goto('/app/manage')
// Wait for models to load
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
// Check for terminal icon (backend logs link)
const terminalIcon = page.locator('a[title="Backend logs"] i.fa-terminal')
await expect(terminalIcon.first()).toBeVisible()
// Row actions live behind the kebab (ActionMenu) — open the first row's menu.
const trigger = page.locator('button.action-menu__trigger').first()
await expect(trigger).toBeVisible()
await trigger.click()
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
await expect(logsItem).toBeVisible()
await expect(logsItem.locator('i.fa-terminal')).toBeVisible()
})
test('terminal icon links to backend-logs page', async ({ page }) => {
test('Backend logs menu item navigates to backend-logs page', async ({ page }) => {
await page.goto('/app/manage')
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
const logsLink = page.locator('a[title="Backend logs"]').first()
await expect(logsLink).toBeVisible()
const trigger = page.locator('button.action-menu__trigger').first()
await expect(trigger).toBeVisible()
await trigger.click()
// Link uses href="#" with onClick for navigation
const href = await logsLink.getAttribute('href')
expect(href).toBe('#')
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
await expect(logsItem).toBeVisible()
await logsItem.click()
// Click and verify navigation
await logsLink.click()
await expect(page).toHaveURL(/\/app\/backend-logs\//)
})
})

View File

@@ -0,0 +1,166 @@
import { test, expect } from '@playwright/test'
// These specs cover the per-node backend row in the Nodes page:
// - the upgrade affordance is self-explanatory (icon + tooltip)
// - a delete affordance is present and goes through ConfirmDialog
//
// We mock the distributed-mode API so the tests can run against the
// standalone ui-test-server without spinning up workers/NATS.
const NODE_ID = 'test-node-1'
const NODE_NAME = 'worker-test'
const BACKEND_NAME = 'cuda12-vllm-development'
async function mockDistributedNodes(page, { onDelete } = {}) {
await page.route('**/api/nodes', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify([
{
id: NODE_ID,
name: NODE_NAME,
node_type: 'backend',
address: '10.0.0.1:50051',
http_address: '10.0.0.1:8090',
status: 'healthy',
total_vram: 0,
available_vram: 0,
total_ram: 8_000_000_000,
available_ram: 4_000_000_000,
gpu_vendor: '',
last_heartbeat: new Date().toISOString(),
created_at: new Date().toISOString(),
updated_at: new Date().toISOString(),
},
]),
})
})
await page.route('**/api/nodes/scheduling', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: '[]',
})
})
await page.route(`**/api/nodes/${NODE_ID}/models`, (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: '[]',
})
})
await page.route(`**/api/nodes/${NODE_ID}/backends`, (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify([
{
name: BACKEND_NAME,
is_system: false,
is_meta: false,
installed_at: new Date().toISOString(),
},
]),
})
})
await page.route(`**/api/nodes/${NODE_ID}/backends/delete`, async (route) => {
if (onDelete) {
await onDelete(route)
}
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ message: 'backend deleted' }),
})
})
}
async function expandNodeAndWaitForBackends(page) {
await page.goto('/app/nodes')
// Click the row to expand it. The chevron toggle and the row both work,
// but clicking the name cell is the most user-like.
await page.getByText(NODE_NAME).first().click()
// Backends, Capacity and Labels live behind a "Manage" <details>
// disclosure (the drawer was distilled to keep at-a-glance content
// lean — see distill refactor in the multi-replica branch). Open it
// by clicking the summary inside the .node-manage scope so the
// per-node backend table is in the DOM before assertions run.
await page.locator('.node-manage > summary').first().click()
await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
}
test.describe('Nodes page — per-node backend actions', () => {
test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
await mockDistributedNodes(page)
await expandNodeAndWaitForBackends(page)
// Negative: the old, ambiguous wording must not be used.
await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
await expect(page.locator('button[title="Reinstall backend"] i.fa-sync-alt')).toHaveCount(0)
// Positive: a self-explanatory upgrade affordance is rendered next to the
// backend row. We accept either an arrow-up or arrows-rotate glyph; both
// map to "upgrade" semantics in FontAwesome 6 unambiguously.
const upgradeBtn = page.locator('button[title="Upgrade backend on this node"]')
await expect(upgradeBtn).toBeVisible()
const iconClass = await upgradeBtn.locator('i').getAttribute('class')
expect(iconClass).toMatch(/fa-(arrow-up|arrows-rotate|up-long)/)
})
test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
await mockDistributedNodes(page)
await expandNodeAndWaitForBackends(page)
const deleteBtn = page.locator('button[title="Delete backend from this node"]')
await expect(deleteBtn).toBeVisible()
await expect(deleteBtn.locator('i.fa-trash')).toBeVisible()
})
test('clicking delete opens the confirm dialog and POSTs to the per-node delete endpoint', async ({ page }) => {
let postedBody = null
await mockDistributedNodes(page, {
onDelete: async (route) => {
postedBody = route.request().postDataJSON()
},
})
await expandNodeAndWaitForBackends(page)
await page.locator('button[title="Delete backend from this node"]').click()
// ConfirmDialog uses role="alertdialog" and a danger confirm button.
const dialog = page.getByRole('alertdialog')
await expect(dialog).toBeVisible()
const confirmBtn = dialog.locator('button.btn-danger')
await expect(confirmBtn).toBeVisible()
await confirmBtn.click()
// Wait until the POST landed.
await expect.poll(() => postedBody, { timeout: 5_000 }).toEqual({ backend: BACKEND_NAME })
})
test('clicking delete and cancelling does not POST', async ({ page }) => {
let deleteCalls = 0
await mockDistributedNodes(page, {
onDelete: () => {
deleteCalls += 1
},
})
await expandNodeAndWaitForBackends(page)
await page.locator('button[title="Delete backend from this node"]').click()
const dialog = page.getByRole('alertdialog')
await expect(dialog).toBeVisible()
await dialog.getByRole('button', { name: /cancel/i }).click()
await expect(dialog).toBeHidden()
// Give any errant request a moment to fire so a regression would be caught.
await page.waitForTimeout(500)
expect(deleteCalls).toBe(0)
})
})

View File

@@ -7,7 +7,7 @@
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&display=swap" rel="stylesheet" />
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@300..700&family=Geist+Mono:wght@300..700&display=swap" rel="stylesheet" />
</head>
<body>
<div id="root"></div>

View File

@@ -3258,9 +3258,9 @@
}
},
"node_modules/postcss": {
"version": "8.5.8",
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz",
"integrity": "sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg==",
"version": "8.5.10",
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.10.tgz",
"integrity": "sha512-pMMHxBOZKFU6HgAZ4eyGnwXF/EvPGGqUr0MnZ5+99485wwW41kW91A4LOGxSHhgugZmSChL5AlElNdwlNgcnLQ==",
"dev": true,
"funding": [
{
@@ -3276,6 +3276,7 @@
"url": "https://github.com/sponsors/ai"
}
],
"license": "MIT",
"dependencies": {
"nanoid": "^3.3.11",
"picocolors": "^1.1.1",

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,11 @@
import { useState, useEffect } from 'react'
import { Outlet, useLocation } from 'react-router-dom'
import { useState, useEffect, useRef } from 'react'
import { Outlet, useLocation, useNavigate } from 'react-router-dom'
import Sidebar from './components/Sidebar'
import OperationsBar from './components/OperationsBar'
import { ToastContainer, useToast } from './components/Toast'
import { systemApi } from './utils/api'
import { useTheme } from './contexts/ThemeContext'
import { useAuth } from './context/AuthContext'
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
@@ -15,6 +17,10 @@ export default function App() {
const { toasts, addToast, removeToast } = useToast()
const [version, setVersion] = useState('')
const location = useLocation()
const navigate = useNavigate()
const { theme, toggleTheme } = useTheme()
const { authEnabled, user } = useAuth()
const hamburgerRef = useRef(null)
const isChatRoute = location.pathname.match(/\/chat(\/|$)/) || location.pathname.match(/\/agents\/[^/]+\/chat/)
useEffect(() => {
@@ -34,26 +40,80 @@ export default function App() {
window.scrollTo(0, 0)
}, [location.pathname])
// Drawer polish: lock body scroll, close on Escape, return focus to the
// hamburger when the drawer closes. Only engages when the drawer is open;
// desktop and tablet rail mode are unaffected.
useEffect(() => {
if (!sidebarOpen) return
const prevOverflow = document.body.style.overflow
document.body.style.overflow = 'hidden'
const onKey = (e) => { if (e.key === 'Escape') setSidebarOpen(false) }
window.addEventListener('keydown', onKey)
return () => {
document.body.style.overflow = prevOverflow
window.removeEventListener('keydown', onKey)
// Restore focus to the trigger so keyboard users land back where
// they invoked the drawer from.
hamburgerRef.current?.focus()
}
}, [sidebarOpen])
const layoutClasses = [
'app-layout',
isChatRoute ? 'app-layout-chat' : '',
sidebarCollapsed ? 'sidebar-is-collapsed' : '',
].filter(Boolean).join(' ')
const showAvatar = authEnabled && user
const accountLabel = user?.name || user?.email || 'Account'
return (
<div className={layoutClasses}>
<Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
<main className="main-content">
<main className="main-content" {...(sidebarOpen ? { 'aria-hidden': 'true', inert: '' } : {})}>
<OperationsBar />
{/* Mobile header */}
{/* Mobile header — primary actions reachable without opening the
drawer. Hamburger is the only way to expand the nav on phones;
theme toggle and account avatar are mirrored from the sidebar
footer so they remain one tap away. */}
<header className="mobile-header">
<button
ref={hamburgerRef}
className="hamburger-btn"
onClick={() => setSidebarOpen(true)}
aria-label="Open menu"
aria-expanded={sidebarOpen}
aria-controls="app-sidebar"
>
<i className="fas fa-bars" />
<i className="fas fa-bars" aria-hidden="true" />
</button>
<span className="mobile-title">LocalAI</span>
<div className="mobile-header-actions">
<button
type="button"
className="mobile-header-btn"
onClick={toggleTheme}
aria-label={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
title={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
>
<i className={`fas ${theme === 'dark' ? 'fa-sun' : 'fa-moon'}`} aria-hidden="true" />
</button>
{showAvatar && (
<button
type="button"
className="mobile-header-btn mobile-header-avatar"
onClick={() => navigate('/app/account')}
aria-label={`Account: ${accountLabel}`}
title={accountLabel}
>
{user.avatarUrl ? (
<img src={user.avatarUrl} alt="" />
) : (
<i className="fas fa-user-circle" aria-hidden="true" />
)}
</button>
)}
</div>
</header>
<div className="main-content-inner">
<div className="page-transition" key={location.pathname}>

View File

@@ -0,0 +1,141 @@
import { useRef, useState, useEffect, useCallback } from 'react'
import Popover from './Popover'
// ActionMenu renders a kebab (three-dot) button that opens a popover with a
// list of row actions. Replaces the inline cluster of icon buttons that made
// dense tables feel like a control panel — actions stay out of the way until
// the user reaches for them, the way Linear/Vercel/Notion handle row menus.
//
// Items shape:
// { key, icon?, label, onClick, danger?, disabled?, hidden?, shortcut? }
// { divider: true } // visual separator
// { type: 'badge', icon?, label } // non-interactive badge row
//
// Hidden items are filtered out so callers can write conditional menus
// inline (`{ key: 'stop', visible: isRunning, ... }` style) without ternaries.
//
// Keyboard:
// ArrowUp / ArrowDown — move highlight (skipping dividers + badges)
// Enter / Space — activate
// Escape — close, return focus to trigger
export default function ActionMenu({ items, ariaLabel = 'Actions', triggerLabel, compact = false }) {
const triggerRef = useRef(null)
const [open, setOpen] = useState(false)
const [activeIdx, setActiveIdx] = useState(-1)
const interactive = (Array.isArray(items) ? items : []).filter(it => it && !it.divider && it.type !== 'badge' && !it.hidden)
const visible = (Array.isArray(items) ? items : []).filter(it => it && !it.hidden)
const close = useCallback(() => {
setOpen(false)
setActiveIdx(-1)
}, [])
// Move highlight to the first interactive item when opening, so keyboard
// users land somewhere meaningful instead of having to arrow into the menu.
useEffect(() => {
if (open && activeIdx === -1 && interactive.length > 0) {
setActiveIdx(0)
}
}, [open, activeIdx, interactive.length])
const handleTriggerKeyDown = (e) => {
if (e.key === 'ArrowDown' || e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
e.stopPropagation()
setOpen(true)
}
}
const handleMenuKeyDown = (e) => {
if (e.key === 'ArrowDown') {
e.preventDefault()
setActiveIdx(i => Math.min(interactive.length - 1, (i < 0 ? -1 : i) + 1))
} else if (e.key === 'ArrowUp') {
e.preventDefault()
setActiveIdx(i => Math.max(0, (i < 0 ? interactive.length : i) - 1))
} else if (e.key === 'Home') {
e.preventDefault()
setActiveIdx(0)
} else if (e.key === 'End') {
e.preventDefault()
setActiveIdx(interactive.length - 1)
} else if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
const item = interactive[activeIdx]
if (item && !item.disabled) {
close()
item.onClick?.()
}
}
}
if (interactive.length === 0 && !visible.some(it => it.type === 'badge')) {
return null
}
return (
<>
<button
ref={triggerRef}
type="button"
className={`action-menu__trigger${compact ? ' action-menu__trigger--compact' : ''}${open ? ' is-open' : ''}`}
aria-haspopup="menu"
aria-expanded={open}
aria-label={triggerLabel || ariaLabel}
onClick={(e) => { e.stopPropagation(); setOpen(v => !v) }}
onKeyDown={handleTriggerKeyDown}
>
<i className="fas fa-ellipsis-vertical" />
</button>
<Popover anchor={triggerRef} open={open} onClose={close} ariaLabel={ariaLabel}>
<div
role="menu"
aria-label={ariaLabel}
className="action-menu"
onKeyDown={handleMenuKeyDown}
// Capture focus when the menu opens so arrow keys work without the
// user clicking inside first.
tabIndex={-1}
ref={el => { if (el && open) el.focus() }}
>
{visible.map((item, i) => {
if (item.divider) {
return <div key={`d-${i}`} className="action-menu__divider" role="separator" />
}
if (item.type === 'badge') {
return (
<div key={item.key || `b-${i}`} className="action-menu__badge" role="presentation">
{item.icon && <i className={`fas ${item.icon}`} aria-hidden="true" />}
<span>{item.label}</span>
</div>
)
}
const idx = interactive.indexOf(item)
const active = idx === activeIdx
return (
<button
key={item.key}
type="button"
role="menuitem"
disabled={item.disabled}
className={`action-menu__item${item.danger ? ' is-danger' : ''}${active ? ' is-active' : ''}`}
onMouseEnter={() => setActiveIdx(idx)}
onClick={(e) => {
e.stopPropagation()
if (item.disabled) return
close()
item.onClick?.()
}}
>
{item.icon && <i className={`fas ${item.icon} action-menu__icon`} aria-hidden="true" />}
<span className="action-menu__label">{item.label}</span>
{item.shortcut && <span className="action-menu__shortcut">{item.shortcut}</span>}
</button>
)
})}
</div>
</Popover>
</>
)
}

View File

@@ -80,7 +80,7 @@ export default function ClientMCPDropdown({
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
value={url}
onChange={e => setUrl(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="text"
@@ -88,7 +88,7 @@ export default function ClientMCPDropdown({
placeholder="Name (optional)"
value={name}
onChange={e => setName(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="password"
@@ -96,13 +96,13 @@ export default function ClientMCPDropdown({
placeholder="Auth token (optional)"
value={authToken}
onChange={e => setAuthToken(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
Use CORS proxy
</label>
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
<button type="button" className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
</div>

View File

@@ -135,7 +135,7 @@ function JsonEditor({ value, onChange }) {
className="input"
value={text}
onChange={e => handleChange(e.target.value)}
style={{ width: '100%', minHeight: 80, fontFamily: 'monospace', fontSize: '0.8125rem', resize: 'vertical' }}
style={{ width: '100%', minHeight: 80, fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', resize: 'vertical' }}
/>
{parseError && <div style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 2 }}>{parseError}</div>}
</div>

View File

@@ -158,7 +158,7 @@ export default function FieldBrowser({ fields, activeFieldPaths, onAddField }) {
{field.description}
</div>
)}
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'monospace' }}>
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'var(--font-mono)' }}>
{field.path}
</div>
</div>

View File

@@ -0,0 +1,79 @@
import { useState, useEffect } from 'react'
const LOADING_PHRASES = [
{ text: 'Loading models...', icon: 'fa-brain' },
{ text: 'Fetching gallery...', icon: 'fa-download' },
{ text: 'Checking availability...', icon: 'fa-circle-check' },
{ text: 'Almost ready...', icon: 'fa-hourglass-half' },
{ text: 'Preparing gallery...', icon: 'fa-store' },
]
// GalleryLoader is the animated skeleton used while the gallery list loads.
// Used by Models, Backends, and (now) the Manage page so an empty fetch state
// reads the same everywhere instead of one tab showing pulsing dots and the
// other showing "Loading...".
export default function GalleryLoader() {
const [idx, setIdx] = useState(() => Math.floor(Math.random() * LOADING_PHRASES.length))
const [fade, setFade] = useState(true)
useEffect(() => {
const interval = setInterval(() => {
setFade(false)
setTimeout(() => {
setIdx(prev => (prev + 1) % LOADING_PHRASES.length)
setFade(true)
}, 300)
}, 2800)
return () => clearInterval(interval)
}, [])
const phrase = LOADING_PHRASES[idx]
return (
<div style={{
display: 'flex', flexDirection: 'column', alignItems: 'center',
justifyContent: 'center', padding: 'var(--spacing-xl) var(--spacing-md)',
minHeight: '280px', gap: 'var(--spacing-lg)',
}}>
<div style={{ display: 'flex', gap: 'var(--spacing-sm)' }}>
{[0, 1, 2, 3, 4].map(i => (
<div key={i} style={{
width: 10, height: 10, borderRadius: '50%',
background: 'var(--color-primary)',
animation: `galleryDot 1.4s ease-in-out ${i * 0.15}s infinite`,
}} />
))}
</div>
<div style={{
display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)',
opacity: fade ? 1 : 0,
transition: 'opacity 300ms ease',
color: 'var(--color-text-secondary)',
fontSize: '0.9375rem',
fontWeight: 500,
}}>
<i className={`fas ${phrase.icon}`} style={{ color: 'var(--color-accent)', fontSize: '1.125rem' }} />
{phrase.text}
</div>
<div style={{ width: '100%', maxWidth: '700px', display: 'flex', flexDirection: 'column', gap: '12px' }}>
{[0.9, 0.7, 0.5].map((opacity, i) => (
<div key={i} style={{
height: '48px', borderRadius: 'var(--radius-md)',
background: 'var(--color-bg-tertiary)', opacity,
animation: `galleryShimmer 1.8s ease-in-out ${i * 0.2}s infinite`,
}} />
))}
</div>
<style>{`
@keyframes galleryDot {
0%, 80%, 100% { transform: scale(0.4); opacity: 0.3; }
40% { transform: scale(1); opacity: 1; }
}
@keyframes galleryShimmer {
0%, 100% { opacity: var(--shimmer-base, 0.15); }
50% { opacity: var(--shimmer-peak, 0.3); }
}
`}</style>
</div>
)
}

View File

@@ -0,0 +1,47 @@
import StatCard from './StatCard'
// ManageSummary anchors the Manage page with the same StatCard pattern the
// Nodes dashboard uses, so the page reads as a real overview rather than
// "two tabs in a hat". Counts are derived in-memory by the parent — this
// component is purely presentational. Cards are clickable and route the
// user to the relevant tab + filter.
export default function ManageSummary({
modelsCount,
backendsCount,
runningCount,
updatesCount,
onCardClick,
}) {
const click = (tab, filter) => onCardClick && onCardClick(tab, filter)
return (
<div className="stat-grid manage-summary">
<StatCard
icon="fas fa-brain"
label="Models Installed"
value={modelsCount}
onClick={() => click('models', 'all')}
/>
<StatCard
icon="fas fa-server"
label="Backends Installed"
value={backendsCount}
onClick={() => click('backends', 'all')}
/>
<StatCard
icon="fas fa-circle-play"
label="Currently Running"
value={runningCount}
accentVar={runningCount > 0 ? '--color-success' : undefined}
onClick={() => click('models', 'running')}
/>
<StatCard
icon="fas fa-arrow-up"
label="Updates Available"
value={updatesCount}
accentVar={updatesCount > 0 ? '--color-warning' : undefined}
onClick={() => click('backends', updatesCount > 0 ? 'upgradable' : 'all')}
/>
</div>
)
}

View File

@@ -0,0 +1,30 @@
// MetaBadgeRow renders the System / User / Meta / Dev badge cluster the same
// way everywhere — Manage tabs and (in future) Install gallery. The badges
// already exist as classes; this component locks down the icons + labels so
// the same backend type doesn't read "User" in one tab and "downloaded" in
// another.
export default function MetaBadgeRow({ isSystem, isMeta, isDevelopment }) {
return (
<div className="badge-row">
{isSystem ? (
<span className="badge badge-info" title="Bundled with the LocalAI runtime">
<i className="fas fa-shield-alt" /> System
</span>
) : (
<span className="badge badge-success" title="Installed from the gallery or external source">
<i className="fas fa-download" /> User
</span>
)}
{isMeta && (
<span className="badge badge-accent" title="Meta backend — selects a concrete variant per node">
<i className="fas fa-layer-group" /> Meta
</span>
)}
{isDevelopment && (
<span className="badge badge-warning" title="Marked as development / pre-release by the gallery">
<i className="fas fa-flask" /> Dev
</span>
)}
</div>
)
}

View File

@@ -0,0 +1,668 @@
import { useState, useMemo, useEffect, useRef } from 'react'
import Modal from './Modal'
import SearchableSelect from './SearchableSelect'
import { nodesApi } from '../utils/api'
// NodeInstallPicker is the single multi-node install surface used both from
// the Backends gallery split-button and from the "Install on more nodes" `+`
// affordance in the Nodes column. Submit fires N parallel per-node install
// calls; rows transition inline so the user sees per-node success/failure
// without leaving the modal.
//
// Props:
// open — controls visibility
// onClose — close handler (header X / Cancel / Esc / backdrop)
// onComplete — fired after at least one node install succeeded;
// gallery uses this to refetch and update the Nodes
// column without a manual reload
// backend — { name, isMeta, capabilities, metaBackendFor }
// nodes — BackendNode[] from /api/nodes
// installedNodeIds — Set/array of node IDs that already have this backend
// initialSelection — optional pre-selected node IDs (e.g. "missing nodes"
// when opened from the Nodes column `+` affordance)
const STATUS_LABELS = { healthy: 'Healthy', draining: 'Draining', unhealthy: 'Unhealthy', offline: 'Offline' }
function formatVRAM(bytes) {
if (!bytes || bytes === 0) return null
const gb = bytes / (1024 * 1024 * 1024)
return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
}
function gpuVendorLabel(vendor) {
const labels = { nvidia: 'NVIDIA', amd: 'AMD', intel: 'Intel', vulkan: 'Vulkan' }
return labels[vendor] || null
}
// hardwareTargetOf parses the capability key that points to a concrete
// variant in the parent meta's CapabilitiesMap. e.g. cpu-llama-cpp comes
// from {"cpu": "cpu-llama-cpp"} → "cpu". Falls back to "" when the parent
// is unknown (the gallery list payload still gives us metaBackendFor).
function hardwareTargetOf(backend, allBackends) {
if (!backend || !backend.name || backend.isMeta) return ''
const parentName = backend.metaBackendFor
if (!parentName) return ''
const parent = (allBackends || []).find(b => b.name === parentName || b.id === parentName)
if (!parent || !parent.capabilities) return ''
for (const [cap, concreteName] of Object.entries(parent.capabilities)) {
if (concreteName === backend.name) return cap
}
return ''
}
// humanTargetLabel turns a capability key into a user-facing phrase used in
// the picker header note: "CPU build", "CUDA 12 build", etc. Keep it
// concrete and product-recognisable, not the raw token from the gallery.
function humanTargetLabel(target) {
if (!target) return 'hardware-specific build'
const t = target.toLowerCase()
if (t.startsWith('cpu') || t === 'default') return 'CPU build'
if (t.includes('cuda-13') || t.includes('cuda13')) return 'CUDA 13 build'
if (t.includes('cuda-12') || t.includes('cuda12')) return 'CUDA 12 build'
if (t.includes('cuda')) return 'NVIDIA CUDA build'
if (t.includes('l4t')) return 'NVIDIA Jetson (L4T) build'
if (t.includes('nvidia')) return 'NVIDIA build'
if (t.includes('rocm') || t.includes('amd')) return 'AMD ROCm build'
if (t.includes('metal')) return 'Apple Metal build'
if (t.includes('sycl') || t.includes('intel')) return 'Intel SYCL build'
if (t.includes('vulkan')) return 'Vulkan build'
if (t.includes('darwin-x86')) return 'macOS x86 build'
return 'hardware-specific build'
}
// suitabilityFor returns the picker's per-row suitability state for the
// requested backend. Already-installed wins over compatible/override so
// the user sees a single signal per row.
function suitabilityFor({ node, backend, hardwareTarget, alreadyInstalled }) {
if (alreadyInstalled) return 'installed'
// backend can be null on the first render before pickerBackend is set —
// this function is invoked from useMemo, which runs regardless of the
// outer open guard. Treat missing data as "compatible" so the placeholder
// render doesn't blow up; the picker won't actually paint anything until
// the early-return below the hooks fires.
if (!backend || backend.isMeta || !hardwareTarget) return 'compatible'
const vendor = (node.gpu_vendor || '').toLowerCase()
const t = hardwareTarget.toLowerCase()
if (t.startsWith('cpu') || t === 'default') {
// CPU builds always run; they're never marked Override (running CPU on a
// GPU node is the headline use case the user is choosing intentionally).
return 'compatible'
}
if (t.includes('nvidia') || t.includes('cuda') || t.includes('l4t')) {
return vendor === 'nvidia' ? 'compatible' : 'override'
}
if (t.includes('amd') || t.includes('rocm') || t.includes('hip')) {
return vendor === 'amd' ? 'compatible' : 'override'
}
if (t.includes('intel') || t.includes('sycl')) {
return vendor === 'intel' ? 'compatible' : 'override'
}
if (t.includes('metal') || t.includes('darwin')) {
// No vendor reporting for Metal; trust the user.
return 'compatible'
}
return 'compatible'
}
export default function NodeInstallPicker({
open, onClose, onComplete,
backend,
nodes = [],
allBackends = [],
installedNodeIds = [],
initialSelection,
addToast,
}) {
const [search, setSearch] = useState('')
const [showHealthy, setShowHealthy] = useState(true)
const [showDraining, setShowDraining] = useState(false)
const [selected, setSelected] = useState(() => new Set())
const [overrideVariant, setOverrideVariant] = useState('') // chosen concrete name
const [overrideExpanded, setOverrideExpanded] = useState(false)
const [submitting, setSubmitting] = useState(false)
const [showMismatchConfirm, setShowMismatchConfirm] = useState(false)
// Per-node submission state: { [nodeId]: { status: 'pending'|'installing'|'done'|'error', error? , version? } }
const [perNode, setPerNode] = useState({})
const headerInputRef = useRef(null)
// Backend-derived metadata used throughout the picker.
const hardwareTarget = useMemo(() => hardwareTargetOf(backend, allBackends), [backend, allBackends])
const targetLabel = humanTargetLabel(hardwareTarget)
const concreteVariants = useMemo(() => {
if (!backend?.isMeta || !backend.capabilities) return []
return Object.entries(backend.capabilities).map(([cap, concrete]) => ({
value: concrete,
label: `${concrete} · ${cap}`,
}))
}, [backend])
// Pending nodes are surgically removed from the list — they can't accept
// installs until approved. Surface the count instead of dead-disabled rows.
const pendingCount = nodes.filter(n => n.status === 'pending').length
const backendNodes = nodes.filter(n =>
(!n.node_type || n.node_type === 'backend') && n.status !== 'pending'
)
const installedSet = useMemo(() => {
const s = new Set()
if (Array.isArray(installedNodeIds)) installedNodeIds.forEach(id => s.add(id))
else if (installedNodeIds && typeof installedNodeIds.has === 'function') {
installedNodeIds.forEach(id => s.add(id))
}
return s
}, [installedNodeIds])
const filteredNodes = useMemo(() => {
let list = backendNodes
if (!showHealthy) list = list.filter(n => n.status !== 'healthy')
if (!showDraining) list = list.filter(n => n.status !== 'draining')
if (search.trim()) {
const q = search.toLowerCase()
list = list.filter(n =>
(n.name || '').toLowerCase().includes(q) ||
Object.entries(n.labels || {}).some(([k, v]) => `${k}=${v}`.toLowerCase().includes(q))
)
}
return list
}, [backendNodes, showHealthy, showDraining, search])
// Pre-seed selection on open. Reset all transient state so reopening
// doesn't surface ghost progress from the prior submit.
useEffect(() => {
if (!open) return
const initial = new Set()
if (Array.isArray(initialSelection)) initialSelection.forEach(id => initial.add(id))
setSelected(initial)
setSearch('')
setOverrideVariant('')
setOverrideExpanded(false)
setPerNode({})
setSubmitting(false)
setShowMismatchConfirm(false)
}, [open, initialSelection])
// Auto-expand the variant override disclosure when at least one selected
// node lacks a working GPU. This is the headline use case the feature
// exists for; surfacing it instead of hiding behind a click.
useEffect(() => {
if (!backend?.isMeta) return
const someGPUMissing = Array.from(selected).some(id => {
const n = backendNodes.find(x => x.id === id)
return n && (!n.gpu_vendor || n.gpu_vendor === '' || n.gpu_vendor === 'unknown')
})
if (someGPUMissing && !overrideExpanded) setOverrideExpanded(true)
}, [selected, backend, backendNodes]) // eslint-disable-line react-hooks/exhaustive-deps
// The effective backend that gets installed on each node. For
// hardware-specific backends this is just backend.name. For meta backends
// with no override, the worker picks per-node — we pass backend.name and
// the worker resolves. With an override set, the picker installs that
// exact concrete variant on every selected node.
const effectiveBackendName = overrideVariant || backend?.name
const counts = useMemo(() => {
let already = 0, overrides = 0
selected.forEach(id => {
const n = backendNodes.find(x => x.id === id)
if (!n) return
if (installedSet.has(id)) { already++; return }
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
const s = suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false })
if (s === 'override') overrides++
})
return { already, overrides, selected: selected.size }
}, [selected, backendNodes, installedSet, overrideVariant, backend, hardwareTarget, allBackends])
const toggle = (nodeId) => {
setSelected(prev => {
const next = new Set(prev)
next.has(nodeId) ? next.delete(nodeId) : next.add(nodeId)
return next
})
}
const selectAllHealthy = () => {
setSelected(new Set(filteredNodes.filter(n => n.status === 'healthy').map(n => n.id)))
}
const selectCompatible = () => {
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
setSelected(new Set(
filteredNodes
.filter(n => suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false }) === 'compatible')
.map(n => n.id)
))
}
const clearSelection = () => setSelected(new Set())
const submit = async () => {
if (selected.size === 0 || submitting) return
if (counts.overrides > 0 && !showMismatchConfirm) {
setShowMismatchConfirm(true)
return
}
setShowMismatchConfirm(false)
setSubmitting(true)
const ids = Array.from(selected)
setPerNode(prev => {
const next = { ...prev }
ids.forEach(id => { next[id] = { status: 'installing' } })
return next
})
const results = await Promise.allSettled(ids.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
))
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) {
next[v.id] = { status: 'done' }
successCount++
} else {
next[v.id] = { status: 'error', error: v.error }
failCount++
}
}
return next
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
} else if (successCount === 0) {
addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
} else {
addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
}
}
const retryFailed = async () => {
const failedIds = Object.entries(perNode)
.filter(([, v]) => v.status === 'error')
.map(([id]) => id)
if (failedIds.length === 0) return
setSelected(new Set(failedIds))
// Replace state for failed rows so they show "installing" again, not stale errors.
setPerNode(prev => {
const next = { ...prev }
failedIds.forEach(id => { next[id] = { status: 'installing' } })
return next
})
setSubmitting(true)
const results = await Promise.allSettled(failedIds.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
))
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
}
return next
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
}
}
const doneCount = Object.values(perNode).filter(v => v.status === 'done').length
const errorCount = Object.values(perNode).filter(v => v.status === 'error').length
const totalAttempted = Object.keys(perNode).length
if (!open || !backend) return null
const noNodes = backendNodes.length === 0
return (
<Modal onClose={onClose} maxWidth="780px">
<div style={{
padding: 'var(--spacing-md) var(--spacing-lg)',
borderBottom: '1px solid var(--color-border-subtle)',
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
gap: 'var(--spacing-sm)',
}}>
<h2 style={{ margin: 0, fontSize: '1rem', display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
<i className="fas fa-cog" style={{ color: 'var(--color-primary)' }} />
Install <span style={{ fontFamily: 'var(--font-mono)' }}>{backend.name}</span>
{backend.isMeta ? (
<span className="badge badge-info" style={{ fontSize: '0.6875rem' }}>Auto-resolving</span>
) : (
<span className="badge badge-warning" style={{ fontSize: '0.6875rem' }}>Hardware-specific</span>
)}
</h2>
<button
type="button"
className="btn btn-ghost btn-sm"
onClick={onClose}
aria-label="Close"
style={{ fontSize: '1.125rem', lineHeight: 1, padding: '4px 10px' }}
>×</button>
</div>
<div style={{ padding: 'var(--spacing-md) var(--spacing-lg)' }}>
{!backend.isMeta && (
<div className="card" style={{
marginBottom: 'var(--spacing-md)',
padding: 'var(--spacing-sm) var(--spacing-md)',
background: 'var(--color-warning-light)',
border: '1px solid var(--color-warning-border)',
borderRadius: 'var(--radius-md)',
display: 'flex',
alignItems: 'center',
gap: 'var(--spacing-sm)',
}}>
<i className="fas fa-microchip" style={{ color: 'var(--color-warning)' }} />
<span style={{ color: 'var(--color-warning)', fontSize: '0.8125rem' }}>
{targetLabel}. Install only on nodes where you want this build to run.
{hardwareTarget && ` Targets: ${humanTargetLabel(hardwareTarget).replace(' build', '')}.`}
</span>
</div>
)}
{noNodes ? (
<div className="empty-state" style={{ padding: 'var(--spacing-xl) 0' }}>
<div className="empty-state-icon"><i className="fas fa-server" /></div>
<h3 className="empty-state-title">No backend nodes available</h3>
<p className="empty-state-text">
Approve pending workers or register new ones.
{pendingCount > 0 && ` (${pendingCount} awaiting approval.)`}
</p>
<a className="btn btn-secondary btn-sm" href="/app/nodes">
<i className="fas fa-network-wired" /> Manage nodes
</a>
</div>
) : (
<>
{/* Filter row */}
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', alignItems: 'center', marginBottom: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
<div className="search-bar" style={{ flex: 1, minWidth: 180 }}>
<i className="fas fa-search search-icon" />
<input
ref={headerInputRef}
className="input"
placeholder="Filter nodes by name or label..."
value={search}
onChange={e => setSearch(e.target.value)}
/>
</div>
<button className="btn btn-secondary btn-sm" onClick={selectAllHealthy} type="button">
Select all healthy
</button>
<button className="btn btn-secondary btn-sm" onClick={selectCompatible} type="button">
Select compatible nodes
</button>
{selected.size > 0 && (
<button className="btn btn-ghost btn-sm" onClick={clearSelection} type="button">
Clear
</button>
)}
</div>
{/* Variant override (auto-resolving only) */}
{backend.isMeta && concreteVariants.length > 0 && (
<div style={{ marginBottom: 'var(--spacing-sm)' }}>
<button
type="button"
className="btn btn-ghost btn-sm"
onClick={() => setOverrideExpanded(v => !v)}
aria-expanded={overrideExpanded}
style={{ padding: '4px 8px' }}
>
<i className={`fas fa-chevron-${overrideExpanded ? 'down' : 'right'}`} style={{ marginRight: 4, fontSize: '0.625rem' }} />
Override variant for selected nodes
</button>
{overrideExpanded && (
<div className="card" style={{ marginTop: 4, padding: 'var(--spacing-sm) var(--spacing-md)' }}>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 0, marginBottom: 'var(--spacing-xs)' }}>
By default each node picks its own variant. Override to install one specific variant on every selected node useful when GPU detection fails on a node and you want the CPU build there instead.
</p>
<SearchableSelect
value={overrideVariant}
onChange={setOverrideVariant}
options={concreteVariants}
placeholder="Per-node auto-resolve (default)"
allOption={{ value: '', label: 'Per-node auto-resolve (default)' }}
/>
</div>
)}
</div>
)}
{/* Node table */}
<div className="table-container" style={{ marginBottom: 'var(--spacing-sm)', maxHeight: '40vh', overflowY: 'auto' }}>
<table className="table" style={{ margin: 0 }}>
<thead>
<tr>
<th style={{ width: 28 }}>
<input
type="checkbox"
aria-label="Select all visible"
checked={filteredNodes.length > 0 && filteredNodes.every(n => selected.has(n.id))}
onChange={(e) => {
setSelected(prev => {
const next = new Set(prev)
if (e.target.checked) filteredNodes.forEach(n => next.add(n.id))
else filteredNodes.forEach(n => next.delete(n.id))
return next
})
}}
/>
</th>
<th>Node</th>
<th>Status</th>
<th>Hardware</th>
<th>Suitability</th>
</tr>
</thead>
<tbody>
{filteredNodes.map(node => {
const installed = installedSet.has(node.id)
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
const suit = suitabilityFor({ node, backend: eff, hardwareTarget: target, alreadyInstalled: installed })
const isSel = selected.has(node.id)
const rowState = perNode[node.id]
const vendor = gpuVendorLabel(node.gpu_vendor)
const totalVRAM = formatVRAM(node.total_vram)
const totalRAM = formatVRAM(node.total_ram)
return (
<tr key={node.id}>
<td>
<input
type="checkbox"
aria-label={`Select ${node.name}`}
aria-disabled={rowState?.status === 'installing'}
checked={isSel}
onChange={() => toggle(node.id)}
/>
</td>
<td>
<div style={{ display: 'flex', flexDirection: 'column', gap: 2 }}>
<span style={{ fontWeight: 500, fontSize: '0.875rem' }}>{node.name}</span>
{node.labels && Object.keys(node.labels).length > 0 && (
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 3 }}>
{Object.entries(node.labels).slice(0, 3).map(([k, v]) => (
<span key={k} className="cell-mono" style={{
padding: '1px 5px', borderRadius: 'var(--radius-sm)', fontSize: '0.6875rem',
background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
}}>{k}={v}</span>
))}
{Object.keys(node.labels).length > 3 && (
<span className="cell-muted" style={{ fontSize: '0.6875rem' }}>
+{Object.keys(node.labels).length - 3}
</span>
)}
</div>
)}
</div>
</td>
<td>
<span style={{ fontSize: '0.8125rem' }}>
{STATUS_LABELS[node.status] || node.status}
</span>
</td>
<td style={{ fontSize: '0.8125rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)' }}>
{totalVRAM ? (
<>{vendor && <span style={{ marginRight: 4 }}>{vendor}</span>}{totalVRAM}</>
) : totalRAM ? (
<span>CPU · {totalRAM}</span>
) : <span className="cell-muted"></span>}
</td>
<td>
{rowState?.status === 'installing' ? (
<span className="badge badge-info">
<i className="fas fa-spinner fa-spin" style={{ marginRight: 4 }} />Installing
</span>
) : rowState?.status === 'done' ? (
<span className="badge badge-success">
<i className="fas fa-check" style={{ marginRight: 4 }} />Installed
</span>
) : rowState?.status === 'error' ? (
<button
type="button"
className="badge badge-error"
title={rowState.error}
aria-describedby={`err-${node.id}`}
style={{ border: 'none', cursor: 'help' }}
>
<i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />Failed
<span id={`err-${node.id}`} style={{ position: 'absolute', left: -9999 }}>{rowState.error}</span>
</button>
) : suit === 'installed' ? (
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
Installed
</span>
) : suit === 'override' ? (
<span className="badge badge-warning">
<i className="fas fa-exclamation-circle" style={{ marginRight: 4 }} />Override
</span>
) : (
<span className="badge badge-success" style={{ background: 'var(--color-success-light)', color: 'var(--color-success)' }}>
Compatible
</span>
)}
</td>
</tr>
)
})}
{filteredNodes.length === 0 && (
<tr>
<td colSpan={5} style={{ textAlign: 'center', padding: 'var(--spacing-md)', color: 'var(--color-text-muted)' }}>
No nodes match the current filters.
</td>
</tr>
)}
</tbody>
</table>
</div>
{pendingCount > 0 && (
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 0, marginBottom: 'var(--spacing-sm)' }}>
+{pendingCount} awaiting approval <a href="/app/nodes" style={{ color: 'var(--color-primary)' }}>approve from Nodes</a>.
</p>
)}
{/* Mismatch confirm */}
{showMismatchConfirm && (
<div className="card" style={{
marginBottom: 'var(--spacing-sm)',
padding: 'var(--spacing-md)',
background: 'var(--color-warning-light)',
border: '1px solid var(--color-warning-border)',
borderRadius: 'var(--radius-md)',
}}>
<p style={{ marginTop: 0, marginBottom: 'var(--spacing-sm)', color: 'var(--color-warning)', fontSize: '0.875rem' }}>
Installing {targetLabel.toLowerCase()} on {counts.overrides} node{counts.overrides === 1 ? '' : 's'} that don't match. Those nodes will run inference on the chosen build, not their native GPU. Continue?
</p>
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end' }}>
<button className="btn btn-secondary btn-sm" type="button" onClick={() => setShowMismatchConfirm(false)}>
Cancel
</button>
<button className="btn btn-primary btn-sm" type="button" onClick={submit}
style={{ background: 'var(--color-warning)', borderColor: 'var(--color-warning)' }}>
Install on {targetLabel.replace(' build', '')}
</button>
</div>
</div>
)}
</>
)}
</div>
{!noNodes && (
<div style={{
padding: 'var(--spacing-md) var(--spacing-lg)',
borderTop: '1px solid var(--color-border-subtle)',
display: 'flex',
alignItems: 'center',
gap: 'var(--spacing-sm)',
flexWrap: 'wrap',
}}>
<div style={{ flex: 1, fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>
{totalAttempted > 0 ? (
<>
{doneCount} of {totalAttempted} done
{errorCount > 0 && (
<> · <span className="badge badge-error" style={{ fontSize: '0.6875rem' }}>{errorCount} failed</span></>
)}
</>
) : (
<>
{counts.selected} {counts.selected === 1 ? 'node' : 'nodes'} selected
{counts.already > 0 && <> · {counts.already} already installed</>}
{counts.overrides > 0 && <> · {counts.overrides} override{counts.overrides === 1 ? '' : 's'}</>}
</>
)}
</div>
{errorCount > 0 && !submitting && (
<button className="btn btn-secondary btn-sm" type="button" onClick={retryFailed}>
<i className="fas fa-redo" /> Retry failed nodes
</button>
)}
<button className="btn btn-secondary btn-sm" type="button" onClick={onClose} disabled={submitting}>
{totalAttempted > 0 && doneCount > 0 ? 'Close' : 'Cancel'}
</button>
<button
className="btn btn-primary btn-sm"
type="button"
onClick={submit}
disabled={submitting || counts.selected === 0 || showMismatchConfirm}
>
{submitting ? (
<><i className="fas fa-spinner fa-spin" /> Installing…</>
) : (
<>Install on {counts.selected} {counts.selected === 1 ? 'node' : 'nodes'}</>
)}
</button>
</div>
)}
</Modal>
)
}

View File

@@ -0,0 +1,29 @@
// ResourceActions groups row-level buttons into a lifecycle cluster (start,
// stop, pin, reinstall, upgrade) and a destructive cluster (delete) with a
// thin divider between them, so a destructive intent visually separates from
// a routine one. Replaces the old 4px-gap row of buttons in the Manage page
// where Stop / Pin / Delete sat shoulder-to-shoulder with no visual cue
// telling apart "click to fiddle" from "click to throw away".
//
// `lifecycle` and `destructive` accept any ReactNode — typically one or more
// <button>s. The wrapping div stops click propagation so action clicks don't
// also expand the row.
export default function ResourceActions({ lifecycle, destructive }) {
const hasLifecycle = !!lifecycle
const hasDestructive = !!destructive
if (!hasLifecycle && !hasDestructive) return null
return (
<div className="resource-actions" onClick={e => e.stopPropagation()}>
{hasLifecycle && (
<div className="resource-actions__group">{lifecycle}</div>
)}
{hasLifecycle && hasDestructive && (
<span className="resource-actions__divider" aria-hidden="true" />
)}
{hasDestructive && (
<div className="resource-actions__group">{destructive}</div>
)}
</div>
)
}

View File

@@ -51,7 +51,7 @@ export default function ResourceMonitor() {
<div className="resource-bar-container" style={{ flex: 1 }}>
<div className="resource-bar" style={{ width: `${pct}%`, background: color }} />
</div>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color, minWidth: '3em', textAlign: 'right' }}>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color, minWidth: '3em', textAlign: 'right' }}>
{pct.toFixed(0)}%
</span>
</div>
@@ -76,7 +76,7 @@ export default function ResourceMonitor() {
<div className="resource-bar-container" style={{ flex: 1 }}>
<div className="resource-bar" style={{ width: `${ram.usage_percent || 0}%`, background: percentColor(ram.usage_percent || 0) }} />
</div>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
{(ram.usage_percent || 0).toFixed(0)}%
</span>
</div>
@@ -91,7 +91,7 @@ export default function ResourceMonitor() {
{isGpu && aggregate.gpu_count > 1 && (
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
<span>Total VRAM</span>
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>
<span style={{ fontFamily: 'var(--font-mono)' }}>
{formatBytes(aggregate.used_memory)} / {formatBytes(aggregate.total_memory)} ({aggregate.usage_percent?.toFixed(1)}%)
</span>
</div>
@@ -101,7 +101,7 @@ export default function ResourceMonitor() {
{resources.storage_size != null && (
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
<span>Models storage</span>
<span style={{ fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-primary)' }}>
<span style={{ fontFamily: 'var(--font-mono)', color: 'var(--color-text-primary)' }}>
{formatBytes(resources.storage_size)}
</span>
</div>

View File

@@ -0,0 +1,81 @@
import { Fragment } from 'react'
// ResourceRow renders the visible row + its conditional detail row as a pair
// of <tr>s, so the existing .table styling keeps applying and the Manage page
// can re-use the gallery's expand-to-detail interaction without inventing a
// new table system. The consumer owns the cells (which pass through as
// children) — this component only manages the click-to-expand handler, the
// dimmed state for disabled rows, and the colSpan'd detail row beneath.
//
// `onToggleExpand` fires on row click only. Buttons / toggles inside cells
// must call e.stopPropagation() (or be wrapped in an .actions-stop wrapper)
// to avoid double-triggering the expand.
export default function ResourceRow({
expanded,
onToggleExpand,
detail,
colSpan,
dimmed,
className = '',
children,
}) {
return (
<Fragment>
<tr
className={`resource-row${dimmed ? ' is-dimmed' : ''}${expanded ? ' is-expanded' : ''} ${className}`.trim()}
onClick={onToggleExpand}
style={{ cursor: onToggleExpand ? 'pointer' : 'default' }}
>
{children}
</tr>
{expanded && detail && (
<tr className="resource-row__detail-row">
<td colSpan={colSpan} className="resource-row__detail-cell">
{detail}
</td>
</tr>
)}
</Fragment>
)
}
// ChevronCell is the small rotating chevron used as the leftmost cell of an
// expandable row. Mirrors the Nodes/Models/Backends gallery affordance so
// users see the same "click to expand" cue everywhere.
export function ChevronCell({ expanded }) {
return (
<td className="resource-row__chevron-cell">
<span className={`row-chevron${expanded ? ' is-expanded' : ''}`} aria-hidden="true">
<i className="fas fa-chevron-right" />
</span>
</td>
)
}
// IconCell renders the 48px brand icon shell — the same one the Install
// gallery uses. `icon` is the image URL (from gallery metadata); when absent
// or broken we fall back to a FontAwesome glyph so custom-imported items
// still get a placeholder instead of an empty square.
export function IconCell({ icon, fallback = 'fa-cube', alt = '' }) {
return (
<td className="resource-row__icon-cell">
<div className="resource-row__icon">
{icon ? (
<img src={icon} alt={alt} loading="lazy" />
) : (
<i className={`fas ${fallback}`} />
)}
</div>
</td>
)
}
// StopPropagationCell wraps cell contents that contain interactive controls
// (Toggle, action buttons) so a click on them doesn't also expand the row.
export function StopPropagationCell({ children, ...props }) {
return (
<td {...props} onClick={e => e.stopPropagation()}>
{children}
</td>
)
}

View File

@@ -116,7 +116,7 @@ export default function SearchableSelect({
aria-expanded={open}
onClick={() => { if (!disabled) { setOpen(!open); setQuery(''); setFocusIndex(-1) } }}
style={{
width: '100%', padding: '4px 8px', fontSize: '0.8125rem',
width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem',
cursor: disabled ? 'not-allowed' : 'pointer',
display: 'flex', alignItems: 'center', gap: '6px',
background: 'var(--color-bg-primary)', border: '1px solid var(--color-border)',
@@ -145,7 +145,7 @@ export default function SearchableSelect({
value={query}
onChange={(e) => { setQuery(e.target.value); setFocusIndex(-1) }}
onKeyDown={handleKeyDown}
style={{ width: '100%', padding: '4px 8px', fontSize: '0.8125rem' }}
style={{ width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem' }}
/>
</div>
<div ref={listRef} role="listbox" style={{ overflowY: 'auto', maxHeight: 'min(200px, 50vh)' }}>

View File

@@ -1,4 +1,4 @@
import { useState, useEffect } from 'react'
import { useState, useEffect, useRef } from 'react'
import { NavLink, useNavigate, useLocation } from 'react-router-dom'
import ThemeToggle from './ThemeToggle'
import { useAuth } from '../context/AuthContext'
@@ -107,11 +107,22 @@ export default function Sidebar({ isOpen, onClose }) {
const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
const navigate = useNavigate()
const location = useLocation()
const closeBtnRef = useRef(null)
useEffect(() => {
fetch(apiUrl('/api/features')).then(r => r.json()).then(setFeatures).catch(() => {})
}, [])
// Move focus into the drawer when opened on mobile/tablet so keyboard
// and screen-reader users land inside the dialog. Targeting the close
// button avoids hijacking the visual focus to a nav item the user may
// not have meant to activate.
useEffect(() => {
if (!isOpen) return
const id = window.requestAnimationFrame(() => closeBtnRef.current?.focus())
return () => window.cancelAnimationFrame(id)
}, [isOpen])
// Auto-expand section containing the active route
useEffect(() => {
for (const section of sections) {
@@ -168,7 +179,11 @@ export default function Sidebar({ isOpen, onClose }) {
<>
{isOpen && <div className="sidebar-overlay" onClick={onClose} />}
<aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
<aside
id="app-sidebar"
className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}
aria-label="Primary navigation"
>
{/* Logo */}
<div className="sidebar-header">
<a href="./" className="sidebar-logo-link">
@@ -177,8 +192,13 @@ export default function Sidebar({ isOpen, onClose }) {
<a href="./" className="sidebar-logo-icon" title="LocalAI">
<img src={apiUrl('/static/logo.png')} alt="LocalAI" className="sidebar-logo-icon-img" />
</a>
<button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
<i className="fas fa-times" />
<button
ref={closeBtnRef}
className="sidebar-close-btn"
onClick={onClose}
aria-label="Close menu"
>
<i className="fas fa-times" aria-hidden="true" />
</button>
</div>

View File

@@ -0,0 +1,39 @@
// StatCard renders a single cluster/dashboard metric card. The left accent
// bar + icon chip color is driven by `accentVar` (a CSS custom property name,
// e.g. "--color-success") so the card reads as semantic without the caller
// having to reach into colors directly. `onClick` upgrades the card to a
// keyboard-focusable button — used by the Manage page so cards double as
// shortcuts to the relevant tab + filter.
export default function StatCard({ icon, label, value, color, accentVar, onClick }) {
const accent = color || (accentVar ? `var(${accentVar})` : 'var(--color-text-primary)')
const interactive = typeof onClick === 'function'
const handleKeyDown = interactive
? (e) => {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
onClick(e)
}
}
: undefined
return (
<div
className="stat-card"
data-clickable={interactive ? 'true' : undefined}
role={interactive ? 'button' : undefined}
tabIndex={interactive ? 0 : undefined}
onClick={interactive ? onClick : undefined}
onKeyDown={handleKeyDown}
style={accentVar ? { ['--stat-accent']: `var(${accentVar})` } : undefined}
>
<div className="stat-card__body">
<div className="stat-card__label">{label}</div>
<div className="stat-card__value" style={{ color: accent }}>{value}</div>
</div>
<div className="stat-card__icon" style={accentVar ? { color: accent } : undefined}>
<i className={icon} />
</div>
</div>
)
}

View File

@@ -24,7 +24,7 @@ export default function TemplateSelector({ onSelect }) {
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)', lineHeight: 1.5, margin: 0 }}>
{t.description}
</p>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: '4px', marginTop: 'var(--spacing-xs)' }}>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)', marginTop: 'var(--spacing-xs)' }}>
{Object.keys(t.fields).filter(k => k !== 'name').map(k => (
<span key={k} className="badge" style={{
fontSize: '0.6875rem', background: 'var(--color-bg-tertiary)',

View File

@@ -187,7 +187,7 @@ export default function UnifiedMCPDropdown({
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
value={url}
onChange={e => setUrl(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="text"
@@ -195,7 +195,7 @@ export default function UnifiedMCPDropdown({
placeholder="Name (optional)"
value={name}
onChange={e => setName(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="password"
@@ -203,13 +203,13 @@ export default function UnifiedMCPDropdown({
placeholder="Auth token (optional)"
value={authToken}
onChange={e => setAuthToken(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
Use CORS proxy
</label>
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
<button type="button" className="btn btn-sm btn-primary" onClick={handleAddClient} disabled={!url.trim()}>Add</button>
</div>

View File

@@ -0,0 +1,40 @@
import { useState, useEffect, useCallback } from 'react'
import { nodesApi } from '../utils/api'
// useDistributedMode probes /api/nodes to decide whether the running LocalAI
// is in distributed mode. The endpoint returns 503 when distributed mode is
// disabled — we treat any failure as standalone, mirroring the detection
// pattern in pages/Nodes.jsx so UI behaviour matches the Nodes page.
//
// Returns:
// enabled — true when the cluster API answered OK at least once
// nodes — the most recent /api/nodes response (array; possibly empty)
// loading — true until the first probe completes
// refetch — manual trigger; the picker calls this after install/delete
//
// Components that need a live nodes list (e.g. install picker) re-call
// refetch after operations complete. The hook does not poll on its own —
// the Nodes page handles its own 5s polling and the Backends gallery only
// needs a one-shot read on mount.
export function useDistributedMode() {
const [enabled, setEnabled] = useState(false)
const [nodes, setNodes] = useState([])
const [loading, setLoading] = useState(true)
const probe = useCallback(async () => {
try {
const data = await nodesApi.list()
setNodes(Array.isArray(data) ? data : [])
setEnabled(true)
} catch {
setEnabled(false)
setNodes([])
} finally {
setLoading(false)
}
}, [])
useEffect(() => { probe() }, [probe])
return { enabled, nodes, loading, refetch: probe }
}

View File

@@ -0,0 +1,53 @@
import { useState, useEffect, useCallback } from 'react'
import { modelsApi, backendsApi } from '../utils/api'
// useGalleryEnrichment fetches the full model + backend gallery once and
// returns lookup helpers used by the Manage page. The Manage list APIs only
// know name/version/alias — descriptions, icons, licenses, tags, and links
// live on the gallery side. Cross-referencing here lets us light up the
// installed lists with the same metadata the Install pages show, instead of
// rendering them as bare names.
//
// Items not present in the gallery (custom imports, external OCI installs)
// resolve to `null` — callers fall back to a neutral icon + "no description".
export function useGalleryEnrichment() {
const [modelMap, setModelMap] = useState(() => new Map())
const [backendMap, setBackendMap] = useState(() => new Map())
const [loaded, setLoaded] = useState(false)
useEffect(() => {
let cancelled = false
Promise.allSettled([
modelsApi.list({ items: 9999, page: 1 }),
backendsApi.list({ items: 9999, page: 1 }),
]).then(([m, b]) => {
if (cancelled) return
const mm = new Map()
if (m.status === 'fulfilled') {
const list = m.value?.models || []
for (const x of list) {
const key = x.name || x.id
if (key) mm.set(key, x)
}
}
const bm = new Map()
if (b.status === 'fulfilled') {
const raw = b.value
const list = Array.isArray(raw?.backends) ? raw.backends : Array.isArray(raw) ? raw : []
for (const x of list) {
const key = x.name || x.id
if (key) bm.set(key, x)
}
}
setModelMap(mm)
setBackendMap(bm)
setLoaded(true)
})
return () => { cancelled = true }
}, [])
const enrichModel = useCallback((name) => (name ? modelMap.get(name) || null : null), [modelMap])
const enrichBackend = useCallback((name) => (name ? backendMap.get(name) || null : null), [backendMap])
return { enrichModel, enrichBackend, loaded }
}

View File

@@ -6,9 +6,9 @@ export function useModels(capability) {
const [loading, setLoading] = useState(true)
const [error, setError] = useState(null)
const fetchModels = useCallback(async () => {
const fetchModels = useCallback(async ({ silent = false } = {}) => {
try {
setLoading(true)
if (!silent) setLoading(true)
const data = await modelsApi.listCapabilities()
let items = data?.data || []
if (capability) {
@@ -30,15 +30,19 @@ export function useModels(capability) {
setError(err.message)
}
} finally {
setLoading(false)
if (!silent) setLoading(false)
}
}, [capability])
// Subsequent refetches stay silent so consumers don't blank their tables
// (e.g. the Manage page auto-refreshes every 10s in distributed mode).
const refetch = useCallback(() => fetchModels({ silent: true }), [fetchModels])
useEffect(() => {
fetchModels()
}, [fetchModels])
return { models, loading, error, refetch: fetchModels }
return { models, loading, error, refetch }
}
export function useGalleryModels(params = {}) {

View File

@@ -12,14 +12,17 @@ html {
}
body {
font-family: 'Space Grotesk', -apple-system, BlinkMacSystemFont, sans-serif;
font-family: var(--font-sans);
font-size: var(--text-base);
font-weight: var(--font-weight-regular);
line-height: var(--leading-normal);
font-feature-settings: "ss01", "ss03", "cv11";
letter-spacing: -0.005em;
min-height: 100%;
background-color: var(--color-bg-primary);
color: var(--color-text-primary);
transition: background-color 200ms ease, color 200ms ease;
transition: background-color var(--duration-normal) var(--ease-default),
color var(--duration-normal) var(--ease-default);
}
#root {
@@ -27,36 +30,73 @@ body {
min-height: 100dvh;
}
/* Scrollbar */
::-webkit-scrollbar { width: 6px; height: 6px; }
::-webkit-scrollbar-track { background: var(--color-bg-primary); }
::-webkit-scrollbar-thumb { background: var(--color-bg-secondary); border-radius: 3px; }
::-webkit-scrollbar-thumb:hover { background: var(--color-primary); }
* { scrollbar-width: thin; scrollbar-color: var(--color-bg-secondary) var(--color-bg-primary); }
/* Global selection + focus */
::selection {
background: var(--color-primary-light);
color: var(--color-text-primary);
}
/* Typography */
/* Scrollbar — slightly wider, warmer thumb */
::-webkit-scrollbar { width: 10px; height: 10px; }
::-webkit-scrollbar-track { background: transparent; }
::-webkit-scrollbar-thumb {
background: var(--color-border-default);
border-radius: var(--radius-sm);
border: 2px solid var(--color-bg-primary);
}
::-webkit-scrollbar-thumb:hover { background: var(--color-border-strong); }
* { scrollbar-width: thin; scrollbar-color: var(--color-border-default) transparent; }
/* Typography — editorial hierarchy */
h1, h2, h3, h4, h5, h6 {
font-family: 'Space Grotesk', sans-serif;
font-family: var(--font-sans);
color: var(--color-text-primary);
line-height: var(--leading-tight);
letter-spacing: -0.01em;
}
h1 { font-size: var(--text-2xl); font-weight: var(--font-weight-semibold); }
h2 { font-size: var(--text-xl); font-weight: var(--font-weight-semibold); }
h3 { font-size: var(--text-lg); font-weight: var(--font-weight-semibold); }
h4 { font-size: var(--text-base); font-weight: var(--font-weight-semibold); }
h5, h6 { font-size: var(--text-sm); font-weight: var(--font-weight-semibold); }
h1 { font-size: var(--text-3xl); font-weight: var(--font-weight-medium); letter-spacing: -0.02em; }
h2 { font-size: var(--text-2xl); font-weight: var(--font-weight-medium); letter-spacing: -0.015em; }
h3 { font-size: var(--text-xl); font-weight: var(--font-weight-medium); letter-spacing: -0.01em; }
h4 { font-size: var(--text-lg); font-weight: var(--font-weight-medium); letter-spacing: -0.005em; }
h5 { font-size: var(--text-base);font-weight: var(--font-weight-semibold); }
h6 {
font-size: var(--text-xs); font-weight: var(--font-weight-semibold);
text-transform: uppercase; letter-spacing: 0.12em;
color: var(--color-text-muted);
}
code, pre {
font-family: 'JetBrains Mono', monospace;
code, pre, kbd, .mono {
font-family: var(--font-mono);
}
kbd {
display: inline-block;
padding: 1px 5px;
font-size: 0.75em;
font-weight: var(--font-weight-medium);
background: var(--color-bg-tertiary);
border: 1px solid var(--color-border-default);
border-radius: var(--radius-sm);
color: var(--color-text-secondary);
line-height: 1.4;
}
a {
color: var(--color-primary);
text-decoration: none;
transition: color var(--duration-fast) var(--ease-default);
}
a:hover {
color: var(--color-primary-hover);
}
/* Honor prefers-reduced-motion globally */
@media (prefers-reduced-motion: reduce) {
*, *::before, *::after {
animation-duration: 0.01ms !important;
animation-iteration-count: 1 !important;
transition-duration: 0.01ms !important;
scroll-behavior: auto !important;
}
}
/* Utility classes */

View File

@@ -403,7 +403,7 @@ export default function Account() {
if (!authEnabled) {
return (
<div className="page">
<div className="page page--narrow">
<div className="empty-state">
<div className="empty-state-icon"><i className="fas fa-user-gear" /></div>
<h2 className="empty-state-title">Account unavailable</h2>
@@ -418,7 +418,7 @@ export default function Account() {
const visibleTabs = isLocal ? TABS : TABS.filter(t => t.id !== 'security')
return (
<div className="page account-page">
<div className="page page--narrow account-page">
{/* Header */}
<div className="page-header">
<h1 className="page-title">Account</h1>

View File

@@ -101,6 +101,7 @@ export default function AgentChat() {
const messagesEndRef = useRef(null)
const messagesRef = useRef(null)
const textareaRef = useRef(null)
const stickToBottomRef = useRef(true)
const eventSourceRef = useRef(null)
const messageIdCounter = useRef(0)
const addMessageRef = useRef(addMessage)
@@ -260,11 +261,31 @@ export default function AgentChat() {
}
}, [name, userId, addToast, nextId])
// Auto-scroll to bottom
// Track whether the user is pinned to the bottom. If they scroll up
// while a response is streaming, stop forcing them back down.
useEffect(() => {
const el = messagesRef.current
if (!el) return
const onScroll = () => {
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
stickToBottomRef.current = distanceFromBottom < 80
}
el.addEventListener('scroll', onScroll, { passive: true })
return () => el.removeEventListener('scroll', onScroll)
}, [])
// Auto-scroll only when the user hasn't scrolled away from the bottom.
useEffect(() => {
if (!stickToBottomRef.current) return
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
}, [messages, streamContent, streamReasoning, streamToolCalls])
// When switching conversations, snap to bottom and re-pin.
useEffect(() => {
stickToBottomRef.current = true
messagesEndRef.current?.scrollIntoView({ behavior: 'auto' })
}, [activeId])
// Highlight code blocks
useEffect(() => {
if (messagesRef.current) highlightAll(messagesRef.current)

View File

@@ -97,7 +97,7 @@ function FormField({ field, value, onChange, disabled }) {
rows={5}
disabled={disabled}
style={field.name.includes('prompt') || field.name.includes('template') || field.name.includes('script')
? { fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' } : undefined}
? { fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' } : undefined}
/>
</div>
)
@@ -624,7 +624,7 @@ export default function AgentCreate() {
value={mcpRawJson}
onChange={(e) => setMcpRawJson(e.target.value)}
rows={16}
style={{ fontFamily: 'monospace', fontSize: '0.85rem', whiteSpace: 'pre' }}
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.85rem', whiteSpace: 'pre' }}
placeholder={'{\n "mcpServers": {\n "my-server": {\n "command": "npx",\n "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],\n "env": {}\n }\n }\n}'}
/>
</div>
@@ -812,14 +812,14 @@ export default function AgentCreate() {
if (loading) {
return (
<div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
<div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
<i className="fas fa-spinner fa-spin" style={{ fontSize: '2rem', color: 'var(--color-primary)' }} />
</div>
)
}
return (
<div className="page">
<div className="page page--narrow">
<style>{`
.agent-form-container {
display: flex;

View File

@@ -43,7 +43,7 @@ function TraceCard({ trace, index }) {
{trace.type || 'unknown'}
</span>
{trace.tool_name && (
<span style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem', color: 'var(--color-text-secondary)' }}>
<span style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem', color: 'var(--color-text-secondary)' }}>
{trace.tool_name}
</span>
)}
@@ -60,7 +60,7 @@ function TraceCard({ trace, index }) {
{trace.content && (
<pre style={{
whiteSpace: 'pre-wrap', wordBreak: 'break-word', margin: 0,
fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem',
fontFamily: 'var(--font-mono)', fontSize: '0.75rem',
color: 'var(--color-text-secondary)', lineHeight: 1.6,
}}>
{trace.content}
@@ -71,7 +71,7 @@ function TraceCard({ trace, index }) {
<span style={{ fontSize: '0.6875rem', fontWeight: 600, color: 'var(--color-text-muted)' }}>Arguments:</span>
<pre style={{
whiteSpace: 'pre-wrap', wordBreak: 'break-word', margin: '4px 0 0',
fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem',
fontFamily: 'var(--font-mono)', fontSize: '0.75rem',
color: 'var(--color-text-secondary)', lineHeight: 1.5,
}}>
{typeof trace.arguments === 'string' ? trace.arguments : JSON.stringify(trace.arguments, null, 2)}
@@ -162,9 +162,9 @@ export default function AgentJobDetails() {
return rendered
}
if (loading) return <div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
if (loading) return <div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
if (!job) return (
<div className="page">
<div className="page page--narrow">
<div className="empty-state">
<div className="empty-state-icon"><i className="fas fa-search" /></div>
<h2 className="empty-state-title">Job not found</h2>
@@ -177,7 +177,7 @@ export default function AgentJobDetails() {
const traces = Array.isArray(job.traces) ? job.traces : []
return (
<div className="page" style={{ maxWidth: 900 }}>
<div className="page page--narrow">
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<div>
<h1 className="page-title">Job Details</h1>
@@ -207,7 +207,7 @@ export default function AgentJobDetails() {
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 'var(--spacing-md)' }}>
<div>
<span className="form-label">Job ID</span>
<p style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem', wordBreak: 'break-all' }}>{job.id}</p>
<p style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', wordBreak: 'break-all' }}>{job.id}</p>
</div>
<div>
<span className="form-label">Task</span>
@@ -264,7 +264,7 @@ export default function AgentJobDetails() {
</h3>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)' }}>
{Object.entries(job.cron_parameters).map(([k, v]) => (
<span key={k} className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem' }}>
<span key={k} className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem' }}>
{k}={v}
</span>
))}
@@ -281,7 +281,7 @@ export default function AgentJobDetails() {
</h3>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)' }}>
{Object.entries(job.parameters).map(([k, v]) => (
<span key={k} className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem' }}>
<span key={k} className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem' }}>
{k}={v}
</span>
))}

View File

@@ -213,7 +213,7 @@ export default function AgentJobs() {
// Wizard: no models installed
if (!loading && models.length === 0) {
return (
<div className="page">
<div className="page page--wide">
<div className="page-header">
<h1 className="page-title">Agent Jobs</h1>
<p className="page-subtitle">Manage agent tasks and automated workflows</p>
@@ -240,7 +240,7 @@ export default function AgentJobs() {
// Wizard: models but no MCP
if (!loading && models.length > 0 && !hasMCPModels && tasks.length === 0) {
return (
<div className="page">
<div className="page page--wide">
<div className="page-header">
<h1 className="page-title">Agent Jobs</h1>
<p className="page-subtitle">Manage agent tasks and automated workflows</p>
@@ -253,7 +253,7 @@ export default function AgentJobs() {
</p>
<div style={{ background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-md)', padding: 'var(--spacing-md)', maxWidth: 500, margin: '0 auto var(--spacing-md)', textAlign: 'left' }}>
<p style={{ fontSize: '0.8125rem', fontWeight: 600, marginBottom: 'var(--spacing-xs)' }}>Example MCP configuration (YAML):</p>
<pre style={{ fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-secondary)', whiteSpace: 'pre-wrap' }}>{`mcp:
<pre style={{ fontSize: '0.75rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)', whiteSpace: 'pre-wrap' }}>{`mcp:
stdio:
- name: my-tool
command: /path/to/tool
@@ -273,7 +273,7 @@ export default function AgentJobs() {
}
return (
<div className="page">
<div className="page page--wide">
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<div>
<h1 className="page-title">Agent Jobs</h1>
@@ -345,7 +345,7 @@ export default function AgentJobs() {
</td>
<td>
{task.cron ? (
<span className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.6875rem' }}>
<span className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.6875rem' }}>
{task.cron}
</span>
) : '-'}
@@ -426,7 +426,7 @@ export default function AgentJobs() {
{filteredJobs.map(job => (
<tr key={job.id}>
<td>
<a onClick={() => navigate(`/app/agent-jobs/jobs/${job.id}`)} style={{ cursor: 'pointer', color: 'var(--color-primary)', fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>
<a onClick={() => navigate(`/app/agent-jobs/jobs/${job.id}`)} style={{ cursor: 'pointer', color: 'var(--color-primary)', fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
{job.id?.slice(0, 12)}...
</a>
</td>
@@ -510,7 +510,7 @@ export default function AgentJobs() {
<tbody>
{(items || []).map(job => (
<tr key={job.id}>
<td style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>{job.id?.slice(0, 12)}...</td>
<td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>{job.id?.slice(0, 12)}...</td>
<td>{job.task_id || '-'}</td>
<td>{statusBadge(job.status)}</td>
<td style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>{formatDate(job.created_at)}</td>
@@ -566,7 +566,7 @@ export default function AgentJobs() {
onChange={(e) => setExecuteParams(e.target.value)}
rows={5}
placeholder={`topic=AI trends\nformat=markdown`}
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
/>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
These will be available as {'{{.parameter_name}}'} in the prompt template.
@@ -590,7 +590,7 @@ export default function AgentJobs() {
{executeMultimedia[type].map((item, i) => (
<div key={i} style={{
display: 'flex', alignItems: 'center', justifyContent: 'space-between',
background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-sm)', padding: '4px 8px', fontSize: '0.75rem',
background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-sm)', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.75rem',
}}>
<span style={{ overflow: 'hidden', textOverflow: 'ellipsis', whiteSpace: 'nowrap' }}>{item.name || item.url?.slice(0, 40)}</span>
<button onClick={() => removeMultimedia(type, i)} style={{ background: 'none', border: 'none', color: 'var(--color-error)', cursor: 'pointer', padding: '2px 4px' }}>

View File

@@ -260,7 +260,7 @@ export default function AgentStatus() {
const tree = buildTree(observables)
return (
<div className="page">
<div className="page page--wide">
<style>{`
.as-card {
background: var(--color-bg-secondary);
@@ -294,7 +294,7 @@ export default function AgentStatus() {
.as-id {
font-size: 0.6875rem;
color: var(--color-text-muted);
font-family: 'JetBrains Mono', monospace;
font-family: var(--font-mono);
}
.as-summary-item {
display: flex; align-items: center; gap: 6px;
@@ -303,7 +303,7 @@ export default function AgentStatus() {
}
.as-summary-item i { font-size: 0.625rem; flex-shrink: 0; }
.as-summary-creation i { color: var(--color-primary); }
.as-summary-tool-call i { color: #f59e0b; }
.as-summary-tool-call i { color: var(--color-warning); }
.as-summary-completion i { color: var(--color-success); }
.as-summary-error i { color: var(--color-error); }
.as-card-body {
@@ -327,13 +327,13 @@ export default function AgentStatus() {
background: var(--color-bg-tertiary); color: var(--color-text-muted);
margin-right: 4px; vertical-align: middle;
}
.as-tag-error { background: var(--color-error); color: #fff; }
.as-tag-error { background: var(--color-error); color: var(--color-text-inverse); }
.as-error-text { color: var(--color-error); }
.as-raw { margin-top: var(--spacing-sm); }
.as-raw summary { font-size: 0.75rem; color: var(--color-text-muted); cursor: pointer; }
.as-json {
background: var(--color-bg-tertiary); border-radius: var(--radius-sm);
padding: var(--spacing-sm); font-family: 'JetBrains Mono', monospace;
padding: var(--spacing-sm); font-family: var(--font-mono);
font-size: 0.75rem; overflow-x: auto; white-space: pre-wrap;
word-break: break-word; max-height: 300px; overflow-y: auto;
}

View File

@@ -159,12 +159,12 @@ export default function AgentTaskDetails() {
const formatDate = (d) => d ? new Date(d).toLocaleString() : '-'
if (loading) return <div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
if (loading) return <div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
// View mode
if (!isNew && !isEdit) {
return (
<div className="page" style={{ maxWidth: 900 }}>
<div className="page page--narrow">
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<div>
<h1 className="page-title">{task.name || 'Task Details'}</h1>
@@ -198,7 +198,7 @@ export default function AgentTaskDetails() {
{task.cron && (
<div>
<span className="form-label">Cron Schedule</span>
<p style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>{task.cron}</p>
<p style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>{task.cron}</p>
</div>
)}
</div>
@@ -229,13 +229,13 @@ export default function AgentTaskDetails() {
<div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
<div>
<span className="form-label">Execute by name</span>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
{`curl -X POST ${window.location.origin}${basePath}/api/agent/tasks/${encodeURIComponent(task.name)}/execute`}
</pre>
</div>
<div>
<span className="form-label">Execute with multimedia</span>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
{`curl -X POST ${window.location.origin}${basePath}/api/agent/tasks/${encodeURIComponent(task.name)}/execute \\
-H "Content-Type: application/json" \\
-d '{"multimedia": {"images": [{"url": "https://example.com/image.jpg"}]}}'`}
@@ -243,7 +243,7 @@ export default function AgentTaskDetails() {
</div>
<div>
<span className="form-label">Check job status</span>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
{`curl ${window.location.origin}${basePath}/api/agent/jobs/<job-id>`}
</pre>
</div>
@@ -261,7 +261,7 @@ export default function AgentTaskDetails() {
<div key={i} style={{ background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-md)', padding: 'var(--spacing-sm)', marginBottom: 'var(--spacing-sm)' }}>
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', fontSize: '0.8125rem' }}>
<span className="badge badge-info">{wh.method || 'POST'}</span>
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>{wh.url}</span>
<span style={{ fontFamily: 'var(--font-mono)' }}>{wh.url}</span>
</div>
</div>
))}
@@ -283,7 +283,7 @@ export default function AgentTaskDetails() {
<tbody>
{jobHistory.map(job => (
<tr key={job.id}>
<td style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>
<td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
{job.id?.slice(0, 12)}...
</td>
<td>{statusBadge(job.status)}</td>
@@ -306,7 +306,7 @@ export default function AgentTaskDetails() {
// Edit/Create form
return (
<div className="page" style={{ maxWidth: 900 }}>
<div className="page page--narrow">
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<h1 className="page-title">{isNew ? 'Create Task' : 'Edit Task'}</h1>
<button className="btn btn-secondary btn-sm" onClick={() => navigate('/app/agent-jobs')}>
@@ -351,7 +351,7 @@ export default function AgentTaskDetails() {
onChange={(e) => updateField('prompt', e.target.value)}
rows={8}
placeholder={`Write a summary about {{.topic}} in {{.format}} format.`}
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
/>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
Use {'{{.parameter_name}}'} for dynamic parameters. Parameters are provided when executing the task.
@@ -376,7 +376,7 @@ export default function AgentTaskDetails() {
value={task.cron}
onChange={(e) => { updateField('cron', e.target.value); validateCron(e.target.value) }}
placeholder="0 */6 * * *"
style={{ fontFamily: "'JetBrains Mono', monospace" }}
style={{ fontFamily: 'var(--font-mono)' }}
/>
{cronError && <p style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 4 }}>{cronError}</p>}
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
@@ -392,7 +392,7 @@ export default function AgentTaskDetails() {
onChange={(e) => updateField('cron_parameters', e.target.value)}
rows={3}
placeholder={`topic=daily news\nformat=bullet points`}
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
/>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
Default parameters used when the cron triggers the task.
@@ -437,7 +437,7 @@ export default function AgentTaskDetails() {
</div>
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
<label className="form-label">Headers (JSON)</label>
<input className="input" value={ms.headers} onChange={(e) => updateMultimediaSource(i, 'headers', e.target.value)} placeholder='{"Authorization": "Bearer ..."}' style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }} />
<input className="input" value={ms.headers} onChange={(e) => updateMultimediaSource(i, 'headers', e.target.value)} placeholder='{"Authorization": "Bearer ..."}' style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }} />
</div>
</div>
))
@@ -479,7 +479,7 @@ export default function AgentTaskDetails() {
</div>
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
<label className="form-label">Headers (JSON)</label>
<input className="input" value={wh.headers} onChange={(e) => updateWebhook(i, 'headers', e.target.value)} placeholder='{"Content-Type": "application/json"}' style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }} />
<input className="input" value={wh.headers} onChange={(e) => updateWebhook(i, 'headers', e.target.value)} placeholder='{"Content-Type": "application/json"}' style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }} />
</div>
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
<label className="form-label">Payload Template (Go template syntax)</label>
@@ -489,7 +489,7 @@ export default function AgentTaskDetails() {
onChange={(e) => updateWebhook(i, 'payload_template', e.target.value)}
rows={3}
placeholder={`{"text": "Job {{.Status}}: {{if .Error}}Error: {{.Error}}{{else}}{{.Result}}{{end}}"}`}
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
/>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 2 }}>
Available: {'{{.Job}}'} {'{{.Task}}'} {'{{.Result}}'} {'{{.Error}}'} {'{{.Status}}'}

View File

@@ -136,7 +136,7 @@ export default function Agents() {
}
return (
<div className="page">
<div className="page page--wide">
<style>{`
.agents-import-input { display: none; }
.agents-toolbar {

View File

@@ -149,7 +149,7 @@ function BackendLogsDetail({ modelId }) {
}
return (
<div className="page">
<div className="page page--wide">
<div className="page-header">
<div>
<h1 className="page-title" style={{ marginBottom: 0 }}>
@@ -229,7 +229,7 @@ function BackendLogsDetail({ modelId }) {
borderRadius: 'var(--radius-md)',
overflow: 'auto',
maxHeight: 'calc(100vh - 280px)',
fontFamily: 'JetBrains Mono, Consolas, monospace',
fontFamily: 'var(--font-mono)',
fontSize: '0.75rem',
lineHeight: '1.5',
}}
@@ -283,7 +283,7 @@ export default function BackendLogs() {
// No model specified — redirect to System page
return (
<div className="page">
<div className="page page--wide">
<div className="empty-state">
<div className="empty-state-icon"><i className="fas fa-terminal" /></div>
<h2 className="empty-state-title">No model selected</h2>

Some files were not shown because too many files have changed in this diff Show More