Compare commits

...

53 Commits

Author SHA1 Message Date
Ettore Di Giacinto
3280b9a287 fix(distributed): per-replica backend logs (store aggregation + UI)
The multi-replica refactor (PR #9583) changed the worker's process key
from `modelID` to `modelID#replicaIndex`, but the BackendLogStore kept
the bare-modelID lookup. Result: every distributed deployment lost
backend logs in the Nodes UI — single-replica too, since even the
default capacity of 1 produces a `#0` suffix.

Two changes wired together:

* pkg/model: BackendLogStore.GetLines/Subscribe now treat a modelID
  without `#` as a model prefix and merge across all `modelID#N` replica
  buffers (timestamp-sorted for GetLines; fan-in for Subscribe). Calls
  with a full `modelID#N` key resolve exactly. ListModels strips
  replica suffixes and deduplicates so the listing surfaces one entry
  per loaded model.

* react-ui: per-replica log streams as the default. Loaded Models
  table disambiguates each row with a `rep N` pill (only when the node
  hosts >1 replica of a model). Each row's "View logs" link routes to
  the per-replica process key so operators see only that replica's
  output. The logs page renders the replica context as a chip in the
  title and surfaces a segmented control — `Replica 0 / 1 / … / All
  merged` — when the model has multiple replicas; the merged segment
  uses the bare-modelID URL (delegating to the store's prefix
  aggregation) for the side-by-side comparison case. Single-replica
  deployments see no extra UI.

Tests added first (TDD): the regression set in
backend_log_store_test.go reproduces the bug at the exact failure
point — GetLines/ListModels/Subscribe assertions all fail against the
broken code, all pass against the fix. TestSubscribe_PerReplicaFilter
pins the exact-key path so a future change can't silently break it.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:distill]
2026-04-27 20:55:24 +00:00
Ettore Di Giacinto
375bf1929d fix(ui): hide meta-dev backends in System → Backends Development toggle
The Manage view's flagsFor() short-circuited on b.IsMeta and returned
dev=false for every meta backend, so meta-dev entries
(e.g. llama-cpp-development, whisper-development, insightface-development)
leaked through the Development toggle in distributed mode and stayed
visible whether the toggle was on or off. The count chip even
under-reported because those rows were excluded from it.

Drop the IsMeta short-circuit and trust gallery enrichment for both
flags. Production metas (llama-cpp) are tagged isAlias=false /
isDevelopment=false in the gallery so they still pass both toggles;
meta-dev entries carry isDevelopment=true and now correctly hide
alongside concrete dev variants.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 20:38:20 +00:00
Ettore Di Giacinto
9a7f5e68bd ci(darwin): add native caches to backend_build_darwin
macOS runners can't use the registry-backed BuildKit cache (no Docker
daemon), so every darwin matrix run was paying full cost for brew
installs, Go module downloads, llama.cpp recompiles and Python wheel
resolution.

Wires actions/cache@v4 into the reusable workflow for four caches:

- Go modules + build cache (setup-go cache: true), shared across matrix
- Homebrew downloads + selected /opt/homebrew/Cellar entries, with
  HOMEBREW_NO_AUTO_UPDATE so restored Cellar paths stay stable
- ccache for the llama-cpp CMake variants, keyed on the pinned
  LLAMA_VERSION; CMAKE_*_COMPILER_LAUNCHER is exported via GITHUB_ENV
  so backend/cpp/llama-cpp/Makefile picks it up without script changes
- Python uv + pip wheel cache, keyed by backend + ISO week — same
  one-cold-rebuild-per-week cadence as the Linux DEPS_REFRESH

Read/write semantics match the existing BuildKit policy: every run
restores, only master/tag pushes save, so PRs can't pollute master's
warm cache.

Documents the new caches and the macOS-specific constraints in
.agents/ci-caching.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
2026-04-27 20:17:36 +00:00
Ettore Di Giacinto
6b63b47f61 feat(distributed): support multiple replicas of one model on the same node (#9583)
* feat(distributed): support multiple replicas of one model on the same node

The distributed scheduler implicitly assumed `(node_id, model_name)` was
unique, but the schema didn't enforce it and the worker keyed all gRPC
processes by model name alone. With `MinReplicas=2` against a single
worker, the reconciler "scaled up" every 30s but the registry never
advanced past 1 row — the worker re-loaded the model in-place every tick
until VRAM fragmented and the gRPC process died.

This change introduces multi-replica-per-node as a first-class concept,
with capacity-aware scheduling, a circuit breaker, and VRAM
soft-reservation. Operators can declare per-node capacity via the worker
flag `--max-replicas-per-model` (mirrored as auto-label
`node.replica-slots=N`) or override per-node from the UI.

* Schema: BackendNode gains MaxReplicasPerModel (default 1) and
  ReservedVRAM. NodeModel gains ReplicaIndex (composite with node_id +
  model_name). ModelSchedulingConfig gains UnsatisfiableUntil/Ticks for
  the reconciler circuit breaker.

* Registry: replica_index threaded through SetNodeModel, RemoveNodeModel,
  IncrementInFlight, DecrementInFlight, TouchNodeModel, GetNodeModel,
  SetNodeModelLoadInfo and the InFlightTrackingClient. New helpers:
  CountReplicasOnNode, NextFreeReplicaIndex (with ErrNoFreeSlot),
  RemoveAllNodeModelReplicas, FindNodesWithFreeSlot,
  ClusterCapacityForModel, ReserveVRAM/ReleaseVRAM (atomic UPDATE with
  ErrInsufficientVRAM), and the unsatisfiable-flag CRUD.

* Worker: processKey now `<modelID>#<replicaIndex>` so concurrent loads
  of the same model land on distinct ports. Adds CLI flag
  --max-replicas-per-model (env LOCALAI_MAX_REPLICAS_PER_MODEL, default 1)
  and emits the auto-label.

* Router: scheduleNewModel filters candidates by free slot, allocates the
  replica index, and soft-reserves VRAM before installing the backend.
  evictLRUAndFreeNode now deletes the targeted row by ID instead of all
  replicas of the model on the node — fixes a latent bug where evicting
  one replica orphaned its siblings.

* Reconciler: caps scale-up at ClusterCapacityForModel so a misconfig
  (MinReplicas > capacity) doesn't loop forever. After 3 consecutive
  ticks of capacity==0 it sets UnsatisfiableUntil for a 5m cooldown and
  emits a warning. ClearAllUnsatisfiable fires from Register,
  ApproveNode, SetNodeLabel(s), RemoveNodeLabel and
  UpdateMaxReplicasPerModel so a new node joining or label changes wake
  the reconciler immediately. scaleDownIdle removes highest-replica-index
  first to keep slots compact.

* Heartbeat resets reserved_vram to 0 — worker is the source of truth
  for actual free VRAM; the reservation is only for the in-tick race
  window between two scheduling decisions.

* Probe path (reconciler.probeLoadedModels and health.doCheckAll) now
  pass the row's replica_index to RemoveNodeModel so an unreachable
  replica doesn't orphan healthy siblings.

* Admin override: PUT /api/nodes/:id/max-replicas-per-model sets a
  sticky override (preserved across worker re-registration). DELETE
  clears the override so the worker's flag applies again on next
  register. Required because Kong defaults the worker flag to 1, so
  every worker restart would have silently reverted the UI value.

* React UI: always-visible slot badge on the node row (muted at default
  1, accented when >1); inline editor in the expanded drawer with
  pencil-to-edit, Save/Cancel, Esc/Enter, "(override)" indicator when
  the value is admin-set, and a "Reset" button to hand control back to
  the worker. Soft confirm when shrinking the cap below the count of
  loaded replicas. Scheduling rules table gets an "Unsatisfiable until
  HH:MM" status badge surfacing the cooldown.

* node.replica-slots filtered out of the labels strip on the row to
  avoid duplicating the slot badge.

23 new Ginkgo specs (registry, reconciler, inflight, health) cover:
multi-replica row independence, RemoveNodeModel of one replica
preserving siblings, NextFreeReplicaIndex slot allocation including
ErrNoFreeSlot, capacity-gated scale-up with circuit breaker tripping
and recovery on Register, scheduleDownIdle ordering, ClusterCapacity
math, ReserveVRAM admission gating, Heartbeat reset, override survival
across worker re-registration, and ResetMaxReplicasPerModel handing
control back. Plus 8 stdlib tests for the worker processKey / CLI /
auto-label.

Closes the flap reproduced on Qwen3.6-35B against the nvidia-thor
worker (single 128 GiB node, MinReplicas=2): the reconciler now caps
the scale-up at the cluster's actual capacity instead of looping.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Read] [Edit] [Bash] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:golang-testing]

* refactor(react-ui/nodes): tighten capacity editor copy + adopt ActionMenu for row actions

* Capacity editor hint trimmed from operator-doc-style ("Sourced from
  the worker's `--max-replicas-per-model` flag. Changing it here makes it
  a sticky admin override that survives worker restarts." → "Saved
  values stick across worker restarts.") and the override-state copy
  similarly compressed. The full mechanic is no longer needed in the UI
  — the override pill carries the meaning and the docs cover the rest.

* Node row actions migrated from an inline cluster of icon buttons
  (Drain / Resume / Trash) to the kebab ActionMenu used by /manage for
  per-row model actions, so dense Nodes tables stay clean. Approve
  stays as a prominent primary button — it's a stateful admission gate,
  not a routine action, and elevating it matches how /manage surfaces
  install-time decisions outside the menu.

* The expanded drawer's Labels section now filters node.replica-slots
  out of the editable label list. The label is owned by the Capacity
  editor above; surfacing it again as an editable label invited
  confusion (the Capacity save would clobber any direct edit).

Both backend and agent workers benefit — they share the row rendering
path, so the action menu and label filter apply to both.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:critique] [Skill:audit] [Skill:polish]

* fix(react-ui/nodes): suppress slot badge on agent workers

Agent workers don't load models, so the per-node replica capacity is
inapplicable to them. Showing "1× slots" on agent rows was a tiny
inconsistency from the unified rendering path — gate the badge on
node_type !== 'agent' so it only appears on backend workers.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp]

* refactor(react-ui/nodes): distill expanded drawer + restyle scheduling form

The expanded node drawer used to stack five panels — slot badge,
filled capacity box, Loaded Models h4+empty-state, Installed Backends
h4+empty-state, Labels h4+chips+form — making routine inspections feel
like a control panel. The scheduling rule form wrapped its mode toggle
as two 50%-width filled buttons that competed visually with the actual
primary action.

* Drawer: collapse three rarely-touched config zones (Capacity,
  Backends, Labels) into one `<details>` "Manage" disclosure (closed by
  default) with small uppercase eyebrow labels for each zone instead of
  parallel h4 sub-headings. Loaded Models stays as the at-a-glance
  headline with a single-line empty hint instead of a boxed empty state.
  CapacityEditor renders flat (no filled background) — the Manage
  disclosure provides framing.

* Scheduling form: replace the chunky 50%-width button-tabs with the
  project's existing `.segmented` control (icon + label, sized to
  content). Mode hint becomes a single tied line below. Fields stack
  vertically with helper text under inputs and a hairline divider above
  the right-aligned Save / Cancel.

The empty drawer collapses from ~5 stacked sections (~280px tall) to
two lines (~80px). The scheduling form now reads as a designed dialog
instead of raw building blocks. Both surfaces now match the typographic
density and weight of the rest of the admin pages.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:distill] [Skill:audit] [Skill:polish]

* feat(react-ui/nodes): replace scheduling form's model picker with searchable combobox

The native <select> made operators scroll through every gallery entry to
find a model name. The project already has SearchableModelSelect (used
in Studio/Talk/etc.) which combines free-text search with the gallery
list and accepts typed model names that aren't installed yet — useful
for pre-staging a scheduling rule before the node it'll run on has
finished bootstrapping.

Also drops the now-unused useModels import (the combobox manages the
gallery hook internally).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit]

* refactor(react-ui/nodes): consolidate key/value chip editor + add replica preset chips

The Nodes page was rendering the same key=value chip pattern in two
places with subtly different markup: the Labels editor in the expanded
drawer and (post-distill) the Node Selector input in the scheduling
form. The form's input was also a comma-separated string that operators
were getting wrong.

* Extract <KeyValueChips> as a fully controlled chip-builder. Parent
  owns the map and decides what onAdd/onRemove does — form state for the
  scheduling form, API calls for the live drawer Labels editor. Same
  visuals everywhere; one component to change when polish needs apply.

* Replace the comma-separated Node Selector text input with KeyValueChips.
  Operators were copying syntax from docs and missing commas; the chip
  vocabulary makes the key=value structure self-documenting.

* Add <ReplicaInput>: numeric input + quick-pick preset chips for Min/Max
  replicas. Picked over a slider because replica counts are exact specs
  derived from VRAM math (operator decision, not a fuzzy estimate). The
  chips give one-click access to common values (1/2/3/4 for Min,
  0=no-limit/2/4/8 for Max) without the slider's special-value problem
  (MaxReplicas=0 is categorical, not a position on a continuum).

* Drop the now-unused labelInputs state in the Nodes page (the inline
  label editor's per-node draft state lived there and is now owned by
  KeyValueChips).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:distill]

* test: fix CI fallout from multi-replica refactor (e2e/distributed + playwright)

Two breakages caught by CI that didn't surface in the local run:

* tests/e2e/distributed/*.go — multiple files used the pre-PR2 registry
  signatures for SetNodeModel / IncrementInFlight / DecrementInFlight /
  RemoveNodeModel / TouchNodeModel / GetNodeModel / SetNodeModelLoadInfo
  and one stale adapter.InstallBackend call in node_lifecycle_test.go.
  All updated to pass replicaIndex=0 — these tests don't exercise
  multi-replica behavior, they just need to compile against the new
  signatures. The chip-builder tests in core/services/nodes/ already
  cover the multi-replica logic.

* core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js — the
  drawer's distill refactor moved Backends inside a "Manage" <details>
  disclosure that's collapsed by default. The test helper expanded the
  node row but never opened Manage, so the per-node backend table was
  never in the DOM. Helper now clicks `.node-manage > summary` after
  expanding the row.

All 100 playwright tests pass locally; tests/e2e/distributed compiles
clean.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 21:20:05 +02:00
Ettore Di Giacinto
f4036fa83f ci(python-backends): add weekly DEPS_REFRESH cache-buster
The shared backend/Dockerfile.python ends in:
    RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.

DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.

Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.

Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 14:21:11 +00:00
Ettore Di Giacinto
3810fe1a1e fix(distributed): worker container healthcheck always unhealthy
The Dockerfile's HEALTHCHECK probes http://localhost:8080/readyz, which
is the OpenAI API server port. When the same image runs as a worker, it
listens on the gRPC base port (50051) and an HTTP file transfer server
on port-1 (50050) — nothing on 8080 — so docker always reports the
container as unhealthy.

Add unauthenticated /readyz and /healthz endpoints to the worker's HTTP
file transfer server, and override HEALTHCHECK_ENDPOINT for worker-1 in
the distributed compose file. Disable the healthcheck for agent-worker
since it is NATS-only and exposes no HTTP server.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 13:51:57 +00:00
Ettore Di Giacinto
bdfa5e934a ci: switch image/backend build cache to a dedicated registry image
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
  from the unused gha cache to type=registry pointing at
  quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
  ignore-error=true. Master/tag builds populate their own
  per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
  Dockerfile stage that was removed by b1fc5acd in July 2025, has been
  failing every night since, and never populated the gha cache. The new
  registry-cache scheme is self-warming, so no separate populator is
  needed.
- Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS
  build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in
  the root Dockerfile (the root Dockerfile no longer compiles gRPC; the
  source build now lives in backend/Dockerfile.{llama-cpp,
  ik-llama-cpp, turboquant} only and uses its own ARG defaults).
- Drop the unused grpc-base-image input from image_build.yml plus the
  matrix passthroughs in image.yml / image-pr.yml.
- Drop the unused GRPC_VERSION env in test.yml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 13:13:04 +00:00
Richard Palethorpe
deca6dbdad feat: Log backend exit code (#9581)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-27 14:19:18 +02:00
Ettore Di Giacinto
60549a8a60 feat(react-ui): page-width archetype system + mobile/tablet nav polish
Replace the universal max-width:1200px cap on .page with a four-tier
archetype system (narrow 760, medium 1080, default 1600, wide unbounded)
selected per page based on what its UX actually wants. Data/table pages
fill ultrawide displays; forms cap at reading width; tabbed feature
surfaces breathe.

Mobile/tablet:
- New 640/1024 breakpoint split. Tablets (640-1023) get a persistent
  52px icon rail; below 640 keeps the slide-off drawer.
- Drawer polish: body-scroll lock, Escape to close, focus moves into
  the drawer on open and back to the hamburger on close, aria-hidden
  + inert on main while open.
- Mobile top bar carries hamburger + theme toggle + account avatar
  (44x44 touch targets) so theme/account aren't trapped in the drawer.
- Page-level reflow on phones: page-header column-stacks, filter chips
  scroll horizontally, tables go edge-to-edge, OperationsBar overflows
  rather than wrapping. Honors prefers-reduced-motion.

Manage > Models: drop the toggle column; Enable/Disable joins the
per-row Actions menu alongside Stop/Pin/Edit/Logs/Delete for
consistency with the other action verbs.

Page-width tokens live in theme.css so future tuning is one line.
Removes 7 inline maxWidth workarounds from page roots.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Edit] [Bash]
2026-04-27 11:51:29 +00:00
Ettore Di Giacinto
54728e292f feat(react-ui): split Manage backends toggle into Variants and Development
Meta backends are now always shown — they're the entries operators
configure against — and two independent toggles govern the noise around
them. "Variants" hides platform-specific concrete builds that a meta
backend aliases on the host (e.g. llama-cpp-cuda12-12.4). "Development"
hides pre-release `-development` builds. Each toggle shows the count of
items currently hidden in its category. The legacy `bm` URL flag is
honored on read so existing deep-links resolve to the same view they
used to.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 08:23:53 +00:00
Tai An
86fd62233f fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) (#9580)
The overrides.parameters.model field referenced 'Qwen3.-27B-Claude-...' (missing the '5'), so model loads failed because the configured filename did not match the file actually downloaded by the entry's files: list ('Qwen3.5-27B-Claude-...').

Aligns the override filename with the files: entries and with the upstream HF repo (mradermacher/Qwen3.5-27B-...).
2026-04-27 09:24:00 +02:00
Alex Brick
41ed8ced70 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) (#9578)
Update additional intel base images
2026-04-27 09:18:57 +02:00
LocalAI [bot]
05e94bd9e7 chore: ⬆️ Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 (#9577)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-27 08:57:24 +02:00
Ettore Di Giacinto
8d124d080f feat(gallery): add whisper-development umbrella stanza
Mirrors the whisper capabilities map with -development variants so
clients can pull the master-tagged whisper.cpp backend via a single
platform-resolved name, matching the existing faster-whisper-development
and whisperx-development entries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 23:04:27 +00:00
Ettore Di Giacinto
2da1a4d230 feat(distributed): per-node backend installation from the gallery
In distributed mode the Backends gallery used to fan every install out to
every worker — fine for auto-resolving (meta) backends like llama-cpp where
each node picks its own variant, but wrong for hardware-specific builds
like cpu-llama-cpp that would silently land on every GPU node.

Adds a node-targeted install path through the existing
POST /api/nodes/:id/backends/install plumbing, with two entry points:

- Backends gallery row gets a split-button in distributed mode. Auto-
  resolving keeps "Install on all nodes" as the primary; chevron menu
  opens the picker. Hardware-specific routes the primary directly to the
  picker — no fan-out path on the row.
- Nodes-page drawer gets a "+ Add backend" button that navigates to
  /app/backends?target=<node-id>; the gallery scopes itself to that node
  (banner, single per-row install button, Reinstall/Remove for already-
  installed). One gallery, two scopes — no second UI to maintain.

The picker (new NodeInstallPicker) shows a 3-state suitability column
(Compatible / Override / Installed), an auto-expanding variant override
disclosure that fires when selected nodes have no working GPU, parallel
per-node installs with inline status and Retry-failed-nodes, and a
mismatch confirm that names the consequence on the button itself.

A 409 fan-out guard on /api/backends/apply protects CLI/Terraform/script
users from the same footgun: hardware-specific installs in distributed
mode now return code "concrete_backend_requires_target" with a human-
readable error and a meta_alternative pointer.

The gallery list payload now surfaces capabilities, metaBackendFor and
per-row nodes (NodeBackendRef) so the picker and the new Nodes column
have everything they need without re-walking the gallery client-side.

GODEBUG=netdns=go is set on the compose services because the cgo DNS
resolver follows the container's nsswitch.conf to host systemd-resolved
(127.0.0.53), unreachable from inside the container; the pure-Go
resolver reads /etc/resolv.conf directly and uses Docker's embedded DNS.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7[1m] [Edit] [Bash] [Read] [Write]
2026-04-26 22:05:18 +00:00
Ettore Di Giacinto
988430c850 test(react-ui): drive Manage page Backend logs link via the new kebab menu
Manage page row actions moved into ActionMenu in b336d9c6, so the
inline `<a title="Backend logs">` the e2e specs were asserting on no
longer exists. Open the row's kebab and assert against the menuitem.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
2026-04-26 20:51:01 +00:00
Ettore Di Giacinto
b336d9c626 feat(react-ui): polish Manage page with kebab menus and gallery rows
Bring the System / Manage page up to the visual standard of the Install
gallery so installed models and backends stop reading like a debug dump.

- Unified ResourceRow anatomy (icon, name+description, badges, status,
  expandable detail) shared across both tabs.
- Gallery enrichment cross-references installed names against the gallery
  list endpoints to surface icons, descriptions, license, tags, and links
  with a graceful "no description" fallback for custom imports.
- Header summary with four StatCards (Models / Backends / Running /
  Updates) — clickable to switch tab + pre-set filter.
- Backends meta + development entries hidden by default; "Show meta &
  development" paired toggle in the FilterBar with hidden-count hint.
- Kebab (three-dot) ActionMenu replaces the inline button cluster on
  every row; restrained until hover, keyboard-navigable, danger items
  separated by a divider.
- Backend "Version" cell now falls back to short digest, OCI tag, or
  ocifile basename when no semver is set, instead of showing "—" for
  every OCI install. Detail panel exposes full Source URI + Digest.
- Drop redundant column headers ("Actions", "On") — kebabs and toggles
  carry their own affordance; screen readers still get a label.
- Inline System / User / Meta / Dev badges next to the backend name so
  the dedicated Type column doesn't reserve space for "USER" repeated.
- Tightened the spacing between the System Resources card and the
  StatCards so they no longer crowd the RAM bar.

Extracted StatCard and GalleryLoader from Nodes.jsx and Models.jsx into
shared components so the visual language is one source of truth.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
2026-04-26 20:33:49 +00:00
Ettore Di Giacinto
f384c64a91 fix(model-loader): also skip .ckpt, .zip, and .tag files when scanning models
The local model directory scan treats every non-skipped file as a model
config candidate. Sidecar artifacts that ship alongside checkpoints
(checkpoint blobs, downloaded archives, ggml-style tag files) were
slipping through and showing up as bogus models in the listing. Add
their extensions to the suffix-skip list.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:37:53 +00:00
Ettore Di Giacinto
e9d8e92988 fix(react-ui): don't yank chat scroll to bottom while user is reading
The chat and agent-chat pages auto-scrolled to the bottom on every
streamed token. If the user scrolled up to re-read part of a response,
the next chunk pulled them back down — making long replies unreadable
while streaming.

Track a stickToBottomRef on each scroll event: if the user is within
80px of the bottom we keep auto-scrolling, otherwise we leave them
where they are. On chat switch we snap back to the bottom and re-pin.

Same fix applied to both Chat.jsx and AgentChat.jsx since they share
the same streaming pattern.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
5b0196c7d0 fix(whisper): scrub invalid UTF-8 from segment text before protobuf marshal
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
c8d63a1003 fix(react-ui): stop Manage page from blanking on auto-refresh; show real model use cases
- useModels.refetch now runs silently — distributed-mode 10s auto-refresh
  no longer flips loading=true and replaces the table with a spinner card.
- Manage Use Cases column derives badges from each model's actual
  capabilities (Chat / Image / TTS / Embeddings / etc.) instead of
  hardcoding a "Chat" link for every row.
- FilterBar right slot is right-aligned via margin-left:auto so the
  Update button lives at the end of the row, not next to the chips.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
LocalAI [bot]
d9cb0d6133 chore: ⬆️ Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 (#9572)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:35 +02:00
LocalAI [bot]
f5c268deac chore: ⬆️ Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a (#9571)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:25 +02:00
Tai An
8931a2ad31 fix(gallery): normalize inconsistent tag casing/plurals across gallery models (#9574)
- embeddings → embedding (6 models): aligns with the WebUI filter button
  defined in core/http/views/models.html ({ term: 'embedding', ... }), so
  models like nomic-embed-text-v1.5 now appear under the Embedding filter
- TTS → tts (5 models), ASR → asr (2 models): lowercase, per existing
  convention used by 161+ models
- CPU/Cpu → cpu (17 models), GPU → gpu (17 models): lowercase, per existing
  convention used by 666+ models
- dedupe duplicate tag entries on 3 models that already had repeated tags
  (gpt-oss-20b had gguf x2; arcee-ai/AFM-4.5B had gpu x2; one Qwen model
  had default x2)

Closes #9247
2026-04-26 08:33:38 +02:00
Ettore Di Giacinto
e16e758dff ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 (#9573)
Extend the existing CPU build matrix entries to produce a multi-arch
manifest (linux/amd64,linux/arm64) at the same image tags. arm64
Linux hosts without an NVIDIA GPU report the "default" capability,
which already maps to cpu-whisperx / cpu-faster-whisper in
backend/index.yaml -- so the manifest list lets Docker pull the right
variant without any gallery changes.

Both stacks install cleanly under aarch64: torch (2.4.1/2.8.0),
faster-whisper, ctranslate2, whisperx, opencv-python and the
remaining deps all ship manylinux2014_aarch64 wheels, so no source
builds run under QEMU emulation.

Follows the same pattern already used by cpu-llama-cpp-quantization.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 08:30:03 +02:00
LocalAI [bot]
1c45227346 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 (#9570)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 08:26:37 +02:00
Ettore Di Giacinto
fbe4f0a99b fix(docs): replace Docsy alert shortcode with Relearn notice
The docs site uses the hugo-theme-relearn theme, which provides
`notice` instead of Docsy's `alert`. The face-recognition,
voice-recognition, and stores feature pages used `{{% alert %}}`,
breaking `hugo build` with "template for shortcode \"alert\" not
found".

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 21:04:31 +00:00
Ettore Di Giacinto
d733c9cd13 fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds (#9568)
Blaizzy/mlx-vlm git HEAD bumped its constraint to mlx>=0.31.2, but
mlx-cuda-12 and mlx-cuda-13 are only published up to 0.31.1 on PyPI.
Since mlx[cudaXX]==0.31.2 forces a sibling wheel that doesn't exist,
pip backtracks through every older mlx[cudaXX], none of which satisfy
mlx>=0.31.2, producing ResolutionImpossible.

Pin all variants to the v0.4.4 tag (mlx>=0.30.0), which resolves
cleanly against mlx[cuda13]==0.31.1. cpu/mps weren't broken yet but
are pinned for consistency.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 22:06:01 +02:00
Ettore Di Giacinto
703b4fcae8 Change cron schedule to run every 12 hours
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-25 18:38:28 +02:00
Richard Palethorpe
73aacad2f9 fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557)
The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:

  ImportError: .../flash_attn_2_cuda...so: undefined symbol:
  _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.

vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.

Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-25 15:38:13 +00:00
LocalAI [bot]
806ea24ff4 chore: ⬆️ Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d (#9497)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:46 +02:00
LocalAI [bot]
385de3705e chore(model gallery): 🤖 add 1 new models via gallery agent (#9558)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:15 +02:00
Ettore Di Giacinto
21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 14:02:57 +02:00
Ettore Di Giacinto
24505e57f5 feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553)
* feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang

Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM
serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and
JetPack 7 boards:

- vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130
  prebuilt vllm + flash-attn).
- vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm
  0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master;
  cu12 path keeps the 0.14.0 pin to avoid disturbing existing images.
- sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the
  jetson-ai-lab SBSA cu130 index, so no source build is needed.
  Cublas13 sglang on x86_64 is intentionally deferred.

CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni},
-nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml
gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13)
and latest/development merge entries.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]

* fix(backends): use unsafe-best-match index strategy on l4t13 builds

The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.)
at limited versions / older Python ABIs. uv defaults to the first index
that contains a package and refuses to fall through to PyPI, so sglang
l4t13 build fails resolving decord. Mirror the existing cpu sglang
profile by setting --index-strategy=unsafe-best-match on l4t13 across
the three backends, and apply it to the explicit vllm install line in
vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS).

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]

* fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0

The [all] extra brings in outlines→decord, and decord has no aarch64
cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37
tags). With unsafe-best-match enabled, uv backtracked through sglang
versions trying to satisfy decord and silently landed on
sglang==0.1.16, an ancient version with an entirely different dep
tree (cloudpickle/outlines 0.0.44, etc.).

Drop [all] so decord is no longer required, and floor sglang at 0.5.0
to prevent any future resolver misfire from degrading the version
again.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 12:26:29 +02:00
LocalAI [bot]
d09706dc60 chore(model gallery): 🤖 add 1 new models via gallery agent (#9555)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 09:00:37 +02:00
LocalAI [bot]
08e393f7db chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 (#9549)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:59:48 +02:00
LocalAI [bot]
47cc3dc8d7 chore: ⬆️ Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 (#9548)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:58:27 +02:00
Ettore Di Giacinto
83b384de97 feat: surface distributed backend management errors (#9552)
* fix(distributed): surface per-node backend op errors to OpStatus

DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the
per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`.
When workers replied Success=false (e.g. an OCI image with no arm64
variant on a Jetson host), the per-node Error string was recorded in
result.Nodes[].Error but never reached the toplevel return value, so
OpStatus.Error stayed empty and the UI reported the install as
"completed" while the backend was nowhere on the cluster.

Add BackendOpResult.Err() that aggregates per-node Status=="error"
entries into a single error. Queued nodes (waiting for reconciler retry)
are deliberately not treated as failures. Wire the three callers and
DeleteBackendDetailed to call result.Err() so reply.Success=false
finally reaches OpStatus.Error → /api/backends/job/:uid → the UI.

The Delete closures had a related bug: they discarded the reply with
`_` and only checked the NATS round-trip error, so reply.Success=false
was a silent success even with the new aggregation. Check both.

Standalone mode (LocalBackendManager) already surfaces gallery errors
correctly through the same OpStatus.Error path; no change needed there.

Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct
errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]

* feat(react-ui): per-node backend delete + clearer upgrade affordance

The Nodes page exposed a per-node "reinstall" button (fa-sync-alt,
tooltip "Reinstall backend") but no per-node delete, even though the
Go side has had POST /api/nodes/:id/backends/delete →
RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up
for a while. Sync icons read as "refresh data" — the action is
functionally an upgrade (re-pulls the gallery image), so the affordance
was misleading.

Per-node backend row now renders two icon buttons:

- Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend
  on this node". Names both action and scope to differentiate from the
  cluster-wide upgrade on the Backends page.
- Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend
  from this node". Matches the node-level destructive style at the row
  action column rather than the solid btn-danger of primary destructive
  pages, since this is a secondary action inside a busy row.

Delete goes through the existing ConfirmDialog (danger=true) with copy
that names the backend and the node explicitly — it's a non-recoverable
op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which
already existed in the API client.

Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip),
delete button presence, confirm dialog flow with POST body assertion,
and cancel-doesn't-POST.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
2026-04-25 08:57:59 +02:00
Ettore Di Giacinto
487e3fd2a4 feat(react-ui): editorial refresh with Nord palette and polished primitives (#9550)
* feat(react-ui): editorial refresh with Nord palette and polished primitives

Replaces the cool gray-blue theme with a deep Nord-inspired palette:
frost-cyan accent (#88c0d0) on deep blue-black surfaces (#13171f /
#1a1f2a / #242a36), snow-storm text scale, aurora status colours.

- Typography: Geist Variable + Geist Mono Variable (Google Fonts) with
  ss01/ss03/cv11 stylistic alternates; strengthened h1-h6 hierarchy;
  editorial negative tracking.
- Primitives: buttons gain depth (inset highlight + hover lift +
  brightness filter); inputs become sunken wells with sage-swap-to-frost
  focus rings; cards hover-lift and gain an .card--accent left-rail
  variant; badges become mono caps rectangles with tabular-nums.
- Chrome: sidebar active state is now an inset left rail + tint
  (no border-left); modals get popIn animation and proper shadow lift;
  toasts carry an inset accent bar + slide-in instead of tinted fills;
  operations bar breathes on active installs.
- Empty states: editorial pattern (eyebrow rule, large mono title,
  52ch lede) that inherits gracefully even without page JSX edits.
- Chat: assistant bubbles drop the gray-nested-in-gray card for a
  transparent pull-quote with a left border; user bubbles soften from
  loud accent fill to a subtle frost tint.
- Motion: custom spring easing cubic-bezier(0.22,1,0.36,1), 180ms
  standard; breathing/pulse/popIn keyframes; global prefers-reduced-
  motion honoring.
- Radii tightened to 3/5/8/10px; warm-shadow tokens redone for cool
  depth; ::selection, :focus-visible, kbd globals added.
- Migrated hardcoded 'JetBrains Mono' CSS literals to var(--font-mono)
  so the Geist Mono swap lands everywhere.

Scope is intentionally tokens + primitives only. Page JSX and the
~1,800 inline style={{…}} instances are untouched and flagged as
follow-ups.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]

* feat(react-ui): complete-coverage pass — migrate inline styles to tokens

Follows up the editorial/Nord token refresh with a mechanical sweep of
page JSX and shared components so nothing bypasses the design system.

- Font family: replaced 80+ 'JetBrains Mono' / 'Space Grotesk' inline
  literals (and the string-CSS variants in CollectionDetails and
  AgentStatus) with var(--font-mono) / var(--font-sans). SVG <text>
  nodes that used the attribute form were switched to style={{ }} so
  the CSS variable resolves.
- Radii: every unquoted numeric borderRadius (2/3/4/10) is now a
  var(--radius-*) token; 50% and 999px kept as computed shapes.
- Spacing: clean-token gaps and margins (4/8/16px) moved to
  var(--spacing-xs/sm/md); padding: '4px 8px' and '8px 16px' lifted
  into token pairs. Micro-values (2/6/10/12px) left inline where no
  token maps cleanly.
- Colors: Talk.jsx button/canvas-surface hardcodes moved to
  var(--color-*); FineTune.jsx chart series colours now use the
  --color-data-* Nord palette (cyan/red/purple/orange instead of
  tailwind hex); AgentStatus tool-call icon and error tag hex swapped
  for var(--color-warning) / var(--color-text-inverse).
- CodeMirror editor (utils/cmTheme.js): both themes rebased on Nord —
  polar-night surfaces and aurora syntax highlighting (dark), snow-
  storm surfaces with darkened aurora (light). Caret/selection/active
  line/search now frost-cyan tinted instead of legacy indigo/purple.

Legitimately dynamic styles (computed widths, per-row colours, canvas
2D context fill/stroke for waveform and spectrogram drawing) remain
inline — they can't be expressed as CSS tokens.

29 files, +237/-237 — identity preserved, semantics re-anchored to
the token system.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]
2026-04-24 23:35:59 +02:00
dependabot[bot]
9ab3496de2 chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory (#9546)
chore(deps): bump rustls-webpki

Bumps the cargo group with 1 update in the /backend/rust/kokoros directory: [rustls-webpki](https://github.com/rustls/webpki).


Updates `rustls-webpki` from 0.103.10 to 0.103.13
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.10...v/0.103.13)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-version: 0.103.13
  dependency-type: indirect
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:58 +02:00
dependabot[bot]
c4511be33a chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9544)
chore(deps): bump postcss

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [postcss](https://github.com/postcss/postcss).


Updates `postcss` from 8.5.8 to 8.5.10
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.5.8...8.5.10)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.10
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:41 +02:00
Ettore Di Giacinto
551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545)
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.

Three causes addressed:

* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
  (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
  manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
  `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
  `/sys/devices/soc0/soc_id`.

* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
  `/proc/device-tree/model` was missing. That's what happens for Thor inside a
  docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
  resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
  (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
  correctly.

* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
  usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
  with `waitid: no child processes` under containers without `--init`. Each
  such heartbeat overwrote the DB and made the UI flip to "fully used".
  `heartbeatBody` now omits `available_vram` in that case so the DB keeps its
  last good value.

Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.

Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.

Assisted-by: Claude:opus-4.7 [Claude Code]
2026-04-24 22:02:23 +02:00
Andreas Egli
1d0de757c3 fix: add hipblaslt library (#9541)
Signed-off-by: Andreas Egli <github@kharan.ch>
2026-04-24 18:50:03 +02:00
Alex Brick
e5337039b0 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (#9543)
* Use latest oneapi-basekit image for Intel images

The current `localai/localai:master-gpu-intel` images don't work with the intel arc pro b70. Updating the base_image to 2025.3.2 fixes it.

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>

* Update github workflow base image

---------

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>
2026-04-24 18:29:10 +02:00
LocalAI [bot]
1c9592c77f chore: ⬆️ Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a (#9521)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-24 15:15:15 +02:00
Richard Palethorpe
3db60b57e6 fix(realtime): consume ChatDeltas when C++ autoparser clears Response (#9538)
The llama.cpp C++-side chat autoparser clears Reply.Message and delivers
parsed content/reasoning/tool-calls via Reply.chat_deltas. chat.go handles
this (non-SSE path uses ToolCallsFromChatDeltas/ContentFromChatDeltas/
ReasoningFromChatDeltas), but realtime.go only read pred.Response, so any
model routed through the autoparser (Qwen2.5/3 and friends) produced a
silent reply: backend emitted N tokens, the session surface saw zero.

Mirror the non-SSE chat path in realtime's triggerResponse: when deltas
carry tool calls or content, use them directly; otherwise fall back to
the existing raw-text parsing.

Assisted-by: claude-opus-4-7-1M [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-24 14:41:38 +02:00
Richard Palethorpe
13734ae9fa feat: Add Sherpa ONNX backend for ASR and TTS (#8523)
feat(backend): Add Sherpa ONNX backend and Omnilingual ASR

Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same
approach as opus/stablediffusion-ggml/whisper — a thin C shim
(csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego
can't reach directly: nested struct config writes, result-struct field
reads, and the streaming TTS callback trampoline. The Go side uses
opaque uintptr handles and purego.NewCallback for the TTS callback.

Supports:
- VAD via sherpa-onnx's Silero VAD
- Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC
- Online/streaming ASR: zipformer transducer with endpoint detection
  (AudioTranscriptionStream emits delta events during decode)
- Offline TTS: VITS (LJS, etc.)
- Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel,
  prefixed by a streaming WAV header

Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline
ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR),
silero-vad-sherpa, vits-ljs-sherpa.

E2E coverage: tests/e2e-backends for offline + streaming ASR,
tests/e2e for the full realtime pipeline (VAD + STT + TTS).

Assisted-by: claude-opus-4-7-1M [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-24 14:40:06 +02:00
Ettore Di Giacinto
c0920f3273 fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature (#9531)
Bumps ik_llama.cpp pin to 16996aeab7. Upstream 286ce32...16996ae adds a
trailing `const struct quantize_user_data *` parameter to
`ggml_quantize_chunk` (PR ikawrakow/ik_llama.cpp#1677) but leaves
`examples/llava/clip.cpp` unchanged because their build has moved to
`examples/mtmd/`. LocalAI's prepare.sh still copies from
`examples/llava/`, so the dead 7-arg call reaches the grpc-server
compile and fails. Patch the call site to pass `nullptr` for the new
param.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]
2026-04-24 13:07:26 +02:00
LocalAI [bot]
7c1934b183 chore: ⬆️ Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 (#9520)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-24 09:17:06 +02:00
Tai An
5e062b4d1f fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) (#9526)
* fix(anthropic): use SetFunctionCallNameString for specific tool forcing

* fix(openai/realtime): use SetFunctionCallNameString for specific tool forcing

* fix(openresponses): use SetFunctionCallNameString for specific tool forcing
2026-04-24 09:06:42 +02:00
Ettore Di Giacinto
4906cbad04 feat: add biometrics UI (#9524)
* feat(react-ui): add Face & Voice Recognition pages

Expose the face and voice biometrics endpoints
(/v1/face/*, /v1/voice/*) through the React UI. Each page has four
tabs driving the six endpoints per modality: Analyze (demographics
with bounding boxes / waveform segments), Compare (verify with a
match gauge and live threshold slider), Enrollment (register /
identify / forget with a top-K matches view), Embedding (raw
vector inspector with sparkline + copy).

MediaInput supports file upload plus live capture: webcam
snap-to-canvas for face, MediaRecorder -> AudioContext ->
16-bit PCM mono WAV transcode for voice (libsndfile on the
backend only handles WAV/FLAC/OGG natively).

Sidebar gets a new Biometrics section feature-gated on
face_recognition / voice_recognition; routes are wrapped in
<RequireFeature>. No new dependencies -- Font Awesome icons
picked from the Free set.

Assisted-by: Claude:Opus 4.7

* fix(localai): accept data URI prefixes with codec/charset params

Browser MediaRecorder produces data URIs like
  data:audio/webm;codecs=opus;base64,...
so the pre-';base64,' section can carry multiple parameter
segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go
and core/http/endpoints/localai/audio.go only matched exactly one
segment, so recordings straight from the React UI's live-capture
tab failed the strip and then tripped the base64 decoder on the
leading 'data:' literal, surfacing as
  "invalid audio base64: illegal base64 data at input byte 4"

Widened both regexes to `^data:[^,]+?;base64,` so any number of
';param=value' segments between the mime type and ';base64,' are
tolerated. Added a regression test covering the MediaRecorder
shape.

Assisted-by: Claude:Opus 4.7

* fix(insightface): scope pack ONNX loading to known manifests

LocalAI's gallery extracts buffalo_* zips flat into the models
directory, which inevitably mixes with ONNX files from other
backends (opencv face engine, MiniFASNet antispoof, WeSpeaker
voice embedding) and older buffalo pack installs. Feeding those
foreign files into insightface's model_zoo.get_model() blows up
inside the router -- it assumes a 4-D NCHW input and indexes
`input_shape[2]` on tensors that aren't shaped like a face model,
raising IndexError mid-load and leaving the backend unusable.

The router's dispatch isn't amenable to per-file try/except alone
(first-file-wins picks det_10g.onnx from buffalo_l even when the
user asked for buffalo_sc -- alphabetical order happens to favour
the wrong pack). Instead, ship an explicit manifest of the
upstream v0.7 pack contents and scope the glob to that when the
requested pack is known. The manifest is small and stable; future
packs can be added alongside or fall through to the tolerance
loop, which also swallows any remaining IndexError / ValueError
from foreign files with a clear `[insightface] skipped` stderr
line for diagnostics.

Assisted-by: Claude:Opus 4.7

* fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders

Pre-exported speaker-encoder ONNX graphs come in two shapes:

  rank-2  [batch, samples]           -- some 3D-Speaker exports,
                                        take raw waveform directly.
  rank-3  [batch, frames, n_mels]    -- WeSpeaker and most Kaldi-
                                        lineage encoders, expect
                                        pre-computed Kaldi FBank.

OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` --
correct for rank-2, IndexError-on-input_shape[3] on rank-3, which
surfaced to the UI as
  "Invalid rank for input: feats Got: 2 Expected: 3"

Detect the input rank at session init and run Kaldi FBank
(80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before
the forward pass when rank>=3. All knobs are configurable via
backend options for encoders that deviate from defaults.

torchaudio.compliance.kaldi is already in the backend's
requirements (SpeechBrain pulls torchaudio in), so no new
dependency.

Assisted-by: Claude:Opus 4.7

* fix(biometrics): isolate face and voice vector stores

Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker
256-D) biometric embeddings were colliding inside a single
in-memory local-store instance. Enrolling one after the other
failed with
  "Try to add key with length N when existing length is M"
because local-store correctly refuses to mix dimensions in one
keyspace.

The registries were constructed with `storeName=""`, which in
StoreBackend() is just a WithModel() call. But ModelLoader's
cache is keyed on `modelID`, not `model` -- so both registries
collapsed to the same `modelID=""` slot and reused the same
backend process despite looking isolated on paper.

Three complementary fixes:

  1. application.go -- give each registry a distinct default
     namespace ("localai-face-biometrics" /
     "localai-voice-biometrics"). The comment claimed
     isolation, now it's actually enforced.

  2. stores.go -- pass the storeName as both WithModelID and
     WithModel so the ModelLoader cache key separates
     namespaces and the loader spawns distinct processes.

  3. local-store/store.go -- drop the Load() `opts.Model != ""`
     guard. It was there to prevent generic model-loading loops
     from picking up local-store by accident, but that auto-load
     path is being retired; the guard now just blocks legitimate
     namespace isolation. opts.Model is treated as a tag; the
     per-tuple process isolation upstream handles discrimination.

Assisted-by: Claude:Opus 4.7

* fix(gallery): stale-file cleanup and upgrade-tmp directory safety

Two related robustness fixes for backend install/upgrade:

pkg/downloader/uri.go
  OCI downloads passed through
      if filepath.Ext(filePath) != "" ...
          filePath = filepath.Dir(filePath)
  which was intended to redirect file-shaped download targets
  into their parent directory for OCI extraction. The heuristic
  misfires on directory-shaped paths with a dot-suffix --
  gallery.UpgradeBackend uses
      tmpPath = "<backendsPath>/<name>.upgrade-tmp"
  and Go's filepath.Ext treats ".upgrade-tmp" as an extension.
  The rewrite landed the extraction at "<backendsPath>/", which
  then **overwrote the real install** (backends/<name>/) with a
  flat-layout file and left a stray run.sh at the top level. The
  tmp dir itself stayed empty, so the validation step that
  checked "<tmpPath>/run.sh" predictably failed with
      "upgrade validation failed: run.sh not found in new backend"
  Every manual upgrade silently corrupted the backends tree this
  way. Guard the rewrite behind "target isn't already an existing
  directory" -- InstallBackend / UpgradeBackend both pre-create
  the target as a directory, so they get the correct behaviour;
  existing file-path callers with a genuine dot-extension still
  get the parent redirect.

core/gallery/backends.go
  InstallBackend's MkdirAll returned ENOTDIR when something at
  the target path was already a file (legacy dev builds dropped
  golang backend binaries directly at `<backendsPath>/<name>`
  instead of nesting them under their own subdir). That
  permanently blocked reinstall and upgrade for anyone carrying
  that state, since every retry hit the same error. Detect a
  pre-existing non-directory, warn, and remove it before the
  MkdirAll so the fresh install can write the correct nested
  layout with metadata.json + run.sh.

Assisted-by: Claude:Opus 4.7

* fix(galleryop): refresh upgrade cache after backend ops

UpgradeChecker caches the last upgrade-check result and only
refreshes on the 6-hour tick or after an auto-upgrade cycle.
Manual upgrades (POST /api/backends/upgrade/:name) go through
the async galleryop worker, which completes the upgrade
correctly but never tells UpgradeChecker to re-check -- so
/api/backends/upgrades continued to list a just-upgraded backend
as upgradeable, indistinguishable from a failed upgrade, for up
to six hours.

Add an optional `OnBackendOpCompleted func()` hook on
GalleryService that fires after every successful install /
upgrade / delete on the backend channel (async, so a slow
callback doesn't stall the queue). startup.go wires it to
UpgradeChecker.TriggerCheck after both services exist. Result:
the upgrade banner clears within milliseconds of the worker
finishing.

Assisted-by: Claude:Opus 4.7

* build: prepend GOPATH/bin to PATH for protogen-go

install-go-tools runs `go install` for protoc-gen-go and
protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin.
That directory isn't on every dev's PATH, and protoc resolves
its code-gen plugins via PATH, so the immediately-following
protoc invocation fails with
  "protoc-gen-go: program not found"
which in turn blocks `make build` and any
`make backends/%` target that depends on build.

Prepend `go env GOPATH`/bin to PATH for the protoc invocation
so the freshly-installed plugins are found without requiring a
shell-profile change.

Assisted-by: Claude:Opus 4.7

* refactor(ui-api): non-blocking backend upgrade handler with opcache

POST /api/backends/upgrade/:name used to send the ManagementOp
directly onto the unbuffered BackendGalleryChannel, which blocked
the HTTP request whenever the galleryop worker was busy with a
prior operation. The op also didn't show up in /api/operations,
so the Backends UI couldn't reflect upgrade progress on the
affected row.

Register the op in opcache immediately, wrap it in a cancellable
context, store the cancellation function on the GalleryService,
and push onto the channel from a goroutine so the handler
returns right away. Response gains a `jobID` field and a
`message` string so clients have a consistent handle regardless
of whether the op is queued or running.

Pairs with the OnBackendOpCompleted hook added in the galleryop
commit — together the UI sees the upgrade start, watches
progress via /api/operations, and drops the "upgradeable" flag
the moment the worker finishes.

Assisted-by: Claude:Opus 4.7
2026-04-24 08:50:34 +02:00
LocalAI [bot]
c755cd5ab5 feat(swagger): update swagger (#9518)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 23:26:50 +02:00
LocalAI [bot]
0fb04f7ac3 chore(model-gallery): ⬆️ update checksum (#9522)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 23:26:27 +02:00
213 changed files with 14900 additions and 2047 deletions

View File

@@ -43,7 +43,7 @@ If you add a new language bucket, `scripts/changed-backends.js` also needs a bra
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
## 3. Add Backend Metadata to `backend/index.yaml`

111
.agents/ci-caching.md Normal file
View File

@@ -0,0 +1,111 @@
# CI Build Caching
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
## Cache layout
- **Cache registry**: `quay.io/go-skynet/ci-cache`
- **One tag per matrix entry**, derived from the existing `tag-suffix`:
- Backend builds (`backend_build.yml`): `cache<tag-suffix>`
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
## Read/write semantics
| Trigger | `cache-from` | `cache-to` |
|---|---|---|
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
| `pull_request` | yes | **no** |
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
## Self-warming, no separate populator
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
## The `DEPS_REFRESH` cache-buster (Python backends)
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
```dockerfile
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
```
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
`DEPS_REFRESH` defends against that:
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
- Within a week, builds stay warm.
This applies only to `Dockerfile.python` because:
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
### Adjusting the cadence
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
## Manually evicting cache
To force a fully cold build for one backend or the whole image:
```bash
# Delete a single tag (requires quay credentials with admin on the repo)
curl -X DELETE \
-H "Authorization: Bearer ${QUAY_TOKEN}" \
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
# List all tags
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
```
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
## What the cache **does not** cover
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
## Darwin native caches
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
| Cache | Path(s) | Key | Scope |
|---|---|---|---|
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
### Cache budget on Darwin
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
## Touching the cache pipeline
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.

View File

@@ -141,7 +141,7 @@ jobs:
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-whisperx'
runs-on: 'ubuntu-latest'
@@ -154,7 +154,7 @@ jobs:
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-faster-whisper'
runs-on: 'ubuntu-latest'
@@ -920,6 +920,32 @@ jobs:
backend: "turboquant"
dockerfile: "./backend/Dockerfile.turboquant"
context: "./"
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-vllm'
runs-on: 'arc-runner-set'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "vllm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-vllm-omni'
runs-on: 'arc-runner-set'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "vllm-omni"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1076,6 +1102,45 @@ jobs:
backend: "diffusers"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "vllm"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm-omni'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "vllm-omni"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-cuda-13-arm64-sglang'
runs-on: 'ubuntu-24.04-arm'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
ubuntu-version: '2404'
backend: "sglang"
dockerfile: "./backend/Dockerfile.python"
context: "./"
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
@@ -1671,7 +1736,7 @@ jobs:
tag-latest: 'auto'
tag-suffix: '-gpu-intel-rerankers'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "rerankers"
dockerfile: "./backend/Dockerfile.python"
@@ -1684,7 +1749,7 @@ jobs:
tag-latest: 'auto'
tag-suffix: '-gpu-intel-sycl-f32-llama-cpp'
runs-on: 'ubuntu-latest'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
skip-drivers: 'false'
backend: "llama-cpp"
dockerfile: "./backend/Dockerfile.llama-cpp"
@@ -2877,6 +2942,49 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
# sherpa-onnx CPU
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-sherpa-onnx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "sherpa-onnx"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# sherpa-onnx CUDA 12
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-sherpa-onnx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "sherpa-onnx"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
# sherpa-onnx CUDA 13 — requires onnxruntime 1.24.x+ for the
# gpu_cuda13 tarball; sherpa-onnx SHERPA_COMMIT pins to v1.12.39.
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-sherpa-onnx'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "sherpa-onnx"
dockerfile: "./backend/Dockerfile.golang"
context: "./"
ubuntu-version: '2404'
backend-jobs-darwin:
uses: ./.github/workflows/backend_build_darwin.yml
strategy:

View File

@@ -208,6 +208,15 @@ jobs:
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
# Weekly cache-buster for the per-backend `make` step. Most Python
# backends list unpinned deps (torch, transformers, vllm, ...), so a
# warm cache freezes upstream versions indefinitely. Rolling this
# weekly forces a re-resolve of the install layer at most once per
# week, picking up newer wheels without a full cold rebuild.
- name: Compute deps refresh key
id: deps_refresh
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Build and push
uses: docker/build-push-action@v7
if: github.event_name != 'pull_request'
@@ -222,9 +231,11 @@ jobs:
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
@@ -244,9 +255,10 @@ jobs:
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
push: ${{ env.quay_username != '' }}
tags: ${{ steps.meta_pull_request.outputs.tags }}

View File

@@ -48,6 +48,13 @@ jobs:
strategy:
matrix:
go-version: ['${{ inputs.go-version }}']
env:
# Keep the brew Cellar stable across cache restores. Without these,
# `brew install` would auto-update brew itself and re-link formulas,
# mutating the very paths the cache just restored.
HOMEBREW_NO_AUTO_UPDATE: '1'
HOMEBREW_NO_INSTALL_CLEANUP: '1'
HOMEBREW_NO_ANALYTICS: '1'
steps:
- name: Clone
uses: actions/checkout@v6
@@ -58,21 +65,141 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
# Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
# Shared across every darwin matrix entry — first job in a run warms
# it, the rest hit warm.
cache: true
# You can test your matrix by printing the current Go version
- name: Display Go version
run: go version
# ---- Homebrew cache ----
# macOS runners have no Docker daemon, so the BuildKit registry cache used
# for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
# We cache the brew downloads + Cellar entries for the formulas we install
# below. Read on every run, write only on master/tag pushes — same policy
# as the Linux registry cache.
- name: Restore Homebrew cache
id: brew-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
# ccache is always installed (used by the llama-cpp variant build) so
# the brew cache content stays stable across every backend in the
# matrix — they all share one cache key.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----
# Three CMake variants (fallback, grpc, rpc-server) compile the same
# llama.cpp source tree with overlapping flags — ccache dedupes object
# files across them. Key on the pinned LLAMA_VERSION so a pin bump
# invalidates cleanly; restore-keys fall back to the latest entry for the
# same pin so unchanged TUs stay warm even when the cache is fresh.
- name: Compute llama.cpp version
if: inputs.backend == 'llama-cpp'
id: llama-version
run: |
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
echo "version=${version}" >> "$GITHUB_OUTPUT"
- name: Restore ccache
if: inputs.backend == 'llama-cpp'
id: ccache-cache
uses: actions/cache/restore@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
restore-keys: |
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
- name: Configure ccache
if: inputs.backend == 'llama-cpp'
run: |
mkdir -p "$HOME/Library/Caches/ccache"
ccache -M 2G
ccache -z
# llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
{
echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
} >> "$GITHUB_ENV"
# ---- Python wheel cache (uv + pip) ----
# Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
# ISO-week segment of the cache key forces at most one cold rebuild per
# backend per week, automatically picking up newer wheels for unpinned
# deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
# recent build of the same backend so off-week PRs still hit warm.
- name: Compute weekly cache bucket
if: inputs.lang == 'python'
id: weekly
run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Restore Python wheel cache
if: inputs.lang == 'python'
id: pyenv-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
restore-keys: |
pyenv-darwin-${{ inputs.backend }}-
- name: Build ${{ inputs.backend }}-darwin
run: |
make protogen-go
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
- name: ccache stats
if: inputs.backend == 'llama-cpp'
run: ccache -s
- name: Save ccache
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
uses: actions/cache/save@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
- name: Save Python wheel cache
if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
- name: Upload ${{ inputs.backend }}.tar
uses: actions/upload-artifact@v7
with:

View File

@@ -2,7 +2,7 @@ name: Gallery Agent
on:
schedule:
- cron: '0 */3 * * *' # Run every 4 hours
- cron: '0 */12 * * *' # Run every 4 hours
workflow_dispatch:
inputs:
search_term:

View File

@@ -1,96 +0,0 @@
name: 'generate and publish GRPC docker caches'
on:
workflow_dispatch:
schedule:
# daily at midnight
- cron: '0 0 * * *'
concurrency:
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
generate_caches:
if: github.repository == 'mudler/LocalAI'
strategy:
matrix:
include:
- grpc-base-image: ubuntu:24.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}
steps:
- name: Release space from worker
if: matrix.runs-on == 'ubuntu-latest'
run: |
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
df -h
echo
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
sudo rm -rf /usr/local/lib/android
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
sudo rm -rf /usr/share/dotnet
sudo apt-get remove -y '^mono-.*' || true
sudo apt-get remove -y '^ghc-.*' || true
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
sudo apt-get remove -y 'php.*' || true
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
sudo apt-get remove -y '^google-.*' || true
sudo apt-get remove -y azure-cli || true
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
sudo apt-get remove -y '^gfortran-.*' || true
sudo apt-get remove -y microsoft-edge-stable || true
sudo apt-get remove -y firefox || true
sudo apt-get remove -y powershell || true
sudo apt-get remove -y r-base-core || true
sudo apt-get autoremove -y
sudo apt-get clean
echo
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
sudo rm -rfv build || true
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /opt/ghc || true
sudo rm -rf "/usr/local/share/boost" || true
sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
df -h
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v6
- name: Cache GRPC
uses: docker/build-push-action@v7
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
build-args: |
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
context: .
file: ./Dockerfile
cache-to: type=gha,ignore-error=true
cache-from: type=gha
target: grpc
platforms: ${{ matrix.platforms }}
push: false

View File

@@ -16,7 +16,7 @@ jobs:
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
- base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
runs-on: 'arc-runner-set'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}

View File

@@ -20,7 +20,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
@@ -60,15 +59,13 @@
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"

View File

@@ -25,7 +25,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
@@ -42,12 +41,11 @@
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
grpc-base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
core-image-build:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
@@ -60,7 +58,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
@@ -121,8 +118,7 @@
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
grpc-base-image: "ubuntu:24.04"
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
@@ -141,7 +137,6 @@
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}

View File

@@ -8,11 +8,6 @@ on:
description: 'Base image'
required: true
type: string
grpc-base-image:
description: 'GRPC Base image, must be a compatible image with base-image'
required: false
default: ''
type: string
build-type:
description: 'Build type'
default: ''
@@ -201,25 +196,19 @@ jobs:
if: github.event_name != 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
@@ -230,25 +219,18 @@ jobs:
if: github.event_name == 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
#push: true
tags: ${{ steps.meta_pull_request.outputs.tags }}

View File

@@ -40,6 +40,7 @@ jobs:
kokoros: ${{ steps.detect.outputs.kokoros }}
insightface: ${{ steps.detect.outputs.insightface }}
speaker-recognition: ${{ steps.detect.outputs.speaker-recognition }}
sherpa-onnx: ${{ steps.detect.outputs.sherpa-onnx }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
@@ -506,6 +507,72 @@ jobs:
- name: Build llama-cpp backend image and run audio transcription gRPC e2e tests
run: |
make test-extra-backend-llama-cpp-transcription
# Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked LLM.
# Builds the sherpa-onnx Docker image, extracts the rootfs so the e2e suite
# can discover the backend binary + shared libs, downloads the three model
# bundles (silero-vad, omnilingual-asr, vits-ljs) and drives the realtime
# websocket spec end-to-end.
tests-sherpa-onnx-realtime:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Build sherpa-onnx backend image and run realtime e2e tests
run: |
make test-extra-e2e-realtime-sherpa
# Streaming ASR via the sherpa-onnx online recognizer (zipformer
# transducer). Exercises both AudioTranscription (buffered) and
# AudioTranscriptionStream (real-time deltas) on the e2e-backends
# harness.
tests-sherpa-onnx-grpc-transcription:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build sherpa-onnx backend image and run streaming ASR gRPC e2e tests
run: |
make test-extra-backend-sherpa-onnx-transcription
# VITS TTS via the sherpa-onnx backend. Drives both TTS (file write) and
# TTSStream (PCM chunks) on the e2e-backends harness.
tests-sherpa-onnx-grpc-tts:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build sherpa-onnx backend image and run TTS gRPC e2e tests
run: |
make test-extra-backend-sherpa-onnx-tts
tests-ik-llama-cpp-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.ik-llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'

View File

@@ -9,9 +9,6 @@ on:
tags:
- '*'
env:
GRPC_VERSION: v1.65.0
concurrency:
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
@@ -195,7 +192,7 @@ jobs:
run: go version
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus ffmpeg
pip install --user --no-cache-dir grpcio-tools grpcio
- name: Setup Node.js
uses: actions/setup-node@v6

View File

@@ -19,6 +19,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
|------|-------------|
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |

View File

@@ -1,5 +1,4 @@
ARG BASE_IMAGE=ubuntu:24.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
ARG UBUNTU_CODENAME=noble
@@ -149,6 +148,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
GOCMD=go
GOTEST=$(GOCMD) test
@@ -394,7 +394,13 @@ protoc:
.PHONY: protogen-go
protogen-go: protoc install-go-tools
mkdir -p pkg/grpc/proto
./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
# install-go-tools writes protoc-gen-go and protoc-gen-go-grpc into
# $(shell go env GOPATH)/bin, which isn't on every dev's PATH. protoc
# resolves its code-gen plugins via PATH, so without this prefix the
# generate step fails with "protoc-gen-go: program not found". Prepend
# GOPATH/bin so the freshly-installed plugins win without requiring a
# shell-profile change.
PATH="$$(go env GOPATH)/bin:$$PATH" ./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
backend/backend.proto
core/config/inference_defaults.json: ## Fetch inference defaults from unsloth (only if missing)
@@ -780,6 +786,44 @@ test-extra-backend-speaker-recognition-ecapa: docker-build-speaker-recognition
test-extra-backend-speaker-recognition-all: \
test-extra-backend-speaker-recognition-ecapa
## Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked
## LLM. Extracts the sherpa-onnx Docker image rootfs, downloads the three
## gallery-referenced model bundles (silero-vad, omnilingual-asr, vits-ljs),
## writes the corresponding model config YAMLs, and runs the realtime
## websocket spec in tests/e2e with REALTIME_* env vars wiring the sherpa
## slots into the pipeline. The LLM slot stays on the in-repo mock-backend
## registered unconditionally by tests/e2e/e2e_suite_test.go. See
## tests/e2e/run-realtime-sherpa.sh for the full orchestration.
test-extra-e2e-realtime-sherpa: build-mock-backend docker-build-sherpa-onnx protogen-go react-ui
bash tests/e2e/run-realtime-sherpa.sh
## Streaming ASR via the sherpa-onnx online recognizer. Uses the streaming
## zipformer English model (encoder/decoder/joiner int8 + tokens) from the
## sherpa-onnx gallery entry. Drives both AudioTranscription and
## AudioTranscriptionStream via the e2e-backends gRPC harness; streaming
## emits real partial deltas during decode. Each file is renamed on download
## to the shape sherpa-onnx's online loader expects (encoder.int8.onnx etc.).
test-extra-backend-sherpa-onnx-transcription: docker-build-sherpa-onnx
BACKEND_IMAGE=local-ai-backend:sherpa-onnx \
BACKEND_TEST_MODEL_URL='https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx#encoder.int8.onnx' \
BACKEND_TEST_EXTRA_FILES='https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx#decoder.int8.onnx|https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx#joiner.int8.onnx|https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/tokens.txt' \
BACKEND_TEST_AUDIO_URL=https://github.com/ggml-org/whisper.cpp/raw/master/samples/jfk.wav \
BACKEND_TEST_CAPS=health,load,transcription \
BACKEND_TEST_OPTIONS=subtype=online \
$(MAKE) test-extra-backend
## VITS TTS via the sherpa-onnx backend. Pulls the individual files from
## HuggingFace (the vits-ljs release tarball lives on the k2-fsa github
## but is also mirrored as discrete files on HF). Exercises both
## TTS (write-to-file) and TTSStream (PCM chunks + WAV header) via the
## e2e-backends gRPC harness.
test-extra-backend-sherpa-onnx-tts: docker-build-sherpa-onnx
BACKEND_IMAGE=local-ai-backend:sherpa-onnx \
BACKEND_TEST_MODEL_URL='https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx#vits-ljs.onnx' \
BACKEND_TEST_EXTRA_FILES='https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt|https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt' \
BACKEND_TEST_CAPS=health,load,tts \
$(MAKE) test-extra-backend
## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
## tool-call extraction via sglang's native qwen parser. CPU builds use
## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
@@ -839,7 +883,7 @@ docker-cuda12:
docker-image-intel:
docker build \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04 \
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
--build-arg GO_TAGS="$(GO_TAGS)" \
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
@@ -917,6 +961,7 @@ BACKEND_VOXTRAL = voxtral|golang|.|false|true
BACKEND_ACESTEP_CPP = acestep-cpp|golang|.|false|true
BACKEND_QWEN3_TTS_CPP = qwen3-tts-cpp|golang|.|false|true
BACKEND_OPUS = opus|golang|.|false|true
BACKEND_SHERPA_ONNX = sherpa-onnx|golang|.|false|true
# Python backends with root context
BACKEND_RERANKERS = rerankers|python|.|false|true
@@ -1029,12 +1074,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP_QUANTIZATION)))
$(eval $(call generate-docker-build-target,$(BACKEND_TINYGRAD)))
$(eval $(call generate-docker-build-target,$(BACKEND_KOKOROS)))
$(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
########################################################
### Mock Backend for E2E Tests

View File

@@ -147,6 +147,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -206,6 +206,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -162,6 +162,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
@@ -202,6 +203,13 @@ COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
ARG FROM_SOURCE=""
ENV FROM_SOURCE=${FROM_SOURCE}
# Cache-buster for the per-backend `make` step. Most Python backends list
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
# would otherwise freeze upstream versions indefinitely. CI passes a value
# that rolls weekly so the install layer is rebuilt at most once per week
# and picks up newer wheels from PyPI / nightly indexes.
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
# Package GPU libraries into the backend's lib directory
@@ -216,4 +224,4 @@ RUN if [ -f "/${BACKEND}/package.sh" ]; then \
FROM scratch
ARG BACKEND=rerankers
COPY --from=builder /${BACKEND}/ /
COPY --from=builder /${BACKEND}/ /

View File

@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=286ce324baed17c95faec77792eaa6bdb1c7a5f5
IK_LLAMA_VERSION?=3a945af45d45936341a45bbf7deda56776a4af26
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -0,0 +1,11 @@
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2494,7 +2494,7 @@
}
new_data = work.data();
- new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr);
+ new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr, nullptr);
} else {
new_type = cur->type;
new_data = cur->data;

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=0d0764dfd257c0ae862525c05778207f87b99b1c
LLAMA_VERSION?=f53577432541bb9edc1588c4ef45c66bf07e4468
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -642,6 +642,21 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
} else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
params.no_op_offload = false;
}
} else if (!strcmp(optname, "split_mode") || !strcmp(optname, "sm")) {
// Accepts: none | layer | row | tensor (the latter requires a llama.cpp build
// that includes ggml-org/llama.cpp#19378, FlashAttention enabled, and KV-cache
// quantization disabled).
if (optval != NULL) {
if (optval_str == "none") {
params.split_mode = LLAMA_SPLIT_MODE_NONE;
} else if (optval_str == "layer") {
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
} else if (optval_str == "row") {
params.split_mode = LLAMA_SPLIT_MODE_ROW;
} else if (optval_str == "tensor") {
params.split_mode = LLAMA_SPLIT_MODE_TENSOR;
}
}
} else if (!strcmp(optname, "kv_unified") || !strcmp(optname, "unified_kv")) {
if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
params.kv_unified = true;

View File

@@ -1,7 +1,7 @@
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
TURBOQUANT_VERSION?=11a241d0db78a68e0a5b99fe6f36de6683100f6a
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
CMAKE_ARGS?=

View File

@@ -4,7 +4,6 @@ package main
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
import (
"container/heap"
"errors"
"fmt"
"math"
"slices"
@@ -100,9 +99,16 @@ func sortIntoKeySlicese(keys []*pb.StoresKey) [][]float32 {
}
func (s *Store) Load(opts *pb.ModelOptions) error {
if opts.Model != "" {
return errors.New("not implemented")
}
// local-store is an in-memory vector store with no on-disk artefact to
// load — opts.Model is just a namespace identifier. The old `!= ""` guard
// rejected any non-empty model name with "not implemented", which broke
// callers that pass a namespace to isolate embedding spaces (face vs.
// voice biometrics both go through local-store but need distinct stores
// so ArcFace 512-D and ECAPA-TDNN 192-D don't collide). Namespace
// isolation is already handled upstream: ModelLoader spawns a fresh
// local-store process per (backend, model) tuple, so each namespace is
// its own Store{} instance. Nothing to do here beyond accepting the load.
_ = opts
return nil
}

11
backend/go/sherpa-onnx/.gitignore vendored Normal file
View File

@@ -0,0 +1,11 @@
.cache/
sources/
build*/
package/
backend-assets/
sherpa-onnx
*.so
compile_commands.json
sherpa-onnx-whisper-*
vits-ljs/
streaming-zipformer-en/

View File

@@ -0,0 +1,120 @@
CURRENT_DIR=$(abspath ./)
GOCMD=go
ONNX_VERSION?=1.24.4
# v1.12.39 — includes upstream's onnxruntime 1.24.4 bump (#3501). Earlier
# pinned commits only support onnxruntime 1.23.2, which has no CUDA 13
# pre-built tarball, blocking the -gpu-nvidia-cuda-13 build matrix entry.
SHERPA_COMMIT?=7288d15e3e31a7bd589b2ba88828d521e7a6b140
ONNX_ARCH?=x64
ONNX_OS?=linux
ifneq (,$(findstring aarch64,$(shell uname -m)))
ONNX_ARCH=aarch64
endif
ifeq ($(OS),Darwin)
ONNX_OS=osx
ifneq (,$(findstring aarch64,$(shell uname -m)))
ONNX_ARCH=arm64
else ifneq (,$(findstring arm64,$(shell uname -m)))
ONNX_ARCH=arm64
else
ONNX_ARCH=x86_64
endif
endif
# Upstream onnxruntime ships CUDA 12 and CUDA 13 variants under different
# names: -gpu-<ver>.tgz for CUDA 12, -gpu_cuda13-<ver>.tgz for CUDA 13
# (note underscore vs dash). CUDA 13 tarballs only exist from 1.24.x onward.
ifeq ($(BUILD_TYPE),cublas)
SHERPA_GPU=ON
ONNX_PROVIDER=cuda
ifeq ($(CUDA_MAJOR_VERSION),13)
ONNX_VARIANT=-gpu_cuda13
else
ONNX_VARIANT=-gpu
endif
else
ONNX_VARIANT=
SHERPA_GPU=OFF
ONNX_PROVIDER=cpu
endif
JOBS?=$(shell nproc --ignore=1 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
sources/onnxruntime:
mkdir -p sources/onnxruntime
curl -L https://github.com/microsoft/onnxruntime/releases/download/v$(ONNX_VERSION)/onnxruntime-$(ONNX_OS)-$(ONNX_ARCH)$(ONNX_VARIANT)-$(ONNX_VERSION).tgz \
-o sources/onnxruntime/onnxruntime.tgz
cd sources/onnxruntime && tar -xf onnxruntime.tgz --strip-components=1 && rm onnxruntime.tgz
sources/sherpa-onnx: sources/onnxruntime
git clone https://github.com/k2-fsa/sherpa-onnx.git sources/sherpa-onnx
cd sources/sherpa-onnx && git checkout $(SHERPA_COMMIT)
mkdir -p sources/sherpa-onnx/build
# sherpa-onnx's cmake detects a pre-installed onnxruntime via the
# SHERPA_ONNXRUNTIME_{INCLUDE,LIB}_DIR env vars (not via -D flags).
# Point them at our locally-downloaded Microsoft tarball — without
# this, sherpa-onnx falls through to download_onnxruntime() which
# fetches from csukuangfj/onnxruntime-libs. For the GPU 1.24.4
# build that release mirror publishes `-patched.zip` instead of the
# expected `.tgz`, so the download 404s and the build fails.
cd sources/sherpa-onnx/build && \
SHERPA_ONNXRUNTIME_INCLUDE_DIR=$(CURRENT_DIR)/sources/onnxruntime/include \
SHERPA_ONNXRUNTIME_LIB_DIR=$(CURRENT_DIR)/sources/onnxruntime/lib \
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_FLAGS="-Wno-error=format-security" \
-DCMAKE_CXX_FLAGS="-Wno-error=format-security" \
-DSHERPA_ONNX_ENABLE_GPU=$(SHERPA_GPU) \
-DSHERPA_ONNX_ENABLE_TTS=ON \
-DSHERPA_ONNX_ENABLE_BINARY=OFF \
-DSHERPA_ONNX_ENABLE_PYTHON=OFF \
-DSHERPA_ONNX_ENABLE_TESTS=OFF \
-DSHERPA_ONNX_ENABLE_C_API=ON \
-DBUILD_SHARED_LIBS=ON \
-DSHERPA_ONNX_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE=ON \
..
cd sources/sherpa-onnx/build && make -j$(JOBS)
backend-assets/lib: sources/sherpa-onnx sources/onnxruntime
mkdir -p backend-assets/lib
cp -rfLv sources/onnxruntime/lib/* backend-assets/lib/
cp -rfLv sources/sherpa-onnx/build/lib/*.so* backend-assets/lib/ 2>/dev/null || true
cp -rfLv sources/sherpa-onnx/build/lib/*.dylib backend-assets/lib/ 2>/dev/null || true
# libsherpa-shim wraps sherpa-onnx's nested config structs and TTS
# callback plumbing behind a purego-friendly API: opaque handles plus
# fixed-signature setters/getters/trampoline. Plain C compile — no cgo.
SHIM_EXT=so
ifeq ($(OS),Darwin)
SHIM_EXT=dylib
endif
backend-assets/lib/libsherpa-shim.$(SHIM_EXT): csrc/shim.c csrc/shim.h backend-assets/lib
$(CC) -shared -fPIC -O2 \
-I$(CURRENT_DIR)/sources/sherpa-onnx/sherpa-onnx/c-api \
-o $@ csrc/shim.c \
-L$(CURRENT_DIR)/backend-assets/lib \
-lsherpa-onnx-c-api \
-Wl,-rpath,'$$ORIGIN'
sherpa-onnx: backend-assets/lib backend-assets/lib/libsherpa-shim.$(SHIM_EXT)
CGO_ENABLED=0 $(GOCMD) build \
-ldflags "$(LD_FLAGS) -X main.onnxProvider=$(ONNX_PROVIDER)" \
-tags "$(GO_TAGS)" -o sherpa-onnx ./
package:
bash package.sh
build: sherpa-onnx package
clean:
rm -rf sherpa-onnx sources/ backend-assets/ package/ vits-ljs/ sherpa-onnx-whisper-*/
test: sherpa-onnx
LD_LIBRARY_PATH=$(CURRENT_DIR)/backend-assets/lib \
bash test.sh
.PHONY: build package clean test

View File

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,169 @@
package main
import (
"os"
"path/filepath"
"testing"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestSherpaBackend(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Sherpa-ONNX Backend Suite")
}
// Load libsherpa-shim + libsherpa-onnx-c-api via purego before any spec
// runs — otherwise any Load/TTS/VAD/AudioTranscription call hits a nil
// function pointer. LD_LIBRARY_PATH must contain the directory holding
// both .so files; test.sh sets this.
var _ = BeforeSuite(func() {
Expect(loadSherpaLibs()).To(Succeed())
})
var _ = Describe("Sherpa-ONNX", func() {
Context("lifecycle", func() {
It("is locking (C API is not thread safe)", func() {
Expect((&SherpaBackend{}).Locking()).To(BeTrue())
})
It("errors loading a non-existent model", func() {
tmpDir, err := os.MkdirTemp("", "sherpa-test-nonexistent")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpDir)
err = (&SherpaBackend{}).Load(&pb.ModelOptions{
ModelFile: filepath.Join(tmpDir, "non-existent-model.onnx"),
})
Expect(err).To(HaveOccurred())
})
It("errors loading a non-existent ASR model", func() {
tmpDir, err := os.MkdirTemp("", "sherpa-test-asr")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpDir)
err = (&SherpaBackend{}).Load(&pb.ModelOptions{
ModelFile: filepath.Join(tmpDir, "model.onnx"),
Type: "asr",
})
Expect(err).To(HaveOccurred())
})
It("dispatches Load by Type", func() {
tmpDir, err := os.MkdirTemp("", "sherpa-test-dispatch")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpDir)
modelFile := filepath.Join(tmpDir, "model.onnx")
for _, typ := range []string{"", "asr", "vad"} {
err := (&SherpaBackend{}).Load(&pb.ModelOptions{ModelFile: modelFile, Type: typ})
Expect(err).To(HaveOccurred(), "Type=%q", typ)
}
})
})
Context("method errors without loaded model", func() {
It("rejects TTS", func() {
tmpDir, err := os.MkdirTemp("", "sherpa-test-tts")
Expect(err).ToNot(HaveOccurred())
defer os.RemoveAll(tmpDir)
err = (&SherpaBackend{}).TTS(&pb.TTSRequest{
Text: "should fail — no model loaded",
Dst: filepath.Join(tmpDir, "output.wav"),
})
Expect(err).To(HaveOccurred())
})
It("rejects AudioTranscription", func() {
_, err := (&SherpaBackend{}).AudioTranscription(&pb.TranscriptRequest{
Dst: "/tmp/nonexistent.wav",
})
Expect(err).To(HaveOccurred())
})
It("rejects VAD", func() {
_, err := (&SherpaBackend{}).VAD(&pb.VADRequest{
Audio: []float32{0.1, 0.2, 0.3},
})
Expect(err).To(HaveOccurred())
})
})
Context("type detection", func() {
DescribeTable("isASRType",
func(input string, want bool) {
Expect(isASRType(input)).To(Equal(want))
},
Entry("asr", "asr", true),
Entry("ASR", "ASR", true),
Entry("Asr", "Asr", true),
Entry("transcription", "transcription", true),
Entry("Transcription", "Transcription", true),
Entry("transcribe", "transcribe", true),
Entry("Transcribe", "Transcribe", true),
Entry("tts", "tts", false),
Entry("empty", "", false),
Entry("other", "other", false),
Entry("vad", "vad", false),
)
DescribeTable("isVADType",
func(input string, want bool) {
Expect(isVADType(input)).To(Equal(want))
},
Entry("vad", "vad", true),
Entry("VAD", "VAD", true),
Entry("Vad", "Vad", true),
Entry("asr", "asr", false),
Entry("tts", "tts", false),
Entry("empty", "", false),
Entry("other", "other", false),
)
})
Context("option parsing", func() {
It("parses float options with fallback on bad input", func() {
opts := &pb.ModelOptions{Options: []string{
"vad.threshold=0.3",
"tts.length_scale=1.25",
"bad.number=not-a-float",
}}
Expect(findOptionFloat(opts, "vad.threshold=", 0.5)).To(BeNumerically("~", 0.3, 1e-6))
Expect(findOptionFloat(opts, "tts.length_scale=", 1.0)).To(BeNumerically("~", 1.25, 1e-6))
Expect(findOptionFloat(opts, "missing.key=", 0.7)).To(BeNumerically("~", 0.7, 1e-6))
Expect(findOptionFloat(opts, "bad.number=", 9.9)).To(BeNumerically("~", 9.9, 1e-6))
})
It("parses int options with fallback on bad input", func() {
opts := &pb.ModelOptions{Options: []string{
"asr.sample_rate=22050",
"online.chunk_samples=800",
"bad.int=4.2",
}}
Expect(findOptionInt(opts, "asr.sample_rate=", 16000)).To(Equal(int32(22050)))
Expect(findOptionInt(opts, "online.chunk_samples=", 1600)).To(Equal(int32(800)))
Expect(findOptionInt(opts, "missing.key=", 42)).To(Equal(int32(42)))
Expect(findOptionInt(opts, "bad.int=", 100)).To(Equal(int32(100)))
})
It("parses bool options (0/1, true/false, yes/no, on/off)", func() {
opts := &pb.ModelOptions{Options: []string{
"online.enable_endpoint=0",
"asr.sense_voice.use_itn=True",
"feature.on=yes",
"feature.off=Off",
"feature.bad=maybe",
}}
Expect(findOptionBool(opts, "online.enable_endpoint=", 1)).To(Equal(int32(0)))
Expect(findOptionBool(opts, "asr.sense_voice.use_itn=", 0)).To(Equal(int32(1)))
Expect(findOptionBool(opts, "feature.on=", 0)).To(Equal(int32(1)))
Expect(findOptionBool(opts, "feature.off=", 1)).To(Equal(int32(0)))
Expect(findOptionBool(opts, "feature.bad=", 1)).To(Equal(int32(1)))
Expect(findOptionBool(opts, "missing.key=", 1)).To(Equal(int32(1)))
})
})
})

View File

@@ -0,0 +1,325 @@
#include "shim.h"
#include "c-api.h"
#include <stdlib.h>
#include <string.h>
// Replace the char* field pointed to by `slot` with a strdup of `s`
// (or NULL if s is NULL). Frees any prior value. Silently no-ops when
// strdup fails — the caller will see a Create* failure downstream.
static void shim_set_str(const char **slot, const char *s) {
free((char *)*slot);
*slot = s ? strdup(s) : NULL;
}
// ==================================================================
// VAD config
// ==================================================================
void *sherpa_shim_vad_config_new(void) {
return calloc(1, sizeof(SherpaOnnxVadModelConfig));
}
void sherpa_shim_vad_config_free(void *h) {
if (!h) return;
SherpaOnnxVadModelConfig *c = (SherpaOnnxVadModelConfig *)h;
free((char *)c->silero_vad.model);
free((char *)c->provider);
free(c);
}
void sherpa_shim_vad_config_set_silero_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxVadModelConfig *)h)->silero_vad.model, v);
}
void sherpa_shim_vad_config_set_silero_threshold(void *h, float v) {
((SherpaOnnxVadModelConfig *)h)->silero_vad.threshold = v;
}
void sherpa_shim_vad_config_set_silero_min_silence_duration(void *h, float v) {
((SherpaOnnxVadModelConfig *)h)->silero_vad.min_silence_duration = v;
}
void sherpa_shim_vad_config_set_silero_min_speech_duration(void *h, float v) {
((SherpaOnnxVadModelConfig *)h)->silero_vad.min_speech_duration = v;
}
void sherpa_shim_vad_config_set_silero_window_size(void *h, int32_t v) {
((SherpaOnnxVadModelConfig *)h)->silero_vad.window_size = v;
}
void sherpa_shim_vad_config_set_silero_max_speech_duration(void *h, float v) {
((SherpaOnnxVadModelConfig *)h)->silero_vad.max_speech_duration = v;
}
void sherpa_shim_vad_config_set_sample_rate(void *h, int32_t v) {
((SherpaOnnxVadModelConfig *)h)->sample_rate = v;
}
void sherpa_shim_vad_config_set_num_threads(void *h, int32_t v) {
((SherpaOnnxVadModelConfig *)h)->num_threads = v;
}
void sherpa_shim_vad_config_set_provider(void *h, const char *v) {
shim_set_str(&((SherpaOnnxVadModelConfig *)h)->provider, v);
}
void sherpa_shim_vad_config_set_debug(void *h, int32_t v) {
((SherpaOnnxVadModelConfig *)h)->debug = v;
}
void *sherpa_shim_create_vad(void *h, float buffer_size_seconds) {
return (void *)SherpaOnnxCreateVoiceActivityDetector(
(const SherpaOnnxVadModelConfig *)h, buffer_size_seconds);
}
// ==================================================================
// Offline TTS config (VITS)
// ==================================================================
void *sherpa_shim_tts_config_new(void) {
return calloc(1, sizeof(SherpaOnnxOfflineTtsConfig));
}
void sherpa_shim_tts_config_free(void *h) {
if (!h) return;
SherpaOnnxOfflineTtsConfig *c = (SherpaOnnxOfflineTtsConfig *)h;
free((char *)c->model.vits.model);
free((char *)c->model.vits.tokens);
free((char *)c->model.vits.lexicon);
free((char *)c->model.vits.data_dir);
free((char *)c->model.provider);
free(c);
}
void sherpa_shim_tts_config_set_vits_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.model, v);
}
void sherpa_shim_tts_config_set_vits_tokens(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.tokens, v);
}
void sherpa_shim_tts_config_set_vits_lexicon(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.lexicon, v);
}
void sherpa_shim_tts_config_set_vits_data_dir(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.data_dir, v);
}
void sherpa_shim_tts_config_set_vits_noise_scale(void *h, float v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.noise_scale = v;
}
void sherpa_shim_tts_config_set_vits_noise_scale_w(void *h, float v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.noise_scale_w = v;
}
void sherpa_shim_tts_config_set_vits_length_scale(void *h, float v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.length_scale = v;
}
void sherpa_shim_tts_config_set_num_threads(void *h, int32_t v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.num_threads = v;
}
void sherpa_shim_tts_config_set_debug(void *h, int32_t v) {
((SherpaOnnxOfflineTtsConfig *)h)->model.debug = v;
}
void sherpa_shim_tts_config_set_provider(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.provider, v);
}
void sherpa_shim_tts_config_set_max_num_sentences(void *h, int32_t v) {
((SherpaOnnxOfflineTtsConfig *)h)->max_num_sentences = v;
}
void *sherpa_shim_create_offline_tts(void *h) {
return (void *)SherpaOnnxCreateOfflineTts(
(const SherpaOnnxOfflineTtsConfig *)h);
}
// ==================================================================
// Offline recognizer config
// ==================================================================
void *sherpa_shim_offline_recog_config_new(void) {
return calloc(1, sizeof(SherpaOnnxOfflineRecognizerConfig));
}
void sherpa_shim_offline_recog_config_free(void *h) {
if (!h) return;
SherpaOnnxOfflineRecognizerConfig *c = (SherpaOnnxOfflineRecognizerConfig *)h;
free((char *)c->model_config.provider);
free((char *)c->model_config.tokens);
free((char *)c->model_config.whisper.encoder);
free((char *)c->model_config.whisper.decoder);
free((char *)c->model_config.whisper.language);
free((char *)c->model_config.whisper.task);
free((char *)c->model_config.paraformer.model);
free((char *)c->model_config.sense_voice.model);
free((char *)c->model_config.sense_voice.language);
free((char *)c->model_config.omnilingual.model);
free((char *)c->decoding_method);
free(c);
}
void sherpa_shim_offline_recog_config_set_num_threads(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.num_threads = v;
}
void sherpa_shim_offline_recog_config_set_debug(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.debug = v;
}
void sherpa_shim_offline_recog_config_set_provider(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.provider, v);
}
void sherpa_shim_offline_recog_config_set_tokens(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.tokens, v);
}
void sherpa_shim_offline_recog_config_set_feat_sample_rate(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->feat_config.sample_rate = v;
}
void sherpa_shim_offline_recog_config_set_feat_feature_dim(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->feat_config.feature_dim = v;
}
void sherpa_shim_offline_recog_config_set_decoding_method(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->decoding_method, v);
}
void sherpa_shim_offline_recog_config_set_whisper_encoder(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.encoder, v);
}
void sherpa_shim_offline_recog_config_set_whisper_decoder(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.decoder, v);
}
void sherpa_shim_offline_recog_config_set_whisper_language(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.language, v);
}
void sherpa_shim_offline_recog_config_set_whisper_task(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.task, v);
}
void sherpa_shim_offline_recog_config_set_whisper_tail_paddings(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.tail_paddings = v;
}
void sherpa_shim_offline_recog_config_set_paraformer_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.paraformer.model, v);
}
void sherpa_shim_offline_recog_config_set_sense_voice_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.model, v);
}
void sherpa_shim_offline_recog_config_set_sense_voice_language(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.language, v);
}
void sherpa_shim_offline_recog_config_set_sense_voice_use_itn(void *h, int32_t v) {
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.use_itn = v;
}
void sherpa_shim_offline_recog_config_set_omnilingual_model(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.omnilingual.model, v);
}
void *sherpa_shim_create_offline_recognizer(void *h) {
return (void *)SherpaOnnxCreateOfflineRecognizer(
(const SherpaOnnxOfflineRecognizerConfig *)h);
}
// ==================================================================
// Online recognizer config
// ==================================================================
void *sherpa_shim_online_recog_config_new(void) {
return calloc(1, sizeof(SherpaOnnxOnlineRecognizerConfig));
}
void sherpa_shim_online_recog_config_free(void *h) {
if (!h) return;
SherpaOnnxOnlineRecognizerConfig *c = (SherpaOnnxOnlineRecognizerConfig *)h;
free((char *)c->model_config.transducer.encoder);
free((char *)c->model_config.transducer.decoder);
free((char *)c->model_config.transducer.joiner);
free((char *)c->model_config.tokens);
free((char *)c->model_config.provider);
free((char *)c->decoding_method);
free(c);
}
void sherpa_shim_online_recog_config_set_transducer_encoder(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.encoder, v);
}
void sherpa_shim_online_recog_config_set_transducer_decoder(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.decoder, v);
}
void sherpa_shim_online_recog_config_set_transducer_joiner(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.joiner, v);
}
void sherpa_shim_online_recog_config_set_tokens(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.tokens, v);
}
void sherpa_shim_online_recog_config_set_num_threads(void *h, int32_t v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.num_threads = v;
}
void sherpa_shim_online_recog_config_set_debug(void *h, int32_t v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.debug = v;
}
void sherpa_shim_online_recog_config_set_provider(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.provider, v);
}
void sherpa_shim_online_recog_config_set_feat_sample_rate(void *h, int32_t v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->feat_config.sample_rate = v;
}
void sherpa_shim_online_recog_config_set_feat_feature_dim(void *h, int32_t v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->feat_config.feature_dim = v;
}
void sherpa_shim_online_recog_config_set_decoding_method(void *h, const char *v) {
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->decoding_method, v);
}
void sherpa_shim_online_recog_config_set_enable_endpoint(void *h, int32_t v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->enable_endpoint = v;
}
void sherpa_shim_online_recog_config_set_rule1_min_trailing_silence(void *h, float v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->rule1_min_trailing_silence = v;
}
void sherpa_shim_online_recog_config_set_rule2_min_trailing_silence(void *h, float v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->rule2_min_trailing_silence = v;
}
void sherpa_shim_online_recog_config_set_rule3_min_utterance_length(void *h, float v) {
((SherpaOnnxOnlineRecognizerConfig *)h)->rule3_min_utterance_length = v;
}
void *sherpa_shim_create_online_recognizer(void *h) {
return (void *)SherpaOnnxCreateOnlineRecognizer(
(const SherpaOnnxOnlineRecognizerConfig *)h);
}
// ==================================================================
// Result-struct accessors
// ==================================================================
int32_t sherpa_shim_wave_sample_rate(const void *h) {
return ((const SherpaOnnxWave *)h)->sample_rate;
}
int32_t sherpa_shim_wave_num_samples(const void *h) {
return ((const SherpaOnnxWave *)h)->num_samples;
}
const float *sherpa_shim_wave_samples(const void *h) {
return ((const SherpaOnnxWave *)h)->samples;
}
const char *sherpa_shim_offline_result_text(const void *h) {
return ((const SherpaOnnxOfflineRecognizerResult *)h)->text;
}
const char *sherpa_shim_online_result_text(const void *h) {
return ((const SherpaOnnxOnlineRecognizerResult *)h)->text;
}
int32_t sherpa_shim_generated_audio_sample_rate(const void *h) {
return ((const SherpaOnnxGeneratedAudio *)h)->sample_rate;
}
int32_t sherpa_shim_generated_audio_n(const void *h) {
return ((const SherpaOnnxGeneratedAudio *)h)->n;
}
const float *sherpa_shim_generated_audio_samples(const void *h) {
return ((const SherpaOnnxGeneratedAudio *)h)->samples;
}
int32_t sherpa_shim_speech_segment_start(const void *h) {
return ((const SherpaOnnxSpeechSegment *)h)->start;
}
int32_t sherpa_shim_speech_segment_n(const void *h) {
return ((const SherpaOnnxSpeechSegment *)h)->n;
}
// ==================================================================
// TTS streaming callback trampoline
// ==================================================================
void *sherpa_shim_tts_generate_with_callback(
void *tts, const char *text, int32_t sid, float speed,
uintptr_t callback_ptr, uintptr_t user_data) {
SherpaOnnxGeneratedAudioCallbackWithArg cb =
(SherpaOnnxGeneratedAudioCallbackWithArg)callback_ptr;
return (void *)SherpaOnnxOfflineTtsGenerateWithCallbackWithArg(
(const SherpaOnnxOfflineTts *)tts, text, sid, speed, cb,
(void *)user_data);
}

View File

@@ -0,0 +1,129 @@
#ifndef LOCALAI_SHERPA_ONNX_SHIM_H
#define LOCALAI_SHERPA_ONNX_SHIM_H
#include <stdint.h>
// libsherpa-shim: purego-friendly wrapper around sherpa-onnx's C API.
// Purego can't access C struct fields and can't route C callbacks to Go
// funcs directly. Every function here is a fixed-signature trampoline
// that replaces one field read/write or callback handoff that the Go
// backend would otherwise have to do through cgo.
//
// String lifetime: setters strdup; _free walks every owned string and
// frees it. Callers may discard their input buffers the moment a setter
// returns.
//
// Opaque handles are `void *` in both directions. Nothing here holds a
// reference across calls except config handles (freed via _free) and
// sherpa-allocated results (freed via sherpa's own Destroy* entry
// points, which Go calls through purego pass-through).
#ifdef __cplusplus
extern "C" {
#endif
// --- VAD config -----------------------------------------------------
void *sherpa_shim_vad_config_new(void);
void sherpa_shim_vad_config_free(void *cfg);
void sherpa_shim_vad_config_set_silero_model(void *cfg, const char *path);
void sherpa_shim_vad_config_set_silero_threshold(void *cfg, float v);
void sherpa_shim_vad_config_set_silero_min_silence_duration(void *cfg, float v);
void sherpa_shim_vad_config_set_silero_min_speech_duration(void *cfg, float v);
void sherpa_shim_vad_config_set_silero_window_size(void *cfg, int32_t v);
void sherpa_shim_vad_config_set_silero_max_speech_duration(void *cfg, float v);
void sherpa_shim_vad_config_set_sample_rate(void *cfg, int32_t v);
void sherpa_shim_vad_config_set_num_threads(void *cfg, int32_t v);
void sherpa_shim_vad_config_set_provider(void *cfg, const char *v);
void sherpa_shim_vad_config_set_debug(void *cfg, int32_t v);
void *sherpa_shim_create_vad(void *cfg, float buffer_size_seconds);
// --- Offline TTS config (VITS path — the only TTS family the backend uses) ---
void *sherpa_shim_tts_config_new(void);
void sherpa_shim_tts_config_free(void *cfg);
void sherpa_shim_tts_config_set_vits_model(void *cfg, const char *v);
void sherpa_shim_tts_config_set_vits_tokens(void *cfg, const char *v);
void sherpa_shim_tts_config_set_vits_lexicon(void *cfg, const char *v);
void sherpa_shim_tts_config_set_vits_data_dir(void *cfg, const char *v);
void sherpa_shim_tts_config_set_vits_noise_scale(void *cfg, float v);
void sherpa_shim_tts_config_set_vits_noise_scale_w(void *cfg, float v);
void sherpa_shim_tts_config_set_vits_length_scale(void *cfg, float v);
void sherpa_shim_tts_config_set_num_threads(void *cfg, int32_t v);
void sherpa_shim_tts_config_set_debug(void *cfg, int32_t v);
void sherpa_shim_tts_config_set_provider(void *cfg, const char *v);
void sherpa_shim_tts_config_set_max_num_sentences(void *cfg, int32_t v);
void *sherpa_shim_create_offline_tts(void *cfg);
// --- Offline recognizer config (Whisper / Paraformer / SenseVoice / Omnilingual) ---
void *sherpa_shim_offline_recog_config_new(void);
void sherpa_shim_offline_recog_config_free(void *cfg);
void sherpa_shim_offline_recog_config_set_num_threads(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_debug(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_provider(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_tokens(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_feat_sample_rate(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_feat_feature_dim(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_decoding_method(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_whisper_encoder(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_whisper_decoder(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_whisper_language(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_whisper_task(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_whisper_tail_paddings(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_paraformer_model(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_sense_voice_model(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_sense_voice_language(void *cfg, const char *v);
void sherpa_shim_offline_recog_config_set_sense_voice_use_itn(void *cfg, int32_t v);
void sherpa_shim_offline_recog_config_set_omnilingual_model(void *cfg, const char *v);
void *sherpa_shim_create_offline_recognizer(void *cfg);
// --- Online recognizer config (streaming zipformer transducer) ---
void *sherpa_shim_online_recog_config_new(void);
void sherpa_shim_online_recog_config_free(void *cfg);
void sherpa_shim_online_recog_config_set_transducer_encoder(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_transducer_decoder(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_transducer_joiner(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_tokens(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_num_threads(void *cfg, int32_t v);
void sherpa_shim_online_recog_config_set_debug(void *cfg, int32_t v);
void sherpa_shim_online_recog_config_set_provider(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_feat_sample_rate(void *cfg, int32_t v);
void sherpa_shim_online_recog_config_set_feat_feature_dim(void *cfg, int32_t v);
void sherpa_shim_online_recog_config_set_decoding_method(void *cfg, const char *v);
void sherpa_shim_online_recog_config_set_enable_endpoint(void *cfg, int32_t v);
void sherpa_shim_online_recog_config_set_rule1_min_trailing_silence(void *cfg, float v);
void sherpa_shim_online_recog_config_set_rule2_min_trailing_silence(void *cfg, float v);
void sherpa_shim_online_recog_config_set_rule3_min_utterance_length(void *cfg, float v);
void *sherpa_shim_create_online_recognizer(void *cfg);
// --- Result accessors (sherpa-allocated; caller destroys via sherpa's own Destroy*) ---
int32_t sherpa_shim_wave_sample_rate(const void *wave);
int32_t sherpa_shim_wave_num_samples(const void *wave);
const float *sherpa_shim_wave_samples(const void *wave);
const char *sherpa_shim_offline_result_text(const void *result);
const char *sherpa_shim_online_result_text(const void *result);
int32_t sherpa_shim_generated_audio_sample_rate(const void *audio);
int32_t sherpa_shim_generated_audio_n(const void *audio);
const float *sherpa_shim_generated_audio_samples(const void *audio);
int32_t sherpa_shim_speech_segment_start(const void *seg);
int32_t sherpa_shim_speech_segment_n(const void *seg);
// --- TTS streaming callback trampoline -----------------------------
// Replaces the //export sherpaTtsGoCallback + callbacks.c bridge pattern.
// `callback_ptr` is the C-callable function pointer returned by
// purego.NewCallback. `user_data` is an integer the Go side uses to
// look up its state (sync.Map keyed by uint64).
//
// Returns the sherpa-allocated SherpaOnnxGeneratedAudio. Destroy with
// SherpaOnnxDestroyOfflineTtsGeneratedAudio (callable directly from
// Go via purego).
void *sherpa_shim_tts_generate_with_callback(
void *tts, const char *text, int32_t sid, float speed,
uintptr_t callback_ptr, uintptr_t user_data);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -0,0 +1,23 @@
package main
import (
"flag"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
var (
addr = flag.String("addr", "localhost:50051", "the address to connect to")
)
func main() {
flag.Parse()
if err := loadSherpaLibs(); err != nil {
panic(err)
}
if err := grpc.StartServer(*addr, &SherpaBackend{}); err != nil {
panic(err)
}
}

View File

@@ -0,0 +1,51 @@
#!/bin/bash
set -e
CURDIR=$(dirname "$(realpath $0)")
REPO_ROOT="${CURDIR}/../../.."
mkdir -p $CURDIR/package/lib
cp -avf $CURDIR/sherpa-onnx $CURDIR/package/
cp -avf $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
echo "Detected ARM64 architecture, copying ARM64 libraries..."
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1
fi
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
if [ -f "$GPU_LIB_SCRIPT" ]; then
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
package_gpu_libs
fi
echo "Packaging completed successfully"
ls -liah $CURDIR/package/
ls -liah $CURDIR/package/lib/

13
backend/go/sherpa-onnx/run.sh Executable file
View File

@@ -0,0 +1,13 @@
#!/bin/bash
set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/sherpa-onnx "$@"
fi
exec $CURDIR/sherpa-onnx "$@"

12
backend/go/sherpa-onnx/test.sh Executable file
View File

@@ -0,0 +1,12 @@
#!/bin/bash
# Unit tests for the sherpa-onnx backend. Exercises error-path and
# dispatch logic via SherpaBackend directly (no gRPC). Integration
# coverage (gRPC TTS / streaming ASR / realtime pipeline) lives in
# tests/e2e-backends and tests/e2e and runs against the Docker image.
set -e
CURDIR=$(dirname "$(realpath $0)")
cd "$CURDIR"
PACKAGES=$(go list ./... | grep -v /sources/)
go test -v -timeout 60s $PACKAGES

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=c97702e1057c2fe13a7074cd9069cb9dd6edc1bf
STABLEDIFFUSION_GGML_VERSION?=b8bdffc19962be7e5a84bfefeb2e31bd885b571a
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -139,7 +139,10 @@ func (w *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptR
// segment start/end conversion factor taken from https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L895
s := CppGetSegmentStart(i) * (10000000)
t := CppGetSegmentEnd(i) * (10000000)
txt := strings.Clone(CppGetSegmentText(i))
// whisper.cpp can emit bytes that aren't valid UTF-8 (e.g. a multibyte
// codepoint split across token boundaries); protobuf string fields
// reject those at marshal time. Scrub before the value escapes cgo.
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
tokens := make([]int32, CppNTokens(i))
if opts.Diarize && CppGetSegmentSpeakerTurnNext(i) {

View File

@@ -263,6 +263,8 @@
amd: "rocm-vllm"
intel: "intel-vllm"
nvidia-cuda-12: "cuda12-vllm"
nvidia-cuda-13: "cuda13-vllm"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
cpu: "cpu-vllm"
- &sglang
name: "sglang"
@@ -285,6 +287,7 @@
amd: "rocm-sglang"
intel: "intel-sglang"
nvidia-cuda-12: "cuda12-sglang"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang"
cpu: "cpu-sglang"
- &vllm-omni
name: "vllm-omni"
@@ -311,6 +314,8 @@
nvidia: "cuda12-vllm-omni"
amd: "rocm-vllm-omni"
nvidia-cuda-12: "cuda12-vllm-omni"
nvidia-cuda-13: "cuda13-vllm-omni"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni"
- &mlx
name: "mlx"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
@@ -1006,6 +1011,23 @@
nvidia: "cuda12-neutts"
amd: "rocm-neutts"
nvidia-cuda-12: "cuda12-neutts"
- &sherpa-onnx
name: "sherpa-onnx"
alias: "sherpa-onnx"
urls:
- https://k2-fsa.github.io/sherpa/onnx/
description: |
Sherpa-ONNX backend for text-to-speech (VITS, Matcha, Kokoro), speech-to-text (Whisper, Paraformer, SenseVoice, Omnilingual ASR CTC), and voice activity detection via ONNX Runtime.
Supports multi-speaker voices, 1600+ language ASR, and GPU acceleration.
tags:
- text-to-speech
- TTS
- speech-to-text
- ASR
capabilities:
default: "cpu-sherpa-onnx"
nvidia: "cuda12-sherpa-onnx"
nvidia-cuda-12: "cuda12-sherpa-onnx"
- !!merge <<: *neutts
name: "neutts-development"
capabilities:
@@ -1591,6 +1613,20 @@
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant
## whisper
- !!merge <<: *whispercpp
name: "whisper-development"
capabilities:
default: "cpu-whisper-development"
nvidia: "cuda12-whisper-development"
intel: "intel-sycl-f16-whisper-development"
metal: "metal-whisper-development"
amd: "rocm-whisper-development"
vulkan: "vulkan-whisper-development"
nvidia-l4t: "nvidia-l4t-arm64-whisper-development"
nvidia-cuda-13: "cuda13-whisper-development"
nvidia-cuda-12: "cuda12-whisper-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-whisper-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-whisper-development"
- !!merge <<: *whispercpp
name: "nvidia-l4t-arm64-whisper"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-whisper"
@@ -1797,12 +1833,25 @@
nvidia: "cuda12-vllm-development"
amd: "rocm-vllm-development"
intel: "intel-vllm-development"
nvidia-cuda-12: "cuda12-vllm-development"
nvidia-cuda-13: "cuda13-vllm-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
cpu: "cpu-vllm-development"
- !!merge <<: *vllm
name: "cuda12-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm
- !!merge <<: *vllm
name: "cuda13-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm
- !!merge <<: *vllm
name: "cuda13-nvidia-l4t-arm64-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm
- !!merge <<: *vllm
name: "rocm-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
@@ -1823,6 +1872,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm
- !!merge <<: *vllm
name: "cuda13-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm
- !!merge <<: *vllm
name: "cuda13-nvidia-l4t-arm64-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm
- !!merge <<: *vllm
name: "rocm-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
@@ -1845,12 +1904,19 @@
nvidia: "cuda12-sglang-development"
amd: "rocm-sglang-development"
intel: "intel-sglang-development"
nvidia-cuda-12: "cuda12-sglang-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang-development"
cpu: "cpu-sglang-development"
- !!merge <<: *sglang
name: "cuda12-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sglang"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sglang
- !!merge <<: *sglang
name: "cuda13-nvidia-l4t-arm64-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang
- !!merge <<: *sglang
name: "rocm-sglang"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-sglang"
@@ -1871,6 +1937,11 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sglang"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-sglang
- !!merge <<: *sglang
name: "cuda13-nvidia-l4t-arm64-sglang-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-sglang"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-sglang
- !!merge <<: *sglang
name: "rocm-sglang-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-sglang"
@@ -1893,11 +1964,23 @@
nvidia: "cuda12-vllm-omni-development"
amd: "rocm-vllm-omni-development"
nvidia-cuda-12: "cuda12-vllm-omni-development"
nvidia-cuda-13: "cuda13-vllm-omni-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
- !!merge <<: *vllm-omni
name: "cuda12-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm-omni"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm-omni"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-nvidia-l4t-arm64-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni
- !!merge <<: *vllm-omni
name: "rocm-vllm-omni"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm-omni"
@@ -1908,6 +1991,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm-omni"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm-omni"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm-omni
- !!merge <<: *vllm-omni
name: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni
- !!merge <<: *vllm-omni
name: "rocm-vllm-omni-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm-omni"
@@ -3834,3 +3927,30 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-speaker-recognition"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-speaker-recognition
## sherpa-onnx
- !!merge <<: *sherpa-onnx
name: "sherpa-onnx-development"
capabilities:
default: "cpu-sherpa-onnx-development"
nvidia: "cuda12-sherpa-onnx-development"
nvidia-cuda-12: "cuda12-sherpa-onnx-development"
- !!merge <<: *sherpa-onnx
name: "cpu-sherpa-onnx"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-sherpa-onnx"
mirrors:
- localai/localai-backends:latest-cpu-sherpa-onnx
- !!merge <<: *sherpa-onnx
name: "cpu-sherpa-onnx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-sherpa-onnx"
mirrors:
- localai/localai-backends:master-cpu-sherpa-onnx
- !!merge <<: *sherpa-onnx
name: "cuda12-sherpa-onnx"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sherpa-onnx"
mirrors:
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sherpa-onnx
- !!merge <<: *sherpa-onnx
name: "cuda12-sherpa-onnx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sherpa-onnx"
mirrors:
- localai/localai-backends:master-gpu-nvidia-cuda-12-sherpa-onnx

View File

@@ -173,6 +173,30 @@ def _build_antispoofer(options: dict[str, str], model_dir: str | None) -> Antisp
# ─── InsightFaceEngine ────────────────────────────────────────────────
# Canonical ONNX manifest for each upstream insightface pack (v0.7 release
# at github.com/deepinsight/insightface/releases). LocalAI's gallery extracts
# these zips flat into the models directory, so when multiple packs or other
# backends drop their own ONNX files alongside, the glob-the-directory
# approach picks up foreign files and insightface's model_zoo.get_model()
# raises IndexError trying to index `input_shape[2]` on a tensor that isn't
# shaped like a face model. The manifest lets us pre-filter to only the
# files that actually belong to the requested pack — deterministic, correct
# pack choice, no crashes on neighbour ONNX files.
_KNOWN_PACK_MANIFESTS: dict[str, frozenset[str]] = {
"buffalo_l": frozenset({
"det_10g.onnx",
"w600k_r50.onnx",
"genderage.onnx",
"2d106det.onnx",
"1k3d68.onnx",
}),
"buffalo_sc": frozenset({
"det_500m.onnx",
"w600k_mbf.onnx",
}),
}
class InsightFaceEngine:
"""Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
@@ -222,6 +246,21 @@ class InsightFaceEngine:
)
onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
# When the pack extracts flat into a shared models directory it
# mixes with ONNX files from other backends (opencv face engine,
# MiniFASNet antispoof, WeSpeaker voice embedding, other buffalo
# packs installed earlier). Feeding those into model_zoo.get_model()
# blows up inside insightface's router — it assumes a 4-D NCHW
# input and indexes `input_shape[2]` on tensors that aren't shaped
# like a face model, raising IndexError. For the upstream packs we
# know the exact ONNX manifest; scoping to it makes the load
# deterministic (without it, det_10g.onnx from buffalo_l sorts
# before det_500m.onnx from buffalo_sc and silently wins).
manifest = _KNOWN_PACK_MANIFESTS.get(self.model_pack)
if manifest is not None:
scoped = [f for f in onnx_files if os.path.basename(f) in manifest]
if scoped:
onnx_files = scoped
if not onnx_files:
raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
@@ -231,14 +270,31 @@ class InsightFaceEngine:
self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
self.models = {}
skipped: list[tuple[str, str]] = []
for onnx_file in onnx_files:
m = model_zoo.get_model(onnx_file, providers=self._providers)
try:
m = model_zoo.get_model(onnx_file, providers=self._providers)
except Exception as err:
# Foreign ONNX (wrong rank/shape, non-insightface model) —
# older insightface versions raise IndexError / ValueError
# instead of returning None. Keep loading the rest.
skipped.append((os.path.basename(onnx_file), str(err)))
continue
if m is None:
skipped.append((os.path.basename(onnx_file), "unknown taskname"))
continue
# First occurrence of each taskname wins (matches FaceAnalysis).
if m.taskname not in self.models:
self.models[m.taskname] = m
if skipped:
import sys
print(
f"[insightface] skipped {len(skipped)} non-pack ONNX file(s) in {pack_dir}: "
+ ", ".join(f"{n} ({why})" for n, why in skipped),
file=sys.stderr,
)
if "detection" not in self.models:
raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
self.det_model = self.models["detection"]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cpu]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda12]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda13]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda12]

View File

@@ -1,2 +1,2 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
mlx[cuda13]

View File

@@ -1 +1 @@
git+https://github.com/Blaizzy/mlx-vlm
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4

View File

@@ -23,6 +23,19 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
# wheel resolves cleanly. unsafe-best-match is required because the
# jetson-ai-lab index lists transitive deps (e.g. decord) at older
# versions only — without it uv refuses to fall through to PyPI for a
# compatible wheel and resolution fails.
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.

View File

@@ -0,0 +1,12 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
# Drop the [all] extra: it pulls outlines/decord, and decord has no
# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
# only legacy cp35-cp37). With [all] uv backtracks through versions
# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
# so uv can't silently downgrade if a future resolution misfires.
sglang>=0.5.0

View File

@@ -317,8 +317,23 @@ class OnnxDirectEngine:
else:
provider_list = ["CPUExecutionProvider"]
self._session = ort.InferenceSession(onnx_path, providers=provider_list)
self._input_name = self._session.get_inputs()[0].name
input_meta = self._session.get_inputs()[0]
self._input_name = input_meta.name
# Pre-exported speaker encoders come in two shapes:
# rank-2 [batch, samples] — some 3D-Speaker exports feed raw waveform.
# rank-3 [batch, frames, n_mels] — WeSpeaker and most Kaldi-lineage encoders
# expect pre-computed Kaldi FBank features.
# We detect this at load time and branch in embed(), because feeding raw audio
# into a rank-3 graph is exactly what triggered
# "Invalid rank for input: feats Got: 2 Expected: 3".
self._input_rank = len(input_meta.shape) if input_meta.shape is not None else 2
self._expected_sr = int(options.get("sample_rate", "16000"))
self._fbank_mels = int(options.get("fbank_num_mel_bins", "80"))
self._fbank_frame_length_ms = float(options.get("fbank_frame_length_ms", "25"))
self._fbank_frame_shift_ms = float(options.get("fbank_frame_shift_ms", "10"))
# Per-utterance cepstral mean normalisation — on for WeSpeaker by default,
# toggleable for encoders that expect raw FBank.
self._fbank_cmn = options.get("fbank_cmn", "true").lower() in ("1", "true", "yes")
self._analysis = AnalysisHead(options)
def _load_waveform(self, path: str):
@@ -344,11 +359,37 @@ class OnnxDirectEngine:
import numpy as np
audio = self._load_waveform(audio_path)
feed = audio.reshape(1, -1)
if self._input_rank >= 3:
feats = self._extract_fbank(audio) # [frames, n_mels]
feed = feats[np.newaxis, :, :] # [1, frames, n_mels]
else:
feed = audio.reshape(1, -1) # [1, samples]
out = self._session.run(None, {self._input_name: feed})
vec = np.asarray(out[0]).reshape(-1)
return [float(x) for x in vec]
def _extract_fbank(self, audio):
"""Compute Kaldi-style 80-dim FBank features for speaker encoders that
expect pre-featurised input (WeSpeaker, most 3D-Speaker exports).
torchaudio is already a backend dependency for SpeechBrain — no new
package required."""
import numpy as np
import torch # type: ignore
import torchaudio.compliance.kaldi as kaldi # type: ignore
tensor = torch.from_numpy(audio).unsqueeze(0) # [1, samples]
feats = kaldi.fbank(
tensor,
sample_frequency=self._expected_sr,
num_mel_bins=self._fbank_mels,
frame_length=self._fbank_frame_length_ms,
frame_shift=self._fbank_frame_shift_ms,
dither=0.0,
) # [frames, n_mels]
if self._fbank_cmn:
feats = feats - feats.mean(dim=0, keepdim=True)
return feats.numpy().astype(np.float32)
def compare(self, audio1: str, audio2: str) -> float:
return _cosine_distance(self.embed(audio1), self.embed(audio2))

View File

@@ -12,11 +12,15 @@ else
source $backend_dir/../common/libbackend.sh
fi
# Handle l4t build profiles (Python 3.12, pip fallback) if needed
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
# unsafe-best-match is required on l4t13 because the jetson-ai-lab index
# lists transitive deps at limited versions — without it uv pins to the
# first matching index and fails to resolve a compatible wheel from PyPI.
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match"
fi
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
@@ -26,7 +30,11 @@ fi
# Install base requirements first
installRequirements
# Install vllm based on build type
# Install vllm based on build type. vllm-omni tracks vllm master from
# source (cloned below) so we leave the upstream vllm dependency unpinned
# — vllm 0.19+ ships cu130 wheels by default, which is what we want for
# cublas13. Older cuda12/rocm/cpu paths still resolve a compatible wheel
# from the relevant channel.
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
# ROCm
if [ "x${USE_PIP}" == "xtrue" ]; then
@@ -34,8 +42,26 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
else
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
fi
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
# JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel
# at jetson-ai-lab. Version is unpinned: the index ships whatever build
# matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through
# to PyPI for transitive deps not present on the jetson-ai-lab index.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
else
uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
fi
elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
# vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm --torch-backend=auto
else
uv pip install vllm --torch-backend=auto
fi
elif [ "x${BUILD_TYPE}" == "xcublas" ] || [ "x${BUILD_TYPE}" == "x" ]; then
# CUDA (default) or CPU
# cuda12 / CPU — keep the 0.14.0 pin for compatibility with the existing
# cuda12 vllm-omni image; bumping should be its own change.
if [ "x${USE_PIP}" == "xtrue" ]; then
pip install vllm==0.14.0 --torch-backend=auto
else

View File

@@ -0,0 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cu130
accelerate
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,13 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
bitsandbytes
flash-attn
diffusers
librosa
soundfile
pillow
numpy

View File

@@ -32,6 +32,22 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. unsafe-best-match
# is required because the jetson-ai-lab index lists transitive deps at
# limited versions — without it uv pins to the first matching index and
# fails to resolve a compatible wheel from PyPI.
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
USE_PIP=true
fi
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
# requirements-cpu-after.txt and compiles vllm locally against the host's
# actual CPU. Not used by default because it takes ~30-40 minutes, but

View File

@@ -1,2 +1,9 @@
https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
# flash-attn wheels are ABI-tied to a specific torch version. vllm forces
# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships
# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently
# broken when vllm upgrades torch during install, producing an undefined
# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers
# attention, and rotary_embedding/common.py guards the flash_attn import
# with find_spec(), so skipping flash-attn is safe and the only stable
# choice until upstream ships a torch-2.10 wheel.
vllm

View File

@@ -1,4 +1,4 @@
accelerate
torch==2.7.0
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,2 @@
--extra-index-url https://download.pytorch.org/whl/cu130
vllm

View File

@@ -0,0 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cu130
accelerate
torch
transformers
bitsandbytes

View File

@@ -0,0 +1,2 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
vllm

View File

@@ -0,0 +1,8 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
accelerate
torch
torchvision
torchaudio
transformers
bitsandbytes
flash-attn

View File

@@ -1867,9 +1867,9 @@ dependencies = [
[[package]]
name = "rustls-webpki"
version = "0.103.10"
version = "0.103.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
dependencies = [
"ring",
"rustls-pki-types",

View File

@@ -81,18 +81,30 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
// The resolver closes over the ModelLoader so the Registry stays
// decoupled from loader plumbing; swapping in a postgres-backed
// implementation later is a single construction change here.
//
// `faceStoreName` is the default namespace passed to StoreBackend when
// the request doesn't override it. Face and voice MUST use distinct
// namespaces — the local-store gRPC surface rejects mixed dimensions
// inside one namespace ("Try to add key with length N when existing
// length is M"). ArcFace buffalo_l produces 512-dim embeddings while
// ECAPA-TDNN produces 192-dim; enrolling one after the other into a
// shared namespace is exactly how we hit that error.
const (
faceStoreName = "localai-face-biometrics"
voiceStoreName = "localai-voice-biometrics"
)
faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
return corebackend.StoreBackend(ml, appConfig, storeName, "")
}
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, faceStoreName, faceEmbeddingDim)
// Voice (speaker) recognition registry — same plumbing, separate
// registry so embedding spaces stay isolated (a face vector and a
// speaker vector are not comparable).
// namespace so embedding spaces stay isolated (a face vector and a
// speaker vector are not comparable and differ in dimensionality).
voiceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
return corebackend.StoreBackend(ml, appConfig, storeName, "")
}
app.voiceRegistry = voicerecognition.NewStoreRegistry(voiceStoreResolver, "", voiceEmbeddingDim)
app.voiceRegistry = voicerecognition.NewStoreRegistry(voiceStoreResolver, voiceStoreName, voiceEmbeddingDim)
return app
}

View File

@@ -242,6 +242,12 @@ func New(opts ...config.AppOption) (*Application, error) {
bmFn := func() galleryop.BackendManager { return application.GalleryService().BackendManager() }
uc := NewUpgradeChecker(options, application.ModelLoader(), application.distributedDB(), bmFn)
application.upgradeChecker = uc
// Refresh the upgrade cache the moment a backend op finishes — otherwise
// the UI keeps showing a just-upgraded backend as upgradeable until the
// next 6-hour tick. TriggerCheck is non-blocking.
if gs := application.GalleryService(); gs != nil {
gs.OnBackendOpCompleted = uc.TriggerCheck
}
go uc.Run(options.Context)
}

View File

@@ -11,8 +11,17 @@ func StoreBackend(sl *model.ModelLoader, appConfig *config.ApplicationConfig, st
if backend == "" {
backend = model.LocalStoreBackend
}
// ModelLoader caches backend processes by `modelID`, not by the `model`
// passed via WithModel. Without a distinct modelID, every StoreBackend
// call collapses to the same `modelID=""` cache slot — face (512-D) and
// voice (192-D) biometrics would then share the same local-store process
// and the second enrollment would fail with
// Try to add key with length N when existing length is M
// Use the store namespace as modelID so each namespace gets its own
// process instance and its own in-memory Store{}.
sc := []model.Option{
model.WithBackendString(backend),
model.WithModelID(storeName),
model.WithModel(storeName),
}

View File

@@ -90,6 +90,14 @@ type WorkerCMD struct {
RegistrationToken string `env:"LOCALAI_REGISTRATION_TOKEN" help:"Token for authenticating with the frontend" group:"registration"`
HeartbeatInterval string `env:"LOCALAI_HEARTBEAT_INTERVAL" default:"10s" help:"Interval between heartbeats" group:"registration"`
NodeLabels string `env:"LOCALAI_NODE_LABELS" help:"Comma-separated key=value labels for this node (e.g. tier=fast,gpu=a100)" group:"registration"`
// MaxReplicasPerModel caps how many replicas of any one model can run on
// this worker concurrently. Default 1 = historical single-replica
// behavior. Set higher when a node has enough VRAM to host multiple
// copies of the same model (e.g. a fat 128 GiB box running 4× of a
// 24 GiB model for throughput). The auto-label `node.replica-slots=N`
// is published so model schedulers can target high-capacity nodes via
// the existing label selector.
MaxReplicasPerModel int `env:"LOCALAI_MAX_REPLICAS_PER_MODEL" default:"1" help:"Max replicas of any single model on this worker. Default 1 preserves single-replica behavior; set higher to allow stacking replicas on a fat node." group:"registration"`
// NATS (required)
NatsURL string `env:"LOCALAI_NATS_URL" required:"" help:"NATS server URL" group:"distributed"`
@@ -567,22 +575,35 @@ func (s *backendSupervisor) getAddr(backend string) string {
return ""
}
// buildProcessKey is the supervisor's stable identifier for a backend gRPC
// process. It includes the replica index so the same model can run multiple
// processes on a worker simultaneously without colliding on the same map slot
// or port. The "#N" suffix is purely internal — the controller never reads it.
func buildProcessKey(modelID, backend string, replicaIndex int) string {
base := modelID
if base == "" {
base = backend
}
return fmt.Sprintf("%s#%d", base, replicaIndex)
}
// installBackend handles the backend.install flow:
// 1. If already running for this model, return existing address
// 1. If already running for this (model, replica) slot, return existing address
// 2. Install backend from gallery (if not already installed)
// 3. Find backend binary
// 4. Start gRPC process on a new port
// Returns the gRPC address of the backend process.
//
// ProcessKey includes the replica index so a worker with MaxReplicasPerModel>1
// can host multiple processes for the same model on distinct ports. Old
// controllers (no replica_index in the request) implicitly target replica 0,
// which preserves single-replica behavior.
func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest) (string, error) {
// Process key: use ModelID if provided (per-model process), else backend name
processKey := req.ModelID
if processKey == "" {
processKey = req.Backend
}
processKey := buildProcessKey(req.ModelID, req.Backend, int(req.ReplicaIndex))
// If already running for this model, return its address
// If already running for this model+replica, return its address
if addr := s.getAddr(processKey); addr != "" {
xlog.Info("Backend already running for model", "backend", req.Backend, "model", req.ModelID, "addr", addr)
xlog.Info("Backend already running for model replica", "backend", req.Backend, "model", req.ModelID, "replica", req.ReplicaIndex, "addr", addr)
return addr, nil
}
@@ -886,13 +907,18 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
totalVRAM, _ := xsysinfo.TotalAvailableVRAM()
gpuVendor, _ := xsysinfo.DetectGPUVendor()
maxReplicas := cmd.MaxReplicasPerModel
if maxReplicas < 1 {
maxReplicas = 1
}
body := map[string]any{
"name": nodeName,
"address": cmd.advertiseAddr(),
"http_address": cmd.advertiseHTTPAddr(),
"total_vram": totalVRAM,
"available_vram": totalVRAM, // initially all VRAM is available
"gpu_vendor": gpuVendor,
"name": nodeName,
"address": cmd.advertiseAddr(),
"http_address": cmd.advertiseHTTPAddr(),
"total_vram": totalVRAM,
"available_vram": totalVRAM, // initially all VRAM is available
"gpu_vendor": gpuVendor,
"max_replicas_per_model": maxReplicas,
}
// If no GPU detected, report system RAM so the scheduler/UI has capacity info
@@ -906,39 +932,40 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
body["token"] = cmd.RegistrationToken
}
// Parse and add static node labels
// Parse and add static node labels. Always include the auto-label
// `node.replica-slots=N` so AND-selectors in ModelSchedulingConfig can
// target high-capacity nodes (e.g. {"node.replica-slots":"4"}).
labels := make(map[string]string)
if cmd.NodeLabels != "" {
labels := make(map[string]string)
for _, pair := range strings.Split(cmd.NodeLabels, ",") {
pair = strings.TrimSpace(pair)
if k, v, ok := strings.Cut(pair, "="); ok {
labels[strings.TrimSpace(k)] = strings.TrimSpace(v)
}
}
if len(labels) > 0 {
body["labels"] = labels
}
}
labels["node.replica-slots"] = strconv.Itoa(maxReplicas)
body["labels"] = labels
return body
}
// heartbeatBody returns the current VRAM/RAM stats for heartbeat payloads.
//
// When aggregate VRAM usage is unknown (no GPU, or temporary detection
// failure), we deliberately OMIT available_vram so the frontend keeps its
// last good value — overwriting with 0 makes the UI show the node as "fully
// used", while reporting total-as-available lies to the scheduler about
// free capacity.
func (cmd *WorkerCMD) heartbeatBody() map[string]any {
var availVRAM uint64
body := map[string]any{}
aggregate := xsysinfo.GetGPUAggregateInfo()
if aggregate.TotalVRAM > 0 {
availVRAM = aggregate.FreeVRAM
} else {
// Fallback: report total as available (no usage tracking possible)
availVRAM, _ = xsysinfo.TotalAvailableVRAM()
body["available_vram"] = aggregate.FreeVRAM
}
body := map[string]any{
"available_vram": availVRAM,
}
// If no GPU, report system RAM usage instead
// CPU-only workers (or workers that lost GPU visibility momentarily):
// report system RAM so the scheduler still has capacity info.
if aggregate.TotalVRAM == 0 {
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
body["available_ram"] = ramInfo.Available

View File

@@ -0,0 +1,70 @@
package cli
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("Worker per-replica process keying", func() {
Describe("buildProcessKey", func() {
// Pin the supervisor's keying contract: distinct replica indexes for
// the same modelID produce distinct process keys, so the supervisor
// map can hold multiple processes for one model. Dropping the suffix
// would re-introduce the original flap (one model, one slot, churn).
DescribeTable("produces stable, distinct keys",
func(modelID, backend string, replica int, want string) {
Expect(buildProcessKey(modelID, backend, replica)).To(Equal(want))
},
Entry("modelID present, replica 0", "Qwen3-35B", "llama-cpp", 0, "Qwen3-35B#0"),
Entry("modelID present, replica 1", "Qwen3-35B", "llama-cpp", 1, "Qwen3-35B#1"),
Entry("falls back to backend when modelID empty", "", "llama-cpp", 0, "llama-cpp#0"),
Entry("backend fallback with replica 2", "", "llama-cpp", 2, "llama-cpp#2"),
)
It("makes replicas distinguishable", func() {
r0 := buildProcessKey("model-a", "llama-cpp", 0)
r1 := buildProcessKey("model-a", "llama-cpp", 1)
Expect(r0).ToNot(Equal(r1), "replicas of the same model must produce distinct keys")
})
})
Describe("registrationBody", func() {
It("includes max_replicas_per_model and the auto-label", func() {
cmd := &WorkerCMD{
Addr: "worker.example.com:50051",
MaxReplicasPerModel: 4,
}
body := cmd.registrationBody()
Expect(body).To(HaveKey("max_replicas_per_model"))
Expect(body["max_replicas_per_model"]).To(Equal(4))
labels, ok := body["labels"].(map[string]string)
Expect(ok).To(BeTrue(), "labels must be present so selectors can target the slot count")
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "4"))
})
It("coerces zero/unset MaxReplicasPerModel to 1", func() {
cmd := &WorkerCMD{Addr: "worker.example.com:50051"}
body := cmd.registrationBody()
Expect(body["max_replicas_per_model"]).To(Equal(1),
"unset must default to single-replica behavior, not capacity 0")
labels := body["labels"].(map[string]string)
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "1"))
})
It("preserves user-provided labels alongside the auto-label", func() {
cmd := &WorkerCMD{
Addr: "worker.example.com:50051",
MaxReplicasPerModel: 2,
NodeLabels: "tier=fast,gpu=a100",
}
body := cmd.registrationBody()
labels := body["labels"].(map[string]string)
Expect(labels).To(HaveKeyWithValue("tier", "fast"))
Expect(labels).To(HaveKeyWithValue("gpu", "a100"))
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "2"))
})
})
})

View File

@@ -767,7 +767,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
}
if (u & FLAG_VAD) == FLAG_VAD {
if c.Backend != "silero-vad" && !(c.Backend == "whisper" && slices.Contains(c.Options, "vad_only")) {
if c.Backend != "silero-vad" && c.Backend != "sherpa-onnx" && !(c.Backend == "whisper" && slices.Contains(c.Options, "vad_only")) {
return false
}
}

View File

@@ -194,6 +194,20 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
name := config.Name
backendPath := filepath.Join(systemState.Backend.BackendsPath, name)
// Clean up legacy flat-layout artefacts: earlier dev builds of the
// golang backends dropped the compiled binary directly at
// `<backendsPath>/<name>` (a plain file) instead of
// `<backendsPath>/<name>/<name>` (the nested layout the current code
// expects). MkdirAll below returns ENOTDIR when such a stale file
// exists, permanently blocking any reinstall or upgrade. Remove the
// file first so the install can proceed; the new install will write
// the correct nested layout, including metadata.json + run.sh.
if fi, statErr := os.Lstat(backendPath); statErr == nil && !fi.IsDir() {
xlog.Warn("removing stale non-directory backend artefact to make room for fresh install", "path", backendPath)
if rmErr := os.Remove(backendPath); rmErr != nil {
return fmt.Errorf("failed to remove stale backend artefact at %s: %w", backendPath, rmErr)
}
}
err = os.MkdirAll(backendPath, 0750)
if err != nil {
return fmt.Errorf("failed to create base path: %v", err)

View File

@@ -880,7 +880,7 @@ func convertAnthropicTools(input *schema.AnthropicRequest, cfg *config.ModelConf
if tcType, ok := tc["type"].(string); ok && tcType == "tool" {
if name, ok := tc["name"].(string); ok {
// Force specific tool
cfg.SetFunctionCallString(name)
cfg.SetFunctionCallNameString(name)
}
}
}

View File

@@ -14,7 +14,13 @@ import (
"github.com/mudler/LocalAI/pkg/utils"
)
var audioDataURIPattern = regexp.MustCompile(`^data:([^;]+);base64,`)
// Match `data:<mime>[;param=value...];base64,` — MediaRecorder in the browser
// produces data URIs like `data:audio/webm;codecs=opus;base64,...`, so the
// pre-`;base64,` section can contain zero or more parameter segments. The
// old `([^;]+)` form only matched exactly one segment and left recordings
// from the React UI's live-capture tab unparsed, which then failed base64
// decoding on the leading `data:` bytes.
var audioDataURIPattern = regexp.MustCompile(`^data:[^,]+?;base64,`)
var audioDownloadClient = http.Client{Timeout: 30 * time.Second}

View File

@@ -98,7 +98,7 @@ func (mgs *BackendEndpointService) GetAllStatusEndpoint() echo.HandlerFunc {
// @Param request body GalleryBackend true "query params"
// @Success 200 {object} schema.BackendResponse "Response"
// @Router /backends/apply [post]
func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
func (mgs *BackendEndpointService) ApplyBackendEndpoint(systemState *system.SystemState) echo.HandlerFunc {
return func(c echo.Context) error {
input := new(GalleryBackend)
// Get input data from the request body
@@ -106,6 +106,18 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
return err
}
// In distributed mode, refuse to fan out a hardware-specific build to
// every node — a CPU build landing on a GPU cluster is almost always
// wrong, and the silent footgun is exactly what this guard exists for.
// Auto-resolving (meta) backends are fine because each node picks its
// own variant. Tooling can recover by hitting
// POST /api/nodes/{id}/backends/install per target node.
if mgs.backendApplier.BackendManager().IsDistributed() && input.ID != "" {
if guard := concreteFanOutGuard(c, mgs.galleries, systemState, input.ID); guard != nil {
return guard
}
}
uuid, err := uuid.NewUUID()
if err != nil {
return err
@@ -120,6 +132,66 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
}
}
// concreteFanOutGuard returns a 409 response if the requested backend is a
// hardware-specific build (not auto-resolving / meta) and we are in
// distributed mode. It looks up the backend in the configured galleries; if
// the lookup itself fails (gallery unreachable, name not found), the guard
// stays out of the way and lets the install enqueue normally — a missing
// name will surface from the worker as a clearer error than the guard could
// produce here. The response body deliberately speaks human, with `code` and
// `meta_alternative` as the programmatic contract for tooling.
func concreteFanOutGuard(c echo.Context, galleries []config.Gallery, systemState *system.SystemState, backendID string) error {
// Use the unfiltered listing because in distributed mode the frontend's
// hardware is irrelevant — the install targets workers, not us — and the
// filtered list would hide variants that don't match the frontend host
// (e.g. a CUDA build on a CPU-only frontend), preventing the guard from
// firing for exactly the cases it's meant to protect against.
available, err := gallery.AvailableBackendsUnfiltered(galleries, systemState)
if err != nil {
return nil
}
requested := available.FindByName(backendID)
if requested == nil || requested.IsMeta() {
return nil
}
// Try to find an auto-resolving (meta) backend that has this concrete
// variant in its CapabilitiesMap, so we can suggest it as a one-shot
// alternative. Optional — empty string is fine if no parent exists.
metaAlternative := ""
for _, b := range available {
if !b.IsMeta() {
continue
}
for _, concrete := range b.CapabilitiesMap {
if concrete == backendID {
metaAlternative = b.Name
break
}
}
if metaAlternative != "" {
break
}
}
msg := fmt.Sprintf(
"Backend %q is a hardware-specific build and won't run correctly on every node in this cluster. In distributed mode, install it on specific nodes:\n\n POST /api/nodes/{node_id}/backends/install\n {\"backend\": %q}",
backendID, backendID,
)
if metaAlternative != "" {
msg += fmt.Sprintf(
"\n\nTo install across all nodes, use the auto-resolving backend %q — each node picks its own variant based on its hardware.",
metaAlternative,
)
}
return c.JSON(409, map[string]any{
"error": msg,
"code": "concrete_backend_requires_target",
"meta_alternative": metaAlternative,
})
}
// DeleteBackendEndpoint lets delete backends from a LocalAI instance
// @Summary delete backends from LocalAI.
// @Tags backends

View File

@@ -73,6 +73,10 @@ type RegisterNodeRequest struct {
AvailableRAM uint64 `json:"available_ram,omitempty"`
GPUVendor string `json:"gpu_vendor,omitempty"`
Labels map[string]string `json:"labels,omitempty"`
// MaxReplicasPerModel is the per-node cap on replicas of any single model.
// Workers older than this field omit it; we coerce 0 → 1 below to preserve
// historical single-replica behavior.
MaxReplicasPerModel int `json:"max_replicas_per_model,omitempty"`
}
// RegisterNodeEndpoint registers a new backend node.
@@ -131,17 +135,26 @@ func RegisterNodeEndpoint(registry *nodes.NodeRegistry, expectedToken string, au
tokenHash = hex.EncodeToString(h[:])
}
// Coerce 0 → 1 for backward compat with workers that don't send the field.
// GORM's `default:1` only fires for a missing column; once Go zero-values
// reach the struct field they're written as 0 unless explicitly set here.
maxReplicasPerModel := req.MaxReplicasPerModel
if maxReplicasPerModel < 1 {
maxReplicasPerModel = 1
}
node := &nodes.BackendNode{
Name: req.Name,
NodeType: nodeType,
Address: req.Address,
HTTPAddress: req.HTTPAddress,
TokenHash: tokenHash,
TotalVRAM: req.TotalVRAM,
AvailableVRAM: req.AvailableVRAM,
TotalRAM: req.TotalRAM,
AvailableRAM: req.AvailableRAM,
GPUVendor: req.GPUVendor,
Name: req.Name,
NodeType: nodeType,
Address: req.Address,
HTTPAddress: req.HTTPAddress,
TokenHash: tokenHash,
TotalVRAM: req.TotalVRAM,
AvailableVRAM: req.AvailableVRAM,
TotalRAM: req.TotalRAM,
AvailableRAM: req.AvailableRAM,
GPUVendor: req.GPUVendor,
MaxReplicasPerModel: maxReplicasPerModel,
}
ctx := c.Request().Context()
@@ -363,6 +376,9 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
}
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
// Backend can be either a gallery ID (resolved against BackendGalleries) or a
// direct URI install (URI + Name + optional Alias) — same shape as the
// standalone /api/backends/install-external path, just scoped to one node.
func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
return func(c echo.Context) error {
if unloader == nil {
@@ -372,17 +388,27 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
var req struct {
Backend string `json:"backend"`
BackendGalleries string `json:"backend_galleries,omitempty"`
URI string `json:"uri,omitempty"`
Name string `json:"name,omitempty"`
Alias string `json:"alias,omitempty"`
}
if err := c.Bind(&req); err != nil || req.Backend == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
}
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
// Either a gallery backend name or a direct URI must be supplied.
if req.Backend == "" && req.URI == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
}
// Admin-driven backend install: not tied to a specific replica slot
// (no model is being loaded). Pass replica 0 to match the worker's
// admin process-key convention (`backend#0`).
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
if err != nil {
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
}
if !reply.Success {
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "error", reply.Error)
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
}
return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
@@ -457,8 +483,8 @@ func UnloadModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
xlog.Error("Failed to stop backend after model unload", "node", nodeID, "model", req.ModelName, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "model unloaded but backend stop failed"))
}
// Remove from registry
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
// Remove every replica of this model on the node from the registry.
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
return c.JSON(http.StatusOK, map[string]string{"message": "model unloaded"})
}
}
@@ -484,7 +510,7 @@ func DeleteModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
// Non-fatal — backend process may not be running
xlog.Warn("StopBackend failed during model deletion (non-fatal)", "node", nodeID, "model", req.ModelName, "error", err)
}
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
return c.JSON(http.StatusOK, map[string]string{"message": "model deleted from node"})
}
}
@@ -659,6 +685,78 @@ func GetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
}
}
// UpdateMaxReplicasPerModelRequest is the body for the per-node replica cap endpoint.
type UpdateMaxReplicasPerModelRequest struct {
// Value is the new per-model replica cap on this node. Must be >= 1.
Value int `json:"value"`
}
// UpdateMaxReplicasPerModelEndpoint sets the per-node cap on how many replicas
// of any one model can be loaded concurrently. The corresponding
// `node.replica-slots` auto-label is refreshed so existing AND-selectors keep
// matching, and any unsatisfiable scheduling cooldowns are cleared so the
// reconciler retries on the next tick.
//
// This is a transient admin override — a worker re-registration restores the
// value the worker was started with (--max-replicas-per-model). For permanent
// fleet changes, change the worker flag.
//
// @Summary Update a node's max replicas per model
// @Tags Nodes
// @Param id path string true "Node ID"
// @Param request body UpdateMaxReplicasPerModelRequest true "New value"
// @Success 200 {object} map[string]int
// @Failure 400 {object} map[string]any "value must be >= 1"
// @Failure 404 {object} map[string]any "node not found"
// @Router /api/nodes/{id}/max-replicas-per-model [put]
func UpdateMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {
ctx := c.Request().Context()
nodeID := c.Param("id")
if _, err := registry.Get(ctx, nodeID); err != nil {
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
}
var req UpdateMaxReplicasPerModelRequest
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
}
if req.Value < 1 {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "value must be >= 1"))
}
if err := registry.UpdateMaxReplicasPerModel(ctx, nodeID, req.Value); err != nil {
xlog.Error("Failed to update max_replicas_per_model", "node", nodeID, "value", req.Value, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to update max replicas per model"))
}
return c.JSON(http.StatusOK, map[string]int{"max_replicas_per_model": req.Value})
}
}
// ResetMaxReplicasPerModelEndpoint clears the admin override on a node, so
// the next worker re-registration is allowed to update the value from its
// CLI flag again. The current value is left in place until the worker calls
// register.
//
// @Summary Reset a node's max replicas per model to the worker default
// @Tags Nodes
// @Param id path string true "Node ID"
// @Success 200 {object} map[string]bool
// @Failure 404 {object} map[string]any "node not found"
// @Router /api/nodes/{id}/max-replicas-per-model [delete]
func ResetMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {
ctx := c.Request().Context()
nodeID := c.Param("id")
if _, err := registry.Get(ctx, nodeID); err != nil {
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
}
if err := registry.ResetMaxReplicasPerModel(ctx, nodeID); err != nil {
xlog.Error("Failed to reset max_replicas_per_model override", "node", nodeID, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to reset override"))
}
return c.JSON(http.StatusOK, map[string]bool{"reset": true})
}
}
// SetNodeLabelsEndpoint replaces all labels for a node.
func SetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {

View File

@@ -1315,13 +1315,35 @@ func triggerResponse(ctx context.Context, session *Session, conv *Conversation,
}
thinkingStartToken := reasoning.DetectThinkingStartToken(template, &config.ReasoningConfig)
reasoningText, responseWithoutReasoning := reasoning.ExtractReasoningWithConfig(rawResponse, thinkingStartToken, config.ReasoningConfig)
// When the C++ autoparser emitted ChatDeltas with actionable data,
// prefer them — the backend clears Reply.Message in that path and
// delivers parsed content/reasoning/tool-calls via the delta stream
// (see pkg/functions/chat_deltas.go, mirrored from chat.go's non-SSE
// handling). Without this, Response is empty and realtime would
// synthesize silence for replies that actually produced tokens.
var reasoningText, responseWithoutReasoning, textContent, cleanedResponse string
var toolCalls []functions.FuncCallResults
deltaToolCalls := functions.ToolCallsFromChatDeltas(pred.ChatDeltas)
deltaContent := functions.ContentFromChatDeltas(pred.ChatDeltas)
deltaReasoning := functions.ReasoningFromChatDeltas(pred.ChatDeltas)
if len(deltaToolCalls) > 0 || deltaContent != "" {
xlog.Debug("[ChatDeltas] realtime: using C++ autoparser deltas",
"tool_calls", len(deltaToolCalls),
"content_len", len(deltaContent),
"reasoning_len", len(deltaReasoning))
reasoningText = deltaReasoning
responseWithoutReasoning = deltaContent
textContent = deltaContent
cleanedResponse = deltaContent
toolCalls = deltaToolCalls
} else {
reasoningText, responseWithoutReasoning = reasoning.ExtractReasoningWithConfig(rawResponse, thinkingStartToken, config.ReasoningConfig)
textContent = functions.ParseTextContent(responseWithoutReasoning, config.FunctionsConfig)
cleanedResponse = functions.CleanupLLMResult(responseWithoutReasoning, config.FunctionsConfig)
toolCalls = functions.ParseFunctionCall(cleanedResponse, config.FunctionsConfig)
}
xlog.Debug("LLM Response", "reasoning", reasoningText, "response_without_reasoning", responseWithoutReasoning)
textContent := functions.ParseTextContent(responseWithoutReasoning, config.FunctionsConfig)
cleanedResponse := functions.CleanupLLMResult(responseWithoutReasoning, config.FunctionsConfig)
toolCalls := functions.ParseFunctionCall(cleanedResponse, config.FunctionsConfig)
xlog.Debug("Function call parsing", "textContent", textContent, "cleanedResponse", cleanedResponse, "toolCallsCount", len(toolCalls))
noActionName := "answer"

View File

@@ -168,7 +168,7 @@ func (m *wrappedModel) Predict(ctx context.Context, messages schema.Messages, im
}
} else if toolChoice.Function != nil {
// Specific function specified
m.LLMConfig.SetFunctionCallString(toolChoice.Function.Name)
m.LLMConfig.SetFunctionCallNameString(toolChoice.Function.Name)
}
}

View File

@@ -773,7 +773,7 @@ func convertORToolsToFunctions(input *schema.OpenResponsesRequest, cfg *config.M
case map[string]any:
if tcType, ok := tc["type"].(string); ok && tcType == "function" {
if name, ok := tc["name"].(string); ok {
cfg.SetFunctionCallString(name)
cfg.SetFunctionCallNameString(name)
}
}
}

View File

@@ -1,29 +1,32 @@
import { test, expect } from '@playwright/test'
test.describe('Manage Page - Backend Logs Link', () => {
test('models table shows terminal icon for logs', async ({ page }) => {
test('row action menu exposes Backend logs entry with terminal icon', async ({ page }) => {
await page.goto('/app/manage')
// Wait for models to load
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
// Check for terminal icon (backend logs link)
const terminalIcon = page.locator('a[title="Backend logs"] i.fa-terminal')
await expect(terminalIcon.first()).toBeVisible()
// Row actions live behind the kebab (ActionMenu) — open the first row's menu.
const trigger = page.locator('button.action-menu__trigger').first()
await expect(trigger).toBeVisible()
await trigger.click()
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
await expect(logsItem).toBeVisible()
await expect(logsItem.locator('i.fa-terminal')).toBeVisible()
})
test('terminal icon links to backend-logs page', async ({ page }) => {
test('Backend logs menu item navigates to backend-logs page', async ({ page }) => {
await page.goto('/app/manage')
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
const logsLink = page.locator('a[title="Backend logs"]').first()
await expect(logsLink).toBeVisible()
const trigger = page.locator('button.action-menu__trigger').first()
await expect(trigger).toBeVisible()
await trigger.click()
// Link uses href="#" with onClick for navigation
const href = await logsLink.getAttribute('href')
expect(href).toBe('#')
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
await expect(logsItem).toBeVisible()
await logsItem.click()
// Click and verify navigation
await logsLink.click()
await expect(page).toHaveURL(/\/app\/backend-logs\//)
})
})

View File

@@ -0,0 +1,166 @@
import { test, expect } from '@playwright/test'
// These specs cover the per-node backend row in the Nodes page:
// - the upgrade affordance is self-explanatory (icon + tooltip)
// - a delete affordance is present and goes through ConfirmDialog
//
// We mock the distributed-mode API so the tests can run against the
// standalone ui-test-server without spinning up workers/NATS.
const NODE_ID = 'test-node-1'
const NODE_NAME = 'worker-test'
const BACKEND_NAME = 'cuda12-vllm-development'
async function mockDistributedNodes(page, { onDelete } = {}) {
await page.route('**/api/nodes', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify([
{
id: NODE_ID,
name: NODE_NAME,
node_type: 'backend',
address: '10.0.0.1:50051',
http_address: '10.0.0.1:8090',
status: 'healthy',
total_vram: 0,
available_vram: 0,
total_ram: 8_000_000_000,
available_ram: 4_000_000_000,
gpu_vendor: '',
last_heartbeat: new Date().toISOString(),
created_at: new Date().toISOString(),
updated_at: new Date().toISOString(),
},
]),
})
})
await page.route('**/api/nodes/scheduling', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: '[]',
})
})
await page.route(`**/api/nodes/${NODE_ID}/models`, (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: '[]',
})
})
await page.route(`**/api/nodes/${NODE_ID}/backends`, (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify([
{
name: BACKEND_NAME,
is_system: false,
is_meta: false,
installed_at: new Date().toISOString(),
},
]),
})
})
await page.route(`**/api/nodes/${NODE_ID}/backends/delete`, async (route) => {
if (onDelete) {
await onDelete(route)
}
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ message: 'backend deleted' }),
})
})
}
async function expandNodeAndWaitForBackends(page) {
await page.goto('/app/nodes')
// Click the row to expand it. The chevron toggle and the row both work,
// but clicking the name cell is the most user-like.
await page.getByText(NODE_NAME).first().click()
// Backends, Capacity and Labels live behind a "Manage" <details>
// disclosure (the drawer was distilled to keep at-a-glance content
// lean — see distill refactor in the multi-replica branch). Open it
// by clicking the summary inside the .node-manage scope so the
// per-node backend table is in the DOM before assertions run.
await page.locator('.node-manage > summary').first().click()
await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
}
test.describe('Nodes page — per-node backend actions', () => {
test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
await mockDistributedNodes(page)
await expandNodeAndWaitForBackends(page)
// Negative: the old, ambiguous wording must not be used.
await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
await expect(page.locator('button[title="Reinstall backend"] i.fa-sync-alt')).toHaveCount(0)
// Positive: a self-explanatory upgrade affordance is rendered next to the
// backend row. We accept either an arrow-up or arrows-rotate glyph; both
// map to "upgrade" semantics in FontAwesome 6 unambiguously.
const upgradeBtn = page.locator('button[title="Upgrade backend on this node"]')
await expect(upgradeBtn).toBeVisible()
const iconClass = await upgradeBtn.locator('i').getAttribute('class')
expect(iconClass).toMatch(/fa-(arrow-up|arrows-rotate|up-long)/)
})
test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
await mockDistributedNodes(page)
await expandNodeAndWaitForBackends(page)
const deleteBtn = page.locator('button[title="Delete backend from this node"]')
await expect(deleteBtn).toBeVisible()
await expect(deleteBtn.locator('i.fa-trash')).toBeVisible()
})
test('clicking delete opens the confirm dialog and POSTs to the per-node delete endpoint', async ({ page }) => {
let postedBody = null
await mockDistributedNodes(page, {
onDelete: async (route) => {
postedBody = route.request().postDataJSON()
},
})
await expandNodeAndWaitForBackends(page)
await page.locator('button[title="Delete backend from this node"]').click()
// ConfirmDialog uses role="alertdialog" and a danger confirm button.
const dialog = page.getByRole('alertdialog')
await expect(dialog).toBeVisible()
const confirmBtn = dialog.locator('button.btn-danger')
await expect(confirmBtn).toBeVisible()
await confirmBtn.click()
// Wait until the POST landed.
await expect.poll(() => postedBody, { timeout: 5_000 }).toEqual({ backend: BACKEND_NAME })
})
test('clicking delete and cancelling does not POST', async ({ page }) => {
let deleteCalls = 0
await mockDistributedNodes(page, {
onDelete: () => {
deleteCalls += 1
},
})
await expandNodeAndWaitForBackends(page)
await page.locator('button[title="Delete backend from this node"]').click()
const dialog = page.getByRole('alertdialog')
await expect(dialog).toBeVisible()
await dialog.getByRole('button', { name: /cancel/i }).click()
await expect(dialog).toBeHidden()
// Give any errant request a moment to fire so a regression would be caught.
await page.waitForTimeout(500)
expect(deleteCalls).toBe(0)
})
})

View File

@@ -7,7 +7,7 @@
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&display=swap" rel="stylesheet" />
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@300..700&family=Geist+Mono:wght@300..700&display=swap" rel="stylesheet" />
</head>
<body>
<div id="root"></div>

View File

@@ -3258,9 +3258,9 @@
}
},
"node_modules/postcss": {
"version": "8.5.8",
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz",
"integrity": "sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg==",
"version": "8.5.10",
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.10.tgz",
"integrity": "sha512-pMMHxBOZKFU6HgAZ4eyGnwXF/EvPGGqUr0MnZ5+99485wwW41kW91A4LOGxSHhgugZmSChL5AlElNdwlNgcnLQ==",
"dev": true,
"funding": [
{
@@ -3276,6 +3276,7 @@
"url": "https://github.com/sponsors/ai"
}
],
"license": "MIT",
"dependencies": {
"nanoid": "^3.3.11",
"picocolors": "^1.1.1",

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,9 +1,11 @@
import { useState, useEffect } from 'react'
import { Outlet, useLocation } from 'react-router-dom'
import { useState, useEffect, useRef } from 'react'
import { Outlet, useLocation, useNavigate } from 'react-router-dom'
import Sidebar from './components/Sidebar'
import OperationsBar from './components/OperationsBar'
import { ToastContainer, useToast } from './components/Toast'
import { systemApi } from './utils/api'
import { useTheme } from './contexts/ThemeContext'
import { useAuth } from './context/AuthContext'
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
@@ -15,6 +17,10 @@ export default function App() {
const { toasts, addToast, removeToast } = useToast()
const [version, setVersion] = useState('')
const location = useLocation()
const navigate = useNavigate()
const { theme, toggleTheme } = useTheme()
const { authEnabled, user } = useAuth()
const hamburgerRef = useRef(null)
const isChatRoute = location.pathname.match(/\/chat(\/|$)/) || location.pathname.match(/\/agents\/[^/]+\/chat/)
useEffect(() => {
@@ -34,26 +40,80 @@ export default function App() {
window.scrollTo(0, 0)
}, [location.pathname])
// Drawer polish: lock body scroll, close on Escape, return focus to the
// hamburger when the drawer closes. Only engages when the drawer is open;
// desktop and tablet rail mode are unaffected.
useEffect(() => {
if (!sidebarOpen) return
const prevOverflow = document.body.style.overflow
document.body.style.overflow = 'hidden'
const onKey = (e) => { if (e.key === 'Escape') setSidebarOpen(false) }
window.addEventListener('keydown', onKey)
return () => {
document.body.style.overflow = prevOverflow
window.removeEventListener('keydown', onKey)
// Restore focus to the trigger so keyboard users land back where
// they invoked the drawer from.
hamburgerRef.current?.focus()
}
}, [sidebarOpen])
const layoutClasses = [
'app-layout',
isChatRoute ? 'app-layout-chat' : '',
sidebarCollapsed ? 'sidebar-is-collapsed' : '',
].filter(Boolean).join(' ')
const showAvatar = authEnabled && user
const accountLabel = user?.name || user?.email || 'Account'
return (
<div className={layoutClasses}>
<Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
<main className="main-content">
<main className="main-content" {...(sidebarOpen ? { 'aria-hidden': 'true', inert: '' } : {})}>
<OperationsBar />
{/* Mobile header */}
{/* Mobile header — primary actions reachable without opening the
drawer. Hamburger is the only way to expand the nav on phones;
theme toggle and account avatar are mirrored from the sidebar
footer so they remain one tap away. */}
<header className="mobile-header">
<button
ref={hamburgerRef}
className="hamburger-btn"
onClick={() => setSidebarOpen(true)}
aria-label="Open menu"
aria-expanded={sidebarOpen}
aria-controls="app-sidebar"
>
<i className="fas fa-bars" />
<i className="fas fa-bars" aria-hidden="true" />
</button>
<span className="mobile-title">LocalAI</span>
<div className="mobile-header-actions">
<button
type="button"
className="mobile-header-btn"
onClick={toggleTheme}
aria-label={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
title={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
>
<i className={`fas ${theme === 'dark' ? 'fa-sun' : 'fa-moon'}`} aria-hidden="true" />
</button>
{showAvatar && (
<button
type="button"
className="mobile-header-btn mobile-header-avatar"
onClick={() => navigate('/app/account')}
aria-label={`Account: ${accountLabel}`}
title={accountLabel}
>
{user.avatarUrl ? (
<img src={user.avatarUrl} alt="" />
) : (
<i className="fas fa-user-circle" aria-hidden="true" />
)}
</button>
)}
</div>
</header>
<div className="main-content-inner">
<div className="page-transition" key={location.pathname}>

View File

@@ -0,0 +1,141 @@
import { useRef, useState, useEffect, useCallback } from 'react'
import Popover from './Popover'
// ActionMenu renders a kebab (three-dot) button that opens a popover with a
// list of row actions. Replaces the inline cluster of icon buttons that made
// dense tables feel like a control panel — actions stay out of the way until
// the user reaches for them, the way Linear/Vercel/Notion handle row menus.
//
// Items shape:
// { key, icon?, label, onClick, danger?, disabled?, hidden?, shortcut? }
// { divider: true } // visual separator
// { type: 'badge', icon?, label } // non-interactive badge row
//
// Hidden items are filtered out so callers can write conditional menus
// inline (`{ key: 'stop', visible: isRunning, ... }` style) without ternaries.
//
// Keyboard:
// ArrowUp / ArrowDown — move highlight (skipping dividers + badges)
// Enter / Space — activate
// Escape — close, return focus to trigger
export default function ActionMenu({ items, ariaLabel = 'Actions', triggerLabel, compact = false }) {
const triggerRef = useRef(null)
const [open, setOpen] = useState(false)
const [activeIdx, setActiveIdx] = useState(-1)
const interactive = (Array.isArray(items) ? items : []).filter(it => it && !it.divider && it.type !== 'badge' && !it.hidden)
const visible = (Array.isArray(items) ? items : []).filter(it => it && !it.hidden)
const close = useCallback(() => {
setOpen(false)
setActiveIdx(-1)
}, [])
// Move highlight to the first interactive item when opening, so keyboard
// users land somewhere meaningful instead of having to arrow into the menu.
useEffect(() => {
if (open && activeIdx === -1 && interactive.length > 0) {
setActiveIdx(0)
}
}, [open, activeIdx, interactive.length])
const handleTriggerKeyDown = (e) => {
if (e.key === 'ArrowDown' || e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
e.stopPropagation()
setOpen(true)
}
}
const handleMenuKeyDown = (e) => {
if (e.key === 'ArrowDown') {
e.preventDefault()
setActiveIdx(i => Math.min(interactive.length - 1, (i < 0 ? -1 : i) + 1))
} else if (e.key === 'ArrowUp') {
e.preventDefault()
setActiveIdx(i => Math.max(0, (i < 0 ? interactive.length : i) - 1))
} else if (e.key === 'Home') {
e.preventDefault()
setActiveIdx(0)
} else if (e.key === 'End') {
e.preventDefault()
setActiveIdx(interactive.length - 1)
} else if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
const item = interactive[activeIdx]
if (item && !item.disabled) {
close()
item.onClick?.()
}
}
}
if (interactive.length === 0 && !visible.some(it => it.type === 'badge')) {
return null
}
return (
<>
<button
ref={triggerRef}
type="button"
className={`action-menu__trigger${compact ? ' action-menu__trigger--compact' : ''}${open ? ' is-open' : ''}`}
aria-haspopup="menu"
aria-expanded={open}
aria-label={triggerLabel || ariaLabel}
onClick={(e) => { e.stopPropagation(); setOpen(v => !v) }}
onKeyDown={handleTriggerKeyDown}
>
<i className="fas fa-ellipsis-vertical" />
</button>
<Popover anchor={triggerRef} open={open} onClose={close} ariaLabel={ariaLabel}>
<div
role="menu"
aria-label={ariaLabel}
className="action-menu"
onKeyDown={handleMenuKeyDown}
// Capture focus when the menu opens so arrow keys work without the
// user clicking inside first.
tabIndex={-1}
ref={el => { if (el && open) el.focus() }}
>
{visible.map((item, i) => {
if (item.divider) {
return <div key={`d-${i}`} className="action-menu__divider" role="separator" />
}
if (item.type === 'badge') {
return (
<div key={item.key || `b-${i}`} className="action-menu__badge" role="presentation">
{item.icon && <i className={`fas ${item.icon}`} aria-hidden="true" />}
<span>{item.label}</span>
</div>
)
}
const idx = interactive.indexOf(item)
const active = idx === activeIdx
return (
<button
key={item.key}
type="button"
role="menuitem"
disabled={item.disabled}
className={`action-menu__item${item.danger ? ' is-danger' : ''}${active ? ' is-active' : ''}`}
onMouseEnter={() => setActiveIdx(idx)}
onClick={(e) => {
e.stopPropagation()
if (item.disabled) return
close()
item.onClick?.()
}}
>
{item.icon && <i className={`fas ${item.icon} action-menu__icon`} aria-hidden="true" />}
<span className="action-menu__label">{item.label}</span>
{item.shortcut && <span className="action-menu__shortcut">{item.shortcut}</span>}
</button>
)
})}
</div>
</Popover>
</>
)
}

View File

@@ -80,7 +80,7 @@ export default function ClientMCPDropdown({
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
value={url}
onChange={e => setUrl(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="text"
@@ -88,7 +88,7 @@ export default function ClientMCPDropdown({
placeholder="Name (optional)"
value={name}
onChange={e => setName(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="password"
@@ -96,13 +96,13 @@ export default function ClientMCPDropdown({
placeholder="Auth token (optional)"
value={authToken}
onChange={e => setAuthToken(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
Use CORS proxy
</label>
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
<button type="button" className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
</div>

View File

@@ -135,7 +135,7 @@ function JsonEditor({ value, onChange }) {
className="input"
value={text}
onChange={e => handleChange(e.target.value)}
style={{ width: '100%', minHeight: 80, fontFamily: 'monospace', fontSize: '0.8125rem', resize: 'vertical' }}
style={{ width: '100%', minHeight: 80, fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', resize: 'vertical' }}
/>
{parseError && <div style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 2 }}>{parseError}</div>}
</div>

View File

@@ -158,7 +158,7 @@ export default function FieldBrowser({ fields, activeFieldPaths, onAddField }) {
{field.description}
</div>
)}
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'monospace' }}>
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'var(--font-mono)' }}>
{field.path}
</div>
</div>

View File

@@ -0,0 +1,79 @@
import { useState, useEffect } from 'react'
const LOADING_PHRASES = [
{ text: 'Loading models...', icon: 'fa-brain' },
{ text: 'Fetching gallery...', icon: 'fa-download' },
{ text: 'Checking availability...', icon: 'fa-circle-check' },
{ text: 'Almost ready...', icon: 'fa-hourglass-half' },
{ text: 'Preparing gallery...', icon: 'fa-store' },
]
// GalleryLoader is the animated skeleton used while the gallery list loads.
// Used by Models, Backends, and (now) the Manage page so an empty fetch state
// reads the same everywhere instead of one tab showing pulsing dots and the
// other showing "Loading...".
export default function GalleryLoader() {
const [idx, setIdx] = useState(() => Math.floor(Math.random() * LOADING_PHRASES.length))
const [fade, setFade] = useState(true)
useEffect(() => {
const interval = setInterval(() => {
setFade(false)
setTimeout(() => {
setIdx(prev => (prev + 1) % LOADING_PHRASES.length)
setFade(true)
}, 300)
}, 2800)
return () => clearInterval(interval)
}, [])
const phrase = LOADING_PHRASES[idx]
return (
<div style={{
display: 'flex', flexDirection: 'column', alignItems: 'center',
justifyContent: 'center', padding: 'var(--spacing-xl) var(--spacing-md)',
minHeight: '280px', gap: 'var(--spacing-lg)',
}}>
<div style={{ display: 'flex', gap: 'var(--spacing-sm)' }}>
{[0, 1, 2, 3, 4].map(i => (
<div key={i} style={{
width: 10, height: 10, borderRadius: '50%',
background: 'var(--color-primary)',
animation: `galleryDot 1.4s ease-in-out ${i * 0.15}s infinite`,
}} />
))}
</div>
<div style={{
display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)',
opacity: fade ? 1 : 0,
transition: 'opacity 300ms ease',
color: 'var(--color-text-secondary)',
fontSize: '0.9375rem',
fontWeight: 500,
}}>
<i className={`fas ${phrase.icon}`} style={{ color: 'var(--color-accent)', fontSize: '1.125rem' }} />
{phrase.text}
</div>
<div style={{ width: '100%', maxWidth: '700px', display: 'flex', flexDirection: 'column', gap: '12px' }}>
{[0.9, 0.7, 0.5].map((opacity, i) => (
<div key={i} style={{
height: '48px', borderRadius: 'var(--radius-md)',
background: 'var(--color-bg-tertiary)', opacity,
animation: `galleryShimmer 1.8s ease-in-out ${i * 0.2}s infinite`,
}} />
))}
</div>
<style>{`
@keyframes galleryDot {
0%, 80%, 100% { transform: scale(0.4); opacity: 0.3; }
40% { transform: scale(1); opacity: 1; }
}
@keyframes galleryShimmer {
0%, 100% { opacity: var(--shimmer-base, 0.15); }
50% { opacity: var(--shimmer-peak, 0.3); }
}
`}</style>
</div>
)
}

View File

@@ -0,0 +1,47 @@
import StatCard from './StatCard'
// ManageSummary anchors the Manage page with the same StatCard pattern the
// Nodes dashboard uses, so the page reads as a real overview rather than
// "two tabs in a hat". Counts are derived in-memory by the parent — this
// component is purely presentational. Cards are clickable and route the
// user to the relevant tab + filter.
export default function ManageSummary({
modelsCount,
backendsCount,
runningCount,
updatesCount,
onCardClick,
}) {
const click = (tab, filter) => onCardClick && onCardClick(tab, filter)
return (
<div className="stat-grid manage-summary">
<StatCard
icon="fas fa-brain"
label="Models Installed"
value={modelsCount}
onClick={() => click('models', 'all')}
/>
<StatCard
icon="fas fa-server"
label="Backends Installed"
value={backendsCount}
onClick={() => click('backends', 'all')}
/>
<StatCard
icon="fas fa-circle-play"
label="Currently Running"
value={runningCount}
accentVar={runningCount > 0 ? '--color-success' : undefined}
onClick={() => click('models', 'running')}
/>
<StatCard
icon="fas fa-arrow-up"
label="Updates Available"
value={updatesCount}
accentVar={updatesCount > 0 ? '--color-warning' : undefined}
onClick={() => click('backends', updatesCount > 0 ? 'upgradable' : 'all')}
/>
</div>
)
}

View File

@@ -0,0 +1,30 @@
// MetaBadgeRow renders the System / User / Meta / Dev badge cluster the same
// way everywhere — Manage tabs and (in future) Install gallery. The badges
// already exist as classes; this component locks down the icons + labels so
// the same backend type doesn't read "User" in one tab and "downloaded" in
// another.
export default function MetaBadgeRow({ isSystem, isMeta, isDevelopment }) {
return (
<div className="badge-row">
{isSystem ? (
<span className="badge badge-info" title="Bundled with the LocalAI runtime">
<i className="fas fa-shield-alt" /> System
</span>
) : (
<span className="badge badge-success" title="Installed from the gallery or external source">
<i className="fas fa-download" /> User
</span>
)}
{isMeta && (
<span className="badge badge-accent" title="Meta backend — selects a concrete variant per node">
<i className="fas fa-layer-group" /> Meta
</span>
)}
{isDevelopment && (
<span className="badge badge-warning" title="Marked as development / pre-release by the gallery">
<i className="fas fa-flask" /> Dev
</span>
)}
</div>
)
}

View File

@@ -0,0 +1,668 @@
import { useState, useMemo, useEffect, useRef } from 'react'
import Modal from './Modal'
import SearchableSelect from './SearchableSelect'
import { nodesApi } from '../utils/api'
// NodeInstallPicker is the single multi-node install surface used both from
// the Backends gallery split-button and from the "Install on more nodes" `+`
// affordance in the Nodes column. Submit fires N parallel per-node install
// calls; rows transition inline so the user sees per-node success/failure
// without leaving the modal.
//
// Props:
// open — controls visibility
// onClose — close handler (header X / Cancel / Esc / backdrop)
// onComplete — fired after at least one node install succeeded;
// gallery uses this to refetch and update the Nodes
// column without a manual reload
// backend — { name, isMeta, capabilities, metaBackendFor }
// nodes — BackendNode[] from /api/nodes
// installedNodeIds — Set/array of node IDs that already have this backend
// initialSelection — optional pre-selected node IDs (e.g. "missing nodes"
// when opened from the Nodes column `+` affordance)
const STATUS_LABELS = { healthy: 'Healthy', draining: 'Draining', unhealthy: 'Unhealthy', offline: 'Offline' }
function formatVRAM(bytes) {
if (!bytes || bytes === 0) return null
const gb = bytes / (1024 * 1024 * 1024)
return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
}
function gpuVendorLabel(vendor) {
const labels = { nvidia: 'NVIDIA', amd: 'AMD', intel: 'Intel', vulkan: 'Vulkan' }
return labels[vendor] || null
}
// hardwareTargetOf parses the capability key that points to a concrete
// variant in the parent meta's CapabilitiesMap. e.g. cpu-llama-cpp comes
// from {"cpu": "cpu-llama-cpp"} → "cpu". Falls back to "" when the parent
// is unknown (the gallery list payload still gives us metaBackendFor).
function hardwareTargetOf(backend, allBackends) {
if (!backend || !backend.name || backend.isMeta) return ''
const parentName = backend.metaBackendFor
if (!parentName) return ''
const parent = (allBackends || []).find(b => b.name === parentName || b.id === parentName)
if (!parent || !parent.capabilities) return ''
for (const [cap, concreteName] of Object.entries(parent.capabilities)) {
if (concreteName === backend.name) return cap
}
return ''
}
// humanTargetLabel turns a capability key into a user-facing phrase used in
// the picker header note: "CPU build", "CUDA 12 build", etc. Keep it
// concrete and product-recognisable, not the raw token from the gallery.
function humanTargetLabel(target) {
if (!target) return 'hardware-specific build'
const t = target.toLowerCase()
if (t.startsWith('cpu') || t === 'default') return 'CPU build'
if (t.includes('cuda-13') || t.includes('cuda13')) return 'CUDA 13 build'
if (t.includes('cuda-12') || t.includes('cuda12')) return 'CUDA 12 build'
if (t.includes('cuda')) return 'NVIDIA CUDA build'
if (t.includes('l4t')) return 'NVIDIA Jetson (L4T) build'
if (t.includes('nvidia')) return 'NVIDIA build'
if (t.includes('rocm') || t.includes('amd')) return 'AMD ROCm build'
if (t.includes('metal')) return 'Apple Metal build'
if (t.includes('sycl') || t.includes('intel')) return 'Intel SYCL build'
if (t.includes('vulkan')) return 'Vulkan build'
if (t.includes('darwin-x86')) return 'macOS x86 build'
return 'hardware-specific build'
}
// suitabilityFor returns the picker's per-row suitability state for the
// requested backend. Already-installed wins over compatible/override so
// the user sees a single signal per row.
function suitabilityFor({ node, backend, hardwareTarget, alreadyInstalled }) {
if (alreadyInstalled) return 'installed'
// backend can be null on the first render before pickerBackend is set —
// this function is invoked from useMemo, which runs regardless of the
// outer open guard. Treat missing data as "compatible" so the placeholder
// render doesn't blow up; the picker won't actually paint anything until
// the early-return below the hooks fires.
if (!backend || backend.isMeta || !hardwareTarget) return 'compatible'
const vendor = (node.gpu_vendor || '').toLowerCase()
const t = hardwareTarget.toLowerCase()
if (t.startsWith('cpu') || t === 'default') {
// CPU builds always run; they're never marked Override (running CPU on a
// GPU node is the headline use case the user is choosing intentionally).
return 'compatible'
}
if (t.includes('nvidia') || t.includes('cuda') || t.includes('l4t')) {
return vendor === 'nvidia' ? 'compatible' : 'override'
}
if (t.includes('amd') || t.includes('rocm') || t.includes('hip')) {
return vendor === 'amd' ? 'compatible' : 'override'
}
if (t.includes('intel') || t.includes('sycl')) {
return vendor === 'intel' ? 'compatible' : 'override'
}
if (t.includes('metal') || t.includes('darwin')) {
// No vendor reporting for Metal; trust the user.
return 'compatible'
}
return 'compatible'
}
export default function NodeInstallPicker({
open, onClose, onComplete,
backend,
nodes = [],
allBackends = [],
installedNodeIds = [],
initialSelection,
addToast,
}) {
const [search, setSearch] = useState('')
const [showHealthy, setShowHealthy] = useState(true)
const [showDraining, setShowDraining] = useState(false)
const [selected, setSelected] = useState(() => new Set())
const [overrideVariant, setOverrideVariant] = useState('') // chosen concrete name
const [overrideExpanded, setOverrideExpanded] = useState(false)
const [submitting, setSubmitting] = useState(false)
const [showMismatchConfirm, setShowMismatchConfirm] = useState(false)
// Per-node submission state: { [nodeId]: { status: 'pending'|'installing'|'done'|'error', error? , version? } }
const [perNode, setPerNode] = useState({})
const headerInputRef = useRef(null)
// Backend-derived metadata used throughout the picker.
const hardwareTarget = useMemo(() => hardwareTargetOf(backend, allBackends), [backend, allBackends])
const targetLabel = humanTargetLabel(hardwareTarget)
const concreteVariants = useMemo(() => {
if (!backend?.isMeta || !backend.capabilities) return []
return Object.entries(backend.capabilities).map(([cap, concrete]) => ({
value: concrete,
label: `${concrete} · ${cap}`,
}))
}, [backend])
// Pending nodes are surgically removed from the list — they can't accept
// installs until approved. Surface the count instead of dead-disabled rows.
const pendingCount = nodes.filter(n => n.status === 'pending').length
const backendNodes = nodes.filter(n =>
(!n.node_type || n.node_type === 'backend') && n.status !== 'pending'
)
const installedSet = useMemo(() => {
const s = new Set()
if (Array.isArray(installedNodeIds)) installedNodeIds.forEach(id => s.add(id))
else if (installedNodeIds && typeof installedNodeIds.has === 'function') {
installedNodeIds.forEach(id => s.add(id))
}
return s
}, [installedNodeIds])
const filteredNodes = useMemo(() => {
let list = backendNodes
if (!showHealthy) list = list.filter(n => n.status !== 'healthy')
if (!showDraining) list = list.filter(n => n.status !== 'draining')
if (search.trim()) {
const q = search.toLowerCase()
list = list.filter(n =>
(n.name || '').toLowerCase().includes(q) ||
Object.entries(n.labels || {}).some(([k, v]) => `${k}=${v}`.toLowerCase().includes(q))
)
}
return list
}, [backendNodes, showHealthy, showDraining, search])
// Pre-seed selection on open. Reset all transient state so reopening
// doesn't surface ghost progress from the prior submit.
useEffect(() => {
if (!open) return
const initial = new Set()
if (Array.isArray(initialSelection)) initialSelection.forEach(id => initial.add(id))
setSelected(initial)
setSearch('')
setOverrideVariant('')
setOverrideExpanded(false)
setPerNode({})
setSubmitting(false)
setShowMismatchConfirm(false)
}, [open, initialSelection])
// Auto-expand the variant override disclosure when at least one selected
// node lacks a working GPU. This is the headline use case the feature
// exists for; surfacing it instead of hiding behind a click.
useEffect(() => {
if (!backend?.isMeta) return
const someGPUMissing = Array.from(selected).some(id => {
const n = backendNodes.find(x => x.id === id)
return n && (!n.gpu_vendor || n.gpu_vendor === '' || n.gpu_vendor === 'unknown')
})
if (someGPUMissing && !overrideExpanded) setOverrideExpanded(true)
}, [selected, backend, backendNodes]) // eslint-disable-line react-hooks/exhaustive-deps
// The effective backend that gets installed on each node. For
// hardware-specific backends this is just backend.name. For meta backends
// with no override, the worker picks per-node — we pass backend.name and
// the worker resolves. With an override set, the picker installs that
// exact concrete variant on every selected node.
const effectiveBackendName = overrideVariant || backend?.name
const counts = useMemo(() => {
let already = 0, overrides = 0
selected.forEach(id => {
const n = backendNodes.find(x => x.id === id)
if (!n) return
if (installedSet.has(id)) { already++; return }
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
const s = suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false })
if (s === 'override') overrides++
})
return { already, overrides, selected: selected.size }
}, [selected, backendNodes, installedSet, overrideVariant, backend, hardwareTarget, allBackends])
const toggle = (nodeId) => {
setSelected(prev => {
const next = new Set(prev)
next.has(nodeId) ? next.delete(nodeId) : next.add(nodeId)
return next
})
}
const selectAllHealthy = () => {
setSelected(new Set(filteredNodes.filter(n => n.status === 'healthy').map(n => n.id)))
}
const selectCompatible = () => {
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
setSelected(new Set(
filteredNodes
.filter(n => suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false }) === 'compatible')
.map(n => n.id)
))
}
const clearSelection = () => setSelected(new Set())
const submit = async () => {
if (selected.size === 0 || submitting) return
if (counts.overrides > 0 && !showMismatchConfirm) {
setShowMismatchConfirm(true)
return
}
setShowMismatchConfirm(false)
setSubmitting(true)
const ids = Array.from(selected)
setPerNode(prev => {
const next = { ...prev }
ids.forEach(id => { next[id] = { status: 'installing' } })
return next
})
const results = await Promise.allSettled(ids.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
))
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) {
next[v.id] = { status: 'done' }
successCount++
} else {
next[v.id] = { status: 'error', error: v.error }
failCount++
}
}
return next
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
} else if (successCount === 0) {
addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
} else {
addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
}
}
const retryFailed = async () => {
const failedIds = Object.entries(perNode)
.filter(([, v]) => v.status === 'error')
.map(([id]) => id)
if (failedIds.length === 0) return
setSelected(new Set(failedIds))
// Replace state for failed rows so they show "installing" again, not stale errors.
setPerNode(prev => {
const next = { ...prev }
failedIds.forEach(id => { next[id] = { status: 'installing' } })
return next
})
setSubmitting(true)
const results = await Promise.allSettled(failedIds.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
))
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
}
return next
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
}
}
const doneCount = Object.values(perNode).filter(v => v.status === 'done').length
const errorCount = Object.values(perNode).filter(v => v.status === 'error').length
const totalAttempted = Object.keys(perNode).length
if (!open || !backend) return null
const noNodes = backendNodes.length === 0
return (
<Modal onClose={onClose} maxWidth="780px">
<div style={{
padding: 'var(--spacing-md) var(--spacing-lg)',
borderBottom: '1px solid var(--color-border-subtle)',
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
gap: 'var(--spacing-sm)',
}}>
<h2 style={{ margin: 0, fontSize: '1rem', display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
<i className="fas fa-cog" style={{ color: 'var(--color-primary)' }} />
Install <span style={{ fontFamily: 'var(--font-mono)' }}>{backend.name}</span>
{backend.isMeta ? (
<span className="badge badge-info" style={{ fontSize: '0.6875rem' }}>Auto-resolving</span>
) : (
<span className="badge badge-warning" style={{ fontSize: '0.6875rem' }}>Hardware-specific</span>
)}
</h2>
<button
type="button"
className="btn btn-ghost btn-sm"
onClick={onClose}
aria-label="Close"
style={{ fontSize: '1.125rem', lineHeight: 1, padding: '4px 10px' }}
>×</button>
</div>
<div style={{ padding: 'var(--spacing-md) var(--spacing-lg)' }}>
{!backend.isMeta && (
<div className="card" style={{
marginBottom: 'var(--spacing-md)',
padding: 'var(--spacing-sm) var(--spacing-md)',
background: 'var(--color-warning-light)',
border: '1px solid var(--color-warning-border)',
borderRadius: 'var(--radius-md)',
display: 'flex',
alignItems: 'center',
gap: 'var(--spacing-sm)',
}}>
<i className="fas fa-microchip" style={{ color: 'var(--color-warning)' }} />
<span style={{ color: 'var(--color-warning)', fontSize: '0.8125rem' }}>
{targetLabel}. Install only on nodes where you want this build to run.
{hardwareTarget && ` Targets: ${humanTargetLabel(hardwareTarget).replace(' build', '')}.`}
</span>
</div>
)}
{noNodes ? (
<div className="empty-state" style={{ padding: 'var(--spacing-xl) 0' }}>
<div className="empty-state-icon"><i className="fas fa-server" /></div>
<h3 className="empty-state-title">No backend nodes available</h3>
<p className="empty-state-text">
Approve pending workers or register new ones.
{pendingCount > 0 && ` (${pendingCount} awaiting approval.)`}
</p>
<a className="btn btn-secondary btn-sm" href="/app/nodes">
<i className="fas fa-network-wired" /> Manage nodes
</a>
</div>
) : (
<>
{/* Filter row */}
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', alignItems: 'center', marginBottom: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
<div className="search-bar" style={{ flex: 1, minWidth: 180 }}>
<i className="fas fa-search search-icon" />
<input
ref={headerInputRef}
className="input"
placeholder="Filter nodes by name or label..."
value={search}
onChange={e => setSearch(e.target.value)}
/>
</div>
<button className="btn btn-secondary btn-sm" onClick={selectAllHealthy} type="button">
Select all healthy
</button>
<button className="btn btn-secondary btn-sm" onClick={selectCompatible} type="button">
Select compatible nodes
</button>
{selected.size > 0 && (
<button className="btn btn-ghost btn-sm" onClick={clearSelection} type="button">
Clear
</button>
)}
</div>
{/* Variant override (auto-resolving only) */}
{backend.isMeta && concreteVariants.length > 0 && (
<div style={{ marginBottom: 'var(--spacing-sm)' }}>
<button
type="button"
className="btn btn-ghost btn-sm"
onClick={() => setOverrideExpanded(v => !v)}
aria-expanded={overrideExpanded}
style={{ padding: '4px 8px' }}
>
<i className={`fas fa-chevron-${overrideExpanded ? 'down' : 'right'}`} style={{ marginRight: 4, fontSize: '0.625rem' }} />
Override variant for selected nodes
</button>
{overrideExpanded && (
<div className="card" style={{ marginTop: 4, padding: 'var(--spacing-sm) var(--spacing-md)' }}>
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 0, marginBottom: 'var(--spacing-xs)' }}>
By default each node picks its own variant. Override to install one specific variant on every selected node useful when GPU detection fails on a node and you want the CPU build there instead.
</p>
<SearchableSelect
value={overrideVariant}
onChange={setOverrideVariant}
options={concreteVariants}
placeholder="Per-node auto-resolve (default)"
allOption={{ value: '', label: 'Per-node auto-resolve (default)' }}
/>
</div>
)}
</div>
)}
{/* Node table */}
<div className="table-container" style={{ marginBottom: 'var(--spacing-sm)', maxHeight: '40vh', overflowY: 'auto' }}>
<table className="table" style={{ margin: 0 }}>
<thead>
<tr>
<th style={{ width: 28 }}>
<input
type="checkbox"
aria-label="Select all visible"
checked={filteredNodes.length > 0 && filteredNodes.every(n => selected.has(n.id))}
onChange={(e) => {
setSelected(prev => {
const next = new Set(prev)
if (e.target.checked) filteredNodes.forEach(n => next.add(n.id))
else filteredNodes.forEach(n => next.delete(n.id))
return next
})
}}
/>
</th>
<th>Node</th>
<th>Status</th>
<th>Hardware</th>
<th>Suitability</th>
</tr>
</thead>
<tbody>
{filteredNodes.map(node => {
const installed = installedSet.has(node.id)
const eff = overrideVariant
? { name: overrideVariant, isMeta: false, metaBackendFor: backend.name }
: backend
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
const suit = suitabilityFor({ node, backend: eff, hardwareTarget: target, alreadyInstalled: installed })
const isSel = selected.has(node.id)
const rowState = perNode[node.id]
const vendor = gpuVendorLabel(node.gpu_vendor)
const totalVRAM = formatVRAM(node.total_vram)
const totalRAM = formatVRAM(node.total_ram)
return (
<tr key={node.id}>
<td>
<input
type="checkbox"
aria-label={`Select ${node.name}`}
aria-disabled={rowState?.status === 'installing'}
checked={isSel}
onChange={() => toggle(node.id)}
/>
</td>
<td>
<div style={{ display: 'flex', flexDirection: 'column', gap: 2 }}>
<span style={{ fontWeight: 500, fontSize: '0.875rem' }}>{node.name}</span>
{node.labels && Object.keys(node.labels).length > 0 && (
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 3 }}>
{Object.entries(node.labels).slice(0, 3).map(([k, v]) => (
<span key={k} className="cell-mono" style={{
padding: '1px 5px', borderRadius: 'var(--radius-sm)', fontSize: '0.6875rem',
background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
}}>{k}={v}</span>
))}
{Object.keys(node.labels).length > 3 && (
<span className="cell-muted" style={{ fontSize: '0.6875rem' }}>
+{Object.keys(node.labels).length - 3}
</span>
)}
</div>
)}
</div>
</td>
<td>
<span style={{ fontSize: '0.8125rem' }}>
{STATUS_LABELS[node.status] || node.status}
</span>
</td>
<td style={{ fontSize: '0.8125rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)' }}>
{totalVRAM ? (
<>{vendor && <span style={{ marginRight: 4 }}>{vendor}</span>}{totalVRAM}</>
) : totalRAM ? (
<span>CPU · {totalRAM}</span>
) : <span className="cell-muted"></span>}
</td>
<td>
{rowState?.status === 'installing' ? (
<span className="badge badge-info">
<i className="fas fa-spinner fa-spin" style={{ marginRight: 4 }} />Installing
</span>
) : rowState?.status === 'done' ? (
<span className="badge badge-success">
<i className="fas fa-check" style={{ marginRight: 4 }} />Installed
</span>
) : rowState?.status === 'error' ? (
<button
type="button"
className="badge badge-error"
title={rowState.error}
aria-describedby={`err-${node.id}`}
style={{ border: 'none', cursor: 'help' }}
>
<i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />Failed
<span id={`err-${node.id}`} style={{ position: 'absolute', left: -9999 }}>{rowState.error}</span>
</button>
) : suit === 'installed' ? (
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
Installed
</span>
) : suit === 'override' ? (
<span className="badge badge-warning">
<i className="fas fa-exclamation-circle" style={{ marginRight: 4 }} />Override
</span>
) : (
<span className="badge badge-success" style={{ background: 'var(--color-success-light)', color: 'var(--color-success)' }}>
Compatible
</span>
)}
</td>
</tr>
)
})}
{filteredNodes.length === 0 && (
<tr>
<td colSpan={5} style={{ textAlign: 'center', padding: 'var(--spacing-md)', color: 'var(--color-text-muted)' }}>
No nodes match the current filters.
</td>
</tr>
)}
</tbody>
</table>
</div>
{pendingCount > 0 && (
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 0, marginBottom: 'var(--spacing-sm)' }}>
+{pendingCount} awaiting approval <a href="/app/nodes" style={{ color: 'var(--color-primary)' }}>approve from Nodes</a>.
</p>
)}
{/* Mismatch confirm */}
{showMismatchConfirm && (
<div className="card" style={{
marginBottom: 'var(--spacing-sm)',
padding: 'var(--spacing-md)',
background: 'var(--color-warning-light)',
border: '1px solid var(--color-warning-border)',
borderRadius: 'var(--radius-md)',
}}>
<p style={{ marginTop: 0, marginBottom: 'var(--spacing-sm)', color: 'var(--color-warning)', fontSize: '0.875rem' }}>
Installing {targetLabel.toLowerCase()} on {counts.overrides} node{counts.overrides === 1 ? '' : 's'} that don't match. Those nodes will run inference on the chosen build, not their native GPU. Continue?
</p>
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end' }}>
<button className="btn btn-secondary btn-sm" type="button" onClick={() => setShowMismatchConfirm(false)}>
Cancel
</button>
<button className="btn btn-primary btn-sm" type="button" onClick={submit}
style={{ background: 'var(--color-warning)', borderColor: 'var(--color-warning)' }}>
Install on {targetLabel.replace(' build', '')}
</button>
</div>
</div>
)}
</>
)}
</div>
{!noNodes && (
<div style={{
padding: 'var(--spacing-md) var(--spacing-lg)',
borderTop: '1px solid var(--color-border-subtle)',
display: 'flex',
alignItems: 'center',
gap: 'var(--spacing-sm)',
flexWrap: 'wrap',
}}>
<div style={{ flex: 1, fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>
{totalAttempted > 0 ? (
<>
{doneCount} of {totalAttempted} done
{errorCount > 0 && (
<> · <span className="badge badge-error" style={{ fontSize: '0.6875rem' }}>{errorCount} failed</span></>
)}
</>
) : (
<>
{counts.selected} {counts.selected === 1 ? 'node' : 'nodes'} selected
{counts.already > 0 && <> · {counts.already} already installed</>}
{counts.overrides > 0 && <> · {counts.overrides} override{counts.overrides === 1 ? '' : 's'}</>}
</>
)}
</div>
{errorCount > 0 && !submitting && (
<button className="btn btn-secondary btn-sm" type="button" onClick={retryFailed}>
<i className="fas fa-redo" /> Retry failed nodes
</button>
)}
<button className="btn btn-secondary btn-sm" type="button" onClick={onClose} disabled={submitting}>
{totalAttempted > 0 && doneCount > 0 ? 'Close' : 'Cancel'}
</button>
<button
className="btn btn-primary btn-sm"
type="button"
onClick={submit}
disabled={submitting || counts.selected === 0 || showMismatchConfirm}
>
{submitting ? (
<><i className="fas fa-spinner fa-spin" /> Installing…</>
) : (
<>Install on {counts.selected} {counts.selected === 1 ? 'node' : 'nodes'}</>
)}
</button>
</div>
)}
</Modal>
)
}

View File

@@ -0,0 +1,29 @@
// ResourceActions groups row-level buttons into a lifecycle cluster (start,
// stop, pin, reinstall, upgrade) and a destructive cluster (delete) with a
// thin divider between them, so a destructive intent visually separates from
// a routine one. Replaces the old 4px-gap row of buttons in the Manage page
// where Stop / Pin / Delete sat shoulder-to-shoulder with no visual cue
// telling apart "click to fiddle" from "click to throw away".
//
// `lifecycle` and `destructive` accept any ReactNode — typically one or more
// <button>s. The wrapping div stops click propagation so action clicks don't
// also expand the row.
export default function ResourceActions({ lifecycle, destructive }) {
const hasLifecycle = !!lifecycle
const hasDestructive = !!destructive
if (!hasLifecycle && !hasDestructive) return null
return (
<div className="resource-actions" onClick={e => e.stopPropagation()}>
{hasLifecycle && (
<div className="resource-actions__group">{lifecycle}</div>
)}
{hasLifecycle && hasDestructive && (
<span className="resource-actions__divider" aria-hidden="true" />
)}
{hasDestructive && (
<div className="resource-actions__group">{destructive}</div>
)}
</div>
)
}

View File

@@ -51,7 +51,7 @@ export default function ResourceMonitor() {
<div className="resource-bar-container" style={{ flex: 1 }}>
<div className="resource-bar" style={{ width: `${pct}%`, background: color }} />
</div>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color, minWidth: '3em', textAlign: 'right' }}>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color, minWidth: '3em', textAlign: 'right' }}>
{pct.toFixed(0)}%
</span>
</div>
@@ -76,7 +76,7 @@ export default function ResourceMonitor() {
<div className="resource-bar-container" style={{ flex: 1 }}>
<div className="resource-bar" style={{ width: `${ram.usage_percent || 0}%`, background: percentColor(ram.usage_percent || 0) }} />
</div>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
{(ram.usage_percent || 0).toFixed(0)}%
</span>
</div>
@@ -91,7 +91,7 @@ export default function ResourceMonitor() {
{isGpu && aggregate.gpu_count > 1 && (
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
<span>Total VRAM</span>
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>
<span style={{ fontFamily: 'var(--font-mono)' }}>
{formatBytes(aggregate.used_memory)} / {formatBytes(aggregate.total_memory)} ({aggregate.usage_percent?.toFixed(1)}%)
</span>
</div>
@@ -101,7 +101,7 @@ export default function ResourceMonitor() {
{resources.storage_size != null && (
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
<span>Models storage</span>
<span style={{ fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-primary)' }}>
<span style={{ fontFamily: 'var(--font-mono)', color: 'var(--color-text-primary)' }}>
{formatBytes(resources.storage_size)}
</span>
</div>

View File

@@ -0,0 +1,81 @@
import { Fragment } from 'react'
// ResourceRow renders the visible row + its conditional detail row as a pair
// of <tr>s, so the existing .table styling keeps applying and the Manage page
// can re-use the gallery's expand-to-detail interaction without inventing a
// new table system. The consumer owns the cells (which pass through as
// children) — this component only manages the click-to-expand handler, the
// dimmed state for disabled rows, and the colSpan'd detail row beneath.
//
// `onToggleExpand` fires on row click only. Buttons / toggles inside cells
// must call e.stopPropagation() (or be wrapped in an .actions-stop wrapper)
// to avoid double-triggering the expand.
export default function ResourceRow({
expanded,
onToggleExpand,
detail,
colSpan,
dimmed,
className = '',
children,
}) {
return (
<Fragment>
<tr
className={`resource-row${dimmed ? ' is-dimmed' : ''}${expanded ? ' is-expanded' : ''} ${className}`.trim()}
onClick={onToggleExpand}
style={{ cursor: onToggleExpand ? 'pointer' : 'default' }}
>
{children}
</tr>
{expanded && detail && (
<tr className="resource-row__detail-row">
<td colSpan={colSpan} className="resource-row__detail-cell">
{detail}
</td>
</tr>
)}
</Fragment>
)
}
// ChevronCell is the small rotating chevron used as the leftmost cell of an
// expandable row. Mirrors the Nodes/Models/Backends gallery affordance so
// users see the same "click to expand" cue everywhere.
export function ChevronCell({ expanded }) {
return (
<td className="resource-row__chevron-cell">
<span className={`row-chevron${expanded ? ' is-expanded' : ''}`} aria-hidden="true">
<i className="fas fa-chevron-right" />
</span>
</td>
)
}
// IconCell renders the 48px brand icon shell — the same one the Install
// gallery uses. `icon` is the image URL (from gallery metadata); when absent
// or broken we fall back to a FontAwesome glyph so custom-imported items
// still get a placeholder instead of an empty square.
export function IconCell({ icon, fallback = 'fa-cube', alt = '' }) {
return (
<td className="resource-row__icon-cell">
<div className="resource-row__icon">
{icon ? (
<img src={icon} alt={alt} loading="lazy" />
) : (
<i className={`fas ${fallback}`} />
)}
</div>
</td>
)
}
// StopPropagationCell wraps cell contents that contain interactive controls
// (Toggle, action buttons) so a click on them doesn't also expand the row.
export function StopPropagationCell({ children, ...props }) {
return (
<td {...props} onClick={e => e.stopPropagation()}>
{children}
</td>
)
}

View File

@@ -116,7 +116,7 @@ export default function SearchableSelect({
aria-expanded={open}
onClick={() => { if (!disabled) { setOpen(!open); setQuery(''); setFocusIndex(-1) } }}
style={{
width: '100%', padding: '4px 8px', fontSize: '0.8125rem',
width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem',
cursor: disabled ? 'not-allowed' : 'pointer',
display: 'flex', alignItems: 'center', gap: '6px',
background: 'var(--color-bg-primary)', border: '1px solid var(--color-border)',
@@ -145,7 +145,7 @@ export default function SearchableSelect({
value={query}
onChange={(e) => { setQuery(e.target.value); setFocusIndex(-1) }}
onKeyDown={handleKeyDown}
style={{ width: '100%', padding: '4px 8px', fontSize: '0.8125rem' }}
style={{ width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem' }}
/>
</div>
<div ref={listRef} role="listbox" style={{ overflowY: 'auto', maxHeight: 'min(200px, 50vh)' }}>

View File

@@ -1,4 +1,4 @@
import { useState, useEffect } from 'react'
import { useState, useEffect, useRef } from 'react'
import { NavLink, useNavigate, useLocation } from 'react-router-dom'
import ThemeToggle from './ThemeToggle'
import { useAuth } from '../context/AuthContext'
@@ -24,6 +24,18 @@ const sections = [
{ path: '/app/quantize', icon: 'fas fa-compress', label: 'Quantize (Experimental)', feature: 'quantization' },
],
},
{
id: 'biometrics',
title: 'Biometrics',
featureMap: {
'/app/face': 'face_recognition',
'/app/voice': 'voice_recognition',
},
items: [
{ path: '/app/face', icon: 'fas fa-face-smile', label: 'Face Recognition', feature: 'face_recognition' },
{ path: '/app/voice', icon: 'fas fa-microphone-lines', label: 'Voice Recognition', feature: 'voice_recognition' },
],
},
{
id: 'agents',
title: 'Agents',
@@ -95,11 +107,22 @@ export default function Sidebar({ isOpen, onClose }) {
const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
const navigate = useNavigate()
const location = useLocation()
const closeBtnRef = useRef(null)
useEffect(() => {
fetch(apiUrl('/api/features')).then(r => r.json()).then(setFeatures).catch(() => {})
}, [])
// Move focus into the drawer when opened on mobile/tablet so keyboard
// and screen-reader users land inside the dialog. Targeting the close
// button avoids hijacking the visual focus to a nav item the user may
// not have meant to activate.
useEffect(() => {
if (!isOpen) return
const id = window.requestAnimationFrame(() => closeBtnRef.current?.focus())
return () => window.cancelAnimationFrame(id)
}, [isOpen])
// Auto-expand section containing the active route
useEffect(() => {
for (const section of sections) {
@@ -156,7 +179,11 @@ export default function Sidebar({ isOpen, onClose }) {
<>
{isOpen && <div className="sidebar-overlay" onClick={onClose} />}
<aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
<aside
id="app-sidebar"
className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}
aria-label="Primary navigation"
>
{/* Logo */}
<div className="sidebar-header">
<a href="./" className="sidebar-logo-link">
@@ -165,8 +192,13 @@ export default function Sidebar({ isOpen, onClose }) {
<a href="./" className="sidebar-logo-icon" title="LocalAI">
<img src={apiUrl('/static/logo.png')} alt="LocalAI" className="sidebar-logo-icon-img" />
</a>
<button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
<i className="fas fa-times" />
<button
ref={closeBtnRef}
className="sidebar-close-btn"
onClick={onClose}
aria-label="Close menu"
>
<i className="fas fa-times" aria-hidden="true" />
</button>
</div>

View File

@@ -0,0 +1,39 @@
// StatCard renders a single cluster/dashboard metric card. The left accent
// bar + icon chip color is driven by `accentVar` (a CSS custom property name,
// e.g. "--color-success") so the card reads as semantic without the caller
// having to reach into colors directly. `onClick` upgrades the card to a
// keyboard-focusable button — used by the Manage page so cards double as
// shortcuts to the relevant tab + filter.
export default function StatCard({ icon, label, value, color, accentVar, onClick }) {
const accent = color || (accentVar ? `var(${accentVar})` : 'var(--color-text-primary)')
const interactive = typeof onClick === 'function'
const handleKeyDown = interactive
? (e) => {
if (e.key === 'Enter' || e.key === ' ') {
e.preventDefault()
onClick(e)
}
}
: undefined
return (
<div
className="stat-card"
data-clickable={interactive ? 'true' : undefined}
role={interactive ? 'button' : undefined}
tabIndex={interactive ? 0 : undefined}
onClick={interactive ? onClick : undefined}
onKeyDown={handleKeyDown}
style={accentVar ? { ['--stat-accent']: `var(${accentVar})` } : undefined}
>
<div className="stat-card__body">
<div className="stat-card__label">{label}</div>
<div className="stat-card__value" style={{ color: accent }}>{value}</div>
</div>
<div className="stat-card__icon" style={accentVar ? { color: accent } : undefined}>
<i className={icon} />
</div>
</div>
)
}

View File

@@ -24,7 +24,7 @@ export default function TemplateSelector({ onSelect }) {
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)', lineHeight: 1.5, margin: 0 }}>
{t.description}
</p>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: '4px', marginTop: 'var(--spacing-xs)' }}>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)', marginTop: 'var(--spacing-xs)' }}>
{Object.keys(t.fields).filter(k => k !== 'name').map(k => (
<span key={k} className="badge" style={{
fontSize: '0.6875rem', background: 'var(--color-bg-tertiary)',

View File

@@ -187,7 +187,7 @@ export default function UnifiedMCPDropdown({
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
value={url}
onChange={e => setUrl(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="text"
@@ -195,7 +195,7 @@ export default function UnifiedMCPDropdown({
placeholder="Name (optional)"
value={name}
onChange={e => setName(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<input
type="password"
@@ -203,13 +203,13 @@ export default function UnifiedMCPDropdown({
placeholder="Auth token (optional)"
value={authToken}
onChange={e => setAuthToken(e.target.value)}
style={{ width: '100%', marginBottom: '4px' }}
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
/>
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
Use CORS proxy
</label>
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
<button type="button" className="btn btn-sm btn-primary" onClick={handleAddClient} disabled={!url.trim()}>Add</button>
</div>

View File

@@ -0,0 +1,63 @@
import { useEffect, useRef, useState } from 'react'
// BoundingBoxCanvas — overlay face-detection rectangles on the user-supplied image.
// boxes: [{ x, y, w, h, label?, sublabel?, tone? }]
// tone: 'default' | 'success' | 'warning' | 'error' | 'accent'
export default function BoundingBoxCanvas({ src, boxes = [], alt = '' }) {
const wrapRef = useRef(null)
const imgRef = useRef(null)
const [dims, setDims] = useState({ w: 0, h: 0, natW: 0, natH: 0 })
useEffect(() => {
const update = () => {
if (!wrapRef.current || !imgRef.current) return
const rect = imgRef.current.getBoundingClientRect()
setDims({
w: rect.width,
h: rect.height,
natW: imgRef.current.naturalWidth || 1,
natH: imgRef.current.naturalHeight || 1,
})
}
update()
const ro = new ResizeObserver(update)
if (imgRef.current) ro.observe(imgRef.current)
window.addEventListener('resize', update)
return () => {
ro.disconnect()
window.removeEventListener('resize', update)
}
}, [src])
const sx = dims.natW ? dims.w / dims.natW : 1
const sy = dims.natH ? dims.h / dims.natH : 1
return (
<div ref={wrapRef} className="biometrics-bbox">
{src && <img ref={imgRef} src={src} alt={alt} onLoad={(e) => {
setDims({
w: e.target.getBoundingClientRect().width,
h: e.target.getBoundingClientRect().height,
natW: e.target.naturalWidth,
natH: e.target.naturalHeight,
})
}} />}
{boxes.map((b, i) => (
<div key={i} className={`biometrics-bbox__box tone-${b.tone || 'accent'}`}
style={{
left: `${b.x * sx}px`,
top: `${b.y * sy}px`,
width: `${b.w * sx}px`,
height: `${b.h * sy}px`,
}}>
{(b.label || b.sublabel) && (
<div className="biometrics-bbox__tag">
{b.label && <strong>{b.label}</strong>}
{b.sublabel && <span>{b.sublabel}</span>}
</div>
)}
</div>
))}
</div>
)
}

View File

@@ -0,0 +1,33 @@
// DistributionBars — one horizontal bar per label, width proportional to value.
// distribution: Record<string, number> (values are probabilities 0..1 or any positive scale).
// dominant: string — highlighted row.
export default function DistributionBars({ title, distribution, dominant, icon }) {
if (!distribution || Object.keys(distribution).length === 0) return null
const entries = Object.entries(distribution).sort((a, b) => b[1] - a[1])
const max = entries.reduce((m, [, v]) => Math.max(m, v), 0) || 1
return (
<div className="biometrics-dist card">
<div className="biometrics-dist__head">
{icon && <i className={icon} aria-hidden="true" />}
<h3>{title}</h3>
{dominant && <span className="biometrics-dist__dominant">{dominant}</span>}
</div>
<ul className="biometrics-dist__rows">
{entries.map(([label, value]) => {
const pct = (value / max) * 100
const isDominant = label === dominant
return (
<li key={label} className={`biometrics-dist__row ${isDominant ? 'dominant' : ''}`}>
<span className="biometrics-dist__label">{label}</span>
<div className="biometrics-dist__bar-wrap" aria-hidden="true">
<div className="biometrics-dist__bar" style={{ width: `${pct}%` }} />
</div>
<span className="biometrics-dist__value">{(value * 100).toFixed(1)}%</span>
</li>
)
})}
</ul>
</div>
)
}

View File

@@ -0,0 +1,89 @@
import { useMemo, useRef, useEffect, useState } from 'react'
// EmbeddingInspector — compact visualization of a raw vector returned by /v1/face|voice/embed.
// embedding: number[] (can be large). dim: int. model: string.
export default function EmbeddingInspector({ embedding, dim, model, elapsedMs }) {
const canvasRef = useRef(null)
const [copied, setCopied] = useState(false)
const summary = useMemo(() => {
if (!embedding || !embedding.length) return null
let sum = 0, sumSq = 0, min = Infinity, max = -Infinity
for (const v of embedding) {
sum += v
sumSq += v * v
if (v < min) min = v
if (v > max) max = v
}
const mean = sum / embedding.length
const norm = Math.sqrt(sumSq)
return { mean, norm, min, max }
}, [embedding])
useEffect(() => {
if (!canvasRef.current || !embedding?.length) return
const canvas = canvasRef.current
const dpr = window.devicePixelRatio || 1
const cssW = canvas.clientWidth
const cssH = 60
canvas.width = Math.floor(cssW * dpr)
canvas.height = Math.floor(cssH * dpr)
const ctx = canvas.getContext('2d')
ctx.scale(dpr, dpr)
ctx.clearRect(0, 0, cssW, cssH)
const COUNT = Math.min(embedding.length, 128)
const values = embedding.slice(0, COUNT)
const max = Math.max(...values.map(Math.abs)) || 1
const mid = cssH / 2
const barW = cssW / COUNT
const accent = getComputedStyle(canvas).getPropertyValue('--color-accent').trim() || '#e8a87c'
const accentMuted = getComputedStyle(canvas).getPropertyValue('--color-text-muted').trim() || '#6c7084'
ctx.strokeStyle = accentMuted
ctx.beginPath()
ctx.moveTo(0, mid + 0.5)
ctx.lineTo(cssW, mid + 0.5)
ctx.stroke()
ctx.fillStyle = accent
for (let i = 0; i < COUNT; i++) {
const v = values[i]
const h = (Math.abs(v) / max) * (cssH * 0.45)
if (v >= 0) ctx.fillRect(i * barW, mid - h, Math.max(0.5, barW - 0.5), h)
else ctx.fillRect(i * barW, mid, Math.max(0.5, barW - 0.5), h)
}
}, [embedding])
if (!embedding) return null
const copy = async () => {
try {
await navigator.clipboard.writeText(JSON.stringify(embedding))
setCopied(true)
setTimeout(() => setCopied(false), 1500)
} catch (_) {
/* clipboard gated */
}
}
return (
<div className="biometrics-embed card">
<div className="biometrics-embed__head">
<div>
<div className="biometrics-embed__title">Embedding vector</div>
<div className="biometrics-embed__meta">
{dim != null && <span><strong>{dim}</strong> dims</span>}
{summary && <span>L2 <strong>{summary.norm.toFixed(3)}</strong></span>}
{summary && <span>range <strong>[{summary.min.toFixed(3)}, {summary.max.toFixed(3)}]</strong></span>}
{model && <span>model <code>{model}</code></span>}
{elapsedMs != null && <span>{elapsedMs.toFixed(0)} ms</span>}
</div>
</div>
<button type="button" className="btn btn-secondary btn-sm" onClick={copy}>
<i className={`fas ${copied ? 'fa-check' : 'fa-copy'}`} aria-hidden="true" />
{copied ? ' Copied' : ' Copy JSON'}
</button>
</div>
<canvas ref={canvasRef} style={{ width: '100%', height: 60 }} aria-label="Embedding sparkline (first 128 dimensions)" />
</div>
)
}

Some files were not shown because too many files have changed in this diff Show More