The shared backend/Dockerfile.python ends in:
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.
DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.
Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.
Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
The Dockerfile's HEALTHCHECK probes http://localhost:8080/readyz, which
is the OpenAI API server port. When the same image runs as a worker, it
listens on the gRPC base port (50051) and an HTTP file transfer server
on port-1 (50050) — nothing on 8080 — so docker always reports the
container as unhealthy.
Add unauthenticated /readyz and /healthz endpoints to the worker's HTTP
file transfer server, and override HEALTHCHECK_ENDPOINT for worker-1 in
the distributed compose file. Disable the healthcheck for agent-worker
since it is NATS-only and exposes no HTTP server.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
from the unused gha cache to type=registry pointing at
quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
ignore-error=true. Master/tag builds populate their own
per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
Dockerfile stage that was removed by b1fc5acd in July 2025, has been
failing every night since, and never populated the gha cache. The new
registry-cache scheme is self-warming, so no separate populator is
needed.
- Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS
build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in
the root Dockerfile (the root Dockerfile no longer compiles gRPC; the
source build now lives in backend/Dockerfile.{llama-cpp,
ik-llama-cpp, turboquant} only and uses its own ARG defaults).
- Drop the unused grpc-base-image input from image_build.yml plus the
matrix passthroughs in image.yml / image-pr.yml.
- Drop the unused GRPC_VERSION env in test.yml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
Replace the universal max-width:1200px cap on .page with a four-tier
archetype system (narrow 760, medium 1080, default 1600, wide unbounded)
selected per page based on what its UX actually wants. Data/table pages
fill ultrawide displays; forms cap at reading width; tabbed feature
surfaces breathe.
Mobile/tablet:
- New 640/1024 breakpoint split. Tablets (640-1023) get a persistent
52px icon rail; below 640 keeps the slide-off drawer.
- Drawer polish: body-scroll lock, Escape to close, focus moves into
the drawer on open and back to the hamburger on close, aria-hidden
+ inert on main while open.
- Mobile top bar carries hamburger + theme toggle + account avatar
(44x44 touch targets) so theme/account aren't trapped in the drawer.
- Page-level reflow on phones: page-header column-stacks, filter chips
scroll horizontally, tables go edge-to-edge, OperationsBar overflows
rather than wrapping. Honors prefers-reduced-motion.
Manage > Models: drop the toggle column; Enable/Disable joins the
per-row Actions menu alongside Stop/Pin/Edit/Logs/Delete for
consistency with the other action verbs.
Page-width tokens live in theme.css so future tuning is one line.
Removes 7 inline maxWidth workarounds from page roots.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Edit] [Bash]
Meta backends are now always shown — they're the entries operators
configure against — and two independent toggles govern the noise around
them. "Variants" hides platform-specific concrete builds that a meta
backend aliases on the host (e.g. llama-cpp-cuda12-12.4). "Development"
hides pre-release `-development` builds. Each toggle shows the count of
items currently hidden in its category. The legacy `bm` URL flag is
honored on read so existing deep-links resolve to the same view they
used to.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The overrides.parameters.model field referenced 'Qwen3.-27B-Claude-...' (missing the '5'), so model loads failed because the configured filename did not match the file actually downloaded by the entry's files: list ('Qwen3.5-27B-Claude-...').
Aligns the override filename with the files: entries and with the upstream HF repo (mradermacher/Qwen3.5-27B-...).
Mirrors the whisper capabilities map with -development variants so
clients can pull the master-tagged whisper.cpp backend via a single
platform-resolved name, matching the existing faster-whisper-development
and whisperx-development entries.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
In distributed mode the Backends gallery used to fan every install out to
every worker — fine for auto-resolving (meta) backends like llama-cpp where
each node picks its own variant, but wrong for hardware-specific builds
like cpu-llama-cpp that would silently land on every GPU node.
Adds a node-targeted install path through the existing
POST /api/nodes/:id/backends/install plumbing, with two entry points:
- Backends gallery row gets a split-button in distributed mode. Auto-
resolving keeps "Install on all nodes" as the primary; chevron menu
opens the picker. Hardware-specific routes the primary directly to the
picker — no fan-out path on the row.
- Nodes-page drawer gets a "+ Add backend" button that navigates to
/app/backends?target=<node-id>; the gallery scopes itself to that node
(banner, single per-row install button, Reinstall/Remove for already-
installed). One gallery, two scopes — no second UI to maintain.
The picker (new NodeInstallPicker) shows a 3-state suitability column
(Compatible / Override / Installed), an auto-expanding variant override
disclosure that fires when selected nodes have no working GPU, parallel
per-node installs with inline status and Retry-failed-nodes, and a
mismatch confirm that names the consequence on the button itself.
A 409 fan-out guard on /api/backends/apply protects CLI/Terraform/script
users from the same footgun: hardware-specific installs in distributed
mode now return code "concrete_backend_requires_target" with a human-
readable error and a meta_alternative pointer.
The gallery list payload now surfaces capabilities, metaBackendFor and
per-row nodes (NodeBackendRef) so the picker and the new Nodes column
have everything they need without re-walking the gallery client-side.
GODEBUG=netdns=go is set on the compose services because the cgo DNS
resolver follows the container's nsswitch.conf to host systemd-resolved
(127.0.0.53), unreachable from inside the container; the pure-Go
resolver reads /etc/resolv.conf directly and uses Docker's embedded DNS.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7[1m] [Edit] [Bash] [Read] [Write]
Manage page row actions moved into ActionMenu in b336d9c6, so the
inline `<a title="Backend logs">` the e2e specs were asserting on no
longer exists. Open the row's kebab and assert against the menuitem.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
Bring the System / Manage page up to the visual standard of the Install
gallery so installed models and backends stop reading like a debug dump.
- Unified ResourceRow anatomy (icon, name+description, badges, status,
expandable detail) shared across both tabs.
- Gallery enrichment cross-references installed names against the gallery
list endpoints to surface icons, descriptions, license, tags, and links
with a graceful "no description" fallback for custom imports.
- Header summary with four StatCards (Models / Backends / Running /
Updates) — clickable to switch tab + pre-set filter.
- Backends meta + development entries hidden by default; "Show meta &
development" paired toggle in the FilterBar with hidden-count hint.
- Kebab (three-dot) ActionMenu replaces the inline button cluster on
every row; restrained until hover, keyboard-navigable, danger items
separated by a divider.
- Backend "Version" cell now falls back to short digest, OCI tag, or
ocifile basename when no semver is set, instead of showing "—" for
every OCI install. Detail panel exposes full Source URI + Digest.
- Drop redundant column headers ("Actions", "On") — kebabs and toggles
carry their own affordance; screen readers still get a label.
- Inline System / User / Meta / Dev badges next to the backend name so
the dedicated Type column doesn't reserve space for "USER" repeated.
- Tightened the spacing between the System Resources card and the
StatCards so they no longer crowd the RAM bar.
Extracted StatCard and GalleryLoader from Nodes.jsx and Models.jsx into
shared components so the visual language is one source of truth.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
The local model directory scan treats every non-skipped file as a model
config candidate. Sidecar artifacts that ship alongside checkpoints
(checkpoint blobs, downloaded archives, ggml-style tag files) were
slipping through and showing up as bogus models in the listing. Add
their extensions to the suffix-skip list.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
The chat and agent-chat pages auto-scrolled to the bottom on every
streamed token. If the user scrolled up to re-read part of a response,
the next chunk pulled them back down — making long replies unreadable
while streaming.
Track a stickToBottomRef on each scroll event: if the user is within
80px of the bottom we keep auto-scrolling, otherwise we leave them
where they are. On chat switch we snap back to the bottom and re-pin.
Same fix applied to both Chat.jsx and AgentChat.jsx since they share
the same streaming pattern.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
- useModels.refetch now runs silently — distributed-mode 10s auto-refresh
no longer flips loading=true and replaces the table with a spinner card.
- Manage Use Cases column derives badges from each model's actual
capabilities (Chat / Image / TTS / Embeddings / etc.) instead of
hardcoding a "Chat" link for every row.
- FilterBar right slot is right-aligned via margin-left:auto so the
Update button lives at the end of the row, not next to the chips.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
- embeddings → embedding (6 models): aligns with the WebUI filter button
defined in core/http/views/models.html ({ term: 'embedding', ... }), so
models like nomic-embed-text-v1.5 now appear under the Embedding filter
- TTS → tts (5 models), ASR → asr (2 models): lowercase, per existing
convention used by 161+ models
- CPU/Cpu → cpu (17 models), GPU → gpu (17 models): lowercase, per existing
convention used by 666+ models
- dedupe duplicate tag entries on 3 models that already had repeated tags
(gpt-oss-20b had gguf x2; arcee-ai/AFM-4.5B had gpu x2; one Qwen model
had default x2)
Closes#9247
Extend the existing CPU build matrix entries to produce a multi-arch
manifest (linux/amd64,linux/arm64) at the same image tags. arm64
Linux hosts without an NVIDIA GPU report the "default" capability,
which already maps to cpu-whisperx / cpu-faster-whisper in
backend/index.yaml -- so the manifest list lets Docker pull the right
variant without any gallery changes.
Both stacks install cleanly under aarch64: torch (2.4.1/2.8.0),
faster-whisper, ctranslate2, whisperx, opencv-python and the
remaining deps all ship manylinux2014_aarch64 wheels, so no source
builds run under QEMU emulation.
Follows the same pattern already used by cpu-llama-cpp-quantization.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The docs site uses the hugo-theme-relearn theme, which provides
`notice` instead of Docsy's `alert`. The face-recognition,
voice-recognition, and stores feature pages used `{{% alert %}}`,
breaking `hugo build` with "template for shortcode \"alert\" not
found".
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Blaizzy/mlx-vlm git HEAD bumped its constraint to mlx>=0.31.2, but
mlx-cuda-12 and mlx-cuda-13 are only published up to 0.31.1 on PyPI.
Since mlx[cudaXX]==0.31.2 forces a sibling wheel that doesn't exist,
pip backtracks through every older mlx[cudaXX], none of which satisfy
mlx>=0.31.2, producing ResolutionImpossible.
Pin all variants to the v0.4.4 tag (mlx>=0.30.0), which resolves
cleanly against mlx[cuda13]==0.31.1. cpu/mps weren't broken yet but
are pinned for consistency.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:
ImportError: .../flash_attn_2_cuda...so: undefined symbol:
_ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.
vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.
Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang
Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM
serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and
JetPack 7 boards:
- vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130
prebuilt vllm + flash-attn).
- vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm
0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master;
cu12 path keeps the 0.14.0 pin to avoid disturbing existing images.
- sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the
jetson-ai-lab SBSA cu130 index, so no source build is needed.
Cublas13 sglang on x86_64 is intentionally deferred.
CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni},
-nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml
gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13)
and latest/development merge entries.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
* fix(backends): use unsafe-best-match index strategy on l4t13 builds
The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.)
at limited versions / older Python ABIs. uv defaults to the first index
that contains a package and refuses to fall through to PyPI, so sglang
l4t13 build fails resolving decord. Mirror the existing cpu sglang
profile by setting --index-strategy=unsafe-best-match on l4t13 across
the three backends, and apply it to the explicit vllm install line in
vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
* fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0
The [all] extra brings in outlines→decord, and decord has no aarch64
cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37
tags). With unsafe-best-match enabled, uv backtracked through sglang
versions trying to satisfy decord and silently landed on
sglang==0.1.16, an ancient version with an entirely different dep
tree (cloudpickle/outlines 0.0.44, etc.).
Drop [all] so decord is no longer required, and floor sglang at 0.5.0
to prevent any future resolver misfire from degrading the version
again.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(distributed): surface per-node backend op errors to OpStatus
DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the
per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`.
When workers replied Success=false (e.g. an OCI image with no arm64
variant on a Jetson host), the per-node Error string was recorded in
result.Nodes[].Error but never reached the toplevel return value, so
OpStatus.Error stayed empty and the UI reported the install as
"completed" while the backend was nowhere on the cluster.
Add BackendOpResult.Err() that aggregates per-node Status=="error"
entries into a single error. Queued nodes (waiting for reconciler retry)
are deliberately not treated as failures. Wire the three callers and
DeleteBackendDetailed to call result.Err() so reply.Success=false
finally reaches OpStatus.Error → /api/backends/job/:uid → the UI.
The Delete closures had a related bug: they discarded the reply with
`_` and only checked the NATS round-trip error, so reply.Success=false
was a silent success even with the new aggregation. Check both.
Standalone mode (LocalBackendManager) already surfaces gallery errors
correctly through the same OpStatus.Error path; no change needed there.
Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct
errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete.
Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
* feat(react-ui): per-node backend delete + clearer upgrade affordance
The Nodes page exposed a per-node "reinstall" button (fa-sync-alt,
tooltip "Reinstall backend") but no per-node delete, even though the
Go side has had POST /api/nodes/:id/backends/delete →
RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up
for a while. Sync icons read as "refresh data" — the action is
functionally an upgrade (re-pulls the gallery image), so the affordance
was misleading.
Per-node backend row now renders two icon buttons:
- Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend
on this node". Names both action and scope to differentiate from the
cluster-wide upgrade on the Backends page.
- Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend
from this node". Matches the node-level destructive style at the row
action column rather than the solid btn-danger of primary destructive
pages, since this is a secondary action inside a busy row.
Delete goes through the existing ConfirmDialog (danger=true) with copy
that names the backend and the node explicitly — it's a non-recoverable
op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which
already existed in the API client.
Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip),
delete button presence, confirm dialog flow with POST body assertion,
and cancel-doesn't-POST.
Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
* feat(react-ui): editorial refresh with Nord palette and polished primitives
Replaces the cool gray-blue theme with a deep Nord-inspired palette:
frost-cyan accent (#88c0d0) on deep blue-black surfaces (#13171f /
#1a1f2a / #242a36), snow-storm text scale, aurora status colours.
- Typography: Geist Variable + Geist Mono Variable (Google Fonts) with
ss01/ss03/cv11 stylistic alternates; strengthened h1-h6 hierarchy;
editorial negative tracking.
- Primitives: buttons gain depth (inset highlight + hover lift +
brightness filter); inputs become sunken wells with sage-swap-to-frost
focus rings; cards hover-lift and gain an .card--accent left-rail
variant; badges become mono caps rectangles with tabular-nums.
- Chrome: sidebar active state is now an inset left rail + tint
(no border-left); modals get popIn animation and proper shadow lift;
toasts carry an inset accent bar + slide-in instead of tinted fills;
operations bar breathes on active installs.
- Empty states: editorial pattern (eyebrow rule, large mono title,
52ch lede) that inherits gracefully even without page JSX edits.
- Chat: assistant bubbles drop the gray-nested-in-gray card for a
transparent pull-quote with a left border; user bubbles soften from
loud accent fill to a subtle frost tint.
- Motion: custom spring easing cubic-bezier(0.22,1,0.36,1), 180ms
standard; breathing/pulse/popIn keyframes; global prefers-reduced-
motion honoring.
- Radii tightened to 3/5/8/10px; warm-shadow tokens redone for cool
depth; ::selection, :focus-visible, kbd globals added.
- Migrated hardcoded 'JetBrains Mono' CSS literals to var(--font-mono)
so the Geist Mono swap lands everywhere.
Scope is intentionally tokens + primitives only. Page JSX and the
~1,800 inline style={{…}} instances are untouched and flagged as
follow-ups.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]
* feat(react-ui): complete-coverage pass — migrate inline styles to tokens
Follows up the editorial/Nord token refresh with a mechanical sweep of
page JSX and shared components so nothing bypasses the design system.
- Font family: replaced 80+ 'JetBrains Mono' / 'Space Grotesk' inline
literals (and the string-CSS variants in CollectionDetails and
AgentStatus) with var(--font-mono) / var(--font-sans). SVG <text>
nodes that used the attribute form were switched to style={{ }} so
the CSS variable resolves.
- Radii: every unquoted numeric borderRadius (2/3/4/10) is now a
var(--radius-*) token; 50% and 999px kept as computed shapes.
- Spacing: clean-token gaps and margins (4/8/16px) moved to
var(--spacing-xs/sm/md); padding: '4px 8px' and '8px 16px' lifted
into token pairs. Micro-values (2/6/10/12px) left inline where no
token maps cleanly.
- Colors: Talk.jsx button/canvas-surface hardcodes moved to
var(--color-*); FineTune.jsx chart series colours now use the
--color-data-* Nord palette (cyan/red/purple/orange instead of
tailwind hex); AgentStatus tool-call icon and error tag hex swapped
for var(--color-warning) / var(--color-text-inverse).
- CodeMirror editor (utils/cmTheme.js): both themes rebased on Nord —
polar-night surfaces and aurora syntax highlighting (dark), snow-
storm surfaces with darkened aurora (light). Caret/selection/active
line/search now frost-cyan tinted instead of legacy indigo/purple.
Legitimately dynamic styles (computed widths, per-row colours, canvas
2D context fill/stroke for waveform and spectrogram drawing) remain
inline — they can't be expressed as CSS tokens.
29 files, +237/-237 — identity preserved, semantics re-anchored to
the token system.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.
Three causes addressed:
* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
(SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
`isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
`/sys/devices/soc0/soc_id`.
* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
`/proc/device-tree/model` was missing. That's what happens for Thor inside a
docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
(`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
correctly.
* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
with `waitid: no child processes` under containers without `--init`. Each
such heartbeat overwrote the DB and made the UI flip to "fully used".
`heartbeatBody` now omits `available_vram` in that case so the DB keeps its
last good value.
Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.
Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.
Assisted-by: Claude:opus-4.7 [Claude Code]
* Use latest oneapi-basekit image for Intel images
The current `localai/localai:master-gpu-intel` images don't work with the intel arc pro b70. Updating the base_image to 2025.3.2 fixes it.
Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>
* Update github workflow base image
---------
Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>
The llama.cpp C++-side chat autoparser clears Reply.Message and delivers
parsed content/reasoning/tool-calls via Reply.chat_deltas. chat.go handles
this (non-SSE path uses ToolCallsFromChatDeltas/ContentFromChatDeltas/
ReasoningFromChatDeltas), but realtime.go only read pred.Response, so any
model routed through the autoparser (Qwen2.5/3 and friends) produced a
silent reply: backend emitted N tokens, the session surface saw zero.
Mirror the non-SSE chat path in realtime's triggerResponse: when deltas
carry tool calls or content, use them directly; otherwise fall back to
the existing raw-text parsing.
Assisted-by: claude-opus-4-7-1M [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
feat(backend): Add Sherpa ONNX backend and Omnilingual ASR
Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same
approach as opus/stablediffusion-ggml/whisper — a thin C shim
(csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego
can't reach directly: nested struct config writes, result-struct field
reads, and the streaming TTS callback trampoline. The Go side uses
opaque uintptr handles and purego.NewCallback for the TTS callback.
Supports:
- VAD via sherpa-onnx's Silero VAD
- Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC
- Online/streaming ASR: zipformer transducer with endpoint detection
(AudioTranscriptionStream emits delta events during decode)
- Offline TTS: VITS (LJS, etc.)
- Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel,
prefixed by a streaming WAV header
Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline
ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR),
silero-vad-sherpa, vits-ljs-sherpa.
E2E coverage: tests/e2e-backends for offline + streaming ASR,
tests/e2e for the full realtime pipeline (VAD + STT + TTS).
Assisted-by: claude-opus-4-7-1M [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Bumps ik_llama.cpp pin to 16996aeab7. Upstream 286ce32...16996ae adds a
trailing `const struct quantize_user_data *` parameter to
`ggml_quantize_chunk` (PR ikawrakow/ik_llama.cpp#1677) but leaves
`examples/llava/clip.cpp` unchanged because their build has moved to
`examples/mtmd/`. LocalAI's prepare.sh still copies from
`examples/llava/`, so the dead 7-arg call reaches the grpc-server
compile and fails. Patch the call site to pass `nullptr` for the new
param.
Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]
* fix(anthropic): use SetFunctionCallNameString for specific tool forcing
* fix(openai/realtime): use SetFunctionCallNameString for specific tool forcing
* fix(openresponses): use SetFunctionCallNameString for specific tool forcing
* feat(react-ui): add Face & Voice Recognition pages
Expose the face and voice biometrics endpoints
(/v1/face/*, /v1/voice/*) through the React UI. Each page has four
tabs driving the six endpoints per modality: Analyze (demographics
with bounding boxes / waveform segments), Compare (verify with a
match gauge and live threshold slider), Enrollment (register /
identify / forget with a top-K matches view), Embedding (raw
vector inspector with sparkline + copy).
MediaInput supports file upload plus live capture: webcam
snap-to-canvas for face, MediaRecorder -> AudioContext ->
16-bit PCM mono WAV transcode for voice (libsndfile on the
backend only handles WAV/FLAC/OGG natively).
Sidebar gets a new Biometrics section feature-gated on
face_recognition / voice_recognition; routes are wrapped in
<RequireFeature>. No new dependencies -- Font Awesome icons
picked from the Free set.
Assisted-by: Claude:Opus 4.7
* fix(localai): accept data URI prefixes with codec/charset params
Browser MediaRecorder produces data URIs like
data:audio/webm;codecs=opus;base64,...
so the pre-';base64,' section can carry multiple parameter
segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go
and core/http/endpoints/localai/audio.go only matched exactly one
segment, so recordings straight from the React UI's live-capture
tab failed the strip and then tripped the base64 decoder on the
leading 'data:' literal, surfacing as
"invalid audio base64: illegal base64 data at input byte 4"
Widened both regexes to `^data:[^,]+?;base64,` so any number of
';param=value' segments between the mime type and ';base64,' are
tolerated. Added a regression test covering the MediaRecorder
shape.
Assisted-by: Claude:Opus 4.7
* fix(insightface): scope pack ONNX loading to known manifests
LocalAI's gallery extracts buffalo_* zips flat into the models
directory, which inevitably mixes with ONNX files from other
backends (opencv face engine, MiniFASNet antispoof, WeSpeaker
voice embedding) and older buffalo pack installs. Feeding those
foreign files into insightface's model_zoo.get_model() blows up
inside the router -- it assumes a 4-D NCHW input and indexes
`input_shape[2]` on tensors that aren't shaped like a face model,
raising IndexError mid-load and leaving the backend unusable.
The router's dispatch isn't amenable to per-file try/except alone
(first-file-wins picks det_10g.onnx from buffalo_l even when the
user asked for buffalo_sc -- alphabetical order happens to favour
the wrong pack). Instead, ship an explicit manifest of the
upstream v0.7 pack contents and scope the glob to that when the
requested pack is known. The manifest is small and stable; future
packs can be added alongside or fall through to the tolerance
loop, which also swallows any remaining IndexError / ValueError
from foreign files with a clear `[insightface] skipped` stderr
line for diagnostics.
Assisted-by: Claude:Opus 4.7
* fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders
Pre-exported speaker-encoder ONNX graphs come in two shapes:
rank-2 [batch, samples] -- some 3D-Speaker exports,
take raw waveform directly.
rank-3 [batch, frames, n_mels] -- WeSpeaker and most Kaldi-
lineage encoders, expect
pre-computed Kaldi FBank.
OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` --
correct for rank-2, IndexError-on-input_shape[3] on rank-3, which
surfaced to the UI as
"Invalid rank for input: feats Got: 2 Expected: 3"
Detect the input rank at session init and run Kaldi FBank
(80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before
the forward pass when rank>=3. All knobs are configurable via
backend options for encoders that deviate from defaults.
torchaudio.compliance.kaldi is already in the backend's
requirements (SpeechBrain pulls torchaudio in), so no new
dependency.
Assisted-by: Claude:Opus 4.7
* fix(biometrics): isolate face and voice vector stores
Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker
256-D) biometric embeddings were colliding inside a single
in-memory local-store instance. Enrolling one after the other
failed with
"Try to add key with length N when existing length is M"
because local-store correctly refuses to mix dimensions in one
keyspace.
The registries were constructed with `storeName=""`, which in
StoreBackend() is just a WithModel() call. But ModelLoader's
cache is keyed on `modelID`, not `model` -- so both registries
collapsed to the same `modelID=""` slot and reused the same
backend process despite looking isolated on paper.
Three complementary fixes:
1. application.go -- give each registry a distinct default
namespace ("localai-face-biometrics" /
"localai-voice-biometrics"). The comment claimed
isolation, now it's actually enforced.
2. stores.go -- pass the storeName as both WithModelID and
WithModel so the ModelLoader cache key separates
namespaces and the loader spawns distinct processes.
3. local-store/store.go -- drop the Load() `opts.Model != ""`
guard. It was there to prevent generic model-loading loops
from picking up local-store by accident, but that auto-load
path is being retired; the guard now just blocks legitimate
namespace isolation. opts.Model is treated as a tag; the
per-tuple process isolation upstream handles discrimination.
Assisted-by: Claude:Opus 4.7
* fix(gallery): stale-file cleanup and upgrade-tmp directory safety
Two related robustness fixes for backend install/upgrade:
pkg/downloader/uri.go
OCI downloads passed through
if filepath.Ext(filePath) != "" ...
filePath = filepath.Dir(filePath)
which was intended to redirect file-shaped download targets
into their parent directory for OCI extraction. The heuristic
misfires on directory-shaped paths with a dot-suffix --
gallery.UpgradeBackend uses
tmpPath = "<backendsPath>/<name>.upgrade-tmp"
and Go's filepath.Ext treats ".upgrade-tmp" as an extension.
The rewrite landed the extraction at "<backendsPath>/", which
then **overwrote the real install** (backends/<name>/) with a
flat-layout file and left a stray run.sh at the top level. The
tmp dir itself stayed empty, so the validation step that
checked "<tmpPath>/run.sh" predictably failed with
"upgrade validation failed: run.sh not found in new backend"
Every manual upgrade silently corrupted the backends tree this
way. Guard the rewrite behind "target isn't already an existing
directory" -- InstallBackend / UpgradeBackend both pre-create
the target as a directory, so they get the correct behaviour;
existing file-path callers with a genuine dot-extension still
get the parent redirect.
core/gallery/backends.go
InstallBackend's MkdirAll returned ENOTDIR when something at
the target path was already a file (legacy dev builds dropped
golang backend binaries directly at `<backendsPath>/<name>`
instead of nesting them under their own subdir). That
permanently blocked reinstall and upgrade for anyone carrying
that state, since every retry hit the same error. Detect a
pre-existing non-directory, warn, and remove it before the
MkdirAll so the fresh install can write the correct nested
layout with metadata.json + run.sh.
Assisted-by: Claude:Opus 4.7
* fix(galleryop): refresh upgrade cache after backend ops
UpgradeChecker caches the last upgrade-check result and only
refreshes on the 6-hour tick or after an auto-upgrade cycle.
Manual upgrades (POST /api/backends/upgrade/:name) go through
the async galleryop worker, which completes the upgrade
correctly but never tells UpgradeChecker to re-check -- so
/api/backends/upgrades continued to list a just-upgraded backend
as upgradeable, indistinguishable from a failed upgrade, for up
to six hours.
Add an optional `OnBackendOpCompleted func()` hook on
GalleryService that fires after every successful install /
upgrade / delete on the backend channel (async, so a slow
callback doesn't stall the queue). startup.go wires it to
UpgradeChecker.TriggerCheck after both services exist. Result:
the upgrade banner clears within milliseconds of the worker
finishing.
Assisted-by: Claude:Opus 4.7
* build: prepend GOPATH/bin to PATH for protogen-go
install-go-tools runs `go install` for protoc-gen-go and
protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin.
That directory isn't on every dev's PATH, and protoc resolves
its code-gen plugins via PATH, so the immediately-following
protoc invocation fails with
"protoc-gen-go: program not found"
which in turn blocks `make build` and any
`make backends/%` target that depends on build.
Prepend `go env GOPATH`/bin to PATH for the protoc invocation
so the freshly-installed plugins are found without requiring a
shell-profile change.
Assisted-by: Claude:Opus 4.7
* refactor(ui-api): non-blocking backend upgrade handler with opcache
POST /api/backends/upgrade/:name used to send the ManagementOp
directly onto the unbuffered BackendGalleryChannel, which blocked
the HTTP request whenever the galleryop worker was busy with a
prior operation. The op also didn't show up in /api/operations,
so the Backends UI couldn't reflect upgrade progress on the
affected row.
Register the op in opcache immediately, wrap it in a cancellable
context, store the cancellation function on the GalleryService,
and push onto the channel from a goroutine so the handler
returns right away. Response gains a `jobID` field and a
`message` string so clients have a consistent handle regardless
of whether the op is queued or running.
Pairs with the OnBackendOpCompleted hook added in the galleryop
commit — together the UI sees the upgrade start, watches
progress via /api/operations, and drops the "upgradeable" flag
the moment the worker finishes.
Assisted-by: Claude:Opus 4.7