Commit Graph

6067 Commits

Author SHA1 Message Date
Ettore Di Giacinto
f0ab68e352 feat(distributed): durable backend fan-out + state reconciliation
Two connected problems handled together:

1) Backend delete/install/upgrade used to silently skip non-healthy nodes,
   so a delete during an outage left a zombie on the offline node once it
   returned. The fan-out now records intent in a new pending_backend_ops
   table before attempting the NATS round-trip. Currently-healthy nodes
   get an immediate attempt; everyone else is queued. Unique index on
   (node_id, backend, op) means reissuing the same operation refreshes
   next_retry_at instead of stacking duplicates.

2) Loaded-model state could drift from reality: a worker OOM'd, got
   killed, or restarted a backend process would leave a node_models row
   claiming the model was still loaded, feeding ghost entries into the
   /api/nodes/models listing and the router's scheduling decisions.

The existing ReplicaReconciler gains two new passes that run under a
fresh KeyStateReconciler advisory lock (non-blocking, so one wedged
frontend doesn't freeze the cluster):

  - drainPendingBackendOps: retries queued ops whose next_retry_at has
    passed on currently-healthy nodes. Success deletes the row; failure
    bumps attempts and pushes next_retry_at out with exponential backoff
    (30s → 15m cap). ErrNoResponders also marks the node unhealthy.

  - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are
    loaded but hasn't seen touched in the last probeStaleAfter (2m).
    Unreachable addresses are removed from the registry. A pluggable
    ModelProber lets tests substitute a fake without standing up gRPC.

DistributedBackendManager exposes DeleteBackendDetailed so the HTTP
handler can surface per-node outcomes ("2 succeeded, 1 queued") to the
UI in a follow-up commit; the existing DeleteBackend still returns
error-only for callers that don't care about node breakdown.

Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx
on a new key so N frontends coordinate — the same pattern the health
monitor and replica reconciler already rely on. Single-node mode runs
both passes inline (adapter is nil, state drain is a no-op).

Tests cover the upsert semantics, backoff math, the probe removing an
unreachable model but keeping a reachable one, and filtering by
probeStaleAfter.
2026-04-19 08:34:57 +00:00
Ettore Di Giacinto
9373de9f9b feat(ui): polish the Nodes page so it reads like a product
The Nodes page was the biggest visual liability in distributed mode.
Rework the main dashboard surfaces in place without changing behavior:

StatCards: uniform height (96px min), left accent bar colored by the
metric's semantic (success/warning/error/primary), icon lives in a
36x36 soft-tinted chip top-right, value is left-aligned and large.
Grid auto-fills so the row doesn't collapse on narrow viewports. This
replaces the previous thin-bordered boxes with inconsistent heights.

Table rows: expandable rows now show a chevron cue on the left (rotates
on expand) so users know rows open. Status cell became a dedicated chip
with an LED-style halo dot instead of a bare bullet. Action buttons gained
labels — "Approve", "Resume", "Drain" — so the icons aren't doing all
the semantic work; the destructive remove action uses the softer
btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog
still owning the real "are you sure". Applied cell-mono/cell-muted
utility classes so label chips and addresses share one spacing/font
grammar instead of re-declaring inline styles everywhere.

Expanded drawer: empty states for Loaded Models and Installed Backends
now render as a proper drawer-empty card (dashed border, icon, one-line
hint) instead of a plain muted string that read like broken formatting.

Tabs: three inline-styled buttons became the shared .tab class so they
inherit focus ring, hover state, and the rest of the design system —
matches the System page.

"Add more workers" toggle turned into a .nodes-add-worker dashed-border
button labelled "Register a new worker" (action voice) instead of a
chevron + muted link that operators kept mistaking for broken text.

New shared CSS primitives carry over to other pages:
.stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty,
.nodes-add-worker.
2026-04-19 08:20:52 +00:00
Ettore Di Giacinto
1b3c951c85 feat(ui): surface backend upgrades in the System page
The System page (Manage.jsx) only showed updates as a tiny inline arrow,
so operators routinely missed them. Port the Backend Gallery's upgrade UX
so System speaks the same visual language:

- Yellow banner at the top of the Backends tab when upgrades are pending,
  with an "Upgrade all" button (serial fan-out, matches the gallery) and a
  "Updates only" filter toggle.
- Warning pill (↑ N) next to the tab label so the count is glanceable even
  when the banner is scrolled out of view.
- Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button
  that silently flipped semantics between Reinstall and Upgrade), plus an
  "Update available" badge in the new Version column.
- New columns: Version (with upgrade + drift chips), Nodes (per-node
  attribution badges for distributed mode, degrading to a compact
  "on N nodes · M offline" chip above three nodes), Installed (relative
  time).
- System backends render a "Protected" chip instead of a bare "—" so rows
  still align and the reason is obvious.
- Delete uses the softer btn-danger-ghost so rows don't scream red; the
  ConfirmDialog still owns the "are you sure".

The upgrade checker also needed the same per-worker fix as the previous
commit: NewUpgradeChecker now takes a BackendManager getter so its
periodic runs call the distributed CheckUpgrades (which asks workers)
instead of the empty frontend filesystem. Without this the /api/backends/
upgrades endpoint stayed empty in distributed mode even with the protocol
change in place.

New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack,
.cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in
App.css so other pages can adopt them without duplicating styles.
2026-04-19 08:14:49 +00:00
Ettore Di Giacinto
1f43762655 fix(distributed): detect backend upgrades across worker nodes
Before this change `DistributedBackendManager.CheckUpgrades` delegated to the
local manager, which read backends from the frontend filesystem. In
distributed deployments the frontend has no backends installed locally —
they live on workers — so the upgrade-detection loop never ran and the UI
silently never surfaced upgrades even when the gallery advertised newer
versions or digests.

Worker-side: NATS backend.list reply now carries Version, URI and Digest
for each installed backend (read from metadata.json).

Frontend-side: DistributedBackendManager.ListBackends aggregates per-node
refs (name, status, version, digest) instead of deduping, and CheckUpgrades
feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint
factored out of CheckBackendUpgrades so both paths share the same core
logic.

Cluster drift policy: when per-node version/digest tuples disagree, the
backend is flagged upgradeable regardless of whether any single node
matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so
operators can see *why* it is out of sync. The next upgrade-all realigns
the cluster.

Tests cover: drift detection, unanimous-match (no upgrade), and the
empty-installed-version path that the old distributed code silently
missed.
2026-04-19 08:03:20 +00:00
LocalAI [bot]
884bfb84c9 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe (#9418)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:27:11 +02:00
LocalAI [bot]
e94a9a8f10 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d (#9417)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-19 09:26:58 +02:00
Ettore Di Giacinto
054c4b4b45 feat(stable-diffusion.ggml): add support for video generation (#9420)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 09:26:33 +02:00
LocalAI [bot]
6e49dba27c chore: ⬆️ Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb (#9415)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:26:05 +02:00
Ettore Di Giacinto
e463820566 fix(ui): fix dark-theme colors in chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-18 23:01:01 +00:00
Keith Mattix II
8839a71c87 fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg (#9410)
Add gfx1151 (AMD Strix Halo / Ryzen AI MAX) to the default AMDGPU_TARGETS
list in the llama-cpp backend Makefile. ROCm 7.2.1 ships with gfx1151
Tensile libraries, so this architecture should be included in default builds.

Also expose AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so that
users building for non-default GPU architectures can override the target
list via --build-arg AMDGPU_TARGETS=<arch>. Previously, passing
-DAMDGPU_TARGETS=<arch> through CMAKE_ARGS was silently overridden by
the Makefile's own append of the default target list.

Fixes #9374

Signed-off-by: Keith Mattix <keithmattix2@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-18 20:39:40 +02:00
Ettore Di Giacinto
117f6430b8 fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413)
The shared grpc-server CMakeLists hardcoded `llama-common`, the post-rename
target name in upstream llama.cpp. The turboquant fork branched before that
rename and still exposes the helpers library as `common`, so the name
silently degraded to a plain `-llama-common` link flag, the PUBLIC include
directory was never propagated, and tools/server/server-task.h failed to
find common.h during turboquant-<flavor> builds.
2026-04-18 20:30:28 +02:00
Ettore Di Giacinto
7809c5f5d0 fix(vision): propagate mtmd media marker from backend via ModelMetadata (#9412)
Upstream llama.cpp (PR #21962) switched the server-side mtmd media
marker to a random per-server string and removed the legacy
"<__media__>" backward-compat replacement in mtmd_tokenizer. The
Go layer still emitted the hardcoded "<__media__>", so on the
non-tokenizer-template path the prompt arrived with a marker mtmd
did not recognize and tokenization failed with "number of bitmaps
(1) does not match number of markers (0)".

Report the active media marker via ModelMetadataResponse.media_marker
and substitute the sentinel "<__media__>" with it right before the
gRPC call, after the backend has been loaded and probed. Also skip
the Go-side multimodal templating entirely when UseTokenizerTemplate
is true — llama.cpp's oaicompat_chat_params_parse already injects its
own marker and StringContent is unused in that path. Backends that do
not expose the field keep the legacy "<__media__>" behavior.
2026-04-18 20:30:13 +02:00
LocalAI [bot]
ad742738cb chore: ⬆️ Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e (#9404)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-18 09:21:32 +02:00
LocalAI [bot]
86c673fd94 chore: ⬆️ Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd (#9403)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-18 00:42:32 +02:00
Ettore Di Giacinto
c49feb546f fix(llama-cpp): rename linked target common -> llama-common (#9408)
Upstream llama.cpp (45cac7ca) renamed the CMake library target
`common` to `llama-common`. Linking the old name caused
`target_include_directories(... PUBLIC .)` from the common/ dir
to not propagate, so `#include "common.h"` failed when building
grpc-server.
2026-04-18 00:42:05 +02:00
LocalAI [bot]
844b0b760b chore(model gallery): 🤖 add 1 new models via gallery agent (#9400)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 17:56:41 +02:00
LocalAI [bot]
55c05211d3 chore(model gallery): 🤖 add 1 new models via gallery agent (#9399)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 16:10:02 +02:00
Ettore Di Giacinto
a90a8cf1d0 fix(ci): switch gallery-agent to sigs.k8s.io/yaml (#9397)
The gallery-agent lives under .github/, which Go tooling treats as a
hidden directory and excludes from './...' expansion. That means 'go
mod tidy' (run on every dependabot dependency bump) repeatedly strips
github.com/ghodss/yaml from go.mod/go.sum, breaking 'go run
./.github/gallery-agent' with a missing go.sum entry error.

Switch to sigs.k8s.io/yaml — API-compatible with ghodss/yaml and
already pulled in as a transitive dependency via non-hidden packages,
so tidy can no longer remove it.
2026-04-17 10:10:42 +02:00
dependabot[bot]
12b069f9bd chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9376)
chore(deps): bump dompurify

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [dompurify](https://github.com/cure53/DOMPurify).


Updates `dompurify` from 3.3.2 to 3.4.0
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.3.2...3.4.0)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.0
  dependency-type: direct:production
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 09:06:32 +02:00
github-actions[bot]
48e87db400 chore: bump inference defaults from unsloth (#9396)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 09:05:55 +02:00
LocalAI [bot]
7dbd9c056a chore: ⬆️ Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff (#9381)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:13:04 +02:00
Ettore Di Giacinto
7c5d6162f7 fix(ui): rename model config files on save to prevent duplicates (#9388)
Editing a model's YAML and changing the `name:` field previously wrote
the new body to the original `<oldName>.yaml`. On reload the config
loader indexed that file under the new name while the old key
lingered in memory, producing two entries in the system UI that
shared a single underlying file — deleting either removed both.

Detect the rename in EditModelEndpoint and rename the on-disk
`<name>.yaml` and `._gallery_<name>.yaml` to match, drop the stale
in-memory key before the reload, and redirect the editor URL in the
React UI so it tracks the new name. Reject conflicts (409) and names
containing path separators (400).

Fixes #9294
2026-04-17 08:12:48 +02:00
Ettore Di Giacinto
5837b14888 chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' (#9385)
chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d`

Drop 0002-ggml-rpc-bump-op-count-to-97.patch; the fork now has
GGML_OP_COUNT == 97 and RPC_PROTO_PATCH_VERSION 2 upstream.

Fetch all tags in backend/cpp/llama-cpp/Makefile so tag-only commits
(the new turboquant pin is reachable only through the tag
feature-turboquant-kv-cache-b8821-45f8a06) can be checked out.
2026-04-17 08:12:21 +02:00
LocalAI [bot]
b6a68e5df4 chore: ⬆️ Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 (#9383)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:11:55 +02:00
LocalAI [bot]
c6dfb4acaf chore: ⬆️ Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 (#9382)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:11:41 +02:00
LocalAI [bot]
ec5935421c chore(model-gallery): ⬆️ update checksum (#9384)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 22:41:52 +02:00
Ettore Di Giacinto
a0cbc46be9 refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer (#9380)
Drop the 295-line vendor/llama.py fork in favor of `tinygrad.apps.llm`,
which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8
quantization), KV-cache and generate loop we were maintaining ourselves.

What changed:
- New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict
  keymap, Transformer kwargs builder, `_embed_hidden` helper, and a hard
  rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the
  apps.llm Transformer ties `bias=False` on Q/K/V projections).
- backend.py routes both safetensors and GGUF paths through
  apps.llm.Transformer. Generation now delegates to its (greedy-only)
  `generate()`; Temperature / TopK / TopP / RepetitionPenalty are still
  accepted on the wire but ignored — documented in the module docstring.
- Jinja chat render now passes `enable_thinking=False` so Qwen3's
  reasoning preamble doesn't eat the tool-call token budget on small
  models.
- Embedding path uses `_embed_hidden` (block stack + output_norm) rather
  than the custom `embed()` method we were carrying on the vendored
  Transformer.
- test.py gains TestAppsLLMAdapter covering the keymap rename, tied
  embedding fallback, unknown-key skipping, and qkv_bias rejection.
- Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B
  (apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the
  HF chat template emits hermes-style JSON tool calls).

Verified with the docker-backed targets:
  test-extra-backend-tinygrad             5/5 PASS
  test-extra-backend-tinygrad-embeddings  3/3 PASS
  test-extra-backend-tinygrad-whisper     4/4 PASS
  test-extra-backend-tinygrad-sd          3/3 PASS
2026-04-16 22:41:18 +02:00
Ettore Di Giacinto
b4e30692a2 feat(backends): add sglang (#9359)
* feat(backends): add sglang

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job

sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally;
-march=native fails on CI runners without AVX-512 in /proc/cpuinfo.
Force -march=sapphirerapids so the build always succeeds, matching
sglang upstream's docker/xeon.Dockerfile recipe.

The resulting binary still requires an AVX-512 capable CPU at runtime,
so disable tests-sglang-grpc in test-extra.yml for the same reason
tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang
still work on hosts with the right SIMD baseline.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512

CXXFLAGS with -march=sapphirerapids was being overridden by
add_compile_options(-march=native) in sglang's CPU CMakeLists.txt,
since CMake appends those flags after CXXFLAGS. Sed-patch the
CMakeLists.txt directly after cloning to replace -march=native.

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 22:40:56 +02:00
Ettore Di Giacinto
61d34ccb11 fix(ui): show also concrete backends in the backend list
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 17:44:25 +00:00
LocalAI [bot]
7f88a3ba30 chore: ⬆️ Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a (#9366)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 09:04:33 +02:00
Matt Van Horn
c4f309388e fix(gallery): correct gemma-4 model URIs returning 404 (#9379)
The gemma-4-26b-a4b-it, gemma-4-e2b-it, and gemma-4-e4b-it gallery
entries pointed at files that do not exist on HuggingFace, so LocalAI
fails with 404 when users try to install them.

Two issues per entry:
- mmproj filename uses the 'f16' quantization suffix, but ggml-org
  publishes the mmproj projectors as 'bf16'.
- The e2b and e4b URIs hardcode lowercase 'e2b'/'e4b' in the filename
  component. HuggingFace file paths are case-sensitive and the real
  files use uppercase 'E2B'/'E4B'.

Updated filename, uri, sha256, and the top-level 'mmproj' and
'parameters.model' references so every entry points at a real file
and the declared hashes match the content.

Verified each URI resolves (HTTP 302) and each sha256 matches the
'x-linked-etag' header returned by HuggingFace.

Signed-off-by: Matt Van Horn <mvanhorn@gmail.com>
2026-04-16 08:51:20 +02:00
dependabot[bot]
ab326a9c61 chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates (#9373)
Bumps the npm_and_yarn group with 6 updates in the /core/http/react-ui directory:

| Package | From | To |
| --- | --- | --- |
| [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) | `6.4.1` | `6.4.2` |
| [@hono/node-server](https://github.com/honojs/node-server) | `1.19.11` | `1.19.14` |
| [flatted](https://github.com/WebReflection/flatted) | `3.3.4` | `3.4.2` |
| [hono](https://github.com/honojs/hono) | `4.12.7` | `4.12.14` |
| [path-to-regexp](https://github.com/pillarjs/path-to-regexp) | `8.3.0` | `8.4.2` |
| [picomatch](https://github.com/micromatch/picomatch) | `4.0.3` | `4.0.4` |



Updates `vite` from 6.4.1 to 6.4.2
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite)

Updates `@hono/node-server` from 1.19.11 to 1.19.14
- [Release notes](https://github.com/honojs/node-server/releases)
- [Commits](https://github.com/honojs/node-server/compare/v1.19.11...v1.19.14)

Updates `flatted` from 3.3.4 to 3.4.2
- [Commits](https://github.com/WebReflection/flatted/compare/v3.3.4...v3.4.2)

Updates `hono` from 4.12.7 to 4.12.14
- [Release notes](https://github.com/honojs/hono/releases)
- [Commits](https://github.com/honojs/hono/compare/v4.12.7...v4.12.14)

Updates `path-to-regexp` from 8.3.0 to 8.4.2
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v8.3.0...v8.4.2)

Updates `picomatch` from 4.0.3 to 4.0.4
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/picomatch/compare/4.0.3...4.0.4)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 6.4.2
  dependency-type: direct:development
  dependency-group: npm_and_yarn
- dependency-name: "@hono/node-server"
  dependency-version: 1.19.14
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: flatted
  dependency-version: 3.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: hono
  dependency-version: 4.12.14
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: path-to-regexp
  dependency-version: 8.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: picomatch
  dependency-version: 4.0.4
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-16 08:23:03 +02:00
LocalAI [bot]
df2d25cee5 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 (#9368)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:13:08 +02:00
LocalAI [bot]
96cd561d9d chore: ⬆️ Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a (#9370)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:12:39 +02:00
LocalAI [bot]
08445b1b89 chore(model-gallery): ⬆️ update checksum (#9369)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:12:01 +02:00
Ettore Di Giacinto
ad3c8c4832 fix(agents): handle embedding model dim changes on collection upload (#9365)
Bumps LocalAGI to pick up the LocalRecall postgres backend fix that
resizes the pgvector column when the configured embedding model
returns vectors of a different dimensionality than the existing
collection. Switching the agent pool's embedding model now triggers
a transparent re-embed at startup instead of failing every subsequent
upload with 'expected N dimensions, not M' (SQLSTATE 22000).

Also surfaces a 409 with an actionable message in
UploadToCollectionEndpoint as a safety net for the rare cases the
upstream migration path doesn't cover (e.g. a model swapped at
runtime), instead of the previous opaque 500.
2026-04-15 20:05:28 +02:00
Ettore Di Giacinto
6f0051301b feat(backend): add tinygrad multimodal backend (experimental) (#9364)
* feat(backend): add tinygrad multimodal backend

Wire tinygrad as a new Python backend covering LLM text generation with
native tool-call extraction, embeddings, Stable Diffusion 1.x image
generation, and Whisper speech-to-text from a single self-contained
container.

Backend (`backend/python/tinygrad/`):
- `backend.py` gRPC servicer with LLM Predict/PredictStream (auto-detects
  Llama / Qwen2 / Mistral architecture from `config.json`, supports
  safetensors and GGUF), Embedding via mean-pooled last hidden state,
  GenerateImage via the vendored SD1.x pipeline, AudioTranscription +
  AudioTranscriptionStream via the vendored Whisper inference loop, plus
  Tokenize / ModelMetadata / Status / Free.
- Vendored upstream model code under `vendor/` (MIT, headers preserved):
  llama.py with an added `qkv_bias` flag for Qwen2-family bias support
  and an `embed()` method that returns the last hidden state, plus
  clip.py, unet.py, stable_diffusion.py (trimmed to drop the MLPerf
  training branch that pulls `mlperf.initializers`), audio_helpers.py
  and whisper.py (trimmed to drop the pyaudio listener).
- Pluggable tool-call parsers under `tool_parsers/`: hermes (Qwen2.5 /
  Hermes), llama3_json (Llama 3.1+), qwen3_xml (Qwen 3), mistral
  (Mistral / Mixtral). Auto-selected from model architecture or `Options`.
- `install.sh` pins Python 3.11.14 (tinygrad >=0.12 needs >=3.11; the
  default portable python is 3.10).
- `package.sh` bundles libLLVM.so.1 + libedit/libtinfo/libgomp/libsndfile
  into the scratch image. `run.sh` sets `CPU_LLVM=1` and `LLVM_PATH` so
  tinygrad's CPU device uses the in-process libLLVM JIT instead of
  shelling out to the missing `clang` binary.
- Local unit tests for Health and the four parsers in `test.py`.

Build wiring:
- Root `Makefile`: `.NOTPARALLEL`, `prepare-test-extra`, `test-extra`,
  `BACKEND_TINYGRAD = tinygrad|python|.|false|true`,
  docker-build-target eval, and `docker-build-backends` aggregator.
- `.github/workflows/backend.yml`: cpu / cuda12 / cuda13 build matrix
  entries (mirrors the transformers backend placement).
- `backend/index.yaml`: `&tinygrad` meta + cpu/cuda12/cuda13 image
  entries (latest + development).

E2E test wiring:
- `tests/e2e-backends/backend_test.go` gains an `image` capability that
  exercises GenerateImage and asserts a non-empty PNG is written to
  `dst`. New `BACKEND_TEST_IMAGE_PROMPT` / `BACKEND_TEST_IMAGE_STEPS`
  knobs.
- Five new make targets next to `test-extra-backend-vllm`:
  - `test-extra-backend-tinygrad` — Qwen2.5-0.5B-Instruct + hermes,
    mirrors the vllm target 1:1 (5/9 specs in ~57s).
  - `test-extra-backend-tinygrad-embeddings` — same model, embeddings
    via LLM hidden state (3/9 in ~10s).
  - `test-extra-backend-tinygrad-sd` — stable-diffusion-v1-5 mirror,
    health/load/image (3/9 in ~10min, 4 diffusion steps on CPU).
  - `test-extra-backend-tinygrad-whisper` — openai/whisper-tiny.en
    against jfk.wav from whisper.cpp samples (4/9 in ~49s).
  - `test-extra-backend-tinygrad-all` aggregate.

All four targets land green on the first MVP pass: 15 specs total, 0
failures across LLM+tools, embeddings, image generation, and speech
transcription.

* refactor(tinygrad): collapse to a single backend image

tinygrad generates its own GPU kernels (PTX renderer for CUDA, the
autogen ctypes wrappers for HIP / Metal / WebGPU) and never links
against cuDNN, cuBLAS, or any toolkit-version-tied library. The only
runtime dependency that varies across hosts is the driver's libcuda.so.1
/ libamdhip64.so, which are injected into the container at run time by
the nvidia-container / rocm runtimes. So unlike torch- or vLLM-based
backends, there is no reason to ship per-CUDA-version images.

- Drop the cuda12-tinygrad and cuda13-tinygrad build-matrix entries
  from .github/workflows/backend.yml. The sole remaining entry is
  renamed to -tinygrad (from -cpu-tinygrad) since it is no longer
  CPU-only.
- Collapse backend/index.yaml to a single meta + development pair.
  The meta anchor carries the latest uri directly; the development
  entry points at the master tag.
- run.sh picks the tinygrad device at launch time by probing
  /usr/lib/... for libcuda.so.1 / libamdhip64.so. When libcuda is
  visible we set CUDA=1 + CUDA_PTX=1 so tinygrad uses its own PTX
  renderer (avoids any nvrtc/toolkit dependency); otherwise we fall
  back to HIP or CLANG. CPU_LLVM=1 + LLVM_PATH keep the in-process
  libLLVM JIT for the CLANG path.
- backend.py's _select_tinygrad_device() is trimmed to a CLANG-only
  fallback since production device selection happens in run.sh.

Re-ran test-extra-backend-tinygrad after the change:
  Ran 5 of 9 Specs in 56.541 seconds — 5 Passed, 0 Failed
2026-04-15 19:48:23 +02:00
LocalAI [bot]
8487058673 chore(model-gallery): ⬆️ update checksum (#9358)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:59 +02:00
LocalAI [bot]
62862ca06b chore: ⬆️ Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 (#9357)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:41 +02:00
LocalAI [bot]
07e244d869 feat(swagger): update swagger (#9356)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:24 +02:00
Ettore Di Giacinto
95efb8a562 feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend

turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.

Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.

scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.

* feat(turboquant): carry upstream patches against fork API drift

turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.

Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).

Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.

* docs: add turboquant backend section + clarify cache_type_k/v

Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.

Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.

* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion

The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.

* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e

The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.

Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.

Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.

Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.

* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3

The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-15 01:25:04 +02:00
Ettore Di Giacinto
410d100cc3 chore(ui): improve visibility of forms, color palette
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 21:53:03 +00:00
Ettore Di Giacinto
833b7e8557 chore(docs): update transcription endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 14:14:54 +00:00
Ettore Di Giacinto
87e6de1989 feat: wire transcription for llama.cpp, add streaming support (#9353)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 16:13:40 +02:00
Ettore Di Giacinto
b361d2ddd6 chore(gallery): add new llama.cpp supported models (qwen-asr, ocr)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 10:04:50 +00:00
Ettore Di Giacinto
1e4c4577bb fix(ci): small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 09:27:27 +00:00
dependabot[bot]
98fd9d5cc6 chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 (#9340)
Bumps [github.com/charmbracelet/glamour](https://github.com/charmbracelet/glamour) from 0.10.0 to 1.0.0.
- [Release notes](https://github.com/charmbracelet/glamour/releases)
- [Commits](https://github.com/charmbracelet/glamour/compare/v0.10.0...v1.0.0)

---
updated-dependencies:
- dependency-name: github.com/charmbracelet/glamour
  dependency-version: 1.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:17:05 +02:00
dependabot[bot]
0c725f5702 chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 (#9344)
Bumps [github.com/swaggo/echo-swagger](https://github.com/swaggo/echo-swagger) from 1.4.1 to 1.5.2.
- [Release notes](https://github.com/swaggo/echo-swagger/releases)
- [Commits](https://github.com/swaggo/echo-swagger/compare/v1.4.1...v1.5.2)

---
updated-dependencies:
- dependency-name: github.com/swaggo/echo-swagger
  dependency-version: 1.5.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:37 +02:00
dependabot[bot]
7661a4ffa5 chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 (#9341)
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats

Bumps [github.com/testcontainers/testcontainers-go/modules/nats](https://github.com/testcontainers/testcontainers-go) from 0.41.0 to 0.42.0.
- [Release notes](https://github.com/testcontainers/testcontainers-go/releases)
- [Commits](https://github.com/testcontainers/testcontainers-go/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: github.com/testcontainers/testcontainers-go/modules/nats
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:26 +02:00
dependabot[bot]
24ad6e4be1 chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 (#9343)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.3 to 0.21.5.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.21.3...v0.21.5)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.21.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:09 +02:00