Compare commits

...

1236 Commits

Author SHA1 Message Date
Richard Palethorpe
9d42a16c20 ci: publish base images to ci-cache instead of localai-base
The previous tag scheme pushed to quay.io/go-skynet/localai-base, which
required a separate quay repo + a write-permission grant for the CI
robot. PR #9672 hit a 401 on push because that grant was missing — the
robot can log in but not write to localai-base.

ci-cache already exists, the robot already has write access (it writes
the buildkit cache there on every backend build), and OCI tags namespace
cleanly within a repo. So publish base images to
quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]. The `base-image-`
prefix doesn't collide with the existing tag prefixes:
  - cache<tag-suffix>           per-backend buildkit cache
  - cache-localai<tag-suffix>   root image buildkit cache
  - base-<stem>                 base image's own buildkit cache
  - base-image-<stem>           the published OCI image (new)

base_images.yml's compute_ref step and prebuiltRef() in
scripts/changed-backends.js are kept in lock-step. Local Makefile tags
are unchanged (they're just local docker labels with no remote
correlation).

Assisted-by: Claude:opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-06 16:13:06 +01:00
Richard Palethorpe
9c1f8b344c ci: extend layered base images to golang, cpp, and rust matrices
Python's tier-1+2 base image (apt + GPU SDK + lang toolchain) was the only
lang previously factored. The remaining 82 matrix entries (62 golang +
9 llama-cpp + 9 turboquant + 1 ik-llama-cpp + 1 rust) still inlined the
same bootstrap into per-backend cache tags.

Add .docker/bases/Dockerfile.{golang,cpp,rust} mirroring Dockerfile.python's
GPU stack, with the lang-specific tail at the bottom (Go + protoc + grpc
tooling; protoc + cmake + GRPC; rustup + audio dev libs respectively).
Slim the five consumer Dockerfiles to FROM ${BASE_IMAGE_PREBUILT} + the
per-backend COPY/make.

The C++ trio (llama-cpp, ik-llama-cpp, turboquant) only differ in their
make targets, so langOf() in scripts/changed-backends.js remaps all three
Dockerfile suffixes to the shared 'cpp' base. That collapses 17 would-be
distinct bases to 8. langTriggerSelector and baseTriggerFiles are
extended so PRs touching the new recipes fan out canaries; the
.docker/bases/ auto-detection picks up the new langs without further
script changes.

Makefile: add docker-build-{python,golang,cpp,rust}-base targets and a
local-base-tag/local-base-target macro pair so each backend's
docker-build-X chains through the right base. The previous python-only
prereq is now a generic per-lang dispatch.

Total distinct bases for the full 234-entry matrix: 29 (was 9 with only
python factored). The C++ base also absorbs the previously per-consumer
GRPC build stage, removing the dominant cost from the llama-cpp /
ik-llama-cpp / turboquant rebuild paths.

Assisted-by: Claude:opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-06 16:13:06 +01:00
Richard Palethorpe
a3b7c3a819 ci: layered Python base images for cross-matrix dedup
The 234-entry backend matrix runs the same apt-update + GPU SDK install +
Python toolchain bootstrap into N independent registry-cache tags. Factor
that shared work out into a tier-1+2 base image (lang × accel × ubuntu ×
cuda) built once per workflow run, then consumed by every backend that
matches its tuple via BASE_IMAGE_PREBUILT.

The matrix data moves to .github/backend-matrix.yaml so backend.yml can
switch to fromJSON without duplicating the matrix. scripts/changed-backends.js
reads the data file, derives the deduplicated bases-matrix, annotates each
Python entry with the right base-image-prebuilt ref, and runs a collision
check that fails loudly if a future matrix change makes two consumers want
incompatible bases under the same tag-stem.

PR builds tag with -pr<N> so end-to-end validation lives within one PR;
master builds tag without the suffix. The base-images registry cache
parallels the existing per-matrix-entry caches.

Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding
a new language tier is a Dockerfile.<lang> recipe + a slim of the
consumer Dockerfile (script auto-detects via .docker/bases/).

10 distinct bases derive from the current 234 entries, replacing the
inline bootstrap that previously ran into ~10 separate cache tags.

Assisted-by: Claude:opus-4-7-1m [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-06 16:10:49 +01:00
LocalAI [bot]
4e154b59e5 fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 (#9688)
Two unrelated CI breakages bundled together since both are one-liners:

- rerankers: bump torch 2.4.1 -> 2.7.1 on cpu/cublas12. The unpinned
  transformers resolves to 5.x, whose moe.py registers a custom_op with
  string-typed `'torch.Tensor'` annotations that torch 2.4.1's
  infer_schema rejects, blocking the gRPC server from starting and
  failing all 5 backend tests with "Connection refused" on :50051.
  Matches the version used by the transformers backend.

- vllm-omni: strip fa3-fwd from the upstream requirements/cuda.txt
  before resolving on aarch64. fa3-fwd 0.0.3 ships only an
  x86_64 wheel and has no sdist, making the cuda profile unsatisfiable
  on Jetson/SBSA. fa3-fwd is a soft runtime dep — vllm-omni's
  attention backends fall back to FA2 then SDPA when it's missing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 17:07:24 +02:00
Richard Palethorpe
969005b2a1 feat(gallery): Speed up load times and clean gallery entries (#9211)
* feat: Rework VRAM estimation and use known_usecases in gallery

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]

* chore(gallery): regenerate gallery index and add known_usecases to model entries

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-06 14:51:38 +02:00
LocalAI [bot]
6d56bf98fe feat(importers): add vibevoice-cpp importer for GGUF bundles (#9685)
Routes mudler/vibevoice.cpp-models and similar repos to the vibevoice-cpp
backend. Detects via repo name ("vibevoice.cpp"/"vibevoice-cpp"), file
listing (vibevoice-*.gguf + tokenizer.gguf), or preferences.backend
override. Defaults to the realtime TTS model; preferences.usecase=asr
selects the ASR/diarization variant. Bundles the required tokenizer.gguf
and (for TTS) a voice prompt, emitting the Options[] entries the backend
expects. Registered ahead of VibeVoiceImporter so the C++ bundles aren't
swallowed by the older Python-backend substring match.


Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 13:33:10 +02:00
LocalAI [bot]
a8d7d37a3c fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) (#9682)
* fix(docs): correct broken Hugo relrefs

The Hugo build has been failing on master since the relevant pages
landed:

- text-generation.md:720 referenced `/docs/features/distributed-mode`,
  but Hugo `relref` paths are relative to the content root, not the
  rendered URL. Drop the `/docs/` prefix so the lookup matches the
  existing `features/...` form used elsewhere in the file.
- audio-transform.md:144 referenced `tts.md`; the actual page is
  `text-to-audio.md`.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(kokoros): stub Diarize and AudioTransform Backend trait methods

The recent backend.proto additions (Diarize, AudioTransform,
AudioTransformStream) extended the gRPC Backend trait, breaking
kokoros-grpc compilation with E0046 because the Rust implementation
hadn't picked up the new methods. Add Unimplemented stubs matching the
existing pattern for non-applicable RPCs in this TTS-only backend.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(vibevoice-cpp): track upstream ABI + wire 1.5B voice cloning

Two recent commits in mudler/vibevoice.cpp reshaped the vv_capi_tts
signature without a corresponding bump on the LocalAI side:

  3bd759c "1.5b: unify into a single tts entry point" inserted a
          ref_audio_path parameter between voice_path and dst_wav_path.
  ad856bd "1.5b: multi-speaker dialog support" promoted that to a
          (const char* const* ref_audio_paths, int n_ref_audio_paths)
          pair for per-speaker conditioning.

Because purego resolves symbols by name and not by signature, the
build kept linking; at runtime the misaligned arguments turned the
TTS->ASR closed-loop test into a SIGSEGV inside cgo. Track HEAD
explicitly and bring the bridge in line with it:

  * Update the CppTTS purego binding to the 9-arg form. purego
    marshals []*byte as a **char by handing the C side the underlying
    array address; nil/empty maps to NULL, which matches the C
    contract for "no reference audio" on the realtime-0.5B path.
  * Add a `ref_audio` gallery option (comma-separated, repeatable)
    that the 1.5B path consumes for runtime voice cloning. Multiple
    entries are interpreted as one WAV per speaker (Speaker 0..n-1).
  * TTSRequest.Voice now routes by extension/shape: `.wav` or a
    comma-separated list goes to ref_audio_paths; anything else stays
    on voice_path (realtime-0.5B's pre-baked voice gguf).
  * Pin VIBEVOICE_CPP_VERSION to ad856bd and wire the Makefile into
    the existing bump_deps matrix so future upstream rolls land as
    reviewable PRs instead of a silent CI break.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(vibevoice-cpp): use ModelOptions.AudioPath for 1.5B ref audio

Use the existing audio_path field from ModelOptions (already plumbed
through config_file's `audio_path:` YAML and consumed by other audio
backends like kokoros) instead of inventing a custom `ref_audio:`
Options[] string. Multi-speaker setups stay on a single comma-
separated value.

No behavior change beyond the gallery key name; per-call routing via
TTSRequest.Voice is unchanged.

Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 10:36:59 +02:00
LocalAI [bot]
06a1524155 chore(model gallery): 🤖 add 1 new models via gallery agent (#9681)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-06 08:47:40 +02:00
LocalAI [bot]
70cf8ac546 fix(backend): resolve relative draft_model paths against the models dir (#9680)
* fix(backend): resolve relative draft_model paths against the models dir

The main model file and mmproj are joined with the configured models
directory before reaching the backend, but draft_model was sent
verbatim. With a relative draft_model in the YAML config, llama.cpp
opens the path from the backend process's CWD and fails with "No such
file or directory", forcing users to hard-code an absolute path.

Mirror the existing mmproj resolution: if draft_model is relative,
join it with modelPath. Absolute paths are passed through unchanged.

Adds an e2e regression test against the mock backend that asserts the
main model file, mmproj, and draft_model all arrive at the backend
resolved to absolute paths.

Closes #9675

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7-1m [Read] [Edit] [Bash] [Write]

* fix(backend): always join draft_model with models dir (drop IsAbs shortcut)

The previous commit kept absolute draft_model paths intact via an
IsAbs check. That left a path-traversal vector open: a user-supplied
YAML config could set draft_model to /etc/passwd (or any other host
file the backend process can read) and the path would be sent through
unchanged.

filepath.Join cleans the leading slash from absolute components, so
joining unconditionally — the way mmproj already does — keeps the
result rooted at the configured models directory regardless of input.

Adds a second e2e spec that feeds an absolute draft_model into the
mock backend and asserts the path is clamped under modelsPath.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7-1m [Read] [Edit] [Bash]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 00:58:38 +02:00
LocalAI [bot]
7fab5e3d21 chore: ⬆️ Update ggml-org/whisper.cpp to 4bf733672b2871d4153158af4f621a6dd9104f4a (#9636)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-06 00:34:16 +02:00
Andreas Egli
af83518532 feat: support word-level timestamps for faster-whisper (#9621)
Signed-off-by: Andreas Egli <github@kharan.ch>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-05-06 00:32:52 +02:00
LocalAI [bot]
a315c321c1 chore: ⬆️ Update TheTom/llama-cpp-turboquant to 69d8e4be47243e83b3d0d71e932bc7aa61c644dc (#9638)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-06 00:29:05 +02:00
Ettore Di Giacinto
75fba9e03f fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678)
In distributed mode the React UI's "Upgrade All" button fanned every
detected outdated backend out to every healthy backend node, including
nodes that never had that backend installed. On heterogeneous clusters
this surfaced as platform errors (e.g. mac-mini-m4 asked to upgrade
cpu-insightface-development, which has no darwin/arm64 variant) and left
forever-retrying pending_backend_ops rows.

DistributedBackendManager.UpgradeBackend now queries ListBackends()
first, builds the target node-ID set from SystemBackend.Nodes, and only
fans out to those nodes — every per-node primitive
(adapter.InstallBackend, the pending-ops queue, BackendOpResult) is
unchanged. enqueueAndDrainBackendOp gains an optional targetNodeIDs
allowlist; Install/Delete keep their fan-to-everyone semantics by
passing nil. If no node reports the backend installed, UpgradeBackend
now returns a clear "not installed on any node" error instead of
producing a stuck queue.

Adds Ginkgo coverage for the smart fan-out: backend on a subset of
nodes goes only to those nodes; backend on no node returns the new
error and never sends a NATS install request.


Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 00:28:41 +02:00
Richard Palethorpe
16b2d4c807 fix(python-backend): make JIT subprocesses work on hosts of any size (#9679)
Two related runtime fixes for Python backends that JIT-compile CUDA
kernels at first model load (FlashInfer, PyTorch inductor, triton):

1. libbackend.sh: replace `source ${EDIR}/venv/bin/activate` with a
   minimal manual setup (_activateVenv: export VIRTUAL_ENV, prepend
   PATH, unset PYTHONHOME) computed from $EDIR at runtime. `uv venv`
   and `python -m venv` both bake the create-time absolute path into
   bin/activate (e.g. VIRTUAL_ENV='/vllm/venv' from the Docker build
   stage), so sourcing activate on a relocated venv — copied out of
   the build container and unpacked at an arbitrary backend dir —
   prepends a stale, non-existent path to $PATH. Pip-installed CLI
   tools (e.g. ninja, used by FlashInfer's NVFP4 GEMM JIT) are then
   never found and the load aborts with FileNotFoundError. Doing the
   env setup ourselves matches what `uv run` does internally and
   sidesteps the relocation problem entirely. Generic — every Python
   backend benefits.

2. vllm/run.sh: replace ninja's default -j$(nproc)+2 with an adaptive
   MAX_JOBS = min(nproc, (MemAvailable-4)/4). Each concurrent
   nvcc/cudafe++ peaks at multiple GiB; the default OOM-kills on
   memory-tight hosts (e.g. a 16 GiB desktop loading a 27B NVFP4
   model) but underutilises 100-core / 1 TB boxes. User-set MAX_JOBS
   still wins. Also pin NVCC_THREADS=2 unless overridden.

Refs: https://github.com/vllm-project/vllm/issues/20079

Assisted-by: Claude:claude-opus-4-7 [Edit] [Bash]
2026-05-06 00:28:01 +02:00
Richard Palethorpe
8e43842175 feat(vllm, distributed): tensor parallel distributed workers (#9612)
* feat(vllm): build vllm from source for Intel XPU

Upstream publishes no XPU wheels for vllm. The Intel profile was
silently picking up a non-XPU wheel that imported but errored at
engine init, and several runtime deps (pillow, charset-normalizer,
chardet) were missing on Intel -- backend.py crashed at import time
before the gRPC server came up.

Switch the Intel profile to upstream's documented from-source
procedure (docs/getting_started/installation/gpu.xpu.inc.md in
vllm-project/vllm):

  - Bump portable Python to 3.12 -- vllm-xpu-kernels ships only a
    cp312 wheel.
  - Source /opt/intel/oneapi/setvars.sh so vllm's CMake build sees
    the dpcpp/sycl compiler from the oneapi-basekit base image.
  - Hide requirements-intel-after.txt during installRequirements
    (it used to 'pip install vllm'); install vllm's deps from a
    fresh git clone of vllm via 'uv pip install -r
    requirements/xpu.txt', swap stock triton for
    triton-xpu==3.7.0, then 'VLLM_TARGET_DEVICE=xpu uv pip install
    --no-deps .'.
  - requirements-intel.txt trimmed to LocalAI's direct deps
    (accelerate / transformers / bitsandbytes); torch-xpu, vllm,
    vllm_xpu_kernels and the rest come from upstream's xpu.txt
    during the source build.
  - requirements.txt: add pillow + charset-normalizer + chardet --
    used by backend.py and missing on the Intel install profile.
  - run.sh: 'set -x' so backend startup is visible in container
    logs (the gRPC startup error path was previously opaque).

Also adds a one-line docs example for engine_args.attention_backend
under the vLLM section, since older XE-HPG GPUs (e.g. Arc A770)
need TRITON_ATTN to bypass the cutlass path in vllm_xpu_kernels.

Tested end-to-end on an Intel Arc A770 with Qwen2.5-0.5B-Instruct
via LocalAI's /v1/chat/completions.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(vllm): add multi-node data-parallel follower worker

vLLM v1's multi-node story is one process per node sharing a DP
coordinator over ZMQ -- the head runs the API server with
data_parallel_size > 1 and followers run `vllm serve --headless ...`
with matching topology. Today LocalAI can already configure DP on the
head via the engine_args YAML map, but there's no way to bring up the
follower nodes -- so the head sits waiting for ranks that never
handshake.

Add `local-ai p2p-worker vllm`, mirroring MLXDistributed's structural
precedent (operator-launched, static config, no NATS placement). The
worker:

  - Optionally self-registers with the frontend as an agent-type node
    tagged `node.role=vllm-follower` so it's visible in the admin UI
    and operators can scope ordinary models away via inverse
    selectors.
  - Resolves the platform-specific vllm backend via the gallery's
    "vllm" meta-entry (cuda*, intel-vllm, rocm-vllm, ...).
  - Runs vLLM as a child process so the heartbeat goroutine survives
    until vLLM exits; forwards SIGINT/SIGTERM so vLLM can clean up its
    ZMQ sockets before we tear down.
  - Validates --headless + --start-rank 0 is rejected (rank 0 is the
    head and must serve the API).

Backend run.sh dispatches `serve` as the first arg to vllm's own CLI
instead of LocalAI's backend.py gRPC server -- the follower speaks
ZMQ directly to the head, there is no LocalAI gRPC on the follower
side. Single-node usage is unchanged.

Generalises the gallery resolution helper into findBackendPath()
shared by MLX and vLLM workers; extracts ParseNodeLabels for the
comma-separated label parsing both use.

Ships with two compose recipes (`docker-compose.vllm-multinode.yaml`
for NVIDIA, `docker-compose.vllm-multinode.intel.yaml` for Intel
XPU/xccl) plus `tests/e2e/vllm-multinode/smoke.sh`. Both vendors are
supported (NCCL for CUDA/ROCm, xccl for XPU) but mixed-vendor DP is
not -- PyTorch's process group requires every rank to use the same
collective backend, and NCCL/xccl/gloo don't interoperate.

Out of scope (deferred): SmartRouter-driven placement of follower
ranks via NATS backend.install events, follower log streaming through
/api/backend-logs, tensor-parallel across nodes, disaggregated
prefill via KVTransferConfig.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* test(vllm): CPU-only end-to-end test for multi-node DP

Adds tests/e2e/vllm-multinode/, a Ginkgo + testcontainers-go suite
that brings up a head + headless follower from the locally-built
local-ai:tests image, bind-mounts the cpu-vllm backend extracted by
make extract-backend-vllm so it's seen as a system backend (no gallery
fetch, no registry server), and asserts a chat completion across both
DP ranks. New `make test-e2e-vllm-multinode` target wires the docker
build, backend extract, and ginkgo run together; BuildKit caches both
images so re-runs only rebuild what changed. Tagged Label("VLLMMultinode")
so the existing distributed suite isn't pulled along.

Two pre-existing bugs surfaced by the test:

1. extract-backend-% (Makefile) failed for every backend, because all
   backend images end with `FROM scratch` and `docker create` rejects
   an image with no CMD/ENTRYPOINT. Fixed by passing
   --entrypoint=/run.sh -- the container is never started, only
   docker-cp'd, so the path doesn't have to exist; we just need
   anything that satisfies the daemon's create-time validation.

2. backend/python/vllm/run.sh's `serve` shortcut for the multi-node DP
   follower exec'd ${EDIR}/venv/bin/vllm directly, but uv bakes an
   absolute build-time shebang (`#!/vllm/venv/bin/python3`) that no
   longer resolves once the backend is relocated to BackendsPath.
   _makeVenvPortable's shebang rewriter only matches paths that
   already point at ${EDIR}, so the original shebang slips through
   unchanged. Fixed by exec-ing ${EDIR}/venv/bin/python with the script
   as an argument -- Python ignores the script's shebang in that case.

The test fixture caps memory aggressively (max_model_len=512,
VLLM_CPU_KVCACHE_SPACE=1, TORCH_COMPILE_DISABLE=1) so two CPU engines
fit on a 32 GB box. TORCH_COMPILE_DISABLE is currently mandatory for
cpu-vllm: torch._inductor's CPU-ISA probe runs even with
enforce_eager=True and needs g++ on PATH, which the LocalAI runtime
image doesn't ship -- to be addressed in a follow-up that bundles a
toolchain in the cpu-vllm backend.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(vllm): bundle a g++ toolchain in the cpu-vllm backend image

torch._inductor's CPU-ISA probe (`cpu_model_runner.py:65 "Warming up
model for the compilation"`) shells out to `g++` at vllm engine
startup, regardless of `enforce_eager=True` -- the eager flag only
disables CUDA graphs, not inductor's first-batch warmup. The LocalAI
CPU runtime image (Dockerfile, unconditional apt list) does not ship
build-essential, and the cpu-vllm backend image is `FROM scratch`,
so any non-trivial inference on cpu-vllm crashes with:

  torch._inductor.exc.InductorError:
    InvalidCxxCompiler: No working C++ compiler found in
    torch._inductor.config.cpp.cxx: (None, 'g++')

Bundling the toolchain in the CPU runtime image would bloat every
non-vllm-CPU deployment and force a single GCC version on backends
that may want clang or a different version. So this lives in the
backend, gated to BUILD_TYPE=='' (the CPU profile).

`package.sh` snapshots g++ + binutils + cc1plus + libstdc++ + libc6
(runtime + dev) + the math libs cc1plus links (libisl/libmpc/libmpfr/
libjansson) into ${BACKEND}/toolchain/, mirroring /usr/... layout. The
unversioned binaries on Debian/Ubuntu are symlink chains pointing into
multiarch packages (`g++` -> `g++-13` -> `x86_64-linux-gnu-g++-13`,
the latter in `g++-13-x86-64-linux-gnu`), so the package list resolves
both the version and the arch-triplet variant. Symlinks /lib ->
usr/lib and /lib64 -> usr/lib64 are recreated under the toolchain
root because Ubuntu's UsrMerge keeps them at /, and ld scripts
(`libc.so`, `libm.so`) hardcode `/lib/...` paths that --sysroot
re-roots into the toolchain.

The unversioned `g++`/`gcc`/`cpp` symlinks are replaced with wrapper
shell scripts that resolve their own location at runtime and pass
`--sysroot=<toolchain>` and `-B <toolchain>/usr/lib/gcc/<triplet>/<ver>/`
to the underlying versioned binary. That's how torch's bare `g++ foo.cpp
-o foo` invocation finds cc1plus (-B), system headers (--sysroot), and
the bundled libstdc++ (--sysroot, --sysroot is recursive into linker).

`run.sh` adds the toolchain bin dir to PATH and the toolchain's
shared-lib dir to LD_LIBRARY_PATH -- everything else (header search,
linker search, executable search) is encapsulated in the wrappers.
No-op for non-CPU builds, the dir doesn't exist there.

The cpu-vllm image grows by ~217 MB. Tradeoff is acceptable -- cpu-vllm
is already a niche profile (few users compared to GPU vllm) and the
alternative is a backend that crashes at first inference unless the
operator manually sets TORCH_COMPILE_DISABLE=1, which silently disables
all torch.compile optimizations.

Drops `TORCH_COMPILE_DISABLE=1` from tests/e2e/vllm-multinode -- the
smoke now exercises the real compile path through the bundled toolchain.
Test runtime is +20s for the warmup compile, still <90s end to end.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(vllm): scope jetson-ai-lab index to L4T-specific wheels via pyproject.toml

The L4T arm64 build resolves dependencies through pypi.jetson-ai-lab.io,
which hosts the L4T-specific torch / vllm / flash-attn wheels but also
transparently proxies the rest of PyPI through `/+f/<sha>/<filename>`
URLs. With `--extra-index-url` + `--index-strategy=unsafe-best-match`
uv would pick those proxy URLs for ordinary PyPI packages —
anthropic/openai/propcache/annotated-types — and fail when the proxy
503s. Master is hitting the same bug on its own l4t-vllm matrix entry.

Switch the l4t13 install path to a pyproject.toml that marks the
jetson-ai-lab index `explicit = true` and pins only torch, torchvision,
torchaudio, flash-attn, and vllm to it via [tool.uv.sources]. uv won't
consult the L4T mirror for anything else, so transitive deps fall back
to PyPI as the default index — no exposure to the proxy 503s.

`uv pip install -r requirements.txt` ignores [tool.uv.sources], so the
l4t13 branch in install.sh now invokes `uv pip install --requirement
pyproject.toml` directly, replacing the old requirements-l4t13*.txt
files. Other BUILD_PROFILEs continue using libbackend.sh's
installRequirements and never read pyproject.toml.

Local resolution test (x86_64, dry-run) confirms uv hits the L4T
index for torch and falls through to PyPI for everything else.

Assisted-by: claude-code:claude-opus-4-7-1m [Read] [Edit] [Bash] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-06 00:22:50 +02:00
Arkadiusz Tymiński
503904d311 fix(faster-whisper): cast segment timestamps to int after multiplication (#9674)
`int(x) * 1e9` returns a float because `1e9` is a float literal, but
TranscriptSegment.start/end are integer protobuf fields. This caused
every transcription request to fail with:

  TypeError: 'float' object cannot be interpreted as an integer

Multiply first, then cast — `int(x * 1e9)` — to get an int as required.
2026-05-05 23:46:39 +02:00
LocalAI [bot]
d5ce823b83 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8b56d813a9ed04fa7b7fe2588fddd845cf64eccb (#9677)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 23:46:09 +02:00
LocalAI [bot]
c9141098b6 chore: ⬆️ Update ggml-org/llama.cpp to bbeb89d76c41bc250f16e4a6fefcc9b530d6e3f3 (#9676)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 23:45:54 +02:00
dependabot[bot]
1caab1de10 chore(deps): bump actions/checkout from 4 to 6 (#9663)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-05 15:37:05 +02:00
Ettore Di Giacinto
e86ade54a6 feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654)
* feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp

Closes #1648.

OpenAI-style multipart endpoint that returns "who spoke when". Single
endpoint instead of the issue's three-endpoint sketch (refactor /vad,
/vad/embedding, /diarization) — the typical client wants one call, and
embeddings can land later as a sibling without breaking this surface.

Response shape borrows from Pyannote/Deepgram: segments carry a
normalised SPEAKER_NN id (zero-padded, stable across the response) plus
the raw backend label, optional per-segment text when the backend bundles
ASR, and a speakers summary in verbose_json. response_format also accepts
rttm so consumers can pipe straight into pyannote.metrics / dscore.

Backends:

* vibevoice-cpp — Diarize() reuses the existing vv_capi_asr pass.
  vibevoice's ASR prompt asks the model to emit
  [{Start,End,Speaker,Content}] natively, so diarization is a by-product
  of the same pass; include_text=true preserves the transcript per
  segment, otherwise we drop it.

* sherpa-onnx — wraps the upstream SherpaOnnxOfflineSpeakerDiarization
  C API (pyannote segmentation + speaker-embedding extractor + fast
  clustering). libsherpa-shim grew config builders, a SetClustering
  wrapper for per-call num_clusters/threshold overrides, and a
  segment_at accessor (purego can't read field arrays out of
  SherpaOnnxOfflineSpeakerDiarizationSegment[] directly).

Plumbing: new Diarize gRPC RPC + DiarizeRequest / DiarizeSegment /
DiarizeResponse messages, threaded through interface.go, base, server,
client, embed. Default Base impl returns unimplemented.

Capability surfaces all updated: FLAG_DIARIZATION usecase,
FeatureAudioDiarization permission (default-on), RouteFeatureRegistry
entries for /v1/audio/diarization and /audio/diarization, audio
instruction-def description widened, CAP_DIARIZATION JS symbol,
swagger regenerated, /api/instructions discovery map updated.

Tests:

* core/backend: speaker-label normalisation (first-seen → SPEAKER_NN,
  per-speaker totals, nil-safety, fallback to backend NumSpeakers when
  no segments).

* core/http/endpoints/openai: RTTM rendering (file-id basename, negative
  duration clamping, fallback id).

* tests/e2e: mock-backend grew a deterministic Diarize that emits
  raw labels "5","2","5" so the e2e suite verifies SPEAKER_NN
  remapping, verbose_json speakers summary + transcript pass-through
  (gated by include_text), RTTM bytes content-type, and rejection of
  unknown response_format. mock-diarize model config registered with
  known_usecases=[FLAG_DIARIZATION] to bypass the backend-name guard.

Docs: new features/audio-diarization.md (request/response, RTTM example,
sherpa-onnx + vibevoice setup), cross-link from audio-to-text.md, entry
in whats-new.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

* fix(diarization): correct sherpa-onnx symbol name + lint cleanup

CI failures on #9654:

* sherpa-onnx-grpc-{tts,transcription} and sherpa-onnx-realtime panicked
  at backend startup with `undefined symbol: SherpaOnnxDestroyOfflineSpeakerDiarizationResult`.
  Upstream's actual symbol is SherpaOnnxOfflineSpeakerDiarizationDestroyResult
  (Destroy in the middle, not the prefix); the rest of the diarization
  surface follows the same naming pattern. The mismatched name made
  purego.RegisterLibFunc fail at dlopen time and crashed the gRPC server
  before the BeforeAll could probe Health, taking down every sherpa-onnx
  test job — not just the diarization-related ones.

* golangci-lint flagged 5 errcheck violations on new defer cleanups
  (os.RemoveAll / Close / conn.Close); wrap each in a `defer func() { _ = X() }()`
  closure (matches the pattern other LocalAI files use for new code, since
  pre-existing bare defers are grandfathered in via new-from-merge-base).

* golangci-lint also flagged forbidigo violations: the new
  diarization_test.go files used testing.T-style `t.Errorf` / `t.Fatalf`,
  which are forbidden by the project's coding-style policy
  (.agents/coding-style.md). Convert both files to Ginkgo/Gomega
  Describe/It with Expect(...) — they get picked up by the existing
  TestBackend / TestOpenAI suites, no new suite plumbing needed.

* modernize linter: tightened the diarization segment loop to
  `for i := range int(numSegments)` (Go 1.22+ idiom).

Verified locally: golangci-lint with new-from-merge-base=origin/master
reports 0 issues across all touched packages, and the four mocked
diarization e2e specs in tests/e2e/mock_backend_test.go still pass.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

* fix(vibevoice-cpp): convert non-WAV input via ffmpeg + raise ASR token budget

Confirmed end-to-end against a real LocalAI instance with vibevoice-asr-q4_k
loaded and the multi-speaker MP3 sample at vibevoice.cpp/samples/2p_argument.mp3:
both /v1/audio/transcriptions and /v1/audio/diarization now succeed and
return correctly attributed speaker turns for the full clip.

Two latent issues surfaced once the diarization endpoint actually exercised
the backend with a non-trivial input:

1. vv_capi_asr only accepts WAV via load_wav_24k_mono. The previous code
   passed the uploaded path straight through, so anything that wasn't
   already a 24 kHz mono s16le WAV failed at the C side with rc=-8 and
   the very unhelpful "vv_capi_asr failed". prepareWavInput shells out
   to ffmpeg ("-ar 24000 -ac 1 -acodec pcm_s16le") in a per-call temp
   dir, matching the rate the model was trained on; both AudioTranscription
   and Diarize now route through it. This is the same shape sherpa-onnx
   uses (utils.AudioToWav), but vibevoice needs 24 kHz rather than 16 kHz
   so we don't reuse that helper.

2. The C ABI's max_new_tokens defaults to 256 when 0 is passed. That's
   fine for a five-second clip but not for anything past ~10 s — vibevoice
   stops mid-JSON, the parse fails, and the caller sees a hard error.
   Pass a much larger budget (16 384 ≈ ~9 minutes of speech at the
   model's ~30 tok/s rate); generation stops at EOS so this is a cap
   rather than a target.

3. As a defensive belt-and-braces, mirror AudioTranscription's existing
   "fall back to a single segment if the model emits non-JSON text"
   pattern in Diarize, so partial / unusual model output never produces
   a 500. This kept the endpoint usable while diagnosing (1) and (2),
   and is the right behaviour to keep.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

* fix(vibevoice-cpp): pass valid WAVs through directly so ffmpeg is not required at runtime

Spotted by tests-e2e-backend (1.25.x): the previous fix forced every
incoming audio file through `ffmpeg -ar 24000 ...`, which meant the
backend container — which does not ship ffmpeg — failed even for the
existing happy path where the caller already uploads a WAV. The
container-side error was:

    rpc error: code = Unknown desc = vibevoice-cpp: ffmpeg convert to
    24k mono wav: exec: "ffmpeg": executable file not found in $PATH

Reading vibevoice.cpp's audio_io.cpp, `load_wav_24k_mono` uses drwav and
already accepts any PCM/IEEE-float WAV at any sample rate, downmixes
multi-channel input to mono, and resamples to 24 kHz internally. So the
only inputs that genuinely need an external converter are non-WAV
formats (MP3, OGG, FLAC, ...).

Detect WAVs by RIFF/WAVE magic at bytes 0..3 / 8..11 and pass them
straight through with a no-op cleanup; everything else still goes
through ffmpeg with the same 24 kHz mono s16le target. The result:

* Container builds without ffmpeg keep working for WAV uploads
  (the e2e-backends fixture is jfk.wav at 16 kHz mono s16le).
* MP3 and other non-WAV inputs still get the new ffmpeg conversion
  path so the diarization endpoint stays useful.
* If the caller uploads a non-WAV but ffmpeg isn't on PATH, the
  surfaced error is still descriptive enough to act on.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

* fix(ci): make gcc-14 install in Dockerfile.golang best-effort for jammy bases

The LocalVQE PR (bb033b16) made `gcc-14 g++-14` an unconditional apt
install in backend/Dockerfile.golang and pointed update-alternatives at
them. That works on the default `BASE_IMAGE=ubuntu:24.04` (noble has
gcc-14 in main), but every Go backend that builds on
`nvcr.io/nvidia/l4t-jetpack:r36.4.0` — jammy under the hood — now fails
at the apt step:

    E: Unable to locate package gcc-14

This blocked unrelated jobs:
backend-jobs(*-nvidia-l4t-arm64-{stablediffusion-ggml, sam3-cpp, whisper,
acestep-cpp, qwen3-tts-cpp, vibevoice-cpp}). LocalVQE itself is only
matrix-built on ubuntu:24.04 (CPU + Vulkan), so it doesn't actually
need gcc-14 anywhere else.

Make the gcc-14 install conditional on the package being available in
the configured apt repos. On noble: identical behaviour to today (gcc-14
installed, update-alternatives points at it). On jammy: skip the
gcc-14 stanza entirely and let build-essential's default gcc take over,
which is what the other Go backends compile with anyway.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-05 15:10:13 +02:00
LocalAI [bot]
1634eece6b chore: ⬆️ Update ikawrakow/ik_llama.cpp to 45dfd80371785731bc2ed05a76252497a4e7a282 (#9644)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 15:09:40 +02:00
LocalAI [bot]
b88ddce0f3 chore: ⬆️ Update ggml-org/llama.cpp to eff06702b2a52e1020ea009ebd86cb9f5acabab5 (#9637)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 09:52:28 +02:00
Ettore Di Giacinto
bbcaebc1ef feat(concurrency-groups): per-model exclusive groups for backend loading (#9662)
* feat(concurrency-groups): per-model exclusive groups for backend loading

Adds `concurrency_groups: [...]` to model YAML configs. Two models that share
a group cannot be loaded concurrently on the same node — loading one evicts
the others, reusing the existing pinned/busy/retry policy from LRU eviction.

Layered design:
- Watchdog (pkg/model): per-node correctness floor — on every Load(), evict
  any loaded model that shares a group with the requested one. Pinned skips
  surface NeedMore so the loader retries (and ultimately logs a clear
  warning), instead of silently allowing the rule to be violated.
- Distributed scheduler (core/services/nodes): soft anti-affinity hint —
  scheduleNewModel prefers nodes that don't already host a same-group
  model, falling back to eviction only if every candidate has a conflict.
  Composes with NodeSelector at the same point in the candidate pipeline.

Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy
models running on different nodes is fine. The ConfigLoader is wired into
SmartRouter via a small ConcurrencyConflictResolver interface so the nodes
package keeps a narrow surface on core/config.

Refactors the inner LRU eviction body into a shared collectEvictionsLocked
helper and the loader retry loop into retryEnforce(fn, maxRetries, interval),
so both LRU and group enforcement share busy/pinned/retry semantics.

Closes #9659.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(watchdog): sync pinned + concurrency_groups at startup

The startup-time watchdog setup lives in initializeWatchdog (startup.go),
not in startWatchdog (watchdog.go). The latter is only invoked from the
runtime-settings RestartWatchdog path. As a result, neither
SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot,
so `pinned: true` and `concurrency_groups: [...]` only became effective
after a settings-driven watchdog restart.

Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end:
loading model A in group "heavy", then C with no group (coexists),
then B in group "heavy" now correctly evicts A and leaves [B, C].

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec

CI lint runs new-from-merge-base, so the existing pre-existing
`defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the
one introduced by the concurrency_groups YAML round-trip test is held
to errcheck. Wrap the remove in a closure that discards the error.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-05 08:42:50 +02:00
dependabot[bot]
22ae415695 chore(deps): bump docs/themes/hugo-theme-relearn from f69a085 to 8bb66fa (#9665)
chore(deps): bump docs/themes/hugo-theme-relearn

Bumps [docs/themes/hugo-theme-relearn](https://github.com/McShelby/hugo-theme-relearn) from `f69a085` to `8bb66fa`.
- [Release notes](https://github.com/McShelby/hugo-theme-relearn/releases)
- [Commits](f69a085322...8bb66fa674)

---
updated-dependencies:
- dependency-name: docs/themes/hugo-theme-relearn
  dependency-version: 8bb66fa674351f3a0b0917a7552caac686eca920
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-05 08:42:27 +02:00
LocalAI [bot]
3a0164670e chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.20.1 (#9649)
⬆️ Update vllm-project/vllm cu130 wheel

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 08:41:55 +02:00
LocalAI [bot]
a91b05907c feat(swagger): update swagger (#9660)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 01:50:17 +02:00
LocalAI [bot]
4ef45bbccd chore(model-gallery): ⬆️ update checksum (#9661)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-05 00:12:22 +02:00
Beshoy Girgis
b224a3d931 deps: update quic-go to v0.59.0 (fix session ticket panic) (#9655)
Update quic-go from v0.54.1 to v0.59.0 to fix the crypto/tls session
ticket panic described in quic-go/quic-go#5572.

Co-dependency go-libp2p upgraded from v0.43.0 to v0.48.0 (required for
quic-go v0.59.0 compatibility).

Signed-off-by: Beshoy Girgis <shoy@1ds.us>
2026-05-04 22:07:42 +02:00
Richard Palethorpe
bb033b16a9 feat: add LocalVQE backend and audio transformations UI (#9640)
feat(audio-transform): add LocalVQE backend, bidi gRPC RPC, Studio UI

Introduce a generic "audio transform" capability for any audio-in / audio-out
operation (echo cancellation, noise suppression, dereverberation, voice
conversion, etc.) and ship LocalVQE as the first backend implementation.

Backend protocol:
- Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and
  bidirectional AudioTransformStream for low-latency frame-by-frame use.
  This is the first bidi stream in the proto; per-frame unary at LocalVQE's
  16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server,
  embed,interface,base} with paired-channel ergonomics.

LocalVQE backend (backend/go/localvqe/):
- Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream
  shared lib + its libggml-cpu-*.so runtime variants directly — no MODULE
  wrapper needed because LocalVQE handles CPU feature selection internally
  via GGML_BACKEND_DL.
- Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it
  LocalVQE runs single-threaded at ~1× realtime instead of the documented
  ~9.6×.
- Reference-length policy: zero-pad short refs, truncate long ones (the
  trailing portion can't have leaked into a mic that wasn't recording).
- Ginkgo test suite (9 always-on specs + 2 model-gated).

HTTP layer:
- POST /audio/transformations (alias /audio/transform): multipart batch
  endpoint, accepts audio + optional reference + params[*]=v form fields.
  Persists inputs alongside the output in GeneratedContentDir/audio so the
  React UI history can replay past (audio, reference, output) triples.
- GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames
  (interleaved stereo mic+ref in, mono out). JSON session.update envelope
  for config; constants hoisted in core/schema/audio_transform.go.
- ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing
  utils.AudioToWav (with passthrough fast-path), so the user can upload any
  format / rate without seeing the model's strict 16 kHz constraint.
- BackendTraceAudioTransform integration so /api/backend-traces and the
  Traces UI light up with audio_snippet base64 and timing.
- Routes registered under routes/localai.go (LocalAI extension; OpenAI has
  no /audio/transformations endpoint), traced via TraceMiddleware.

Auth + capability + importer:
- FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on,
  in APIFeatures), three RouteFeatureRegistry rows.
- localvqe added to knownPrefOnlyBackends with modality "audio-transform".
- Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on
  huggingface.co/LocalAI-io/LocalVQE).

React UI:
- New /app/transform page surfaced via a dedicated "Enhance" sidebar
  section (sibling of Tools / Biometrics) — the page is enhancement, not
  generation, so it lives outside Studio. Two AudioInput components
  (Upload + Record tabs, drag-drop, mic capture).
- Echo-test button: records mic while playing the loaded reference through
  the speakers — the mic naturally picks up speaker bleed, giving a real
  (mic, ref) pair for AEC testing without leaving the UI.
- Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls)
  and useAudioPeaks hook (shared module-scoped AudioContext to avoid
  hitting browser context limits with three players on one page); migrated
  TTS, Sound, Traces audio blocks to use it.
- Past runs saved in localStorage via useMediaHistory('audio-transform') —
  the history entry stores all three URLs so clicking re-renders the full
  triple, not just the output.

Build + e2e:
- 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm,
  SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those
  two and let GPU-class hardware route through Vulkan in the gallery
  capabilities map.
- tests-localvqe-grpc-transform job in test-extra.yml (gated on
  detect-changes.outputs.localvqe).
- New audio_transform capability + 4 specs in tests/e2e-backends.
- Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js
  (8 specs covering tabs, file upload, multipart shape, history, errors).

Docs:
- New docs/content/features/audio-transform.md covering the (audio,
  reference) mental model, batch + WebSocket wire formats, LocalVQE param
  keys, and a YAML config example. Cross-links from text-to-audio and
  audio-to-text feature pages.

Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-04 22:07:11 +02:00
LocalAI [bot]
de83b72bb7 fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam (#9657)
* fix(nodes/health): skip stale-marking already-offline nodes

The health monitor re-emitted "Node heartbeat stale" + "Marking stale
node offline" + MarkOffline on every cycle for nodes that were already
in the offline (or unhealthy) state. For an operator-stopped node this
flooded the logs with the same WARN+INFO pair every check interval.

Skip the staleness branch when the node is already StatusOffline /
StatusUnhealthy — the state is already what we'd write, so neither the
log lines nor the DB update carry information.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(worker): wait for backend gRPC bind before replying to backend.install

The backend supervisor used to wait up to 4s (20 × 200ms) for the
backend's gRPC server to answer a HealthCheck, then log a warning and
reply Success with the bind address anyway. On slower nodes (a Jetson
Orin doing first-boot CUDA init, large CGO library load) the gRPC
listener wasn't up yet, so the frontend's first LoadModel dial returned
"connect: connection refused" and the operator chased a phantom network
issue instead of a startup-timing one.

Two changes:

  - Bump the readiness window to 30s. CUDA init on Orin/Thor first boot
    measures in seconds, not milliseconds.
  - On deadline-exceeded, stop the half-started process, recycle the
    port, and return an error with the backend's stderr tail. The
    frontend now gets a real failure with diagnostic context instead of
    a misleading ECONNREFUSED on a downstream dial.

Process death during the wait window keeps its existing fast-fail path.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(distributed): route auto-upgrade through BackendManager + bump LocalAGI/LocalRecall

Two distributed-mode bugs that surfaced together in the orchestrator
logs:

1. Auto-upgrade always failed with "backend not found".

   UpgradeChecker correctly routed CheckUpgrades through the active
   BackendManager (so the frontend aggregates worker state), but the
   auto-upgrade branch right below called gallery.UpgradeBackend
   directly with the frontend's SystemState. In distributed mode the
   frontend has no backends installed locally, so ListSystemBackends
   returned empty and Get(name) failed for every reported upgrade.
   Auto-upgrade now also goes through BackendManager.UpgradeBackend,
   which fans out to workers via NATS.

2. Embedding-load failure on a remote node crashed the orchestrator.

   When RAG init lazily called NewPersistentPostgresCollection and the
   remote embedding worker was unreachable, LocalRecall called
   os.Exit(1) inside the constructor, killing the orchestrator pod.
   LocalRecall now returns errors instead, LocalAGI surfaces them as a
   nil collection, and the existing RAGProviderFromState path returns
   (nil, nil, false) — the same code path the agent pool already takes
   when no RAG is configured. The orchestrator stays up; chat requests
   degrade to "no RAG available" until the embedding worker recovers.

Bumps:
  github.com/mudler/LocalAGI    → e83bf515d010
  github.com/mudler/localrecall → 6138c1f535ab

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-04 19:09:16 +02:00
LocalAI [bot]
1aeb4d7e73 chore(model gallery): 🤖 add 1 new models via gallery agent (#9653)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-04 15:42:08 +02:00
Ettore Di Giacinto
a271c72931 fix(react-ui/e2e): scope backendTrigger to <main> so it skips LanguageSwitcher
The LanguageSwitcher added in the i18n PR (#9642) lives in the sidebar
and also uses aria-haspopup="listbox" — same attribute the import-form
SearchableSelect uses. The Batch D / E tests' helper resolved the trigger
with `page.locator('button[aria-haspopup="listbox"]').first()`, which now
returns the language switcher (rendered first in DOM order, in the
sidebar) instead of the backend dropdown.

After clicking the wrong button, getByRole('option', { name: 'llama-cpp' })
naturally never resolves — language options aren't backend names — and
the test times out at 30s.

Scope the locator to the <main className="main-content"> wrapper so only
buttons inside the route's main content area match. The page layout has
the Sidebar outside <main>, so this cleanly excludes it.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-04 08:58:25 +00:00
Ettore Di Giacinto
ade5fd4b97 fix(react-ui): reflect disabled state on SearchableSelect button
The Backend dropdown is disabled while /backends/known is in flight
(disabled={isSubmitting || backendsLoading} in ImportModel.jsx). Until
now the disabled prop only guarded the internal onClick handler — there
was no `disabled` HTML attribute on the <button>, so the element
remained "actionable" from the outside.

That regressed the import-form-ux Batch D / E Playwright tests after
the i18next-suspense PR (#9642): suspending on the importModel
namespace defers the useEffect that fetches /backends/known, so when
the test calls backendTrigger.click() the button is rendered but
backendsLoading is still true. The click hits the no-op branch,
the dropdown stays closed, and `getByRole('option', { name: 'llama-cpp' })`
times out at 30s.

Surfacing the disabled state on the actual <button> makes Playwright
auto-wait until the dropdown is ready, fixes a11y (screen readers now
announce "disabled"), and removes the button from the tab order while
loading.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-04 08:21:03 +00:00
LocalAI [bot]
170d55c67d fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups (#9652)
* fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups

Two distinct bugs were causing tight retry loops in the distributed scheduler:

1. FindAndLockNodeWithModel ignored the model's NodeSelector. When a model
   was loaded on multiple nodes and only some matched the current selector,
   the function returned the lowest-in_flight node — even one the selector
   excluded. Route()'s post-check then fell through to scheduleNewModel,
   which targeted the matching node where the model was already at
   MaxReplicasPerModel capacity. Eviction couldn't help (the only loaded
   model on that node was the one being requested, and it was busy), so
   every request looped through "evicting LRU" → "all models busy".

   Fix: thread an optional candidateNodeIDs filter through
   FindAndLockNodeWithModel. Route() resolves the selector once via a new
   resolveSelectorCandidates helper and passes the matching IDs to both
   the cached-replica lookup and scheduleNewModel. The same helper
   replaces the inline selector block in scheduleNewModel.

2. ScheduleAndLoadModel (reconciler scale-up path) fell back to
   scheduleNewModel with backendType="" when no replica had ever been
   loaded for a model. The worker rejected the resulting backend.install
   ("backend name is empty") on every reconciler tick (~30s).

   Fix: remove the broken fallback. When GetModelLoadInfo has nothing
   stored, return a clear error instead of firing a doomed NATS install.
   The reconciler's existing scale-up failure log surfaces it once per
   tick; the model auto-replicates as soon as Route() serves it once and
   stores load info.

Also downgrade the post-LoadModel-failure StopGRPC error to Debug — that
cleanup attempt usually hits "model not found" because LoadModel failed
before registering the process, and the outer "Failed to load model"
error already carries the real reason.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]

* test(distributed): cover selector-aware FindAndLockNodeWithModel and reconciler scaleup guard

Two regression tests for the bugs fixed in the previous commit:

1. FindAndLockNodeWithModel — registry-level integration tests verify the
   candidateNodeIDs filter:
   - Returns the included node even when an excluded node has lower
     in_flight (the original selector-mismatch loop scenario).
   - Returns not-found when the model is loaded only on excluded nodes,
     forcing Route() to fall through to a fresh schedule instead of
     reusing the excluded replica.

2. ScheduleAndLoadModel — mock-based test verifies the reconciler scale-up
   path returns an error and does NOT fire backend.install when no replica
   has been loaded yet. fakeUnloader gains an installCalls slice so this
   negative assertion is direct.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-04 09:42:14 +02:00
Ettore Di Giacinto
28b4857bd6 fix(ci): leave ports.ubuntu.com upstream on self-hosted runners
mirrors.edge.kernel.org carries /ubuntu/ (amd64 archive) but does NOT
carry /ubuntu-ports/. With the previous default both archive and ports
pointed at kernel.org, so multi-arch builds (linux/amd64,linux/arm64)
on bigger-runner / arc-runner-set 404'd on the arm64 leg:

  Err:5 http://mirrors.edge.kernel.org/ubuntu-ports noble Release
    404  Not Found [IP: 213.196.21.55 80]

The original outage was on archive.ubuntu.com, not ports.ubuntu.com, so
default the self-hosted-ports-mirror to '' (= keep ports.ubuntu.com
upstream). apt-mirror.sh and the runner-side rewrite both already
no-op when the env var is empty.

Self-hosted amd64 still uses kernel.org for the main archive, which
worked fine in this run before the arm64 leg failed.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-04 07:28:43 +00:00
Ettore Di Giacinto
5503be1fb3 fix(ci): use http for the kernel.org mirror — bare ubuntu image has no CA bundle
The Docker build runs on the minimal ubuntu:24.04 base image, which
ships *without* ca-certificates. The very first apt-get update over
HTTPS therefore fails the TLS handshake ("No system certificates
available. Try installing ca-certificates."), and apt can't reach
ca-certificates itself to fix the situation — chicken and egg.

Apt validates package integrity via GPG-signed Release files, so plain
HTTP is safe for the archive. archive.ubuntu.com / azure.archive are
already accessed over HTTP for the same reason. Switch the kernel.org
defaults from https://mirrors.edge.kernel.org to
http://mirrors.edge.kernel.org so the in-Dockerfile rewrite works on
self-hosted runners too.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-03 23:29:53 +00:00
Ettore Di Giacinto
50580a84ae fix(ci): switch apt mirror per runner — azure on github-hosted, kernel.org on self-hosted
Self-hosted runners (arc-runner-set, bigger-runner) cannot reach
azure.archive.ubuntu.com — they live in different networks (e.g. our
arc-runner-set Kubernetes cluster) where Azure's mirror IP is not
routable. Symptom: "Connection failed [IP: 51.11.236.225 80]" with each
Ign:/Err: cycle taking 60s, hanging the build for ~16 minutes before
exit 100.

Pick the mirror based on `runner.environment`:

  * github-hosted (ubuntu-latest, ubuntu-24.04-arm) → Azure
    (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
    — same VPC as the runner.
  * self-hosted (arc-runner-set, bigger-runner)    → kernel.org
    (https://mirrors.edge.kernel.org for both archive and ports)
    — publicly reachable from any network.

The choice now lives in one place: the .github/actions/configure-apt-mirror
composite action exposes `effective-mirror` / `effective-ports-mirror`
outputs so the reusable workflows can forward the same value as Docker
build-args without duplicating the per-runner-environment branch.

The now-redundant `apt-mirror` / `apt-ports-mirror` workflow inputs on
image_build.yml and backend_build.yml are dropped — defaults live in the
composite action and are visible there.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-03 22:59:26 +00:00
Ettore Di Giacinto
8edac61e57 feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650)
* feat(ci): allow routing apt traffic through an alternate Ubuntu mirror

Adds opt-in APT_MIRROR / APT_PORTS_MIRROR knobs to all Dockerfiles, the
Makefile, and CI workflows so we can fail over to a non-canonical Ubuntu
mirror when archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com
are degraded (recently observed: multi-day DDoS against the default pool).

Defaults are empty everywhere — behavior is unchanged unless a mirror is
configured. To enable in CI, set the repo-level GitHub Actions variables
APT_MIRROR (and APT_PORTS_MIRROR for arm64 builds). Locally:
    make docker APT_MIRROR=http://azure.archive.ubuntu.com

A small POSIX-sh helper in .docker/apt-mirror.sh rewrites both DEB822
(/etc/apt/sources.list.d/ubuntu.sources, Ubuntu 24.04+) and the legacy
/etc/apt/sources.list before the first apt-get update. Dockerfile stages
load it via RUN --mount=type=bind, so there is no extra layer and no
cache invalidation when the script is unchanged. Reusable workflows also
rewrite the runner's own /etc/apt sources before any sudo apt-get call.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(apt-mirror): default to the Azure mirror, visible in the workflow source

Bakes Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com)
in as the default for both Docker builds and runner-side apt — rather than
hiding the URL behind a GitHub Actions repo variable that's not visible
from the source tree.

A new composite action at .github/actions/configure-apt-mirror is the
single source of truth for runner-side rewrites. Five standalone
workflows (build-test, release, tests-e2e, tests-ui-e2e, update_swagger)
just `uses: ./.github/actions/configure-apt-mirror`.

Three workflows (image_build, backend_build, checksum_checker) keep an
inline bash rewrite, because they install/upgrade git via apt *before*
the checkout step (so the local composite action isn't loadable yet).
The Azure URL is visible in those files too.

The `apt-mirror` / `apt-ports-mirror` inputs of the reusable workflows
keep their now-Azure defaults — they still feed the Docker build-args
block in addition to the inline runner-side rewrite. Callers (image.yml,
image-pr.yml, backend.yml, backend_pr.yml) drop the previous
`vars.APT_MIRROR` plumbing and rely on those defaults.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(apt-mirror): drop Force Install GIT, consolidate on the composite action

The PPA git upgrade ran add-apt-repository ppa:git-core/ppa, which talks
to api.launchpad.net — also part of Canonical's infrastructure and
currently returning HTTP 504. The Azure mirror only covers
archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com, not PPAs.

The system git that ubuntu-latest already ships is sufficient for
actions/checkout and the build pipeline, so just drop the upgrade. With
that gone, the apt-before-checkout constraint disappears too — all three
holdouts (image_build, backend_build, checksum_checker) can now switch
to ./.github/actions/configure-apt-mirror like the other five.

Net: 0 inline apt-mirror blocks, all 8 workflows route through the
composite action.

Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-03 23:50:13 +02:00
Tai An
0b024f0886 chore(model gallery): add chroma1-hd diffusers model (#9646)
Resolves https://github.com/mudler/LocalAI/issues/9604

Adds Chroma1-HD (lodestones/Chroma1-HD), an 8.9B-parameter
text-to-image model derived from FLUX.1-schnell, served via the
upstream-diffusers ChromaPipeline. Inference defaults follow the
model card recommendations: 40 steps, CFG 3.0, bfloat16.

Assisted-by: claude-code:opus-4.7
2026-05-03 09:06:31 +02:00
Ettore Di Giacinto
a6121e240e docs: credit the LocalAI maintainers team
Update README and docs to attribute maintenance to the LocalAI team
(Ettore Di Giacinto and Richard Palethorpe) and drop the autonomous
AI dev team section.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Edit] [Bash]
2026-05-02 23:37:04 +00:00
Ettore Di Giacinto
87cf736068 feat(react-ui): add multilingual (i18n) support (#9642)
Adds end-to-end internationalization to the React UI with five seed
languages (English, Italian, Spanish, German, Simplified Chinese) and
a sidebar-footer language switcher next to the existing theme toggle.

Library: react-i18next + i18next + i18next-http-backend +
i18next-browser-languagedetector. The detector caches the user's
choice in localStorage (key `localai-language`, mirroring the existing
`localai-theme` convention) and updates the `<html lang>` attribute on
change. fallbackLng is `en`, so any missing translation in another
locale falls back transparently.

Translation files live under `public/locales/<lng>/<ns>.json`. They
ride along with the existing `//go:embed react-ui/dist/*` directive,
but the previous SPA route in core/http/app.go only exposed
`/assets/*` from the embedded React build. This commit generalizes
the asset handler into a `serveReactSubdir(subdir)` helper and adds a
matching `/locales/*` route so i18next-http-backend can fetch the
JSONs at runtime. The http-backend `loadPath` is built via the
existing `apiUrl()` helper so instances served under a sub-path (e.g.
`<base href="/ui/">`) resolve correctly.

Namespaces (13): common, nav, errors, auth, home, models, importModel,
chat, agents, skills, collections, media, admin. Translated UI surfaces
include the sidebar/header/footer chrome, login + account flows, the
Home dashboard (incl. the manage-by-chat assistant CTA), the model
gallery + import flow, the chat experience (Chat.jsx + ChatsMenu),
agents/skills/collections list pages, the studio media tabs (Image,
Video, TTS), and the admin page-headers (Settings incl. its section
nav, Manage, Backends, Traces, Nodes, P2P, Users, Usage). Shared
components (ConfirmDialog, Toast) take their default labels from the
common namespace so callers don't need to pass strings explicitly.

Tooling for incremental adoption is included:
  - `i18next-parser.config.js` + `npm run i18n:extract` to sweep `t()`
    keys into the JSON skeletons.
  - `scripts/translate-locales.mjs` (one-off helper) to bootstrap
    non-English locales from English source via OpenAI or Anthropic
    APIs, with --copy mode as a placeholder fallback. Idempotent;
    preserves existing translations unless --overwrite is passed.

Larger config-driven pages (ModelEditor, Settings deep field forms,
AgentChat/AgentCreate, SkillEdit, CollectionDetails, Talk, Sound,
biometrics, FineTune/Quantize, Users modals, Nodes/P2P install
pickers, BackendLogs, Traces deep filters, Explorer) intentionally
keep their inner content untranslated for now — they fall back to
English via fallbackLng so functionality is unaffected, and the
extracted-strings pattern + the bootstrap script make follow-up
extraction straightforward.

The initial Suspense fallback at the root in main.jsx covers the
first JSON fetch on cold load. A simple `.app-boot-spinner` styled
in App.css provides a non-empty paint while the first namespace
loads.

Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-02 22:42:08 +02:00
LocalAI [bot]
1ad5b5907d feat(swagger): update swagger (#9643)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-02 22:33:47 +02:00
Russell Sim
18e039f305 fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds (#9626)
* fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds

399c1dec wired amdgpu-targets through the backend_build workflow_call
interface, intending the input's default value to cover matrix entries
that don't specify targets. However, GitHub Actions only applies a
workflow_call input default when the caller omits the input entirely.
When backend.yml passes `amdgpu-targets: ${{ matrix.amdgpu-targets }}`
and the matrix entry has no amdgpu-targets key, the expression evaluates
to an empty string, which is treated as an explicit value — bypassing
the default. The result is Docker receiving AMDGPU_TARGETS="" which in
turn causes Make's ?= default to be skipped (since the variable is
already set in the environment, even to empty), and cmake gets
-DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an
indeterminate target rather than the intended GPU list.

Fix this at two levels:

1. backend.yml: use a || fallback in the expression so that an undefined
   matrix.amdgpu-targets never reaches the reusable workflow as an empty
   string. The target list is the canonical default and lives here.

2. backend_build.yml: remove the now-misleading default value from the
   input declaration. The default never fired due to the above bug, so
   keeping it implied a guarantee that didn't exist.

3. backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard
   after the ?= assignment so that if AMDGPU_TARGETS is empty (whether
   from environment or any future CI wiring mistake) the build fails
   immediately with a clear message rather than silently producing a
   binary compiled for an unknown GPU target.

Assisted-by: Claude Code:claude-sonnet-4-6
Signed-off-by: Russell Sim <rsl@simopolis.xyz>

* fix(build): plumb AMDGPU_TARGETS through to Docker builds

The docker-build-backend Makefile macro and Dockerfile.golang did not
pass AMDGPU_TARGETS to the inner make invocation, so hipblas builds
always used the backend Makefile's hardcoded default GPU targets
regardless of what was specified via environment or CI inputs.

Signed-off-by: Russell Sim <rsl@simopolis.xyz>

---------

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-05-02 15:53:14 +02:00
Ettore Di Giacinto
b1a99436c7 feat(branding): admin-configurable instance name, tagline, and assets (#9635)
Adds a whitelabeling feature so an operator can replace the LocalAI
instance name, tagline, square logo, horizontal logo, and favicon from
the admin Settings page. Defaults fall back to the bundled assets so
existing installs are unaffected.

The public GET /api/branding endpoint is reachable pre-auth so the
login screen can render the configured branding before sign-in.
Mutating routes (POST/DELETE /api/branding/asset/:kind) remain
admin-only. Text fields (instance_name, instance_tagline) ride the
existing /api/settings flow; binary assets get a dedicated multipart
upload route that persists files under DynamicConfigsDir/branding/.

To prevent the Settings page's stale local state from clobbering an
upload on save, UpdateSettingsEndpoint preserves whatever the on-disk
asset filename fields are regardless of the body — /api/branding/asset/*
are the sole writers for those fields.

The MCP catalog gains get_branding and set_branding tools (text fields
only; file upload stays UI-only) plus a configure_branding skill prompt.

While wiring this up, the same restart-loss class of bug surfaced for
several existing fields whose RuntimeSettings entries were never read
by the startup loader. Fix loadRuntimeSettingsFromFile() to load:

  - branding (instance_name, instance_tagline, *_file basenames)
  - auto_upgrade_backends, prefer_development_backends
  - localai_assistant_enabled
  - open_responses_store_ttl
  - the 7 existing AgentPool fields (enabled, default/embedding model,
    chunking sizes, enable_logs, collection_db_path)

Also exposes 3 new AgentPool runtime settings (vector_engine,
database_url, agent_hub_url) via /api/settings + the Settings UI, with
the same load-on-startup wiring. The file watcher's manual-edit path
is intentionally not changed — the in-process API endpoints already
update appConfig directly, so the watcher is redundant for supported
flows and a separate refactor for everything else.

15 TDD specs cover the loader behaviour (1 branding + 11 adjacent + 3
new agent-pool); 2 specs cover the persistence helpers and the
clobber-prevention contract.


Assisted-by: claude-code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-02 15:51:36 +02:00
Ettore Di Giacinto
7325046650 fix(diffusers): drop compel from requirements to unblock pip resolver (#9632)
compel 2.3.1 (latest, Nov 2025) declares transformers~=4.25 in its
metadata, i.e. >=4.25,<5.0. After transformers 5.0 (2026-01-26) and
huggingface-hub 1.0 (2025-10-27) shipped, the weekly DEPS_REFRESH
cache rotation in CI started seeing the new majors and pip's resolver
went into multi-hour backtracking storms walking every transformers
4.x candidate against every accelerate/hf-hub/tokenizers combination
to find a set compel would accept. The 2026-04-29 backend-build for
the diffusers backend (darwin-mps + l4t + cublas13-turboquant matrix
cells) hit the GitHub Actions 6h job timeout still inside pip
install — the build itself never started.

compel is the only hard upper bound on transformers in this stack
(diffusers, accelerate, peft, optimum-quanto are all flexible), and
upstream support for transformers 5 is still in flight: damian0815/
compel#129 ("Modernize Compel for Transformers 5") and #128 ("Bump
transformers version to >5.0") are both open as of today.

backend.py only constructs Compel() when COMPEL=1 is set in the env
(default off), so make compel a true optional extra:

  - Wrap the top-level `from compel import ...` in try/except
    ImportError, mirroring the existing sd_embed pattern.
  - Auto-disable COMPEL with a warning when the module isn't
    installed, instead of crashing on module load.
  - Drop compel from all eight requirements-*.txt variants so the
    resolver no longer has to satisfy its transformers cap.
  - Leave a TODO in backend.py and in each requirements file
    pointing at the upstream PR/issue, so the dependency can be
    reinstated once compel supports transformers >= 5.

Users who rely on weighted-prompt embeddings can opt in with a
manual `pip install compel` alongside COMPEL=1; the warning emitted
on startup tells them how.

Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit WebFetch]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-01 14:45:14 +02:00
Ettore Di Giacinto
8452068f43 feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models (#9630)
The WhisperImporter's Import() switch ordered LooksLikeURL ahead of the
HuggingFace branch, so any https://huggingface.co/<owner>/<repo> URI
(e.g. LocalAI-io/whisper-large-v3-it-yodas-only-ggml) hijacked the URL
path. FilenameFromUrl returned the repo slug, the gallery entry pointed
at the HTML repo page, the SHA256 was empty, and the HF file listing
was effectively dead code for HTTPS imports. The HF branch only fired
for huggingface://owner/repo and hf://owner/repo references.

Gate the URL case on a "ggml-*.bin" basename signal — mirroring how
the llama-cpp importer gates on ".gguf" — so direct file URLs still
take the URL path while HF repo URLs fall through to the HF branch.
There the file listing is actually consulted: every ggml-*.bin entry
is collected and one is picked by the new preferences.quantizations
preference (default q5_0; comma-separated for fallback ordering).

Pin the chosen file under whisper/models/<name>/<file> so a single
repo can ship q4_0/q5_0/q8_0 side-by-side without colliding on disk,
matching the llama-cpp/models/<name>/ layout. The fallback when no
preference matches is the last available ggml file, mirroring
llama-cpp's pickPreferredGroup behaviour.

Tests: replace the previous probe spec with positive assertions
against LocalAI-io/whisper-large-v3-it-yodas-only-ggml (default →
ggml-model-q5_0.bin, quantizations=q4_0 → ggml-model-q4_0.bin) plus
two offline specs that build a fake hfapi.ModelDetails to cover the
fallback rule and non-ggml filtering without touching the network.


Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit WebFetch]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-01 12:03:07 +02:00
ER-EPR
0b0078047f Add tags to qwen3-vl-reranker and Qwen3-VL-Embedding to the gallery (#9628)
* Add tags to Qwen3-VL-Reranker models

Added tags for reranker models in index.yaml.

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Add Qwen3-VL-Embedding models to gallery

Added Qwen3-VL-Embedding-8B and Qwen3-VL-Embedding-2B models with detailed descriptions and file references.

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

---------

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>
2026-05-01 10:56:58 +02:00
Tai An
80961d2da6 feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629)
Closes #9601

Makes the temporary scratch paths in vllm, vllm-omni, tinygrad, and pocket-tts
backends configurable via the standard TMPDIR env var, instead of always writing
to /tmp. This is a one-line change per call site that calls tempfile.gettempdir()
for the directory and keeps the same filename suffix.

Users who run on systems with a small root partition (or want to relocate scratch
files to a larger volume) can now redirect these by setting TMPDIR
(e.g. TMPDIR=/data/tmp), without affecting the existing LOCALAI_GENERATED_CONTENT_PATH
or LOCALAI_UPLOAD_PATH options that already cover other temp paths.

Files touched:
- backend/python/vllm/backend.py        (1 site: video base64 scratch)
- backend/python/tinygrad/backend.py    (1 site: image fallback dst)
- backend/python/pocket-tts/backend.py  (1 site: tts wav fallback dst)
- backend/python/vllm-omni/backend.py   (2 sites: video + audio scratch)
2026-05-01 10:56:24 +02:00
LocalAI [bot]
9c4c3f9d8f chore: ⬆️ Update ggml-org/llama.cpp to beb42fffa45eded44804a1fd4916146222371581 (#9624)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-01 02:02:56 +02:00
LocalAI [bot]
273416f54b chore: ⬆️ Update ikawrakow/ik_llama.cpp to a8aecbf15933295af96504f9a693998322185b5c (#9625)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-01 02:02:29 +02:00
Ettore Di Giacinto
c02a50f2ab feat(llama-cpp): bump to d775992 and adapt to spec params refactor (#9618)
Bumps backend/cpp/llama-cpp/Makefile LLAMA_VERSION from 665abc6 to
d775992, picking up upstream PR ggml-org/llama.cpp#22397 which splits
common_params_speculative into nested draft / ngram_simple / ngram_mod
sub-structs. Renames every grpc-server.cpp reference to match:

  speculative.mparams_dft.path  -> speculative.draft.mparams.path
  speculative.{n_max,n_min}     -> speculative.draft.{n_max,n_min}
  speculative.{p_min,p_split}   -> speculative.draft.{p_min,p_split}
  speculative.{n_gpu_layers,n_ctx} -> speculative.draft.{n_gpu_layers,n_ctx}
  speculative.ngram_size_n      -> speculative.ngram_simple.size_n
  speculative.ngram_size_m      -> speculative.ngram_simple.size_m
  speculative.ngram_min_hits    -> speculative.ngram_simple.min_hits

The "speculative.n_max" JSON key sent to the upstream server stays
unchanged — server-task.cpp still reads it and routes the value into
draft.n_max internally.

The turboquant fork (TheTom/llama-cpp-turboquant @ 11a241d) branched
before #22397 and still exposes the flat layout. Since turboquant
reuses the shared backend/cpp/llama-cpp/grpc-server.cpp, extend
patch-grpc-server.sh with an idempotent sed block that reverts the
ten field references back to the legacy flat names on the build copy
only — the original under backend/cpp/llama-cpp/ stays compiling
against vanilla upstream. Drop the block once the fork rebases.

ik-llama-cpp has its own grpc-server.cpp with no speculative refs
(0/2661 lines), so it is unaffected.

Validated locally with `make docker-build-llama-cpp` (avx, avx2,
avx512, fallback, grpc + rpc-server all built; image exported).


Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-30 08:44:43 +02:00
LocalAI [bot]
76971fb2aa chore: ⬆️ Update leejet/stable-diffusion.cpp to 3d6064b37ef4607917f8acf2ca8c8906d5087413 (#9617)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-30 08:43:42 +02:00
LocalAI [bot]
ebd9fcbe20 chore(model gallery): 🤖 add 1 new models via gallery agent (#9615)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-29 22:33:41 +02:00
Ettore Di Giacinto
091eda8d70 feat: react chat redesign (#9616)
* feat(react-ui): redesign chat — popover history, focus on send, density pass

Replace the persistent 260px conversation sidebar with a Cmd/Ctrl+K
popover (ChatsMenu) so the conversation owns the page. Once a chat has
at least one message we auto-collapse the global app rail and fade
non-essential header chrome; Esc gives the user back the full chrome
for the rest of the session.

Move Canvas mode and the MCP dropdown into the input wrapper as mode
chips — they describe what's armed for the next message and now live
where the user composes. The chat header drops to Chats · title ·
ModelSelector · overflow · settings, and an overflow menu carries
admin-only Manage mode along with Info / Edit / Export / Clear.

Density pass: tighter header (40px), smaller avatars with the assistant
left-border accent doing the work, 88% bubble width, modern
field-sizing on the textarea, 32px send/stop buttons.

Empty state now surfaces a Recent strip (top 4 non-empty chats) and a
Cmd+K hint, replacing the discoverability the persistent sidebar used
to provide.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

* feat(react-ui): chat input chips, slimmer menu, focus mode polish

Move Canvas mode and the MCP dropdown into the input wrapper as compact
mode chips — they describe what's armed for the next message and now
sit where the user composes. The MCP popover flips upward when anchored
to the input row so it stays on-screen.

Eliminate the chat header overflow ("…") menu entirely; relocate each
item to its semantic home so users don't have to remember a
miscellany drawer:

- Manage mode toggle → top of the Settings drawer, alongside the
  other sticky chat knobs. The shield next to the title still
  signals state at a glance.
- Model info / Edit config → small admin-only "ⓘ" button next to the
  ModelSelector; the existing model-info panel now hosts the Edit
  config link.
- Export as Markdown → per-row hover action in ChatsMenu, so it works
  for any chat (not just the active one).
- Clear chat history → destructive button at the bottom of the
  Settings drawer.

Make the Sidebar listen to its own `sidebar-collapse` event so the
chat's focus mode actually shrinks the rail (it previously only
flipped the layout class, leaving the sidebar element at full width
and overlapping the chat). Drop the focus-mode toast — the visual
shift is enough; the toast was noise.

Define `--color-text-tertiary` in both themes; without it metadata
text (recent strip timestamps and a few other sites) was inheriting
the platform default, which read as black on the dark surface.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

* fix(model/log-store): close merged channel exactly once; clean up Remove

Two latent races in BackendLogStore.Subscribe could panic under load
(distributed e2e test triggered "send on closed channel" at
backend_log_store.go:288):

1. The aggregated path closed the merged channel `ch` from two
   places — the fan-in waiter goroutine (after all source channels
   drained) and unsubscribe(). When unsubscribe ran while a fan-in
   goroutine was mid-flight on `ch <- line`, the close beat the send
   and the runtime panicked. Now `ch` is closed by exactly one
   goroutine: the waiter that observes all fan-in goroutines finish.
   unsubscribe() only closes the per-buffer source channels — the
   for-range in each fan-in goroutine then exits naturally and the
   waiter takes care of the merged close.

2. Remove() closed every subscriber channel but didn't delete the
   entries from the subscribers map, so a concurrent unsubscribe()
   would call close() again on the already-closed channel
   ("close of closed channel"). Clear the map entry while closing.

Add a regression test that hammers AppendLine concurrently with
Subscribe + unsubscribe + Remove; the race detector catches both
classes of regression.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

* test(model/log-store): port backend log store tests to ginkgo

Bring backend_log_store_test.go in line with the rest of pkg/model
(loader_test, watchdog_test, store_test): same external test package
(`model_test`), same ginkgo + gomega imports, same Describe/It
nesting around the public API. Behaviour is unchanged — the four
existing scenarios plus the unsubscribe race regression all run as
specs under the existing `TestModel` suite.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-29 22:33:26 +02:00
Ettore Di Giacinto
fe6eb57082 feat(vibevoice-cpp): add purego TTS+ASR backend (#9610)
* feat(vibevoice-cpp): add purego TTS+ASR backend

Wire up Microsoft VibeVoice via the vibevoice.cpp C ABI as a new
purego-based Go backend that serves both Backend.TTS and
Backend.AudioTranscription from a single gRPC binary. Mirrors the
qwen3-tts-cpp / sherpa-onnx pattern so the variant matrix
(cpu/cuda12/cuda13/metal/rocm/sycl-f16/f32/vulkan/l4t) and the
e2e-backends gRPC harness reuse existing infrastructure.

- backend/go/vibevoice-cpp/ - Makefile, CMakeLists, purego shim, gRPC
  Backend with model-dir auto-detection, closed-loop TTS->ASR smoke test
- backend/index.yaml - &vibevoicecpp meta + 18 image entries
- Makefile - .NOTPARALLEL, BACKEND_VIBEVOICE_CPP, docker-build wiring,
  test-extra-backend-vibevoice-cpp-{tts,transcription} e2e wrappers
- .github/workflows/backend.yml - matrix entries for all variants
- .github/workflows/test-extra.yml - per-backend smoke + 2 gRPC e2e jobs

* feat(vibevoice-cpp): drop hardcoded glob detection, add gallery entries

Refactor backend Load() to follow the standard Options[] convention
used by sherpa-onnx and the rest of the multi-role backends:
ModelFile is the primary gguf, supplementary paths come through
opts.Options[] as key=value (or key:value for Make-target compat),
resolved against opts.ModelPath. type=asr/tts decides the role of
ModelFile when neither tts_model nor asr_model is set explicitly.

Add gallery/index.yaml entries:
- vibevoice-cpp     - realtime 0.5B Q8_0 TTS + tokenizer + Carter voice
- vibevoice-cpp-asr - long-form ASR Q8_0 + tokenizer

Both pull from huggingface://mudler/vibevoice.cpp-models with sha256
verification. parameters.model + Options[] paths are siblings under
{models_dir} per the qwen3-tts-cpp convention.

Update Makefile e2e wrappers to pass BACKEND_TEST_OPTIONS comma+colon
style, and tighten the per-backend Go closed-loop test to use the
explicit Options API.

* fix(vibevoice-cpp): force whole-archive link so vv_capi_* exports survive

libvibevoice is a STATIC archive linked into the MODULE library.
Without --whole-archive (or -force_load on Apple, /WHOLEARCHIVE on
MSVC), the linker garbage-collects symbols not referenced from this
translation unit - which means dlopen+RegisterLibFunc panics with
'undefined symbol: vv_capi_load' at backend startup, since purego
looks them up by name and our cpp/govibevoicecpp.cpp doesn't call
them directly.

* test(vibevoice-cpp): rewrite suite with Ginkgo v2

Match the convention used by backend/go/sherpa-onnx/backend_test.go.
The suite now covers backend semantics that don't need purego (Locking,
empty-ModelFile rejection, TTS/ASR-without-loaded-model errors) on top
of the gRPC lifecycle specs (Health, Load, closed-loop TTS->ASR).
Model-dependent specs Skip() when VIBEVOICE_MODEL_DIR is unset, so
`go test ./backend/go/vibevoice-cpp/` is green on a clean checkout
and runs the heavyweight closed-loop spec when test.sh has staged
the bundle.

* fix(vibevoice-cpp): implement TTSStream + AudioTranscriptionStream

The gRPC server's stream handlers (pkg/grpc/server.go) spawn a
goroutine that ranges over a chan; the only thing closing that chan
is the backend's own *Stream method. With the default Base stub
returning 'unimplemented' and never touching the chan, the server
goroutine hangs forever and the client hits DeadlineExceeded - which
is exactly what the e2e harness saw in the test-extra-backend-vibevoice-cpp-tts
matrix run.

TTSStream synthesizes via vv_capi_tts to a tempfile, then emits a
streaming WAV header (chunk sizes 0xFFFFFFFF so HTTP clients can
start playback before the full PCM lands) followed by the PCM body
in 64 KB slices. The header + >=2 PCM frames satisfy the harness's
'expected >=2 chunks' assertion and give a real progressive stream.

AudioTranscriptionStream runs the offline transcription, emits each
segment as a delta, and closes with a final_result whose Text equals
the concatenated deltas (the harness asserts those match).

Two new Ginkgo specs guard the close-channel-on-error path so the
deadline-exceeded regression can't come back silently.

* fix(vibevoice-cpp): silence errcheck on cleanup paths

Lint flagged six unchecked Close()/Remove()/RemoveAll() calls along
purely-cleanup deferred paths. Wrap each in '_ = ...' (or a closure
for defers that take args) - matches what the rest of the LocalAI
backend/go/* tree already does for these callsites.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(vibevoice-cpp): closed-loop slot fill + modelRoot-relative path resolution

Two bugs the test-extra-backend-vibevoice-cpp-* CI matrix surfaced:

1. Closed-loop Load with ModelFile=tts.gguf + Options[asr_model=...] left
   v.ttsModel empty, because the default-fill block only ran when BOTH
   slots were empty. vv_capi_load then got tts="" + a voice and the
   C side rejected it with rc=-3 'TTS model required to load a voice'.
   Fix: ModelFile fills the *primary* role-slot (decided by 'type=' in
   Options, defaulting to tts) independently of the secondary, so
   ModelFile + asr_model resolves to both.

2. resolvePath stat'd CWD before falling back to relTo. With LocalAI
   launched from a directory that happens to contain a same-named
   file, supplementary Options[] paths could leak away from the
   models dir. Drop the CWD probe entirely - relative paths now
   *always* join onto opts.ModelPath (the gallery convention).

New Ginkgo coverage:
  * 'ModelFile slot resolution' (4 specs) - asr_model+ModelFile, type=asr,
    explicit tts_model override, key:value variant.
  * 'resolvePath (relative-to-modelRoot)' (5 specs) - join, abs passthrough,
    empty input, empty relTo, and the CWD-trap regression test.
  * 'Load resolves relative Options paths against opts.ModelPath' - end-
    to-end gallery layout round-trip.

Verified locally: 19/19 specs pass (with model bundle, including the
closed-loop TTS->ASR; without bundle, 17 pass + 2 model-dependent skip).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(vibevoice-cpp): use gallery convention in closed-loop spec

The 'loads the realtime TTS model' / closed-loop specs were passing
already-prefixed paths into Options[]:

    Options: ['tokenizer=' + filepath.Join(modelDir, 'tokenizer.gguf')]

Combined with no ModelPath set on the request, the backend's
modelRoot fell back to filepath.Dir(ModelFile) = modelDir, then
resolvePath joined the prefixed Options path on top of it -
producing 'vibevoice-models/vibevoice-models/tokenizer.gguf' when
the CI's VIBEVOICE_MODEL_DIR is the relative './vibevoice-models'.

The fix is to mirror the gallery contract LocalAI core actually
sends in production: ModelPath is the models root (absolute),
ModelFile is a name *under* it, every Options[] path is relative
to ModelPath. Uses filepath.Base() to get bare filenames.

Verified locally with both VIBEVOICE_MODEL_DIR=/tmp/vv-bundle (abs)
and VIBEVOICE_MODEL_DIR=vibevoice-models (the relative shape that
broke CI). Both: 19/19 specs pass, ~55-60s.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(vibevoice-cpp): switch ASR to Q4_K + bump transcription timeout

The Q8_0 ASR gguf is ~14 GB - too big to fit alongside the runner
image, the docker build cache, and the test artifacts on a free
ubuntu-latest GHA runner; 'test-extra-backend-vibevoice-cpp-transcription'
was getting SIGTERM'd at 90 min before the model could finish loading.

Switch to Q4_K (~10 GB on disk, slightly faster CPU decode) for:
  * the e2e harness Make target
  * the gallery 'vibevoice-cpp-asr' entry (parameters + files block)
  * the per-backend test.sh auto-download list

Bump tests-vibevoice-cpp-grpc-transcription's timeout-minutes from
90 to 150 - even with Q4_K, the 30 s JFK clip on a CPU runner needs
runway above the previous 90 min cap.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(vibevoice-cpp): drop transcription gRPC e2e job - too heavy for free runners

The vibevoice ASR is a 7B-parameter model. Even on Q4_K (~10 GB on
disk) a single 30 s transcription saturates the per-test 30 min
timeout in the e2e-backends harness on a 4-core ubuntu-latest, and
the 10 GB download + Docker layer + working space leaves no headroom
on the runner's free disk. Two attempts in CI got SIGTERM'd at the
LoadModel boundary - the bottleneck isn't tunable from the workflow
side without a paid-tier runner.

The per-backend tests-vibevoice-cpp job already runs the same
AudioTranscription path via a closed-loop TTS->ASR Ginkgo spec - same
gRPC contract, same model, single process - so the standalone
tests-vibevoice-cpp-grpc-transcription job was redundant on top of
the disk/CPU pressure.

The Makefile target test-extra-backend-vibevoice-cpp-transcription
stays for local invocation on workstations that can afford it -
useful when developing the streaming codepaths.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(vibevoice-cpp): restore transcription gRPC e2e on bigger-runner

Switch tests-vibevoice-cpp-grpc-transcription from ubuntu-latest to
the self-hosted 'bigger-runner' label that GPU image builds in
backend.yml use, plus the documented Free-disk-space prep step (purge
dotnet / ghc / android / CodeQL caches) the disabled vllm/sglang
entries in this file describe. That gives the 7B-param Q4_K ASR
model the disk + CPU runway it needs.

Keep timeout-minutes: 150 - even on a beefier runner the 30 s JFK
decode plus 10 GB download has to fit comfortably.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(vibevoice-cpp): apt-get install make on bigger-runner before transcription e2e

bigger-runner is a self-hosted bare runner without the standard
ubuntu image's preinstalled build tools, so the previous job died at
the very first command with 'make: command not found' (exit 127).
Add the Dependencies step that the disabled vllm/sglang entries in
this file already document - apt-get installs make + build-essential
+ curl + unzip + ca-certificates + git + tar before the make target
runs. Mirrors how every other 'runs-on: bigger-runner' entry in
backend.yml prepares the runner.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-29 22:22:14 +02:00
LocalAI [bot]
13fe37df89 chore(model gallery): 🤖 add 1 new models via gallery agent (#9611)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-29 09:06:22 +02:00
Richard Palethorpe
4916f8c880 feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563)
* feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map

LocalAI's vLLM backend wraps a small typed subset of vLLM's
AsyncEngineArgs (quantization, tensor_parallel_size, dtype, etc.).
Anything outside that subset -- pipeline/data/expert parallelism,
speculative_config, kv_transfer_config, all2all_backend, prefix
caching, chunked prefill, etc. -- requires a new protobuf field, a
Go struct field, an options.go line, and a backend.py mapping per
feature. That cadence is the bottleneck on shipping vLLM's
production feature set.

Add a generic `engine_args:` map on the model YAML that is
JSON-serialised into a new ModelOptions.EngineArgs proto field and
applied verbatim to AsyncEngineArgs at LoadModel time. Validation
is done by the Python backend via dataclasses.fields(); unknown
keys fail with the closest valid name as a hint.
dataclasses.replace() is used so vLLM's __post_init__ re-runs and
auto-converts dict values into nested config dataclasses
(CompilationConfig, AttentionConfig, ...). speculative_config and
kv_transfer_config flow through as dicts; vLLM converts them at
engine init.

Operators can now write:

  engine_args:
    data_parallel_size: 8
    enable_expert_parallel: true
    all2all_backend: deepep_low_latency
    speculative_config:
      method: deepseek_mtp
      num_speculative_tokens: 3
    kv_cache_dtype: fp8

without further proto/Go/Python plumbing per field.

Production defaults seeded by hooks_vllm.go: enable_prefix_caching
and enable_chunked_prefill default to true unless explicitly set.

Existing typed YAML fields (gpu_memory_utilization,
tensor_parallel_size, etc.) remain for back-compat; engine_args
overrides them when both are set.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(vllm): pin cublas13 to vLLM 0.20.0 cu130 wheel

vLLM's PyPI wheel is built against CUDA 12 (libcudart.so.12) and won't
load on a cu130 host. Switch the cublas13 build to vLLM's per-tag cu130
simple-index (https://wheels.vllm.ai/0.20.0/cu130/) and pin
vllm==0.20.0. The cu130-flavoured wheel ships libcudart.so.13 and
includes the DFlash speculative-decoding method that landed in 0.20.0.

cublas13 install gets --index-strategy=unsafe-best-match so uv consults
both the cu130 index and PyPI when resolving — PyPI also publishes
vllm==0.20.0, but with cu12 binaries that error at import time.

Verified: Qwen3.5-4B + z-lab/Qwen3.5-4B-DFlash loads and serves chat
completions on RTX 5070 Ti (sm_120, cu130).

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* ci(vllm): bot job to bump cublas13 vLLM wheel pin

vLLM's cu130 wheel index URL is itself version-locked
(wheels.vllm.ai/<TAG>/cu130/, no /latest/ alias upstream), so a vLLM
bump means rewriting two values atomically — the URL segment and the
version constraint. bump_deps.sh handles git-sha-in-Makefile only;
add a sibling bump_vllm_wheel.sh and a matching workflow job that
mirrors the existing matrix's PR-creation pattern.

The bumper queries /releases/latest (which excludes prereleases),
strips the leading 'v', and seds both lines unconditionally. When the
file is already on the latest tag the rewrite is a no-op and
peter-evans/create-pull-request opens no PR.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* docs(vllm): document engine_args and speculative decoding

The new engine_args: map plumbs arbitrary AsyncEngineArgs through to
vLLM, but the public docs only covered the basic typed fields. Add a
short subsection in the vLLM section explaining the typed/generic
split and showing a worked DFlash speculative-decoding config, with
pointers to vLLM's SpeculativeConfig reference and z-lab's drafter
collection.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-29 00:49:28 +02:00
LocalAI [bot]
55afda22e3 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 453a027c17e4d63a7f16b871197a396240a65138 (#9608)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-29 00:18:19 +02:00
LocalAI [bot]
1fe3558ec6 feat(swagger): update swagger (#9607)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-29 00:18:02 +02:00
Ettore Di Giacinto
e370318bd7 fix(vllm): seed pybind11 for fastsafetensors build under --no-build-isolation
fastsafetensors==0.3 (transitive dep of vllm) imports pybind11 in
setup.py without declaring it in build-system.requires. With
--no-build-isolation it has to already exist in the venv, otherwise the
wheel build fails with ModuleNotFoundError on arm64 L4T CUDA 13 (and
any other profile that picks up vllm 0.20.0).

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-28 20:08:26 +00:00
Richard Palethorpe
4443250756 chore: add golangci-lint with new-from-merge-base baseline (#9603)
* chore: add golangci-lint with new-from-merge-base baseline

Configure golangci-lint v2 with the standard linter set (errcheck, govet,
ineffassign, unused) plus forbidigo, which enforces the Ginkgo/Gomega-only
test convention from .agents/coding-style.md by rejecting stdlib testing
calls (t.Errorf, t.Fatalf, t.Run, ...). staticcheck is disabled — the
codebase has many pre-existing QF-style suggestions not worth gating on.

issues.new-from-merge-base = master makes the lint job a gate for new
issues only; the ~1300 pre-existing baseline stays visible via
'make lint-all' for incremental cleanup. CI runs 'make lint'.

Backends needing C/C++ headers we don't install in the lint runner are
excluded via a deny list in the Makefile (backend/go/{piper,silero-vad,
llm}, cmd/launcher). Discovery still flows through 'go list ./...', so
new packages are scanned automatically.

To make backend/go/{sam3-cpp,stablediffusion-ggml,whisper} typecheckable,
move their .cpp/.h sources into cpp/ subdirs (matching qwen3-tts-cpp /
acestep-cpp). Without this 'go list' rejects the package because Go does
not allow .cpp alongside .go without cgo.

Fix two real bugs found by lint in tests/integration/ (run only via
'make test-stores', not default CI): a stale zerolog reference left over
from the slog migration (c37785b7) and an unused 'os' import.

Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* ci(lint): generate proto sources and fetch full history

The lint job was failing for two reasons:

- pkg/grpc/proto/*.go is generated, not checked in. Several packages
  import it, so without 'make protogen-go' typecheck fails project-wide
  with "no required module provides package github.com/mudler/LocalAI/
  pkg/grpc/proto".

- golangci-lint's new-from-merge-base needs to git-merge-base the PR
  against master, but actions/checkout's default shallow clone doesn't
  fetch master. fetch-depth: 0 brings full history; the config now
  references origin/master (the remote-tracking branch that survives
  the shallow checkout) instead of bare master (which doesn't exist
  locally after checkout).

Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* ci(lint): stub react-ui/dist for go:embed glob

core/http/app.go has //go:embed react-ui/dist/*. The glob must match at
least one non-hidden entry or typecheck fails the whole core/http
package. We don't need the real React bundle to lint Go code, so just
touch an empty index.html to satisfy the embed.

Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-28 22:07:44 +02:00
Ettore Di Giacinto
bcef72b9c1 feat: localai assistant chat modality (#9602)
* fix(tests): inline model_test fixtures after tests/models_fixtures removal

The previous reorg removed tests/models_fixtures/ but core/config/model_test.go
still read CONFIG_FILE/MODELS_PATH env vars pointing into that directory, so
`make test` failed with "open : no such file or directory" on the readConfigFile
spec (the suite ran with --fail-fast and bailed before openresponses_test).

Inline the YAMLs (config/embeddings/grpc/rwkv/whisper) directly into the test
file, materialise them into a per-test tmpdir via BeforeEach, and drop the
env-var lookups. The test no longer depends on Makefile plumbing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash]

* refactor(modeladmin): extract model-admin helpers into a service package

Lift the bodies of EditModelEndpoint, PatchConfigEndpoint,
ToggleStateModelEndpoint, TogglePinnedModelEndpoint and
VRAMEstimateEndpoint into core/services/modeladmin so the same logic can
be called by non-HTTP clients (notably the in-process MCP server that
backs the LocalAI Assistant chat modality, landing in a follow-up commit).

The HTTP handlers shrink to thin shells that parse echo inputs, call the
matching helper, map typed errors (ErrNotFound, ErrConflict,
ErrPathNotTrusted, ErrBadAction, ...) to the existing HTTP status codes,
and render the existing response shapes. No REST-surface behaviour change;
the existing localai endpoint tests cover the regression net.

Adds focused unit tests for each helper against tmp-dir-backed
ModelConfigLoader fixtures (deep-merge patch, rename + conflict, path
separator guard, toggle/pin enable/disable, sync callback).

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(assistant): LocalAI Assistant chat modality with in-memory MCP server

Adds a chat modality, admin-only, that wires the chat session to an
in-memory MCP server exposing LocalAI's own admin/management surface as
tools. An admin can install models, manage backends, edit configs and
check status by chatting; the LLM calls tools like gallery_search,
install_model, import_model_uri, list_installed_models, edit_model_config
and surfaces the results.

Same Go package powers two modes:

  pkg/mcp/localaitools/

    NewServer(client, opts) builds an MCP server that registers the
    19-tool admin catalog. The LocalAIClient interface has two impls:

    - inproc.Client — calls services directly (no HTTP loopback,
      no synthetic admin API key). Used in-process by the chat handler.
    - httpapi.Client — calls the LocalAI REST API. Used by the new
      `local-ai mcp-server --target=…` subcommand to control a remote
      LocalAI from a stdio MCP host.

    Tools and their embedded skill prompts are agnostic to which client
    backs them. Skill prompts are markdown files under prompts/, embedded
    via go:embed and assembled into the system prompt at server init.

Wiring:

  - core/http/endpoints/mcp/localai_assistant.go — process-wide holder
    that spins up the in-memory MCP server once at Application start
    using paired net.Pipe transports, then reuses LocalToolExecutor
    (no fork) for every chat request that opts in.

  - core/http/endpoints/openai/chat.go — small branch ahead of the
    existing MCP block: when metadata.localai_assistant=true,
    defense-in-depth admin check + executor swap + system-prompt
    injection. All downstream tool dispatch is unchanged.

  - core/http/auth/{permissions,features}.go — adds
    FeatureLocalAIAssistant; gating happens at the chat handler entry
    plus admin-only `/api/settings`.

  - core/cli/{run.go,cli.go,mcp_server.go} —
    LOCALAI_DISABLE_ASSISTANT flag (runtime-toggleable via Settings, no
    restart), plus `local-ai mcp-server` stdio subcommand.

  - core/config/runtime_settings.go — `localai_assistant_enabled`
    runtime setting; the chat handler reads `DisableLocalAIAssistant`
    live at request entry.

UI:

  - Home.jsx — prominent self-explanatory CTA card on first run
    ("Manage LocalAI by chatting"); collapses to a compact
    "Manage by chat" button in the quick-links row once used,
    persisted via localStorage.
  - Chat.jsx — admin-only "Manage" toggle in the chat header,
    "Manage mode" badge, dedicated empty-state copy, starter chips.
  - Settings.jsx — "LocalAI Assistant" section with the runtime
    enable toggle.
  - useChat.js — `localaiAssistant` flag on the chat schema; injects
    `metadata.localai_assistant=true` on requests when active.

Distributed mode: the in-memory MCP server lives only on the head node;
inproc.Client wraps already-distributed-aware services so installs
propagate to workers via the existing GalleryService machinery.

Documentation: `.agents/localai-assistant-mcp.md` is the contributor
contract — when adding an admin REST endpoint, also add a LocalAIClient
method, an inproc + httpapi impl, a tool registration, and a skill
prompt update; the AGENTS.md index links to it.

Out of scope (follow-ups): per-tool RBAC granularity for non-admin
read-only access; streaming mcp_tool_progress for long installs;
React Vitest rig for the UI changes.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(assistant): extract tool/capability/MiB/server-name constants

The MCP tool surface, capability tag set, server-name default, and the
chat-handler metadata key were repeated as bare string literals across
seven files. Renaming any one required hand-editing every call site and
risked code/test/prompt drift.

This pulls them into typed constants:

- pkg/mcp/localaitools/tools.go — Tool* constants for the 19 MCP tools,
  plus DefaultServerName.
- pkg/mcp/localaitools/capability.go — typed Capability + constants for
  the capability tag set the LLM passes to list_installed_models. The
  type rides through LocalAIClient.ListInstalledModels and replaces the
  triplet of "embed"/"embedding"/"embeddings" with the single
  CapabilityEmbeddings.
- pkg/mcp/localaitools/inproc/client.go — bytesPerMiB constant for the
  VRAMEstimate byte→MB conversion.
- core/http/endpoints/mcp/tools.go — MetadataKeyLocalAIAssistant for the
  "localai_assistant" request-metadata key consumed by the chat handler.

Tool registrations, the test catalog, the dispatch table, the validation
fixtures, and the fake/stub clients all reference the constants. The
embedded skill prompts under prompts/ keep their bare strings (go:embed
markdown can't import Go constants); the existing TestPromptsContain
SafetyAnchors guards the alignment.

No behaviour change. All tests pass with -race.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(modeladmin): typed Action for ToggleState/TogglePinned

The toggle/pin verbs were bare strings everywhere — handler signatures,
service implementations, MCP tool args, the fake/stub clients, the
inproc and httpapi LocalAIClient impls, plus 4 test files. A typo in
any caller silently fell through to the runtime "must be 'enable' or
'disable'" check.

Introduce core/services/modeladmin.Action (string alias) with
ActionEnable, ActionDisable, ActionPin, ActionUnpin and a small Valid
helper. The compiler now catches mismatches at every boundary; renames
ripple through one source of truth.

LocalAIClient.ToggleModelState/Pinned signatures change to take
modeladmin.Action. The package is brand-new and unreleased so this is
a free public-API tightening.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(assistant): respect ctx cancellation on gallery channel sends

InstallModel, DeleteModel, ImportModelURI, InstallBackend and
UpgradeBackend all pushed onto galleryop channels with bare sends. If the
worker was paused or the buffer full, the chat-handler goroutine blocked
forever — the LLM kept polling and the request leaked.

Wrap the five sends in a sendModelOp/sendBackendOp helper that selects
on ctx.Done() so a cancelled chat completion surfaces context.Canceled
back to the LLM instead of hanging.

Adds inproc/client_test.go with a pre-cancelled-ctx regression test on
InstallModel; the helpers are shared so the same guarantee covers the
other four call sites.

Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(assistant): graceful shutdown for in-memory holder and stdio CLI

Two related leaks:

- Application.start() built the LocalAIAssistantHolder but never wired
  Close() into the graceful-termination chain — the in-memory MCP
  transport pair stayed alive until process exit, and the goroutines
  behind net.Pipe() didn't drain. Hook into the existing
  signals.RegisterGracefulTerminationHandler chain (same pattern as
  core/http/endpoints/mcp/tools.go:770).

- core/cli/mcp_server.go ran srv.Run with context.Background(); a
  Ctrl-C from the host (Claude Desktop, mcphost, npx inspector) or a
  SIGTERM from process supervision left the stdio loop reading from a
  closed pipe. Switch to signal.NotifyContext to surface the signal
  through ctx and let srv.Run drain.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(assistant): typed HTTPError + propagate prompt walk error

The httpapi client detected "no such job" by substring-matching on the
error string ("404", "could not find") — brittle to status-code
formatting changes and to LocalAI fixing /models/jobs/:uuid to return a
proper 404. Replace with a typed *HTTPError whose Is() method honours
errors.Is(err, ErrHTTPNotFound). The 500-with-"could not find" branch
stays as a transitional fallback documented in Is().

Same change covers ListNodes' 404 fallback for the /api/nodes endpoint.

Adds httptest tests for both 404 and the legacy 500 path, plus a
direct errors.Is exposure test so external callers (the standalone
stdio CLI host) can match without re-string-parsing.

Also tightens prompts.SystemPrompt: panic when fs.WalkDir on the
embedded FS fails. The only realistic cause is a build-time //go:embed
misconfiguration; serving an empty system prompt to the LLM is much
worse than crashing init. TestSystemPromptIncludesAllEmbeddedFiles
catches regressions in CI.

Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(modeladmin): atomic writes for model config files

The five sites that wrote model YAML used os.WriteFile, which opens
with O_TRUNC|O_WRONLY|O_CREATE. A crash mid-write left the destination
truncated and the model unloadable until manual repair. Pre-existing
behaviour inherited from the original endpoint handlers — fix once now
that there's a single helper.

Adds writeFileAtomic: writes to a sibling temp file, chmods, syncs via
Close(), then os.Rename. Same-directory temp keeps the rename atomic on
the same filesystem; cleanup runs on every error path so stray temps
don't accumulate. No new dependency.

Applied to:
- ConfigService.PatchConfig
- ConfigService.EditYAML (both rename and in-place branches)
- mutateYAMLBoolFlag (drives ToggleState + TogglePinned)

atomic_test.go covers the happy path plus a read-only-dir failure case
that asserts the original file is preserved (skipped on Windows where
the chmod trick is POSIX-specific).

Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(assistant): prune dead code, mark stub, document conventions

Three small cleanups landing together:

- Drop the unused errNotImplemented sentinel from inproc/client.go.
  All five methods that used to return it are wired to modeladmin
  helpers since the Phase B commit; the package var is dead.

- Annotate httpapi.Client.GetModelConfig as a known stub. LocalAI's
  /models/edit/:name returns rendered HTML, not JSON, so the standalone
  CLI's get_model_config tool surfaces a clear error to the LLM. A
  future JSON-only /api/models/config-yaml/:name endpoint is tracked in
  the agent contract; FIXME points at it.

- Extend `.agents/localai-assistant-mcp.md` with a "Code conventions"
  section that documents the audit-driven rules: tool/Capability/Action
  constants, errors.Is over substring matching, ctx-aware channel
  sends, atomic writes, and graceful shutdown. Refresh the file map so
  it lists tools.go and capability.go and drops the removed
  tools_bootstrap.go.

The tools_models.go diff is a comment-only change explaining why the
ModelName empty-string check stays at the tool layer (consistency
across LocalAIClient implementations, since the SDK schema validator
only enforces presence, not non-empty).

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(assistant): convert test files to ginkgo + gomega

The repo convention (per core/http/endpoints/localai/*_test.go,
core/gallery/**, etc.) is Ginkgo v2 with Gomega assertions. The tests I
introduced for the assistant feature used vanilla testing.T, which made
them stand out and stripped the BDD structure the rest of the suite
relies on.

Convert every test file in the assistant scope to Ginkgo:

  pkg/mcp/localaitools/
    dto_test.go            — Describe("DTOs round-trip through JSON")
    prompts_test.go        — Describe("SystemPrompt assembler")
    server_test.go         — Describe("Server tool catalog"),
                              Describe("Tool dispatch"),
                              Describe("Tool error surfacing"),
                              Describe("Argument validation"),
                              Describe("Concurrent tool calls")
    parity_test.go         — Describe("LocalAIClient parity"),
                              hosts the suite's single RunSpecs (the file
                              is package localaitools_test so it can
                              import httpapi without an import cycle;
                              Ginkgo aggregates Describes from both the
                              internal and external test packages into
                              one run).
    httpapi/client_test.go — Describe("httpapi.Client against the
                              LocalAI admin REST surface"),
                              Describe("ErrHTTPNotFound"),
                              Describe("Bearer token")
    inproc/client_test.go  — Describe("inproc.Client cancellation")

  core/services/modeladmin/
    config_test.go         — Describe("ConfigService") with sub-Describes
                              for GetConfig, PatchConfig, EditYAML
    state_test.go          — Describe("ConfigService.ToggleState")
    pinned_test.go         — Describe("ConfigService.TogglePinned")
    atomic_test.go         — Describe("writeFileAtomic")

  core/http/endpoints/mcp/
    localai_assistant_test.go — Describe("LocalAIAssistantHolder")

Each package gets a `*_suite_test.go` with the standard
`RegisterFailHandler(Fail) + RunSpecs(t, "...")` boilerplate. Helpers
that previously took *testing.T (newTestService, writeModelYAML,
readMap, sortedStrings, sortGalleries, etc.) drop the *T receiver and
use Gomega Expectations directly. tmp dirs come from GinkgoT().TempDir().

No semantic change to test coverage — every original assertion has a
direct Gomega counterpart. All suites pass with -race.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test+docs(assistant): drift detector for Tool ↔ REST route mapping

Honest gap from the audit: the parity_test.go suite only checks four
methods, and uses the same httpapi.Client for both sides — it asserts
stability of the DTO shapes, not equivalence between in-process and
HTTP. If a contributor adds an admin REST endpoint without an MCP tool,
or a tool without a matching httpapi route, both surfaces silently
diverge.

Add a coverage test plus stronger docs:

- pkg/mcp/localaitools/coverage_test.go introduces a hand-maintained
  toolToHTTPRoute map: every Tool* constant must list the REST endpoint
  the httpapi.Client hits (or "(none)" with a documented reason). Two
  Ginkgo specs assert the map and the published catalog stay in sync —
  one fails when a Tool is added without a route entry, the other fails
  when a route entry references a tool that no longer exists. Verified
  by removing the ToolDeleteModel entry locally; the test fired with a
  clear message pointing the contributor at the file.

  Deliberate non-test: we don't enumerate live admin REST routes from
  here. Walking the route registry requires booting Application;
  parsing core/http/routes/localai.go is brittle. The "new admin REST
  endpoint → MCP tool" direction stays a PR checklist item — see below.

- AGENTS.md gets a new Quick Reference bullet that calls out the rule
  and points at the test by name.

- .agents/api-endpoints-and-auth.md tightens the existing "Companion:
  MCP admin tool surface" subsection from "if useful, consider..." to
  "MUST be considered, with three concrete outcomes (tool added,
  deliberately skipped with documented reason, or forgot — which
  breaks the contract)". Adds a checklist item at the bottom of the
  file's authoritative checklist.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(assistant): drop duplicate DTOs, surface canonical types

Audit feedback: localaitools/dto.go reinvented several types that already
existed in the codebase. Replace the duplicates with the canonical types
so the LLM-visible wire format stays aligned with the rest of LocalAI by
construction (no parallel structs to keep in sync).

Removed (and the canonical type now used by the LocalAIClient interface):

  localaitools.Gallery          → config.Gallery
  localaitools.GalleryModelHit  → gallery.Metadata
  localaitools.VRAMEstimate     → vram.EstimateResult

Tightened scope:

  localaitools.Backend          → kept, but reduced to {Name, Installed}.
                                  ListKnownBackends now returns
                                  []schema.KnownBackend (the canonical
                                  type already used by REST /backends/known).

Kept with documented rationale:

  localaitools.JobStatus       — galleryop.OpStatus has Error error which
                                 marshals to "{}". JobStatus is the
                                 JSON-friendly mirror.
  localaitools.Node            — nodes.BackendNode carries gorm internals
                                 + token hash; we expose only the
                                 LLM-relevant fields.
  ImportModelURIRequest/Response — schema.ImportModelRequest and
                                   GalleryResponse are wire-shaped, mine
                                   are LLM-shaped (BackendPreference flat,
                                   AmbiguousBackend exposed).

Side wins:

  - Drop bytesPerMiB; vram.EstimateResult already carries human-readable
    display strings (size_display, vram_display) the LLM uses directly.
  - Drop the handler-private vramEstimateRequest in
    core/http/endpoints/localai/vram.go and bind directly into
    modeladmin.VRAMRequest (now JSON-tagged).

Both clients pass through these types now where possible (e.g.
ListGalleries in inproc.Client is a one-liner returning
AppConfig.Galleries; httpapi.Client.GallerySearch decodes straight into
[]gallery.Metadata).

All tests green with -race.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(assistant): extract REST route paths into named constants

httpapi.Client had 18 bare-string path sites scattered across methods.
Pull them into pkg/mcp/localaitools/httpapi/routes.go: static paths as
package-private constants, dynamic paths as small builders that handle
url.PathEscape on segment values.

No behaviour change. Drops the now-unused net/url import from client.go
since path escaping moved into routes.go alongside the path it applies to.

Local-only by design: the server-side registrations in
core/http/routes/localai.go remain bare strings. Sharing constants across
the pkg/ ↔ core/ boundary would invert the layering today; the existing
Tool↔REST drift-detector in coverage_test.go is the safety net for that
direction.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

* docs(assistant): align with shipped UI and dropped bootstrap env vars

The LocalAI Assistant doc still described the older iteration:

- The in-chat toggle was renamed from "Admin" to "Manage" (the badge is
  now "Manage mode" and the home page exposes a "Manage by chat" CTA).
- LOCALAI_ASSISTANT_BOOTSTRAP_MODEL / --localai-assistant-bootstrap-model
  and the bootstrap_default_model tool were removed — admins pick a model
  from the existing selector instead, no env-var configuration required.
- The shipped tool catalog includes import_model_uri but didn't appear in
  the doc; bootstrap_default_model appeared but no longer exists.
- The Settings → LocalAI Assistant runtime toggle wasn't mentioned as the
  preferred way to disable without restart.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-28 19:29:27 +02:00
Ettore Di Giacinto
142919fc79 fix(tests): inline model_test fixtures after tests/models_fixtures removal
The previous reorg removed tests/models_fixtures/ but core/config/model_test.go
still read CONFIG_FILE/MODELS_PATH env vars pointing into that directory, so
`make test` failed with "open : no such file or directory" on the readConfigFile
spec (the suite ran with --fail-fast and bailed before openresponses_test).

Inline the YAMLs (config/embeddings/grpc/rwkv/whisper) directly into the test
file, materialise them into a per-test tmpdir via BeforeEach, and drop the
env-var lookups. The test no longer depends on Makefile plumbing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash]
2026-04-28 12:58:49 +00:00
dependabot[bot]
439471baec chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui (#9594)
Bumps [packaging](https://github.com/pypa/packaging) from 24.1 to 26.2.
- [Release notes](https://github.com/pypa/packaging/releases)
- [Changelog](https://github.com/pypa/packaging/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pypa/packaging/compare/24.1...26.2)

---
updated-dependencies:
- dependency-name: packaging
  dependency-version: '26.2'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-28 08:44:53 +02:00
dependabot[bot]
eff4be6794 chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.1 to 2.28.2 (#9593)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.28.1 to 2.28.2.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.28.1...v2.28.2)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-28 08:44:41 +02:00
dependabot[bot]
f1ec30d646 chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres from 0.41.0 to 0.42.0 (#9591)
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres

Bumps [github.com/testcontainers/testcontainers-go/modules/postgres](https://github.com/testcontainers/testcontainers-go) from 0.41.0 to 0.42.0.
- [Release notes](https://github.com/testcontainers/testcontainers-go/releases)
- [Commits](https://github.com/testcontainers/testcontainers-go/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: github.com/testcontainers/testcontainers-go/modules/postgres
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-28 08:44:27 +02:00
LocalAI [bot]
f3500223d7 chore: ⬆️ Update leejet/stable-diffusion.cpp to a81677f59c92d90343aebca51dfed7decf0a0cb0 (#9586)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-28 08:44:10 +02:00
LocalAI [bot]
b69bacfcdc chore: ⬆️ Update ikawrakow/ik_llama.cpp to d6f3e4e28fbf75e6181e6ea32e734de9ce9304fd (#9585)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-28 08:43:51 +02:00
LocalAI [bot]
8e50066fa2 chore: ⬆️ Update ggml-org/llama.cpp to 665abc609740d397d30c0d8ef4157dbf900bd1a3 (#9584)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-28 08:43:33 +02:00
Ettore Di Giacinto
a0317d9926 refactor(tests): split app_test.go, move real-backend coverage to e2e-backends
core/http/app_test.go had grown to 1495 lines exercising three concerns at
once: HTTP-layer integration, real-backend inference (llama-gguf, tts,
stablediffusion, transformers embeddings, whisper), and service logic that
already has unit-level coverage. Each PR paid for 6 backend builds plus
real-model downloads to satisfy a single suite.

Reorg per layer:

- app_test.go (1495 -> 1003 lines) drives the mock-backend binary only.
  Kept: auth, routing, gallery API, file:// import, /system, agent-jobs
  HTTP plumbing, config-file model loading. Deleted real-inference specs
  (llama-gguf chat, ggml completions/streaming, logprobs, logit_bias,
  transcription, embeddings, External-gRPC, Stores duplicate, Model gallery
  Context). Lifted Agent Jobs out of the deleted Stores Context.
- tests/e2e-backends/backend_test.go gains logprobs, logit_bias, and
  no-first-token-dup specs (the latter folded into PredictStream). Two
  new caps gate them so non-LLM backends opt out.
- tests/e2e-aio/e2e_test.go gains a streaming smoke under Context("text")
  to catch container-level streaming regressions.
- tests/models_fixtures/ removed; all fixtures referenced testmodel.ggml.
  app_test.go now writes per-Context inline mock-model YAMLs.

CI:

- test.yml + tests-e2e.yml gain paths-ignore (docs/, examples/, *.md,
  backend/) so docs and backend-only PRs skip them. test.yml drops the
  6-backend Build step plus TRANSFORMER_BACKEND/GO_TAGS=tts; tests-apple
  drops the llama-cpp-darwin build.
- New tests-aio.yml runs the AIO container nightly + on workflow_dispatch
  + master/tags. The tests-e2e-container job moved out of test.yml so PRs
  no longer pay AIO cost.
- New tests-llama-cpp-smoke job in test-extra.yml runs on every PR with
  no detect-changes gate; pulls quay.io/go-skynet/local-ai-backends:
  master-cpu-llama-cpp (no build on PR) and exercises predict/stream/
  logprobs/logit_bias against Qwen3-0.6B. This is the PR-acceptance
  real-backend gate after AIO moved to nightly. The path-gated heavy
  test-extra-backend-llama-cpp wrapper appends the same caps so it
  exercises the moved specs when the backend actually changes.

Makefile:

- Deleted test-models/testmodel.ggml (the wget chain), test-llama-gguf,
  test-tts, test-stablediffusion, test-realtime-models. test target
  drops --label-filter, HUGGINGFACE_GRPC, TRANSFORMER_BACKEND, TEST_DIR,
  FIXTURES, CONFIG_FILE, MODELS_PATH, BACKENDS_PATH; depends on
  build-mock-backend. test-stores keeps a focused entry point and depends
  on backends/local-store. clean-tests also clears the mock-backend
  binary.

Net per typical Go-side PR: ~25min (6 backend builds + tests + AIO) +
~8min e2e drops to ~5min mock-backend test + ~8min e2e + ~5-10min
llama-cpp-smoke (image pulled). Docs and backend-only PRs skip the
always-on workflows entirely.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash]
2026-04-27 23:09:20 +00:00
Ettore Di Giacinto
3948b580d2 fix(distributed): worker stopBackend/isRunning resolve bare modelID to replica keys
PR #9583 changed the supervisor's process map key from `modelID` to
`modelID#replicaIndex`, but the NATS lifecycle handlers kept passing
the bare modelID:

* `backend.stop` (subscribeLifecycleEvents): `s.stopBackend(req.Backend)`
  → `s.processes["Qwen3.6-..."]` missed (actual key is "...#0") →
  silent no-op. Admin "Unload model" clicks released VRAM via
  model.unload but left the gRPC process alive on its old port.
  Subsequent chats hit installBackend, found the leftover process,
  reused its address — and the UI reported "no models loaded" while
  the model kept responding.

* `backend.delete` (subscribeLifecycleEvents): same map miss in
  `isRunning(req.Backend)` and `s.stopBackend(req.Backend)` — admin
  "Delete backend" deleted the binary while the process was still
  serving traffic.

Add `resolveProcessKeys(id)`: exact match if `id` is a full processKey
(stopAllBackends iterates the map and passes its own keys);
prefix-match if `id` is bare (NATS handlers); empty if `id` contains
`#` but doesn't match (no spurious fallback when the caller was
explicit). stopBackend and isRunning now call it; stopBackend gets a
new stopBackendExact helper for per-key cleanup.

TDD: regression test fails without the fix (resolveProcessKeys
doesn't exist; map lookup by bare name returns nothing). Tests pass
post-fix.

Reproduced live: registry row count was 0 for the model the user
"Unloaded", chat still served by the leftover worker process.
SmartRouter behavior is correct in itself — it falls through to
scheduleAndLoad when no row exists; the bug was that the leftover
process corrupted the install path.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]
2026-04-27 21:43:15 +00:00
LocalAI [bot]
5efbe8405f feat(swagger): update swagger (#9587)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-27 23:28:03 +02:00
Ettore Di Giacinto
ea1df8945b fix(distributed): preserve UI-added node labels across worker re-register
The register endpoint called SetNodeLabels(req.Labels) — replace-all
semantics — so every worker re-register wiped every label not in the
worker's body. The bug existed since labels were introduced in
PR #9186 (Mar 31), but only triggered for workers that supplied
labels via --node-labels.

PR #9583 (the multi-replica refactor) added an auto-mirrored
`node.replica-slots` label to every worker's registration body, which
made `len(req.Labels) > 0` always true — turning a latent edge-case
bug into a universal one. Operators reported "labels assigned to
node do not persist": labels survived until the next worker restart,
then disappeared.

Fix: iterate req.Labels and call SetNodeLabel (upsert) for each
instead of SetNodeLabels (delete-then-recreate). Worker-managed
labels still refresh on re-register; UI-added labels survive.

Trade-off: an operator who removes a label from --node-labels won't
have it auto-removed from the DB on next register — they can clean it
via the UI. Acceptable, since the alternative (current behavior)
silently destroys operator state.

Regression test added first (TDD): RegisterNodeEndpoint registers a
node, the test simulates a UI add via SetNodeLabel, then re-registers
with a different worker label set; assertion that the UI-added label
survives. Test fails against the broken code, passes against the fix.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]
2026-04-27 21:24:50 +00:00
Ettore Di Giacinto
3280b9a287 fix(distributed): per-replica backend logs (store aggregation + UI)
The multi-replica refactor (PR #9583) changed the worker's process key
from `modelID` to `modelID#replicaIndex`, but the BackendLogStore kept
the bare-modelID lookup. Result: every distributed deployment lost
backend logs in the Nodes UI — single-replica too, since even the
default capacity of 1 produces a `#0` suffix.

Two changes wired together:

* pkg/model: BackendLogStore.GetLines/Subscribe now treat a modelID
  without `#` as a model prefix and merge across all `modelID#N` replica
  buffers (timestamp-sorted for GetLines; fan-in for Subscribe). Calls
  with a full `modelID#N` key resolve exactly. ListModels strips
  replica suffixes and deduplicates so the listing surfaces one entry
  per loaded model.

* react-ui: per-replica log streams as the default. Loaded Models
  table disambiguates each row with a `rep N` pill (only when the node
  hosts >1 replica of a model). Each row's "View logs" link routes to
  the per-replica process key so operators see only that replica's
  output. The logs page renders the replica context as a chip in the
  title and surfaces a segmented control — `Replica 0 / 1 / … / All
  merged` — when the model has multiple replicas; the merged segment
  uses the bare-modelID URL (delegating to the store's prefix
  aggregation) for the side-by-side comparison case. Single-replica
  deployments see no extra UI.

Tests added first (TDD): the regression set in
backend_log_store_test.go reproduces the bug at the exact failure
point — GetLines/ListModels/Subscribe assertions all fail against the
broken code, all pass against the fix. TestSubscribe_PerReplicaFilter
pins the exact-key path so a future change can't silently break it.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:distill]
2026-04-27 20:55:24 +00:00
Ettore Di Giacinto
375bf1929d fix(ui): hide meta-dev backends in System → Backends Development toggle
The Manage view's flagsFor() short-circuited on b.IsMeta and returned
dev=false for every meta backend, so meta-dev entries
(e.g. llama-cpp-development, whisper-development, insightface-development)
leaked through the Development toggle in distributed mode and stayed
visible whether the toggle was on or off. The count chip even
under-reported because those rows were excluded from it.

Drop the IsMeta short-circuit and trust gallery enrichment for both
flags. Production metas (llama-cpp) are tagged isAlias=false /
isDevelopment=false in the gallery so they still pass both toggles;
meta-dev entries carry isDevelopment=true and now correctly hide
alongside concrete dev variants.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 20:38:20 +00:00
Ettore Di Giacinto
9a7f5e68bd ci(darwin): add native caches to backend_build_darwin
macOS runners can't use the registry-backed BuildKit cache (no Docker
daemon), so every darwin matrix run was paying full cost for brew
installs, Go module downloads, llama.cpp recompiles and Python wheel
resolution.

Wires actions/cache@v4 into the reusable workflow for four caches:

- Go modules + build cache (setup-go cache: true), shared across matrix
- Homebrew downloads + selected /opt/homebrew/Cellar entries, with
  HOMEBREW_NO_AUTO_UPDATE so restored Cellar paths stay stable
- ccache for the llama-cpp CMake variants, keyed on the pinned
  LLAMA_VERSION; CMAKE_*_COMPILER_LAUNCHER is exported via GITHUB_ENV
  so backend/cpp/llama-cpp/Makefile picks it up without script changes
- Python uv + pip wheel cache, keyed by backend + ISO week — same
  one-cold-rebuild-per-week cadence as the Linux DEPS_REFRESH

Read/write semantics match the existing BuildKit policy: every run
restores, only master/tag pushes save, so PRs can't pollute master's
warm cache.

Documents the new caches and the macOS-specific constraints in
.agents/ci-caching.md.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code]
2026-04-27 20:17:36 +00:00
Ettore Di Giacinto
6b63b47f61 feat(distributed): support multiple replicas of one model on the same node (#9583)
* feat(distributed): support multiple replicas of one model on the same node

The distributed scheduler implicitly assumed `(node_id, model_name)` was
unique, but the schema didn't enforce it and the worker keyed all gRPC
processes by model name alone. With `MinReplicas=2` against a single
worker, the reconciler "scaled up" every 30s but the registry never
advanced past 1 row — the worker re-loaded the model in-place every tick
until VRAM fragmented and the gRPC process died.

This change introduces multi-replica-per-node as a first-class concept,
with capacity-aware scheduling, a circuit breaker, and VRAM
soft-reservation. Operators can declare per-node capacity via the worker
flag `--max-replicas-per-model` (mirrored as auto-label
`node.replica-slots=N`) or override per-node from the UI.

* Schema: BackendNode gains MaxReplicasPerModel (default 1) and
  ReservedVRAM. NodeModel gains ReplicaIndex (composite with node_id +
  model_name). ModelSchedulingConfig gains UnsatisfiableUntil/Ticks for
  the reconciler circuit breaker.

* Registry: replica_index threaded through SetNodeModel, RemoveNodeModel,
  IncrementInFlight, DecrementInFlight, TouchNodeModel, GetNodeModel,
  SetNodeModelLoadInfo and the InFlightTrackingClient. New helpers:
  CountReplicasOnNode, NextFreeReplicaIndex (with ErrNoFreeSlot),
  RemoveAllNodeModelReplicas, FindNodesWithFreeSlot,
  ClusterCapacityForModel, ReserveVRAM/ReleaseVRAM (atomic UPDATE with
  ErrInsufficientVRAM), and the unsatisfiable-flag CRUD.

* Worker: processKey now `<modelID>#<replicaIndex>` so concurrent loads
  of the same model land on distinct ports. Adds CLI flag
  --max-replicas-per-model (env LOCALAI_MAX_REPLICAS_PER_MODEL, default 1)
  and emits the auto-label.

* Router: scheduleNewModel filters candidates by free slot, allocates the
  replica index, and soft-reserves VRAM before installing the backend.
  evictLRUAndFreeNode now deletes the targeted row by ID instead of all
  replicas of the model on the node — fixes a latent bug where evicting
  one replica orphaned its siblings.

* Reconciler: caps scale-up at ClusterCapacityForModel so a misconfig
  (MinReplicas > capacity) doesn't loop forever. After 3 consecutive
  ticks of capacity==0 it sets UnsatisfiableUntil for a 5m cooldown and
  emits a warning. ClearAllUnsatisfiable fires from Register,
  ApproveNode, SetNodeLabel(s), RemoveNodeLabel and
  UpdateMaxReplicasPerModel so a new node joining or label changes wake
  the reconciler immediately. scaleDownIdle removes highest-replica-index
  first to keep slots compact.

* Heartbeat resets reserved_vram to 0 — worker is the source of truth
  for actual free VRAM; the reservation is only for the in-tick race
  window between two scheduling decisions.

* Probe path (reconciler.probeLoadedModels and health.doCheckAll) now
  pass the row's replica_index to RemoveNodeModel so an unreachable
  replica doesn't orphan healthy siblings.

* Admin override: PUT /api/nodes/:id/max-replicas-per-model sets a
  sticky override (preserved across worker re-registration). DELETE
  clears the override so the worker's flag applies again on next
  register. Required because Kong defaults the worker flag to 1, so
  every worker restart would have silently reverted the UI value.

* React UI: always-visible slot badge on the node row (muted at default
  1, accented when >1); inline editor in the expanded drawer with
  pencil-to-edit, Save/Cancel, Esc/Enter, "(override)" indicator when
  the value is admin-set, and a "Reset" button to hand control back to
  the worker. Soft confirm when shrinking the cap below the count of
  loaded replicas. Scheduling rules table gets an "Unsatisfiable until
  HH:MM" status badge surfacing the cooldown.

* node.replica-slots filtered out of the labels strip on the row to
  avoid duplicating the slot badge.

23 new Ginkgo specs (registry, reconciler, inflight, health) cover:
multi-replica row independence, RemoveNodeModel of one replica
preserving siblings, NextFreeReplicaIndex slot allocation including
ErrNoFreeSlot, capacity-gated scale-up with circuit breaker tripping
and recovery on Register, scheduleDownIdle ordering, ClusterCapacity
math, ReserveVRAM admission gating, Heartbeat reset, override survival
across worker re-registration, and ResetMaxReplicasPerModel handing
control back. Plus 8 stdlib tests for the worker processKey / CLI /
auto-label.

Closes the flap reproduced on Qwen3.6-35B against the nvidia-thor
worker (single 128 GiB node, MinReplicas=2): the reconciler now caps
the scale-up at the cluster's actual capacity instead of looping.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Read] [Edit] [Bash] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:golang-testing]

* refactor(react-ui/nodes): tighten capacity editor copy + adopt ActionMenu for row actions

* Capacity editor hint trimmed from operator-doc-style ("Sourced from
  the worker's `--max-replicas-per-model` flag. Changing it here makes it
  a sticky admin override that survives worker restarts." → "Saved
  values stick across worker restarts.") and the override-state copy
  similarly compressed. The full mechanic is no longer needed in the UI
  — the override pill carries the meaning and the docs cover the rest.

* Node row actions migrated from an inline cluster of icon buttons
  (Drain / Resume / Trash) to the kebab ActionMenu used by /manage for
  per-row model actions, so dense Nodes tables stay clean. Approve
  stays as a prominent primary button — it's a stateful admission gate,
  not a routine action, and elevating it matches how /manage surfaces
  install-time decisions outside the menu.

* The expanded drawer's Labels section now filters node.replica-slots
  out of the editable label list. The label is owned by the Capacity
  editor above; surfacing it again as an editable label invited
  confusion (the Capacity save would clobber any direct edit).

Both backend and agent workers benefit — they share the row rendering
path, so the action menu and label filter apply to both.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:critique] [Skill:audit] [Skill:polish]

* fix(react-ui/nodes): suppress slot badge on agent workers

Agent workers don't load models, so the per-node replica capacity is
inapplicable to them. Showing "1× slots" on agent rows was a tiny
inconsistency from the unified rendering path — gate the badge on
node_type !== 'agent' so it only appears on backend workers.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp]

* refactor(react-ui/nodes): distill expanded drawer + restyle scheduling form

The expanded node drawer used to stack five panels — slot badge,
filled capacity box, Loaded Models h4+empty-state, Installed Backends
h4+empty-state, Labels h4+chips+form — making routine inspections feel
like a control panel. The scheduling rule form wrapped its mode toggle
as two 50%-width filled buttons that competed visually with the actual
primary action.

* Drawer: collapse three rarely-touched config zones (Capacity,
  Backends, Labels) into one `<details>` "Manage" disclosure (closed by
  default) with small uppercase eyebrow labels for each zone instead of
  parallel h4 sub-headings. Loaded Models stays as the at-a-glance
  headline with a single-line empty hint instead of a boxed empty state.
  CapacityEditor renders flat (no filled background) — the Manage
  disclosure provides framing.

* Scheduling form: replace the chunky 50%-width button-tabs with the
  project's existing `.segmented` control (icon + label, sized to
  content). Mode hint becomes a single tied line below. Fields stack
  vertically with helper text under inputs and a hairline divider above
  the right-aligned Save / Cancel.

The empty drawer collapses from ~5 stacked sections (~280px tall) to
two lines (~80px). The scheduling form now reads as a designed dialog
instead of raw building blocks. Both surfaces now match the typographic
density and weight of the rest of the admin pages.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [chrome-devtools-mcp] [Skill:distill] [Skill:audit] [Skill:polish]

* feat(react-ui/nodes): replace scheduling form's model picker with searchable combobox

The native <select> made operators scroll through every gallery entry to
find a model name. The project already has SearchableModelSelect (used
in Studio/Talk/etc.) which combines free-text search with the gallery
list and accepts typed model names that aren't installed yet — useful
for pre-staging a scheduling rule before the node it'll run on has
finished bootstrapping.

Also drops the now-unused useModels import (the combobox manages the
gallery hook internally).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit]

* refactor(react-ui/nodes): consolidate key/value chip editor + add replica preset chips

The Nodes page was rendering the same key=value chip pattern in two
places with subtly different markup: the Labels editor in the expanded
drawer and (post-distill) the Node Selector input in the scheduling
form. The form's input was also a comma-separated string that operators
were getting wrong.

* Extract <KeyValueChips> as a fully controlled chip-builder. Parent
  owns the map and decides what onAdd/onRemove does — form state for the
  scheduling form, API calls for the live drawer Labels editor. Same
  visuals everywhere; one component to change when polish needs apply.

* Replace the comma-separated Node Selector text input with KeyValueChips.
  Operators were copying syntax from docs and missing commas; the chip
  vocabulary makes the key=value structure self-documenting.

* Add <ReplicaInput>: numeric input + quick-pick preset chips for Min/Max
  replicas. Picked over a slider because replica counts are exact specs
  derived from VRAM math (operator decision, not a fuzzy estimate). The
  chips give one-click access to common values (1/2/3/4 for Min,
  0=no-limit/2/4/8 for Max) without the slider's special-value problem
  (MaxReplicas=0 is categorical, not a position on a continuum).

* Drop the now-unused labelInputs state in the Nodes page (the inline
  label editor's per-node draft state lived there and is now owned by
  KeyValueChips).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Skill:distill]

* test: fix CI fallout from multi-replica refactor (e2e/distributed + playwright)

Two breakages caught by CI that didn't surface in the local run:

* tests/e2e/distributed/*.go — multiple files used the pre-PR2 registry
  signatures for SetNodeModel / IncrementInFlight / DecrementInFlight /
  RemoveNodeModel / TouchNodeModel / GetNodeModel / SetNodeModelLoadInfo
  and one stale adapter.InstallBackend call in node_lifecycle_test.go.
  All updated to pass replicaIndex=0 — these tests don't exercise
  multi-replica behavior, they just need to compile against the new
  signatures. The chip-builder tests in core/services/nodes/ already
  cover the multi-replica logic.

* core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js — the
  drawer's distill refactor moved Backends inside a "Manage" <details>
  disclosure that's collapsed by default. The test helper expanded the
  node row but never opened Manage, so the per-node backend table was
  never in the DOM. Helper now clicks `.node-manage > summary` after
  expanding the row.

All 100 playwright tests pass locally; tests/e2e/distributed compiles
clean.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 21:20:05 +02:00
Ettore Di Giacinto
f4036fa83f ci(python-backends): add weekly DEPS_REFRESH cache-buster
The shared backend/Dockerfile.python ends in:
    RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
which `pip install`s each backend's requirements*.txt. A scan of all 34
Python backends shows every single one ships at least some unpinned deps
(torch, transformers, vllm, diffusers, ...). With the registry cache now
enabled, that `make` layer's BuildKit hash depends only on Dockerfile
instructions + COPYed source — not on what pip resolves at runtime — so
a warm cache would freeze upstream versions indefinitely.

DEPS_REFRESH is an ARG declared right before that RUN. backend_build.yml
computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) and passes it as
a build-arg, so the install layer invalidates at most once per week and
re-resolves PyPI / nightly indexes. Within a week, builds stay warm.

Only Dockerfile.python is affected: Go (go.sum) and Rust (Cargo.lock)
already lock their deps, and the C++ backends pull gRPC at a pinned tag
and llama.cpp at a pinned commit.

Add .agents/ci-caching.md documenting the cache layout
(quay.io/go-skynet/ci-cache:cache<tag-suffix>), read/write semantics
(master writes, PRs read-only), DEPS_REFRESH semantics, and how to
manually evict tags. Index it from AGENTS.md (CLAUDE.md is a symlink).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 14:21:11 +00:00
Ettore Di Giacinto
3810fe1a1e fix(distributed): worker container healthcheck always unhealthy
The Dockerfile's HEALTHCHECK probes http://localhost:8080/readyz, which
is the OpenAI API server port. When the same image runs as a worker, it
listens on the gRPC base port (50051) and an HTTP file transfer server
on port-1 (50050) — nothing on 8080 — so docker always reports the
container as unhealthy.

Add unauthenticated /readyz and /healthz endpoints to the worker's HTTP
file transfer server, and override HEALTHCHECK_ENDPOINT for worker-1 in
the distributed compose file. Disable the healthcheck for agent-worker
since it is NATS-only and exposes no HTTP server.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 13:51:57 +00:00
Ettore Di Giacinto
bdfa5e934a ci: switch image/backend build cache to a dedicated registry image
- Switch cache-from/cache-to in backend_build.yml and image_build.yml
  from the unused gha cache to type=registry pointing at
  quay.io/go-skynet/ci-cache:cache<tag-suffix>, mode=max with
  ignore-error=true. Master/tag builds populate their own
  per-matrix-entry cache; PR builds read-only.
- Drop the broken generate_grpc_cache.yaml cron. It targeted a `grpc`
  Dockerfile stage that was removed by b1fc5acd in July 2025, has been
  failing every night since, and never populated the gha cache. The new
  registry-cache scheme is self-warming, so no separate populator is
  needed.
- Remove the dead GRPC_VERSION / GRPC_BASE_IMAGE / GRPC_MAKEFLAGS
  build-args from image_build.yml and the orphan ARG GRPC_BASE_IMAGE in
  the root Dockerfile (the root Dockerfile no longer compiles gRPC; the
  source build now lives in backend/Dockerfile.{llama-cpp,
  ik-llama-cpp, turboquant} only and uses its own ARG defaults).
- Drop the unused grpc-base-image input from image_build.yml plus the
  matrix passthroughs in image.yml / image-pr.yml.
- Drop the unused GRPC_VERSION env in test.yml.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7-1m
2026-04-27 13:13:04 +00:00
Richard Palethorpe
deca6dbdad feat: Log backend exit code (#9581)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-27 14:19:18 +02:00
Ettore Di Giacinto
60549a8a60 feat(react-ui): page-width archetype system + mobile/tablet nav polish
Replace the universal max-width:1200px cap on .page with a four-tier
archetype system (narrow 760, medium 1080, default 1600, wide unbounded)
selected per page based on what its UX actually wants. Data/table pages
fill ultrawide displays; forms cap at reading width; tabbed feature
surfaces breathe.

Mobile/tablet:
- New 640/1024 breakpoint split. Tablets (640-1023) get a persistent
  52px icon rail; below 640 keeps the slide-off drawer.
- Drawer polish: body-scroll lock, Escape to close, focus moves into
  the drawer on open and back to the hamburger on close, aria-hidden
  + inert on main while open.
- Mobile top bar carries hamburger + theme toggle + account avatar
  (44x44 touch targets) so theme/account aren't trapped in the drawer.
- Page-level reflow on phones: page-header column-stacks, filter chips
  scroll horizontally, tables go edge-to-edge, OperationsBar overflows
  rather than wrapping. Honors prefers-reduced-motion.

Manage > Models: drop the toggle column; Enable/Disable joins the
per-row Actions menu alongside Stop/Pin/Edit/Logs/Delete for
consistency with the other action verbs.

Page-width tokens live in theme.css so future tuning is one line.
Removes 7 inline maxWidth workarounds from page roots.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Edit] [Bash]
2026-04-27 11:51:29 +00:00
Ettore Di Giacinto
54728e292f feat(react-ui): split Manage backends toggle into Variants and Development
Meta backends are now always shown — they're the entries operators
configure against — and two independent toggles govern the noise around
them. "Variants" hides platform-specific concrete builds that a meta
backend aliases on the host (e.g. llama-cpp-cuda12-12.4). "Development"
hides pre-release `-development` builds. Each toggle shows the count of
items currently hidden in its category. The legacy `bm` URL flag is
honored on read so existing deep-links resolve to the same view they
used to.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-27 08:23:53 +00:00
Tai An
86fd62233f fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) (#9580)
The overrides.parameters.model field referenced 'Qwen3.-27B-Claude-...' (missing the '5'), so model loads failed because the configured filename did not match the file actually downloaded by the entry's files: list ('Qwen3.5-27B-Claude-...').

Aligns the override filename with the files: entries and with the upstream HF repo (mradermacher/Qwen3.5-27B-...).
2026-04-27 09:24:00 +02:00
Alex Brick
41ed8ced70 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) (#9578)
Update additional intel base images
2026-04-27 09:18:57 +02:00
LocalAI [bot]
05e94bd9e7 chore: ⬆️ Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 (#9577)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-27 08:57:24 +02:00
Ettore Di Giacinto
8d124d080f feat(gallery): add whisper-development umbrella stanza
Mirrors the whisper capabilities map with -development variants so
clients can pull the master-tagged whisper.cpp backend via a single
platform-resolved name, matching the existing faster-whisper-development
and whisperx-development entries.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 23:04:27 +00:00
Ettore Di Giacinto
2da1a4d230 feat(distributed): per-node backend installation from the gallery
In distributed mode the Backends gallery used to fan every install out to
every worker — fine for auto-resolving (meta) backends like llama-cpp where
each node picks its own variant, but wrong for hardware-specific builds
like cpu-llama-cpp that would silently land on every GPU node.

Adds a node-targeted install path through the existing
POST /api/nodes/:id/backends/install plumbing, with two entry points:

- Backends gallery row gets a split-button in distributed mode. Auto-
  resolving keeps "Install on all nodes" as the primary; chevron menu
  opens the picker. Hardware-specific routes the primary directly to the
  picker — no fan-out path on the row.
- Nodes-page drawer gets a "+ Add backend" button that navigates to
  /app/backends?target=<node-id>; the gallery scopes itself to that node
  (banner, single per-row install button, Reinstall/Remove for already-
  installed). One gallery, two scopes — no second UI to maintain.

The picker (new NodeInstallPicker) shows a 3-state suitability column
(Compatible / Override / Installed), an auto-expanding variant override
disclosure that fires when selected nodes have no working GPU, parallel
per-node installs with inline status and Retry-failed-nodes, and a
mismatch confirm that names the consequence on the button itself.

A 409 fan-out guard on /api/backends/apply protects CLI/Terraform/script
users from the same footgun: hardware-specific installs in distributed
mode now return code "concrete_backend_requires_target" with a human-
readable error and a meta_alternative pointer.

The gallery list payload now surfaces capabilities, metaBackendFor and
per-row nodes (NodeBackendRef) so the picker and the new Nodes column
have everything they need without re-walking the gallery client-side.

GODEBUG=netdns=go is set on the compose services because the cgo DNS
resolver follows the container's nsswitch.conf to host systemd-resolved
(127.0.0.53), unreachable from inside the container; the pure-Go
resolver reads /etc/resolv.conf directly and uses Docker's embedded DNS.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7[1m] [Edit] [Bash] [Read] [Write]
2026-04-26 22:05:18 +00:00
Ettore Di Giacinto
988430c850 test(react-ui): drive Manage page Backend logs link via the new kebab menu
Manage page row actions moved into ActionMenu in b336d9c6, so the
inline `<a title="Backend logs">` the e2e specs were asserting on no
longer exists. Open the row's kebab and assert against the menuitem.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7
2026-04-26 20:51:01 +00:00
Ettore Di Giacinto
b336d9c626 feat(react-ui): polish Manage page with kebab menus and gallery rows
Bring the System / Manage page up to the visual standard of the Install
gallery so installed models and backends stop reading like a debug dump.

- Unified ResourceRow anatomy (icon, name+description, badges, status,
  expandable detail) shared across both tabs.
- Gallery enrichment cross-references installed names against the gallery
  list endpoints to surface icons, descriptions, license, tags, and links
  with a graceful "no description" fallback for custom imports.
- Header summary with four StatCards (Models / Backends / Running /
  Updates) — clickable to switch tab + pre-set filter.
- Backends meta + development entries hidden by default; "Show meta &
  development" paired toggle in the FilterBar with hidden-count hint.
- Kebab (three-dot) ActionMenu replaces the inline button cluster on
  every row; restrained until hover, keyboard-navigable, danger items
  separated by a divider.
- Backend "Version" cell now falls back to short digest, OCI tag, or
  ocifile basename when no semver is set, instead of showing "—" for
  every OCI install. Detail panel exposes full Source URI + Digest.
- Drop redundant column headers ("Actions", "On") — kebabs and toggles
  carry their own affordance; screen readers still get a label.
- Inline System / User / Meta / Dev badges next to the backend name so
  the dedicated Type column doesn't reserve space for "USER" repeated.
- Tightened the spacing between the System Resources card and the
  StatCards so they no longer crowd the RAM bar.

Extracted StatCard and GalleryLoader from Nodes.jsx and Models.jsx into
shared components so the visual language is one source of truth.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
2026-04-26 20:33:49 +00:00
Ettore Di Giacinto
f384c64a91 fix(model-loader): also skip .ckpt, .zip, and .tag files when scanning models
The local model directory scan treats every non-skipped file as a model
config candidate. Sidecar artifacts that ship alongside checkpoints
(checkpoint blobs, downloaded archives, ggml-style tag files) were
slipping through and showing up as bogus models in the listing. Add
their extensions to the suffix-skip list.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:37:53 +00:00
Ettore Di Giacinto
e9d8e92988 fix(react-ui): don't yank chat scroll to bottom while user is reading
The chat and agent-chat pages auto-scrolled to the bottom on every
streamed token. If the user scrolled up to re-read part of a response,
the next chunk pulled them back down — making long replies unreadable
while streaming.

Track a stickToBottomRef on each scroll event: if the user is within
80px of the bottom we keep auto-scrolling, otherwise we leave them
where they are. On chat switch we snap back to the bottom and re-pin.

Same fix applied to both Chat.jsx and AgentChat.jsx since they share
the same streaming pattern.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
5b0196c7d0 fix(whisper): scrub invalid UTF-8 from segment text before protobuf marshal
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
Ettore Di Giacinto
c8d63a1003 fix(react-ui): stop Manage page from blanking on auto-refresh; show real model use cases
- useModels.refetch now runs silently — distributed-mode 10s auto-refresh
  no longer flips loading=true and replaces the table with a spinner card.
- Manage Use Cases column derives badges from each model's actual
  capabilities (Chat / Image / TTS / Embeddings / etc.) instead of
  hardcoding a "Chat" link for every row.
- FilterBar right slot is right-aligned via margin-left:auto so the
  Update button lives at the end of the row, not next to the chips.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-26 19:35:39 +00:00
LocalAI [bot]
d9cb0d6133 chore: ⬆️ Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 (#9572)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:35 +02:00
LocalAI [bot]
f5c268deac chore: ⬆️ Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a (#9571)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 12:38:25 +02:00
Tai An
8931a2ad31 fix(gallery): normalize inconsistent tag casing/plurals across gallery models (#9574)
- embeddings → embedding (6 models): aligns with the WebUI filter button
  defined in core/http/views/models.html ({ term: 'embedding', ... }), so
  models like nomic-embed-text-v1.5 now appear under the Embedding filter
- TTS → tts (5 models), ASR → asr (2 models): lowercase, per existing
  convention used by 161+ models
- CPU/Cpu → cpu (17 models), GPU → gpu (17 models): lowercase, per existing
  convention used by 666+ models
- dedupe duplicate tag entries on 3 models that already had repeated tags
  (gpt-oss-20b had gguf x2; arcee-ai/AFM-4.5B had gpu x2; one Qwen model
  had default x2)

Closes #9247
2026-04-26 08:33:38 +02:00
Ettore Di Giacinto
e16e758dff ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 (#9573)
Extend the existing CPU build matrix entries to produce a multi-arch
manifest (linux/amd64,linux/arm64) at the same image tags. arm64
Linux hosts without an NVIDIA GPU report the "default" capability,
which already maps to cpu-whisperx / cpu-faster-whisper in
backend/index.yaml -- so the manifest list lets Docker pull the right
variant without any gallery changes.

Both stacks install cleanly under aarch64: torch (2.4.1/2.8.0),
faster-whisper, ctranslate2, whisperx, opencv-python and the
remaining deps all ship manylinux2014_aarch64 wheels, so no source
builds run under QEMU emulation.

Follows the same pattern already used by cpu-llama-cpp-quantization.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-26 08:30:03 +02:00
LocalAI [bot]
1c45227346 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 (#9570)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-26 08:26:37 +02:00
Ettore Di Giacinto
fbe4f0a99b fix(docs): replace Docsy alert shortcode with Relearn notice
The docs site uses the hugo-theme-relearn theme, which provides
`notice` instead of Docsy's `alert`. The face-recognition,
voice-recognition, and stores feature pages used `{{% alert %}}`,
breaking `hugo build` with "template for shortcode \"alert\" not
found".

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 21:04:31 +00:00
Ettore Di Giacinto
d733c9cd13 fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds (#9568)
Blaizzy/mlx-vlm git HEAD bumped its constraint to mlx>=0.31.2, but
mlx-cuda-12 and mlx-cuda-13 are only published up to 0.31.1 on PyPI.
Since mlx[cudaXX]==0.31.2 forces a sibling wheel that doesn't exist,
pip backtracks through every older mlx[cudaXX], none of which satisfy
mlx>=0.31.2, producing ResolutionImpossible.

Pin all variants to the v0.4.4 tag (mlx>=0.30.0), which resolves
cleanly against mlx[cuda13]==0.31.1. cpu/mps weren't broken yet but
are pinned for consistency.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 22:06:01 +02:00
Ettore Di Giacinto
703b4fcae8 Change cron schedule to run every 12 hours
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-25 18:38:28 +02:00
Richard Palethorpe
73aacad2f9 fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557)
The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:

  ImportError: .../flash_attn_2_cuda...so: undefined symbol:
  _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.

vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.

Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-25 15:38:13 +00:00
LocalAI [bot]
806ea24ff4 chore: ⬆️ Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d (#9497)
⬆️ Update TheTom/llama-cpp-turboquant

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:46 +02:00
LocalAI [bot]
385de3705e chore(model gallery): 🤖 add 1 new models via gallery agent (#9558)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 14:03:15 +02:00
Ettore Di Giacinto
21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 14:02:57 +02:00
Ettore Di Giacinto
24505e57f5 feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553)
* feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang

Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM
serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and
JetPack 7 boards:

- vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130
  prebuilt vllm + flash-attn).
- vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm
  0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master;
  cu12 path keeps the 0.14.0 pin to avoid disturbing existing images.
- sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the
  jetson-ai-lab SBSA cu130 index, so no source build is needed.
  Cublas13 sglang on x86_64 is intentionally deferred.

CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni},
-nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml
gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13)
and latest/development merge entries.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]

* fix(backends): use unsafe-best-match index strategy on l4t13 builds

The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.)
at limited versions / older Python ABIs. uv defaults to the first index
that contains a package and refuses to fall through to PyPI, so sglang
l4t13 build fails resolving decord. Mirror the existing cpu sglang
profile by setting --index-strategy=unsafe-best-match on l4t13 across
the three backends, and apply it to the explicit vllm install line in
vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS).

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]

* fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0

The [all] extra brings in outlines→decord, and decord has no aarch64
cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37
tags). With unsafe-best-match enabled, uv backtracked through sglang
versions trying to satisfy decord and silently landed on
sglang==0.1.16, an ancient version with an entirely different dep
tree (cloudpickle/outlines 0.0.44, etc.).

Drop [all] so decord is no longer required, and floor sglang at 0.5.0
to prevent any future resolver misfire from degrading the version
again.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 12:26:29 +02:00
LocalAI [bot]
d09706dc60 chore(model gallery): 🤖 add 1 new models via gallery agent (#9555)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 09:00:37 +02:00
LocalAI [bot]
08e393f7db chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 (#9549)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:59:48 +02:00
LocalAI [bot]
47cc3dc8d7 chore: ⬆️ Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 (#9548)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-25 08:58:27 +02:00
Ettore Di Giacinto
83b384de97 feat: surface distributed backend management errors (#9552)
* fix(distributed): surface per-node backend op errors to OpStatus

DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the
per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`.
When workers replied Success=false (e.g. an OCI image with no arm64
variant on a Jetson host), the per-node Error string was recorded in
result.Nodes[].Error but never reached the toplevel return value, so
OpStatus.Error stayed empty and the UI reported the install as
"completed" while the backend was nowhere on the cluster.

Add BackendOpResult.Err() that aggregates per-node Status=="error"
entries into a single error. Queued nodes (waiting for reconciler retry)
are deliberately not treated as failures. Wire the three callers and
DeleteBackendDetailed to call result.Err() so reply.Success=false
finally reaches OpStatus.Error → /api/backends/job/:uid → the UI.

The Delete closures had a related bug: they discarded the reply with
`_` and only checked the NATS round-trip error, so reply.Success=false
was a silent success even with the new aggregation. Check both.

Standalone mode (LocalBackendManager) already surfaces gallery errors
correctly through the same OpStatus.Error path; no change needed there.

Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct
errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]

* feat(react-ui): per-node backend delete + clearer upgrade affordance

The Nodes page exposed a per-node "reinstall" button (fa-sync-alt,
tooltip "Reinstall backend") but no per-node delete, even though the
Go side has had POST /api/nodes/:id/backends/delete →
RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up
for a while. Sync icons read as "refresh data" — the action is
functionally an upgrade (re-pulls the gallery image), so the affordance
was misleading.

Per-node backend row now renders two icon buttons:

- Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend
  on this node". Names both action and scope to differentiate from the
  cluster-wide upgrade on the Backends page.
- Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend
  from this node". Matches the node-level destructive style at the row
  action column rather than the solid btn-danger of primary destructive
  pages, since this is a secondary action inside a busy row.

Delete goes through the existing ConfirmDialog (danger=true) with copy
that names the backend and the node explicitly — it's a non-recoverable
op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which
already existed in the API client.

Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip),
delete button presence, confirm dialog flow with POST body assertion,
and cancel-doesn't-POST.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
2026-04-25 08:57:59 +02:00
Ettore Di Giacinto
487e3fd2a4 feat(react-ui): editorial refresh with Nord palette and polished primitives (#9550)
* feat(react-ui): editorial refresh with Nord palette and polished primitives

Replaces the cool gray-blue theme with a deep Nord-inspired palette:
frost-cyan accent (#88c0d0) on deep blue-black surfaces (#13171f /
#1a1f2a / #242a36), snow-storm text scale, aurora status colours.

- Typography: Geist Variable + Geist Mono Variable (Google Fonts) with
  ss01/ss03/cv11 stylistic alternates; strengthened h1-h6 hierarchy;
  editorial negative tracking.
- Primitives: buttons gain depth (inset highlight + hover lift +
  brightness filter); inputs become sunken wells with sage-swap-to-frost
  focus rings; cards hover-lift and gain an .card--accent left-rail
  variant; badges become mono caps rectangles with tabular-nums.
- Chrome: sidebar active state is now an inset left rail + tint
  (no border-left); modals get popIn animation and proper shadow lift;
  toasts carry an inset accent bar + slide-in instead of tinted fills;
  operations bar breathes on active installs.
- Empty states: editorial pattern (eyebrow rule, large mono title,
  52ch lede) that inherits gracefully even without page JSX edits.
- Chat: assistant bubbles drop the gray-nested-in-gray card for a
  transparent pull-quote with a left border; user bubbles soften from
  loud accent fill to a subtle frost tint.
- Motion: custom spring easing cubic-bezier(0.22,1,0.36,1), 180ms
  standard; breathing/pulse/popIn keyframes; global prefers-reduced-
  motion honoring.
- Radii tightened to 3/5/8/10px; warm-shadow tokens redone for cool
  depth; ::selection, :focus-visible, kbd globals added.
- Migrated hardcoded 'JetBrains Mono' CSS literals to var(--font-mono)
  so the Geist Mono swap lands everywhere.

Scope is intentionally tokens + primitives only. Page JSX and the
~1,800 inline style={{…}} instances are untouched and flagged as
follow-ups.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]

* feat(react-ui): complete-coverage pass — migrate inline styles to tokens

Follows up the editorial/Nord token refresh with a mechanical sweep of
page JSX and shared components so nothing bypasses the design system.

- Font family: replaced 80+ 'JetBrains Mono' / 'Space Grotesk' inline
  literals (and the string-CSS variants in CollectionDetails and
  AgentStatus) with var(--font-mono) / var(--font-sans). SVG <text>
  nodes that used the attribute form were switched to style={{ }} so
  the CSS variable resolves.
- Radii: every unquoted numeric borderRadius (2/3/4/10) is now a
  var(--radius-*) token; 50% and 999px kept as computed shapes.
- Spacing: clean-token gaps and margins (4/8/16px) moved to
  var(--spacing-xs/sm/md); padding: '4px 8px' and '8px 16px' lifted
  into token pairs. Micro-values (2/6/10/12px) left inline where no
  token maps cleanly.
- Colors: Talk.jsx button/canvas-surface hardcodes moved to
  var(--color-*); FineTune.jsx chart series colours now use the
  --color-data-* Nord palette (cyan/red/purple/orange instead of
  tailwind hex); AgentStatus tool-call icon and error tag hex swapped
  for var(--color-warning) / var(--color-text-inverse).
- CodeMirror editor (utils/cmTheme.js): both themes rebased on Nord —
  polar-night surfaces and aurora syntax highlighting (dark), snow-
  storm surfaces with darkened aurora (light). Caret/selection/active
  line/search now frost-cyan tinted instead of legacy indigo/purple.

Legitimately dynamic styles (computed widths, per-row colours, canvas
2D context fill/stroke for waveform and spectrogram drawing) remain
inline — they can't be expressed as CSS tokens.

29 files, +237/-237 — identity preserved, semantics re-anchored to
the token system.

Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write]
2026-04-24 23:35:59 +02:00
dependabot[bot]
9ab3496de2 chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory (#9546)
chore(deps): bump rustls-webpki

Bumps the cargo group with 1 update in the /backend/rust/kokoros directory: [rustls-webpki](https://github.com/rustls/webpki).


Updates `rustls-webpki` from 0.103.10 to 0.103.13
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.10...v/0.103.13)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-version: 0.103.13
  dependency-type: indirect
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:58 +02:00
dependabot[bot]
c4511be33a chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9544)
chore(deps): bump postcss

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [postcss](https://github.com/postcss/postcss).


Updates `postcss` from 8.5.8 to 8.5.10
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.5.8...8.5.10)

---
updated-dependencies:
- dependency-name: postcss
  dependency-version: 8.5.10
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-24 22:02:41 +02:00
Ettore Di Giacinto
551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545)
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.

Three causes addressed:

* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
  (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
  manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
  `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
  `/sys/devices/soc0/soc_id`.

* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
  `/proc/device-tree/model` was missing. That's what happens for Thor inside a
  docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
  resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
  (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
  correctly.

* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
  usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
  with `waitid: no child processes` under containers without `--init`. Each
  such heartbeat overwrote the DB and made the UI flip to "fully used".
  `heartbeatBody` now omits `available_vram` in that case so the DB keeps its
  last good value.

Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.

Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.

Assisted-by: Claude:opus-4.7 [Claude Code]
2026-04-24 22:02:23 +02:00
Andreas Egli
1d0de757c3 fix: add hipblaslt library (#9541)
Signed-off-by: Andreas Egli <github@kharan.ch>
2026-04-24 18:50:03 +02:00
Alex Brick
e5337039b0 [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (#9543)
* Use latest oneapi-basekit image for Intel images

The current `localai/localai:master-gpu-intel` images don't work with the intel arc pro b70. Updating the base_image to 2025.3.2 fixes it.

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>

* Update github workflow base image

---------

Signed-off-by: Alex Brick <3220905+arbrick@users.noreply.github.com>
2026-04-24 18:29:10 +02:00
LocalAI [bot]
1c9592c77f chore: ⬆️ Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a (#9521)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-24 15:15:15 +02:00
Richard Palethorpe
3db60b57e6 fix(realtime): consume ChatDeltas when C++ autoparser clears Response (#9538)
The llama.cpp C++-side chat autoparser clears Reply.Message and delivers
parsed content/reasoning/tool-calls via Reply.chat_deltas. chat.go handles
this (non-SSE path uses ToolCallsFromChatDeltas/ContentFromChatDeltas/
ReasoningFromChatDeltas), but realtime.go only read pred.Response, so any
model routed through the autoparser (Qwen2.5/3 and friends) produced a
silent reply: backend emitted N tokens, the session surface saw zero.

Mirror the non-SSE chat path in realtime's triggerResponse: when deltas
carry tool calls or content, use them directly; otherwise fall back to
the existing raw-text parsing.

Assisted-by: claude-opus-4-7-1M [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-24 14:41:38 +02:00
Richard Palethorpe
13734ae9fa feat: Add Sherpa ONNX backend for ASR and TTS (#8523)
feat(backend): Add Sherpa ONNX backend and Omnilingual ASR

Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same
approach as opus/stablediffusion-ggml/whisper — a thin C shim
(csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego
can't reach directly: nested struct config writes, result-struct field
reads, and the streaming TTS callback trampoline. The Go side uses
opaque uintptr handles and purego.NewCallback for the TTS callback.

Supports:
- VAD via sherpa-onnx's Silero VAD
- Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC
- Online/streaming ASR: zipformer transducer with endpoint detection
  (AudioTranscriptionStream emits delta events during decode)
- Offline TTS: VITS (LJS, etc.)
- Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel,
  prefixed by a streaming WAV header

Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline
ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR),
silero-vad-sherpa, vits-ljs-sherpa.

E2E coverage: tests/e2e-backends for offline + streaming ASR,
tests/e2e for the full realtime pipeline (VAD + STT + TTS).

Assisted-by: claude-opus-4-7-1M [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-24 14:40:06 +02:00
Ettore Di Giacinto
c0920f3273 fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature (#9531)
Bumps ik_llama.cpp pin to 16996aeab7. Upstream 286ce32...16996ae adds a
trailing `const struct quantize_user_data *` parameter to
`ggml_quantize_chunk` (PR ikawrakow/ik_llama.cpp#1677) but leaves
`examples/llava/clip.cpp` unchanged because their build has moved to
`examples/mtmd/`. LocalAI's prepare.sh still copies from
`examples/llava/`, so the dead 7-arg call reaches the grpc-server
compile and fails. Patch the call site to pass `nullptr` for the new
param.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]
2026-04-24 13:07:26 +02:00
LocalAI [bot]
7c1934b183 chore: ⬆️ Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 (#9520)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-24 09:17:06 +02:00
Tai An
5e062b4d1f fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) (#9526)
* fix(anthropic): use SetFunctionCallNameString for specific tool forcing

* fix(openai/realtime): use SetFunctionCallNameString for specific tool forcing

* fix(openresponses): use SetFunctionCallNameString for specific tool forcing
2026-04-24 09:06:42 +02:00
Ettore Di Giacinto
4906cbad04 feat: add biometrics UI (#9524)
* feat(react-ui): add Face & Voice Recognition pages

Expose the face and voice biometrics endpoints
(/v1/face/*, /v1/voice/*) through the React UI. Each page has four
tabs driving the six endpoints per modality: Analyze (demographics
with bounding boxes / waveform segments), Compare (verify with a
match gauge and live threshold slider), Enrollment (register /
identify / forget with a top-K matches view), Embedding (raw
vector inspector with sparkline + copy).

MediaInput supports file upload plus live capture: webcam
snap-to-canvas for face, MediaRecorder -> AudioContext ->
16-bit PCM mono WAV transcode for voice (libsndfile on the
backend only handles WAV/FLAC/OGG natively).

Sidebar gets a new Biometrics section feature-gated on
face_recognition / voice_recognition; routes are wrapped in
<RequireFeature>. No new dependencies -- Font Awesome icons
picked from the Free set.

Assisted-by: Claude:Opus 4.7

* fix(localai): accept data URI prefixes with codec/charset params

Browser MediaRecorder produces data URIs like
  data:audio/webm;codecs=opus;base64,...
so the pre-';base64,' section can carry multiple parameter
segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go
and core/http/endpoints/localai/audio.go only matched exactly one
segment, so recordings straight from the React UI's live-capture
tab failed the strip and then tripped the base64 decoder on the
leading 'data:' literal, surfacing as
  "invalid audio base64: illegal base64 data at input byte 4"

Widened both regexes to `^data:[^,]+?;base64,` so any number of
';param=value' segments between the mime type and ';base64,' are
tolerated. Added a regression test covering the MediaRecorder
shape.

Assisted-by: Claude:Opus 4.7

* fix(insightface): scope pack ONNX loading to known manifests

LocalAI's gallery extracts buffalo_* zips flat into the models
directory, which inevitably mixes with ONNX files from other
backends (opencv face engine, MiniFASNet antispoof, WeSpeaker
voice embedding) and older buffalo pack installs. Feeding those
foreign files into insightface's model_zoo.get_model() blows up
inside the router -- it assumes a 4-D NCHW input and indexes
`input_shape[2]` on tensors that aren't shaped like a face model,
raising IndexError mid-load and leaving the backend unusable.

The router's dispatch isn't amenable to per-file try/except alone
(first-file-wins picks det_10g.onnx from buffalo_l even when the
user asked for buffalo_sc -- alphabetical order happens to favour
the wrong pack). Instead, ship an explicit manifest of the
upstream v0.7 pack contents and scope the glob to that when the
requested pack is known. The manifest is small and stable; future
packs can be added alongside or fall through to the tolerance
loop, which also swallows any remaining IndexError / ValueError
from foreign files with a clear `[insightface] skipped` stderr
line for diagnostics.

Assisted-by: Claude:Opus 4.7

* fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders

Pre-exported speaker-encoder ONNX graphs come in two shapes:

  rank-2  [batch, samples]           -- some 3D-Speaker exports,
                                        take raw waveform directly.
  rank-3  [batch, frames, n_mels]    -- WeSpeaker and most Kaldi-
                                        lineage encoders, expect
                                        pre-computed Kaldi FBank.

OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` --
correct for rank-2, IndexError-on-input_shape[3] on rank-3, which
surfaced to the UI as
  "Invalid rank for input: feats Got: 2 Expected: 3"

Detect the input rank at session init and run Kaldi FBank
(80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before
the forward pass when rank>=3. All knobs are configurable via
backend options for encoders that deviate from defaults.

torchaudio.compliance.kaldi is already in the backend's
requirements (SpeechBrain pulls torchaudio in), so no new
dependency.

Assisted-by: Claude:Opus 4.7

* fix(biometrics): isolate face and voice vector stores

Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker
256-D) biometric embeddings were colliding inside a single
in-memory local-store instance. Enrolling one after the other
failed with
  "Try to add key with length N when existing length is M"
because local-store correctly refuses to mix dimensions in one
keyspace.

The registries were constructed with `storeName=""`, which in
StoreBackend() is just a WithModel() call. But ModelLoader's
cache is keyed on `modelID`, not `model` -- so both registries
collapsed to the same `modelID=""` slot and reused the same
backend process despite looking isolated on paper.

Three complementary fixes:

  1. application.go -- give each registry a distinct default
     namespace ("localai-face-biometrics" /
     "localai-voice-biometrics"). The comment claimed
     isolation, now it's actually enforced.

  2. stores.go -- pass the storeName as both WithModelID and
     WithModel so the ModelLoader cache key separates
     namespaces and the loader spawns distinct processes.

  3. local-store/store.go -- drop the Load() `opts.Model != ""`
     guard. It was there to prevent generic model-loading loops
     from picking up local-store by accident, but that auto-load
     path is being retired; the guard now just blocks legitimate
     namespace isolation. opts.Model is treated as a tag; the
     per-tuple process isolation upstream handles discrimination.

Assisted-by: Claude:Opus 4.7

* fix(gallery): stale-file cleanup and upgrade-tmp directory safety

Two related robustness fixes for backend install/upgrade:

pkg/downloader/uri.go
  OCI downloads passed through
      if filepath.Ext(filePath) != "" ...
          filePath = filepath.Dir(filePath)
  which was intended to redirect file-shaped download targets
  into their parent directory for OCI extraction. The heuristic
  misfires on directory-shaped paths with a dot-suffix --
  gallery.UpgradeBackend uses
      tmpPath = "<backendsPath>/<name>.upgrade-tmp"
  and Go's filepath.Ext treats ".upgrade-tmp" as an extension.
  The rewrite landed the extraction at "<backendsPath>/", which
  then **overwrote the real install** (backends/<name>/) with a
  flat-layout file and left a stray run.sh at the top level. The
  tmp dir itself stayed empty, so the validation step that
  checked "<tmpPath>/run.sh" predictably failed with
      "upgrade validation failed: run.sh not found in new backend"
  Every manual upgrade silently corrupted the backends tree this
  way. Guard the rewrite behind "target isn't already an existing
  directory" -- InstallBackend / UpgradeBackend both pre-create
  the target as a directory, so they get the correct behaviour;
  existing file-path callers with a genuine dot-extension still
  get the parent redirect.

core/gallery/backends.go
  InstallBackend's MkdirAll returned ENOTDIR when something at
  the target path was already a file (legacy dev builds dropped
  golang backend binaries directly at `<backendsPath>/<name>`
  instead of nesting them under their own subdir). That
  permanently blocked reinstall and upgrade for anyone carrying
  that state, since every retry hit the same error. Detect a
  pre-existing non-directory, warn, and remove it before the
  MkdirAll so the fresh install can write the correct nested
  layout with metadata.json + run.sh.

Assisted-by: Claude:Opus 4.7

* fix(galleryop): refresh upgrade cache after backend ops

UpgradeChecker caches the last upgrade-check result and only
refreshes on the 6-hour tick or after an auto-upgrade cycle.
Manual upgrades (POST /api/backends/upgrade/:name) go through
the async galleryop worker, which completes the upgrade
correctly but never tells UpgradeChecker to re-check -- so
/api/backends/upgrades continued to list a just-upgraded backend
as upgradeable, indistinguishable from a failed upgrade, for up
to six hours.

Add an optional `OnBackendOpCompleted func()` hook on
GalleryService that fires after every successful install /
upgrade / delete on the backend channel (async, so a slow
callback doesn't stall the queue). startup.go wires it to
UpgradeChecker.TriggerCheck after both services exist. Result:
the upgrade banner clears within milliseconds of the worker
finishing.

Assisted-by: Claude:Opus 4.7

* build: prepend GOPATH/bin to PATH for protogen-go

install-go-tools runs `go install` for protoc-gen-go and
protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin.
That directory isn't on every dev's PATH, and protoc resolves
its code-gen plugins via PATH, so the immediately-following
protoc invocation fails with
  "protoc-gen-go: program not found"
which in turn blocks `make build` and any
`make backends/%` target that depends on build.

Prepend `go env GOPATH`/bin to PATH for the protoc invocation
so the freshly-installed plugins are found without requiring a
shell-profile change.

Assisted-by: Claude:Opus 4.7

* refactor(ui-api): non-blocking backend upgrade handler with opcache

POST /api/backends/upgrade/:name used to send the ManagementOp
directly onto the unbuffered BackendGalleryChannel, which blocked
the HTTP request whenever the galleryop worker was busy with a
prior operation. The op also didn't show up in /api/operations,
so the Backends UI couldn't reflect upgrade progress on the
affected row.

Register the op in opcache immediately, wrap it in a cancellable
context, store the cancellation function on the GalleryService,
and push onto the channel from a goroutine so the handler
returns right away. Response gains a `jobID` field and a
`message` string so clients have a consistent handle regardless
of whether the op is queued or running.

Pairs with the OnBackendOpCompleted hook added in the galleryop
commit — together the UI sees the upgrade start, watches
progress via /api/operations, and drops the "upgradeable" flag
the moment the worker finishes.

Assisted-by: Claude:Opus 4.7
2026-04-24 08:50:34 +02:00
LocalAI [bot]
c755cd5ab5 feat(swagger): update swagger (#9518)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 23:26:50 +02:00
LocalAI [bot]
0fb04f7ac3 chore(model-gallery): ⬆️ update checksum (#9522)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 23:26:27 +02:00
Ettore Di Giacinto
d9d7b5c29b docs(readme): add April 2026 highlights to Latest News
Assisted-by: Claude-Code:claude-opus-4-7
2026-04-23 20:47:06 +00:00
walcz-de
f877942d97 fix(openresponses): parse OpenAI-spec nested tool_choice + use correct setter (#9509)
Two bugs in MergeOpenResponsesConfig (/v1/responses + WebSocket, *not*
/v1/chat/completions — that has a separate, working path via Tool
unmarshal + SetFunctionCallNameString):

1. **Shape mismatch.** OpenAI's specific-function tool_choice nests the
   name under "function":
       {"type": "function", "function": {"name": "my_function"}}
   The legacy flat shape was:
       {"type": "function", "name": "my_function"}
   Only the flat shape was handled. OpenAI-compliant clients that reach
   /v1/responses (openai-python with the Responses API, Stainless-generated
   SDKs, …) silently failed to force the function.

2. **Wrong setter.** The code called SetFunctionCallString(name), which
   writes the mode field (functionCallString: "none"/"auto"/"required").
   The specific-function name lives in a separate field
   (functionCallNameString), read by ShouldCallSpecificFunction and
   FunctionToCall. Net effect: a correctly-formed tool_choice never
   engaged grammar-based forcing.

The fix preserves backward compatibility by accepting both shapes
(nested preferred, flat as fallback) and routes to the correct setter.

Note: The same "wrong setter" pattern appears at three other sites —
anthropic/messages.go:883, openai/realtime_model.go:171, and
openresponses/responses.go:776 — and /v1/chat/completions has its own
issue parsing tool_choice="required" as a string (json.Unmarshal on a
raw string fails silently). Those are filed as a tracking issue rather
than bundled here to keep this PR focused.

## Test plan
9 new Ginkgo specs under "MergeOpenResponsesConfig tool_choice parsing":
  - string modes: "required" / "auto" / "none"
  - OpenAI-spec nested shape: {type:function, function:{name}}
  - Legacy Anthropic-compat flat shape: {type:function, name}
  - Shape-preference: nested wins over flat when both present
  - Malformed: missing type, wrong type, missing name, empty name, nil

$ go test ./core/http/middleware/ -count=1 -run TestMiddleware
  Ran 28 of 28 Specs in 0.003 seconds -- PASS

## Repro (against /v1/responses)

    curl -N http://localai/v1/responses \
         -H 'Content-Type: application/json' \
         -d '{"model":"qwen3.6-35b-a3b-apex",
              "input":"Weather in Berlin?",
              "tools":[{"type":"function","name":"get_weather",
                        "parameters":{"type":"object",
                          "properties":{"city":{"type":"string"}},
                          "required":["city"]}}],
              "tool_choice":{"type":"function",
                             "function":{"name":"get_weather"}}}'

Before: grammar-based forcing silently inactive; model free-texts.
After : grammar forces get_weather invocation; output contains
        tool_calls with function:{name:"get_weather", arguments:{...}}.
2026-04-23 18:30:05 +02:00
Ettore Di Giacinto
f5eb13d3c2 feat(insightface): add antispoofing (liveness) detection (#9515)
* feat(insightface): add antispoofing (liveness) detection

Light up the anti_spoofing flag that was parked during the first pass.
Both FaceVerify and FaceAnalyze now run the Silent-Face MiniFASNetV2 +
MiniFASNetV1SE ensemble (~4 MB, Apache 2.0, CPU <10ms) when the flag is
set. Failed liveness on either image vetoes FaceVerify regardless of
embedding similarity. Every insightface* gallery entry now ships the
MiniFASNet ONNX weights so existing packs light up after reinstall.

Setting the flag against a model without the MiniFASNet files returns
FAILED_PRECONDITION (HTTP 412) with a clear install message — no
silent is_real=false.

FaceVerifyResponse gained per-image img{1,2}_is_real and
img{1,2}_antispoof_score (proto 9-12); FaceAnalysis's existing
is_real/antispoof_score fields are now populated. Schema fields are
pointers so they are fully absent from the JSON response when
anti_spoofing was not requested — avoids collapsing "not checked" with
"checked and fake" under Go's omitempty on bool.

Validated end-to-end over HTTP against a local install:
- verify + anti_spoofing, both real -> verified=true, score ~0.76
- verify + anti_spoofing, img2 spoof -> verified=false, img2_is_real=false
- analyze + anti_spoofing -> is_real and score per face
- flag against model without MiniFASNet -> HTTP 412 fail-loud

Assisted-by: Claude:claude-opus-4-7 go vet

* test(insightface): wire test target into test-extra

The root Makefile's `test-extra` already runs
`$(MAKE) -C backend/python/insightface test`, but the backend's
Makefile never defined the target — so the command silently errored
and the suite was never executed in CI. Adding the two-line target
(matching ace-step/Makefile) hooks `test.sh` → `runUnittests` →
`python -m unittest test.py`, which discovers both the pre-existing
engine classes (InsightFaceEngineTest, OnnxDirectEngineTest) and the
new AntispoofingTest. Each class skips gracefully when its weights
can't be downloaded from a network-restricted runner.

Assisted-by: Claude:claude-opus-4-7

* test(insightface): exercise antispoofing in e2e-backends (both paths)

Add a `face_antispoof` capability to the Ginkgo e2e suite and extend
the existing FaceVerify + FaceAnalyze specs with liveness assertions
covering BOTH paths:

  real fixture -> is_real=true, score>0, verified stays true
  spoof fixture -> is_real=false, verified vetoed to false

The spoof fixture is upstream's own `image_F2.jpg` (via the yakhyo
mirror) — verified locally against the MiniFASNetV2+V1SE ensemble to
classify as is_real=false with score ~0.013. That makes the assertion
deterministic across CI runs; synthetic/derived spoofs fool the model
unpredictably and would be flaky.

Makefile wires it up end-to-end:
- New INSIGHTFACE_ANTISPOOF_* cache dir + two ONNX downloads with
  pinned SHAs, matching the gallery entries.
- insightface-antispoof-models target shared by both backend configs.
- FACE_SPOOF_IMAGE_URL passed via BACKEND_TEST_FACE_SPOOF_IMAGE_URL.
- Both e2e targets (buffalo-sc + opencv) now:
  * depend on insightface-antispoof-models
  * pass antispoof_v2_onnx / antispoof_v1se_onnx in BACKEND_TEST_OPTIONS
  * include face_antispoof in BACKEND_TEST_CAPS

backend_test.go adds the new capability constant and a faceSpoofFile
fixture resolved the same way as faceFile1/2/3. Spoof assertions are
gated on both capFaceAntispoof AND faceSpoofFile being set, so a test
config that omits the spoof fixture degrades gracefully to "real path
only" instead of failing.

Assisted-by: Claude:claude-opus-4-7 go vet
2026-04-23 18:28:15 +02:00
Ettore Di Giacinto
c1f923b2bc fix(importer): emit all shards for multi-part GGUF models (#9513)
The llama-cpp HuggingFace importer iterated files one at a time and
kept overwriting `lastGGUFFile`, so sharded repos such as
`unsloth/Kimi-K2.6-GGUF` (14 `Q8_K_XL` parts) produced a gallery entry
pointing only at the final shard — useless to llama.cpp's split loader,
which needs shard 1 to discover the set.

Group shards up front via new helpers in `pkg/huggingface-api`
(`SplitShardSuffix`, `ShardGroup`, `GroupShards`). The llama-cpp
importer now picks a group (preferred quant, then last-group fallback)
and emits every shard, with `Model:` pointing at shard 1.
`FindPreferredModelFile` returns shard 1 of the first matching group so
the gallery agent's preview stays coherent for sharded repos.

Adds unit coverage for the HuggingFace branch of the importer (which
had none), plus shard-detection tests in the hfapi package.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]
2026-04-23 15:00:02 +02:00
Ettore Di Giacinto
ed648b3b4e fix(llama-cpp): include server-chat.cpp in grpc-server translation unit (#9511)
* fix(llama-cpp): include server-chat.cpp in grpc-server translation unit

Upstream llama.cpp refactor (ggml-org/llama.cpp#20690) moved the
OAI/Anthropic/Responses and transcription conversion helpers out of
server-common.cpp into a new server-chat.cpp, and server-task.cpp and
server-context.cpp now call those symbols (convert_transcriptions_to_chatcmpl,
server_chat_convert_responses_to_chatcmpl, server_chat_convert_anthropic_to_oai,
server_chat_msg_diff_to_json_oaicompat) via server-chat.h.

grpc-server.cpp builds as a single translation unit by #include-ing the
upstream .cpp files directly. Without including server-chat.cpp, the
declarations are satisfied at compile time via server-chat.h but the
link step fails with undefined references once LLAMA_VERSION crosses
the refactor commit (134d6e54).

Guard the include with __has_include so the same source stays buildable
on older LLAMA_VERSION pins that predate the refactor (where prepare.sh
won't copy server-chat.cpp into tools/grpc-server/).

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(llama-cpp): bump LLAMA_VERSION to 0d0764dfd

Bump to ggml-org/llama.cpp@0d0764dfd2.
Paired with the preceding grpc-server server-chat.cpp include so the
refactor at 134d6e54 links cleanly. Supersedes PR #9494.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-23 14:59:39 +02:00
Ettore Di Giacinto
3ce5248126 Update expected length of instructions in test
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-23 14:58:57 +02:00
Ettore Di Giacinto
04f1a0285d fix(ik-llama-cpp): adapt to common_grammar struct in sampling.h (#9512)
Upstream ik_llama.cpp commit e0596bf6 ("Autoparser") changed
common_params_sampling::grammar from std::string to a common_grammar
struct (type + grammar), which broke our two direct accesses:

 - JSON ingest fed the field through json_value<common_grammar>(...),
   for which nlohmann has no from_json adapter.
 - JSON export emitted the struct directly, for which nlohmann has no
   to_json adapter.

Wrap the incoming JSON string in common_grammar{COMMON_GRAMMAR_TYPE_USER, ...}
and serialize via the inner .grammar member, mirroring upstream's
examples/server/server-context.cpp.

Also bump IK_LLAMA_VERSION to 286ce324baed17c95faec77792eaa6bdb1c7a5f5
so the local-ai side lines up with the dependency bump in #9496.

Assisted-by: Claude-Code:claude-opus-4-7
2026-04-23 13:45:06 +02:00
Ettore Di Giacinto
181ebb6df4 feat: voice recognition (#9500)
* feat(voice-recognition): add /v1/voice/{verify,analyze,embed} + speaker-recognition backend

Audio analog to face recognition. Adds three gRPC RPCs
(VoiceVerify / VoiceAnalyze / VoiceEmbed), their Go service and HTTP
layers, a new FLAG_SPEAKER_RECOGNITION capability flag, and a Python
backend scaffold under backend/python/speaker-recognition/ wrapping
SpeechBrain ECAPA-TDNN with a parallel OnnxDirectEngine for
WeSpeaker / 3D-Speaker ONNX exports.

The kokoros Rust backend gets matching unimplemented trait stubs —
tonic's async_trait has no defaults, so adding an RPC without Rust
stubs breaks the build (same regression fixed by eb01c772 for face).

Swagger, /api/instructions, and the auth RouteFeatureRegistry /
APIFeatures list are updated so the endpoints surface everywhere a
client or admin UI looks.

Assisted-by: Claude:claude-opus-4-7

* feat(voice-recognition): add 1:N identify + register/forget endpoints

Mirrors the face-recognition register/identify/forget surface. New
package core/services/voicerecognition/ carries a Registry interface
and a local-store-backed implementation (same in-memory vector-store
plumbing facerecognition uses, separate instance so the embedding
spaces stay isolated).

Handlers under /v1/voice/{register,identify,forget} reuse
backend.VoiceEmbed to compute the probe vector, then delegate the
nearest-neighbour search to the registry. Default cosine-distance
threshold is tuned for ECAPA-TDNN on VoxCeleb (0.25, EER ~1.9%).

As with the face registry, the current backing is in-memory only — a
pgvector implementation is a future constructor-level swap.

Assisted-by: Claude:claude-opus-4-7

* feat(voice-recognition): gallery, docs, CI and e2e coverage

- backend/index.yaml: speaker-recognition backend entry + CPU and
  CUDA-12 image variants (plus matching development variants).
- gallery/index.yaml: speechbrain-ecapa-tdnn (default) and
  wespeaker-resnet34 model entries. The WeSpeaker SHA-256 is a
  deliberate placeholder — the HF URI must be curl'd and its hash
  filled in before the entry installs.
- docs/content/features/voice-recognition.md: API reference + quickstart,
  mirrors the face-recognition docs.
- React UI: CAP_SPEAKER_RECOGNITION flag export (consumers follow face's
  precedent — no dedicated tab yet).
- tests/e2e-backends: voice_embed / voice_verify / voice_analyze specs.
  Helper resolveFaceFixture is reused as-is — the only thing face/voice
  share is "download a file into workDir", so no need for a new helper.
- Makefile: docker-build-speaker-recognition + test-extra-backend-
  speaker-recognition-{ecapa,all} targets. Audio fixtures default to
  VCTK p225/p226 samples from HuggingFace.
- CI: test-extra.yml grows a tests-speaker-recognition-grpc job
  mirroring insightface. backend.yml matrix gains CPU + CUDA-12 image
  build entries — scripts/changed-backends.js auto-picks these up.

Assisted-by: Claude:claude-opus-4-7

* feat(voice-recognition): wire a working /v1/voice/analyze head

Adds AnalysisHead: a lazy-loading age / gender / emotion inference
wrapper that plugs into both SpeechBrainEngine and OnnxDirectEngine.

Defaults to two open-licence HuggingFace checkpoints:
  - audeering/wav2vec2-large-robust-24-ft-age-gender (Apache 2.0) —
    age regression + 3-way gender (female / male / child).
  - superb/wav2vec2-base-superb-er (Apache 2.0) — 4-way emotion.

Both are optional and degrade gracefully when transformers or the
model can't be loaded — the engine raises NotImplementedError so the
gRPC layer returns 501 instead of a generic 500.

Emotion classes pass through from the model (neutral/happy/angry/sad
on the default checkpoint); the e2e test now accepts any non-empty
dominant gender so custom age_gender_model overrides don't fail it.

Adds transformers to the backend's CPU and CUDA-12 requirements.

Assisted-by: Claude:claude-opus-4-7

* fix(voice-recognition): pin real WeSpeaker ResNet34 ONNX SHA-256

Replaces the placeholder hash in gallery/index.yaml with the actual
SHA-256 (7bb2f06e…) of the upstream
Wespeaker/wespeaker-voxceleb-resnet34-LM ONNX at ~25MB. `local-ai
models install wespeaker-resnet34` now succeeds.

Assisted-by: Claude:claude-opus-4-7

* fix(voice-recognition): soundfile loader + honest analyze default

Two issues surfaced on first end-to-end smoke with the actual backend
image:

1. torchaudio.load in torchaudio 2.8+ requires the torchcodec package
   for audio decoding. Switch SpeechBrainEngine._load_waveform to the
   already-present soundfile (listed in requirements.txt) plus a numpy
   linear resample to 16kHz. Drops a heavy ffmpeg-linked dep and the
   codepath we never exercise (torchaudio's ffmpeg backend).

2. The AnalysisHead was defaulting to audeering/wav2vec2-large-robust-
   24-ft-age-gender, but AutoModelForAudioClassification silently
   mangles that checkpoint — it reports the age head weights as
   UNEXPECTED and re-initialises the classifier head with random
   values, so the "gender" output is noise and there is no age output
   at all. Make age/gender opt-in instead (empty default; users wire
   a cleanly-loadable Wav2Vec2ForSequenceClassification checkpoint via
   age_gender_model: option). Emotion keeps its working Superb default.
   Also broaden _infer_age_gender's tensor-shape handling and catch
   runtime exceptions so a dodgy age/gender head never takes down the
   whole analyze call.

Docs and README updated to match the new policy.

Verified with the branch-scoped gallery on localhost:
- voice/embed    → 192-d ECAPA-TDNN vector
- voice/verify   → same-clip dist≈6e-08 verified=true; cross-speaker
                   dist 0.76–0.99 verified=false (as expected)
- voice/register/identify/forget → round-trip works, 404 on unknown id
- voice/analyze  → emotion populated, age/gender omitted (opt-in)

Assisted-by: Claude:claude-opus-4-7

* fix(voice-recognition): real CI audio fixtures + fixture-agnostic verify spec

Two issues surfaced after CI actually ran the speaker-recognition e2e
target (I'd curl-tested against a running server but hadn't run the
make target locally):

1. The default BACKEND_TEST_VOICE_AUDIO_* URLs pointed at
   huggingface.co/datasets/CSTR-Edinburgh/vctk paths that return 404
   (the dataset is gated). Swap them for the speechbrain test samples
   served from github.com/speechbrain/speechbrain/raw/develop/ —
   public, no auth, correct 16kHz mono format.

2. The VoiceVerify spec required d(file1,file2) < 0.4, assuming
   file1/file2 were same-speaker. The speechbrain samples are three
   different speakers (example1/2/5), and there is no easy un-gated
   source of true same-speaker audio pairs (VoxCeleb/VCTK/LibriSpeech
   are all license- or size-gated for CI use). Replace the ceiling
   check with a relative-ordering assertion: d(pair) > d(same-clip)
   for both file2 and file3 — that's enough to prove the embeddings
   encode speaker info, and it works with any three non-identical
   clips. Actual speaker ordering d(1,2) vs d(1,3) is logged but not
   asserted.

Local run: 4/4 voice specs pass (Health, LoadModel, VoiceEmbed,
VoiceVerify) on the built backend image. 12 non-voice specs skipped
as expected.

Assisted-by: Claude:claude-opus-4-7

* fix(ci): checkout with submodules in the reusable backend_build workflow

The kokoros Rust backend build fails with

    failed to read .../sources/Kokoros/kokoros/Cargo.toml: No such file

because the reusable backend_build.yml workflow's actions/checkout
step was missing `submodules: true`. Dockerfile.rust does `COPY .
/LocalAI`, and without the submodule files the subsequent `cargo
build` can't find the vendored Kokoros crate.

The bug pre-dates this PR — scripts/changed-backends.js only triggers
the kokoros image job when something under backend/rust/kokoros or
the shared proto changes, so master had been coasting past it. The
voice-recognition proto addition re-broke it.

Other checkouts in backend.yml (llama-cpp-darwin) and test-extra.yml
(insightface, kokoros, speaker-recognition) already pass
`submodules: true`; this brings the shared backend image builder in
line.

Assisted-by: Claude:claude-opus-4-7
2026-04-23 12:07:14 +02:00
LocalAI [bot]
1c59165d63 chore(model gallery): 🤖 add 1 new models via gallery agent (#9505)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 09:32:44 +02:00
LocalAI [bot]
eb00d9b178 chore: ⬆️ Update leejet/stable-diffusion.cpp to c97702e1057c2fe13a7074cd9069cb9dd6edc1bf (#9495)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-23 09:32:21 +02:00
LocalAI [bot]
2068b6f43c feat(swagger): update swagger (#9498)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 22:51:39 +02:00
Ettore Di Giacinto
eb01c77214 fix(kokoros): implement face_verify and face_analyze trait stubs (#9499)
The backend.proto was updated to add FaceVerify and FaceAnalyze RPCs
(face detection support), but the Rust KokorosService was never updated
to match the regenerated tonic trait, breaking compilation with E0046:

    not all trait items implemented, missing: `face_verify`, `face_analyze`

Stubs both methods as unimplemented, matching the pattern used for the
other RPCs Kokoros does not support.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
2026-04-22 22:51:18 +02:00
Richard Palethorpe
bb4fda6f0e chore(agents): Update the backend creation instructions to include Rust and extra tests (#9490)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-22 22:43:01 +02:00
Ettore Di Giacinto
f0c92610a1 feat(importer): expand importer flow to almost all backends (#9466)
* docs(agents): require importer integration when adding backends

Document the importer registry workflow so contributors know that adding
a new backend also requires updating the /import-model dropdown source:
either a new importer in core/gallery/importers/, extending an existing
one for drop-in replacements, or the pref-only slice for backends with
no reliable auto-detect signal. Always covered by a table-driven test.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for Batch 0 primitives

Introduce failing tests that drive Batch 0 of the importer expansion:

- pkg/huggingface-api: assert GetModelDetails populates PipelineTag and
  LibraryName from /api/models/{repo}, and that a failing metadata
  endpoint still returns file details (best-effort fetch).
- core/gallery/importers/helpers_test.go: new table-driven coverage for
  HasFile, HasExtension, HasONNX, HasONNXConfigPair, HasGGMLFile.
- core/gallery/importers/importers_test.go: assert ErrAmbiguousImport
  sentinel exists and round-trips through errors.Is.
- core/gallery/importers/local_test.go: extend with detection cases for
  ggml-*.bin (whisper), silero_vad.onnx (silero-vad), and the piper
  .onnx + .onnx.json pair.
- core/http/endpoints/localai/import_model_test.go: assert
  ImportModelURIEndpoint returns HTTP 400 with a structured
  {error, detail, hint} body when ErrAmbiguousImport surfaces.

All tests fail in the expected places (missing fields, missing
helpers, missing sentinel, endpoint still wraps as 500).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): Batch 0 foundation — helpers, sentinel, local detection

Implements the Batch 0 primitives that subsequent importer batches build on:

- pkg/huggingface-api: ModelDetails gains PipelineTag and LibraryName.
  GetModelDetails now layers a best-effort GET /api/models/{repo} fetch
  on top of ListFiles — a metadata outage leaves the fields empty but
  still returns full file details. Uses a dedicated response struct
  because the single-model endpoint uses snake_case keys while the list
  endpoint historically returned camelCase.

- core/gallery/importers/helpers.go: generic HasFile, HasExtension,
  HasONNX, HasONNXConfigPair, HasGGMLFile helpers working on
  []hfapi.ModelFile so per-backend importers can detect artefact
  patterns without duplicating string wrangling.

- core/gallery/importers/importers.go: adds the ErrAmbiguousImport
  sentinel. DiscoverModelConfig now returns it (wrapped with
  fmt.Errorf("%w: ...")) when no importer matched AND the HF
  pipeline_tag falls in a whitelist of narrow modalities (ASR, TTS,
  sentence-similarity, text-classification, object-detection). The
  whitelist is intentionally narrow — unknown tags keep the previous
  "no importer matched" behaviour to avoid blocking rare repos.

- core/gallery/importers/local.go: three new local-path detections,
  inserted before the existing merged-transformers branch:
    * ggml-*.bin → whisper
    * silero*.onnx → silero-vad
    * *.onnx + *.onnx.json pair → piper

- core/http/endpoints/localai/import_model.go: ImportModelURIEndpoint
  surfaces ErrAmbiguousImport as HTTP 400 with
  {error, detail, hint} JSON, preserving existing behaviour for
  unrelated errors.

Green tests:
  go test ./core/gallery/importers/... ./pkg/huggingface-api/... \
          ./core/http/endpoints/localai/...

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(importers): red tests for KnownBackend endpoint and importer metadata

Add failing tests that drive Batch UI-Dropdown:

- importers_test.go: assert importers expose Name/Modality/AutoDetects
  and that LlamaCPPImporter advertises drop-in replacements via a new
  AdditionalBackendsProvider interface. A Registry() accessor is also
  expected.

- backend_test.go (new): assert GET /backends/known returns
  []schema.KnownBackend, covers every importer, exposes drop-in
  llama-cpp replacements, includes curated pref-only backends, has no
  duplicates, and is sorted by Modality+Name.

These tests fail at compile time against master; they are intentionally
red so the follow-up green commit is reviewable.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery): add /backends/known endpoint for importer-aware backend list

Extend the Importer interface with Name/Modality/AutoDetects so the
import system can self-describe its registry, and introduce the
AdditionalBackendsProvider interface so importers can advertise drop-in
replacements (llama-cpp advertises ik-llama-cpp and turboquant).

Expose the new GET /backends/known endpoint that merges:

- the importer registry (auto-detect supported),
- drop-in replacements hosted by importers (preference-only),
- a curated knownPrefOnlyBackends slice for backends with no dedicated
  importer (sglang, tinygrad, trl, mlx-vlm, whisperx, kokoros, Qwen TTS
  variants, sam3-cpp) — kept at the top of backend.go so contributors
  adding a new pref-only backend have one obvious place to edit,
- backends installed on disk but unknown to the importer (marked
  AutoDetect=false, empty Modality).

The endpoint deliberately does NOT filter by gallery membership or host
capability (unlike /backends/available): LocalAI may auto-install a
backend that is not yet present, so the import form dropdown must show
everything the importer knows about.

Response is deduplicated (importer wins over pref-only) and sorted by
Modality+Name for deterministic output.

Registered in core/http/routes/localai.go next to /backends/available
under the same admin middleware.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui): source import form backend dropdown from /backends/known

Replace the hard-coded BACKENDS constant in ImportModel.jsx with a
live fetch of /backends/known on mount. Users now see every backend
the importer layer knows about (including preference-only entries)
grouped by modality, not a stale subset.

Changes:

- config.js: add backendsKnown endpoint constant next to
  backendsAvailable.
- api.js: add backendsApi.listKnown() wrapper.
- ImportModel.jsx: remove BACKENDS constant, fetch the list via
  useEffect, and derive grouped options via buildBackendOptions.
  Preference-only entries render with a " (preference-only)" suffix.
  Loading state disables the dropdown with a "Loading backends…"
  placeholder; on fetch failure the form falls back to auto-detect
  only and surfaces a non-blocking toast.
- SearchableSelect.jsx: accept items flagged isHeader=true and render
  them as non-selectable section dividers. Keyboard navigation skips
  headers and search queries hide them so filtered output stays
  relevant.

Vitest is not set up in this project (devDependencies ship Playwright
only). Per the brief's guard-rail, no frontend test framework is
introduced; coverage is provided by the Go handler tests that assert
the /backends/known contract consumed by the React form.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for whisper importer

Asserts detection on ggerganov/whisper.cpp (via ggml-*.bin filename),
the preferences.backend=whisper override path for arbitrary URIs,
and the Importer interface metadata (name/modality/autodetect).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add whisper importer

Recognises whisper.cpp GGML models by the "ggml-*.bin" filename
convention (direct URL or HF repo member) and by the explicit
preferences.backend="whisper" override. Emits backend: whisper with
the transcript use-case. Registered before llama-cpp so the narrow
filename signal wins before any generic GGUF match is attempted.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for moonshine importer

Asserts detection on UsefulSensors/moonshine-tiny via owner + ONNX
files, the preferences.backend=moonshine override for arbitrary URIs,
and the Importer interface metadata (name/modality/autodetect).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add moonshine importer

Matches UsefulSensors-owned HF repos whose artefacts or metadata
identify them as ASR: on-disk .onnx files (the canonical Moonshine
packaging) OR pipeline_tag=automatic-speech-recognition (covers
transformers/safetensors-only sibling repos). preferences.backend=
moonshine overrides detection. Test uses the live moonshine-tiny
repo because the canonical UsefulSensors/moonshine repo currently
hits a recursive-subfolder bug in pkg/huggingface-api ListFiles.

Registered after WhisperImporter but before LlamaCPPImporter and
TransformersImporter so the narrower owner+ASR signal wins before
the generic tokenizer.json check routes the repo to transformers.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for nemo importer

Asserts detection on nvidia/parakeet-tdt-0.6b-v3 via owner + .nemo
file, the preferences.backend=nemo override for arbitrary URIs, and
the Importer interface metadata (name/modality/autodetect).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add nemo importer

Matches nvidia-owned HF repos that ship a .nemo checkpoint archive,
the canonical NeMo ASR packaging. preferences.backend=nemo forces
detection. Registered between moonshine and llama-cpp so the narrow
owner + extension signal wins before any downstream generic matcher.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for faster-whisper importer

Asserts detection on Systran/faster-whisper-large-v3 (owner +
model.bin + config.json + ASR pipeline), the preferences.backend=
faster-whisper override for arbitrary URIs, and the Importer
interface metadata.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add faster-whisper importer

Recognises CTranslate2-packaged whisper checkpoints distributed for
the faster-whisper runtime: model.bin + config.json + ASR
pipeline_tag, narrowed to Systran-owned repos or repo names
containing "faster-whisper" to avoid falsely claiming vanilla
OpenAI whisper HF repos. preferences.backend=faster-whisper
overrides detection. Registered before llama-cpp and transformers
so the narrow signal wins before tokenizer.json routes the repo to
the generic transformers importer.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for qwen-asr importer

Asserts detection on Qwen/Qwen3-ASR-1.7B via owner + ASR substring
in the repo name, the preferences.backend=qwen-asr override for
arbitrary URIs, and the Importer interface metadata.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add qwen-asr importer

Matches Qwen-owned HF repos whose name contains "ASR"
(case-insensitive), routing them to the qwen-asr backend rather
than the generic transformers/vllm path. The substring check scans
the repo portion only so the owner field cannot leak a false match.
preferences.backend=qwen-asr forces detection. Registered before
llama-cpp and transformers so the narrow owner+name signal wins.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): ASR ambiguity surfaces ErrAmbiguousImport

Locks in the behaviour added in Batch 0: an HF repo whose pipeline_tag
marks it as automatic-speech-recognition but whose artefacts match no
ASR importer (and no generic importer) must fail with
ErrAmbiguousImport so callers know to pass preferences.backend rather
than silently guess. pyannote/voice-activity-detection is the fixture
— its file list is only config.yaml + README, leaving every importer's
artefact check negative.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for piper importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add piper importer

Detects piper TTS voices by the canonical <voice>.onnx + <voice>.onnx.json
pair packaging (via HasONNXConfigPair). Narrow enough to skip generic
ONNX repos used by other backends (Moonshine ASR, sentence-transformers).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for bark importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add bark importer

Detects Suno's Bark TTS checkpoints by HF owner "suno" + repo name
prefix "bark". Adds HFOwnerRepoFromURI() helper so importers can fall
back to URI parsing when pkg/huggingface-api's recursive tree listing
errors on repos with nested subdirectories (suno/bark ships a
speaker_embeddings/v2 subtree that trips a pre-existing path-doubling
bug in the listFilesInPath recursion).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for fish-speech importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add fish-speech importer

Detects Fish Audio TTS releases by HF owner "fishaudio" with a URI-based
fallback for repos whose tree recursion trips the pre-existing hfapi
path-doubling bug.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for outetts importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add outetts importer

Detects OuteAI's OuteTTS releases by HF owner "OuteAI" or a case-
insensitive "OuteTTS" substring in the repo name, with a URI-based
fallback for recursion-bugged repos.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for voxcpm importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add voxcpm importer

Detects OpenBMB's VoxCPM TTS family by repo-name substring (community
mirrors re-host the weights under many owners — mlx-community,
bluryar, callgg, etc).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for kokoro importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add kokoro importer

Detects hexgrad's Kokoro TTS by the "Kokoro" repo-name substring paired
with a PyTorch .pth/.pt checkpoint — the pairing excludes ONNX-only
mirrors (handled by the pref-only `kokoros` Rust runtime) and GGUF
mirrors (handled by llama-cpp).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for kitten-tts importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add kitten-tts importer

Detects KittenML's kitten-tts releases by owner or "kitten-tts" repo-name
substring, with URI-parsing fallback.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for neutts importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add neutts importer

Detects Neuphonic's NeuTTS releases by owner "neuphonic" or "neutts"
repo-name substring, with URI-parsing fallback.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for chatterbox importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add chatterbox importer

Detects Resemble AI's Chatterbox TTS by owner "ResembleAI" or
"chatterbox" repo-name substring, with URI-parsing fallback.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for vibevoice importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add vibevoice importer

Detects Microsoft's VibeVoice TTS by "vibevoice" repo-name substring
(case-insensitive) so community mirrors still route here.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for coqui importer

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add coqui importer

Detects Coqui AI's TTS releases (XTTS-v2, YourTTS, …) by the
authoritative `coqui` HF owner, with URI-parsing fallback.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): TTS ambiguity surfaces ErrAmbiguousImport

Adds a Ginkgo spec that imports nari-labs/Dia-1.6B — a real HF repo
carrying pipeline_tag="text-to-speech" whose artefacts (*.pth, one
safetensors shard, preprocessor_config.json, config.json) match none of
the Batch-2 TTS importers nor the generic text/image importers — and
asserts DiscoverModelConfig wraps ErrAmbiguousImport via errors.Is.

Also pivots the endpoint-level ambiguity fixture from hexgrad/Kokoro-82M
to nari-labs/Dia-1.6B. Batch 2 added a dedicated kokoro importer that
now claims the original fixture; Dia remains genuinely unclaimed and
so exercises the same ambiguity code path at the HTTP layer.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for stablediffusion-ggml importer

Covers HF repo detection (city96/FLUX.1-dev-gguf), raw .gguf URL matching on
filename arch tokens, preference override, and Importer interface metadata.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add stablediffusion-ggml importer

Detects GGUF-packed Stable Diffusion and FLUX checkpoints (leejet owner,
city96 FLUX mirrors, second-state SD dumps, raw .gguf URLs with arch
tokens) and routes them to the stablediffusion-ggml backend. Registered
BEFORE LlamaCPPImporter so .gguf image checkpoints are not stolen by
llama-cpp's generic .gguf match. Reuses HFOwnerRepoFromURI for the
hfapi-recursion-bug fallback. preferences.backend overrides detection.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for ace-step importer

Covers HF repo-name detection (ACE-Step/ACE-Step-v1-3.5B), preference
override, and Importer interface metadata.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add ace-step importer

Routes ACE-Step music generation checkpoints (ACE-Step/ACE-Step-v1-3.5B,
ACE-Step/Ace-Step1.5, community mirrors) to the ace-step backend.
Matching is case-insensitive on the "ace-step" repo-name substring and
owner, with an HFOwnerRepoFromURI fallback for the hfapi recursion bug.
KnownUsecaseStrings mirrors the gallery's ace-step-turbo entry
(sound_generation, tts). preferences.backend overrides.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): surface ErrAmbiguousImport on text-to-image misses

Adds text-to-image to ambiguousModalities whitelist and covers the
h94/IP-Adapter-FaceID case — pipeline_tag=text-to-image but ships only
.bin/.safetensors so diffusers, stablediffusion-ggml, llama-cpp,
transformers, vllm, mlx, and ace-step all miss. DiscoverModelConfig now
surfaces ErrAmbiguousImport for that shape instead of the opaque
"no importer matched" error.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for vllm-omni importer

Introduces the test surface for the forthcoming VLLMOmniImporter:
detection via preferences.backend, Qwen owner + Omni repo token,
URI-only fallback, negative cases (plain Qwen, random OmniX repo), and
Import() emitting backend: vllm-omni with chat + multimodal usecases.

Includes a registration-order assertion via DiscoverModelConfig to pin
the requirement that vllm-omni wins over vllm for Qwen Omni repos
(tokenizer files are usually present too).

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add vllm-omni importer

Adds VLLMOmniImporter for Qwen Omni-style multimodal checkpoints
(Qwen3-Omni, Qwen2.5-Omni, …). Detection is narrow: HF owner "Qwen"
combined with "omni" in the repo name, or a repo name matching the
-Omni-/Omni- naming pattern. preferences.backend="vllm-omni" always
wins; HFOwnerRepoFromURI provides a URI-only fallback for the hfapi
recursion-bug edge case.

Emitted YAML sets backend: vllm-omni and known_usecases: [chat,
multimodal], matching the gallery/index.yaml vllm-omni entries. The
importer is registered ahead of VLLMImporter so Qwen Omni repos —
which also carry tokenizer files — route to vllm-omni rather than the
plain vllm backend.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for llama-cpp drop-in preferences

Pins the expected drop-in replacement behaviour: preferences.backend
of ik-llama-cpp or turboquant must swap the emitted YAML backend
field while keeping the llama-cpp file layout identical. Also covers
the unknown-backend case (must stay llama-cpp) and re-asserts
AdditionalBackends() returns the two curated entries with non-empty
descriptions.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): llama-cpp honours ik-llama-cpp and turboquant drop-in preferences

preferences.backend set to ik-llama-cpp or turboquant now swaps the
emitted YAML backend field while leaving the file layout, model path,
mmproj handling and everything else in the llama-cpp Import pipeline
untouched. Unknown values are ignored and fall back to backend:
llama-cpp so arbitrary input can't leak into the config.

Aligns the AdditionalBackends() descriptions with the user-facing
naming conventions surfaced via /backends/known. No changes to the
pref-only curated list in endpoints/localai/backend.go: the two
drop-in names have always lived on the importer side via
AdditionalBackends.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for silero-vad importer

Add the SileroVADImporter test fixtures covering metadata, preference
overrides, snakers4 + onnx detection, silero_vad.onnx canonical filename,
URI fallback, and live HF discovery. Implementation follows in the next
commit.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add silero-vad importer

Recognise the Silero VAD ONNX packaging: the canonical silero_vad.onnx
filename or any ONNX file under the snakers4 owner. Emits a
backend: silero-vad config with the vad known_usecase, and attaches the
canonical file entry when present so the weights download on import.

Registered before the generic importers so the unique-filename signal
takes precedence over any downstream tokenizer-based matcher.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for rerankers importer

Cover the RerankersImporter contract: interface metadata, preference
override, cross-encoder owner detection, case-insensitive 'reranker'
substring match (BAAI/bge-reranker, Alibaba-NLP/gte-reranker), URI
fallback, and the full-discovery ordering check that a BAAI reranker
repo must route to the rerankers importer rather than transformers.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add rerankers importer

Recognise reranker repositories — cross-encoder owner or any repo whose
name contains 'reranker' (case-insensitive). Emits backend: rerankers
with reranking: true and the rerank known_usecase.

Registered ahead of sentencetransformers and transformers so reranker
repos that happen to ship tokenizer.json or modules.json still route
here.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for sentencetransformers importer

Cover the SentenceTransformersImporter contract: interface metadata,
preference override, modules.json marker file, sentence_bert_config.json
marker file, sentence-transformers owner, URI fallback, and the
full-discovery ordering check that ensures a sentence-transformers HF
URI routes here rather than transformers.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add sentencetransformers importer

Recognise sentence-transformers embedding repos by modules.json,
sentence_bert_config.json, or the sentence-transformers owner. Emits
backend: sentencetransformers with embeddings: true and the embeddings
known_usecase.

Registered ahead of transformers so ST repos that carry tokenizer.json
still route here.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): add failing tests for rfdetr importer

Cover the RFDetrImporter contract: interface metadata, preference
override, case-insensitive rf-detr and rfdetr substring matches, URI
fallback, and negative cases. Implementation follows in the next
commit.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(gallery/importers): add rfdetr importer

Recognise RF-DETR object-detection repositories by a case-insensitive
'rf-detr' / 'rfdetr' substring in the repo name. Emits backend: rfdetr
with the detection known_usecase.

Registered ahead of transformers so RF-DETR repos with tokenizer
artefacts still route here.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(gallery/importers): surface ErrAmbiguousImport on sentence-similarity misses

Add an ambiguity fixture covering the embeddings/rerankers modality.
Qdrant/bm25 carries pipeline_tag=sentence-similarity but ships only
config.json + stopword .txt files — none of the Batch 5 importers
(silero-vad, rerankers, sentencetransformers, rfdetr) or the generic
vllm/transformers/llama-cpp/mlx/diffusers importers match. Because the
modality is in the ambiguous whitelist, DiscoverModelConfig must
surface ErrAmbiguousImport.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(localai/backend): red tests for KnownBackend.Installed flag

Extend the /backends/known suite with three failing cases that pin down
the forthcoming Installed field: JSON field presence on every entry,
flipping to true when an importer-registered backend is also present on
disk (and staying false for non-installed pref-only entries), and
surfacing system-only backends with empty modality and AutoDetect=false.

A small writeFakeSystemBackend helper plants a run.sh under the backends
dir so gallery.ListSystemBackends recognises the fixture.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(schema,localai/backend): add Installed flag to KnownBackend

Add an Installed bool to schema.KnownBackend and populate it from the
/backends/known handler so the React import form can warn users that
picking a not-yet-installed backend will trigger an automatic download
on submit.

Computation: after merging the importer registry, additional backends
provider entries and the curated pref-only slice, the handler walks
gallery.ListSystemBackends(systemState) and either flips the existing
map entry's Installed flag to true (preserving modality / autodetect /
description metadata) or inserts a bare {Installed:true} entry for
system-only backends the importer layer doesn't know about.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(localai/import_model): structured ambiguous-import response

Add red tests covering the extended ambiguity shape the React import
form needs:

- ImportModelURIEndpoint must return an HTTP 400 body that exposes the
  detected `modality` (normalised to the importer modality key, e.g.
  "tts" for pipeline_tag=text-to-speech) and a list of `candidates`
  (backend names filtered by modality, excluding text-LLM backends).
- The importers package must surface a typed AmbiguousImportError so
  HTTP consumers can read Modality + Candidates without parsing the
  error string. errors.Is against the existing sentinel keeps working.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(localai/import_model): structured ambiguity response with modality + candidates

DiscoverModelConfig now returns a typed AmbiguousImportError that
carries the importer modality key, candidate backend names, the
original URI, and the raw HF pipeline_tag. Its Is() preserves
errors.Is(err, ErrAmbiguousImport) for legacy callers.

The importer modality is pre-mapped from the HF pipeline_tag
(automatic-speech-recognition → asr, text-to-speech → tts, etc) via
PipelineTagToModality — surfaced as an exported helper so downstream
consumers can avoid duplicating the table. CandidatesForModality
filters the default importer registry plus AdditionalBackendsProvider
drop-ins by modality, sorts deterministically, and is the single
source of truth used by ImportModelURIEndpoint.

ImportModelURIEndpoint now returns HTTP 400 with
  { error, detail, modality, candidates, hint }
when ambiguity fires, letting the React form render a modality-scoped
picker inline instead of a generic toast.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): manual pick badge + tooltip

Red Playwright coverage for the preference-only → manual pick rename:

- The Backend dropdown renders a "manual pick" badge on every option
  whose KnownBackend.auto_detect is false.
- The badge carries a title attribute with hover-tooltip copy that
  explains auto-detect won't route to this backend.
- Auto-detectable backends must NOT carry the badge.
- The legacy " (preference-only)" suffix is gone from every label.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(import): replace preference-only suffix with manual pick badge

SearchableSelect option rows now support an optional badge field — a
muted pill rendered to the right of the label with an optional title
attribute for native hover tooltips. Plain text so screen readers read
it alongside the option name.

buildBackendOptions in ImportModel stops appending " (preference-only)"
to the label and instead sets badge="manual pick" plus a descriptive
tooltip on every option whose auto_detect is false. The Backend help
text explains what "manual pick" means so users aren't left wondering
about the badge.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): inline ambiguity picker

Red Playwright coverage for Batch A2 — when the server returns a 400
ambiguity body, the form must render an inline alert instead of a
toast, expose one clickable chip per candidate backend, and support
both auto-resubmit on pick and silent dismiss.

- Mocks /api/models/import-uri with the structured ambiguity body
  (error, detail, modality, candidates, hint).
- On first click of Import, the alert is visible, carries
  modality-specific copy, and shows a chip per candidate.
- Clicking a chip clears the alert, sets the Backend dropdown, and
  triggers a second POST to /api/models/import-uri.
- Dismissing the alert leaves the Backend dropdown on Auto-detect —
  no implicit backend assignment.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): inline ambiguity alert with candidate chips

Adds AmbiguityAlert — a soft, info-coloured card rendered above the URI
input when the server returns a structured 400 with { modality,
candidates }. Message is modality-aware (tts/asr/embeddings/image/
reranker/detection get purpose-written copy, everything else falls back
to a generic template). Each candidate is a clickable chip that shows a
download icon when /backends/known marks the backend as not yet
installed, so users aren't surprised by an implicit install.

ImportModel wires the alert to handleSimpleImport's error path:
- api.handleResponse now attaches { status, body } to the thrown Error
  so pages can pattern-match on structured responses instead of string
  error messages.
- handleSimpleImport detects `status === 400 && body.error === 'ambiguous
  import'` and flips into the inline-picker mode instead of toasting.
- Clicking a chip sets prefs.backend and auto-resubmits (passing the
  picked backend as an override so setPrefs's asynchrony doesn't leak
  a stale value).
- Dismissing clears the alert; changing the URI or the backend also
  clears it so a stale alert never sticks around.

Test fixtures mock GET /backends/known + POST /models/import-uri so the
Playwright specs don't depend on real network reachability.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): auto-install warning

Red Playwright coverage for Batch A3 — when the user picks a backend
whose KnownBackend.installed is false, the form must render a muted
inline note under the Backend dropdown warning that submitting will
download the backend first. Picking an installed backend or leaving
Auto-detect selected must keep the note hidden.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): auto-install warning under backend dropdown

When the user picks a backend whose KnownBackend.installed is false,
render a muted inline note under the Backend dropdown's help text
warning that submitting will download the backend first. The note
lives inside the same form-group so it lines up with the existing
hint text; it's hidden when Auto-detect is selected (the selected
backend is unknowable at that point) or when the chosen backend is
already on disk.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(import): drop redundant section header, adjust icons, rename HF shortcut

- Remove the "Import from URI" card-level <h2> — the page title already
  says "Import New Model" one row up, so the secondary header was
  duplicating information.
- Swap the fa-star on "Common Preferences" for fa-sliders (stars imply
  favourites/ratings; this is just a preferences block) and move the
  Custom Preferences fa-sliders-h to fa-plus-circle so the two blocks
  read as distinct rather than as two sliders.
- Rename the HF shortcut from "Search GGUF on HF" → "Browse models on
  HF" and drop the `search=gguf` filter on the linked URL. The import
  form now supports ~40 backends; hard-coding GGUF in the copy no
  longer matches the form's actual reach.
- Pure polish — no behaviour change, covered by the existing Batch A
  Playwright suite.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): batch B — simple/power switch, options, tabs, dialog

Adds a failing Playwright suite covering the full Batch B surface ahead
of implementation:

- B1: SimplePowerSwitch segmented control renders, toggles, persists to
  localStorage across reloads.
- B2: Simple-mode Options disclosure is collapsed by default; expanding
  exposes only Backend, Model Name, Description (no quantizations,
  mmproj, model type, or custom prefs).
- B3: Power mode has Preferences and YAML tabs with a persistent
  selection across reloads; URI/name/description typed in Simple carry
  over to Power; YAML tab swaps the primary action to Create.
- B4: Switching Power -> Simple with a custom preference set triggers
  the 3-button confirmation dialog (Keep / Discard / Cancel) with the
  documented semantics.

Tests fail against master — implementation lands in the following
commits.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): add SimplePowerSwitch segmented control

Replaces the previous "Advanced Mode / Simple Mode" toggle button in the
page header with a two-segment control that flips between Simple and
Power. The control reuses the existing .segmented CSS shared with the
Sound page for visual consistency.

Mode state is persisted to localStorage under `import-form-mode` so
reloads land on the same view (default: simple). The boolean alias
`isAdvancedMode` is retained internally to minimise diff — subsequent
commits reshape the Simple and Power surfaces independently.

Closes B1 from the Batch B Playwright suite.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): simple mode collapsible options, power tabs, switch dialog

Completes the Batch B surface in a single structural pass so Simple and
Power mode can evolve independently:

Simple mode
  - URI input + Ambiguity alert + Import button, plus a collapsible
    "Options" disclosure that exposes ONLY Backend, Model Name,
    Description. Quantizations / MMProj / Model Type / Diffusers fields
    / Custom Preferences are no longer rendered in Simple mode.

Power mode
  - In-page segmented "Preferences · YAML" tab strip. Active tab
    persists to localStorage under `import-form-power-tab`.
  - Preferences tab = the full existing preferences + custom prefs
    panel (no progressive disclosure yet — that's Batch D).
  - YAML tab = the existing CodeEditor. Primary button reads "Create"
    here, "Import Model" everywhere else.

Switch dialog
  - Power -> Simple with non-default prefs (advanced pref keys set,
    any custom-pref key non-empty, or YAML edited away from the
    template) opens a 3-button dialog: Keep & switch / Discard &
    switch / Cancel.
  - Keep preserves all state. Discard resets prefs + customPrefs + YAML
    to defaults. Cancel leaves the user in Power mode.

Page subtitle reflects the current surface (Simple, Power/Preferences,
Power/YAML). Estimate banner renders everywhere except Power/YAML.

Closes B2/B3/B4 from the Batch B Playwright suite.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): expand Options disclosure in Batch A tests

Batch B hid the Backend dropdown behind a collapsible Options disclosure
in Simple mode. The Batch A tests that exercise the dropdown directly
(manual-pick badge, ambiguity chip sets the selected backend, auto-
install warning) now click the disclosure toggle before asserting on
dropdown contents. Test intent is unchanged.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(import): strip decorative icons from field labels

The preference panel had 12 Font Awesome icons decorating field labels
(Backend, Model Name, Description, Quantizations, MMProj Quantizations,
Model Type, Pipeline Type, Scheduler Type, Enable Parameters, Embeddings,
CUDA, plus fa-link on Model URI). Every label screamed equally, flattening
the visual hierarchy.

Remove them. Keep icons where they carry meaning: page-level section
headers, URI format guide entries, primary buttons, the Simple-mode
Options disclosure, the ambiguity alert's fa-lightbulb, the auto-install
note's fa-download, and the Estimated-requirements banner's
fa-memory / fa-microchip / fa-download.

No new behaviour, no layout / spacing changes beyond removing the
orphaned icon margin. Playwright suite green.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): progressive disclosure of preference fields

Cover the Batch D visibility matrix for Power > Preferences: Quantizations,
MMProj Quantizations, and Model Type each render only for the backends that
can consume them, stay visible when the backend is unset, and preserve any
value the user already typed when toggled off and back on. Also pin the
shrunk Description textarea at rows=2.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): progressive disclosure + shorter description textarea

Gate Quantizations, MMProj Quantizations, and Model Type in the Power >
Preferences tab so each field only renders for the backends that can
actually consume it. Backend unset keeps everything visible. Hidden
fields' state is preserved (the JSX wrapper is guarded, not the
underlying prefs state) so users flipping backends back and forth don't
lose input.

Also shrink the Description textarea from rows=3 to rows=2 — it's
shared between Simple Options and Power Preferences so the change
applies to both.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): enter-to-submit in Simple mode

Red test for Batch F3 — pressing Enter in the URI input must POST
/models/import-uri, and Enter in the Description textarea must insert
a newline without submitting the form.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): enter-to-submit in Simple mode

Wrap the Simple-mode URI input + ambiguity alert + Options disclosure
in a <form> whose onSubmit calls handleSimpleImport. Pressing Enter in
the URI input (or any Simple-mode text input) now submits the import
without having to move the mouse to the header button. The Description
textarea keeps its native behaviour — Enter inserts a newline.

A hidden submit button is included because the visible Import button
lives outside the form in the page header; some browsers only fire
implicit Enter-submit when the form contains a submit-capable element.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(import,SearchableSelect,components): aria-hidden on decorative icons

Every Font Awesome icon in the import form is decorative — its meaning
is already conveyed by adjacent visible text. Adding aria-hidden="true"
prevents screen readers from announcing the unicode glyph point as
content. Covers ImportModel.jsx (all remaining <i> glyphs) and
SearchableSelect.jsx (the trigger chevron).

AmbiguityAlert and SimplePowerSwitch already set aria-hidden on their
icons when the components landed in Batches A and B — no change needed
there.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(SearchableSelect): responsive dropdown maxHeight + hover focus guard

F2 — replace fixed pixel heights with min(pixel, vh) so the dropdown
and its inner scroll region don't overflow short viewports. Outer
container: 260px -> min(260px, 60vh); inner listbox: 200px ->
min(200px, 50vh). Tall viewports still get the original pixel caps.

F5 — short-circuit onMouseEnter when the hovered row is already the
focused row. Avoids queueing a setFocusIndex call (and a render) for
every mousemove inside the same item — the state would be identical.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* ui(import): aria-label on custom preference rows

The Key / Value inputs and trash button in each Custom Preferences row
previously relied on placeholder text alone. Placeholders are not
accessible names — they vanish on input and screen readers do not
announce them consistently. Add row-indexed aria-labels so assistive
tech can distinguish "Preference key for row 1" from "row 2", and give
the trash button an explicit "Remove this preference" label.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* test(ui/import): modality chip row

Red tests for Batch E — a horizontal modality chip row that filters the
Backend dropdown by modality. Covers visibility in Simple-mode Options
and Power/Preferences (and absence in Power/YAML), filter behaviour,
mismatched-backend clearing with toast, ambiguity-alert auto-selection,
and radiogroup keyboard navigation.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* feat(ui/import): add ModalityChips component + filter integration

Horizontal chip row (Any, Text, Speech, TTS, Image, Embeddings,
Rerankers, Detection, VAD) filters the Backend dropdown options to the
selected modality. Default is Any — no filter, current behaviour.

- New ModalityChips component (radiogroup pattern, roving tabindex,
  arrow-key navigation, Home/End).
- buildBackendOptions now accepts an optional modalityFilter so grouped
  output is narrowed before rendering.
- Chips render inside Simple-mode Options disclosure and Power >
  Preferences tab. Power > YAML stays unaffected.
- Switching the filter drops a mismatched backend selection and
  surfaces a toast so the auto-clear is visible.
- Ambiguity alerts auto-activate the matching chip so users see only
  relevant backends even if they dismiss the alert.

Tightens the Batch E tests' option-matching to the label <span> so the
"↵" keybind hint on the focused row doesn't break accessible-name
lookups.

Assisted-by: Claude:claude-opus-4-7[1m] [Agent]

* fix(ui/import): rename Power to Advanced + stop URI-formats toggle from submitting form

The "Supported URI Formats" disclosure button inside the Simple-mode form
lacked an explicit type attribute, so it defaulted to type="submit". Every
click triggered the form's onSubmit and surfaced the empty-URI validation
toast ("Please enter a model URI"). Marking it type="button" lets it
behave as a pure toggle.

While here, rename the user-visible "Power" label to "Advanced" in the
mode switch (button text + tooltip) and the Power-mode tab's aria-label,
matching the term users actually expect. The internal mode key stays
'power' so tests, localStorage, and data-testid selectors are untouched.

Assisted-by: Claude:claude-opus-4-7

* fix(system): fall back to cpu when meta backend lacks default capability

Meta backends like vllm and sglang enumerate concrete variants for
nvidia/amd/intel/cpu but omit a default: catch-all entry. On a no-GPU
host the reported capability is "default", so the previous Capability()
returned "default" unconditionally on a miss — IsCompatibleWith then saw
no "default" key and filtered the meta out of AvailableBackends. The
import flow's auto-install step then failed with "no backend found with
name <meta>", contradicting the UI's promise that the backend would be
downloaded on demand.

Try the explicit "default" key first, then fall back to "cpu" before
giving up. vllm now resolves to cpu-vllm on CPU-only Linux without
touching the gallery YAML.

Assisted-by: Claude:claude-opus-4-7
2026-04-22 22:42:37 +02:00
orbisai0security
bbeacf140d fix: remove unsafe sprintf() in grpc-server.cpp (#9486)
fix: V-001 security vulnerability

Automated security fix generated by Orbis Security AI
2026-04-22 21:57:29 +02:00
LocalAI [bot]
6820ec468f chore(model gallery): 🤖 add 1 new models via gallery agent (#9491)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 21:56:11 +02:00
Ettore Di Giacinto
20baec77ab feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480)
* feat(face-recognition): add insightface backend for 1:1 verify, 1:N identify, embedding, detection, analysis

Adds face recognition as a new first-class capability in LocalAI via the
`insightface` Python backend, with a pluggable two-engine design so
non-commercial (insightface model packs) and commercial-safe
(OpenCV Zoo YuNet + SFace) models share the same gRPC/HTTP surface.

New gRPC RPCs (backend/backend.proto):
  * FaceVerify(FaceVerifyRequest) returns FaceVerifyResponse
  * FaceAnalyze(FaceAnalyzeRequest) returns FaceAnalyzeResponse

Existing Embedding and Detect RPCs are reused (face image in
PredictOptions.Images / DetectOptions.src) for face embedding and
face detection respectively.

New HTTP endpoints under /v1/face/:
  * verify     — 1:1 image pair same-person decision
  * analyze    — per-face age + gender (emotion/race reserved)
  * register   — 1:N enrollment; stores embedding in vector store
  * identify   — 1:N recognition; detect → embed → StoresFind
  * forget     — remove a registered face by opaque ID

Service layer (core/services/facerecognition/) introduces a
`Registry` interface with one in-memory `storeRegistry` impl backed
by LocalAI's existing local-store gRPC vector backend. HTTP handlers
depend on the interface, not on StoresSet/StoresFind directly, so a
persistent PostgreSQL/pgvector implementation can be slotted in via a
single constructor change in core/application (TODO marker in the
package doc).

New usecase flag FLAG_FACE_RECOGNITION; insightface is also wired
into FLAG_DETECTION so /v1/detection works for face bounding boxes.

Gallery (backend/index.yaml) ships three entries:
  * insightface-buffalo-l   — SCRFD-10GF + ArcFace R50 + genderage
                              (~326MB pre-baked; non-commercial research use only)
  * insightface-opencv      — YuNet + SFace (~40MB pre-baked; Apache 2.0)
  * insightface-buffalo-s   — SCRFD-500MF + MBF (runtime download; non-commercial)

Python backend (backend/python/insightface/):
  * engines.py — FaceEngine protocol with InsightFaceEngine and
    OnnxDirectEngine; resolves model paths relative to the backend
    directory so the same gallery config works in docker-scratch and
    in the e2e-backends rootfs-extraction harness.
  * backend.py — gRPC servicer implementing Health, LoadModel, Status,
    Embedding, Detect, FaceVerify, FaceAnalyze.
  * install.sh — pre-bakes buffalo_l + OpenCV YuNet/SFace inside the
    backend directory so first-run is offline-clean (the final scratch
    image only preserves files under /<backend>/).
  * test.py — parametrized unit tests over both engines.

Tests:
  * Registry unit tests (go test -race ./core/services/facerecognition/...)
    — in-memory fake grpc.Backend, table-driven, covers register/
    identify/forget/error paths + concurrent access.
  * tests/e2e-backends/backend_test.go extended with face caps
    (face_detect, face_embed, face_verify, face_analyze); relative
    ordering + configurable verifyCeiling per engine.
  * Makefile targets: test-extra-backend-insightface-buffalo-l,
    -opencv, and the -all aggregate.
  * CI: .github/workflows/test-extra.yml gains tests-insightface-grpc,
    auto-triggered by changes under backend/python/insightface/.

Docs:
  * docs/content/features/face-recognition.md — feature page with
    license table, quickstart (defaults to the commercial-safe model),
    models matrix, API reference, 1:N workflow, storage caveats.
  * Cross-refs in object-detection.md, stores.md, embeddings.md, and
    whats-new.md.
  * Contributor README at backend/python/insightface/README.md.

Verified end-to-end:
  * buffalo_l: 6/6 specs (health, load, face_detect, face_embed,
    face_verify, face_analyze).
  * opencv: 5/5 specs (same minus face_analyze — SFace has no
    demographic head; correctly skipped via BACKEND_TEST_CAPS).

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): move engine selection to model gallery, collapse backend entries

The previous commit put engine/model_pack options on backend gallery
entries (`backend/index.yaml`). That was wrong — `GalleryBackend`
(core/gallery/backend_types.go:32) has no `options` field, so the
YAML decoder silently dropped those keys and all three "different
insightface-*" backend entries resolved to the same container image
with no distinguishing configuration.

Correct split:

  * `backend/index.yaml` now has ONE `insightface` backend entry
    shipping the CPU + CUDA 12 container images. The Python backend
    bundles both the non-commercial insightface model packs
    (buffalo_l / buffalo_s) and the commercial-safe OpenCV Zoo
    weights (YuNet + SFace); the active engine is selected at
    LoadModel time via `options: ["engine:..."]`.

  * `gallery/index.yaml` gains three model entries —
    `insightface-buffalo-l`, `insightface-opencv`,
    `insightface-buffalo-s` — each setting the appropriate
    `overrides.backend` + `overrides.options` so installing one
    actually gives the user the intended engine. This matches how
    `rfdetr-base` lives in the model gallery against the `rfdetr`
    backend.

The earlier e2e tests passed despite this bug because the Makefile
targets pass `BACKEND_TEST_OPTIONS` directly to LoadModel via gRPC,
bypassing any gallery resolution entirely. No code changes needed.

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): cover all supported models in the gallery + drop weight baking

Follows up on the model-gallery split: adds entries for every model
configuration either engine actually supports, and switches weight
delivery from image-baked to LocalAI's standard gallery mechanism.

Gallery now has seven `insightface-*` model entries (gallery/index.yaml):

  insightface (family)  — non-commercial research use
    • buffalo-l   (326MB)  — SCRFD-10GF + ResNet50 + genderage, default
    • buffalo-m   (313MB)  — SCRFD-2.5GF + ResNet50 + genderage
    • buffalo-s   (159MB)  — SCRFD-500MF + MBF + genderage
    • buffalo-sc  (16MB)   — SCRFD-500MF + MBF, recognition only
                             (no landmarks, no demographics — analyze
                             returns empty attributes)
    • antelopev2  (407MB)  — SCRFD-10GF + ResNet100@Glint360K + genderage

  OpenCV Zoo family — Apache 2.0 commercial-safe
    • opencv       — YuNet + SFace fp32 (~40MB)
    • opencv-int8  — YuNet + SFace int8 (~12MB, ~3x smaller, faster on CPU)

Model weights are no longer baked into the backend image. The image
now ships only the Python runtime + libraries (~275MB content size,
~1.18GB disk vs ~1.21GB when weights were baked). Weights flow through
LocalAI's gallery mechanism:

  * OpenCV variants list `files:` with ONNX URIs + SHA-256, so
    `local-ai models install insightface-opencv` pulls them into the
    models directory exactly like any other gallery-managed model.

  * insightface packs (upstream distributes .zip archives only, not
    individual ONNX files) auto-download on first LoadModel via
    FaceAnalysis' built-in machinery, rooted at the LocalAI models
    directory so they live alongside everything else — same pattern
    `rfdetr` uses with `inference.get_model()`.

Backend changes (backend/python/insightface/):

  * backend.py — LoadModel propagates `ModelOptions.ModelPath` (the
    LocalAI models directory) to engines via a `_model_dir` hint.
    This replaces the earlier ModelFile-dirname approach; ModelPath
    is the canonical "models directory" variable set by the Go loader
    (pkg/model/initializers.go:144) and is always populated.

  * engines.py::_resolve_model_path — picks up `model_dir` and searches
    it (plus basename-in-model-dir) before falling back to the dev
    script-dir. This is how OnnxDirectEngine finds gallery-downloaded
    YuNet/SFace files by filename only.

  * engines.py::_flatten_insightface_pack — new helper that works
    around an upstream packaging inconsistency: buffalo_l/s/sc zips
    expand flat, but buffalo_m and antelopev2 zips wrap their ONNX
    files in a redundant `<name>/` directory. insightface's own
    loader looks one level too shallow and fails. We call
    `ensure_available()` explicitly, flatten if nested, then hand to
    FaceAnalysis.

  * engines.py::InsightFaceEngine.prepare — root-resolution order now
    includes the `_model_dir` hint so packs download into the LocalAI
    models directory by default.

  * install.sh — no longer pre-downloads any weights. Everything is
    gallery-managed now.

  * smoke.py (new) — parametrized smoke test that iterates over every
    gallery configuration, simulating the LocalAI install flow
    (creates a models dir, fetches OpenCV files with checksum
    verification, lets insightface auto-download its packs), then
    runs detect + embed + verify (+ analyze where supported) through
    the in-process BackendServicer.

  * test.py — OnnxDirectEngineTest no longer hardcodes `/models/opencv/`
    paths; downloads ONNX files to a temp dir at setUpClass time and
    passes ModelPath accordingly.

Registry change (core/services/facerecognition/store_registry.go):

  * `dim=0` in NewStoreRegistry now means "accept whatever dimension
    arrives" — needed because the backend supports 512-d ArcFace/MBF
    and 128-d SFace via the same Registry. A non-zero dim still fails
    fast with ErrDimensionMismatch.

  * core/application plumbs `faceEmbeddingDim = 0`, explaining the
    rationale in the comment.

Backend gallery description updated to reflect that the image carries
no weights — it's just Python + engines.

Smoke-tested all 7 configurations against the rebuilt image (with the
flatten fix applied), exit 0:

    PASS: insightface-buffalo-l    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-sc   faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-s    faces=6 dim=512 same-dist=0.000
    PASS: insightface-buffalo-m    faces=6 dim=512 same-dist=0.000
    PASS: insightface-antelopev2   faces=6 dim=512 same-dist=0.000
    PASS: insightface-opencv       faces=6 dim=128 same-dist=0.000
    PASS: insightface-opencv-int8  faces=6 dim=128 same-dist=0.000
    7/7 passed

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): pre-fetch OpenCV ONNX for e2e target; drop stale pre-baked claim

CI regression from the previous commit: I moved OpenCV Zoo weight
delivery to LocalAI's gallery `files:` mechanism, but the
test-extra-backend-insightface-opencv target was still passing
relative paths `detector_onnx:models/opencv/yunet.onnx` in
BACKEND_TEST_OPTIONS. The e2e suite drives LoadModel directly over
gRPC without going through the gallery, so those relative paths
resolved to nothing and OpenCV's ONNXImporter failed:

    LoadModel failed: Failed to load face engine:
    OpenCV(4.13.0) ... Can't read ONNX file: models/opencv/yunet.onnx

Fix: add an `insightface-opencv-models` prerequisite target that
fetches the two ONNX files (YuNet + SFace) to a deterministic host
cache at /tmp/localai-insightface-opencv-cache/, verifies SHA-256,
and skips the download on re-runs. The opencv test target depends on
it and passes absolute paths in BACKEND_TEST_OPTIONS, so the backend
finds the files via its normal absolute-path resolution branch.

Also refresh the buffalo_l comment: it no longer says "pre-baked"
(nothing is — the pack auto-downloads from upstream's GitHub release
on first LoadModel, same as in CI).

Locally verified: `make test-extra-backend-insightface-opencv` passes
5/5 specs (health, load, face_detect, face_embed, face_verify).

Assisted-by: Claude:claude-opus-4-7

* feat(face-recognition): add POST /v1/face/embed + correct /v1/embeddings docs

The docs promised that /v1/embeddings returns face vectors when you
send an image data-URI. That was never true: /v1/embeddings is
OpenAI-compatible and text-only by contract — its handler goes
through `core/backend/embeddings.go::ModelEmbedding`, which sets
`predictOptions.Embeddings = s` (a string of TEXT to embed) and never
populates `predictOptions.Images[]`. The Python backend's Embedding
gRPC method does handle Images[] (that's how /v1/face/register reaches
it internally via `backend.FaceEmbed`), but the HTTP embeddings
endpoint wasn't wired to populate it.

Rather than overload /v1/embeddings with image-vs-text detection —
messy, and the endpoint is OpenAI-compatible by design — add a
dedicated /v1/face/embed endpoint that wraps `backend.FaceEmbed`
(already used internally by /v1/face/register and /v1/face/identify).

Matches LocalAI's convention of a dedicated path per non-standard flow
(/v1/rerank, /v1/detection, /v1/face/verify etc.).

Response:

    {
      "embedding": [<dim> floats, L2-normed],
      "dim": int,           // 512 for ArcFace R50 / MBF, 128 for SFace
      "model": "<name>"
    }

Live-tested on the opencv engine: returns a 128-d L2-normalized vector
(sum(x^2) = 1.0000). Sentinel in docs updated to note /v1/embeddings
is text-only and point image users at /v1/face/embed instead.

Assisted-by: Claude:claude-opus-4-7

* fix(http): map malformed image input + gRPC status codes to proper 4xx

Image-input failures on LocalAI's single-image endpoints (/v1/detection,
/v1/face/{verify,analyze,embed,register,identify}) have historically
returned 500 — even when the client was the one who sent garbage.
Classic example: you POST an "image" that isn't a URL, isn't a
data-URI, and isn't a valid JPEG/PNG — the server shouldn't claim
that's its fault.

Two helpers land in core/http/endpoints/localai/images.go and every
single-image handler is switched over:

  * decodeImageInput(s)
      Wraps utils.GetContentURIAsBase64 and turns any failure
      (invalid URL, not a data-URI, download error, etc.) into
      echo.NewHTTPError(400, "invalid image input: ...").

  * mapBackendError(err)
      Inspects the gRPC status on a backend call error and maps:
        INVALID_ARGUMENT     → 400 Bad Request
        NOT_FOUND            → 404 Not Found
        FAILED_PRECONDITION  → 412 Precondition Failed
        Unimplemented        → 501 Not Implemented
      All other codes fall through unchanged (still 500).

Before, my 1×1 PNG error-path test returned:
    HTTP 500 "rpc error: code = InvalidArgument desc = failed to decode one or both images"
After:
    HTTP 400 "failed to decode one or both images"

Scope-limited to the LocalAI single-image endpoints. The multi-modal
paths (middleware/request.go, openresponses/responses.go,
openai/realtime.go) intentionally log-and-skip individual media parts
when decoding fails — different design intent (graceful degradation
of a multi-part message), not a 400-worthy failure. Left untouched.

Live-verified: every error case in /tmp/face_errors.py now returns
4xx with a meaningful message; the "image with no face (1x1 PNG)"
case specifically went from 500 → 400.

Assisted-by: Claude:claude-opus-4-7

* refactor(face-recognition): insightface packs go through gallery files:, drop FaceAnalysis

Follows up on the discovery that LocalAI's gallery `files:` mechanism
handles archives (zip, tar.gz, …) via mholt/archiver/v3 — the rhasspy
piper voices use exactly this pattern. Insightface packs are zip
archives, so we can now deliver them the same way every other
gallery-managed model gets delivered: declaratively, checksum-verified,
through LocalAI's standard download+extract pipeline.

Two changes:

1. Gallery (gallery/index.yaml) — every insightface-* entry gains a
   `files:` list with the pack zip's URI + SHA-256. `local-ai models
   install insightface-buffalo-l` now fetches the zip, verifies the
   hash, and extracts it into the models directory. No more reliance
   on insightface's library-internal `ensure_available()` auto-download
   or its hardcoded `BASE_REPO_URL`.

2. InsightFaceEngine (backend/python/insightface/engines.py) — drops
   the FaceAnalysis wrapper and drives insightface's `model_zoo`
   directly. The ~50 lines FaceAnalysis provides — glob ONNX files,
   route each through `model_zoo.get_model()`, build a
   `{taskname: model}` dict, loop per-face at inference — are
   reimplemented in `InsightFaceEngine`. The actual inference classes
   (RetinaFace, ArcFaceONNX, Attribute, Landmark) are still
   insightface's — we only replicate the glue, so drift risk against
   upstream is minimal.

   Why drop FaceAnalysis: it hard-codes a `<root>/models/<name>/*.onnx`
   layout that doesn't match what LocalAI's zip extraction produces.
   LocalAI unpacks archives flat into `<models_dir>`. Upstream packs
   are inconsistent — buffalo_l/s/sc ship ONNX at the zip root (lands
   at `<models_dir>/*.onnx`), buffalo_m/antelopev2 wrap in a redundant
   `<name>/` dir (lands at `<models_dir>/<name>/*.onnx`). The new
   `_locate_insightface_pack` helper searches both locations plus
   legacy paths and returns whichever has ONNX files. Replaces the
   earlier `_flatten_insightface_pack` helper (which tried to fight
   FaceAnalysis's layout expectations; now we just find the files
   wherever they are).

Net effect for users: install once via LocalAI's managed flow,
weights live alongside every other model, progress shows in the
jobs endpoint, no first-load network call. Same API surface,
cleaner plumbing.

Assisted-by: Claude:claude-opus-4-7

* fix(face-recognition): CI's insightface e2e path needs the pack pre-fetched

The e2e suite drives LoadModel over gRPC without going through LocalAI's
gallery flow, so the engine's `_model_dir` option (normally populated
from ModelPath) is empty. Previously the insightface target relied on
FaceAnalysis auto-download to paper over this, but we dropped
FaceAnalysis in favor of direct model_zoo calls — so the buffalo_l
target started failing at LoadModel with "no insightface pack found".

Mirror the opencv target's pre-fetch pattern: download buffalo_sc.zip
(same SHA as the gallery entry), extract it on the host, and pass
`root:<dir>` so the engine locates the pack without needing
ModelPath. Switched to buffalo_sc (smallest pack, ~16MB) to keep CI
fast; it covers the same insightface engine code path as buffalo_l.

Face analyze cap dropped since buffalo_sc has no age/gender head.

Assisted-by: Claude:claude-opus-4-7[1m]

* feat(face-recognition): surface face-recognition in advertised feature maps

The six /v1/face/* endpoints were missing from every place LocalAI
advertises its feature surface to clients:

  * api_instructions — the machine-readable capability index at
    GET /api/instructions. Added `face-recognition` as a dedicated
    instruction area with an intro that calls out the in-memory
    registry caveat and the /v1/face/embed vs /v1/embeddings split.
  * auth/permissions — added FeatureFaceRecognition constant, routed
    all six face endpoints through it so admins can gate them per-user
    like any other API feature. Default ON (matches the other API
    features).
  * React UI capabilities — CAP_FACE_RECOGNITION symbol mapped to
    FLAG_FACE_RECOGNITION. Declared only for now; the Face page is a
    follow-up (noted in the plan).

Instruction count bumped 9 → 10; test updated.

Assisted-by: Claude:claude-opus-4-7[1m]

* docs(agents): capture advertising-surface steps in the endpoint guide

Before this change, adding a new /v1/* endpoint reliably missed one or
more of: the swagger @Tags annotation, the /api/instructions registry,
the auth RouteFeatureRegistry, and the React UI CAP_* symbol. The
endpoint would work but be invisible to API consumers, admins, and the
UI — and nothing in the existing docs said to look in those places.

Extend .agents/api-endpoints-and-auth.md with a new "Advertising
surfaces" section covering all four surfaces (swagger tags, /api/
instructions, capabilities.js, docs/), and expand the closing checklist
so it's impossible to ship a feature without visiting each one. Hoist a
one-liner reminder into AGENTS.md's Quick Reference so agents skim it
before diving in.

Assisted-by: Claude:claude-opus-4-7[1m]
2026-04-22 21:55:41 +02:00
Richard Palethorpe
d16f19f1eb fix(kokoros): Build and publish the backend images from CI/CD (#9487)
* fix(kokoros): Build and publish the backend images from CI/CD

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Delete .claude/agents

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/commands

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/settings.json

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Delete .claude/skills

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-22 13:19:55 +02:00
LocalAI [bot]
cd7b035716 chore: ⬆️ Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 (#9479)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 08:58:19 +02:00
LocalAI [bot]
0f3bb2d647 chore(model gallery): 🤖 add 1 new models via gallery agent (#9481)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-22 08:22:05 +02:00
Adira
607efe5a4c fix(backend-monitor): accept model as a query parameter (#9411)
The /backend/monitor endpoint is routed as GET but its handler bound the
model name from a request body, which is invalid per REST and breaks
Swagger UI and OpenAPI codegen tools that refuse to send bodies with GET.

Switch to reading ?model=<name> as a query parameter and update the
Swagger annotation, regenerated spec files, and documentation. The
handler still falls back to body binding when the query parameter is
absent, so existing clients sending {"model": "..."} continue to work.

Fixes #9207

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-04-21 22:06:35 +02:00
Ettore Di Giacinto
7d8c1d5e45 fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush (#9470)
* fix(streaming): dedupe content, recover reasoning, unique tool IDs

When tool calls are discovered only during final parsing (after the
streaming token callback returns), processTools' default switch branch
used to emit the full accumulated content alongside the tool_call args
chunk. Clients that accumulate delta.content per the OpenAI streaming
contract end up showing every narration line twice. Three related bugs
in the same flush path:

1. Content duplication: the args chunk carried Content:textContentToReturn
   even though the text had already been streamed token-by-token via
   the token callback, so delta.content was both the running total and
   bundled with tool_calls in one delta (two spec violations).
2. Reasoning drop: when the C++ autoparser surfaces reasoning only as
   a final aggregate (no incremental tokens), the callback never emits
   it and the flush branch didn't either, silently losing it.
3. tool_call ID collision: empty ss.ID fell back to the request id, so
   multiple empty-ID calls in the same turn all shared the same id,
   breaking tool_result matching by tool_call_id.

Extracted the block into buildDeferredToolCallChunks (pure function,
unit-testable) and added 19 Ginkgo specs covering streamed vs.
not-streamed content/reasoning, single vs. multi call, and
incremental-vs-deferred emission. Every case asserts the invariant
that no delta carries both non-empty Content/Reasoning and non-empty
ToolCalls.

Fix summary:
- emit reasoning in its own leading chunk when !reasoningAlreadyStreamed
- emit role+content in their own chunks when !contentAlreadyStreamed
- drop Content from the tool_call args chunk
- fallback to fmt.Sprintf("%s-%d", id, i) for empty ss.ID so calls stay
  uniquely addressable

Reproduced live against qwen3.6-35b-a3b-apex served by LocalAI with
the C++ autoparser; the full-content replay chunk that preceded each
tool_calls block is gone after the fix.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): dedupe reasoning in the noActionToRun final chunk

extractor.Reasoning() returns only the Go-side extractor's lastReasoning
accumulator (pkg/reasoning/extractor.go:129). ChatDelta reasoning
coming through ProcessChatDeltaReasoning lives in a separate
accumulator (cdLastStrippedReasoning) that Reasoning() does not
expose. The "reasoning != \"\" && extractor.Reasoning() == \"\"" guard
therefore fires exactly when the autoparser streamed reasoning
incrementally via the callback — producing a duplicate final delivery.

Replace both guard sites in the noActionToRun branch with the
sentReasoning flag introduced in the previous commit. Extract the
closing-chunk logic into buildNoActionFinalChunks so the refactor is
testable; the helper mirrors buildDeferredToolCallChunks.

Add Ginkgo coverage for both the content-streamed and
content-not-streamed paths: reasoning is dropped when it was streamed,
delivered once when it arrived only as a final aggregate, and omitted
when empty. Metadata invariants carried over from the sibling helper.

Assisted-by: Claude:claude-opus-4-7 go vet

* fix(streaming): detect noActionToRun anywhere in functionResults

The previous condition only looked at functionResults[0].Name, which
misbehaved when a real tool call followed a noAction sentinel — the
noAction shadowed the real call and the whole turn was treated as a
question to answer, silently dropping the tool call. The mirror case,
[realCall, noActionCall], fell into the default branch and emitted the
noAction entry as if it were a real tool_call.

Replace with hasRealCall, which scans the slice and returns true as
soon as it finds a non-noAction entry. noActionToRun now matches the
semantic intent: "every entry is the noAction sentinel (or the slice
is empty)".

Note: this does not change incremental emission, where noAction
entries may still be forwarded as tool_call chunks by the XML/JSON
iterative parsers. That is a separate layer (functions.Parse*) and
addressing it requires threading noAction through the parser APIs —
out of scope for this change.

Assisted-by: Claude:claude-opus-4-7 go vet
2026-04-21 21:59:33 +02:00
leinasi2014
d18d434bb2 Respect explicit reasoning config during GGUF thinking probe (#9463)
Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-21 21:53:10 +02:00
Ettore Di Giacinto
39573ecd2a chore(whisperx): drop ROCm/hipblas build target (#9474)
whisperx has no upstream AMD GPU support and its core transcription path
(faster-whisper -> ctranslate2) falls back to CPU on AMD since the PyPI
ctranslate2 is CUDA-only. The torch rocm wheels would accelerate only the
alignment/diarization stages, producing a misleadingly half-working image.

Drop the hipblas variant rather than shipping a partially accelerated build
users can't distinguish from the real thing. AMD hosts now fall through
the capability map to cpu-whisperx / cpu-whisperx-development.

Also removes the now-dangling rocm-whisperx assertion from
pkg/system/capabilities_test.go and the ROCm mention from the whisperx
row in docs/content/reference/compatibility-table.md.

Assisted-by: Claude Code:claude-opus-4-7
2026-04-21 21:50:18 +02:00
Ettore Di Giacinto
a7dbb2a83d fix(gallery-agent): process blacklist command on recently-closed PRs (#9473)
The command-processing step only walked open PRs, so when a maintainer
wrote `/gallery-agent blacklist` and immediately closed the PR, the
next scheduled run missed the command, the `gallery-agent/blacklisted`
label was never applied, and the skip-URL step (which only pulls URLs
from closed PRs carrying that label) re-proposed the model on the next
cron.

Also scan closed gallery-agent PRs from the last 14 days that don't
already carry the blacklist label, and apply the label retroactively
when the command is present. Close/recreate actions still only run on
open PRs.

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 16:29:13 +02:00
dependabot[bot]
3ad9b16c29 chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 (#9455)
Bumps [github.com/coreos/go-oidc/v3](https://github.com/coreos/go-oidc) from 3.17.0 to 3.18.0.
- [Release notes](https://github.com/coreos/go-oidc/releases)
- [Commits](https://github.com/coreos/go-oidc/compare/v3.17.0...v3.18.0)

---
updated-dependencies:
- dependency-name: github.com/coreos/go-oidc/v3
  dependency-version: 3.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:31:02 +02:00
dependabot[bot]
c806d5ab73 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 (#9456)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.14 to 1.32.16.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.14...config/v1.32.16)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.16
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 15:30:22 +02:00
LocalAI [bot]
47efaf5b43 Fix: Add model parameter to neutts-air gallery definition (#8793)
fix: Add model parameter to neutts-air gallery definition

The neutts-air model entry was missing the 'model' parameter in its
configuration, which caused LocalAI to fail with an 'Unrecognized model'
error when trying to use it. This change adds the required model parameter
pointing to the HuggingFace repository (neuphonic/neutts-air) so the backend
can properly load the model.

Fixes #8792

Signed-off-by: localai-bot <localai-bot@example.com>
Co-authored-by: localai-bot <localai-bot@example.com>
2026-04-21 11:56:00 +02:00
LocalAI [bot]
315b634a91 feat: improve CLI error messages with actionable guidance (#8880)
- transcript.go: Model not found error now suggests available models commands
- util.go: GGUF error explains format and how to get models
- worker_p2p.go: Token error explains purpose and how to obtain one
- run.go: Startup failure includes troubleshooting steps and docs link
- model_config_loader.go: Config validation errors include file path and guidance

Refs: H2 - UX Review Issue

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 11:53:26 +02:00
dependabot[bot]
6b245299d7 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 (#9454)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.4.1 to 1.5.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.4.1...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:43:00 +02:00
dependabot[bot]
677c0315c1 chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 (#9453)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.30 to 1.7.31.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.30...v1.7.31)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-version: 1.7.31
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:43 +02:00
dependabot[bot]
478522ce4d chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 (#9452)
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.97.1 to 1.99.1.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.97.1...service/s3/v1.99.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-version: 1.99.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-21 11:42:27 +02:00
Ettore Di Giacinto
c54897ad44 fix(tests): update InstallBackend call sites for new URI/Name/Alias params (#9467)
Commit 02bb715c (#9446) added uri, name, alias parameters to
RemoteUnloaderAdapter.InstallBackend but missed the e2e test call
sites, breaking the distributed test build. Pass empty strings to
match the pattern used by the other non-URI call sites.

Assisted-by: Claude Code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-21 11:41:38 +02:00
LocalAI [bot]
8bb1e8f21f chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 (#9448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:15:45 +02:00
LocalAI [bot]
cd94a0b61a chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe (#9450)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:09:05 +02:00
LocalAI [bot]
047bc48fa9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9464)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 11:07:07 +02:00
sec171
01bd8ae5d0 [gallery] Fix duplicate sha256 keys in Wan models (#9461)
Fix duplicate sha256 keys in wan models gallery

The wan models previously defined the `sha256` key twice in their files lists,
which triggered strict mapping key checks in the YAML parser and resulted
in unmarshal errors that crashed the `/api/models` loading. This removes
the redundant trailing `sha256` keys from the Wan model definitions.

Assisted-by: Antigravity:Gemini-3.1-Pro-High [multi_replace_file_content, run_command]

Signed-off-by: Alex <codecrusher24@gmail.com>
2026-04-21 11:06:36 +02:00
LocalAI [bot]
d9808769be chore(model-gallery): ⬆️ update checksum (#9451)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:58 +02:00
LocalAI [bot]
5973c0a9df chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e (#9447)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-21 00:07:19 +02:00
leinasi2014
486b5e25a3 fix(config): ignore yaml backup files in model loader (#9443)
Only load files whose real extension is .yaml or .yml so backup files like model.yaml.bak do not override active configs. Add a regression test covering plain and timestamped backup files.

Assisted-by: Codex:gpt-5.4 docker

Signed-off-by: leinasi2014 <leinasi2014@gmail.com>
2026-04-20 23:41:39 +02:00
Russell Sim
c66c41e8d7 fix(ci): wire AMDGPU_TARGETS through backend build workflow (#9445)
Commit 8839a71c exposed AMDGPU_TARGETS as an ARG/ENV in
Dockerfile.llama-cpp so GPU targets could be overridden, but never
wired the value through the CI workflow inputs. Without it, Docker
receives AMDGPU_TARGETS="" which overrides the Makefile's ?= default,
causing all hipblas builds to compile only for gfx906 regardless of
the target list in the Makefile.

Add amdgpu-targets as a workflow_call input with the same default list
as the Makefile, and pass it as AMDGPU_TARGETS in the build-args of
both the push and PR build steps.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:41:19 +02:00
Russell Sim
02bb715c0a fix(distributed): pass ExternalURI through NATS backend install (#9446)
When installing a backend with a custom OCI URI in distributed mode,
the URI was captured in ManagementOp.ExternalURI by the HTTP handler
but never forwarded to workers. BackendInstallRequest had no URI field,
so workers fell through to the gallery lookup and failed with
"no backend found with name <custom-name>".

Add URI/Name/Alias fields to BackendInstallRequest and thread them from
ManagementOp through DistributedBackendManager.InstallBackend() and the
RemoteUnloaderAdapter. On the worker side, route to InstallExternalBackend
when URI is set instead of InstallBackendFromGallery. Update all
remaining InstallBackend call sites (UpgradeBackend, reconciler
pending-op drain, router auto-install) to pass empty strings for the
new params.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed-off-by: Russell Sim <rsl@simopolis.xyz>
2026-04-20 23:39:35 +02:00
Ettore Di Giacinto
8ab56e2ad3 feat(gallery): add wan i2v 720p (#9457)
feat(gallery): add Wan 2.1 I2V 14B 720P + pin all wan ggufs by sha256

Adds a new entry for the native-720p image-to-video sibling of the
480p I2V model (wan-2.1-i2v-14b-480p-ggml). The 720p I2V model is
trained purely as image-to-video — no first-last-frame interpolation
path — so motion is freer than repurposing the FLF2V 720P variant as
an i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h
auxiliary files as the existing 480p I2V and 720p FLF2V entries, so
no new aux downloads are introduced.

Also pins the main diffusion gguf by sha256 for the new entry and for
the three existing wan entries that were previously missing a hash
(wan-2.1-t2v-1.3b-ggml, wan-2.1-i2v-14b-480p-ggml,
wan-2.1-flf2v-14b-720p-ggml). Hashes were fetched from HuggingFace's
x-linked-etag header per .agents/adding-gallery-models.md.

Assisted-by: Claude:claude-opus-4-7
2026-04-20 23:34:11 +02:00
pjbrzozowski
ecf85fde9e fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427)
The API Traces tab in /app/traces always showed (0) traces despite requests
being recorded.

The /api/traces endpoint was registered in both localai.go and ui_api.go.
The ui_api.go version wrapped the response as {"traces": [...]} instead of
the flat []APIExchange array that both the React UI (Traces.jsx) and the
legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go
handler, Array.isArray(apiData) always returned false, making the API Traces
tab permanently empty.

Remove the duplicate endpoints from ui_api.go so only the correct flat-array
version in localai.go is served.

Also use mime.ParseMediaType for the Content-Type check in the trace
middleware so requests with parameters (e.g. application/json; charset=utf-8)
are still traced.

Signed-off-by: Pawel Brzozowski <paul@ontux.net>
Co-authored-by: Pawel Brzozowski <paul@ontux.net>
2026-04-20 18:44:49 +02:00
Sai Asish Y
6480715a16 fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438)
GET /api/settings returns settings.ApiKeys as the merged env+runtime list
via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and
round-trips it back on POST /api/settings unchanged.

UpdateSettingsEndpoint was then doing:

    appConfig.ApiKeys = append(envKeys, runtimeKeys...)

where runtimeKeys already contained envKeys (because the UI got them from
the merged GET). Every save therefore duplicated the env keys on top of
the previous merge, and also wrote the duplicates to runtime_settings.json
so the duplication survived restarts and compounded with each save. This
is the user-visible behaviour in #9071: the Web UI shows the keys
twice / three times after consecutive saves.

Before we marshal the settings to disk or call ApplyRuntimeSettings, drop
any incoming key that already appears in startupConfig.ApiKeys. The file
on disk now stores only the genuinely runtime-added keys; the subsequent
append(envKeys, runtimeKeys...) produces one copy of each env key, as
intended. Behaviour is unchanged for users who never had env keys set.

Fixes #9071

Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>
2026-04-20 10:36:54 +02:00
Ettore Di Giacinto
f683231811 feat(gallery): add Wan 2.1 FLF2V 14B 720P (#9440)
First-last-frame-to-video variant of the 14B Wan family. Accepts a
start and end reference image and — unlike the pure i2v path — runs
both through clip_vision, so the final frame lands on the end image
both in pixel and semantic space. Right pick for seamless loops
(start_image == end_image) and narrative A→B cuts.

Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the
I2V 14B entry. Options block mirrors i2v's full-list-in-override
style so the template merge doesn't drop fields.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 10:34:36 +02:00
LocalAI [bot]
960757f0e8 chore(model gallery): 🤖 add 1 new models via gallery agent (#9436)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 08:48:47 +02:00
Ettore Di Giacinto
865fd552f5 docs(agents): adopt kernel's AI coding assistants policy
Align LocalAI with the Linux kernel project's policy for AI-assisted
contributions (https://docs.kernel.org/process/coding-assistants.html).

- Add .agents/ai-coding-assistants.md with the full policy adapted to
  LocalAI's MIT license: no Signed-off-by or Co-Authored-By from AI,
  attribute AI involvement via an Assisted-by: trailer, human submitter
  owns the contribution.
- Surface the rules at the entry points: AGENTS.md (and its CLAUDE.md
  symlink) and CONTRIBUTING.md.
- Publish a user-facing reference page at
  docs/content/reference/ai-coding-assistants.md and link it from the
  references index.

Assisted-by: Claude:claude-opus-4-7
2026-04-19 22:50:54 +00:00
LocalAI [bot]
cb77a5a4b9 chore(model gallery): 🤖 add 1 new models via gallery agent (#9425)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:42:44 +02:00
Ettore Di Giacinto
60633c4dd5 fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435)
gen_video's ffmpeg subprocess was relying on the filename extension to
choose the output container. Distributed LocalAI hands the backend a
staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to
.mp4 only after the backend returns, so ffmpeg saw a .tmp extension and
bailed with "Unable to choose an output format". Inference had already
completed and the frames were piped in, producing the cryptic
"video inference failed (code 1)" at the API layer.

Pass -f mp4 explicitly so the container is selected by flag instead of
by filename suffix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 00:41:54 +02:00
Ettore Di Giacinto
9e44944cc1 fix(i2v): Add new options to the model configuration
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-20 00:27:05 +02:00
Ettore Di Giacinto
372eb08dcf fix(gallery): allow uninstalling orphaned meta backends + force reinstall (#9434)
Two interrelated bugs that combined to make a meta backend impossible
to uninstall once its concrete had been removed from disk (partial
install, earlier crash, manual cleanup).

1. DeleteBackendFromSystem returned "meta backend %q not found" and
   bailed out early when the concrete directory didn't exist,
   preventing the orphaned meta dir from ever being removed. Treat a
   missing concrete as idempotent success — log a warning and continue
   to remove the orphan meta.

2. InstallBackendFromGallery's "already installed, skip" short-circuit
   only checked that the name was known (`backends.Exists(name)`); an
   orphaned meta whose RunFile points at a missing concrete still
   satisfies that check, so every reinstall returned nil without doing
   anything. Afterwards the worker's findBackend returned empty and we
   kept looping with "backend %q not found after install attempt".
   Require the entry to be actually runnable (run.sh stat-able, not a
   directory) before skipping.

New helper isBackendRunnable centralises the runnability test so both
the install guard and future callers stay in sync. Tests cover the
orphaned-meta delete path and the non-runnable short-circuit case.
2026-04-20 00:10:19 +02:00
LocalAI [bot]
28091d626e chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 (#9430)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-20 00:01:48 +02:00
LocalAI [bot]
cae79d9107 feat(swagger): update swagger (#9431)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:50 +02:00
LocalAI [bot]
babbbc6ec8 chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad (#9429)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:19 +02:00
LocalAI [bot]
3804497186 chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 (#9428)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 23:39:07 +02:00
Ettore Di Giacinto
fda1c553a1 fix(distributed): stop queue loops on agent nodes + dead-letter cap (#9433)
pending_backend_ops rows targeting agent-type workers looped forever:
the reconciler fan-out hit a NATS subject the worker doesn't subscribe
to, returned ErrNoResponders, we marked the node unhealthy, and the
health monitor flipped it back to healthy on the next heartbeat. Next
tick, same row, same failure.

Three related fixes:

1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend.
   Agent workers handle agent NATS subjects, not backend.install /
   delete / list, so enqueueing for them guarantees an infinite retry
   loop. Silent skip is correct — they aren't consumers of these ops.

2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on
   nats.ErrNoResponders: mark the node unhealthy before recording the
   failure, so subsequent ListDuePendingBackendOps (filters by
   status=healthy) stops picking the row until the node actually
   recovers. Matches the synchronous fan-out path.

3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of
   exponential backoff the row is a poison message; further retries
   just thrash NATS. Row is deleted and logged at ERROR so it stays
   visible without staying infinite.

Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows
that target agent-type nodes, non-existent nodes, or carry an empty
backend name. Guarded by the same schema-migration advisory lock so
only one instance performs it. The guards above prevent new rows of
this shape; this closes the migration gap for existing ones.

Tests: the prune migration (valid row stays, agent + empty-name rows
drop) on top of existing upsert / backoff coverage.
2026-04-19 23:38:43 +02:00
Ettore Di Giacinto
b27de08fff chore(gallery): fixup wan
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 21:31:22 +00:00
Ettore Di Giacinto
510f791ccc feat(gallery): add stablediffusion-ggml-development meta backend 2026-04-19 20:16:33 +00:00
Ettore Di Giacinto
369c50a41c fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423)
* fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc

The upstream PR #21203 (server: respect the ignore_eos flag) has been
merged into the TheTom/llama-cpp-turboquant feature/turboquant-kv-cache
branch. With the fix now in-tree, 0001-server-respect-the-ignore-eos-flag.patch
no longer applies (git apply sees its additions already present) and the
nightly turboquant bump fails.

Retire the patch and bump the pin to the first fork revision that carries
the merged fix (tag feature-turboquant-kv-cache-b8967-627ebbc). This matches
the contract in apply-patches.sh: drop patches once the fork catches up.

* fix(turboquant): patch out get_media_marker() call in grpc-server copy

CI turboquant docker build was failing with:

  grpc-server.cpp:2825:40: error: use of undeclared identifier
  'get_media_marker'

The call was added by 7809c5f5 (PR #9412) to propagate the mtmd random
per-server media marker upstream landed in ggml-org/llama.cpp#21962. The
TheTom/llama-cpp-turboquant fork branched before that PR, so its
server-common.cpp has no such symbol.

Extend patch-grpc-server.sh to substitute get_media_marker() with the
legacy "<__media__>" literal in the build-time grpc-server.cpp copy
under turboquant-<flavor>-build/. The fork's mtmd_default_marker()
returns exactly that string, and the Go layer falls back to the same
sentinel when media_marker is empty, so behavior on the turboquant path
is unchanged. Patched copy only — the shared source under
backend/cpp/llama-cpp/ keeps compiling against vanilla upstream.

Verified by running `make docker-build-turboquant` locally end-to-end:
all five flavors (avx, avx2, avx512, fallback, grpc+rpc-server) now
compile past the previous failure and the image tags successfully.
2026-04-19 21:05:21 +02:00
Ettore Di Giacinto
75a63f87d8 feat(distributed): sync state with frontends, better backend management reporting (#9426)
* fix(distributed): detect backend upgrades across worker nodes

Before this change `DistributedBackendManager.CheckUpgrades` delegated to the
local manager, which read backends from the frontend filesystem. In
distributed deployments the frontend has no backends installed locally —
they live on workers — so the upgrade-detection loop never ran and the UI
silently never surfaced upgrades even when the gallery advertised newer
versions or digests.

Worker-side: NATS backend.list reply now carries Version, URI and Digest
for each installed backend (read from metadata.json).

Frontend-side: DistributedBackendManager.ListBackends aggregates per-node
refs (name, status, version, digest) instead of deduping, and CheckUpgrades
feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint
factored out of CheckBackendUpgrades so both paths share the same core
logic.

Cluster drift policy: when per-node version/digest tuples disagree, the
backend is flagged upgradeable regardless of whether any single node
matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so
operators can see *why* it is out of sync. The next upgrade-all realigns
the cluster.

Tests cover: drift detection, unanimous-match (no upgrade), and the
empty-installed-version path that the old distributed code silently
missed.

* feat(ui): surface backend upgrades in the System page

The System page (Manage.jsx) only showed updates as a tiny inline arrow,
so operators routinely missed them. Port the Backend Gallery's upgrade UX
so System speaks the same visual language:

- Yellow banner at the top of the Backends tab when upgrades are pending,
  with an "Upgrade all" button (serial fan-out, matches the gallery) and a
  "Updates only" filter toggle.
- Warning pill (↑ N) next to the tab label so the count is glanceable even
  when the banner is scrolled out of view.
- Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button
  that silently flipped semantics between Reinstall and Upgrade), plus an
  "Update available" badge in the new Version column.
- New columns: Version (with upgrade + drift chips), Nodes (per-node
  attribution badges for distributed mode, degrading to a compact
  "on N nodes · M offline" chip above three nodes), Installed (relative
  time).
- System backends render a "Protected" chip instead of a bare "—" so rows
  still align and the reason is obvious.
- Delete uses the softer btn-danger-ghost so rows don't scream red; the
  ConfirmDialog still owns the "are you sure".

The upgrade checker also needed the same per-worker fix as the previous
commit: NewUpgradeChecker now takes a BackendManager getter so its
periodic runs call the distributed CheckUpgrades (which asks workers)
instead of the empty frontend filesystem. Without this the /api/backends/
upgrades endpoint stayed empty in distributed mode even with the protocol
change in place.

New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack,
.cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in
App.css so other pages can adopt them without duplicating styles.

* feat(ui): polish the Nodes page so it reads like a product

The Nodes page was the biggest visual liability in distributed mode.
Rework the main dashboard surfaces in place without changing behavior:

StatCards: uniform height (96px min), left accent bar colored by the
metric's semantic (success/warning/error/primary), icon lives in a
36x36 soft-tinted chip top-right, value is left-aligned and large.
Grid auto-fills so the row doesn't collapse on narrow viewports. This
replaces the previous thin-bordered boxes with inconsistent heights.

Table rows: expandable rows now show a chevron cue on the left (rotates
on expand) so users know rows open. Status cell became a dedicated chip
with an LED-style halo dot instead of a bare bullet. Action buttons gained
labels — "Approve", "Resume", "Drain" — so the icons aren't doing all
the semantic work; the destructive remove action uses the softer
btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog
still owning the real "are you sure". Applied cell-mono/cell-muted
utility classes so label chips and addresses share one spacing/font
grammar instead of re-declaring inline styles everywhere.

Expanded drawer: empty states for Loaded Models and Installed Backends
now render as a proper drawer-empty card (dashed border, icon, one-line
hint) instead of a plain muted string that read like broken formatting.

Tabs: three inline-styled buttons became the shared .tab class so they
inherit focus ring, hover state, and the rest of the design system —
matches the System page.

"Add more workers" toggle turned into a .nodes-add-worker dashed-border
button labelled "Register a new worker" (action voice) instead of a
chevron + muted link that operators kept mistaking for broken text.

New shared CSS primitives carry over to other pages:
.stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty,
.nodes-add-worker.

* feat(distributed): durable backend fan-out + state reconciliation

Two connected problems handled together:

1) Backend delete/install/upgrade used to silently skip non-healthy nodes,
   so a delete during an outage left a zombie on the offline node once it
   returned. The fan-out now records intent in a new pending_backend_ops
   table before attempting the NATS round-trip. Currently-healthy nodes
   get an immediate attempt; everyone else is queued. Unique index on
   (node_id, backend, op) means reissuing the same operation refreshes
   next_retry_at instead of stacking duplicates.

2) Loaded-model state could drift from reality: a worker OOM'd, got
   killed, or restarted a backend process would leave a node_models row
   claiming the model was still loaded, feeding ghost entries into the
   /api/nodes/models listing and the router's scheduling decisions.

The existing ReplicaReconciler gains two new passes that run under a
fresh KeyStateReconciler advisory lock (non-blocking, so one wedged
frontend doesn't freeze the cluster):

  - drainPendingBackendOps: retries queued ops whose next_retry_at has
    passed on currently-healthy nodes. Success deletes the row; failure
    bumps attempts and pushes next_retry_at out with exponential backoff
    (30s → 15m cap). ErrNoResponders also marks the node unhealthy.

  - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are
    loaded but hasn't seen touched in the last probeStaleAfter (2m).
    Unreachable addresses are removed from the registry. A pluggable
    ModelProber lets tests substitute a fake without standing up gRPC.

DistributedBackendManager exposes DeleteBackendDetailed so the HTTP
handler can surface per-node outcomes ("2 succeeded, 1 queued") to the
UI in a follow-up commit; the existing DeleteBackend still returns
error-only for callers that don't care about node breakdown.

Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx
on a new key so N frontends coordinate — the same pattern the health
monitor and replica reconciler already rely on. Single-node mode runs
both passes inline (adapter is nil, state drain is a no-op).

Tests cover the upsert semantics, backoff math, the probe removing an
unreachable model but keeping a reachable one, and filtering by
probeStaleAfter.

* feat(ui): show cluster distribution of models in the System page

When a frontend restarted in distributed mode, models that workers had
already loaded weren't visible until the operator clicked into each node
manually — the /api/models/capabilities endpoint only knew about
configs on the frontend's filesystem, not the registry-backed truth.

/api/models/capabilities now joins in ListAllLoadedModels() when the
registry is active, returning loaded_on[] with node id/name/state/status
for each model. Models that live in the registry but lack a local config
(the actual ghosts, not recovered from the frontend's file cache) still
surface with source="registry-only" so operators can see and persist
them; without that emission they'd be invisible to this frontend.

Manage → Models replaces the old Running/Idle pill with a distribution
cell that lists the first three nodes the model is loaded on as chips
colored by state (green loaded, blue loading, amber anything else). On
wider clusters the remaining count collapses into a +N chip with a
title-attribute breakdown. Disabled / single-node behavior unchanged.

Adopted models get an extra "Adopted" ghost-icon chip with hover copy
explaining what it means and how to make it permanent.

Distributed mode also enables a 10s auto-refresh and a "Last synced Xs
ago" indicator next to the Update button so ghost rows drop off within
one reconcile tick after their owning process dies. Non-distributed
mode is untouched — no polling, no cell-stack, same old Running/Idle.

* feat(ui): NodeDistributionChip — shared per-node attribution component

Large clusters were going to break the Manage → Backends Nodes column:
the old inline logic rendered every node as a badge and would shred the
layout at >10 workers, plus the Manage → Models distribution cell had
copy-pasted its own slightly-different version.

NodeDistributionChip handles any cluster size with two render modes:
  - small (≤3 nodes): inline chips of node names, colored by health.
  - large: a single "on N nodes · M offline · K drift" summary chip;
    clicking opens a Popover with a per-node table (name, status,
    version, digest for backends; name, status, state for models).

Drift counting mirrors the backend's summarizeNodeDrift so the UI
number matches UpgradeInfo.NodeDrift. Digests are truncated to the
docker-style 12-char form with the full value preserved in the title.

Popover is a new general-purpose primitive: fixed positioning anchored
to the trigger, flips above when there's no room below, closes on
outside-click or Escape, returns focus to the trigger. Uses .card as
its surface so theming is inherited. Also useful for a future
labels-editor popup and the user menu.

Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell
and uses the shared chip with context="backends" / "models"
respectively. Delete code removes ~40 lines of ad-hoc logic.

* feat(ui): shared FilterBar across the System page tabs

The Backends gallery had a nice search + chip + toggle strip; the System
page had nothing, so the two surfaces felt like different apps. Lift the
pattern into a reusable FilterBar and wire both System tabs through it.

New component core/http/react-ui/src/components/FilterBar.jsx renders a
search input, a role="tablist" chip row (aria-selected for a11y), and
optional toggles / right slot. Chips support an optional `count` which
the System page uses to show "User 3", "Updates 1" etc.

System Models tab: search by id or backend; chips for
All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in
distributed mode. "Last synced" + Update button live in the right slot.

System Backends tab: search by name/alias/meta-backend-for; chips for
All/User/System/Meta plus conditional Updates / Offline-nodes chips
when relevant. The old ad-hoc "Updates only" toggle from the upgrade
banner folded into the Updates chip — one source of truth for that
filter. Offline chip only appears in distributed mode when at least
one backend has an unhealthy node, so the chip row stays quiet on
healthy clusters.

Filter state persists in URL query params (mq/mf/bq/bf) so deep links
and tab switches keep the operator's filter context instead of
resetting every time.

Also adds an "Adopted" distribution path: when a model in
/api/models/capabilities carries source="registry-only" (discovered on
a worker but not configured locally), the Models tab shows a ghost chip
labelled "Adopted" with hover copy explaining how to persist it — this
is what closes the loop on the ghost-model story end-to-end.
2026-04-19 17:55:53 +02:00
Ettore Di Giacinto
9cd8d7951f fix(kokoros): implement audio_transcription_stream trait stub (#9422)
The backend.proto was updated to add AudioTranscriptionStream RPC, but
the Rust KokorosService was never updated to match the regenerated
tonic trait, breaking compilation with E0046.

Stubs the new streaming method as unimplemented, matching the pattern
used for the other streaming RPCs Kokoros does not support.
2026-04-19 13:29:58 +02:00
LocalAI [bot]
884bfb84c9 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe (#9418)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:27:11 +02:00
LocalAI [bot]
e94a9a8f10 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d (#9417)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-19 09:26:58 +02:00
Ettore Di Giacinto
054c4b4b45 feat(stable-diffusion.ggml): add support for video generation (#9420)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-19 09:26:33 +02:00
LocalAI [bot]
6e49dba27c chore: ⬆️ Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb (#9415)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-19 09:26:05 +02:00
Ettore Di Giacinto
e463820566 fix(ui): fix dark-theme colors in chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-18 23:01:01 +00:00
Keith Mattix II
8839a71c87 fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg (#9410)
Add gfx1151 (AMD Strix Halo / Ryzen AI MAX) to the default AMDGPU_TARGETS
list in the llama-cpp backend Makefile. ROCm 7.2.1 ships with gfx1151
Tensile libraries, so this architecture should be included in default builds.

Also expose AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so that
users building for non-default GPU architectures can override the target
list via --build-arg AMDGPU_TARGETS=<arch>. Previously, passing
-DAMDGPU_TARGETS=<arch> through CMAKE_ARGS was silently overridden by
the Makefile's own append of the default target list.

Fixes #9374

Signed-off-by: Keith Mattix <keithmattix2@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-18 20:39:40 +02:00
Ettore Di Giacinto
117f6430b8 fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413)
The shared grpc-server CMakeLists hardcoded `llama-common`, the post-rename
target name in upstream llama.cpp. The turboquant fork branched before that
rename and still exposes the helpers library as `common`, so the name
silently degraded to a plain `-llama-common` link flag, the PUBLIC include
directory was never propagated, and tools/server/server-task.h failed to
find common.h during turboquant-<flavor> builds.
2026-04-18 20:30:28 +02:00
Ettore Di Giacinto
7809c5f5d0 fix(vision): propagate mtmd media marker from backend via ModelMetadata (#9412)
Upstream llama.cpp (PR #21962) switched the server-side mtmd media
marker to a random per-server string and removed the legacy
"<__media__>" backward-compat replacement in mtmd_tokenizer. The
Go layer still emitted the hardcoded "<__media__>", so on the
non-tokenizer-template path the prompt arrived with a marker mtmd
did not recognize and tokenization failed with "number of bitmaps
(1) does not match number of markers (0)".

Report the active media marker via ModelMetadataResponse.media_marker
and substitute the sentinel "<__media__>" with it right before the
gRPC call, after the backend has been loaded and probed. Also skip
the Go-side multimodal templating entirely when UseTokenizerTemplate
is true — llama.cpp's oaicompat_chat_params_parse already injects its
own marker and StringContent is unused in that path. Backends that do
not expose the field keep the legacy "<__media__>" behavior.
2026-04-18 20:30:13 +02:00
LocalAI [bot]
ad742738cb chore: ⬆️ Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e (#9404)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-18 09:21:32 +02:00
LocalAI [bot]
86c673fd94 chore: ⬆️ Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd (#9403)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-18 00:42:32 +02:00
Ettore Di Giacinto
c49feb546f fix(llama-cpp): rename linked target common -> llama-common (#9408)
Upstream llama.cpp (45cac7ca) renamed the CMake library target
`common` to `llama-common`. Linking the old name caused
`target_include_directories(... PUBLIC .)` from the common/ dir
to not propagate, so `#include "common.h"` failed when building
grpc-server.
2026-04-18 00:42:05 +02:00
LocalAI [bot]
844b0b760b chore(model gallery): 🤖 add 1 new models via gallery agent (#9400)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 17:56:41 +02:00
LocalAI [bot]
55c05211d3 chore(model gallery): 🤖 add 1 new models via gallery agent (#9399)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 16:10:02 +02:00
Ettore Di Giacinto
a90a8cf1d0 fix(ci): switch gallery-agent to sigs.k8s.io/yaml (#9397)
The gallery-agent lives under .github/, which Go tooling treats as a
hidden directory and excludes from './...' expansion. That means 'go
mod tidy' (run on every dependabot dependency bump) repeatedly strips
github.com/ghodss/yaml from go.mod/go.sum, breaking 'go run
./.github/gallery-agent' with a missing go.sum entry error.

Switch to sigs.k8s.io/yaml — API-compatible with ghodss/yaml and
already pulled in as a transitive dependency via non-hidden packages,
so tidy can no longer remove it.
2026-04-17 10:10:42 +02:00
dependabot[bot]
12b069f9bd chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9376)
chore(deps): bump dompurify

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [dompurify](https://github.com/cure53/DOMPurify).


Updates `dompurify` from 3.3.2 to 3.4.0
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.3.2...3.4.0)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.0
  dependency-type: direct:production
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 09:06:32 +02:00
github-actions[bot]
48e87db400 chore: bump inference defaults from unsloth (#9396)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 09:05:55 +02:00
LocalAI [bot]
7dbd9c056a chore: ⬆️ Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff (#9381)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:13:04 +02:00
Ettore Di Giacinto
7c5d6162f7 fix(ui): rename model config files on save to prevent duplicates (#9388)
Editing a model's YAML and changing the `name:` field previously wrote
the new body to the original `<oldName>.yaml`. On reload the config
loader indexed that file under the new name while the old key
lingered in memory, producing two entries in the system UI that
shared a single underlying file — deleting either removed both.

Detect the rename in EditModelEndpoint and rename the on-disk
`<name>.yaml` and `._gallery_<name>.yaml` to match, drop the stale
in-memory key before the reload, and redirect the editor URL in the
React UI so it tracks the new name. Reject conflicts (409) and names
containing path separators (400).

Fixes #9294
2026-04-17 08:12:48 +02:00
Ettore Di Giacinto
5837b14888 chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' (#9385)
chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d`

Drop 0002-ggml-rpc-bump-op-count-to-97.patch; the fork now has
GGML_OP_COUNT == 97 and RPC_PROTO_PATCH_VERSION 2 upstream.

Fetch all tags in backend/cpp/llama-cpp/Makefile so tag-only commits
(the new turboquant pin is reachable only through the tag
feature-turboquant-kv-cache-b8821-45f8a06) can be checked out.
2026-04-17 08:12:21 +02:00
LocalAI [bot]
b6a68e5df4 chore: ⬆️ Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 (#9383)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:11:55 +02:00
LocalAI [bot]
c6dfb4acaf chore: ⬆️ Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 (#9382)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-17 08:11:41 +02:00
LocalAI [bot]
ec5935421c chore(model-gallery): ⬆️ update checksum (#9384)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 22:41:52 +02:00
Ettore Di Giacinto
a0cbc46be9 refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer (#9380)
Drop the 295-line vendor/llama.py fork in favor of `tinygrad.apps.llm`,
which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8
quantization), KV-cache and generate loop we were maintaining ourselves.

What changed:
- New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict
  keymap, Transformer kwargs builder, `_embed_hidden` helper, and a hard
  rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the
  apps.llm Transformer ties `bias=False` on Q/K/V projections).
- backend.py routes both safetensors and GGUF paths through
  apps.llm.Transformer. Generation now delegates to its (greedy-only)
  `generate()`; Temperature / TopK / TopP / RepetitionPenalty are still
  accepted on the wire but ignored — documented in the module docstring.
- Jinja chat render now passes `enable_thinking=False` so Qwen3's
  reasoning preamble doesn't eat the tool-call token budget on small
  models.
- Embedding path uses `_embed_hidden` (block stack + output_norm) rather
  than the custom `embed()` method we were carrying on the vendored
  Transformer.
- test.py gains TestAppsLLMAdapter covering the keymap rename, tied
  embedding fallback, unknown-key skipping, and qkv_bias rejection.
- Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B
  (apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the
  HF chat template emits hermes-style JSON tool calls).

Verified with the docker-backed targets:
  test-extra-backend-tinygrad             5/5 PASS
  test-extra-backend-tinygrad-embeddings  3/3 PASS
  test-extra-backend-tinygrad-whisper     4/4 PASS
  test-extra-backend-tinygrad-sd          3/3 PASS
2026-04-16 22:41:18 +02:00
Ettore Di Giacinto
b4e30692a2 feat(backends): add sglang (#9359)
* feat(backends): add sglang

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job

sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally;
-march=native fails on CI runners without AVX-512 in /proc/cpuinfo.
Force -march=sapphirerapids so the build always succeeds, matching
sglang upstream's docker/xeon.Dockerfile recipe.

The resulting binary still requires an AVX-512 capable CPU at runtime,
so disable tests-sglang-grpc in test-extra.yml for the same reason
tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang
still work on hosts with the right SIMD baseline.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512

CXXFLAGS with -march=sapphirerapids was being overridden by
add_compile_options(-march=native) in sglang's CPU CMakeLists.txt,
since CMake appends those flags after CXXFLAGS. Sed-patch the
CMakeLists.txt directly after cloning to replace -march=native.

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 22:40:56 +02:00
Ettore Di Giacinto
61d34ccb11 fix(ui): show also concrete backends in the backend list
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 17:44:25 +00:00
LocalAI [bot]
7f88a3ba30 chore: ⬆️ Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a (#9366)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 09:04:33 +02:00
Matt Van Horn
c4f309388e fix(gallery): correct gemma-4 model URIs returning 404 (#9379)
The gemma-4-26b-a4b-it, gemma-4-e2b-it, and gemma-4-e4b-it gallery
entries pointed at files that do not exist on HuggingFace, so LocalAI
fails with 404 when users try to install them.

Two issues per entry:
- mmproj filename uses the 'f16' quantization suffix, but ggml-org
  publishes the mmproj projectors as 'bf16'.
- The e2b and e4b URIs hardcode lowercase 'e2b'/'e4b' in the filename
  component. HuggingFace file paths are case-sensitive and the real
  files use uppercase 'E2B'/'E4B'.

Updated filename, uri, sha256, and the top-level 'mmproj' and
'parameters.model' references so every entry points at a real file
and the declared hashes match the content.

Verified each URI resolves (HTTP 302) and each sha256 matches the
'x-linked-etag' header returned by HuggingFace.

Signed-off-by: Matt Van Horn <mvanhorn@gmail.com>
2026-04-16 08:51:20 +02:00
dependabot[bot]
ab326a9c61 chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates (#9373)
Bumps the npm_and_yarn group with 6 updates in the /core/http/react-ui directory:

| Package | From | To |
| --- | --- | --- |
| [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) | `6.4.1` | `6.4.2` |
| [@hono/node-server](https://github.com/honojs/node-server) | `1.19.11` | `1.19.14` |
| [flatted](https://github.com/WebReflection/flatted) | `3.3.4` | `3.4.2` |
| [hono](https://github.com/honojs/hono) | `4.12.7` | `4.12.14` |
| [path-to-regexp](https://github.com/pillarjs/path-to-regexp) | `8.3.0` | `8.4.2` |
| [picomatch](https://github.com/micromatch/picomatch) | `4.0.3` | `4.0.4` |



Updates `vite` from 6.4.1 to 6.4.2
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite)

Updates `@hono/node-server` from 1.19.11 to 1.19.14
- [Release notes](https://github.com/honojs/node-server/releases)
- [Commits](https://github.com/honojs/node-server/compare/v1.19.11...v1.19.14)

Updates `flatted` from 3.3.4 to 3.4.2
- [Commits](https://github.com/WebReflection/flatted/compare/v3.3.4...v3.4.2)

Updates `hono` from 4.12.7 to 4.12.14
- [Release notes](https://github.com/honojs/hono/releases)
- [Commits](https://github.com/honojs/hono/compare/v4.12.7...v4.12.14)

Updates `path-to-regexp` from 8.3.0 to 8.4.2
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v8.3.0...v8.4.2)

Updates `picomatch` from 4.0.3 to 4.0.4
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](https://github.com/micromatch/picomatch/compare/4.0.3...4.0.4)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 6.4.2
  dependency-type: direct:development
  dependency-group: npm_and_yarn
- dependency-name: "@hono/node-server"
  dependency-version: 1.19.14
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: flatted
  dependency-version: 3.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: hono
  dependency-version: 4.12.14
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: path-to-regexp
  dependency-version: 8.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: picomatch
  dependency-version: 4.0.4
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-16 08:23:03 +02:00
LocalAI [bot]
df2d25cee5 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 (#9368)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:13:08 +02:00
LocalAI [bot]
96cd561d9d chore: ⬆️ Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a (#9370)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:12:39 +02:00
LocalAI [bot]
08445b1b89 chore(model-gallery): ⬆️ update checksum (#9369)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-16 01:12:01 +02:00
Ettore Di Giacinto
ad3c8c4832 fix(agents): handle embedding model dim changes on collection upload (#9365)
Bumps LocalAGI to pick up the LocalRecall postgres backend fix that
resizes the pgvector column when the configured embedding model
returns vectors of a different dimensionality than the existing
collection. Switching the agent pool's embedding model now triggers
a transparent re-embed at startup instead of failing every subsequent
upload with 'expected N dimensions, not M' (SQLSTATE 22000).

Also surfaces a 409 with an actionable message in
UploadToCollectionEndpoint as a safety net for the rare cases the
upstream migration path doesn't cover (e.g. a model swapped at
runtime), instead of the previous opaque 500.
2026-04-15 20:05:28 +02:00
Ettore Di Giacinto
6f0051301b feat(backend): add tinygrad multimodal backend (experimental) (#9364)
* feat(backend): add tinygrad multimodal backend

Wire tinygrad as a new Python backend covering LLM text generation with
native tool-call extraction, embeddings, Stable Diffusion 1.x image
generation, and Whisper speech-to-text from a single self-contained
container.

Backend (`backend/python/tinygrad/`):
- `backend.py` gRPC servicer with LLM Predict/PredictStream (auto-detects
  Llama / Qwen2 / Mistral architecture from `config.json`, supports
  safetensors and GGUF), Embedding via mean-pooled last hidden state,
  GenerateImage via the vendored SD1.x pipeline, AudioTranscription +
  AudioTranscriptionStream via the vendored Whisper inference loop, plus
  Tokenize / ModelMetadata / Status / Free.
- Vendored upstream model code under `vendor/` (MIT, headers preserved):
  llama.py with an added `qkv_bias` flag for Qwen2-family bias support
  and an `embed()` method that returns the last hidden state, plus
  clip.py, unet.py, stable_diffusion.py (trimmed to drop the MLPerf
  training branch that pulls `mlperf.initializers`), audio_helpers.py
  and whisper.py (trimmed to drop the pyaudio listener).
- Pluggable tool-call parsers under `tool_parsers/`: hermes (Qwen2.5 /
  Hermes), llama3_json (Llama 3.1+), qwen3_xml (Qwen 3), mistral
  (Mistral / Mixtral). Auto-selected from model architecture or `Options`.
- `install.sh` pins Python 3.11.14 (tinygrad >=0.12 needs >=3.11; the
  default portable python is 3.10).
- `package.sh` bundles libLLVM.so.1 + libedit/libtinfo/libgomp/libsndfile
  into the scratch image. `run.sh` sets `CPU_LLVM=1` and `LLVM_PATH` so
  tinygrad's CPU device uses the in-process libLLVM JIT instead of
  shelling out to the missing `clang` binary.
- Local unit tests for Health and the four parsers in `test.py`.

Build wiring:
- Root `Makefile`: `.NOTPARALLEL`, `prepare-test-extra`, `test-extra`,
  `BACKEND_TINYGRAD = tinygrad|python|.|false|true`,
  docker-build-target eval, and `docker-build-backends` aggregator.
- `.github/workflows/backend.yml`: cpu / cuda12 / cuda13 build matrix
  entries (mirrors the transformers backend placement).
- `backend/index.yaml`: `&tinygrad` meta + cpu/cuda12/cuda13 image
  entries (latest + development).

E2E test wiring:
- `tests/e2e-backends/backend_test.go` gains an `image` capability that
  exercises GenerateImage and asserts a non-empty PNG is written to
  `dst`. New `BACKEND_TEST_IMAGE_PROMPT` / `BACKEND_TEST_IMAGE_STEPS`
  knobs.
- Five new make targets next to `test-extra-backend-vllm`:
  - `test-extra-backend-tinygrad` — Qwen2.5-0.5B-Instruct + hermes,
    mirrors the vllm target 1:1 (5/9 specs in ~57s).
  - `test-extra-backend-tinygrad-embeddings` — same model, embeddings
    via LLM hidden state (3/9 in ~10s).
  - `test-extra-backend-tinygrad-sd` — stable-diffusion-v1-5 mirror,
    health/load/image (3/9 in ~10min, 4 diffusion steps on CPU).
  - `test-extra-backend-tinygrad-whisper` — openai/whisper-tiny.en
    against jfk.wav from whisper.cpp samples (4/9 in ~49s).
  - `test-extra-backend-tinygrad-all` aggregate.

All four targets land green on the first MVP pass: 15 specs total, 0
failures across LLM+tools, embeddings, image generation, and speech
transcription.

* refactor(tinygrad): collapse to a single backend image

tinygrad generates its own GPU kernels (PTX renderer for CUDA, the
autogen ctypes wrappers for HIP / Metal / WebGPU) and never links
against cuDNN, cuBLAS, or any toolkit-version-tied library. The only
runtime dependency that varies across hosts is the driver's libcuda.so.1
/ libamdhip64.so, which are injected into the container at run time by
the nvidia-container / rocm runtimes. So unlike torch- or vLLM-based
backends, there is no reason to ship per-CUDA-version images.

- Drop the cuda12-tinygrad and cuda13-tinygrad build-matrix entries
  from .github/workflows/backend.yml. The sole remaining entry is
  renamed to -tinygrad (from -cpu-tinygrad) since it is no longer
  CPU-only.
- Collapse backend/index.yaml to a single meta + development pair.
  The meta anchor carries the latest uri directly; the development
  entry points at the master tag.
- run.sh picks the tinygrad device at launch time by probing
  /usr/lib/... for libcuda.so.1 / libamdhip64.so. When libcuda is
  visible we set CUDA=1 + CUDA_PTX=1 so tinygrad uses its own PTX
  renderer (avoids any nvrtc/toolkit dependency); otherwise we fall
  back to HIP or CLANG. CPU_LLVM=1 + LLVM_PATH keep the in-process
  libLLVM JIT for the CLANG path.
- backend.py's _select_tinygrad_device() is trimmed to a CLANG-only
  fallback since production device selection happens in run.sh.

Re-ran test-extra-backend-tinygrad after the change:
  Ran 5 of 9 Specs in 56.541 seconds — 5 Passed, 0 Failed
2026-04-15 19:48:23 +02:00
LocalAI [bot]
8487058673 chore(model-gallery): ⬆️ update checksum (#9358)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:59 +02:00
LocalAI [bot]
62862ca06b chore: ⬆️ Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 (#9357)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:41 +02:00
LocalAI [bot]
07e244d869 feat(swagger): update swagger (#9356)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-15 01:25:24 +02:00
Ettore Di Giacinto
95efb8a562 feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend

turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.

Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.

scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.

* feat(turboquant): carry upstream patches against fork API drift

turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.

Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).

Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.

* docs: add turboquant backend section + clarify cache_type_k/v

Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.

Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.

* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion

The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.

* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e

The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.

Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.

Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.

Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.

* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3

The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-15 01:25:04 +02:00
Ettore Di Giacinto
410d100cc3 chore(ui): improve visibility of forms, color palette
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 21:53:03 +00:00
Ettore Di Giacinto
833b7e8557 chore(docs): update transcription endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 14:14:54 +00:00
Ettore Di Giacinto
87e6de1989 feat: wire transcription for llama.cpp, add streaming support (#9353)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 16:13:40 +02:00
Ettore Di Giacinto
b361d2ddd6 chore(gallery): add new llama.cpp supported models (qwen-asr, ocr)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 10:04:50 +00:00
Ettore Di Giacinto
1e4c4577bb fix(ci): small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 09:27:27 +00:00
dependabot[bot]
98fd9d5cc6 chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 (#9340)
Bumps [github.com/charmbracelet/glamour](https://github.com/charmbracelet/glamour) from 0.10.0 to 1.0.0.
- [Release notes](https://github.com/charmbracelet/glamour/releases)
- [Commits](https://github.com/charmbracelet/glamour/compare/v0.10.0...v1.0.0)

---
updated-dependencies:
- dependency-name: github.com/charmbracelet/glamour
  dependency-version: 1.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:17:05 +02:00
dependabot[bot]
0c725f5702 chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 (#9344)
Bumps [github.com/swaggo/echo-swagger](https://github.com/swaggo/echo-swagger) from 1.4.1 to 1.5.2.
- [Release notes](https://github.com/swaggo/echo-swagger/releases)
- [Commits](https://github.com/swaggo/echo-swagger/compare/v1.4.1...v1.5.2)

---
updated-dependencies:
- dependency-name: github.com/swaggo/echo-swagger
  dependency-version: 1.5.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:37 +02:00
dependabot[bot]
7661a4ffa5 chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 (#9341)
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats

Bumps [github.com/testcontainers/testcontainers-go/modules/nats](https://github.com/testcontainers/testcontainers-go) from 0.41.0 to 0.42.0.
- [Release notes](https://github.com/testcontainers/testcontainers-go/releases)
- [Commits](https://github.com/testcontainers/testcontainers-go/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: github.com/testcontainers/testcontainers-go/modules/nats
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:26 +02:00
dependabot[bot]
24ad6e4be1 chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 (#9343)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.3 to 0.21.5.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.21.3...v0.21.5)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.21.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 11:15:09 +02:00
LocalAI [bot]
c0648b8836 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 55d3c05bf7b377deaa5dc84d255d9740a345a206 (#9348)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-14 08:56:25 +02:00
Ettore Di Giacinto
a05c7def59 fix(e2e): update to new testcontainers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-14 06:56:04 +00:00
LocalAI [bot]
906acba8db chore: ⬆️ Update ggml-org/llama.cpp to e97492369888f5311e4d1f3beb325a36bbed70e9 (#9347)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-14 08:54:25 +02:00
dependabot[bot]
4226ca4aee chore(deps): bump sentence-transformers from 5.2.3 to 5.4.0 in /backend/python/transformers (#9342)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.3 to 5.4.0.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.3...v5.4.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-14 00:30:27 +02:00
LocalAI [bot]
c6d5dc3374 chore(model-gallery): ⬆️ update checksum (#9346)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-13 23:00:13 +02:00
Ettore Di Giacinto
7ce675af21 chore(gallery-agent): extract readme
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-13 20:31:49 +00:00
Ettore Di Giacinto
be1b8d56c9 fix(gallery): override parameters for flux kontext
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-13 22:29:17 +02:00
dependabot[bot]
97f087ed31 chore(deps): bump github.com/testcontainers/testcontainers-go from 0.41.0 to 0.42.0 (#9338)
chore(deps): bump github.com/testcontainers/testcontainers-go

Bumps [github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go) from 0.41.0 to 0.42.0.
- [Release notes](https://github.com/testcontainers/testcontainers-go/releases)
- [Commits](https://github.com/testcontainers/testcontainers-go/compare/v0.41.0...v0.42.0)

---
updated-dependencies:
- dependency-name: github.com/testcontainers/testcontainers-go
  dependency-version: 0.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 21:54:02 +02:00
dependabot[bot]
8691bbe663 chore(deps): bump actions/upload-pages-artifact from 4 to 5 (#9337)
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 4 to 5.
- [Release notes](https://github.com/actions/upload-pages-artifact/releases)
- [Commits](https://github.com/actions/upload-pages-artifact/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/upload-pages-artifact
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 21:53:47 +02:00
dependabot[bot]
7998f96f11 chore(deps): bump softprops/action-gh-release from 2 to 3 (#9336)
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2 to 3.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](https://github.com/softprops/action-gh-release/compare/v2...v3)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-version: '3'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-13 21:53:28 +02:00
Ettore Di Giacinto
cada97ee46 chore(gallery-agent): control bot via PR
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-13 19:52:48 +00:00
Ettore Di Giacinto
3375ea1a2c chore(gallery-agent): simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-13 19:50:31 +00:00
Ettore Di Giacinto
0e7c0adee4 docs: document tool calling on vLLM and MLX backends
openai-functions.md used to claim LocalAI tool calling worked only on
llama.cpp-compatible models. That was true when it was written; it's
not true now — vLLM (since PR #9328) and MLX/MLX-VLM both extract
structured tool calls from model output.

- openai-functions.md: new 'Supported backends' matrix covering
  llama.cpp, vllm, vllm-omni, mlx, mlx-vlm, with the key distinction
  that vllm needs an explicit tool_parser: option while mlx auto-
  detects from the chat template. Reasoning content (think tags) is
  extracted on the same set of backends. Added setup snippets for
  both the vllm and mlx paths, and noted the gallery importer
  pre-fills tool_parser:/reasoning_parser: for known families.
- compatibility-table.md: fix the stale 'Streaming: no' for vllm,
  vllm-omni, mlx, mlx-vlm (all four support streaming now). Add
  'Functions' to their capabilities. Also widen the MLX Acceleration
  column to reflect the CPU/CUDA/Jetson L4T backends that already
  exist in backend/index.yaml — 'Metal' on its own was misleading.
2026-04-13 16:58:55 +00:00
Ettore Di Giacinto
016da02845 feat: refactor shared helpers and enhance MLX backend functionality (#9335)
* refactor(backends): extract python_utils + add mlx_utils shared helpers

Move parse_options() and messages_to_dicts() out of vllm_utils.py into a
new framework-agnostic python_utils.py, and re-export them from vllm_utils
so existing vllm / vllm-omni imports keep working.

Add mlx_utils.py with split_reasoning() and parse_tool_calls() — ported
from mlx_vlm/server.py's process_tool_calls. These work with any
mlx-lm / mlx-vlm tool module (anything exposing tool_call_start,
tool_call_end, parse_tool_call). Used by the mlx and mlx-vlm backends in
later commits to emit structured ChatDelta.tool_calls without
reimplementing per-model parsing.

Shared smoke tests confirm:
- parse_options round-trips bool/int/float/string
- vllm_utils re-exports are identity-equal to python_utils originals
- mlx_utils parse_tool_calls handles <tool_call>...</tool_call> with a
  shim module and produces a correctly-indexed list with JSON arguments
- mlx_utils split_reasoning extracts <think> blocks and leaves clean
  content

* feat(mlx): wire native tool parsers + ChatDelta + token usage + logprobs

Bring the MLX backend up to the same structured-output contract as vLLM
and llama.cpp: emit Reply.chat_deltas so the OpenAI HTTP layer sees
tool_calls and reasoning_content, not just raw text.

Key insight: mlx_lm.load() returns a TokenizerWrapper that already auto-
detects the right tool parser from the model's chat template
(_infer_tool_parser in mlx_lm/tokenizer_utils.py). The wrapper exposes
has_tool_calling, has_thinking, tool_parser, tool_call_start,
tool_call_end, think_start, think_end — no user configuration needed,
unlike vLLM.

Changes in backend/python/mlx/backend.py:

- Imports: replace inline parse_options / messages_to_dicts with the
  shared helpers from python_utils. Pull split_reasoning / parse_tool_calls
  from the new mlx_utils shared module.
- LoadModel: log the auto-detected has_tool_calling / has_thinking /
  tool_parser_type for observability. Drop the local is_float / is_int
  duplicates.
- _prepare_prompt: run request.Messages through messages_to_dicts so
  tool_call_id / tool_calls / reasoning_content survive the conversion,
  and pass tools=json.loads(request.Tools) + enable_thinking=True (when
  request.Metadata says so) to apply_chat_template. Falls back on
  TypeError for tokenizers whose template doesn't accept those kwargs.
- _build_generation_params: return an additional (logits_params,
  stop_words) pair. Maps RepetitionPenalty / PresencePenalty /
  FrequencyPenalty to mlx_lm.sample_utils.make_logits_processors and
  threads StopPrompts through to post-decode truncation.
- New _tool_module_from_tokenizer / _finalize_output / _truncate_at_stop
  helpers. _finalize_output runs split_reasoning when has_thinking is
  true and parse_tool_calls (using a SimpleNamespace shim around the
  wrapper's tool_parser callable) when has_tool_calling is true, then
  extracts prompt_tokens, generation_tokens and (best-effort) logprobs
  from the last GenerationResponse chunk.
- Predict: use make_logits_processors, accumulate text + last_response,
  finalize into a structured Reply carrying chat_deltas,
  prompt_tokens, tokens, logprobs. Early-stops on user stop sequences.
- PredictStream: per-chunk Reply still carries raw message bytes for
  back-compat but now also emits chat_deltas=[ChatDelta(content=delta)].
  On loop exit, emit a terminal Reply with structured
  reasoning_content / tool_calls / token counts / logprobs — so the Go
  side sees tool calls without needing the regex fallback.
- TokenizeString RPC: uses the TokenizerWrapper's encode(); returns
  length + tokens or FAILED_PRECONDITION if the model isn't loaded.
- Free RPC: drops model / tokenizer / lru_cache, runs gc.collect(),
  calls mx.metal.clear_cache() when available, and best-effort clears
  torch.cuda as a belt-and-suspenders.

* feat(mlx-vlm): mirror MLX parity (tool parsers + ChatDelta + samplers)

Same treatment as the MLX backend: emit structured Reply.chat_deltas,
tool_calls, reasoning_content, token counts and logprobs, and extend
sampling parameter coverage beyond the temp/top_p pair the backend
used to handle.

- Imports: drop the inline is_float/is_int helpers, pull parse_options /
  messages_to_dicts from python_utils and split_reasoning /
  parse_tool_calls from mlx_utils. Also import make_sampler and
  make_logits_processors from mlx_lm.sample_utils — mlx-vlm re-uses them.
- LoadModel: use parse_options; call mlx_vlm.tool_parsers._infer_tool_parser
  / load_tool_module to auto-detect a tool module from the processor's
  chat_template. Stash think_start / think_end / has_thinking so later
  finalisation can split reasoning blocks without duck-typing on each
  call. Logs the detected parser type.
- _prepare_prompt: convert proto Messages via messages_to_dicts (so
  tool_call_id / tool_calls survive), pass tools=json.loads(request.Tools)
  and enable_thinking=True to apply_chat_template when present, fall
  back on TypeError for older mlx-vlm versions. Also handle the
  prompt-only + media and empty-prompt + media paths consistently.
- _build_generation_params: return (max_tokens, sampler_params,
  logits_params, stop_words). Maps repetition_penalty / presence_penalty /
  frequency_penalty and passes them through make_logits_processors.
- _finalize_output / _truncate_at_stop: common helper used by Predict
  and PredictStream to split reasoning, run parse_tool_calls against the
  auto-detected tool module, build ToolCallDelta list, and extract token
  counts + logprobs from the last GenerationResult.
- Predict / PredictStream: switch from mlx_vlm.generate to mlx_vlm.stream_generate
  in both paths, accumulate text + last_response, pass sampler and
  logits_processors through, emit content-only ChatDelta per streaming
  chunk followed by a terminal Reply carrying reasoning_content,
  tool_calls, prompt_tokens, tokens and logprobs. Non-streaming Predict
  returns the same structured Reply shape.
- New helper _collect_media extracted from the duplicated base64 image /
  audio decode loop.
- New TokenizeString RPC using the processor's tokenizer.encode and
  Free RPC that drops model/processor/config, runs gc + Metal cache
  clear + best-effort torch.cuda cache clear.

* feat(importer/mlx): auto-set tool_parser/reasoning_parser on import

Mirror what core/gallery/importers/vllm.go does: after applying the
shared inference defaults, look up the model URI in parser_defaults.json
and append matching tool_parser:/reasoning_parser: entries to Options.

The MLX backends auto-detect tool parsers from the chat template at
runtime so they don't actually consume these options — but surfacing
them in the generated YAML:
  - keeps the import experience consistent with vllm
  - gives users a single visible place to override
  - documents the intended parser for a given model family

* test(mlx): add helper unit tests + TokenizeString/Free + e2e make targets

- backend/python/mlx/test.py: add TestSharedHelpers with server-less
  unit tests for parse_options, messages_to_dicts, split_reasoning and
  parse_tool_calls (using a SimpleNamespace shim to fake a tool module
  without requiring a model). Plus test_tokenize_string and test_free
  RPC tests that load a tiny MLX-quantized Llama and exercise the new
  RPCs end-to-end.

- backend/python/mlx-vlm/test.py: same helper unit tests + cleanup of
  the duplicated import block at the top of the file.

- Makefile: register BACKEND_MLX and BACKEND_MLX_VLM (they were missing
  from the docker-build-target eval list — only mlx-distributed had a
  generated target before). Add test-extra-backend-mlx and
  test-extra-backend-mlx-vlm convenience targets that build the
  respective image and run tests/e2e-backends with the tools capability
  against mlx-community/Qwen2.5-0.5B-Instruct-4bit. The MLX backend
  auto-detects the tool parser from the chat template so no
  BACKEND_TEST_OPTIONS is needed (unlike vllm).

* fix(libbackend): don't pass --copies to venv unless PORTABLE_PYTHON=true

backend/python/common/libbackend.sh:ensureVenv() always invoked
'python -m venv --copies', but macOS system python (and some other
builds) refuses with:

    Error: This build of python cannot create venvs without using symlinks

--copies only matters when _makeVenvPortable later relocates the venv,
which only happens when PORTABLE_PYTHON=true. Make --copies conditional
on that flag and fall back to default (symlinked) venv otherwise.

Caught while bringing up the mlx backend on Apple Silicon — the same
build path is used by every Python backend with USE_PIP=true.

* fix(mlx): support mlx-lm 0.29.x tool calling + drop deprecated clear_cache

The released mlx-lm 0.29.x ships a much simpler tool-calling API than
HEAD: TokenizerWrapper detects the <tool_call>...</tool_call> markers
from the tokenizer vocab and exposes has_tool_calling /
tool_call_start / tool_call_end, but does NOT expose a tool_parser
callable on the wrapper and does NOT ship a mlx_lm.tool_parsers
subpackage at all (those only exist on main).

Caught while running the smoke test on Apple Silicon with the
released mlx-lm 0.29.1: tokenizer.tool_parser raised AttributeError
(falling through to the underlying HF tokenizer), so
_tool_module_from_tokenizer always returned None and tool calls slipped
through as raw <tool_call>...</tool_call> text in Reply.message instead
of being parsed into ChatDelta.tool_calls.

Fix: when has_tool_calling is True but tokenizer.tool_parser is missing,
default the parse_tool_call callable to json.loads(body.strip()) — that's
exactly what mlx_lm.tool_parsers.json_tools.parse_tool_call does on HEAD
and covers the only format 0.29 detects (<tool_call>JSON</tool_call>).
Future mlx-lm releases that ship more parsers will be picked up
automatically via the tokenizer.tool_parser attribute when present.

Also tighten the LoadModel logging — the old log line read
init_kwargs.get('tool_parser_type') which doesn't exist on 0.29 and
showed None even when has_tool_calling was True. Log the actual
tool_call_start / tool_call_end markers instead.

While here, switch Free()'s Metal cache clear from the deprecated
mx.metal.clear_cache to mx.clear_cache (mlx >= 0.30), with a
fallback for older releases. Mirrored to the mlx-vlm backend.

* feat(mlx-distributed): mirror MLX parity (tool calls + ChatDelta + sampler)

Same treatment as the mlx and mlx-vlm backends: emit Reply.chat_deltas
with structured tool_calls / reasoning_content / token counts /
logprobs, expand sampling parameter coverage beyond temp+top_p, and
add the missing TokenizeString and Free RPCs.

Notes specific to mlx-distributed:

- Rank 0 is the only rank that owns a sampler — workers participate in
  the pipeline-parallel forward pass via mx.distributed and don't
  re-implement sampling. So the new logits_params (repetition_penalty,
  presence_penalty, frequency_penalty) and stop_words apply on rank 0
  only; we don't need to extend coordinator.broadcast_generation_params,
  which still ships only max_tokens / temperature / top_p to workers
  (everything else is a rank-0 concern).
- Free() now broadcasts CMD_SHUTDOWN to workers when a coordinator is
  active, so they release the model on their end too. The constant is
  already defined and handled by the existing worker loop in
  backend.py:633 (CMD_SHUTDOWN = -1).
- Drop the locally-defined is_float / is_int / parse_options trio in
  favor of python_utils.parse_options, re-exported under the module
  name for back-compat with anything that imported it directly.
- _prepare_prompt: route through messages_to_dicts so tool_call_id /
  tool_calls / reasoning_content survive, pass tools=json.loads(
  request.Tools) and enable_thinking=True to apply_chat_template, fall
  back on TypeError for templates that don't accept those kwargs.
- New _tool_module_from_tokenizer (with the json.loads fallback for
  mlx-lm 0.29.x), _finalize_output, _truncate_at_stop helpers — same
  contract as the mlx backend.
- LoadModel logs the auto-detected has_tool_calling / has_thinking /
  tool_call_start / tool_call_end so users can see what the wrapper
  picked up for the loaded model.
- backend/python/mlx-distributed/test.py: add the same TestSharedHelpers
  unit tests (parse_options, messages_to_dicts, split_reasoning,
  parse_tool_calls) that exist for mlx and mlx-vlm.
2026-04-13 18:44:03 +02:00
Ettore Di Giacinto
daa0272f2e docs(agents): capture vllm backend lessons + runtime lib packaging (#9333)
New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:

- Use vLLM's native ToolParserManager / ReasoningParserManager — do
  not write regex-based parsers. Selection is explicit via Options[],
  defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
  base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
  won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
  Newer wheels declare torch==2.10.0+cpu which only exists on the
  PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
  symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
  silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
  '*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
  when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.
2026-04-13 11:09:57 +02:00
Ettore Di Giacinto
d67623230f feat(vllm): parity with llama.cpp backend (#9328)
* fix(schema): serialize ToolCallID and Reasoning in Messages.ToProto

The ToProto conversion was dropping tool_call_id and reasoning_content
even though both proto and Go fields existed, breaking multi-turn tool
calling and reasoning passthrough to backends.

* refactor(config): introduce backend hook system and migrate llama-cpp defaults

Adds RegisterBackendHook/runBackendHooks so each backend can register
default-filling functions that run during ModelConfig.SetDefaults().

Migrates the existing GGUF guessing logic into hooks_llamacpp.go,
registered for both 'llama-cpp' and the empty backend (auto-detect).
Removes the old guesser.go shim.

* feat(config): add vLLM parser defaults hook and importer auto-detection

Introduces parser_defaults.json mapping model families to vLLM
tool_parser/reasoning_parser names, with longest-pattern-first matching.

The vllmDefaults hook auto-fills tool_parser and reasoning_parser
options at load time for known families, while the VLLMImporter writes
the same values into generated YAML so users can review and edit them.

Adds tests covering MatchParserDefaults, hook registration via
SetDefaults, and the user-override behavior.

* feat(vllm): wire native tool/reasoning parsers + chat deltas + logprobs

- Use vLLM's ToolParserManager/ReasoningParserManager to extract structured
  output (tool calls, reasoning content) instead of reimplementing parsing
- Convert proto Messages to dicts and pass tools to apply_chat_template
- Emit ChatDelta with content/reasoning_content/tool_calls in Reply
- Extract prompt_tokens, completion_tokens, and logprobs from output
- Replace boolean GuidedDecoding with proper GuidedDecodingParams from Grammar
- Add TokenizeString and Free RPC methods
- Fix missing `time` import used by load_video()

* feat(vllm): CPU support + shared utils + vllm-omni feature parity

- Split vllm install per acceleration: move generic `vllm` out of
  requirements-after.txt into per-profile after files (cublas12, hipblas,
  intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
  messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
  wire native parsers via shared utils, emit ChatDelta with token counts,
  add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
  a small model (Qwen2.5-0.5B-Instruct)

* fix(vllm): CPU build compatibility with vllm 0.14.1

Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict,
TokenizeString, Free all working).

- requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from
  GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU
  wheel whose torch dependency resolves against published PyTorch builds
  (torch==2.9.1+cpu). Later vllm CPU wheels currently require
  torch==2.10.0+cpu which is only available on the PyTorch test channel
  with incompatible torchvision.
- requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio
  so uv resolves them consistently from the PyTorch CPU index.
- install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv
  can mix the PyTorch index and PyPI for transitive deps (matches the
  existing intel profile behaviour).
- backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config
  so the old code path errored out with AttributeError on model load.
  Switch to the new get_tokenizer()/tokenizer accessor with a fallback
  to building the tokenizer directly from request.Model.

* fix(vllm): tool parser constructor compat + e2e tool calling test

Concrete vLLM tool parsers override the abstract base's __init__ and
drop the tools kwarg (e.g. Hermes2ProToolParser only takes tokenizer).
Instantiating with tools= raised TypeError which was silently caught,
leaving chat_deltas.tool_calls empty.

Retry the constructor without the tools kwarg on TypeError — tools
aren't required by these parsers since extract_tool_calls finds tool
syntax in the raw model output directly.

Validated with Qwen/Qwen2.5-0.5B-Instruct + hermes parser on CPU:
the backend correctly returns ToolCallDelta{name='get_weather',
arguments='{"location": "Paris, France"}'} in ChatDelta.

test_tool_calls.py is a standalone smoke test that spawns the gRPC
backend, sends a chat completion with tools, and asserts the response
contains a structured tool call.

* ci(backend): build cpu-vllm container image

Add the cpu-vllm variant to the backend container build matrix so the
image registered in backend/index.yaml (cpu-vllm / cpu-vllm-development)
is actually produced by CI.

Follows the same pattern as the other CPU python backends
(cpu-diffusers, cpu-chatterbox, etc.) with build-type='' and no CUDA.
backend_pr.yml auto-picks this up via its matrix filter from backend.yml.

* test(e2e-backends): add tools capability + HF model name support

Extends tests/e2e-backends to cover backends that:
- Resolve HuggingFace model ids natively (vllm, vllm-omni) instead of
  loading a local file: BACKEND_TEST_MODEL_NAME is passed verbatim as
  ModelOptions.Model with no download/ModelFile.
- Parse tool calls into ChatDelta.tool_calls: new "tools" capability
  sends a Predict with a get_weather function definition and asserts
  the Reply contains a matching ToolCallDelta. Uses UseTokenizerTemplate
  with OpenAI-style Messages so the backend can wire tools into the
  model's chat template.
- Need backend-specific Options[]: BACKEND_TEST_OPTIONS lets a test set
  e.g. "tool_parser:hermes,reasoning_parser:qwen3" at LoadModel time.

Adds make target test-extra-backend-vllm that:
- docker-build-vllm
- loads Qwen/Qwen2.5-0.5B-Instruct
- runs health,load,predict,stream,tools with tool_parser:hermes

Drops backend/python/vllm/test_{cpu_inference,tool_calls}.py — those
standalone scripts were scaffolding used while bringing up the Python
backend; the e2e-backends harness now covers the same ground uniformly
alongside llama-cpp and ik-llama-cpp.

* ci(test-extra): run vllm e2e tests on CPU

Adds tests-vllm-grpc to the test-extra workflow, mirroring the
llama-cpp and ik-llama-cpp gRPC jobs. Triggers when files under
backend/python/vllm/ change (or on run-all), builds the local-ai
vllm container image, and runs the tests/e2e-backends harness with
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct, tool_parser:hermes,
and the tools capability enabled.

Uses ubuntu-latest (no GPU) — vllm runs on CPU via the cpu-vllm
wheel we pinned in requirements-cpu-after.txt. Frees disk space
before the build since the docker image + torch + vllm wheel is
sizeable.

* fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel

The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.

- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
  requirements-cpu-after.txt so installRequirements installs the base
  deps + torch CPU without pulling the prebuilt wheel, then clone vllm
  and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
  target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
  it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
  when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
  recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
  tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
  works on hosts that share the build-time SIMD baseline.

Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.

* ci(vllm): use bigger-runner instead of source build

The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512
VNNI/BF16) that stock ubuntu-latest GitHub runners don't support —
vllm.model_executor.models.registry SIGILLs on import during LoadModel.

Source compilation works but takes 30-40 minutes per CI run, which is
too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the
bigger-runner self-hosted label (already used by backend.yml for the
llama-cpp CUDA build) — that hardware has the required SIMD baseline
and the prebuilt wheel runs cleanly.

FROM_SOURCE=true is kept as an opt-in escape hatch:
- install.sh still has the CPU source-build path for hosts that need it
- backend/Dockerfile.python still declares the ARG + ENV
- Makefile docker-build-backend still forwards the build-arg when set
Default CI path uses the fast prebuilt wheel; source build can be
re-enabled by exporting FROM_SOURCE=true in the environment.

* ci(vllm): install make + build deps on bigger-runner

bigger-runner is a bare self-hosted runner used by backend.yml for
docker image builds — it has docker but not the usual ubuntu-latest
toolchain. The make-based test target needs make, build-essential
(cgo in 'go test'), and curl/unzip (the Makefile protoc target
downloads protoc from github releases).

protoc-gen-go and protoc-gen-go-grpc come via 'go install' in the
install-go-tools target, which setup-go makes possible.

* ci(vllm): install libnuma1 + libgomp1 on bigger-runner

The vllm 0.14.1+cpu wheel ships a _C C++ extension that dlopens
libnuma.so.1 at import time. When the runner host doesn't have it,
the extension silently fails to register its torch ops, so
EngineCore crashes on init_device with:

  AttributeError: '_OpNamespace' '_C_utils' object has no attribute
    'init_cpu_threads_env'

Also add libgomp1 (OpenMP runtime, used by torch CPU kernels) to be
safe on stripped-down runners.

* feat(vllm): bundle libnuma/libgomp via package.sh

The vllm CPU wheel ships a _C extension that dlopens libnuma.so.1 at
import time; torch's CPU kernels in turn use libgomp.so.1 (OpenMP).
Without these on the host, vllm._C silently fails to register its
torch ops and EngineCore crashes with:

  AttributeError: '_OpNamespace' '_C_utils' object has no attribute
    'init_cpu_threads_env'

Rather than asking every user to install libnuma1/libgomp1 on their
host (or every LocalAI base image to ship them), bundle them into
the backend image itself — same pattern fish-speech and the GPU libs
already use. libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH at
run time so the bundled copies are picked up automatically.

- backend/python/vllm/package.sh (new): copies libnuma.so.1 and
  libgomp.so.1 from the builder's multilib paths into ${BACKEND}/lib,
  preserving soname symlinks. Runs during Dockerfile.python's
  'Run backend-specific packaging' step (which already invokes
  package.sh if present).
- backend/Dockerfile.python: install libnuma1 + libgomp1 in the
  builder stage so package.sh has something to copy (the Ubuntu
  base image otherwise only has libgomp in the gcc dep chain).
- test-extra.yml: drop the workaround that installed these libs on
  the runner host — with the backend image self-contained, the
  runner no longer needs them, and the test now exercises the
  packaging path end-to-end the way a production host would.

* ci(vllm): disable tests-vllm-grpc job (heterogeneous runners)

Both ubuntu-latest and bigger-runner have inconsistent CPU baselines:
some instances support the AVX-512 VNNI/BF16 instructions the prebuilt
vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of
vllm.model_executor.models.registry. The libnuma packaging fix doesn't
help when the wheel itself can't be loaded.

FROM_SOURCE=true compiles vllm against the actual host CPU and works
everywhere, but takes 30-50 minutes per run — too slow for a smoke
test on every PR.

Comment out the job for now. The test itself is intact and passes
locally; run it via 'make test-extra-backend-vllm' on a host with the
required SIMD baseline. Re-enable when:
  - we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or
  - vllm publishes a CPU wheel with a wider baseline, or
  - we set up a docker layer cache that makes FROM_SOURCE acceptable

The detect-changes vllm output, the test harness changes (tests/
e2e-backends + tools cap), the make target (test-extra-backend-vllm),
the package.sh and the Dockerfile/install.sh plumbing all stay in
place.
2026-04-13 11:00:29 +02:00
LocalAI [bot]
0f90d17aac feat(swagger): update swagger (#9329)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-13 09:42:36 +02:00
LocalAI [bot]
ea32b8953f chore: ⬆️ Update ggml-org/llama.cpp to 1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be (#9330)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-13 09:42:18 +02:00
Ettore Di Giacinto
bc7578bdb1 fix(hipblas): pin down rocm6.4 wheels on whisperx (7.x not supported)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-12 15:27:51 +00:00
Ettore Di Giacinto
9ca03cf9cc feat(backends): add ik-llama-cpp (#9326)
* feat(backends): add ik-llama-cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add grpc e2e suite, hook to CI, update README

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-12 13:51:28 +02:00
Ettore Di Giacinto
151ad271f2 feat(rocm): bump to 7.x (#9323)
feat(rocm): bump to 7.2.1

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-12 08:51:30 +02:00
Ettore Di Giacinto
2865f0f8d3 feat(ux): backend management enhancement (#9325)
* feat: add PreferDevelopmentBackends setting, expose isMeta/isDevelopment in API

- Add PreferDevelopmentBackends config field, CLI flag, runtime setting
- Add IsDevelopment() method to GalleryBackend
- Use AvailableBackendsUnfiltered in UI API to show all backends
- Expose isMeta, isDevelopment, preferDevelopmentBackends in backend API response

* feat: upgrade banner with Upgrade All button, detect pre-existing backends

- Add upgrade banner on Backends page showing count and Upgrade All button
- Fix upgrade detection for backends installed before version tracking:
  flag as upgradeable when gallery has a version but installed has none
- Fix OCI digest check to flag backends with no stored digest as upgradeable
2026-04-12 00:35:22 +02:00
LocalAI [bot]
6fbda277c5 chore: ⬆️ Update ggml-org/llama.cpp to ff5ef8278615a2462b79b50abdf3cc95cfb31c6f (#9319)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 23:15:23 +02:00
Ettore Di Giacinto
7a0e6ae6d2 feat(qwen3tts.cpp): add new backend (#9316)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-11 23:14:26 +02:00
LocalAI [bot]
e4bfc42a2d chore: ⬆️ Update leejet/stable-diffusion.cpp to 6b675a5ede9b0edf0a0f44191e8b79d7ef27615a (#9320)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 23:07:30 +02:00
LocalAI [bot]
7edd3ea96f chore(model-gallery): ⬆️ update checksum (#9321)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 22:53:48 +02:00
LocalAI [bot]
b20a2f1cea feat(swagger): update swagger (#9318)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 22:31:36 +02:00
Ettore Di Giacinto
8ab0744458 feat: backend versioning, upgrade detection and auto-upgrade (#9315)
* feat: add backend versioning data model foundation

Add Version, URI, and Digest fields to BackendMetadata for tracking
installed backend versions and enabling upgrade detection. Add Version
field to GalleryBackend. Add UpgradeAvailable/AvailableVersion fields
to SystemBackend. Implement GetImageDigest() for lightweight OCI digest
lookups via remote.Head. Record version, URI, and digest at install time
in InstallBackend() and propagate version through meta backends.

* feat: add backend upgrade detection and execution logic

Add CheckBackendUpgrades() to compare installed backend versions/digests
against gallery entries, and UpgradeBackend() to perform atomic upgrades
with backup-based rollback on failure. Includes Agent A's data model
changes (Version/URI/Digest fields, GetImageDigest).

* feat: add AutoUpgradeBackends config and runtime settings

Add configuration and runtime settings for backend auto-upgrade:
- RuntimeSettings field for dynamic config via API/JSON
- ApplicationConfig field, option func, and roundtrip conversion
- CLI flag with LOCALAI_AUTO_UPGRADE_BACKENDS env var
- Config file watcher support for runtime_settings.json
- Tests for ToRuntimeSettings, ApplyRuntimeSettings, and roundtrip

* feat(ui): add backend version display and upgrade support

- Add upgrade check/trigger API endpoints to config and api module
- Backends page: version badge, upgrade indicator, upgrade button
- Manage page: version in metadata, context-aware upgrade/reinstall button
- Settings page: auto-upgrade backends toggle

* feat: add upgrade checker service, API endpoints, and CLI command

- UpgradeChecker background service: checks every 6h, auto-upgrades when enabled
- API endpoints: GET /backends/upgrades, POST /backends/upgrades/check, POST /backends/upgrade/:name
- CLI: `localai backends upgrade` command, version display in `backends list`
- BackendManager interface: add UpgradeBackend and CheckUpgrades methods
- Wire upgrade op through GalleryService backend handler
- Distributed mode: fan-out upgrade to worker nodes via NATS

* fix: use advisory lock for upgrade checker in distributed mode

In distributed mode with multiple frontend instances, use PostgreSQL
advisory lock (KeyBackendUpgradeCheck) so only one instance runs
periodic upgrade checks and auto-upgrades. Prevents duplicate
upgrade operations across replicas.

Standalone mode is unchanged (simple ticker loop).

* test: add e2e tests for backend upgrade API

- Test GET /api/backends/upgrades returns 200 (even with no upgrade checker)
- Test POST /api/backends/upgrade/:name accepts request and returns job ID
- Test full upgrade flow: trigger upgrade via API, wait for job completion,
  verify run.sh updated to v2 and metadata.json has version 2.0.0
- Test POST /api/backends/upgrades/check returns 200
- Fix nil check for applicationInstance in upgrade API routes
2026-04-11 22:31:15 +02:00
thelittlefireman
7c1865b307 Fix load of z-image-turbo (#9264)
* Fix load of z-image and improve speed

Signed-off-by: thelittlefireman <5165783+thelittlefireman@users.noreply.github.com>

* Remove diffusion_flash_attn from z-image-ggml.yaml

Removed 'diffusion_flash_attn' parameter from configuration.

Signed-off-by: thelittlefireman <5165783+thelittlefireman@users.noreply.github.com>

---------

Signed-off-by: thelittlefireman <5165783+thelittlefireman@users.noreply.github.com>
2026-04-11 08:42:13 +02:00
LocalAI [bot]
62a674ce12 chore: ⬆️ Update ggml-org/llama.cpp to e62fa13c2497b2cd1958cb496e9489e86bbd5182 (#9312)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 08:39:10 +02:00
LocalAI [bot]
c39213443b feat(swagger): update swagger (#9310)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 08:38:55 +02:00
LocalAI [bot]
606f462da4 chore: ⬆️ Update PABannier/sam3.cpp to 01832ef85fcc8eb6488f1d01cd247f07e96ff5a9 (#9311)
⬆️ Update PABannier/sam3.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-11 08:38:30 +02:00
Ettore Di Giacinto
5c35e85fe2 feat: allow to pin models and skip from reaping (#9309)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-11 08:38:17 +02:00
Leigh Phillips
062e0d0d00 feat: Add toggle mechanism to enable/disable models from loading on demand (#9304)
* feat: add toggle mechanism to enable/disable models from loading on demand

Implements #9303 - Adds ability to disable models from being auto-loaded
while keeping them in the collection.

Backend changes:
- Add Disabled field to ModelConfig struct with IsDisabled() getter
- New ToggleModelEndpoint handler (PUT /models/toggle/:name/:action)
- Request middleware returns 403 when disabled model is requested
- Capabilities endpoint exposes disabled status

Frontend changes:
- Toggle switch in System > Models table Actions column
- Visual indicators: dimmed row, red Disabled badge, muted icons
- Tooltip describes toggle function on hover
- Loading state while API call is in progress

* fix: remove extra closing brace causing syntax error in request middleware

* refactor: reorder Actions column - Stop button before toggle switch

* refactor: migrate from toggle to toggle-state per PR review feedback
2026-04-10 18:17:41 +02:00
LocalAI [bot]
d4cd6c284f chore: ⬆️ Update ggml-org/llama.cpp to d132f22fc92f36848f7ccf2fc9987cd0b0120825 (#9302)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-10 08:46:45 +02:00
Ettore Di Giacinto
3bb8b65d31 chore(qwen3-asr): pass prompt as context to transcribe (#9301)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-10 08:45:59 +02:00
Ettore Di Giacinto
9748a1cbc6 fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299)
When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token
produces a JSON array with two elements: a role-init chunk and the
actual content chunk. The grpc-server loop called attach_chat_deltas
for both elements with the same raw_result pointer, stamping the first
token's ChatDelta.Content on both replies. The Go side accumulated both,
emitting the first content token twice to SSE clients.

Fix: in the array iteration loops in PredictStream, detect role-init
elements (delta has "role" key) and skip attach_chat_deltas for them.
Only content/reasoning elements get chat deltas attached.

Reasoning models are unaffected because their first token goes into
reasoning_content, not content.
2026-04-10 08:45:47 +02:00
LocalAI [bot]
6bc76dda6d feat(swagger): update swagger (#9300)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-10 01:05:53 +02:00
Ettore Di Giacinto
e1a6010874 fix(streaming): deduplicate tool call emissions during streaming (#9292)
The Go-side incremental JSON parser was emitting the same tool call on
every streaming token because it lacked the len > lastEmittedCount guard
that the XML parser had. On top of that, the post-streaming default:
case re-emitted all tool calls from index 0, duplicating everything.

This produced duplicate delta.tool_calls events causing clients to
accumulate arguments as "{args}{args}" — invalid JSON.

Fixes:
- JSON incremental parser: add len(jsonResults) > lastEmittedCount guard
  and loop from lastEmittedCount (matching the XML parser pattern)
- Post-streaming default: case: skip i < lastEmittedCount entries that
  were already emitted during streaming
- JSON parser: use blocking channel send (matching XML parser behavior)
2026-04-10 00:44:25 +02:00
Ettore Di Giacinto
706cf5d43c feat(sam.cpp): add sam.cpp detection backend (#9288)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-09 21:49:11 +02:00
Ettore Di Giacinto
13a6ed709c fix: thinking models with tools returning empty content (reasoning-only retry loop) (#9290)
When clients like Nextcloud or Home Assistant send requests with tools
to thinking models (e.g. Gemma 4 with <|channel>thought tags), the
response was empty despite the backend producing valid content.

Root cause: the C++ autoparser puts clean content in both the raw
Response and ChatDeltas. The Go-side PrependThinkingTokenIfNeeded
then prepends the thinking start token to the already-clean content,
causing ExtractReasoning to classify the entire response as unclosed
reasoning. This made cbRawResult empty, triggering a retry loop that
never succeeds.

Two fixes:
- inference.go: check ChatDeltas for content/tool_calls regardless of
  whether Response is empty, so skipCallerRetry fires correctly
- chat.go: when ChatDeltas have content but no tool calls, use that
  content directly instead of falling back to the empty cbRawResult
2026-04-09 18:30:31 +02:00
Ettore Di Giacinto
85be4ff03c feat(api): add ollama compatibility (#9284)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-09 14:15:14 +02:00
Ettore Di Giacinto
b0d9ce4905 Remove header from OpenAI Realtime API documentation
Removed the header from the Realtime API documentation.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-09 09:00:28 +02:00
LocalAI [bot]
7081b54c09 chore: ⬆️ Update leejet/stable-diffusion.cpp to e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef (#9281)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-09 08:12:23 +02:00
Ettore Di Giacinto
2b05420f95 chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' (#9282)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-09 08:12:05 +02:00
Ettore Di Giacinto
b64347b6aa chore: add gemma4 to the gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 23:44:16 +00:00
Ettore Di Giacinto
e00ce981f0 fix: try to add whisperx and faster-whisper for more variants (#9278)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 21:23:38 +02:00
Ettore Di Giacinto
285f7d4340 chore: add embeddingemma
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 17:40:55 +00:00
Richard Palethorpe
ea6e850809 feat: Add Kokoros backend (#9212)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-08 19:23:16 +02:00
Ettore Di Giacinto
b7247fc148 fix(whisperx): add alias
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 14:40:08 +00:00
Ettore Di Giacinto
39c6b3ed66 feat: track files being staged (#9275)
This changeset makes visible when files are being staged, so users are
aware that the model "isn't ready yet" for requests.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 14:33:58 +02:00
Ettore Di Giacinto
0e9d1a6588 chore(ci): drop unnecessary test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 12:19:54 +00:00
Ettore Di Giacinto
510d6759fe fix(nodes): better detection if nodes goes down or model is not available (#9274)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 12:11:02 +02:00
Ettore Di Giacinto
154fa000d3 fix(autoscaling): extract load model from Route() and use as well when doing autoscale (#9270)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-08 08:27:51 +02:00
LocalAI [bot]
0526e60f8d chore: ⬆️ Update ggml-org/llama.cpp to 66c4f9ded01b29d9120255be1ed8d5835bcbb51d (#9269)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-08 08:27:38 +02:00
LocalAI [bot]
db600fb5b2 docs: ⬆️ update docs version mudler/LocalAI (#9268)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-08 08:27:27 +02:00
Richard Palethorpe
9ac1bdc587 feat(ui): Interactive model config editor with autocomplete (#9149)
* feat(ui): Add dynamic model editor with autocomplete

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(docs): Add link to longformat installation video

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-07 14:42:23 +02:00
dependabot[bot]
fdc9f7bf35 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.64.0 to 0.65.0 (#9254)
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus

Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.64.0 to 0.65.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.64.0...exporters/prometheus/v0.65.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.65.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-07 00:39:52 +02:00
LocalAI [bot]
8e59346091 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0 (#9262)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:39:38 +02:00
LocalAI [bot]
e6e4e19633 chore: ⬆️ Update ace-step/acestep.cpp to e0c8d75a672fca5684c88c68dbf6d12f58754258 (#9261)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:39:24 +02:00
Ettore Di Giacinto
505c417fa7 fix(gpu): better detection for MacOS and Thor (#9263)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-07 00:39:07 +02:00
LocalAI [bot]
17215f6fbc docs: ⬆️ update docs version mudler/LocalAI (#9260)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:38:50 +02:00
LocalAI [bot]
bccaba1f66 chore: ⬆️ Update ggml-org/llama.cpp to d0a6dfeb28a09831d904fc4d910ddb740da82834 (#9259)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:38:36 +02:00
Ettore Di Giacinto
0f9d516a6c fix(anthropic): do not emit empty tokens and fix SSE tool calls (#9258)
This fixes Claude Code compatibility

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-07 00:38:21 +02:00
dependabot[bot]
33b124c6f1 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.12 to 1.32.14 (#9256)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.12 to 1.32.14.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.12...config/v1.32.14)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.14
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:46:52 +02:00
dependabot[bot]
6b8007e88e chore(deps): bump github.com/jaypipes/ghw from 0.23.0 to 0.24.0 (#9250)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.24.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:46:18 +02:00
dependabot[bot]
b3837c2078 chore(deps): bump google.golang.org/grpc from 1.79.3 to 1.80.0 (#9253)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.79.3 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.79.3...v1.80.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:45:50 +02:00
Ettore Di Giacinto
92f99b1ec3 fix(token): login via legacy api keys (#9249)
We were not checking against the api keys when db == nil.

This commit also cleanups now unused middleware

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-06 21:45:09 +02:00
LocalAI [bot]
ad232fdb1a docs: ⬆️ update docs version mudler/LocalAI (#9241)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:53:07 +02:00
LocalAI [bot]
11637b5a1b chore: ⬆️ Update leejet/stable-diffusion.cpp to 7397ddaa86f4e8837d5261724678cde0f36d4d89 (#9242)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:52:51 +02:00
LocalAI [bot]
0dda4fe6f0 chore: ⬆️ Update ggml-org/llama.cpp to 761797ffdf2ce3f118e82c663b1ad7d935fbd656 (#9243)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:52:38 +02:00
Ettore Di Giacinto
773489eeb1 fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244)
* fix(chat): do not retry if we had chatdeltas or tooldeltas from backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use oai compat for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: apply to non-streaming path too

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* map also other fields

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-06 10:52:23 +02:00
Ettore Di Giacinto
06fbe48b3f feat(llama.cpp): wire speculative decoding settings (#9238)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-05 14:56:30 +02:00
Ettore Di Giacinto
232e324a68 fix(autoparser): correctly pass by logprobs (#9239)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-05 09:39:22 +02:00
ER-EPR
39c954764c Update index.yaml and add Qwen3.5 model files (#9237)
* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Add mmproj files for Qwen3.5 models

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update file paths for Qwen models in index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Refactor Qwen3-Reranker-0.6B entry in index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update qwen3.yaml configuration parameters

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

---------

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>
2026-04-05 09:21:21 +02:00
Ettore Di Giacinto
9b7d5513fc chore(gallery): add mmproj file for gemma4
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-05 02:02:52 +02:00
LocalAI [bot]
84cd8c0e7f chore: ⬆️ Update ggml-org/llama.cpp to b8635075ffe27b135c49afb9a8b5c434bd42c502 (#9231)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-04 23:02:58 +02:00
LocalAI [bot]
d990f2790c chore(model-gallery): ⬆️ update checksum (#9233)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-04 23:02:41 +02:00
Ettore Di Giacinto
53deeb1107 fix(reasoning): suppress partial tag tokens during autoparser warm-up
The C++ PEG parser needs a few tokens to identify the reasoning format
(e.g. "<|channel>thought\n" for Gemma 4). During this warm-up, the gRPC
layer was sending raw partial tag tokens to Go, which leaked into the
reasoning field.

- Clear reply.message in gRPC when autoparser is active but has no diffs
  yet, matching llama.cpp server behavior of only emitting classified output
- Prefer C++ autoparser chat deltas for reasoning/content in all streaming
  paths, falling back to Go-side extraction for backends without autoparser
  (e.g. vLLM)
- Override non-streaming no-tools result with chat delta content when available
- Guard PrependThinkingTokenIfNeeded against partial tag prefixes during
  streaming accumulation
- Reorder default thinking tokens so <|channel>thought is checked before
  <|think|> (Gemma 4 templates contain both)
2026-04-04 20:45:57 +00:00
Ettore Di Giacinto
c5a840f6af fix(reasoning): warm-up
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 20:25:24 +00:00
Ettore Di Giacinto
6d9d77d590 fix(reasoning): accumulate and strip reasoning tags from autoparser results (#9227)
fix(reasoning): acccumulate and strip reasoning tags from autoparser results

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 18:15:32 +02:00
Ettore Di Giacinto
6f304d1201 chore(refactor): use interface (#9226)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 17:29:37 +02:00
Richard Palethorpe
557d0f0f04 feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-04 15:14:35 +02:00
Ettore Di Giacinto
b7e3589875 fix(anthropic): show null index when not present, default to 0 (#9225)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 15:13:17 +02:00
Ettore Di Giacinto
716ddd697b feat(autoparser): prefer chat deltas from backends when emitted (#9224)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 12:12:08 +02:00
Ettore Di Giacinto
223deb908d fix(nats): improve error handling (#9222)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 12:11:54 +02:00
Ettore Di Giacinto
9f8821bba8 feat(gemma4): add thinking support (#9221)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 12:11:38 +02:00
Ettore Di Giacinto
84e51b68ef fix(ui): pass by staticApiKeyRequired to show login when only api key is configured (#9220)
This fixes #9213

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-04 12:11:22 +02:00
LocalAI [bot]
7962dd16f7 chore: ⬆️ Update ggml-org/llama.cpp to d006858316d4650bb4da0c6923294ccd741caefd (#9215)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-04 09:44:39 +02:00
LocalAI [bot]
a1466b305a docs: ⬆️ update docs version mudler/LocalAI (#9214)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-04 09:44:25 +02:00
github-actions[bot]
57c0026715 chore: bump inference defaults from unsloth (#9219)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-04 09:44:12 +02:00
Ettore Di Giacinto
1ed6b9e5ed fix(llama.cpp): correctly parse grpc header for bearer token auth
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-03 21:38:41 +00:00
LocalAI [bot]
e4ee74354f chore(model gallery): 🤖 add 1 new models via gallery agent (#9210)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-03 16:23:17 +02:00
Ettore Di Giacinto
8577bdcebc Update asset links in README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-03 10:24:08 +02:00
Ettore Di Giacinto
0d489c7a0d Add guided tour and update screenshots section
Updated README to include a guided tour section with links to various assets and details about agents and usage metrics.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-03 10:23:03 +02:00
Ettore Di Giacinto
11dc54bda9 fix(docs): commit distribution.md
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-03 10:14:13 +02:00
Ettore Di Giacinto
7e0b73deaa fix(docs): fix broken references to distributed mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-03 09:46:06 +02:00
LocalAI [bot]
c0a023d13d chore: ⬆️ Update ggml-org/llama.cpp to a1cfb645307edc61a89e41557f290f441043d3c2 (#9203)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-03 08:30:15 +02:00
Loryan Strant
0d3ae1c295 docs: Update Home Assistant integrations list (#9206)
Update Home Assistant integrations list

Signed-off-by: Loryan Strant <51473494+loryanstrant@users.noreply.github.com>
2026-04-03 08:30:00 +02:00
LocalAI [bot]
e9f10f2f50 chore(model gallery): 🤖 add 1 new models via gallery agent (#9202)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-02 21:22:19 +02:00
Ettore Di Giacinto
b95b0b72ff chore(ci): fix gallery agent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-02 18:02:18 +00:00
LocalAI [bot]
26f1b94f4d chore: ⬆️ Update ggml-org/llama.cpp to 95a6ebabb277c4cc18247e7bc2a5502133caca63 (#9199)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-02 08:53:16 +02:00
LocalAI [bot]
2d40725ca2 chore: ⬆️ Update leejet/stable-diffusion.cpp to 87ecb95cbc65dc8e58e3d88f4f4a59a0939796f5 (#9200)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-02 08:53:04 +02:00
Ettore Di Giacinto
6c635e8353 feat: add resume endpoint to undrain nodes (#9197)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-01 18:21:43 +02:00
LocalAI [bot]
cc5f33ce95 chore: ⬆️ Update ggml-org/llama.cpp to 0fcb3760b2b9a3a496ef14621a7e4dad7a8df90f (#9196)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-01 00:48:40 +02:00
LocalAI [bot]
ba7cdd532a chore: ⬆️ Update leejet/stable-diffusion.cpp to 09b12d5f6d51d862749e8e0ee8baac8f012089e2 (#9195)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-01 00:48:25 +02:00
Ettore Di Giacinto
6b6c136210 fix(inflight): count inflight from load model, but release afterwards (#9194)
This should fix the count of 1 in flight always showing in the node list

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 23:24:45 +02:00
Ettore Di Giacinto
e587ecc485 chore(ui): allow to unload forcefully
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 17:20:53 +00:00
Ettore Di Giacinto
f259036a27 feat(gpu): add jetson/tegra detection
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 15:45:07 +00:00
Ettore Di Giacinto
221ff0f28f feat(ui): show cluster status in home in distributed mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 15:37:58 +00:00
Ettore Di Giacinto
16d5cb00bd chore: css cleanups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 16:37:38 +02:00
Richard Palethorpe
952635fba6 feat(distributed): Avoid resending models to backend nodes (#9193)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-31 16:28:13 +02:00
Ettore Di Giacinto
3cc05af2e5 chore(nodes): restore offline nodes too
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 14:22:18 +00:00
Copilot
87a63316c7 stablediffusion-ggml: replace hand-maintained enum string arrays with upstream API calls (#9192)
* Initial plan

* Remove hand-maintained enum string arrays in gosd.cpp, use upstream API functions

Agent-Logs-Url: https://github.com/mudler/LocalAI/sessions/561fb489-89ed-4588-8f1e-7b967d91ba37

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-31 14:53:38 +02:00
Richard Palethorpe
efdcbbe332 feat(api): Return 404 when model is not found except for model names in HF format (#9133)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-31 10:48:21 +02:00
Ettore Di Giacinto
b4fff9293d chore: small ui improvements in the node page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 08:41:40 +00:00
dependabot[bot]
8180221b7e chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/common/template (#9176)
chore(deps): bump grpcio in /backend/python/common/template

Bumps [grpcio](https://github.com/grpc/grpc) from 1.78.1 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.78.1...v1.80.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 10:11:04 +02:00
dependabot[bot]
52a9755e08 chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/rerankers (#9181)
chore(deps): bump grpcio in /backend/python/rerankers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.78.1 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.78.1...v1.80.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 10:10:50 +02:00
dependabot[bot]
a2a1d919f9 chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/coqui (#9182)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.78.1 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.78.1...v1.80.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 10:10:35 +02:00
dependabot[bot]
a3d37931ec chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/vllm (#9177)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.78.1 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.78.1...v1.80.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 10:10:17 +02:00
dependabot[bot]
5b2e25ebb0 chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/transformers (#9180)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.78.1 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.78.1...v1.80.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-31 10:10:03 +02:00
LocalAI [bot]
b0b37a472f chore: ⬆️ Update ggml-org/llama.cpp to 08f21453aec846867b39878500d725a05bd32683 (#9190)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-31 09:27:08 +02:00
Ettore Di Giacinto
3db12eaa7a fix(oauth/invite): do not register user (prending approval) without correct invite (#9189)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 08:29:07 +02:00
Ettore Di Giacinto
8862e3ce60 feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186)
* always enable parallel requests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: move tests to ginkgo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(smart router): order by available vram

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-31 08:28:56 +02:00
LocalAI [bot]
80699a3f70 feat(swagger): update swagger (#9187)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-30 23:48:06 +02:00
dependabot[bot]
309a59f61e chore(deps): bump actions/upload-pages-artifact from 3 to 4 (#9179)
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 3 to 4.
- [Release notes](https://github.com/actions/upload-pages-artifact/releases)
- [Commits](https://github.com/actions/upload-pages-artifact/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/upload-pages-artifact
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:16:23 +02:00
dependabot[bot]
65c9380389 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.62.0 to 0.64.0 (#9178)
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus

Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.62.0 to 0.64.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.62.0...exporters/prometheus/v0.64.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.64.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:16:09 +02:00
dependabot[bot]
79963c56bf chore(deps): bump github.com/pion/webrtc/v4 from 4.2.9 to 4.2.11 (#9185)
Bumps [github.com/pion/webrtc/v4](https://github.com/pion/webrtc) from 4.2.9 to 4.2.11.
- [Release notes](https://github.com/pion/webrtc/releases)
- [Commits](https://github.com/pion/webrtc/compare/v4.2.9...v4.2.11)

---
updated-dependencies:
- dependency-name: github.com/pion/webrtc/v4
  dependency-version: 4.2.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:15:54 +02:00
dependabot[bot]
7004ce0b78 chore(deps): bump github.com/nats-io/nats.go from 1.49.0 to 1.50.0 (#9183)
Bumps [github.com/nats-io/nats.go](https://github.com/nats-io/nats.go) from 1.49.0 to 1.50.0.
- [Release notes](https://github.com/nats-io/nats.go/releases)
- [Commits](https://github.com/nats-io/nats.go/compare/v1.49.0...v1.50.0)

---
updated-dependencies:
- dependency-name: github.com/nats-io/nats.go
  dependency-version: 1.50.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:15:39 +02:00
dependabot[bot]
702d0e0e4d chore(deps): bump google.golang.org/grpc from 1.79.1 to 1.79.3 (#9175)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.79.1 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.79.1...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:15:25 +02:00
dependabot[bot]
d6de208d6c chore(deps): bump actions/configure-pages from 5 to 6 (#9174)
Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 5 to 6.
- [Release notes](https://github.com/actions/configure-pages/releases)
- [Commits](https://github.com/actions/configure-pages/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/configure-pages
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:15:10 +02:00
dependabot[bot]
7451145e0c chore(deps): bump actions/deploy-pages from 4 to 5 (#9172)
Bumps [actions/deploy-pages](https://github.com/actions/deploy-pages) from 4 to 5.
- [Release notes](https://github.com/actions/deploy-pages/releases)
- [Commits](https://github.com/actions/deploy-pages/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/deploy-pages
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:14:57 +02:00
dependabot[bot]
cfda3dd0df chore(deps): bump actions/checkout from 4 to 6 (#9173)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-30 23:14:43 +02:00
Richard Palethorpe
e0eb2fd734 chore(ci): Scope tests extras backend tests (#9170)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-30 17:46:07 +00:00
Ettore Di Giacinto
dd3376e0a9 chore(workers): improve logging, set header timeouts (#9171)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-30 17:26:55 +02:00
Richard Palethorpe
520e1ce3cd fix(kokoro): Download phonemization model during installation (#9165)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-30 15:08:48 +02:00
LocalAI [bot]
3d738164b7 chore: ⬆️ Update ggml-org/llama.cpp to 7c203670f8d746382247ed369fea7fbf10df8ae0 (#9160)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-30 08:27:26 +02:00
LocalAI [bot]
56db76599a chore: ⬆️ Update ggml-org/whisper.cpp to 95ea8f9bfb03a15db08a8989966fd1ae3361e20d (#9168)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-30 08:27:11 +02:00
LocalAI [bot]
ad57cdfefe chore: ⬆️ Update leejet/stable-diffusion.cpp to f16a110f8776398ef23a2a6b7b57522c2471637a (#9167)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-30 08:26:45 +02:00
Richard Palethorpe
c2f7d1c18b feat(ui): Add media history to studio pages (e.g. past images) (#9151)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-30 00:49:55 +02:00
ER-EPR
afe79568d6 fix: huggingface repo change the file name so Update index.yaml is needed (#9163)
* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Add mmproj files for Qwen3.5 models

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update file paths for Qwen models in index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

---------

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>
2026-03-30 00:48:17 +02:00
Ettore Di Giacinto
59108fbe32 feat: add distributed mode (#9124)
* feat: add distributed mode (experimental)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix data races, mutexes, transactions

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix events and tool stream in agent chat

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* use ginkgo

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(cron): compute correctly time boundaries avoiding re-triggering

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* enhancements, refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not flood of healthy checks

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not list obvious backends as text backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tests fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactoring and consolidation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop redundant healthcheck

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* enhancements, refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-30 00:47:27 +02:00
LocalAI [bot]
4c870288d9 chore: ⬆️ Update ggml-org/llama.cpp to 59d840209a5195c2f6e2e81b5f8339a0637b59d9 (#9144)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-28 18:18:06 +01:00
Ettore Di Giacinto
8da7212763 fix(ci): checkout submodules
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-28 00:33:31 +01:00
Ettore Di Giacinto
6e76052f9d ci: set gh-pages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-27 21:26:55 +00:00
Richard Palethorpe
cf84db36ec fix(voxcpm): Force using a recent voxcpm version to kick the dependency solver (#9150)
fix(voxcpm): Allow packages to be fetched from all indexes

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-27 15:38:51 +01:00
Richard Palethorpe
d3f629f183 feat: Merge repeated log lines in the terminal (#9141)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-26 22:16:13 +01:00
Richard Palethorpe
b1aa707a92 fix(coqui,nemo,voxcpm): Add dependencies to allow CI to progress (#9142)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-26 18:03:56 +01:00
LocalAI [bot]
731176ce3a feat(swagger): update swagger (#9136)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-26 07:58:11 +01:00
LocalAI [bot]
b86fa63f70 chore: ⬆️ Update ggml-org/llama.cpp to a970515bdb0b1d09519106847660b0d0c84d2472 (#9137)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-26 07:56:41 +01:00
walcz-de
00fcf6936c fix: implement encoding_format=base64 for embeddings endpoint (#9135)
The OpenAI Node.js SDK v4+ sends encoding_format=base64 by default.
LocalAI previously ignored this parameter and always returned a float
JSON array, causing a silent data corruption bug in any Node.js client
(AnythingLLM Desktop, LangChain.js, LlamaIndex.TS, …):

  // What the client does when it expects base64 but receives a float array:
  Buffer.from(floatArray, 'base64')

Node.js treats a non-string first argument as a byte array — each
float32 value is truncated to a single byte — and Float32Array then
reads those bytes as floats, yielding dims/4 values.  Vector databases
(Qdrant, pgvector, …) then create collections with the wrong dimension,
causing all similarity searches to fail silently.

  e.g. granite-embedding-107m (384 dims) → 96 stored in Qdrant
       jina-embeddings-v3      (1024 dims) → 256 stored in Qdrant

Changes:
- core/schema/prediction.go: add EncodingFormat string field to
  PredictionOptions so the request parameter is parsed and available
  throughout the request pipeline
- core/schema/openai.go: add EmbeddingBase64 string field to Item;
  add MarshalJSON so the "embedding" JSON key emits either []float32
  or a base64 string depending on which field is populated — all other
  Item consumers (image, video endpoints) are unaffected
- core/http/endpoints/openai/embeddings.go: add floatsToBase64()
  which packs a float32 slice as little-endian bytes and base64-encodes
  it; add embeddingItem() helper; both InputToken and InputStrings loops
  now honour encoding_format=base64

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 17:38:07 +01:00
Richard Palethorpe
26384c5c70 fix(docs): Use notice instead of alert (#9134)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-25 13:55:48 +01:00
LocalAI [bot]
7209457f53 chore: ⬆️ Update ace-step/acestep.cpp to 6f35c874ee11e86d511b860019b84976f5b52d3a (#9128)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-25 07:52:31 +01:00
LocalAI [bot]
9bc68b2721 chore: ⬆️ Update ggml-org/llama.cpp to 9f102a1407ed5d73b8c954f32edab50f8dfa3f58 (#9127)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-25 07:52:14 +01:00
Richard Palethorpe
7bdd198fd3 fix(downloader): Rewrite full https HF URI with HF_ENDPOINT (#9107)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-24 18:32:52 +01:00
dependabot[bot]
b296e3d94b chore(deps): bump github.com/mudler/skillserver from 0.0.5 to 0.0.6 (#9116)
Bumps [github.com/mudler/skillserver](https://github.com/mudler/skillserver) from 0.0.5 to 0.0.6.
- [Release notes](https://github.com/mudler/skillserver/releases)
- [Commits](https://github.com/mudler/skillserver/compare/v0.0.5...v0.0.6)

---
updated-dependencies:
- dependency-name: github.com/mudler/skillserver
  dependency-version: 0.0.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 08:51:02 +01:00
dependabot[bot]
c91855a9b2 chore(deps): bump peter-evans/create-pull-request from 7 to 8 (#9114)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 7 to 8.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](https://github.com/peter-evans/create-pull-request/compare/v7...v8)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 08:50:50 +01:00
dependabot[bot]
e8e445cd43 chore(deps): bump actions/checkout from 4 to 6 (#9110)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 08:50:36 +01:00
dependabot[bot]
735c426072 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.0 to 1.4.1 (#9118)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.4.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 00:36:23 +01:00
dependabot[bot]
0976b8a17b chore(deps): bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 (#9121)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.2 to 0.21.3.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.21.2...v0.21.3)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.21.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 00:35:55 +01:00
LocalAI [bot]
2ad8c149e0 chore: ⬆️ Update ggml-org/llama.cpp to 1772701f99dd3fc13f5783b282c2361eda8ca47c (#9123)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-24 00:35:40 +01:00
LocalAI [bot]
31fcb1425d chore: ⬆️ Update ggml-org/llama.cpp to 49bfddeca18e62fa3d39114a23e9fcbdf8a22388 (#9102)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-23 01:11:18 +01:00
LocalAI [bot]
470d5e506f feat(swagger): update swagger (#9103)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-22 21:28:20 +01:00
Ettore Di Giacinto
0ee49cf42e Fix formatting in LocalAI description
Updated the description formatting for LocalAI.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-22 21:28:07 +01:00
Ettore Di Giacinto
cecd8d6aa5 chore(docs): simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 20:24:44 +00:00
Ettore Di Giacinto
15935e9d5f fix(auth): do not allow to register in invite mode (#9101)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 20:44:03 +01:00
Ettore Di Giacinto
5d410e5a03 fix(download): do not remove dst dir until we try all fallbacks (#9100)
This actually caused fallbacks to be compeletely no-op as we were
removing the destination dir before calling containerd.Apply

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 10:29:57 +01:00
Ettore Di Giacinto
5df77d7e8c Remove how-tos section link from README
Removed outdated link to community curated how-tos section.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-22 10:05:13 +01:00
Ettore Di Giacinto
f891d60d26 fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099)
chore(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:58:14 +01:00
Ettore Di Giacinto
be25217955 chore(transformers): bump to >5.0 and generically load models (#9097)
* chore(transformers): bump to >5.0

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: refactor to use generic model loading

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:57:54 +01:00
LocalAI [bot]
b74111feed chore: ⬆️ Update ggml-org/llama.cpp to 990e4d96980d0b016a2b07049cc9031642fb9903 (#9095)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-22 00:57:39 +01:00
LocalAI [bot]
bf92117259 chore: ⬆️ Update ggml-org/whisper.cpp to 76684141a5d059be71cbe23dc2f0ed552213ba2d (#9094)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-22 00:57:28 +01:00
Ettore Di Giacinto
031a36c995 feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092)
* feat: wire min_p

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: inferencing defaults

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(refactor): re-use iterative parser

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: generate automatically inference defaults from unsloth

Instead of trying to re-invent the wheel and maintain here the inference
defaults, prefer to consume unsloth ones, and contribute there as
necessary.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: apply defaults also to models installed via gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: be consistent and apply fallback to all endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:57:15 +01:00
LocalAI [bot]
8036d22ec6 chore: ⬆️ Update ace-step/acestep.cpp to 7326a7bea0c2037982ec924f7364e998df70450c (#9086)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-22 00:56:52 +01:00
Ettore Di Giacinto
f7e8d9e791 feat(quantization): add quantization backend (#9096)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:56:34 +01:00
Ettore Di Giacinto
4b183b7bb6 feat: add quota system (#9090)
* feat: add quota system

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-21 10:09:49 +01:00
Ettore Di Giacinto
f38e91d80b feat(ui): add predictor for usage, user-breakdown statistics (#9091)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-21 10:09:36 +01:00
LocalAI [bot]
aa3e82976e chore: ⬆️ Update ggml-org/llama.cpp to 4cb7e0bd61e7e1101e8ab10db5dee70c5717a386 (#9087)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-21 09:41:11 +01:00
Ettore Di Giacinto
d9c1db2b87 feat: add (experimental) fine-tuning support with TRL (#9088)
* feat: add fine-tuning endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(experimental): add fine-tuning endpoint and TRL support

This changeset defines new GRPC signatues for Fine tuning backends, and
add TRL backend as initial fine-tuning engine. This implementation also
supports exporting to GGUF and automatically importing it to LocalAI
after fine-tuning.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* commit TRL backend, stop by killing process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* move fine-tune to generic features

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add evals, reorder menu

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-21 02:08:02 +01:00
LocalAI [bot]
f7e3aab4fc feat(swagger): update swagger (#9085)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-20 21:45:03 +01:00
Richard Palethorpe
73bdc3b50d fix(realtime): Set the alias for opus so the development backend can be selected (#9083)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-20 15:08:07 +01:00
Richard Palethorpe
cb63bdb9e4 feat(ui): Add model pipeline editor (#9070)
This creates a new model config page. Presently just allows configuring
pipelines, but can be extending the future to other types of models.
However pipelines are quite easy to create a form for and require
editing to create.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-20 15:07:34 +01:00
Richard Palethorpe
8cd3f9fc47 feat(ui, openai): Structured errors and link to traces in error toast (#9068)
First when sending errors over SSE we now clearly identify them as such
instead of just sending the error string as a chat completion message.

We use this in the UI to identify errors and link to them to the traces.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-20 15:06:07 +01:00
lif
e0ab1a8b43 fix: use exact tag matching for model gallery tag filtering (#9041)
The Search() method uses strings.Contains() on comma-joined tags,
causing substring false positives (e.g., "asr" matching "image-diffusers").

Add FilterByTag() method that checks each tag with strings.EqualFold()
for exact, case-insensitive matching. Add 'tag' query parameter to
/api/models and /api/backends endpoints. Update the React frontend to
send filter selections as 'tag' instead of 'term'.

Closes #8775

Signed-off-by: majiayu000 <1835304752@qq.com>
2026-03-20 08:37:45 +01:00
Ettore Di Giacinto
c3174f9543 chore(deps): bump llama-cpp to 'a0bbcdd9b6b83eeeda6f1216088f42c33d464e38' (#9079)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-20 08:12:21 +01:00
LocalAI [bot]
2b12875302 fix: Add tracing settings loading from runtime_settings.json (#9081)
Tracing settings (EnableTracing and TracingMaxItems) were not being
loaded from runtime_settings.json on startup, causing tracing settings
configured via WebUI to be lost after service restart.

This fix adds proper loading of tracing settings in
loadRuntimeSettingsFromFile function in core/application/startup.go.

Fixes #9072

Co-authored-by: localai-bot <localai-bot@localai.io>
2026-03-20 00:58:52 +01:00
Ettore Di Giacinto
9cdbd89c1f chore(agents.md): update with auth/feature gating instructions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-19 22:52:28 +00:00
LocalAI [bot]
7d81bf0aa3 chore: ⬆️ Update ggml-org/whisper.cpp to 9386f239401074690479731c1e41683fbbeac557 (#9077)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-19 23:27:35 +01:00
Tv
a6d0e29eba fix(openresponses): do not omit required field ORItemParam.Arguments (#9074)
See #9047
2026-03-19 22:04:45 +01:00
LocalAI [bot]
6054d2a91b feat(swagger): update swagger (#9075)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-19 21:49:40 +01:00
Ettore Di Giacinto
aea21951a2 feat: add users and authentication support (#9061)
* feat(ui): add users and authentication support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: allow the admin user to impersonificate users

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: ui improvements, disable 'Users' button in navbar when no auth is configured

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: add OIDC support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: gate models

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: cache requests to optimize speed

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* small UI enhancements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ui): style improvements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: cover other paths by auth

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: separate local auth, refactor

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* security hardening, approval mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: fix tests and expectations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: update localagi/localrecall

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-19 21:40:51 +01:00
LocalAI [bot]
bbe9067227 docs: Add troubleshooting guide for embedding models (fixes #9064) (#9065)
docs: Add troubleshooting guide for embedding models (#9064)

- Add section on using gallery models for embeddings
- Document common issues with embedding model configuration
- Add troubleshooting guide for Qwen3 embedding models
- Include correct configuration examples for Qwen3-Embedding-4B
- Document context size limits and dimension parameters
- Add table of Qwen3 embedding model specifications

Fixes #9064

Signed-off-by: localai-bot <localai-bot@localai.io>
Co-authored-by: localai-bot <localai-bot@localai.io>
2026-03-19 19:41:12 +01:00
LocalAI [bot]
9a9da062e1 chore: ⬆️ Update ggml-org/llama.cpp to 5744d7ec430e2f875a393770195fda530560773f (#9063)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-19 07:58:30 +01:00
LocalAI [bot]
dd1a8b174f chore: ⬆️ Update ggml-org/whisper.cpp to ef3463bb29ef90d25dfabfd1e75993111c52412d (#9062)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-19 07:58:11 +01:00
Richard Palethorpe
cfb7641eea feat(ui, gallery): Show model backends and add searchable model/backend selector (#9060)
* feat(ui, gallery): Display and filter by the backend models use

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(ui): Add searchable model backend/model selector and prevent delete models being selected

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-18 21:14:41 +01:00
Richard Palethorpe
e832efeb9e fix(ui): Refresh model list on deletion (#9059)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-18 14:07:45 +01:00
dependabot[bot]
a42548e9d1 chore(deps): bump playwright from 1.52.0 to 1.58.2 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9055)
chore(deps): bump playwright

Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [playwright](https://github.com/microsoft/playwright).


Updates `playwright` from 1.52.0 to 1.58.2
- [Release notes](https://github.com/microsoft/playwright/releases)
- [Commits](https://github.com/microsoft/playwright/compare/v1.52.0...v1.58.2)

---
updated-dependencies:
- dependency-name: playwright
  dependency-version: 1.58.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-18 14:05:59 +01:00
LocalAI [bot]
8615ce28a8 feat: Add standalone agent run mode inspired by LocalAGI (#9056)
- Add 'agent' subcommand with 'run' and 'list' sub-commands
- Support running agents by name from pool.json registry
- Support running agents from JSON config files
- Implement foreground mode with --prompt flag for single-turn interactions
- Reuse AgentPoolService for consistent agent initialization
- Add comprehensive unit tests for config loading and overrides

Fixes #8960

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-18 14:04:20 +01:00
Ettore Di Giacinto
8336efec41 fix(ui): correctly display backend if specified in the model config, re-order MCP buttons (#9053)
fix(ui): correctly display backend if specified in the model config

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-18 09:58:25 +01:00
LocalAI [bot]
8560a1e571 chore: ⬆️ Update ace-step/acestep.cpp to ab020a9aefcd364423e0665da12babc6b0c7b507 (#9046)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-18 08:54:15 +01:00
LocalAI [bot]
29c33e6a6a chore: ⬆️ Update ggml-org/whisper.cpp to dc9611662265870df22a7230b7586176a99c1955 (#9045)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-18 08:46:35 +01:00
LocalAI [bot]
a58475dbef chore: ⬆️ Update ggml-org/llama.cpp to ee4801e5a6ee7ee4063144ab44ab4e127f76fba8 (#9044)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-18 08:46:12 +01:00
Tv
8a0edd0809 Always populate ORItemParam.Summary (#9049)
* fix(openresponses): do not omit required fields summary and id

* fix(openresponses): ensure ORItemParam.Summary is never null

Normalize Summary to an empty slice at serialization chokepoints
(sendSSEEvent, bufferEvent, buildORResponse) so it always serializes
as [] instead of null.

Closes #9047
2026-03-18 08:45:46 +01:00
Richard Palethorpe
35d509d8e7 feat(ui): Per model backend logs and various fixes (#9028)
* feat(gallery): Switch to expandable box instead of pop-over and display model files

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(ui, backends): Add individual backend logging

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ui): Set the context settings from the model config

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-18 08:31:26 +01:00
Ettore Di Giacinto
eef808d921 fix: call .String() on AllowedTools
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-17 23:16:19 +00:00
Ettore Di Giacinto
9d9ea5c1a0 chore(deps): bump skillserver
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-17 21:04:52 +00:00
LocalAI [bot]
e21ad5cfaa chore: ⬆️ Update leejet/stable-diffusion.cpp to 545fac4f3fb0117a4e962b1a04cf933a7e635933 (#9036)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-17 18:07:30 +01:00
dependabot[bot]
05ab0c0aa2 chore(deps): bump github.com/google/go-containerregistry from 0.21.1 to 0.21.2 (#9033)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.1 to 0.21.2.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.21.1...v0.21.2)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.21.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-17 11:43:25 +01:00
dependabot[bot]
e2b6233570 chore(deps): bump actions/upload-artifact from 4 to 7 (#9030)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-17 11:42:49 +01:00
dependabot[bot]
19f995f38f chore(deps): bump github.com/ebitengine/purego from 0.9.1 to 0.10.0 (#9034)
Bumps [github.com/ebitengine/purego](https://github.com/ebitengine/purego) from 0.9.1 to 0.10.0.
- [Release notes](https://github.com/ebitengine/purego/releases)
- [Commits](https://github.com/ebitengine/purego/compare/v0.9.1...v0.10.0)

---
updated-dependencies:
- dependency-name: github.com/ebitengine/purego
  dependency-version: 0.10.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-17 11:42:06 +01:00
dependabot[bot]
ac168bbc60 chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.26.0 to 1.27.0 (#9035)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.26.0...v1.27.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.27.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-17 11:41:36 +01:00
LocalAI [bot]
5c5e537b31 chore: ⬆️ Update ace-step/acestep.cpp to 15740f4301b3ec3020875f1fb975a6cfdb2f6767 (#9038)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-17 10:22:53 +01:00
LocalAI [bot]
118bcee196 chore: ⬆️ Update ggml-org/llama.cpp to 9b342d0a9f2f4892daec065491583ec2be129685 (#9039)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-17 10:22:42 +01:00
LocalAI [bot]
3eabd6d1d0 chore: ⬆️ Update ggml-org/whisper.cpp to 79218f51d02ffe70575ef7fba3496dfc7adda027 (#9037)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-17 08:25:31 +01:00
Ettore Di Giacinto
ee96e5e08d chore: refactor endpoints to use same inferencing path, add automatic retrial mechanism in case of errors (#9029)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-16 21:31:02 +01:00
Richard Palethorpe
3d9ccd1ddc fix(ui): Add tracing inline settings back and create UI tests (#9027)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-16 17:51:06 +01:00
Ettore Di Giacinto
d8161bfe57 fix(api): unescape model names (#9024)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-16 00:56:37 +01:00
Ettore Di Giacinto
5fd42399d4 feat: support streaming mode for tool calls in agent mode, fix interleaved thinking stream (#9023)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-16 00:50:19 +01:00
LocalAI [bot]
b2030255ca chore: ⬆️ Update ggml-org/llama.cpp to 88915cb55c14769738fcab7f1c6eaa6dcc9c2b0c (#9020)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-16 00:10:11 +01:00
LocalAI [bot]
9f903ec06e chore: ⬆️ Update leejet/stable-diffusion.cpp to 862a6586cb6fcec037c14f9ed902329ecec7d990 (#9019)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-16 00:09:59 +01:00
Ettore Di Giacinto
4ea461c330 fix(ui): correctly map watchdog fields (#9022)
Fixes: https://github.com/mudler/LocalAI/issues/9018

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-15 22:12:24 +01:00
Ettore Di Giacinto
042a9b8ef6 Remove Table of Contents from README
Removed the Table of Contents section from the README.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-15 21:49:13 +01:00
Ettore Di Giacinto
65f1a4154a Revise README structure and content
Updated the README to include new sections and remove outdated content.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-15 21:47:08 +01:00
LocalAI [bot]
c6a51289b0 fix: Automatically disable mmap for Intel SYCL backends (#9012) (#9015)
* fix: Automatically disable mmap for Intel SYCL backends

Fixes issue #9012 where Qwen3.5 models fail to load on Intel Arc GPU
with RPC EOF error.

The Intel SYCL backend has a known issue where mmap enabled causes
the backend to hang. This change automatically disables mmap when
detecting Intel or SYCL backends.

References:
- https://github.com/mudler/LocalAI/issues/9012
- Documentation mentions: SYCL hangs when mmap: true is set

* feat: Add logging for mmap auto-disable on Intel SYCL backends

As requested in PR review, add xlog.Info call to log when mmap
is automatically disabled for Intel SYCL backends. This helps
with debugging and confirms the auto-disable logic is working.

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-15 21:06:35 +01:00
LocalAI [bot]
87525109f1 chore: ⬆️ Update ggml-org/llama.cpp to 3a6f059909ed5dab8587df5df4120315053d57a4 (#9009)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-15 09:46:45 +01:00
Ettore Di Giacinto
c596d8a5d9 fix: Change baseDir assignment to use ModelPath (#9010)
Fixes: https://github.com/mudler/LocalAI/issues/9005

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-15 09:45:58 +01:00
LocalAI [bot]
d79ad76e48 docs: ⬆️ update docs version mudler/LocalAI (#9008)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-15 08:39:49 +01:00
Ettore Di Giacinto
dde0353432 chore(api): add path to expose collection raw files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 22:31:45 +00:00
Ettore Di Giacinto
8e8b7df715 fix(ui): do not let from button to trigger
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 17:35:04 +00:00
Ettore Di Giacinto
5affb747a9 chore: drop AIO images (#9004)
AIO images are behind, and takes effort to maintain these. Wizard and
installation of models have been semplified massively, so AIO images
lost their purpose.

This allows us to be more laser focused on main images and reliefes
stress from CI.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 17:49:36 +01:00
Ettore Di Giacinto
0ac4ac5bdd chore: configure data volume
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 14:46:52 +00:00
Ettore Di Giacinto
0725b6f334 chore: bump dependencies
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 10:14:42 +00:00
Ettore Di Giacinto
0a856c9bae chore: bump dependencies
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 10:06:05 +00:00
LocalAI [bot]
977063c4ba chore: ⬆️ Update ggml-org/llama.cpp to e30f1fdf74ea9238ff562901aa974c75aab6619b (#8997)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-14 01:16:42 +01:00
LocalAI [bot]
0ec3ea4a46 fix(acestep-cpp): resolve relative model paths in options (#8993)
* fix(acestep-cpp): resolve relative model paths in options

The acestep-cpp backend was failing to load models because the model
paths in options (text_encoder_model, dit_model, vae_model) were being
passed to the C++ code without resolving their relative paths.

When a user configures acestep-cpp-turbo-4b, the model paths are specified
as relative paths like 'acestep-cpp/acestep-v15-turbo-Q8_0.gguf'. The
backend was passing these paths directly to the C++ code without joining
them with the model directory.

This fix:
1. Gets the base directory from the ModelFile path
2. Resolves all relative paths in options to be absolute paths
3. Adds debug logging to show resolved paths for troubleshooting

Fixes #8991

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* test: fix acestep tests to not join modeldir in options

According to code review feedback, the Options array in TestLoadModel
and TestSoundGeneration should contain just the model filenames without
filepath.Join with modelDir. The model paths are handled internally by
the backend.

* fix: change bpm parameter type to float32 to match C++ API signature

* test: fix TestLoadModel and TestSoundGeneration to use baseDir for model paths

- Modified TestLoadModel to compute baseDir from main model path and use it for relative model paths
- Modified TestSoundGeneration similarly to use baseDir for model paths
- Changed bpm parameter type from int32 to float32 to match C++ API

* Apply suggestions from code review

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-14 01:16:13 +01:00
Richard Palethorpe
87b3e10024 fix(flux.2-klein-9b): Use Qwen3-8b to avoid GGML assertion failure on tensor mismatch (#8995)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-13 21:39:31 +01:00
Richard Palethorpe
e4c1c39727 fix(conf): Don't overwrite env provided galleries with runtime conf (#8994)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-13 21:39:03 +01:00
Richard Palethorpe
ed2c6da4bf fix(ui): Move routes to /app to avoid conflict with API endpoints (#8978)
Also test for regressions in HTTP GET API key exempted endpoints because
this list can get out of sync with the UI routes.

Also fix support for proxying on a different prefix both server and
client side.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-13 21:38:18 +01:00
Richard Palethorpe
f9a850c02a feat(realtime): WebRTC support (#8790)
* feat(realtime): WebRTC support

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(tracing): Show full LLM opts and deltas

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-13 21:37:15 +01:00
Ettore Di Giacinto
4e3bf2752d fix(tests): Remove 'huggingface' substring expectation
Remove check for 'huggingface' in response body.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-13 13:12:04 +01:00
Ettore Di Giacinto
0b53bc5061 fix(ci): Delete leftovers huggingface backend from backend.yml
Removed huggingface backend configuration from the workflow.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-13 09:59:43 +01:00
LocalAI [bot]
ff21bc6cbb chore: ⬆️ Update ace-step/acestep.cpp to 5aa065445541094cba934299cd498bbb9fa5c434 (#8984)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-13 07:59:34 +01:00
LocalAI [bot]
46a8941a2c chore: ⬆️ Update ggml-org/llama.cpp to 57819b8d4b39d893408e51520dff3d47d1ebb757 (#8983)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-13 07:59:15 +01:00
LocalAI [bot]
c0351b8e6a Remove HuggingFace backend support (#8971)
* Remove HuggingFace backend support, restore other backends

- Removed backend/go/huggingface directory and all related files
- Removed pkg/langchain/huggingface.go
- Removed LCHuggingFaceBackend from pkg/model/initializers.go
- Removed huggingface backend entries from backend/index.yaml
- Updated backend/README.md to remove HuggingFace backend reference
- Restored kitten-tts, local-store, silero-vad, piper backends that were incorrectly removed

This change removes only HuggingFace backend support from LocalAI
as per the P0 priority request in issue #8963, while preserving
other backends (kitten-tts, local-store, silero-vad, piper).

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>

* Remove huggingface backend from test.yml build command

The tests-linux CI job was failing because it was trying to build the
huggingface backend which no longer exists after the backend removal.
This removes huggingface from the build command in test.yml.

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-13 01:09:30 +01:00
Ettore Di Giacinto
ec91c477dc Remove model descriptions from index.yaml
Removed description fields from multiple model entries in index.yaml.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 21:43:21 +01:00
LocalAI [bot]
3b9abffdc8 chore(model-gallery): ⬆️ update checksum (#8985)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-12 21:34:27 +01:00
Ettore Di Giacinto
6c11c54a3b fix: avoid race condition
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 18:46:22 +00:00
Ettore Di Giacinto
13bd0d9944 fix(collections): start agent pool after http server (#8981)
Otherwise if using collections with postgresql we create a deadlock, as
we need embeddings to be up

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 19:25:49 +01:00
Ettore Di Giacinto
a738f8b0e4 feat(backends): add ace-step.cpp (#8965)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 18:56:26 +01:00
Ettore Di Giacinto
8f3efaed15 Update model entry in index.yaml
Removed description and license fields from model entry.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 18:51:29 +01:00
LocalAI [bot]
c4cccb728e chore(model gallery): 🤖 add 1 new models via gallery agent (#8980)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-12 18:51:00 +01:00
Ettore Di Giacinto
b209947e81 Remove Qwen3.5-35B model and update model path
Removed deprecated Qwen3.5-35B-A3B model configuration and updated model path for Qwen3.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 18:19:55 +01:00
Ettore Di Giacinto
14e82d76f9 chore(ui): improve errors and reporting during model installation (#8979)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 18:19:06 +01:00
LocalAI [bot]
f73a158153 docs: Document GPU auto-fit mode limitations and trade-offs (closes #8562) (#8954)
* docs: Add documentation about GPU auto-fit mode limitations (closes #8562)

- Document the default gpu_layers behavior (9999999) that disables auto-fit
- Explain the trade-off between auto-fit and VRAM threshold unloading
- Add recommendations for users who want to enable gpu_layers: -1
- Note known issues with tensor_buft_override buffer errors
- Link to issue #8562 for future improvements

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 13:35:31 +01:00
Richard Palethorpe
b24ca51287 fix(llama-cpp): Set enable_thinking in the correct place (#8973)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-12 13:32:29 +01:00
Sertaç Özercan
45d18813bd fix: gate CUDA directory checks on GPU vendor to prevent false CUDA detection (#8942)
Container images that install CUDA runtime libraries (e.g., cuda-cudart-12-5
via apt) create /usr/local/cuda-12 directories as a side effect. The previous
code checked for these directories before checking whether a GPU was present,
causing CPU-only hosts to select a CUDA backend that crashes because
libcuda.so.1 is absent.

Reorder checks so CUDA directory existence only refines the capability when
an NVIDIA GPU is actually detected, consistent with the arm64 L4T code path.

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2026-03-12 07:53:39 +01:00
Ettore Di Giacinto
7dc691c171 feat: add fish-speech backend (#8962)
* feat: add fish-speech backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop portaudio

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 07:48:23 +01:00
Ettore Di Giacinto
17f36e73b5 Remove model name entries from index.yaml
Removed 'name' entries for various models in the index file.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 01:13:58 +01:00
Ettore Di Giacinto
031909d85a Clean up gallery index by removing obsolete models
Removed multiple models and their associated metadata from the gallery index.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-12 00:55:19 +01:00
LocalAI [bot]
996dd7652f feat(swagger): update swagger (#8961)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-11 22:01:55 +01:00
LocalAI [bot]
bf4f8da266 fix: include model name in mmproj file path to prevent model isolation (#8937) (#8940)
* fix: include model name in mmproj file path to prevent model isolation issues

This fix addresses issue #8937 where different models with mmproj files
having the same filename (e.g., mmproj-F32.gguf) would overwrite each other.

By including the model name in the path (llama-cpp/mmproj/<model-name>/<filename>),
each model's mmproj files are now stored in separate directories, preventing
the collision that caused conversations to fail when switching between models.

Fixes #8937

Signed-off-by: LocalAI Bot <localai-bot@example.com>

* test: update test expectations for model name in mmproj path

The test file had hardcoded expectations for the old mmproj path format.
Updated the test expectations to include the model name subdirectory
to match the new path structure introduced in the fix.

Fixes CI failures on tests-apple and tests-linux

* fix: add model name to model path for consistency with mmproj path

This change makes the model path consistent with the mmproj path by
including the model name subdirectory in both paths:
- mmproj: llama-cpp/mmproj/<model-name>/<filename>
- model: llama-cpp/models/<model-name>/<filename>

This addresses the reviewer's feedback that the model config generation
needs to correctly reference the mmproj file path.

Fixes the issue where the model path didn't include the model name
subdirectory while the mmproj path did.

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>

---------

Signed-off-by: LocalAI Bot <localai-bot@example.com>
Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
2026-03-11 10:28:37 +01:00
Attila Györffy
5a67b5d73c Fix image upload processing and img2img pipeline in diffusers backend (#8879)
* fix: add missing bufio.Flush in processImageFile

The processImageFile function writes decoded image data (from base64
or URL download) through a bufio.NewWriter but never calls Flush()
before closing the underlying file. Since bufio's default buffer is
4096 bytes, small images produce 0-byte files and large images are
truncated — causing PIL to fail with "cannot identify image file".

This breaks all image input paths: file, files, and ref_images
parameters in /v1/images/generations, making img2img, inpainting,
and reference image features non-functional.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* fix: merge options into kwargs in diffusers GenerateImage

The GenerateImage method builds a local `options` dict containing the
source image (PIL), negative_prompt, and num_inference_steps, but
never merges it into `kwargs` before calling self.pipe(**kwargs).
This causes img2img to fail with "Input is in incorrect format"
because the pipeline never receives the image parameter.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: add unit test for processImageFile base64 decoding

Verifies that a base64-encoded PNG survives the write path
(encode → decode → bufio.Write → Flush → file on disk) with
byte-for-byte fidelity. The test image is small enough to fit
entirely in bufio's 4096-byte buffer, which is the exact scenario
where the missing Flush() produced a 0-byte file.

Also tests that invalid base64 input is handled gracefully.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: verify GenerateImage merges options into pipeline kwargs

Mocks the diffusers pipeline and calls GenerateImage with a source
image and negative prompt. Asserts that the pipeline receives the
image, negative_prompt, and num_inference_steps via kwargs — the
exact parameters that were silently dropped before the fix.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* fix: move kwargs.update(options) earlier in GenerateImage

Move the options merge right after self.options merge (L742) so that
image, negative_prompt, and num_inference_steps are available to all
downstream code paths including img2vid and txt2vid.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: convert processImageFile tests to ginkgo

Replace standard testing with ginkgo/gomega to be consistent with
the rest of the test suites in the project.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

---------

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-11 08:05:50 +01:00
LocalAI [bot]
270eb956c7 chore: ⬆️ Update ggml-org/llama.cpp to 10e5b148b061569aaee8ae0cf72a703129df0eab (#8946)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-11 08:04:09 +01:00
Ettore Di Giacinto
8818452d85 feat(ui): MCP Apps, mcp streaming and client-side support (#8947)
* Revert "fix: Add timeout-based wait for model deletion completion (#8756)"

This reverts commit 9e1b0d0c82.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: add mcp prompts and resources

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add client-side MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): allow to authenticate MCP servers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add MCP Apps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: update AGENTS

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: allow to collapse navbar, save state in storage

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add MCP button also to home page

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(chat): populate string content

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-11 07:30:49 +01:00
LocalAI [bot]
79f90de935 chore(model-gallery): ⬆️ update checksum (#8945)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-10 21:43:52 +01:00
LocalAI [bot]
bda826d005 chore(model gallery): 🤖 add 1 new models via gallery agent (#8939)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-10 18:09:11 +01:00
Ettore Di Giacinto
89076bab92 fix(ui): wrap struct to pass metadata fields
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-10 08:48:25 +00:00
LocalAI [bot]
199fe89cfe feat: Expand section index pages with comprehensive navigation (M7) (#8929)
feat: expand section index pages with comprehensive navigation (M7)

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-10 07:34:44 +01:00
LocalAI [bot]
de55ff3725 fix: correct grammar in CONTRIBUTING.md documentation section (#8932)
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-10 07:33:11 +01:00
dependabot[bot]
6bdfefda96 chore(deps): bump node from 22-slim to 25-slim (#8922)
Bumps node from 22-slim to 25-slim.

---
updated-dependencies:
- dependency-name: node
  dependency-version: 25-slim
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-10 07:31:03 +01:00
LocalAI [bot]
b48920ecf6 chore: ⬆️ Update ggml-org/llama.cpp to 23fbfcb1ad6c6f76b230e8895254de785000be46 (#8921)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-10 07:30:43 +01:00
LocalAI [bot]
515cd968ae chore: ⬆️ Update leejet/stable-diffusion.cpp to d6dd6d7b555c233bb9bc9f20b4751eb8c9269743 (#8925)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-10 07:29:54 +01:00
Ettore Di Giacinto
85f3558d22 feat(ui): add canvas mode, support history in agent chat (#8927)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-09 23:42:47 +01:00
dependabot[bot]
01bd3d8212 chore(deps): bump docker/login-action from 3 to 4 (#8918)
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:30:11 +01:00
dependabot[bot]
7f11f66b44 chore(deps): bump docker/build-push-action from 6 to 7 (#8919)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:29:51 +01:00
dependabot[bot]
82fd1cada0 chore(deps): bump actions/setup-node from 4 to 6 (#8920)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 6.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:29:36 +01:00
dependabot[bot]
2a351e1f0c chore(deps): bump docker/metadata-action from 5 to 6 (#8917)
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6.
- [Release notes](https://github.com/docker/metadata-action/releases)
- [Commits](https://github.com/docker/metadata-action/compare/v5...v6)

---
updated-dependencies:
- dependency-name: docker/metadata-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:27:02 +01:00
dependabot[bot]
585d6248f2 chore(deps): bump github.com/labstack/echo/v4 from 4.15.0 to 4.15.1 (#8914)
Bumps [github.com/labstack/echo/v4](https://github.com/labstack/echo) from 4.15.0 to 4.15.1.
- [Release notes](https://github.com/labstack/echo/releases)
- [Changelog](https://github.com/labstack/echo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/labstack/echo/compare/v4.15.0...v4.15.1)

---
updated-dependencies:
- dependency-name: github.com/labstack/echo/v4
  dependency-version: 4.15.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:26:39 +01:00
dependabot[bot]
ab4d965218 chore(deps): bump go.yaml.in/yaml/v2 from 2.4.3 to 2.4.4 (#8913)
Bumps [go.yaml.in/yaml/v2](https://github.com/yaml/go-yaml) from 2.4.3 to 2.4.4.
- [Commits](https://github.com/yaml/go-yaml/compare/v2.4.3...v2.4.4)

---
updated-dependencies:
- dependency-name: go.yaml.in/yaml/v2
  dependency-version: 2.4.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:25:49 +01:00
dependabot[bot]
6dddab7114 chore(deps): bump github.com/openai/openai-go/v3 from 3.24.0 to 3.26.0 (#8916)
Bumps [github.com/openai/openai-go/v3](https://github.com/openai/openai-go) from 3.24.0 to 3.26.0.
- [Release notes](https://github.com/openai/openai-go/releases)
- [Changelog](https://github.com/openai/openai-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-go/compare/v3.24.0...v3.26.0)

---
updated-dependencies:
- dependency-name: github.com/openai/openai-go/v3
  dependency-version: 3.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:23:29 +01:00
dependabot[bot]
e0b8501b61 chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.40.0 to 1.42.0 (#8915)
chore(deps): bump go.opentelemetry.io/otel/sdk/metric

Bumps [go.opentelemetry.io/otel/sdk/metric](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.42.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.42.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk/metric
  dependency-version: 1.42.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-09 22:23:15 +01:00
LocalAI [bot]
dfaaaad0f1 chore: Standardize CLI flag naming to kebab-case (M12) (#8912)
feat: standardize CLI flag naming to kebab-case with backwards compatibility

- Rename --p2ptoken to --p2p-token for consistency
- Add deprecation alias for old --p2ptoken flag
- Fix broken name tag in config check command
- Add runtime deprecation warning system (core/cli/deprecations.go)
- Document kebab-case naming convention in code comments
- Maintain full backwards compatibility via kong aliases

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-09 22:15:39 +01:00
LocalAI [bot]
6a633f2533 feat(swagger): update swagger (#8923)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-09 22:14:17 +01:00
Ettore Di Giacinto
2c81852773 chore(ui): use same chat interface for agent
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-09 17:28:44 +00:00
Ettore Di Giacinto
75428d8d1f fix(ui): minor visual enhancements (#8909)
- Fixes thinking box overflowing in other pages
- Shows loading icon to the active chats

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-09 18:02:26 +01:00
Ettore Di Giacinto
05a3d00924 chore(size): display size of HF models and allow to specify it from the gallery (#8907)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-09 17:38:14 +01:00
LocalAI [bot]
74db732873 feat: Redesign explorer and models pages with react-ui theme (#8903)
feat: redesign explorer and models pages with react-ui theme

- Updated logo and branding to match LocalAI's current design
- Applied react-ui color scheme and CSS variables throughout
- Added grid/list view toggle for models page
- Implemented enhanced filter chips with active state highlighting
- Added sort options and improved pagination
- Redesigned explorer page cards and token display
- Modernized navbar styling with sticky positioning
- Improved modal design with inline actions
- Ensured mobile-responsive design maintained

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-09 17:32:32 +01:00
Ettore Di Giacinto
a026277ab9 feat(mlx-distributed): add new MLX-distributed backend (#8801)
* feat(mlx-distributed): add new MLX-distributed backend

Add new MLX distributed backend with support for both TCP and RDMA for
model sharding.

This implementation ties in the discovery implementation already in
place, and re-uses the same P2P mechanism for the TCP MLX-distributed
inferencing.

The Auto-parallel implementation is inspired by Exo's
ones (who have been added to acknowledgement for the great work!)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose a CLI to facilitate backend starting

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: make manual rank0 configurable via model configs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add missing features from mlx backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-09 17:29:32 +01:00
LocalAI [bot]
734b6d391f chore(model gallery): 🤖 add 1 new models via gallery agent (#8904)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-09 17:01:56 +01:00
LocalAI [bot]
66a7bd38d2 chore(model gallery): 🤖 add 1 new models via gallery agent (#8902)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-09 15:59:33 +01:00
LocalAI [bot]
564d7f49dd chore(model gallery): 🤖 add 1 new models via gallery agent (#8901)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-09 15:27:48 +01:00
LocalAI [bot]
d200401e86 feat: Add --data-path CLI flag for persistent data separation (#8888)
feat: add --data-path CLI flag for persistent data separation

- Add LOCALAI_DATA_PATH environment variable and --data-path CLI flag
- Default data path: /data (separate from configuration directory)
- Automatic migration on startup: moves agent_tasks.json, agent_jobs.json, collections/, and assets/ from old config dir to new data path
- Backward compatible: preserves old behavior if LOCALAI_DATA_PATH is not set
- Agent state and job directories now use DataPath with proper fallback chain
- Update documentation with new flag and docker-compose example

This separates mutable persistent data (collectiondb, agents, assets, skills) from configuration files, enabling better volume mounting and data persistence in containerized deployments.

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-09 14:11:15 +01:00
LocalAI [bot]
95aef32492 docs: make examples repository link more prominent (#8895)
docs: make examples repository link more prominent in README

Add a badge-style button link to the examples repository in the main
README and expand examples/README.md with example categories and
quick-start links to help new users discover the examples repo.

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 09:26:16 +01:00
LocalAI [bot]
316bacdff5 feat: Add tabs to System view for Models and Backends (#8885)
feat: add tabs to System view for Models and Backends

- Split System view into two tabs: Models and Backends
- Use URL search params and localStorage for tab state persistence
- Optimize API calls to only fetch data for active tab
- Add tab counts in labels showing number of items
- Use existing tab CSS patterns from the codebase
- Maintain all existing functionality with improved UX

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-09 09:25:27 +01:00
LocalAI [bot]
9da24cdf85 docs: Update model compatibility documentation with missing backends (#8889)
* docs: Update model compatibility documentation with missing backends

Added the following backends to README.md and compatibility-table.md:
- vllm-omni: Multimodal vLLM with vision and audio support
- nemo: NVIDIA NeMo framework for speech models
- outetts: OuteTTS with voice cloning capabilities
- faster-qwen3-tts: Faster Qwen3 TTS implementation
- qwen-asr: Qwen automatic speech recognition
- voxcpm: VoxCPM speech understanding model
- whisperx: Enhanced Whisper with word-level transcription

These backends exist in the codebase (backend/index.yaml) but were missing
from the documentation. This update ensures accurate reflection of currently
supported backends in LocalAI.

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@example.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-09 09:22:34 +01:00
LocalAI [bot]
f06c02d10e chore: ⬆️ Update ggml-org/llama.cpp to 35bee031e17ed2b2e8e7278b284a6c8cd120d9f8 (#8872)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-08 22:25:04 +01:00
LocalAI [bot]
a29aaf9804 chore: Update README.md screenshot references and alt text (#8862)
docs: fix README screenshot references and clean up alt text

The Talk Interface was incorrectly using screenshot_tts.png (same as
Generate Audio) instead of screenshot_talk.png. Also standardized
alt text across all screenshot references for consistency.

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 22:24:37 +01:00
Ettore Di Giacinto
b2f81bfa2e feat(functions): add peg-based parsing and allow backends to return tool calls directly (#8838)
* feat(functions): add peg-based parsing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: support returning toolcalls directly from backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: do run PEG only if backend didn't send deltas

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-08 22:21:57 +01:00
LocalAI [bot]
b57a6e42f1 feat: Add documentation URLs to CLI help text (#8874)
feat: add documentation URLs to CLI help text

- Add link to main documentation (https://localai.io/)
- Add link to getting started guide
- Add link to GitHub issues for support
- Improves user experience by providing direct access to resources

Reference: UX Review Issue L5

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 22:13:01 +01:00
LocalAI [bot]
9297074caa docs: expand GPU acceleration guide with L4T, multi-GPU, monitoring, and troubleshooting (#8858)
- Expand multi-GPU section to cover llama.cpp (CUDA_VISIBLE_DEVICES,
  HIP_VISIBLE_DEVICES) in addition to diffusers
- Add NVIDIA L4T/Jetson section with quick start commands and cross-reference
  to the dedicated ARM64 page
- Add GPU monitoring section with vendor-specific tools (nvidia-smi, rocm-smi,
  intel_gpu_top)
- Add troubleshooting section covering common issues: GPU not detected, CPU
  fallback, OOM errors, unsupported ROCm targets, SYCL mmap hang
- Replace "under construction" warning with useful cross-references to related
  docs (container images, VRAM management)

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 21:59:57 +01:00
LocalAI [bot]
2133031b47 feat: Create comprehensive troubleshooting guide (M1 task) (#8856)
* feat: create comprehensive troubleshooting guide (M1 task)

- Consolidates troubleshooting information from scattered documentation
- Covers installation, model loading, GPU/memory, API, performance, Docker, and network issues
- Includes diagnostic commands and step-by-step solutions
- Organized by category for easy navigation

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-08 21:58:32 +01:00
LocalAI [bot]
e026b513b2 feat: add MIT license badge to README.md (#8871)
feat: add MIT license badge to README.md header

- Add Shields.io license badge showing 'License: MIT'
- Place badge in header section with other badges
- Link badge to LICENSE file
- Follows existing badge format (for-the-badge style)

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 21:17:36 +01:00
LocalAI [bot]
2334556a8f feat(cli): add configurable backend image fallback tags via CLI options (#8817)
* feat(cli): add configurable backend image fallback tags via CLI options

- Add three new CLI flags: --backend-images-release-tag, --backend-images-branch-tag, --backend-dev-suffix
- Add corresponding fields to SystemState for passing configuration
- Add WithBackendImagesReleaseTag, WithBackendImagesBranchTag, WithBackendDevSuffix options
- Modify getFallbackTagValues to use SystemState instead of environment variables
- Pass CLI options through to SystemState in run.go

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* fix: add missing os import in core/gallery/backends.go

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-08 21:16:37 +01:00
LocalAI [bot]
05b7cce633 feat: add Events column to Agents list page (#8870)
- Add 'Events' column header between 'Status' and 'Actions'
- Fetch observable counts for each agent using /api/agents/<name>/observables
- Display events count as clickable link navigating to agent status page
- Events count updates every 5 seconds with agent refresh interval
- Shows '0' if API call fails for an agent

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 21:15:29 +01:00
LocalAI [bot]
ce33b7e6f8 docs: add comprehensive development setup instructions to CONTRIBUTING.md (H7) (#8860)
* docs: add comprehensive development setup instructions to CONTRIBUTING.md

- Expand prerequisites with Go version requirements and installation links
- Add system dependencies for Ubuntu/Debian, CentOS/RHEL/Fedora, macOS, and Windows
- Document build commands with explanations and key build variables
- Add environment variables section with useful development env vars
- Include development workflow guidelines (branch naming, commit format, PR process)
- Enhance testing section with per-package and focused test instructions

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-08 18:27:00 +01:00
LocalAI [bot]
9090bca920 feat: Add documentation for undocumented API endpoints (#8852)
* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:59:33 +01:00
LocalAI [bot]
ec8f2d7683 fix: Correct Talk Interface screenshot reference in README.md (H6) (#8857)
fix: correct Talk Interface screenshot reference in README.md (H6)

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 17:58:59 +01:00
LocalAI [bot]
ff02e7ff5b docs: clarify SECURITY.md version support table with specific ranges and EOL dates (#8861)
* docs: clarify SECURITY.md version support table with specific ranges and EOL dates

- Add detailed version support table with 3.x (actively supported), 2.x (security fixes until Dec 31, 2026), and 1.x (EOL since Jan 1, 2024)
- Define what each support level means for users
- Add migration guidance for users on older versions
- Replace vague version ranges with specific, actionable information

Signed-off-by: localai-bot <localai-bot@noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-08 17:58:19 +01:00
LocalAI [bot]
85e4871d4d chore: ⬆️ Update leejet/stable-diffusion.cpp to c8fb3d245858d495be1f140efdcfaa0d49de41e5 (#8841)
* chore: ⬆️ update stable-diffusion.cpp to `c8fb3d245858d495be1f140efdcfaa0d49de41e5`

Update stablediffusion-ggml to include fix for SD1 Pix2Pix issue
(leejet/stable-diffusion.cpp#1329).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: localai-bot <localai-bot@noreply.github.com>

* fix: address CI failures in stablediffusion update

Signed-off-by: localai-bot <localai-bot@noreply.github.com>

* fix: resolve remaining CI failures in stablediffusion update

- Move flow_shift to global scope so gen_image() can access the value
  set during load_model() (was causing compilation error)
- Fix sd_type_str array: TQ1_0 should be at index 34, TQ2_0 at index 35
  to match upstream SD_TYPE_TQ1_0=34, SD_TYPE_TQ2_0=35 enum values

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 09:53:08 +01:00
LocalAI [bot]
364ad30a2f feat(downloader): add HF_MIRROR environment variable support (#8847)
- Added HF_MIRROR env var to configure HuggingFace mirror URLs
- HF_MIRROR takes precedence over HF_ENDPOINT for simpler mirror config
- Supports both full URLs (https://hf-mirror.com) and simple hostnames (hf-mirror.com)
- Auto-adds https:// if no scheme is provided
- Also supports HF env var as an alias for HF_MIRROR

Closes #8414

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-08 09:34:44 +01:00
Ettore Di Giacinto
d21369ad7b Update shell completion documentation URL
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-08 09:33:36 +01:00
LocalAI [bot]
efd402207c feat: Add shell completion support for bash, zsh, and fish (#8851)
feat: add shell completion support for bash, zsh, and fish

- Add core/cli/completion.go with dynamic completion script generation
- Add core/cli/completion_test.go with unit tests
- Modify cmd/local-ai/main.go to support completion command
- Modify core/cli/cli.go to add Completion subcommand
- Add docs/content/features/shell-completion.md with installation instructions

The completion scripts are generated dynamically from the Kong CLI model,
so they automatically include all commands, subcommands, and flags.

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 09:32:39 +01:00
LocalAI [bot]
6a928e70bc docs: add Table of Contents to README.md (#8846)
docs: add Table of Contents to README.md for easier navigation

- Add collapsible TOC with anchor links to all major sections
- Include H2 sections and important H3 subsections
- Place TOC after main description, before Local Stack Family
- Use proper markdown anchor link format

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 09:32:26 +01:00
LocalAI [bot]
36c184175a docs: Add comprehensive API error reference documentation (#8848)
docs: add comprehensive API error reference documentation

Document all error response formats (OpenAI, Anthropic, Open Responses),
HTTP status codes, per-endpoint error scenarios, and client error handling
examples based on actual error handling code in the codebase.

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 08:54:51 +01:00
LocalAI [bot]
970079e68a fix: Remove debug print statement from soundgeneration.go (C2) (#8843)
fix: remove debug fmt.Printf statement from soundgeneration.go (#C2)

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 08:49:29 +01:00
Weathercold
f347495de9 fix(qwen-tts): duplicate instruct argument in voice design mode (#8842)
Don't pass instruct because it is added to kwargs

Fixes the error `qwen_tts.inference.qwen3_tts_model.Qwen3TTSModel.generate_voice_design() got multiple values for keyword argument 'instruct'`

Signed-off-by: Weathercold <weathercold.scr@proton.me>
2026-03-08 08:48:22 +01:00
LocalAI [bot]
23aa7aefed chore(docs): Populate coding guidelines in CONTRIBUTING.md (#8840)
docs: populate coding guidelines in CONTRIBUTING.md

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
2026-03-08 08:20:39 +01:00
LocalAI [bot]
1296167f84 chore: ⬆️ Update ggml-org/llama.cpp to c5a778891ba0ddbd4cbb507c823f970595b1adc2 (#8837)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-07 23:28:06 +01:00
Ettore Di Giacinto
326f4bf4bc chore(ci): drop voxcpm-cpu on aarm64 (torchcodec is not supported)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-07 22:24:56 +00:00
LocalAI [bot]
06125c338a feat: update descriptions for first 9 models in gallery/index.yaml (#8831)
feat: update descriptions for first 9 models in gallery/index.yaml from HuggingFace model cards

- Updated qwen3.5-27b-claude-4.6-opus-reasoning-distilled-i1 with reasoning capabilities
- Updated qwen3.5-4b-claude-4.6-opus-reasoning-distilled with reasoning capabilities
- Updated q3.5-bluestar-27b with fine-tuned variant description
- Updated qwen3.5-9b with multimodal capabilities
- Updated qwen3.5-397b-a17b with large-scale model description
- Updated qwen3.5-27b with performance-efficiency balance
- Updated qwen3.5-122b-a10b with MoE architecture description
- Updated qwen3.5-35b-a3b with MoE architecture description
- Updated qwen3-next-80b-a3b-thinking with next-gen model description

Descriptions sourced from HuggingFace model API metadata.

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-07 12:34:29 +01:00
LocalAI [bot]
73158600c8 chore(model gallery): 🤖 add 1 new models via gallery agent (#8830)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-07 09:17:30 +01:00
LocalAI [bot]
40b8f6270e chore(model gallery): 🤖 add 1 new models via gallery agent (#8828)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-07 08:46:57 +01:00
Ettore Di Giacinto
ac48867b7d feat: add agentic management (#8820)
* feat: add standalone and agentic functionalities

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose agents via responses api

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-07 00:03:08 +01:00
LocalAI [bot]
e1df6807dc chore: ⬆️ Update ggml-org/llama.cpp to 566059a26b0ce8faec4ea053605719d399c64cc5 (#8822)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-06 23:53:23 +01:00
LocalAI [bot]
ab315f2725 feat: Add LOCALAI_DISABLE_MCP environment variable to disable MCP support (#8816)
* feat: Add LOCALAI_DISABLE_MCP environment variable to disable MCP support

- Added DisableMCP field to RunCMD struct in core/cli/run.go
- Added LOCALAI_DISABLE_MCP environment variable support
- Added DisableMCP field to ApplicationConfig struct
- Added DisableMCP AppOption function
- Updated MCP endpoint routing to check appConfig.DisableMCP
- When LOCALAI_DISABLE_MCP is set to true/1/yes, MCP endpoints are not registered

When set, all MCP functionality is disabled and appropriate error messages
are returned to users.

Use Cases:
- Security-conscious deployments where MCP is not needed
- Reducing attack surface
- Compliance requirements that prohibit certain protocol support

Environment variable: LOCALAI_DISABLE_MCP=true

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: Add documentation for LOCALAI_DISABLE_MCP environment variable

- Add section explaining how to disable MCP support using environment variable
- Document use cases for disabling MCP
- Provide examples for CLI and Docker usage

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-06 20:44:03 +01:00
BitToby
96efa4fce0 feat: add WebSocket mode support for the response api (#8676)
* feat: add WebSocket mode support for the response api

Signed-off-by: bittoby <218712309+bittoby@users.noreply.github.com>

* test: add e2e tests for WebSocket Responses API

Signed-off-by: bittoby <218712309+bittoby@users.noreply.github.com>

---------

Signed-off-by: bittoby <218712309+bittoby@users.noreply.github.com>
2026-03-06 10:36:59 +00:00
Ettore Di Giacinto
e82b861961 fix(ui): do not lock all components during load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-06 09:35:01 +01:00
LocalAI [bot]
6a15be377a chore: Add LTX-2.3 model to gallery (#8805)
feat: Add LTX-2.3 model to gallery

- Add new entry for LTX-2.3 from Lightricks
- Follows the same structure as existing LTX-2 entry
- References: https://huggingface.co/Lightricks/LTX-2.3

Co-authored-by: localai-bot <localai-bot@example.com>
2026-03-06 09:22:31 +01:00
LocalAI [bot]
9e1b0d0c82 fix: Add timeout-based wait for model deletion completion (#8756)
* fix: Add timeout-based wait for model deletion completion

- Replace simple polling loop with context-based timeout (5 minutes)
- Use select statement for cleaner timeout handling
- Added proper logging for timeout case
- This addresses the code review comment about using context with timeout instead of dangerous polling approach

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* fix: replace goto statements with break in model deletion loop (fixes CI compilation error)

Signed-off-by: LocalAI [bot] <localai-bot@noreply.github.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: LocalAI [bot] <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: LocalAI [bot] <localai-bot@noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-06 01:07:15 +01:00
Ettore Di Giacinto
580517f9db feat: pass-by metadata to predict options (#8795)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-05 22:50:10 +01:00
LocalAI [bot]
0cf7c18177 chore: ⬆️ Update ggml-org/llama.cpp to a0ed91a442ea6b013bd42ebc3887a81792eaefa1 (#8797)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-05 22:49:45 +01:00
LocalAI [bot]
ac91413eb2 chore: ⬆️ Update ggml-org/whisper.cpp to 30c5194c9691e4e9a98b3dea9f19727397d3f46e (#8796)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-05 22:49:32 +01:00
Ettore Di Giacinto
86680ff8bc fix(ui): fix /app redirect
Do not handle redirect individually, but serve the app directly in /

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-05 21:43:46 +00:00
Ettore Di Giacinto
09ddaf94b2 feat(ui): move to React for frontend (#8772)
* feat(ui): move to React

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add import model

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* syntax highlight

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Minor fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-05 21:47:12 +01:00
LocalAI [bot]
61c139fa7d feat: Rename 'Whisper' model type to 'STT' in UI (#8785)
* feat: Rename 'Whisper' model type to 'STT' in UI

- Updated models.html: Changed 'Whisper' filter button to 'STT'
- Updated talk.html: Changed 'Whisper Model' to 'STT Model'
- Updated backends.html: Changed 'Whisper' to 'STT'
- Updated talk.js: Renamed getWhisperModel() to getSTTModel(),
  sendAudioToWhisper() to sendAudioToSTT(), and whisperModelSelect to sttModelSelect

This change makes the UI more consistent with the model category naming,
where all speech-to-text models (including Whisper, Parakeet, Moonshine,
WhisperX, etc.) are grouped under the 'STT' (Speech-to-Text) category.

Fixes #8776

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>

* Rename whisperModelSelect to sttModelSelect in talk.html

As requested by maintainer mudler in PR review, replacing all
whisperModelSelect occurrences with sttModelSelect since the
model type was renamed from Whisper to STT.

Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>

---------

Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
2026-03-05 09:51:47 +01:00
LocalAI [bot]
9fc77909e0 fix: Add vllm-omni backend to video generation model detection (#8659) (#8781)
fix: Add vllm-omni backend to video generation model detection

- Include vllm-omni in the list of backends that support FLAG_VIDEO
- This allows models like vllm-omni-wan2.2-t2v to appear in the video model selector UI
- Fixes issue #8659 where video generation models using vllm-omni backend were not showing in the dropdown

Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
2026-03-05 01:04:47 +01:00
LocalAI [bot]
3dce20b026 docs: add autonomous development team section to README (#8780)
* docs: add autonomous development team section to README

- Add blog post link to Media, Blogs, Social section
- Add new section about autonomous AI agent maintenance team
- Include links to reports.localai.io and project board
- Reference the experiment blog post

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 23:52:36 +01:00
LocalAI [bot]
f25e450414 chore: ⬆️ Update ggml-org/llama.cpp to 24d2ee052795063afffc9732465ca1b1c65f4a28 (#8777)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-04 23:25:48 +01:00
LocalAI [bot]
ee06892cd5 chore(model-gallery): ⬆️ update checksum (#8778)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-04 22:02:14 +01:00
Ettore Di Giacinto
c25dfcc9b4 Update model and OPENAI_MODE in gallery-agent.yaml
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 18:28:53 +01:00
Ettore Di Giacinto
016738a787 Remove descriptions from model entries in index.yaml
Removed model descriptions for several entries in the gallery.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 18:26:45 +01:00
LocalAI [bot]
2938fe5cad chore(model gallery): 🤖 add 1 new models via gallery agent (#8770)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-04 15:55:08 +01:00
Andres
454d8adc76 feat(qwen-tts): Support using multiple voices (#8757)
* Add support for multiple voice clones in Qwen TTS

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Add voice prompt caching and generation logs to see generation time

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 09:47:21 +01:00
LocalAI [bot]
6002c940a9 chore: ⬆️ Update ggml-org/llama.cpp to ecd99d6a9acbc436bad085783bcd5d0b9ae9e9e9 (#8762)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 08:08:37 +01:00
Ettore Di Giacinto
8e6fe4531e chore(ci): update environment variable for external backend
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-03 22:12:37 +01:00
Ettore Di Giacinto
5203fb37a6 fix(ci): remove erroneus abspath call
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-03 19:07:39 +01:00
LocalAI [bot]
eb2a656575 fix: return full embedding dimensions instead of truncating trailing zeros (#8721) (#8755)
fix: return full embedding dimensions instead of truncating trailing zeros

- Remove the logic that strips trailing zeros from embeddings
- Trailing zeros may be valid values in some embedding models
- This fixes the issue where embeddings like jina-v3 returned
  only 1/4 of their native dimensions (256 instead of 1024)
- The truncation was causing vector database dimension mismatch errors
- Fixes issue #8721

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-03 17:08:16 +01:00
LocalAI [bot]
6e5a58ca70 feat: Add Free RPC to backend.proto for VRAM cleanup (#8751)
* fix: Add VRAM cleanup when stopping models

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.

* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n  through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-03 12:39:06 +01:00
Ettore Di Giacinto
1c8db3846d chore(faster-qwen3-tts): Add anyio to requirements.txt
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-03 09:43:29 +01:00
LocalAI [bot]
5139719d59 chore(model gallery): 🤖 add 1 new models via gallery agent (#8743)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-03 09:33:11 +01:00
LocalAI [bot]
d846ad3a84 chore: ⬆️ Update ggml-org/llama.cpp to 4d828bd1ab52773ba9570cc008cf209eb4a8b2f5 (#8727)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-02 23:22:28 +01:00
dependabot[bot]
50cf3ff37f chore(deps): bump github.com/google/go-containerregistry from 0.20.7 to 0.21.1 (#8736)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.20.7 to 0.21.1.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.20.7...v0.21.1)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.21.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:44:40 +01:00
dependabot[bot]
74dff72551 chore(deps): bump go.opentelemetry.io/otel/metric from 1.40.0 to 1.41.0 (#8735)
Bumps [go.opentelemetry.io/otel/metric](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.41.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/metric
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:44:28 +01:00
dependabot[bot]
c4eeab0f7c chore(deps): bump go.opentelemetry.io/otel from 1.40.0 to 1.41.0 (#8734)
Bumps [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.41.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.41.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel
  dependency-version: 1.41.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:44:15 +01:00
dependabot[bot]
432e182af0 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.3.0 to 1.4.0 (#8733)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:44:03 +01:00
dependabot[bot]
d1fcd19cd3 chore(deps): bump github.com/openai/openai-go/v3 from 3.19.0 to 3.24.0 (#8732)
Bumps [github.com/openai/openai-go/v3](https://github.com/openai/openai-go) from 3.19.0 to 3.24.0.
- [Release notes](https://github.com/openai/openai-go/releases)
- [Changelog](https://github.com/openai/openai-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-go/compare/v3.19.0...v3.24.0)

---
updated-dependencies:
- dependency-name: github.com/openai/openai-go/v3
  dependency-version: 3.24.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:43:53 +01:00
dependabot[bot]
dc550395cb chore(deps): bump actions/upload-artifact from 6 to 7 (#8730)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v6...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:43:39 +01:00
dependabot[bot]
00ecc60372 chore(deps): bump actions/download-artifact from 7 to 8 (#8729)
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v7...v8)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 21:43:27 +01:00
LocalAI [bot]
11443dc299 chore(model-gallery): ⬆️ update checksum (#8728)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-02 21:43:12 +01:00
LocalAI [bot]
6d182281cf fix: allow reranking models configured with known_usecases (#8681)
When a model is configured with 'known_usecases: [rerank]' in the YAML
config, the reranking endpoint was not being matched because:
1. The GuessUsecases function only checked for backend == 'rerankers'
2. The syncKnownUsecasesFromString() was not being called when loading
   configs via yaml.Unmarshal in readModelConfigsFromFile

This fix:
1. Updates GuessUsecases to also check if Reranking is explicitly set to
   true in the model config (in addition to checking backend type)
2. Adds syncKnownUsecasesFromString() calls after yaml.Unmarshal in
   readModelConfigsFromFile to ensure known_usecases are properly parsed

Fixes #8658

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-02 19:00:18 +01:00
Ettore Di Giacinto
8f0c3cec39 chore(ci): disable CI actions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-02 14:48:00 +01:00
LocalAI [bot]
2dd4e7cdc3 fix(qwen-tts): ensure all requirements files end with newline (#8724)
- Add trailing newline to all requirements*.txt files in qwen-tts backend
- This ensures proper file formatting and prevents potential issues with
  package installation tools that expect newline-terminated files
2026-03-02 13:56:11 +01:00
LocalAI [bot]
eca2c6e01c fix: Implement responsive line wrapping for model names (#8209) (#8720)
fix: Implement responsive line wrapping for model names on home page

- Changed model name display from truncate to break-words
- Increased max-width from 100px to 200px to allow more text
- This fixes issue #8209 for responsive text wrapping on smaller screens

Fixes: #8209

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-02 13:54:58 +01:00
LocalAI [bot]
b61536c0f4 chore: ⬆️ Update ggml-org/llama.cpp to 319146247e643695f94a558e8ae686277dd4f8da (#8707)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-02 10:08:51 +01:00
LocalAI [bot]
02d287a297 fix(ci): correct transformer backend path typo (#8712)
* fix(ci): correct transformer backend path typo

- Fix typo: 'transformer' -> 'transformers' in .github/workflows/test.yml
- The original PR #8710 had a typo where 'transformers' was written as 'transformer'
- This caused the build to fail as the directory is actually named 'transformers'
- References: https://github.com/mudler/LocalAI/pull/8710

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-02 10:08:28 +01:00
LocalAI [bot]
8b430c577b feat: Add debug logging for pocket-tts voice issue #8244 (#8715)
Adding debug logging to help investigate the pocket-tts custom voice
finding issue (Issue #8244). This is a first step to understand how
voices are being loaded and where the failure occurs.

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-02 09:24:59 +01:00
LocalAI [bot]
0063e5d68f feat(swagger): update swagger (#8706)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 21:33:19 +01:00
Ettore Di Giacinto
c7c4a20a9e fix: retry when LLM returns empty messages (#8704)
* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* retry instead of re-computing a response

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-01 21:32:38 +01:00
LocalAI [bot]
94539f3992 chore(model gallery): 🤖 add 1 new models via gallery agent (#8698)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 16:54:01 +01:00
LocalAI [bot]
525278658d chore(model gallery): 🤖 add 1 new models via gallery agent (#8696)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 16:19:38 +01:00
LocalAI [bot]
919f801e25 chore(model gallery): 🤖 add 1 new models via gallery agent (#8695)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 15:58:02 +01:00
LocalAI [bot]
362eb261c5 chore(model gallery): 🤖 add 1 new models via gallery agent (#8694)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 15:40:43 +01:00
LocalAI [bot]
d407f4ead5 chore(model gallery): 🤖 add 1 new models via gallery agent (#8693)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 15:25:08 +01:00
Ettore Di Giacinto
1fc8ad854f fix(toolcall): consider also literal \n between tags
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-01 11:20:46 +01:00
Loryan Strant
f49a8edd87 docs: Update Home Assistant links in README.md (#8688)
Update Home Assistant links in README.md

Signed-off-by: Loryan Strant <51473494+loryanstrant@users.noreply.github.com>
2026-03-01 08:28:58 +01:00
Ettore Di Giacinto
510b830d2b fix: simplify CI steps, fix gallery agent (#8685)
chore: simplify CI steps, fix gallery agent

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-01 01:00:30 +01:00
LocalAI [bot]
ddb36468ed chore: ⬆️ Update ggml-org/llama.cpp to 05728db18eea59de81ee3a7699739daaf015206b (#8683)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-03-01 00:48:26 +01:00
Ettore Di Giacinto
983db7bedc feat(ui): add model size estimation (#8684)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-28 23:03:47 +01:00
LocalAI [bot]
b260378694 docs: add TLS reverse proxy configuration guide (#8673)
* docs: add TLS reverse proxy configuration guide

Add documentation explaining how to use LocalAI behind a TLS
termination reverse proxy (HAProxy, Apache, Nginx).

The documentation covers:
- How LocalAI detects HTTPS via X-Forwarded-Proto header
- Required headers that must be forwarded
- Configuration examples for HAProxy, Apache, and Nginx
- Sub-path serving configuration
- Testing and troubleshooting guide

Fixes: Issue #7176 - Web UI broken behind TLS reverse proxy

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: remove non-existent --base-url option from sub-path section

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 23:02:17 +01:00
LocalAI [bot]
b10443ab5a feat(models): add model storage size display and RAM warning (#8675)
Add model storage size display and RAM warning in Models tab

- Backend (ui_api.go):
  - Added getDirectorySize() helper function to calculate total size of model files
  - Added storageSize, ramTotal, ramUsed, ramUsagePercent to /api/models endpoint response
  - Uses xsysinfo.GetSystemRAMInfo() for RAM information

- Frontend (models.html):
  - Added storageSize, ramTotal, ramUsed, ramUsagePercent to Alpine.js data object
  - Added formatBytes() helper for human-readable byte formatting
  - Display storage size in hero header with blue indicator
  - Show warning banner when storage exceeds RAM (model too large for system)

Addresses: https://github.com/mudler/LocalAI/issues/6251

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 22:05:01 +01:00
LocalAI [bot]
b647b6caf1 fix: properly sync model selection dropdown in video generation UI (#8680)
fix(video): initialize model selection dropdown with current model value

The Alpine.js link variable was starting empty, causing the dropdown
selection to not reflect the currently selected model. This fix initializes
the link variable with the current model value from the template (e.g.,
video/{{.Model}}), following the same pattern used in image.html.

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 13:11:33 +01:00
LocalAI [bot]
c187b160e7 fix(gallery): clean up partially downloaded backend on installation failure (#8679)
When a backend download fails (e.g., on Mac OS with port conflicts causing
connection issues), the backend directory is left with partial files.
This causes subsequent installation attempts to fail with 'run file not
found' because the sanity check runs on an empty/partial directory.

This fix cleans up the backend directory when the initial download fails
before attempting fallback URIs or mirrors. This ensures a clean state
for retry attempts.

Fixes: #8016

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 13:10:53 +01:00
LocalAI [bot]
42e580bed0 fix: whisper breaking on cuda-13 (use absolute path for CUDA directory detection) (#8678)
fix: use absolute path for CUDA directory detection

The capability detection was using a relative path 'usr/local/cuda-13'
which doesn't work when LocalAI is run from a different working directory.
This caused whisper (and other backends) to fail on CUDA-13 containers
because the system incorrectly detected 'nvidia' capability instead of
'nvidia-cuda-13', leading to wrong backend selection (cuda12-whisper
instead of cuda13-whisper).

Fixes: https://github.com/mudler/LocalAI/issues/8033

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 09:10:40 +01:00
LocalAI [bot]
5e13193d84 docs: add CDI driver config for NVIDIA GPU in containers (fix #8108) (#8677)
This addresses issue #8108 where the legacy nvidia driver configuration
causes container startup failures with newer NVIDIA Container Toolkit versions.

Changes:
- Update docker-compose example to show both CDI (recommended) and legacy
  nvidia driver options
- Add troubleshooting section for 'Auto-detected mode as legacy' error
- Document the fix for nvidia-container-cli 'invalid expression' errors

The root cause is a Docker/NVIDIA Container Toolkit configuration issue,
not a LocalAI code bug. The error occurs during the container runtime's
prestart hook before LocalAI starts.

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 08:42:53 +01:00
Ettore Di Giacinto
1c5dc83232 chore(deps): bump llama.cpp to 'ecbcb7ea9d3303097519723b264a8b5f1e977028' (#8672)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-28 00:33:56 +01:00
LocalAI [bot]
73b997686a chore: ⬆️ Update ggml-org/whisper.cpp to 9453b4b9be9b73adfc35051083f37cefa039acee (#8671)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-27 21:28:48 +00:00
Ettore Di Giacinto
00abf1be1f fix(qwen3.5): add qwen3.5 preset and mimick llama.cpp's PEG (#8668)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-27 12:15:00 +01:00
LocalAI [bot]
959458f0db fix(gallery): add fallback URI resolution for backend installation (#8663)
* fix(gallery): add fallback URI resolution for backend installation

When a backend installation fails (e.g., due to missing 'latest-' tag),
try fallback URIs in order:
1. Replace 'latest-' with 'master-' in the URI
2. If that fails, append '-development' to the backend name

This fixes the issue where backend index entries don't match the
repository tags. For example, installing 'ace-step' tries to download
'latest-gpu-nvidia-cuda-13-ace-step' but only 'master-gpu-nvidia-cuda-13-ace-step'
exists in the quay.io registry.

Fixes: #8437
Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>

* chore(gallery): make fallback URI patterns configurable via env vars

---------

Signed-off-by: localai-bot <139863280+localai-bot@users.noreply.github.com>
2026-02-27 10:56:33 +01:00
LocalAI [bot]
dfc6efb88d feat(backends): add faster-qwen3-tts (#8664)
* feat(backends): add faster-qwen3-tts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: this backend is CUDA only

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add requirements-install.txt with setuptools for build isolation

The faster-qwen3-tts backend requires setuptools to build packages
like sox that have setuptools as a build dependency. This ensures
the build completes successfully in CI.

Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-27 08:16:51 +01:00
LocalAI [bot]
65082b3a6f fix: Add named volumes for Windows Docker compatibility (#8661)
- Added named volumes (models, images) to docker-compose.yaml
- Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml
- Changed bind mounts to named volumes for Windows compatibility

Fixes #8455

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-26 23:18:53 +01:00
Ettore Di Giacinto
0483d47674 Change condition for dependabot job in workflow
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-26 23:17:33 +01:00
LocalAI [bot]
8ad40091a6 chore: ⬆️ Update ggml-org/llama.cpp to 723c71064da0908c19683f8c344715fbf6d986fd (#8660)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-26 21:34:47 +00:00
LocalAI [bot]
8bfe458fbc fix: change file permissions from 0600 to 0644 in InstallModel (#8657)
Closes #8119

When installing models from the gallery, files are created with 0600
permissions (owner read/write only), making them unreadable by the
LocalAI server when running as a different user.

This fix changes the permissions to 0644 (owner read/write, group/others
read), allowing the server to read model files regardless of the user
it runs as.

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-26 09:38:54 +01:00
Ettore Di Giacinto
657ba8cdad fix(chat): do not send thinking/reasoning messages to the LLM (#8656)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-26 00:06:35 +01:00
LocalAI [bot]
fb86f6461d chore: ⬆️ Update ggml-org/llama.cpp to 3769fe6eb70b0a0fbb30b80917f1caae68c902f7 (#8655)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-26 00:05:03 +01:00
LocalAI [bot]
1027c487a6 fix: reload model after editing YAML config (issue #8647) (#8652)
fix: reload model configuration after editing (issue #8647)

- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-25 22:18:42 +01:00
LocalAI [bot]
bb226d1eaa feat(swagger): update swagger (#8654)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 21:52:35 +01:00
Ettore Di Giacinto
b032cf489b fix(chatterbox): add support for cuda13/aarch64 (#8653)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-25 21:51:44 +01:00
Copilot
3ac7301f31 Add sample_rate support to TTS API via post-processing resampling (#8650)
* Initial plan

* Add TTS sample_rate support via AudioResample post-processing

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 16:36:27 +01:00
dependabot[bot]
c4783a0a05 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm (#8635)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:17:32 +01:00
dependabot[bot]
c44f03b882 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers (#8636)
chore(deps): bump grpcio in /backend/python/rerankers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:57 +01:00
dependabot[bot]
eeec92af78 chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers (#8638)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.2...v5.2.3)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:41 +01:00
dependabot[bot]
842033b8b5 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers (#8640)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:55 +01:00
dependabot[bot]
a2941228a7 chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template (#8641)
chore(deps): bump grpcio in /backend/python/common/template

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:43 +01:00
dependabot[bot]
791e6b84ee chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui (#8642)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:30 +01:00
LocalAI [bot]
d845c39963 docs: add Podman installation documentation (#8646)
* docs: add Podman installation documentation

- Add new podman.md with comprehensive installation and usage guide
- Cover installation on multiple platforms (Ubuntu, Fedora, Arch, macOS, Windows)
- Document GPU support (NVIDIA CUDA, AMD ROCm, Intel, Vulkan)
- Include rootless container configuration
- Document Docker Compose with podman-compose
- Add troubleshooting section for common issues
- Link to Podman documentation in installation index
- Update image references to use Docker Hub and link to docker docs
- Change YAML heredoc to EOF in compose.yaml example
- Add curly brackets to notice shortcode and fix link

Closes #8645

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: merge Docker and Podman docs into unified Containers guide

Following the review comment, we have merged the Docker and Podman
documentation into a single 'Containers' page that covers both container
engines. The Docker and Podman pages now redirect to this unified guide.

Changes:
- Added new docs/content/installation/containers.md with combined Docker/Podman guide
- Updated docs/content/installation/docker.md to redirect to containers
- Updated docs/content/installation/podman.md to redirect to containers
- Updated docs/content/installation/_index.en.md to link to containers

Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

* docs: remove podman.md as docs are merged into containers.md

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-25 08:13:55 +01:00
LocalAI [bot]
1331e23b67 chore: ⬆️ Update ggml-org/llama.cpp to 418dea39cea85d3496c8b04a118c3b17f3940ad8 (#8649)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-25 00:04:48 +00:00
LocalAI [bot]
36ff2a0138 fix(webui): use different icon for System nav item (#8648)
Change the System nav item icon from fas fa-server to fas fa-desktop
to distinguish it from the Backends nav item which still uses fa-server.

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-24 17:10:58 +01:00
LocalAI [bot]
db6ba4ef07 chore: remove install.sh script and documentation references (#8643)
* chore: remove install.sh script and documentation references

- Delete docs/static/install.sh (broken installer causing issues)
- Remove One-Line Installer section from linux.md documentation
- Remove install.sh references from installation/_index.en.md
- Remove install.sh warning and commands from README.md

Closes #8032

* fix: add missing closing braces to notice shortcode
2026-02-24 08:36:25 +01:00
dependabot[bot]
d19dcac863 chore(deps): bump github.com/mudler/cogito from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1 (#8632)
chore(deps): bump github.com/mudler/cogito

Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/commits/v0.9.1)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.9.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-24 01:29:47 +00:00
dependabot[bot]
fd42675bec chore(deps): bump goreleaser/goreleaser-action from 6 to 7 (#8634)
Bumps [goreleaser/goreleaser-action](https://github.com/goreleaser/goreleaser-action) from 6 to 7.
- [Release notes](https://github.com/goreleaser/goreleaser-action/releases)
- [Commits](https://github.com/goreleaser/goreleaser-action/compare/v6...v7)

---
updated-dependencies:
- dependency-name: goreleaser/goreleaser-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:27:49 +01:00
dependabot[bot]
3391538806 chore(deps): bump actions/stale from 10.1.1 to 10.2.0 (#8633)
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.1 to 10.2.0.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](997185467f...b5d41d4e1d)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: 10.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:27:20 +01:00
dependabot[bot]
c4f879c4ea chore(deps): bump github.com/gpustack/gguf-parser-go from 0.23.1 to 0.24.0 (#8631)
chore(deps): bump github.com/gpustack/gguf-parser-go

Bumps [github.com/gpustack/gguf-parser-go](https://github.com/gpustack/gguf-parser-go) from 0.23.1 to 0.24.0.
- [Release notes](https://github.com/gpustack/gguf-parser-go/releases)
- [Commits](https://github.com/gpustack/gguf-parser-go/compare/v0.23.1...v0.24.0)

---
updated-dependencies:
- dependency-name: github.com/gpustack/gguf-parser-go
  dependency-version: 0.24.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:26:41 +01:00
dependabot[bot]
b7e0de54fe chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.22.0 to 1.26.0 (#8630)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:26:15 +01:00
dependabot[bot]
f0868acdf3 chore(deps): bump fyne.io/fyne/v2 from 2.7.2 to 2.7.3 (#8629)
Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.2 to 2.7.3.
- [Release notes](https://github.com/fyne-io/fyne/releases)
- [Changelog](https://github.com/fyne-io/fyne/blob/v2.7.3/CHANGELOG.md)
- [Commits](https://github.com/fyne-io/fyne/compare/v2.7.2...v2.7.3)

---
updated-dependencies:
- dependency-name: fyne.io/fyne/v2
  dependency-version: 2.7.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 23:25:46 +01:00
LocalAI [bot]
9a5b5ee8a9 chore: ⬆️ Update ggml-org/llama.cpp to b68a83e641b3ebe6465970b34e99f3f0e0a0b21a (#8628)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-23 22:02:40 +00:00
Lukas Schaefer
ed0bfb8732 fix: rename json_verbose to verbose_json (#8627)
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
2026-02-23 17:57:06 +00:00
Richard Palethorpe
be84b1d258 feat(traces): Use accordian instead of pop-ups (#8626)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-23 13:07:41 +01:00
Andres
cbedcc9091 fix(api): Downgrade health/readiness check to debug (#8625)
Downgrade health/readiness check to debug

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-23 11:58:04 +01:00
Andres
e45d63c86e fix(cli): Fix watchdog running constantly and spamming logs (#8624)
* Fix watchdog running constantly and spamming logs

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Update docs

Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-23 11:57:28 +01:00
LocalAI [bot]
f40c8dd0ce chore: ⬆️ Update ggml-org/llama.cpp to 2b6dfe824de8600c061ef91ce5cc5c307f97112c (#8622)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-23 09:30:58 +00:00
LocalAI [bot]
559ab99890 docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration (#8621)
* docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration

* chore: revert backend/python/diffusers/README.md to original content

---------

Co-authored-by: Your Name <you@example.com>
2026-02-22 18:17:23 +01:00
LocalAI [bot]
91f2dd5820 chore: ⬆️ Update ggml-org/llama.cpp to f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5 (#8619)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-22 13:20:08 +01:00
LocalAI [bot]
8250815763 docs: ⬆️ update docs version mudler/LocalAI (#8618)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-21 21:18:40 +00:00
Richard Palethorpe
b1b67b973e fix(realtime): Add functions to conversation history (#8616)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-21 19:03:49 +01:00
LocalAI [bot]
fcecc12e57 chore: ⬆️ Update ggml-org/llama.cpp to ba3b9c8844aca35ecb40d31886686326f22d2214 (#8613)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-21 09:57:04 +01:00
Ettore Di Giacinto
51902df7ba fix: merge openresponses messages (#8615)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-21 09:56:43 +01:00
Ettore Di Giacinto
05f3ae31de chore: drop bark.cpp leftovers from pipelines (#8614)
Update bump_deps.yaml

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-21 09:24:09 +01:00
LocalAI [bot]
bb0924dff1 chore: ⬆️ Update ggml-org/llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#8612)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-20 23:47:47 +01:00
Richard Palethorpe
51eec4e6b8 feat(traces): Add backend traces (#8609)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-20 23:47:33 +01:00
LocalAI [bot]
462c82fad2 docs: ⬆️ update docs version mudler/LocalAI (#8611)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-20 21:19:51 +00:00
Ettore Di Giacinto
352b8aaa1b fix(ui): pass by needed values to unbreak model editor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-20 09:06:17 +01:00
Ettore Di Giacinto
df792d6243 chore(ui): improve navigation and buttons placement (#8608)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 23:41:05 +01:00
LocalAI [bot]
b1c434f0fc chore: ⬆️ Update ggml-org/llama.cpp to 11c325c6e0666a30590cde390d5746a405e536b9 (#8607)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-19 23:32:35 +01:00
LocalAI [bot]
bb42b342de chore: ⬆️ Update ggml-org/whisper.cpp to 21411d81ea736ed5d9cdea4df360d3c4b60a4adb (#8606)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-19 23:32:21 +01:00
LocalAI [bot]
e555057f8b fix: multi-GPU support for Diffusers (Issue #8575) (#8605)
* chore: init

* feat: implement multi-GPU support for Diffusers backend (fixes #8575)

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-19 21:35:58 +01:00
Ettore Di Giacinto
76fba02e56 fix: do not keep track model if not existing (#8603)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 17:18:38 +01:00
Ettore Di Giacinto
dadc7158fb fix(diffusers): sd_embed is not always available (#8602)
Seems sd_embed doesn't play well with MPS and L4T. Making it optional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 10:45:17 +01:00
LocalAI [bot]
68c7077491 chore: ⬆️ Update ggml-org/llama.cpp to b55dcdef5dcd74dc75c4921090e928d43453c157 (#8599)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-18 22:33:25 +01:00
Ettore Di Giacinto
b471619ad9 chore(deps): bump cogito and add new options to the agent config (#8601)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 22:10:26 +01:00
LocalAI [bot]
a0476d5567 chore(model-gallery): ⬆️ update checksum (#8600)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-18 21:52:28 +01:00
Ettore Di Giacinto
a2228f1418 fix(ui): improve view on mobile (#8598)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 19:50:59 +01:00
Ettore Di Giacinto
7dd9a155a3 fix(ui): drop duplicated footer import
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 14:38:53 +01:00
Richard Palethorpe
4fe830ff58 fix(realtime): Limit buffer sizes to prevent DoS (#8596)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-18 14:36:43 +01:00
Richard Palethorpe
86b3bc9313 fix(realtime): Better support for thinking models and setting model parameters (#8595)
* fix(realtime): Wrap functions in OpenAI chat completions format

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): Set max tokens from session object

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Find thinking start tag for thinking extraction

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Don't send buffer cleared message when we automatically drop it

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-18 14:36:16 +01:00
Ettore Di Giacinto
2fabdc08e6 feat(ui): left navbar, dark/light theme (#8594)
* feat(ui): left navbar, dark/light theme

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* darker background

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-18 00:14:39 +01:00
LocalAI [bot]
ed832cf0e0 chore: ⬆️ Update ggml-org/llama.cpp to 2b089c77580d347767f440205103e4da8ec33d89 (#8592)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-17 22:35:07 +00:00
LocalAI [bot]
95db1da309 chore(model-gallery): ⬆️ update checksum (#8593)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-17 21:31:35 +01:00
Richard Palethorpe
9e692967c3 fix(llama-cpp): Pass parameters when using embedded template (#8590)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-17 18:50:05 +01:00
Ettore Di Giacinto
ecba23d44e fix: improve watchdown logics (#8591)
* fix: ensure proper watchdog shutdown and state passing between restarts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add missing watchdog settings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: untrack model if we shut it down successfully

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-17 18:49:22 +01:00
LocalAI [bot]
067a255435 chore: ⬆️ Update ggml-org/llama.cpp to d612901116ab2066c7923372d4827032ff296bc4 (#8588)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-17 00:57:32 +01:00
dependabot[bot]
637ecba382 chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.2.0 to 1.3.0 (#8585)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.2.0 to 1.3.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.2.0...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 23:04:19 +00:00
dependabot[bot]
46c64e59f5 chore(deps): bump github.com/jaypipes/ghw from 0.22.0 to 0.23.0 (#8587)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.22.0 to 0.23.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.22.0...v0.23.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 21:49:41 +00:00
dependabot[bot]
f806838c37 chore(deps): bump google.golang.org/grpc from 1.78.0 to 1.79.1 (#8583)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.78.0 to 1.79.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.78.0...v1.79.1)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-16 20:20:30 +00:00
Richard Palethorpe
074a982853 fix(gallery): Use YAML v3 to avoid merging maps with incompatible keys (#8580)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-16 14:10:19 +01:00
LocalAI [bot]
109f29cc24 chore: ⬆️ Update ggml-org/llama.cpp to 27b93cbd157fc4ad94573a1fbc226d3e18ea1bb4 (#8577)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 23:42:36 +01:00
LocalAI [bot]
587e4a21b3 chore: ⬆️ Update antirez/voxtral.c to 134d366c24d20c64b614a3dcc8bda2a6922d077d (#8578)
⬆️ Update antirez/voxtral.c

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 23:42:11 +01:00
LocalAI [bot]
3f1f58b2ab chore: ⬆️ Update ggml-org/whisper.cpp to 364c77f4ca2737e3287652e0e8a8c6dce3231bba (#8576)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-15 21:20:04 +00:00
Ettore Di Giacinto
01eb70caff Fix formatting in README.md for Audio to Text section
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-15 09:57:50 +01:00
LocalAI [bot]
d784851337 chore: ⬆️ Update ggml-org/llama.cpp to 01d8eaa28d57bfc6d06e30072085ed0ef12e06c5 (#8567)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-14 22:52:32 +01:00
Ettore Di Giacinto
1c4e5aa5c0 chore: bump cogito (#8568)
Adapt to new API and drop call to Ask()

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-14 22:52:22 +01:00
LocalAI [bot]
94df096fb9 fix: pin neutts-air to known working commit (#8566)
* chore: init

* fix: pin neutts-air to known working commit

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-14 21:16:37 +01:00
Ettore Di Giacinto
820bd7dd01 fix(ci): try to fix deps for l4t13 on qwen-*
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-14 10:21:23 +01:00
Austen
42cb7bda19 fix(llama-cpp): populate tensor_buft_override buffer so llama-cpp properly performs fit calculations (#8560)
fix auto-fit for llama-cpp
2026-02-14 10:07:37 +01:00
Ettore Di Giacinto
2fb9940b8a fix(voxcpm): pin setuptools (#8556)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-13 23:44:35 +01:00
LocalAI [bot]
2ff0ad4190 chore: ⬆️ Update ggml-org/llama.cpp to 05a6f0e8946914918758db767f6eb04bc1e38507 (#8553)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-13 22:48:01 +01:00
Ettore Di Giacinto
bd12103ed4 chore: compute capabilities once (#8555)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-13 22:23:06 +01:00
LocalAI [bot]
2e17edd72a fix: prevent excessive logging in capability detection (#8552)
Closes #8527.

This PR fixes the excessive logging issue in capability detection by applying the existing capabilityLogged guard to the forced capability run file case.

## Changes
- Apply capabilityLogged flag to forced capability detection logging
- Prevents repeated log messages during backend discovery and gallery operations

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-13 20:00:29 +00:00
Richard Palethorpe
24aab68b3f feat(gallery): Add nanbeige4.1-3b (#8551)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-13 18:23:44 +01:00
Richard Palethorpe
5bdbb10593 fix(realtime): Send proper image data to backend (#8547)
* fix(realtime): Allow empty parameters

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Just pass base64 string to backend

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-13 18:01:07 +01:00
Ettore Di Giacinto
2fd026e958 fix: update moonshine API, add setuptools to voxcpm requirements (#8541)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 23:22:37 +01:00
LocalAI [bot]
08718b656e chore: ⬆️ Update ggml-org/llama.cpp to 338085c69e486b7155e5b03d7b5087e02c0e2528 (#8538)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-12 23:21:53 +01:00
LocalAI [bot]
7121b189f7 chore(model-gallery): ⬆️ update checksum (#8540)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-12 21:54:33 +01:00
Richard Palethorpe
f6c80a6987 feat(realtime): Allow sending text, image and audio conversation items" (#8524)
feat(realtime): Allow sending text and image conversation items

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 19:33:46 +00:00
Ettore Di Giacinto
4a4d65f8e8 chore(model gallery): add vllm-omni models (#8536)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:27:20 +01:00
Ettore Di Giacinto
2858e71606 chore(model gallery): add neutts (#8535)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:17:03 +01:00
Ettore Di Giacinto
088205339c chore(model gallery): add voxcpm, whisperx, moonshine-tiny (#8534)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:13:03 +01:00
Ettore Di Giacinto
8616397d59 chore(model gallery): add nemo-asr (#8533)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 18:01:42 +01:00
Ettore Di Giacinto
1698f92bd0 Remove URL entry from gallery index
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 17:50:13 +01:00
Ettore Di Giacinto
02c95a57ae Add known use cases for audio processing
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-12 17:49:54 +01:00
rampa3
2ab6be1d0c chore(model gallery): Add npc-llm-3-8b (#8498)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-12 17:46:25 +01:00
Ettore Di Giacinto
9d78ec1bd8 chore(model gallery): add voxtral (which is only available in development) (#8532)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 17:44:24 +01:00
LocalAI [bot]
b10b85de52 chore: improve log levels verbosity (#8528)
* chore: init for PR

* feat: improve log verbosity per #8449 - demote /api/resources to DEBUG, elevate job events to INFO

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-12 16:24:46 +01:00
Richard Palethorpe
1479bee894 fix(realtime): Sampling and websocket locking (#8521)
* fix(realtime): Use locked websocket for concurrent access

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Use sample rate set in session

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(config): Allow pipelines to have no model parameters

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-12 13:57:34 +01:00
Austen
cff972094c feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504)
* add experimental support for sd_embed-style prompt embedding

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* add doc equivalent to compel

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* need to use flux1 embedding function for flux model

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

---------

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
2026-02-11 22:58:19 +01:00
LocalAI [bot]
79a25f7ae9 chore: ⬆️ Update ggml-org/llama.cpp to 4d3daf80f8834e0eb5148efc7610513f1e263653 (#8513)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-11 21:48:39 +00:00
Richard Palethorpe
7270a98ce5 fix(realtime): Use user provided voice and allow pipeline models to have no backend (#8415)
* fix(realtime): Use the voice provided by the user or none at all

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-11 14:18:05 +01:00
LocalAI [bot]
0ee92317ec chore: ⬆️ Update ggml-org/llama.cpp to 57487a64c88c152ac72f3aea09bd1cc491b2f61e (#8499)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 21:32:46 +00:00
LocalAI [bot]
743d2d1947 chore: ⬆️ Update ggml-org/whisper.cpp to 764482c3175d9c3bc6089c1ec84df7d1b9537d83 (#8478)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 15:14:59 +01:00
LocalAI [bot]
df04843f34 chore: ⬆️ Update ggml-org/llama.cpp to 262364e31d1da43596fe84244fba44e94a0de64e (#8479)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-10 15:14:33 +01:00
Kolega.dev
780877d1d0 security: validate URLs to prevent SSRF in content fetching endpoints (#8476)
User-supplied URLs passed to GetContentURIAsBase64() and downloadFile()
were fetched without validation, allowing SSRF attacks against internal
services. Added URL validation that blocks private IPs, loopback,
link-local, and cloud metadata endpoints before fetching.

Co-authored-by: kolega.dev <faizan@kolega.ai>
2026-02-10 15:14:14 +01:00
dependabot[bot]
08eeed61f4 chore(deps): bump github.com/openai/openai-go/v3 from 3.17.0 to 3.19.0 (#8485)
Bumps [github.com/openai/openai-go/v3](https://github.com/openai/openai-go) from 3.17.0 to 3.19.0.
- [Release notes](https://github.com/openai/openai-go/releases)
- [Changelog](https://github.com/openai/openai-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-go/compare/v3.17.0...v3.19.0)

---
updated-dependencies:
- dependency-name: github.com/openai/openai-go/v3
  dependency-version: 3.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 05:41:15 +00:00
dependabot[bot]
5207ff84dc chore(deps): bump github.com/alecthomas/kong from 1.13.0 to 1.14.0 (#8481)
Bumps [github.com/alecthomas/kong](https://github.com/alecthomas/kong) from 1.13.0 to 1.14.0.
- [Commits](https://github.com/alecthomas/kong/compare/v1.13.0...v1.14.0)

---
updated-dependencies:
- dependency-name: github.com/alecthomas/kong
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 04:29:00 +00:00
dependabot[bot]
4ade2e61ab chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 (#8483)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.28.0 to 2.28.1.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.28.0...v2.28.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.28.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 03:15:46 +00:00
dependabot[bot]
818be98314 chore(deps): bump github.com/jaypipes/ghw from 0.21.2 to 0.22.0 (#8484)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.21.2 to 0.22.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.21.2...v0.22.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-10 02:02:38 +00:00
dependabot[bot]
056c438452 chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.20.0 to 1.22.0 (#8482)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.20.0 to 1.22.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.20.0...v1.22.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-09 23:34:36 +00:00
LocalAI [bot]
0c040beb59 chore: ⬆️ Update antirez/voxtral.c to c9e8773a2042d67c637fc492c8a655c485354080 (#8477)
⬆️ Update antirez/voxtral.c

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 22:20:03 +01:00
Ettore Di Giacinto
bf5a1dd840 feat(voxtral): add voxtral backend (#8451)
* feat(voxtral): add voxtral backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* simplify

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-09 09:12:05 +01:00
rampa3
f44200bec8 chore(model gallery): Add Ministral 3 family of models (aside from base versions) (#8467)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-09 09:10:37 +01:00
LocalAI [bot]
3b1b08efd6 chore: ⬆️ Update ggml-org/llama.cpp to e06088da0fa86aa444409f38dff274904931c507 (#8464)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 09:09:32 +01:00
LocalAI [bot]
3d8791067f chore: ⬆️ Update ggml-org/whisper.cpp to 4b23ff249e7f93137cb870b28fb27818e074c255 (#8463)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-09 09:08:55 +01:00
Austen
da8207b73b feat(stablediffusion-ggml): Improve legacy CPU support for stablediffusion-ggml backend (#8461)
* Port AVX logic from whisper to stablediffusion-ggml

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* disable BMI2 on AVX builds

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

---------

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-08 23:11:33 +00:00
Varun Chawla
aa9ca401fa docs: update model gallery documentation to reference main repository (#8452)
Fixes #8212 - Updated the note about reporting broken models to
reference the main LocalAI repository instead of the outdated
separate gallery repository reference.
2026-02-08 22:14:23 +01:00
LocalAI [bot]
e43c0c3ffc docs: ⬆️ update docs version mudler/LocalAI (#8462)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-08 21:12:50 +00:00
LocalAI [bot]
944874d08b chore: ⬆️ Update ggml-org/llama.cpp to 8872ad2125336d209a9911a82101f80095a9831d (#8448)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-07 21:22:18 +00:00
Ettore Di Giacinto
3370d807c2 feat(nemo): add Nemo (only asr for now) backend (#8436)
* feat(nemo): add Nemo (only asr for now) backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(nemo): add Nemo backend without Python version pins (#8438)

* Initial plan

* Remove Python version pins from nemo backend install.sh

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Pin pyarrow to 20.0.0 in nemo requirements

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-07 08:19:37 +01:00
LocalAI [bot]
ae2689936a chore: ⬆️ Update ggml-org/llama.cpp to b83111815e9a79949257e9d4b087206b320a3063 (#8434)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-06 21:22:33 +00:00
Richard Palethorpe
15c12674b6 fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile (#8431)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 17:12:45 +01:00
Richard Palethorpe
7fbe1d2e72 chore(models): Add Qwen TTS 0.6b (#8428)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 10:30:36 +01:00
Richard Palethorpe
c1d0b10b14 chore(docs): Document using a local model gallery (#8426)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 10:28:41 +01:00
Andres
efd552f83e fix(api)!: Stop model prior to deletion (#8422)
* Unload model prior to deletion

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Fix LFM model in gallery

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Remove mistakenly added files

Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
2026-02-06 09:22:10 +01:00
LocalAI [bot]
bcd927da6e chore: ⬆️ Update ggml-org/llama.cpp to 22cae832188a1f08d18bd0a707a4ba5cd03c7349 (#8419)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-06 09:21:33 +01:00
LocalAI [bot]
682ac7e637 chore(model-gallery): ⬆️ update checksum (#8420)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-05 21:34:26 +01:00
LocalAI [bot]
c8d74d35df feat(swagger): update swagger (#8418)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-05 21:33:44 +01:00
Ettore Di Giacinto
a849f285a5 chore(tests): add audio/wav to expected wav file 2026-02-05 20:27:06 +00:00
Ettore Di Giacinto
697f6aa71c feat(audio): set audio content type (#8416)
* feat(audio): set audio content type

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 19:14:12 +01:00
Ettore Di Giacinto
218d0526cb fix(qwen-tts): add six dependency
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 18:05:31 +01:00
Ettore Di Giacinto
9bc5ab18fa fix(voxcpm): make sed call unix-compliant
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 17:15:58 +01:00
Ettore Di Giacinto
a9267f391c fix(huggingface): add clean target
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:54:41 +01:00
Ettore Di Giacinto
029ae3420d fix(package.sh): drop redundant -a and -R
-a implies already -R

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:39:38 +01:00
Ettore Di Giacinto
c0461f32a1 fix: add missing clean targets
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 16:38:16 +01:00
Ettore Di Giacinto
8989d2944e fix: add clean target to local-store
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 14:55:34 +01:00
Ettore Di Giacinto
7aea2add44 Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" (#8412)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/re…"

This reverts commit 55e43b3f92.
2026-02-05 14:17:33 +01:00
dependabot[bot]
55e43b3f92 chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory (#8407)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-05 12:37:52 +00:00
Ettore Di Giacinto
53276d28e7 feat(musicgen): add ace-step and UI interface (#8396)
* feat(musicgen): add ace-step and UI interface

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly handle model dir

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop auto-download

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to models, fixup UIs icons

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* l4t13 is incompatbile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* avoid pinning version for cuda12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop l4t12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 12:04:53 +01:00
Yaroslav98214
6dbcdb0b9e fix: filter GGUF and GGML files from model list (#8397)
Filter GGUF and GGML files from model list

Skip .gguf/.ggml loose files when listing models and add a test
for .gguf exclusion.

Closes #1077

Signed-off-by: Yaroslav98214 <diakovichyaroslav30@gmail.com>
2026-02-05 10:17:46 +01:00
LocalAI [bot]
c30866ba95 chore: ⬆️ Update ggml-org/llama.cpp to b536eb023368701fe3564210440e2df6151c3e65 (#8399)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-04 23:08:08 +01:00
LocalAI [bot]
b413beba2d chore: ⬆️ Update ggml-org/whisper.cpp to 941bdabbe4561bc6de68981aea01bc5ab05781c5 (#8398)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-04 21:20:59 +00:00
Ettore Di Giacinto
9db4df22f3 chore: update torch and torchaudio version specifications for qwen-tts in MPS
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-04 16:55:42 +01:00
Jonas Bernard
5ac50c9348 fix(docs): Promote DEBUG=false in production docker compose (#8390)
fix(docs): Use DEBUG=false in production docker compose

Signed-off-by: Jonas Bernard <public.jbernard@web.de>
2026-02-04 09:35:32 +01:00
Ettore Di Giacinto
5201b58d3e feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU (#8380)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 23:53:34 +01:00
LocalAI [bot]
8fa6737bdc chore(model gallery): 🤖 add 1 new models via gallery agent (#8381)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-03 22:40:22 +01:00
Ettore Di Giacinto
3039ced287 chore(ci): enlarge sleep startup time
Even if suboptimal as we should poll to wait for the service to be available, this should at least alleviate tests for now

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-03 22:07:07 +01:00
Ettore Di Giacinto
e7fc604dbc feat(metal): try to extend support to remaining backends (#8374)
* feat(metal): try to extend support to remaining backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* neutts doesn't work

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* split outetts out of transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove torch pin to whisperx

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 21:57:50 +01:00
Richard Palethorpe
5195062e12 fix(realtime): Include noAction function in prompt template and handle tool_choice (#8372)
The realtime endpoint was not passing the noAction "answer" function to the
model in the prompt template, causing the model to always call user-provided
tools even when a direct response was appropriate.

Root cause:
- User tools were added to the funcs list
- TemplateMessages() was called to generate the prompt
- noAction function was only added AFTER templating
- This meant the prompt didn't include the "answer" function, even though
  the grammar did

Fix:
- Move noAction function creation before TemplateMessages() call so it's
  included in both the prompt and grammar
- Add proper tool_choice parameter handling to support "auto", "required",
  "none", and specific function selection
- Match behavior of the standard chat endpoint

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-03 14:30:37 +01:00
dependabot[bot]
c86edf06f2 chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 (#8357)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.39.0 to 1.39.1.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.39.0...v1.39.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 09:00:59 +01:00
dependabot[bot]
5d14c2fe4d chore(deps): bump sentence-transformers from 5.2.0 to 5.2.2 in /backend/python/transformers (#8358)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.0 to 5.2.2.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.0...v5.2.2)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:37:05 +01:00
dependabot[bot]
4601143998 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.61.0 to 0.62.0 (#8359)
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus

Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.61.0 to 0.62.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.61.0...exporters/prometheus/v0.62.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.62.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:36:34 +01:00
dependabot[bot]
7913ea2bfb chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.39.0 to 1.40.0 (#8354)
chore(deps): bump go.opentelemetry.io/otel/sdk/metric

Bumps [go.opentelemetry.io/otel/sdk/metric](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk/metric
  dependency-version: 1.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:36:10 +01:00
Ettore Di Giacinto
d6409bd2eb Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" (#8367)
Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vl…"

This reverts commit 4c0e70086d.
2026-02-03 08:34:54 +01:00
dependabot[bot]
98872791e5 chore(deps): bump protobuf from 6.33.4 to 6.33.5 in /backend/python/transformers (#8356)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.4 to 6.33.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:33:45 +01:00
dependabot[bot]
c6d47cb4e5 chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.19.0 to 1.20.0 (#8355)
chore(deps): bump github.com/anthropics/anthropic-sdk-go

Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.19.0 to 1.20.0.
- [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.19.0...v1.20.0)

---
updated-dependencies:
- dependency-name: github.com/anthropics/anthropic-sdk-go
  dependency-version: 1.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:33:24 +01:00
dependabot[bot]
f754a8edb1 chore(deps): bump go.opentelemetry.io/otel/metric from 1.39.0 to 1.40.0 (#8353)
Bumps [go.opentelemetry.io/otel/metric](https://github.com/open-telemetry/opentelemetry-go) from 1.39.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.39.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/metric
  dependency-version: 1.40.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:33:04 +01:00
dependabot[bot]
4c0e70086d chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory (#8360)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/vllm directory: torch.


Updates `torch` from 2.7.0 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 03:07:02 +00:00
dependabot[bot]
f8bd527dfe chore(deps): bump appleboy/ssh-action from 1.2.4 to 1.2.5 (#8352)
Bumps [appleboy/ssh-action](https://github.com/appleboy/ssh-action) from 1.2.4 to 1.2.5.
- [Release notes](https://github.com/appleboy/ssh-action/releases)
- [Commits](https://github.com/appleboy/ssh-action/compare/v1.2.4...v1.2.5)

---
updated-dependencies:
- dependency-name: appleboy/ssh-action
  dependency-version: 1.2.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 01:53:20 +00:00
Ettore Di Giacinto
08b2b8d755 fix(libbackend): do not inject --index-strategy unsafe-best-match to all
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-02 17:13:45 +01:00
Dream
10a1e6c74d feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx backend for transcription with diarization

Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): register whisperx backend in Makefile

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx meta and image entries to index.yaml

Signed-off-by: eureka928 <meobius123@gmail.com>

* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): unpin torch versions and use CPU index for cpu requirements

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch ROCm variant to fix CI build failure

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch CPU variant to fix uv resolution failure

Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

* revert: scope hipblas suffix fix to whisperx only

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

---------

Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:33:12 +01:00
Alex O'Connell
b7585ca738 fix(api): Add missing field in initial OpenAI streaming response (#8341)
Add missing field in initial OpenAI streaming response

Signed-off-by: Alex O'Connell <35843486+acon96@users.noreply.github.com>
2026-02-02 08:30:04 +01:00
LocalAI [bot]
8cae99229c chore: ⬆️ Update ggml-org/llama.cpp to 2634ed207a17db1a54bd8df0555bd8499a6ab691 (#8336)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-01 21:23:57 +00:00
rampa3
04e0f444e1 chore(model gallery): Add Qwen 3 VL 8B thinking & instruct (#8329)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-01 17:34:31 +01:00
rampa3
6f410f4cbe chore(model gallery): Rename downloaded filename for Magistral Small mmproj (#8327)
Rename downloaded filename for Magistral Small mmproj

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-02-01 17:33:48 +01:00
Ettore Di Giacinto
800f749c7b fix: drop gguf VRAM estimation (now redundant) (#8325)
fix: drop gguf VRAM estimation

Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go.

VRAM estimation in general is tricky, but llama.cpp ( 41ea26144e/src/llama.cpp (L168) ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it

 397f7f0862/backend/cpp/llama-cpp/grpc-server.cpp (L393)

Fixes: https://github.com/mudler/LocalAI/issues/8302
See: https://github.com/mudler/LocalAI/issues/8302#issuecomment-3830773472
2026-02-01 17:33:28 +01:00
Andres
b6459ddd57 feat(api): Add transcribe response format request parameter & adjust STT backends (#8318)
* WIP response format implementation for audio transcriptions

(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Rework transcript response_format and add more formats

(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Add test and replace go-openai package with official openai go client

(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Fix faster-whisper backend and refactor transcription formatting to also work on CLI

Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-01 17:33:17 +01:00
Ettore Di Giacinto
397f7f0862 fix(ui): take account of reasoning in token count calculation (#8324)
We were skipping reasoning traces when counting tokens, yielding to a
wrong sum count.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-01 10:48:31 +01:00
LocalAI [bot]
234072769c chore(model gallery): 🤖 add 1 new models via gallery agent (#8321)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-01 09:02:05 +01:00
LocalAI [bot]
3445415b3d chore: ⬆️ Update ggml-org/llama.cpp to 41ea26144e55d23f37bb765f88c07588d786567f (#8317)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 21:18:31 +00:00
LocalAI [bot]
b05e110aa6 chore: ⬆️ Update ggml-org/llama.cpp to 1488339138d609139c4400d1b80f8a5b1a9a203c (#8306)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 08:59:09 +01:00
LocalAI [bot]
e69cba2444 chore: ⬆️ Update ggml-org/whisper.cpp to aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075 (#8305)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-31 08:58:54 +01:00
LocalAI [bot]
f7903597ac chore(model-gallery): ⬆️ update checksum (#8307)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-30 21:27:48 +01:00
LocalAI [bot]
ee76a0cd1c feat(swagger): update swagger (#8304)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-30 21:27:10 +01:00
Ettore Di Giacinto
4ca5b737bf chore(cuda): target 12.8 for 12 to increase compatibility (#8297)
Some datacenter setups might be stuck with the 5.x kernel which doesn't
play well with CUDA >=12.9. To incrase compatibility with the CUDA 12.x
branch, downgrade to 12.8. For newer systems, it is still suggested to
use CUDA 13.x wherever compatible.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 12:58:44 +01:00
Ettore Di Giacinto
4077aaf978 chore: re-enable e2e tests, fixups anthropic API tools support (#8296)
* chore(tests): add mock backend e2e tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup anthropic tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* prepare e2e tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop repetitive tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop specific CI workflow

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixup anthropic issues, move all e2e tests to use mocked backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 12:41:50 +01:00
Ettore Di Giacinto
68dd9765a0 feat(tts): add support for streaming mode (#8291)
* feat(tts): add support for streaming mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Send first audio, make sure it's 16

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 11:58:01 +01:00
LocalAI [bot]
2c44b06a67 chore: ⬆️ Update ggml-org/llama.cpp to 4fdbc1e4dba428ce0cf9d2ac22232dc170bbca82 (#8283)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-29 23:43:29 +01:00
LocalAI [bot]
7cc90db3e5 chore(model-gallery): ⬆️ update checksum (#8285)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-29 21:51:18 +01:00
Ettore Di Giacinto
1e08e02598 feat(qwen-asr): add support to qwen-asr (#8281)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-29 21:50:35 +01:00
Richard Palethorpe
dd8e74a486 feat(realtime): Add audio conversations (#6245)
* feat(realtime): Add audio conversations

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(realtime): Vendor the updated API and modify for server side

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): Update to the GA realtime API

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore: Document realtime API and add docs to AGENTS.md

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat: Filter reasoning from spoken output

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Send delta and done events for tool calls and audio transcripts

Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.

💘 Generated with Crush

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(realtime): Improve tool call handling and error reporting

- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
  instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function

💘 Generated with Crush

Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-29 08:44:53 +01:00
Ettore Di Giacinto
48e08772f3 chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' (#8275)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-29 00:22:25 +01:00
LocalAI [bot]
c28c0227c6 chore: ⬆️ Update leejet/stable-diffusion.cpp to e411520407663e1ddf8ff2e5ed4ff3a116fbbc97 (#8274)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-28 21:23:05 +00:00
Richard Palethorpe
856ca2d6b1 fix(qwen3): Be explicit with function calling format (#8265)
Qwen3 4b was using the wrong function format (i.e. using "function"
instead of "name") within the realtime API.

If we specify the function calling format explicitly then it stops it.

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-28 14:44:29 +01:00
Ettore Di Giacinto
9b973b79f6 feat: add VoxCPM tts backend (#8109)
* feat: add VoxCPM tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable voxcpm on arm64 cpu

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 14:44:04 +01:00
Ettore Di Giacinto
cba8ef4e38 chore: fix backend icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 09:09:00 +01:00
Ettore Di Giacinto
f729e300d6 chore: fix backend icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 09:08:03 +01:00
LocalAI [bot]
9916811a79 chore: ⬆️ Update ggml-org/llama.cpp to 2b4cbd2834e427024bc7f935a1f232aecac6679b (#8258)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-28 08:50:16 +01:00
Ettore Di Giacinto
2f7c595cd1 chore(model gallery): add z-image and z-image-turbo for diffusers (#8260)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-27 22:42:10 +01:00
rampa3
73decac746 chore(model gallery): Add mistral-community/pixtral-12b with mmproj (#8245)
Rebased branch add_pixtral on master

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 21:43:31 +01:00
Ettore Di Giacinto
ec1598868b feat(vibevoice): add ASR support (#8222)
* feat(vibevoice): add ASR support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): download voice files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run on bigger runner

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* CI can't hold vibevoice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-27 20:19:22 +01:00
rampa3
93d7e5d4b8 chore(model gallery): Add entry for Magistral Small 1.2 with mmproj (#8248)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 16:55:00 +01:00
rampa3
ff5a54b9d1 chore(model gallery): Add entry for Mistral Small 3.1 with mmproj (#8247)
* chore(model gallery): Add entry for Mistral Small 3.1 with mmproj

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* Use llama-cpp subfolder structure akin to Qwen 3 VL

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-27 16:54:14 +01:00
LocalAI [bot]
3c1f823c47 chore: ⬆️ Update ggml-org/llama.cpp to 8f80d1b254aef70a0959e314be368d05debe7294 (#8229)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 21:19:43 +00:00
LocalAI [bot]
4024220d00 chore(model gallery): 🤖 add 1 new models via gallery agent (#8220)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 12:11:24 +01:00
LocalAI [bot]
f76958d761 chore: ⬆️ Update ggml-org/llama.cpp to 0440bfd1605333726ea0fb7a836942660bf2f9a6 (#8216)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 00:50:35 +01:00
LocalAI [bot]
2bd5ca45de chore: ⬆️ Update leejet/stable-diffusion.cpp to 43e829f21966abb96b08c712bccee872dc820914 (#8215)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-26 00:50:16 +01:00
Ettore Di Giacinto
6804ce1c39 chore(docs): change MEMORY_FILE_PATH to MEMORY_INDEX_PATH
Updated MEMORY_FILE_PATH to MEMORY_INDEX_PATH in memory configuration.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-25 22:14:11 +01:00
Dedy F. Setyawan
d499071bff fix(ui): correctly display selected image model (#8208)
Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-01-25 14:54:40 +01:00
Ettore Di Giacinto
26a374b717 chore: drop bark which is unmaintained (#8207)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-25 09:26:40 +01:00
rampa3
980de0e25b chore(model gallery): Add most of not yet present Piper voices from Hugging Face (#8202)
Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-25 08:56:53 +01:00
Ettore Di Giacinto
4767371aee chore(README): Add links
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:49:27 +01:00
Ettore Di Giacinto
131d247b78 chore(README): Update and simplify links
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:46:40 +01:00
Ettore Di Giacinto
b2a8a63899 feat(vllm-omni): add new backend (#8188)
* feat(vllm-omni: add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* default to py3.12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:23:30 +01:00
LocalAI [bot]
05a332cd5f chore: ⬆️ Update ggml-org/llama.cpp to bb02f74c612064947e51d23269a1cf810b67c9a7 (#8196)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 21:19:43 +00:00
Ettore Di Giacinto
05904c77f5 chore(exllama): drop backend now almost deprecated (#8186)
exllama2 development has stalled and only old architectures are
supported. exllamav3 is still in development, meanwhile cleaning up
exllama2 from the gallery.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 08:57:37 +01:00
LocalAI [bot]
17783fa7d9 chore: ⬆️ Update leejet/stable-diffusion.cpp to fa61ea744d1a87fa26a63f8a86e45587bc9534d6 (#8184)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 08:57:24 +01:00
LocalAI [bot]
4019094111 chore: ⬆️ Update ggml-org/llama.cpp to 557515be1e93ed8939dd8a7c7d08765fdbe8be31 (#8183)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-24 08:57:08 +01:00
Ettore Di Giacinto
ca65fc751a chore(model gallery): add qwen3-tts to model gallery (#8187)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 23:06:50 +01:00
LocalAI [bot]
a1e3acc590 docs: ⬆️ update docs version mudler/LocalAI (#8182)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-23 22:03:47 +01:00
Ettore Di Giacinto
a36960e069 fix(qwen-tts): change icon URL in index.yaml
Updated the icon URL for the project.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-23 22:00:14 +01:00
Ettore Di Giacinto
58bb6a29ed Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" (#8180)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/ba…"

This reverts commit 5881c82413.
2026-01-23 17:25:04 +01:00
dependabot[bot]
5881c82413 chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory (#8175)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/bark directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-23 15:32:15 +00:00
Ettore Di Giacinto
923ebbb344 feat(qwen-tts): add Qwen-tts backend (#8163)
* feat(qwen-tts): add Qwen-tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update intel deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop flash-attn for cuda13

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 15:18:41 +01:00
LocalAI [bot]
ea51567b89 chore(model gallery): 🤖 add 1 new models via gallery agent (#8170)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-23 08:19:39 +01:00
LocalAI [bot]
552c62a19c chore: ⬆️ Update leejet/stable-diffusion.cpp to 5e4579c11d0678f9765463582d024e58270faa9c (#8166)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-23 08:18:05 +01:00
Ettore Di Giacinto
c0b21a921b feat: detect thinking support from backend automatically if not explicitly set (#8167)
detect thinking support from backend automatically if not explicitly set

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 00:38:28 +01:00
LocalAI [bot]
b10045adc2 chore: ⬆️ Update ggml-org/llama.cpp to a5eaa1d6a3732bc0f460b02b61c95680bba5a012 (#8165)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-22 23:32:05 +00:00
Ettore Di Giacinto
61b5e3b629 chore: drop test file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 22:19:38 +00:00
Ettore Di Giacinto
e35d7cb3b3 chore: drop test file
the function now was removed

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 21:47:52 +00:00
Ettore Di Giacinto
0fa0ac4797 fix(videogen): drop incomplete endpoint, add GGUF support for LTX-2 (#8160)
* Debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop openai video endpoint (is not complete)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add download button

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 14:09:20 +01:00
LocalAI [bot]
be7ed85838 chore(model gallery): 🤖 add 1 new models via gallery agent (#8157)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 08:25:40 +01:00
LocalAI [bot]
c12b310028 chore: ⬆️ Update ggml-org/llama.cpp to c301172f660a1fe0b42023da990bf7385d69adb4 (#8151)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 00:51:22 +01:00
LocalAI [bot]
0447d5564d chore: ⬆️ Update leejet/stable-diffusion.cpp to 329571131d62d64a4f49e1acbef49ae02544fdcd (#8152)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-22 00:50:41 +01:00
Ettore Di Giacinto
22c0eb5421 chore(diffusers): add 'av' to requirements.txt (#8155)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-21 22:35:00 +01:00
LocalAI [bot]
a0a00fb937 chore(model-gallery): ⬆️ update checksum (#8153)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 21:45:11 +01:00
LocalAI [bot]
6dd44742ea feat(swagger): update swagger (#8150)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 21:44:44 +01:00
Richard Palethorpe
00c72e7d3e fix(tracing): Create trace buffer on first request to enable tracing at runtime (#8148)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-21 18:39:39 +01:00
LocalAI [bot]
d01c335cf6 chore: ⬆️ Update ggml-org/whisper.cpp to 7aa8818647303b567c3a21fe4220b2681988e220 (#8146)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 17:44:01 +01:00
LocalAI [bot]
5687df4535 chore: ⬆️ Update ggml-org/llama.cpp to ad8d85bd94cc86e89d23407bdebf98f2e6510c61 (#8145)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-21 15:41:36 +00:00
Ettore Di Giacinto
f5fade97e6 chore: drop noisy logs (#8142)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 09:52:20 +01:00
Ettore Di Giacinto
b88ae31e4e chore(model gallery): add flux 2 and flux 2 klein (#8141)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 09:46:33 +01:00
Ettore Di Giacinto
f6daaa7c35 chore(deps): Bump llama.cpp to '1c7cf94b22a9dc6b1d32422f72a627787a4783a3' (#8136)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 00:12:13 +01:00
Ettore Di Giacinto
c491c6ca90 feat(openresponses): Support reasoning blocks (#8133)
* feat(openresponses): support reasoning blocks

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* allow to disable reasoning, refactor common logic

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add option to only strip reasoning

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add configurations for custom reasoning tokens

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-21 00:11:45 +01:00
Ettore Di Giacinto
34e054f607 fix(reasoning): support models with reasoning without starting thinking tag (#8132)
* chore: extract reasoning to its own package

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* make sure we detect thinking tokens from template

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Allow to override via config, add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-20 21:07:59 +01:00
LocalAI [bot]
e886bb291a chore(model gallery): 🤖 add 1 new models via gallery agent (#8128)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-20 12:58:29 +01:00
Ettore Di Giacinto
4bf2f8bbd8 chore(docs): update docs with Anthropic API and openresponses
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-20 09:25:24 +01:00
LocalAI [bot]
d3525b7509 chore: ⬆️ Update ggml-org/llama.cpp to 959ecf7f234dc0bc0cd6829b25cb0ee1481aa78a (#8122)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 22:50:47 +01:00
LocalAI [bot]
c8aa821e0e chore: ⬆️ Update leejet/stable-diffusion.cpp to a48b4a3ade9972faf0adcad47e51c6fc03f0e46d (#8121)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 22:27:46 +01:00
dependabot[bot]
b3191927ae chore(deps): bump github.com/mudler/cogito from 0.7.2 to 0.8.1 (#8124)
Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.7.2 to 0.8.1.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/compare/v0.7.2...v0.8.1)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.8.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-19 22:26:26 +01:00
LocalAI [bot]
54c5a2d9ea docs: ⬆️ update docs version mudler/LocalAI (#8120)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-19 21:18:24 +00:00
Ettore Di Giacinto
0279591fec Enable reranking for Qwen3-VL-Reranker-8B
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-19 15:28:58 +01:00
LocalAI [bot]
8845186955 chore: ⬆️ Update leejet/stable-diffusion.cpp to 2efd19978dd4164e387bf226025c9666b6ef35e2 (#8099)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:40:35 +01:00
LocalAI [bot]
ab8ed24358 chore: ⬆️ Update ggml-org/llama.cpp to 287a33017b32600bfc0e81feeb0ad6e81e0dd484 (#8100)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:40:14 +01:00
LocalAI [bot]
a021df5a88 feat(swagger): update swagger (#8098)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-18 22:10:06 +01:00
Ettore Di Giacinto
5f403b1631 chore: drop neutts for l4t (#8101)
Builds exhausts CI currently, and there are better backends at this
point in time. We will probably deprecate it in the future.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-18 21:55:56 +01:00
rampa3
897ad1729e chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request (#8082)
* chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* added missing model config import URL

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-18 09:23:07 +01:00
LocalAI [bot]
16a18a2e55 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9565c7f6bd5fcff124c589147b2621244f2c4aa1 (#8086)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 22:12:21 +01:00
Ettore Di Giacinto
3387bfaee0 feat(api): add support for open responses specification (#8063)
* feat: openresponses

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add ttl settings, fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: register cors middleware by default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* satisfy schema

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Logitbias and logprobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add grammar

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* SSE compliance

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tool JSON conversion

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* support background mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* swagger

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop code. This is handled in the handler

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* background mode for MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 22:11:47 +01:00
LocalAI [bot]
1cd33047b4 chore: ⬆️ Update ggml-org/llama.cpp to 2fbde785bc106ae1c4102b0e82b9b41d9c466579 (#8087)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 21:10:18 +00:00
Ettore Di Giacinto
1de045311a chore(ui): add video generation link (#8079)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 09:49:47 +01:00
LocalAI [bot]
5fe9bf9f84 chore: ⬆️ Update ggml-org/whisper.cpp to f53dc74843e97f19f94a79241357f74ad5b691a6 (#8074)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 08:32:53 +01:00
LocalAI [bot]
d4fd0c0609 chore: ⬆️ Update ggml-org/llama.cpp to 388ce822415f24c60fcf164a321455f1e008cafb (#8073)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-16 21:22:33 +00:00
Ettore Di Giacinto
d16722ee13 Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" (#8072)
Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/pyt…"

This reverts commit 1f10ab39a9.
2026-01-16 20:50:33 +01:00
dependabot[bot]
1f10ab39a9 chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory (#8066)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: [torch](https://github.com/pytorch/pytorch).


Updates `torch` from 2.3.1+cxx11.abi to 2.8.0
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.8.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.8.0
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-16 19:38:12 +00:00
Ettore Di Giacinto
4d36e393d1 fix(ci): use more beefy runner for expensive jobs (#8065)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-16 19:26:40 +01:00
LocalAI [bot]
cb8616c7d1 chore: ⬆️ Update ggml-org/llama.cpp to 785a71008573e2d84728fb0ba9e851d72d3f8fab (#8053)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 22:53:17 +01:00
LocalAI [bot]
ff31d50488 chore: ⬆️ Update ggml-org/whisper.cpp to 2eeeba56e9edd762b4b38467bab96c2517163158 (#8052)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 22:52:56 +01:00
Divyanshupandey007
1a50717e33 fix: reduce log verbosity for /api/operations polling (#8050)
* fix: reduce log verbosity for /api/operations polling

Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes #7989.

* fix: reduce log verbosity for /api/operations polling

Reduces log clutter by changing the log level from INFO to DEBUG for successful (200 OK) /api/operations requests. This endpoint is polled frequently by the Web UI, causing log spam. Fixes #7989.
2026-01-15 21:13:13 +01:00
LocalAI [bot]
49d6305509 chore: ⬆️ Update ggml-org/llama.cpp to d98b548120eecf98f0f6eaa1ba7e29b3afda9f2e (#8040)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-15 08:39:46 +01:00
Ettore Di Giacinto
d20a113aef fix(functions): do not duplicate function when valid JSON is inside XML tags (#8043)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 23:42:00 +01:00
LocalAI [bot]
cbaa793520 chore: ⬆️ Update ggml-org/whisper.cpp to 47af2fb70f7e4ee1ba40c8bed513760fdfe7a704 (#8039)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 22:12:32 +01:00
Ettore Di Giacinto
6fe3fc880f Update section headers in README.md for clarity
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-14 22:11:58 +01:00
Ettore Di Giacinto
752e641c48 Clarify Docker usage in README
Updated Docker section in README to clarify usage.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-14 22:10:59 +01:00
Ettore Di Giacinto
44d78b4d15 chore(doc): put alert on install.sh until is fixed (#8042)
See: https://github.com/mudler/LocalAI/issues/8032

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 22:08:48 +01:00
Ettore Di Giacinto
64d0a96ba3 feat(ui): add video gen UI (#8020)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 11:43:32 +01:00
Ettore Di Giacinto
b19afc9e64 feat(diffusers): add support to LTX-2 (#8019)
* feat(diffusers): add support to LTX-2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 09:07:30 +01:00
LocalAI [bot]
d6e698876b chore: ⬆️ Update ggml-org/llama.cpp to e4832e3ae4d58ac0ecbdbf4ae055424d6e628c9f (#8015)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:09:37 +01:00
LocalAI [bot]
8962205546 chore: ⬆️ Update ggml-org/whisper.cpp to a96310871a3b294f026c3bcad4e715d17b5905fe (#8014)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:09:00 +01:00
LocalAI [bot]
eddc460118 chore: ⬆️ Update leejet/stable-diffusion.cpp to 7010bb4dff7bd55b03d35ef9772142c21699eba9 (#8013)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-14 08:08:31 +01:00
Ettore Di Giacinto
a6ff354c86 feat(tts): add pocket-tts backend (#8018)
* feat(pocket-tts): add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-13 23:35:19 +01:00
dependabot[bot]
3a2be4df48 chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 (#8004)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.27.3 to 2.27.5.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.27.3...v2.27.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 09:06:20 +01:00
dependabot[bot]
4e1f448e86 chore(deps): bump fyne.io/fyne/v2 from 2.7.1 to 2.7.2 (#8003)
Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.1 to 2.7.2.
- [Release notes](https://github.com/fyne-io/fyne/releases)
- [Changelog](https://github.com/fyne-io/fyne/blob/master/CHANGELOG.md)
- [Commits](https://github.com/fyne-io/fyne/compare/v2.7.1...v2.7.2)

---
updated-dependencies:
- dependency-name: fyne.io/fyne/v2
  dependency-version: 2.7.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:58 +01:00
dependabot[bot]
3e0168360a chore(deps): bump github.com/gpustack/gguf-parser-go from 0.22.1 to 0.23.1 (#8001)
chore(deps): bump github.com/gpustack/gguf-parser-go

Bumps [github.com/gpustack/gguf-parser-go](https://github.com/gpustack/gguf-parser-go) from 0.22.1 to 0.23.1.
- [Release notes](https://github.com/gpustack/gguf-parser-go/releases)
- [Commits](https://github.com/gpustack/gguf-parser-go/compare/v0.22.1...v0.23.1)

---
updated-dependencies:
- dependency-name: github.com/gpustack/gguf-parser-go
  dependency-version: 0.23.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:35 +01:00
dependabot[bot]
ea4157887b chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 (#8000)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.38.3 to 1.39.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.38.3...v1.39.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:45:18 +01:00
dependabot[bot]
699c50be47 chore(deps): bump github.com/mudler/go-processmanager from 0.0.0-20240820160718-8b802d3ecf82 to 0.1.0 (#7992)
chore(deps): bump github.com/mudler/go-processmanager

Bumps [github.com/mudler/go-processmanager](https://github.com/mudler/go-processmanager) from 0.0.0-20240820160718-8b802d3ecf82 to 0.1.0.
- [Release notes](https://github.com/mudler/go-processmanager/releases)
- [Commits](https://github.com/mudler/go-processmanager/commits/v0.1.0)

---
updated-dependencies:
- dependency-name: github.com/mudler/go-processmanager
  dependency-version: 0.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 08:44:53 +01:00
dependabot[bot]
94eecc43a3 chore(deps): bump protobuf from 6.33.2 to 6.33.4 in /backend/python/transformers (#7993)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.2 to 6.33.4.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 23:46:32 +00:00
LocalAI [bot]
7e35ec6c4f chore: ⬆️ Update ggml-org/llama.cpp to bcf7546160982f56bc290d2e538544bbc0772f63 (#7991)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-12 21:14:33 +00:00
Ettore Di Giacinto
7891c33cb1 chore(vulkan): bump vulkan-sdk to 1.4.335.0 (#7981)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-12 07:51:26 +01:00
Ettore Di Giacinto
271cc79709 chore(backends): do not bundle cuda target directory (#7982)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-12 07:51:09 +01:00
LocalAI [bot]
3d12d5e70d chore: ⬆️ Update leejet/stable-diffusion.cpp to 885e62ea822e674c6837a8225d2d75f021b97a6a (#7979)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-11 22:44:11 +01:00
LocalAI [bot]
bc180c2638 chore: ⬆️ Update ggml-org/llama.cpp to 0c3b7a9efebc73d206421c99b7eb6b6716231322 (#7978)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-11 22:06:30 +01:00
Ettore Di Giacinto
2de30440fe fix(l4t-12): use pip to install python deps (#7967)
* fix: install only torch/torchvision from jetson index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use pip for l4t-12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Revert "fix: install only torch/torchvision from jetson index"

This reverts commit 2d2b020078

* chatterbox needs wheel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-11 00:21:32 +01:00
Copilot
673a80a578 feat: Filter backend gallery by system capabilities (#7950)
* Initial plan

* Add backend gallery filtering based on system capabilities

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Refactor L4T backend check to come before NVIDIA check

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Refactor: move capabilities business logic to capabilities.go and use constants

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* feat: display system capability in webui and refactor tests

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* chore: rename System/Capability

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor: use getSystemCapabilities in IsBackendCompatible for consistency

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* refactor: keep unused constants private in capabilities.go

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* fix: skip AMD/ROCm and Intel/SYCL tests on darwin

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-10 23:34:01 +01:00
Jon Roeber
2554e9fabe fix(model): do not assume success when deleting a model process (#7963)
* fix(model): do not assume success when deleting a model process

Signed-off-by: Jon Roeber <jon@roeber.dev>

* Update pkg/model/process.go

Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com>

---------

Signed-off-by: Jon Roeber <jon@roeber.dev>
Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-10 23:33:44 +01:00
LocalAI [bot]
5bfc3eebf8 chore: ⬆️ Update ggml-org/llama.cpp to b1377188784f9aea26b8abde56d4aee8c733eec7 (#7965)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 22:24:26 +01:00
LocalAI [bot]
ab893fe302 feat(swagger): update swagger (#7964)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 21:46:23 +01:00
Ettore Di Giacinto
c88074a19e feat(api): support 'reasoning' api field (#7959)
This PR adds support to support the 'reasoning' API field of the OpenAI
spec.

LocalAI now will extract automatically thinking tags in both SSE and
non-SSE mode. The changes are adapted as well to the Chat UI now that
will use the reasoning field to extract the thinking process and display
it in the chat.

This fixes https://github.com/mudler/LocalAI/issues/7944

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-10 19:06:12 +01:00
Copilot
5ca8f0aea0 feat: add tool/function calling support to Anthropic Messages API (#7956)
* Initial plan

* Add tool/function calling schema support to Anthropic Messages API

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add E2E tests for Anthropic tool calling

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Make tool calling tests require model to use tools

- First test now expects hasToolUse to be true with clear error message
- Third test now expects toolUseID to be non-empty (removed conditional)
- Both tests will now fail if model doesn't call the expected tools

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add E2E test for tool calling with streaming responses

- Tests that streaming events are properly emitted (content_block_start/delta/stop)
- Verifies tool_use blocks are accumulated correctly in streaming mode
- Ensures model calls tools and stop_reason is set to tool_use

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 18:44:22 +01:00
LocalAI [bot]
84234e531f chore(model gallery): 🤖 add 1 new models via gallery agent (#7954)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 12:34:23 +01:00
Copilot
4cbf9abfef feat: Add Anthropic Messages API support (#7948)
* Initial plan

* Add Anthropic Messages API support

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review comments: add error handling for JSON operations

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix test suite to use existing schema test runner

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add Anthropic e2e tests using anthropic-sdk-go for streaming and non-streaming

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-10 12:33:05 +01:00
LocalAI [bot]
fdc2c0737c chore: ⬆️ Update ggml-org/llama.cpp to 593da7fa49503b68f9f01700be9f508f1e528992 (#7946)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-09 21:13:04 +00:00
Ettore Di Giacinto
f4b0a304d7 chore(llama.cpp): propagate errors during model load (#7937)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-09 07:52:49 +01:00
Ettore Di Giacinto
d16ec7aa9e chore(deps): Bump llama.cpp to '480160d47297df43b43746294963476fc0a6e10f' (#7933)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-09 07:52:32 +01:00
Ettore Di Giacinto
d699b7ccdc Add backend configuration for Granite embedding model
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-09 00:44:10 +01:00
Ettore Di Giacinto
a4d224dd1b Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t" (#7936)
Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)"

This reverts commit f5dee90962.
2026-01-08 23:31:51 +01:00
Ettore Di Giacinto
917c7aa9f3 chore(ci): roll back l4t-cuda12 configurations (#7935)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 23:04:33 +01:00
LocalAI [bot]
5aa66842dd chore: ⬆️ Update leejet/stable-diffusion.cpp to 0e52afc6513cc2dea9a1a017afc4a008d5acf2b0 (#7930)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 22:48:46 +01:00
Ettore Di Giacinto
f5dee90962 chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)
This is because the main index might not contain all the dependencies
for torch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 22:48:03 +01:00
Copilot
06323df457 Optimize GPU library copying to preserve symlinks and avoid duplicates (#7931)
* Initial plan

* Optimize library copying to preserve symlinks and avoid duplicates

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: extract get_inode helper, use file type detection for sorting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify implementation by removing inode tracking

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add clarifying comment about basename deduplication

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 22:26:48 +01:00
Richard Palethorpe
98f28bf583 chore(docs): Add Crush and VoxInput to the integrations (#7924)
* chore(docs): Add Crush and VoxInput to the integrations

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-08 21:39:25 +01:00
Ettore Di Giacinto
383312b50e chore(l4t-12): do not use python 3.12 (wheels are only for 3.10) (#7928)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 19:00:07 +01:00
Ettore Di Giacinto
b736db4bbe chore(ci): use latest jetpack image for l4t (#7926)
This image is for HW prior Jetpack 7. Jetpack 7 broke compatibility with
older devices (which are still in use) such as AGX Orin or Jetsons.

While we do have l4t-cuda-13 images with sbsa support for new Nvidia
devices (Thor, DGX, etc). For older HW we are forced to keep old images
around as 24.04 does not seem to be supported.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 18:30:59 +01:00
LocalAI [bot]
09bc2e4a00 chore(model gallery): 🤖 add 1 new models via gallery agent (#7922)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-08 11:06:21 +01:00
LocalAI [bot]
c03e532a18 chore: ⬆️ Update ggml-org/llama.cpp to ae9f8df77882716b1702df2bed8919499e64cc28 (#7915)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 23:24:01 +01:00
Ettore Di Giacinto
fcb58ee243 fix(intel): Add ARG for Ubuntu codename in Dockerfile (#7917)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 21:55:18 +01:00
Copilot
b2ff1cea2a feat: enable Vulkan arm64 image builds (#7912)
* Initial plan

* Add arm64 support for Vulkan builds in Dockerfiles and workflows

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 21:49:50 +01:00
Ettore Di Giacinto
b964b3d53e feat(backends): add moonshine backend for faster transcription (#7833)
* feat(backends): add moonshine backend for faster transcription

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add backend to CI, update AGENTS.md from this exercise

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 21:44:35 +01:00
LocalAI [bot]
0b26669d0b chore(model gallery): 🤖 add 1 new models via gallery agent (#7916)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 21:43:39 +01:00
Ettore Di Giacinto
5a9698bc69 chore(Dockerfile): restore GPU vendor specific sections (#7911)
Until we figure out https://github.com/mudler/LocalAI/issues/7909

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:34:23 +01:00
Ettore Di Giacinto
1fe0e9f74f chore(ci): restore building of GPU vendor images (#7910)
Until we figure out https://github.com/mudler/LocalAI/issues/7909

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:32:22 +01:00
Ettore Di Giacinto
ffb2dc4666 chore(detection): detect GPU vendor from files present in the system (#7908)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:18:27 +01:00
Ettore Di Giacinto
cfc2225fc7 chore(dockerfile): drop driver-requirements section (#7907)
* chore(dockerfile): drop driver-requirements section

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(ci): drop other builds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 16:18:14 +01:00
Copilot
fd53978a7b feat: package GPU libraries inside backend containers for unified base image (#7891)
* Initial plan

* Add GPU library packaging for isolated backend environments

- Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries
- Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries
- Update Dockerfile.python to package GPU libraries into Python backends
- Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: fix variable consistency and quoting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review issues: improve glob handling and remove redundant variable

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify main Dockerfile and workflow to use unified base image

- Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel)
- Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64
- GPU libraries are now packaged in individual backend containers

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 15:48:51 +01:00
LocalAI [bot]
7abc0242bb chore(model gallery): 🤖 add 1 new models via gallery agent (#7903)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-07 09:46:36 +01:00
LocalAI [bot]
23df29fbd3 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9be0b91927dfa4007d053df72dea7302990226bb (#7895)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-06 22:18:53 +01:00
LocalAI [bot]
fb9879949c chore: ⬆️ Update ggml-org/llama.cpp to ccbc84a5374bab7a01f68b129411772ddd8e7c79 (#7894)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-06 22:18:35 +01:00
Manish Dewangan
1642b39cb8 [gallery] add JSON schema for gallery model specification (#7890)
Add JSON Schema for gallery model specification

Signed-off-by: devmanishofficial <devmanishofficial@gmail.com>
2026-01-06 22:10:43 +01:00
Richard Palethorpe
e6ba26c3e7 chore: Update to Ubuntu24.04 (cont #7423) (#7769)
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): bump GitHub Actions CUDA support to 12.9

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): disable parallel backend builds to avoid race conditions

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): update backend Dockerfiles to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore: add local-ai-launcher to .gitignore

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix backends GitHub Actions workflows after rebase

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): use build-time UBUNTU_VERSION variable

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(docker): remove libquadmath0 from requirements-stage base image

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(docker): correct CUDA installation steps in backend Dockerfiles

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): update ROCm to 6.4 and align Python hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix failing GitHub Actions runners

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): rm all traces of CUDA 11

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): Add Ubuntu codename as an argument

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
2026-01-06 15:26:42 +01:00
Ettore Di Giacinto
26c4f80d1b chore(llama.cpp/flags): simplify conditionals (#7887)
If ggml handle conditionals correctly we don't need to handle it here.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-06 15:02:20 +01:00
coffeerunhobby
5add7b47f5 fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) (#7864)
* Fix BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge)

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>

* Address feedback from review

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-06 00:13:48 +00:00
Ettore Di Giacinto
3244ccc224 chore(image-ui): simplify interface (#7882)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-05 23:20:28 +01:00
LocalAI [bot]
4f7b6b0bff chore: ⬆️ Update ggml-org/llama.cpp to e443fbcfa51a8a27b15f949397ab94b5e87b2450 (#7881)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:55:40 +01:00
LocalAI [bot]
3a629cea2f chore: ⬆️ Update ggml-org/whisper.cpp to 679bdb53dbcbfb3e42685f50c7ff367949fd4d48 (#7879)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:55:16 +01:00
LocalAI [bot]
f917feda29 chore: ⬆️ Update leejet/stable-diffusion.cpp to c5602a676caff5fe5a9f3b76b2bc614faf5121a5 (#7880)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 22:54:56 +01:00
dependabot[bot]
e2018cdc8f chore(deps): bump github.com/labstack/echo/v4 from 4.14.0 to 4.15.0 (#7875)
Bumps [github.com/labstack/echo/v4](https://github.com/labstack/echo) from 4.14.0 to 4.15.0.
- [Release notes](https://github.com/labstack/echo/releases)
- [Changelog](https://github.com/labstack/echo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/labstack/echo/compare/v4.14.0...v4.15.0)

---
updated-dependencies:
- dependency-name: github.com/labstack/echo/v4
  dependency-version: 4.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 22:54:30 +01:00
Manish Dewangan
a3b8a94187 fix(ui): fix 404 on API menu link by pointing to index.html (#7878)
Signed-off-by: devmanishofficial <devmanishofficial@gmail.com>
2026-01-05 22:54:14 +01:00
dependabot[bot]
41de7d32ad chore(deps): bump dependabot/fetch-metadata from 2.4.0 to 2.5.0 (#7876)
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](https://github.com/dependabot/fetch-metadata/compare/v2.4.0...v2.5.0)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 20:10:07 +00:00
Richard Palethorpe
93364df0a8 chore(AGENTS.md): Add section to help with building backends (#7871)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-01-05 18:25:52 +01:00
Ettore Di Giacinto
21c84f432f feat(function): Add tool streaming, XML Tool Call Parsing Support (#7865)
* feat(function): Add XML Tool Call Parsing Support

Extend the function parsing system in LocalAI to support XML-style tool calls, similar to how JSON tool calls are currently parsed. This will allow models that return XML format (like <tool_call><function=name><parameter=key>value</parameter></function></tool_call>) to be properly parsed alongside text content.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* thinking before tool calls, more strict support for corner cases with no tools

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Support streaming tools

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Iterative JSON

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Iterative parsing

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Consume JSON marker

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixup

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix pending TODOs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Don't run other parsing with ParseRegex

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-05 18:25:40 +01:00
LocalAI [bot]
9d3da0bed5 chore: ⬆️ Update ggml-org/llama.cpp to 4974bf53cf14073c7b66e1151348156aabd42cb8 (#7861)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-05 00:10:18 +01:00
LocalAI [bot]
1b063b5595 chore: ⬆️ Update leejet/stable-diffusion.cpp to b90b1ee9cf84ea48b478c674dd2ec6a33fd504d6 (#7862)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-04 23:52:01 +01:00
Ettore Di Giacinto
560bf50299 chore(Makefile): refactor common make targets (#7858)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-04 21:12:50 +01:00
LocalAI [bot]
a7e155240b chore: ⬆️ Update ggml-org/llama.cpp to e57f52334b2e8436a94f7e332462dfc63a08f995 (#7848)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-04 10:27:45 +01:00
LocalAI [bot]
793e4907a2 feat(swagger): update swagger (#7847)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-03 22:09:39 +01:00
Ettore Di Giacinto
d38811560c chore(docs): add opencode, GHA, and realtime voice assistant examples
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 22:03:43 +01:00
Ettore Di Giacinto
33cc0b8e13 fix(chat/ui): record model name in history for consistency (#7845)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 18:05:33 +01:00
lif
4cd95b8a9d fix: Highly inconsistent agent response to cogito agent calling MCP server - Body "Invalid http method" (#7790)
* fix: resolve duplicate MCP route registration causing 50% failure rate

Fixes #7772

The issue was caused by duplicate registration of the MCP endpoint
/mcp/v1/chat/completions in both openai.go and localai.go, leading
to a race condition where requests would randomly hit different
handlers with incompatible behaviors.

Changes:
- Removed duplicate MCP route registration from openai.go
- Kept the localai.MCPStreamEndpoint as the canonical handler
- Added all three MCP route patterns for backward compatibility:
  * /v1/mcp/chat/completions
  * /mcp/v1/chat/completions
  * /mcp/chat/completions
- Added comments to clarify route ownership and prevent future conflicts
- Fixed formatting in ui_api.go

The localai.MCPStreamEndpoint handler is more feature-complete as it
supports both streaming and non-streaming modes, while the removed
openai.MCPCompletionEndpoint only supported synchronous requests.

This eliminates the ~50% failure rate where the cogito library would
receive "Invalid http method" errors when internal HTTP requests were
routed to the wrong handler.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

* Address feedback from review

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 15:43:23 +01:00
LocalAI [bot]
8c504113a2 chore(model gallery): 🤖 add 1 new models via gallery agent (#7840)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-03 08:42:05 +01:00
coffeerunhobby
666d110714 fix: Prevent BMI2 instruction crash on AVX-only CPUs (#7817)
* Fix: Prevent BMI2 instruction crash on AVX-only CPUs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: apply no-bmi flags on non-darwin

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: coffeerunhobby <coffeerunhobby@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-03 08:36:55 +01:00
LocalAI [bot]
641606ae93 chore: ⬆️ Update ggml-org/llama.cpp to 706e3f93a60109a40f1224eaf4af0d59caa7c3ae (#7836)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 21:26:37 +00:00
Ettore Di Giacinto
5f6c941399 fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832)
fix(mmproj): fix loading mmproj in nested sub-dirs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-02 20:17:30 +01:00
LocalAI [bot]
1639fc6309 chore(model gallery): 🤖 add 1 new models via gallery agent (#7831)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 15:10:00 +01:00
Ettore Di Giacinto
841e8f6d47 fix(image-gen): fix scrolling issues (#7829)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-02 09:05:49 +01:00
LocalAI [bot]
fd152c97c0 chore(model gallery): 🤖 add 1 new models via gallery agent (#7826)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 08:45:43 +01:00
LocalAI [bot]
949de04052 chore: ⬆️ Update ggml-org/llama.cpp to ced765be44ce173c374f295b3c6f4175f8fd109b (#7822)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-02 08:44:49 +01:00
Ettore Di Giacinto
76cfe1f367 feat(image-gen/UI): move controls to the left, make the page more compact (#7823)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-01 22:07:42 +01:00
LocalAI [bot]
5ee6c1810b feat(swagger): update swagger (#7820)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-01 21:16:38 +01:00
LocalAI [bot]
7db79aadfa chore(model-gallery): ⬆️ update checksum (#7821)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-01 21:16:11 +01:00
nold
dee48679b4 Fix(gallery): Updated checksums for qwen3-vl-30b instruct & thinking (#7819)
* Fix(gallery): SHA256 hashes for qwen3-vl-30b-instruct

Signed-off-by: nold <Nold360@users.noreply.github.com>

* Fix(gallery): SHA256 checksums for qwen3-vl-30b-thinking

Signed-off-by: nold <Nold360@users.noreply.github.com>

---------

Signed-off-by: nold <Nold360@users.noreply.github.com>
2026-01-01 20:33:55 +01:00
LocalAI [bot]
94b47a9310 chore(model gallery): 🤖 add 1 new models via gallery agent (#7816)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-01 19:20:26 +01:00
LocalAI [bot]
bc3e8793ed chore: ⬆️ Update ggml-org/llama.cpp to 13814eb370d2f0b70e1830cc577b6155b17aee47 (#7809)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 23:04:01 +01:00
LocalAI [bot]
91978bb3a5 chore: ⬆️ Update ggml-org/whisper.cpp to e9898ddfb908ffaa7026c66852a023889a5a7202 (#7810)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 22:59:05 +01:00
Ettore Di Giacinto
797f27f09f feat(UI): image generation improvements (#7804)
* chore: drop mode from image generation(unused)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(UI): improve image generation front-end

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(UI): only ref images. files is to be deprecated

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do not override default steps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-31 21:59:46 +01:00
LocalAI [bot]
3f1631aa87 chore(model gallery): 🤖 add 1 new models via gallery agent (#7807)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 19:29:59 +01:00
LocalAI [bot]
dad509637e chore(model gallery): 🤖 add 1 new models via gallery agent (#7801)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 09:18:35 +01:00
LocalAI [bot]
218f3a126a chore: ⬆️ Update ggml-org/llama.cpp to 0f89d2ecf14270f45f43c442e90ae433fd82dab1 (#7795)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-31 08:53:41 +01:00
Ettore Di Giacinto
be77a845fa fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:34:25 +00:00
Ettore Di Giacinto
ca32286022 fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:27:48 +00:00
Ettore Di Giacinto
1f592505dd fix(gallery agent): change model
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:22:45 +00:00
Ettore Di Giacinto
b3bc623eb3 fix(gallery agent): fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 22:18:02 +00:00
Ettore Di Giacinto
e56391cf14 Add individual sponsors acknowledgment in README
Added a section to acknowledge individual sponsors and their contributions.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-30 23:01:22 +01:00
Ettore Di Giacinto
ef3ffe4a4e fix(gallery agent): fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 21:56:54 +00:00
Ettore Di Giacinto
3cffde2cd5 fix(gallery agent): skip model selection if only one
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-30 21:53:37 +00:00
LocalAI [bot]
234bf7e2ad feat(swagger): update swagger (#7794)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-30 21:05:01 +00:00
lif
ba73d2e759 fix: Failed to download checksums.txt when using launch to install localai (#7788)
* fix: add retry logic and fallback for checksums.txt download

- Add HTTP client with 30s timeout to ReleaseManager
- Implement downloadFileWithRetry with 3 attempts and exponential backoff
- Allow manual checksum placement at ~/.localai/checksums/checksums-<version>.txt
- Continue installation with warning if checksum download/verification fails
- Add test for HTTPClient initialization
- Fix linter error in systray_manager.go

Fixes #7385

Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: add retry logic and improve checksums.txt download handling

This commit addresses issue #7385 by implementing:
- Retry logic (3 attempts) for checksum file downloads
- Fallback to manually placed checksum files
- Option to proceed with installation if checksums unavailable (with warnings)
- Fixed resource leaks in download retry loop
- Added configurable HTTP client with 30s timeout

The installation will now be more resilient to network issues while
maintaining security through checksum verification when available.

Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: check for existing checksum file before downloading

This commit addresses the review feedback from mudler on PR #7788.
The code now checks if there's already a checksum file (either manually
placed or previously downloaded) and honors that, skipping download
entirely in such case.

Changes:
- Check for existing checksum file at ~/.localai/checksums/checksums-<version>.txt first
- Check for existing downloaded checksum file at binary path
- Only attempt to download if no existing checksum file is found
- This prevents unnecessary network requests and honors user-placed checksums

Signed-off-by: majiayu000 <1835304752@qq.com>

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 18:33:44 +01:00
Ettore Di Giacinto
592697216b Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" (#7789)
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7774)"

This reverts commit 0c16f55b45.
2025-12-30 09:58:13 +01:00
lif
8bd7143a44 fix: propagate validation errors (#7787)
fix: validate MCP configuration in model config

Fixes #7334

The Validate() function was not checking if MCP configuration
(mcp.stdio and mcp.remote) contains valid JSON. This caused
malformed JSON with missing commas to be silently accepted.

Changes:
- Add MCP configuration validation to ModelConfig.Validate()
- Properly report validation errors instead of discarding them
- Add test cases for valid and invalid MCP configurations

The fix ensures that malformed JSON in MCP config sections
will now be caught and reported during validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 09:54:27 +01:00
lif
0d0ef0121c fix: Usage for image generation is incorrect (and causes error in LiteLLM) (#7786)
* fix: Add usage fields to image generation response for OpenAI API compatibility

Fixes #7354

Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.

Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

* fix: Correct usage field types for image generation API compatibility

Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.

Fixes #7354

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30 09:53:05 +01:00
lif
d7b2eee08f fix: add nil checks before mergo.Merge to prevent panic in gallery model installation (#7785)
Fixes #7420

Added nil checks before calling mergo.Merge in InstallModelFromGallery and InstallModel
functions to prevent panic when req.Overrides or configOverrides are nil. The panic was
occurring at models.go:248 during Qwen-Image-Edit gallery model download.

Changes:
- Added nil check for req.Overrides before merging in InstallModelFromGallery (line 126)
- Added nil check for configOverrides before merging in InstallModel (line 248)
- Added test case to verify nil configOverrides are handled without panic

Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-30 09:51:45 +01:00
LocalAI [bot]
bc8ec5cb39 chore: ⬆️ Update ggml-org/llama.cpp to c9a3b40d6578f2381a1373d10249403d58c3c5bd (#7778)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-30 08:27:16 +01:00
dependabot[bot]
3f38fecdfc chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.1.0 to 1.2.0 (#7776)
chore(deps): bump github.com/modelcontextprotocol/go-sdk

Bumps [github.com/modelcontextprotocol/go-sdk](https://github.com/modelcontextprotocol/go-sdk) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/modelcontextprotocol/go-sdk/releases)
- [Commits](https://github.com/modelcontextprotocol/go-sdk/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/modelcontextprotocol/go-sdk
  dependency-version: 1.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 22:15:29 +01:00
dependabot[bot]
20a4199229 chore(deps): bump github.com/schollz/progressbar/v3 from 3.18.0 to 3.19.0 (#7775)
chore(deps): bump github.com/schollz/progressbar/v3

Bumps [github.com/schollz/progressbar/v3](https://github.com/schollz/progressbar) from 3.18.0 to 3.19.0.
- [Release notes](https://github.com/schollz/progressbar/releases)
- [Commits](https://github.com/schollz/progressbar/compare/v3.18.0...v3.19.0)

---
updated-dependencies:
- dependency-name: github.com/schollz/progressbar/v3
  dependency-version: 3.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 22:15:11 +01:00
Ettore Di Giacinto
ded9955881 chore(ci): do not select models if we have only 1 result
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 22:14:14 +01:00
dependabot[bot]
cf78f9a2a8 chore(deps): bump google.golang.org/grpc from 1.77.0 to 1.78.0 (#7777)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.78.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.77.0...v1.78.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.78.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 21:03:57 +01:00
dependabot[bot]
0c16f55b45 chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7774)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.9 to 2.22.11.
- [Release notes](https://github.com/securego/gosec/releases)
- [Commits](https://github.com/securego/gosec/compare/v2.22.9...v2.22.11)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 19:18:29 +00:00
Richard Palethorpe
0b80167912 chore: ⬆️ Update leejet/stable-diffusion.cpp to 4ff2c8c74bd17c2cfffe3a01be77743fb3efba2f (#7771)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix: Add KL_OPTIMAL scheduler, pass sampler to default scheduler for LCM and fixup other refactorings from upstream

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* Delete backend/go/stablediffusion-ggml/compile_commands.json

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-29 19:06:35 +01:00
Richard Palethorpe
99b5c5f156 feat(api): Allow tracing of requests and responses (#7609)
* feat(api): Allow tracing of requests and responses

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(traces): Add traces UI

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-29 11:06:06 +01:00
Ettore Di Giacinto
9ab812a8e8 chore(ci): be more precise when detecting existing models (#7767)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 10:06:42 +01:00
Ettore Di Giacinto
185a685211 fix(amd-gpu): correctly show total and used vram (#7761)
An example output of `rocm-smi --showproductname --showmeminfo vram --showuniqueid --csv`:

```
device,Unique ID,VRAM Total Memory (B),VRAM Total Used Memory (B),Card Series,Card Model,Card Vendor,Card SKU,Subsystem ID,Device Rev,Node ID,GUID,GFX Version
card0,0x9246____________,17163091968,692142080,Navi 21 [Radeon RX 6800/6800 XT / 6900 XT],0x73bf,Advanced Micro Devices Inc. [AMD/ATI],001,0x2406,0xc1,1,45534,gfx1030
card1,N/A,67108864,26079232,Raphael,0x164e,Advanced Micro Devices Inc. [AMD/ATI],RAPHAEL,0x364e,0xc6,2,52156,gfx1036
```

Total memory is actually showed before the total used memory as can be seen in https://github.com/LostRuins/koboldcpp/issues/1104#issuecomment-2321143507.

This PR fixes https://github.com/mudler/LocalAI/issues/7724

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-29 07:57:07 +01:00
LocalAI [bot]
1a6fd0f7fc chore: ⬆️ Update ggml-org/llama.cpp to 4ffc47cb2001e7d523f9ff525335bbe34b1a2858 (#7760)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-28 21:10:39 +00:00
LocalAI [bot]
c95c482f36 chore: ⬆️ Update ggml-org/llama.cpp to a4bf35889eda36d3597cd0f8f333f5b8a2fcaefc (#7751)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-27 21:09:12 +00:00
Ettore Di Giacinto
21c464c34f fix(cli): import via CLI needs system state (#7746)
pass system state to application config to avoid nil pointer exception
during import.

Fixes: https://github.com/mudler/LocalAI/issues/7728

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-27 11:10:28 +01:00
LocalAI [bot]
ddf0281785 chore: ⬆️ Update ggml-org/llama.cpp to 7ac8902133da6eb390c4d8368a7d252279123942 (#7740)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-26 21:44:34 +00:00
LocalAI [bot]
86c68c9623 chore: ⬆️ Update ggml-org/llama.cpp to 85c40c9b02941ebf1add1469af75f1796d513ef4 (#7731)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 21:10:28 +00:00
Ettore Di Giacinto
c844b7ac58 feat: disable force eviction (#7725)
* feat: allow to set forcing backends eviction while requests are in flight

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: try to make the request sit and retry if eviction couldn't be done

Otherwise calls that in order to pass would need to shutdown other
backends would just fail.

In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose settings to CLI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 14:26:18 +01:00
Ettore Di Giacinto
bb459e671f fix(ui): correctly parse import errors (#7726)
errors are nested

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 10:43:12 +01:00
LocalAI [bot]
2fe6e278c8 chore: ⬆️ Update ggml-org/llama.cpp to c18428423018ed214c004e6ecaedb0cbdda06805 (#7718)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 10:00:40 +01:00
LocalAI [bot]
ae69921d77 chore: ⬆️ Update ggml-org/whisper.cpp to 6114e692136bea917dc88a5eb2e532c3d133d963 (#7717)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-25 10:00:24 +01:00
Ettore Di Giacinto
bf2f95c684 chore(docs): update docs with cuda 13 instructions and the new vibevoice backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 10:00:07 +01:00
LocalAI [bot]
94069f2751 docs: ⬆️ update docs version mudler/LocalAI (#7716)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-24 21:06:02 +00:00
LocalAI [bot]
aadec0b8cb chore(model gallery): 🤖 add 1 new models via gallery agent (#7712)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-24 13:00:54 +01:00
Ettore Di Giacinto
35d71cf25e fix: remove duplicate logging line
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 09:35:18 +01:00
Ettore Di Giacinto
39a5a84e64 fix: include virtual config
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 09:30:29 +01:00
Ettore Di Giacinto
83ed16f325 chore(logging): be consistent and do not emit logs from echo (#7710)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 09:22:27 +01:00
Ettore Di Giacinto
c8173f0f67 chore(gallery): cleanup old architectures
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 09:14:03 +01:00
LocalAI [bot]
6dc2dbc835 chore(model gallery): 🤖 add 1 new models via gallery agent (#7707)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-24 08:34:18 +01:00
Ettore Di Giacinto
0a168830ea chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params (#7706)
* chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: update AGENTS.md

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-24 00:28:27 +01:00
LocalAI [bot]
96d3f0ebc8 chore(model gallery): 🤖 add 1 new models via gallery agent (#7700)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-23 08:53:18 +01:00
Ettore Di Giacinto
b8aacb39e8 Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" (#7698)
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7690)"

This reverts commit b698033ef9.
2025-12-22 23:58:42 +01:00
Ettore Di Giacinto
b36a7593fa chore(gallery): cleanup old (superseded) archs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-22 22:55:53 +00:00
Ettore Di Giacinto
1ab91edc08 chore(gallery): cleanup old (superseded) archs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-22 22:53:29 +00:00
Ettore Di Giacinto
31f4e0c46d chore(gallery agent): various fixups (#7697)
* chore(ci/agent): fix formatting issues

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: get icon from readme/hf and prepend to the gallery file

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-22 23:46:40 +01:00
dependabot[bot]
07c80fba88 chore(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.30 (#7692)
Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.29 to 1.7.30.
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.29...v1.7.30)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-version: 1.7.30
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 22:43:42 +01:00
dependabot[bot]
9256a21d2c chore(deps): bump github.com/jaypipes/ghw from 0.21.1 to 0.21.2 (#7694)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.21.1 to 0.21.2.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.21.1...v0.21.2)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.21.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 22:43:00 +01:00
dependabot[bot]
b3a81292c1 chore(deps): bump github.com/mudler/cogito from 0.7.1 to 0.7.2 (#7691)
Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.7.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 22:42:35 +01:00
dependabot[bot]
5fc0cafd86 chore(deps): bump github.com/mudler/xlog from 0.0.3 to 0.0.4 (#7695)
Bumps [github.com/mudler/xlog](https://github.com/mudler/xlog) from 0.0.3 to 0.0.4.
- [Release notes](https://github.com/mudler/xlog/releases)
- [Commits](https://github.com/mudler/xlog/compare/v0.0.3...v0.0.4)

---
updated-dependencies:
- dependency-name: github.com/mudler/xlog
  dependency-version: 0.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 22:42:08 +01:00
Richard Palethorpe
9783aeaef5 chore: Add AGENTS.md (#7688)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-22 22:41:33 +01:00
dependabot[bot]
b698033ef9 chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7690)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.9 to 2.22.11.
- [Release notes](https://github.com/securego/gosec/releases)
- [Commits](https://github.com/securego/gosec/compare/v2.22.9...v2.22.11)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-22 19:09:06 +00:00
Ettore Di Giacinto
fc6057a952 chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' (#7684)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-22 09:50:42 +01:00
Ettore Di Giacinto
8b3e0ebf8a chore: allow to set local-ai log format, default to custom one (#7679)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-21 21:21:59 +01:00
Mikhail Khludnev
53b0530275 docs: Add langchain-localai integration package to documentation (#7677)
Add `langchain-localai` integration package to documentation

Signed-off-by: Mikhail Khludnev <mkhludnev@users.noreply.github.com>
2025-12-21 21:02:14 +01:00
Ettore Di Giacinto
99d301fcf9 chore(deps): bump xlog to v0.0.3 (#7675)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-21 19:36:54 +01:00
Ettore Di Giacinto
c37785b78c chore(refactor): move logging to common package based on slog (#7668)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-21 19:33:13 +01:00
LocalAI [bot]
38cde81ff4 chore: ⬆️ Update ggml-org/llama.cpp to 52ab19df633f3de5d4db171a16f2d9edd2342fec (#7665)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-20 21:09:15 +00:00
Ettore Di Giacinto
8ba5d6e796 chore(cogito): respect application-level logging and propagate (#7656)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-19 23:02:08 +01:00
Ettore Di Giacinto
8b6f443cd5 chore(deps): bump cogito to latest and adapt API changes (#7655)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-19 22:50:18 +01:00
LocalAI [bot]
626057bcca chore: ⬆️ Update ggml-org/llama.cpp to ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787 (#7654)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-19 21:15:39 +00:00
LocalAI [bot]
aa0efeb0a8 chore: ⬆️ Update ggml-org/whisper.cpp to 6c22e792cb0ee155b6587ce71a8410c3aeb06949 (#7644)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-19 09:26:41 +01:00
LocalAI [bot]
f25ac00bca chore: ⬆️ Update ggml-org/llama.cpp to f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcd (#7642)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 21:17:14 +00:00
Richard Palethorpe
c3494a0927 chore: ⬆️ Update leejet/stable-diffusion.cpp to bda7fab9f208dff4b67179a68f694b6ddec13326 (#7639)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): Don't set removed lora model dir

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 20:52:22 +01:00
Richard Palethorpe
716dba94b4 feat(whisper): Add prompt to condition transcription output (#7624)
* chore(makefile): Add buildargs for sd and cuda when building backend

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(whisper): Add prompt to condition transcription output

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-18 14:40:45 +01:00
mintyleaf
247983265d fix(uri): consider subfolders when expanding huggingface URLs (#7634)
Update uri.go

Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
2025-12-18 09:12:16 +01:00
LocalAI [bot]
5515119a7e chore: ⬆️ Update ggml-org/llama.cpp to d37fc935059211454e9ad2e2a44e8ed78fd6d1ce (#7629)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 09:07:09 +01:00
LocalAI [bot]
4535e7dfc4 chore: ⬆️ Update ggml-org/whisper.cpp to 3e79e73eee32e924fbd34587f2f2ac5a45a26b61 (#7630)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-18 09:06:48 +01:00
Ettore Di Giacinto
d8ee02e607 chore(tests): simplify tests and run intensive ones only once
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-18 09:05:58 +01:00
Ettore Di Giacinto
2d2e8759bb fix(ci): remove specific version for grpcio packages (#7627)
Updated grpcio-tools and grpcio installation to latest version.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-17 19:18:07 +01:00
LocalAI [bot]
14bb65b57b chore: ⬆️ Update ggml-org/llama.cpp to ef83fb8601229ff650d952985be47e82d644bfaa (#7611)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-17 08:32:42 +01:00
Ettore Di Giacinto
3ca90876f1 chore(memory detection): do not use go-sigar as requires CGO on darwin (#7618)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 23:10:42 +01:00
Ettore Di Giacinto
f251bdee64 chore: fixup tests with defaults from constants
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 21:26:55 +00:00
Ettore Di Giacinto
61afe4ca60 chore: drop drawin-x86_64 support (#7616)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 21:22:15 +01:00
Ettore Di Giacinto
424c95edba fix: correctly propagate error during model load (#7610)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 18:26:54 +01:00
Ettore Di Giacinto
b348a99b03 chore: move defaults to constants
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 17:40:51 +01:00
Ettore Di Giacinto
f3c70a96ba chore(memory-reclaimer): use saner defaults
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 16:25:09 +01:00
Ettore Di Giacinto
e3e5f59965 fix(ram): do not read from cgroup (#7606)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 13:28:11 +01:00
blightbow
67baf66555 feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling (#7556)
* feat(mlx): add thread-safe LRU prompt cache

Port mlx-lm's LRUPromptCache to fix race condition where concurrent
requests corrupt shared KV cache state. The previous implementation
used a single prompt_cache instance shared across all requests.

Changes:
- Add backend/python/common/mlx_cache.py with ThreadSafeLRUPromptCache
- Modify backend.py to use per-request cache isolation via fetch/insert
- Add prefix matching for cache reuse across similar prompts
- Add LRU eviction (default 10 entries, configurable)
- Add concurrency and cache unit tests

The cache uses a trie-based structure for efficient prefix matching,
allowing prompts that share common prefixes to reuse cached KV states.
Thread safety is provided via threading.Lock.

New configuration options:
- max_cache_entries: Maximum LRU cache entries (default: 10)
- max_kv_size: Maximum KV cache size per entry (default: None)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* feat(mlx): add min_p and top_k sampler support

Add MinP field to proto (field 52) following the precedent set by
other non-OpenAI sampling parameters like TopK, TailFreeSamplingZ,
TypicalP, and Mirostat.

Changes:
- backend.proto: Add float MinP field for min-p sampling
- backend.py: Extract and pass min_p and top_k to mlx_lm sampler
  (top_k was in proto but not being passed)
- test.py: Fix test_sampling_params to use valid proto fields and
  switch to MLX-compatible model (mlx-community/Llama-3.2-1B-Instruct)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* refactor(mlx): move mlx_cache.py from common to mlx backend

The ThreadSafeLRUPromptCache is only used by the mlx backend. After
evaluating mlx-vlm, it was determined that the cache cannot be shared
because mlx-vlm's generate/stream_generate functions don't support
the prompt_cache parameter that mlx_lm provides.

- Move mlx_cache.py from backend/python/common/ to backend/python/mlx/
- Remove sys.path manipulation from backend.py and test.py
- Fix test assertion to expect "MLX model loaded successfully"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* test(mlx): add comprehensive cache tests and document upstream behavior

Added comprehensive unit tests (test_mlx_cache.py) covering all cache
operation modes:
- Exact match
- Shorter prefix match
- Longer prefix match with trimming
- No match scenarios
- LRU eviction and access order
- Reference counting and deep copy behavior
- Multi-model namespacing
- Thread safety with data integrity verification

Documents upstream mlx_lm/server.py behavior: single-token prefixes are
deliberately not matched (uses > 0, not >= 0) to allow longer cached
sequences to be preferred for trimming. This is acceptable because real
prompts with chat templates are always many tokens.

Removed weak unit tests from test.py that only verified "no exception
thrown" rather than correctness.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* chore(mlx): remove unused MinP proto field

The MinP field was added to PredictOptions but is not populated by the
Go frontend/API. The MLX backend uses getattr with a default value,
so it works without the proto field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

---------

Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 11:27:46 +01:00
Ettore Di Giacinto
878c9d46d5 fix: improve ram estimation (#7603)
* fix: default to 10seconds of watchdog if runtime setting is malformed

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use gosigar for RAM estimation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 10:18:36 +01:00
Ettore Di Giacinto
b841a495da Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" (#7602)
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7588)"

This reverts commit 648dfc0389.
2025-12-16 09:48:46 +01:00
Ettore Di Giacinto
f75903d7f7 Update latest project news in README
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-16 09:16:42 +01:00
Ettore Di Giacinto
50f9c9a058 feat(watchdog): add Memory resource reclaimer (#7583)
* feat(watchdog): add GPU reclaimer

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Handle vram calculation for unified memory devices

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Support RAM eviction, set watchdog interval from runtime settings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-16 09:15:18 +01:00
dependabot[bot]
dbd25885c3 chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers (#7594)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.1.0 to 5.2.0.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.1.0...v5.2.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 09:12:57 +01:00
dependabot[bot]
3d55055126 chore(deps): bump github.com/jaypipes/ghw from 0.20.0 to 0.21.1 (#7591)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.20.0 to 0.21.1.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.20.0...v0.21.1)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.21.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 08:16:05 +01:00
dependabot[bot]
af7ba2e3de chore(deps): bump github.com/labstack/echo/v4 from 4.13.4 to 4.14.0 (#7589)
Bumps [github.com/labstack/echo/v4](https://github.com/labstack/echo) from 4.13.4 to 4.14.0.
- [Release notes](https://github.com/labstack/echo/releases)
- [Changelog](https://github.com/labstack/echo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/labstack/echo/compare/v4.13.4...v4.14.0)

---
updated-dependencies:
- dependency-name: github.com/labstack/echo/v4
  dependency-version: 4.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 08:15:41 +01:00
LocalAI [bot]
7a3b0bbfaa chore: ⬆️ Update leejet/stable-diffusion.cpp to 200cb6f2ca07e40fa83b610a4e595f4da06ec709 (#7597)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-16 08:15:15 +01:00
dependabot[bot]
648dfc0389 chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 (#7588)
Bumps [securego/gosec](https://github.com/securego/gosec) from 2.22.9 to 2.22.11.
- [Release notes](https://github.com/securego/gosec/releases)
- [Commits](https://github.com/securego/gosec/compare/v2.22.9...v2.22.11)

---
updated-dependencies:
- dependency-name: securego/gosec
  dependency-version: 2.22.11
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 01:49:11 +00:00
dependabot[bot]
b396413ad5 chore(deps): bump actions/download-artifact from 6 to 7 (#7587)
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 6 to 7.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v6...v7)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 00:14:02 +01:00
dependabot[bot]
2ad928678c chore(deps): bump peter-evans/create-pull-request from 7 to 8 (#7586)
Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 7 to 8.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](https://github.com/peter-evans/create-pull-request/compare/v7...v8)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 00:13:42 +01:00
dependabot[bot]
9b27b53a50 chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 (#7590)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.27.2 to 2.27.3.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.27.2...v2.27.3)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.27.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 22:58:45 +01:00
Ettore Di Giacinto
2387b266d8 chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server (#7584)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-15 21:55:20 +01:00
dependabot[bot]
0f2df23c61 chore(deps): bump actions/upload-artifact from 5 to 6 (#7585)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5 to 6.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-15 19:33:48 +00:00
Ettore Di Giacinto
8ac7e8c299 fix(chat-ui): model selection toggle and new chat (#7574)
Fixes a minor glitch that happens when switching model in from the chat
pane where the header was not getting updated. Besides, it allows to
create new chat directly when clicking from the management pane to the
model.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-14 22:29:11 +01:00
LocalAI [bot]
0f5cc4c07b chore: ⬆️ Update ggml-org/llama.cpp to 5c8a717128cc98aa9e5b1c44652f5cf458fd426e (#7573)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-14 22:21:54 +01:00
LocalAI [bot]
3e4e6777d8 chore: ⬆️ Update ggml-org/llama.cpp to 5266379bcae74214af397f36aa81b2a08b15d545 (#7563)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-14 11:41:10 +01:00
Simon Redman
5de539ab07 fix(7355): Update llama-cpp grpc for v3 interface (#7566)
* fix(7355): Update llama-cpp grpc for v3 interface

Signed-off-by: Simon Redman <simon@ergotech.com>

* feat(llama-gprc): Trim whitespace from servers list

Signed-off-by: Simon Redman <simon@ergotech.com>

* Trim trailing spaces in grpc-server.cpp

Signed-off-by: Simon Redman <simon@ergotech.com>

---------

Signed-off-by: Simon Redman <simon@ergotech.com>
2025-12-14 11:40:33 +01:00
LocalAI [bot]
3013d1c7b5 chore: ⬆️ Update leejet/stable-diffusion.cpp to 43a70e819b9254dee0d017305d6992f6bb27f850 (#7562)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-13 22:52:20 +01:00
LocalAI [bot]
073b3855d9 chore: ⬆️ Update ggml-org/whisper.cpp to 2551e4ce98db69027d08bd99bcc3f1a4e2ad2cef (#7561)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-13 21:22:14 +00:00
Ettore Di Giacinto
e1874cdb54 feat(ui): add mask to install custom backends (#7559)
* feat: allow to install backends from URL in the WebUI and API

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* trace backends installations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-13 19:11:32 +01:00
Ettore Di Giacinto
7790a24682 Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory" (#7558)
Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend…"

This reverts commit 1b4aa6f1be.
2025-12-13 17:04:46 +01:00
dependabot[bot]
1b4aa6f1be chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory (#7549)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/diffusers directory: torch.


Updates `torch` from 2.5.1+cxx11.abi to 2.7.1+cpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+cpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-13 13:12:18 +00:00
Ettore Di Giacinto
504d954aea Add chardet to requirements-l4t13.txt
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-13 12:59:03 +01:00
Ettore Di Giacinto
1383ad6d6d Change runner from macOS-14 to macos-latest
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-13 10:11:27 +01:00
Ettore Di Giacinto
5e270ba5bd Change runner from macOS-14 to macos-latest
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-13 10:10:47 +01:00
Ettore Di Giacinto
6d2a535813 chore(l4t13): use pytorch index (#7546)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-13 10:04:57 +01:00
Ettore Di Giacinto
abfb0ff8fe feat(stablediffusion-ggml): add lora support (#7542)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-13 08:29:06 +01:00
LocalAI [bot]
2bd6faaff5 chore: ⬆️ Update leejet/stable-diffusion.cpp to 11ab095230b2b67210f5da4d901588d56c71fe3a (#7539)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-12 21:31:13 +00:00
Ettore Di Giacinto
1a9f5da1b7 Update Discord badge with dynamic member count
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-12 12:50:55 +01:00
Ettore Di Giacinto
7f823fce7c Update Discord badge in README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-12 12:34:57 +01:00
Ettore Di Giacinto
fc5b9ebfcc feat(loader): enhance single active backend to support LRU eviction (#7535)
* feat(loader): refactor single active backend support to LRU

This changeset introduces LRU management of loaded backends. Users can
set now a maximum number of models to be loaded concurrently, and, when
setting LocalAI in single active backend mode we set LRU to 1 for
backward compatibility.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-12 12:28:38 +01:00
LocalAI [bot]
c141a40e00 chore(model-gallery): ⬆️ update checksum (#7530)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-12 08:16:04 +01:00
Ettore Di Giacinto
0b130fb811 fix(llama.cpp): handle corner cases with tool array content (#7528)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-12 08:15:45 +01:00
LocalAI [bot]
0771a2d3ec chore: ⬆️ Update ggml-org/llama.cpp to a81a569577cc38b32558958b048228150be63eae (#7529)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-11 21:55:44 +00:00
Richard Palethorpe
9441eb509a chore(makefile): Add buildargs for sd and cuda when building backend (#7525)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-11 20:33:19 +01:00
Ettore Di Giacinto
8442f33712 chore(deps): bump stable-diffusion.cpp to '8823dc48bcc1598eb9671da7b69e45338d0cc5a5' (#7524)
* chore(deps): bump stable-diffusion.cpp to '8823dc48bcc1598eb9671da7b69e45338d0cc5a5'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(Dockerfile.golang): Make curl noisy to see when download fails

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Richard Palethorpe <io@richiejp.com>
2025-12-11 20:32:25 +01:00
Ettore Di Giacinto
5dde7e9ac6 fix: make sure to close on errors (#7521)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-11 14:03:20 +01:00
LocalAI [bot]
72621a1d1c chore: ⬆️ Update ggml-org/llama.cpp to 4dff236a522bd0ed949331d6cb1ee2a1b3615c35 (#7508)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-11 08:15:38 +01:00
Ettore Di Giacinto
3b5c2ea633 feat(ui): allow to order search results (#7507)
* feat(ui): improve table view and let items to be sorted

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: use constants

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-11 00:11:33 +01:00
LocalAI [bot]
e1d060d147 chore: ⬆️ Update ggml-org/whisper.cpp to 9f5ed26e43c680bece09df7bdc8c1b7835f0e537 (#7509)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-10 23:09:13 +01:00
Ettore Di Giacinto
32dcb58e89 feat(vibevoice): add new backend (#7494)
* feat(vibevoice): add backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add workflow and backend index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(gallery): add vibevoice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use self-hosted for intel builds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pin python version for l4t

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-10 21:14:21 +01:00
LocalAI [bot]
ef44ace73f chore: ⬆️ Update ggml-org/llama.cpp to 086a63e3a5d2dbbb7183a74db453459e544eb55a (#7496)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-10 12:05:13 +01:00
Ettore Di Giacinto
f51d3e380b fix(config): make syncKnownUsecasesFromString idempotent (#7493)
fix(config): correctly parse usecases from strings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-09 21:08:22 +01:00
Ettore Di Giacinto
6cc5cac7b0 fix(downloader): do not download model files if not necessary (#7492)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-09 19:08:10 +01:00
Ettore Di Giacinto
74ee1463fe chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' (#7484)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-09 15:41:37 +01:00
LocalAI [bot]
6c7b215687 chore: ⬆️ Update ggml-org/whisper.cpp to a8f45ab11d6731e591ae3d0230be3fec6c2efc91 (#7483)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-09 08:33:30 +01:00
dependabot[bot]
5e0bc37de3 chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 (#7475)
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.38.2 to 1.38.3.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.38.2...v1.38.3)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-version: 1.38.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 01:24:08 +00:00
dependabot[bot]
e28a00c952 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.60.0 to 0.61.0 (#7477)
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus

Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.60.0 to 0.61.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.60.0...exporters/prometheus/v0.61.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.61.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 23:43:13 +00:00
dependabot[bot]
08f9a52594 chore(deps): bump github.com/mudler/cogito from 0.5.1 to 0.6.0 (#7474)
Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.5.1 to 0.6.0.
- [Release notes](https://github.com/mudler/cogito/releases)
- [Commits](https://github.com/mudler/cogito/compare/v0.5.1...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/mudler/cogito
  dependency-version: 0.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 22:40:33 +01:00
dependabot[bot]
bbce461f57 chore(deps): bump protobuf from 6.33.1 to 6.33.2 in /backend/python/transformers (#7481)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.1 to 6.33.2.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 22:13:18 +01:00
dependabot[bot]
22e13c362a chore(deps): bump actions/stale from 10.1.0 to 10.1.1 (#7473)
Bumps [actions/stale](https://github.com/actions/stale) from 10.1.0 to 10.1.1.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](5f858e3efb...997185467f)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-version: 10.1.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 21:15:37 +01:00
dependabot[bot]
6bd0442698 chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.38.0 to 1.39.0 (#7476)
chore(deps): bump go.opentelemetry.io/otel/sdk/metric

Bumps [go.opentelemetry.io/otel/sdk/metric](https://github.com/open-telemetry/opentelemetry-go) from 1.38.0 to 1.39.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.38.0...v1.39.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk/metric
  dependency-version: 1.39.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 19:30:21 +00:00
Ettore Di Giacinto
0380bfe006 Enhance README with video and screenshots
Added YouTube video link and screenshots section to README.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-08 17:08:15 +01:00
Ettore Di Giacinto
00a05208bc chore(docs): center video
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-08 16:59:11 +01:00
Ettore Di Giacinto
4a7cd256c9 Revise 'Screenshots' section to include video
Updated section title and added video link for LocalAI.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-08 16:56:34 +01:00
Ettore Di Giacinto
a27d0d151f Embed YouTube video in documentation
Added an embedded YouTube video to the documentation.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-08 16:53:20 +01:00
Ettore Di Giacinto
03a17a2986 fix(paths): remove trailing slash from requests (#7451)
This removes any ambiguity from how paths are handled, and at the same
time it uniforms the ui paths with the other paths that don't have a
trailing slash

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-07 21:45:09 +01:00
Ettore Di Giacinto
8ca98c90ea chore(importers/llama.cpp): add models to 'llama-cpp' subfolder (#7450)
This makes paths predictable, and avoids multiple model files to show in
the main view

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-07 21:44:57 +01:00
Ettore Di Giacinto
18b8956bd9 chore(gallery agent): strip thinking tags (#7464)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-07 19:25:41 +01:00
Ettore Di Giacinto
262afd28a0 chore(gallery agent): summary now is at root of the git repository (#7463)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-07 19:23:27 +01:00
LocalAI [bot]
5610384d8a chore: ⬆️ Update ggml-org/llama.cpp to db97837385edfbc772230debbd49e5efae843a71 (#7447)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-07 08:32:35 +01:00
rampa3
6aee29d18f fix(ui): Update few links in web UI from 'browse' to '/browse/' (#7445)
* Update few links in web UI from 'browse' to '/browse/'

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* Update core/http/views/404.html

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Update core/http/views/error.html

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

* Update core/http/views/manage.html

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-06 22:40:26 +01:00
LocalAI [bot]
c3493e4917 chore: ⬆️ Update ggml-org/whisper.cpp to a88b93f85f08fc6045e5d8a8c3f94b7be0ac8bce (#7448)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-06 21:26:25 +00:00
LocalAI [bot]
edf7141b9b chore: ⬆️ Update ggml-org/llama.cpp to 8160b38a5fa8a25490ca33ffdd200cda51405688 (#7438)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-06 13:35:24 +01:00
Ettore Di Giacinto
446b686470 Update model version in gallery-agent workflow
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-05 22:08:16 +01:00
Ettore Di Giacinto
b287944f07 Add Proto Dependencies installation step
Added steps to install protobuf and Go dependencies in the GitHub Actions workflow.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-05 21:40:36 +01:00
LocalAI [bot]
f3ae358689 chore(model-gallery): ⬆️ update checksum (#7437)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-05 15:20:21 +01:00
Richard Palethorpe
c7aaeab683 fix(stablediffusion-ggml): Correct Z-Image model name (#7436)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-05 14:57:39 +01:00
Ettore Di Giacinto
024aa6a55b chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' (#7434)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-05 00:17:35 +01:00
Ettore Di Giacinto
7ce8a56e96 chore(ci/agent): correctly invoke go run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-04 23:12:04 +01:00
Ettore Di Giacinto
3e9ed48432 chore(ci/agent): support quantization
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-04 22:56:35 +01:00
Ettore Di Giacinto
963796ff51 Update localai-github-action to version 1.1
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 22:50:55 +01:00
Ettore Di Giacinto
6bd9a304bc Add local AI model to gallery agent workflow
Updated the GitHub Actions workflow to include the local AI model and modified environment variables for the gallery agent.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 22:43:31 +01:00
Ettore Di Giacinto
7990c7a401 chore(agent): update gallery agent to use importers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-04 22:23:43 +01:00
LocalAI [bot]
4bb93b1c4c chore(model-gallery): ⬆️ update checksum (#7433)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-04 21:23:26 +01:00
Copilot
1abbedd732 feat(diffusers): implement dynamic pipeline loader to remove per-pipeline conditionals (#7365)
* Initial plan

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add dynamic loader for diffusers pipelines and refactor backend.py

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix pipeline discovery error handling and test mock issue

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Address code review feedback: direct imports, better error handling, improved tests

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Address remaining code review feedback: specific exceptions, registry access, test imports

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add defensive fallback for DiffusionPipeline registry access

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Actually use dynamic pipeline loading for all pipelines in backend

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use dynamic loader consistently for all pipelines including AutoPipelineForText2Image

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move dynamic loader tests into test.py for CI compatibility

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Extend dynamic loader to discover any diffusers class type, not just DiffusionPipeline

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add AutoPipeline classes to pipeline registry for default model loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(python): set pyvenv python home

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do pyenv update during start

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Minor changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 19:02:06 +01:00
Ettore Di Giacinto
92ee8c2256 fix(ui): prevent box overflow in chat view (#7430)
Otherwise tool call and result might overflow the box

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-04 17:21:17 +01:00
Ettore Di Giacinto
78105e6b20 chore(ui): uniform buttons (#7429)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-04 17:18:51 +01:00
Richard Palethorpe
c2e4a1f29b feat(stablediffusion): Passthrough more parameters to support z-image and flux2 (#7419)
* feat(stablediffusion): Passthrough more parameters to support z-image and flux2

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(z-image): Add Z-Image-Turbo GGML to library

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(stablediffusion-ggml): flush stderr and check errors when writing PNG

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(stablediffusion-ggml): Re-allocate Go strings in C++

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(stablediffusion-ggml): Try to avoid segfaults

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(stablediffusion-ggml): Init sample and easycache params

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 17:08:21 +01:00
Ettore Di Giacinto
100ebdfa2c chore(ci): do not overload the apple tests
Skip tests that are already run on other jobs and not really adding anything here. We have already functional tests that cover apple.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 14:15:15 +01:00
LocalAI [bot]
ca2e878aaf chore: ⬆️ Update ggml-org/llama.cpp to e9f9483464e6f01d843d7f0293bd9c7bc6b2221c (#7421)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 11:54:01 +01:00
Igor B. Poretsky
96e123d53a Messages output fix (#7424)
The internal echo command in sh does not support "-e" and "-E" options
and interprets backslash escape sequences by default. So we prefer the
external echo command when it is available.
2025-12-04 11:30:02 +01:00
LocalAI [bot]
7c5a0cde64 chore: ⬆️ Update leejet/stable-diffusion.cpp to 5865b5e7034801af1a288a9584631730b25272c6 (#7422)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 11:29:16 +01:00
Ettore Di Giacinto
edcbf82b31 chore(ci): add wget 2025-12-04 10:01:34 +01:00
Ettore Di Giacinto
6558caca85 chore(ci): adapt also golang-based backends docker images 2025-12-04 09:14:08 +01:00
Ettore Di Giacinto
b4172762d7 chore(ci): do override pip in 24.04
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 22:54:13 +01:00
Ettore Di Giacinto
dc6182bbb1 chore(ci): add wget to llama-cpp docker image builder
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 22:48:41 +01:00
Ettore Di Giacinto
1d1d52da59 chore(ci): small fixups to build arm64 images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 21:42:33 +01:00
Ettore Di Giacinto
46b1a1848f chore(ci): minor fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 16:47:31 +01:00
LocalAI [bot]
957eea3da3 chore: ⬆️ Update ggml-org/llama.cpp to 61bde8e21f4a1f9a98c9205831ca3e55457b4c78 (#7415)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-03 16:27:12 +01:00
Ettore Di Giacinto
ab4f2742a6 chore(ci): minor fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 16:26:33 +01:00
Ettore Di Giacinto
03f3bf2d94 chore(ci): only install runtime libs needed on arm64
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 15:13:21 +01:00
Ettore Di Giacinto
774ddc60db chore(ci): specify ubuntu version in pipelines
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 11:10:18 +01:00
Ettore Di Giacinto
0ca1322b43 chore(ci): correctly pass ubuntu-version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 09:58:10 +01:00
Ettore Di Giacinto
8dfeea2f55 fix: use ubuntu 24.04 for cuda13 l4t images (#7418)
* fix: use ubuntu 24.04 for cuda13 l4t images

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop openblas from containers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-03 09:47:03 +01:00
Ettore Di Giacinto
fea9018dc5 Revert "feat(stablediffusion): Passthrough more parameters to support z-image and flux2" (#7417)
Revert "feat(stablediffusion): Passthrough more parameters to support z-image…"

This reverts commit 4018e59b2a.
2025-12-02 22:14:28 +01:00
Ettore Di Giacinto
d8c7e90a69 Add Dockerfile for arm64 with nvpl installation (#7416)
Added installation of nvpl and updated apt-get commands for arm64 architecture.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-02 21:55:42 +01:00
Ettore Di Giacinto
c045b7a6bb Update Dockerfile to install cudss package
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-02 21:23:21 +01:00
Ettore Di Giacinto
7a5c61b057 fix: configure sbsa packages for arm64 (#7413)
* fix: configure sbsa packages for arm64

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-02 18:59:36 +01:00
Richard Palethorpe
4018e59b2a feat(stablediffusion): Passthrough more parameters to support z-image and flux2 (#7414)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2025-12-02 18:28:26 +01:00
Richard Palethorpe
aaece6685f chore(deps/stable-diffusion-ggml): update stablediffusion-ggml (#7411)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): fixup schedulers and samplers arrays, use default getters

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-02 16:35:39 +01:00
Ettore Di Giacinto
f5df806f35 Fixup tags
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-02 15:15:41 +01:00
Ettore Di Giacinto
cfd95745ed feat: add cuda13 images (#7404)
* chore(ci): add cuda13 jobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to pipelines and to capabilities. Start to work on the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* capabilities: try to detect by looking at /usr/local

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* neutts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* backends.yaml

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add cuda13 l4t requirements.txt

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add cuda13 requirements.txt

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pin vllm

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Not all backends are compatible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add vllm to requirements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* vllm is not pre-compiled for cuda 13

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-02 14:24:35 +01:00
dependabot[bot]
9872bdf455 chore(deps): bump appleboy/ssh-action from 1.2.3 to 1.2.4 (#7410)
Bumps [appleboy/ssh-action](https://github.com/appleboy/ssh-action) from 1.2.3 to 1.2.4.
- [Release notes](https://github.com/appleboy/ssh-action/releases)
- [Commits](https://github.com/appleboy/ssh-action/compare/v1.2.3...v1.2.4)

---
updated-dependencies:
- dependency-name: appleboy/ssh-action
  dependency-version: 1.2.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 08:00:16 +01:00
LocalAI [bot]
665441ca94 chore: ⬆️ Update ggml-org/llama.cpp to ec18edfcba94dacb166e6523612fc0129cead67a (#7406)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-12-02 07:59:52 +01:00
dependabot[bot]
60f50a356f chore(deps): bump github.com/google/go-containerregistry from 0.19.2 to 0.20.7 (#7409)
chore(deps): bump github.com/google/go-containerregistry

Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.19.2 to 0.20.7.
- [Release notes](https://github.com/google/go-containerregistry/releases)
- [Commits](https://github.com/google/go-containerregistry/compare/v0.19.2...v0.20.7)

---
updated-dependencies:
- dependency-name: github.com/google/go-containerregistry
  dependency-version: 0.20.7
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-01 22:55:30 +00:00
Ettore Di Giacinto
045baf7fd2 fix(ui): navbar ordering and login icon (#7407)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-01 21:20:11 +01:00
Ettore Di Giacinto
8a54ffa668 fix: do not require auth for readyz/healthz endpoints (#7403)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-01 10:35:28 +01:00
Ettore Di Giacinto
e3bcba5c45 chore: ⬆️ Update ggml-org/llama.cpp to 7f8ef50cce40e3e7e4526a3696cb45658190e69a (#7402)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-01 07:50:40 +01:00
LocalAI [bot]
17d84c8556 feat(swagger): update swagger (#7400)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-30 21:39:29 +00:00
Ettore Di Giacinto
a3423f33e1 feat(agent-jobs): add multimedia support (#7398)
* feat(agent-jobs): add multimedia support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Refactoring

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-30 14:09:25 +01:00
Ettore Di Giacinto
45ee10ec50 feat(hf-api): return files in nested directories (#7396)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-30 09:06:54 +01:00
LocalAI [bot]
0824fd8efd chore: ⬆️ Update ggml-org/llama.cpp to 8c32d9d96d9ae345a0150cae8572859e9aafea0b (#7395)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-30 09:06:18 +01:00
LocalAI [bot]
a9b8869964 feat(swagger): update swagger (#7394)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-30 09:05:46 +01:00
Ettore Di Giacinto
54b5dfa8e1 chore: refactor css, restyle to be slightly minimalistic (#7397)
restyle

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-29 22:11:44 +01:00
Ettore Di Giacinto
468ac608f3 chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' (#7392)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-29 08:58:07 +01:00
Ettore Di Giacinto
53e5b2d6be feat: agent jobs panel (#7390)
* feat(agent): agent jobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Multiple webhooks, simplify

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Do not use cron with seconds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Create separate pages for details

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Detect if no models have MCP configuration, show wizard

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Make services test to run

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-28 23:05:39 +01:00
Ettore Di Giacinto
4b5977f535 chore: drop pinning of python 3.12 (#7389)
Update install.sh

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-28 11:02:56 +01:00
Ettore Di Giacinto
0d877b1e71 Revert "chore(l4t): Update extra index URL for requirements-l4t.txt" (#7388)
Revert "chore(l4t): Update extra index URL for requirements-l4t.txt (#7383)"

This reverts commit 0d781e6b7e.
2025-11-28 11:02:11 +01:00
Ettore Di Giacinto
e27f1370eb chore(diffusers): Add PY_STANDALONE_TAG for l4t Python version (#7387)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-28 09:34:05 +01:00
LocalAI [bot]
1a53fd2b9b chore: ⬆️ Update ggml-org/llama.cpp to 4abef75f2cf2eee75eb5083b30a94cf981587394 (#7382)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-28 00:08:27 +01:00
Ettore Di Giacinto
e01d821314 chore: Add Python 3.12 support for l4t build profile (#7384)
Set Python version to 3.12 for l4t build profile.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 23:00:09 +01:00
Ettore Di Giacinto
0d781e6b7e chore(l4t): Update extra index URL for requirements-l4t.txt (#7383)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 22:02:06 +01:00
LocalAI [bot]
4c41f96157 docs: ⬆️ update docs version mudler/LocalAI (#7381)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-27 21:49:31 +01:00
Igor B. Poretsky
a8eb1c421b Clean data directory (#7378)
It seems to be no point to copy /etc/skel content to newly created data
directory.
2025-11-27 17:48:32 +01:00
Igor B. Poretsky
d27a281783 Correct user deletion with all its data (#7368)
Actually it is not necessary to remove particularly the local-ai data
directory before user deletion. It will be accomplished automatically by
the userdel command. But it is crucial to remove additional users from
the local-ai group to allow userdel command to delete the group itself.
2025-11-27 17:47:55 +01:00
Igor B. Poretsky
c411fe09fb Conventional way of adding extra apt repository (#7362) 2025-11-27 17:46:26 +01:00
Ettore Di Giacinto
7ccc383a8b chore(l4t/diffusers): bump nvidia l4t index for pytorch 2.9 (#7379)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 17:42:01 +01:00
Ettore Di Giacinto
2f8a2b1297 chore(deps): update diffusers dependency to use GitHub repo for l4t (#7369)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 16:02:48 +01:00
Igor B. Poretsky
acbcb44dbc Initialize sudo reference before its first actual use (#7367)
Unfortunately, in my previous pr I missed the fact that uninstall
procedure uses sudo as well. La colpa mia.
2025-11-27 15:20:46 +01:00
Igor B. Poretsky
ab022172a9 chore: switch from /usr/share to /var/lib for data storage (#7361)
* More appropriate place for data storing

The /usr/share subtree in Linux is used for data that generally are not
supposed to change. Conventional places for changeable data are usually
located under /var, so /var/lib seems to be a reasonable default here.

* Data paths consistency fix

* Directory name consistency fix
2025-11-27 09:18:28 +01:00
LocalAI [bot]
b5f4f4ac6d chore: ⬆️ Update ggml-org/llama.cpp to eec1e33a9ed71b79422e39cc489719cf4f8e0777 (#7363)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-27 09:17:25 +01:00
Igor B. Poretsky
c0d1d0211f fix: Initialize sudo reference before its first actual use (#7360) 2025-11-26 16:03:42 +01:00
Igor B. Poretsky
f617bec686 fix: double sudo invocation fix in the install script (#7359)
Double sudo invocation fix in the install script
2025-11-26 16:03:10 +01:00
Ettore Di Giacinto
7a94d237c4 chore(deps): bump llama.cpp to '583cb83416467e8abf9b37349dcf1f6a0083745a (#7358)
chore(deps): bump llama.cpp to '583cb83416467e8abf9b37349dcf1f6a0083745a'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-26 08:23:21 +01:00
LocalAI [bot]
304ac94d01 feat(swagger): update swagger (#7356)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-25 22:19:53 +01:00
Ettore Di Giacinto
f9f9b9d444 Update project news section in README.md
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-25 19:31:05 +01:00
dependabot[bot]
70d78b9fd4 chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 (#7343)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.76.0 to 1.77.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.76.0...v1.77.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.77.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 21:18:41 +01:00
dependabot[bot]
91248da09e chore(deps): bump actions/checkout from 5 to 6 (#7339)
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 21:18:15 +01:00
Gregory Mariani
745c31e013 feat(inpainting): add inpainting endpoint, wire ImageGenerationFunc and return generated image URL (#7328)
feat(inpainting): add inpainting endpoint with automatic model selection

Signed-off-by: Greg <marianigregory@pm.me>
2025-11-24 21:13:54 +01:00
dependabot[bot]
7e01aa8faa chore(deps): bump protobuf from 6.32.0 to 6.33.1 in /backend/python/transformers (#7340)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.32.0 to 6.33.1.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 20:12:17 +00:00
Ettore Di Giacinto
aceebf81d6 chore(ui): fix slider overflow
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-24 14:43:38 +01:00
Ettore Di Giacinto
71ed03102f feat(ui): add chat history (#7325)
* feat(chat): add history and management

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Display in progress chats

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fetch available context size as we switch chat

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add search

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Display MCP toggle correctly

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Re-ordering

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Re-style

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Stable ordering

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Display token/sec correctly

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Visual changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Display chat time

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-24 11:48:24 +01:00
LocalAI [bot]
f6d2a52cd5 chore: ⬆️ Update ggml-org/llama.cpp to 0c7220db56525d40177fcce3baa0d083448ec813 (#7337)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-24 09:11:38 +01:00
LocalAI [bot]
05a00b2399 chore: ⬆️ Update ggml-org/llama.cpp to 3f3a4fb9c3b907c68598363b204e6f58f4757c8c (#7336)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-22 21:53:40 +00:00
Ettore Di Giacinto
3a232446e0 Revert "chore(chatterbox): bump l4t index to support more recent pytorch" (#7333)
Revert "chore(chatterbox): bump l4t index to support more recent pytorch (#7332)"

This reverts commit 55607a5aac.
2025-11-22 10:10:27 +01:00
LocalAI [bot]
bdfe8431fa chore: ⬆️ Update ggml-org/llama.cpp to 23bc779a6e58762ea892eca1801b2ea1b9050c00 (#7331)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-22 08:44:01 +01:00
Ettore Di Giacinto
55607a5aac chore(chatterbox): bump l4t index to support more recent pytorch (#7332)
This should add support for devices like the DGX Spark

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 22:24:46 +01:00
Ettore Di Giacinto
ec492a4c56 fix(typo): environment variable name for max jobs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 18:37:22 +01:00
Ettore Di Giacinto
2defe98df8 fix(vllm): Update flash-attn to specific wheel URL
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 18:06:46 +01:00
Filipe Oliveira
b406b088a7 fix: Update Installer Options URL (#7330) 2025-11-21 17:29:36 +01:00
Ettore Di Giacinto
6261c87b1b Add NVCC_THREADS and MAX_JOB environment variables
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 16:14:13 +01:00
Ettore Di Giacinto
fa00aa0085 chore(ci): add OS check to skip test if not on Linux
Skip test on non-Linux operating systems.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 15:01:04 +01:00
Ettore Di Giacinto
0e53ce60b4 chore(ci): remove context size configuration from application
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 14:57:32 +01:00
Ettore Di Giacinto
8aba078439 chore(tests): add context size option to application initialization
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 09:50:05 +01:00
Ettore Di Giacinto
e88db7d142 fix(llama.cpp): handle corner cases with tool content (#7324)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-21 09:21:49 +01:00
LocalAI [bot]
b7b8a0a748 chore: ⬆️ Update ggml-org/llama.cpp to dd0f3219419b24740864b5343958a97e1b3e4b26 (#7322)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-21 08:11:47 +01:00
Ettore Di Giacinto
dd2828241c chore(docs): add documentation about import (#7315)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-20 23:07:36 +01:00
LocalAI [bot]
b8011f49f2 chore: ⬆️ Update ggml-org/whisper.cpp to 19ceec8eac980403b714d603e5ca31653cd42a3f (#7321)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-20 23:07:22 +01:00
Copilot
16e5689162 feat(importers): Add diffuser backend importer with ginkgo tests and UI support (#7316)
* Initial plan

* Add diffuser backend importer with ginkgo tests

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Finalize diffuser backend importer implementation

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Add diffuser preferences to model-editor import section

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Use gopkg.in/yaml.v3 for consistency in diffuser importer

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-20 22:38:30 +01:00
Ettore Di Giacinto
2dd42292dc feat(ui): runtime settings (#7320)
* feat(ui): add watchdog settings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Do not re-read env

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Some refactor, move other settings to runtime (p2p)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add API Keys handling

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Allow to disable runtime settings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Documentation

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* show MCP toggle in index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop context default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-20 22:37:20 +01:00
Ettore Di Giacinto
53d51671d7 Update Docker installation recommendation wording
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-20 17:27:48 +01:00
Ettore Di Giacinto
daf39e1efd chore(vllm/ci): set maximum number of jobs
Also added comments to clarify CPU usage during build.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-20 15:53:32 +01:00
Ettore Di Giacinto
382474e4a1 fix: do not delete files if used by other configured models (#7235)
* WIP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: prevent deletion of model files shared by multiple configurations (#7317)

* Initial plan

* fix: do not delete files if used by other configured models

- Fixed bug in DeleteModelFromSystem where OR was used instead of AND for file suffix check
- Fixed bug where model config filename comparison was incorrect
- Added comprehensive Ginkgo test to verify shared model files are not deleted

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* fix: prevent deletion of model files shared by multiple configurations

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-20 14:55:51 +01:00
Ettore Di Giacinto
5fed9c6596 chore(ci): move intel image builds to self-hosted
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-20 09:36:54 +01:00
LocalAI [bot]
bfa07df7cd chore: ⬆️ Update ggml-org/llama.cpp to 7d77f07325985c03a91fa371d0a68ef88a91ec7f (#7314)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-20 07:58:42 +01:00
dependabot[bot]
fbaa21b0e5 chore(deps): bump golang.org/x/crypto from 0.43.0 to 0.45.0 in the go_modules group across 1 directory (#7319)
chore(deps): bump golang.org/x/crypto

Bumps the go_modules group with 1 update in the / directory: [golang.org/x/crypto](https://github.com/golang/crypto).


Updates `golang.org/x/crypto` from 0.43.0 to 0.45.0
- [Commits](https://github.com/golang/crypto/compare/v0.43.0...v0.45.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.45.0
  dependency-type: indirect
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-20 04:19:22 +00:00
Ettore Di Giacinto
95b6c9bb5a Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-19 22:25:33 +01:00
Ettore Di Giacinto
2cc4809b0d feat: docs revamp (#7313)
* docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small enhancements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Enhancements

* Default to zen-dark

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-19 22:21:20 +01:00
Ettore Di Giacinto
77bbeed57e feat(importer): unify importing code with CLI (#7299)
* feat(importer): support ollama and OCI, unify code

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: support importing from local file

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* support also yaml config files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly handle local files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Extract importing errors

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add importer tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add integration tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(UX): improve and specify supported URI formats

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fail if backend does not have a runfile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Adapt tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(gallery): add cache for galleries

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): remove handler duplicate

File input handlers are now handled by Alpine.js @change handlers in chat.html.
Removed duplicate listeners to prevent files from being processed twice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): be consistent in attachments in the chat

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fail if no importer matches

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: propagate ops correctly

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-19 20:52:11 +01:00
Ettore Di Giacinto
3152611184 chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f (#7311)
chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f'

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-19 14:42:11 +01:00
Ettore Di Giacinto
30f992f241 feat(ui): add backend reinstall button (#7305)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-18 14:52:54 +01:00
ErixM
2709220b84 fix the tts model dropdown to show the currently selected model (#7306)
* fix the tts model dropdown to show the currently selected model

* Update core/config/model_config.go

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Erixhens Muka <erixhens.muka@bluetensor.ai>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-18 14:49:03 +01:00
LocalAI [bot]
4278506876 chore: ⬆️ Update ggml-org/llama.cpp to cb623de3fc61011e5062522b4d05721a22f2e916 (#7301)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-18 07:43:57 +01:00
LocalAI [bot]
1dd1d12da1 chore: ⬆️ Update ggml-org/whisper.cpp to b12abefa9be2abae39a73fa903322af135024a36 (#7300)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2025-11-18 07:43:33 +01:00
dependabot[bot]
3a5b3bb0a6 chore(deps): bump google.golang.org/protobuf from 1.36.8 to 1.36.10 (#7295)
Bumps google.golang.org/protobuf from 1.36.8 to 1.36.10.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-version: 1.36.10
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 22:25:55 +01:00
dependabot[bot]
94d9fc923f chore(deps): bump github.com/alecthomas/kong from 1.12.1 to 1.13.0 (#7296)
Bumps [github.com/alecthomas/kong](https://github.com/alecthomas/kong) from 1.12.1 to 1.13.0.
- [Commits](https://github.com/alecthomas/kong/compare/v1.12.1...v1.13.0)

---
updated-dependencies:
- dependency-name: github.com/alecthomas/kong
  dependency-version: 1.13.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 20:39:14 +01:00
dependabot[bot]
6fcf2c50b6 chore(deps): bump go.yaml.in/yaml/v2 from 2.4.2 to 2.4.3 (#7294)
Bumps [go.yaml.in/yaml/v2](https://github.com/yaml/go-yaml) from 2.4.2 to 2.4.3.
- [Commits](https://github.com/yaml/go-yaml/compare/v2.4.2...v2.4.3)

---
updated-dependencies:
- dependency-name: go.yaml.in/yaml/v2
  dependency-version: 2.4.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 20:37:19 +01:00
dependabot[bot]
7cbd4a2f18 chore(deps): bump fyne.io/fyne/v2 from 2.7.0 to 2.7.1 (#7293)
Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.0 to 2.7.1.
- [Release notes](https://github.com/fyne-io/fyne/releases)
- [Changelog](https://github.com/fyne-io/fyne/blob/master/CHANGELOG.md)
- [Commits](https://github.com/fyne-io/fyne/compare/v2.7.0...v2.7.1)

---
updated-dependencies:
- dependency-name: fyne.io/fyne/v2
  dependency-version: 2.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 20:37:07 +01:00
1833 changed files with 548104 additions and 275431 deletions

215
.agents/adding-backends.md Normal file
View File

@@ -0,0 +1,215 @@
# Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
## 1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
- **Rust backends**: `backend/rust/<backend-name>/`
For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
For Rust backends, you'll typically need (see `backend/rust/kokoros/` as a reference):
- `Cargo.toml` - Crate manifest; depend on the upstream project as a submodule under `sources/`
- `build.rs` - Invokes `tonic_build` to generate gRPC stubs from `backend/backend.proto` (use the `BACKEND_PROTO_PATH` env var so the Makefile can inject the canonical copy)
- `src/` - The gRPC server implementation (implement `Backend` via `tonic`)
- `Makefile` - Copies `backend.proto` into the crate, runs `cargo build --release`, then `package.sh`
- `package.sh` - Uses `ldd` to bundle the binary's dynamic deps and `ld.so` into `package/lib/`
- `run.sh` - Sets `LD_LIBRARY_PATH`/`SSL_CERT_DIR` and execs the binary via the bundled `lib/ld.so`
- `sources/<UpstreamProject>/` - Git submodule with the upstream Rust crate
## 2. Add Build Configurations to `.github/workflows/backend.yml`
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends for reference — `chatterbox`/`faster-whisper` for Python, `piper`/`silero-vad` for Go, `kokoros` for Rust.
**Without an entry here no image is ever built or pushed, and the gallery entry in `backend/index.yaml` will point at a tag that does not exist.** The `dockerfile:` field must point at `./backend/Dockerfile.<lang>` matching the language bucket from step 1 (e.g. `Dockerfile.python`, `Dockerfile.golang`, `Dockerfile.rust`). The `tag-suffix` must match the `uri:` in the corresponding `backend/index.yaml` image entry exactly.
If you add a new language bucket, `scripts/changed-backends.js` also needs a branch in `inferBackendPath` so PR change-detection routes file edits correctly.
**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
## 3. Add Backend Metadata to `backend/index.yaml`
**Step 3a: Add Meta Definition**
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
**Step 3b: Add Image Entries**
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
## 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
**Step 4a: Add to `.NOTPARALLEL`**
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
```makefile
.NOTPARALLEL: ... backends/<backend-name>
```
**Step 4b: Add to `prepare-test-extra`**
Add the backend to the `prepare-test-extra` target to prepare it for testing. Use the path matching your language bucket (`backend/python/`, `backend/go/`, `backend/rust/`, …):
```makefile
prepare-test-extra: protogen-python
...
$(MAKE) -C backend/<lang>/<backend-name>
```
For Rust backends the target is usually the crate build target itself (e.g. `$(MAKE) -C backend/rust/<backend-name> <backend-name>-grpc`) so the binary is in place before `test` runs.
**Step 4c: Add to `test-extra`**
Add the backend to the `test-extra` target to run its tests — applies to Go and Rust backends too, not only Python:
```makefile
test-extra: prepare-test-extra
...
$(MAKE) -C backend/<lang>/<backend-name> test
```
Each backend's own `Makefile` should define a `test` target so this line works regardless of language. Integration tests that need large model downloads should be gated behind an env var (see `backend/rust/kokoros/`'s `KOKOROS_MODEL_PATH` pattern) so CI only runs unit tests.
**Step 4d: Add Backend Definition**
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
**For Python backends with root context** (like `faster-whisper`, `coqui`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
```
**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
```
**For Go backends**:
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
```
**For Rust backends**:
```makefile
BACKEND_<BACKEND_NAME> = <backend-name>|rust|.|false|true
```
The language field (`python`/`golang`/`rust`/…) must match a `backend/Dockerfile.<lang>` file.
**Step 4e: Generate Docker Build Target**
Add an eval call to generate the docker-build target (around line 480-501):
```makefile
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
```
**Step 4f: Add to `docker-build-backends`**
Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
```makefile
docker-build-backends: ... docker-build-<backend-name>
```
**Determining the Context:**
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
## 5. Verification Checklist
After adding a new backend, verify:
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
## Bundling runtime shared libraries (`package.sh`)
The final `Dockerfile.python` stage is `FROM scratch` — there is no system `libc`, no `apt`, no fallback library path. Only files explicitly copied from the builder stage end up in the backend image. That means any runtime `dlopen` your backend (or its Python deps) needs **must** be packaged into `${BACKEND}/lib/`.
Pattern:
1. Make sure the library is installed in the builder stage of `backend/Dockerfile.python` (add it to the top-level `apt-get install`).
2. Drop a `package.sh` in your backend directory that copies the library — and its soname symlinks — into `$(dirname $0)/lib`. See `backend/python/vllm/package.sh` for a reference implementation that walks `/usr/lib/x86_64-linux-gnu`, `/usr/lib/aarch64-linux-gnu`, etc.
3. `Dockerfile.python` already runs `package.sh` automatically if it exists, after `package-gpu-libs.sh`.
4. `libbackend.sh` automatically prepends `${EDIR}/lib` to `LD_LIBRARY_PATH` at run time, so anything packaged this way is found by `dlopen`.
How to find missing libs: when a Python module silently fails to register torch ops or you see `AttributeError: '_OpNamespace' '...' object has no attribute '...'`, run the backend image's Python with `LD_DEBUG=libs` to see which `dlopen` failed. The filename in the error message (e.g. `libnuma.so.1`) is what you need to package.
To verify packaging works without trusting the host:
```bash
make docker-build-<backend>
CID=$(docker create --entrypoint=/run.sh local-ai-backend:<backend>)
docker cp $CID:/lib /tmp/check && docker rm $CID
ls /tmp/check # expect the bundled .so files + symlinks
```
Then boot it inside a fresh `ubuntu:24.04` (which intentionally does *not* have the lib installed) to confirm it actually loads from the backend dir.
## Importer integration
When you add a new backend, you MUST also make it importable via the model import form (`/import-model`). The import form dropdown is sourced dynamically from `GET /backends/known` — it reads the importer registry at `core/gallery/importers/importers.go`, so the steps below are the ONLY way to make your backend show up.
Required steps:
1. **If your backend has unambiguous detection signals** (unique file extension, HF `pipeline_tag`, unique repo name pattern, unique artefact like `modules.json`):
- Create an importer file at `core/gallery/importers/<backend>.go` following the Match/Import pattern in `llama-cpp.go`.
- Register it in `importers.go:defaultImporters` in **specificity order** — more specific detectors must appear BEFORE more generic ones (e.g. `sentencetransformers` before `transformers`, `stablediffusion-ggml` before `llama-cpp`, `vllm-omni` before `vllm`). First match wins.
2. **If your backend is a drop-in replacement** (same artefacts as another backend, e.g. `ik-llama-cpp` and `turboquant` both consume GGUF the same way `llama-cpp` does):
- Do NOT create a new importer. Extend the existing importer's `Import()` to swap the emitted `backend:` field when `preferences.backend` matches. See `llama-cpp.go` for the pattern.
3. **If your backend has no reliable auto-detect signal** (preference-only — e.g. `sglang`, `tinygrad`, `whisperx`):
- Do NOT create an importer. Instead add the backend name to the curated pref-only slice in `core/http/endpoints/localai/backend.go` that feeds `/backends/known`. A single line addition.
4. **Always** add a table-driven test in `core/gallery/importers/importers_test.go` (Ginkgo/Gomega):
- Use a real public HuggingFace repo URI as the test fixture (existing tests already hit the live HF API — follow that pattern).
- Cover detection (auto-match without preferences), preference-override (explicit `backend:` in preferences wins), and — if the backend's modality has a common `pipeline_tag` but ambiguous artefacts — an ambiguity test asserting `errors.Is(err, importers.ErrAmbiguousImport)`.
Rules of thumb:
- When in doubt, lean pref-only. A wrong auto-detect is worse than a forced preference.
- Never silently emit a modality mismatch (e.g. emit `llama-cpp` for a TTS repo because `.gguf` is present). Return `ErrAmbiguousImport` instead.
- Registration order is the single most common source of bugs. Check by running `go test ./core/gallery/importers/...` — the existing suite will fail if you've shadowed a pre-existing detector.
## 6. Example: Adding a Python Backend
For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
- **Makefile updates**:
- Added to `.NOTPARALLEL` line
- Added to `prepare-test-extra` and `test-extra` targets
- Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
- Added eval for docker-build target generation
- Added `docker-build-moonshine` to `docker-build-backends`

View File

@@ -0,0 +1,111 @@
# Adding GGUF Models from HuggingFace to the Gallery
When adding a GGUF model from HuggingFace to the LocalAI model gallery, follow this guide.
## Gallery file
All models are defined in `gallery/index.yaml`. Find the appropriate section (embedding models near other embeddings, chat models near similar chat models) and add a new entry.
## Getting the SHA256
GGUF files on HuggingFace expose their SHA256 via the `x-linked-etag` HTTP header. Fetch it with:
```bash
curl -sI "https://huggingface.co/<org>/<repo>/resolve/main/<filename>.gguf" | grep -i x-linked-etag
```
The value (without quotes) is the SHA256 hash. Example:
```bash
curl -sI "https://huggingface.co/ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/resolve/main/embeddinggemma-300m-qat-Q8_0.gguf" | grep -i x-linked-etag
# x-linked-etag: "6fa0c02a9c302be6f977521d399b4de3a46310a4f2621ee0063747881b673f67"
```
**Important**: Pay attention to exact filename casing — HuggingFace filenames are case-sensitive (e.g., `Q8_0` vs `q8_0`). Check the repo's file listing to get the exact name.
## Entry format — Embedding models
Embedding models use `gallery/virtual.yaml` as the base config and set `embeddings: true`:
```yaml
- name: "model-name"
url: github:mudler/LocalAI/gallery/virtual.yaml@master
urls:
- https://huggingface.co/<original-model-org>/<original-model-name>
- https://huggingface.co/<gguf-org>/<gguf-repo-name>
description: |
Short description of the model, its size, and capabilities.
tags:
- embeddings
overrides:
backend: llama-cpp
embeddings: true
parameters:
model: <filename>.gguf
files:
- filename: <filename>.gguf
uri: huggingface://<gguf-org>/<gguf-repo-name>/<filename>.gguf
sha256: <sha256-hash>
```
## Entry format — Chat/LLM models
Chat models typically reference a template config (e.g., `gallery/gemma.yaml`, `gallery/chatml.yaml`) that defines the prompt format. Use YAML anchors (`&name` / `*name`) if adding multiple quantization variants of the same model:
```yaml
- &model-anchor
url: "github:mudler/LocalAI/gallery/<template>.yaml@master"
name: "model-name"
icon: https://example.com/icon.png
license: <license>
urls:
- https://huggingface.co/<org>/<model>
- https://huggingface.co/<gguf-org>/<gguf-repo>
description: |
Model description.
tags:
- llm
- gguf
- gpu
- cpu
overrides:
parameters:
model: <filename>-Q4_K_M.gguf
files:
- filename: <filename>-Q4_K_M.gguf
sha256: <sha256>
uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q4_K_M.gguf
```
To add a variant (e.g., different quantization), use YAML merge:
```yaml
- !!merge <<: *model-anchor
name: "model-name-q8"
overrides:
parameters:
model: <filename>-Q8_0.gguf
files:
- filename: <filename>-Q8_0.gguf
sha256: <sha256>
uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q8_0.gguf
```
## Available template configs
Look at existing `.yaml` files in `gallery/` to find the right prompt template for your model architecture:
- `gemma.yaml` — Gemma-family models (gemma, embeddinggemma, etc.)
- `chatml.yaml` — ChatML format (many Mistral/OpenHermes models)
- `deepseek.yaml` — DeepSeek models
- `virtual.yaml` — Minimal base (good for embedding models that don't need chat templates)
## Checklist
1. **Find the GGUF file** on HuggingFace — note exact filename (case-sensitive)
2. **Get the SHA256** using the `curl -sI` + `x-linked-etag` method above
3. **Choose the right template** config from `gallery/` based on model architecture
4. **Add the entry** to `gallery/index.yaml` near similar models
5. **Set `embeddings: true`** if it's an embedding model
6. **Include both URLs** — the original model page and the GGUF repo
7. **Write a description** — mention model size, capabilities, and quantization type

View File

@@ -0,0 +1,101 @@
# AI Coding Assistants
This document provides guidance for AI tools and developers using AI
assistance when contributing to LocalAI.
**LocalAI follows the same guidelines as the Linux kernel project for
AI-assisted contributions.** See the upstream policy here:
<https://docs.kernel.org/process/coding-assistants.html>
The rules below mirror that policy, adapted to LocalAI's license and
project layout. If anything is unclear, the kernel document is the
authoritative reference for intent.
AI tools helping with LocalAI development should follow the standard
project development process:
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
conventions, and PR guidelines
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
logging, and documentation conventions
- [.agents/building-and-testing.md](building-and-testing.md) — build and
test procedures
## Licensing and Legal Requirements
All contributions must comply with LocalAI's licensing requirements:
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
file
- New source files should use the SPDX license identifier `MIT` where
applicable to the file type
- Contributions must be compatible with the MIT License and must not
introduce code under incompatible licenses (e.g., GPL) without an
explicit discussion with maintainers
## Signed-off-by and Developer Certificate of Origin
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
certify the Developer Certificate of Origin (DCO). The human submitter
is responsible for:
- Reviewing all AI-generated code
- Ensuring compliance with licensing requirements
- Adding their own `Signed-off-by` tag (when the project requires DCO)
to certify the contribution
- Taking full responsibility for the contribution
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
A human reviewer owns the contribution; the AI's involvement is recorded
via `Assisted-by` (see below).
## Attribution
When AI tools contribute to LocalAI development, proper attribution helps
track the evolving role of AI in the development process. Contributions
should include an `Assisted-by` tag in the commit message trailer in the
following format:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Where:
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
`Copilot`, `Cursor`)
- `MODEL_VERSION` — specific model version used (e.g.,
`claude-opus-4-7`, `gpt-5`)
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
Basic development tools (git, go, make, editors) should **not** be listed.
### Example
```
fix(llama-cpp): handle empty tool call arguments
Previously the parser panicked when the model returned a tool call with
an empty arguments object. Fall back to an empty JSON object in that
case so downstream consumers receive a valid payload.
Assisted-by: Claude:claude-opus-4-7 golangci-lint
Signed-off-by: Jane Developer <jane@example.com>
```
## Scope and Responsibility
Using an AI assistant does not reduce the contributor's responsibility.
The human submitter must:
- Understand every line that lands in the PR
- Verify that generated code compiles, passes tests, and follows the
project style
- Confirm that any referenced APIs, flags, or file paths actually exist
in the current tree (AI models may hallucinate identifiers)
- Not submit AI output verbatim without review
Reviewers may ask for clarification on any change regardless of how it
was produced. "An AI wrote it" is not an acceptable answer to a design
question.

View File

@@ -0,0 +1,345 @@
# API Endpoints and Authentication
This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
## Architecture overview
Authentication and authorization flow through three layers:
1. **Global auth middleware** (`core/http/auth/middleware.go``auth.Middleware`) — applied to every request in `core/http/app.go`. Handles session cookies, Bearer tokens, API keys, and legacy API keys. Populates `auth_user` and `auth_role` in the Echo context.
2. **Feature middleware** (`auth.RequireFeature`) — per-feature access control applied to route groups or individual routes. Checks if the authenticated user has the specific feature enabled.
3. **Admin middleware** (`auth.RequireAdmin`) — restricts endpoints to admin users only.
When auth is disabled (no auth DB, no legacy API keys), all middleware becomes pass-through (`auth.NoopMiddleware`).
## Adding a new API endpoint
### Step 1: Create the handler
Write the endpoint handler in the appropriate package under `core/http/endpoints/`. Follow existing patterns:
```go
// core/http/endpoints/localai/my_feature.go
func MyFeatureEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
// Use auth.GetUser(c) to get the authenticated user (may be nil if auth is disabled)
user := auth.GetUser(c)
// Your logic here
return c.JSON(http.StatusOK, result)
}
}
```
### Step 2: Register routes
Add routes in the appropriate file under `core/http/routes/`. The file you use depends on the endpoint category:
| File | Category |
|------|----------|
| `routes/openai.go` | OpenAI-compatible API endpoints (`/v1/...`) |
| `routes/localai.go` | LocalAI-specific endpoints (`/api/...`, `/models/...`, `/backends/...`) |
| `routes/agents.go` | Agent pool endpoints (`/api/agents/...`) |
| `routes/auth.go` | Auth endpoints (`/api/auth/...`) |
| `routes/ui_api.go` | UI backend API endpoints |
### Step 3: Apply the right middleware
Choose the appropriate protection level:
#### No auth required (public)
Exempt paths bypass auth entirely. Add to `isExemptPath()` in `middleware.go` or use the `/api/auth/` prefix (always exempt). Use sparingly — most endpoints should require auth.
#### Standard auth (any authenticated user)
The global middleware already handles this. API paths (`/api/`, `/v1/`, etc.) automatically require authentication when auth is enabled. You don't need to add any extra middleware.
```go
router.GET("/v1/my-endpoint", myHandler) // auth enforced by global middleware
```
#### Admin only
Pass `adminMiddleware` to the route. This is set up in `app.go` and passed to `Register*Routes` functions:
```go
// In the Register function signature, accept the middleware:
func RegisterMyRoutes(router *echo.Echo, app *application.Application, adminMiddleware echo.MiddlewareFunc) {
router.POST("/models/apply", myHandler, adminMiddleware)
}
```
#### Feature-gated
For endpoints that should be toggleable per-user, use feature middleware. There are two approaches:
**Approach A: Route-level middleware** (preferred for groups of related endpoints)
```go
// In app.go, create the feature middleware:
myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
// Pass it to the route registration function:
routes.RegisterMyRoutes(e, app, myFeatureMw)
// In the routes file, apply to a group:
g := e.Group("/api/my-feature", myFeatureMw)
g.GET("", listHandler)
g.POST("", createHandler)
```
**Approach B: RouteFeatureRegistry** (preferred for individual OpenAI-compatible endpoints)
Add an entry to `RouteFeatureRegistry` in `core/http/auth/features.go`. The `RequireRouteFeature` global middleware will automatically enforce it:
```go
var RouteFeatureRegistry = []RouteFeature{
// ... existing entries ...
{"POST", "/v1/my-endpoint", FeatureMyFeature},
}
```
## Adding a new feature
When you need a new toggleable feature (not just a new endpoint under an existing feature):
### 1. Define the feature constant
Add to `core/http/auth/permissions.go`:
```go
const (
// Add to the appropriate group:
// Agent features (default OFF for new users)
FeatureMyFeature = "my_feature"
// OR API features (default ON for new users)
FeatureMyFeature = "my_feature"
)
```
Then add it to the appropriate slice:
```go
// Default OFF — user must be explicitly granted access:
var AgentFeatures = []string{..., FeatureMyFeature}
// Default ON — user has access unless explicitly revoked:
var APIFeatures = []string{..., FeatureMyFeature}
```
### 2. Add feature metadata
In `core/http/auth/features.go`, add to the appropriate `FeatureMetas` function so the admin UI can display it:
```go
func AgentFeatureMetas() []FeatureMeta {
return []FeatureMeta{
// ... existing ...
{FeatureMyFeature, "My Feature", false}, // false = default OFF
}
}
```
### 3. Wire up the middleware
In `core/http/app.go`:
```go
myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
```
Then pass it to the route registration function.
### 4. Register route-feature mappings (if applicable)
If your feature gates standard API endpoints (like `/v1/...`), add entries to `RouteFeatureRegistry` in `features.go` instead of using per-route middleware.
## Accessing the authenticated user in handlers
```go
import "github.com/mudler/LocalAI/core/http/auth"
func MyHandler(c echo.Context) error {
// Get the user (nil when auth is disabled or unauthenticated)
user := auth.GetUser(c)
if user == nil {
// Handle unauthenticated — or let middleware handle it
}
// Check role
if user.Role == auth.RoleAdmin {
// admin-specific logic
}
// Check feature access programmatically (when you need conditional behavior, not full blocking)
if auth.HasFeatureAccess(db, user, auth.FeatureMyFeature) {
// feature-specific logic
}
// Check model access
if !auth.IsModelAllowed(db, user, modelName) {
return c.JSON(http.StatusForbidden, ...)
}
}
```
## Middleware composition patterns
Middleware can be composed at different levels. Here are the patterns used in the codebase:
### Group-level middleware (agents pattern)
```go
// All routes in the group share the middleware
g := e.Group("/api/agents", poolReadyMw, agentsMw)
g.GET("", listHandler)
g.POST("", createHandler)
```
### Per-route middleware (localai pattern)
```go
// Individual routes get middleware as extra arguments
router.POST("/models/apply", applyHandler, adminMiddleware)
router.GET("/metrics", metricsHandler, adminMiddleware)
```
### Middleware slice (openai pattern)
```go
// Build a middleware chain for a handler
chatMiddleware := []echo.MiddlewareFunc{
usageMiddleware,
traceMiddleware,
modelFilterMiddleware,
}
app.POST("/v1/chat/completions", chatHandler, chatMiddleware...)
```
## Error response format
Always use `schema.ErrorResponse` for auth/permission errors to stay consistent with the OpenAI-compatible API:
```go
return c.JSON(http.StatusForbidden, schema.ErrorResponse{
Error: &schema.APIError{
Message: "feature not enabled for your account",
Code: http.StatusForbidden,
Type: "authorization_error",
},
})
```
Use these HTTP status codes:
- `401 Unauthorized` — no valid credentials provided
- `403 Forbidden` — authenticated but lacking permission
- `429 Too Many Requests` — rate limited (auth endpoints)
## Usage tracking
If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
## Advertising surfaces — where to register a new capability
Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
### 1. Swagger `@Tags` annotation (mandatory)
Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
```go
// MyEndpoint does X.
// @Summary Do X.
// @Tags my-capability
// @Param request body schema.MyRequest true "payload"
// @Success 200 {object} schema.MyResponse "Response"
// @Router /v1/my-endpoint [post]
func MyEndpoint(...) echo.HandlerFunc { ... }
```
Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
After adding endpoints, regenerate the embedded spec so the runtime serves it:
```bash
make protogen-go # ensures gRPC codegen is fresh first
make swagger # regenerates swagger/swagger.json
```
### 2. `/api/instructions` registry (for new capability areas)
`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
Add an entry to `instructionDefs`:
```go
{
Name: "my-capability", // URL segment at /api/instructions/my-capability
Description: "Short sentence describing the capability",
Tags: []string{"my-capability"}, // must match swagger @Tags
Intro: "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
},
```
Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
```js
export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
```
React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
### 4. `docs/content/` (user-facing documentation)
A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
## Path protection rules
The global auth middleware classifies paths as API paths or non-API paths:
- **API paths** (always require auth when auth is enabled): `/api/`, `/v1/`, `/models/`, `/backends/`, `/backend/`, `/tts`, `/vad`, `/video`, `/stores/`, `/system`, `/ws/`, `/metrics`
- **Exempt paths** (never require auth): `/api/auth/` prefix, anything in `appConfig.PathWithoutAuth`
- **Non-API paths** (UI, static assets): pass through without auth — the React UI handles login redirects client-side
If you add endpoints under a new top-level path prefix, add it to `isAPIPath()` in `middleware.go` to ensure it requires authentication.
## Checklist
When adding a new endpoint:
**Routing & auth**
- [ ] Handler in `core/http/endpoints/`
- [ ] Route registered in appropriate `core/http/routes/` file
- [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If token-counting: `usageMiddleware` added to middleware chain
**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
**Quality**
- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
- [ ] Tests cover both authenticated and unauthenticated access
- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
## Companion: MCP admin tool surface
**Required for admin endpoints.** Every new admin endpoint MUST be considered for the MCP admin tool surface — the REST API and the MCP tool catalog can drift silently otherwise, and both the LocalAI Assistant chat modality and the standalone `local-ai mcp-server` rely on `pkg/mcp/localaitools/` to mirror REST.
Two outcomes are acceptable; one is not:
- **Tool added.** The new endpoint is something an admin would manage conversationally (install, list, edit, toggle, upgrade). Follow the full checklist in [.agents/localai-assistant-mcp.md](localai-assistant-mcp.md): add a `LocalAIClient` interface method, implement it in both `inproc` and `httpapi`, register the tool with a `Tool*` constant, update the skill prompts, **and add the route to `toolToHTTPRoute` in `pkg/mcp/localaitools/coverage_test.go`**.
- **Tool deliberately skipped.** The endpoint is internal/diagnostic and adding a chat path would be misleading. Document the decision in the PR description; no code action.
- **Forgot.** This breaks the contract. The `TestToolHTTPRouteMappingComplete` test in `pkg/mcp/localaitools` is a partial guard (it checks every `Tool*` has a route mapping), but it does NOT detect new REST endpoints without a tool — that's still a process check on the PR author.
**Add to the bottom of the checklist below**:
- [ ] If admin: decided whether MCP coverage is needed; if yes, tool registered + map updated; if no, skip-reason in PR description.

View File

@@ -0,0 +1,16 @@
# Build and Testing
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
## Building a specified backend
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
- l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:7.2.1 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.

280
.agents/ci-caching.md Normal file
View File

@@ -0,0 +1,280 @@
# CI Build Caching
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
## Cache layout
- **Cache registry**: `quay.io/go-skynet/ci-cache`
- **Tag prefixes**:
- Backend builds (`backend_build.yml`) buildkit cache: `cache<tag-suffix>`
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
- Root image builds (`image_build.yml`) buildkit cache: `cache-localai<tag-suffix>`
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
- Layered base builds (`base_images.yml`) buildkit cache: `base-<stem>`
- e.g. `base-python-cpu-2404`, `base-cpp-cublas-2404-cuda13.0`
- Layered base **images** (the OCI manifests consumers FROM): `base-image-<stem>[-pr<N>]`
- e.g. `base-image-python-cpu-2404`, `base-image-cpp-cublas-2404-cuda13.0-pr9672`
- The cache tags store multi-arch BuildKit cache manifests (`mode=max`); the `base-image-*` tags store ordinary OCI image manifests.
## Read/write semantics
| Trigger | `cache-from` | `cache-to` |
|---|---|---|
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
| `pull_request` | yes | **no** |
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
## Self-warming, no separate populator
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
## The `DEPS_REFRESH` cache-buster (Python backends)
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
```dockerfile
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
```
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
`DEPS_REFRESH` defends against that:
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
- Within a week, builds stay warm.
This applies only to `Dockerfile.python` because:
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
### Adjusting the cadence
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
## Manually evicting cache
To force a fully cold build for one backend or the whole image:
```bash
# Delete a single tag (requires quay credentials with admin on the repo)
curl -X DELETE \
-H "Authorization: Bearer ${QUAY_TOKEN}" \
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
# List all tags
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
```
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
## What the cache **does not** cover
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
## Darwin native caches
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
| Cache | Path(s) | Key | Scope |
|---|---|---|---|
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
### Cache budget on Darwin
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
## Layered base images (`ci-cache:base-image-*`)
The registry-backed BuildKit cache deduplicates **within** a matrix entry's
cache tag, but each matrix entry has its own tag — so the same `apt-get`,
GPU SDK install, and language toolchain bootstrap runs into N different
cache tags across the backend matrix. The layered base images factor that
shared work out of the per-backend builds.
They live in the same `quay.io/go-skynet/ci-cache` repo as the buildkit
caches, under a distinct `base-image-` tag prefix so the OCI image
manifests coexist with `base-<stem>` (the cache for building the base),
`cache<tag-suffix>` (per-backend caches), and `cache-localai<tag-suffix>`
(root image caches). Reusing `ci-cache` means no new quay repo or robot
grant is needed — the same credentials that write the cache also write
the image.
### How it fits together
```
.github/backend-matrix.yaml # raw matrix data (linux + darwin)
backend.yml / backend_pr.yml
├── derive-bases / generate-matrix
│ scripts/changed-backends.js
│ reads .github/backend-matrix.yaml
│ (PR mode also reads changed files)
│ emits:
│ - matrix (annotated with base-image-prebuilt)
│ - matrix-darwin
│ - bases-matrix (deduplicated by tag-stem)
├── build-bases (matrix: bases-matrix)
│ uses base_images.yml
│ FROM .docker/bases/Dockerfile.<lang>
│ pushes quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]
└── backend-jobs (matrix: matrix; needs build-bases)
uses backend_build.yml
FROM ${BASE_IMAGE_PREBUILT}
i.e. quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]
only the backend source COPY + `make` remain.
```
The base image is **always** built before backends consume it, in the same
workflow run. There is no cross-workflow dependency, no chicken-and-egg
on first push, and no manual matrix to keep in sync — adding a backend
matrix entry is just an edit to `.github/backend-matrix.yaml`.
### Tag scheme
`<stem>` is computed by `tagStem()` in `scripts/changed-backends.js` from
the (lang, build-type, ubuntu, cuda, base-image) tuple. Arch is
intentionally NOT in the stem — bases are built multi-arch when any
consumer needs multi-arch, and single-arch otherwise (the `platforms`
field on each base entry is the union of its consumers' platforms).
| Build-type | Stem template |
|---|---|
| `''` (CPU) | `<lang>-cpu-<ubuntu>[-<base-image-slug>]` |
| `cublas` / `l4t` | `<lang>-<build-type>-<ubuntu>-cuda<major>.<minor>[-<base-image-slug>]` |
| anything else (vulkan, hipblas, intel, sycl_*) | `<lang>-<build-type>-<ubuntu>[-<base-image-slug>]` |
The base-image slug is empty for the default `ubuntu:24.04` and a short
parseable suffix otherwise (`jetpack-r36.4.0`, `rocm-7.2.1`,
`oneapi-2025.3.2`, etc.).
| Event | Pushed tag (in `quay.io/go-skynet/ci-cache`) |
|---|---|
| `push` (master/tag) | `:base-image-<stem>` |
| `pull_request` | `:base-image-<stem>-pr<PR_NUMBER>` |
The buildkit cache for the base build itself lives at
`quay.io/go-skynet/ci-cache:base-<stem>` (`mode=max,ignore-error=true`),
parallel to the per-matrix-entry caches. The `base-` (cache) and
`base-image-` (image) prefixes never collide.
The script also runs a collision check across consumers of each stem: if
two consumers map to the same stem but disagree on `base-image` or
`skip-drivers` (and skip-drivers is meaningful for that build-type), the
script fails loudly. Resolve by encoding the differing input in
`tagStem()` rather than letting the dedup silently pick a winner.
### PR testability
PRs run the same pipeline as master: derive bases → build bases (tagged
`-pr<N>`) → run filtered backend matrix consuming those `-pr<N>` tags.
End-to-end validation always lives within the PR.
For PRs that only change `.docker/bases/Dockerfile.<lang>` (no backend
source touched), `changed-backends.js` adds one canary backend matrix
entry per (lang × build-type × arch × cuda × ubuntu) tuple to the filtered
matrix so each base flavour gets exercised.
### Existing language tiers
| Tier (lang) | Recipe | Consumer Dockerfile(s) | Distinct stems |
|---|---|---|---|
| `python` | `.docker/bases/Dockerfile.python` | `backend/Dockerfile.python` | 9 |
| `golang` | `.docker/bases/Dockerfile.golang` | `backend/Dockerfile.golang` | 8 |
| `cpp` | `.docker/bases/Dockerfile.cpp` (apt + GPU + protoc + cmake + GRPC) | `backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` | 8 |
| `rust` | `.docker/bases/Dockerfile.rust` | `backend/Dockerfile.rust` | 1 |
The C++ trio share a single `cpp` base because they only differ in their
per-backend `make` targets. `langOf()` in `scripts/changed-backends.js`
remaps `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}``cpp` so dedup
works across the trio. If a future C++ consumer needs a *different* base
(e.g. without GRPC, or with a different protoc version), give it its own
`Dockerfile.<newlang>` recipe and remove it from the cpp remap.
### Adding a new (accel × arch × cuda × lang) flavour
Just add the matrix entry to `.github/backend-matrix.yaml` for the new
flavour. The bases matrix and the per-entry `base-image-prebuilt` are
derived automatically by `scripts/changed-backends.js`. Nothing else to
change.
### Adding a new language tier
1. Create `.docker/bases/Dockerfile.<lang>` mirroring an existing tier
(apt + accel install + lang-specific toolchain).
2. Slim `backend/Dockerfile.<lang>` to `FROM ${BASE_IMAGE_PREBUILT}` plus
the per-backend source COPY + build (no inline accel install).
3. Add the new recipe to `baseTriggerFiles` in
`scripts/changed-backends.js` so PRs touching it fan out to canaries.
4. Add `<lang>: (item) => item.dockerfile.endsWith("<lang>")` to
`langTriggerSelector` in the same file.
5. Add a `LOCAL_BASE_<LANG>_TAG`, a `docker-build-<lang>-base` target,
and a clause in `local-base-tag` / `local-base-target` in `Makefile`.
The `langsWithBase` set in `scripts/changed-backends.js` is auto-detected
from the `.docker/bases/` directory at script startup, so step 1 alone is
enough for the script to start emitting bases (and annotating matrix
entries with `base-image-prebuilt`) for that lang. Steps 35 plug it
into the canary fan-out and the local-build path.
### Why not just rely on `mode=max` cache?
`mode=max` deduplicates at the layer level, but each matrix entry has its
own cache tag (`cache<tag-suffix>`). A change that invalidates the GPU SDK
layer in one backend does not invalidate it in any other; each entry pays
the full cost on its next rebuild. The shared base image is built once per
(accel × arch × cuda × lang), then pulled by every backend that consumes
it — that's the actual cross-matrix dedup.
### Local builds
All `backend/Dockerfile.{python,golang,cpp,rust}` consumers require
`BASE_IMAGE_PREBUILT` (no inline fallback). The Makefile wires the right
`docker-build-<lang>-base` as a prerequisite for each backend's
`docker-build-<backend>` target, so:
```bash
# Build any backend; the matching base is built first if needed.
make docker-build-vllm BUILD_TYPE=cublas CUDA_MAJOR_VERSION=12 CUDA_MINOR_VERSION=8
make docker-build-llama-cpp BUILD_TYPE=cublas CUDA_MAJOR_VERSION=13 CUDA_MINOR_VERSION=0
make docker-build-rerankers # golang
make docker-build-kokoros # rust
```
Or build a base directly: `make docker-build-{python,golang,cpp,rust}-base
BUILD_TYPE=...`. Or pull a pre-built one from quay if it exists for your
target tuple.
## Touching the cache pipeline
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
5. **`tagStem()` in `scripts/changed-backends.js` is the single source of truth for base image tags.** The matrix entries are annotated with `base-image-prebuilt` in the same script run; backend-jobs reads the value as-is. There's no parallel YAML expression to keep in sync. Adding a new dimension to the stem (e.g. a slug for a new base-image variant) is a script change only.

60
.agents/coding-style.md Normal file
View File

@@ -0,0 +1,60 @@
# Coding Style
The project has the following .editorconfig:
```
root = true
[*]
indent_style = space
indent_size = 2
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.go]
indent_style = tab
[Makefile]
indent_style = tab
[*.proto]
indent_size = 2
[*.py]
indent_size = 4
[*.js]
indent_size = 2
[*.yaml]
indent_size = 2
[*.md]
trim_trailing_whitespace = false
```
- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
- Prefer modern Go e.g. use `any` not `interface{}`
## Logging
Use `github.com/mudler/xlog` for logging which has the same API as slog.
## Go tests
All Go tests — including backend tests — must use [Ginkgo](https://onsi.github.io/ginkgo/) (v2) with Gomega matchers, not the stdlib `testing` package with `t.Run` / `t.Errorf`. A test file should register a suite with `RegisterFailHandler(Fail)` in a `TestXxx(t *testing.T)` bootstrap and use `Describe`/`Context`/`It` blocks for the actual cases. Look at any existing `*_test.go` under `core/` or `pkg/` for a template.
Do not mix styles within a package. If you are extending tests in a package that already uses Ginkgo, keep using Ginkgo. If you find stdlib-style Go tests in the tree, treat them as tech debt to be migrated rather than as a pattern to follow.
This is enforced by `golangci-lint` via the `forbidigo` linter (see `.golangci.yml`); calls like `t.Errorf` / `t.Fatalf` / `t.Run` / `t.Skip` / `t.Logf` are flagged. Run `make lint` locally before submitting; the same check runs in CI (`.github/workflows/lint.yml`).
## Documentation
The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
- **Shortcodes**: Use `{{% notice note %}}`, `{{% notice tip %}}`, or `{{% notice warning %}}` for callout boxes. Do **not** use `{{% alert %}}` — that shortcode does not exist in this project's Hugo theme and will break the docs build.

View File

@@ -0,0 +1,141 @@
# Debugging and Rebuilding Backends
When a backend fails at runtime (e.g. a gRPC method error, a Python import error, or a dependency conflict), use this guide to diagnose, fix, and rebuild.
## Architecture Overview
- **Source directory**: `backend/python/<name>/` (or `backend/go/<name>/`, `backend/cpp/<name>/`)
- **Installed directory**: `backends/<name>/` — this is what LocalAI actually runs. It is populated by `make backends/<name>` which builds a Docker image, exports it, and installs it via `local-ai backends install`.
- **Virtual environment**: `backends/<name>/venv/` — the installed Python venv (for Python backends). The Python binary is at `backends/<name>/venv/bin/python`.
Editing files in `backend/python/<name>/` does **not** affect the running backend until you rebuild with `make backends/<name>`.
## Diagnosing Failures
### 1. Check the logs
Backend gRPC processes log to LocalAI's stdout/stderr. Look for lines tagged with the backend's model ID:
```
GRPC stderr id="trl-finetune-127.0.0.1:37335" line="..."
```
Common error patterns:
- **"Method not implemented"** — the backend is missing a gRPC method that the Go side calls. The model loader (`pkg/model/initializers.go`) always calls `LoadModel` after `Health`; fine-tuning backends must implement it even as a no-op stub.
- **Python import errors / `AttributeError`** — usually a dependency version mismatch (e.g. `pyarrow` removing `PyExtensionType`).
- **"failed to load backend"** — the gRPC process crashed or never started. Check stderr lines for the traceback.
### 2. Test the Python environment directly
You can run the installed venv's Python to check imports without starting the full server:
```bash
backends/<name>/venv/bin/python -c "import datasets; print(datasets.__version__)"
```
If `pip` is missing from the venv, bootstrap it:
```bash
backends/<name>/venv/bin/python -m ensurepip
```
Then use `backends/<name>/venv/bin/python -m pip install ...` to test fixes in the installed venv before committing them to the source requirements.
### 3. Check upstream dependency constraints
When you hit a dependency conflict, check what the main library expects. For example, TRL's upstream `requirements.txt`:
```
https://github.com/huggingface/trl/blob/main/requirements.txt
```
Pin minimum versions in the backend's requirements files to match upstream.
## Common Fixes
### Missing gRPC methods
If the Go side calls a method the backend doesn't implement (e.g. `LoadModel`), add a no-op stub in `backend.py`:
```python
def LoadModel(self, request, context):
"""No-op — actual loading happens elsewhere."""
return backend_pb2.Result(success=True, message="OK")
```
The gRPC contract requires `LoadModel` to succeed for the model loader to return a usable client, even if the backend doesn't need upfront model loading.
### Dependency version conflicts
Python backends often break when a transitive dependency releases a breaking change (e.g. `pyarrow` removing `PyExtensionType`). Steps:
1. Identify the broken import in the logs
2. Test in the installed venv: `backends/<name>/venv/bin/python -c "import <module>"`
3. Check upstream requirements for version constraints
4. Update **all** requirements files in `backend/python/<name>/`:
- `requirements.txt` — base deps (grpcio, protobuf)
- `requirements-cpu.txt` — CPU-specific (includes PyTorch CPU index)
- `requirements-cublas12.txt` — CUDA 12
- `requirements-cublas13.txt` — CUDA 13
5. Rebuild: `make backends/<name>`
### PyTorch index conflicts (uv resolver)
The Docker build uses `uv` for pip installs. When `--extra-index-url` points to the PyTorch wheel index, `uv` may refuse to fetch packages like `requests` from PyPI if it finds a different version on the PyTorch index first. Fix this by adding `--index-strategy=unsafe-first-match` to `install.sh`:
```bash
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
installRequirements
```
Most Python backends already do this — check `backend/python/transformers/install.sh` or similar for reference.
## Rebuilding
### Rebuild a single backend
```bash
make backends/<name>
```
This runs the Docker build (`Dockerfile.python`), exports the image to `backend-images/<name>.tar`, and installs it into `backends/<name>/`. It also rebuilds the `local-ai` Go binary (without extra tags).
**Important**: If you were previously running with `GO_TAGS=auth`, the `make backends/<name>` step will overwrite your binary without that tag. Rebuild the Go binary afterward:
```bash
GO_TAGS=auth make build
```
### Rebuild and restart
After rebuilding a backend, you must restart LocalAI for it to pick up the new backend files. The backend gRPC process is spawned on demand when the model is first loaded.
```bash
# Kill existing process
kill <pid>
# Restart
./local-ai run --debug [your flags]
```
### Quick iteration (skip Docker rebuild)
For fast iteration on a Python backend's `backend.py` without a full Docker rebuild, you can edit the installed copy directly:
```bash
# Edit the installed copy
vim backends/<name>/backend.py
# Restart LocalAI to respawn the gRPC process
```
This is useful for testing but **does not persist** — the next `make backends/<name>` will overwrite it. Always commit fixes to the source in `backend/python/<name>/`.
## Verification
After fixing and rebuilding:
1. Start LocalAI and confirm the backend registers: look for `Registering backend name="<name>"` in the logs
2. Trigger the operation that failed (e.g. start a fine-tuning job)
3. Watch the GRPC stderr/stdout lines for the backend's model ID
4. Confirm no errors in the traceback

View File

@@ -0,0 +1,77 @@
# llama.cpp Backend
The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
## Building and Testing
- Test llama.cpp backend compilation: `make backends/llama-cpp`
- The backend is built as part of the main build process
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
## Architecture
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
## Common Issues When Updating llama.cpp
When fixing compilation errors after upstream changes:
1. Check how `server.cpp` (HTTP server) handles the same change
2. Look for new public APIs or getter methods
3. Store copies of needed data instead of accessing private members
4. Update function calls to match new signatures
5. Test with `make backends/llama-cpp`
## Key Differences from HTTP Server
- gRPC uses `BackendServiceImpl` class with gRPC service methods
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
## Tool Call Parsing Maintenance
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
### Checking for XML Parsing Changes
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
- `COMMON_CHAT_FORMAT_GLM_4_5`
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
- `COMMON_CHAT_FORMAT_KIMI_K2`
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
- Any new formats added
### Model Configuration Options
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
4. **Examples of options to check**:
- `ctx_shift` - Context shifting support
- `parallel_tool_calls` - Parallel tool calling
- `reasoning_format` - Reasoning format options
- Any new flags or parameters
### Implementation Guidelines
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
3. **Documentation**: Update relevant documentation when adding new formats or options
4. **Backward Compatibility**: Ensure changes don't break existing functionality
### Files to Monitor
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options

View File

@@ -0,0 +1,97 @@
# LocalAI Assistant — admin MCP server
This document is the contract for **anyone** (human or AI agent) touching LocalAI's admin REST surface, the in-process MCP server that wraps it, or the embedded skill prompts that teach the assistant how to use it. Read this before adding/removing/renaming admin endpoints, MCP tools, or skill recipes.
## What this feature is
`pkg/mcp/localaitools/` is a public Go package that exposes LocalAI's admin/management surface as an MCP server. It is used in two ways:
1. **In-process**: when an admin opens a chat with `metadata.localai_assistant=true`, the chat handler injects the in-memory MCP server (paired `net.Pipe()` transport, no HTTP loopback) so the LLM can install models, manage backends and edit configs by chatting.
2. **Standalone**: the `local-ai mcp-server --target=…` subcommand serves the same MCP server over stdio, talking HTTP to a remote LocalAI instance.
The two modes share **all** tool definitions and skill prompts. They differ only in their `LocalAIClient` implementation (`inproc/` calls services directly; `httpapi/` calls REST).
## The three things you must keep in sync
When you change LocalAI's admin surface, three layers must stay aligned:
1. **REST endpoint** in `core/http/endpoints/localai/*.go`.
2. **MCP tool registration** in `pkg/mcp/localaitools/tools_*.go`, plus a method on `LocalAIClient` (in `client.go`) and implementations in both `inproc/client.go` **and** `httpapi/client.go`.
3. **Skill prompt** under `pkg/mcp/localaitools/prompts/skills/*.md` — the markdown that teaches the LLM how to use the new tool. If the new tool fits an existing recipe, update that recipe; otherwise add a new file.
If you ship a REST endpoint without (2) and (3), conversational admins won't see the feature.
## Checklist for adding a new admin endpoint
- [ ] REST endpoint exists in `core/http/endpoints/localai/*.go` and is gated by `auth.RequireAdmin()` in `core/http/routes/localai.go`.
- [ ] `LocalAIClient` interface in `pkg/mcp/localaitools/client.go` has a method covering the new operation.
- [ ] DTOs added/updated in `pkg/mcp/localaitools/dto.go` (JSON-tagged; never expose raw service types).
- [ ] `inproc/client.go` implements the new method by calling the service directly (not via HTTP loopback).
- [ ] `httpapi/client.go` implements the new method by calling the REST endpoint.
- [ ] Tool registration added in the appropriate `pkg/mcp/localaitools/tools_*.go`. Mutating tools must reference safety rule 1 in the description.
- [ ] If the tool is mutating, ensure `Options{DisableMutating: true}` skips it (mirror the pattern in `tools_models.go`).
- [ ] Skill prompt added or updated under `pkg/mcp/localaitools/prompts/skills/`. The prompt must instruct the LLM when to call the tool, what to ask the user first, and what to do on error.
- [ ] Tests:
- `pkg/mcp/localaitools/server_test.go` adds the tool name to `expectedFullCatalog` and `expectedReadOnlyCatalog` (if read-only).
- Tool dispatch is added to `TestEachToolDispatchesToClient`.
- `pkg/mcp/localaitools/httpapi/client_test.go` covers the new HTTP path.
## Adding a new skill recipe (no new tool)
Sometimes you want to teach the LLM a new pattern that uses existing tools. Drop a markdown file under `pkg/mcp/localaitools/prompts/skills/<verb>_<noun>.md`. The file is automatically embedded by `//go:embed` and assembled into the system prompt in lexicographic order. No Go changes needed.
Conventions:
- Filename: `<verb>_<noun>.md` (e.g. `install_chat_model.md`, `upgrade_backend.md`).
- First line: `# Skill: <Title Case description>`.
- Number the steps. Reference exact tool names in backticks.
- If the skill mutates state, remind the LLM to confirm with the user.
## Code conventions
These rules guard against the magic-literal drift that surfaced in the first audit. Do not re-introduce bare strings.
- **Tool names** always come from the `Tool*` constants in `pkg/mcp/localaitools/tools.go`. Tool registrations, the test catalog (`server_test.go`'s `expectedFullCatalog` / `expectedReadOnlyCatalog`), and dispatch tables reference the constants. The embedded skill prompts under `prompts/` keep bare strings — that's the one allowed exception, and `TestPromptsContainSafetyAnchors` enforces alignment.
- **Toggle/pin actions** use the `modeladmin.Action` type (`pkg/mcp/localaitools` and `core/services/modeladmin`). Use `ActionEnable`/`ActionDisable`/`ActionPin`/`ActionUnpin`; never bare `"enable"`/`"pin"` strings.
- **Capability tags** for `list_installed_models` use the `localaitools.Capability` type (`capability.go`). The `LocalAIClient.ListInstalledModels` interface takes a typed `Capability`, and the `inproc` switch only accepts canonical values (`"embed"`/`"embedding"` are not aliases — only `CapabilityEmbeddings`).
- **HTTP error checks** in `httpapi.Client` use `errors.Is(err, ErrHTTPNotFound)`, not substring matches on `err.Error()`. The typed `*HTTPError` carries `StatusCode` and `Body`; add new sentinel errors as needed rather than re-introducing string matching.
- **Channel sends** to `GalleryService.ModelGalleryChannel` / `BackendGalleryChannel` from inproc clients MUST select on `ctx.Done()` so a cancelled chat completion releases the goroutine. See `inproc.sendModelOp` / `sendBackendOp`.
- **Disk writes** of model config YAML go through `modeladmin.writeFileAtomic` (temp file + `os.Rename`). `os.WriteFile` truncates on crash and corrupts the model.
- **MCP server lifecycle**: every initialised holder MUST register `Close()` with `signals.RegisterGracefulTerminationHandler`. The standalone `mcp-server` CLI uses `signal.NotifyContext` to honour SIGINT/SIGTERM.
## File map (where to look)
```
pkg/mcp/localaitools/
client.go # LocalAIClient interface + DTO registry
dto.go # JSON-tagged DTOs shared by both client impls
server.go # NewServer(client, opts) — registers tools
tools.go # Tool* name constants (single source of truth)
capability.go # Capability type + constants
tools_models.go # gallery_search, install_model, import_model_uri, ...
tools_backends.go
tools_config.go
tools_system.go
tools_state.go
prompts.go # //go:embed loader + SystemPrompt(opts)
prompts/00_role.md
prompts/10_safety.md # SAFETY RULES — change with care
prompts/20_tools.md # curated tool catalog with one-liners
prompts/skills/*.md
inproc/client.go # in-process LocalAIClient (services-direct)
httpapi/client.go # REST LocalAIClient (for standalone CLI / remote)
core/http/endpoints/mcp/
localai_assistant.go # process-wide holder + LocalToolExecutor
core/cli/mcp_server.go # local-ai mcp-server subcommand
```
## Why two clients
The in-process MCP server runs inside the same LocalAI binary that serves chat. Going over HTTP loopback would (a) require minting a synthetic admin API key for the server to authenticate against itself, (b) double-marshal every tool dispatch, and (c) lose access to in-process channels (e.g. `GalleryService.ModelGalleryChannel` for streaming install progress). So in-process uses `inproc.Client`. The standalone stdio CLI talks to a *remote* LocalAI; HTTP is the only option, so it uses `httpapi.Client`. Both implement the same `LocalAIClient` interface, and the parity test in `pkg/mcp/localaitools/parity_test.go` (when present) keeps their output equivalent.
## Why prompt-enforced confirmation, not code gates
The user chose KISS. Every mutating tool has a safety rule (`prompts/10_safety.md` rule 1) that requires the LLM to summarise the action and wait for explicit user confirmation before calling it. There is no `plan_*`/`apply_*` two-step in code. If you add a mutating tool, do **not** add per-tool confirmation logic in Go — instead, list the new tool name in `prompts/10_safety.md` so the LLM knows it falls under the confirmation rule.
## Distributed mode
The in-memory MCP server runs only on the head node (where the chat handler runs). `inproc.Client` wraps services that are already distributed-aware (`GalleryService` coordinates with workers; `ListNodes` reads the NATS-populated registry). No NATS routing of MCP tools — the admin surface lives on the head, period.

120
.agents/testing-mcp-apps.md Normal file
View File

@@ -0,0 +1,120 @@
# Testing MCP Apps (Interactive Tool UIs)
MCP Apps is an extension to MCP where tools declare interactive HTML UIs via `_meta.ui.resourceUri`. When the LLM calls such a tool, the UI renders the app in a sandboxed iframe inline in the chat. The app communicates bidirectionally with the host via `postMessage` (JSON-RPC) and can call server tools, send messages, and update model context.
Spec: https://modelcontextprotocol.io/extensions/apps/overview
## Quick Start: Run a Test MCP App Server
The `@modelcontextprotocol/server-basic-react` npm package is a ready-to-use test server that exposes a `get-time` tool with an interactive React clock UI. It requires Node >= 20, so run it in Docker:
```bash
docker run -d --name mcp-app-test -p 3001:3001 node:22-slim \
sh -c 'npx -y @modelcontextprotocol/server-basic-react'
```
Wait ~10 seconds for it to start, then verify:
```bash
# Check it's running
docker logs mcp-app-test
# Expected: "MCP server listening on http://localhost:3001/mcp"
# Verify MCP protocol works
curl -s -X POST http://localhost:3001/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
# List tools — should show get-time with _meta.ui.resourceUri
curl -s -X POST http://localhost:3001/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
```
The `tools/list` response should contain:
```json
{
"name": "get-time",
"_meta": {
"ui": { "resourceUri": "ui://get-time/mcp-app.html" }
}
}
```
## Testing in LocalAI's UI
1. Make sure LocalAI is running (e.g. `http://localhost:8080`)
2. Build the React UI: `cd core/http/react-ui && npm install && npm run build`
3. Open the Chat page in your browser
4. Click **"Client MCP"** in the chat header
5. Add a new client MCP server:
- **URL**: `http://localhost:3001/mcp`
- **Use CORS proxy**: enabled (default) — required because the browser can't hit `localhost:3001` directly due to CORS; LocalAI's proxy at `/api/cors-proxy` handles it
6. The server should connect and discover the `get-time` tool
7. Select a model and send: **"What time is it?"**
8. The LLM should call the `get-time` tool
9. The tool result should render the interactive React clock app in an iframe as a standalone chat message (not inside the collapsed activity group)
## What to Verify
- [ ] Tool appears in the connected tools list (not filtered — `get-time` is callable by the LLM)
- [ ] The iframe renders as a standalone chat message with a puzzle-piece icon
- [ ] The app loads and is interactive (clock UI, buttons work)
- [ ] No "Reconnect to MCP server" overlay (connection is live)
- [ ] Console logs show bidirectional communication:
- `tools/call` messages from app to host (app calling server tools)
- `ui/message` notifications (app sending messages)
- [ ] After the app renders, the LLM continues and produces a text response with the time
- [ ] Non-UI tools continue to work normally (text-only results)
- [ ] Page reload shows the HTML statically with a reconnect overlay until you reconnect
## Console Log Patterns
Healthy bidirectional communication looks like:
```
Parsed message { jsonrpc: "2.0", id: N, result: {...} } // Bridge init
get-time result: { content: [...] } // Tool result received
Calling get-time tool... // App calls tool
Sending message { method: "tools/call", ... } // App -> host -> server
Parsed message { jsonrpc: "2.0", id: N, result: {...} } // Server response
Sending message text to Host: ... // App sends message
Sending message { method: "ui/message", ... } // Message notification
Message accepted // Host acknowledged
```
Benign warnings to ignore:
- `Source map error: ... about:srcdoc` — browser devtools can't find source maps for srcdoc iframes
- `Ignoring message from unknown source` — duplicate postMessage from iframe navigation
- `notifications/cancelled` — app cleaning up previous requests
## Architecture Notes
- **No server-side changes needed** — the MCP App protocol runs entirely in the browser
- `PostMessageTransport` wraps `window.postMessage` between host and `srcdoc` iframe
- `AppBridge` (from `@modelcontextprotocol/ext-apps`) auto-forwards `tools/call`, `resources/read`, `resources/list` from the app to the MCP server via the host's `Client`
- The iframe uses `sandbox="allow-scripts allow-forms"` (no `allow-same-origin`) — opaque origin, no access to host cookies/DOM/localStorage
- App-only tools (`_meta.ui.visibility: "app-only"`) are filtered from the LLM's tool list but remain callable by the app iframe
## Key Files
- `core/http/react-ui/src/components/MCPAppFrame.jsx` — iframe + AppBridge component
- `core/http/react-ui/src/hooks/useMCPClient.js` — MCP client hook with app UI helpers (`hasAppUI`, `getAppResource`, `getClientForTool`, `getToolDefinition`)
- `core/http/react-ui/src/hooks/useChat.js` — agentic loop, attaches `appUI` to tool_result messages
- `core/http/react-ui/src/pages/Chat.jsx` — renders MCPAppFrame as standalone chat messages
## Other Test Servers
The `@modelcontextprotocol/ext-apps` repo has many example servers:
- `@modelcontextprotocol/server-basic-react` — simple clock (React)
- More examples at https://github.com/modelcontextprotocol/ext-apps/tree/main/examples
All examples support both stdio and HTTP transport. Run without `--stdio` for HTTP mode on port 3001.
## Cleanup
```bash
docker rm -f mcp-app-test
```

115
.agents/vllm-backend.md Normal file
View File

@@ -0,0 +1,115 @@
# Working on the vLLM Backend
The vLLM backend lives at `backend/python/vllm/backend.py` (async gRPC) and the multimodal variant at `backend/python/vllm-omni/backend.py` (sync gRPC). Both wrap vLLM's `AsyncLLMEngine` / `Omni` and translate the LocalAI gRPC `PredictOptions` into vLLM `SamplingParams` + outputs into `Reply.chat_deltas`.
This file captures the non-obvious bits — most of the bring-up was a single PR (`feat/vllm-parity`) and the things below are easy to get wrong.
## Tool calling and reasoning use vLLM's *native* parsers
Do not write regex-based tool-call extractors for vLLM. vLLM ships:
- `vllm.tool_parsers.ToolParserManager` — 50+ registered parsers (`hermes`, `llama3_json`, `llama4_pythonic`, `mistral`, `qwen3_xml`, `deepseek_v3`, `granite4`, `openai`, `kimi_k2`, `glm45`, …)
- `vllm.reasoning.ReasoningParserManager` — 25+ registered parsers (`deepseek_r1`, `qwen3`, `mistral`, `gemma4`, …)
Both can be used standalone: instantiate with a tokenizer, call `extract_tool_calls(text, request=None)` / `extract_reasoning(text, request=None)`. The backend stores the parser *classes* on `self.tool_parser_cls` / `self.reasoning_parser_cls` at LoadModel time and instantiates them per request.
**Selection:** vLLM does *not* auto-detect parsers from model name — neither does the LocalAI backend. The user (or `core/config/hooks_vllm.go`) must pick one and pass it via `Options[]`:
```yaml
options:
- tool_parser:hermes
- reasoning_parser:qwen3
```
Auto-defaults for known model families live in `core/config/parser_defaults.json` and are applied:
- at gallery import time by `core/gallery/importers/vllm.go`
- at model load time by the `vllm` / `vllm-omni` backend hook in `core/config/hooks_vllm.go`
User-supplied `tool_parser:`/`reasoning_parser:` in the config wins over defaults — the hook checks for existing entries before appending.
**When to update `parser_defaults.json`:** any time vLLM ships a new tool or reasoning parser, or you onboard a new model family that LocalAI users will pull from HuggingFace. The file is keyed by *family pattern* matched against `normalizeModelID(cfg.Model)` (lowercase, org-prefix stripped, `_``-`). Patterns are checked **longest-first** — keep `qwen3.5` before `qwen3`, `llama-3.3` before `llama-3`, etc., or the wrong family wins. Add a covering test in `core/config/hooks_test.go`.
**Sister file — `core/config/inference_defaults.json`:** same pattern but for sampling parameters (temperature, top_p, top_k, min_p, repeat_penalty, presence_penalty). Loaded by `core/config/inference_defaults.go` and applied by `ApplyInferenceDefaults()`. The schema is `map[string]float64` only — *strings don't fit*, which is why parser defaults needed their own JSON file. The inference file is **auto-generated from unsloth** via `go generate ./core/config/` (see `core/config/gen_inference_defaults/`) — don't hand-edit it; instead update the upstream source or regenerate. Both files share `normalizeModelID()` and the longest-first pattern ordering.
**Constructor compatibility gotcha:** the abstract `ToolParser.__init__` accepts `tools=`, but several concrete parsers (Hermes2ProToolParser, etc.) override `__init__` and *only* accept `tokenizer`. Always:
```python
try:
tp = self.tool_parser_cls(self.tokenizer, tools=tools)
except TypeError:
tp = self.tool_parser_cls(self.tokenizer)
```
## ChatDelta is the streaming contract
The Go side (`core/backend/llm.go`, `pkg/functions/chat_deltas.go`) consumes `Reply.chat_deltas` to assemble the OpenAI response. For tool calls to surface in `chat/completions`, the Python backend **must** populate `Reply.chat_deltas[].tool_calls` with `ToolCallDelta{index, id, name, arguments}`. Returning the raw `<tool_call>...</tool_call>` text in `Reply.message` is *not* enough — the Go regex fallback exists for llama.cpp, not for vllm.
Same story for `reasoning_content` — emit it on `ChatDelta.reasoning_content`, not as part of `content`.
## Message conversion to chat templates
`tokenizer.apply_chat_template()` expects a list of dicts, not proto Messages. The shared helper in `backend/python/common/vllm_utils.py` (`messages_to_dicts`) handles the mapping including:
- `tool_call_id` and `name` for `role="tool"` messages
- `tool_calls` JSON-string field → parsed Python list for `role="assistant"`
- `reasoning_content` for thinking models
Pass `tools=json.loads(request.Tools)` and (when `request.Metadata.get("enable_thinking") == "true"`) `enable_thinking=True` to `apply_chat_template`. Wrap in `try/except TypeError` because not every tokenizer template accepts those kwargs.
## CPU support and the SIMD/library minefield
vLLM publishes prebuilt CPU wheels at `https://github.com/vllm-project/vllm/releases/...`. The pin lives in `backend/python/vllm/requirements-cpu-after.txt`.
**Version compatibility — important:** newer vllm CPU wheels (≥ 0.15) declare `torch==2.10.0+cpu` as a hard dep, but `torch==2.10.0` only exists on the PyTorch test channel and pulls in an incompatible `torchvision`. Stay on **`vllm 0.14.1+cpu` + `torch 2.9.1+cpu`** until both upstream catch up. Bumping requires verifying torchvision/torchaudio match.
`requirements-cpu.txt` uses `--extra-index-url https://download.pytorch.org/whl/cpu`. `install.sh` adds `--index-strategy=unsafe-best-match` for the `cpu` profile so uv resolves transformers/vllm from PyPI while pulling torch from the PyTorch index.
**SIMD baseline:** the prebuilt CPU wheel is compiled with AVX-512 VNNI/BF16. On a CPU without those instructions, importing `vllm.model_executor.models.registry` SIGILLs at `_run_in_subprocess` time during model inspection. There is no runtime flag to disable it. Workarounds:
1. **Run on a host with the right SIMD baseline** (default — fast)
2. **Build from source** with `FROM_SOURCE=true` env var. Plumbing exists end-to-end:
- `install.sh` hides `requirements-cpu-after.txt`, runs `installRequirements` for the base deps, then clones vllm and `VLLM_TARGET_DEVICE=cpu uv pip install --no-deps .`
- `backend/Dockerfile.python` declares `ARG FROM_SOURCE` + `ENV FROM_SOURCE`
- `Makefile` `docker-build-backend` macro forwards `--build-arg FROM_SOURCE=$(FROM_SOURCE)` when set
- Source build takes 3050 minutes — too slow for per-PR CI but fine for local.
**Runtime shared libraries:** vLLM's `vllm._C` extension `dlopen`s `libnuma.so.1` at import time. If missing, the C extension silently fails and `torch.ops._C_utils.init_cpu_threads_env` is never registered → `EngineCore` crashes on `init_device` with:
```
AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env'
```
`backend/python/vllm/package.sh` bundles `libnuma.so.1` and `libgomp.so.1` into `${BACKEND}/lib/`, which `libbackend.sh` adds to `LD_LIBRARY_PATH` at run time. The builder stage in `backend/Dockerfile.python` installs `libnuma1`/`libgomp1` so package.sh has something to copy. Do *not* assume the production host has these — backend images are `FROM scratch`.
## Backend hook system (`core/config/backend_hooks.go`)
Per-backend defaults that used to be hardcoded in `ModelConfig.Prepare()` now live in `core/config/hooks_*.go` files and self-register via `init()`:
- `hooks_llamacpp.go` → GGUF metadata parsing, context size, GPU layers, jinja template
- `hooks_vllm.go` → tool/reasoning parser auto-selection from `parser_defaults.json`
Hook keys:
- `"llama-cpp"`, `"vllm"`, `"vllm-omni"`, … — backend-specific
- `""` — runs only when `cfg.Backend` is empty (auto-detect case)
- `"*"` — global catch-all, runs for every backend before specific hooks
Multiple hooks per key are supported and run in registration order. Adding a new backend default:
```go
// core/config/hooks_<backend>.go
func init() {
RegisterBackendHook("<backend>", myDefaults)
}
func myDefaults(cfg *ModelConfig, modelPath string) {
// only fill in fields the user didn't set
}
```
## The `Messages.ToProto()` fields you need to set
`core/schema/message.go:ToProto()` must serialize:
- `ToolCallID``proto.Message.ToolCallId` (for `role="tool"` messages — links result back to the call)
- `Reasoning``proto.Message.ReasoningContent`
- `ToolCalls``proto.Message.ToolCalls` (JSON-encoded string)
These were originally not serialized and tool-calling conversations broke silently — the C++ llama.cpp backend reads them but always got empty strings. Any new field added to `schema.Message` *and* `proto.Message` needs a matching line in `ToProto()`.

View File

@@ -10,7 +10,8 @@ services:
- 8080:8080
volumes:
- localai_workspace:/workspace
- ../models:/host-models
- models:/host-models
- backends:/host-backends
- ./customization:/devcontainer-customization
command: /bin/sh -c "while sleep 1000; do :; done"
cap_add:
@@ -39,6 +40,9 @@ services:
- GF_SECURITY_ADMIN_PASSWORD=grafana
volumes:
- ./grafana:/etc/grafana/provisioning/datasources
volumes:
prom_data:
localai_workspace:
localai_workspace:
models:
backends:

39
.docker/apt-mirror.sh Executable file
View File

@@ -0,0 +1,39 @@
#!/bin/sh
# Reconfigure Ubuntu apt sources to point at an alternate mirror.
#
# Used by Dockerfiles via `RUN --mount=type=bind,source=.docker/apt-mirror.sh,...`
# and by CI workflows on the runner to mitigate outages of the default
# archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com pool.
#
# Inputs (env):
# APT_MIRROR Replacement for archive.ubuntu.com and security.ubuntu.com
# (e.g. "http://azure.archive.ubuntu.com" or
# "https://mirrors.edge.kernel.org").
# Leave empty to keep upstream. The trailing "/ubuntu/..."
# path is preserved by the rewrite.
# APT_PORTS_MIRROR Replacement for ports.ubuntu.com (arm64/ppc64el/...).
# Leave empty to keep upstream.
#
# Both default to empty, in which case the script is a no-op.
set -e
if [ -z "${APT_MIRROR}" ] && [ -z "${APT_PORTS_MIRROR}" ]; then
exit 0
fi
# Ubuntu 24.04 (noble) ships DEB822 sources at /etc/apt/sources.list.d/ubuntu.sources;
# older releases use /etc/apt/sources.list. We rewrite whichever exists.
for f in /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list; do
[ -f "$f" ] || continue
if [ -n "${APT_MIRROR}" ]; then
# Use a comma delimiter so the alternation pipe in the regex
# is not interpreted as the s/// separator.
sed -i -E "s,https?://(archive\.ubuntu\.com|security\.ubuntu\.com),${APT_MIRROR},g" "$f"
fi
if [ -n "${APT_PORTS_MIRROR}" ]; then
sed -i -E "s,https?://ports\.ubuntu\.com,${APT_PORTS_MIRROR},g" "$f"
fi
done
echo "apt-mirror: rewrote sources (APT_MIRROR='${APT_MIRROR}', APT_PORTS_MIRROR='${APT_PORTS_MIRROR}')"

View File

@@ -0,0 +1,259 @@
# Shared C++ + accelerator base image for the llama-cpp / ik-llama-cpp /
# turboquant trio. They differ only in their Makefile targets at build
# time; the apt + GPU SDK + protoc + cmake + GRPC install is identical.
#
# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
# by .github/workflows/base_images.yml and pushed to
# quay.io/go-skynet/ci-cache:base-image-<tag-stem>[-pr<N>]. Consumed by
# backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant} via the
# BASE_IMAGE_PREBUILT build-arg. See .agents/ci-caching.md.
ARG BASE_IMAGE=ubuntu:24.04
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
FROM ${BASE_IMAGE} AS grpc
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG GRPC_VERSION=v1.65.0
ARG CMAKE_FROM_SOURCE=false
# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
ARG CMAKE_VERSION=3.31.10
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
WORKDIR /build
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
build-essential curl libssl-dev \
git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# Build GRPC into /opt/grpc so we can copy it into the final base without
# pulling in the full source tree. Mirrors the original two-stage layout in
# Dockerfile.llama-cpp; absorbing it here means consumers no longer pay the
# GRPC compile cost.
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
mkdir -p /build/grpc/cmake/build && \
cd /build/grpc/cmake/build && \
sed -i "216i\ TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
make && \
make install && \
rm -rf /build
FROM ${BASE_IMAGE}
ARG CMAKE_FROM_SOURCE=false
ARG CMAKE_VERSION=3.31.10
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
LABEL org.opencontainers.image.description="LocalAI C++ (llama-cpp/ik-llama-cpp/turboquant) base image"
LABEL org.localai.base.lang="cpp"
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
ccache git \
ca-certificates \
make \
pkg-config libcurl4-openssl-dev \
curl unzip \
libssl-dev wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
else
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
fi
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get install -y nvpl
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
ldconfig && \
echo "rocBLAS library data architectures:" && \
(ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
echo "WARNING: No rocBLAS kernel data found" \
; fi
# Install protoc (the version in 22.04 is too old, and grpc's bundled protoc
# would pull in a newer absl that breaks stablediffusion).
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
COPY --from=grpc /opt/grpc /usr/local

View File

@@ -0,0 +1,206 @@
# Shared Go + accelerator base image.
#
# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
# by .github/workflows/base_images.yml and pushed to
# quay.io/go-skynet/ci-cache:base-image-<tag-stem>[-pr<N>]. Consumed by
# backend/Dockerfile.golang via the BASE_IMAGE_PREBUILT build-arg.
#
# Mirrors the GPU stack stanzas in Dockerfile.python; the language-specific
# tail at the bottom installs Go + grpc tooling. See .agents/ci-caching.md.
ARG BASE_IMAGE=ubuntu:24.04
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
FROM ${BASE_IMAGE}
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.25.4
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
LABEL org.opencontainers.image.description="LocalAI Go+accelerator base image"
LABEL org.localai.base.lang="golang"
# gcc-14 is the default on noble (ubuntu:24.04) but absent from jammy
# (the L4T jetpack r36.4.0 base). LocalVQE needs it; the other Go backends
# compile with the default gcc shipped via build-essential. Try gcc-14
# from the configured repos and fall back gracefully when it's missing.
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
git ccache \
ca-certificates \
make cmake wget libopenblas-dev \
curl unzip \
libssl-dev && \
if apt-cache show gcc-14 >/dev/null 2>&1 && apt-cache show g++-14 >/dev/null 2>&1; then \
apt-get install -y --no-install-recommends gcc-14 g++-14 && \
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100 \
--slave /usr/bin/g++ g++ /usr/bin/g++-14 \
--slave /usr/bin/gcov gcov /usr/bin/gcov-14; \
fi && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
else
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
fi
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get install -y nvpl
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
ldconfig \
; fi
# Install Go
RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin:/usr/local/bin
# Install grpc compilers
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
# Install protoc (the version in 22.04 is too old, and grpc's bundled protoc
# would pull in a newer absl that breaks stablediffusion).
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT

View File

@@ -0,0 +1,209 @@
# Shared Python + accelerator base image.
#
# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
# by .github/workflows/base_images.yml and pushed to
# quay.io/go-skynet/ci-cache:base-image-<tag-stem>[-pr<N>]. Consumed by
# backend/Dockerfile.python via the BASE_IMAGE_PREBUILT build-arg.
# See .agents/ci-caching.md.
ARG BASE_IMAGE=ubuntu:24.04
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
FROM ${BASE_IMAGE}
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
LABEL org.opencontainers.image.description="LocalAI Python+accelerator base image"
LABEL org.localai.base.lang="python"
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
ccache \
ca-certificates \
espeak-ng \
curl \
libssl-dev \
git wget \
git-lfs \
unzip clang \
upx-ucl \
curl python3-pip \
python-is-python3 \
python3-dev llvm \
libnuma1 libgomp1 \
python3-venv make cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN <<EOT bash
if [ "${UBUNTU_VERSION}" = "2404" ]; then
pip install --break-system-packages --user --upgrade pip
else
pip install --upgrade pip
fi
EOT
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
else
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
fi
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get install -y nvpl
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
; fi
# Install uv as a system package
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
ENV PATH="/root/.cargo/bin:${PATH}"
# Increase timeout for uv installs behind slow networks
ENV UV_HTTP_TIMEOUT=180
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Install grpcio-tools (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${UBUNTU_VERSION}" = "2404" ]; then
pip install --break-system-packages --user grpcio-tools==1.71.0 grpcio==1.71.0
else
pip install grpcio-tools==1.71.0 grpcio==1.71.0
fi
EOT

View File

@@ -0,0 +1,47 @@
# Shared Rust base image for the kokoros backend.
#
# Built once per (ubuntu-version) by .github/workflows/base_images.yml and
# pushed to quay.io/go-skynet/ci-cache:base-image-<tag-stem>[-pr<N>]. The
# current rust matrix is CPU-only, so this base skips the GPU SDK stanzas;
# if a future rust backend needs cublas/rocm/etc., promote this recipe to
# mirror Dockerfile.python's GPU stack. See .agents/ci-caching.md.
ARG BASE_IMAGE=ubuntu:24.04
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
FROM ${BASE_IMAGE}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG UBUNTU_VERSION=2404
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
LABEL org.opencontainers.image.description="LocalAI Rust base image"
LABEL org.localai.base.lang="rust"
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
git ccache \
ca-certificates \
make cmake wget \
curl unzip \
clang \
pkg-config \
libssl-dev \
espeak-ng libespeak-ng-dev \
libsonic-dev libpcaudio-dev \
libopus-dev \
protobuf-compiler && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install Rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

12
.env
View File

@@ -26,21 +26,15 @@
## Disables COMPEL (Diffusers)
# COMPEL=0
## Disables SD_EMBED (Diffusers)
# SD_EMBED=0
## Enable/Disable single backend (useful if only one GPU is available)
# LOCALAI_SINGLE_ACTIVE_BACKEND=true
# Forces shutdown of the backends if busy (only if LOCALAI_SINGLE_ACTIVE_BACKEND is set)
# LOCALAI_FORCE_BACKEND_SHUTDOWN=true
## Specify a build type. Available: cublas, openblas, clblas.
## cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.
## OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.
## clBLAS: This is an open-source implementation of the BLAS library that uses OpenCL, a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. clBLAS is designed to take advantage of the parallel computing power of GPUs but can also run on any hardware that supports OpenCL. This includes hardware from different vendors like Nvidia, AMD, and Intel.
# BUILD_TYPE=openblas
## Uncomment and set to true to enable rebuilding from source
# REBUILD=true
## Path where to store generated images
# LOCALAI_IMAGE_PATH=/tmp/generated/images

View File

@@ -0,0 +1,100 @@
name: 'Configure apt mirror'
description: |
Reconfigure the GitHub Actions runner's Ubuntu apt sources to use an
alternate mirror, and emit the effective URLs as outputs so callers can
forward them as Docker build-args.
Two mirror profiles depending on where the runner lives, because the
best mirror differs by network:
* github-hosted runners run on Azure, so they default to the
Azure-hosted Ubuntu mirror (lowest latency, same VPC).
* self-hosted runners (arc-runner-set, bigger-runner, ...) typically
cannot route to azure.archive.ubuntu.com, so they default to the
kernel.org mirror, which is publicly reachable from anywhere.
Pass an empty string to either input to skip the rewrite for that
profile and keep upstream archive.ubuntu.com / ports.ubuntu.com.
inputs:
github-hosted-mirror:
description: 'archive/security mirror URL for github-hosted runners (empty = upstream)'
required: false
default: 'http://azure.archive.ubuntu.com'
github-hosted-ports-mirror:
description: 'ports.ubuntu.com mirror URL for github-hosted runners (empty = upstream)'
required: false
default: 'http://azure.ports.ubuntu.com'
self-hosted-mirror:
description: 'archive/security mirror URL for self-hosted runners (empty = upstream)'
required: false
# HTTP, not HTTPS: the bare ubuntu:24.04 builder image doesn't ship
# ca-certificates, so the very first apt-get update over TLS would
# fail with "No system certificates available" before it can install
# anything. apt validates package integrity via GPG signatures, so
# plain HTTP is safe for the archive itself.
default: 'http://mirrors.edge.kernel.org'
self-hosted-ports-mirror:
description: 'ports.ubuntu.com mirror URL for self-hosted runners (empty = upstream)'
required: false
# mirrors.edge.kernel.org does NOT carry /ubuntu-ports/ — only the
# main /ubuntu/ archive — so arm64 builds 404 there. Leave ports
# upstream by default. The original DDoS was on archive.ubuntu.com
# so ports.ubuntu.com remains the path of least surprise.
default: ''
outputs:
effective-mirror:
description: 'The mirror URL actually applied for this runner (or empty)'
value: ${{ steps.pick.outputs.mirror }}
effective-ports-mirror:
description: 'The ports mirror URL actually applied for this runner (or empty)'
value: ${{ steps.pick.outputs.ports-mirror }}
runs:
using: 'composite'
steps:
- name: Pick effective mirror for this runner
id: pick
shell: bash
env:
RUNNER_ENV: ${{ runner.environment }}
GH_MIRROR: ${{ inputs.github-hosted-mirror }}
GH_PORTS_MIRROR: ${{ inputs.github-hosted-ports-mirror }}
SH_MIRROR: ${{ inputs.self-hosted-mirror }}
SH_PORTS_MIRROR: ${{ inputs.self-hosted-ports-mirror }}
run: |
if [ "${RUNNER_ENV}" = "github-hosted" ]; then
MIRROR="${GH_MIRROR}"
PORTS_MIRROR="${GH_PORTS_MIRROR}"
else
MIRROR="${SH_MIRROR}"
PORTS_MIRROR="${SH_PORTS_MIRROR}"
fi
echo "configure-apt-mirror: runner=${RUNNER_ENV} mirror='${MIRROR}' ports-mirror='${PORTS_MIRROR}'"
echo "mirror=${MIRROR}" >> "$GITHUB_OUTPUT"
echo "ports-mirror=${PORTS_MIRROR}" >> "$GITHUB_OUTPUT"
- name: Rewrite apt sources
if: steps.pick.outputs.mirror != '' || steps.pick.outputs.ports-mirror != ''
shell: bash
env:
APT_MIRROR: ${{ steps.pick.outputs.mirror }}
APT_PORTS_MIRROR: ${{ steps.pick.outputs.ports-mirror }}
run: |
set -e
# Ubuntu 24.04 (noble) ships DEB822 sources at
# /etc/apt/sources.list.d/ubuntu.sources; older releases use
# /etc/apt/sources.list. Rewrite whichever exists.
for f in /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list; do
sudo test -f "$f" || continue
if [ -n "${APT_MIRROR}" ]; then
# Comma delimiter so the alternation pipe in the regex is not
# interpreted as the s/// separator.
sudo sed -i -E "s,https?://(archive\.ubuntu\.com|security\.ubuntu\.com),${APT_MIRROR},g" "$f"
fi
if [ -n "${APT_PORTS_MIRROR}" ]; then
sudo sed -i -E "s,https?://ports\.ubuntu\.com,${APT_PORTS_MIRROR},g" "$f"
fi
done
echo "Runner apt mirror configured (APT_MIRROR='${APT_MIRROR}', APT_PORTS_MIRROR='${APT_PORTS_MIRROR}')"

3164
.github/backend-matrix.yaml vendored Normal file
View File

File diff suppressed because it is too large Load Diff

45
.github/bump_vllm_wheel.sh vendored Executable file
View File

@@ -0,0 +1,45 @@
#!/bin/bash
# Bump the cublas13 vLLM wheel pin in requirements-cublas13-after.txt.
#
# vLLM's PyPI wheel is built against CUDA 12 so the cublas13 build pulls a
# cu130-flavoured wheel from vLLM's per-tag index at
# https://wheels.vllm.ai/<TAG>/cu130/. That URL segment is itself version-locked
# (no /latest/ alias upstream), so bumping vLLM means rewriting both the URL
# segment and the version constraint atomically. bump_deps.sh handles git-sha
# vars in Makefiles; this script handles the two-value rewrite specific to the
# vLLM requirements file.
set -xe
REPO=$1 # vllm-project/vllm
FILE=$2 # backend/python/vllm/requirements-cublas13-after.txt
VAR=$3 # VLLM_VERSION (used for output file names so the workflow can read them)
if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then
echo "usage: $0 <repo> <requirements-file> <var-name>" >&2
exit 1
fi
# /releases/latest returns the most recent non-prerelease tag.
LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \
"https://api.github.com/repos/$REPO/releases/latest" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])")
# Strip leading 'v' (vLLM tags are 'v0.20.0', the URL/version use '0.20.0').
NEW_VERSION="${LATEST_TAG#v}"
set +e
CURRENT_VERSION=$(grep -oE '^vllm==[0-9]+\.[0-9]+\.[0-9]+' "$FILE" | head -1 | cut -d= -f3)
set -e
# sed both lines unconditionally — peter-evans/create-pull-request opens no PR
# when the working tree is clean, so a no-op rewrite is safe.
sed -i "$FILE" \
-e "s|wheels\.vllm\.ai/[^/]*/cu130|wheels.vllm.ai/$NEW_VERSION/cu130|g" \
-e "s|^vllm==.*|vllm==$NEW_VERSION|"
if [ -z "$CURRENT_VERSION" ]; then
echo "Could not find vllm==X.Y.Z in $FILE."
exit 0
fi
echo "Changes: https://github.com/$REPO/compare/v${CURRENT_VERSION}...${LATEST_TAG}" >> "${VAR}_message.txt"
echo "${NEW_VERSION}" >> "${VAR}_commit.txt"

View File

@@ -1,288 +0,0 @@
package main
import (
"context"
"fmt"
"os"
"slices"
"strings"
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
cogito "github.com/mudler/cogito"
"github.com/mudler/cogito/structures"
"github.com/sashabaranov/go-openai/jsonschema"
)
var (
openAIModel = os.Getenv("OPENAI_MODEL")
openAIKey = os.Getenv("OPENAI_KEY")
openAIBaseURL = os.Getenv("OPENAI_BASE_URL")
galleryIndexPath = os.Getenv("GALLERY_INDEX_PATH")
//defaultclient
llm = cogito.NewOpenAILLM(openAIModel, openAIKey, openAIBaseURL)
)
// cleanTextContent removes trailing spaces, tabs, and normalizes line endings
// to prevent YAML linting issues like trailing spaces and multiple empty lines
func cleanTextContent(text string) string {
lines := strings.Split(text, "\n")
var cleanedLines []string
var prevEmpty bool
for _, line := range lines {
// Remove all trailing whitespace (spaces, tabs, etc.)
trimmed := strings.TrimRight(line, " \t\r")
// Avoid multiple consecutive empty lines
if trimmed == "" {
if !prevEmpty {
cleanedLines = append(cleanedLines, "")
}
prevEmpty = true
} else {
cleanedLines = append(cleanedLines, trimmed)
prevEmpty = false
}
}
// Remove trailing empty lines from the result
result := strings.Join(cleanedLines, "\n")
return strings.TrimRight(result, "\n")
}
// isModelExisting checks if a specific model ID exists in the gallery using text search
func isModelExisting(modelID string) (bool, error) {
indexPath := getGalleryIndexPath()
content, err := os.ReadFile(indexPath)
if err != nil {
return false, fmt.Errorf("failed to read %s: %w", indexPath, err)
}
contentStr := string(content)
// Simple text search - if the model ID appears anywhere in the file, it exists
return strings.Contains(contentStr, modelID), nil
}
// filterExistingModels removes models that already exist in the gallery
func filterExistingModels(models []ProcessedModel) ([]ProcessedModel, error) {
var filteredModels []ProcessedModel
for _, model := range models {
exists, err := isModelExisting(model.ModelID)
if err != nil {
fmt.Printf("Error checking if model %s exists: %v, skipping\n", model.ModelID, err)
continue
}
if !exists {
filteredModels = append(filteredModels, model)
} else {
fmt.Printf("Skipping existing model: %s\n", model.ModelID)
}
}
fmt.Printf("Filtered out %d existing models, %d new models remaining\n",
len(models)-len(filteredModels), len(filteredModels))
return filteredModels, nil
}
// getGalleryIndexPath returns the gallery index file path, with a default fallback
func getGalleryIndexPath() string {
if galleryIndexPath != "" {
return galleryIndexPath
}
return "gallery/index.yaml"
}
func getRealReadme(ctx context.Context, repository string) (string, error) {
// Create a conversation fragment
fragment := cogito.NewEmptyFragment().
AddMessage("user",
`Your task is to get a clear description of a large language model from huggingface by using the provided tool. I will share with you a repository that might be quantized, and as such probably not by the original model author. We need to get the real description of the model, and not the one that might be quantized. You will have to call the tool to get the readme more than once by figuring out from the quantized readme which is the base model readme. This is the repository: `+repository)
// Execute with tools
result, err := cogito.ExecuteTools(llm, fragment,
cogito.WithIterations(3),
cogito.WithMaxAttempts(3),
cogito.WithTools(&HFReadmeTool{client: hfapi.NewClient()}))
if err != nil {
return "", err
}
result = result.AddMessage("user", "Describe the model in a clear and concise way that can be shared in a model gallery.")
// Get a response
newFragment, err := llm.Ask(ctx, result)
if err != nil {
return "", err
}
content := newFragment.LastMessage().Content
return cleanTextContent(content), nil
}
func selectMostInterestingModels(ctx context.Context, searchResult *SearchResult) ([]ProcessedModel, error) {
// Create a conversation fragment
fragment := cogito.NewEmptyFragment().
AddMessage("user",
`Your task is to analyze a list of AI models and select the most interesting ones for a model gallery. You will be given detailed information about multiple models including their metadata, file information, and README content.
Consider the following criteria when selecting models:
1. Model popularity (download count)
2. Model recency (last modified date)
3. Model completeness (has preferred model file, README, etc.)
4. Model uniqueness (not duplicates or very similar models)
5. Model quality (based on README content and description)
6. Model utility (practical applications)
You should select models that would be most valuable for users browsing a model gallery. Prioritize models that are:
- Well-documented with clear READMEs
- Recently updated
- Popular (high download count)
- Have the preferred quantization format available
- Offer unique capabilities or are from reputable authors
Return your analysis and selection reasoning.`)
// Add the search results as context
modelsInfo := fmt.Sprintf("Found %d models matching '%s' with quantization preference '%s':\n\n",
searchResult.TotalModelsFound, searchResult.SearchTerm, searchResult.Quantization)
for i, model := range searchResult.Models {
modelsInfo += fmt.Sprintf("Model %d:\n", i+1)
modelsInfo += fmt.Sprintf(" ID: %s\n", model.ModelID)
modelsInfo += fmt.Sprintf(" Author: %s\n", model.Author)
modelsInfo += fmt.Sprintf(" Downloads: %d\n", model.Downloads)
modelsInfo += fmt.Sprintf(" Last Modified: %s\n", model.LastModified)
modelsInfo += fmt.Sprintf(" Files: %d files\n", len(model.Files))
if model.PreferredModelFile != nil {
modelsInfo += fmt.Sprintf(" Preferred Model File: %s (%d bytes)\n",
model.PreferredModelFile.Path, model.PreferredModelFile.Size)
} else {
modelsInfo += " No preferred model file found\n"
}
if model.ReadmeContent != "" {
modelsInfo += fmt.Sprintf(" README: %s\n", model.ReadmeContent)
}
if model.ProcessingError != "" {
modelsInfo += fmt.Sprintf(" Processing Error: %s\n", model.ProcessingError)
}
modelsInfo += "\n"
}
fragment = fragment.AddMessage("user", modelsInfo)
fragment = fragment.AddMessage("user", "Based on your analysis, select the top 5 most interesting models and provide a brief explanation for each selection. Also, create a filtered SearchResult with only the selected models. Return just a list of repositories IDs, you will later be asked to output it as a JSON array with the json tool.")
// Get a response
newFragment, err := llm.Ask(ctx, fragment)
if err != nil {
return nil, err
}
fmt.Println(newFragment.LastMessage().Content)
repositories := struct {
Repositories []string `json:"repositories"`
}{}
s := structures.Structure{
Schema: jsonschema.Definition{
Type: jsonschema.Object,
AdditionalProperties: false,
Properties: map[string]jsonschema.Definition{
"repositories": {
Type: jsonschema.Array,
Items: &jsonschema.Definition{Type: jsonschema.String},
Description: "The trending repositories IDs",
},
},
Required: []string{"repositories"},
},
Object: &repositories,
}
err = newFragment.ExtractStructure(ctx, llm, s)
if err != nil {
return nil, err
}
filteredModels := []ProcessedModel{}
for _, m := range searchResult.Models {
if slices.Contains(repositories.Repositories, m.ModelID) {
filteredModels = append(filteredModels, m)
}
}
return filteredModels, nil
}
// ModelFamily represents a YAML anchor/family
type ModelFamily struct {
Anchor string `json:"anchor"`
Name string `json:"name"`
}
// selectModelFamily selects the appropriate model family/anchor for a given model
func selectModelFamily(ctx context.Context, model ProcessedModel, availableFamilies []ModelFamily) (string, error) {
// Create a conversation fragment
fragment := cogito.NewEmptyFragment().
AddMessage("user",
`Your task is to select the most appropriate model family/anchor for a given AI model. You will be provided with:
1. Information about the model (name, description, etc.)
2. A list of available model families/anchors
You need to select the family that best matches the model's architecture, capabilities, or characteristics. Consider:
- Model architecture (e.g., Llama, Qwen, Mistral, etc.)
- Model capabilities (e.g., vision, coding, chat, etc.)
- Model size/type (e.g., small, medium, large)
- Model purpose (e.g., general purpose, specialized, etc.)
Return the anchor name that best fits the model.`)
// Add model information
modelInfo := "Model Information:\n"
modelInfo += fmt.Sprintf(" ID: %s\n", model.ModelID)
modelInfo += fmt.Sprintf(" Author: %s\n", model.Author)
modelInfo += fmt.Sprintf(" Downloads: %d\n", model.Downloads)
modelInfo += fmt.Sprintf(" Description: %s\n", model.ReadmeContentPreview)
fragment = fragment.AddMessage("user", modelInfo)
// Add available families
familiesInfo := "Available Model Families:\n"
for _, family := range availableFamilies {
familiesInfo += fmt.Sprintf(" - %s (%s)\n", family.Anchor, family.Name)
}
fragment = fragment.AddMessage("user", familiesInfo)
fragment = fragment.AddMessage("user", "Select the most appropriate family anchor for this model. Return just the anchor name.")
// Get a response
newFragment, err := llm.Ask(ctx, fragment)
if err != nil {
return "", err
}
// Extract the selected family
selectedFamily := strings.TrimSpace(newFragment.LastMessage().Content)
// Validate that the selected family exists in our list
for _, family := range availableFamilies {
if family.Anchor == selectedFamily {
return selectedFamily, nil
}
}
// If no exact match, try to find a close match
for _, family := range availableFamilies {
if strings.Contains(strings.ToLower(family.Anchor), strings.ToLower(selectedFamily)) ||
strings.Contains(strings.ToLower(selectedFamily), strings.ToLower(family.Anchor)) {
return family.Anchor, nil
}
}
// Default fallback
return "llama3", nil
}

View File

@@ -2,13 +2,61 @@ package main
import (
"context"
"encoding/json"
"fmt"
"os"
"strings"
"github.com/mudler/LocalAI/core/gallery/importers"
"sigs.k8s.io/yaml"
)
func formatTextContent(text string) string {
return formatTextContentWithIndent(text, 4, 6)
}
// formatTextContentWithIndent formats text content with specified base and list item indentation
func formatTextContentWithIndent(text string, baseIndent int, listItemIndent int) string {
var formattedLines []string
lines := strings.Split(text, "\n")
for _, line := range lines {
trimmed := strings.TrimRight(line, " \t\r")
if trimmed == "" {
// Keep empty lines as empty (no indentation)
formattedLines = append(formattedLines, "")
} else {
// Preserve relative indentation from yaml.Marshal output
// Count existing leading spaces to preserve relative structure
leadingSpaces := len(trimmed) - len(strings.TrimLeft(trimmed, " \t"))
trimmedStripped := strings.TrimLeft(trimmed, " \t")
var totalIndent int
if strings.HasPrefix(trimmedStripped, "-") {
// List items: use listItemIndent (ignore existing leading spaces)
totalIndent = listItemIndent
} else {
// Regular lines: use baseIndent + preserve relative indentation
// This handles both top-level keys (leadingSpaces=0) and nested properties (leadingSpaces>0)
totalIndent = baseIndent + leadingSpaces
}
indentStr := strings.Repeat(" ", totalIndent)
formattedLines = append(formattedLines, indentStr+trimmedStripped)
}
}
formattedText := strings.Join(formattedLines, "\n")
// Remove any trailing spaces from the formatted description
formattedText = strings.TrimRight(formattedText, " \t")
return formattedText
}
// generateYAMLEntry generates a YAML entry for a model using the specified anchor
func generateYAMLEntry(model ProcessedModel, familyAnchor string) string {
func generateYAMLEntry(model ProcessedModel, quantization string) string {
modelConfig, err := importers.DiscoverModelConfig("https://huggingface.co/"+model.ModelID, json.RawMessage(`{ "quantization": "`+quantization+`"}`))
if err != nil {
panic(err)
}
// Extract model name from ModelID
parts := strings.Split(model.ModelID, "/")
modelName := model.ModelID
@@ -22,18 +70,6 @@ func generateYAMLEntry(model ProcessedModel, familyAnchor string) string {
modelName = strings.ReplaceAll(modelName, "-q3_k_m", "")
modelName = strings.ReplaceAll(modelName, "-q2_k", "")
fileName := ""
checksum := ""
if model.PreferredModelFile != nil {
fileParts := strings.Split(model.PreferredModelFile.Path, "/")
if len(fileParts) > 0 {
fileName = fileParts[len(fileParts)-1]
}
checksum = model.PreferredModelFile.SHA256
} else {
fileName = model.ModelID
}
description := model.ReadmeContent
if description == "" {
description = fmt.Sprintf("AI model: %s", modelName)
@@ -41,142 +77,101 @@ func generateYAMLEntry(model ProcessedModel, familyAnchor string) string {
// Clean up description to prevent YAML linting issues
description = cleanTextContent(description)
formattedDescription := formatTextContent(description)
// Format description for YAML (indent each line and ensure no trailing spaces)
lines := strings.Split(description, "\n")
var formattedLines []string
for _, line := range lines {
if strings.TrimSpace(line) == "" {
// Keep empty lines as empty (no indentation)
formattedLines = append(formattedLines, "")
} else {
// Add indentation to non-empty lines
formattedLines = append(formattedLines, " "+line)
// Strip name and description from config file since they are
// already present at the gallery entry level and should not
// appear under overrides.
configFileContent := modelConfig.ConfigFile
var cfgMap map[string]any
if err := yaml.Unmarshal([]byte(configFileContent), &cfgMap); err == nil {
delete(cfgMap, "name")
delete(cfgMap, "description")
if cleaned, err := yaml.Marshal(cfgMap); err == nil {
configFileContent = string(cleaned)
}
}
formattedDescription := strings.Join(formattedLines, "\n")
// Remove any trailing spaces from the formatted description
formattedDescription = strings.TrimRight(formattedDescription, " \t")
configFile := formatTextContent(configFileContent)
filesYAML, _ := yaml.Marshal(modelConfig.Files)
// Files section: list items need 4 spaces (not 6), since files: is at 2 spaces
files := formatTextContentWithIndent(string(filesYAML), 4, 4)
// Build metadata sections
var metadataSections []string
// Add license if present
if model.License != "" {
metadataSections = append(metadataSections, fmt.Sprintf(` license: "%s"`, model.License))
}
// Add tags if present
if len(model.Tags) > 0 {
tagsYAML, _ := yaml.Marshal(model.Tags)
tagsFormatted := formatTextContentWithIndent(string(tagsYAML), 4, 4)
tagsFormatted = strings.TrimRight(tagsFormatted, "\n")
metadataSections = append(metadataSections, fmt.Sprintf(" tags:\n%s", tagsFormatted))
}
// Add icon if present
if model.Icon != "" {
metadataSections = append(metadataSections, fmt.Sprintf(` icon: %s`, model.Icon))
}
// Build the metadata block
metadataBlock := ""
if len(metadataSections) > 0 {
metadataBlock = strings.Join(metadataSections, "\n") + "\n"
}
yamlTemplate := ""
if checksum != "" {
yamlTemplate = `- !!merge <<: *%s
name: "%s"
yamlTemplate = `- name: "%s"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/%s
description: |
%s
%s%s
overrides:
parameters:
model: %s
%s
files:
- filename: %s
sha256: %s
uri: huggingface://%s/%s`
return fmt.Sprintf(yamlTemplate,
familyAnchor,
modelName,
model.ModelID,
formattedDescription,
fileName,
fileName,
checksum,
model.ModelID,
fileName,
)
} else {
yamlTemplate = `- !!merge <<: *%s
name: "%s"
urls:
- https://huggingface.co/%s
description: |
%s
overrides:
parameters:
model: %s`
return fmt.Sprintf(yamlTemplate,
familyAnchor,
modelName,
model.ModelID,
formattedDescription,
fileName,
)
%s`
// Trim trailing newlines from formatted sections to prevent extra blank lines
formattedDescription = strings.TrimRight(formattedDescription, "\n")
configFile = strings.TrimRight(configFile, "\n")
files = strings.TrimRight(files, "\n")
// Add newline before metadata block if present
if metadataBlock != "" {
metadataBlock = "\n" + strings.TrimRight(metadataBlock, "\n")
}
}
// extractModelFamilies extracts all YAML anchors from the gallery index.yaml file
func extractModelFamilies() ([]ModelFamily, error) {
// Read the index.yaml file
indexPath := getGalleryIndexPath()
content, err := os.ReadFile(indexPath)
if err != nil {
return nil, fmt.Errorf("failed to read %s: %w", indexPath, err)
}
lines := strings.Split(string(content), "\n")
var families []ModelFamily
for _, line := range lines {
line = strings.TrimSpace(line)
// Look for YAML anchors (lines starting with "- &")
if strings.HasPrefix(line, "- &") {
// Extract the anchor name (everything after "- &")
anchor := strings.TrimPrefix(line, "- &")
// Remove any trailing colon or other characters
anchor = strings.Split(anchor, ":")[0]
anchor = strings.Split(anchor, " ")[0]
if anchor != "" {
families = append(families, ModelFamily{
Anchor: anchor,
Name: anchor, // Use anchor as name for now
})
}
}
}
return families, nil
return fmt.Sprintf(yamlTemplate,
modelName,
model.ModelID,
formattedDescription,
metadataBlock,
configFile,
files,
)
}
// generateYAMLForModels generates YAML entries for selected models and appends to index.yaml
func generateYAMLForModels(ctx context.Context, models []ProcessedModel) error {
// Extract available model families
families, err := extractModelFamilies()
if err != nil {
return fmt.Errorf("failed to extract model families: %w", err)
}
fmt.Printf("Found %d model families: %v\n", len(families),
func() []string {
var names []string
for _, f := range families {
names = append(names, f.Anchor)
}
return names
}())
func generateYAMLForModels(ctx context.Context, models []ProcessedModel, quantization string) error {
// Generate YAML entries for each model
var yamlEntries []string
for _, model := range models {
fmt.Printf("Selecting family for model: %s\n", model.ModelID)
// Select appropriate family for this model
familyAnchor, err := selectModelFamily(ctx, model, families)
if err != nil {
fmt.Printf("Error selecting family for %s: %v, using default\n", model.ModelID, err)
familyAnchor = "llama3" // Default fallback
}
fmt.Printf("Selected family '%s' for model %s\n", familyAnchor, model.ModelID)
fmt.Printf("Generating YAML entry for model: %s\n", model.ModelID)
// Generate YAML entry
yamlEntry := generateYAMLEntry(model, familyAnchor)
yamlEntry := generateYAMLEntry(model, quantization)
yamlEntries = append(yamlEntries, yamlEntry)
}
// Append to index.yaml
// Prepend to index.yaml (write at the top)
if len(yamlEntries) > 0 {
indexPath := getGalleryIndexPath()
fmt.Printf("Appending YAML entries to %s...\n", indexPath)
fmt.Printf("Prepending YAML entries to %s...\n", indexPath)
// Read current content
content, err := os.ReadFile(indexPath)
@@ -184,11 +179,26 @@ func generateYAMLForModels(ctx context.Context, models []ProcessedModel) error {
return fmt.Errorf("failed to read %s: %w", indexPath, err)
}
// Append new entries
// Remove trailing whitespace from existing content and join entries without extra newlines
existingContent := strings.TrimRight(string(content), " \t\n\r")
existingContent := string(content)
yamlBlock := strings.Join(yamlEntries, "\n")
newContent := existingContent + "\n" + yamlBlock + "\n"
// Check if file starts with "---"
var newContent string
if strings.HasPrefix(existingContent, "---\n") {
// File starts with "---", prepend new entries after it
restOfContent := strings.TrimPrefix(existingContent, "---\n")
// Ensure proper spacing: "---\n" + new entries + "\n" + rest of content
newContent = "---\n" + yamlBlock + "\n" + restOfContent
} else if strings.HasPrefix(existingContent, "---") {
// File starts with "---" but no newline after
restOfContent := strings.TrimPrefix(existingContent, "---")
newContent = "---\n" + yamlBlock + "\n" + strings.TrimPrefix(restOfContent, "\n")
} else {
// No "---" at start, prepend new entries at the very beginning
// Trim leading whitespace from existing content
existingContent = strings.TrimLeft(existingContent, " \t\n\r")
newContent = yamlBlock + "\n" + existingContent
}
// Write back to file
err = os.WriteFile(indexPath, []byte(newContent), 0644)
@@ -196,7 +206,7 @@ func generateYAMLForModels(ctx context.Context, models []ProcessedModel) error {
return fmt.Errorf("failed to write %s: %w", indexPath, err)
}
fmt.Printf("Successfully added %d models to %s\n", len(yamlEntries), indexPath)
fmt.Printf("Successfully prepended %d models to %s\n", len(yamlEntries), indexPath)
}
return nil

301
.github/gallery-agent/helpers.go vendored Normal file
View File

@@ -0,0 +1,301 @@
package main
import (
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"regexp"
"strings"
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
"sigs.k8s.io/yaml"
)
var galleryIndexPath = os.Getenv("GALLERY_INDEX_PATH")
// getGalleryIndexPath returns the gallery index file path, with a default fallback
func getGalleryIndexPath() string {
if galleryIndexPath != "" {
return galleryIndexPath
}
return "gallery/index.yaml"
}
type galleryModel struct {
Name string `yaml:"name"`
Urls []string `yaml:"urls"`
}
// loadGalleryURLSet parses gallery/index.yaml once and returns the set of
// HuggingFace model URLs already present in the gallery.
func loadGalleryURLSet() (map[string]struct{}, error) {
indexPath := getGalleryIndexPath()
content, err := os.ReadFile(indexPath)
if err != nil {
return nil, fmt.Errorf("failed to read %s: %w", indexPath, err)
}
var galleryModels []galleryModel
if err := yaml.Unmarshal(content, &galleryModels); err != nil {
return nil, fmt.Errorf("failed to unmarshal %s: %w", indexPath, err)
}
set := make(map[string]struct{}, len(galleryModels))
for _, gm := range galleryModels {
for _, u := range gm.Urls {
set[u] = struct{}{}
}
}
// Also skip URLs already proposed in open (unmerged) gallery-agent PRs.
// The workflow injects these via EXTRA_SKIP_URLS so we don't keep
// re-proposing the same model every run while a PR is waiting to merge.
for _, line := range strings.FieldsFunc(os.Getenv("EXTRA_SKIP_URLS"), func(r rune) bool {
return r == '\n' || r == ',' || r == ' '
}) {
u := strings.TrimSpace(line)
if u != "" {
set[u] = struct{}{}
}
}
return set, nil
}
// modelAlreadyInGallery checks whether a HuggingFace model repo is already
// referenced in the gallery URL set.
func modelAlreadyInGallery(set map[string]struct{}, modelID string) bool {
_, ok := set["https://huggingface.co/"+modelID]
return ok
}
// baseModelFromTags returns the first `base_model:<repo>` value found in the
// tag list, or "" if none is present. HuggingFace surfaces the base model
// declared in the model card's YAML frontmatter as such a tag.
func baseModelFromTags(tags []string) string {
for _, t := range tags {
if strings.HasPrefix(t, "base_model:") {
return strings.TrimPrefix(t, "base_model:")
}
}
return ""
}
// licenseFromTags returns the `license:<id>` value from the tag list, or "".
func licenseFromTags(tags []string) string {
for _, t := range tags {
if strings.HasPrefix(t, "license:") {
return strings.TrimPrefix(t, "license:")
}
}
return ""
}
// curatedTags produces the gallery tag list from HuggingFace's raw tag set.
// Always includes llm + gguf, then adds whitelisted family / capability
// markers when they appear in the HF tag list.
func curatedTags(hfTags []string) []string {
whitelist := []string{
"gpu", "cpu",
"llama", "mistral", "mixtral", "qwen", "qwen2", "qwen3",
"gemma", "gemma2", "gemma3", "phi", "phi3", "phi4",
"deepseek", "yi", "falcon", "command-r",
"vision", "multimodal", "code", "chat",
"instruction-tuned", "reasoning", "thinking",
}
seen := map[string]struct{}{}
out := []string{"llm", "gguf"}
seen["llm"] = struct{}{}
seen["gguf"] = struct{}{}
hfSet := map[string]struct{}{}
for _, t := range hfTags {
hfSet[strings.ToLower(t)] = struct{}{}
}
for _, w := range whitelist {
if _, ok := hfSet[w]; ok {
if _, dup := seen[w]; !dup {
out = append(out, w)
seen[w] = struct{}{}
}
}
}
return out
}
// resolveReadme fetches a description-quality README for a (possibly
// quantized) repo: if a `base_model:` tag is present, fetch the base repo's
// README; otherwise fall back to the repo's own README.
func resolveReadme(client *hfapi.Client, modelID string, hfTags []string) (string, error) {
if base := baseModelFromTags(hfTags); base != "" && base != modelID {
if content, err := client.GetReadmeContent(base, "README.md"); err == nil && strings.TrimSpace(content) != "" {
return cleanTextContent(content), nil
}
}
content, err := client.GetReadmeContent(modelID, "README.md")
if err != nil {
return "", err
}
return cleanTextContent(content), nil
}
// extractDescription turns a raw HuggingFace README into a concise plain-text
// description suitable for embedding in gallery/index.yaml: strips YAML
// frontmatter, HTML tags/comments, markdown images, link URLs (keeping the
// link text), markdown tables, and then truncates at a paragraph boundary
// around ~1200 characters. Raw README should still be used for icon
// extraction — call this only for the `description:` field.
func extractDescription(readme string) string {
s := readme
// Strip leading YAML frontmatter: `---\n...\n---\n` at start of file.
if strings.HasPrefix(strings.TrimLeft(s, " \t\n"), "---") {
trimmed := strings.TrimLeft(s, " \t\n")
rest := strings.TrimPrefix(trimmed, "---")
if idx := strings.Index(rest, "\n---"); idx >= 0 {
after := rest[idx+len("\n---"):]
after = strings.TrimPrefix(after, "\n")
s = after
}
}
// Strip HTML comments and tags.
s = regexp.MustCompile(`(?s)<!--.*?-->`).ReplaceAllString(s, "")
s = regexp.MustCompile(`(?is)<[^>]+>`).ReplaceAllString(s, "")
// Strip markdown images entirely.
s = regexp.MustCompile(`!\[[^\]]*\]\([^)]*\)`).ReplaceAllString(s, "")
// Replace markdown links `[text](url)` with just `text`.
s = regexp.MustCompile(`\[([^\]]+)\]\([^)]+\)`).ReplaceAllString(s, "$1")
// Drop table lines and horizontal rules, and flatten all leading
// whitespace: generateYAMLEntry embeds this under a `description: |`
// literal block whose indentation is set by the first non-empty line.
// If any line has extra leading whitespace (e.g. from an indented
// `<p align="center">` block in the original README), YAML will pick
// that up as the block's indent and every later line at a smaller
// indent blows the block scalar. Stripping leading whitespace here
// guarantees uniform 4-space indentation after formatTextContent runs.
var kept []string
for _, line := range strings.Split(s, "\n") {
t := strings.TrimLeft(line, " \t")
ts := strings.TrimSpace(t)
if strings.HasPrefix(ts, "|") {
continue
}
if strings.HasPrefix(ts, ":--") || strings.HasPrefix(ts, "---") || strings.HasPrefix(ts, "===") {
continue
}
kept = append(kept, t)
}
s = strings.Join(kept, "\n")
// Normalise whitespace and drop any leading blank lines so the literal
// block in YAML doesn't start with a blank first line (which would
// break the indentation detector the same way).
s = cleanTextContent(s)
s = strings.TrimLeft(s, " \t\n")
// Truncate at a paragraph boundary around maxLen chars.
const maxLen = 1200
if len(s) > maxLen {
cut := strings.LastIndex(s[:maxLen], "\n\n")
if cut < maxLen/3 {
cut = maxLen
}
s = strings.TrimRight(s[:cut], " \t\n") + "\n\n..."
}
return s
}
// cleanTextContent removes trailing spaces/tabs and collapses multiple empty
// lines so README content embeds cleanly into YAML without lint noise.
func cleanTextContent(text string) string {
lines := strings.Split(text, "\n")
var cleaned []string
var prevEmpty bool
for _, line := range lines {
trimmed := strings.TrimRight(line, " \t\r")
if trimmed == "" {
if !prevEmpty {
cleaned = append(cleaned, "")
}
prevEmpty = true
} else {
cleaned = append(cleaned, trimmed)
prevEmpty = false
}
}
return strings.TrimRight(strings.Join(cleaned, "\n"), "\n")
}
// extractIconFromReadme scans README content for an image URL usable as a
// gallery entry icon.
func extractIconFromReadme(readmeContent string) string {
if readmeContent == "" {
return ""
}
markdownImageRegex := regexp.MustCompile(`(?i)!\[[^\]]*\]\(([^)]+\.(png|jpg|jpeg|svg|webp|gif))\)`)
htmlImageRegex := regexp.MustCompile(`(?i)<img[^>]+src=["']([^"']+\.(png|jpg|jpeg|svg|webp|gif))["']`)
plainImageRegex := regexp.MustCompile(`(?i)https?://[^\s<>"']+\.(png|jpg|jpeg|svg|webp|gif)`)
if m := markdownImageRegex.FindStringSubmatch(readmeContent); len(m) > 1 && strings.HasPrefix(strings.ToLower(m[1]), "http") {
return strings.TrimSpace(m[1])
}
if m := htmlImageRegex.FindStringSubmatch(readmeContent); len(m) > 1 && strings.HasPrefix(strings.ToLower(m[1]), "http") {
return strings.TrimSpace(m[1])
}
if m := plainImageRegex.FindStringSubmatch(readmeContent); len(m) > 0 && strings.HasPrefix(strings.ToLower(m[0]), "http") {
return strings.TrimSpace(m[0])
}
return ""
}
// getHuggingFaceAvatarURL returns the HF avatar URL for a user, or "".
func getHuggingFaceAvatarURL(author string) string {
if author == "" {
return ""
}
userURL := fmt.Sprintf("https://huggingface.co/api/users/%s/overview", author)
resp, err := http.Get(userURL)
if err != nil {
return ""
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return ""
}
body, err := io.ReadAll(resp.Body)
if err != nil {
return ""
}
var info map[string]any
if err := json.Unmarshal(body, &info); err != nil {
return ""
}
if v, ok := info["avatarUrl"].(string); ok && v != "" {
return v
}
if v, ok := info["avatar"].(string); ok && v != "" {
return v
}
return ""
}
// extractModelIcon extracts an icon URL from the README, falling back to the
// HuggingFace user avatar.
func extractModelIcon(model ProcessedModel) string {
if icon := extractIconFromReadme(model.ReadmeContent); icon != "" {
return icon
}
if model.Author != "" {
if avatar := getHuggingFaceAvatarURL(model.Author); avatar != "" {
return avatar
}
}
return ""
}

View File

@@ -6,7 +6,6 @@ import (
"fmt"
"os"
"strconv"
"strings"
"time"
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
@@ -34,16 +33,9 @@ type ProcessedModel struct {
ReadmeContentPreview string `json:"readme_content_preview,omitempty"`
QuantizationPreferences []string `json:"quantization_preferences"`
ProcessingError string `json:"processing_error,omitempty"`
}
// SearchResult represents the complete result of searching and processing models
type SearchResult struct {
SearchTerm string `json:"search_term"`
Limit int `json:"limit"`
Quantization string `json:"quantization"`
TotalModelsFound int `json:"total_models_found"`
Models []ProcessedModel `json:"models"`
FormattedOutput string `json:"formatted_output"`
Tags []string `json:"tags,omitempty"`
License string `json:"license,omitempty"`
Icon string `json:"icon,omitempty"`
}
// AddedModelSummary represents a summary of models added to the gallery
@@ -60,19 +52,16 @@ type AddedModelSummary struct {
func main() {
startTime := time.Now()
// Check for synthetic mode
syntheticMode := os.Getenv("SYNTHETIC_MODE")
if syntheticMode == "true" || syntheticMode == "1" {
// Synthetic mode for local testing
if sm := os.Getenv("SYNTHETIC_MODE"); sm == "true" || sm == "1" {
fmt.Println("Running in SYNTHETIC MODE - generating random test data")
err := runSyntheticMode()
if err != nil {
if err := runSyntheticMode(); err != nil {
fmt.Fprintf(os.Stderr, "Error in synthetic mode: %v\n", err)
os.Exit(1)
}
return
}
// Get configuration from environment variables
searchTerm := os.Getenv("SEARCH_TERM")
if searchTerm == "" {
searchTerm = "GGUF"
@@ -80,7 +69,7 @@ func main() {
limitStr := os.Getenv("LIMIT")
if limitStr == "" {
limitStr = "5"
limitStr = "15"
}
limit, err := strconv.Atoi(limitStr)
if err != nil {
@@ -89,258 +78,197 @@ func main() {
}
quantization := os.Getenv("QUANTIZATION")
maxModels := os.Getenv("MAX_MODELS")
if maxModels == "" {
maxModels = "1"
if quantization == "" {
quantization = "Q4_K_M"
}
maxModelsInt, err := strconv.Atoi(maxModels)
maxModelsStr := os.Getenv("MAX_MODELS")
if maxModelsStr == "" {
maxModelsStr = "1"
}
maxModels, err := strconv.Atoi(maxModelsStr)
if err != nil {
fmt.Fprintf(os.Stderr, "Error parsing MAX_MODELS: %v\n", err)
os.Exit(1)
}
// Print configuration
fmt.Printf("Gallery Agent Configuration:\n")
fmt.Printf(" Search Term: %s\n", searchTerm)
fmt.Printf(" Limit: %d\n", limit)
fmt.Printf(" Quantization: %s\n", quantization)
fmt.Printf(" Max Models to Add: %d\n", maxModelsInt)
fmt.Printf(" Gallery Index Path: %s\n", os.Getenv("GALLERY_INDEX_PATH"))
fmt.Printf(" Max Models to Add: %d\n", maxModels)
fmt.Printf(" Gallery Index Path: %s\n", getGalleryIndexPath())
fmt.Println()
result, err := searchAndProcessModels(searchTerm, limit, quantization)
// Phase 1: load current gallery and query HuggingFace.
gallerySet, err := loadGalleryURLSet()
if err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
fmt.Fprintf(os.Stderr, "Error loading gallery index: %v\n", err)
os.Exit(1)
}
fmt.Printf("Loaded %d existing gallery entries\n", len(gallerySet))
fmt.Println(result.FormattedOutput)
// Use AI agent to select the most interesting models
fmt.Println("Using AI agent to select the most interesting models...")
models, err := selectMostInterestingModels(context.Background(), result)
if err != nil {
fmt.Fprintf(os.Stderr, "Error in model selection: %v\n", err)
// Continue with original result if selection fails
models = result.Models
}
fmt.Print(models)
// Filter out models that already exist in the gallery
fmt.Println("Filtering out existing models...")
models, err = filterExistingModels(models)
if err != nil {
fmt.Fprintf(os.Stderr, "Error filtering existing models: %v\n", err)
os.Exit(1)
}
// Limit to maxModelsInt after filtering
if len(models) > maxModelsInt {
models = models[:maxModelsInt]
}
// Track added models for summary
var addedModelIDs []string
var addedModelURLs []string
// Generate YAML entries and append to gallery/index.yaml
if len(models) > 0 {
for _, model := range models {
addedModelIDs = append(addedModelIDs, model.ModelID)
// Generate Hugging Face URL for the model
modelURL := fmt.Sprintf("https://huggingface.co/%s", model.ModelID)
addedModelURLs = append(addedModelURLs, modelURL)
}
fmt.Println("Generating YAML entries for selected models...")
err = generateYAMLForModels(context.Background(), models)
if err != nil {
fmt.Fprintf(os.Stderr, "Error generating YAML entries: %v\n", err)
os.Exit(1)
}
} else {
fmt.Println("No new models to add to the gallery.")
}
// Create and write summary
processingTime := time.Since(startTime).String()
summary := AddedModelSummary{
SearchTerm: searchTerm,
TotalFound: result.TotalModelsFound,
ModelsAdded: len(addedModelIDs),
AddedModelIDs: addedModelIDs,
AddedModelURLs: addedModelURLs,
Quantization: quantization,
ProcessingTime: processingTime,
}
// Write summary to file
summaryData, err := json.MarshalIndent(summary, "", " ")
if err != nil {
fmt.Fprintf(os.Stderr, "Error marshaling summary: %v\n", err)
} else {
err = os.WriteFile("gallery-agent-summary.json", summaryData, 0644)
if err != nil {
fmt.Fprintf(os.Stderr, "Error writing summary file: %v\n", err)
} else {
fmt.Printf("Summary written to gallery-agent-summary.json\n")
}
}
}
func searchAndProcessModels(searchTerm string, limit int, quantization string) (*SearchResult, error) {
client := hfapi.NewClient()
var outputBuilder strings.Builder
fmt.Println("Searching for models...")
// Initialize the result struct
result := &SearchResult{
SearchTerm: searchTerm,
Limit: limit,
Quantization: quantization,
Models: []ProcessedModel{},
}
models, err := client.GetLatest(searchTerm, limit)
fmt.Println("Searching for trending models on HuggingFace...")
rawModels, err := client.GetTrending(searchTerm, limit)
if err != nil {
return nil, fmt.Errorf("failed to fetch models: %w", err)
fmt.Fprintf(os.Stderr, "Error fetching models: %v\n", err)
os.Exit(1)
}
fmt.Printf("Found %d trending models matching %q\n", len(rawModels), searchTerm)
totalFound := len(rawModels)
// Phase 2: drop anything already in the gallery *before* any expensive
// per-model work (GetModelDetails, README fetches, icon lookups).
fresh := rawModels[:0]
for _, m := range rawModels {
if modelAlreadyInGallery(gallerySet, m.ModelID) {
fmt.Printf("Skipping existing model: %s\n", m.ModelID)
continue
}
fresh = append(fresh, m)
}
fmt.Printf("%d candidates after gallery dedup\n", len(fresh))
// Phase 3: HuggingFace already returned these in trendingScore order —
// just cap to MAX_MODELS.
if len(fresh) > maxModels {
fresh = fresh[:maxModels]
}
if len(fresh) == 0 {
fmt.Println("No new models to add to the gallery.")
writeSummary(AddedModelSummary{
SearchTerm: searchTerm,
TotalFound: totalFound,
ModelsAdded: 0,
Quantization: quantization,
ProcessingTime: time.Since(startTime).String(),
})
return
}
fmt.Println("Models found:", len(models))
result.TotalModelsFound = len(models)
// Phase 4: fetch details and build ProcessedModel entries for survivors.
var processed []ProcessedModel
quantPrefs := []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K", "Q8_0"}
for _, m := range fresh {
fmt.Printf("Processing model: %s (downloads=%d)\n", m.ModelID, m.Downloads)
if len(models) == 0 {
outputBuilder.WriteString("No models found.\n")
result.FormattedOutput = outputBuilder.String()
return result, nil
}
outputBuilder.WriteString(fmt.Sprintf("Found %d models matching '%s':\n\n", len(models), searchTerm))
// Process each model
for i, model := range models {
outputBuilder.WriteString(fmt.Sprintf("%d. Processing Model: %s\n", i+1, model.ModelID))
outputBuilder.WriteString(fmt.Sprintf(" Author: %s\n", model.Author))
outputBuilder.WriteString(fmt.Sprintf(" Downloads: %d\n", model.Downloads))
outputBuilder.WriteString(fmt.Sprintf(" Last Modified: %s\n", model.LastModified))
// Initialize processed model struct
processedModel := ProcessedModel{
ModelID: model.ModelID,
Author: model.Author,
Downloads: model.Downloads,
LastModified: model.LastModified,
QuantizationPreferences: []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"},
pm := ProcessedModel{
ModelID: m.ModelID,
Author: m.Author,
Downloads: m.Downloads,
LastModified: m.LastModified,
QuantizationPreferences: quantPrefs,
}
// Get detailed model information
details, err := client.GetModelDetails(model.ModelID)
details, err := client.GetModelDetails(m.ModelID)
if err != nil {
errorMsg := fmt.Sprintf(" Error getting model details: %v\n", err)
outputBuilder.WriteString(errorMsg)
processedModel.ProcessingError = err.Error()
result.Models = append(result.Models, processedModel)
fmt.Printf(" Error getting model details: %v (skipping)\n", err)
continue
}
// Define quantization preferences (in order of preference)
quantizationPreferences := []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"}
preferred := hfapi.FindPreferredModelFile(details.Files, quantPrefs)
if preferred == nil {
fmt.Printf(" No GGUF file matching %v — skipping\n", quantPrefs)
continue
}
// Find preferred model file
preferredModelFile := hfapi.FindPreferredModelFile(details.Files, quantizationPreferences)
// Process files
processedFiles := make([]ProcessedModelFile, len(details.Files))
for j, file := range details.Files {
pm.Files = make([]ProcessedModelFile, len(details.Files))
for j, f := range details.Files {
fileType := "other"
if file.IsReadme {
if f.IsReadme {
fileType = "readme"
} else if preferredModelFile != nil && file.Path == preferredModelFile.Path {
} else if f.Path == preferred.Path {
fileType = "model"
}
processedFiles[j] = ProcessedModelFile{
Path: file.Path,
Size: file.Size,
SHA256: file.SHA256,
IsReadme: file.IsReadme,
pm.Files[j] = ProcessedModelFile{
Path: f.Path,
Size: f.Size,
SHA256: f.SHA256,
IsReadme: f.IsReadme,
FileType: fileType,
}
}
processedModel.Files = processedFiles
// Set preferred model file
if preferredModelFile != nil {
for _, file := range processedFiles {
if file.Path == preferredModelFile.Path {
processedModel.PreferredModelFile = &file
break
}
if f.Path == preferred.Path {
copyFile := pm.Files[j]
pm.PreferredModelFile = &copyFile
}
if f.IsReadme {
copyFile := pm.Files[j]
pm.ReadmeFile = &copyFile
}
}
// Print file information
outputBuilder.WriteString(fmt.Sprintf(" Files found: %d\n", len(details.Files)))
// Deterministic README resolution: follow base_model tag if set.
// Keep the raw (HTML-bearing) README around while we extract the
// icon, then strip it down to a plain-text description for the
// `description:` YAML field.
readme, err := resolveReadme(client, m.ModelID, m.Tags)
if err != nil {
fmt.Printf(" Warning: failed to fetch README: %v\n", err)
}
pm.ReadmeContent = readme
if preferredModelFile != nil {
outputBuilder.WriteString(fmt.Sprintf(" Preferred Model File: %s (SHA256: %s)\n",
preferredModelFile.Path,
preferredModelFile.SHA256))
} else {
outputBuilder.WriteString(fmt.Sprintf(" No model file found with quantization preferences: %v\n", quantizationPreferences))
pm.License = licenseFromTags(m.Tags)
pm.Tags = curatedTags(m.Tags)
pm.Icon = extractModelIcon(pm)
if pm.ReadmeContent != "" {
pm.ReadmeContent = extractDescription(pm.ReadmeContent)
pm.ReadmeContentPreview = truncateString(pm.ReadmeContent, 200)
}
if details.ReadmeFile != nil {
outputBuilder.WriteString(fmt.Sprintf(" README File: %s\n", details.ReadmeFile.Path))
// Find and set readme file
for _, file := range processedFiles {
if file.IsReadme {
processedModel.ReadmeFile = &file
break
}
}
fmt.Println("Getting real readme for", model.ModelID, "waiting...")
// Use agent to get the real readme and prepare the model description
readmeContent, err := getRealReadme(context.Background(), model.ModelID)
if err == nil {
processedModel.ReadmeContent = readmeContent
processedModel.ReadmeContentPreview = truncateString(readmeContent, 200)
outputBuilder.WriteString(fmt.Sprintf(" README Content Preview: %s\n",
processedModel.ReadmeContentPreview))
} else {
continue
}
fmt.Println("Real readme got", readmeContent)
// Get README content
// readmeContent, err := client.GetReadmeContent(model.ModelID, details.ReadmeFile.Path)
// if err == nil {
// processedModel.ReadmeContent = readmeContent
// processedModel.ReadmeContentPreview = truncateString(readmeContent, 200)
// outputBuilder.WriteString(fmt.Sprintf(" README Content Preview: %s\n",
// processedModel.ReadmeContentPreview))
// }
}
// Print all files with their checksums
outputBuilder.WriteString(" All Files:\n")
for _, file := range processedFiles {
outputBuilder.WriteString(fmt.Sprintf(" - %s (%s, %d bytes", file.Path, file.FileType, file.Size))
if file.SHA256 != "" {
outputBuilder.WriteString(fmt.Sprintf(", SHA256: %s", file.SHA256))
}
outputBuilder.WriteString(")\n")
}
outputBuilder.WriteString("\n")
result.Models = append(result.Models, processedModel)
fmt.Printf(" License: %s, Tags: %v, Icon: %s\n", pm.License, pm.Tags, pm.Icon)
processed = append(processed, pm)
}
result.FormattedOutput = outputBuilder.String()
return result, nil
if len(processed) == 0 {
fmt.Println("No processable models after detail fetch.")
writeSummary(AddedModelSummary{
SearchTerm: searchTerm,
TotalFound: totalFound,
ModelsAdded: 0,
Quantization: quantization,
ProcessingTime: time.Since(startTime).String(),
})
return
}
// Phase 5: write YAML entries.
var addedIDs, addedURLs []string
for _, pm := range processed {
addedIDs = append(addedIDs, pm.ModelID)
addedURLs = append(addedURLs, "https://huggingface.co/"+pm.ModelID)
}
fmt.Println("Generating YAML entries for selected models...")
if err := generateYAMLForModels(context.Background(), processed, quantization); err != nil {
fmt.Fprintf(os.Stderr, "Error generating YAML entries: %v\n", err)
os.Exit(1)
}
writeSummary(AddedModelSummary{
SearchTerm: searchTerm,
TotalFound: totalFound,
ModelsAdded: len(addedIDs),
AddedModelIDs: addedIDs,
AddedModelURLs: addedURLs,
Quantization: quantization,
ProcessingTime: time.Since(startTime).String(),
})
}
func writeSummary(summary AddedModelSummary) {
data, err := json.MarshalIndent(summary, "", " ")
if err != nil {
fmt.Fprintf(os.Stderr, "Error marshaling summary: %v\n", err)
return
}
if err := os.WriteFile("gallery-agent-summary.json", data, 0644); err != nil {
fmt.Fprintf(os.Stderr, "Error writing summary file: %v\n", err)
return
}
fmt.Println("Summary written to gallery-agent-summary.json")
}
func truncateString(s string, maxLen int) string {
@@ -349,3 +277,4 @@ func truncateString(s string, maxLen int) string {
}
return s[:maxLen] + "..."
}

View File

@@ -3,7 +3,7 @@ package main
import (
"context"
"fmt"
"math/rand"
"math/rand/v2"
"strings"
"time"
)
@@ -13,11 +13,11 @@ func runSyntheticMode() error {
generator := NewSyntheticDataGenerator()
// Generate a random number of synthetic models (1-3)
numModels := generator.rand.Intn(3) + 1
numModels := generator.rand.IntN(3) + 1
fmt.Printf("Generating %d synthetic models for testing...\n", numModels)
var models []ProcessedModel
for i := 0; i < numModels; i++ {
for range numModels {
model := generator.GenerateProcessedModel()
models = append(models, model)
fmt.Printf("Generated synthetic model: %s\n", model.ModelID)
@@ -25,7 +25,7 @@ func runSyntheticMode() error {
// Generate YAML entries and append to gallery/index.yaml
fmt.Println("Generating YAML entries for synthetic models...")
err := generateYAMLForModels(context.Background(), models)
err := generateYAMLForModels(context.Background(), models, "Q4_K_M")
if err != nil {
return fmt.Errorf("error generating YAML entries: %w", err)
}
@@ -42,14 +42,14 @@ type SyntheticDataGenerator struct {
// NewSyntheticDataGenerator creates a new synthetic data generator
func NewSyntheticDataGenerator() *SyntheticDataGenerator {
return &SyntheticDataGenerator{
rand: rand.New(rand.NewSource(time.Now().UnixNano())),
rand: rand.New(rand.NewPCG(uint64(time.Now().UnixNano()), 0)),
}
}
// GenerateProcessedModelFile creates a synthetic ProcessedModelFile
func (g *SyntheticDataGenerator) GenerateProcessedModelFile() ProcessedModelFile {
fileTypes := []string{"model", "readme", "other"}
fileType := fileTypes[g.rand.Intn(len(fileTypes))]
fileType := fileTypes[g.rand.IntN(len(fileTypes))]
var path string
var isReadme bool
@@ -68,7 +68,7 @@ func (g *SyntheticDataGenerator) GenerateProcessedModelFile() ProcessedModelFile
return ProcessedModelFile{
Path: path,
Size: int64(g.rand.Intn(1000000000) + 1000000), // 1MB to 1GB
Size: int64(g.rand.IntN(1000000000) + 1000000), // 1MB to 1GB
SHA256: g.randomSHA256(),
IsReadme: isReadme,
FileType: fileType,
@@ -80,19 +80,19 @@ func (g *SyntheticDataGenerator) GenerateProcessedModel() ProcessedModel {
authors := []string{"microsoft", "meta", "google", "openai", "anthropic", "mistralai", "huggingface"}
modelNames := []string{"llama", "gpt", "claude", "mistral", "gemma", "phi", "qwen", "codellama"}
author := authors[g.rand.Intn(len(authors))]
modelName := modelNames[g.rand.Intn(len(modelNames))]
author := authors[g.rand.IntN(len(authors))]
modelName := modelNames[g.rand.IntN(len(modelNames))]
modelID := fmt.Sprintf("%s/%s-%s", author, modelName, g.randomString(6))
// Generate files
numFiles := g.rand.Intn(5) + 2 // 2-6 files
numFiles := g.rand.IntN(5) + 2 // 2-6 files
files := make([]ProcessedModelFile, numFiles)
// Ensure at least one model file and one readme
hasModelFile := false
hasReadme := false
for i := 0; i < numFiles; i++ {
for i := range numFiles {
files[i] = g.GenerateProcessedModelFile()
if files[i].FileType == "model" {
hasModelFile = true
@@ -138,10 +138,29 @@ func (g *SyntheticDataGenerator) GenerateProcessedModel() ProcessedModel {
readmeContent := g.generateReadmeContent(modelName, author)
// Generate sample metadata
licenses := []string{"apache-2.0", "mit", "llama2", "gpl-3.0", "bsd", ""}
license := licenses[g.rand.IntN(len(licenses))]
sampleTags := []string{"llm", "gguf", "gpu", "cpu", "text-to-text", "chat", "instruction-tuned"}
numTags := g.rand.IntN(4) + 3 // 3-6 tags
tags := make([]string, numTags)
for i := range numTags {
tags[i] = sampleTags[g.rand.IntN(len(sampleTags))]
}
// Remove duplicates
tags = g.removeDuplicates(tags)
// Optionally include icon (50% chance)
icon := ""
if g.rand.IntN(2) == 0 {
icon = fmt.Sprintf("https://cdn-avatars.huggingface.co/v1/production/uploads/%s.png", g.randomString(24))
}
return ProcessedModel{
ModelID: modelID,
Author: author,
Downloads: g.rand.Intn(1000000) + 1000,
Downloads: g.rand.IntN(1000000) + 1000,
LastModified: g.randomDate(),
Files: files,
PreferredModelFile: preferredModelFile,
@@ -150,6 +169,9 @@ func (g *SyntheticDataGenerator) GenerateProcessedModel() ProcessedModel {
ReadmeContentPreview: truncateString(readmeContent, 200),
QuantizationPreferences: []string{"Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"},
ProcessingError: "",
Tags: tags,
License: license,
Icon: icon,
}
}
@@ -158,7 +180,7 @@ func (g *SyntheticDataGenerator) randomString(length int) string {
const charset = "abcdefghijklmnopqrstuvwxyz0123456789"
b := make([]byte, length)
for i := range b {
b[i] = charset[g.rand.Intn(len(charset))]
b[i] = charset[g.rand.IntN(len(charset))]
}
return string(b)
}
@@ -167,18 +189,30 @@ func (g *SyntheticDataGenerator) randomSHA256() string {
const charset = "0123456789abcdef"
b := make([]byte, 64)
for i := range b {
b[i] = charset[g.rand.Intn(len(charset))]
b[i] = charset[g.rand.IntN(len(charset))]
}
return string(b)
}
func (g *SyntheticDataGenerator) randomDate() string {
now := time.Now()
daysAgo := g.rand.Intn(365) // Random date within last year
daysAgo := g.rand.IntN(365) // Random date within last year
pastDate := now.AddDate(0, 0, -daysAgo)
return pastDate.Format("2006-01-02T15:04:05.000Z")
}
func (g *SyntheticDataGenerator) removeDuplicates(slice []string) []string {
keys := make(map[string]bool)
result := []string{}
for _, item := range slice {
if !keys[item] {
keys[item] = true
result = append(result, item)
}
}
return result
}
func (g *SyntheticDataGenerator) generateReadmeContent(modelName, author string) string {
templates := []string{
fmt.Sprintf("# %s Model\n\nThis is a %s model developed by %s. It's designed for various natural language processing tasks including text generation, question answering, and conversation.\n\n## Features\n\n- High-quality text generation\n- Efficient inference\n- Multiple quantization options\n- Easy to use with LocalAI\n\n## Usage\n\nUse this model with LocalAI for various AI tasks.", strings.Title(modelName), modelName, author),
@@ -186,5 +220,5 @@ func (g *SyntheticDataGenerator) generateReadmeContent(modelName, author string)
fmt.Sprintf("# %s Language Model\n\nDeveloped by %s, this model represents state-of-the-art performance in natural language understanding and generation.\n\n## Key Features\n\n- Multilingual support\n- Context-aware responses\n- Efficient memory usage\n- Fast inference speed\n\n## Applications\n\n- Chatbots and virtual assistants\n- Content generation\n- Code completion\n- Educational tools", strings.Title(modelName), author),
}
return templates[g.rand.Intn(len(templates))]
return templates[g.rand.IntN(len(templates))]
}

View File

@@ -1,46 +0,0 @@
package main
import (
"fmt"
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
openai "github.com/sashabaranov/go-openai"
jsonschema "github.com/sashabaranov/go-openai/jsonschema"
)
// Get repository README from HF
type HFReadmeTool struct {
client *hfapi.Client
}
func (s *HFReadmeTool) Execute(args map[string]any) (string, error) {
q, ok := args["repository"].(string)
if !ok {
return "", fmt.Errorf("no query")
}
readme, err := s.client.GetReadmeContent(q, "README.md")
if err != nil {
return "", err
}
return readme, nil
}
func (s *HFReadmeTool) Tool() openai.Tool {
return openai.Tool{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "hf_readme",
Description: "A tool to get the README content of a huggingface repository",
Parameters: jsonschema.Definition{
Type: jsonschema.Object,
Properties: map[string]jsonschema.Definition{
"repository": {
Type: jsonschema.String,
Description: "The huggingface repository to get the README content of",
},
},
Required: []string{"repository"},
},
},
}
}

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
---
name: 'build python backend container images (reusable)'
name: 'build backend container images (reusable)'
on:
workflow_call:
@@ -53,6 +53,26 @@ on:
description: 'Skip drivers'
default: 'false'
type: string
ubuntu-version:
description: 'Ubuntu version'
required: false
default: '2204'
type: string
amdgpu-targets:
description: 'AMD GPU targets for ROCm/HIP builds'
required: false
default: ''
type: string
base-image-prebuilt:
description: |
Optional reference to a prebuilt accel/lang base image
(quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]). When
set, the backend Dockerfile FROMs this image instead of running
an inline bootstrap. See .github/workflows/base_images.yml and
.agents/ci-caching.md.
required: false
default: ''
type: string
secrets:
dockerUsername:
required: false
@@ -70,6 +90,14 @@ jobs:
quay_username: ${{ secrets.quayUsername }}
steps:
- name: Checkout
uses: actions/checkout@v6
with:
submodules: true
- name: Configure apt mirror on runner
id: apt_mirror
uses: ./.github/actions/configure-apt-mirror
- name: Free Disk Space (Ubuntu)
if: inputs.runs-on == 'ubuntu-latest'
@@ -87,18 +115,6 @@ jobs:
docker-images: true
swap-storage: true
- name: Force Install GIT latest
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git
- name: Checkout
uses: actions/checkout@v5
- name: Release space from worker
if: inputs.runs-on == 'ubuntu-latest'
run: |
@@ -144,7 +160,7 @@ jobs:
- name: Docker meta
id: meta
if: github.event_name != 'pull_request'
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
quay.io/go-skynet/local-ai-backends
@@ -160,7 +176,7 @@ jobs:
- name: Docker meta for PR
id: meta_pull_request
if: github.event_name == 'pull_request'
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
quay.io/go-skynet/ci-tests
@@ -183,21 +199,30 @@ jobs:
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
username: ${{ secrets.dockerUsername }}
password: ${{ secrets.dockerPassword }}
- name: Login to Quay.io
if: ${{ env.quay_username != '' }}
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: quay.io
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
# Weekly cache-buster for the per-backend `make` step. Most Python
# backends list unpinned deps (torch, transformers, vllm, ...), so a
# warm cache freezes upstream versions indefinitely. Rolling this
# weekly forces a re-resolve of the install layer at most once per
# week, picking up newer wheels without a full cold rebuild.
- name: Compute deps refresh key
id: deps_refresh
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Build and push
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
if: github.event_name != 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
@@ -208,16 +233,23 @@ jobs:
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Build and push (PR)
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
if: github.event_name == 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
@@ -228,9 +260,15 @@ jobs:
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
BACKEND=${{ inputs.backend }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
context: ${{ inputs.context }}
file: ${{ inputs.dockerfile }}
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
push: ${{ env.quay_username != '' }}
tags: ${{ steps.meta_pull_request.outputs.tags }}

View File

@@ -48,9 +48,16 @@ jobs:
strategy:
matrix:
go-version: ['${{ inputs.go-version }}']
env:
# Keep the brew Cellar stable across cache restores. Without these,
# `brew install` would auto-update brew itself and re-link formulas,
# mutating the very paths the cache just restored.
HOMEBREW_NO_AUTO_UPDATE: '1'
HOMEBREW_NO_INSTALL_CLEANUP: '1'
HOMEBREW_NO_ANALYTICS: '1'
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
@@ -58,23 +65,143 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
# Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
# Shared across every darwin matrix entry — first job in a run warms
# it, the rest hit warm.
cache: true
# You can test your matrix by printing the current Go version
- name: Display Go version
run: go version
# ---- Homebrew cache ----
# macOS runners have no Docker daemon, so the BuildKit registry cache used
# for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
# We cache the brew downloads + Cellar entries for the formulas we install
# below. Read on every run, write only on master/tag pushes — same policy
# as the Linux registry cache.
- name: Restore Homebrew cache
id: brew-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
# ccache is always installed (used by the llama-cpp variant build) so
# the brew cache content stays stable across every backend in the
# matrix — they all share one cache key.
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
- name: Save Homebrew cache
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/Homebrew/downloads
/opt/homebrew/Cellar/protobuf
/opt/homebrew/Cellar/grpc
/opt/homebrew/Cellar/protoc-gen-go
/opt/homebrew/Cellar/protoc-gen-go-grpc
/opt/homebrew/Cellar/libomp
/opt/homebrew/Cellar/llvm
/opt/homebrew/Cellar/ccache
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
# ---- ccache for llama.cpp CMake builds ----
# Three CMake variants (fallback, grpc, rpc-server) compile the same
# llama.cpp source tree with overlapping flags — ccache dedupes object
# files across them. Key on the pinned LLAMA_VERSION so a pin bump
# invalidates cleanly; restore-keys fall back to the latest entry for the
# same pin so unchanged TUs stay warm even when the cache is fresh.
- name: Compute llama.cpp version
if: inputs.backend == 'llama-cpp'
id: llama-version
run: |
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
echo "version=${version}" >> "$GITHUB_OUTPUT"
- name: Restore ccache
if: inputs.backend == 'llama-cpp'
id: ccache-cache
uses: actions/cache/restore@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
restore-keys: |
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
- name: Configure ccache
if: inputs.backend == 'llama-cpp'
run: |
mkdir -p "$HOME/Library/Caches/ccache"
ccache -M 2G
ccache -z
# llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
{
echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
} >> "$GITHUB_ENV"
# ---- Python wheel cache (uv + pip) ----
# Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
# ISO-week segment of the cache key forces at most one cold rebuild per
# backend per week, automatically picking up newer wheels for unpinned
# deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
# recent build of the same backend so off-week PRs still hit warm.
- name: Compute weekly cache bucket
if: inputs.lang == 'python'
id: weekly
run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
- name: Restore Python wheel cache
if: inputs.lang == 'python'
id: pyenv-cache
uses: actions/cache/restore@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
restore-keys: |
pyenv-darwin-${{ inputs.backend }}-
- name: Build ${{ inputs.backend }}-darwin
run: |
make protogen-go
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
- name: ccache stats
if: inputs.backend == 'llama-cpp'
run: ccache -s
- name: Save ccache
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
uses: actions/cache/save@v4
with:
path: ~/Library/Caches/ccache
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
- name: Save Python wheel cache
if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
uses: actions/cache/save@v4
with:
path: |
~/Library/Caches/pip
~/Library/Caches/uv
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
- name: Upload ${{ inputs.backend }}.tar
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: ${{ inputs.backend }}-tar
path: backend-images/${{ inputs.backend }}.tar
@@ -85,7 +212,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download ${{ inputs.backend }}.tar
uses: actions/download-artifact@v6
uses: actions/download-artifact@v8
with:
name: ${{ inputs.backend }}-tar
path: .
@@ -105,7 +232,7 @@ jobs:
- name: Docker meta
id: meta
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
localai/localai-backends
@@ -119,7 +246,7 @@ jobs:
- name: Docker meta
id: quaymeta
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
quay.io/go-skynet/local-ai-backends

View File

@@ -13,11 +13,13 @@ jobs:
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
matrix-darwin: ${{ steps.set-matrix.outputs.matrix-darwin }}
bases-matrix: ${{ steps.set-matrix.outputs.bases-matrix }}
has-backends: ${{ steps.set-matrix.outputs.has-backends }}
has-backends-darwin: ${{ steps.set-matrix.outputs.has-backends-darwin }}
has-bases: ${{ steps.set-matrix.outputs.has-bases }}
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Setup Bun
uses: oven-sh/setup-bun@v2
@@ -27,7 +29,8 @@ jobs:
bun add js-yaml
bun add @octokit/core
# filters the matrix in backend.yml
# Filters the matrix from backend.yml against this PR's changed files
# AND derives the deduplicated bases-matrix consumed by build-bases.
- name: Filter matrix for changed backends
id: set-matrix
env:
@@ -35,10 +38,34 @@ jobs:
GITHUB_EVENT_PATH: ${{ github.event_path }}
run: bun run scripts/changed-backends.js
backend-jobs:
build-bases:
needs: generate-matrix
if: needs.generate-matrix.outputs.has-bases == 'true'
strategy:
fail-fast: false
matrix: ${{ fromJSON(needs.generate-matrix.outputs.bases-matrix) }}
uses: ./.github/workflows/base_images.yml
with:
lang: ${{ matrix.lang }}
base-image: ${{ matrix.base-image }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
ubuntu-version: ${{ matrix.ubuntu-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
tag-stem: ${{ matrix.tag-stem }}
skip-drivers: ${{ matrix.skip-drivers }}
secrets:
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
backend-jobs:
needs: [generate-matrix, build-bases]
uses: ./.github/workflows/backend_build.yml
if: needs.generate-matrix.outputs.has-backends == 'true'
if: |
always() && needs.generate-matrix.outputs.has-backends == 'true' &&
(needs.build-bases.result == 'success' || needs.build-bases.result == 'skipped')
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
@@ -52,12 +79,19 @@ jobs:
dockerfile: ${{ matrix.dockerfile }}
skip-drivers: ${{ matrix.skip-drivers }}
context: ${{ matrix.context }}
ubuntu-version: ${{ matrix.ubuntu-version }}
amdgpu-targets: ${{ matrix.amdgpu-targets || 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201' }}
# The script annotates each filtered Python entry with the prebuilt
# base ref it should consume; non-Python entries get '' and run their
# own inline bootstrap.
base-image-prebuilt: ${{ matrix.base-image-prebuilt || '' }}
secrets:
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
fail-fast: true
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
backend-jobs-darwin:
needs: generate-matrix
uses: ./.github/workflows/backend_build_darwin.yml
@@ -69,7 +103,7 @@ jobs:
tag-suffix: ${{ matrix.tag-suffix }}
lang: ${{ matrix.lang || 'python' }}
use-pip: ${{ matrix.backend == 'diffusers' }}
runs-on: "macOS-14"
runs-on: "macos-latest"
secrets:
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}

152
.github/workflows/base_images.yml vendored Normal file
View File

@@ -0,0 +1,152 @@
---
name: 'build base image (reusable)'
# Builds and pushes one (lang, accel, arch, ubuntu, cuda) base image flavour
# to quay.io/go-skynet/ci-cache:base-image-<stem>[-pr<N>]. Consumed by
# backend builds via the BASE_IMAGE_PREBUILT build-arg. PR builds tag with
# `-pr${PR_NUMBER}` so the same PR's backend matrix can opt-in to the
# freshly-built base; master builds overwrite the unsuffixed tag for
# downstream consumption. The image lives in the same ci-cache repo as the
# buildkit cache (under a `base-image-` prefix that doesn't collide with
# the `base-<stem>` cache prefix), so no separate quay repo + grant is
# needed. See .agents/ci-caching.md for the full tagging scheme.
on:
workflow_call:
inputs:
lang:
description: 'Language toolchain (matches .docker/bases/Dockerfile.<lang>)'
required: true
type: string
base-image:
description: 'Upstream base image (ubuntu:24.04, rocm/dev-ubuntu-24.04:..., etc.)'
required: true
type: string
build-type:
description: 'BUILD_TYPE: empty for CPU, cublas, hipblas, vulkan, l4t, ...'
default: ''
type: string
cuda-major-version:
description: 'CUDA major version (only meaningful for cublas/l4t)'
default: '12'
type: string
cuda-minor-version:
description: 'CUDA minor version'
default: '9'
type: string
ubuntu-version:
description: 'Ubuntu version code (2204, 2404)'
default: '2404'
type: string
platforms:
description: 'Single platform per call (linux/amd64 or linux/arm64)'
required: true
type: string
runs-on:
description: 'Runner label'
required: true
type: string
tag-stem:
description: 'Stable portion of the image tag (e.g. python-cpu-amd64-2404)'
required: true
type: string
skip-drivers:
description: 'Pass-through to the base Dockerfile'
default: 'false'
type: string
secrets:
quayUsername:
required: false
quayPassword:
required: false
outputs:
image-ref:
description: 'Full image reference of the built base'
value: ${{ jobs.base-build.outputs.image-ref }}
jobs:
base-build:
runs-on: ${{ inputs.runs-on }}
env:
quay_username: ${{ secrets.quayUsername }}
outputs:
image-ref: ${{ steps.compute_ref.outputs.ref }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Configure apt mirror on runner
id: apt_mirror
uses: ./.github/actions/configure-apt-mirror
- name: Free Disk Space (Ubuntu)
if: inputs.runs-on == 'ubuntu-latest'
uses: jlumbroso/free-disk-space@main
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: true
- name: Compute image ref
id: compute_ref
run: |
stem='${{ inputs.tag-stem }}'
if [ "${{ github.event_name }}" = "pull_request" ]; then
tag="${stem}-pr${{ github.event.number }}"
else
tag="${stem}"
fi
echo "tag=${tag}" >> "$GITHUB_OUTPUT"
# Published into the existing ci-cache repo (the CI robot already
# has write access there) under a distinct `base-image-` prefix so
# the OCI image tags coexist with the buildkit cache tags
# (`base-<stem>`, `cache<tag-suffix>`, `cache-localai<tag-suffix>`).
echo "ref=quay.io/go-skynet/ci-cache:base-image-${tag}" >> "$GITHUB_OUTPUT"
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Login to Quay.io
if: ${{ env.quay_username != '' }}
uses: docker/login-action@v4
with:
registry: quay.io
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
- name: Build and push base image
uses: docker/build-push-action@v7
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./.docker/bases/Dockerfile.${{ inputs.lang }}
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
platforms: ${{ inputs.platforms }}
# Push on PRs as well (if creds present) so the PR's backend matrix
# can opt-in to the freshly-built base via -pr${N} tag.
push: ${{ env.quay_username != '' }}
tags: ${{ steps.compute_ref.outputs.ref }}
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }},mode=max,ignore-error=true
- name: job summary
run: |
echo "Built base image: ${{ steps.compute_ref.outputs.ref }}" >> "$GITHUB_STEP_SUMMARY"

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -25,7 +25,7 @@ jobs:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -37,7 +37,7 @@ jobs:
make build-launcher-darwin
ls -liah dist
- name: Upload macOS launcher artifacts
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: launcher-macos
path: dist/
@@ -47,9 +47,11 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Set up Go
uses: actions/setup-go@v5
with:
@@ -60,7 +62,7 @@ jobs:
sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
make build-launcher-linux
- name: Upload Linux launcher artifacts
uses: actions/upload-artifact@v5
uses: actions/upload-artifact@v7
with:
name: launcher-linux
path: local-ai-launcher-linux.tar.xz

View File

@@ -0,0 +1,48 @@
name: Bump inference defaults
on:
schedule:
# Run daily at 06:00 UTC
- cron: '0 6 * * *'
workflow_dispatch: # Allow manual trigger
permissions:
contents: write
pull-requests: write
jobs:
bump:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- name: Re-fetch inference defaults
run: make generate-force
- name: Check for changes
id: diff
run: |
if git diff --quiet core/config/inference_defaults.json; then
echo "changed=false" >> "$GITHUB_OUTPUT"
else
echo "changed=true" >> "$GITHUB_OUTPUT"
fi
- name: Create Pull Request
if: steps.diff.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v8
with:
commit-message: "chore: bump inference defaults from unsloth"
title: "chore: bump inference defaults from unsloth"
body: |
Auto-generated update of `core/config/inference_defaults.json` from
[unsloth's inference_defaults.json](https://github.com/unslothai/unsloth/blob/main/studio/backend/assets/configs/inference_defaults.json).
This PR was created automatically by the `bump-inference-defaults` workflow.
branch: chore/bump-inference-defaults
delete-branch: true
labels: automated

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
bump-backends:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
matrix:
@@ -13,14 +14,18 @@ jobs:
variable: "LLAMA_VERSION"
branch: "master"
file: "backend/cpp/llama-cpp/Makefile"
- repository: "ikawrakow/ik_llama.cpp"
variable: "IK_LLAMA_VERSION"
branch: "main"
file: "backend/cpp/ik-llama-cpp/Makefile"
- repository: "TheTom/llama-cpp-turboquant"
variable: "TURBOQUANT_VERSION"
branch: "feature/turboquant-kv-cache"
file: "backend/cpp/turboquant/Makefile"
- repository: "ggml-org/whisper.cpp"
variable: "WHISPER_CPP_VERSION"
branch: "master"
file: "backend/go/whisper/Makefile"
- repository: "PABannier/bark.cpp"
variable: "BARKCPP_VERSION"
branch: "main"
file: "Makefile"
- repository: "leejet/stable-diffusion.cpp"
variable: "STABLEDIFFUSION_GGML_VERSION"
branch: "master"
@@ -29,9 +34,29 @@ jobs:
variable: "PIPER_VERSION"
branch: "master"
file: "backend/go/piper/Makefile"
- repository: "antirez/voxtral.c"
variable: "VOXTRAL_VERSION"
branch: "main"
file: "backend/go/voxtral/Makefile"
- repository: "ace-step/acestep.cpp"
variable: "ACESTEP_CPP_VERSION"
branch: "master"
file: "backend/go/acestep-cpp/Makefile"
- repository: "PABannier/sam3.cpp"
variable: "SAM3_VERSION"
branch: "main"
file: "backend/go/sam3-cpp/Makefile"
- repository: "predict-woo/qwen3-tts.cpp"
variable: "QWEN3TTS_CPP_VERSION"
branch: "main"
file: "backend/go/qwen3-tts-cpp/Makefile"
- repository: "mudler/vibevoice.cpp"
variable: "VIBEVOICE_CPP_VERSION"
branch: "master"
file: "backend/go/vibevoice-cpp/Makefile"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Bump dependencies 🔧
id: bump
run: |
@@ -49,7 +74,7 @@ jobs:
rm -rfv ${{ matrix.variable }}_message.txt
rm -rfv ${{ matrix.variable }}_commit.txt
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
@@ -59,5 +84,37 @@ jobs:
body: ${{ steps.bump.outputs.message }}
signoff: true
bump-vllm-wheel:
# vLLM's cu130 wheel comes from a per-tag index URL (no /latest/ alias),
# so the cublas13 requirements file pins both a URL segment and a version
# constraint. bump_deps.sh handles git-sha-in-Makefile only — this job
# rewrites both values atomically when a new vLLM stable tag ships.
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Bump vLLM cu130 wheel pin 🔧
id: bump
run: |
bash .github/bump_vllm_wheel.sh vllm-project/vllm backend/python/vllm/requirements-cublas13-after.txt VLLM_VERSION
{
echo 'message<<EOF'
cat "VLLM_VERSION_message.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
{
echo 'commit<<EOF'
cat "VLLM_VERSION_commit.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
rm -rfv VLLM_VERSION_message.txt VLLM_VERSION_commit.txt
- name: Create Pull Request
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
commit-message: ':arrow_up: Update vllm-project/vllm cu130 wheel'
title: 'chore: :arrow_up: Update vllm-project/vllm cu130 wheel to `${{ steps.bump.outputs.commit }}`'
branch: "update/VLLM_VERSION"
body: ${{ steps.bump.outputs.message }}
signoff: true

View File

@@ -5,6 +5,7 @@ on:
workflow_dispatch:
jobs:
bump-docs:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
matrix:
@@ -12,12 +13,12 @@ jobs:
- repository: "mudler/LocalAI"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Bump dependencies 🔧
run: |
bash .github/bump_docs.sh ${{ matrix.repository }}
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI

View File

@@ -5,17 +5,12 @@ on:
workflow_dispatch:
jobs:
checksum_check:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Force Install GIT latest
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Install dependencies
run: |
sudo apt-get update
@@ -35,7 +30,7 @@ jobs:
sudo chmod 777 /hf_cache
bash .github/checksum_checker.sh gallery/index.yaml
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI

View File

@@ -12,10 +12,11 @@ concurrency:
jobs:
build-linux:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- uses: actions/setup-go@v5
@@ -33,7 +34,7 @@ jobs:
run: |
CGO_ENABLED=0 make build
- name: rm
uses: appleboy/ssh-action@v1.2.3
uses: appleboy/ssh-action@v1.2.5
with:
host: ${{ secrets.EXPLORER_SSH_HOST }}
username: ${{ secrets.EXPLORER_SSH_USERNAME }}
@@ -53,7 +54,7 @@ jobs:
rm: true
target: ./local-ai
- name: restarting
uses: appleboy/ssh-action@v1.2.3
uses: appleboy/ssh-action@v1.2.5
with:
host: ${{ secrets.EXPLORER_SSH_HOST }}
username: ${{ secrets.EXPLORER_SSH_USERNAME }}

View File

@@ -9,18 +9,18 @@ permissions:
jobs:
dependabot:
if: github.repository == 'mudler/LocalAI' && github.actor == 'dependabot[bot]'
runs-on: ubuntu-latest
if: ${{ github.actor == 'dependabot[bot]' }}
steps:
- name: Dependabot metadata
id: metadata
uses: dependabot/fetch-metadata@v2.4.0
uses: dependabot/fetch-metadata@v2.5.0
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
skip-commit-verification: true
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Approve a PR if not already approved
run: |

View File

@@ -10,11 +10,11 @@ permissions:
actions: write # to dispatch publish workflow
jobs:
dependabot:
if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && contains(github.event.pull_request.title, 'chore:')
runs-on: ubuntu-latest
if: ${{ github.actor == 'localai-bot' && !contains(github.event.pull_request.title, 'chore(model gallery):') }}
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Approve a PR if not already approved
run: |

View File

@@ -10,12 +10,12 @@ permissions:
jobs:
notify-discord:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
env:
MODEL_NAME: gemma-3-12b-it-qat
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
with:
fetch-depth: 0 # needed to checkout all branches for this Action to work
ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes
@@ -90,12 +90,12 @@ jobs:
connect-timeout-seconds: 180
limit-access-to-actor: true
notify-twitter:
if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
env:
MODEL_NAME: gemma-3-12b-it-qat
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
with:
fetch-depth: 0 # needed to checkout all branches for this Action to work
ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes

View File

@@ -2,7 +2,7 @@ name: Gallery Agent
on:
schedule:
- cron: '0 */3 * * *' # Run every 4 hours
- cron: '0 */12 * * *' # Run every 4 hours
workflow_dispatch:
inputs:
search_term:
@@ -27,10 +27,11 @@ on:
type: string
jobs:
gallery-agent:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
token: ${{ secrets.GITHUB_TOKEN }}
@@ -38,20 +39,100 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: '1.21'
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Process gallery-agent PR commands
env:
GH_TOKEN: ${{ secrets.UPDATE_BOT_TOKEN }}
REPO: ${{ github.repository }}
SEARCH: 'gallery agent in:title'
run: |
# Walk gallery-agent PRs and act on maintainer comments:
# /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
# /gallery-agent recreate → close without label (next run may repropose)
# Only comments from OWNER / MEMBER / COLLABORATOR are honored so
# random users can't drive the bot.
#
# We scan both open PRs AND recently-closed PRs that don't already
# carry the blacklist label. This covers the common flow where a
# maintainer writes /gallery-agent blacklist and immediately clicks
# Close — without this, the next scheduled run wouldn't see the
# command (PR is already closed) and would repropose the model.
gh label create gallery-agent/blacklisted \
--repo "$REPO" --color ededed \
--description "gallery-agent must not repropose this model" 2>/dev/null || true
prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
--json number --jq '.[].number')
# Closed PRs from the last 14 days that don't yet have the blacklist label.
# Bounded window keeps the scan cheap while covering late-applied commands.
since=$(date -u -d '14 days ago' +%Y-%m-%d)
prs_closed=$(gh pr list --repo "$REPO" --state closed \
--search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
--json number --jq '.[].number')
prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
for pr in $prs; do
state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
--jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
echo "PR #$pr: blacklist command found (state=$state)"
gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
if [ "$state" = "OPEN" ]; then
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
fi
elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
echo "PR #$pr: recreate command found"
gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
fi
done
- name: Collect skip URLs for the gallery agent
id: open_prs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
REPO: ${{ github.repository }}
SEARCH: 'gallery agent in:title'
run: |
# Skip set =
# URLs from any open gallery-agent PR (avoid duplicate PRs for the same model while one is pending)
# + URLs from closed PRs carrying the `gallery-agent/blacklisted` label (hard blacklist)
# Plain-closed PRs without the label are ignored — closing a PR is
# not by itself a "never propose again" signal; maintainers must
# opt in via the /gallery-agent blacklist comment command.
urls_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
--json body --jq '[.[].body] | join("\n")' \
| grep -oE 'https://huggingface\.co/[^ )]+' || true)
urls_blacklist=$(gh pr list --repo "$REPO" --state closed --search "$SEARCH" \
--label gallery-agent/blacklisted \
--json body --jq '[.[].body] | join("\n")' \
| grep -oE 'https://huggingface\.co/[^ )]+' || true)
urls=$(printf '%s\n%s\n' "$urls_open" "$urls_blacklist" | sort -u | sed '/^$/d')
echo "Skip URLs:"
echo "$urls"
{
echo "urls<<EOF"
echo "$urls"
echo "EOF"
} >> "$GITHUB_OUTPUT"
- name: Run gallery agent
env:
OPENAI_MODEL: ${{ secrets.OPENAI_MODEL }}
OPENAI_KEY: ${{ secrets.OPENAI_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
SEARCH_TERM: ${{ github.event.inputs.search_term || 'GGUF' }}
LIMIT: ${{ github.event.inputs.limit || '15' }}
QUANTIZATION: ${{ github.event.inputs.quantization || 'Q4_K_M' }}
MAX_MODELS: ${{ github.event.inputs.max_models || '1' }}
EXTRA_SKIP_URLS: ${{ steps.open_prs.outputs.urls }}
run: |
export GALLERY_INDEX_PATH=$PWD/gallery/index.yaml
go run .github/gallery-agent
go run ./.github/gallery-agent
- name: Check for changes
id: check_changes
@@ -69,28 +150,28 @@ jobs:
id: read_summary
if: steps.check_changes.outputs.changes == 'true'
run: |
if [ -f ".github/gallery-agent/gallery-agent-summary.json" ]; then
if [ -f "./gallery-agent-summary.json" ]; then
echo "summary_exists=true" >> $GITHUB_OUTPUT
# Extract summary data using jq
echo "search_term=$(jq -r '.search_term' .github/gallery-agent/gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "total_found=$(jq -r '.total_found' .github/gallery-agent/gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "models_added=$(jq -r '.models_added' .github/gallery-agent/gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "quantization=$(jq -r '.quantization' .github/gallery-agent/gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "processing_time=$(jq -r '.processing_time' .github/gallery-agent/gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "search_term=$(jq -r '.search_term' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "total_found=$(jq -r '.total_found' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "models_added=$(jq -r '.models_added' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "quantization=$(jq -r '.quantization' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
echo "processing_time=$(jq -r '.processing_time' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
# Create a formatted list of added models with URLs
added_models=$(jq -r 'range(0; .added_model_ids | length) as $i | "- [\(.added_model_ids[$i])](\(.added_model_urls[$i]))"' .github/gallery-agent/gallery-agent-summary.json | tr '\n' '\n')
added_models=$(jq -r 'range(0; .added_model_ids | length) as $i | "- [\(.added_model_ids[$i])](\(.added_model_urls[$i]))"' ./gallery-agent-summary.json | tr '\n' '\n')
echo "added_models<<EOF" >> $GITHUB_OUTPUT
echo "$added_models" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
rm -f .github/gallery-agent/gallery-agent-summary.json
rm -f ./gallery-agent-summary.json
else
echo "summary_exists=false" >> $GITHUB_OUTPUT
fi
- name: Create Pull Request
if: steps.check_changes.outputs.changes == 'true'
uses: peter-evans/create-pull-request@v7
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
@@ -110,7 +191,21 @@ jobs:
**Added Models:**
${{ steps.read_summary.outputs.added_models || '- No models added' }}
### Bot commands
Maintainers (owner / member / collaborator) can control this PR
by leaving a comment with one of:
- `/gallery-agent recreate` — close this PR; the next scheduled
run will propose this model again (useful if the entry needs
to be regenerated with fresh metadata).
- `/gallery-agent blacklist` — close this PR and permanently
prevent the gallery agent from ever reproposing this model.
Plain "Close" (without a command) is treated as a no-op: the
model may be reproposed by a future run.
**Workflow Details:**
- Triggered by: `${{ github.event_name }}`
- Run ID: `${{ github.run_id }}`

View File

@@ -1,95 +0,0 @@
name: 'generate and publish GRPC docker caches'
on:
workflow_dispatch:
schedule:
# daily at midnight
- cron: '0 0 * * *'
concurrency:
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
generate_caches:
strategy:
matrix:
include:
- grpc-base-image: ubuntu:22.04
runs-on: 'ubuntu-latest'
platforms: 'linux/amd64,linux/arm64'
runs-on: ${{matrix.runs-on}}
steps:
- name: Release space from worker
if: matrix.runs-on == 'ubuntu-latest'
run: |
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
df -h
echo
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
sudo rm -rf /usr/local/lib/android
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
sudo rm -rf /usr/share/dotnet
sudo apt-get remove -y '^mono-.*' || true
sudo apt-get remove -y '^ghc-.*' || true
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
sudo apt-get remove -y 'php.*' || true
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
sudo apt-get remove -y '^google-.*' || true
sudo apt-get remove -y azure-cli || true
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
sudo apt-get remove -y '^gfortran-.*' || true
sudo apt-get remove -y microsoft-edge-stable || true
sudo apt-get remove -y firefox || true
sudo apt-get remove -y powershell || true
sudo apt-get remove -y r-base-core || true
sudo apt-get autoremove -y
sudo apt-get clean
echo
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
sudo rm -rfv build || true
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /opt/ghc || true
sudo rm -rf "/usr/local/share/boost" || true
sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
df -h
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v5
- name: Cache GRPC
uses: docker/build-push-action@v6
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
build-args: |
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
context: .
file: ./Dockerfile
cache-to: type=gha,ignore-error=true
cache-from: type=gha
target: grpc
platforms: ${{ matrix.platforms }}
push: false

View File

@@ -12,11 +12,12 @@ concurrency:
jobs:
generate_caches:
if: github.repository == 'mudler/LocalAI'
strategy:
matrix:
include:
- base-image: intel/oneapi-basekit:2025.2.0-0-devel-ubuntu22.04
runs-on: 'ubuntu-latest'
- base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
runs-on: 'arc-runner-set'
platforms: 'linux/amd64'
runs-on: ${{matrix.runs-on}}
steps:
@@ -26,14 +27,14 @@ jobs:
platforms: all
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Login to quay
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: quay.io
username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
@@ -43,17 +44,17 @@ jobs:
uses: docker/setup-buildx-action@master
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
- name: Cache Intel images
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
with:
builder: ${{ steps.buildx.outputs.name }}
build-args: |
BASE_IMAGE=${{ matrix.base-image }}
context: .
file: ./Dockerfile
tags: quay.io/go-skynet/intel-oneapi-base:latest
tags: quay.io/go-skynet/intel-oneapi-base:24.04
push: true
target: intel
platforms: ${{ matrix.platforms }}

75
.github/workflows/gh-pages.yml vendored Normal file
View File

@@ -0,0 +1,75 @@
name: Deploy docs to GitHub Pages
on:
push:
branches:
- master
paths:
- 'docs/**'
- 'gallery/**'
- 'images/**'
- '.github/ci/modelslist.go'
- '.github/workflows/gh-pages.yml'
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: false
jobs:
build:
runs-on: ubuntu-latest
env:
HUGO_VERSION: "0.146.3"
steps:
- name: Checkout
uses: actions/checkout@v6
with:
fetch-depth: 0 # needed for enableGitInfo
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
cache: false
- name: Setup Hugo
uses: peaceiris/actions-hugo@v3
with:
hugo-version: ${{ env.HUGO_VERSION }}
extended: true
- name: Setup Pages
id: pages
uses: actions/configure-pages@v6
- name: Generate gallery
run: go run ./.github/ci/modelslist.go ./gallery/index.yaml > docs/static/gallery.html
- name: Build site
working-directory: docs
run: |
mkdir -p layouts/_default
hugo --minify --baseURL "${{ steps.pages.outputs.base_url }}/"
- name: Upload artifact
uses: actions/upload-pages-artifact@v5
with:
path: docs/public
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v5

View File

@@ -1,68 +1,92 @@
---
name: 'build container images tests'
on:
pull_request:
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
# Pushing with all jobs in parallel
# eats the bandwidth of all the nodes
max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
fail-fast: false
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-22.04:6.4.3"
grpc-base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-vulkan-core'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
name: 'build container images tests'
on:
pull_request:
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
# Pushing with all jobs in parallel
# eats the bandwidth of all the nodes
max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
fail-fast: false
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'false'
tag-suffix: '-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'sycl'
platforms: 'linux/amd64'
tag-latest: 'false'
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: 'sycl'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'vulkan'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'false'
tag-suffix: '-vulkan-core'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'false'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'

View File

@@ -1,154 +1,176 @@
---
name: 'build container images'
name: 'build container images'
on:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
hipblas-jobs:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
makeflags: ${{ matrix.makeflags }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
on:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
hipblas-jobs:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
aio: ${{ matrix.aio }}
makeflags: ${{ matrix.makeflags }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'hipblas'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-hipblas'
base-image: "rocm/dev-ubuntu-22.04:6.4.3"
grpc-base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-hipblas"
core-image-build:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
#max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
matrix:
include:
- build-type: ''
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: ''
base-image: "ubuntu:22.04"
runs-on: 'ubuntu-latest'
aio: "-aio-cpu"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
- build-type: 'cublas'
cuda-major-version: "11"
cuda-minor-version: "7"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-11'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
aio: "-aio-gpu-nvidia-cuda-11"
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-nvidia-cuda-12"
- build-type: 'vulkan'
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
aio: "-aio-gpu-vulkan"
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
grpc-base-image: "ubuntu:22.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
aio: "-aio-gpu-intel"
gh-runner:
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
aio: ${{ matrix.aio }}
base-image: ${{ matrix.base-image }}
grpc-base-image: ${{ matrix.grpc-base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'true'
core-image-build:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
#max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
matrix:
include:
- build-type: ''
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: ''
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-latest'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:22.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'vulkan'
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
makeflags: "--jobs=4 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
- build-type: 'intel'
platforms: 'linux/amd64'
tag-latest: 'auto'
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
tag-suffix: '-gpu-intel'
runs-on: 'ubuntu-latest'
makeflags: "--jobs=3 --output-sync=target"
ubuntu-version: '2404'
ubuntu-codename: 'noble'
gh-runner:
if: github.repository == 'mudler/LocalAI'
uses: ./.github/workflows/image_build.yml
with:
tag-latest: ${{ matrix.tag-latest }}
tag-suffix: ${{ matrix.tag-suffix }}
build-type: ${{ matrix.build-type }}
cuda-major-version: ${{ matrix.cuda-major-version }}
cuda-minor-version: ${{ matrix.cuda-minor-version }}
platforms: ${{ matrix.platforms }}
runs-on: ${{ matrix.runs-on }}
base-image: ${{ matrix.base-image }}
makeflags: ${{ matrix.makeflags }}
skip-drivers: ${{ matrix.skip-drivers }}
ubuntu-version: ${{ matrix.ubuntu-version }}
ubuntu-codename: ${{ matrix.ubuntu-codename }}
secrets:
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
strategy:
matrix:
include:
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'true'
ubuntu-version: "2204"
ubuntu-codename: 'jammy'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/arm64'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-cuda-13'
base-image: "ubuntu:24.04"
runs-on: 'ubuntu-24.04-arm'
makeflags: "--jobs=4 --output-sync=target"
skip-drivers: 'false'
ubuntu-version: '2404'
ubuntu-codename: 'noble'

View File

@@ -8,11 +8,6 @@ on:
description: 'Base image'
required: true
type: string
grpc-base-image:
description: 'GRPC Base image, must be a compatible image with base-image'
required: false
default: ''
type: string
build-type:
description: 'Build type'
default: ''
@@ -23,7 +18,7 @@ on:
type: string
cuda-minor-version:
description: 'CUDA minor version'
default: "4"
default: "9"
type: string
platforms:
description: 'Platforms'
@@ -51,10 +46,15 @@ on:
required: false
default: '--jobs=4 --output-sync=target'
type: string
aio:
description: 'AIO Image Name'
ubuntu-version:
description: 'Ubuntu version'
required: false
default: ''
default: '2204'
type: string
ubuntu-codename:
description: 'Ubuntu codename'
required: false
default: 'noble'
type: string
secrets:
dockerUsername:
@@ -70,6 +70,13 @@ jobs:
runs-on: ${{ inputs.runs-on }}
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Configure apt mirror on runner
id: apt_mirror
uses: ./.github/actions/configure-apt-mirror
- name: Free Disk Space (Ubuntu)
if: inputs.runs-on == 'ubuntu-latest'
uses: jlumbroso/free-disk-space@main
@@ -85,16 +92,6 @@ jobs:
large-packages: true
docker-images: true
swap-storage: true
- name: Force Install GIT latest
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git
- name: Checkout
uses: actions/checkout@v5
- name: Release space from worker
if: inputs.runs-on == 'ubuntu-latest'
@@ -141,7 +138,7 @@ jobs:
- name: Docker meta
id: meta
if: github.event_name != 'pull_request'
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
quay.io/go-skynet/local-ai
@@ -156,7 +153,7 @@ jobs:
- name: Docker meta for PR
id: meta_pull_request
if: github.event_name == 'pull_request'
uses: docker/metadata-action@v5
uses: docker/metadata-action@v6
with:
images: |
quay.io/go-skynet/ci-tests
@@ -167,34 +164,6 @@ jobs:
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.tag-suffix }}
- name: Docker meta AIO (quay.io)
if: inputs.aio != ''
id: meta_aio
uses: docker/metadata-action@v5
with:
images: |
quay.io/go-skynet/local-ai
tags: |
type=ref,event=branch
type=semver,pattern={{raw}}
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.aio }},onlatest=true
- name: Docker meta AIO (dockerhub)
if: inputs.aio != ''
id: meta_aio_dockerhub
uses: docker/metadata-action@v5
with:
images: |
localai/localai
tags: |
type=ref,event=branch
type=semver,pattern={{raw}}
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.aio }},onlatest=true
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
@@ -206,108 +175,68 @@ jobs:
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
username: ${{ secrets.dockerUsername }}
password: ${{ secrets.dockerPassword }}
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: quay.io
username: ${{ secrets.quayUsername }}
password: ${{ secrets.quayPassword }}
- name: Build and push
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
if: github.event_name != 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
### Start testing image
- name: Build and push
uses: docker/build-push-action@v6
uses: docker/build-push-action@v7
if: github.event_name == 'pull_request'
with:
builder: ${{ steps.buildx.outputs.name }}
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
# This means that even the MAKEFLAGS have to be an EXACT match.
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
build-args: |
BUILD_TYPE=${{ inputs.build-type }}
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
BASE_IMAGE=${{ inputs.base-image }}
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
GRPC_VERSION=v1.65.0
MAKEFLAGS=${{ inputs.makeflags }}
SKIP_DRIVERS=${{ inputs.skip-drivers }}
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
context: .
file: ./Dockerfile
cache-from: type=gha
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
platforms: ${{ inputs.platforms }}
#push: true
tags: ${{ steps.meta_pull_request.outputs.tags }}
labels: ${{ steps.meta_pull_request.outputs.labels }}
## End testing image
- name: Build and push AIO image
if: inputs.aio != ''
uses: docker/build-push-action@v6
with:
builder: ${{ steps.buildx.outputs.name }}
build-args: |
BASE_IMAGE=quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
MAKEFLAGS=${{ inputs.makeflags }}
context: .
file: ./Dockerfile.aio
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta_aio.outputs.tags }}
labels: ${{ steps.meta_aio.outputs.labels }}
- name: Build and push AIO image (dockerhub)
if: inputs.aio != ''
uses: docker/build-push-action@v6
with:
builder: ${{ steps.buildx.outputs.name }}
build-args: |
BASE_IMAGE=localai/localai:${{ steps.meta.outputs.version }}
MAKEFLAGS=${{ inputs.makeflags }}
context: .
file: ./Dockerfile.aio
platforms: ${{ inputs.platforms }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta_aio_dockerhub.outputs.tags }}
labels: ${{ steps.meta_aio_dockerhub.outputs.labels }}
- name: job summary
run: |
echo "Built image: ${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
- name: job summary(AIO)
if: inputs.aio != ''
run: |
echo "Built image: ${{ steps.meta_aio.outputs.labels }}" >> $GITHUB_STEP_SUMMARY

48
.github/workflows/lint.yml vendored Normal file
View File

@@ -0,0 +1,48 @@
---
name: 'lint'
on:
pull_request:
paths-ignore:
- 'docs/**'
- 'examples/**'
- 'README.md'
- '**/*.md'
push:
branches:
- master
concurrency:
group: ci-lint-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
golangci-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
# Full history so golangci-lint's new-from-merge-base can reach
# origin/master and compute the diff against it.
fetch-depth: 0
- uses: actions/setup-go@v5
with:
go-version: '1.26.x'
cache: false
- name: install golangci-lint
run: |
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh \
| sh -s -- -b "$(go env GOPATH)/bin" v2.11.4
- name: generate grpc proto sources
# pkg/grpc/proto/*.go is generated, not checked in. Several packages
# import it, so without this step typecheck fails project-wide.
run: make protogen-go
- name: stub react-ui dist for go:embed
# core/http/app.go has //go:embed react-ui/dist/*; the glob needs at
# least one non-hidden entry to satisfy typecheck. We don't run
# `make react-ui` here because lint doesn't need the real bundle.
run: |
mkdir -p core/http/react-ui/dist
touch core/http/react-ui/dist/index.html
- name: lint
run: make lint

View File

@@ -6,6 +6,7 @@ on:
jobs:
notify-discord:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
env:
RELEASE_BODY: ${{ github.event.release.body }}

View File

@@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -18,7 +18,7 @@ jobs:
with:
go-version: 1.23
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v6
uses: goreleaser/goreleaser-action@v7
with:
version: v2.11.0
args: release --clean
@@ -28,7 +28,7 @@ jobs:
runs-on: macos-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Go
@@ -39,16 +39,18 @@ jobs:
run: |
make build-launcher-darwin
- name: Upload DMG to Release
uses: softprops/action-gh-release@v2
uses: softprops/action-gh-release@v3
with:
files: ./dist/LocalAI.dmg
launcher-build-linux:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Set up Go
uses: actions/setup-go@v5
with:
@@ -59,6 +61,6 @@ jobs:
sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
make build-launcher-linux
- name: Upload Linux launcher artifacts
uses: softprops/action-gh-release@v2
uses: softprops/action-gh-release@v3
with:
files: ./local-ai-launcher-linux.tar.xz

View File

@@ -14,7 +14,7 @@ jobs:
GO111MODULE: on
steps:
- name: Checkout Source
uses: actions/checkout@v5
uses: actions/checkout@v6
if: ${{ github.actor != 'dependabot[bot]' }}
- name: Run Gosec Security Scanner
if: ${{ github.actor != 'dependabot[bot]' }}

View File

@@ -8,9 +8,10 @@ on:
jobs:
stale:
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/stale@5f858e3efba33a5ca4407a664cc011ad407f2008 # v9
- uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v9
with:
stale-issue-message: 'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
stale-pr-message: 'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'

View File

@@ -14,12 +14,55 @@ concurrency:
cancel-in-progress: true
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
run-all: ${{ steps.detect.outputs.run-all }}
transformers: ${{ steps.detect.outputs.transformers }}
rerankers: ${{ steps.detect.outputs.rerankers }}
diffusers: ${{ steps.detect.outputs.diffusers }}
coqui: ${{ steps.detect.outputs.coqui }}
moonshine: ${{ steps.detect.outputs.moonshine }}
pocket-tts: ${{ steps.detect.outputs.pocket-tts }}
qwen-tts: ${{ steps.detect.outputs.qwen-tts }}
qwen-asr: ${{ steps.detect.outputs.qwen-asr }}
nemo: ${{ steps.detect.outputs.nemo }}
voxcpm: ${{ steps.detect.outputs.voxcpm }}
llama-cpp-quantization: ${{ steps.detect.outputs.llama-cpp-quantization }}
llama-cpp: ${{ steps.detect.outputs.llama-cpp }}
ik-llama-cpp: ${{ steps.detect.outputs.ik-llama-cpp }}
turboquant: ${{ steps.detect.outputs.turboquant }}
vllm: ${{ steps.detect.outputs.vllm }}
sglang: ${{ steps.detect.outputs.sglang }}
acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
vibevoice-cpp: ${{ steps.detect.outputs.vibevoice-cpp }}
localvqe: ${{ steps.detect.outputs.localvqe }}
voxtral: ${{ steps.detect.outputs.voxtral }}
kokoros: ${{ steps.detect.outputs.kokoros }}
insightface: ${{ steps.detect.outputs.insightface }}
speaker-recognition: ${{ steps.detect.outputs.speaker-recognition }}
sherpa-onnx: ${{ steps.detect.outputs.sherpa-onnx }}
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Setup Bun
uses: oven-sh/setup-bun@v2
- name: Install dependencies
run: bun add js-yaml @octokit/core
- name: Detect changed backends
id: detect
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_EVENT_PATH: ${{ github.event_path }}
run: bun run scripts/changed-backends.js
# Requires CUDA
# tests-chatterbox-tts:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v5
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -37,10 +80,12 @@ jobs:
# make --jobs=5 --output-sync=target -C backend/python/chatterbox
# make --jobs=5 --output-sync=target -C backend/python/chatterbox test
tests-transformers:
needs: detect-changes
if: needs.detect-changes.outputs.transformers == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -58,10 +103,12 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/transformers
make --jobs=5 --output-sync=target -C backend/python/transformers test
tests-rerankers:
needs: detect-changes
if: needs.detect-changes.outputs.rerankers == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -80,10 +127,12 @@ jobs:
make --jobs=5 --output-sync=target -C backend/python/rerankers test
tests-diffusers:
needs: detect-changes
if: needs.detect-changes.outputs.diffusers == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
@@ -104,7 +153,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v5
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -124,7 +173,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v5
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -186,7 +235,7 @@ jobs:
# sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
# df -h
# - name: Clone
# uses: actions/checkout@v5
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -211,7 +260,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Clone
# uses: actions/checkout@v5
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
@@ -229,16 +278,18 @@ jobs:
# make --jobs=5 --output-sync=target -C backend/python/vllm test
tests-coqui:
needs: detect-changes
if: needs.detect-changes.outputs.coqui == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -247,3 +298,717 @@ jobs:
run: |
make --jobs=5 --output-sync=target -C backend/python/coqui
make --jobs=5 --output-sync=target -C backend/python/coqui test
tests-moonshine:
needs: detect-changes
if: needs.detect-changes.outputs.moonshine == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test moonshine
run: |
make --jobs=5 --output-sync=target -C backend/python/moonshine
make --jobs=5 --output-sync=target -C backend/python/moonshine test
tests-pocket-tts:
needs: detect-changes
if: needs.detect-changes.outputs.pocket-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test pocket-tts
run: |
make --jobs=5 --output-sync=target -C backend/python/pocket-tts
make --jobs=5 --output-sync=target -C backend/python/pocket-tts test
tests-qwen-tts:
needs: detect-changes
if: needs.detect-changes.outputs.qwen-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test qwen-tts
run: |
make --jobs=5 --output-sync=target -C backend/python/qwen-tts
make --jobs=5 --output-sync=target -C backend/python/qwen-tts test
# TODO: s2-pro model is too large to load on CPU-only CI runners — re-enable
# when we have GPU runners or a smaller test model.
# tests-fish-speech:
# runs-on: ubuntu-latest
# timeout-minutes: 45
# steps:
# - name: Clone
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
# run: |
# sudo apt-get update
# sudo apt-get install -y build-essential ffmpeg portaudio19-dev
# sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# # Install UV
# curl -LsSf https://astral.sh/uv/install.sh | sh
# pip install --user --no-cache-dir grpcio-tools==1.64.1
# - name: Test fish-speech
# run: |
# make --jobs=5 --output-sync=target -C backend/python/fish-speech
# make --jobs=5 --output-sync=target -C backend/python/fish-speech test
tests-qwen-asr:
needs: detect-changes
if: needs.detect-changes.outputs.qwen-asr == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg sox
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test qwen-asr
run: |
make --jobs=5 --output-sync=target -C backend/python/qwen-asr
make --jobs=5 --output-sync=target -C backend/python/qwen-asr test
tests-nemo:
needs: detect-changes
if: needs.detect-changes.outputs.nemo == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential ffmpeg sox
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test nemo
run: |
make --jobs=5 --output-sync=target -C backend/python/nemo
make --jobs=5 --output-sync=target -C backend/python/nemo test
tests-voxcpm:
needs: detect-changes
if: needs.detect-changes.outputs.voxcpm == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ffmpeg
sudo apt-get install -y ca-certificates cmake curl patch python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Test voxcpm
run: |
make --jobs=5 --output-sync=target -C backend/python/voxcpm
make --jobs=5 --output-sync=target -C backend/python/voxcpm test
tests-llama-cpp-quantization:
needs: detect-changes
if: needs.detect-changes.outputs.llama-cpp-quantization == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl git python3-pip
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
pip install --user --no-cache-dir grpcio-tools==1.64.1
- name: Build llama-quantize from llama.cpp
run: |
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git /tmp/llama.cpp
cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_NATIVE=OFF
cmake --build /tmp/llama.cpp/build --target llama-quantize -j$(nproc)
sudo cp /tmp/llama.cpp/build/bin/llama-quantize /usr/local/bin/
- name: Install backend
run: |
make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization
- name: Test llama-cpp-quantization
run: |
make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization test
tests-llama-cpp-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build llama-cpp backend image and run gRPC e2e tests
run: |
make test-extra-backend-llama-cpp
tests-llama-cpp-grpc-transcription:
needs: detect-changes
if: needs.detect-changes.outputs.llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build llama-cpp backend image and run audio transcription gRPC e2e tests
run: |
make test-extra-backend-llama-cpp-transcription
# PR-acceptance smoke gate: always runs on every PR (no detect-changes gate, no
# paths filter). Pulls the pre-built master CPU llama-cpp image from quay
# instead of building from source, so the cost is a docker pull (~30s) plus the
# short Qwen3-0.6B model download. Exercises the full gRPC surface — health,
# load, predict, stream — plus the logprobs/logit_bias specs that moved out of
# core/http/app_test.go. Anything heavier or per-backend is gated to the
# detect-changes path-filter above.
tests-llama-cpp-smoke:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Pull pre-built llama-cpp backend image
run: docker pull quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp
- name: Run e2e-backends smoke
env:
BACKEND_IMAGE: quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp
BACKEND_TEST_CAPS: health,load,predict,stream,logprobs,logit_bias
run: |
make test-extra-backend
# Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked LLM.
# Builds the sherpa-onnx Docker image, extracts the rootfs so the e2e suite
# can discover the backend binary + shared libs, downloads the three model
# bundles (silero-vad, omnilingual-asr, vits-ljs) and drives the realtime
# websocket spec end-to-end.
tests-sherpa-onnx-realtime:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Build sherpa-onnx backend image and run realtime e2e tests
run: |
make test-extra-e2e-realtime-sherpa
# Streaming ASR via the sherpa-onnx online recognizer (zipformer
# transducer). Exercises both AudioTranscription (buffered) and
# AudioTranscriptionStream (real-time deltas) on the e2e-backends
# harness.
tests-sherpa-onnx-grpc-transcription:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build sherpa-onnx backend image and run streaming ASR gRPC e2e tests
run: |
make test-extra-backend-sherpa-onnx-transcription
# VITS TTS via the sherpa-onnx backend. Drives both TTS (file write) and
# TTSStream (PCM chunks) on the e2e-backends harness.
tests-sherpa-onnx-grpc-tts:
needs: detect-changes
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build sherpa-onnx backend image and run TTS gRPC e2e tests
run: |
make test-extra-backend-sherpa-onnx-tts
tests-ik-llama-cpp-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.ik-llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build ik-llama-cpp backend image and run gRPC e2e tests
run: |
make test-extra-backend-ik-llama-cpp
tests-turboquant-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.turboquant == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
# Exercises the turboquant (llama.cpp fork) backend with KV-cache
# quantization enabled. The convenience target sets
# BACKEND_TEST_CACHE_TYPE_K / _V=q8_0, which are plumbed into the
# ModelOptions.CacheTypeKey/Value gRPC fields. LoadModel-success +
# backend stdout/stderr (captured by the Ginkgo suite) prove the
# cache-type config path reaches the fork's KV-cache init.
- name: Build turboquant backend image and run gRPC e2e tests
run: |
make test-extra-backend-turboquant
# tests-vllm-grpc is currently disabled in CI.
#
# The prebuilt vllm CPU wheel is compiled with AVX-512 VNNI/BF16
# instructions, and neither ubuntu-latest nor the bigger-runner pool
# offers a stable CPU baseline that supports them — runners come
# back with different hardware between runs and SIGILL on import of
# vllm.model_executor.models.registry. Compiling vllm from source
# via FROM_SOURCE=true works on any CPU but takes 30-50 minutes per
# run, which is too slow for a smoke test.
#
# The test itself (tests/e2e-backends + make test-extra-backend-vllm)
# is fully working and validated locally on a host with the right
# SIMD baseline. Run it manually with:
#
# make test-extra-backend-vllm
#
# Re-enable this job once we have a self-hosted runner label with
# guaranteed AVX-512 VNNI/BF16 support, or once the vllm project
# publishes a CPU wheel with a wider baseline.
#
# tests-vllm-grpc:
# needs: detect-changes
# if: needs.detect-changes.outputs.vllm == 'true' || needs.detect-changes.outputs.run-all == 'true'
# runs-on: bigger-runner
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
# run: |
# sudo apt-get update
# sudo apt-get install -y --no-install-recommends \
# make build-essential curl unzip ca-certificates git tar
# - name: Setup Go
# uses: actions/setup-go@v5
# with:
# go-version: '1.25.4'
# - name: Free disk space
# run: |
# sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
# df -h
# - name: Build vllm (cpu) backend image and run gRPC e2e tests
# run: |
# make test-extra-backend-vllm
# tests-sglang-grpc is currently disabled in CI for the same reason as
# tests-vllm-grpc: sglang's CPU kernel (sgl-kernel) uses __m512 AVX-512
# intrinsics unconditionally in shm.cpp, so the from-source build
# requires `-march=sapphirerapids` (already set in install.sh) and the
# resulting binary SIGILLs at import on CPUs without AVX-512 VNNI/BF16.
# The ubuntu-latest runner pool does not guarantee that ISA baseline.
#
# The test itself (tests/e2e-backends + make test-extra-backend-sglang)
# is fully working and validated locally on a host with the right
# SIMD baseline. Run it manually with:
#
# make test-extra-backend-sglang
#
# Re-enable this job once we have a self-hosted runner label with
# guaranteed AVX-512 VNNI/BF16 support.
#
# tests-sglang-grpc:
# needs: detect-changes
# if: needs.detect-changes.outputs.sglang == 'true' || needs.detect-changes.outputs.run-all == 'true'
# runs-on: bigger-runner
# timeout-minutes: 90
# steps:
# - name: Clone
# uses: actions/checkout@v6
# with:
# submodules: true
# - name: Dependencies
# run: |
# sudo apt-get update
# sudo apt-get install -y --no-install-recommends \
# make build-essential curl unzip ca-certificates git tar
# - name: Setup Go
# uses: actions/setup-go@v5
# with:
# go-version: '1.25.4'
# - name: Free disk space
# run: |
# sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
# df -h
# - name: Build sglang (cpu) backend image and run gRPC e2e tests
# run: |
# make test-extra-backend-sglang
tests-acestep-cpp:
needs: detect-changes
if: needs.detect-changes.outputs.acestep-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
- name: Setup Go
uses: actions/setup-go@v5
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build acestep-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/acestep-cpp
- name: Test acestep-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/acestep-cpp test
tests-qwen3-tts-cpp:
needs: detect-changes
if: needs.detect-changes.outputs.qwen3-tts-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
- name: Setup Go
uses: actions/setup-go@v5
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build qwen3-tts-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/qwen3-tts-cpp
- name: Test qwen3-tts-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/qwen3-tts-cpp test
# Per-backend smoke for vibevoice-cpp: builds the .so + Go binary and
# runs `make -C backend/go/vibevoice-cpp test`. test.sh auto-downloads
# the published mudler/vibevoice.cpp-models bundle (TTS Q8_0 + ASR Q4_K
# + tokenizer + voice) and runs the closed-loop TTS → ASR Go test.
tests-vibevoice-cpp:
needs: detect-changes
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
- name: Setup Go
uses: actions/setup-go@v5
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build vibevoice-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/vibevoice-cpp
- name: Test vibevoice-cpp
run: |
make --jobs=5 --output-sync=target -C backend/go/vibevoice-cpp test
# End-to-end TTS via the e2e-backends gRPC harness. Builds the
# vibevoice-cpp Docker image and drives Backend/TTS against it with a
# real LocalAI gRPC client.
tests-vibevoice-cpp-grpc-tts:
needs: detect-changes
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build vibevoice-cpp backend image and run TTS gRPC e2e tests
run: |
make test-extra-backend-vibevoice-cpp-tts
# End-to-end transcription via the e2e-backends gRPC harness. The
# vibevoice ASR is a 7B-param model (Q4_K weights ~10 GB on disk)
# and the JFK 30 s decode is too heavy for a free 4-core
# ubuntu-latest pool runner - two CI attempts got SIGTERM'd during
# LoadModel, before the test could even progress. Use the
# self-hosted 'bigger-runner' label (same one the GPU image builds
# in backend.yml use) and the documented dotnet/ghc/android cache
# purge to clear ~10-20 GB of headroom for the model + Docker
# image + working dir.
tests-vibevoice-cpp-grpc-transcription:
needs: detect-changes
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: bigger-runner
timeout-minutes: 150
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
make build-essential curl unzip ca-certificates git tar
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
df -h
- name: Build vibevoice-cpp backend image and run ASR gRPC e2e tests
run: |
make test-extra-backend-vibevoice-cpp-transcription
# End-to-end audio transform via the e2e-backends gRPC harness. The
# LocalVQE GGUF is small (~5 MB) and the model is real-time on CPU, so
# the default ubuntu-latest pool is plenty.
tests-localvqe-grpc-transform:
needs: detect-changes
if: needs.detect-changes.outputs.localvqe == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.25.4'
- name: Build localvqe backend image and run audio_transform gRPC e2e tests
run: |
make test-extra-backend-localvqe-transform
tests-voxtral:
needs: detect-changes
if: needs.detect-changes.outputs.voxtral == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
- name: Setup Go
uses: actions/setup-go@v5
# You can test your matrix by printing the current Go version
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Build voxtral
run: |
make --jobs=5 --output-sync=target -C backend/go/voxtral
- name: Test voxtral
run: |
make --jobs=5 --output-sync=target -C backend/go/voxtral test
tests-kokoros:
needs: detect-changes
if: needs.detect-changes.outputs.kokoros == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential cmake pkg-config protobuf-compiler clang libclang-dev
sudo apt-get install -y espeak-ng libespeak-ng-dev libsonic-dev libpcaudio-dev libopus-dev libssl-dev
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Build kokoros
run: |
make -C backend/rust/kokoros kokoros-grpc
- name: Test kokoros
run: |
make -C backend/rust/kokoros test
tests-insightface-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
make build-essential curl unzip ca-certificates git tar
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.26.0'
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
df -h
- name: Build insightface backend image and run both model configurations
run: |
make test-extra-backend-insightface-all
tests-speaker-recognition-grpc:
needs: detect-changes
if: needs.detect-changes.outputs.speaker-recognition == 'true' || needs.detect-changes.outputs.run-all == 'true'
runs-on: ubuntu-latest
timeout-minutes: 90
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
make build-essential curl ca-certificates git tar
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: '1.26.0'
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
df -h
- name: Build speaker-recognition backend image and run the ECAPA-TDNN configuration
run: |
make test-extra-backend-speaker-recognition-all

View File

@@ -3,15 +3,18 @@ name: 'tests'
on:
pull_request:
paths-ignore:
- 'docs/**'
- 'examples/**'
- 'README.md'
- '**/*.md'
- 'backend/**'
push:
branches:
- master
tags:
- '*'
env:
GRPC_VERSION: v1.65.0
concurrency:
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
@@ -21,7 +24,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.25.x']
go-version: ['1.26.x']
steps:
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
@@ -70,7 +73,7 @@ jobs:
sudo rm -rfv build || true
df -h
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
@@ -93,94 +96,16 @@ jobs:
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install build-essential ccache upx-ucl curl ffmpeg
sudo apt-get install -y libgmock-dev clang
# Install UV
curl -LsSf https://astral.sh/uv/install.sh | sh
sudo apt-get install -y ca-certificates cmake patch python3-pip unzip
sudo apt-get install -y libopencv-dev
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
export CUDACXX=/usr/local/cuda/bin/nvcc
# The python3-grpc-tools package in 22.04 is too old
pip install --user grpcio-tools==1.71.0 grpcio==1.71.0
make -C backend/python/transformers
make backends/huggingface backends/llama-cpp backends/local-store backends/silero-vad backends/piper backends/whisper backends/stablediffusion-ggml
env:
CUDA_VERSION: 12-4
sudo apt-get install curl ffmpeg libopus-dev
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Build React UI
run: make react-ui
- name: Test
run: |
PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
tests-aio-container:
runs-on: ubuntu-latest
steps:
- name: Release space from worker
run: |
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
df -h
echo
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
sudo rm -rf /usr/local/lib/android
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
sudo rm -rf /usr/share/dotnet
sudo apt-get remove -y '^mono-.*' || true
sudo apt-get remove -y '^ghc-.*' || true
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
sudo apt-get remove -y 'php.*' || true
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
sudo apt-get remove -y '^google-.*' || true
sudo apt-get remove -y azure-cli || true
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
sudo apt-get remove -y '^gfortran-.*' || true
sudo apt-get autoremove -y
sudo apt-get clean
echo
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
sudo rm -rfv build || true
df -h
- name: Clone
uses: actions/checkout@v5
with:
submodules: true
- name: Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Test
run: |
PATH="$PATH:$HOME/go/bin" make backends/local-store backends/silero-vad backends/llama-cpp backends/whisper backends/piper backends/stablediffusion-ggml docker-build-aio e2e-aio
PATH="$PATH:/root/go/bin" make --jobs 5 --output-sync=target test
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
@@ -190,13 +115,13 @@ jobs:
limit-access-to-actor: true
tests-apple:
runs-on: macOS-14
runs-on: macos-latest
strategy:
matrix:
go-version: ['1.25.x']
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
submodules: true
- name: Setup Go ${{ matrix.go-version }}
@@ -209,12 +134,14 @@ jobs:
run: go version
- name: Dependencies
run: |
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
pip install --user --no-cache-dir grpcio-tools==1.71.0 grpcio==1.71.0
- name: Build llama-cpp-darwin
run: |
make protogen-go
make backends/llama-cpp-darwin
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus ffmpeg
pip install --user --no-cache-dir grpcio-tools grpcio
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Build React UI
run: make react-ui
- name: Test
run: |
export C_INCLUDE_PATH=/usr/local/include

86
.github/workflows/tests-aio.yml vendored Normal file
View File

@@ -0,0 +1,86 @@
---
name: 'tests-aio'
# Runs the all-in-one (AIO) Docker image with real backends + real models.
# Heavy: builds llama-cpp/whisper/piper/silero-vad/stablediffusion-ggml/local-store
# and exercises end-to-end inference inside the container. Moved out of test.yml
# (which used to run on every PR) so PR CI no longer pays this cost.
#
# Triggers:
# - schedule (nightly @ 04:00 UTC) — catches packaging/image regressions within 24h
# - workflow_dispatch — manual run on-demand
# - push to master/tags — sanity check after merge / before release
on:
schedule:
- cron: '0 4 * * *'
workflow_dispatch:
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-tests-aio-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
tests-aio:
runs-on: ubuntu-latest
steps:
- name: Release space from worker
run: |
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
df -h
echo
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
sudo rm -rf /usr/local/lib/android
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
sudo rm -rf /usr/share/dotnet
sudo apt-get remove -y '^mono-.*' || true
sudo apt-get remove -y '^ghc-.*' || true
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
sudo apt-get remove -y 'php.*' || true
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
sudo apt-get remove -y '^google-.*' || true
sudo apt-get remove -y azure-cli || true
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
sudo apt-get remove -y '^gfortran-.*' || true
sudo apt-get autoremove -y
sudo apt-get clean
echo
echo "Listing top largest packages"
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
head -n 30 <<< "${pkgs}"
echo
sudo rm -rfv build || true
df -h
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Test
run: |
PATH="$PATH:$HOME/go/bin" make backends/local-store backends/silero-vad backends/llama-cpp backends/whisper backends/piper backends/stablediffusion-ggml docker-build-e2e e2e-aio
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

70
.github/workflows/tests-e2e.yml vendored Normal file
View File

@@ -0,0 +1,70 @@
---
name: 'E2E Backend Tests'
on:
pull_request:
paths-ignore:
- 'docs/**'
- 'examples/**'
- 'README.md'
- '**/*.md'
- 'backend/**'
push:
branches:
- master
tags:
- '*'
concurrency:
group: ci-tests-e2e-backend-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
tests-e2e-backend:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.25.x']
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
- name: Display Go version
run: go version
- name: Proto Dependencies
run: |
# Install protoc
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
PATH="$PATH:$HOME/go/bin" make protogen-go
- name: Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential libopus-dev
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Build React UI
run: make react-ui
- name: Test Backend E2E
run: |
PATH="$PATH:$HOME/go/bin" make build-mock-backend test-e2e
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

74
.github/workflows/tests-ui-e2e.yml vendored Normal file
View File

@@ -0,0 +1,74 @@
---
name: 'UI E2E Tests'
on:
pull_request:
paths:
- 'core/http/**'
- 'tests/e2e-ui/**'
- 'tests/e2e/mock-backend/**'
push:
branches:
- master
concurrency:
group: ci-tests-ui-e2e-${{ github.head_ref || github.ref }}-${{ github.repository }}
cancel-in-progress: true
jobs:
tests-ui-e2e:
runs-on: ubuntu-latest
strategy:
matrix:
go-version: ['1.26.x']
steps:
- name: Clone
uses: actions/checkout@v6
with:
submodules: true
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- name: Setup Go ${{ matrix.go-version }}
uses: actions/setup-go@v5
with:
go-version: ${{ matrix.go-version }}
cache: false
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: '22'
- name: Proto Dependencies
run: |
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
- name: System Dependencies
run: |
sudo apt-get update
sudo apt-get install -y build-essential libopus-dev
- name: Build UI test server
run: PATH="$PATH:$HOME/go/bin" make build-ui-test-server
- name: Install Playwright
working-directory: core/http/react-ui
run: |
npm install
npx playwright install --with-deps chromium
- name: Run Playwright tests
working-directory: core/http/react-ui
run: npx playwright test
- name: Upload Playwright report
if: ${{ failure() }}
uses: actions/upload-artifact@v7
with:
name: playwright-report
path: core/http/react-ui/playwright-report/
retention-days: 7
- name: Setup tmate session if tests fail
if: ${{ failure() }}
uses: mxschmitt/action-tmate@v3.23
with:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true

View File

@@ -5,11 +5,14 @@ on:
workflow_dispatch:
jobs:
swagger:
if: github.repository == 'mudler/LocalAI'
strategy:
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6
- name: Configure apt mirror on runner
uses: ./.github/actions/configure-apt-mirror
- uses: actions/setup-go@v5
with:
go-version: 'stable'
@@ -25,7 +28,7 @@ jobs:
run: |
make protogen-go swagger
- name: Create Pull Request
uses: peter-evans/create-pull-request@v7
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI

15
.gitignore vendored
View File

@@ -25,6 +25,7 @@ go-bert
# LocalAI build binary
LocalAI
/local-ai
/local-ai-launcher
# prevent above rules from omitting the helm chart
!charts/*
# prevent above rules from omitting the api/localai folder
@@ -35,6 +36,8 @@ LocalAI
models/*
test-models/
test-dir/
tests/e2e-aio/backends
mock-backend
release/
@@ -62,3 +65,15 @@ docs/static/gallery.html
# per-developer customization files for the development container
.devcontainer/customization/*
# React UI build artifacts (keep placeholder dist/index.html)
core/http/react-ui/node_modules/
core/http/react-ui/dist
# Extracted backend binaries for container-based testing
local-backends/
# UI E2E test artifacts
tests/e2e-ui/ui-test-server
core/http/react-ui/playwright-report/
core/http/react-ui/test-results/

6
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "docs/themes/hugo-theme-relearn"]
path = docs/themes/hugo-theme-relearn
url = https://github.com/McShelby/hugo-theme-relearn.git
[submodule "docs/themes/lotusdocs"]
path = docs/themes/lotusdocs
url = https://github.com/colinwilson/lotusdocs
[submodule "backend/rust/kokoros/sources/Kokoros"]
path = backend/rust/kokoros/sources/Kokoros
url = https://github.com/lucasjinreal/Kokoros

53
.golangci.yml Normal file
View File

@@ -0,0 +1,53 @@
version: "2"
# Only issues introduced relative to master are reported. Pre-existing issues
# in the codebase do not fail the lint job; they're treated as a baseline that
# can be cleaned up incrementally. New code (added lines on a branch) is held
# to the full linter set. Locally, `make lint-all` overrides this and reports
# every issue.
issues:
# origin/master because in shallow CI checkouts only the remote-tracking
# branch exists; a bare 'master' ref isn't reachable locally.
new-from-merge-base: origin/master
linters:
default: standard
# staticcheck is noisy on this codebase (mostly QF style suggestions like
# "could use tagged switch" or "unnecessary fmt.Sprintf"). Re-enable
# selectively if a high-signal subset is identified.
disable:
- staticcheck
enable:
- forbidigo
settings:
forbidigo:
forbid:
- pattern: '^t\.Errorf$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(...) instead of t.Errorf. See .agents/coding-style.md.'
- pattern: '^t\.Error$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(...) instead of t.Error. See .agents/coding-style.md.'
- pattern: '^t\.Fatalf$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(Succeed()) / Fail(...) instead of t.Fatalf. See .agents/coding-style.md.'
- pattern: '^t\.Fatal$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(Succeed()) / Fail(...) instead of t.Fatal. See .agents/coding-style.md.'
- pattern: '^t\.Run$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Describe/Context/It instead of t.Run. See .agents/coding-style.md.'
- pattern: '^t\.Skip$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.Skip. See .agents/coding-style.md.'
- pattern: '^t\.Skipf$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.Skipf. See .agents/coding-style.md.'
- pattern: '^t\.SkipNow$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.SkipNow. See .agents/coding-style.md.'
- pattern: '^t\.Logf$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use GinkgoWriter / fmt.Fprintf(GinkgoWriter, ...) instead of t.Logf. See .agents/coding-style.md.'
- pattern: '^t\.Log$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use GinkgoWriter / fmt.Fprintln(GinkgoWriter, ...) instead of t.Log. See .agents/coding-style.md.'
- pattern: '^t\.Fail$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.Fail. See .agents/coding-style.md.'
- pattern: '^t\.FailNow$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.FailNow. See .agents/coding-style.md.'
exclusions:
paths:
# Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
- 'backend/go/whisper/sources'
- 'docs/'

View File

@@ -2,6 +2,7 @@ version: 2
before:
hooks:
- make protogen-go
- make react-ui
- go mod tidy
dist: release
source:
@@ -22,6 +23,9 @@ builds:
goarch:
- amd64
- arm64
ignore:
- goos: darwin
goarch: amd64
archives:
- formats: [ 'binary' ] # this removes the tar of the archives, leaving the binaries alone
name_template: local-ai-{{ .Tag }}-{{ .Os }}-{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}

42
AGENTS.md Normal file
View File

@@ -0,0 +1,42 @@
# LocalAI Agent Instructions
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
## Policy for AI-Assisted Contributions
LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
## Topics
| File | When to read |
|------|-------------|
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
| [.agents/vllm-backend.md](.agents/vllm-backend.md) | Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks |
| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
| [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
| [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |
## Quick Reference
- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
- **Go style**: Prefer `any` over `interface{}`
- **Comments**: Explain *why*, not *what*
- **Docs**: Update `docs/content/` when adding features or changing config
- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
- **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI

1
CLAUDE.md Symbolic link
View File

@@ -0,0 +1 @@
AGENTS.md

View File

@@ -7,10 +7,13 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Setting up the Development Environment](#setting-up-the-development-environment)
- [Environment Variables](#environment-variables)
- [Contributing](#contributing)
- [Submitting an Issue](#submitting-an-issue)
- [Development Workflow](#development-workflow)
- [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
- [Coding Guidelines](#coding-guidelines)
- [AI Coding Assistants](#ai-coding-assistants)
- [Testing](#testing)
- [Documentation](#documentation)
- [Community and Communication](#community-and-communication)
@@ -19,18 +22,122 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
### Prerequisites
- Golang [1.21]
- Git
- macOS/Linux
- **Go 1.21+** (the project currently uses Go 1.26 in `go.mod`, but 1.21 is the minimum supported version)
- [Download Go](https://go.dev/dl/) or install via your package manager
- macOS: `brew install go`
- Ubuntu/Debian: follow the [official instructions](https://go.dev/doc/install) (the `apt` version is often outdated)
- Verify: `go version`
- **Git**
- **GNU Make**
- **GCC / C/C++ toolchain** (required for CGo and native backends)
- **Protocol Buffers compiler** (`protoc`) — needed for gRPC code generation
### Setting up the Development Environment and running localAI in the local environment
#### System dependencies by platform
1. Clone the repository: `git clone https://github.com/go-skynet/LocalAI.git`
2. Navigate to the project directory: `cd LocalAI`
3. Install the required dependencies ( see https://localai.io/basics/build/#build-localai-locally )
4. Build LocalAI: `make build`
5. Run LocalAI: `./local-ai`
6. To Build and live reload: `make build-dev`
<details>
<summary><strong>Ubuntu / Debian</strong></summary>
```bash
sudo apt-get update
sudo apt-get install -y build-essential gcc g++ cmake git wget \
protobuf-compiler libprotobuf-dev pkg-config \
libopencv-dev libgrpc-dev
```
</details>
<details>
<summary><strong>CentOS / RHEL / Fedora</strong></summary>
```bash
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y cmake git wget protobuf-compiler protobuf-devel \
opencv-devel grpc-devel
```
</details>
<details>
<summary><strong>macOS</strong></summary>
```bash
xcode-select --install
brew install cmake git protobuf grpc opencv wget
```
</details>
<details>
<summary><strong>Windows</strong></summary>
Use [WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) with an Ubuntu distribution, then follow the Ubuntu instructions above.
</details>
### Setting up the Development Environment
1. **Clone the repository:**
```bash
git clone https://github.com/mudler/LocalAI.git
cd LocalAI
```
2. **Build LocalAI:**
```bash
make build
```
This runs protobuf generation, installs Go tools, builds the React UI, and compiles the `local-ai` binary. Key build variables you can set:
| Variable | Description | Example |
|---|---|---|
| `BUILD_TYPE` | GPU/accelerator type (`cublas`, `hipblas`, `intel`, ``) | `BUILD_TYPE=cublas make build` |
| `GO_TAGS` | Additional Go build tags | `GO_TAGS=debug make build` |
| `CUDA_MAJOR_VERSION` | CUDA major version (default: `13`) | `CUDA_MAJOR_VERSION=12` |
3. **Run LocalAI:**
```bash
./local-ai
```
4. **Development mode with live reload:**
```bash
make build-dev
```
This installs [`air`](https://github.com/air-verse/air) automatically and watches for file changes, rebuilding and restarting the server on each save.
5. **Containerized build** (no local toolchain needed):
```bash
make docker
```
For GPU-specific Docker builds, see the `docker-build-*` targets in the Makefile and refer to [CLAUDE.md](CLAUDE.md) for detailed backend build instructions.
### Environment Variables
LocalAI is configured primarily through environment variables (or equivalent CLI flags). The most useful ones for development are:
| Variable | Description | Default |
|---|---|---|
| `LOCALAI_DEBUG` | Enable debug mode | `false` |
| `LOCALAI_LOG_LEVEL` | Log verbosity (`error`, `warn`, `info`, `debug`, `trace`) | — |
| `LOCALAI_LOG_FORMAT` | Log format (`default`, `text`, `json`) | `default` |
| `LOCALAI_MODELS_PATH` | Path to model files | `./models` |
| `LOCALAI_BACKENDS_PATH` | Path to backend binaries | `./backends` |
| `LOCALAI_CONFIG_DIR` | Directory for dynamic config files (API keys, external backends) | `./configuration` |
| `LOCALAI_THREADS` | Number of threads for inference | — |
| `LOCALAI_ADDRESS` | Bind address for the API server | `:8080` |
| `LOCALAI_API_KEY` | API key(s) for authentication | — |
| `LOCALAI_CORS` | Enable CORS | `false` |
| `LOCALAI_DISABLE_WEBUI` | Disable the web UI | `false` |
See `core/cli/run.go` for the full list of supported environment variables.
## Contributing
@@ -40,43 +147,162 @@ We welcome contributions from everyone! To get started, follow these steps:
If you find a bug, have a feature request, or encounter any issues, please check the [issue tracker](https://github.com/go-skynet/LocalAI/issues) to see if a similar issue has already been reported. If not, feel free to [create a new issue](https://github.com/go-skynet/LocalAI/issues/new) and provide as much detail as possible.
### Creating a Pull Request (PR)
### Development Workflow
#### Branch naming conventions
Use a descriptive branch name that indicates the type and scope of the change:
- `feature/<short-description>` — new functionality
- `fix/<short-description>` — bug fixes
- `docs/<short-description>` — documentation changes
- `refactor/<short-description>` — code refactoring
#### Commit messages
- Use a short, imperative subject line (e.g., "feat: add whisper backend support", not "Added whisper backend support")
- Keep the subject under 72 characters
- Use the body to explain **why** the change was made when the subject alone is not sufficient
- Use [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/)
#### Creating a Pull Request (PR)
Before jumping into a PR for a massive feature or big change, it is preferred to discuss it first via an issue.
1. Fork the repository.
2. Create a new branch with a descriptive name: `git checkout -b [branch name]`
3. Make your changes and commit them.
4. Push the changes to your fork: `git push origin [branch name]`
5. Create a new pull request from your branch to the main project's `main` or `master` branch.
6. Provide a clear description of your changes in the pull request.
7. Make any requested changes during the review process.
8. Once your PR is approved, it will be merged into the main project.
2. Create a new branch: `git checkout -b feature/my-change`
3. Make your changes, keeping commits focused and atomic.
4. Run tests locally before pushing (see [Testing](#testing) below).
5. Push to your fork: `git push origin feature/my-change`
6. Open a pull request against the `master` branch.
7. Fill in the PR description with:
- What the change does and why
- How it was tested
- Any breaking changes or migration steps
8. Respond to review feedback promptly. Push follow-up commits rather than force-pushing amended commits so reviewers can see incremental changes.
9. Once approved, a maintainer will merge your PR.
## Coding Guidelines
- No specific coding guidelines at the moment. Please make sure the code can be tested. The most popular lint tools like [`golangci-lint`](https://golangci-lint.run) can help you here.
This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
### General Principles
- Write code that can be tested. All new features and bug fixes should include test coverage.
- Use comments sparingly to explain **why** code does something, not **what** it does. Comments should add context that would be difficult to deduce from reading the code alone.
- Keep changes focused. Avoid unrelated refactors, formatting changes, or feature additions in the same PR.
### Go Code
- Prefer modern Go idioms — for example, use `any` instead of `interface{}`.
- Use [`golangci-lint`](https://golangci-lint.run) to catch common issues before submitting a PR.
- Use [`github.com/mudler/xlog`](https://github.com/mudler/xlog) for logging (same API as `slog`). Do not use `fmt.Println` or the standard `log` package for operational logging.
- Use tab indentation for Go files (as defined in `.editorconfig`).
### Python Code
- Use 4-space indentation (as defined in `.editorconfig`).
- Include a `requirements.txt` for any new dependencies.
### Code Review
- All contributions go through code review via pull requests.
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
- Be responsive to review feedback and keep discussions constructive.
## AI Coding Assistants
LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
```
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
```
Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
Basic development tools (git, go, make, editors) should not be listed.
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
- Contributions must remain compatible with LocalAI's **MIT License**.
## Testing
`make test` cannot handle all the model now. Please be sure to add a test case for the new features or the part was changed.
All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
### Running AIO tests
### Running unit tests
All-In-One images has a set of tests that automatically verifies that most of the endpoints works correctly, a flow can be :
```bash
make test
```
This downloads test model fixtures, runs protobuf generation, and executes the full test suite including llama-gguf, TTS, and stable-diffusion tests. Note: some tests require model files to be downloaded, so the first run may take longer.
To run tests for a specific package:
```bash
go test ./core/config/...
go test ./pkg/model/...
```
To run a specific test by name using Ginkgo's `--focus` flag:
```bash
go run github.com/onsi/ginkgo/v2/ginkgo --focus="should load a model" -v -r ./core/
```
### Running end-to-end tests
The e2e tests run LocalAI in a Docker container and exercise the API:
```bash
make test-e2e
```
### Running E2E container tests
These tests build a standard LocalAI Docker image and run it with pre-configured model configs to verify that most endpoints work correctly:
```bash
# Build the LocalAI docker image
make DOCKER_IMAGE=local-ai docker
make docker-build-e2e
# Build the corresponding AIO image
BASE_IMAGE=local-ai DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio
# Run the e2e tests (uses model configs from tests/e2e-aio/models/)
make e2e-aio
```
# Run the AIO e2e tests
LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio make run-e2e-aio
### Testing backends
To prepare and test extra (Python) backends:
```bash
make prepare-test-extra # build Python backends for testing
make test-extra # run backend-specific tests
```
## Documentation
We are welcome the contribution of the documents, please open new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
We welcome contributions to the documentation. Please open a new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
### Gallery YAML Schema
LocalAI provides a JSON Schema for gallery model YAML files at:
`core/schema/gallery-model.schema.json`
This schema mirrors the internal gallery model configuration and can be used by editors (such as VS Code) to enable autocomplete, validation, and inline documentation when creating or modifying gallery files.
To use it with the YAML language server, add the following comment at the top of a gallery YAML file:
```yaml
# yaml-language-server: $schema=../core/schema/gallery-model.schema.json
```
## Community and Communication

View File

@@ -1,15 +1,23 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
ARG BASE_IMAGE=ubuntu:24.04
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
ARG UBUNTU_CODENAME=noble
# Optional alternate Ubuntu apt mirror(s). Empty = use upstream.
# See .docker/apt-mirror.sh for accepted values.
ARG APT_MIRROR=""
ARG APT_PORTS_MIRROR=""
FROM ${BASE_IMAGE} AS requirements
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates curl wget espeak-ng libgomp1 \
ffmpeg libopenblas-base libopenblas-dev && \
ffmpeg libopenblas0 libopenblas-dev libopus0 sox && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
@@ -23,6 +31,7 @@ ARG SKIP_DRIVERS=false
ARG TARGETARCH
ARG TARGETVARIANT
ENV BUILD_TYPE=${BUILD_TYPE}
ARG UBUNTU_VERSION=2404
RUN mkdir -p /run/localai
RUN echo "default" > /run/localai/capability
@@ -33,11 +42,45 @@ RUN <<EOT bash
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils mesa-vulkan-drivers
if [ "amd64" = "$TARGETARCH" ]; then
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
mkdir -p /opt/vulkan-sdk && \
mv 1.4.335.0 /opt/vulkan-sdk/ && \
cd /opt/vulkan-sdk/1.4.335.0 && \
./vulkansdk --no-deps --maxjobs \
vulkan-loader \
vulkan-validationlayers \
vulkan-extensionlayer \
vulkan-tools \
shaderc && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
rm -rf /opt/vulkan-sdk
fi
if [ "arm64" = "$TARGETARCH" ]; then
mkdir vulkan && cd vulkan && \
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
tar -xvf vulkan-sdk.tar.xz && \
rm vulkan-sdk.tar.xz && \
cd 1.4.335.0 && \
cp -rfv aarch64/bin/* /usr/bin/ && \
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
cp -rfv aarch64/include/* /usr/include/ && \
cp -rfv aarch64/share/* /usr/share/ && \
cd ../.. && \
rm -rf vulkan
fi
ldconfig && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
echo "vulkan" > /run/localai/capability
@@ -46,15 +89,19 @@ EOT
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
else
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
fi
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
@@ -65,26 +112,34 @@ RUN <<EOT bash
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
apt-get install -y --no-install-recommends \
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
fi
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
echo "nvidia" > /run/localai/capability
echo "nvidia-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
fi
EOT
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
echo "nvidia-l4t" > /run/localai/capability
echo "nvidia-l4t-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
fi
EOT
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu2204-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu2204-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu2204-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
apt-get update && apt-get install -y nvpl
fi
EOT
@@ -101,6 +156,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
hipblaslt-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
@@ -128,13 +184,12 @@ ENV PATH=/opt/rocm/bin:${PATH}
# The requirements-core target is common to all images. It should not be placed in requirements-core unless every single build will use it.
FROM requirements-drivers AS build-requirements
ARG GO_VERSION=1.22.6
ARG CMAKE_VERSION=3.26.4
ARG GO_VERSION=1.26.0
ARG CMAKE_VERSION=3.31.10
ARG CMAKE_FROM_SOURCE=false
ARG TARGETARCH
ARG TARGETVARIANT
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
@@ -143,6 +198,7 @@ RUN apt-get update && \
curl libssl-dev \
git \
git-lfs \
libopus-dev pkg-config \
unzip upx-ucl python3 python-is-python3 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
@@ -171,14 +227,6 @@ RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
COPY --chmod=644 custom-ca-certs/* /usr/local/share/ca-certificates/
RUN update-ca-certificates
# OpenBLAS requirements and stable diffusion
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libopenblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN test -n "$TARGETARCH" \
|| (echo 'warn: missing $TARGETARCH, either set this `ARG` manually, or run using `docker buildkit`')
@@ -199,10 +247,15 @@ WORKDIR /build
# https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/APT-Repository-not-working-signatures-invalid/m-p/1599436/highlight/true#M36143
# This is a temporary workaround until Intel fixes their repository
FROM ${INTEL_BASE_IMAGE} AS intel
ARG UBUNTU_CODENAME=noble
ARG APT_MIRROR
ARG APT_PORTS_MIRROR
RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
RUN apt-get update && \
RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${UBUNTU_CODENAME}/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
apt-get update && \
apt-get install -y --no-install-recommends \
intel-oneapi-runtime-libs && \
apt-get clean && \
@@ -215,7 +268,7 @@ RUN apt-get update && \
FROM build-requirements AS builder-base
ARG GO_TAGS=""
ARG GO_TAGS="auth"
ARG GRPC_BACKENDS
ARG MAKEFLAGS
ARG LD_FLAGS="-s -w"
@@ -251,6 +304,17 @@ EOT
###################################
###################################
# Build React UI
FROM node:25-slim AS react-ui-builder
WORKDIR /app
COPY core/http/react-ui/package*.json ./
RUN npm install
COPY core/http/react-ui/ ./
RUN npm run build
###################################
###################################
# Compile backends first in a separate stage
FROM builder-base AS builder-backends
ARG TARGETARCH
@@ -267,7 +331,6 @@ COPY ./.git ./.git
# Some of the Go backends use libs from the main src, we could further optimize the caching by building the CPP backends before here
COPY ./pkg/grpc ./pkg/grpc
COPY ./pkg/utils ./pkg/utils
COPY ./pkg/langchain ./pkg/langchain
RUN ls -l ./
RUN make protogen-go
@@ -280,6 +343,9 @@ WORKDIR /build
COPY . .
# Copy pre-built React UI
COPY --from=react-ui-builder /app/dist ./core/http/react-ui/dist
## Build the binary
## If we're on arm64 AND using cublas/hipblas, skip some of the llama-compat backends to save space
## Otherwise just run the normal build
@@ -324,14 +390,17 @@ COPY ./entrypoint.sh .
# Copy the binary
COPY --from=builder /build/local-ai ./
# Copy the opus shim if it was built
RUN --mount=from=builder,src=/build/,dst=/mnt/build \
if [ -f /mnt/build/libopusshim.so ]; then cp /mnt/build/libopusshim.so ./; fi
# Make sure the models directory exists
RUN mkdir -p /models /backends
RUN mkdir -p /models /backends /data
# Define the health check command
HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
CMD curl -f ${HEALTHCHECK_ENDPOINT} || exit 1
VOLUME /models /backends
VOLUME /models /backends /configuration /data
EXPOSE 8080
ENTRYPOINT [ "/entrypoint.sh" ]

View File

@@ -1,8 +0,0 @@
ARG BASE_IMAGE=ubuntu:22.04
FROM ${BASE_IMAGE}
RUN apt-get update && apt-get install -y pciutils && apt-get clean
COPY aio/ /aio
ENTRYPOINT [ "/aio/entrypoint.sh" ]

1147
Makefile
View File

File diff suppressed because it is too large Load Diff

388
README.md
View File

@@ -5,26 +5,14 @@
</h1>
<p align="center">
<a href="https://github.com/go-skynet/LocalAI/fork" target="blank">
<img src="https://img.shields.io/github/forks/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI forks"/>
</a>
<a href="https://github.com/go-skynet/LocalAI/stargazers" target="blank">
<img src="https://img.shields.io/github/stars/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI stars"/>
</a>
<a href="https://github.com/go-skynet/LocalAI/pulls" target="blank">
<img src="https://img.shields.io/github/issues-pr/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI pull-requests"/>
</a>
<a href='https://github.com/go-skynet/LocalAI/releases'>
<img src='https://img.shields.io/github/release/go-skynet/LocalAI?&label=Latest&style=for-the-badge'>
</a>
</p>
<p align="center">
<a href="https://hub.docker.com/r/localai/localai" target="blank">
<img src="https://img.shields.io/badge/dockerhub-images-important.svg?logo=Docker" alt="LocalAI Docker hub"/>
</a>
<a href="https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest" target="blank">
<img src="https://img.shields.io/badge/quay.io-images-important.svg?" alt="LocalAI Quay.io"/>
<a href="LICENSE" target="blank">
<img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" alt="LocalAI License"/>
</a>
</p>
@@ -33,7 +21,7 @@
<img src="https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white&label=LocalAI_API" alt="Follow LocalAI_API"/>
</a>
<a href="https://discord.gg/uJAeKSAGDy" target="blank">
<img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/>
<img src="https://img.shields.io/badge/dynamic/json?color=blue&label=Discord&style=for-the-badge&query=approximate_member_count&url=https%3A%2F%2Fdiscordapp.com%2Fapi%2Finvites%2FuJAeKSAGDy%3Fwith_counts%3Dtrue&logo=discord" alt="Join LocalAI Discord Community"/>
</a>
</p>
@@ -41,329 +29,186 @@
<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>
> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
>
> [💻 Quickstart](https://localai.io/basics/getting_started/) [🖼️ Models](https://models.localai.io/) [🚀 Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [🛫 Examples](https://github.com/mudler/LocalAI-examples) Try on
[![Telegram](https://img.shields.io/badge/Telegram-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white)](https://t.me/localaiofficial_bot)
**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)
- **Drop-in API compatibility** — OpenAI, Anthropic, ElevenLabs APIs
- **36+ backends** — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
- **Any hardware** — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
- **Multi-user ready** — API key auth, user quotas, role-based access
- **Built-in AI agents** — autonomous agents with tool use, RAG, MCP, and skills
- **Privacy-first** — your data never leaves your infrastructure
**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
Created by [Ettore Di Giacinto](https://github.com/mudler) and maintained by the [LocalAI team](#team).
> [:book: Documentation](https://localai.io/) | [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) | [💻 Quickstart](https://localai.io/basics/getting_started/) | [🖼️ Models](https://models.localai.io/) | [❓FAQ](https://localai.io/faq/)
## 📚🆕 Local Stack Family
## Guided tour
🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
<table>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalAGI">
<img src="https://raw.githubusercontent.com/mudler/LocalAGI/refs/heads/main/webui/react-ui/public/logo_2.png" width="300" alt="LocalAGI Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalAGI">LocalAGI</a></h3>
<p>A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.</p>
</td>
</tr>
<tr>
<td width="50%" valign="top">
<a href="https://github.com/mudler/LocalRecall">
<img src="https://raw.githubusercontent.com/mudler/LocalRecall/refs/heads/main/static/localrecall_horizontal.png" width="300" alt="LocalRecall Logo">
</a>
</td>
<td width="50%" valign="top">
<h3><a href="https://github.com/mudler/LocalRecall">LocalRecall</a></h3>
<p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
</td>
</tr>
</table>
<details>
## Screenshots
<summary>
Click to see more!
</summary>
#### User and auth
| Talk Interface | Generate Audio |
| --- | --- |
| ![Screenshot 2025-03-31 at 12-01-36 LocalAI - Talk](./docs/assets/images/screenshots/screenshot_tts.png) | ![Screenshot 2025-03-31 at 12-01-29 LocalAI - Generate audio with voice-en-us-ryan-low](./docs/assets/images/screenshots/screenshot_tts.png) |
https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c
| Models Overview | Generate Images |
| --- | --- |
| ![Screenshot 2025-03-31 at 12-01-20 LocalAI - Models](./docs/assets/images/screenshots/screenshot_gallery.png) | ![Screenshot 2025-03-31 at 12-31-41 LocalAI - Generate images with flux 1-dev](./docs/assets/images/screenshots/screenshot_image.png) |
#### Agents
| Chat Interface | Home |
| --- | --- |
| ![Screenshot 2025-03-31 at 11-57-44 LocalAI - Chat with localai-functioncall-qwen2 5-7b-v0 5](./docs/assets/images/screenshots/screenshot_chat.png) | ![Screenshot 2025-03-31 at 11-57-23 LocalAI API - c2a39e3 (c2a39e3639227cfd94ffffe9f5691239acc275a8)](./docs/assets/images/screenshots/screenshot_home.png) |
https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
| Login | Swarm |
| --- | --- |
|![Screenshot 2025-03-31 at 12-09-59 ](./docs/assets/images/screenshots/screenshot_login.png) | ![Screenshot 2025-03-31 at 12-10-39 LocalAI - P2P dashboard](./docs/assets/images/screenshots/screenshot_p2p.png) |
#### Usage metrics per user
## 💻 Quickstart
https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f
Run the installer script:
#### Fine-tuning and Quantization
```bash
# Basic installation
curl https://localai.io/install.sh | sh
```
https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee
For more installation options, see [Installer Options](https://localai.io/docs/advanced/installer/).
#### WebRTC
### macOS Download:
https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b
</details>
## Quickstart
### macOS
<a href="https://github.com/mudler/LocalAI/releases/latest/download/LocalAI.dmg">
<img src="https://img.shields.io/badge/Download-macOS-blue?style=for-the-badge&logo=apple&logoColor=white" alt="Download LocalAI for macOS"/>
</a>
> Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
> **Note:** The DMG is not signed by Apple. After installing, run: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`. See [#6268](https://github.com/mudler/LocalAI/issues/6268) for details.
Or run with docker:
### Containers (Docker, podman, ...)
> **💡 Docker Run vs Docker Start**
>
> - `docker run` creates and starts a new container. If a container with the same name already exists, this command will fail.
> - `docker start` starts an existing container that was previously created with `docker run`.
>
> If you've already run LocalAI before and want to start it again, use: `docker start -i local-ai`
> Already ran LocalAI before? Use `docker start -i local-ai` to restart an existing container.
### CPU only image:
#### CPU only:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
```
### NVIDIA GPU Images:
#### NVIDIA GPU:
```bash
# CUDA 12.0
# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CUDA 11.7
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11
# NVIDIA Jetson (L4T) ARM64
# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
```
### AMD GPU Images (ROCm):
#### AMD GPU (ROCm):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
```
### Intel GPU Images (oneAPI):
#### Intel GPU (oneAPI):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
```
### Vulkan GPU Images:
#### Vulkan GPU:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
```
### AIO Images (pre-downloaded models):
### Loading models
```bash
# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11
# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
```
For more information about the AIO images and pre-downloaded models, see [Container Documentation](https://localai.io/basics/container/).
To load models:
```bash
# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with the phi-2 model directly from huggingface
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
```
> **Automatic Backend Detection**: When you install models from the gallery or YAML files, LocalAI automatically detects your system's GPU capabilities (NVIDIA, AMD, Intel) and downloads the appropriate backend. For advanced configuration options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/#automatic-backend-detection).
> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).
For more information, see [💻 Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).
## 📰 Latest project news
## Latest News
- October 2025: 🔌 [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support added for agentic capabilities with external tools
- September 2025: New Launcher application for MacOS and Linux, extended support to many backends for Mac and Nvidia L4T devices. Models: Added MLX-Audio, WAN 2.2. WebUI improvements and Python-based backends now ships portable python environments.
- August 2025: MLX, MLX-VLM, Diffusers and llama.cpp are now supported on Mac M1/M2/M3+ chips ( with `development` suffix in the gallery ): https://github.com/mudler/LocalAI/pull/6049 https://github.com/mudler/LocalAI/pull/6119 https://github.com/mudler/LocalAI/pull/6121 https://github.com/mudler/LocalAI/pull/6060
- July/August 2025: 🔍 [Object Detection](https://localai.io/features/object-detection/) added to the API featuring [rf-detr](https://github.com/roboflow/rf-detr)
- July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. [Read the release notes](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
- June 2025: [Backend management](https://github.com/mudler/LocalAI/pull/5607) has been added. Attention: extras images are going to be deprecated from the next release! Read [the backend management PR](https://github.com/mudler/LocalAI/pull/5607).
- May 2025: [Audio input](https://github.com/mudler/LocalAI/pull/5466) and [Reranking](https://github.com/mudler/LocalAI/pull/5396) in llama.cpp backend, [Realtime API](https://github.com/mudler/LocalAI/pull/5392), Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
- May 2025: Important: image name changes [See release](https://github.com/mudler/LocalAI/releases/tag/v2.29.0)
- Apr 2025: Rebrand, WebUI enhancements
- Apr 2025: [LocalAGI](https://github.com/mudler/LocalAGI) and [LocalRecall](https://github.com/mudler/LocalRecall) join the LocalAI family stack.
- Apr 2025: WebUI overhaul, AIO images updates
- Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images
- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
- Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
- Aug 2024: 🆕 FLUX-1, [P2P Explorer](https://explorer.localai.io)
- July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- May 2024: 🔥🔥 Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs https://localai.io/features/distribute/
- May 2024: 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
- **April 2026**: [Voice recognition](https://github.com/mudler/LocalAI/pull/9500), [Face recognition, identification & liveness detection](https://github.com/mudler/LocalAI/pull/9480), [Ollama API compatibility](https://github.com/mudler/LocalAI/pull/9284), [Video generation in stable-diffusion.ggml](https://github.com/mudler/LocalAI/pull/9420), [Backend versioning with auto-upgrade](https://github.com/mudler/LocalAI/pull/9315), [Pin models & load-on-demand toggle](https://github.com/mudler/LocalAI/pull/9309), [Universal model importer](https://github.com/mudler/LocalAI/pull/9466), new backends: [sglang](https://github.com/mudler/LocalAI/pull/9359), [ik-llama-cpp](https://github.com/mudler/LocalAI/pull/9326), [TurboQuant](https://github.com/mudler/LocalAI/pull/9355), [sam.cpp](https://github.com/mudler/LocalAI/pull/9288), [Kokoros](https://github.com/mudler/LocalAI/pull/9212), [qwen3tts.cpp](https://github.com/mudler/LocalAI/pull/9316), [tinygrad multimodal](https://github.com/mudler/LocalAI/pull/9364)
- **March 2026**: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790), [MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
- **February 2026**: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
- **January 2026**: **LocalAI 3.10.0** — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0)
- **December 2025**: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic multi-GPU model fitting (llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
- **November 2025**: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245), [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
- **October 2025**: [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support for agentic capabilities
- **September 2025**: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
- **August 2025**: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
- **July 2025**: All backends migrated outside the main binary — [lightweight, modular architecture](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
For older news and full release notes, see [GitHub Releases](https://github.com/mudler/LocalAI/releases) and the [News page](https://localai.io/basics/news/).
## 🚀 [Features](https://localai.io/features/)
## Features
- 🧩 [Backend Gallery](https://localai.io/backends/): Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.
- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation](https://localai.io/features/image-generation)
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 🔍 [Object Detection](https://localai.io/features/object-detection/)
- 📈 [Reranker API](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
- 🆕🔌 [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) - Agentic capabilities with external tools and [LocalAGI's Agentic capabilities](https://github.com/mudler/LocalAGI)
- 🔊 Voice activity detection (Silero-VAD support)
- 🌍 Integrated WebUI!
- [Text generation](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [and more](https://localai.io/model-compatibility/))
- [Text to Audio](https://localai.io/features/text-to-audio/)
- [Audio to Text](https://localai.io/features/audio-to-text/)
- [Image generation](https://localai.io/features/image-generation)
- [OpenAI-compatible tools API](https://localai.io/features/openai-functions/)
- [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
- [Embeddings generation](https://localai.io/features/embeddings/)
- [Constrained grammars](https://localai.io/features/constrained_grammars/)
- [Download models from Huggingface](https://localai.io/models/)
- [Vision API](https://localai.io/features/gpt-vision/)
- [Object Detection](https://localai.io/features/object-detection/)
- [Reranker API](https://localai.io/features/reranker/)
- [P2P Inferencing](https://localai.io/features/distribute/)
- [Distributed Mode](https://localai.io/features/distributed-mode/) — Horizontal scaling with PostgreSQL + NATS
- [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/)
- [Built-in Agents](https://localai.io/features/agents/) — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and [Agent Hub](https://agenthub.localai.io)
- [Backend Gallery](https://localai.io/backends/) — Install/remove backends on the fly via OCI images
- Voice Activity Detection (Silero-VAD)
- Integrated WebUI
## 🧩 Supported Backends & Acceleration
## Supported Backends & Acceleration
LocalAI supports a comprehensive range of AI backends with multiple acceleration options:
LocalAI supports **36+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
### Text Generation & Language Models
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12, ROCm, Intel |
| **transformers** | HuggingFace transformers framework | CUDA 11/12, ROCm, Intel, CPU |
| **exllama2** | GPTQ inference library | CUDA 12 |
| **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
| **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
### Audio & Speech Processing
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12, ROCm, Intel, CPU |
| **bark** | Text-to-audio generation | CUDA 12, ROCm, Intel |
| **bark-cpp** | C++ implementation of Bark | CUDA, Metal, CPU |
| **coqui** | Advanced TTS with 1100+ languages | CUDA 12, ROCm, Intel, CPU |
| **kokoro** | Lightweight TTS model | CUDA 12, ROCm, Intel, CPU |
| **chatterbox** | Production-grade TTS | CUDA 11/12, CPU |
| **piper** | Fast neural TTS system | CPU |
| **kitten-tts** | Kitten TTS models | CPU |
| **silero-vad** | Voice Activity Detection | CPU |
| **neutts** | Text-to-speech with voice cloning | CUDA 12, ROCm, CPU |
## Resources
### Image & Video Generation
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12, Intel SYCL, Vulkan, CPU |
| **diffusers** | HuggingFace diffusion models | CUDA 11/12, ROCm, Intel, Metal, CPU |
- [Documentation](https://localai.io/)
- [LLM fine-tuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [Build from source](https://localai.io/basics/build/)
- [Kubernetes installation](https://localai.io/basics/getting_started/#run-localai-in-kubernetes)
- [Integrations & community projects](https://localai.io/docs/integrations/)
- [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4)
- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
- [Examples](https://github.com/mudler/LocalAI-examples)
### Specialized AI Tasks
| Backend | Description | Acceleration Support |
|---------|-------------|---------------------|
| **rfdetr** | Real-time object detection | CUDA 12, Intel, CPU |
| **rerankers** | Document reranking API | CUDA 11/12, ROCm, Intel, CPU |
| **local-store** | Vector database | CPU |
| **huggingface** | HuggingFace API integration | API-based |
## Team
### Hardware Acceleration Matrix
LocalAI is maintained by a small team of humans, together with the wider community of contributors.
| Acceleration Type | Supported Backends | Hardware Support |
|-------------------|-------------------|------------------|
| **NVIDIA CUDA 11** | llama.cpp, whisper, stablediffusion, diffusers, rerankers, bark, chatterbox | Nvidia hardware |
| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware |
| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark, neutts | AMD Graphics |
| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark | Intel Arc, Intel iGPUs |
| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cpp | Apple M1/M2/M3+ |
| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
| **NVIDIA Jetson** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI |
| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |
- **[Ettore Di Giacinto](https://github.com/mudler)** — original author and project lead
- **[Richard Palethorpe](https://github.com/richiejp)** — maintainer
### 🔗 Community and integrations
Build and deploy custom containers:
- https://github.com/sozercan/aikit
WebUIs:
- https://github.com/Jirubizu/localai-admin
- https://github.com/go-skynet/LocalAI-frontend
- QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot
Agentic Libraries:
- https://github.com/mudler/cogito
MCPs:
- https://github.com/mudler/MCPs
Model galleries
- https://github.com/go-skynet/model-gallery
Voice:
- https://github.com/richiejp/VoxInput
Other:
- Helm chart https://github.com/go-skynet/helm-charts
- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
- Langchain: https://python.langchain.com/docs/integrations/providers/localai/
- Terminal utility https://github.com/djcopley/ShellOracle
- Local Smart assistant https://github.com/mudler/LocalAGI
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
- Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
- Another Telegram Bot https://github.com/JackBekket/Hellper
- Auto-documentation https://github.com/JackBekket/Reflexia
- Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper
- Github Actions: https://github.com/marketplace/actions/start-localai
- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
### 🔗 Resources
- [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [How to build locally](https://localai.io/basics/build/index.html)
- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
- [Projects integrating LocalAI](https://localai.io/docs/integrations/)
- [How tos section](https://io.midori-ai.xyz/howtos/) (curated by our community)
## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
- [Run Visual studio code with LocalAI (SUSE)](https://www.suse.com/c/running-ai-locally/)
- 🆕 [Run LocalAI on Jetson Nano Devkit](https://mudler.pm/posts/local-ai-jetson-nano-devkit/)
- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/blog/low-code-llm-apps-with-local-ai-flowise-and-pulumi/)
- [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)
A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in [Discord](https://discord.gg/uJAeKSAGDy) — LocalAI is a community-driven project and wouldn't exist without you. See the full [contributors list](https://github.com/mudler/LocalAI/graphs/contributors).
## Citation
@@ -379,7 +224,7 @@ If you utilize this repository, data in a downstream project, please consider ci
howpublished = {\url{https://github.com/go-skynet/LocalAI}},
```
## ❤️ Sponsors
## Sponsors
> Do you find LocalAI useful?
@@ -396,17 +241,21 @@ A huge thank you to our generous sponsors who support this project covering CI e
</a>
</p>
## 🌟 Star history
### Individual sponsors
A special thanks to individual sponsors, a full list is on [GitHub](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler). Special shout out to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
## Star history
[![LocalAI Star history Chart](https://api.star-history.com/svg?repos=go-skynet/LocalAI&type=Date)](https://star-history.com/#go-skynet/LocalAI&Date)
## 📖 License
## License
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/) and maintained by the [LocalAI team](#team).
MIT - Author Ettore Di Giacinto <mudler@localai.io>
## 🙇 Acknowledgements
## Acknowledgements
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
@@ -417,10 +266,11 @@ LocalAI couldn't have been built without the help of great software already avai
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/ggerganov/whisper.cpp
- https://github.com/rhasspy/piper
- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation
## 🤗 Contributors
## Contributors
This is a community project, a special thanks to our contributors! 🤗
This is a community project, a special thanks to our contributors!
<a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
<img src="https://contrib.rocks/image?repo=go-skynet/LocalAI" />
</a>

View File

@@ -8,10 +8,24 @@ At LocalAI, we take the security of our software seriously. We understand the im
We provide support and updates for certain versions of our software. The following table outlines which versions are currently supported with security updates:
| Version | Supported |
| ------- | ------------------ |
| > 2.0 | :white_check_mark: |
| < 2.0 | :x: |
| Version Series | Support Level | Details |
| -------------- | ------------- | ------- |
| 3.x | :white_check_mark: Actively supported | Full security updates and bug fixes for the latest minor versions. |
| 2.x | :warning: Security fixes only | Critical security patches only, until **December 31, 2025**. |
| 1.x | :x: End-of-life (EOL) | No longer supported as of **January 1, 2024**. No security fixes will be provided. |
### What each support level means
- **Actively supported (3.x):** Receives all security updates, bug fixes, and new features. Users should stay on the latest 3.x minor release for the best protection.
- **Security fixes only (2.x):** Receives only critical security patches (e.g., remote code execution, authentication bypass, data exposure). No bug fixes or new features. Support ends December 31, 2025.
- **End-of-life (1.x):** No updates of any kind. Users on 1.x are strongly encouraged to upgrade immediately, as known vulnerabilities will not be patched.
### Migrating from older versions
If you are running an unsupported or soon-to-be-unsupported version, we recommend upgrading as soon as possible:
- **From 1.x to 3.x:** Version 1.x reached end-of-life on January 1, 2024. Review the [release notes](https://github.com/mudler/LocalAI/releases) for breaking changes across major versions, and upgrade directly to the latest 3.x release.
- **From 2.x to 3.x:** While 2.x still receives critical security patches until December 31, 2025, we recommend planning your migration to 3.x to benefit from ongoing improvements and full support.
Please ensure that you are using a supported version to receive the latest security updates.

View File

@@ -1,5 +0,0 @@
## AIO CPU size
Use this image with CPU-only.
Please keep using only C++ backends so the base image is as small as possible (without CUDA, cuDNN, python, etc).

View File

@@ -1,13 +0,0 @@
embeddings: true
name: text-embedding-ada-002
backend: llama-cpp
parameters:
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "Your text string goes here",
"model": "text-embedding-ada-002"
}'

View File

@@ -1,33 +0,0 @@
name: jina-reranker-v1-base-en
reranking: true
f16: true
parameters:
model: jina-reranker-v1-tiny-en.f16.gguf
backend: llama-cpp
download_files:
- filename: jina-reranker-v1-tiny-en.f16.gguf
sha256: 5f696cf0d0f3d347c4a279eee8270e5918554cdac0ed1f632f2619e4e8341407
uri: huggingface://mradermacher/jina-reranker-v1-tiny-en-GGUF/jina-reranker-v1-tiny-en.f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "jina-reranker-v1-base-en",
"query": "Organic skincare products for sensitive skin",
"documents": [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
],
"top_n": 3
}'

View File

@@ -1,18 +0,0 @@
name: whisper-1
backend: whisper
parameters:
model: ggml-whisper-base.bin
usage: |
## example audio file
wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
## Send the example audio file to the transcriptions endpoint
curl http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@$PWD/gb1.ogg" -F model="whisper-1"
download_files:
- filename: "ggml-whisper-base.bin"
sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"

View File

@@ -1,15 +0,0 @@
name: tts-1
download_files:
- filename: voice-en-us-amy-low.tar.gz
uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
backend: piper
parameters:
model: en-us-amy-low.onnx
usage: |
To test if this model works as expected, you can use the following curl command:
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"voice-en-us-amy-low",
"input": "Hi, this is a test."
}'

View File

@@ -1,8 +0,0 @@
backend: silero-vad
name: silero-vad
parameters:
model: silero-vad.onnx
download_files:
- filename: silero-vad.onnx
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808

View File

@@ -1,50 +0,0 @@
context_size: 4096
f16: true
backend: llama-cpp
mmap: true
mmproj: minicpm-v-4_5-mmproj-f16.gguf
name: gpt-4o
parameters:
model: minicpm-v-4_5-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
- <|endoftext|>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: minicpm-v-4_5-Q4_K_M.gguf
sha256: c1c3c33100b15b4caf7319acce4e23c0eb0ce1cbd12f70e8d24f05aa67b7512f
uri: huggingface://openbmb/MiniCPM-V-4_5-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-4_5-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-4_5-gguf/mmproj-model-f16.gguf
sha256: 7a7225a32e8d453aaa3d22d8c579b5bf833c253f784cdb05c99c9a76fd616df8

View File

@@ -1,138 +0,0 @@
#!/bin/bash
echo "===> LocalAI All-in-One (AIO) container starting..."
GPU_ACCELERATION=false
GPU_VENDOR=""
function check_intel() {
if lspci | grep -E 'VGA|3D' | grep -iq intel; then
echo "Intel GPU detected"
if [ -d /opt/intel ]; then
GPU_ACCELERATION=true
GPU_VENDOR=intel
else
echo "Intel GPU detected, but Intel GPU drivers are not installed. GPU acceleration will not be available."
fi
fi
}
function check_nvidia_wsl() {
if lspci | grep -E 'VGA|3D' | grep -iq "Microsoft Corporation Device 008e"; then
# We make the assumption this WSL2 cars is NVIDIA, then check for nvidia-smi
# Make sure the container was run with `--gpus all` as the only required parameter
echo "NVIDIA GPU detected via WSL2"
# nvidia-smi should be installed in the container
if nvidia-smi; then
GPU_ACCELERATION=true
GPU_VENDOR=nvidia
else
echo "NVIDIA GPU detected via WSL2, but nvidia-smi is not installed. GPU acceleration will not be available."
fi
fi
}
function check_amd() {
if lspci | grep -E 'VGA|3D' | grep -iq amd; then
echo "AMD GPU detected"
# Check if ROCm is installed
if [ -d /opt/rocm ]; then
GPU_ACCELERATION=true
GPU_VENDOR=amd
else
echo "AMD GPU detected, but ROCm is not installed. GPU acceleration will not be available."
fi
fi
}
function check_nvidia() {
if lspci | grep -E 'VGA|3D' | grep -iq nvidia; then
echo "NVIDIA GPU detected"
# nvidia-smi should be installed in the container
if nvidia-smi; then
GPU_ACCELERATION=true
GPU_VENDOR=nvidia
else
echo "NVIDIA GPU detected, but nvidia-smi is not installed. GPU acceleration will not be available."
fi
fi
}
function check_metal() {
if system_profiler SPDisplaysDataType | grep -iq 'Metal'; then
echo "Apple Metal supported GPU detected"
GPU_ACCELERATION=true
GPU_VENDOR=apple
fi
}
function detect_gpu() {
case "$(uname -s)" in
Linux)
check_nvidia
check_amd
check_intel
check_nvidia_wsl
;;
Darwin)
check_metal
;;
esac
}
function detect_gpu_size() {
# Attempting to find GPU memory size for NVIDIA GPUs
if [ "$GPU_ACCELERATION" = true ] && [ "$GPU_VENDOR" = "nvidia" ]; then
echo "NVIDIA GPU detected. Attempting to find memory size..."
# Using head -n 1 to get the total memory of the 1st NVIDIA GPU detected.
# If handling multiple GPUs is required in the future, this is the place to do it
nvidia_sm=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits | head -n 1)
if [ ! -z "$nvidia_sm" ]; then
echo "Total GPU Memory: $nvidia_sm MiB"
# if bigger than 8GB, use 16GB
#if [ "$nvidia_sm" -gt 8192 ]; then
# GPU_SIZE=gpu-16g
#else
GPU_SIZE=gpu-8g
#fi
else
echo "Unable to determine NVIDIA GPU memory size. Falling back to CPU."
GPU_SIZE=gpu-8g
fi
elif [ "$GPU_ACCELERATION" = true ] && [ "$GPU_VENDOR" = "intel" ]; then
GPU_SIZE=intel
# Default to a generic GPU size until we implement GPU size detection for non NVIDIA GPUs
elif [ "$GPU_ACCELERATION" = true ]; then
echo "Non-NVIDIA GPU detected. Specific GPU memory size detection is not implemented."
GPU_SIZE=gpu-8g
# default to cpu if GPU_SIZE is not set
else
echo "GPU acceleration is not enabled or supported. Defaulting to CPU."
GPU_SIZE=cpu
fi
}
function check_vars() {
if [ -z "$MODELS" ]; then
echo "MODELS environment variable is not set. Please set it to a comma-separated list of model YAML files to load."
exit 1
fi
if [ -z "$PROFILE" ]; then
echo "PROFILE environment variable is not set. Please set it to one of the following: cpu, gpu-8g, gpu-16g, apple"
exit 1
fi
}
detect_gpu
detect_gpu_size
PROFILE="${PROFILE:-$GPU_SIZE}" # default to cpu
export MODELS="${MODELS:-/aio/${PROFILE}/embeddings.yaml,/aio/${PROFILE}/rerank.yaml,/aio/${PROFILE}/text-to-speech.yaml,/aio/${PROFILE}/image-gen.yaml,/aio/${PROFILE}/text-to-text.yaml,/aio/${PROFILE}/speech-to-text.yaml,/aio/${PROFILE}/vad.yaml,/aio/${PROFILE}/vision.yaml}"
check_vars
echo "===> Starting LocalAI[$PROFILE] with the following models: $MODELS"
exec /entrypoint.sh "$@"

View File

@@ -1,13 +0,0 @@
embeddings: true
name: text-embedding-ada-002
backend: llama-cpp
parameters:
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "Your text string goes here",
"model": "text-embedding-ada-002"
}'

View File

@@ -1,25 +0,0 @@
name: stablediffusion
parameters:
model: DreamShaper_8_pruned.safetensors
backend: diffusers
step: 25
f16: true
diffusers:
pipeline_type: StableDiffusionPipeline
cuda: true
enable_parameters: "negative_prompt,num_inference_steps"
scheduler_type: "k_dpmpp_2m"
download_files:
- filename: DreamShaper_8_pruned.safetensors
uri: huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors
usage: |
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "<positive prompt>|<negative prompt>",
"step": 25,
"size": "512x512"
}'

View File

@@ -1,33 +0,0 @@
name: jina-reranker-v1-base-en
reranking: true
f16: true
parameters:
model: jina-reranker-v1-tiny-en.f16.gguf
backend: llama-cpp
download_files:
- filename: jina-reranker-v1-tiny-en.f16.gguf
sha256: 5f696cf0d0f3d347c4a279eee8270e5918554cdac0ed1f632f2619e4e8341407
uri: huggingface://mradermacher/jina-reranker-v1-tiny-en-GGUF/jina-reranker-v1-tiny-en.f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "jina-reranker-v1-base-en",
"query": "Organic skincare products for sensitive skin",
"documents": [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
],
"top_n": 3
}'

View File

@@ -1,18 +0,0 @@
name: whisper-1
backend: whisper
parameters:
model: ggml-whisper-base.bin
usage: |
## example audio file
wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
## Send the example audio file to the transcriptions endpoint
curl http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@$PWD/gb1.ogg" -F model="whisper-1"
download_files:
- filename: "ggml-whisper-base.bin"
sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"

View File

@@ -1,15 +0,0 @@
name: tts-1
download_files:
- filename: voice-en-us-amy-low.tar.gz
uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
backend: piper
parameters:
model: en-us-amy-low.onnx
usage: |
To test if this model works as expected, you can use the following curl command:
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"tts-1",
"input": "Hi, this is a test."
}'

View File

@@ -1,54 +0,0 @@
context_size: 4096
f16: true
backend: llama-cpp
function:
capture_llm_results:
- (?s)<Thought>(.*?)</Thought>
grammar:
properties_order: name,arguments
json_regex_match:
- (?s)<Output>(.*?)</Output>
replace_llm_results:
- key: (?s)<Thought>(.*?)</Thought>
value: ""
mmap: true
name: gpt-4
parameters:
model: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
sha256: 4e7b7fe1d54b881f1ef90799219dc6cc285d29db24f559c8998d1addb35713d4
uri: huggingface://mudler/LocalAI-functioncall-qwen2.5-7b-v0.5-Q4_K_M-GGUF/localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf

View File

@@ -1,8 +0,0 @@
backend: silero-vad
name: silero-vad
parameters:
model: silero-vad.onnx
download_files:
- filename: silero-vad.onnx
uri: https://huggingface.co/onnx-community/silero-vad/resolve/main/onnx/model.onnx
sha256: a4a068cd6cf1ea8355b84327595838ca748ec29a25bc91fc82e6c299ccdc5808

View File

@@ -1,50 +0,0 @@
context_size: 4096
backend: llama-cpp
f16: true
mmap: true
mmproj: minicpm-v-4_5-mmproj-f16.gguf
name: gpt-4o
parameters:
model: minicpm-v-4_5-Q4_K_M.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
- <|endoftext|>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: minicpm-v-4_5-Q4_K_M.gguf
sha256: c1c3c33100b15b4caf7319acce4e23c0eb0ce1cbd12f70e8d24f05aa67b7512f
uri: huggingface://openbmb/MiniCPM-V-4_5-gguf/ggml-model-Q4_K_M.gguf
- filename: minicpm-v-4_5-mmproj-f16.gguf
uri: huggingface://openbmb/MiniCPM-V-4_5-gguf/mmproj-model-f16.gguf
sha256: 7a7225a32e8d453aaa3d22d8c579b5bf833c253f784cdb05c99c9a76fd616df8

View File

@@ -1,13 +0,0 @@
embeddings: true
name: text-embedding-ada-002
backend: llama-cpp
parameters:
model: huggingface://bartowski/granite-embedding-107m-multilingual-GGUF/granite-embedding-107m-multilingual-f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
"input": "Your text string goes here",
"model": "text-embedding-ada-002"
}'

View File

@@ -1,20 +0,0 @@
name: stablediffusion
parameters:
model: Lykon/dreamshaper-8
backend: diffusers
step: 25
f16: true
diffusers:
pipeline_type: StableDiffusionPipeline
cuda: true
enable_parameters: "negative_prompt,num_inference_steps"
scheduler_type: "k_dpmpp_2m"
usage: |
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "<positive prompt>|<negative prompt>",
"step": 25,
"size": "512x512"
}'

View File

@@ -1,33 +0,0 @@
name: jina-reranker-v1-base-en
reranking: true
f16: true
parameters:
model: jina-reranker-v1-tiny-en.f16.gguf
backend: llama-cpp
download_files:
- filename: jina-reranker-v1-tiny-en.f16.gguf
sha256: 5f696cf0d0f3d347c4a279eee8270e5918554cdac0ed1f632f2619e4e8341407
uri: huggingface://mradermacher/jina-reranker-v1-tiny-en-GGUF/jina-reranker-v1-tiny-en.f16.gguf
usage: |
You can test this model with curl like this:
curl http://localhost:8080/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "jina-reranker-v1-base-en",
"query": "Organic skincare products for sensitive skin",
"documents": [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
],
"top_n": 3
}'

View File

@@ -1,18 +0,0 @@
name: whisper-1
backend: whisper
parameters:
model: ggml-whisper-base.bin
usage: |
## example audio file
wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
## Send the example audio file to the transcriptions endpoint
curl http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F file="@$PWD/gb1.ogg" -F model="whisper-1"
download_files:
- filename: "ggml-whisper-base.bin"
sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"

View File

@@ -1,15 +0,0 @@
name: tts-1
download_files:
- filename: voice-en-us-amy-low.tar.gz
uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
backend: piper
parameters:
model: en-us-amy-low.onnx
usage: |
To test if this model works as expected, you can use the following curl command:
curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
"model":"tts-1",
"input": "Hi, this is a test."
}'

View File

@@ -1,54 +0,0 @@
context_size: 4096
f16: true
backend: llama-cpp
function:
capture_llm_results:
- (?s)<Thought>(.*?)</Thought>
grammar:
properties_order: name,arguments
json_regex_match:
- (?s)<Output>(.*?)</Output>
replace_llm_results:
- key: (?s)<Thought>(.*?)</Thought>
value: ""
mmap: true
name: gpt-4
parameters:
model: localai-functioncall-qwen2.5-7b-v0.5-q4_k_m.gguf
stopwords:
- <|im_end|>
- <dummy32000>
- </s>
template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are an AI assistant that executes function calls, and these are the tools at your disposal:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
<|im_end|>
{{.Input -}}
<|im_start|>assistant
download_files:
- filename: localai-functioncall-phi-4-v0.3-q4_k_m.gguf
sha256: 23fee048ded2a6e2e1a7b6bbefa6cbf83068f194caa9552aecbaa00fec8a16d5
uri: huggingface://mudler/LocalAI-functioncall-phi-4-v0.3-Q4_K_M-GGUF/localai-functioncall-phi-4-v0.3-q4_k_m.gguf

View File

@@ -1,128 +1,42 @@
ARG BASE_IMAGE=ubuntu:22.04
# Builds a single Go backend on top of the shared
# .docker/bases/Dockerfile.golang base. The base bakes in apt + GPU SDK +
# Go toolchain + protoc + grpc tooling, so this stage only carries the
# per-backend opus-dev install + COPY + `make build`.
#
# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
# the right base flavour automatically via scripts/changed-backends.js
# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
# make backend-image-base LANG=golang BUILD_TYPE=<...>
# make backend-image BACKEND=<...> BUILD_TYPE=<...>
# See .agents/ci-caching.md.
ARG BASE_IMAGE_PREBUILT
FROM ${BASE_IMAGE_PREBUILT} AS builder
FROM ${BASE_IMAGE} AS builder
ARG BACKEND=rerankers
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.22.6
ARG AMDGPU_TARGETS
ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
git ccache \
ca-certificates \
make cmake \
curl unzip \
libssl-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# opus-dev is only needed for the opus backend; install on demand to keep
# every other golang backend's base image lean.
RUN if [ "${BACKEND}" = "opus" ]; then \
apt-get update && apt-get install -y --no-install-recommends libopus-dev pkg-config && \
apt-get clean && rm -rf /var/lib/apt/lists/*; \
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig \
; fi
# Install Go
RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin:/usr/local/bin
# Install grpc compilers
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
RUN echo "TARGETARCH: $TARGETARCH"
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
# here so that we can generate the grpc code for the stablediffusion build
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT
COPY . /LocalAI
RUN git config --global --add safe.directory /LocalAI
RUN cd /LocalAI && make protogen-go && make -C /LocalAI/backend/go/${BACKEND} build
FROM scratch

View File

@@ -0,0 +1,54 @@
# Builds the ik-llama-cpp backend on top of the shared
# .docker/bases/Dockerfile.cpp base (shared with llama-cpp/turboquant).
# See backend/Dockerfile.llama-cpp for the rationale; this consumer differs
# only in the make targets at the end.
ARG BASE_IMAGE_PREBUILT
FROM ${BASE_IMAGE_PREBUILT} AS builder
ARG CUDA_DOCKER_ARCH
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
ARG CMAKE_ARGS
ENV CMAKE_ARGS=${CMAKE_ARGS}
ARG BACKEND=ik-llama-cpp
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ARG TARGETARCH
ARG TARGETVARIANT
COPY . /LocalAI
RUN <<'EOT' bash
set -euxo pipefail
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
rm -rf /LocalAI/backend/cpp/ik-llama-cpp-*-build
fi
cd /LocalAI/backend/cpp/ik-llama-cpp
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
# ARM64 / ROCm: build without x86 SIMD
make ik-llama-cpp-fallback
else
# ik_llama.cpp's IQK kernels require at least AVX2
make ik-llama-cpp-avx2
fi
EOT
RUN make -BC /LocalAI/backend/cpp/ik-llama-cpp package
FROM scratch
COPY --from=builder /LocalAI/backend/cpp/ik-llama-cpp/package/. ./

View File

@@ -1,198 +1,58 @@
ARG BASE_IMAGE=ubuntu:22.04
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
# Builds the llama-cpp backend on top of the shared
# .docker/bases/Dockerfile.cpp base. The base bakes in apt + GPU SDK +
# protoc + cmake + GRPC, so this stage only carries the COPY + `make`
# invocations and the final scratch-stage package.
#
# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) passes
# BASE_IMAGE_PREBUILT. See .agents/ci-caching.md.
ARG BASE_IMAGE_PREBUILT
# The grpc target does one thing, it builds and installs GRPC. This is in it's own layer so that it can be effectively cached by CI.
# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
FROM ${GRPC_BASE_IMAGE} AS grpc
FROM ${BASE_IMAGE_PREBUILT} AS builder
# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
ARG GRPC_VERSION=v1.65.0
ARG CMAKE_FROM_SOURCE=false
ARG CMAKE_VERSION=3.26.4
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
WORKDIR /build
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates \
build-essential curl libssl-dev \
git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
# and running make install in the target container
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
mkdir -p /build/grpc/cmake/build && \
cd /build/grpc/cmake/build && \
sed -i "216i\ TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
make && \
make install && \
rm -rf /build
FROM ${BASE_IMAGE} AS builder
ARG BACKEND=rerankers
# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
ARG CUDA_DOCKER_ARCH
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
ARG CMAKE_ARGS
ENV CMAKE_ARGS=${CMAKE_ARGS}
ARG AMDGPU_TARGETS
ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
ARG BACKEND=llama-cpp
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
ARG GO_VERSION=1.22.6
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
ccache git \
ca-certificates \
make \
curl unzip \
libssl-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig \
; fi
RUN echo "TARGETARCH: $TARGETARCH"
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
# here so that we can generate the grpc code for the stablediffusion build
RUN <<EOT bash
if [ "amd64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
rm protoc.zip
fi
EOT
# Install CMake (the version in 22.04 is too old)
RUN <<EOT bash
if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
else
apt-get update && \
apt-get install -y \
cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
COPY --from=grpc /opt/grpc /usr/local
COPY . /LocalAI
## Otherwise just run the normal build
RUN <<EOT bash
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-fallback && \
make llama-cpp-grpc && make llama-cpp-rpc-server; \
else \
cd /LocalAI/backend/cpp/llama-cpp && make llama-cpp-avx && \
make llama-cpp-avx2 && \
make llama-cpp-avx512 && \
make llama-cpp-fallback && \
make llama-cpp-grpc && \
make llama-cpp-rpc-server; \
fi
RUN <<'EOT' bash
set -euxo pipefail
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
fi
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
else
cd /LocalAI/backend/cpp/llama-cpp
make llama-cpp-avx
make llama-cpp-avx2
make llama-cpp-avx512
make llama-cpp-fallback
make llama-cpp-grpc
make llama-cpp-rpc-server
fi
EOT

View File

@@ -1,123 +1,57 @@
ARG BASE_IMAGE=ubuntu:22.04
# Builds a single Python backend on top of the shared
# .docker/bases/Dockerfile.python base. The base bakes in apt-update + GPU
# SDK install + python toolchain (uv, pip, rustup, grpcio-tools), so this
# stage only carries the per-backend source COPY + `make`.
#
# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
# the right base flavour automatically via scripts/derive-build-matrix.js
# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
# make backend-image-base BUILD_TYPE=<...> # build the base
# make backend-image BACKEND=<...> BUILD_TYPE=<...>
# See .agents/ci-caching.md.
ARG BASE_IMAGE_PREBUILT
FROM ${BASE_IMAGE_PREBUILT} AS builder
FROM ${BASE_IMAGE} AS builder
ARG BACKEND=rerankers
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
ENV DEBIAN_FRONTEND=noninteractive
ARG TARGETARCH
ARG TARGETVARIANT
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
ccache \
ca-certificates \
espeak-ng \
curl \
libssl-dev \
git \
git-lfs \
unzip clang \
upx-ucl \
curl python3-pip \
python-is-python3 \
python3-dev llvm \
python3-venv make cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
pip install --upgrade pip
COPY backend/python/${BACKEND} /${BACKEND}
COPY backend/backend.proto /${BACKEND}/backend.proto
COPY backend/python/common/ /${BACKEND}/common
COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
# Optional per-backend source build toggle (e.g. vllm on CPU can set
# FROM_SOURCE=true to compile against the build host SIMD instead of
# pulling a prebuilt wheel). Default empty — most backends ignore it.
ARG FROM_SOURCE=""
ENV FROM_SOURCE=${FROM_SOURCE}
# Cuda
ENV PATH=/usr/local/cuda/bin:${PATH}
# HipBLAS requirements
ENV PATH=/opt/rocm/bin:${PATH}
# Vulkan requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils wget gpg-agent && \
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt-get update && \
apt-get install -y \
vulkan-sdk && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# CuBLAS requirements
RUN <<EOT bash
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
apt-get update && \
apt-get install -y --no-install-recommends \
software-properties-common pciutils
if [ "amd64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
fi
if [ "arm64" = "$TARGETARCH" ]; then
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
fi
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm -f cuda-keyring_1.1-1_all.deb && \
apt-get update && \
apt-get install -y --no-install-recommends \
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
fi
EOT
# If we are building with clblas support, we need the libraries for the builds
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
libclblast-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* \
; fi
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
apt-get update && \
apt-get install -y --no-install-recommends \
hipblas-dev \
rocblas-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
ldconfig \
; fi
# Install uv as a system package
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
ENV PATH="/root/.cargo/bin:${PATH}"
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# Install grpcio-tools (the version in 22.04 is too old)
RUN pip install --user grpcio-tools==1.71.0 grpcio==1.71.0
COPY python/${BACKEND} /${BACKEND}
COPY backend.proto /${BACKEND}/backend.proto
COPY python/common/ /${BACKEND}/common
# Cache-buster for the per-backend `make` step. Most Python backends list
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
# would otherwise freeze upstream versions indefinitely. CI passes a value
# that rolls weekly so the install layer is rebuilt at most once per week
# and picks up newer wheels from PyPI / nightly indexes.
ARG DEPS_REFRESH=initial
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
# Package GPU libraries into the backend's lib directory
RUN mkdir -p /${BACKEND}/lib && \
TARGET_LIB_DIR="/${BACKEND}/lib" BUILD_TYPE="${BUILD_TYPE}" CUDA_MAJOR_VERSION="${CUDA_MAJOR_VERSION}" \
bash /package-gpu-libs.sh "/${BACKEND}/lib"
# Run backend-specific packaging if a package.sh exists
RUN if [ -f "/${BACKEND}/package.sh" ]; then \
cd /${BACKEND} && bash package.sh; \
fi
FROM scratch
ARG BACKEND=rerankers
COPY --from=builder /${BACKEND}/ /
COPY --from=builder /${BACKEND}/ /

Some files were not shown because too many files have changed in this diff Show More