LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-25 00:59:28 -04:00

Go to file

LocalAI [bot] 5c3d48ab50 feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496 )

* feat(ui): remember last-used model per capability

ModelSelector auto-selected the first option whenever the bound value was
empty or stale, so every visit to the Home chat box, Image, TTS or Talk
pages reset the choice to whatever sorted first. Persist the user's pick
in localStorage keyed by capability and prefer it on auto-select when the
model is still available, falling back to the first option otherwise.

Because every modality picker funnels through ModelSelector, this fixes
the friction everywhere at once. External-options callers pass no
capability and keep the previous first-item behaviour.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): add visibility-aware polling hook

The app had 26 hand-rolled setInterval polls, none of which paused when
the browser tab was hidden, so backgrounded dashboards kept hitting the
server every few seconds for data nobody was looking at.

Add usePolling: runs immediately, polls on a fixed interval, pauses while
document.hidden, fires a catch-up poll on return, and guards against
overlapping slow requests. Route useResources (the highest-frequency
shared poll) through it. Further callers can be migrated incrementally.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): hardware-aware starter models on empty home

A fresh install dropped admins straight into a 1000+ model gallery with
no guidance. Add a StarterModels widget to the empty-state wizard that
recommends a small, curated set tuned to the detected hardware:

- CPU-only machines (no GPU VRAM) are steered to genuinely small models
  (1-4B, Q4) that stay responsive without a GPU.
- GPU machines get suggestions scaled to available VRAM.

Curated names are real gallery entries, intersected against the live
gallery at render time so a trimmed/custom gallery degrades gracefully.
Install is one click via the existing model-install API.

Also routes Home's cluster and system-info polls through usePolling so a
backgrounded home page stops fetching.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* feat(ui): optional token-cost estimates on usage dashboard

The usage dashboard tracked tokens but had no monetary view. Multi-user
deployments that bill back or budget compute had to export and compute
cost elsewhere.

Add an opt-in pricing control: admins set $ per 1M prompt/completion
tokens (stored per-browser). When set, an estimated-cost summary card and
per-model / per-user cost columns appear, computed from recorded token
counts. The entire cost surface stays hidden until a price is entered, so
the default view is unchanged. Cost is clearly labelled an estimate -
LocalAI itself has no notion of price.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(ui): label icon-only send buttons for screen readers

The chat and agent-chat send buttons were a bare paper-plane icon with
no accessible name, so screen readers announced only "button". Add an
aria-label/title ("Send message") and mark the icon aria-hidden. An audit
of all icon-only buttons found these were the only two unlabeled controls;
the rest already carry visible text.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-24 23:30:08 +02:00

.agents

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

.devcontainer

fix: Add named volumes for Windows Docker compatibility (#8661 )

2026-02-26 23:18:53 +01:00

.devcontainer-scripts

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

.docker

feat(vulkan): make Vulkan backends self-contained on the GPU (#10404 )

2026-06-19 17:16:33 +02:00

.githooks

test: add Go + React UI coverage gates and fill test gaps (#9989 )

2026-05-26 22:06:10 +02:00

.github

feat(backends): add darwin/metal build for liquid-audio (#10486 )

2026-06-24 23:16:27 +02:00

.vscode

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

backend

feat(backends): add darwin/metal build for liquid-audio (#10486 )

2026-06-24 23:16:27 +02:00

cmd

fix(launcher): truncate download status labels to stop progress dialog blowout (#10357 )

2026-06-16 09:42:07 +02:00

configuration

refactor: move remaining api packages to core (#1731 )

2024-03-01 16:19:53 +01:00

core

feat(ui): usage & UX enhancements (last-used model, polling, starter models, usage cost, a11y) (#10496 )

2026-06-24 23:30:08 +02:00

custom-ca-certs

feat(certificates): add support for custom CA certificates (#880 )

2023-11-01 20:10:14 +01:00

docs

docs: ⬆️ update docs version mudler/LocalAI (#10491 )

2026-06-24 23:18:24 +02:00

examples

feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346 ) (#10351 )

2026-06-21 17:07:15 +02:00

gallery

chore(model-gallery): ⬆️ update checksum (#10495 )

2026-06-24 23:18:04 +02:00

internal

feat: cleanups, small enhancements

2023-07-04 18:58:19 +02:00

pkg

refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476 )

2026-06-24 11:08:29 +02:00

prompt-templates

Requested Changes from GPT4ALL to Luna-AI-Llama2 (#1092 )

2023-09-22 11:22:17 +02:00

scripts

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

swagger

feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447 )

2026-06-22 18:24:29 +02:00

tests

fix(test): update e2e UpdateProgress calls for new cancellable arg (#10460 )

2026-06-22 23:45:22 +02:00

.air.toml

feat(ui): chat stats, small visual enhancements (#7223 )

2025-11-10 18:12:07 +01:00

.dockerignore

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

.editorconfig

feat(stores): Vector store backend (#1795 )

2024-03-22 21:14:04 +01:00

.env

feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504 )

2026-02-11 22:58:19 +01:00

.gitattributes

chore(linguist): add *.hpp files to linguist-vendored (#4154 )

2024-11-14 14:12:16 +01:00

.gitignore

feat(ui): restructure Cluster Nodes view (pulse + panel roster + detail page) (#10447 )

2026-06-22 18:24:29 +02:00

.gitmodules

feat: Add Kokoros backend (#9212 )

2026-04-08 19:23:16 +02:00

.golangci.yml

feat(supertonic): add Supertonic ONNX TTS backend (CPU) (#10342 )

2026-06-15 16:54:11 +02:00

.goreleaser.yaml

feat(ui): move to React for frontend (#8772 )

2026-03-05 21:47:12 +01:00

.yamllint

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

AGENTS.md

test(react-ui): add page render-smoke specs, reset the coverage gate (#10122 )

2026-06-01 14:24:36 +02:00

CLAUDE.md

fix(realtime): Add functions to conversation history (#8616 )

2026-02-21 19:03:49 +01:00

CONTRIBUTING.md

test(react-ui): add page render-smoke specs, reset the coverage gate (#10122 )

2026-06-01 14:24:36 +02:00

coverage-baseline.txt

test: add Go + React UI coverage gates and fill test gaps (#9989 )

2026-05-26 22:06:10 +02:00

docker-compose.distributed.yaml

fix(distributed): worker container healthcheck always unhealthy

2026-04-27 13:51:57 +00:00

docker-compose.yaml

fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 )

2026-04-24 22:02:23 +02:00

Dockerfile

fix(cuda): install cuda-nvrtc-dev alongside the other CUDA dev packages (#10257 )

2026-06-11 23:57:00 +02:00

Entitlements.plist

Feat: OSX Local Codesigning (#1319 )

2023-11-23 15:22:54 +01:00

entrypoint.sh

feat: ⚠️ reduce images size and stop bundling sources (#5721 )

2025-06-26 18:41:38 +02:00

flake.lock

fix(nix flake): ensure nix flake builds successfully (#10399 )

2026-06-19 17:15:18 +02:00

flake.nix

fix(nix flake): ensure nix flake builds successfully (#10399 )

2026-06-19 17:15:18 +02:00

go.mod

chore: bump localrecall to fix PostgreSQL collection name with ':' (#10375 ) (#10387 )

2026-06-18 17:05:52 +02:00

go.sum

chore: bump localrecall to fix PostgreSQL collection name with ':' (#10375 ) (#10387 )

2026-06-18 17:05:52 +02:00

LICENSE

chore(docs): update license year

2025-02-15 18:17:15 +01:00

Makefile

fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier (#10401 )

2026-06-22 18:26:19 +02:00

README.md

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

renovate.json

ci: manually update deps

2023-05-04 15:01:29 +02:00

SECURITY.md

docs: clarify SECURITY.md version support table with specific ranges and EOL dates (#8861 )

2026-03-08 17:58:19 +01:00

webui_static.yaml

feat(ui): move to React for frontend (#8772 )

2026-03-05 21:47:12 +01:00

README.md

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

A small core, not a bundle. Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.

Composable by design: backends are separate and pulled on demand, so you install only what your model needs
Open and extensible: load any model, or build your own backend in any language against an open interface
Drop-in API compatibility: OpenAI, Anthropic, and ElevenLabs APIs across every backend
Any model, any modality: LLMs, vision, voice, image, and video behind one API
Any hardware: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
Multi-user ready: API key auth, user quotas, role-based access
Built-in AI agents: autonomous agents with tool use, RAG, MCP, and skills
Privacy-first: your data never leaves your infrastructure

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | ❓FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

Quickstart

macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, /models lists installed models and /model <name> switches between them.

# Terminal 1
local-ai run llama-3.2-1b-instruct:q4_k_m

# Terminal 2
local-ai chat --model llama-3.2-1b-instruct:q4_k_m

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

June 2026: New realtime voice assistant demo (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus streaming of the realtime LLM / TTS / transcription pipeline stages and configurable WebRTC ICE candidates.
June 2026: Big speech push: the parakeet.cpp ASR engine gains NeMo-faithful segment timestamps, a multilingual streaming Nemotron-3.5 model, dynamic batching for concurrent transcription and CUDA graphs; the new CrispASR backend adds multi-architecture ASR + TTS, and 60 Piper TTS voices across 42 languages land in the gallery (plus per-request TTS instructions and params).
June 2026: New backends and models: locate-anything.cpp for open-vocabulary object detection via ggml, Ideogram4 image generation in stablediffusion-ggml, llama.cpp video input, and the Gemma 4 QAT family with MTP speculative-decoding pairs. Plus an interactive CLI chat mode and RAG source citations in agent responses.
June 2026: Distributed mode hardening: prefix-cache-aware routing, a production-ready request router with auto-sized embedding/rerank batches, ds4 layer-split distributed inference, NATS JWT auth + TLS/mTLS, and resumable file uploads.
May 2026: LocalAI 4.3.0 - llama.cpp prompt cache on by default (repeated system prompts collapse from minutes to seconds), keyless cosign signing of backend OCI images, per-API-key + per-user usage attribution, Distributed v3 with per-request replica routing. Release notes
May 2026: LocalAI 4.2.0 - LocalAI sees and hears: voice recognition, face recognition + antispoofing liveness, speaker diarization. Plus drop-in Ollama API, video generation, redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. Release notes
April 2026: LocalAI 4.1.0 - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. Release notes
March 2026: LocalAI 4.0.0 - native agentic orchestration with the new Agenthub community hub, full React UI rewrite with Canvas mode, MCP Apps + client-side with tool streaming, WebRTC realtime audio, MLX-distributed. Release notes
February 2026: Realtime API for audio-to-audio with tool calling, ACE-Step 1.5 support
January 2026: LocalAI 3.10.0 — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. Release notes
December 2025: Dynamic Memory Resource reclaimer, Automatic multi-GPU model fitting (llama.cpp), Vibevoice backend
November 2025: Import models via URL, Multiple chats and history
October 2025: Model Context Protocol (MCP) support for agentic capabilities
September 2025: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
August 2025: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
July 2025: All backends migrated outside the main binary — lightweight, modular architecture

For older news and full release notes, see GitHub Releases and the News page.

Features

Text generation (llama.cpp, transformers, vllm ... and more)
Text to Audio
Audio to Text
Image generation
OpenAI-compatible tools API
Realtime API (Speech-to-speech)
Embeddings generation
Constrained grammars
Download models from Huggingface
Vision API
Object Detection
Reranker API
P2P Inferencing
Distributed Mode — Horizontal scaling with PostgreSQL + NATS
Model Context Protocol (MCP)
Built-in Agents — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and Agent Hub
Backend Gallery — Install/remove backends on the fly via OCI images
Voice Activity Detection (Silero-VAD)
Integrated WebUI

Supported Backends & Acceleration

LocalAI supports 60+ backends including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Backends built by us

Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:

Backend	What it does
parakeet.cpp	C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription
ced.cpp	C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition
voxtral.c	Voxtral Realtime 4B speech-to-text in pure C
vibevoice.cpp	Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization
rf-detr.cpp	Native RF-DETR object detection and instance segmentation
locate-anything.cpp	Open-vocabulary object detection and visual grounding (LocateAnything-3B)
depth-anything.cpp	Depth Anything 3 monocular metric depth + camera pose estimation
privacy-filter.cpp	Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier
LocalVQE	Joint acoustic echo cancellation, noise suppression, and dereverberation
local-store	Local-first vector database for embeddings (shipped in-tree)

We also maintain apex-quant, a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.

Resources

Documentation
LLM fine-tuning guide
Build from source
Kubernetes installation
Integrations & community projects
Installation video walkthrough
Media & blog posts
Examples — including the realtime voice assistant demo (Go client for the Realtime API with tool calling)

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

Ettore Di Giacinto — original author and project lead
Richard Palethorpe — maintainer

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Star history

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

llama.cpp
https://github.com/tatsu-lab/stanford_alpaca
https://github.com/cornelk/llama-go for the initial ideas
https://github.com/antimatter15/alpaca.cpp
https://github.com/EdVince/Stable-Diffusion-NCNN
https://github.com/ggerganov/whisper.cpp
https://github.com/rhasspy/piper
exo for the MLX distributed auto-parallel sharding implementation

Contributors

This is a community project, a special thanks to our contributors!

Languages

Go 69.1%

JavaScript 12.1%

Python 5.9%

HTML 4.7%

C++ 3%

Other 5.2%

README.md

Guided tour

User and auth

Agents

Usage metrics per user

Fine-tuning and Quantization

WebRTC

Quickstart

macOS

Containers (Docker, podman, ...)

CPU only:

NVIDIA GPU:

AMD GPU (ROCm):

Intel GPU (oneAPI):

Vulkan GPU:

Loading models

Latest News

Features

Supported Backends & Acceleration

Backends built by us

Resources

Team

Citation

Sponsors

Individual sponsors

Star history

License

Acknowledgements

Contributors