mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-23 08:08:52 -04:00
* feat(ced): sketch sound-classification backend (CED audio tagger) Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry, footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend. SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist in DESIGN.md): - backend/backend.proto: new SoundDetection rpc + SoundClass messages (run `make protogen-go` to regenerate pkg/grpc/proto). - backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h), goced.go (Ced gRPC backend: Load + SoundDetection), Makefile (clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh, package.sh, .gitignore. - DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability registration checklist), gallery/index + CI registration, and a scoping note for the realtime/websocket live-recognition path (sliding-window classify over the existing ws transport + voicegate; the ced C-API per-PCM entry point is already window-friendly). Backend code does not compile until protogen-go regenerates the pb types and a libced.so is built (Makefile clones+builds it). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): REST /v1/audio/classification endpoint + capability registration Wires the ced sound-event classification backend (AudioSet audio tagger) end to end through the REST surface, mirroring the transcription path. - Handler: core/http/endpoints/openai/sound_classification.go parses the multipart audio upload, temp-files it, resolves the model config and calls the SoundDetection RPC; returns {model, detections[]} JSON. - Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection) loads the model and normalizes the proto response into schema types. - Schema: core/schema/sound_classification.go (SoundClassificationResult). - gRPC layer: SoundDetection wired through the LocalAI wrapper (interface, Backend client, Client, embed, server, base default) so the loader-typed client exposes the RPC; proto regenerated via make protogen-go. - Route: POST /v1/audio/classification (+ /audio/classification alias) with the audio/multipart default-model middleware in routes/openai.go. - Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_ CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap + GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase option; /api/instructions audio area updated; auth RouteFeatureRegistry + FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter + i18n; docs page features/audio-classification.md + whats-new + crosslink. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): realtime sound-event detection over the websocket API When a realtime pipeline configures a sound-classification model, each VAD-committed utterance (the same window the transcription path produces) is also run through the CED sound-event classifier and the scored AudioSet tags are emitted as a new server event. No new backend rpc is needed: the SoundDetection gRPC method already exists on this branch. - config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty) beside Transcription/VAD. - realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the ModelInterface; implement it on wrappedModel and transcriptOnlyModel by calling backend.ModelSoundDetection with the session's sound-classification model config (mirrors how Transcribe dispatches). Load the optional config in newModel / newTranscriptionOnlyModel; nil config keeps it additive. - types: add ConversationItemSoundDetectionEvent (item_id, content_index, detections[]{label,score,index}) with type conversation.item.sound_detection, its ServerEventType constant and MarshalJSON, mirroring the transcription completed event. - realtime: add emitSoundDetection (unary path: classify the committed window, build the event, t.SendEvent) and wire it at the utterance-commit hook right after emitTranscription; gated on session.SoundDetectionEnabled (resolved from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0). Its error is logged via xlog but never aborts the turn. - test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections, classifier error) plus a SoundDetection method on the fakeModel double. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): implement SoundDetection in nodes backend test doubles The SoundDetection method added to the grpc backend interface left two test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so core/services/nodes failed to compile under `go vet`/`go test` (go build missed it: the doubles live in _test.go). Add the method to both, mirroring their existing Detect mock. Repairs CI for the nodes package. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): decouple realtime sound detection from VAD (sound-only sessions) Sound-event detection must activate on sounds, not speech, so it no longer runs through the voice VAD/transcription path. A sound-detection-only pipeline (sound_detection set, no transcription/LLM) now: - is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline stage), - builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS loaded), and - defaults the session to turn_detection none (no VAD) with no transcription stage, so the client drives windowing via input_audio_buffer.commit (option A: client-side sliding window). The per-PCM C-API already supports arbitrary windows. commitUtterance gains a sound-only branch: it emits the conversation.item.sound_detection event (scored AudioSet tags) and stops - no transcription, no LLM response. generateResponse is now guarded on a transcription stage being present, so a sound-only turn never invokes the LLM. Existing transcription/VAD sessions are unchanged (additive). Added a commitUtterance sound-only Ginkgo spec asserting it emits the sound event and neither transcribes nor generates a response. go vet + golangci-lint (new-from-merge-base) clean; openai suite green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): register sound-classification backend in gallery + CI Mechanical backend-image registration for the ced sound-event classifier, mirroring the parakeet-cpp Go/purego backend everywhere it is wired up. - .github/backend-matrix.yml: add the ced build matrix, field-for-field copies of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64, l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan amd64/arm64, rocm hipblas, and the metal darwin entry), changing only backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang. - backend/index.yaml: add the &ced meta anchor (capabilities map per platform) plus ced-development and the per-arch image entries, each uri/mirror tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is intentionally deferred pending the HuggingFace publish (TODO note inline). - scripts/changed-backends.js: add an explicit item.backend === "ced" branch in inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as the parakeet-cpp branch (before the generic golang fallthrough). - .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in backend/go/ced/Makefile so the daily bot bumps the pin. - swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so the existing /v1/audio/classification annotations land in the generated spec. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): server-side windowing for realtime sound detection (option B) Adds an optional server-driven sliding-window classifier so a sound-only realtime client only has to stream audio (no input_audio_buffer.commit): - Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs. When both > 0 on a sound-only session, the server classifies the last window of streamed audio every hop and emits a conversation.item.sound_ detection event; the input buffer is trimmed to one window so a long stream stays bounded. When unset, the session stays client-driven (option A). Runs independent of VAD (sound events are not speech). - handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so it is unit-testable) + writeWindowWAV, which declares the true InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples correctly. Goroutine is started after toggleVAD and torn down with the session (close + wg.Wait). - Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta registry; the earlier realtime commit added pipeline.sound_detection without a registry entry, failing TestAllFieldsHaveRegistryEntries. This fixes that and covers the two new knobs. Tests: classifySoundWindow emits an event + trims the buffer to one window, no-ops on too-little audio; writeWindowWAV declares the given sample rate. go build/vet + golangci-lint (new-from-merge-base) clean; config + openai suites green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0) The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0, converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced + known_usecases: sound_classification) and two gallery/index.yaml entries (ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and removes the now-resolved TODO from backend/index.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ced): add tiny/mini/small GGUF model gallery entries Publishes the rest of the CED family (same architecture, metadata-driven port verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds their f16 + q8_0 gallery entries: ced-tiny (5.5M, edge/Pi-class) f16 11MB / q8_0 6MB ced-mini (9.6M) f16 19MB / q8_0 11MB ced-small (22M) f16 42MB / q8_0 23MB All sha256-pinned. ced-base remains the accuracy default. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8 gallery model entries' urls + file uris accordingly. sha256 and filenames are unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): bump CED_VERSION to the short-clip fix Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip shorter than target_length (~10.11s): time_pos_embed was added at its full 63-frame grid instead of being sliced to the clip's actual time grid, tripping ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s windows) and gated with a short-clip parity test upstream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive - README.md: add ced.cpp to the "native C/C++/GGML engines developed and maintained by the LocalAI project" table. - docs/content/features/backends.md: add a Sound Classification backend category (sound-event classification / audio tagging) listing ced.cpp. - .agents/adding-backends.md: add a "Documenting the backend" section and two verification-checklist items requiring new backends to be documented in the backends.md category list, and in-house native engines to be added to the README maintained-engines table. This directive was missing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ced): repin CED_VERSION to the v0.1.0 release commit ced.cpp history was squashed into a single release commit (tagged v0.1.0), so the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the v0.1.0 release commit, so the backend builds against a commit that exists. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths - sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler. - goced.go: reading a NUL-terminated C string from a libced-owned buffer. #nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since the uintptr is a C-owned malloc'd buffer, not Go-GC memory. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
341 lines
20 KiB
Markdown
341 lines
20 KiB
Markdown
<h1 align="center">
|
|
<br>
|
|
<img width="300" src="./core/http/static/logo.png"> <br>
|
|
<br>
|
|
</h1>
|
|
|
|
<p align="center">
|
|
<a href="https://github.com/go-skynet/LocalAI/stargazers" target="blank">
|
|
<img src="https://img.shields.io/github/stars/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI stars"/>
|
|
</a>
|
|
<a href='https://github.com/go-skynet/LocalAI/releases'>
|
|
<img src='https://img.shields.io/github/release/go-skynet/LocalAI?&label=Latest&style=for-the-badge'>
|
|
</a>
|
|
<a href="LICENSE" target="blank">
|
|
<img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" alt="LocalAI License"/>
|
|
</a>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="https://twitter.com/LocalAI_API" target="blank">
|
|
<img src="https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white&label=LocalAI_API" alt="Follow LocalAI_API"/>
|
|
</a>
|
|
<a href="https://discord.gg/uJAeKSAGDy" target="blank">
|
|
<img src="https://img.shields.io/badge/dynamic/json?color=blue&label=Discord&style=for-the-badge&query=approximate_member_count&url=https%3A%2F%2Fdiscordapp.com%2Fapi%2Finvites%2FuJAeKSAGDy%3Fwith_counts%3Dtrue&logo=discord" alt="Join LocalAI Discord Community"/>
|
|
</a>
|
|
</p>
|
|
|
|
<p align="center">
|
|
<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
|
|
</p>
|
|
|
|
<!-- Keep these links, translations synced daily. -->
|
|
<p align="center">
|
|
<a href="https://zdoc.app/de/mudler/LocalAI">Deutsch</a> |
|
|
<a href="https://zdoc.app/es/mudler/LocalAI">Español</a> |
|
|
<a href="https://zdoc.app/fr/mudler/LocalAI">français</a> |
|
|
<a href="https://zdoc.app/ja/mudler/LocalAI">日本語</a> |
|
|
<a href="https://zdoc.app/ko/mudler/LocalAI">한국어</a> |
|
|
<a href="https://zdoc.app/pt/mudler/LocalAI">Português</a> |
|
|
<a href="https://zdoc.app/ru/mudler/LocalAI">Русский</a> |
|
|
<a href="https://zdoc.app/zh/mudler/LocalAI">中文</a>
|
|
</p>
|
|
|
|
**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
|
|
|
|
**A small core, not a bundle.** Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.
|
|
|
|
- **Composable by design**: backends are separate and pulled on demand, so you install only what your model needs
|
|
- **Open and extensible**: load any model, or build your own backend in any language against an open interface
|
|
- **Drop-in API compatibility**: OpenAI, Anthropic, and ElevenLabs APIs across every backend
|
|
- **Any model, any modality**: LLMs, vision, voice, image, and video behind one API
|
|
- **Any hardware**: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
|
|
- **Multi-user ready**: API key auth, user quotas, role-based access
|
|
- **Built-in AI agents**: autonomous agents with tool use, RAG, MCP, and skills
|
|
- **Privacy-first**: your data never leaves your infrastructure
|
|
|
|

|
|
|
|
Created by [Ettore Di Giacinto](https://github.com/mudler) and maintained by the [LocalAI team](#team).
|
|
|
|
> [:book: Documentation](https://localai.io/) | [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) | [💻 Quickstart](https://localai.io/basics/getting_started/) | [🖼️ Models](https://models.localai.io/) | [❓FAQ](https://localai.io/faq/)
|
|
|
|
## Guided tour
|
|
|
|
https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
|
|
|
|
<details>
|
|
|
|
<summary>
|
|
Click to see more!
|
|
</summary>
|
|
|
|
#### User and auth
|
|
|
|
https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c
|
|
|
|
#### Agents
|
|
|
|
https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
|
|
|
|
#### Usage metrics per user
|
|
|
|
https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f
|
|
|
|
#### Fine-tuning and Quantization
|
|
|
|
https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee
|
|
|
|
#### WebRTC
|
|
|
|
https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b
|
|
|
|
</details>
|
|
|
|
## Quickstart
|
|
|
|
### macOS
|
|
|
|
<a href="https://github.com/mudler/LocalAI/releases/latest/download/LocalAI.dmg">
|
|
<img src="https://img.shields.io/badge/Download-macOS-blue?style=for-the-badge&logo=apple&logoColor=white" alt="Download LocalAI for macOS"/>
|
|
</a>
|
|
|
|
> **Note:** The DMG is not signed by Apple. After installing, run: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`. See [#6268](https://github.com/mudler/LocalAI/issues/6268) for details.
|
|
|
|
### Containers (Docker, podman, ...)
|
|
|
|
> Already ran LocalAI before? Use `docker start -i local-ai` to restart an existing container.
|
|
|
|
#### CPU only:
|
|
|
|
```bash
|
|
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
|
|
```
|
|
|
|
#### NVIDIA GPU:
|
|
|
|
```bash
|
|
# CUDA 13
|
|
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
|
|
|
|
# CUDA 12
|
|
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
|
|
|
|
# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
|
|
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
|
|
|
|
# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
|
|
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
|
|
```
|
|
|
|
#### AMD GPU (ROCm):
|
|
|
|
```bash
|
|
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
|
|
```
|
|
|
|
#### Intel GPU (oneAPI):
|
|
|
|
```bash
|
|
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
|
|
```
|
|
|
|
#### Vulkan GPU:
|
|
|
|
```bash
|
|
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
|
|
```
|
|
|
|
### Loading models
|
|
|
|
```bash
|
|
# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
|
|
local-ai run llama-3.2-1b-instruct:q4_k_m
|
|
# From Huggingface
|
|
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
|
|
# From the Ollama OCI registry
|
|
local-ai run ollama://gemma:2b
|
|
# From a YAML config
|
|
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
|
|
# From a standard OCI registry (e.g., Docker Hub)
|
|
local-ai run oci://localai/phi-2:latest
|
|
```
|
|
|
|
To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, `/models` lists installed models and `/model <name>` switches between them.
|
|
|
|
```bash
|
|
# Terminal 1
|
|
local-ai run llama-3.2-1b-instruct:q4_k_m
|
|
|
|
# Terminal 2
|
|
local-ai chat --model llama-3.2-1b-instruct:q4_k_m
|
|
```
|
|
|
|
> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).
|
|
|
|
For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).
|
|
|
|
## Latest News
|
|
|
|
- **June 2026**: New [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus [streaming of the realtime LLM / TTS / transcription pipeline stages](https://github.com/mudler/LocalAI/pull/10176) and [configurable WebRTC ICE candidates](https://github.com/mudler/LocalAI/pull/10231).
|
|
- **June 2026**: Big speech push: the [parakeet.cpp](https://github.com/mudler/parakeet.cpp) ASR engine gains [NeMo-faithful segment timestamps](https://github.com/mudler/LocalAI/pull/10207), a [multilingual streaming Nemotron-3.5 model](https://github.com/mudler/LocalAI/pull/10199), [dynamic batching for concurrent transcription](https://github.com/mudler/LocalAI/pull/10112) and [CUDA graphs](https://github.com/mudler/LocalAI/pull/10273); the new [CrispASR backend](https://github.com/mudler/LocalAI/pull/10099) adds multi-architecture ASR + TTS, and [60 Piper TTS voices across 42 languages](https://github.com/mudler/LocalAI/pull/10296) land in the gallery (plus [per-request TTS instructions and params](https://github.com/mudler/LocalAI/pull/10172)).
|
|
- **June 2026**: New backends and models: [locate-anything.cpp](https://github.com/mudler/LocalAI/pull/10264) for open-vocabulary object detection via ggml, [Ideogram4 image generation](https://github.com/mudler/LocalAI/pull/10201) in stablediffusion-ggml, [llama.cpp video input](https://github.com/mudler/LocalAI/pull/10216), and the [Gemma 4 QAT family with MTP speculative-decoding pairs](https://github.com/mudler/LocalAI/pull/10215). Plus an [interactive CLI chat mode](https://github.com/mudler/LocalAI/pull/10226) and [RAG source citations in agent responses](https://github.com/mudler/LocalAI/pull/10228).
|
|
- **June 2026**: Distributed mode hardening: [prefix-cache-aware routing](https://github.com/mudler/LocalAI/pull/10071), a [production-ready request router with auto-sized embedding/rerank batches](https://github.com/mudler/LocalAI/pull/10104), [ds4 layer-split distributed inference](https://github.com/mudler/LocalAI/pull/10098), [NATS JWT auth + TLS/mTLS](https://github.com/mudler/LocalAI/pull/10159), and [resumable file uploads](https://github.com/mudler/LocalAI/pull/10109).
|
|
- **May 2026**: **LocalAI 4.3.0** - `llama.cpp` [prompt cache on by default](https://github.com/mudler/LocalAI/pull/9925) (repeated system prompts collapse from minutes to seconds), [keyless cosign signing of backend OCI images](https://github.com/mudler/LocalAI/pull/9823), [per-API-key + per-user usage attribution](https://github.com/mudler/LocalAI/pull/9920), Distributed v3 with [per-request replica routing](https://github.com/mudler/LocalAI/pull/9968). [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.3.0)
|
|
- **May 2026**: **LocalAI 4.2.0** - LocalAI sees and hears: [voice recognition](https://github.com/mudler/LocalAI/pull/9500), [face recognition + antispoofing liveness](https://github.com/mudler/LocalAI/pull/9480), speaker diarization. Plus [drop-in Ollama API](https://github.com/mudler/LocalAI/pull/9284), [video generation](https://github.com/mudler/LocalAI/pull/9420), redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.2.0)
|
|
- **April 2026**: **LocalAI 4.1.0** - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.1.0)
|
|
- **March 2026**: **LocalAI 4.0.0** - native agentic orchestration with the new [Agenthub](https://agenthub.localai.io) community hub, full React UI rewrite with Canvas mode, [MCP Apps + client-side](https://github.com/mudler/LocalAI/pull/8947) with tool streaming, [WebRTC realtime audio](https://github.com/mudler/LocalAI/pull/8790), [MLX-distributed](https://github.com/mudler/LocalAI/pull/8801). [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.0.0)
|
|
- **February 2026**: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
|
|
- **January 2026**: **LocalAI 3.10.0** — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0)
|
|
- **December 2025**: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic multi-GPU model fitting (llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
|
|
- **November 2025**: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245), [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
|
|
- **October 2025**: [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support for agentic capabilities
|
|
- **September 2025**: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
|
|
- **August 2025**: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
|
|
- **July 2025**: All backends migrated outside the main binary — [lightweight, modular architecture](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
|
|
|
|
For older news and full release notes, see [GitHub Releases](https://github.com/mudler/LocalAI/releases) and the [News page](https://localai.io/basics/news/).
|
|
|
|
## Features
|
|
|
|
- [Text generation](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [and more](https://localai.io/model-compatibility/))
|
|
- [Text to Audio](https://localai.io/features/text-to-audio/)
|
|
- [Audio to Text](https://localai.io/features/audio-to-text/)
|
|
- [Image generation](https://localai.io/features/image-generation)
|
|
- [OpenAI-compatible tools API](https://localai.io/features/openai-functions/)
|
|
- [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
|
|
- [Embeddings generation](https://localai.io/features/embeddings/)
|
|
- [Constrained grammars](https://localai.io/features/constrained_grammars/)
|
|
- [Download models from Huggingface](https://localai.io/models/)
|
|
- [Vision API](https://localai.io/features/gpt-vision/)
|
|
- [Object Detection](https://localai.io/features/object-detection/)
|
|
- [Reranker API](https://localai.io/features/reranker/)
|
|
- [P2P Inferencing](https://localai.io/features/distribute/)
|
|
- [Distributed Mode](https://localai.io/features/distributed-mode/) — Horizontal scaling with PostgreSQL + NATS
|
|
- [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/)
|
|
- [Built-in Agents](https://localai.io/features/agents/) — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and [Agent Hub](https://agenthub.localai.io)
|
|
- [Backend Gallery](https://localai.io/backends/) — Install/remove backends on the fly via OCI images
|
|
- Voice Activity Detection (Silero-VAD)
|
|
- Integrated WebUI
|
|
|
|
## Supported Backends & Acceleration
|
|
|
|
LocalAI supports **60+ backends** including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
|
|
|
|
See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
|
|
|
|
### Backends built by us
|
|
|
|
Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:
|
|
|
|
| Backend | What it does |
|
|
|---------|-------------|
|
|
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
|
|
| [ced.cpp](https://github.com/mudler/ced.cpp) | C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition |
|
|
| [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C |
|
|
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
|
|
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation |
|
|
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) |
|
|
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose estimation |
|
|
| [privacy-filter.cpp](https://github.com/localai-org/privacy-filter.cpp) | Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier |
|
|
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
|
|
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
|
|
|
|
We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
|
|
|
|
## Resources
|
|
|
|
- [Documentation](https://localai.io/)
|
|
- [LLM fine-tuning guide](https://localai.io/docs/advanced/fine-tuning/)
|
|
- [Build from source](https://localai.io/basics/build/)
|
|
- [Kubernetes installation](https://localai.io/basics/getting_started/#run-localai-in-kubernetes)
|
|
- [Integrations & community projects](https://localai.io/docs/integrations/)
|
|
- [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4)
|
|
- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
|
|
- [Examples](https://github.com/mudler/LocalAI-examples) — including the [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (Go client for the Realtime API with tool calling)
|
|
|
|
## Team
|
|
|
|
LocalAI is maintained by a small team of humans, together with the wider community of contributors.
|
|
|
|
- **[Ettore Di Giacinto](https://github.com/mudler)** — original author and project lead
|
|
- **[Richard Palethorpe](https://github.com/richiejp)** — maintainer
|
|
|
|
A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in [Discord](https://discord.gg/uJAeKSAGDy) — LocalAI is a community-driven project and wouldn't exist without you. See the full [contributors list](https://github.com/mudler/LocalAI/graphs/contributors).
|
|
|
|
## Citation
|
|
|
|
If you utilize this repository, data in a downstream project, please consider citing it with:
|
|
|
|
```
|
|
@misc{localai,
|
|
author = {Ettore Di Giacinto},
|
|
title = {LocalAI: The free, Open source OpenAI alternative},
|
|
year = {2023},
|
|
publisher = {GitHub},
|
|
journal = {GitHub repository},
|
|
howpublished = {\url{https://github.com/go-skynet/LocalAI}},
|
|
```
|
|
|
|
## Sponsors
|
|
|
|
> Do you find LocalAI useful?
|
|
|
|
Support the project by becoming [a backer or sponsor](https://github.com/sponsors/mudler). Your logo will show up here with a link to your website.
|
|
|
|
A huge thank you to our generous sponsors who support this project covering CI expenses, and our [Sponsor list](https://github.com/sponsors/mudler):
|
|
|
|
<p align="center">
|
|
<a href="https://www.spectrocloud.com/" target="blank">
|
|
<img height="200" src="https://github.com/user-attachments/assets/72eab1dd-8b93-4fc0-9ade-84db49f24962">
|
|
</a>
|
|
</p>
|
|
|
|
<details>
|
|
|
|
<summary>
|
|
Past sponsors
|
|
</summary>
|
|
|
|
<p align="center">
|
|
<a href="https://www.premai.io/" target="blank">
|
|
<img height="200" src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>
|
|
</a>
|
|
</p>
|
|
|
|
</details>
|
|
|
|
### Individual sponsors
|
|
|
|
A special thanks to individual sponsors, a full list is on [GitHub](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler). Special shout out to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
|
|
|
|
## Star history
|
|
|
|
[](https://star-history.com/#go-skynet/LocalAI&Date)
|
|
|
|
## License
|
|
|
|
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/) and maintained by the [LocalAI team](#team).
|
|
|
|
MIT - Author Ettore Di Giacinto <mudler@localai.io>
|
|
|
|
## Acknowledgements
|
|
|
|
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
|
|
|
|
- [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
|
- https://github.com/tatsu-lab/stanford_alpaca
|
|
- https://github.com/cornelk/llama-go for the initial ideas
|
|
- https://github.com/antimatter15/alpaca.cpp
|
|
- https://github.com/EdVince/Stable-Diffusion-NCNN
|
|
- https://github.com/ggerganov/whisper.cpp
|
|
- https://github.com/rhasspy/piper
|
|
- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation
|
|
|
|
## Contributors
|
|
|
|
This is a community project, a special thanks to our contributors!
|
|
<a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
|
|
<img src="https://contrib.rocks/image?repo=go-skynet/LocalAI" />
|
|
</a>
|