diff --git a/README.md b/README.md
index dcc2da167..a93f160f5 100644
--- a/README.md
+++ b/README.md
@@ -5,35 +5,17 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
@@ -47,22 +29,20 @@
-> :bulb: Get help - [βFAQ](https://localai.io/faq/) [πDiscussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
->
-> [π» Quickstart](https://localai.io/basics/getting_started/) [πΌοΈ Models](https://models.localai.io/) [π Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [π« Examples](https://github.com/mudler/LocalAI-examples) Try on
-[](https://t.me/localaiofficial_bot)
+**LocalAI** is the open-source AI engine. Run any model β LLMs, vision, voice, image, video β on any hardware. No GPU required.
-[](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[](https://artifacthub.io/packages/search?repo=localai)
+- **Drop-in API compatibility** β OpenAI, Anthropic, ElevenLabs APIs
+- **35+ backends** β llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
+- **Any hardware** β NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
+- **Multi-user ready** β API key auth, user quotas, role-based access
+- **Built-in AI agents** β autonomous agents with tool use, RAG, MCP, and skills
+- **Privacy-first** β your data never leaves your infrastructure
-
-
-
-
-
+Created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
-**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
+> [:book: Documentation](https://localai.io/) | [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) | [π» Quickstart](https://localai.io/basics/getting_started/) | [πΌοΈ Models](https://models.localai.io/) | [βFAQ](https://localai.io/faq/)
-## Screenshots / Video
+## Screenshots
### Chat, Model gallery
@@ -72,282 +52,137 @@ https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
-### Youtube video
+## Quickstart
-
-
-
-
-
-
-## π» Quickstart
-
-### macOS Download:
+### macOS
-> Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244
-> Install the DMG and paste this code into terminal: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`
+> **Note:** The DMG is not signed by Apple. After installing, run: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`. See [#6268](https://github.com/mudler/LocalAI/issues/6268) for details.
### Containers (Docker, podman, ...)
-> **π‘ Docker Run vs Docker Start**
->
-> - `docker run` creates and starts a new container. If a container with the same name already exists, this command will fail.
-> - `docker start` starts an existing container that was previously created with `docker run`.
->
-> If you've already run LocalAI before and want to start it again, use: `docker start -i local-ai`
+> Already ran LocalAI before? Use `docker start -i local-ai` to restart an existing container.
-#### CPU only image:
+#### CPU only:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
```
-#### NVIDIA GPU Images:
+#### NVIDIA GPU:
```bash
-# CUDA 13.0
+# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
-# CUDA 12.0
+# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
-# NVIDIA Jetson (L4T) ARM64
-# CUDA 12 (for Nvidia AGX Orin and similar platforms)
+# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
-# CUDA 13 (for Nvidia DGX Spark)
+# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
```
-#### AMD GPU Images (ROCm):
+#### AMD GPU (ROCm):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
```
-#### Intel GPU Images (oneAPI):
+#### Intel GPU (oneAPI):
```bash
docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
```
-#### Vulkan GPU Images:
+#### Vulkan GPU:
```bash
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
```
-To load models:
+### Loading models
```bash
-# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
+# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
-# Start LocalAI with the phi-2 model directly from huggingface
+# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
-# Install and run a model from the Ollama OCI registry
+# From the Ollama OCI registry
local-ai run ollama://gemma:2b
-# Run a model from a configuration file
+# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
-# Install and run a model from a standard OCI registry (e.g., Docker Hub)
+# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
```
-> β‘ **Automatic Backend Detection**: When you install models from the gallery or YAML files, LocalAI automatically detects your system's GPU capabilities (NVIDIA, AMD, Intel) and downloads the appropriate backend. For advanced configuration options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/#automatic-backend-detection).
+> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).
-For more information, see [π» Getting started](https://localai.io/basics/getting_started/index.html), if you are interested in our roadmap items and future enhancements, you can see the [Issues labeled as Roadmap here](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
+For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).
-## π° Latest project news
-- March 2026: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790),[MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
-- February 2026: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
-- January 2026: **LocalAI 3.10.0** - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: **Moonshine** (ultra-fast transcription), **Pocket-TTS** (lightweight TTS). Vulkan arm64 builds now available. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0).
-- December 2025: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic fitting of models to multiple GPUS(llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Added Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
-- November 2025: Major improvements to the UX. Among these: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245) and [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
-- October 2025: π [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support added for agentic capabilities with external tools
-- September 2025: New Launcher application for MacOS and Linux, extended support to many backends for Mac and Nvidia L4T devices. Models: Added MLX-Audio, WAN 2.2. WebUI improvements and Python-based backends now ships portable python environments.
-- August 2025: MLX, MLX-VLM, Diffusers and llama.cpp are now supported on Mac M1/M2/M3+ chips ( with `development` suffix in the gallery ): https://github.com/mudler/LocalAI/pull/6049 https://github.com/mudler/LocalAI/pull/6119 https://github.com/mudler/LocalAI/pull/6121 https://github.com/mudler/LocalAI/pull/6060
-- July/August 2025: π [Object Detection](https://localai.io/features/object-detection/) added to the API featuring [rf-detr](https://github.com/roboflow/rf-detr)
-- July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. [Read the release notes](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
-- June 2025: [Backend management](https://github.com/mudler/LocalAI/pull/5607) has been added. Attention: extras images are going to be deprecated from the next release! Read [the backend management PR](https://github.com/mudler/LocalAI/pull/5607).
-- May 2025: [Audio input](https://github.com/mudler/LocalAI/pull/5466) and [Reranking](https://github.com/mudler/LocalAI/pull/5396) in llama.cpp backend, [Realtime API](https://github.com/mudler/LocalAI/pull/5392), Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
-- May 2025: Important: image name changes [See release](https://github.com/mudler/LocalAI/releases/tag/v2.29.0)
-- Apr 2025: Rebrand, WebUI enhancements
-- Apr 2025: [LocalAGI](https://github.com/mudler/LocalAGI) and [LocalRecall](https://github.com/mudler/LocalRecall) join the LocalAI family stack.
-- Apr 2025: WebUI overhaul
-- Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images
-- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
-- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
-- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
-- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
-- Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
-- Aug 2024: π FLUX-1, [P2P Explorer](https://explorer.localai.io)
-- July 2024: π₯π₯ π P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
-- May 2024: π₯π₯ Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) π Docs https://localai.io/features/distribute/
-- May 2024: π₯π₯ Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
-- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
+## Latest News
-Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
+- **March 2026**: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790), [MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
+- **February 2026**: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
+- **January 2026**: **LocalAI 3.10.0** β Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0)
+- **December 2025**: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic multi-GPU model fitting (llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
+- **November 2025**: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245), [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
+- **October 2025**: [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support for agentic capabilities
+- **September 2025**: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
+- **August 2025**: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
+- **July 2025**: All backends migrated outside the main binary β [lightweight, modular architecture](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
-## π [Features](https://localai.io/features/)
+For older news and full release notes, see [GitHub Releases](https://github.com/mudler/LocalAI/releases) and the [News page](https://localai.io/basics/news/).
-- π§© [Backend Gallery](https://localai.io/backends/): Install/remove backends on the fly, powered by OCI images β fully customizable and API-driven.
-- π [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
-- π£ [Text to Audio](https://localai.io/features/text-to-audio/)
-- π [Audio to Text](https://localai.io/features/audio-to-text/)
-- π¨ [Image generation](https://localai.io/features/image-generation)
-- π₯ [OpenAI-alike tools API](https://localai.io/features/openai-functions/)
-- β‘ [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
-- π§ [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
-- βοΈ [Constrained grammars](https://localai.io/features/constrained_grammars/)
-- πΌοΈ [Download Models directly from Huggingface ](https://localai.io/models/)
-- π₯½ [Vision API](https://localai.io/features/gpt-vision/)
-- π [Object Detection](https://localai.io/features/object-detection/)
-- π [Reranker API](https://localai.io/features/reranker/)
-- ππ§ [P2P Inferencing](https://localai.io/features/distribute/)
-- ππ [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) - Agentic capabilities with external tools and [LocalAGI's Agentic capabilities](https://github.com/mudler/LocalAGI)
-- ππ€ [Built-in Agents](https://localai.io/features/agents/) - Autonomous AI agents with tool use, knowledge base (RAG), skills, SSE streaming, import/export, and [Agent Hub](https://agenthub.localai.io) β powered by [LocalAGI](https://github.com/mudler/LocalAGI)
-- π Voice activity detection (Silero-VAD support)
-- π Integrated WebUI!
+## Features
-## π§© Supported Backends & Acceleration
+- [Text generation](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [and more](https://localai.io/model-compatibility/))
+- [Text to Audio](https://localai.io/features/text-to-audio/)
+- [Audio to Text](https://localai.io/features/audio-to-text/)
+- [Image generation](https://localai.io/features/image-generation)
+- [OpenAI-compatible tools API](https://localai.io/features/openai-functions/)
+- [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
+- [Embeddings generation](https://localai.io/features/embeddings/)
+- [Constrained grammars](https://localai.io/features/constrained_grammars/)
+- [Download models from Huggingface](https://localai.io/models/)
+- [Vision API](https://localai.io/features/gpt-vision/)
+- [Object Detection](https://localai.io/features/object-detection/)
+- [Reranker API](https://localai.io/features/reranker/)
+- [P2P Inferencing](https://localai.io/features/distribute/)
+- [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/)
+- [Built-in Agents](https://localai.io/features/agents/) β Autonomous AI agents with tool use, RAG, skills, SSE streaming, and [Agent Hub](https://agenthub.localai.io)
+- [Backend Gallery](https://localai.io/backends/) β Install/remove backends on the fly via OCI images
+- Voice Activity Detection (Silero-VAD)
+- Integrated WebUI
-LocalAI supports a comprehensive range of AI backends with multiple acceleration options:
+## Supported Backends & Acceleration
-### Text Generation & Language Models
-| Backend | Description | Acceleration Support |
-|---------|-------------|---------------------|
-| **llama.cpp** | LLM inference in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
-| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12/13, ROCm, Intel |
-| **transformers** | HuggingFace transformers framework | CUDA 12/13, ROCm, Intel, CPU |
-| **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
-| **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
-| **vLLM Omni** | Multimodal vLLM with vision and audio | CUDA 12/13, ROCm, Intel |
+LocalAI supports **35+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
-### Audio & Speech Processing
-| Backend | Description | Acceleration Support |
-|---------|-------------|---------------------|
-| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
-| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, CPU |
-| **moonshine** | Ultra-fast transcription engine for low-end devices | CUDA 12/13, Metal, CPU |
-| **coqui** | Advanced TTS with 1100+ languages | CUDA 12/13, ROCm, Intel, CPU |
-| **kokoro** | Lightweight TTS model | CUDA 12/13, ROCm, Intel, CPU |
-| **chatterbox** | Production-grade TTS | CUDA 12/13, CPU |
-| **piper** | Fast neural TTS system | CPU |
-| **kitten-tts** | Kitten TTS models | CPU |
-| **silero-vad** | Voice Activity Detection | CPU |
-| **neutts** | Text-to-speech with voice cloning | CUDA 12/13, ROCm, CPU |
-| **vibevoice** | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU |
-| **pocket-tts** | Lightweight CPU-based TTS | CUDA 12/13, ROCm, Intel, CPU |
-| **qwen-tts** | High-quality TTS with custom voice, voice design, and voice cloning | CUDA 12/13, ROCm, Intel, CPU |
-| **nemo** | NVIDIA NeMo framework for speech models | CUDA 12/13, ROCm, Intel, CPU |
-| **outetts** | OuteTTS with voice cloning | CUDA 12/13, CPU |
-| **faster-qwen3-tts** | Faster Qwen3 TTS | CUDA 12/13, ROCm, Intel, CPU |
-| **qwen-asr** | Qwen ASR speech recognition | CUDA 12/13, ROCm, Intel, CPU |
-| **voxcpm** | VoxCPM speech understanding | CUDA 12/13, Metal, CPU |
-| **whisperx** | Enhanced Whisper transcription | CUDA 12/13, ROCm, Intel, CPU |
-| **ace-step** | Music generation from text descriptions, lyrics, or audio samples | CUDA 12/13, ROCm, Intel, Metal, CPU |
+See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
-### Image & Video Generation
-| Backend | Description | Acceleration Support |
-|---------|-------------|---------------------|
-| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12/13, Intel SYCL, Vulkan, CPU |
-| **diffusers** | HuggingFace diffusion models | CUDA 12/13, ROCm, Intel, Metal, CPU |
+## Resources
-### Specialized AI Tasks
-| Backend | Description | Acceleration Support |
-|---------|-------------|---------------------|
-| **rfdetr** | Real-time object detection | CUDA 12/13, Intel, CPU |
-| **rerankers** | Document reranking API | CUDA 12/13, ROCm, Intel, CPU |
-| **local-store** | Vector database | CPU |
-| **huggingface** | HuggingFace API integration | API-based |
+- [Documentation](https://localai.io/)
+- [LLM fine-tuning guide](https://localai.io/docs/advanced/fine-tuning/)
+- [Build from source](https://localai.io/basics/build/)
+- [Kubernetes installation](https://localai.io/basics/getting_started/#run-localai-in-kubernetes)
+- [Integrations & community projects](https://localai.io/docs/integrations/)
+- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
+- [Examples](https://github.com/mudler/LocalAI-examples)
-### Hardware Acceleration Matrix
+## Autonomous Development Team
-| Acceleration Type | Supported Backends | Hardware Support |
-|-------------------|-------------------|------------------|
-| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware |
-| **NVIDIA CUDA 13** | All CUDA-compatible backends | Nvidia hardware |
-| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts, ace-step | AMD Graphics |
-| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts, ace-step | Intel Arc, Intel iGPUs |
-| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, moonshine, ace-step | Apple M1/M2/M3+ |
-| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs |
-| **NVIDIA Jetson (CUDA 12)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr, ace-step | ARM64 embedded AI (AGX Orin, etc.) |
-| **NVIDIA Jetson (CUDA 13)** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI (DGX Spark) |
-| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support |
+LocalAI is helped being maintained by a team of autonomous AI agents led by an AI Scrum Master.
-### π Community and integrations
-
-Build and deploy custom containers:
-- https://github.com/sozercan/aikit
-
-WebUIs:
-- https://github.com/Jirubizu/localai-admin
-- https://github.com/go-skynet/LocalAI-frontend
-- QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot
-
-Agentic Libraries:
-- https://github.com/mudler/cogito
-
-MCPs:
-- https://github.com/mudler/MCPs
-
-OS Assistant:
-
-- https://github.com/mudler/Keygeist - Keygeist is an AI-powered keyboard operator that listens for key combinations and responds with AI-generated text typed directly into your Linux box.
-
-Model galleries
-- https://github.com/go-skynet/model-gallery
-
-Voice:
-- https://github.com/richiejp/VoxInput
-
-Other:
-- Helm chart https://github.com/go-skynet/helm-charts
-- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
-- Langchain: https://python.langchain.com/docs/integrations/providers/localai/
-- Terminal utility https://github.com/djcopley/ShellOracle
-- Local Smart assistant https://github.com/mudler/LocalAGI
-- Home Assistant https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-llmvision / https://github.com/loryanstrant/HA-LocalAI-Monitor
-- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
-- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
-- Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
-- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
-- Another Telegram Bot https://github.com/JackBekket/Hellper
-- Auto-documentation https://github.com/JackBekket/Reflexia
-- Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper
-- Github Actions: https://github.com/marketplace/actions/start-localai
-- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
-
-
-### π Resources
-
-- [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
-- [How to build locally](https://localai.io/basics/build/index.html)
-- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
-- [Projects integrating LocalAI](https://localai.io/docs/integrations/)
-
-## :book: π₯ [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
-
-- π [LocalAI Autonomous Dev Team Blog Post](https://mudler.pm/posts/2026/02/28/a-call-to-open-source-maintainers-stop-babysitting-ai-how-i-built-a-100-local-autonomous-dev-team-to-maintain-localai-and-why-you-should-too/)
-- [Run Visual studio code with LocalAI (SUSE)](https://www.suse.com/c/running-ai-locally/)
-- π [Run LocalAI on Jetson Nano Devkit](https://mudler.pm/posts/local-ai-jetson-nano-devkit/)
-- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/blog/low-code-llm-apps-with-local-ai-flowise-and-pulumi/)
-- [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
-- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
-- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
-- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
-- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)
-
-## π€ Autonomous Development Team
-
-LocalAI is now helped being maintained (for small tasks!) by a full team of autonomous AI agents led by an AI Scrum Master! This experiment demonstrates how open source projects can leverage AI agents for sustainable, long-term maintenance.
-
-- **π Live Reports**: [Automatically generated reports](http://reports.localai.io)
-- **π Project Board**: [Agent task tracking](https://github.com/users/mudler/projects/6)
-- **π Blog Post**: [Learn about the autonomous dev team experiment](https://mudler.pm/posts/2026/02/28/a-call-to-open-source-maintainers-stop-babysitting-ai-how-i-built-a-100-local-autonomous-dev-team-to-maintain-localai-and-why-you-should-too/)
+- **Live Reports**: [reports.localai.io](http://reports.localai.io)
+- **Project Board**: [Agent task tracking](https://github.com/users/mudler/projects/6)
+- **Blog Post**: [Learn about the experiment](https://mudler.pm/posts/2026/02/28/a-call-to-open-source-maintainers-stop-babysitting-ai-how-i-built-a-100-local-autonomous-dev-team-to-maintain-localai-and-why-you-should-too/)
## Citation
@@ -363,7 +198,7 @@ If you utilize this repository, data in a downstream project, please consider ci
howpublished = {\url{https://github.com/go-skynet/LocalAI}},
```
-## β€οΈ Sponsors
+## Sponsors
> Do you find LocalAI useful?
@@ -382,19 +217,19 @@ A huge thank you to our generous sponsors who support this project covering CI e
### Individual sponsors
-A special thanks to individual sponsors that contributed to the project, a full list is in [Github](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler), a special shout out goes to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
+A special thanks to individual sponsors, a full list is on [GitHub](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler). Special shout out to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
-## π Star history
+## Star history
[](https://star-history.com/#go-skynet/LocalAI&Date)
-## π License
+## License
LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
MIT - Author Ettore Di Giacinto
-## π Acknowledgements
+## Acknowledgements
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
@@ -407,9 +242,9 @@ LocalAI couldn't have been built without the help of great software already avai
- https://github.com/rhasspy/piper
- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation
-## π€ Contributors
+## Contributors
-This is a community project, a special thanks to our contributors! π€
+This is a community project, a special thanks to our contributors!
diff --git a/docs/content/integrations.md b/docs/content/integrations.md
index 4851267de..0f147002a 100644
--- a/docs/content/integrations.md
+++ b/docs/content/integrations.md
@@ -6,35 +6,94 @@ icon = "sync"
+++
-## Community integrations
+## Community Integrations
-List of projects that are using directly LocalAI behind the scenes can be found [here](https://github.com/mudler/LocalAI#-community-and-integrations).
+The lists below cover software and community projects that integrate with LocalAI.
-The list below is a list of software that integrates with LocalAI.
+Feel free to open up a Pull request (by clicking at the "Edit page" below) to get your project added!
+
+### Build & Deploy
+
+- [aikit](https://github.com/sozercan/aikit) β Build and deploy custom LocalAI containers
+- [Helm chart](https://github.com/go-skynet/helm-charts) β Deploy LocalAI on Kubernetes
+- [GitHub Actions](https://github.com/marketplace/actions/start-localai) β Use LocalAI in CI/CD workflows
+
+### Web UIs
+
+- [localai-admin](https://github.com/Jirubizu/localai-admin)
+- [LocalAI-frontend](https://github.com/go-skynet/LocalAI-frontend)
+- [QA-Pilot](https://github.com/reid41/QA-Pilot) β Interactive chat for navigating GitHub code repositories
+- [Big AGI](https://github.com/enricoros/big-agi) β Powerful web interface running entirely in the browser
+
+### Agentic Libraries & Assistants
+
+- [cogito](https://github.com/mudler/cogito) β Agentic library for Go
+- [LocalAGI](https://github.com/mudler/LocalAGI) β Local smart assistant with autonomous agents
+
+### MCP Servers
+
+- [MCPs](https://github.com/mudler/MCPs) β Model Context Protocol servers
+
+### OS Assistants
+
+- [Keygeist](https://github.com/mudler/Keygeist) β AI-powered keyboard operator for Linux
+
+### Voice
+
+- [VoxInput](https://github.com/richiejp/VoxInput) β Use voice to control your desktop
+
+### IDE & Editor Plugins
+
+- [VSCode extension](https://github.com/badgooooor/localai-vscode-plugin)
+- [GPTLocalhost (Word Add-in)](https://gptlocalhost.com/demo#LocalAI) β Run LocalAI in Microsoft Word locally
+
+### Framework Integrations
+
+- [Langchain (Python)](https://python.langchain.com/docs/integrations/providers/localai/) β [pypi](https://pypi.org/project/langchain-localai/)
+- [langchain4j](https://github.com/langchain4j/langchain4j) β Java LangChain
+- [lingoose](https://github.com/henomis/lingoose) β Go framework for LLM apps
+- [LLPhant](https://github.com/theodo-group/LLPhant) β PHP library for LLMs and vector databases
+- [FlowiseAI](https://github.com/FlowiseAI/Flowise) β Low-code LLM app builder
+- [LLMStack](https://github.com/trypromptly/LLMStack)
+- [Midori AI Subsystem Manager](https://io.midori-ai.xyz/subsystem/manager/)
+
+### Terminal Tools
+
+- [ShellOracle](https://github.com/djcopley/ShellOracle) β Terminal utility
+- [Shell-Pilot](https://github.com/reid41/shell-pilot) β Interact with LLMs via pure shell scripts
+- [Mods](https://github.com/charmbracelet/mods) β AI on the command line
+
+### Chat Bots
+
+- [Discord bot](https://github.com/mudler/LocalAGI/tree/main/examples/discord)
+- [Slack bot](https://github.com/mudler/LocalAGI/tree/main/examples/slack)
+- [Telegram bot](https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot)
+- [Hellper (Telegram)](https://github.com/JackBekket/Hellper)
+
+### Home Automation
+
+- [hass-openai-custom-conversation](https://github.com/drndos/hass-openai-custom-conversation) β Home Assistant integration
+- [ha-llmvision](https://github.com/valentinfrlch/ha-llmvision) β Home Assistant LLM Vision
+- [HA-LocalAI-Monitor](https://github.com/loryanstrant/HA-LocalAI-Monitor) β Home Assistant monitoring
+- Nextcloud [integration plugin](https://apps.nextcloud.com/apps/integration_openai) and [AI assistant](https://apps.nextcloud.com/apps/assistant)
+
+### Automation & DevOps
+
+- [Reflexia](https://github.com/JackBekket/Reflexia) β Auto-documentation
+- [GitHelper](https://github.com/JackBekket/GitHelper) β GitHub bot for issues with code and documentation context
+- [kairos](https://github.com/kairos-io/kairos) β Immutable Linux OS
+
+### Other Integrations
- [AnythingLLM](https://github.com/Mintplex-Labs/anything-llm)
-- [Logseq GPT3 OpenAI plugin](https://github.com/briansunter/logseq-plugin-gpt3-openai) allows to set a base URL, and works with LocalAI.
-- https://plugins.jetbrains.com/plugin/21056-codegpt allows for custom OpenAI compatible endpoints since 2.4.0
-- [Wave Terminal](https://docs.waveterm.dev/features/supportedLLMs/localai) has native support for LocalAI!
-- https://github.com/longy2k/obsidian-bmo-chatbot
-- https://github.com/FlowiseAI/Flowise
-- https://github.com/k8sgpt-ai/k8sgpt
-- https://github.com/kairos-io/kairos
-- https://github.com/langchain4j/langchain4j
-- https://github.com/henomis/lingoose
-- https://github.com/trypromptly/LLMStack
-- https://github.com/mattermost/openops
-- https://github.com/charmbracelet/mods
-- https://github.com/cedriking/spark
-- [Big AGI](https://github.com/enricoros/big-agi) is a powerful web interface entirely running in the browser, supporting LocalAI
-- [Midori AI Subsystem Manager](https://io.midori-ai.xyz/subsystem/manager/) is a powerful docker subsystem for running all types of AI programs
-- [LLPhant](https://github.com/theodo-group/LLPhant) is a PHP library for interacting with LLMs and Vector Databases
-- [GPTLocalhost (Word Add-in)](https://gptlocalhost.com/demo#LocalAI) - run LocalAI in Microsoft Word locally
-- use LocalAI from Nextcloud with the [integration plugin](https://apps.nextcloud.com/apps/integration_openai) and [AI assistant](https://apps.nextcloud.com/apps/assistant)
-- [Langchain](https://docs.langchain.com/oss/python/integrations/providers/localai) integration package [pypi](https://pypi.org/project/langchain-localai/)
-- [VoxInput](https://github.com/richiejp/VoxInput) - Use voice to control your desktop
-
-Feel free to open up a Pull request (by clicking at the "Edit page" below) to get a page for your project made or if you see a error on one of the pages!
+- [Logseq GPT3 OpenAI plugin](https://github.com/briansunter/logseq-plugin-gpt3-openai)
+- [CodeGPT (JetBrains)](https://plugins.jetbrains.com/plugin/21056-codegpt) β Custom OpenAI-compatible endpoints
+- [Wave Terminal](https://docs.waveterm.dev/features/supportedLLMs/localai) β Native LocalAI support
+- [Obsidian BMO Chatbot](https://github.com/longy2k/obsidian-bmo-chatbot)
+- [spark](https://github.com/cedriking/spark)
+- [openops (Mattermost)](https://github.com/mattermost/openops)
+- [Model Gallery](https://github.com/go-skynet/model-gallery)
+- [Examples](https://github.com/mudler/LocalAI/tree/master/examples/)
## Configuration Guides
diff --git a/docs/content/reference/compatibility-table.md b/docs/content/reference/compatibility-table.md
index fc3033aa9..80cf4e781 100644
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -16,55 +16,72 @@ LocalAI will attempt to automatically load models which are not explicitly confi
## Text Generation & Language Models
-| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
-|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
-| [llama.cpp]({{%relref "features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
-| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12/13, ROCm, Intel |
-| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 12/13, ROCm, Intel, CPU |
-| [MLX](https://github.com/ml-explore/mlx-lm) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
-| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
-| [vllm-omni](https://github.com/vllm-project/vllm) | vLLM Omni multimodal | yes | Multimodal GPT | no | no | CUDA 12/13, ROCm, Intel |
-| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
+| Backend | Description | Capability | Embeddings | Streaming | Acceleration |
+|---------|-------------|------------|------------|-----------|-------------|
+| [llama.cpp](https://github.com/ggerganov/llama.cpp) | LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | GPT, Functions | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
+| [vLLM](https://github.com/vllm-project/vllm) | Fast LLM serving with PagedAttention | GPT | no | no | CUDA 12, ROCm, Intel |
+| [vLLM Omni](https://github.com/vllm-project/vllm) | Unified multimodal generation (text, image, video, audio) | Multimodal GPT | no | no | CUDA 12, ROCm |
+| [transformers](https://github.com/huggingface/transformers) | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CPU, CUDA 12/13, ROCm, Intel, Metal |
+| [MLX](https://github.com/ml-explore/mlx-lm) | Apple Silicon LLM inference | GPT | no | no | Metal |
+| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models on Apple Silicon | Multimodal GPT | no | no | Metal |
+| [MLX Distributed](https://github.com/ml-explore/mlx-lm) | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | Metal |
-## Audio & Speech Processing
+## Speech-to-Text
-| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
-|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
-| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
-| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU |
-| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 12/13, CPU |
-| [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS | no | Text-to-speech | no | no | CPU |
-| [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU |
-| [neutts](https://github.com/neuphonic/neuttsair) | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, CPU |
-| [vibevoice](https://github.com/microsoft/VibeVoice) | VibeVoice-Realtime | no | Real-time text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [pocket-tts](https://github.com/kyutai-labs/pocket-tts) | Pocket TTS | no | Lightweight CPU-based text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [mlx-audio](https://github.com/Blaizzy/mlx-audio) | MLX | no | Text-tospeech | no | no | Metal (Apple Silicon) |
-| [nemo](https://github.com/NVIDIA/NeMo) | NeMo speech models | no | Speech models | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [outetts](https://github.com/edwengc/outetts) | OuteTTS | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, CPU |
-| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Faster Qwen3 TTS | no | Fast text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [qwen-asr](https://github.com/QwenLM/Qwen-ASR) | Qwen ASR | no | Automatic speech recognition | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [voxcpm](https://github.com/voxcpm/voxcpm) | VoxCPM | no | Speech understanding | no | no | CUDA 12/13, Metal, CPU |
-| [whisperx](https://github.com/m-bain/whisperX) | WhisperX | no | Enhanced transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
+| Backend | Description | Acceleration |
+|---------|-------------|-------------|
+| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
+| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
+| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
+| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
+| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
+| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [NeMo](https://github.com/NVIDIA/NeMo) | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel, Metal |
+
+## Text-to-Speech
+
+| Backend | Description | Acceleration |
+|---------|-------------|-------------|
+| [piper](https://github.com/rhasspy/piper) | Fast neural TTS | CPU |
+| [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) | TTS with 1100+ languages and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal |
+| [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [Chatterbox](https://github.com/resemble-ai/chatterbox) | Production-grade TTS with emotion control | CPU, CUDA 12/13, Metal, Jetson L4T |
+| [VibeVoice](https://github.com/microsoft/VibeVoice) | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [fish-speech](https://github.com/fishaudio/fish-speech) | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
+| [OuteTTS](https://github.com/OuteAI/outetts) | TTS with custom speaker voices | CPU, CUDA 12 |
+| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Real-time Qwen3-TTS with CUDA graph capture | CUDA 12/13, Jetson L4T |
+| [NeuTTS Air](https://github.com/neuphonic/neutts-air) | Instant voice cloning TTS | CPU, CUDA 12, ROCm |
+| [VoxCPM](https://github.com/ModelBest/VoxCPM) | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel, Metal |
+| [Kitten TTS](https://github.com/KittenML/KittenTTS) | Kitten TTS model | CPU, Metal |
+| [MLX-Audio](https://github.com/Blaizzy/mlx-audio) | Audio models on Apple Silicon | Metal, CPU, CUDA 12/13, Jetson L4T |
+
+## Music Generation
+
+| Backend | Description | Acceleration |
+|---------|-------------|-------------|
+| [ACE-Step](https://github.com/ace-step/ACE-Step-1.5) | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel, Metal |
+| [acestep.cpp](https://github.com/ace-step/acestep.cpp) | ACE-Step 1.5 C++ backend using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
## Image & Video Generation
-| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
-|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
-| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12/13, Intel SYCL, Vulkan, CPU |
-| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 12/13, ROCm, Intel, Metal, CPU |
-| [transformers-musicgen](https://github.com/huggingface/transformers) | MusicGen | no | Audio generation | no | no | CUDA, CPU |
+| Backend | Description | Acceleration |
+|---------|-------------|-------------|
+| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | Stable Diffusion, Flux, PhotoMaker in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
+| [diffusers](https://github.com/huggingface/diffusers) | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
-## Specialized AI Tasks
+## Specialized Tasks
-| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
-|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
-| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12/13, Intel, CPU |
-| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 12/13, ROCm, Intel, CPU |
-| [local-store](https://github.com/mudler/LocalAI) | Vector database | no | Vector storage | yes | no | CPU |
-| [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
+| Backend | Description | Acceleration |
+|---------|-------------|-------------|
+| [RF-DETR](https://github.com/roboflow/rf-detr) | Real-time transformer-based object detection | CPU, CUDA 12/13, Intel, Metal, Jetson L4T |
+| [rerankers](https://github.com/AnswerDotAI/rerankers) | Document reranking for RAG | CUDA 12/13, ROCm, Intel, Metal |
+| [local-store](https://github.com/mudler/LocalAI) | Local vector database for embeddings | CPU, Metal |
+| [Silero VAD](https://github.com/snakers4/silero-vad) | Voice Activity Detection | CPU |
+| [TRL](https://github.com/huggingface/trl) | Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) | CPU, CUDA 12/13 |
+| [llama.cpp quantization](https://github.com/ggml-org/llama.cpp) | HuggingFace β GGUF model conversion and quantization | CPU, Metal |
+| [Opus](https://opus-codec.org/) | Audio codec for WebRTC / Realtime API | CPU, Metal |
## Acceleration Support Summary
diff --git a/docs/content/whats-new.md b/docs/content/whats-new.md
index f3b57c178..05dc2e961 100644
--- a/docs/content/whats-new.md
+++ b/docs/content/whats-new.md
@@ -6,10 +6,20 @@ url = '/basics/news/'
icon = "newspaper"
+++
-Release notes have been now moved completely over Github releases.
+Release notes have been now moved completely over Github releases.
You can see the release notes [here](https://github.com/mudler/LocalAI/releases).
+## 2024 Highlights
+
+- **April 2024**: [Reranker API](https://github.com/mudler/LocalAI/pull/2121)
+- **May 2024**: [Distributed inferencing](https://github.com/mudler/LocalAI/pull/2324), [Decentralized P2P llama.cpp](https://github.com/mudler/LocalAI/pull/2343) β [Docs](https://localai.io/features/distribute/)
+- **July/August 2024**: [P2P Dashboard, Federated mode and AI Swarms](https://github.com/mudler/LocalAI/pull/2723), [P2P Global community pools](https://github.com/mudler/LocalAI/issues/3113), FLUX-1 support, [P2P Explorer](https://explorer.localai.io)
+- **October 2024**: Examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
+- **November 2024**: [Voice Activity Detection (VAD)](https://github.com/mudler/LocalAI/pull/4204), [Bark.cpp backend](https://github.com/mudler/LocalAI/pull/4287)
+- **December 2024**: [stablediffusion.cpp backend (ggml)](https://github.com/mudler/LocalAI/pull/4289)
+
+---
## 04-12-2023: __v2.0.0__