docs: document the privacy-filter.cpp backend in README and compatibility table The privacy-filter.cpp backend (#10360) was registered in backend/index.yaml and referenced from the PII feature docs, but was missing from the backend catalog surfaces. Add it to the README "Backends built by us" table, the compatibility table (Utilities & Other, CPU/CUDA 13/Vulkan), and the backend type list in the backends feature doc. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
12 KiB
+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
{{% notice note %}}
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
All backends listed here can be installed on demand from the [Backend Gallery]({{%relref "features/backends" %}}). The exact set of acceleration variants published for each backend is defined in backend/index.yaml.
{{% /notice %}}
Text Generation & Language Models
| Backend | Description | Capability | Embeddings | Streaming | Acceleration |
|---|---|---|---|---|---|
| llama.cpp | LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | GPT, Functions | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| ik_llama.cpp | Hard fork of llama.cpp optimized for CPU/hybrid CPU+GPU with IQK quants, custom quant mixes, and MLA for DeepSeek | GPT | yes | yes | CPU (AVX2+) |
| turboquant | llama.cpp fork adding the TurboQuant KV-cache quantization scheme | GPT | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T |
| ds4 | DeepSeek V4 Flash single-model inference engine, optimized for Metal and CUDA | GPT | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
| vLLM | Fast LLM serving with PagedAttention; GPTQ/AWQ/FP8 quantization | GPT, Functions, Multimodal | no | yes | CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
| vLLM Omni | Unified multimodal generation (text, image, video, audio) on top of vLLM | Multimodal GPT, Functions | no | yes | CUDA 12/13, ROCm, Jetson L4T |
| SGLang | Fast serving framework for LLMs and vision-language models with speculative decoding | GPT, Functions, Multimodal | no | yes | CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
| transformers | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CUDA 12/13, ROCm, Intel SYCL, Metal |
| MLX | Apple Silicon LLM inference | GPT, Functions | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
| MLX-VLM | Vision-Language Models on Apple Silicon | Multimodal GPT, Functions | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
| MLX Distributed | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | CPU, CUDA 12/13, Metal, Jetson L4T |
| tinygrad | Minimalist deep-learning framework with zero runtime dependencies | GPT, Embeddings, Multimodal | yes | yes | CPU |
Speech-to-Text
| Backend | Description | Acceleration |
|---|---|---|
| whisper.cpp | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| faster-whisper | Fast Whisper with CTranslate2 | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| WhisperX | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal, Jetson L4T |
| moonshine | Ultra-fast transcription for low-end devices (ONNX) | CPU, CUDA 12/13, Metal |
| parakeet.cpp | C++/GGML port of NVIDIA NeMo Parakeet (tdt/ctc/rnnt/hybrid), with cache-aware streaming | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| CrispASR | Unified speech engine (whisper.cpp fork) supporting Parakeet, Canary, and many ASR architectures, plus TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| voxtral | Voxtral Realtime 4B speech-to-text in pure C | CPU, Metal |
| Qwen3-ASR | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| NeMo | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
| sherpa-onnx | Sherpa-ONNX ASR (Whisper, Paraformer, SenseVoice) and TTS | CPU, CUDA 12, Metal |
Text-to-Speech
| Backend | Description | Acceleration |
|---|---|---|
| piper | Fast neural TTS | CPU, Metal |
| Coqui TTS | TTS with 1100+ languages and voice cloning | CUDA 12, ROCm, Intel SYCL, Metal |
| Kokoro | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| Kokoros | Pure Rust Kokoro TTS via ONNX | CPU |
| Chatterbox | Production-grade TTS with emotion control | CPU, CUDA 12/13, Metal, Jetson L4T |
| VibeVoice | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| vibevoice.cpp | Native C++/GGML port of VibeVoice for TTS (voice cloning) and long-form ASR with diarization | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| Qwen3-TTS | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| qwentts.cpp | Native C++/GGML Qwen3-TTS with streaming, named speakers, and voice design | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| OmniVoice | Native C++/GGML TTS with voice cloning, voice design, and streaming | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| fish-speech | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| Pocket TTS | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| OuteTTS | TTS with custom speaker voices | CPU, CUDA 12 |
| faster-qwen3-tts | Real-time Qwen3-TTS with CUDA graph capture | CPU, CUDA 12/13, Jetson L4T |
| NeuTTS Air | Instant voice cloning, on-device TTS | CPU, CUDA 12, ROCm |
| VoxCPM | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
| Kitten TTS | Kitten TTS model | CPU, Metal |
| Supertonic | Lightning-fast on-device multilingual TTS via ONNX | CPU |
| MLX-Audio | Audio models on Apple Silicon | CPU, CUDA 12/13, Metal, Jetson L4T |
| liquid-audio | LFM2 end-to-end speech-to-speech, ASR, and TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
Music & Sound Generation
| Backend | Description | Acceleration |
|---|---|---|
| ACE-Step | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
| acestep.cpp | ACE-Step 1.5 C++ backend using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
Image & Video Generation
| Backend | Description | Acceleration |
|---|---|---|
| stable-diffusion.cpp | Stable Diffusion, Flux, PhotoMaker, Ideogram in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
| diffusers | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
| vLLM Omni | Multimodal generation including text-to-image and text-to-video | CUDA 12/13, ROCm, Jetson L4T |
Vision, Detection & Recognition
| Backend | Description | Acceleration |
|---|---|---|
| RF-DETR | Real-time transformer-based object detection (Python) | CPU, CUDA 12/13, Intel SYCL, Metal, Jetson L4T |
| rf-detr.cpp | Native RF-DETR object detection and instance segmentation in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
| locate-anything.cpp | Open-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
| depth-anything.cpp | Depth Anything 3 monocular metric depth + camera pose in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
| sam3.cpp | Segment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
| insightface | Face verification, embedding, and anti-spoofing liveness (ONNX Runtime) | CPU, CUDA 12 |
| speaker-recognition | Speaker (voice) recognition via SpeechBrain ECAPA-TDNN | CPU, CUDA 12, Metal |
Audio Processing
| Backend | Description | Acceleration |
|---|---|---|
| Silero VAD | Voice Activity Detection | CPU, Metal |
| LocalVQE | Joint acoustic echo cancellation, noise suppression, and dereverberation in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T |
| Opus | Audio codec for WebRTC / Realtime API | CPU, Metal |
Utilities & Other
| Backend | Description | Acceleration |
|---|---|---|
| rerankers | Document reranking for RAG | CUDA 12, ROCm, Intel SYCL, Metal |
| privacy-filter.cpp | Standalone GGML engine for the openai-privacy-filter PII/NER token-classification model family (powers LocalAI's PII redaction tier) | CPU, CUDA 13, Vulkan |
| local-store | Local-first vector database for embeddings | CPU, Metal |
| TRL | Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) | CPU, CUDA 12/13 |
| llama.cpp quantization | HuggingFace → GGUF model conversion and quantization | CPU, Metal |
Acceleration Support Summary
GPU Acceleration
- NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
- AMD ROCm: HIP-based acceleration for AMD GPUs
- Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- Vulkan: Cross-platform GPU acceleration
- Metal: Apple Silicon GPU acceleration (M1/M2/M3+)
Specialized Hardware
- NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
- Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
- Darwin x86: Intel Mac support
CPU Optimization
- AVX/AVX2/AVX512: Advanced vector extensions for x86
- Quantization: 4-bit, 5-bit, 8-bit integer quantization support
- Mixed Precision: F16/F32 mixed precision support
Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
- * Only for CUDA and OpenVINO CPU/XPU acceleration.