Files
LocalAI/docs/content/reference/compatibility-table.md
Ettore Di Giacinto cecd8d6aa5 chore(docs): simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 20:24:44 +00:00

7.5 KiB

+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

{{% notice note %}}

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

{{% /notice %}}

Text Generation & Language Models

Backend Description Capability Embeddings Streaming Acceleration
llama.cpp LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others GPT, Functions yes yes CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
vLLM Fast LLM serving with PagedAttention GPT no no CUDA 12, ROCm, Intel
vLLM Omni Unified multimodal generation (text, image, video, audio) Multimodal GPT no no CUDA 12, ROCm
transformers HuggingFace Transformers framework GPT, Embeddings, Multimodal yes yes* CPU, CUDA 12/13, ROCm, Intel, Metal
MLX Apple Silicon LLM inference GPT no no Metal
MLX-VLM Vision-Language Models on Apple Silicon Multimodal GPT no no Metal
MLX Distributed Distributed LLM inference across multiple Apple Silicon Macs GPT no no Metal

Speech-to-Text

Backend Description Acceleration
whisper.cpp OpenAI Whisper in C/C++ CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T
faster-whisper Fast Whisper with CTranslate2 CUDA 12/13, ROCm, Intel, Metal
WhisperX Word-level timestamps and speaker diarization CPU, CUDA 12/13, ROCm, Metal
moonshine Ultra-fast transcription for low-end devices CPU, CUDA 12/13, Metal
voxtral Voxtral Realtime 4B speech-to-text in C CPU, Metal
Qwen3-ASR Qwen3 automatic speech recognition CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
NeMo NVIDIA NeMo ASR toolkit CPU, CUDA 12/13, ROCm, Intel, Metal

Text-to-Speech

Backend Description Acceleration
piper Fast neural TTS CPU
Coqui TTS TTS with 1100+ languages and voice cloning CPU, CUDA 12/13, ROCm, Intel, Metal
Kokoro Lightweight TTS (82M params) CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
Chatterbox Production-grade TTS with emotion control CPU, CUDA 12/13, Metal, Jetson L4T
VibeVoice Real-time TTS with voice cloning CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
Qwen3-TTS TTS with custom voice, voice design, and voice cloning CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
fish-speech High-quality TTS with voice cloning CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
Pocket TTS Lightweight CPU-efficient TTS with voice cloning CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T
OuteTTS TTS with custom speaker voices CPU, CUDA 12
faster-qwen3-tts Real-time Qwen3-TTS with CUDA graph capture CUDA 12/13, Jetson L4T
NeuTTS Air Instant voice cloning TTS CPU, CUDA 12, ROCm
VoxCPM Expressive end-to-end TTS CPU, CUDA 12/13, ROCm, Intel, Metal
Kitten TTS Kitten TTS model CPU, Metal
MLX-Audio Audio models on Apple Silicon Metal, CPU, CUDA 12/13, Jetson L4T

Music Generation

Backend Description Acceleration
ACE-Step Music generation from text descriptions, lyrics, or audio CPU, CUDA 12/13, ROCm, Intel, Metal
acestep.cpp ACE-Step 1.5 C++ backend using GGML CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T

Image & Video Generation

Backend Description Acceleration
stable-diffusion.cpp Stable Diffusion, Flux, PhotoMaker in C/C++ CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T
diffusers HuggingFace diffusion models (image and video generation) CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T

Specialized Tasks

Backend Description Acceleration
RF-DETR Real-time transformer-based object detection CPU, CUDA 12/13, Intel, Metal, Jetson L4T
rerankers Document reranking for RAG CUDA 12/13, ROCm, Intel, Metal
local-store Local vector database for embeddings CPU, Metal
Silero VAD Voice Activity Detection CPU
TRL Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) CPU, CUDA 12/13
llama.cpp quantization HuggingFace → GGUF model conversion and quantization CPU, Metal
Opus Audio codec for WebRTC / Realtime API CPU, Metal

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
  • NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.