Files
LocalAI/docs/content/reference/compatibility-table.md
2025-12-25 10:00:07 +01:00

9.8 KiB

+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

{{% notice note %}}

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

{{% /notice %}}

Text Generation & Language Models

Backend and Bindings Compatible models Completion/Chat endpoint Capability Embeddings support Token stream support Acceleration
[llama.cpp]({{%relref "features/text-generation#llama.cpp" %}}) LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others yes GPT and Functions yes yes CUDA 11/12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLM Various GPTs and quantization formats yes GPT no no CUDA 12/13, ROCm, Intel
transformers Various GPTs and quantization formats yes GPT, embeddings, Audio generation yes yes* CUDA 11/12/13, ROCm, Intel, CPU
exllama2 GPTQ yes GPT only no no CUDA 12/13
MLX Various LLMs yes GPT no no Metal (Apple Silicon)
MLX-VLM Vision-Language Models yes Multimodal GPT no no Metal (Apple Silicon)
langchain-huggingface Any text generators available on HuggingFace through API yes GPT no no N/A

Audio & Speech Processing

Backend and Bindings Compatible models Completion/Chat endpoint Capability Embeddings support Token stream support Acceleration
whisper.cpp whisper no Audio transcription no no CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU
faster-whisper whisper no Audio transcription no no CUDA 12/13, ROCm, Intel, CPU
piper (binding) Any piper onnx model no Text to voice no no CPU
bark bark no Audio generation no no CUDA 12/13, ROCm, Intel
bark-cpp bark no Audio-Only no no CUDA, Metal, CPU
coqui Coqui TTS no Audio generation and Voice cloning no no CUDA 12/13, ROCm, Intel, CPU
kokoro Kokoro TTS no Text-to-speech no no CUDA 12/13, ROCm, Intel, CPU
chatterbox Chatterbox TTS no Text-to-speech no no CUDA 11/12/13, CPU
kitten-tts Kitten TTS no Text-to-speech no no CPU
silero-vad with Golang bindings Silero VAD no Voice Activity Detection no no CPU
neutts NeuTTSAir no Text-to-speech with voice cloning no no CUDA 12/13, ROCm, CPU
vibevoice VibeVoice-Realtime no Real-time text-to-speech with voice cloning no no CUDA 12/13, ROCm, Intel, CPU
mlx-audio MLX no Text-tospeech no no Metal (Apple Silicon)

Image & Video Generation

Backend and Bindings Compatible models Completion/Chat endpoint Capability Embeddings support Token stream support Acceleration
stablediffusion.cpp stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker no Image no no CUDA 12/13, Intel SYCL, Vulkan, CPU
diffusers SD, various diffusion models,... no Image/Video generation no no CUDA 11/12/13, ROCm, Intel, Metal, CPU
transformers-musicgen MusicGen no Audio generation no no CUDA, CPU

Specialized AI Tasks

Backend and Bindings Compatible models Completion/Chat endpoint Capability Embeddings support Token stream support Acceleration
rfdetr RF-DETR no Object Detection no no CUDA 12/13, Intel, CPU
rerankers Reranking API no Reranking no no CUDA 11/12/13, ROCm, Intel, CPU
local-store Vector database no Vector storage yes no CPU
huggingface HuggingFace API models yes Various AI tasks yes yes API-based

Acceleration Support Summary

GPU Acceleration

  • NVIDIA CUDA: CUDA 11.7, CUDA 12.0, CUDA 13.0 support across most backends
  • AMD ROCm: HIP-based acceleration for AMD GPUs
  • Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
  • Vulkan: Cross-platform GPU acceleration
  • Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

  • NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
  • NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
  • Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
  • Darwin x86: Intel Mac support

CPU Optimization

  • AVX/AVX2/AVX512: Advanced vector extensions for x86
  • Quantization: 4-bit, 5-bit, 8-bit integer quantization support
  • Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

  • * Only for CUDA and OpenVINO CPU/XPU acceleration.