LocalAI/docs/content/reference/compatibility-table.md at b2ff1cea2a57eb7574152a4eca290d296c47954b

mirror of https://github.com/mudler/LocalAI.git synced 2026-01-18 11:21:10 -05:00

Files

Richard Palethorpe e6ba26c3e7 chore: Update to Ubuntu24.04 (cont #7423 ) (#7769 )

* ci(workflows): bump GitHub Actions images to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): bump GitHub Actions CUDA support to 12.9

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): disable parallel backend builds to avoid race conditions

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): update backend Dockerfiles to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore: add local-ai-launcher to .gitignore

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix backends GitHub Actions workflows after rebase

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): use build-time UBUNTU_VERSION variable

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(docker): remove libquadmath0 from requirements-stage base image

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(docker): correct CUDA installation steps in backend Dockerfiles

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): update ROCm to 6.4 and align Python hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix failing GitHub Actions runners

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): rm all traces of CUDA 11

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): Add Ubuntu codename as an argument

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

2026-01-06 15:26:42 +01:00

9.8 KiB

Raw Blame History

+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

Text Generation & Language Models

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
[llama.cpp]({{%relref "features/text-generation#llama.cpp" %}})	LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others	yes	GPT and Functions	yes	yes	CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLM	Various GPTs and quantization formats	yes	GPT	no	no	CUDA 12/13, ROCm, Intel
transformers	Various GPTs and quantization formats	yes	GPT, embeddings, Audio generation	yes	yes*	CUDA 12/13, ROCm, Intel, CPU
exllama2	GPTQ	yes	GPT only	no	no	CUDA 12/13
MLX	Various LLMs	yes	GPT	no	no	Metal (Apple Silicon)
MLX-VLM	Vision-Language Models	yes	Multimodal GPT	no	no	Metal (Apple Silicon)
langchain-huggingface	Any text generators available on HuggingFace through API	yes	GPT	no	no	N/A

Audio & Speech Processing

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
whisper.cpp	whisper	no	Audio transcription	no	no	CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU
faster-whisper	whisper	no	Audio transcription	no	no	CUDA 12/13, ROCm, Intel, CPU
piper (binding)	Any piper onnx model	no	Text to voice	no	no	CPU
bark	bark	no	Audio generation	no	no	CUDA 12/13, ROCm, Intel
bark-cpp	bark	no	Audio-Only	no	no	CUDA, Metal, CPU
coqui	Coqui TTS	no	Audio generation and Voice cloning	no	no	CUDA 12/13, ROCm, Intel, CPU
kokoro	Kokoro TTS	no	Text-to-speech	no	no	CUDA 12/13, ROCm, Intel, CPU
chatterbox	Chatterbox TTS	no	Text-to-speech	no	no	CUDA 12/13, CPU
kitten-tts	Kitten TTS	no	Text-to-speech	no	no	CPU
silero-vad with Golang bindings	Silero VAD	no	Voice Activity Detection	no	no	CPU
neutts	NeuTTSAir	no	Text-to-speech with voice cloning	no	no	CUDA 12/13, ROCm, CPU
vibevoice	VibeVoice-Realtime	no	Real-time text-to-speech with voice cloning	no	no	CUDA 12/13, ROCm, Intel, CPU
mlx-audio	MLX	no	Text-tospeech	no	no	Metal (Apple Silicon)

Image & Video Generation

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
stablediffusion.cpp	stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker	no	Image	no	no	CUDA 12/13, Intel SYCL, Vulkan, CPU
diffusers	SD, various diffusion models,...	no	Image/Video generation	no	no	CUDA 12/13, ROCm, Intel, Metal, CPU
transformers-musicgen	MusicGen	no	Audio generation	no	no	CUDA, CPU

Specialized AI Tasks

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
rfdetr	RF-DETR	no	Object Detection	no	no	CUDA 12/13, Intel, CPU
rerankers	Reranking API	no	Reranking	no	no	CUDA 12/13, ROCm, Intel, CPU
local-store	Vector database	no	Vector storage	yes	no	CPU
huggingface	HuggingFace API models	yes	Various AI tasks	yes	yes	API-based

Acceleration Support Summary

GPU Acceleration

NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
AMD ROCm: HIP-based acceleration for AMD GPUs
Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
Vulkan: Cross-platform GPU acceleration
Metal: Apple Silicon GPU acceleration (M1/M2/M3+)

Specialized Hardware

NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
Darwin x86: Intel Mac support

CPU Optimization

AVX/AVX2/AVX512: Advanced vector extensions for x86
Quantization: 4-bit, 5-bit, 8-bit integer quantization support
Mixed Precision: F16/F32 mixed precision support

Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

* Only for CUDA and OpenVINO CPU/XPU acceleration.

9.8 KiB Raw Blame History