mirror of
https://github.com/mudler/LocalAI.git
synced 2026-01-18 11:21:10 -05:00
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04) Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): bump GitHub Actions CUDA support to 12.9 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): disable parallel backend builds to avoid race conditions Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): update backend Dockerfiles to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore: add local-ai-launcher to .gitignore Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix backends GitHub Actions workflows after rebase Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): use build-time UBUNTU_VERSION variable Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(docker): remove libquadmath0 from requirements-stage base image Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(docker): correct CUDA installation steps in backend Dockerfiles Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): update ROCm to 6.4 and align Python hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix failing GitHub Actions runners Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc. Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): rm all traces of CUDA 11 Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): Add Ubuntu codename as an argument Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
9.8 KiB
9.8 KiB
+++ disableToc = false title = "Model compatibility table" weight = 24 url = "/model-compatibility/" +++
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
{{% notice note %}}
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
{{% /notice %}}
Text Generation & Language Models
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| [llama.cpp]({{%relref "features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, and many others | yes | GPT and Functions | yes | yes | CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| vLLM | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12/13, ROCm, Intel |
| transformers | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 12/13, ROCm, Intel, CPU |
| exllama2 | GPTQ | yes | GPT only | no | no | CUDA 12/13 |
| MLX | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
| MLX-VLM | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
| langchain-huggingface | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
Audio & Speech Processing
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| whisper.cpp | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| faster-whisper | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
| piper (binding) | Any piper onnx model | no | Text to voice | no | no | CPU |
| bark | bark | no | Audio generation | no | no | CUDA 12/13, ROCm, Intel |
| bark-cpp | bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
| coqui | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| kokoro | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
| chatterbox | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 12/13, CPU |
| kitten-tts | Kitten TTS | no | Text-to-speech | no | no | CPU |
| silero-vad with Golang bindings | Silero VAD | no | Voice Activity Detection | no | no | CPU |
| neutts | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, CPU |
| vibevoice | VibeVoice-Realtime | no | Real-time text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| mlx-audio | MLX | no | Text-tospeech | no | no | Metal (Apple Silicon) |
Image & Video Generation
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| stablediffusion.cpp | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12/13, Intel SYCL, Vulkan, CPU |
| diffusers | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 12/13, ROCm, Intel, Metal, CPU |
| transformers-musicgen | MusicGen | no | Audio generation | no | no | CUDA, CPU |
Specialized AI Tasks
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|---|---|---|---|---|---|---|
| rfdetr | RF-DETR | no | Object Detection | no | no | CUDA 12/13, Intel, CPU |
| rerankers | Reranking API | no | Reranking | no | no | CUDA 12/13, ROCm, Intel, CPU |
| local-store | Vector database | no | Vector storage | yes | no | CPU |
| huggingface | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
Acceleration Support Summary
GPU Acceleration
- NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends
- AMD ROCm: HIP-based acceleration for AMD GPUs
- Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- Vulkan: Cross-platform GPU acceleration
- Metal: Apple Silicon GPU acceleration (M1/M2/M3+)
Specialized Hardware
- NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)
- Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+
- Darwin x86: Intel Mac support
CPU Optimization
- AVX/AVX2/AVX512: Advanced vector extensions for x86
- Quantization: 4-bit, 5-bit, 8-bit integer quantization support
- Mixed Precision: F16/F32 mixed precision support
Note: any backend name listed above can be used in the backend field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
- * Only for CUDA and OpenVINO CPU/XPU acceleration.