chore(docs): update list of supported backends (#6134)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
Ettore Di Giacinto
2025-08-24 20:09:19 +02:00
committed by GitHub
parent be132fe816
commit 195aa22e77
3 changed files with 119 additions and 17 deletions

View File

@@ -14,29 +14,77 @@ LocalAI will attempt to automatically load models which are not explicitly confi
{{% /alert %}}
## Text Generation & Language Models
{{< table "table-responsive" >}}
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal |
| [whisper](https://github.com/ggerganov/whisper.cpp) | whisper | no | Audio | no | no | N/A |
| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel |
| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12, ROCm, Intel, CPU |
| [exllama2](https://github.com/turboderp-org/exllamav2) | GPTQ | yes | GPT only | no | no | CUDA 12 |
| [MLX](https://github.com/ml-explore/mlx-lm) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
| `bark` | bark | no | Audio generation | no | no | yes |
| `autogptq` | GPTQ | yes | GPT | yes | no | N/A |
| `diffusers` | SD,... | no | Image generation | no | no | N/A |
| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
| `exllama2` | GPTQ | yes | GPT only | no | no | N/A |
| `transformers-musicgen` | | no | Audio generation | no | no | N/A |
| stablediffusion | no | Image | no | no | N/A |
| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CPU/CUDA |
| `transformers` | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CPU/CUDA/XPU |
| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | yes |
| [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A |
{{< /table >}}
## Audio & Speech Processing
{{< table "table-responsive" >}}
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU |
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU |
| [bark](https://github.com/suno-ai/bark) | bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel |
| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU |
| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU |
| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU |
| [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS | no | Text-to-speech | no | no | CPU |
| [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU |
{{< /table >}}
## Image & Video Generation
{{< table "table-responsive" >}}
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU |
| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU |
| [transformers-musicgen](https://github.com/huggingface/transformers) | MusicGen | no | Audio generation | no | no | CUDA, CPU |
{{< /table >}}
## Specialized AI Tasks
{{< table "table-responsive" >}}
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU |
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU |
| [local-store](https://github.com/mudler/LocalAI) | Vector database | no | Vector storage | yes | no | CPU |
| [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
{{< /table >}}
## Acceleration Support Summary
### GPU Acceleration
- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends
- **AMD ROCm**: HIP-based acceleration for AMD GPUs
- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- **Vulkan**: Cross-platform GPU acceleration
- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)
### Specialized Hardware
- **NVIDIA Jetson (L4T)**: ARM64 support for embedded AI
- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
- **Darwin x86**: Intel Mac support
### CPU Optimization
- **AVX/AVX2/AVX512**: Advanced vector extensions for x86
- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
- **Mixed Precision**: F16/F32 mixed precision support
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})).
- \* Only for CUDA and OpenVINO CPU/XPU acceleration.