chore(docs): update list of supported backends (#6134)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-04 13:27:04 -04:00 · 2025-08-24 20:09:19 +02:00
parent be132fe816
commit 195aa22e77
3 changed files with 119 additions and 17 deletions
--- a/docs/content/docs/reference/compatibility-table.md
+++ b/docs/content/docs/reference/compatibility-table.md
@@ -14,29 +14,77 @@ LocalAI will attempt to automatically load models which are not explicitly confi

 {{% /alert %}}

+## Text Generation & Language Models
+
 {{< table "table-responsive" >}}
 | Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
 |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
-| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}})        | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA, openCL, cuBLAS, Metal |
-| [whisper](https://github.com/ggerganov/whisper.cpp)         | whisper               | no                       | Audio                 | no                                | no                   | N/A |
+| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}})        | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes                      | GPT and Functions                        | yes | yes                  | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU |
+| [vLLM](https://github.com/vllm-project/vllm)        | Various GPTs and quantization formats | yes                      | GPT             | no | no                  | CUDA 12, ROCm, Intel |
+| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats  | yes                      | GPT, embeddings, Audio generation            | yes | yes*                  | CUDA 11/12, ROCm, Intel, CPU |
+| [exllama2](https://github.com/turboderp-org/exllamav2)  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | CUDA 12 |
+| [MLX](https://github.com/ml-explore/mlx-lm)        | Various LLMs               | yes                       | GPT                        | no                                | no                   | Metal (Apple Silicon) |
+| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm)        | Vision-Language Models               | yes                       | Multimodal GPT                        | no                                | no                   | Metal (Apple Silicon) |
 | [langchain-huggingface](https://github.com/tmc/langchaingo)                                                                    | Any text generators available on HuggingFace through API | yes                      | GPT                        | no                                | no                   | N/A |
-| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper))                                                                     | Any piper onnx model | no                      | Text to voice                        | no                                | no                   | N/A |
-| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT                   | no                       | Embeddings only                  | yes                               | no                   | N/A |
-| `bark`  | bark                   | no                       | Audio generation                  | no                               | no                   | yes |
-| `autogptq` | GPTQ                   | yes                       | GPT                  | yes                               | no                   | N/A |
-| `diffusers`  | SD,...                   | no                       | Image generation    | no                               | no                   | N/A |
-| `vllm` | Various GPTs and quantization formats | yes                      | GPT             | no | no                  | CPU/CUDA |
-| `exllama2`  | GPTQ                   | yes                       | GPT only                  | no                               | no                   | N/A |
-| `transformers-musicgen`  |                    | no                       | Audio generation                | no                               | no                   | N/A |
-| stablediffusion               | no                       | Image                 | no                                | no                   | N/A |
-| `coqui` | Coqui    | no                       | Audio generation and Voice cloning    | no                               | no                   | CPU/CUDA |
-| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API    | no                       | Reranking   | no                               | no                   | CPU/CUDA |
-| `transformers` | Various GPTs and quantization formats  | yes                      | GPT, embeddings, Audio generation            | yes | yes*                  | CPU/CUDA/XPU |
-| [bark-cpp](https://github.com/PABannier/bark.cpp)        | bark               | no                       | Audio-Only                 | no                                | no                   | yes |
-| [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp)         | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker               | no                       | Image                 | no                                | no                   | N/A |
+{{< /table >}}
+
+## Audio & Speech Processing
+
+{{< table "table-responsive" >}}
+| Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
+|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
+| [whisper.cpp](https://github.com/ggml-org/whisper.cpp)         | whisper               | no                       | Audio transcription                 | no                                | no                   | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU |
+| [faster-whisper](https://github.com/SYSTRAN/faster-whisper)         | whisper               | no                       | Audio transcription                 | no                                | no                   | CUDA 12, ROCm, Intel, CPU |
+| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper))                                                                     | Any piper onnx model | no                      | Text to voice                        | no                                | no                   | CPU |
+| [bark](https://github.com/suno-ai/bark)  | bark                   | no                       | Audio generation                  | no                               | no                   | CUDA 12, ROCm, Intel |
+| [bark-cpp](https://github.com/PABannier/bark.cpp)        | bark               | no                       | Audio-Only                 | no                                | no                   | CUDA, Metal, CPU |
+| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS    | no                       | Audio generation and Voice cloning    | no                               | no                   | CUDA 12, ROCm, Intel, CPU |
+| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS    | no                       | Text-to-speech    | no                               | no                   | CUDA 12, ROCm, Intel, CPU |
+| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS    | no                       | Text-to-speech    | no                               | no                   | CUDA 11/12, CPU |
+| [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS    | no                       | Text-to-speech    | no                               | no                   | CPU |
 | [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD    | no                       | Voice Activity Detection    | no                               | no                   | CPU |
 {{< /table >}}

+## Image & Video Generation
+
+{{< table "table-responsive" >}}
+| Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
+|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
+| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp)         | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker               | no                       | Image                 | no                                | no                   | CUDA 12, Intel SYCL, Vulkan, CPU |
+| [diffusers](https://github.com/huggingface/diffusers)  | SD, various diffusion models,...                   | no                       | Image/Video generation    | no                               | no                   | CUDA 11/12, ROCm, Intel, Metal, CPU |
+| [transformers-musicgen](https://github.com/huggingface/transformers)  | MusicGen                    | no                       | Audio generation                | no                               | no                   | CUDA, CPU |
+{{< /table >}}
+
+## Specialized AI Tasks
+
+{{< table "table-responsive" >}}
+| Backend and Bindings                                                             | Compatible models     | Completion/Chat endpoint | Capability | Embeddings support                | Token stream support | Acceleration |
+|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
+| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR    | no                       | Object Detection    | no                               | no                   | CUDA 12, Intel, CPU |
+| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API    | no                       | Reranking   | no                               | no                   | CUDA 11/12, ROCm, Intel, CPU |
+| [local-store](https://github.com/mudler/LocalAI) | Vector database    | no                       | Vector storage   | yes                               | no                   | CPU |
+| [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models    | yes                       | Various AI tasks   | yes                               | yes                   | API-based |
+{{< /table >}}
+
+## Acceleration Support Summary
+
+### GPU Acceleration
+- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends
+- **AMD ROCm**: HIP-based acceleration for AMD GPUs
+- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
+- **Vulkan**: Cross-platform GPU acceleration
+- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)
+
+### Specialized Hardware
+- **NVIDIA Jetson (L4T)**: ARM64 support for embedded AI
+- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
+- **Darwin x86**: Intel Mac support
+
+### CPU Optimization
+- **AVX/AVX2/AVX512**: Advanced vector extensions for x86
+- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
+- **Mixed Precision**: F16/F32 mixed precision support
+
 Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})).

 - \* Only for CUDA and OpenVINO CPU/XPU acceleration.