From 195aa22e77e6d790fd057450bd75ddcbd8913802 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sun, 24 Aug 2025 20:09:19 +0200 Subject: [PATCH] chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto --- README.md | 54 +++++++++++++ backend/index.yaml | 2 +- .../docs/reference/compatibility-table.md | 80 +++++++++++++++---- 3 files changed, 119 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index e924af647..e5a47ff0f 100644 --- a/README.md +++ b/README.md @@ -233,6 +233,60 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A - 🔊 Voice activity detection (Silero-VAD support) - 🌍 Integrated WebUI! +## 🧩 Supported Backends & Acceleration + +LocalAI supports a comprehensive range of AI backends with multiple acceleration options: + +### Text Generation & Language Models +| Backend | Description | Acceleration Support | +|---------|-------------|---------------------| +| **llama.cpp** | LLM inference in C/C++ | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU | +| **vLLM** | Fast LLM inference with PagedAttention | CUDA 12, ROCm, Intel | +| **transformers** | HuggingFace transformers framework | CUDA 11/12, ROCm, Intel, CPU | +| **exllama2** | GPTQ inference library | CUDA 12 | +| **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) | +| **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) | + +### Audio & Speech Processing +| Backend | Description | Acceleration Support | +|---------|-------------|---------------------| +| **whisper.cpp** | OpenAI Whisper in C/C++ | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU | +| **faster-whisper** | Fast Whisper with CTranslate2 | CUDA 12, ROCm, Intel, CPU | +| **bark** | Text-to-audio generation | CUDA 12, ROCm, Intel | +| **bark-cpp** | C++ implementation of Bark | CUDA, Metal, CPU | +| **coqui** | Advanced TTS with 1100+ languages | CUDA 12, ROCm, Intel, CPU | +| **kokoro** | Lightweight TTS model | CUDA 12, ROCm, Intel, CPU | +| **chatterbox** | Production-grade TTS | CUDA 11/12, CPU | +| **piper** | Fast neural TTS system | CPU | +| **kitten-tts** | Kitten TTS models | CPU | +| **silero-vad** | Voice Activity Detection | CPU | + +### Image & Video Generation +| Backend | Description | Acceleration Support | +|---------|-------------|---------------------| +| **stablediffusion.cpp** | Stable Diffusion in C/C++ | CUDA 12, Intel SYCL, Vulkan, CPU | +| **diffusers** | HuggingFace diffusion models | CUDA 11/12, ROCm, Intel, Metal, CPU | + +### Specialized AI Tasks +| Backend | Description | Acceleration Support | +|---------|-------------|---------------------| +| **rfdetr** | Real-time object detection | CUDA 12, Intel, CPU | +| **rerankers** | Document reranking API | CUDA 11/12, ROCm, Intel, CPU | +| **local-store** | Vector database | CPU | +| **huggingface** | HuggingFace API integration | API-based | + +### Hardware Acceleration Matrix + +| Acceleration Type | Supported Backends | Hardware Support | +|-------------------|-------------------|------------------| +| **NVIDIA CUDA 11** | llama.cpp, whisper, stablediffusion, diffusers, rerankers, bark, chatterbox | Nvidia hardware | +| **NVIDIA CUDA 12** | All CUDA-compatible backends | Nvidia hardware | +| **AMD ROCm** | llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, bark | AMD Graphics | +| **Intel oneAPI** | llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, exllama2, coqui, kokoro, bark | Intel Arc, Intel iGPUs | +| **Apple Metal** | llama.cpp, whisper, diffusers, MLX, MLX-VLM, bark-cpp | Apple M1/M2/M3+ | +| **Vulkan** | llama.cpp, whisper, stablediffusion | Cross-platform GPUs | +| **NVIDIA Jetson** | llama.cpp, whisper, stablediffusion, diffusers, rfdetr | ARM64 embedded AI | +| **CPU Optimized** | All backends | AVX/AVX2/AVX512, quantization support | ### 🔗 Community and integrations diff --git a/backend/index.yaml b/backend/index.yaml index 8398fb0d4..960cf3aec 100644 --- a/backend/index.yaml +++ b/backend/index.yaml @@ -147,7 +147,7 @@ uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-vlm" icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4 urls: - - https://github.com/ml-explore/mlx-vlm + - https://github.com/Blaizzy/mlx-vlm mirrors: - localai/localai-backends:latest-metal-darwin-arm64-mlx-vlm license: MIT diff --git a/docs/content/docs/reference/compatibility-table.md b/docs/content/docs/reference/compatibility-table.md index b00aa5360..2b47d99c8 100644 --- a/docs/content/docs/reference/compatibility-table.md +++ b/docs/content/docs/reference/compatibility-table.md @@ -14,29 +14,77 @@ LocalAI will attempt to automatically load models which are not explicitly confi {{% /alert %}} +## Text Generation & Language Models + {{< table "table-responsive" >}} | Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | |----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| -| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA, openCL, cuBLAS, Metal | -| [whisper](https://github.com/ggerganov/whisper.cpp) | whisper | no | Audio | no | no | N/A | +| [llama.cpp]({{%relref "docs/features/text-generation#llama.cpp" %}}) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | yes | GPT and Functions | yes | yes | CUDA 11/12, ROCm, Intel SYCL, Vulkan, Metal, CPU | +| [vLLM](https://github.com/vllm-project/vllm) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12, ROCm, Intel | +| [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12, ROCm, Intel, CPU | +| [exllama2](https://github.com/turboderp-org/exllamav2) | GPTQ | yes | GPT only | no | no | CUDA 12 | +| [MLX](https://github.com/ml-explore/mlx-lm) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) | +| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) | | [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A | -| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A | -| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A | -| `bark` | bark | no | Audio generation | no | no | yes | -| `autogptq` | GPTQ | yes | GPT | yes | no | N/A | -| `diffusers` | SD,... | no | Image generation | no | no | N/A | -| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA | -| `exllama2` | GPTQ | yes | GPT only | no | no | N/A | -| `transformers-musicgen` | | no | Audio generation | no | no | N/A | -| stablediffusion | no | Image | no | no | N/A | -| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA | -| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CPU/CUDA | -| `transformers` | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CPU/CUDA/XPU | -| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | yes | -| [stablediffusion-cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | N/A | +{{< /table >}} + +## Audio & Speech Processing + +{{< table "table-responsive" >}} +| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | +|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| +| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel SYCL, Vulkan, CPU | +| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | whisper | no | Audio transcription | no | no | CUDA 12, ROCm, Intel, CPU | +| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU | +| [bark](https://github.com/suno-ai/bark) | bark | no | Audio generation | no | no | CUDA 12, ROCm, Intel | +| [bark-cpp](https://github.com/PABannier/bark.cpp) | bark | no | Audio-Only | no | no | CUDA, Metal, CPU | +| [coqui](https://github.com/idiap/coqui-ai-TTS) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12, ROCm, Intel, CPU | +| [kokoro](https://github.com/hexgrad/kokoro) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12, ROCm, Intel, CPU | +| [chatterbox](https://github.com/resemble-ai/chatterbox) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12, CPU | +| [kitten-tts](https://github.com/KittenML/KittenTTS) | Kitten TTS | no | Text-to-speech | no | no | CPU | | [silero-vad](https://github.com/snakers4/silero-vad) with [Golang bindings](https://github.com/streamer45/silero-vad-go) | Silero VAD | no | Voice Activity Detection | no | no | CPU | {{< /table >}} +## Image & Video Generation + +{{< table "table-responsive" >}} +| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | +|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| +| [stablediffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12, Intel SYCL, Vulkan, CPU | +| [diffusers](https://github.com/huggingface/diffusers) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12, ROCm, Intel, Metal, CPU | +| [transformers-musicgen](https://github.com/huggingface/transformers) | MusicGen | no | Audio generation | no | no | CUDA, CPU | +{{< /table >}} + +## Specialized AI Tasks + +{{< table "table-responsive" >}} +| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration | +|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------| +| [rfdetr](https://github.com/roboflow/rf-detr) | RF-DETR | no | Object Detection | no | no | CUDA 12, Intel, CPU | +| [rerankers](https://github.com/AnswerDotAI/rerankers) | Reranking API | no | Reranking | no | no | CUDA 11/12, ROCm, Intel, CPU | +| [local-store](https://github.com/mudler/LocalAI) | Vector database | no | Vector storage | yes | no | CPU | +| [huggingface](https://huggingface.co/docs/hub/en/api) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based | +{{< /table >}} + +## Acceleration Support Summary + +### GPU Acceleration +- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0 support across most backends +- **AMD ROCm**: HIP-based acceleration for AMD GPUs +- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision) +- **Vulkan**: Cross-platform GPU acceleration +- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+) + +### Specialized Hardware +- **NVIDIA Jetson (L4T)**: ARM64 support for embedded AI +- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+ +- **Darwin x86**: Intel Mac support + +### CPU Optimization +- **AVX/AVX2/AVX512**: Advanced vector extensions for x86 +- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support +- **Mixed Precision**: F16/F32 mixed precision support + Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "docs/advanced" %}})). - \* Only for CUDA and OpenVINO CPU/XPU acceleration.