docs: Update model compatibility documentation with missing backends (#8889)

* docs: Update model compatibility documentation with missing backends Added the following backends to README.md and compatibility-table.md: - vllm-omni: Multimodal vLLM with vision and audio support - nemo: NVIDIA NeMo framework for speech models - outetts: OuteTTS with voice cloning capabilities - faster-qwen3-tts: Faster Qwen3 TTS implementation - qwen-asr: Qwen automatic speech recognition - voxcpm: VoxCPM speech understanding model - whisperx: Enhanced Whisper with word-level transcription These backends exist in the codebase (backend/index.yaml) but were missing from the documentation. This update ensures accurate reflection of currently supported backends in LocalAI. * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@example.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-07-11 08:48:07 -04:00 · 2026-03-09 09:22:34 +01:00
parent f06c02d10e
commit 9da24cdf85
2 changed files with 14 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -292,6 +292,7 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
 | **transformers** | HuggingFace transformers framework | CUDA 12/13, ROCm, Intel, CPU |
 | **MLX** | Apple Silicon LLM inference | Metal (M1/M2/M3+) |
 | **MLX-VLM** | Apple Silicon Vision-Language Models | Metal (M1/M2/M3+) |
+| **vLLM Omni** | Multimodal vLLM with vision and audio | CUDA 12/13, ROCm, Intel |

 ### Audio & Speech Processing
 | Backend | Description | Acceleration Support |
@@ -309,6 +310,12 @@ LocalAI supports a comprehensive range of AI backends with multiple acceleration
 | **vibevoice** | Real-time TTS with voice cloning | CUDA 12/13, ROCm, Intel, CPU |
 | **pocket-tts** | Lightweight CPU-based TTS | CUDA 12/13, ROCm, Intel, CPU |
 | **qwen-tts** | High-quality TTS with custom voice, voice design, and voice cloning | CUDA 12/13, ROCm, Intel, CPU |
+| **nemo** | NVIDIA NeMo framework for speech models | CUDA 12/13, ROCm, Intel, CPU |
+| **outetts** | OuteTTS with voice cloning | CUDA 12/13, CPU |
+| **faster-qwen3-tts** | Faster Qwen3 TTS | CUDA 12/13, ROCm, Intel, CPU |
+| **qwen-asr** | Qwen ASR speech recognition | CUDA 12/13, ROCm, Intel, CPU |
+| **voxcpm** | VoxCPM speech understanding | CUDA 12/13, Metal, CPU |
+| **whisperx** | Enhanced Whisper transcription | CUDA 12/13, ROCm, Intel, CPU |
 | **ace-step** | Music generation from text descriptions, lyrics, or audio samples | CUDA 12/13, ROCm, Intel, Metal, CPU |

 ### Image & Video Generation
--- a/docs/content/reference/compatibility-table.md
+++ b/docs/content/reference/compatibility-table.md
@@ -23,6 +23,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 | [transformers](https://github.com/huggingface/transformers) | Various GPTs and quantization formats  | yes                      | GPT, embeddings, Audio generation            | yes | yes*                  | CUDA 12/13, ROCm, Intel, CPU |
 | [MLX](https://github.com/ml-explore/mlx-lm)        | Various LLMs               | yes                       | GPT                        | no                                | no                   | Metal (Apple Silicon) |
 | [MLX-VLM](https://github.com/Blaizzy/mlx-vlm)        | Vision-Language Models               | yes                       | Multimodal GPT                        | no                                | no                   | Metal (Apple Silicon) |
+| [vllm-omni](https://github.com/vllm-project/vllm) | vLLM Omni multimodal | yes | Multimodal GPT | no | no | CUDA 12/13, ROCm, Intel |
 | [langchain-huggingface](https://github.com/tmc/langchaingo)                                                                    | Any text generators available on HuggingFace through API | yes                      | GPT                        | no                                | no                   | N/A |

 ## Audio & Speech Processing
@@ -41,6 +42,12 @@ LocalAI will attempt to automatically load models which are not explicitly confi
 | [vibevoice](https://github.com/microsoft/VibeVoice) | VibeVoice-Realtime    | no                       | Real-time text-to-speech with voice cloning    | no                               | no                   | CUDA 12/13, ROCm, Intel, CPU |
 | [pocket-tts](https://github.com/kyutai-labs/pocket-tts) | Pocket TTS    | no                       | Lightweight CPU-based text-to-speech with voice cloning    | no                               | no                   | CUDA 12/13, ROCm, Intel, CPU |
 | [mlx-audio](https://github.com/Blaizzy/mlx-audio) | MLX | no                       | Text-tospeech    | no                               | no                   | Metal (Apple Silicon) |
+| [nemo](https://github.com/NVIDIA/NeMo) | NeMo speech models | no | Speech models | no | no | CUDA 12/13, ROCm, Intel, CPU |
+| [outetts](https://github.com/edwengc/outetts) | OuteTTS | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, CPU |
+| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Faster Qwen3 TTS | no | Fast text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
+| [qwen-asr](https://github.com/QwenLM/Qwen-ASR) | Qwen ASR | no | Automatic speech recognition | no | no | CUDA 12/13, ROCm, Intel, CPU |
+| [voxcpm](https://github.com/voxcpm/voxcpm) | VoxCPM | no | Speech understanding | no | no | CUDA 12/13, Metal, CPU |
+| [whisperx](https://github.com/m-bain/whisperX) | WhisperX | no | Enhanced transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |

 ## Image & Video Generation