mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-18 13:49:09 -04:00
docs: document all available backends and add "built by us" list (#10376)
Bring the Backend & Model Compatibility Table up to the full set of backends published in backend/index.yaml (60+), organized by modality with per-backend acceleration targets. Add an "Available Backends" pointer and expand the backend-type list in the backends feature doc. Update the README backend count to 60+ and add a "Backends built by us" section listing the native C/C++/GGML engines maintained by the LocalAI project (parakeet.cpp, voxtral.c, vibevoice.cpp, rf-detr.cpp, locate-anything.cpp, depth-anything.cpp, LocalVQE, local-store). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
17
README.md
17
README.md
@@ -220,10 +220,25 @@ For older news and full release notes, see [GitHub Releases](https://github.com/
|
||||
|
||||
## Supported Backends & Acceleration
|
||||
|
||||
LocalAI supports **36+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
|
||||
LocalAI supports **60+ backends** including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
|
||||
|
||||
See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
|
||||
|
||||
### Backends built by us
|
||||
|
||||
Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:
|
||||
|
||||
| Backend | What it does |
|
||||
|---------|-------------|
|
||||
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription |
|
||||
| [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C |
|
||||
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization |
|
||||
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation |
|
||||
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) |
|
||||
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose estimation |
|
||||
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
|
||||
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
|
||||
|
||||
## Resources
|
||||
|
||||
- [Documentation](https://localai.io/)
|
||||
|
||||
@@ -8,6 +8,12 @@ url: "/backends/"
|
||||
|
||||
LocalAI supports a variety of backends that can be used to run different types of AI models. There are core Backends which are included, and there are containerized applications that provide the runtime environment for specific model types, such as LLMs, diffusion models, or text-to-speech models.
|
||||
|
||||
## Available Backends
|
||||
|
||||
LocalAI ships **60+ backends** covering text generation, speech-to-text, text-to-speech, music and sound generation, image and video generation, vision and object detection, audio processing, reranking, fine-tuning, and more. Each one is published as an on-demand OCI image with the appropriate acceleration variants (CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T).
|
||||
|
||||
For the complete list of backends, the model families they support, and their acceleration targets, see the [Backend & Model Compatibility Table]({{%relref "reference/compatibility-table" %}}). The authoritative source is [`backend/index.yaml`](https://github.com/mudler/LocalAI/blob/master/backend/index.yaml), and the same catalog is browsable in the web UI under the **Backends** section.
|
||||
|
||||
## Managing Backends in the UI
|
||||
|
||||
The LocalAI web interface provides an intuitive way to manage your backends:
|
||||
@@ -118,8 +124,13 @@ For getting started, see the available backends in LocalAI here: https://github.
|
||||
|
||||
LocalAI supports various types of backends:
|
||||
|
||||
- **LLM Backends**: For running language models
|
||||
- **Diffusion Backends**: For image generation
|
||||
- **TTS Backends**: For text-to-speech conversion
|
||||
- **Whisper Backends**: For speech-to-text conversion
|
||||
- **Sound Generation Backends**: For music and audio generation (e.g., ACE-Step)
|
||||
- **LLM Backends**: For running language models (e.g., llama.cpp, vLLM, SGLang, transformers, MLX)
|
||||
- **Speech-to-Text Backends**: For transcription (e.g., whisper.cpp, parakeet.cpp, faster-whisper, NeMo)
|
||||
- **Text-to-Speech Backends**: For speech synthesis (e.g., piper, Kokoro, VibeVoice, Qwen3-TTS)
|
||||
- **Sound Generation Backends**: For music and audio generation (e.g., ACE-Step)
|
||||
- **Image & Video Generation Backends**: For diffusion models (e.g., stable-diffusion.cpp, diffusers)
|
||||
- **Vision & Detection Backends**: For object detection, segmentation, depth, and face/voice recognition (e.g., rf-detr.cpp, locate-anything.cpp, sam3.cpp, insightface)
|
||||
- **Audio Processing Backends**: For voice activity detection and audio enhancement (e.g., Silero VAD, LocalVQE)
|
||||
- **Utility Backends**: For reranking, fine-tuning, quantization, and vector storage (e.g., rerankers, TRL, local-store)
|
||||
|
||||
See the [Backend & Model Compatibility Table]({{%relref "reference/compatibility-table" %}}) for the full catalog.
|
||||
@@ -12,6 +12,8 @@ Besides llama based models, LocalAI is compatible also with other architectures.
|
||||
|
||||
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.
|
||||
|
||||
All backends listed here can be installed on demand from the [Backend Gallery]({{%relref "features/backends" %}}). The exact set of acceleration variants published for each backend is defined in [`backend/index.yaml`](https://github.com/mudler/LocalAI/blob/master/backend/index.yaml).
|
||||
|
||||
{{% /notice %}}
|
||||
|
||||
## Text Generation & Language Models
|
||||
@@ -20,70 +22,100 @@ LocalAI will attempt to automatically load models which are not explicitly confi
|
||||
|---------|-------------|------------|------------|-----------|-------------|
|
||||
| [llama.cpp](https://github.com/ggerganov/llama.cpp) | LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | GPT, Functions | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) | Hard fork of llama.cpp optimized for CPU/hybrid CPU+GPU with IQK quants, custom quant mixes, and MLA for DeepSeek | GPT | yes | yes | CPU (AVX2+) |
|
||||
| [vLLM](https://github.com/vllm-project/vllm) | Fast LLM serving with PagedAttention | GPT, Functions | no | yes | CPU, CUDA 12, ROCm, Intel |
|
||||
| [vLLM Omni](https://github.com/vllm-project/vllm) | Unified multimodal generation (text, image, video, audio) | Multimodal GPT, Functions | no | yes | CUDA 12, ROCm |
|
||||
| [transformers](https://github.com/huggingface/transformers) | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CPU, CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [MLX](https://github.com/ml-explore/mlx-lm) | Apple Silicon LLM inference | GPT, Functions | no | yes | Metal, CPU, CUDA 12/13, Jetson L4T |
|
||||
| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models on Apple Silicon | Multimodal GPT, Functions | no | yes | Metal, CPU, CUDA 12/13, Jetson L4T |
|
||||
| [MLX Distributed](https://github.com/ml-explore/mlx-lm) | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | Metal |
|
||||
| [turboquant](https://github.com/TheTom/llama-cpp-turboquant) | llama.cpp fork adding the TurboQuant KV-cache quantization scheme | GPT | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [ds4](https://github.com/antirez/ds4) | DeepSeek V4 Flash single-model inference engine, optimized for Metal and CUDA | GPT | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [vLLM](https://github.com/vllm-project/vllm) | Fast LLM serving with PagedAttention; GPTQ/AWQ/FP8 quantization | GPT, Functions, Multimodal | no | yes | CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
|
||||
| [vLLM Omni](https://github.com/vllm-project/vllm-omni) | Unified multimodal generation (text, image, video, audio) on top of vLLM | Multimodal GPT, Functions | no | yes | CUDA 12/13, ROCm, Jetson L4T |
|
||||
| [SGLang](https://github.com/sgl-project/sglang) | Fast serving framework for LLMs and vision-language models with speculative decoding | GPT, Functions, Multimodal | no | yes | CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
|
||||
| [transformers](https://github.com/huggingface/transformers) | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CUDA 12/13, ROCm, Intel SYCL, Metal |
|
||||
| [MLX](https://github.com/ml-explore/mlx-lm) | Apple Silicon LLM inference | GPT, Functions | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models on Apple Silicon | Multimodal GPT, Functions | no | yes | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [MLX Distributed](https://github.com/ml-explore/mlx-lm) | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [tinygrad](https://github.com/tinygrad/tinygrad) | Minimalist deep-learning framework with zero runtime dependencies | GPT, Embeddings, Multimodal | yes | yes | CPU |
|
||||
|
||||
## Speech-to-Text
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
|
||||
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
|
||||
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
|
||||
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [NeMo](https://github.com/NVIDIA/NeMo) | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices (ONNX) | CPU, CUDA 12/13, Metal |
|
||||
| [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet (tdt/ctc/rnnt/hybrid), with cache-aware streaming | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [CrispASR](https://github.com/CrispStrobe/CrispASR) | Unified speech engine (whisper.cpp fork) supporting Parakeet, Canary, and many ASR architectures, plus TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C | CPU, Metal |
|
||||
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [NeMo](https://github.com/NVIDIA/NeMo) | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
|
||||
| [sherpa-onnx](https://k2-fsa.github.io/sherpa/onnx/) | Sherpa-ONNX ASR (Whisper, Paraformer, SenseVoice) and TTS | CPU, CUDA 12, Metal |
|
||||
|
||||
## Text-to-Speech
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [piper](https://github.com/rhasspy/piper) | Fast neural TTS | CPU |
|
||||
| [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) | TTS with 1100+ languages and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [piper](https://github.com/rhasspy/piper) | Fast neural TTS | CPU, Metal |
|
||||
| [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) | TTS with 1100+ languages and voice cloning | CUDA 12, ROCm, Intel SYCL, Metal |
|
||||
| [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [Kokoros](https://huggingface.co/hexgrad/Kokoro-82M) | Pure Rust Kokoro TTS via ONNX | CPU |
|
||||
| [Chatterbox](https://github.com/resemble-ai/chatterbox) | Production-grade TTS with emotion control | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [VibeVoice](https://github.com/microsoft/VibeVoice) | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [fish-speech](https://github.com/fishaudio/fish-speech) | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [VibeVoice](https://github.com/microsoft/VibeVoice) | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native C++/GGML port of VibeVoice for TTS (voice cloning) and long-form ASR with diarization | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [qwentts.cpp](https://github.com/ServeurpersoCom/qwentts.cpp) | Native C++/GGML Qwen3-TTS with streaming, named speakers, and voice design | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [OmniVoice](https://github.com/ServeurpersoCom/omnivoice.cpp) | Native C++/GGML TTS with voice cloning, voice design, and streaming | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [fish-speech](https://github.com/fishaudio/fish-speech) | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [OuteTTS](https://github.com/OuteAI/outetts) | TTS with custom speaker voices | CPU, CUDA 12 |
|
||||
| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Real-time Qwen3-TTS with CUDA graph capture | CUDA 12/13, Jetson L4T |
|
||||
| [NeuTTS Air](https://github.com/neuphonic/neutts-air) | Instant voice cloning TTS | CPU, CUDA 12, ROCm |
|
||||
| [VoxCPM](https://github.com/ModelBest/VoxCPM) | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Real-time Qwen3-TTS with CUDA graph capture | CPU, CUDA 12/13, Jetson L4T |
|
||||
| [NeuTTS Air](https://github.com/neuphonic/neutts-air) | Instant voice cloning, on-device TTS | CPU, CUDA 12, ROCm |
|
||||
| [VoxCPM](https://github.com/ModelBest/VoxCPM) | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
|
||||
| [Kitten TTS](https://github.com/KittenML/KittenTTS) | Kitten TTS model | CPU, Metal |
|
||||
| [MLX-Audio](https://github.com/Blaizzy/mlx-audio) | Audio models on Apple Silicon | Metal, CPU, CUDA 12/13, Jetson L4T |
|
||||
| [OmniVoice](https://github.com/ServeurpersoCom/omnivoice.cpp) | Native C++/GGML TTS with voice cloning, voice design, and streaming | CPU, CUDA 12/13, ROCm, Intel, Metal, Vulkan, Jetson L4T |
|
||||
| [Supertonic](https://github.com/supertone-inc/supertonic) | Lightning-fast on-device multilingual TTS via ONNX | CPU |
|
||||
| [MLX-Audio](https://github.com/Blaizzy/mlx-audio) | Audio models on Apple Silicon | CPU, CUDA 12/13, Metal, Jetson L4T |
|
||||
| [liquid-audio](https://github.com/Liquid4All/liquid-audio) | LFM2 end-to-end speech-to-speech, ASR, and TTS | CPU, CUDA 12/13, ROCm, Intel SYCL, Jetson L4T |
|
||||
|
||||
## Music Generation
|
||||
## Music & Sound Generation
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [ACE-Step](https://github.com/ace-step/ACE-Step-1.5) | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [ACE-Step](https://github.com/ace-step/ACE-Step-1.5) | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal |
|
||||
| [acestep.cpp](https://github.com/ace-step/acestep.cpp) | ACE-Step 1.5 C++ backend using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
|
||||
## Image & Video Generation
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | Stable Diffusion, Flux, PhotoMaker in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [diffusers](https://github.com/huggingface/diffusers) | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | Stable Diffusion, Flux, PhotoMaker, Ideogram in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [diffusers](https://github.com/huggingface/diffusers) | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel SYCL, Metal, Jetson L4T |
|
||||
| [vLLM Omni](https://github.com/vllm-project/vllm-omni) | Multimodal generation including text-to-image and text-to-video | CUDA 12/13, ROCm, Jetson L4T |
|
||||
|
||||
## Specialized Tasks
|
||||
## Vision, Detection & Recognition
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [RF-DETR](https://github.com/roboflow/rf-detr) | Real-time transformer-based object detection | CPU, CUDA 12/13, Intel, Metal, Jetson L4T |
|
||||
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Document reranking for RAG | CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [local-store](https://github.com/mudler/LocalAI) | Local vector database for embeddings | CPU, Metal |
|
||||
| [Silero VAD](https://github.com/snakers4/silero-vad) | Voice Activity Detection | CPU |
|
||||
| [RF-DETR](https://github.com/roboflow/rf-detr) | Real-time transformer-based object detection (Python) | CPU, CUDA 12/13, Intel SYCL, Metal, Jetson L4T |
|
||||
| [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [locate-anything.cpp](https://github.com/mudler/locate-anything.cpp) | Open-vocabulary object detection and visual grounding (LocateAnything-3B) in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [depth-anything.cpp](https://github.com/mudler/depth-anything.cpp) | Depth Anything 3 monocular metric depth + camera pose in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [sam3.cpp](https://github.com/PABannier/sam3.cpp) | Segment Anything (SAM 3/2/EdgeTAM) with text/point/box prompts in C/C++ using GGML | CPU, CUDA 12/13, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [insightface](https://github.com/deepinsight/insightface) | Face verification, embedding, and anti-spoofing liveness (ONNX Runtime) | CPU, CUDA 12 |
|
||||
| [speaker-recognition](https://speechbrain.github.io/) | Speaker (voice) recognition via SpeechBrain ECAPA-TDNN | CPU, CUDA 12, Metal |
|
||||
|
||||
## Audio Processing
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [Silero VAD](https://github.com/snakers4/silero-vad) | Voice Activity Detection | CPU, Metal |
|
||||
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation in C/C++ using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Jetson L4T |
|
||||
| [Opus](https://opus-codec.org/) | Audio codec for WebRTC / Realtime API | CPU, Metal |
|
||||
|
||||
## Utilities & Other
|
||||
|
||||
| Backend | Description | Acceleration |
|
||||
|---------|-------------|-------------|
|
||||
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Document reranking for RAG | CUDA 12, ROCm, Intel SYCL, Metal |
|
||||
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings | CPU, Metal |
|
||||
| [TRL](https://github.com/huggingface/trl) | Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) | CPU, CUDA 12/13 |
|
||||
| [llama.cpp quantization](https://github.com/ggml-org/llama.cpp) | HuggingFace → GGUF model conversion and quantization | CPU, Metal |
|
||||
| [Opus](https://opus-codec.org/) | Audio codec for WebRTC / Realtime API | CPU, Metal |
|
||||
|
||||
## Acceleration Support Summary
|
||||
|
||||
|
||||
Reference in New Issue
Block a user