mirror of https://github.com/mudler/LocalAI.git synced 2026-06-21 23:29:04 -04:00

Files

Dream 10a1e6c74d feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299 )

* feat(proto): add speaker field to TranscriptSegment for diarization

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx backend for transcription with diarization

Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): register whisperx backend in Makefile

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx meta and image entries to index.yaml

Signed-off-by: eureka928 <meobius123@gmail.com>

* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): unpin torch versions and use CPU index for cpu requirements

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch ROCm variant to fix CI build failure

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch CPU variant to fix uv resolution failure

Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

* revert: scope hipblas suffix fix to whisperx only

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

---------

Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 16:33:12 +01:00

chatterbox

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

common

feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299 )

2026-02-02 16:33:12 +01:00

coqui

chore: drop bark which is unmaintained (#8207 )

2026-01-25 09:26:40 +01:00

diffusers

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

faster-whisper

feat(api): Add transcribe response format request parameter & adjust STT backends (#8318 )

2026-02-01 17:33:17 +01:00

kitten-tts

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

kokoro

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

mlx

feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling (#7556 )

2025-12-16 11:27:46 +01:00

mlx-audio

fix(python): make option check uniform across backends (#6314 )

2025-09-19 19:56:08 +02:00

mlx-vlm

fix(python): make option check uniform across backends (#6314 )

2025-09-19 19:56:08 +02:00

moonshine

feat(backends): add moonshine backend for faster transcription (#7833 )

2026-01-07 21:44:35 +01:00

neutts

fix(l4t-12): use pip to install python deps (#7967 )

2026-01-11 00:21:32 +01:00

pocket-tts

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

qwen-asr

feat(qwen-asr): add support to qwen-asr (#8281 )

2026-01-29 21:50:35 +01:00

qwen-tts

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

rerankers

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

rfdetr

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

transformers

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

vibevoice

feat(vibevoice): add ASR support (#8222 )

2026-01-27 20:19:22 +01:00

vllm

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

vllm-omni

feat(vllm-omni): add new backend (#8188 )

2026-01-24 22:23:30 +01:00

voxcpm

feat(tts): add support for streaming mode (#8291 )

2026-01-30 11:58:01 +01:00

whisperx

feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299 )

2026-02-02 16:33:12 +01:00

README.md

chore: drop bark which is unmaintained (#8207 )

2026-01-25 09:26:40 +01:00

README.md

Python Backends for LocalAI

This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.

Overview

The Python backends use a unified build system based on libbackend.sh that provides:

Automatic virtual environment management with support for both uv and pip
Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
Portable Python support for standalone deployments
Consistent backend execution across different environments

Available Backends

Core AI Models

transformers - Hugging Face Transformers framework (PyTorch-based)
vllm - High-performance LLM inference engine
mlx - Apple Silicon optimized ML framework

Audio & Speech

coqui - Coqui TTS models
faster-whisper - Fast Whisper speech recognition
kitten-tts - Lightweight TTS
mlx-audio - Apple Silicon audio processing
chatterbox - TTS model
kokoro - TTS models

Computer Vision

diffusers - Stable Diffusion and image generation
mlx-vlm - Vision-language models for Apple Silicon
rfdetr - Object detection models

Specialized

rerankers - Text reranking models

Quick Start

Prerequisites

Python 3.10+ (default: 3.10.18)
uv package manager (recommended) or pip
Appropriate hardware drivers for your target (CUDA, Intel, etc.)

Installation

Each backend can be installed individually:

# Navigate to a specific backend
cd backend/python/transformers

# Install dependencies
make transformers
# or
bash install.sh

# Run the backend
make run
# or
bash run.sh

Using the Unified Build System

The libbackend.sh script provides consistent commands across all backends:

# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh

# Install requirements (automatically handles hardware detection)
installRequirements

# Start the backend server
startBackend $@

# Run tests
runUnittests

Hardware Targets

The build system automatically detects and configures for different hardware:

CPU - Standard CPU-only builds
CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
Intel - Intel XPU/GPU optimization
MLX - Apple Silicon (M1/M2/M3) optimization
HIP - AMD GPU acceleration

Target-Specific Requirements

Backends can specify hardware-specific dependencies:

requirements.txt - Base requirements
requirements-cpu.txt - CPU-specific packages
requirements-cublas12.txt - CUDA 12 packages
requirements-cublas13.txt - CUDA 13 packages
requirements-intel.txt - Intel-optimized packages
requirements-mps.txt - Apple Silicon packages

Configuration Options

Environment Variables

PYTHON_VERSION - Python version (default: 3.10)
PYTHON_PATCH - Python patch version (default: 18)
BUILD_TYPE - Force specific build target
USE_PIP - Use pip instead of uv (default: false)
PORTABLE_PYTHON - Enable portable Python builds
LIMIT_TARGETS - Restrict backend to specific targets

Example: CUDA 12 Only Backend

# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh

Example: Intel-Optimized Backend

# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh

Development

Adding a New Backend

Create a new directory in backend/python/
Copy the template structure from common/template/
Implement your backend.py with the required gRPC interface
Add appropriate requirements files for your target hardware
Use libbackend.sh for consistent build and execution

Testing

# Run backend tests
make test
# or
bash test.sh

Building

# Install dependencies
make <backend-name>

# Clean build artifacts
make clean

Architecture

Each backend follows a consistent structure:

backend-name/
├── backend.py          # Main backend implementation
├── requirements.txt    # Base dependencies
├── requirements-*.txt  # Hardware-specific dependencies
├── install.sh         # Installation script
├── run.sh            # Execution script
├── test.sh           # Test script
├── Makefile          # Build targets
└── test.py           # Unit tests

Troubleshooting

Common Issues

Missing dependencies: Ensure all requirements files are properly configured
Hardware detection: Check that BUILD_TYPE matches your system
Python version: Verify Python 3.10+ is available
Virtual environment: Use ensureVenv to create/activate environments

Contributing

When adding new backends or modifying existing ones:

Follow the established directory structure
Use libbackend.sh for consistent behavior
Include appropriate requirements files for all target hardware
Add comprehensive tests
Update this README if adding new backend types