mirror of
https://github.com/mudler/LocalAI.git
synced 2026-03-23 17:21:51 -04:00
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Python Backends for LocalAI
This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.
Overview
The Python backends use a unified build system based on libbackend.sh that provides:
- Automatic virtual environment management with support for both
uvandpip - Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
- Portable Python support for standalone deployments
- Consistent backend execution across different environments
Available Backends
Core AI Models
- transformers - Hugging Face Transformers framework (PyTorch-based)
- vllm - High-performance LLM inference engine
- mlx - Apple Silicon optimized ML framework
Audio & Speech
- coqui - Coqui TTS models
- faster-whisper - Fast Whisper speech recognition
- kitten-tts - Lightweight TTS
- mlx-audio - Apple Silicon audio processing
- chatterbox - TTS model
- kokoro - TTS models
Computer Vision
- diffusers - Stable Diffusion and image generation
- mlx-vlm - Vision-language models for Apple Silicon
- rfdetr - Object detection models
Specialized
- rerankers - Text reranking models
Quick Start
Prerequisites
- Python 3.10+ (default: 3.10.18)
uvpackage manager (recommended) orpip- Appropriate hardware drivers for your target (CUDA, Intel, etc.)
Installation
Each backend can be installed individually:
# Navigate to a specific backend
cd backend/python/transformers
# Install dependencies
make transformers
# or
bash install.sh
# Run the backend
make run
# or
bash run.sh
Using the Unified Build System
The libbackend.sh script provides consistent commands across all backends:
# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh
# Install requirements (automatically handles hardware detection)
installRequirements
# Start the backend server
startBackend $@
# Run tests
runUnittests
Hardware Targets
The build system automatically detects and configures for different hardware:
- CPU - Standard CPU-only builds
- CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
- Intel - Intel XPU/GPU optimization
- MLX - Apple Silicon (M1/M2/M3) optimization
- HIP - AMD GPU acceleration
Target-Specific Requirements
Backends can specify hardware-specific dependencies:
requirements.txt- Base requirementsrequirements-cpu.txt- CPU-specific packagesrequirements-cublas12.txt- CUDA 12 packagesrequirements-cublas13.txt- CUDA 13 packagesrequirements-intel.txt- Intel-optimized packagesrequirements-mps.txt- Apple Silicon packages
Configuration Options
Environment Variables
PYTHON_VERSION- Python version (default: 3.10)PYTHON_PATCH- Python patch version (default: 18)BUILD_TYPE- Force specific build targetUSE_PIP- Use pip instead of uv (default: false)PORTABLE_PYTHON- Enable portable Python buildsLIMIT_TARGETS- Restrict backend to specific targets
Example: CUDA 12 Only Backend
# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh
Example: Intel-Optimized Backend
# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh
Development
Adding a New Backend
- Create a new directory in
backend/python/ - Copy the template structure from
common/template/ - Implement your
backend.pywith the required gRPC interface - Add appropriate requirements files for your target hardware
- Use
libbackend.shfor consistent build and execution
Testing
# Run backend tests
make test
# or
bash test.sh
Building
# Install dependencies
make <backend-name>
# Clean build artifacts
make clean
Architecture
Each backend follows a consistent structure:
backend-name/
├── backend.py # Main backend implementation
├── requirements.txt # Base dependencies
├── requirements-*.txt # Hardware-specific dependencies
├── install.sh # Installation script
├── run.sh # Execution script
├── test.sh # Test script
├── Makefile # Build targets
└── test.py # Unit tests
Troubleshooting
Common Issues
- Missing dependencies: Ensure all requirements files are properly configured
- Hardware detection: Check that
BUILD_TYPEmatches your system - Python version: Verify Python 3.10+ is available
- Virtual environment: Use
ensureVenvto create/activate environments
Contributing
When adding new backends or modifying existing ones:
- Follow the established directory structure
- Use
libbackend.shfor consistent behavior - Include appropriate requirements files for all target hardware
- Add comprehensive tests
- Update this README if adding new backend types