mirror of https://github.com/mudler/LocalAI.git synced 2026-08-01 19:09:42 -04:00

Files

Ettore Di Giacinto 329df11989 fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel

The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.

- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
  requirements-cpu-after.txt so installRequirements installs the base
  deps + torch CPU without pulling the prebuilt wheel, then clone vllm
  and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
  target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
  it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
  when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
  recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
  tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
  works on hosts that share the build-time SIMD baseline.

Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.

2026-04-12 15:14:42 +00:00

ace-step

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

chatterbox

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

common

feat(vllm): CPU support + shared utils + vllm-omni feature parity

2026-04-12 14:48:28 +00:00

coqui

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

diffusers

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

faster-qwen3-tts

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

faster-whisper

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

fish-speech

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

kitten-tts

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

kokoro

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

llama-cpp-quantization

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mlx

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mlx-audio

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mlx-distributed

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mlx-vlm

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

moonshine

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

nemo

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

neutts

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

outetts

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

pocket-tts

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

qwen-asr

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

qwen-tts

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

rerankers

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

rfdetr

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

transformers

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

trl

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

vibevoice

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

vllm

fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel

2026-04-12 15:14:42 +00:00

vllm-omni

feat(vllm): CPU support + shared utils + vllm-omni feature parity

2026-04-12 14:48:28 +00:00

voxcpm

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

whisperx

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

README.md

chore: drop bark which is unmaintained (#8207 )

2026-01-25 09:26:40 +01:00

README.md

Python Backends for LocalAI

This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.

Overview

The Python backends use a unified build system based on libbackend.sh that provides:

Automatic virtual environment management with support for both uv and pip
Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
Portable Python support for standalone deployments
Consistent backend execution across different environments

Available Backends

Core AI Models

transformers - Hugging Face Transformers framework (PyTorch-based)
vllm - High-performance LLM inference engine
mlx - Apple Silicon optimized ML framework

Audio & Speech

coqui - Coqui TTS models
faster-whisper - Fast Whisper speech recognition
kitten-tts - Lightweight TTS
mlx-audio - Apple Silicon audio processing
chatterbox - TTS model
kokoro - TTS models

Computer Vision

diffusers - Stable Diffusion and image generation
mlx-vlm - Vision-language models for Apple Silicon
rfdetr - Object detection models

Specialized

rerankers - Text reranking models

Quick Start

Prerequisites

Python 3.10+ (default: 3.10.18)
uv package manager (recommended) or pip
Appropriate hardware drivers for your target (CUDA, Intel, etc.)

Installation

Each backend can be installed individually:

# Navigate to a specific backend
cd backend/python/transformers

# Install dependencies
make transformers
# or
bash install.sh

# Run the backend
make run
# or
bash run.sh

Using the Unified Build System

The libbackend.sh script provides consistent commands across all backends:

# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh

# Install requirements (automatically handles hardware detection)
installRequirements

# Start the backend server
startBackend $@

# Run tests
runUnittests

Hardware Targets

The build system automatically detects and configures for different hardware:

CPU - Standard CPU-only builds
CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
Intel - Intel XPU/GPU optimization
MLX - Apple Silicon (M1/M2/M3) optimization
HIP - AMD GPU acceleration

Target-Specific Requirements

Backends can specify hardware-specific dependencies:

requirements.txt - Base requirements
requirements-cpu.txt - CPU-specific packages
requirements-cublas12.txt - CUDA 12 packages
requirements-cublas13.txt - CUDA 13 packages
requirements-intel.txt - Intel-optimized packages
requirements-mps.txt - Apple Silicon packages

Configuration Options

Environment Variables

PYTHON_VERSION - Python version (default: 3.10)
PYTHON_PATCH - Python patch version (default: 18)
BUILD_TYPE - Force specific build target
USE_PIP - Use pip instead of uv (default: false)
PORTABLE_PYTHON - Enable portable Python builds
LIMIT_TARGETS - Restrict backend to specific targets

Example: CUDA 12 Only Backend

# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh

Example: Intel-Optimized Backend

# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh

Development

Adding a New Backend

Create a new directory in backend/python/
Copy the template structure from common/template/
Implement your backend.py with the required gRPC interface
Add appropriate requirements files for your target hardware
Use libbackend.sh for consistent build and execution

Testing

# Run backend tests
make test
# or
bash test.sh

Building

# Install dependencies
make <backend-name>

# Clean build artifacts
make clean

Architecture

Each backend follows a consistent structure:

backend-name/
├── backend.py          # Main backend implementation
├── requirements.txt    # Base dependencies
├── requirements-*.txt  # Hardware-specific dependencies
├── install.sh         # Installation script
├── run.sh            # Execution script
├── test.sh           # Test script
├── Makefile          # Build targets
└── test.py           # Unit tests

Troubleshooting

Common Issues

Missing dependencies: Ensure all requirements files are properly configured
Hardware detection: Check that BUILD_TYPE matches your system
Python version: Verify Python 3.10+ is available
Virtual environment: Use ensureVenv to create/activate environments

Contributing

When adding new backends or modifying existing ones:

Follow the established directory structure
Use libbackend.sh for consistent behavior
Include appropriate requirements files for all target hardware
Add comprehensive tests
Update this README if adding new backend types