Files
LocalAI/backend/python
Ettore Di Giacinto 329df11989 fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel
The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.

- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
  requirements-cpu-after.txt so installRequirements installs the base
  deps + torch CPU without pulling the prebuilt wheel, then clone vllm
  and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
  target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
  it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
  when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
  recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
  tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
  works on hosts that share the build-time SIMD baseline.

Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.
2026-04-12 15:14:42 +00:00
..
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-03-30 00:47:27 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-03-30 00:47:27 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00
2026-04-12 08:51:30 +02:00

Python Backends for LocalAI

This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.

Overview

The Python backends use a unified build system based on libbackend.sh that provides:

  • Automatic virtual environment management with support for both uv and pip
  • Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
  • Portable Python support for standalone deployments
  • Consistent backend execution across different environments

Available Backends

Core AI Models

  • transformers - Hugging Face Transformers framework (PyTorch-based)
  • vllm - High-performance LLM inference engine
  • mlx - Apple Silicon optimized ML framework

Audio & Speech

  • coqui - Coqui TTS models
  • faster-whisper - Fast Whisper speech recognition
  • kitten-tts - Lightweight TTS
  • mlx-audio - Apple Silicon audio processing
  • chatterbox - TTS model
  • kokoro - TTS models

Computer Vision

  • diffusers - Stable Diffusion and image generation
  • mlx-vlm - Vision-language models for Apple Silicon
  • rfdetr - Object detection models

Specialized

  • rerankers - Text reranking models

Quick Start

Prerequisites

  • Python 3.10+ (default: 3.10.18)
  • uv package manager (recommended) or pip
  • Appropriate hardware drivers for your target (CUDA, Intel, etc.)

Installation

Each backend can be installed individually:

# Navigate to a specific backend
cd backend/python/transformers

# Install dependencies
make transformers
# or
bash install.sh

# Run the backend
make run
# or
bash run.sh

Using the Unified Build System

The libbackend.sh script provides consistent commands across all backends:

# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh

# Install requirements (automatically handles hardware detection)
installRequirements

# Start the backend server
startBackend $@

# Run tests
runUnittests

Hardware Targets

The build system automatically detects and configures for different hardware:

  • CPU - Standard CPU-only builds
  • CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
  • Intel - Intel XPU/GPU optimization
  • MLX - Apple Silicon (M1/M2/M3) optimization
  • HIP - AMD GPU acceleration

Target-Specific Requirements

Backends can specify hardware-specific dependencies:

  • requirements.txt - Base requirements
  • requirements-cpu.txt - CPU-specific packages
  • requirements-cublas12.txt - CUDA 12 packages
  • requirements-cublas13.txt - CUDA 13 packages
  • requirements-intel.txt - Intel-optimized packages
  • requirements-mps.txt - Apple Silicon packages

Configuration Options

Environment Variables

  • PYTHON_VERSION - Python version (default: 3.10)
  • PYTHON_PATCH - Python patch version (default: 18)
  • BUILD_TYPE - Force specific build target
  • USE_PIP - Use pip instead of uv (default: false)
  • PORTABLE_PYTHON - Enable portable Python builds
  • LIMIT_TARGETS - Restrict backend to specific targets

Example: CUDA 12 Only Backend

# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh

Example: Intel-Optimized Backend

# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh

Development

Adding a New Backend

  1. Create a new directory in backend/python/
  2. Copy the template structure from common/template/
  3. Implement your backend.py with the required gRPC interface
  4. Add appropriate requirements files for your target hardware
  5. Use libbackend.sh for consistent build and execution

Testing

# Run backend tests
make test
# or
bash test.sh

Building

# Install dependencies
make <backend-name>

# Clean build artifacts
make clean

Architecture

Each backend follows a consistent structure:

backend-name/
├── backend.py          # Main backend implementation
├── requirements.txt    # Base dependencies
├── requirements-*.txt  # Hardware-specific dependencies
├── install.sh         # Installation script
├── run.sh            # Execution script
├── test.sh           # Test script
├── Makefile          # Build targets
└── test.py           # Unit tests

Troubleshooting

Common Issues

  1. Missing dependencies: Ensure all requirements files are properly configured
  2. Hardware detection: Check that BUILD_TYPE matches your system
  3. Python version: Verify Python 3.10+ is available
  4. Virtual environment: Use ensureVenv to create/activate environments

Contributing

When adding new backends or modifying existing ones:

  1. Follow the established directory structure
  2. Use libbackend.sh for consistent behavior
  3. Include appropriate requirements files for all target hardware
  4. Add comprehensive tests
  5. Update this README if adding new backend types