mirror of https://github.com/mudler/LocalAI.git synced 2026-07-02 12:26:49 -04:00

Files

Ettore Di Giacinto 94e3e06b8b fix(process): extend parent-death backstop to C++ and Python backends

The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435d5)
only protects backends that route through pkg/grpc. C++ and Python
backends don't, so the originally-reported case — the llama.cpp gRPC
worker surviving a non-graceful LocalAI death — was still uncovered.

Extend the same best-effort backstop to both languages, reusing the
exact mechanism and semantics:

- capture getppid() at startup, skip if already orphaned (<=1)
- a background thread polls getppid() and self-exits on reparenting
  (getppid() != orig || == 1), portable across Linux/macOS, no-op on
  Windows
- same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy
  false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL
  (default 2s; accepts Go-style durations like 500ms/2s/1m)

C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++
backend) as a dependency-free header parent_watch.h, wired into
grpc-server.cpp's main() and copied at build time via prepare.sh. C++
backends have no shared server scaffolding, so other C++ backends
(ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would
each need the same one-line include+call as follow-ups.

Python: implemented once in the shared common/parent_watch.py and armed
from common/grpc_auth.py's get_auth_interceptors() — the single helper
every one of the 35 Python backends invokes while building its gRPC
server — so all Python backends (and future ones) are covered with no
per-backend edits and no duplicated implementation.

Tests (real process-tree reparent detection, mirroring the Go test):
- backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh)
- backend/python/common/parent_watch_test.py (python -m unittest)

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-07-02 07:31:31 +00:00

ace-step

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

chatterbox

feat(tts): support per-request instructions and params (#10172 )

2026-06-04 11:45:02 +02:00

common

fix(process): extend parent-death backstop to C++ and Python backends

2026-07-02 07:31:31 +00:00

coqui

chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui (#9594 )

2026-04-28 08:44:53 +02:00

diffusers

fix(diffusers): pin diffusers and transformers to a known-good pair (#9979 ) (#10442 )

2026-06-22 12:38:06 +02:00

faster-qwen3-tts

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

faster-whisper

test(ci): trigger faster-whisper rebuild to observe per-arch+merge

2026-05-08 22:09:46 +00:00

fish-speech

fix(fish-speech): allow invalid_reference_casting so tokenizers builds on darwin (#10573 )

2026-06-28 19:10:27 +02:00

insightface

feat: add biometrics UI (#9524 )

2026-04-24 08:50:34 +02:00

kitten-tts

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

kokoro

fix(kokoro): add explicit click dep so spacy CLI works on intel build (#10572 )

2026-06-28 11:29:17 +02:00

liquid-audio

feat(backends): add darwin/metal build for liquid-audio (#10486 )

2026-06-24 23:16:27 +02:00

llama-cpp-quantization

fix(backends): repair release CI build/test breaks (kokoros, fish-speech, llama-cpp-quantization, sglang) (#10547 )

2026-06-27 09:42:22 +02:00

mlx

fix(mlx): route vision-language models to the mlx-vlm backend (#10274 )

2026-06-12 23:12:42 +02:00

mlx-audio

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mlx-distributed

feat: refactor shared helpers and enhance MLX backend functionality (#9335 )

2026-04-13 18:44:03 +02:00

mlx-vlm

fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds (#9568 )

2026-04-25 22:06:01 +02:00

moonshine

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

nemo

feat(nemo): enable word-level timestamps for ASR models (#10297 )

2026-06-21 17:04:19 +02:00

neutts

fix(neutts): pin torchaudio to match torch (fixes undefined symbol) (#9798 ) (#10292 )

2026-06-13 09:28:41 +02:00

outetts

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

pocket-tts

feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629 )

2026-05-01 10:56:24 +02:00

qwen-asr

fix(qwen-asr): enable timestamp output when forced_aligner is configured (#10013 )

2026-05-26 20:34:21 +00:00

qwen-tts

feat(tts): support per-request instructions and params (#10172 )

2026-06-04 11:45:02 +02:00

rerankers

fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 (#9688 )

2026-05-06 17:07:24 +02:00

rfdetr

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

sglang

fix(sglang): parse tool_call function arguments before applying the chat template (#10558 )

2026-06-30 09:00:51 +02:00

speaker-recognition

fix(darwin): publish sherpa-onnx and speaker-recognition images for darwin/arm64 (#10275 )

2026-06-12 22:32:42 +02:00

tinygrad

feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629 )

2026-05-01 10:56:24 +02:00

transformers

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

trl

feat(backends): add darwin/metal (MPS) build for trl (#10487 )

2026-06-25 08:09:36 +02:00

vibevoice

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

vllm

chore: ⬆️ Update vllm-metal (darwin) to v0.3.0.dev20260630095652 (#10616 )

2026-07-01 21:56:59 +02:00

vllm-omni

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

voxcpm

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

whisperx

fix(whisperx): use whisperx.diarize.DiarizationPipeline with token kwarg (#10389 )

2026-06-18 18:50:37 +02:00

README.md

chore: drop bark which is unmaintained (#8207 )

2026-01-25 09:26:40 +01:00

README.md

Python Backends for LocalAI

This directory contains Python-based AI backends for LocalAI, providing support for various AI models and hardware acceleration targets.

Overview

The Python backends use a unified build system based on libbackend.sh that provides:

Automatic virtual environment management with support for both uv and pip
Hardware-specific dependency installation (CPU, CUDA, Intel, MLX, etc.)
Portable Python support for standalone deployments
Consistent backend execution across different environments

Available Backends

Core AI Models

transformers - Hugging Face Transformers framework (PyTorch-based)
vllm - High-performance LLM inference engine
mlx - Apple Silicon optimized ML framework

Audio & Speech

coqui - Coqui TTS models
faster-whisper - Fast Whisper speech recognition
kitten-tts - Lightweight TTS
mlx-audio - Apple Silicon audio processing
chatterbox - TTS model
kokoro - TTS models

Computer Vision

diffusers - Stable Diffusion and image generation
mlx-vlm - Vision-language models for Apple Silicon
rfdetr - Object detection models

Specialized

rerankers - Text reranking models

Quick Start

Prerequisites

Python 3.10+ (default: 3.10.18)
uv package manager (recommended) or pip
Appropriate hardware drivers for your target (CUDA, Intel, etc.)

Installation

Each backend can be installed individually:

# Navigate to a specific backend
cd backend/python/transformers

# Install dependencies
make transformers
# or
bash install.sh

# Run the backend
make run
# or
bash run.sh

Using the Unified Build System

The libbackend.sh script provides consistent commands across all backends:

# Source the library in your backend script
source $(dirname $0)/../common/libbackend.sh

# Install requirements (automatically handles hardware detection)
installRequirements

# Start the backend server
startBackend $@

# Run tests
runUnittests

Hardware Targets

The build system automatically detects and configures for different hardware:

CPU - Standard CPU-only builds
CUDA - NVIDIA GPU acceleration (supports CUDA 12/13)
Intel - Intel XPU/GPU optimization
MLX - Apple Silicon (M1/M2/M3) optimization
HIP - AMD GPU acceleration

Target-Specific Requirements

Backends can specify hardware-specific dependencies:

requirements.txt - Base requirements
requirements-cpu.txt - CPU-specific packages
requirements-cublas12.txt - CUDA 12 packages
requirements-cublas13.txt - CUDA 13 packages
requirements-intel.txt - Intel-optimized packages
requirements-mps.txt - Apple Silicon packages

Configuration Options

Environment Variables

PYTHON_VERSION - Python version (default: 3.10)
PYTHON_PATCH - Python patch version (default: 18)
BUILD_TYPE - Force specific build target
USE_PIP - Use pip instead of uv (default: false)
PORTABLE_PYTHON - Enable portable Python builds
LIMIT_TARGETS - Restrict backend to specific targets

Example: CUDA 12 Only Backend

# In your backend script
LIMIT_TARGETS="cublas12"
source $(dirname $0)/../common/libbackend.sh

Example: Intel-Optimized Backend

# In your backend script
LIMIT_TARGETS="intel"
source $(dirname $0)/../common/libbackend.sh

Development

Adding a New Backend

Create a new directory in backend/python/
Copy the template structure from common/template/
Implement your backend.py with the required gRPC interface
Add appropriate requirements files for your target hardware
Use libbackend.sh for consistent build and execution

Testing

# Run backend tests
make test
# or
bash test.sh

Building

# Install dependencies
make <backend-name>

# Clean build artifacts
make clean

Architecture

Each backend follows a consistent structure:

backend-name/
├── backend.py          # Main backend implementation
├── requirements.txt    # Base dependencies
├── requirements-*.txt  # Hardware-specific dependencies
├── install.sh         # Installation script
├── run.sh            # Execution script
├── test.sh           # Test script
├── Makefile          # Build targets
└── test.py           # Unit tests

Troubleshooting

Common Issues

Missing dependencies: Ensure all requirements files are properly configured
Hardware detection: Check that BUILD_TYPE matches your system
Python version: Verify Python 3.10+ is available
Virtual environment: Use ensureVenv to create/activate environments

Contributing

When adding new backends or modifying existing ones:

Follow the established directory structure
Use libbackend.sh for consistent behavior
Include appropriate requirements files for all target hardware
Add comprehensive tests
Update this README if adding new backend types