mirror of https://github.com/mudler/LocalAI.git synced 2026-07-23 06:34:48 -04:00

Files

Richard Palethorpe 718223f33b feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 )

* chore(localvqe): update backend to v1.3, add v1.2/v1.3 gallery models

Bump the LocalVQE backend pin 72bfb4c6 -> b0f0378a, which adds the v1.2
(1.3 M) and v1.3 (4.8 M) GGUF SHA-256s to the upstream released-models
allowlist (and the arch_version=3 loader) so both load without
LOCALVQE_ALLOW_UNHASHED.

Add gallery entries for localvqe-v1.2-1.3m and localvqe-v1.3-4.8m
(SHA-256 verified against the downloaded weights) and update the
audio-transform docs to make v1.3 the current default while noting the
compact v1.1/v1.2 alternatives.

Assisted-by: Claude:claude-opus-4-8 Claude-Code
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(flake): add ffmpeg-headless to the dev shell

pkg/utils/ffmpeg_test.go shells out to the `ffmpeg` CLI, and the
pre-commit gate runs those tests via `make test-coverage`. Without
ffmpeg in the dev shell the gate fails with "executable file not found
in $PATH". The headless build provides the CLI without GUI/X deps.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(localvqe): parse WAV by walking RIFF sub-chunks

Walk the RIFF chunk list instead of assuming the canonical 44-byte
header layout. Real inputs (browser-recorded clips, ffmpeg output with
an 18/40-byte extensible `fmt ` chunk or trailing LIST/INFO metadata)
would otherwise splice header/metadata bytes into the PCM stream as an
audible impulse. Honour the `data` chunk size and validate that both
`fmt ` and `data` chunks are present.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(security-headers): allow blob: in connect-src for waveform fetch

The waveform renderer XHRs/fetches a freshly-created blob: object URL
(e.g. an uploaded or enhanced clip before it has a server URL). XHR/fetch
of blob: is governed by connect-src, not media-src, so it was blocked by
the CSP. Add blob: to connect-src.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(react-ui): add input/output spectrogram view to AudioTransform

The transform page only showed time-domain amplitude waveforms, so you
could see how loud a clip was but not which frequencies the model
touched. Add a time x frequency spectrogram heatmap and render the input
and output spectrums side by side, so it's visible which bands the
enhancement attenuates (bright input bands that go dark in the output).

Computed client-side via a Hann-windowed STFT over both clips (a small
dependency-free radix-2 FFT), defaulting to the LocalVQE 512/256 frame
geometry. This shows the net input->output spectral change; the model's
internal gain mask is not exposed by the backend.

- src/utils/fft.js            radix-2 FFT
- src/hooks/useSpectrogram.js decode + STFT -> normalised dB magnitude grid
- src/components/audio/Spectrogram.jsx  canvas heatmap (magma colormap)
- AudioTransform.jsx          dual-spectrogram panel + CSS
- e2e spec + UI coverage baseline bump (38.29 -> 39.0; measured ~39.4-40.2)

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* test(react-ui): make UI coverage deterministic, tighten the gate

UI e2e line coverage swung ~1pp run-to-run (39.1% <-> 40.2%), which forced
a loose 0.8pp tolerance on the monotonic gate — a band wide enough to let
a real ~300-line regression through silently. The swing was a bug, not
inherent jitter: the 'Create Agent navigates' spec ended on the URL
assertion, so AgentCreate.jsx's ~400 lines were collected only when its
render happened to beat the coverage teardown.

Wait for the page to actually render (assert its heading) so those lines
are covered every run. With the race gone, repeated runs land within
~0.013pp of each other, so:

- tighten UI_COVERAGE_TOLERANCE 0.8 -> 0.1 (noise floor, not a drift band)
- set the baseline to the real, reliably-achieved value (39.0 -> 39.86)

Localised by running the V8-coverage suite repeatedly and diffing per-file
line coverage; AgentCreate.jsx was the sole ~1pp flipper.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>

2026-05-31 23:56:46 +02:00

cpp

chore: ⬆️ Update ggml-org/llama.cpp to d6588daa800058dfa54f1d7ea695b1a810c8ae18 (#10093 )

2026-05-31 10:26:03 +00:00

feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 )

2026-05-31 23:56:46 +02:00

python

chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.22.0 (#10079 )

2026-05-30 00:11:41 +02:00

rust/kokoros

Fix kokoros backend build break from Backend trait drift (#9972 )

2026-05-24 22:39:15 +02:00

backend.proto

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

Dockerfile.base-grpc-builder

ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738 )

2026-05-10 00:03:52 +02:00

Dockerfile.ds4

feat: add ds4 backend (DeepSeek V4 Flash) with tool calls, thinking, KV cache (#9758 )

2026-05-11 22:15:47 +02:00

Dockerfile.golang

feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )

2026-05-05 15:10:13 +02:00

Dockerfile.ik-llama-cpp

ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738 )

2026-05-10 00:03:52 +02:00

Dockerfile.llama-cpp

ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738 )

2026-05-10 00:03:52 +02:00

Dockerfile.python

feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650 )

2026-05-03 23:50:13 +02:00

Dockerfile.rust

feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650 )

2026-05-03 23:50:13 +02:00

Dockerfile.turboquant

feat(llama-cpp): bump to 1ec7ba0c, adapt grpc-server, expose new spec-decoding options (#9765 )

2026-05-12 17:22:37 +02:00

index.yaml

feat(crispasr): add CrispASR backend — multi-architecture ASR + TTS (#10099 )

2026-05-31 12:11:03 +02:00

README.md

Remove HuggingFace backend support (#8971 )

2026-03-13 01:09:30 +01:00

README.md

LocalAI Backend Architecture

This directory contains the core backend infrastructure for LocalAI, including the gRPC protocol definition, multi-language Dockerfiles, and language-specific backend implementations.

Overview

LocalAI uses a unified gRPC-based architecture that allows different programming languages to implement AI backends while maintaining consistent interfaces and capabilities. The backend system supports multiple hardware acceleration targets and provides a standardized way to integrate various AI models and frameworks.

Architecture Components

1. Protocol Definition (`backend.proto`)

The backend.proto file defines the gRPC service interface that all backends must implement. This ensures consistency across different language implementations and provides a contract for communication between LocalAI core and backend services.

Core Services

Text Generation: Predict, PredictStream for LLM inference
Embeddings: Embedding for text vectorization
Image Generation: GenerateImage for stable diffusion and image models
Audio Processing: AudioTranscription, TTS, SoundGeneration
Video Generation: GenerateVideo for video synthesis
Object Detection: Detect for computer vision tasks
Vector Storage: StoresSet, StoresGet, StoresFind for RAG operations
Reranking: Rerank for document relevance scoring
Voice Activity Detection: VAD for audio segmentation

Key Message Types

PredictOptions: Comprehensive configuration for text generation
ModelOptions: Model loading and configuration parameters
Result: Standardized response format
StatusResponse: Backend health and memory usage information

2. Multi-Language Dockerfiles

The backend system provides language-specific Dockerfiles that handle the build environment and dependencies for different programming languages:

Dockerfile.python
Dockerfile.golang
Dockerfile.llama-cpp

3. Language-Specific Implementations

Python Backends (`python/`)

transformers: Hugging Face Transformers framework
vllm: High-performance LLM inference
mlx: Apple Silicon optimization
diffusers: Stable Diffusion models
Audio: coqui, faster-whisper, kitten-tts
Vision: mlx-vlm, rfdetr
Specialized: rerankers, chatterbox, kokoro

Go Backends (`go/`)

whisper: OpenAI Whisper speech recognition in Go with GGML cpp backend (whisper.cpp)
stablediffusion-ggml: Stable Diffusion in Go with GGML Cpp backend
piper: Text-to-speech synthesis Golang with C bindings using rhaspy/piper
local-store: Vector storage backend

C++ Backends (`cpp/`)

llama-cpp: Llama.cpp integration
grpc: GRPC utilities and helpers

Hardware Acceleration Support

CUDA (NVIDIA)

Versions: CUDA 12.x, 13.x
Features: cuBLAS, cuDNN, TensorRT optimization
Targets: x86_64, ARM64 (Jetson)

ROCm (AMD)

Features: HIP, rocBLAS, MIOpen
Targets: AMD GPUs with ROCm support

Intel

Features: oneAPI, Intel Extension for PyTorch
Targets: Intel GPUs, XPUs, CPUs

Vulkan

Features: Cross-platform GPU acceleration
Targets: Windows, Linux, Android, macOS

Apple Silicon

Features: MLX framework, Metal Performance Shaders
Targets: M1/M2/M3 Macs

Backend Registry (`index.yaml`)

The index.yaml file serves as a central registry for all available backends, providing:

Metadata: Name, description, license, icons
Capabilities: Hardware targets and optimization profiles
Tags: Categorization for discovery
URLs: Source code and documentation links

Building Backends

Prerequisites

Docker with multi-architecture support
Appropriate hardware drivers (CUDA, ROCm, etc.)
Build tools (make, cmake, compilers)

Build Commands

Example of build commands with Docker

# Build Python backend
docker build -f backend/Dockerfile.python \
  --build-arg BACKEND=transformers \
  --build-arg BUILD_TYPE=cublas12 \
  --build-arg CUDA_MAJOR_VERSION=12 \
  --build-arg CUDA_MINOR_VERSION=0 \
  -t localai-backend-transformers .

# Build Go backend
docker build -f backend/Dockerfile.golang \
  --build-arg BACKEND=whisper \
  --build-arg BUILD_TYPE=cpu \
  -t localai-backend-whisper .

# Build C++ backend
docker build -f backend/Dockerfile.llama-cpp \
  --build-arg BACKEND=llama-cpp \
  --build-arg BUILD_TYPE=cublas12 \
  -t localai-backend-llama-cpp .

For ARM64/Mac builds, docker can't be used, and the makefile in the respective backend has to be used.

Build Types

cpu: CPU-only optimization
cublas12, cublas13: CUDA 12.x, 13.x with cuBLAS
hipblas: ROCm with rocBLAS
intel: Intel oneAPI optimization
vulkan: Vulkan-based acceleration
metal: Apple Metal optimization

Backend Development

Creating a New Backend

Choose Language: Select Python, Go, or C++ based on requirements
Implement Interface: Implement the gRPC service defined in backend.proto
Add Dependencies: Create appropriate requirements files
Configure Build: Set up Dockerfile and build scripts
Register Backend: Add entry to index.yaml
Test Integration: Verify gRPC communication and functionality

Backend Structure

backend-name/
├── backend.py/go/cpp    # Main implementation
├── requirements.txt      # Dependencies
├── Dockerfile           # Build configuration
├── install.sh           # Installation script
├── run.sh              # Execution script
├── test.sh             # Test script
└── README.md           # Backend documentation

Required gRPC Methods

At minimum, backends must implement:

Health() - Service health check
LoadModel() - Model loading and initialization
Predict() - Main inference endpoint
Status() - Backend status and metrics

Integration with LocalAI Core

Backends communicate with LocalAI core through gRPC:

Service Discovery: Core discovers available backends
Model Loading: Core requests model loading via LoadModel
Inference: Core sends requests via Predict or specialized endpoints
Streaming: Core handles streaming responses for real-time generation
Monitoring: Core tracks backend health and performance

Performance Optimization

Memory Management

Model Caching: Efficient model loading and caching
Batch Processing: Optimize for multiple concurrent requests
Memory Pinning: GPU memory optimization for CUDA/ROCm

Hardware Utilization

Multi-GPU: Support for tensor parallelism
Mixed Precision: FP16/BF16 for memory efficiency
Kernel Fusion: Optimized CUDA/ROCm kernels

Troubleshooting

Common Issues

GRPC Connection: Verify backend service is running and accessible
Model Loading: Check model paths and dependencies
Hardware Detection: Ensure appropriate drivers and libraries
Memory Issues: Monitor GPU memory usage and model sizes

Contributing

When contributing to the backend system:

Follow Protocol: Implement the exact gRPC interface
Add Tests: Include comprehensive test coverage
Document: Provide clear usage examples
Optimize: Consider performance and resource usage
Validate: Test across different hardware targets

README.md

LocalAI Backend Architecture

Overview

Architecture Components

1. Protocol Definition (backend.proto)

Core Services

Key Message Types

2. Multi-Language Dockerfiles

3. Language-Specific Implementations

Python Backends (python/)

Go Backends (go/)

C++ Backends (cpp/)

Hardware Acceleration Support

CUDA (NVIDIA)

ROCm (AMD)

Intel

Vulkan

Apple Silicon

Backend Registry (index.yaml)

Building Backends

Prerequisites

Build Commands

Build Types

Backend Development

Creating a New Backend

Backend Structure

Required gRPC Methods

Integration with LocalAI Core

Performance Optimization

Memory Management

Hardware Utilization

Troubleshooting

Common Issues

Contributing

1. Protocol Definition (`backend.proto`)

Python Backends (`python/`)

Go Backends (`go/`)

C++ Backends (`cpp/`)

Backend Registry (`index.yaml`)