LocalAI/docs/content/features/openai-realtime.md at eef81fd18921d0e65d5d04bd0c767aef3979c52c

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-11 02:07:27 -04:00

Files

LocalAI [bot] b203b32e57 feat(realtime): make WebRTC ICE candidates configurable (#10231 )

The /v1/realtime WebRTC handler created the peer connection with a bare
webrtc.Configuration and no SettingEngine, so pion gathered a host ICE
candidate for every local interface. Under Docker host networking that
includes bridge addresses (docker0/veth, 172.x) a remote browser cannot
route to; the call establishes on a good pair and then drops once ICE
consent freshness checks fail on the unreachable candidates.

Add two opt-in knobs, applied via a pion SettingEngine:
- LOCALAI_WEBRTC_NAT_1TO1_IPS: advertise these IPs as the host candidates
  (e.g. the host LAN IP)
- LOCALAI_WEBRTC_ICE_INTERFACES: restrict ICE gathering to these interfaces

Defaults are unchanged (empty => current all-interface behavior).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-09 22:28:03 +02:00

3.7 KiB

Raw Blame History

title: "Realtime API" weight: 60

LocalAI supports the OpenAI Realtime API which enables low-latency, multi-modal conversations (voice and text) over WebSocket.

To use the Realtime API, you need to configure a pipeline model that defines the components for Voice Activity Detection (VAD), Transcription (STT), Language Model (LLM), and Text-to-Speech (TTS).

Configuration

Create a model configuration file (e.g., gpt-realtime.yaml) in your models directory. For a complete reference of configuration options, see [Model Configuration]({{%relref "advanced/model-configuration" %}}).

name: gpt-realtime
pipeline:
  vad: silero-vad-ggml
  transcription: whisper-large-turbo
  llm: qwen3-4b
  tts: tts-1

This configuration links the following components:

vad: The Voice Activity Detection model (e.g., silero-vad-ggml) to detect when the user is speaking.
transcription: The Speech-to-Text model (e.g., whisper-large-turbo) to transcribe user audio.
llm: The Large Language Model (e.g., qwen3-4b) to generate responses.
tts: The Text-to-Speech model (e.g., tts-1) to synthesize the audio response.

Make sure all referenced models (silero-vad-ggml, whisper-large-turbo, qwen3-4b, tts-1) are also installed or defined in your LocalAI instance.

Transports

The Realtime API supports two transports: WebSocket and WebRTC.

WebSocket

Connect to the WebSocket endpoint:

ws://localhost:8080/v1/realtime?model=gpt-realtime

Audio is sent and received as raw PCM in the WebSocket messages, following the OpenAI Realtime API protocol.

WebRTC

The WebRTC transport enables browser-based voice conversations with lower latency. Connect by POSTing an SDP offer to the REST endpoint:

POST http://localhost:8080/v1/realtime?model=gpt-realtime
Content-Type: application/sdp

<SDP offer body>

The response contains the SDP answer to complete the WebRTC handshake.

Opus backend requirement

WebRTC uses the Opus audio codec for encoding and decoding audio on RTP tracks. The opus backend must be installed for WebRTC to work. Install it from the model gallery:

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{"id": "opus"}'

Or set the EXTERNAL_GRPC_BACKENDS environment variable if running a local build:

EXTERNAL_GRPC_BACKENDS=opus:/path/to/backend/go/opus/opus

The opus backend is loaded automatically when a WebRTC session starts. It does not require any model configuration file — just the backend binary.

WebRTC behind Docker host networking or NAT

By default pion gathers a host ICE candidate for every local interface. Under Docker host networking that includes bridge addresses (docker0/veth, 172.x) that a remote browser cannot route to: the call typically connects on a good candidate and then drops a few seconds later when ICE consent checks fail on the unreachable ones. Two settings let you advertise only the reachable address:

# Advertise these IPs as the host ICE candidates (e.g. the host's LAN IP)
LOCALAI_WEBRTC_NAT_1TO1_IPS=192.168.1.10

# ...or restrict ICE gathering to specific interfaces
LOCALAI_WEBRTC_ICE_INTERFACES=eth0

{{% notice tip %}} For a browser on another LAN machine talking to LocalAI in a host-networked container, set LOCALAI_WEBRTC_NAT_1TO1_IPS to the host's LAN IP. This is the most reliable fix for WebRTC connections that establish and then drop. {{% /notice %}}

Protocol

The API follows the OpenAI Realtime API protocol for handling sessions, audio buffers, and conversation items.

3.7 KiB Raw Blame History