From 3e838c0cff8d7ff5c2a0e51e2ed2a5d18398ac5f Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 13 Jun 2026 20:10:07 +0000 Subject: [PATCH] docs: add realtime voice demo example and refresh README news Add the localai-org/localai-realtime-demo Go client to the README Examples list and to the realtime docs (integrations + realtime feature page). Refresh the Latest News section with June 2026 highlights pulled from history since v4.3.0: realtime pipeline streaming, the parakeet.cpp and CrispASR speech work, new backends (locate-anything.cpp, Ideogram4, llama.cpp video input), and distributed-mode hardening. Signed-off-by: Ettore Di Giacinto Assisted-by: Claude:claude-opus-4-8 [Claude Code] --- README.md | 6 +++++- docs/content/features/openai-realtime.md | 5 +++++ docs/content/integrations.md | 5 +++-- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index d6cc8c23e..d8a4c33b5 100644 --- a/README.md +++ b/README.md @@ -165,6 +165,10 @@ For more details, see the [Getting Started guide](https://localai.io/basics/gett ## Latest News +- **June 2026**: New [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (a tiny Go client for the Realtime API with a full talk-back voice loop and tool calling), plus [streaming of the realtime LLM / TTS / transcription pipeline stages](https://github.com/mudler/LocalAI/pull/10176) and [configurable WebRTC ICE candidates](https://github.com/mudler/LocalAI/pull/10231). +- **June 2026**: Big speech push: the [parakeet.cpp](https://github.com/mudler/parakeet.cpp) ASR engine gains [NeMo-faithful segment timestamps](https://github.com/mudler/LocalAI/pull/10207), a [multilingual streaming Nemotron-3.5 model](https://github.com/mudler/LocalAI/pull/10199), [dynamic batching for concurrent transcription](https://github.com/mudler/LocalAI/pull/10112) and [CUDA graphs](https://github.com/mudler/LocalAI/pull/10273); the new [CrispASR backend](https://github.com/mudler/LocalAI/pull/10099) adds multi-architecture ASR + TTS, and [60 Piper TTS voices across 42 languages](https://github.com/mudler/LocalAI/pull/10296) land in the gallery (plus [per-request TTS instructions and params](https://github.com/mudler/LocalAI/pull/10172)). +- **June 2026**: New backends and models: [locate-anything.cpp](https://github.com/mudler/LocalAI/pull/10264) for open-vocabulary object detection via ggml, [Ideogram4 image generation](https://github.com/mudler/LocalAI/pull/10201) in stablediffusion-ggml, [llama.cpp video input](https://github.com/mudler/LocalAI/pull/10216), and the [Gemma 4 QAT family with MTP speculative-decoding pairs](https://github.com/mudler/LocalAI/pull/10215). Plus an [interactive CLI chat mode](https://github.com/mudler/LocalAI/pull/10226) and [RAG source citations in agent responses](https://github.com/mudler/LocalAI/pull/10228). +- **June 2026**: Distributed mode hardening: [prefix-cache-aware routing](https://github.com/mudler/LocalAI/pull/10071), a [production-ready request router with auto-sized embedding/rerank batches](https://github.com/mudler/LocalAI/pull/10104), [ds4 layer-split distributed inference](https://github.com/mudler/LocalAI/pull/10098), [NATS JWT auth + TLS/mTLS](https://github.com/mudler/LocalAI/pull/10159), and [resumable file uploads](https://github.com/mudler/LocalAI/pull/10109). - **May 2026**: **LocalAI 4.3.0** - `llama.cpp` [prompt cache on by default](https://github.com/mudler/LocalAI/pull/9925) (repeated system prompts collapse from minutes to seconds), [keyless cosign signing of backend OCI images](https://github.com/mudler/LocalAI/pull/9823), [per-API-key + per-user usage attribution](https://github.com/mudler/LocalAI/pull/9920), Distributed v3 with [per-request replica routing](https://github.com/mudler/LocalAI/pull/9968). [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.3.0) - **May 2026**: **LocalAI 4.2.0** - LocalAI sees and hears: [voice recognition](https://github.com/mudler/LocalAI/pull/9500), [face recognition + antispoofing liveness](https://github.com/mudler/LocalAI/pull/9480), speaker diarization. Plus [drop-in Ollama API](https://github.com/mudler/LocalAI/pull/9284), [video generation](https://github.com/mudler/LocalAI/pull/9420), redesigned UI with i18n + admin-configurable branding, vLLM at feature parity with llama.cpp, and 11 new backends. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.2.0) - **April 2026**: **LocalAI 4.1.0** - LocalAI becomes a control tower: distributed cluster mode with VRAM-aware smart routing + autoscaling, multi-user platform with OIDC and API keys, per-user quotas with predictive analytics, in-UI fine-tuning with TRL (auto-export to GGUF), on-the-fly quantization backend, visual pipeline editor. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v4.1.0) @@ -217,7 +221,7 @@ See the full [Backend & Model Compatibility Table](https://localai.io/model-comp - [Integrations & community projects](https://localai.io/docs/integrations/) - [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4) - [Media & blog posts](https://localai.io/basics/news/#media-blogs-social) -- [Examples](https://github.com/mudler/LocalAI-examples) +- [Examples](https://github.com/mudler/LocalAI-examples) — including the [realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo) (Go client for the Realtime API with tool calling) ## Team diff --git a/docs/content/features/openai-realtime.md b/docs/content/features/openai-realtime.md index 2bd340578..d8ae1212b 100644 --- a/docs/content/features/openai-realtime.md +++ b/docs/content/features/openai-realtime.md @@ -136,3 +136,8 @@ most reliable fix for WebRTC connections that establish and then drop. ## Protocol The API follows the OpenAI Realtime API protocol for handling sessions, audio buffers, and conversation items. + +## Examples + +- [Realtime voice assistant demo (Go)](https://github.com/localai-org/localai-realtime-demo): a minimal Go client for the Realtime (WebSocket) API with a full talk-back voice loop and an example tool call. Ships a `docker compose` setup that brings up a realtime-capable LocalAI for you. +- [Realtime voice assistant example (Python)](https://github.com/mudler/LocalAI-examples/tree/main/realtime): thin-client architecture (Silero VAD on the client, heavy lifting on LocalAI), suited to running the client on a Raspberry Pi. diff --git a/docs/content/integrations.md b/docs/content/integrations.md index 5cf03ee5b..2946cd538 100644 --- a/docs/content/integrations.md +++ b/docs/content/integrations.md @@ -381,7 +381,7 @@ jobs: ### Realtime Voice Assistant -LocalAI supports realtime voice interactions , enabling voice assistant applications with real-time speech-to-speech communication. A complete example implementation is available in the [LocalAI-examples repository](https://github.com/mudler/LocalAI-examples/tree/main/realtime). +LocalAI supports realtime voice interactions , enabling voice assistant applications with real-time speech-to-speech communication. A complete example implementation is available in the [LocalAI-examples repository](https://github.com/mudler/LocalAI-examples/tree/main/realtime). For a minimal native client, see the [Go realtime voice assistant demo](https://github.com/localai-org/localai-realtime-demo): a tiny Go client for the Realtime (WebSocket) API with a full talk-back loop and an example tool call, plus a `docker compose` setup that brings up a realtime-capable LocalAI for you. #### Overview @@ -457,7 +457,8 @@ The realtime voice assistant example demonstrates how to build a voice assistant #### Additional Resources -- [Realtime Voice Assistant Example](https://github.com/mudler/LocalAI-examples/tree/main/realtime) +- [Realtime Voice Assistant Example (Python)](https://github.com/mudler/LocalAI-examples/tree/main/realtime) +- [Realtime Voice Assistant Demo (Go)](https://github.com/localai-org/localai-realtime-demo) - [LocalAI Realtime API documentation](/features/) - [Audio features documentation](/features/text-to-audio/) - [Transcription features documentation](/features/audio-to-text/)