LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-26 01:16:58 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	fdf475ec5f	feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446 ) * feat(realtime): add pipeline.compaction config + resolution Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): extract itemID helper, reuse in item.retrieve Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): drop duplicate Ginkgo bootstrap, fold specs into openai suite Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.delete Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement input_audio_buffer.clear Add a handler for the input_audio_buffer.clear client event that discards a partially-captured utterance (raw PCM + buffered Opus frames) via a unit-tested clearInputAudio helper, then acks with input_audio_buffer.cleared. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): implement conversation.item.truncate (text) Clears both .Text and .Transcript of the assistant content part at contentIndex so barge-in truncation also works for audio turns whose spoken words live in .Transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): add Conversation.Memory + pair-safe compactionCut Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): compactionCut returns 0 for keep<=0 (no-cap sentinel, avoids panic) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(realtime): gofmt compaction test helper closures Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): inject rolling memory into the prompt + summary builders Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): server-side summarize-then-drop compactor Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(realtime): unit-test prefixMatches eviction-safety predicate Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): resolve summarizer model + schedule compaction per turn Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document conversation compaction + new item events Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): resolve summary model inside compaction goroutine (lazy, off-path) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): reuse reasoning.ExtractReasoningComplete for summary stripping Replace the bespoke <think> regex in the compactor with the shared pkg/reasoning extractor (via spokenReasoningConfig), matching the rest of the realtime path and covering all reasoning tag families, not just <think>. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(config): register pipeline.compaction fields in meta registry TestAllFieldsHaveRegistryEntries requires every ModelConfig field to have a UI/meta registry entry; add the four pipeline.compaction.* leaves so they render with proper labels/descriptions instead of the reflection fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-22 21:28:49 +02:00
LocalAI [bot]	32c47706ae	feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 ) * feat(realtime): add voice_recognition enforce + identity config Add Enforce bool and Identity VoiceIdentityConfig to PipelineVoiceRecognition, plus EnforceGate/IdentityEnabled/ AnnounceEnabled/PersonalizeEnabled helpers. Enforce nil defaults to gating (backward compatible); identity surfacing is independent of the gate. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): add Speaker type and conversation.item.speaker event Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): split voiceGate into Resolve + authorize Split the speaker authorization into a Resolve step (embed once, produce a types.Speaker identity) and a pure authorize policy step, with a 0..100 confidence score mirroring /v1/voice/identify. The legacy Authorize wrapper is kept so existing specs stay green. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): resolve speaker per turn and emit conversation.item.speaker Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): personalize LLM turns with recognized speaker Set the per-message name field on each recognized user turn and append a current-speaker note to the system message, both gated by the voice recognition identity config. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document speaker identity surfacing and personalization Document the new voice_recognition keys (enforce, identity.) and the LocalAI-extension conversation.item.speaker server event in the realtime feature docs. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> test(realtime): cover when:first+identity re-resolution and multi-speaker history Add two integration specs to harden the speaker-aware realtime path: - when:first with an Identity block re-resolves the speaker every turn even though re-authorization is skipped after the first match: a later resolve error now fails closed, while a clean later resolve still surfaces and names the speaker. - multi-speaker history attribution: each user turn carries its own per-message name and the injected system note reflects the latest speaker. Test-only change; no production behavior was modified. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): surface speaker labels in conversation.item.speaker Carry the registered speaker's labels (identify mode) on types.Speaker so they flow into the conversation.item.speaker event and the stored item. Verify mode has no labels, so the field is omitted there. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(e2e): cover conversation.item.speaker over a real websocket Add a realtime-pipeline-identity config (verify mode, enforce:false, identity announce+announce_unknown+personalize) and two e2e specs driving the real server over a real WebSocket with the mock VoiceEmbed backend: an authorized speaker yields a conversation.item.speaker event naming e2e-speaker (matched true) and reaches response.done; an unauthorized speaker yields an unknown (matched false, no name) event and still responds, proving enforce:false never drops a turn. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(config): register voice_recognition enforce + identity fields The meta registry coverage test (TestAllFieldsHaveRegistryEntries) requires every config field to have an entry in core/config/meta/registry.go. The new voice_recognition.enforce and voice_recognition.identity.* fields were missing, failing tests-linux and tests-apple. Add registry entries (toggles) so the fields are surfaced in the model-config editor and the coverage test passes. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-06-21 21:07:10 +02:00
LocalAI [bot]	4ec6e3221e	feat(realtime): gate realtime pipeline voice models behind voice recognition (#10319 ) * feat(realtime): add pipeline voice_recognition gate config schema Add the PipelineVoiceRecognition config block that gates a realtime pipeline behind speaker verification (identify against the voice registry, or verify against reference audios), with Normalize defaults and Validate enum/shape checks. Register the new fields in the config meta registry so the UI renders them with proper labels/components (required by the registry-coverage gate). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(realtime): range-check voice gate threshold and floor UI min Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): add cosineDistance helper for voice gate Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): add voiceGate identify-mode authorization Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * test(realtime): cover voice gate fail-closed error paths Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): add voiceGate verify-mode authorization Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): add voiceGate decide policy helper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): add newVoiceGate constructor Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * feat(realtime): gate pipeline responses behind voice recognition Run speaker verification concurrently with transcription and join on a hard barrier before generateResponse, so unauthorized utterances never reach the LLM, tools, or TTS. Supports identify (registry) and verify (reference) modes with multiple authorized speakers, per-utterance or first-utterance checking, and drop-with-event or silent-drop on reject. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(realtime): harden voice gate goroutine lifecycle Only launch the verification goroutine on the transcription path and drain it before the temp WAV is removed on the transcription-error return, so an in-flight backend read never races the deferred cleanup. Drop the write-only voiceMatched field; log the matched speaker instead. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * docs(realtime): document the voice_recognition pipeline gate Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * fix(realtime): fail closed on an incomplete voice_recognition block A present voice_recognition block with no model previously disabled the gate silently, authorizing every speaker. Treat block presence as the intent signal and reject an empty model in Validate, so the session is refused instead of running unprotected. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] * test(realtime): integration-test the voice gate through commitUtterance Drive the real commitUtterance path (gate goroutine, hard join before the LLM, reject event, when:first session trust) with the existing transport/model doubles: authorized speakers reach a full response, unauthorized ones are dropped before the LLM with a speaker_not_authorized event, backend errors fail closed, drop_silent stays quiet, and when:first trusts the session after one match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 23:38:08 +02:00
Ettore Di Giacinto	3e838c0cff	docs: add realtime voice demo example and refresh README news Add the localai-org/localai-realtime-demo Go client to the README Examples list and to the realtime docs (integrations + realtime feature page). Refresh the Latest News section with June 2026 highlights pulled from history since v4.3.0: realtime pipeline streaming, the parakeet.cpp and CrispASR speech work, new backends (locate-anything.cpp, Ideogram4, llama.cpp video input), and distributed-mode hardening. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-13 20:10:22 +00:00
LocalAI [bot]	892fc49949	feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 ) * feat(realtime): pipeline streaming + disable_thinking config Add a nested pipeline.streaming.{llm,tts,transcription} block plus pipeline.disable_thinking, with StreamLLM/StreamTTS/StreamTranscription/ ThinkingDisabled helpers. Pointer-bools so unset keeps the unary path; existing configs are unaffected. Wiring into the realtime handler follows. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): sentence segmenter for streamed LLM->TTS pipelining streamSegmenter accumulates streamed LLM tokens and emits complete sentence/clause segments (terminator+whitespace, or newline) so TTS can synthesize each segment as it completes instead of waiting for the whole reply. Pure helper; the streaming handler wiring consumes it next. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): streaming TTS/transcription methods on Model interface Add TTSStream and TranscribeStream to the realtime Model interface and implement them on wrappedModel (delegating to backend.ModelTTSStream / ModelTranscriptionStream) and transcriptOnlyModel. ttsStream adapts the backend's WAV-framed stream (44-byte header carrying the sample rate, then PCM) into raw PCM + sample rate for the realtime transports. Handler wiring that consumes these (flag-gated) follows. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): emitSpeech with flag-gated streaming TTS emitSpeech synthesizes a piece of text and forwards audio to the client, streaming one output_audio.delta per backend PCM chunk when the pipeline sets streaming.tts, or one delta for the whole utterance otherwise. WebRTC gets raw PCM (it resamples internally); WebSocket gets base64 PCM at the session rate. It emits no transcript/audio-done events so a streamed reply can be split into multiple spoken segments sharing one response. Adds fakeModel/fakeTransport test doubles for the realtime Model/Transport interfaces, driving streaming assertions deterministically. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): route response audio through emitSpeech (streaming TTS) Replace the inline unary TTS block in the response handler with emitSpeech, which streams a response.output_audio.delta per backend PCM chunk when pipeline.streaming.tts is set and otherwise preserves the single-delta unary behaviour. emitSpeech returns the accumulated base64 audio, stored on the conversation item as before. Transcript and audio-done events stay in the handler so later per-segment streaming can reuse emitSpeech. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): streaming transcription text deltas Add emitTranscription and route commitUtterance through it. With pipeline.streaming.transcription set it streams each transcript fragment as a conversation.item.input_audio_transcription.delta via TranscribeStream then a completed event; otherwise it preserves the single completed-event unary behaviour. Returns the final transcript for response generation. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): pipeline disable_thinking maps to enable_thinking off applyPipelineThinking forces the LLM's ReasoningConfig.DisableReasoning when pipeline.disable_thinking is set, which gRPCPredictOpts turns into the enable_thinking=false backend metadata. Applied at newModel construction on the per-session LLM config copy, so it doesn't leak to other model users and needs no realtime-specific request plumbing. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): speechStreamer for token-streamed LLM->TTS emitSpeech now returns raw PCM (caller base64-encodes) so streamed segments accumulate correctly. speechStreamer consumes streamed LLM tokens: it strips reasoning via the streaming ReasoningExtractor, emits a transcript delta per content fragment, and sentence-pipes content into emitSpeech so each sentence is synthesized as soon as it's ready. Handler wiring (plain-content turns) follows. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): wire streamLLMResponse for token-streamed replies triggerResponseAtTurn takes a streamed path when pipeline.streaming.llm is set, the turn has no tools, and audio is requested: streamLLMResponse announces the assistant item, drives the LLM token callback through a speechStreamer (reasoning-stripped transcript deltas + sentence-piped TTS), and emits the terminal events. Tool turns and non-streaming pipelines keep the existing buffered path unchanged, so this is strictly opt-in. Assisted-by: Claude:claude-opus-4-8 go vet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(realtime): document pipeline streaming + disable_thinking Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): register pipeline streaming/thinking config fields TestAllFieldsHaveRegistryEntries (core/config/meta) requires every config field to have a meta registry entry. The four new pipeline fields (disable_thinking, streaming.{llm,tts,transcription}) had none, failing tests-linux/tests-apple. Add toggle entries for them. Also handle the os.Remove return in realtime_speech_test.go to satisfy errcheck (golangci-lint). Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): always strip reasoning from spoken output disable_thinking maps to ReasoningConfig.DisableReasoning=true on the LLM config, which the backend reads as enable_thinking=false. But the realtime handler reads that SAME config to drive reasoning extraction, and there DisableReasoning=true means "skip stripping". PredictConfig() returns this LLM config, so both the streamed (speechStreamer) and buffered realtime paths stopped stripping <think>…</think> exactly when disable_thinking was on — leaking raw reasoning to the client whenever the model ignored the enable_thinking hint (e.g. lfm2.5). Add spokenReasoningConfig() which clears DisableReasoning for extraction (keeping custom tokens/tag pairs) and route both realtime paths through it. Spoken output now always strips reasoning, independent of the backend suppression hint. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(realtime): clean TTS temp path before read (gosec G304) emitSpeech reads the WAV file the TTS backend wrote. The read moved here from realtime.go, so code-scanning flagged it as a new G304 alert even though the path is backend-controlled (a temp file), not user input. Wrap it in filepath.Clean — a real path normalization that also clears the alert, keeping with the repo's no-#nosec convention. Assisted-by: Claude:claude-opus-4-8 gosec, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(realtime): buffer whole message for TTS, drop sentence segmenter Per review (richiejp): the sentence segmenter pipelined unary TTS by splitting on ASCII .!?/newline, which does nothing for languages without those boundaries (CJK/Thai) — there it already degraded to buffering the whole message anyway. Replace it with a uniform model: stream the LLM transcript live, buffer the full message, then synthesize it once. emitSpeech already streams the audio chunks when the backend implements TTSStream and falls back to a single unary delta otherwise, so this is real streaming TTS where supported and a clean whole-message synthesis elsewhere — no per-sentence emulation, no language assumptions. speechStreamer becomes transcriptStreamer (transcript deltas only); the whole-message synthesis moves into streamLLMResponse. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): stream tool-call turns via tokenizer-template autoparser Per review (richiejp): tool-call deltas exist, so streaming should work with tools too. It does — for models that use their tokenizer template. The C++ autoparser then clears reply.Message and delivers content + tool calls via ChatDeltas, so the streamed transcript carries only spoken content (no tool-call JSON leak) and the tool calls are parsed from the final response. - Drop the len(tools)==0 gate; stream when no tools OR use_tokenizer_template (grammar-based function calling still buffers, since its call is emitted as JSON in the token stream and would leak into the transcript). - streamLLMResponse takes tools/toolChoice/toolTurn, reads ChatDelta content in the token callback, parses tool calls from the final ChatDeltas, and creates the assistant content item lazily so a content-less tool turn emits only the tool calls. - Extract emitToolCallItems from the buffered path so both paths finalize tool calls, response.done, and server-side assistant-tool follow-ups identically. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(realtime): script-aware clause chunking + streamed-reply fixes Opt-in pipeline.streaming.clause_chunking splits the streamed LLM reply into speakable clauses and synthesizes each as soon as it completes, lowering time-to-first-audio instead of buffering the whole message. The splitter is script-aware (rivo/uniseg, pure Go): UAX#29 sentence segmentation handles CJK 。！？ with no whitespace, CJK clause punctuation (，、；：) and Thai/Lao spaces give finer cuts, and a UAX#14 line-break cap bounds an over-long punctuation-less run. Unlike the old ASCII .!?/newline segmenter (dropped in `076dcdbe`) it does not degrade to whole-message buffering for CJK/Thai; scripts needing a dictionary (Khmer/Burmese) stay buffered until a space or end-of-message. Clauses are synthesized synchronously in the token callback (the LLM keeps generating into the gRPC stream meanwhile), so audio still starts mid-generation. Off by default — the whole-message path is unchanged. Also fix the streamed-reply path and the Talk page: - Don't swallow streamed autoparser content as reasoning: the tokenizer-template path already delivers reasoning-free content via ChatDeltas, so prefilling the thinking start token re-tagged it as an unclosed reasoning block, leaving no spoken reply. Disable the prefill on that path; closed tag pairs are still stripped (#9985). - Generate collision-free realtime IDs (16 random bytes) instead of a constant, so per-item bookkeeping (cancel, conversation.item.retrieve) works. - Key the Talk transcript by the server item_id and upsert entries. Realtime events arrive over a WebRTC data channel — outside React's event system — so React defers the setTranscript updaters while synchronous ref writes in handler bodies run first; the old index-tracking ref rendered a duplicate assistant bubble on completion. Upserts by item_id are idempotent and order-independent. - Drop the partial assistant bubble on a cancelled response (barge-in): the server discards the interrupted item and sends response.done with status "cancelled"; mirror that in the UI so the regenerated reply isn't rendered as a second assistant message. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Richard Palethorpe <io@richiejp.com>	2026-06-11 08:43:12 +01:00
LocalAI [bot]	b203b32e57	feat(realtime): make WebRTC ICE candidates configurable (#10231 ) The /v1/realtime WebRTC handler created the peer connection with a bare webrtc.Configuration and no SettingEngine, so pion gathered a host ICE candidate for every local interface. Under Docker host networking that includes bridge addresses (docker0/veth, 172.x) a remote browser cannot route to; the call establishes on a good pair and then drops once ICE consent freshness checks fail on the unreachable candidates. Add two opt-in knobs, applied via a pion SettingEngine: - LOCALAI_WEBRTC_NAT_1TO1_IPS: advertise these IPs as the host candidates (e.g. the host LAN IP) - LOCALAI_WEBRTC_ICE_INTERFACES: restrict ICE gathering to these interfaces Defaults are unchanged (empty => current all-interface behavior). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-09 22:28:03 +02:00
LocalAI [bot]	7e59a5c7c5	docs: architecture & feature diagrams (blueprint style) (#10137 ) * docs: add 'how LocalAI works' architecture diagram Add a blueprint-style architecture diagram: clients -> small core (API, router, WebUI, agents) -> gRPC -> backend processes pulled on demand as OCI images. Place it on the overview page and replace the stale external architecture image on the reference page. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: add blueprint diagrams across feature, distributed & getting-started docs Add 24 architecture/flow/comparison diagrams (PNG + HTML source) under docs/static/images/diagrams/, wired into their docs pages, from an impact-vs-effort audit of the docs. Broaden the API surface on the overview architecture diagram (OpenAI, Anthropic, ElevenLabs, Ollama, and LocalAI's own API) and move the gRPC boundary label clear of the arrows. Pages: distributed mode (architecture, scheduling, ds4 layer-split), distributed inferencing, MLX, realtime, quantization, MCP, agents, mitm & cloud proxy, middleware, reverse-proxy TLS, VRAM, voice & face recognition, reranker, function calling, fine-tuning (recipe + jobs), diarization, audio transform, quickstart, model resolution. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: add composable-core diagram to README hero Commit the composable-core card (small core + on-demand backend tiles) alongside the other diagrams and reference it from the README hero via a repo-relative path, so it renders on GitHub. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs: fix composable-core connectors/badge and federated-vs-worker layout - composable-core: thicken the plug-in connectors so they read clearly, and widen the SEPARATE IMAGE badge so its text no longer overflows the box. - federated-vs-worker: shorten the WHOLE/SPLIT REQUEST pills to fit, and replace the tangled node-to-node activation arrows with a clean fan-out (request split across all sharded nodes), mirroring the federated panel. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-02 18:43:22 +02:00
Ettore Di Giacinto	b0d9ce4905	Remove header from OpenAI Realtime API documentation Removed the header from the Realtime API documentation. Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-09 09:00:28 +02:00
Richard Palethorpe	f9a850c02a	feat(realtime): WebRTC support (#8790 ) * feat(realtime): WebRTC support Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(tracing): Show full LLM opts and deltas Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-13 21:37:15 +01:00
Richard Palethorpe	dd8e74a486	feat(realtime): Add audio conversations (#6245 ) * feat(realtime): Add audio conversations Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(realtime): Vendor the updated API and modify for server side Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): Update to the GA realtime API Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore: Document realtime API and add docs to AGENTS.md Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat: Filter reasoning from spoken output Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(realtime): Send delta and done events for tool calls and audio transcripts Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing. 💘 Generated with Crush Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(realtime): Improve tool call handling and error reporting - Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles - Fix Parameters field handling: support both map[string]any and JSON string formats - Add PredictConfig() method to Model interface for accessing model configuration - Add comprehensive debug logging for tool call parsing and function config - Add missing return statement after prediction error (critical bug fix) - Add warning logs for NoAction function argument parsing failures - Improve error visibility throughout generateResponse function 💘 Generated with Crush Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land> Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-01-29 08:44:53 +01:00

10 Commits