mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-01 20:07:18 -04:00
master
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5d0c43ec6e |
feat(realtime): Semantic VAD EOU token (#10444)
* feat(realtime): EOU-driven semantic_vad turn detection Add a `semantic_vad` turn-detection mode to the realtime API that feeds the transcription model live and decides "the user finished speaking" from the `<EOU>` end-of-utterance token rather than from silence alone. When EOU fires the turn commits immediately (~0.3s); otherwise it falls back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s). Plumbing, bottom to top: - proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof, mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus `TranscriptResult.eou` for the unary retranscribe gate. - pkg/grpc: client/server/base/embed scaffolding for the bidi stream, modeled on AudioTransformStream; release stream conns on terminal Recv. - parakeet-cpp: live transcription RPC with per-C-call engine locking (one live stream per turn, finalize+free at commit); bump parakeet.cpp to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel recompute that delayed EOU on long turns) and the <EOU>/<EOB> split; strip the literal <EOU>/<EOB> from offline text and set Eou. - core/backend: LiveTranscriptionSession wrapper + pipeline `turn_detection:` config block (type/eagerness/retranscribe). - realtime: semantic_vad integration — live input captions streamed as transcription deltas while the user speaks, EOU-immediate commit with eagerness fallback, optional retranscribe gate (batch re-decode must also end in <EOU> to confirm), clause synthesis off the LLM token callback, and per-turn live-transcription / model_load telemetry. - UI: show the realtime pipeline components as a vertical list. Docs and tests included; opt-in via the pipeline YAML or per-session `session.update`. Non-streaming STT backends degrade to silence-only. Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash] Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): explicit formally-verified state machines + parakeet streaming driver The realtime API had several implicit state machines whose state was inferred from scattered booleans, channels, and five separate mutexes, leaving illegal/inconsistent states reachable. Make them explicit and keep the implementation in step with a formal design; rework the parakeet streaming backend along the same lines. Realtime state machines (M1-M5). Each is a sealed sum-type State/Event/Effect with a total, pure Next(state,event)->(state,[]effect) behind a single-writer Coordinator: M1 conncoord connection lifecycle: VAD toggle + once-only teardown (replaces vadServerStarted + a `done` channel closed from two sites). M2 turncoord turn detection: collapses speechStarted and the live-stream "turn open" flag into one state, so discardTurn can no longer desync them and suppress the next onset. M3 respcoord response coordination: serializes the dual-writer start/cancel so at most one response is live; one response.done per response.create. M4 compactcoord conversation compaction: single-flight (replaces the `compacting atomic.Bool` CAS). M5 ttscoord TTS pipeline: open->closing->closed, idempotent wait(), rejects enqueue-after-close (was a silent drop). The Coordinator/Sink/Next plumbing — only the sealed types and Next differed per machine — is extracted once into core/http/endpoints/openai/coordinator as a generic Coordinator[S,E,F]; each machine keeps its public API via type aliases, so no sink, call-site, or test moved. Hierarchy. session_lifecycle.fizz models M1 as the parent region with its children (M2/M3/M4) as one statechart and asserts ChildrenDieWithParent (conn torn => all children terminal, none start after teardown). respcoord and compactcoord gain an absorbing Terminated state + Shutdown event; conncoord's teardown drives the children terminal. This closes a compaction teardown gap: a fire-and-forget compaction could outlive a torn session — compactionSink now takes a session-scoped cancellable context + WaitGroup and joins the in-flight summarize+evict on shutdown. Formal verification. formal-verification/ holds one authoritative FizzBee spec per machine plus the composition spec, each with an always-assertion and a documented one-line edit that makes the checker fail (verified non-vacuous). scripts/realtime-conformance.sh is fail-closed: all Go conformance suites under -race AND a model-check of every .fizz spec; a missing FizzBee is a hard error (only the loud REALTIME_CONFORMANCE_SKIP_FIZZBEE=1 bypasses it, never in CI). FizzBee is pinned by sha256 and installed via scripts/install-fizzbee.sh into .tools/ (gitignored). Wired as make test-realtime-conformance, a CI workflow, and a pre-commit path filter. Go conformance tests are Ginkgo/Gomega (per the repo's forbidigo lint): transition tables + fixed-seed property walks + concurrent/-race specs, no rapid dependency. Design map: docs/design/realtime-state-machines.md. Parakeet streaming backend. The same treatment applied to the parakeet-cpp streaming paths: - AudioTranscriptionStream returns codes.Unimplemented for non-streaming models instead of decoding offline and emitting it as one delta + final. A client that asked for streaming learns the model cannot stream rather than receiving a batch result shaped like a stream. New grpcerrors.StreamTranscriptionUnsupported carries that signal; the HTTP /v1/audio/transcriptions stream path surfaces it as an SSE error event. Mirrors AudioTranscriptionLive, which already did this. - utteranceBoundary (boundary.go): a single definition of the end-of-utterance latch, replacing three open-coded finalEou toggles. Modelled as a two-valued type so illegal states are unrepresentable. - Shared decode driver (driver.go): streamFeedResult (one per-feed event) + feedChunk (hides the ABI v4 JSON vs text-only split) + feedSlices + flushTail. The feed loop is written once. - AudioTranscriptionLive becomes a bidi adapter: it streams the per-feed {delta,eou,eob,words} the realtime turn detector consumes and a terminal FinalResult carrying only Text. Segments/duration/eou are offline-only and no longer produced (nor read) on the live path; liveTraceState drops the terminal eou and keeps the per-feed eou_events count. - AudioTranscriptionStream + streamJSON merge into one driver-based function; streamSegmenter is generalized to the unified event with a text-only fallback that preserves the legacy (no-words) library's per-utterance segmentation. Verified: build/vet/gofumpt clean, golangci-lint 0 issues, all coordinator and parakeet packages under -race, the fail-closed conformance gate green, and make test-realtime (12 e2e WS+WebRTC). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
a7cb587d96 |
feat(parakeet-cpp): real segment timestamps (NeMo-faithful) (#10207)
* feat(parakeet-cpp): real segment timestamps (NeMo-faithful)
Offline: replace the single synthetic whole-clip segment with multiple
segments grouped exactly like NeMo's get_segment_offsets - a new segment
after sentence-ending punctuation ('. ? !'), each carrying start/end and
its time-window token ids. The optional model option segment_gap_threshold
(NeMo's unit: encoder FRAMES, default 0=off) adds NeMo's silence-gap split,
converted to seconds via the JSON frame_sec the engine now reports.
Per-segment words are still gated behind timestamp_granularities=["word"];
a zero-word document falls back to a single text segment.
Streaming: when libparakeet.so exposes the ABI v4 JSON entry points
(probed), drive parakeet_capi_stream_feed_json / _finalize_json and
accumulate the streamed per-word timestamps into per-utterance segments
(EOU stays the boundary), so streaming FinalResult segments now carry
start/end. Falls back to the text-only feed against an older library.
Pure-Go specs cover splitWordsIntoSegments (punctuation + gap rules, NeMo
elif order, fallback), transcriptResultFromDoc (multi-segment, token
windows, word-granularity gate), and the streaming segmenter.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(audio): document parakeet-cpp segment timestamps + segment_gap_threshold
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(parakeet-cpp): update model-gated specs for multi-segment output
The offline AudioTranscription specs asserted the old single synthetic
segment (Segments HaveLen(1), Segments[0].Text == res.Text). With
NeMo-faithful segmentation a multi-sentence clip now yields multiple
punctuation-delimited segments, so assert the new contract instead:
one-or-more time-ordered segments, each with text and (under word
granularity) per-segment words whose span tracks the segment start/end.
Caught by running the model-gated suite on the dgx (GB10) against the
real tdt_ctc-110m + realtime_eou models.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
9d10418593 |
fix(parakeet-cpp): convert audio before the non-batched transcribe path (#10161)
The direct (non-batched) transcription path handed the original upload path straight to the C library via parakeet_capi_transcribe_path_json. That loader only understands 16 kHz mono WAV/PCM, so any other format (MP3, etc.) failed with "parakeet: failed to load audio: <file>". Only the batched path converted the input (via decodeWavMono16k -> utils.AudioToWav). Every other audio backend (whisper, crispasr) converts unconditionally with utils.AudioToWav before handing the file to its engine; the parakeet-cpp fallback was the lone exception. Extract a convertToWavMono16k helper (reused by decodeWavMono16k) that produces a 16 kHz mono WAV in a temp dir, and run the non-batched path through it before calling the C loader. WAV inputs already in the target format are passed through without ffmpeg. Add specs covering the helper (decodable copy + cleanup, and an error on a missing input) that need neither the model, the C library, nor ffmpeg. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
860f9d63ad |
feat(parakeet-cpp): dynamic batching for concurrent transcription requests (#10112)
* feat(parakeet-cpp): dynamic-batching scheduler (queue + dispatcher) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): dynamic batching for AudioTranscription via batched JSON C-API Drop SingleThread; route unary transcription through the in-process batcher which coalesces concurrent requests into one batched engine call. Streaming stays mutually exclusive via engineMu. Adds batch_max_size / batch_max_wait_ms options (size=1 disables; recommended on CPU). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): tear down dispatcher in Free; log batch config; preallocate; clarify stream lock Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): Ginkgo batcher tests; optional batch C-API binding with per-request fallback The batched JSON C-API symbol exists only in newer libparakeet.so (ABI >= 2); probe it with Dlsym and register optionally so the backend still loads against an older library, falling back to per-request transcription. Rewrites the batcher unit tests as Ginkgo/Gomega specs (forbidigo bans t.Fatal in tests). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): debug-log coalesced batch size in runBatch Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): default batch_max_size to 1 (batching opt-in) Dynamic batching now defaults off (batch_max_size:1, one request at a time). Raise batch_max_size to opt in: it is a large throughput win on GPU under concurrent load, but on CPU and low-concurrency setups it only adds latency, so off is the safer default. The startup log now states whether batching is on or off, and the audio-to-text docs are updated to match. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(parakeet-cpp): bump parakeet.cpp to 8a7c482 (batched decode + B=1 fast-path) parakeet.cpp PR #1 merged the batched encoder/decode and the B=1 encoder fast-path to master. Point PARAKEET_VERSION at that commit so the backend builds the batched C-API (parakeet_capi_transcribe_pcm_batch_json) that the dynamic batcher calls; the prior pin (30a3075) predated it, so only the per-request fallback path was exercised. Verified the shared lib builds with the backend's CMake flags and exports the batch symbol. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
4912c9b73a |
feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084)
* feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes *eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and -*-parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add ¶keetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(parakeet-cpp): document the parakeet-cpp transcription backend Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(parakeet-cpp): drop em dashes from comments and docs Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add parakeet-cpp f16 models to the model gallery Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION - goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): init the ggml submodule in the backend image clone The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): statically link ggml into libparakeet.so The shared libparakeet.so linked ggml's shared libs (libggml*.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |