mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-01 11:56:57 -04:00
* feat(realtime): EOU-driven semantic_vad turn detection Add a `semantic_vad` turn-detection mode to the realtime API that feeds the transcription model live and decides "the user finished speaking" from the `<EOU>` end-of-utterance token rather than from silence alone. When EOU fires the turn commits immediately (~0.3s); otherwise it falls back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s). Plumbing, bottom to top: - proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof, mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus `TranscriptResult.eou` for the unary retranscribe gate. - pkg/grpc: client/server/base/embed scaffolding for the bidi stream, modeled on AudioTransformStream; release stream conns on terminal Recv. - parakeet-cpp: live transcription RPC with per-C-call engine locking (one live stream per turn, finalize+free at commit); bump parakeet.cpp to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel recompute that delayed EOU on long turns) and the <EOU>/<EOB> split; strip the literal <EOU>/<EOB> from offline text and set Eou. - core/backend: LiveTranscriptionSession wrapper + pipeline `turn_detection:` config block (type/eagerness/retranscribe). - realtime: semantic_vad integration — live input captions streamed as transcription deltas while the user speaks, EOU-immediate commit with eagerness fallback, optional retranscribe gate (batch re-decode must also end in <EOU> to confirm), clause synthesis off the LLM token callback, and per-turn live-transcription / model_load telemetry. - UI: show the realtime pipeline components as a vertical list. Docs and tests included; opt-in via the pipeline YAML or per-session `session.update`. Non-streaming STT backends degrade to silence-only. Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash] Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(realtime): explicit formally-verified state machines + parakeet streaming driver The realtime API had several implicit state machines whose state was inferred from scattered booleans, channels, and five separate mutexes, leaving illegal/inconsistent states reachable. Make them explicit and keep the implementation in step with a formal design; rework the parakeet streaming backend along the same lines. Realtime state machines (M1-M5). Each is a sealed sum-type State/Event/Effect with a total, pure Next(state,event)->(state,[]effect) behind a single-writer Coordinator: M1 conncoord connection lifecycle: VAD toggle + once-only teardown (replaces vadServerStarted + a `done` channel closed from two sites). M2 turncoord turn detection: collapses speechStarted and the live-stream "turn open" flag into one state, so discardTurn can no longer desync them and suppress the next onset. M3 respcoord response coordination: serializes the dual-writer start/cancel so at most one response is live; one response.done per response.create. M4 compactcoord conversation compaction: single-flight (replaces the `compacting atomic.Bool` CAS). M5 ttscoord TTS pipeline: open->closing->closed, idempotent wait(), rejects enqueue-after-close (was a silent drop). The Coordinator/Sink/Next plumbing — only the sealed types and Next differed per machine — is extracted once into core/http/endpoints/openai/coordinator as a generic Coordinator[S,E,F]; each machine keeps its public API via type aliases, so no sink, call-site, or test moved. Hierarchy. session_lifecycle.fizz models M1 as the parent region with its children (M2/M3/M4) as one statechart and asserts ChildrenDieWithParent (conn torn => all children terminal, none start after teardown). respcoord and compactcoord gain an absorbing Terminated state + Shutdown event; conncoord's teardown drives the children terminal. This closes a compaction teardown gap: a fire-and-forget compaction could outlive a torn session — compactionSink now takes a session-scoped cancellable context + WaitGroup and joins the in-flight summarize+evict on shutdown. Formal verification. formal-verification/ holds one authoritative FizzBee spec per machine plus the composition spec, each with an always-assertion and a documented one-line edit that makes the checker fail (verified non-vacuous). scripts/realtime-conformance.sh is fail-closed: all Go conformance suites under -race AND a model-check of every .fizz spec; a missing FizzBee is a hard error (only the loud REALTIME_CONFORMANCE_SKIP_FIZZBEE=1 bypasses it, never in CI). FizzBee is pinned by sha256 and installed via scripts/install-fizzbee.sh into .tools/ (gitignored). Wired as make test-realtime-conformance, a CI workflow, and a pre-commit path filter. Go conformance tests are Ginkgo/Gomega (per the repo's forbidigo lint): transition tables + fixed-seed property walks + concurrent/-race specs, no rapid dependency. Design map: docs/design/realtime-state-machines.md. Parakeet streaming backend. The same treatment applied to the parakeet-cpp streaming paths: - AudioTranscriptionStream returns codes.Unimplemented for non-streaming models instead of decoding offline and emitting it as one delta + final. A client that asked for streaming learns the model cannot stream rather than receiving a batch result shaped like a stream. New grpcerrors.StreamTranscriptionUnsupported carries that signal; the HTTP /v1/audio/transcriptions stream path surfaces it as an SSE error event. Mirrors AudioTranscriptionLive, which already did this. - utteranceBoundary (boundary.go): a single definition of the end-of-utterance latch, replacing three open-coded finalEou toggles. Modelled as a two-valued type so illegal states are unrepresentable. - Shared decode driver (driver.go): streamFeedResult (one per-feed event) + feedChunk (hides the ABI v4 JSON vs text-only split) + feedSlices + flushTail. The feed loop is written once. - AudioTranscriptionLive becomes a bidi adapter: it streams the per-feed {delta,eou,eob,words} the realtime turn detector consumes and a terminal FinalResult carrying only Text. Segments/duration/eou are offline-only and no longer produced (nor read) on the live path; liveTraceState drops the terminal eou and keeps the per-feed eou_events count. - AudioTranscriptionStream + streamJSON merge into one driver-based function; streamSegmenter is generalized to the unified event with a text-only fallback that preserves the legacy (no-words) library's per-utterance segmentation. Verified: build/vet/gofumpt clean, golangci-lint 0 issues, all coordinator and parakeet packages under -race, the fail-closed conformance gate green, and make test-realtime (12 e2e WS+WebRTC). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>
418 lines
13 KiB
Go
418 lines
13 KiB
Go
package main
|
|
|
|
import (
|
|
"sync"
|
|
"time"
|
|
"unsafe"
|
|
|
|
"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
|
|
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
|
. "github.com/onsi/ginkgo/v2"
|
|
. "github.com/onsi/gomega"
|
|
"google.golang.org/grpc/codes"
|
|
"google.golang.org/grpc/status"
|
|
)
|
|
|
|
// The live-RPC specs drive AudioTranscriptionLive entirely against stubbed
|
|
// Cpp* package vars (the same seam batcher_test.go uses), so they run
|
|
// without libparakeet.so.
|
|
|
|
// liveCstrPool hands out NUL-terminated C-style strings backed by Go memory
|
|
// and keeps them alive for the duration of a spec (goStringFromCPtr reads
|
|
// through the raw pointer; Go's GC must not collect the backing array while
|
|
// a stub's return value is in flight).
|
|
type liveCstrPool struct {
|
|
mu sync.Mutex
|
|
bufs [][]byte
|
|
}
|
|
|
|
func (p *liveCstrPool) cstr(s string) uintptr {
|
|
p.mu.Lock()
|
|
defer p.mu.Unlock()
|
|
b := append([]byte(s), 0)
|
|
p.bufs = append(p.bufs, b)
|
|
return uintptr(unsafe.Pointer(&b[0]))
|
|
}
|
|
|
|
// liveStubs swaps every C entry point the live path touches and returns a
|
|
// restore func for AfterEach.
|
|
func liveStubs() (restore func()) {
|
|
savedBegin, savedBeginLang := CppStreamBegin, CppStreamBeginLang
|
|
savedFeed, savedFeedJSON := CppStreamFeed, CppStreamFeedJSON
|
|
savedFinalize, savedFinalizeJSON := CppStreamFinalize, CppStreamFinalizeJSON
|
|
savedFree, savedLastError := CppStreamFree, CppLastError
|
|
savedFreeString := CppFreeString
|
|
return func() {
|
|
CppStreamBegin, CppStreamBeginLang = savedBegin, savedBeginLang
|
|
CppStreamFeed, CppStreamFeedJSON = savedFeed, savedFeedJSON
|
|
CppStreamFinalize, CppStreamFinalizeJSON = savedFinalize, savedFinalizeJSON
|
|
CppStreamFree, CppLastError = savedFree, savedLastError
|
|
CppFreeString = savedFreeString
|
|
}
|
|
}
|
|
|
|
// runLive starts the RPC on its own goroutine and returns the request
|
|
// channel plus a collector for everything the backend emitted.
|
|
func runLive(p *ParakeetCpp) (chan *pb.TranscriptLiveRequest, chan *pb.TranscriptLiveResponse, chan error) {
|
|
in := make(chan *pb.TranscriptLiveRequest)
|
|
out := make(chan *pb.TranscriptLiveResponse, 32)
|
|
errCh := make(chan error, 1)
|
|
go func() { errCh <- p.AudioTranscriptionLive(in, out) }()
|
|
return in, out, errCh
|
|
}
|
|
|
|
func liveConfig(lang string) *pb.TranscriptLiveRequest {
|
|
return &pb.TranscriptLiveRequest{
|
|
Payload: &pb.TranscriptLiveRequest_Config{Config: &pb.TranscriptLiveConfig{Language: lang}},
|
|
}
|
|
}
|
|
|
|
func liveAudio(pcm []float32) *pb.TranscriptLiveRequest {
|
|
return &pb.TranscriptLiveRequest{
|
|
Payload: &pb.TranscriptLiveRequest_Audio{Audio: &pb.TranscriptLiveAudio{Pcm: pcm}},
|
|
}
|
|
}
|
|
|
|
func collectLive(out chan *pb.TranscriptLiveResponse) []*pb.TranscriptLiveResponse {
|
|
var got []*pb.TranscriptLiveResponse
|
|
for r := range out {
|
|
got = append(got, r)
|
|
}
|
|
return got
|
|
}
|
|
|
|
var _ = Describe("AudioTranscriptionLive (stubbed C API)", func() {
|
|
var (
|
|
pool *liveCstrPool
|
|
restore func()
|
|
p *ParakeetCpp
|
|
)
|
|
|
|
BeforeEach(func() {
|
|
pool = &liveCstrPool{}
|
|
restore = liveStubs()
|
|
p = &ParakeetCpp{ctxPtr: 1}
|
|
|
|
CppStreamBeginLang = nil
|
|
CppStreamBegin = func(ctx uintptr) uintptr { return 7 }
|
|
CppStreamFree = func(s uintptr) {}
|
|
CppFreeString = func(s uintptr) {}
|
|
CppLastError = func(ctx uintptr) string { return "stub error" }
|
|
CppStreamFeed = nil
|
|
CppStreamFeedJSON = nil
|
|
CppStreamFinalize = nil
|
|
CppStreamFinalizeJSON = nil
|
|
})
|
|
|
|
AfterEach(func() { restore() })
|
|
|
|
It("rejects a stream whose first message is not a config", func() {
|
|
in, out, errCh := runLive(p)
|
|
in <- liveAudio([]float32{0.1})
|
|
close(in)
|
|
|
|
err := <-errCh
|
|
Expect(status.Code(err)).To(Equal(codes.InvalidArgument))
|
|
Expect(collectLive(out)).To(BeEmpty())
|
|
})
|
|
|
|
It("rejects a non-16k sample rate", func() {
|
|
in, _, errCh := runLive(p)
|
|
in <- &pb.TranscriptLiveRequest{
|
|
Payload: &pb.TranscriptLiveRequest_Config{Config: &pb.TranscriptLiveConfig{SampleRate: 8000}},
|
|
}
|
|
close(in)
|
|
Expect(status.Code(<-errCh)).To(Equal(codes.InvalidArgument))
|
|
})
|
|
|
|
It("returns the typed Unimplemented signal for non-streaming models, before any ack", func() {
|
|
CppStreamBegin = func(ctx uintptr) uintptr { return 0 }
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
close(in)
|
|
|
|
err := <-errCh
|
|
Expect(grpcerrors.IsLiveTranscriptionUnsupported(err)).To(BeTrue())
|
|
Expect(collectLive(out)).To(BeEmpty())
|
|
})
|
|
|
|
It("streams deltas, eou flags and words on the JSON path and finalizes on close", func() {
|
|
var freed []uintptr
|
|
CppStreamFree = func(s uintptr) { freed = append(freed, s) }
|
|
feeds := 0
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr {
|
|
feeds++
|
|
switch feeds {
|
|
case 1:
|
|
return pool.cstr(`{"text":"hello ","eou":0,"frame_sec":0.08,` +
|
|
`"words":[{"w":"hello","start":0.1,"end":0.4,"conf":0.9}]}`)
|
|
default:
|
|
return pool.cstr(`{"text":"world","eou":1,"frame_sec":0.08,` +
|
|
`"words":[{"w":"world","start":0.5,"end":0.8,"conf":0.9}]}`)
|
|
}
|
|
}
|
|
CppStreamFinalizeJSON = func(s uintptr) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("en")
|
|
in <- liveAudio(make([]float32, 100))
|
|
in <- liveAudio(make([]float32, 200))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
Expect(got).To(HaveLen(4)) // ready, two deltas, final
|
|
|
|
Expect(got[0].Ready).To(BeTrue())
|
|
|
|
Expect(got[1].Delta).To(Equal("hello "))
|
|
Expect(got[1].Eou).To(BeFalse())
|
|
Expect(got[1].Words).To(HaveLen(1))
|
|
Expect(got[1].Words[0].Text).To(Equal("hello"))
|
|
|
|
Expect(got[2].Delta).To(Equal("world"))
|
|
Expect(got[2].Eou).To(BeTrue())
|
|
|
|
final := got[3].FinalResult
|
|
Expect(final).NotTo(BeNil())
|
|
Expect(final.Text).To(Equal("hello world"))
|
|
// The live FinalResult carries only Text. Per-utterance segments,
|
|
// duration and the terminal eou flag are an offline-path concern (see
|
|
// boundary.go / AudioTranscriptionStream); the realtime core reads the
|
|
// streamed per-feed tokens above plus this Text.
|
|
Expect(final.Eou).To(BeFalse())
|
|
Expect(final.Segments).To(BeEmpty())
|
|
Expect(final.Duration).To(BeZero())
|
|
|
|
Expect(freed).To(Equal([]uintptr{7}))
|
|
})
|
|
|
|
It("falls back to the text feed (eou out-param) when the JSON entry points are absent", func() {
|
|
feeds := 0
|
|
CppStreamFeed = func(s uintptr, pcm []float32, n int32, eouOut unsafe.Pointer) uintptr {
|
|
feeds++
|
|
if feeds == 2 {
|
|
*(*int32)(eouOut) = 1
|
|
return pool.cstr("done")
|
|
}
|
|
return pool.cstr("first ")
|
|
}
|
|
CppStreamFinalize = func(s uintptr) uintptr { return pool.cstr("") }
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
in <- liveAudio(make([]float32, 10))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
Expect(got).To(HaveLen(4))
|
|
Expect(got[1].Delta).To(Equal("first "))
|
|
Expect(got[1].Eou).To(BeFalse())
|
|
Expect(got[2].Delta).To(Equal("done"))
|
|
Expect(got[2].Eou).To(BeTrue())
|
|
Expect(got[3].FinalResult.Text).To(Equal("first done"))
|
|
})
|
|
|
|
It("forwards <EOB> as eob — a backchannel, never an eou (ABI v5 JSON)", func() {
|
|
feeds := 0
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr {
|
|
feeds++
|
|
if feeds == 1 {
|
|
return pool.cstr(`{"text":"uh-huh","eou":0,"eob":1,"frame_sec":0.08,` +
|
|
`"words":[{"w":"uh-huh","start":0.1,"end":0.3,"conf":0.9}]}`)
|
|
}
|
|
return pool.cstr(`{"text":"the turn","eou":1,"eob":0,"frame_sec":0.08,` +
|
|
`"words":[{"w":"the","start":0.5,"end":0.6,"conf":0.9},{"w":"turn","start":0.6,"end":0.8,"conf":0.9}]}`)
|
|
}
|
|
CppStreamFinalizeJSON = func(s uintptr) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"eob":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
in <- liveAudio(make([]float32, 10))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
Expect(got).To(HaveLen(4))
|
|
Expect(got[1].Eob).To(BeTrue())
|
|
Expect(got[1].Eou).To(BeFalse(), "a backchannel must not masquerade as a turn boundary")
|
|
Expect(got[2].Eou).To(BeTrue())
|
|
})
|
|
|
|
It("maps the v5 eou_out bitmask on the text path (bit0 <EOU>, bit1 <EOB>)", func() {
|
|
feeds := 0
|
|
CppStreamFeed = func(s uintptr, pcm []float32, n int32, eouOut unsafe.Pointer) uintptr {
|
|
feeds++
|
|
if feeds == 1 {
|
|
*(*int32)(eouOut) = 2 // <EOB> only
|
|
return pool.cstr("uh-huh")
|
|
}
|
|
*(*int32)(eouOut) = 1 // <EOU>
|
|
return pool.cstr(" done")
|
|
}
|
|
CppStreamFinalize = func(s uintptr) uintptr { return pool.cstr("") }
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
in <- liveAudio(make([]float32, 10))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
Expect(got).To(HaveLen(4))
|
|
Expect(got[1].Eob).To(BeTrue())
|
|
Expect(got[1].Eou).To(BeFalse())
|
|
Expect(got[2].Eou).To(BeTrue())
|
|
Expect(got[2].Eob).To(BeFalse())
|
|
})
|
|
|
|
It("accumulates trailing text after an EOU into the final transcript", func() {
|
|
feeds := 0
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr {
|
|
feeds++
|
|
if feeds == 1 {
|
|
return pool.cstr(`{"text":"turn one","eou":1,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
return pool.cstr(`{"text":" and more","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
CppStreamFinalizeJSON = func(s uintptr) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
in <- liveAudio(make([]float32, 10))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
final := got[len(got)-1].FinalResult
|
|
Expect(final.Text).To(Equal("turn one and more"))
|
|
})
|
|
|
|
It("resets the decode session on a mid-stream config", func() {
|
|
var begun, freed int
|
|
CppStreamBegin = func(ctx uintptr) uintptr { begun++; return uintptr(10 + begun) }
|
|
CppStreamFree = func(s uintptr) { freed++ }
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr {
|
|
return pool.cstr(`{"text":"x","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
CppStreamFinalizeJSON = func(s uintptr) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
in <- liveConfig("") // reset
|
|
in <- liveAudio(make([]float32, 10))
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
|
|
got := collectLive(out)
|
|
final := got[len(got)-1].FinalResult
|
|
Expect(final.Text).To(Equal("x"), "pre-reset transcript dropped")
|
|
Expect(begun).To(Equal(2))
|
|
Expect(freed).To(Equal(2), "old session freed on reset, new one on unwind")
|
|
})
|
|
|
|
It("does not hold engineMu between feeds (unary work interleaves with a live session)", func() {
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
CppStreamFinalizeJSON = func(s uintptr) uintptr {
|
|
return pool.cstr(`{"text":"","eou":0,"frame_sec":0.08,"words":[]}`)
|
|
}
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
|
|
// The session is open and idle between feeds: the engine lock must be
|
|
// acquirable, which is what lets batched unary transcription proceed
|
|
// mid-session. Under stream-lifetime locking this probe would block
|
|
// until the stream ended and the Eventually would time out.
|
|
locked := make(chan struct{})
|
|
go func() {
|
|
p.engineMu.Lock()
|
|
p.engineMu.Unlock() //nolint:staticcheck // probe: acquire-release proves availability
|
|
close(locked)
|
|
}()
|
|
Eventually(locked, time.Second).Should(BeClosed())
|
|
|
|
close(in)
|
|
Expect(<-errCh).NotTo(HaveOccurred())
|
|
collectLive(out)
|
|
})
|
|
|
|
It("errors out and reads last_error under the lock when a feed fails", func() {
|
|
CppStreamFeedJSON = func(s uintptr, pcm []float32, n int32) uintptr { return 0 }
|
|
|
|
in, out, errCh := runLive(p)
|
|
in <- liveConfig("")
|
|
in <- liveAudio(make([]float32, 10))
|
|
|
|
err := <-errCh
|
|
Expect(err).To(MatchError(ContainSubstring("stub error")))
|
|
got := collectLive(out)
|
|
Expect(got).To(HaveLen(1)) // just the ready ack
|
|
close(in)
|
|
})
|
|
})
|
|
|
|
var _ = Describe("stripEouMarker", func() {
|
|
It("strips a trailing <EOU> and reports it", func() {
|
|
text, eou := stripEouMarker("it is certainly very like the old portrait<EOU>")
|
|
Expect(text).To(Equal("it is certainly very like the old portrait"))
|
|
Expect(eou).To(BeTrue())
|
|
})
|
|
|
|
It("strips a trailing <EOB> WITHOUT reporting an utterance end", func() {
|
|
// A decode ending on a backchannel must not confirm the
|
|
// retranscribe gate — the user was acknowledging, not yielding.
|
|
text, eou := stripEouMarker("uh-huh<EOB>")
|
|
Expect(text).To(Equal("uh-huh"))
|
|
Expect(eou).To(BeFalse())
|
|
})
|
|
|
|
It("leaves marker-free text alone", func() {
|
|
text, eou := stripEouMarker("plain transcript")
|
|
Expect(text).To(Equal("plain transcript"))
|
|
Expect(eou).To(BeFalse())
|
|
})
|
|
|
|
It("does not strip a marker in the middle of the text", func() {
|
|
text, eou := stripEouMarker("a<EOU>b")
|
|
Expect(text).To(Equal("a<EOU>b"))
|
|
Expect(eou).To(BeFalse())
|
|
})
|
|
})
|
|
|
|
var _ = Describe("transcriptResultFromDoc EOU handling", func() {
|
|
It("strips the offline marker from text and sets the result flag", func() {
|
|
doc := transcriptJSON{Text: "the old portrait<EOU>"}
|
|
res := transcriptResultFromDoc(doc, &pb.TranscriptRequest{}, 0)
|
|
Expect(res.Text).To(Equal("the old portrait"))
|
|
Expect(res.Eou).To(BeTrue())
|
|
Expect(res.Segments).To(HaveLen(1))
|
|
Expect(res.Segments[0].Text).To(Equal("the old portrait"))
|
|
})
|
|
|
|
It("reports eou=false for marker-free decodes", func() {
|
|
doc := transcriptJSON{Text: "no marker here"}
|
|
res := transcriptResultFromDoc(doc, &pb.TranscriptRequest{}, 0)
|
|
Expect(res.Text).To(Equal("no marker here"))
|
|
Expect(res.Eou).To(BeFalse())
|
|
})
|
|
})
|