Files
LocalAI/formal-verification/README.md
Richard Palethorpe 5d0c43ec6e feat(realtime): Semantic VAD EOU token (#10444)
* feat(realtime): EOU-driven semantic_vad turn detection

Add a `semantic_vad` turn-detection mode to the realtime API that feeds
the transcription model live and decides "the user finished speaking"
from the `<EOU>` end-of-utterance token rather than from silence alone.
When EOU fires the turn commits immediately (~0.3s); otherwise it falls
back to an eagerness-scaled silence threshold (low/med/high = 8/4/2s).

Plumbing, bottom to top:

- proto: `AudioTranscriptionLive` bidirectional RPC (config-first oneof,
  mono float PCM @16k, ready-ack / Unimplemented degrade signal) plus
  `TranscriptResult.eou` for the unary retranscribe gate.
- pkg/grpc: client/server/base/embed scaffolding for the bidi stream,
  modeled on AudioTransformStream; release stream conns on terminal Recv.
- parakeet-cpp: live transcription RPC with per-C-call engine locking
  (one live stream per turn, finalize+free at commit); bump parakeet.cpp
  to ABI v5 — incremental StreamingMel (no more quadratic per-feed mel
  recompute that delayed EOU on long turns) and the <EOU>/<EOB> split;
  strip the literal <EOU>/<EOB> from offline text and set Eou.
- core/backend: LiveTranscriptionSession wrapper + pipeline
  `turn_detection:` config block (type/eagerness/retranscribe).
- realtime: semantic_vad integration — live input captions streamed as
  transcription deltas while the user speaks, EOU-immediate commit with
  eagerness fallback, optional retranscribe gate (batch re-decode must
  also end in <EOU> to confirm), clause synthesis off the LLM token
  callback, and per-turn live-transcription / model_load telemetry.
- UI: show the realtime pipeline components as a vertical list.

Docs and tests included; opt-in via the pipeline YAML or per-session
`session.update`. Non-streaming STT backends degrade to silence-only.

Assisted-by: Claude Code:claude-opus-4-8 [Read] [Edit] [Write] [Bash]
Assisted-by: Claude Code:claude-fable-5 [Read] [Edit] [Bash]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(realtime): explicit formally-verified state machines + parakeet streaming driver

The realtime API had several implicit state machines whose state was inferred
from scattered booleans, channels, and five separate mutexes, leaving
illegal/inconsistent states reachable. Make them explicit and keep the
implementation in step with a formal design; rework the parakeet streaming
backend along the same lines.

Realtime state machines (M1-M5). Each is a sealed sum-type State/Event/Effect
with a total, pure Next(state,event)->(state,[]effect) behind a single-writer
Coordinator:

  M1 conncoord    connection lifecycle: VAD toggle + once-only teardown
                  (replaces vadServerStarted + a `done` channel closed from
                  two sites).
  M2 turncoord    turn detection: collapses speechStarted and the live-stream
                  "turn open" flag into one state, so discardTurn can no longer
                  desync them and suppress the next onset.
  M3 respcoord    response coordination: serializes the dual-writer
                  start/cancel so at most one response is live; one
                  response.done per response.create.
  M4 compactcoord conversation compaction: single-flight (replaces the
                  `compacting atomic.Bool` CAS).
  M5 ttscoord     TTS pipeline: open->closing->closed, idempotent wait(),
                  rejects enqueue-after-close (was a silent drop).

The Coordinator/Sink/Next plumbing — only the sealed types and Next differed
per machine — is extracted once into core/http/endpoints/openai/coordinator as
a generic Coordinator[S,E,F]; each machine keeps its public API via type
aliases, so no sink, call-site, or test moved.

Hierarchy. session_lifecycle.fizz models M1 as the parent region with its
children (M2/M3/M4) as one statechart and asserts ChildrenDieWithParent (conn
torn => all children terminal, none start after teardown). respcoord and
compactcoord gain an absorbing Terminated state + Shutdown event; conncoord's
teardown drives the children terminal. This closes a compaction teardown gap: a
fire-and-forget compaction could outlive a torn session — compactionSink now
takes a session-scoped cancellable context + WaitGroup and joins the in-flight
summarize+evict on shutdown.

Formal verification. formal-verification/ holds one authoritative FizzBee spec
per machine plus the composition spec, each with an always-assertion and a
documented one-line edit that makes the checker fail (verified non-vacuous).
scripts/realtime-conformance.sh is fail-closed: all Go conformance suites under
-race AND a model-check of every .fizz spec; a missing FizzBee is a hard error
(only the loud REALTIME_CONFORMANCE_SKIP_FIZZBEE=1 bypasses it, never in CI).
FizzBee is pinned by sha256 and installed via scripts/install-fizzbee.sh into
.tools/ (gitignored). Wired as make test-realtime-conformance, a CI workflow,
and a pre-commit path filter. Go conformance tests are Ginkgo/Gomega (per the
repo's forbidigo lint): transition tables + fixed-seed property walks +
concurrent/-race specs, no rapid dependency. Design map:
docs/design/realtime-state-machines.md.

Parakeet streaming backend. The same treatment applied to the parakeet-cpp
streaming paths:
- AudioTranscriptionStream returns codes.Unimplemented for non-streaming models
  instead of decoding offline and emitting it as one delta + final. A client
  that asked for streaming learns the model cannot stream rather than receiving
  a batch result shaped like a stream. New grpcerrors.StreamTranscriptionUnsupported
  carries that signal; the HTTP /v1/audio/transcriptions stream path surfaces it
  as an SSE error event. Mirrors AudioTranscriptionLive, which already did this.
- utteranceBoundary (boundary.go): a single definition of the end-of-utterance
  latch, replacing three open-coded finalEou toggles. Modelled as a two-valued
  type so illegal states are unrepresentable.
- Shared decode driver (driver.go): streamFeedResult (one per-feed event) +
  feedChunk (hides the ABI v4 JSON vs text-only split) + feedSlices + flushTail.
  The feed loop is written once.
- AudioTranscriptionLive becomes a bidi adapter: it streams the per-feed
  {delta,eou,eob,words} the realtime turn detector consumes and a terminal
  FinalResult carrying only Text. Segments/duration/eou are offline-only and no
  longer produced (nor read) on the live path; liveTraceState drops the terminal
  eou and keeps the per-feed eou_events count.
- AudioTranscriptionStream + streamJSON merge into one driver-based function;
  streamSegmenter is generalized to the unified event with a text-only fallback
  that preserves the legacy (no-words) library's per-utterance segmentation.

Verified: build/vet/gofumpt clean, golangci-lint 0 issues, all coordinator and
parakeet packages under -race, the fail-closed conformance gate green, and
make test-realtime (12 e2e WS+WebRTC).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-30 09:01:22 +02:00

143 lines
8.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Formal verification — realtime state machines
Formal designs (FizzBee specs) for the realtime API state machines and the harness
that keeps the Go implementation provably in step with them. Background and
rationale: [../docs/design/realtime-state-machines.md](../docs/design/realtime-state-machines.md) (Part 6).
The design is **authoritative**: behaviour changes go through the spec first, then
the implementation is checked against it. The `realtime-conformance` gate makes
that the path of least resistance — you cannot land a non-conforming change green.
## What's here
| File | Role |
|------|------|
| `response_lifecycle.fizz` | **Authoritative** FizzBee model of machine M3 (response coordination). Model-checked + drives the Go MBT conformance harness. |
| `turn_lifecycle.fizz` | **Authoritative** FizzBee model of machine M2 (turn detection): the speechStarted / turn-open coupling. |
| `conn_lifecycle.fizz` | **Authoritative** FizzBee model of machine M1 (connection lifecycle): VAD toggle + once-only teardown. |
| `compaction.fizz` | **Authoritative** FizzBee model of machine M4 (conversation compaction): single-flight. |
| `tts_pipeline.fizz` | **Authoritative** FizzBee model of machine M5 (TTS pipeline): open->closing->closed, idempotent close. |
| `session_lifecycle.fizz` | **Composition** spec: the M1M5 hierarchy — conn (M1) is the parent; when it is torn down, every child (vad/M2, resp/M3, compaction/M4) is terminal. Models the relationship the per-machine specs can't express. |
| `fizzbee.sha256` | Pinned checksum(s) of the FizzBee release the gate uses (created on first `install-fizzbee.sh` run). |
The implementations under test live in
[`core/http/endpoints/openai/respcoord`](../../../core/http/endpoints/openai/respcoord) (M3),
[`core/http/endpoints/openai/turncoord`](../../../core/http/endpoints/openai/turncoord) (M2),
[`core/http/endpoints/openai/conncoord`](../../../core/http/endpoints/openai/conncoord) (M1),
[`core/http/endpoints/openai/compactcoord`](../../../core/http/endpoints/openai/compactcoord) (M4),
and [`core/http/endpoints/openai/ttscoord`](../../../core/http/endpoints/openai/ttscoord) (M5).
## Running the gate
```sh
make test-realtime-conformance
# or directly:
./scripts/realtime-conformance.sh
```
Two layers, **both required — the gate is fail-closed**:
1. **Go-native conformance** — the `respcoord` + `turncoord` + `conncoord` + `compactcoord` + `ttscoord` transition-table
tests + Ginkgo/Gomega seeded property (random-walk) tests under `-race`
(checks the implementation), plus the shared `coordinator` runtime they all
build on. Also run as part of `make test` (they're ordinary Go packages with a
Ginkgo suite each). The five machines reduce to their sealed State/Event/Effect
types + a pure `Next`; the single-writer Coordinator/Sink plumbing lives once in
`core/http/endpoints/openai/coordinator` (a generic `Coordinator[S,E,F]`).
2. **FizzBee model check** — model-checks the authoritative `.fizz` specs (checks
the design). **A missing FizzBee is a hard failure, not a skip** — otherwise
the design verification silently disappears whenever the tool is inconvenient,
which is the whole thing we're trying to prevent.
FizzBee is pinned and checksum-verified (`fizzbee.sha256`), so "couldn't install"
is not a reason to skip — run `make install-fizzbee`. The **only** way to skip is
the explicit, loud `REALTIME_CONFORMANCE_SKIP_FIZZBEE=1` opt-out, intended for
local work on unrelated code. CI never sets it, and `pre-commit` runs the full
gate whenever `respcoord/**`, `turncoord/**`, `conncoord/**`, `compactcoord/**`, `ttscoord/**`, or `formal-verification/**` is
staged (so a pure `.fizz` edit still re-verifies).
## Installing FizzBee (pinned)
FizzBee is pre-1.0 and single-maintainer, so we pin a version + sha256 and use the
prebuilt release tarball (its primary build is Bazel — it is **not** go-gettable:
the `pkg/modelchecker` library imports the Bazel-internal `fizz/proto` with no
committed `.pb.go`, so a plain `go get` won't build it).
```sh
make install-fizzbee # = scripts/install-fizzbee.sh (default v0.5.2)
```
The four platform assets are pinned by sha256 in `fizzbee.sha256` (digests taken
from the GitHub release); the installer verifies before extracting. Heads-up: the
Linux bundles are large (~290350 MB, because `parser_bin` embeds a full runtime),
macOS ~36 MB. CI caches `.tools/fizzbee` keyed on the pin so it downloads once.
This unpacks a **self-contained** directory under `.tools/fizzbee/` (gitignored):
```
.tools/fizzbee/
fizz -> stable symlink the gate auto-detects
fizzbee-v0.5.2-linux_x86/
fizz # CLI wrapper (entrypoint)
parser/parser_bin # the .fizz frontend, BUNDLED (no system Python needed)
fizzbee # Go model-checker binary
fizz.env # resolves the above paths relative to `fizz`
mbt_gen.zip # MBT generator (this one DOES need system python)
```
Keep the directory intact — `fizz.env` resolves its siblings relative to the
`fizz` wrapper. The gate auto-detects `.tools/fizzbee/fizz`; override with
`FIZZBEE_BIN` only if you installed elsewhere (point it at the `fizz` wrapper,
not the raw `fizzbee` binary).
First `install-fizzbee.sh` run prints the computed sha256; record it in
`fizzbee.sha256` as `<sha256> <asset>` and commit so later runs verify the pin.
> CLI facts (validate against the pinned version — FizzBee is pre-1.0): the CLI
> is `fizz [flags] <spec.fizz>` (default = exhaustive BFS); there is **no `run`
> subcommand**. The checker can print `FAILED`/`DEADLOCK` while still exiting 0,
> so the gate scans output for those markers in addition to the exit code.
> Model-checking needs only the bundled `parser_bin` (no Python); only
> `mbt-scaffold` shells out to system `python`.
## Reproducing the bug the spec catches
Each spec models the **correct** design, so it passes; each documents how to
reproduce the legacy bug it guards against:
- `response_lifecycle.fizz` (M3): change `atomic func start()` to
`serial func start()` — the checker reports `AtMostOneLive` violated (the
dual-writer race). Pinned deterministically in Go by the respcoord
"legacy dual-writer characterization" spec.
- `turn_lifecycle.fizz` (M2): in `Abort`, delete `self.speech = 0` (clear only
the turn, as the legacy `discardTurn` did) — the checker reports `Coupled`
violated (the speechStarted/turn-open desync that suppressed the next onset).
- `conn_lifecycle.fizz` (M1): in `Close`, delete `self.torn = 1` — the checker
reports `TeardownOnce` violated (the legacy double-teardown / double-close
hazard when a session reaches teardown from more than one exit path).
- `compaction.fizz` (M4): in `Trigger`, delete the `if self.active == 0:` guard —
the checker reports `SingleFlight` violated (two goroutines compacting the same
overflow concurrently, the race the `compacting` CAS prevents).
- `tts_pipeline.fizz` (M5): in `Close`, delete the `if self.phase == 0` guard —
the checker reports `WakeOnce` violated (a non-idempotent wait() that wakes /
joins the worker more than once).
- `session_lifecycle.fizz` (hierarchy): in `Teardown`, delete `self.compaction = 2`
— the checker reports `ChildrenDieWithParent` violated. This is the real M4 gap:
a fire-and-forget compaction outliving the torn session. The fix is `conncoord`'s
teardown cancelling + joining each conversation's compaction (and respcoord/
compactcoord gained an absorbing `Terminated` state so no child can start after
teardown).
## Adding another machine
All five mapped machines (M1M5) have landed. To add a new sealed-state machine:
1. Add `<machine>.fizz` here (with an `always assertion`; verify non-vacuity by
breaking one guard and confirming the checker fails).
2. Implement it as a sealed-state package under `core/http/endpoints/openai/`.
3. Add transition-table + Ginkgo/Gomega seeded property conformance tests
(one `*_suite_test.go` bootstrap per package; LocalAI mandates Ginkgo/Gomega).
4. The gate picks up new `*.fizz` specs automatically; add the new Go package to
the `-race` test list in `scripts/realtime-conformance.sh` (and the path
filters in `.githooks/pre-commit` + `.github/workflows/realtime-conformance.yml`).