Files
LocalAI/backend/go/crispasr
番茄摔成番茄酱 cf7f9573a2 fix(crispasr): filter garbage words from parakeet word-level timestamps (#10421)
The parakeet-specific word accessors can return stale initialisation
data (model name, binary blobs) for segments with no real speech.
Add isValidWord() to filter out words that have:
- empty or whitespace-only text
- U+FFFD replacement characters (from binary data scrubbing)
- negative timestamps
- zero duration (end <= start)

Also skip empty segments entirely when they have no recognisable
content (empty text AND no valid words), preventing spurious subtitle
entries like '00:45:33,592 --> 00:45:33,592 parakeet@rH\u000b\ufffdI'.

Applies to both AudioTranscription and AudioTranscriptionStream.

Signed-off-by: fqscfqj <fqscfqj@outlook.com>
2026-06-21 17:03:33 +02:00
..