Files
LocalAI/backend/go/vibevoice-cpp/vibevoicecpp_test.go
LocalAI [bot] 2be07f61da feat(whisper): honor client cancellation via ggml abort_callback (#9710)
* refactor(transcription): propagate request ctx through ModelTranscription*

Replaces context.Background() with the HTTP request ctx so client
disconnects start cancelling the gRPC call. No backend-side abort wiring
yet — that comes in a later commit. Pure plumbing.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(cli): pass ctx to backend.ModelTranscription

Follow-up to e65d3e1f which threaded ctx through ModelTranscription
but missed the CLI caller. CLI commands have no request-scoped ctx,
so context.Background() is correct here.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(audio): propagate request ctx into TTS, sound-gen, audio-transform

Same ctx-plumbing pattern applied to the rest of the audio path. CLI
callers use context.Background() since there is no request scope; HTTP
callers use c.Request().Context().

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(backend): propagate request ctx into biometric, detection, rerank, diarization paths

Replaces remaining context.Background() sites in core/backend with the
caller's ctx. After this commit, every core/backend/*.go entry point
threads the request ctx end-to-end to the gRPC client.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(grpc): plumb ctx through AIModel.AudioTranscription{,Stream}

Adds context.Context as first parameter to the AIModel interface methods
that wrap whisper-style transcription. Server-side gRPC handler now
forwards the per-RPC ctx (server-streaming uses stream.Context()).
Whisper, Voxtral, vibevoice-cpp, and sherpa-onnx accept the parameter;
none uses it yet — the actual cancellation primitive lands in the next
commit so this is pure plumbing.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): add abort_callback hook in the C++ bridge

Installs a std::atomic<int> flag, wires it into
whisper_full_params.abort_callback, and exposes a set_abort(int) C
symbol so Go can flip the flag from a goroutine watching the request
context. transcribe() now distinguishes abort (return 2) from real
whisper_full failure (return 1).

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): register set_abort symbol in the purego loader

Adds the Go-side binding for the new C export so the next commit can
call CppSetAbort(1) from a watcher goroutine on ctx.Done().

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): honor ctx cancellation and return codes.Canceled

A watcher goroutine watches ctx.Done() during AudioTranscription and
calls CppSetAbort(1) on cancel. whisper_full sees abort_callback return
true at the next compute graph step, returns non-zero, and the bridge
returns 2 -> AudioTranscription maps that to codes.Canceled.

Adds an opt-in test (gated on WHISPER_MODEL_PATH / WHISPER_AUDIO_PATH)
that asserts cancellation latency under 5s and proves the abort flag
resets cleanly so the next transcription succeeds.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(whisper): join the cancel watcher goroutine before returning

Follow-up to 85edf9d2. The previous commit used `defer close(done)` and
called the watcher "joined synchronously" — but close() only signals,
it does not block until the goroutine exits. That left a window where
a late CppSetAbort(1) from a cancelled call could land on the next
call, after its C-side g_abort reset but before whisper_full() began
polling the abort callback, corrupting the second transcription.

Switch to a sync.WaitGroup join so wg.Wait() blocks until the watcher
has actually returned from its select.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(whisper): short-circuit pre-cancelled ctx in AudioTranscription

If ctx is already Done() at entry, return codes.Canceled immediately
instead of running the full transcription. The C-side g_abort reset
happens at the start of transcribe() and would otherwise overwrite a
watcher-set abort flag from an already-cancelled ctx, producing a
spurious successful transcription on a request the client has already
abandoned.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests/distributed): update testLLM mock for new AudioTranscription signature

Phase B (93c48e19) added context.Context to AIModel.AudioTranscription
but missed the testLLM mock in tests/e2e/distributed. CI golangci-lint
caught it: *testLLM did not implement grpc.AIModel because the method
signature lacked the ctx parameter, which broke the distributed test
suite compilation and cascaded through every backend-build job that
runs `go build ./...`.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(whisper): port cancellation test to Ginkgo/Gomega

Project policy (.agents/coding-style.md, enforced by golangci-lint
forbidigo) is that all Go tests must use Ginkgo v2 + Gomega — no
stdlib testing patterns (t.Skip, t.Fatalf, etc.). Convert the
cancellation test to a Describe/It block with Skip(...) for env
gating and Expect/HaveOccurred for assertions.

Same coverage: cancel mid-flight returns codes.Canceled within 5s and
a follow-up transcription succeeds, proving the C-side g_abort flag
resets cleanly.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-08 01:44:47 +02:00

383 lines
13 KiB
Go

package main
import (
"context"
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"testing"
"time"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
const (
testAddr = "localhost:50098"
startupWait = 5 * time.Second
)
func TestVibevoiceCpp(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "VibeVoice-cpp Backend Suite")
}
// modelDirOrSkip returns the staged model bundle dir, or Skip()s the
// current spec when VIBEVOICE_MODEL_DIR is unset / lacks the gguf
// files we need. Tests that don't depend on a model (Locking, error
// paths) don't call this.
func modelDirOrSkip() string {
dir := os.Getenv("VIBEVOICE_MODEL_DIR")
if dir == "" {
Skip("VIBEVOICE_MODEL_DIR not set, skipping model-dependent specs")
}
if _, err := os.Stat(filepath.Join(dir, "tokenizer.gguf")); os.IsNotExist(err) {
Skip("tokenizer.gguf missing in " + dir)
}
tts, _ := filepath.Glob(filepath.Join(dir, "vibevoice-realtime-*.gguf"))
asr, _ := filepath.Glob(filepath.Join(dir, "vibevoice-asr-*.gguf"))
if len(tts) == 0 && len(asr) == 0 {
Skip("neither realtime TTS nor ASR gguf found in " + dir)
}
return dir
}
// startServer launches the prebuilt backend binary and returns a
// running *exec.Cmd. test.sh ensures `./vibevoice-cpp` is built; if
// it isn't, every gRPC spec is skipped with a clear reason.
func startServer() *exec.Cmd {
binary := os.Getenv("VIBEVOICE_BINARY")
if binary == "" {
binary = "./vibevoice-cpp"
}
if _, err := os.Stat(binary); os.IsNotExist(err) {
Skip("backend binary not found at " + binary)
}
cmd := exec.Command(binary, "--addr", testAddr)
cmd.Stdout = os.Stderr
cmd.Stderr = os.Stderr
Expect(cmd.Start()).To(Succeed())
time.Sleep(startupWait)
return cmd
}
func stopServer(cmd *exec.Cmd) {
if cmd == nil || cmd.Process == nil {
return
}
_ = cmd.Process.Kill()
_, _ = cmd.Process.Wait()
}
func dialGRPC() *grpc.ClientConn {
conn, err := grpc.Dial(testAddr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultCallOptions(
grpc.MaxCallRecvMsgSize(50*1024*1024),
grpc.MaxCallSendMsgSize(50*1024*1024),
),
)
Expect(err).ToNot(HaveOccurred())
return conn
}
var _ = Describe("VibeVoice-cpp", func() {
Context("backend semantics (no purego load needed)", func() {
It("is locking - the engine has process-global state", func() {
Expect((&VibevoiceCpp{}).Locking()).To(BeTrue())
})
It("rejects Load with empty ModelFile", func() {
err := (&VibevoiceCpp{}).Load(&pb.ModelOptions{})
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("ModelFile"))
})
It("rejects TTS without a loaded TTS model", func() {
err := (&VibevoiceCpp{}).TTS(&pb.TTSRequest{
Text: "no model loaded",
Dst: "/tmp/should-not-be-written.wav",
})
Expect(err).To(HaveOccurred())
})
It("rejects AudioTranscription without a loaded ASR model", func() {
_, err := (&VibevoiceCpp{}).AudioTranscription(context.Background(), &pb.TranscriptRequest{
Dst: "/tmp/some.wav",
})
Expect(err).To(HaveOccurred())
})
It("closes the channel and errors on TTSStream without a loaded model", func() {
ch := make(chan []byte, 4)
err := (&VibevoiceCpp{}).TTSStream(&pb.TTSRequest{
Text: "no model loaded",
Dst: "/tmp/should-not-be-written.wav",
}, ch)
Expect(err).To(HaveOccurred())
// Server hangs forever if the channel stays open; this guard
// is what regresses the e2e DeadlineExceeded we're fixing.
_, ok := <-ch
Expect(ok).To(BeFalse(), "TTSStream must close results channel even on error")
})
// parseOptions + slot fill is the source of the closed-loop CI
// regression where ModelFile=tts.gguf + Options[asr_model=...]
// resulted in a load with empty tts slot. These specs assert
// the slot resolution before we ever call into purego.
Describe("ModelFile slot resolution", func() {
It("fills tts slot from ModelFile when only asr_model is in Options", func() {
v := &VibevoiceCpp{}
v.modelRoot = "/abs/root"
role := v.parseOptions([]string{"asr_model=/abs/root/asr.gguf", "tokenizer=/abs/root/tokenizer.gguf"}, v.modelRoot)
Expect(v.asrModel).To(Equal("/abs/root/asr.gguf"))
Expect(v.ttsModel).To(BeEmpty())
Expect(role).To(BeEmpty())
// Mirror the Load() default-fill block:
if v.ttsModel == "" {
v.ttsModel = "/abs/root/tts.gguf"
}
Expect(v.ttsModel).To(Equal("/abs/root/tts.gguf"))
Expect(v.asrModel).To(Equal("/abs/root/asr.gguf"))
})
It("fills asr slot from ModelFile when type=asr is set", func() {
v := &VibevoiceCpp{}
v.modelRoot = "/abs/root"
role := v.parseOptions([]string{"type=asr", "tokenizer=/abs/root/tokenizer.gguf"}, v.modelRoot)
Expect(role).To(Equal("asr"))
Expect(v.asrModel).To(BeEmpty())
Expect(v.ttsModel).To(BeEmpty())
})
It("respects explicit tts_model override over ModelFile", func() {
v := &VibevoiceCpp{}
v.modelRoot = "/abs/root"
_ = v.parseOptions([]string{"tts_model=/abs/root/alt.gguf"}, v.modelRoot)
Expect(v.ttsModel).To(Equal("/abs/root/alt.gguf"))
})
It("accepts colon-separated options too", func() {
v := &VibevoiceCpp{}
v.modelRoot = "/abs/root"
role := v.parseOptions([]string{"type:asr", "tokenizer:/abs/root/tokenizer.gguf"}, v.modelRoot)
Expect(role).To(Equal("asr"))
Expect(v.tokenizer).To(Equal("/abs/root/tokenizer.gguf"))
})
})
// The gallery flow puts everything under <models_dir>/<entry>/,
// and parameters/options carry paths *relative* to <models_dir>.
// LocalAI core fills opts.ModelPath = <models_dir>; the backend
// must resolve every relative path against that root, never CWD.
Describe("resolvePath (relative-to-modelRoot)", func() {
It("joins relative path onto relTo", func() {
Expect(resolvePath("vibevoice-cpp/tokenizer.gguf", "/data/models")).
To(Equal("/data/models/vibevoice-cpp/tokenizer.gguf"))
})
It("passes absolute paths through unchanged", func() {
Expect(resolvePath("/abs/somewhere/tokenizer.gguf", "/data/models")).
To(Equal("/abs/somewhere/tokenizer.gguf"))
})
It("returns input unchanged when relTo is empty", func() {
Expect(resolvePath("vibevoice-cpp/tokenizer.gguf", "")).
To(Equal("vibevoice-cpp/tokenizer.gguf"))
})
It("returns empty input unchanged", func() {
Expect(resolvePath("", "/data/models")).To(BeEmpty())
})
It("does not consult CWD - bare filenames stay relative to modelRoot", func() {
// Even if the test runs in a directory containing a
// file with this name, the lookup must not fall back
// to CWD. This is the trap the production gallery flow
// would otherwise hit when LocalAI is launched from a
// directory that happens to contain a same-named file.
prev, _ := os.Getwd()
DeferCleanup(func() { _ = os.Chdir(prev) })
tmpCWD, err := os.MkdirTemp("", "vv-cwd-*")
Expect(err).ToNot(HaveOccurred())
DeferCleanup(func() { _ = os.RemoveAll(tmpCWD) })
Expect(os.WriteFile(filepath.Join(tmpCWD, "tokenizer.gguf"),
[]byte("not the real one"), 0o644)).To(Succeed())
Expect(os.Chdir(tmpCWD)).To(Succeed())
got := resolvePath("tokenizer.gguf", "/data/models")
Expect(got).To(Equal("/data/models/tokenizer.gguf"))
})
})
// Round-trip the gallery layout: relative paths in Options +
// an absolute ModelFile (as LocalAI core delivers them) end
// up resolved correctly inside the backend struct.
It("Load resolves relative Options paths against opts.ModelPath", func() {
tmpDir, err := os.MkdirTemp("", "vv-relpath-*")
Expect(err).ToNot(HaveOccurred())
DeferCleanup(func() { _ = os.RemoveAll(tmpDir) })
// Lay out the bundle exactly as the gallery would after install:
// <modelpath>/vibevoice-cpp/{tts,tokenizer,voice}.gguf
subDir := filepath.Join(tmpDir, "vibevoice-cpp")
Expect(os.MkdirAll(subDir, 0o755)).To(Succeed())
tts := filepath.Join(subDir, "vibevoice-realtime-stub.gguf")
tok := filepath.Join(subDir, "tokenizer.gguf")
voice := filepath.Join(subDir, "voice.gguf")
for _, p := range []string{tts, tok, voice} {
Expect(os.WriteFile(p, []byte("stub"), 0o644)).To(Succeed())
}
// Mirror Load()'s pre-purego prefix: parse + slot fill.
v := &VibevoiceCpp{}
modelFile := tts // core delivers this as an abspath already
v.modelRoot = tmpDir
role := v.parseOptions([]string{
"tokenizer=vibevoice-cpp/tokenizer.gguf",
"voice=vibevoice-cpp/voice.gguf",
}, v.modelRoot)
Expect(role).To(BeEmpty())
if v.ttsModel == "" {
v.ttsModel = modelFile
}
Expect(v.ttsModel).To(Equal(tts))
Expect(v.tokenizer).To(Equal(tok))
Expect(v.voice).To(Equal(voice))
Expect(v.asrModel).To(BeEmpty())
})
It("closes the channel and errors on AudioTranscriptionStream without a loaded model", func() {
ch := make(chan *pb.TranscriptStreamResponse, 4)
err := (&VibevoiceCpp{}).AudioTranscriptionStream(context.Background(), &pb.TranscriptRequest{
Dst: "/tmp/some.wav",
}, ch)
Expect(err).To(HaveOccurred())
_, ok := <-ch
Expect(ok).To(BeFalse(), "AudioTranscriptionStream must close results channel even on error")
})
})
Context("gRPC server lifecycle", func() {
var cmd *exec.Cmd
AfterEach(func() {
stopServer(cmd)
cmd = nil
})
It("answers Health checks", func() {
cmd = startServer()
conn := dialGRPC()
defer func() { _ = conn.Close() }()
resp, err := pb.NewBackendClient(conn).Health(context.Background(), &pb.HealthMessage{})
Expect(err).ToNot(HaveOccurred())
Expect(string(resp.Message)).To(Equal("OK"))
})
It("loads the realtime TTS model", func() {
dir := modelDirOrSkip()
tts, _ := filepath.Glob(filepath.Join(dir, "vibevoice-realtime-*.gguf"))
if len(tts) == 0 {
Skip("realtime TTS gguf missing")
}
cmd = startServer()
conn := dialGRPC()
defer func() { _ = conn.Close() }()
// Mirror the gallery contract: ModelFile is whatever LocalAI
// core hands us; ModelPath is the models root; Options[]
// carry paths relative to ModelPath.
resp, err := pb.NewBackendClient(conn).LoadModel(context.Background(), &pb.ModelOptions{
ModelFile: filepath.Base(tts[0]),
ModelPath: dir,
Threads: 4,
Options: []string{"tokenizer=tokenizer.gguf"},
})
Expect(err).ToNot(HaveOccurred())
Expect(resp.Success).To(BeTrue(), "LoadModel msg=%q", resp.Message)
})
It("runs a closed-loop TTS -> ASR with >=80% word recall", func() {
dir := modelDirOrSkip()
tts, _ := filepath.Glob(filepath.Join(dir, "vibevoice-realtime-*.gguf"))
asr, _ := filepath.Glob(filepath.Join(dir, "vibevoice-asr-*.gguf"))
if len(tts) == 0 || len(asr) == 0 {
Skip("closed-loop needs both realtime TTS and ASR ggufs")
}
tmpDir, err := os.MkdirTemp("", "vibevoice-cpp-closedloop-*")
Expect(err).ToNot(HaveOccurred())
DeferCleanup(func() { _ = os.RemoveAll(tmpDir) })
wav := filepath.Join(tmpDir, "say.wav")
cmd = startServer()
conn := dialGRPC()
defer func() { _ = conn.Close() }()
client := pb.NewBackendClient(conn)
// Gallery convention: ModelPath is the models root, every
// path inside Options[] is relative to it.
voiceMatches, _ := filepath.Glob(filepath.Join(dir, "voice-*.gguf"))
loadOpts := &pb.ModelOptions{
ModelFile: filepath.Base(tts[0]),
ModelPath: dir,
Threads: 4,
Options: []string{
"asr_model=" + filepath.Base(asr[0]),
"tokenizer=tokenizer.gguf",
},
}
if len(voiceMatches) > 0 {
loadOpts.Options = append(loadOpts.Options, "voice="+filepath.Base(voiceMatches[0]))
}
loadResp, err := client.LoadModel(context.Background(), loadOpts)
Expect(err).ToNot(HaveOccurred())
Expect(loadResp.Success).To(BeTrue(), "LoadModel msg=%q", loadResp.Message)
srcText := "Hello world this is a test of the synthesis system."
_, err = client.TTS(context.Background(), &pb.TTSRequest{
Text: srcText,
Dst: wav,
})
Expect(err).ToNot(HaveOccurred())
info, err := os.Stat(wav)
Expect(err).ToNot(HaveOccurred())
Expect(info.Size()).To(BeNumerically(">=", 1000),
"TTS produced suspiciously small wav (%d bytes)", info.Size())
resp, err := client.AudioTranscription(context.Background(), &pb.TranscriptRequest{
Dst: wav,
})
Expect(err).ToNot(HaveOccurred())
got := strings.ToLower(resp.Text)
GinkgoWriter.Printf("source : %s\n", srcText)
GinkgoWriter.Printf("transcribed: %s\n", got)
wordRE := regexp.MustCompile(`[a-z]+`)
srcWords := wordRE.FindAllString(strings.ToLower(srcText), -1)
Expect(srcWords).ToNot(BeEmpty())
hits := 0
for _, w := range srcWords {
if strings.Contains(got, w) {
hits++
}
}
recall := float64(hits) / float64(len(srcWords))
GinkgoWriter.Printf("recall: %d/%d = %.2f%%\n", hits, len(srcWords), recall*100)
Expect(recall).To(BeNumerically(">=", 0.80),
"closed-loop recall too low: %d/%d = %.2f%%",
hits, len(srcWords), recall*100)
})
})
})