mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-29 19:44:13 -04:00
* feat(vibevoice-cpp): add purego TTS+ASR backend
Wire up Microsoft VibeVoice via the vibevoice.cpp C ABI as a new
purego-based Go backend that serves both Backend.TTS and
Backend.AudioTranscription from a single gRPC binary. Mirrors the
qwen3-tts-cpp / sherpa-onnx pattern so the variant matrix
(cpu/cuda12/cuda13/metal/rocm/sycl-f16/f32/vulkan/l4t) and the
e2e-backends gRPC harness reuse existing infrastructure.
- backend/go/vibevoice-cpp/ - Makefile, CMakeLists, purego shim, gRPC
Backend with model-dir auto-detection, closed-loop TTS->ASR smoke test
- backend/index.yaml - &vibevoicecpp meta + 18 image entries
- Makefile - .NOTPARALLEL, BACKEND_VIBEVOICE_CPP, docker-build wiring,
test-extra-backend-vibevoice-cpp-{tts,transcription} e2e wrappers
- .github/workflows/backend.yml - matrix entries for all variants
- .github/workflows/test-extra.yml - per-backend smoke + 2 gRPC e2e jobs
* feat(vibevoice-cpp): drop hardcoded glob detection, add gallery entries
Refactor backend Load() to follow the standard Options[] convention
used by sherpa-onnx and the rest of the multi-role backends:
ModelFile is the primary gguf, supplementary paths come through
opts.Options[] as key=value (or key:value for Make-target compat),
resolved against opts.ModelPath. type=asr/tts decides the role of
ModelFile when neither tts_model nor asr_model is set explicitly.
Add gallery/index.yaml entries:
- vibevoice-cpp - realtime 0.5B Q8_0 TTS + tokenizer + Carter voice
- vibevoice-cpp-asr - long-form ASR Q8_0 + tokenizer
Both pull from huggingface://mudler/vibevoice.cpp-models with sha256
verification. parameters.model + Options[] paths are siblings under
{models_dir} per the qwen3-tts-cpp convention.
Update Makefile e2e wrappers to pass BACKEND_TEST_OPTIONS comma+colon
style, and tighten the per-backend Go closed-loop test to use the
explicit Options API.
* fix(vibevoice-cpp): force whole-archive link so vv_capi_* exports survive
libvibevoice is a STATIC archive linked into the MODULE library.
Without --whole-archive (or -force_load on Apple, /WHOLEARCHIVE on
MSVC), the linker garbage-collects symbols not referenced from this
translation unit - which means dlopen+RegisterLibFunc panics with
'undefined symbol: vv_capi_load' at backend startup, since purego
looks them up by name and our cpp/govibevoicecpp.cpp doesn't call
them directly.
* test(vibevoice-cpp): rewrite suite with Ginkgo v2
Match the convention used by backend/go/sherpa-onnx/backend_test.go.
The suite now covers backend semantics that don't need purego (Locking,
empty-ModelFile rejection, TTS/ASR-without-loaded-model errors) on top
of the gRPC lifecycle specs (Health, Load, closed-loop TTS->ASR).
Model-dependent specs Skip() when VIBEVOICE_MODEL_DIR is unset, so
`go test ./backend/go/vibevoice-cpp/` is green on a clean checkout
and runs the heavyweight closed-loop spec when test.sh has staged
the bundle.
* fix(vibevoice-cpp): implement TTSStream + AudioTranscriptionStream
The gRPC server's stream handlers (pkg/grpc/server.go) spawn a
goroutine that ranges over a chan; the only thing closing that chan
is the backend's own *Stream method. With the default Base stub
returning 'unimplemented' and never touching the chan, the server
goroutine hangs forever and the client hits DeadlineExceeded - which
is exactly what the e2e harness saw in the test-extra-backend-vibevoice-cpp-tts
matrix run.
TTSStream synthesizes via vv_capi_tts to a tempfile, then emits a
streaming WAV header (chunk sizes 0xFFFFFFFF so HTTP clients can
start playback before the full PCM lands) followed by the PCM body
in 64 KB slices. The header + >=2 PCM frames satisfy the harness's
'expected >=2 chunks' assertion and give a real progressive stream.
AudioTranscriptionStream runs the offline transcription, emits each
segment as a delta, and closes with a final_result whose Text equals
the concatenated deltas (the harness asserts those match).
Two new Ginkgo specs guard the close-channel-on-error path so the
deadline-exceeded regression can't come back silently.
* fix(vibevoice-cpp): silence errcheck on cleanup paths
Lint flagged six unchecked Close()/Remove()/RemoveAll() calls along
purely-cleanup deferred paths. Wrap each in '_ = ...' (or a closure
for defers that take args) - matches what the rest of the LocalAI
backend/go/* tree already does for these callsites.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(vibevoice-cpp): closed-loop slot fill + modelRoot-relative path resolution
Two bugs the test-extra-backend-vibevoice-cpp-* CI matrix surfaced:
1. Closed-loop Load with ModelFile=tts.gguf + Options[asr_model=...] left
v.ttsModel empty, because the default-fill block only ran when BOTH
slots were empty. vv_capi_load then got tts="" + a voice and the
C side rejected it with rc=-3 'TTS model required to load a voice'.
Fix: ModelFile fills the *primary* role-slot (decided by 'type=' in
Options, defaulting to tts) independently of the secondary, so
ModelFile + asr_model resolves to both.
2. resolvePath stat'd CWD before falling back to relTo. With LocalAI
launched from a directory that happens to contain a same-named
file, supplementary Options[] paths could leak away from the
models dir. Drop the CWD probe entirely - relative paths now
*always* join onto opts.ModelPath (the gallery convention).
New Ginkgo coverage:
* 'ModelFile slot resolution' (4 specs) - asr_model+ModelFile, type=asr,
explicit tts_model override, key:value variant.
* 'resolvePath (relative-to-modelRoot)' (5 specs) - join, abs passthrough,
empty input, empty relTo, and the CWD-trap regression test.
* 'Load resolves relative Options paths against opts.ModelPath' - end-
to-end gallery layout round-trip.
Verified locally: 19/19 specs pass (with model bundle, including the
closed-loop TTS->ASR; without bundle, 17 pass + 2 model-dependent skip).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(vibevoice-cpp): use gallery convention in closed-loop spec
The 'loads the realtime TTS model' / closed-loop specs were passing
already-prefixed paths into Options[]:
Options: ['tokenizer=' + filepath.Join(modelDir, 'tokenizer.gguf')]
Combined with no ModelPath set on the request, the backend's
modelRoot fell back to filepath.Dir(ModelFile) = modelDir, then
resolvePath joined the prefixed Options path on top of it -
producing 'vibevoice-models/vibevoice-models/tokenizer.gguf' when
the CI's VIBEVOICE_MODEL_DIR is the relative './vibevoice-models'.
The fix is to mirror the gallery contract LocalAI core actually
sends in production: ModelPath is the models root (absolute),
ModelFile is a name *under* it, every Options[] path is relative
to ModelPath. Uses filepath.Base() to get bare filenames.
Verified locally with both VIBEVOICE_MODEL_DIR=/tmp/vv-bundle (abs)
and VIBEVOICE_MODEL_DIR=vibevoice-models (the relative shape that
broke CI). Both: 19/19 specs pass, ~55-60s.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): switch ASR to Q4_K + bump transcription timeout
The Q8_0 ASR gguf is ~14 GB - too big to fit alongside the runner
image, the docker build cache, and the test artifacts on a free
ubuntu-latest GHA runner; 'test-extra-backend-vibevoice-cpp-transcription'
was getting SIGTERM'd at 90 min before the model could finish loading.
Switch to Q4_K (~10 GB on disk, slightly faster CPU decode) for:
* the e2e harness Make target
* the gallery 'vibevoice-cpp-asr' entry (parameters + files block)
* the per-backend test.sh auto-download list
Bump tests-vibevoice-cpp-grpc-transcription's timeout-minutes from
90 to 150 - even with Q4_K, the 30 s JFK clip on a CPU runner needs
runway above the previous 90 min cap.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): drop transcription gRPC e2e job - too heavy for free runners
The vibevoice ASR is a 7B-parameter model. Even on Q4_K (~10 GB on
disk) a single 30 s transcription saturates the per-test 30 min
timeout in the e2e-backends harness on a 4-core ubuntu-latest, and
the 10 GB download + Docker layer + working space leaves no headroom
on the runner's free disk. Two attempts in CI got SIGTERM'd at the
LoadModel boundary - the bottleneck isn't tunable from the workflow
side without a paid-tier runner.
The per-backend tests-vibevoice-cpp job already runs the same
AudioTranscription path via a closed-loop TTS->ASR Ginkgo spec - same
gRPC contract, same model, single process - so the standalone
tests-vibevoice-cpp-grpc-transcription job was redundant on top of
the disk/CPU pressure.
The Makefile target test-extra-backend-vibevoice-cpp-transcription
stays for local invocation on workstations that can afford it -
useful when developing the streaming codepaths.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): restore transcription gRPC e2e on bigger-runner
Switch tests-vibevoice-cpp-grpc-transcription from ubuntu-latest to
the self-hosted 'bigger-runner' label that GPU image builds in
backend.yml use, plus the documented Free-disk-space prep step (purge
dotnet / ghc / android / CodeQL caches) the disabled vllm/sglang
entries in this file describe. That gives the 7B-param Q4_K ASR
model the disk + CPU runway it needs.
Keep timeout-minutes: 150 - even on a beefier runner the 30 s JFK
decode plus 10 GB download has to fit comfortably.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): apt-get install make on bigger-runner before transcription e2e
bigger-runner is a self-hosted bare runner without the standard
ubuntu image's preinstalled build tools, so the previous job died at
the very first command with 'make: command not found' (exit 127).
Add the Dependencies step that the disabled vllm/sglang entries in
this file already document - apt-get installs make + build-essential
+ curl + unzip + ca-certificates + git + tar before the make target
runs. Mirrors how every other 'runs-on: bigger-runner' entry in
backend.yml prepares the runner.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
129 lines
4.2 KiB
Makefile
129 lines
4.2 KiB
Makefile
CMAKE_ARGS?=
|
|
BUILD_TYPE?=
|
|
NATIVE?=false
|
|
|
|
GOCMD?=go
|
|
GO_TAGS?=
|
|
JOBS?=$(shell nproc --ignore=1)
|
|
|
|
# vibevoice.cpp version
|
|
VIBEVOICE_REPO?=https://github.com/mudler/vibevoice.cpp
|
|
VIBEVOICE_CPP_VERSION?=master
|
|
SO_TARGET?=libgovibevoicecpp.so
|
|
|
|
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
|
|
CMAKE_ARGS+=-DVIBEVOICE_BUILD_TESTS=OFF
|
|
CMAKE_ARGS+=-DVIBEVOICE_BUILD_EXAMPLES=OFF
|
|
|
|
ifeq ($(NATIVE),false)
|
|
CMAKE_ARGS+=-DGGML_NATIVE=OFF
|
|
endif
|
|
|
|
ifeq ($(BUILD_TYPE),cublas)
|
|
CMAKE_ARGS+=-DGGML_CUDA=ON -DVIBEVOICE_GGML_CUDA=ON
|
|
else ifeq ($(BUILD_TYPE),openblas)
|
|
CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
|
|
else ifeq ($(BUILD_TYPE),clblas)
|
|
CMAKE_ARGS+=-DGGML_CLBLAST=ON -DCLBlast_DIR=/some/path
|
|
else ifeq ($(BUILD_TYPE),hipblas)
|
|
CMAKE_ARGS+=-DGGML_HIPBLAS=ON -DVIBEVOICE_GGML_HIPBLAS=ON
|
|
else ifeq ($(BUILD_TYPE),vulkan)
|
|
CMAKE_ARGS+=-DGGML_VULKAN=ON -DVIBEVOICE_GGML_VULKAN=ON
|
|
else ifeq ($(OS),Darwin)
|
|
ifneq ($(BUILD_TYPE),metal)
|
|
CMAKE_ARGS+=-DGGML_METAL=OFF
|
|
else
|
|
CMAKE_ARGS+=-DGGML_METAL=ON -DVIBEVOICE_GGML_METAL=ON
|
|
CMAKE_ARGS+=-DGGML_METAL_EMBED_LIBRARY=ON
|
|
endif
|
|
endif
|
|
|
|
ifeq ($(BUILD_TYPE),sycl_f16)
|
|
CMAKE_ARGS+=-DGGML_SYCL=ON \
|
|
-DCMAKE_C_COMPILER=icx \
|
|
-DCMAKE_CXX_COMPILER=icpx \
|
|
-DGGML_SYCL_F16=ON
|
|
endif
|
|
|
|
ifeq ($(BUILD_TYPE),sycl_f32)
|
|
CMAKE_ARGS+=-DGGML_SYCL=ON \
|
|
-DCMAKE_C_COMPILER=icx \
|
|
-DCMAKE_CXX_COMPILER=icpx
|
|
endif
|
|
|
|
sources/vibevoice.cpp:
|
|
mkdir -p sources/vibevoice.cpp
|
|
cd sources/vibevoice.cpp && \
|
|
git init && \
|
|
git remote add origin $(VIBEVOICE_REPO) && \
|
|
git fetch origin && \
|
|
git checkout $(VIBEVOICE_CPP_VERSION) && \
|
|
git submodule update --init --recursive --depth 1 --single-branch
|
|
|
|
# Detect OS
|
|
UNAME_S := $(shell uname -s)
|
|
|
|
# Only build CPU variants on Linux
|
|
ifeq ($(UNAME_S),Linux)
|
|
VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so
|
|
else
|
|
# On non-Linux (e.g., Darwin), build only fallback variant
|
|
VARIANT_TARGETS = libgovibevoicecpp-fallback.so
|
|
endif
|
|
|
|
vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS)
|
|
CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o vibevoice-cpp ./
|
|
|
|
package: vibevoice-cpp
|
|
bash package.sh
|
|
|
|
build: package
|
|
|
|
clean: purge
|
|
rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp
|
|
|
|
purge:
|
|
rm -rf build*
|
|
|
|
# Variants must build sequentially
|
|
.NOTPARALLEL:
|
|
|
|
# Build all variants (Linux only)
|
|
ifeq ($(UNAME_S),Linux)
|
|
libgovibevoicecpp-avx.so: sources/vibevoice.cpp
|
|
$(info ${GREEN}I vibevoice-cpp build info:avx${RESET})
|
|
SO_TARGET=libgovibevoicecpp-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
|
|
rm -rf build-libgovibevoicecpp-avx.so
|
|
|
|
libgovibevoicecpp-avx2.so: sources/vibevoice.cpp
|
|
$(info ${GREEN}I vibevoice-cpp build info:avx2${RESET})
|
|
SO_TARGET=libgovibevoicecpp-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgovibevoicecpp-custom
|
|
rm -rf build-libgovibevoicecpp-avx2.so
|
|
|
|
libgovibevoicecpp-avx512.so: sources/vibevoice.cpp
|
|
$(info ${GREEN}I vibevoice-cpp build info:avx512${RESET})
|
|
SO_TARGET=libgovibevoicecpp-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgovibevoicecpp-custom
|
|
rm -rf build-libgovibevoicecpp-avx512.so
|
|
endif
|
|
|
|
# Build fallback variant (all platforms)
|
|
libgovibevoicecpp-fallback.so: sources/vibevoice.cpp
|
|
$(info ${GREEN}I vibevoice-cpp build info:fallback${RESET})
|
|
SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
|
|
rm -rf build-libgovibevoicecpp-fallback.so
|
|
|
|
libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h
|
|
mkdir -p build-$(SO_TARGET) && \
|
|
cd build-$(SO_TARGET) && \
|
|
cmake .. $(CMAKE_ARGS) && \
|
|
cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \
|
|
cd .. && \
|
|
mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET)
|
|
|
|
test: vibevoice-cpp
|
|
@echo "Running vibevoice-cpp tests..."
|
|
bash test.sh
|
|
@echo "vibevoice-cpp tests completed."
|
|
|
|
all: vibevoice-cpp package
|