mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-16 20:52:08 -04:00
feat(audio-transform): add LocalVQE backend, bidi gRPC RPC, Studio UI
Introduce a generic "audio transform" capability for any audio-in / audio-out
operation (echo cancellation, noise suppression, dereverberation, voice
conversion, etc.) and ship LocalVQE as the first backend implementation.
Backend protocol:
- Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and
bidirectional AudioTransformStream for low-latency frame-by-frame use.
This is the first bidi stream in the proto; per-frame unary at LocalVQE's
16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server,
embed,interface,base} with paired-channel ergonomics.
LocalVQE backend (backend/go/localvqe/):
- Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream
shared lib + its libggml-cpu-*.so runtime variants directly — no MODULE
wrapper needed because LocalVQE handles CPU feature selection internally
via GGML_BACKEND_DL.
- Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it
LocalVQE runs single-threaded at ~1× realtime instead of the documented
~9.6×.
- Reference-length policy: zero-pad short refs, truncate long ones (the
trailing portion can't have leaked into a mic that wasn't recording).
- Ginkgo test suite (9 always-on specs + 2 model-gated).
HTTP layer:
- POST /audio/transformations (alias /audio/transform): multipart batch
endpoint, accepts audio + optional reference + params[*]=v form fields.
Persists inputs alongside the output in GeneratedContentDir/audio so the
React UI history can replay past (audio, reference, output) triples.
- GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames
(interleaved stereo mic+ref in, mono out). JSON session.update envelope
for config; constants hoisted in core/schema/audio_transform.go.
- ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing
utils.AudioToWav (with passthrough fast-path), so the user can upload any
format / rate without seeing the model's strict 16 kHz constraint.
- BackendTraceAudioTransform integration so /api/backend-traces and the
Traces UI light up with audio_snippet base64 and timing.
- Routes registered under routes/localai.go (LocalAI extension; OpenAI has
no /audio/transformations endpoint), traced via TraceMiddleware.
Auth + capability + importer:
- FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on,
in APIFeatures), three RouteFeatureRegistry rows.
- localvqe added to knownPrefOnlyBackends with modality "audio-transform".
- Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on
huggingface.co/LocalAI-io/LocalVQE).
React UI:
- New /app/transform page surfaced via a dedicated "Enhance" sidebar
section (sibling of Tools / Biometrics) — the page is enhancement, not
generation, so it lives outside Studio. Two AudioInput components
(Upload + Record tabs, drag-drop, mic capture).
- Echo-test button: records mic while playing the loaded reference through
the speakers — the mic naturally picks up speaker bleed, giving a real
(mic, ref) pair for AEC testing without leaving the UI.
- Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls)
and useAudioPeaks hook (shared module-scoped AudioContext to avoid
hitting browser context limits with three players on one page); migrated
TTS, Sound, Traces audio blocks to use it.
- Past runs saved in localStorage via useMediaHistory('audio-transform') —
the history entry stores all three URLs so clicking re-renders the full
triple, not just the output.
Build + e2e:
- 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm,
SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those
two and let GPU-class hardware route through Vulkan in the gallery
capabilities map.
- tests-localvqe-grpc-transform job in test-extra.yml (gated on
detect-changes.outputs.localvqe).
- New audio_transform capability + 4 specs in tests/e2e-backends.
- Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js
(8 specs covering tabs, file upload, multipart shape, history, errors).
Docs:
- New docs/content/features/audio-transform.md covering the (audio,
reference) mental model, batch + WebSocket wire formats, LocalVQE param
keys, and a YAML config example. Cross-links from text-to-audio and
audio-to-text feature pages.
Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
211 lines
9.3 KiB
Docker
211 lines
9.3 KiB
Docker
ARG BASE_IMAGE=ubuntu:24.04
|
|
ARG APT_MIRROR=""
|
|
ARG APT_PORTS_MIRROR=""
|
|
|
|
FROM ${BASE_IMAGE} AS builder
|
|
ARG BACKEND=rerankers
|
|
ARG BUILD_TYPE
|
|
ENV BUILD_TYPE=${BUILD_TYPE}
|
|
ARG CUDA_MAJOR_VERSION
|
|
ARG CUDA_MINOR_VERSION
|
|
ARG SKIP_DRIVERS=false
|
|
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
|
|
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
ARG TARGETARCH
|
|
ARG TARGETVARIANT
|
|
ARG GO_VERSION=1.25.4
|
|
ARG UBUNTU_VERSION=2404
|
|
ARG AMDGPU_TARGETS
|
|
ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
|
|
ARG APT_MIRROR
|
|
ARG APT_PORTS_MIRROR
|
|
|
|
RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
|
|
APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
build-essential \
|
|
gcc-14 g++-14 \
|
|
git ccache \
|
|
ca-certificates \
|
|
make cmake wget libopenblas-dev \
|
|
curl unzip \
|
|
libssl-dev && \
|
|
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100 \
|
|
--slave /usr/bin/g++ g++ /usr/bin/g++-14 \
|
|
--slave /usr/bin/gcov gcov /usr/bin/gcov-14 && \
|
|
apt-get clean && \
|
|
rm -rf /var/lib/apt/lists/*
|
|
|
|
|
|
# Cuda
|
|
ENV PATH=/usr/local/cuda/bin:${PATH}
|
|
|
|
# HipBLAS requirements
|
|
ENV PATH=/opt/rocm/bin:${PATH}
|
|
|
|
|
|
# Vulkan requirements
|
|
RUN <<EOT bash
|
|
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
software-properties-common pciutils wget gpg-agent && \
|
|
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
|
|
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
|
|
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
|
|
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
|
|
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
|
|
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
|
|
if [ "amd64" = "$TARGETARCH" ]; then
|
|
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
|
|
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
|
|
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
|
|
mkdir -p /opt/vulkan-sdk && \
|
|
mv 1.4.335.0 /opt/vulkan-sdk/ && \
|
|
cd /opt/vulkan-sdk/1.4.335.0 && \
|
|
./vulkansdk --no-deps --maxjobs \
|
|
vulkan-loader \
|
|
vulkan-validationlayers \
|
|
vulkan-extensionlayer \
|
|
vulkan-tools \
|
|
shaderc && \
|
|
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
|
|
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
|
|
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
|
|
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
|
|
rm -rf /opt/vulkan-sdk
|
|
fi
|
|
if [ "arm64" = "$TARGETARCH" ]; then
|
|
mkdir vulkan && cd vulkan && \
|
|
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
|
|
tar -xvf vulkan-sdk.tar.xz && \
|
|
rm vulkan-sdk.tar.xz && \
|
|
cd 1.4.335.0 && \
|
|
cp -rfv aarch64/bin/* /usr/bin/ && \
|
|
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
|
|
cp -rfv aarch64/include/* /usr/include/ && \
|
|
cp -rfv aarch64/share/* /usr/share/ && \
|
|
cd ../.. && \
|
|
rm -rf vulkan
|
|
fi
|
|
ldconfig && \
|
|
apt-get clean && \
|
|
rm -rf /var/lib/apt/lists/*
|
|
fi
|
|
EOT
|
|
|
|
# CuBLAS requirements
|
|
RUN <<EOT bash
|
|
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
software-properties-common pciutils
|
|
if [ "amd64" = "$TARGETARCH" ]; then
|
|
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
|
|
fi
|
|
if [ "arm64" = "$TARGETARCH" ]; then
|
|
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
|
|
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
|
|
else
|
|
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
|
|
fi
|
|
fi
|
|
dpkg -i cuda-keyring_1.1-1_all.deb && \
|
|
rm -f cuda-keyring_1.1-1_all.deb && \
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
|
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
|
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
|
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
|
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
|
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
|
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
|
|
apt-get install -y --no-install-recommends \
|
|
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
|
fi
|
|
apt-get clean && \
|
|
rm -rf /var/lib/apt/lists/*
|
|
fi
|
|
EOT
|
|
|
|
|
|
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
|
|
RUN <<EOT bash
|
|
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
|
|
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
|
|
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
|
|
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
|
|
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
|
|
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
|
|
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
|
|
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
|
|
apt-get update && apt-get install -y nvpl
|
|
fi
|
|
EOT
|
|
|
|
# If we are building with clblas support, we need the libraries for the builds
|
|
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
libclblast-dev && \
|
|
apt-get clean && \
|
|
rm -rf /var/lib/apt/lists/* \
|
|
; fi
|
|
|
|
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
|
|
apt-get update && \
|
|
apt-get install -y --no-install-recommends \
|
|
hipblas-dev \
|
|
hipblaslt-dev \
|
|
rocblas-dev && \
|
|
apt-get clean && \
|
|
rm -rf /var/lib/apt/lists/* && \
|
|
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
|
|
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
|
|
ldconfig \
|
|
; fi
|
|
|
|
# Install Go
|
|
RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
|
|
ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin:/usr/local/bin
|
|
|
|
# Install grpc compilers
|
|
RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
|
|
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
|
|
RUN echo "TARGETARCH: $TARGETARCH"
|
|
|
|
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
|
|
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
|
|
# here so that we can generate the grpc code for the stablediffusion build
|
|
RUN <<EOT bash
|
|
if [ "amd64" = "$TARGETARCH" ]; then
|
|
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
|
|
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
|
|
rm protoc.zip
|
|
fi
|
|
if [ "arm64" = "$TARGETARCH" ]; then
|
|
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
|
|
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
|
|
rm protoc.zip
|
|
fi
|
|
EOT
|
|
|
|
RUN if [ "${BACKEND}" = "opus" ]; then \
|
|
apt-get update && apt-get install -y --no-install-recommends libopus-dev pkg-config && \
|
|
apt-get clean && rm -rf /var/lib/apt/lists/*; \
|
|
fi
|
|
|
|
COPY . /LocalAI
|
|
|
|
RUN git config --global --add safe.directory /LocalAI
|
|
|
|
RUN cd /LocalAI && make protogen-go && make -C /LocalAI/backend/go/${BACKEND} build
|
|
|
|
FROM scratch
|
|
ARG BACKEND=rerankers
|
|
|
|
COPY --from=builder /LocalAI/backend/go/${BACKEND}/package/. ./
|