* feat(vibevoice-cpp): add purego TTS+ASR backend
Wire up Microsoft VibeVoice via the vibevoice.cpp C ABI as a new
purego-based Go backend that serves both Backend.TTS and
Backend.AudioTranscription from a single gRPC binary. Mirrors the
qwen3-tts-cpp / sherpa-onnx pattern so the variant matrix
(cpu/cuda12/cuda13/metal/rocm/sycl-f16/f32/vulkan/l4t) and the
e2e-backends gRPC harness reuse existing infrastructure.
- backend/go/vibevoice-cpp/ - Makefile, CMakeLists, purego shim, gRPC
Backend with model-dir auto-detection, closed-loop TTS->ASR smoke test
- backend/index.yaml - &vibevoicecpp meta + 18 image entries
- Makefile - .NOTPARALLEL, BACKEND_VIBEVOICE_CPP, docker-build wiring,
test-extra-backend-vibevoice-cpp-{tts,transcription} e2e wrappers
- .github/workflows/backend.yml - matrix entries for all variants
- .github/workflows/test-extra.yml - per-backend smoke + 2 gRPC e2e jobs
* feat(vibevoice-cpp): drop hardcoded glob detection, add gallery entries
Refactor backend Load() to follow the standard Options[] convention
used by sherpa-onnx and the rest of the multi-role backends:
ModelFile is the primary gguf, supplementary paths come through
opts.Options[] as key=value (or key:value for Make-target compat),
resolved against opts.ModelPath. type=asr/tts decides the role of
ModelFile when neither tts_model nor asr_model is set explicitly.
Add gallery/index.yaml entries:
- vibevoice-cpp - realtime 0.5B Q8_0 TTS + tokenizer + Carter voice
- vibevoice-cpp-asr - long-form ASR Q8_0 + tokenizer
Both pull from huggingface://mudler/vibevoice.cpp-models with sha256
verification. parameters.model + Options[] paths are siblings under
{models_dir} per the qwen3-tts-cpp convention.
Update Makefile e2e wrappers to pass BACKEND_TEST_OPTIONS comma+colon
style, and tighten the per-backend Go closed-loop test to use the
explicit Options API.
* fix(vibevoice-cpp): force whole-archive link so vv_capi_* exports survive
libvibevoice is a STATIC archive linked into the MODULE library.
Without --whole-archive (or -force_load on Apple, /WHOLEARCHIVE on
MSVC), the linker garbage-collects symbols not referenced from this
translation unit - which means dlopen+RegisterLibFunc panics with
'undefined symbol: vv_capi_load' at backend startup, since purego
looks them up by name and our cpp/govibevoicecpp.cpp doesn't call
them directly.
* test(vibevoice-cpp): rewrite suite with Ginkgo v2
Match the convention used by backend/go/sherpa-onnx/backend_test.go.
The suite now covers backend semantics that don't need purego (Locking,
empty-ModelFile rejection, TTS/ASR-without-loaded-model errors) on top
of the gRPC lifecycle specs (Health, Load, closed-loop TTS->ASR).
Model-dependent specs Skip() when VIBEVOICE_MODEL_DIR is unset, so
`go test ./backend/go/vibevoice-cpp/` is green on a clean checkout
and runs the heavyweight closed-loop spec when test.sh has staged
the bundle.
* fix(vibevoice-cpp): implement TTSStream + AudioTranscriptionStream
The gRPC server's stream handlers (pkg/grpc/server.go) spawn a
goroutine that ranges over a chan; the only thing closing that chan
is the backend's own *Stream method. With the default Base stub
returning 'unimplemented' and never touching the chan, the server
goroutine hangs forever and the client hits DeadlineExceeded - which
is exactly what the e2e harness saw in the test-extra-backend-vibevoice-cpp-tts
matrix run.
TTSStream synthesizes via vv_capi_tts to a tempfile, then emits a
streaming WAV header (chunk sizes 0xFFFFFFFF so HTTP clients can
start playback before the full PCM lands) followed by the PCM body
in 64 KB slices. The header + >=2 PCM frames satisfy the harness's
'expected >=2 chunks' assertion and give a real progressive stream.
AudioTranscriptionStream runs the offline transcription, emits each
segment as a delta, and closes with a final_result whose Text equals
the concatenated deltas (the harness asserts those match).
Two new Ginkgo specs guard the close-channel-on-error path so the
deadline-exceeded regression can't come back silently.
* fix(vibevoice-cpp): silence errcheck on cleanup paths
Lint flagged six unchecked Close()/Remove()/RemoveAll() calls along
purely-cleanup deferred paths. Wrap each in '_ = ...' (or a closure
for defers that take args) - matches what the rest of the LocalAI
backend/go/* tree already does for these callsites.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(vibevoice-cpp): closed-loop slot fill + modelRoot-relative path resolution
Two bugs the test-extra-backend-vibevoice-cpp-* CI matrix surfaced:
1. Closed-loop Load with ModelFile=tts.gguf + Options[asr_model=...] left
v.ttsModel empty, because the default-fill block only ran when BOTH
slots were empty. vv_capi_load then got tts="" + a voice and the
C side rejected it with rc=-3 'TTS model required to load a voice'.
Fix: ModelFile fills the *primary* role-slot (decided by 'type=' in
Options, defaulting to tts) independently of the secondary, so
ModelFile + asr_model resolves to both.
2. resolvePath stat'd CWD before falling back to relTo. With LocalAI
launched from a directory that happens to contain a same-named
file, supplementary Options[] paths could leak away from the
models dir. Drop the CWD probe entirely - relative paths now
*always* join onto opts.ModelPath (the gallery convention).
New Ginkgo coverage:
* 'ModelFile slot resolution' (4 specs) - asr_model+ModelFile, type=asr,
explicit tts_model override, key:value variant.
* 'resolvePath (relative-to-modelRoot)' (5 specs) - join, abs passthrough,
empty input, empty relTo, and the CWD-trap regression test.
* 'Load resolves relative Options paths against opts.ModelPath' - end-
to-end gallery layout round-trip.
Verified locally: 19/19 specs pass (with model bundle, including the
closed-loop TTS->ASR; without bundle, 17 pass + 2 model-dependent skip).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(vibevoice-cpp): use gallery convention in closed-loop spec
The 'loads the realtime TTS model' / closed-loop specs were passing
already-prefixed paths into Options[]:
Options: ['tokenizer=' + filepath.Join(modelDir, 'tokenizer.gguf')]
Combined with no ModelPath set on the request, the backend's
modelRoot fell back to filepath.Dir(ModelFile) = modelDir, then
resolvePath joined the prefixed Options path on top of it -
producing 'vibevoice-models/vibevoice-models/tokenizer.gguf' when
the CI's VIBEVOICE_MODEL_DIR is the relative './vibevoice-models'.
The fix is to mirror the gallery contract LocalAI core actually
sends in production: ModelPath is the models root (absolute),
ModelFile is a name *under* it, every Options[] path is relative
to ModelPath. Uses filepath.Base() to get bare filenames.
Verified locally with both VIBEVOICE_MODEL_DIR=/tmp/vv-bundle (abs)
and VIBEVOICE_MODEL_DIR=vibevoice-models (the relative shape that
broke CI). Both: 19/19 specs pass, ~55-60s.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): switch ASR to Q4_K + bump transcription timeout
The Q8_0 ASR gguf is ~14 GB - too big to fit alongside the runner
image, the docker build cache, and the test artifacts on a free
ubuntu-latest GHA runner; 'test-extra-backend-vibevoice-cpp-transcription'
was getting SIGTERM'd at 90 min before the model could finish loading.
Switch to Q4_K (~10 GB on disk, slightly faster CPU decode) for:
* the e2e harness Make target
* the gallery 'vibevoice-cpp-asr' entry (parameters + files block)
* the per-backend test.sh auto-download list
Bump tests-vibevoice-cpp-grpc-transcription's timeout-minutes from
90 to 150 - even with Q4_K, the 30 s JFK clip on a CPU runner needs
runway above the previous 90 min cap.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): drop transcription gRPC e2e job - too heavy for free runners
The vibevoice ASR is a 7B-parameter model. Even on Q4_K (~10 GB on
disk) a single 30 s transcription saturates the per-test 30 min
timeout in the e2e-backends harness on a 4-core ubuntu-latest, and
the 10 GB download + Docker layer + working space leaves no headroom
on the runner's free disk. Two attempts in CI got SIGTERM'd at the
LoadModel boundary - the bottleneck isn't tunable from the workflow
side without a paid-tier runner.
The per-backend tests-vibevoice-cpp job already runs the same
AudioTranscription path via a closed-loop TTS->ASR Ginkgo spec - same
gRPC contract, same model, single process - so the standalone
tests-vibevoice-cpp-grpc-transcription job was redundant on top of
the disk/CPU pressure.
The Makefile target test-extra-backend-vibevoice-cpp-transcription
stays for local invocation on workstations that can afford it -
useful when developing the streaming codepaths.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): restore transcription gRPC e2e on bigger-runner
Switch tests-vibevoice-cpp-grpc-transcription from ubuntu-latest to
the self-hosted 'bigger-runner' label that GPU image builds in
backend.yml use, plus the documented Free-disk-space prep step (purge
dotnet / ghc / android / CodeQL caches) the disabled vllm/sglang
entries in this file describe. That gives the 7B-param Q4_K ASR
model the disk + CPU runway it needs.
Keep timeout-minutes: 150 - even on a beefier runner the 30 s JFK
decode plus 10 GB download has to fit comfortably.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(vibevoice-cpp): apt-get install make on bigger-runner before transcription e2e
bigger-runner is a self-hosted bare runner without the standard
ubuntu image's preinstalled build tools, so the previous job died at
the very first command with 'make: command not found' (exit 127).
Add the Dependencies step that the disabled vllm/sglang entries in
this file already document - apt-get installs make + build-essential
+ curl + unzip + ca-certificates + git + tar before the make target
runs. Mirrors how every other 'runs-on: bigger-runner' entry in
backend.yml prepares the runner.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: add golangci-lint with new-from-merge-base baseline
Configure golangci-lint v2 with the standard linter set (errcheck, govet,
ineffassign, unused) plus forbidigo, which enforces the Ginkgo/Gomega-only
test convention from .agents/coding-style.md by rejecting stdlib testing
calls (t.Errorf, t.Fatalf, t.Run, ...). staticcheck is disabled — the
codebase has many pre-existing QF-style suggestions not worth gating on.
issues.new-from-merge-base = master makes the lint job a gate for new
issues only; the ~1300 pre-existing baseline stays visible via
'make lint-all' for incremental cleanup. CI runs 'make lint'.
Backends needing C/C++ headers we don't install in the lint runner are
excluded via a deny list in the Makefile (backend/go/{piper,silero-vad,
llm}, cmd/launcher). Discovery still flows through 'go list ./...', so
new packages are scanned automatically.
To make backend/go/{sam3-cpp,stablediffusion-ggml,whisper} typecheckable,
move their .cpp/.h sources into cpp/ subdirs (matching qwen3-tts-cpp /
acestep-cpp). Without this 'go list' rejects the package because Go does
not allow .cpp alongside .go without cgo.
Fix two real bugs found by lint in tests/integration/ (run only via
'make test-stores', not default CI): a stale zerolog reference left over
from the slog migration (c37785b7) and an unused 'os' import.
Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* ci(lint): generate proto sources and fetch full history
The lint job was failing for two reasons:
- pkg/grpc/proto/*.go is generated, not checked in. Several packages
import it, so without 'make protogen-go' typecheck fails project-wide
with "no required module provides package github.com/mudler/LocalAI/
pkg/grpc/proto".
- golangci-lint's new-from-merge-base needs to git-merge-base the PR
against master, but actions/checkout's default shallow clone doesn't
fetch master. fetch-depth: 0 brings full history; the config now
references origin/master (the remote-tracking branch that survives
the shallow checkout) instead of bare master (which doesn't exist
locally after checkout).
Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* ci(lint): stub react-ui/dist for go:embed glob
core/http/app.go has //go:embed react-ui/dist/*. The glob must match at
least one non-hidden entry or typecheck fails the whole core/http
package. We don't need the real React bundle to lint Go code, so just
touch an empty index.html to satisfy the embed.
Assisted-by: Claude Code:Opus 4.7 (1M) [Bash] [Read] [Edit] [Write]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
feat(backend): Add Sherpa ONNX backend and Omnilingual ASR
Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same
approach as opus/stablediffusion-ggml/whisper — a thin C shim
(csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego
can't reach directly: nested struct config writes, result-struct field
reads, and the streaming TTS callback trampoline. The Go side uses
opaque uintptr handles and purego.NewCallback for the TTS callback.
Supports:
- VAD via sherpa-onnx's Silero VAD
- Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC
- Online/streaming ASR: zipformer transducer with endpoint detection
(AudioTranscriptionStream emits delta events during decode)
- Offline TTS: VITS (LJS, etc.)
- Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel,
prefixed by a streaming WAV header
Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline
ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR),
silero-vad-sherpa, vits-ljs-sherpa.
E2E coverage: tests/e2e-backends for offline + streaming ASR,
tests/e2e for the full realtime pipeline (VAD + STT + TTS).
Assisted-by: claude-opus-4-7-1M [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(react-ui): add Face & Voice Recognition pages
Expose the face and voice biometrics endpoints
(/v1/face/*, /v1/voice/*) through the React UI. Each page has four
tabs driving the six endpoints per modality: Analyze (demographics
with bounding boxes / waveform segments), Compare (verify with a
match gauge and live threshold slider), Enrollment (register /
identify / forget with a top-K matches view), Embedding (raw
vector inspector with sparkline + copy).
MediaInput supports file upload plus live capture: webcam
snap-to-canvas for face, MediaRecorder -> AudioContext ->
16-bit PCM mono WAV transcode for voice (libsndfile on the
backend only handles WAV/FLAC/OGG natively).
Sidebar gets a new Biometrics section feature-gated on
face_recognition / voice_recognition; routes are wrapped in
<RequireFeature>. No new dependencies -- Font Awesome icons
picked from the Free set.
Assisted-by: Claude:Opus 4.7
* fix(localai): accept data URI prefixes with codec/charset params
Browser MediaRecorder produces data URIs like
data:audio/webm;codecs=opus;base64,...
so the pre-';base64,' section can carry multiple parameter
segments. The `^data:([^;]+);base64,` regex in pkg/utils/base64.go
and core/http/endpoints/localai/audio.go only matched exactly one
segment, so recordings straight from the React UI's live-capture
tab failed the strip and then tripped the base64 decoder on the
leading 'data:' literal, surfacing as
"invalid audio base64: illegal base64 data at input byte 4"
Widened both regexes to `^data:[^,]+?;base64,` so any number of
';param=value' segments between the mime type and ';base64,' are
tolerated. Added a regression test covering the MediaRecorder
shape.
Assisted-by: Claude:Opus 4.7
* fix(insightface): scope pack ONNX loading to known manifests
LocalAI's gallery extracts buffalo_* zips flat into the models
directory, which inevitably mixes with ONNX files from other
backends (opencv face engine, MiniFASNet antispoof, WeSpeaker
voice embedding) and older buffalo pack installs. Feeding those
foreign files into insightface's model_zoo.get_model() blows up
inside the router -- it assumes a 4-D NCHW input and indexes
`input_shape[2]` on tensors that aren't shaped like a face model,
raising IndexError mid-load and leaving the backend unusable.
The router's dispatch isn't amenable to per-file try/except alone
(first-file-wins picks det_10g.onnx from buffalo_l even when the
user asked for buffalo_sc -- alphabetical order happens to favour
the wrong pack). Instead, ship an explicit manifest of the
upstream v0.7 pack contents and scope the glob to that when the
requested pack is known. The manifest is small and stable; future
packs can be added alongside or fall through to the tolerance
loop, which also swallows any remaining IndexError / ValueError
from foreign files with a clear `[insightface] skipped` stderr
line for diagnostics.
Assisted-by: Claude:Opus 4.7
* fix(speaker-recognition): extract FBank features for rank-3 ONNX encoders
Pre-exported speaker-encoder ONNX graphs come in two shapes:
rank-2 [batch, samples] -- some 3D-Speaker exports,
take raw waveform directly.
rank-3 [batch, frames, n_mels] -- WeSpeaker and most Kaldi-
lineage encoders, expect
pre-computed Kaldi FBank.
OnnxDirectEngine unconditionally fed `audio.reshape(1, -1)` --
correct for rank-2, IndexError-on-input_shape[3] on rank-3, which
surfaced to the UI as
"Invalid rank for input: feats Got: 2 Expected: 3"
Detect the input rank at session init and run Kaldi FBank
(80-dim, 25ms/10ms frames, dither=0.0, per-utterance CMN) before
the forward pass when rank>=3. All knobs are configurable via
backend options for encoders that deviate from defaults.
torchaudio.compliance.kaldi is already in the backend's
requirements (SpeechBrain pulls torchaudio in), so no new
dependency.
Assisted-by: Claude:Opus 4.7
* fix(biometrics): isolate face and voice vector stores
Face (ArcFace, 512-D) and voice (ECAPA-TDNN 192-D / WeSpeaker
256-D) biometric embeddings were colliding inside a single
in-memory local-store instance. Enrolling one after the other
failed with
"Try to add key with length N when existing length is M"
because local-store correctly refuses to mix dimensions in one
keyspace.
The registries were constructed with `storeName=""`, which in
StoreBackend() is just a WithModel() call. But ModelLoader's
cache is keyed on `modelID`, not `model` -- so both registries
collapsed to the same `modelID=""` slot and reused the same
backend process despite looking isolated on paper.
Three complementary fixes:
1. application.go -- give each registry a distinct default
namespace ("localai-face-biometrics" /
"localai-voice-biometrics"). The comment claimed
isolation, now it's actually enforced.
2. stores.go -- pass the storeName as both WithModelID and
WithModel so the ModelLoader cache key separates
namespaces and the loader spawns distinct processes.
3. local-store/store.go -- drop the Load() `opts.Model != ""`
guard. It was there to prevent generic model-loading loops
from picking up local-store by accident, but that auto-load
path is being retired; the guard now just blocks legitimate
namespace isolation. opts.Model is treated as a tag; the
per-tuple process isolation upstream handles discrimination.
Assisted-by: Claude:Opus 4.7
* fix(gallery): stale-file cleanup and upgrade-tmp directory safety
Two related robustness fixes for backend install/upgrade:
pkg/downloader/uri.go
OCI downloads passed through
if filepath.Ext(filePath) != "" ...
filePath = filepath.Dir(filePath)
which was intended to redirect file-shaped download targets
into their parent directory for OCI extraction. The heuristic
misfires on directory-shaped paths with a dot-suffix --
gallery.UpgradeBackend uses
tmpPath = "<backendsPath>/<name>.upgrade-tmp"
and Go's filepath.Ext treats ".upgrade-tmp" as an extension.
The rewrite landed the extraction at "<backendsPath>/", which
then **overwrote the real install** (backends/<name>/) with a
flat-layout file and left a stray run.sh at the top level. The
tmp dir itself stayed empty, so the validation step that
checked "<tmpPath>/run.sh" predictably failed with
"upgrade validation failed: run.sh not found in new backend"
Every manual upgrade silently corrupted the backends tree this
way. Guard the rewrite behind "target isn't already an existing
directory" -- InstallBackend / UpgradeBackend both pre-create
the target as a directory, so they get the correct behaviour;
existing file-path callers with a genuine dot-extension still
get the parent redirect.
core/gallery/backends.go
InstallBackend's MkdirAll returned ENOTDIR when something at
the target path was already a file (legacy dev builds dropped
golang backend binaries directly at `<backendsPath>/<name>`
instead of nesting them under their own subdir). That
permanently blocked reinstall and upgrade for anyone carrying
that state, since every retry hit the same error. Detect a
pre-existing non-directory, warn, and remove it before the
MkdirAll so the fresh install can write the correct nested
layout with metadata.json + run.sh.
Assisted-by: Claude:Opus 4.7
* fix(galleryop): refresh upgrade cache after backend ops
UpgradeChecker caches the last upgrade-check result and only
refreshes on the 6-hour tick or after an auto-upgrade cycle.
Manual upgrades (POST /api/backends/upgrade/:name) go through
the async galleryop worker, which completes the upgrade
correctly but never tells UpgradeChecker to re-check -- so
/api/backends/upgrades continued to list a just-upgraded backend
as upgradeable, indistinguishable from a failed upgrade, for up
to six hours.
Add an optional `OnBackendOpCompleted func()` hook on
GalleryService that fires after every successful install /
upgrade / delete on the backend channel (async, so a slow
callback doesn't stall the queue). startup.go wires it to
UpgradeChecker.TriggerCheck after both services exist. Result:
the upgrade banner clears within milliseconds of the worker
finishing.
Assisted-by: Claude:Opus 4.7
* build: prepend GOPATH/bin to PATH for protogen-go
install-go-tools runs `go install` for protoc-gen-go and
protoc-gen-go-grpc, which writes them into `go env GOPATH`/bin.
That directory isn't on every dev's PATH, and protoc resolves
its code-gen plugins via PATH, so the immediately-following
protoc invocation fails with
"protoc-gen-go: program not found"
which in turn blocks `make build` and any
`make backends/%` target that depends on build.
Prepend `go env GOPATH`/bin to PATH for the protoc invocation
so the freshly-installed plugins are found without requiring a
shell-profile change.
Assisted-by: Claude:Opus 4.7
* refactor(ui-api): non-blocking backend upgrade handler with opcache
POST /api/backends/upgrade/:name used to send the ManagementOp
directly onto the unbuffered BackendGalleryChannel, which blocked
the HTTP request whenever the galleryop worker was busy with a
prior operation. The op also didn't show up in /api/operations,
so the Backends UI couldn't reflect upgrade progress on the
affected row.
Register the op in opcache immediately, wrap it in a cancellable
context, store the cancellation function on the GalleryService,
and push onto the channel from a goroutine so the handler
returns right away. Response gains a `jobID` field and a
`message` string so clients have a consistent handle regardless
of whether the op is queued or running.
Pairs with the OnBackendOpCompleted hook added in the galleryop
commit — together the UI sees the upgrade start, watches
progress via /api/operations, and drops the "upgradeable" flag
the moment the worker finishes.
Assisted-by: Claude:Opus 4.7
gen_video's ffmpeg subprocess was relying on the filename extension to
choose the output container. Distributed LocalAI hands the backend a
staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to
.mp4 only after the backend returns, so ffmpeg saw a .tmp extension and
bailed with "Unable to choose an output format". Inference had already
completed and the frames were piped in, producing the cryptic
"video inference failed (code 1)" at the API layer.
Pass -f mp4 explicitly so the container is selected by flag instead of
by filename suffix.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: add distributed mode (experimental)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix data races, mutexes, transactions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix events and tool stream in agent chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use ginkgo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(cron): compute correctly time boundaries avoiding re-triggering
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not flood of healthy checks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not list obvious backends as text backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* tests fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactoring and consolidation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop redundant healthcheck
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* enhancements, refactorings
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(acestep-cpp): resolve relative model paths in options
The acestep-cpp backend was failing to load models because the model
paths in options (text_encoder_model, dit_model, vae_model) were being
passed to the C++ code without resolving their relative paths.
When a user configures acestep-cpp-turbo-4b, the model paths are specified
as relative paths like 'acestep-cpp/acestep-v15-turbo-Q8_0.gguf'. The
backend was passing these paths directly to the C++ code without joining
them with the model directory.
This fix:
1. Gets the base directory from the ModelFile path
2. Resolves all relative paths in options to be absolute paths
3. Adds debug logging to show resolved paths for troubleshooting
Fixes#8991
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* test: fix acestep tests to not join modeldir in options
According to code review feedback, the Options array in TestLoadModel
and TestSoundGeneration should contain just the model filenames without
filepath.Join with modelDir. The model paths are handled internally by
the backend.
* fix: change bpm parameter type to float32 to match C++ API signature
* test: fix TestLoadModel and TestSoundGeneration to use baseDir for model paths
- Modified TestLoadModel to compute baseDir from main model path and use it for relative model paths
- Modified TestSoundGeneration similarly to use baseDir for model paths
- Changed bpm parameter type from int32 to float32 to match C++ API
* Apply suggestions from code review
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(realtime): WebRTC support
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(tracing): Show full LLM opts and deltas
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* Remove HuggingFace backend support, restore other backends
- Removed backend/go/huggingface directory and all related files
- Removed pkg/langchain/huggingface.go
- Removed LCHuggingFaceBackend from pkg/model/initializers.go
- Removed huggingface backend entries from backend/index.yaml
- Updated backend/README.md to remove HuggingFace backend reference
- Restored kitten-tts, local-store, silero-vad, piper backends that were incorrectly removed
This change removes only HuggingFace backend support from LocalAI
as per the P0 priority request in issue #8963, while preserving
other backends (kitten-tts, local-store, silero-vad, piper).
Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
* Remove huggingface backend from test.yml build command
The tests-linux CI job was failing because it was trying to build the
huggingface backend which no longer exists after the backend removal.
This removes huggingface from the build command in test.yml.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>