Commit Graph

3 Commits

Author SHA1 Message Date
LocalAI [bot]
1198d10b58 fix(traces): cap backend trace Data to keep admin UI responsive (#9960)
* fix(traces): cap backend trace Data field so the admin UI stays responsive

The previous fix (#9946) capped API trace bodies but missed backend traces,
which carry the same blast radius:

  - LLM backend traces store the full chat messages JSON, full response, and
    full streaming deltas. Every agent-pool reasoning step ships the full
    RAG-augmented history (50-500 KiB per trace, often 100+ traces queued).
  - TTS / audio_transform / transcript traces embed a 30s audio snippet as
    base64, around 1.3 MiB per trace.

Both blow the /api/backend-traces JSON past tens of MiB. The admin Traces
page then keeps re-downloading and re-parsing the buffer faster than the
5s auto-refresh and stays in the loading state forever, the same symptom
the API-side fix addressed.

Apply two complementary caps, both honoring LOCALAI_TRACING_MAX_BODY_BYTES:

Option A (safety net in core/trace): RecordBackendTrace walks the Data map
recursively and replaces any string value larger than the cap with
"<truncated: N bytes>". Catches anything a future producer forgets.

Option B (head-preserving at the producer):
  - core/backend/llm.go: TruncateToBytes on messages, response, and
    chat_deltas content/reasoning_content so the leading content stays
    readable in the UI.
  - core/trace/audio_snippet.go: omit audio_wav_base64 when the encoded
    blob would exceed the cap (truncated base64 is undecodable). The
    quality metrics still ship and the UI's WaveformPlayer simply skips
    when the field is absent.

TruncateToBytes is bounded to <= maxBytes so Option A leaves the producer's
head-preserving output alone instead of replacing it with the bare marker.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

* fix(react-ui): expose tracing_max_body_bytes in Settings and Traces panels

The setting was already plumbed through env (LOCALAI_TRACING_MAX_BODY_BYTES),
CLI flag, and the runtime_settings.json GET/PUT schema, but neither the main
Settings page nor the inline Traces panel offered an input for it. Admins
hitting the "Traces UI stuck loading" symptom had to know to set an env var
or PUT raw JSON to /api/settings to dial the cap.

Add a "Max Body Bytes" row next to "Max Items" in both places. Same input
type, same disabled-when-tracing-off semantics, placeholder shows the 65536
default so users see what they're inheriting.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

* test(react-ui): disambiguate Max Items locator after adding Max Body Bytes

The Tracing settings panel now has two number inputs. The previous spec
matched 'input[type="number"]' which became ambiguous and triggered a
Playwright strict-mode violation in CI. Switch to getByPlaceholder('100')
for Max Items and add a parallel spec for the new Max Body Bytes field
using getByPlaceholder('65536').

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-23 14:50:40 +02:00
LocalAI [bot]
2be07f61da feat(whisper): honor client cancellation via ggml abort_callback (#9710)
* refactor(transcription): propagate request ctx through ModelTranscription*

Replaces context.Background() with the HTTP request ctx so client
disconnects start cancelling the gRPC call. No backend-side abort wiring
yet — that comes in a later commit. Pure plumbing.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(cli): pass ctx to backend.ModelTranscription

Follow-up to e65d3e1f which threaded ctx through ModelTranscription
but missed the CLI caller. CLI commands have no request-scoped ctx,
so context.Background() is correct here.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(audio): propagate request ctx into TTS, sound-gen, audio-transform

Same ctx-plumbing pattern applied to the rest of the audio path. CLI
callers use context.Background() since there is no request scope; HTTP
callers use c.Request().Context().

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(backend): propagate request ctx into biometric, detection, rerank, diarization paths

Replaces remaining context.Background() sites in core/backend with the
caller's ctx. After this commit, every core/backend/*.go entry point
threads the request ctx end-to-end to the gRPC client.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(grpc): plumb ctx through AIModel.AudioTranscription{,Stream}

Adds context.Context as first parameter to the AIModel interface methods
that wrap whisper-style transcription. Server-side gRPC handler now
forwards the per-RPC ctx (server-streaming uses stream.Context()).
Whisper, Voxtral, vibevoice-cpp, and sherpa-onnx accept the parameter;
none uses it yet — the actual cancellation primitive lands in the next
commit so this is pure plumbing.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): add abort_callback hook in the C++ bridge

Installs a std::atomic<int> flag, wires it into
whisper_full_params.abort_callback, and exposes a set_abort(int) C
symbol so Go can flip the flag from a goroutine watching the request
context. transcribe() now distinguishes abort (return 2) from real
whisper_full failure (return 1).

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): register set_abort symbol in the purego loader

Adds the Go-side binding for the new C export so the next commit can
call CppSetAbort(1) from a watcher goroutine on ctx.Done().

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper): honor ctx cancellation and return codes.Canceled

A watcher goroutine watches ctx.Done() during AudioTranscription and
calls CppSetAbort(1) on cancel. whisper_full sees abort_callback return
true at the next compute graph step, returns non-zero, and the bridge
returns 2 -> AudioTranscription maps that to codes.Canceled.

Adds an opt-in test (gated on WHISPER_MODEL_PATH / WHISPER_AUDIO_PATH)
that asserts cancellation latency under 5s and proves the abort flag
resets cleanly so the next transcription succeeds.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(whisper): join the cancel watcher goroutine before returning

Follow-up to 85edf9d2. The previous commit used `defer close(done)` and
called the watcher "joined synchronously" — but close() only signals,
it does not block until the goroutine exits. That left a window where
a late CppSetAbort(1) from a cancelled call could land on the next
call, after its C-side g_abort reset but before whisper_full() began
polling the abort callback, corrupting the second transcription.

Switch to a sync.WaitGroup join so wg.Wait() blocks until the watcher
has actually returned from its select.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(whisper): short-circuit pre-cancelled ctx in AudioTranscription

If ctx is already Done() at entry, return codes.Canceled immediately
instead of running the full transcription. The C-side g_abort reset
happens at the start of transcribe() and would otherwise overwrite a
watcher-set abort flag from an already-cancelled ctx, producing a
spurious successful transcription on a request the client has already
abandoned.

Assisted-by: Claude:claude-haiku-4-5
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(tests/distributed): update testLLM mock for new AudioTranscription signature

Phase B (93c48e19) added context.Context to AIModel.AudioTranscription
but missed the testLLM mock in tests/e2e/distributed. CI golangci-lint
caught it: *testLLM did not implement grpc.AIModel because the method
signature lacked the ctx parameter, which broke the distributed test
suite compilation and cascaded through every backend-build job that
runs `go build ./...`.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(whisper): port cancellation test to Ginkgo/Gomega

Project policy (.agents/coding-style.md, enforced by golangci-lint
forbidigo) is that all Go tests must use Ginkgo v2 + Gomega — no
stdlib testing patterns (t.Skip, t.Fatalf, etc.). Convert the
cancellation test to a Describe/It block with Skip(...) for env
gating and Expect/HaveOccurred for assertions.

Same coverage: cancel mid-flight returns codes.Canceled within 5s and
a follow-up transcription succeeds, proving the C-side g_abort flag
resets cleanly.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-08 01:44:47 +02:00
Richard Palethorpe
bb033b16a9 feat: add LocalVQE backend and audio transformations UI (#9640)
feat(audio-transform): add LocalVQE backend, bidi gRPC RPC, Studio UI

Introduce a generic "audio transform" capability for any audio-in / audio-out
operation (echo cancellation, noise suppression, dereverberation, voice
conversion, etc.) and ship LocalVQE as the first backend implementation.

Backend protocol:
- Two new gRPC RPCs in backend.proto: unary AudioTransform for batch and
  bidirectional AudioTransformStream for low-latency frame-by-frame use.
  This is the first bidi stream in the proto; per-frame unary at LocalVQE's
  16 ms hop would be RTT-bound. Wire it through pkg/grpc/{client,server,
  embed,interface,base} with paired-channel ergonomics.

LocalVQE backend (backend/go/localvqe/):
- Go-Purego wrapper around upstream liblocalvqe.so. CMake builds the upstream
  shared lib + its libggml-cpu-*.so runtime variants directly — no MODULE
  wrapper needed because LocalVQE handles CPU feature selection internally
  via GGML_BACKEND_DL.
- Sets GGML_NTHREADS from opts.Threads (or runtime.NumCPU()-1) — without it
  LocalVQE runs single-threaded at ~1× realtime instead of the documented
  ~9.6×.
- Reference-length policy: zero-pad short refs, truncate long ones (the
  trailing portion can't have leaked into a mic that wasn't recording).
- Ginkgo test suite (9 always-on specs + 2 model-gated).

HTTP layer:
- POST /audio/transformations (alias /audio/transform): multipart batch
  endpoint, accepts audio + optional reference + params[*]=v form fields.
  Persists inputs alongside the output in GeneratedContentDir/audio so the
  React UI history can replay past (audio, reference, output) triples.
- GET /audio/transformations/stream: WebSocket bidi, 16 ms PCM frames
  (interleaved stereo mic+ref in, mono out). JSON session.update envelope
  for config; constants hoisted in core/schema/audio_transform.go.
- ffmpeg-based input normalisation to 16 kHz mono s16 WAV via the existing
  utils.AudioToWav (with passthrough fast-path), so the user can upload any
  format / rate without seeing the model's strict 16 kHz constraint.
- BackendTraceAudioTransform integration so /api/backend-traces and the
  Traces UI light up with audio_snippet base64 and timing.
- Routes registered under routes/localai.go (LocalAI extension; OpenAI has
  no /audio/transformations endpoint), traced via TraceMiddleware.

Auth + capability + importer:
- FLAG_AUDIO_TRANSFORM (model_config.go), FeatureAudioTransform (default-on,
  in APIFeatures), three RouteFeatureRegistry rows.
- localvqe added to knownPrefOnlyBackends with modality "audio-transform".
- Gallery entry localvqe-v1-1.3m (sha256-pinned, hosted on
  huggingface.co/LocalAI-io/LocalVQE).

React UI:
- New /app/transform page surfaced via a dedicated "Enhance" sidebar
  section (sibling of Tools / Biometrics) — the page is enhancement, not
  generation, so it lives outside Studio. Two AudioInput components
  (Upload + Record tabs, drag-drop, mic capture).
- Echo-test button: records mic while playing the loaded reference through
  the speakers — the mic naturally picks up speaker bleed, giving a real
  (mic, ref) pair for AEC testing without leaving the UI.
- Reusable WaveformPlayer (canvas peaks + click-to-seek + audio controls)
  and useAudioPeaks hook (shared module-scoped AudioContext to avoid
  hitting browser context limits with three players on one page); migrated
  TTS, Sound, Traces audio blocks to use it.
- Past runs saved in localStorage via useMediaHistory('audio-transform') —
  the history entry stores all three URLs so clicking re-renders the full
  triple, not just the output.

Build + e2e:
- 11 matrix entries removed from .github/workflows/backend.yml (CUDA, ROCm,
  SYCL, Metal, L4T): upstream supports only CPU + Vulkan, so we ship those
  two and let GPU-class hardware route through Vulkan in the gallery
  capabilities map.
- tests-localvqe-grpc-transform job in test-extra.yml (gated on
  detect-changes.outputs.localvqe).
- New audio_transform capability + 4 specs in tests/e2e-backends.
- Playwright spec suite in core/http/react-ui/e2e/audio-transform.spec.js
  (8 specs covering tabs, file upload, multipart shape, history, errors).

Docs:
- New docs/content/features/audio-transform.md covering the (audio,
  reference) mental model, batch + WebSocket wire formats, LocalVQE param
  keys, and a YAML config example. Cross-links from text-to-audio and
  audio-to-text feature pages.

Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit Write Agent TaskCreate]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-04 22:07:11 +02:00