The llama-cpp-localai-paged patches/ dir had accumulated docs, plots, a csv,
dev .cpp harnesses, and a dead FP4-MoE kernel scaffold after an earlier git-mv.
Restore the invariant that patches/ holds only the .patch series.
Moves:
- patches/paged/README.md -> README.md (canonical doc at the backend root)
- patches/paged/{PIN_SYNC_c299a92c,PAGED_BITEXACT_NOTE,LOCALAI_LLAMACPP_BACKEND_PLAN,UPSTREAM_LAYER2_SCOPE}.md,
final_benchmark.csv, qwen36_*.png, paged-burst-bench.cpp, paged-reclaim-unit.cpp -> docs/
- patches/README.md -> docs/PATCH_MAINTENANCE.md (unique patch-regen recipe not in the canonical README)
Deletes:
- patches/BENCHMARKS.md (superseded by README section 4 + the dev-notes section)
- patches/kernel/ (dead FP4-MoE scaffold, never in the 0001-0030 apply glob, zero refs repo-wide)
Repoint every reference to the moved files: README internal links (docs/ + the
.github links drop from 5x ../ to 3x ../), .agents/llama-cpp-localai-paged-backend.md,
.github/scripts/paged-canary-apply.sh, .github/workflows/llama-cpp-paged-canary.yml,
the wrapper Makefile, backend/cpp/llama-cpp/grpc-server.cpp, backend/index.yaml,
docs/content/features/backends.md, gallery/index.yaml.
The build apply glob PAGED_PATCHES_DIR/0*.patch (PAGED_PATCHES_DIR := .../patches/paged)
is unchanged and still resolves to the 28 patches.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Move ALL paged-attention content out of the stock backend/cpp/llama-cpp
backend and into backend/cpp/llama-cpp-localai-paged, so the stock backend is
pure upstream llama.cpp and the paged backend owns and applies its own vendored
patch series.
- Delete the dead early-exploration scaffold backend/cpp/llama-cpp/paged/
(kernel/w4a16 Marlin scaffold, standalone paged_kv_manager, bench/loadgen,
its own 0001-0002 patches, dense-era design docs, tests). Zero references
repo-wide.
- Move backend/cpp/llama-cpp/patches/ (the 28-patch paged series + paged/README
+ 3 operational docs, plus the kernel/ scaffold patch and the top-level paged
README/BENCHMARKS) to backend/cpp/llama-cpp-localai-paged/patches/. The stock
backend keeps no patches/ dir; it had no non-paged base patches.
- Purify the stock backend: remove the LLAMA_PAGED make variable, the
patches/paged apply loop, and the LLAMA_PAGED passthrough to prepare.sh;
remove the paged-series handling from prepare.sh. The stock llama.cpp target
now only clones the pin and applies its own (currently empty) base patches/
series. The runtime paged option hooks in the shared grpc-server.cpp are
untouched (inert without the patches).
- The paged backend's Makefile now applies its OWN patches/paged/0*.patch onto
each freshly cloned tree via strict git apply (apply-paged-patches), after the
copied stock infra clones the pin and applies base patches.
- Repoint every reference to the old patches/paged path: the upstream canary
workflow + apply script, bump_deps.yaml, gallery/index.yaml, the docs,
backend/index.yaml, backend-matrix.yml, the top-level Makefile comments, and
the moved PIN_SYNC / README docs. Drop the now-removed LLAMA_PAGED=on
build-toggle from comments.
Verified: the full 28-patch series applies strict-clean (git apply, exit 0) to
a clean ggml-org/llama.cpp checkout at the pinned c299a92c, and the repointed
canary apply script resolves and applies the series end to end.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The paged-attention patch directory had accumulated ~55 scattered dev docs
(results, progress, scope, lever, and gap-analysis notes). Consolidate the
durable content of all of them into one canonical
backend/cpp/llama-cpp/patches/paged/README.md covering: what the patchset is,
the architecture (paged KV + block-table flash-attn, the gated-DeltaNet SSM
decode path, NVFP4 FP4-MMA, the decode-first scheduler), the full 0001-0030
patch series table with bit-exact status, the GB10 benchmarks
(patched-vs-stock-vs-vLLM + the Apple M4 architectural note), the dev notes
(bit-exact methodology, the per-path gate, the MoE-parity conclusion, the
rejected/flat levers, the opt-in bf16-SSM mode), arch+quant generality, the
pin + canary maintenance policy, and the published NVFP4 gallery models.
Delete the consolidated-away dev trail. Keep the three operational docs the
README links to: PIN_SYNC_c299a92c.md (canary reference), PAGED_BITEXACT_NOTE.md
(per-path gate reference) and LOCALAI_LLAMACPP_BACKEND_PLAN.md (the
ship-as-own-backend design-of-record), plus the benchmark plots + csv. The
.patch files and the unit/bench .cpp are untouched.
Repoint every external reference to a deleted doc at the new README:
grpc-server.cpp, docs/content/features/backends.md, gallery/index.yaml, the
canary apply script (PIN_BUMP_APPLY_CHECK.md -> README), and the base
patches/README.md (ADDITIVE_DESIGN.md -> README). The canary's PIN_SYNC
reference still resolves; its inert SSM_DECODE_FIX_RESULTS.md glob (a
patch-internal path matcher, not a repo-doc link) is left intact.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Advance the paged-attention backend's owned llama.cpp pin by 23 upstream
commits. The shipped source-only patch series (0001-0030, 28 patches) applies
strict-clean (git apply, exit 0) on a fresh c299a92c checkout with no re-export
needed, and the bit-exact gate is GREEN on every path on GB10 (CUDA sm_121):
- md5 greedy decode (-ngl 99 -fa on -n 48 --temp 0 --seed 1): dense
non-paged/paged 5951a5b4, MoE non-paged 07db32c2, MoE paged 8cb0ce23; all
match the established baselines.
- test-backend-ops CUDA0: SSM_CONV 45/45, SSM_CONV_UPDATE 16/16,
SSM_CONV_UPDATE_IDS 16/16, GATED_DELTA_NET 84/84, MUL_MAT 1146/1146,
MUL_MAT_ID 806/806; all OK.
The 23-commit upstream jump did not change our decode output. The .patch files
are kept byte-identical (they already apply strict-clean at the new pin); only
the pin, the PIN_SYNC evidence doc, and the canary/gallery doc references change.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The six LocalAI-paged NVFP4 entries advertised GB10 throughput figures with
no machine-readable hardware signal, and the four qwopus/MTP entries lacked
the nvfp4 tag entirely (not discoverable as NVFP4). Per the cross-arch audit
(ARCH_GENERALITY_AUDIT.md section gallery-targeting), NVFP4 GGUFs run
everywhere via dequant (never fail), so the gap is performance-expectation,
not correctness; the only available lever is description + tags.
- Add the nvfp4 tag to the four qwopus/MTP entries that lacked it; the two
base qwen3.6 entries already had it.
- Add a blackwell tag to all six (precedent: the nvidia hardware tag is
already used on many gallery entries as a filter chip).
- Lead each of the six descriptions with a one-line Blackwell-recommended /
runs-slower-off-Blackwell caveat.
- Scope the qwen3.6-27b 90-117% of vLLM claim explicitly to GB10 / DGX Spark
(consumer Blackwell) so it is not read as a universal figure.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The dense + MoE base NVFP4 GGUFs are live (huggingface.co/mudler/Qwen3.6-27B-NVFP4-GGUF
and .../Qwen3.6-35B-A3B-NVFP4-GGUF), sha256 verified vs the Hub LFS hash, uris resolve.
Replaces the placeholder/not-yet-published TODO.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Rename the two base NVFP4 entries to a consistent -paged suffix
(qwen3.6-27b-nvfp4 -> qwen3.6-27b-nvfp4-paged, qwen3.6-35b-a3b-nvfp4 ->
qwen3.6-35b-a3b-nvfp4-paged) so all four base/MTP paged entries share the
naming convention. Update the two matching examples in the backend plan doc.
Add qwopus3.6-27b-v2-mtp-nvfp4-paged and qwopus3.6-27b-coder-mtp-nvfp4-paged:
verbatim copies of the stock qwopus NVFP4-MTP entries (same GGUF uri/sha256,
sampling, template, tags, function block) rewired onto the LocalAI
paged-attention stack (backend llama-cpp-localai-paged; f16, flash_attention,
131072 context, 99 gpu_layers, batch 512; paged_kv + max_batch_tokens:512 +
kv_unified:false + parallel:128). The stock entries are left untouched.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Add qwen3.6-27b-nvfp4-mtp-paged and qwen3.6-35b-a3b-nvfp4-mtp-paged: the
existing michaelw9999 NVFP4-MTP GGUFs (same uri/sha256/filename and the
recommended Qwen3.6 sampling defaults) wired to backend
llama-cpp-localai-paged with our optimized paged options (f16, flash
attention, 128k context, gpu_layers 99, batch 512, paged_kv, decode-first
max_batch_tokens, kv_unified:false, parallel:128).
These coexist with the stock llama-cpp *-nvfp4-mtp entries (distinct
-paged names) so the four LocalAI-paged NVFP4 entries sit together at the
top of the gallery.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Patch 0026 added the hybrid per-head bf16 SSM-state opt-in as the
ssm_hybrid_tau_thresh cparam + the --ssm-bf16-tau CLI flag (default 0 =
bit-exact f32). Expose it per-model via the LocalAI gallery/model YAML
`options:` list, mirroring the paged_kv / max_batch_tokens setenv hooks.
- grpc-server.cpp: new `ssm_bf16_tau` (alias `ssm_hybrid_tau`) option ->
setenv(LLAMA_SSM_BF16_TAU) when the value parses to a positive float. It
does NOT reference the paged-only common_params field, so the turboquant
fork (which lacks patch 0026) stays byte-clean.
- patch 0026 (common.cpp common_context_params_to_llama): getenv fallback
feeds cparams.ssm_hybrid_tau_thresh from LLAMA_SSM_BF16_TAU only when the
--ssm-bf16-tau CLI flag is unset (0). Absent/non-positive env => untouched,
so stock stays bit-exact; the CLI flag takes precedence when set.
- docs: backend/index.yaml note, docs backends.md, gallery header NOTE
(referencing A_HYBRID_SSM_RESULTS.md; the 2 NVFP4 entries stay bit-exact).
Byte-safe when unset: with no ssm_bf16_tau option the env is never touched
and the default f32 bit-exact recurrence is preserved. Verified the parse +
consume code paths with a standalone compile-and-run (option string ->
LLAMA_SSM_BF16_TAU -> tau, plus 0 / garbage / CLI-precedence / unset cases).
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sync to master (12 commits) + the llama.cpp pin bump 8be759e6 -> 9d5d882d.
Conflicts resolved:
- Makefile .NOTPARALLEL: union (keep both backends/llama-cpp-localai-paged and
master's backends/privacy-filter-darwin).
- gallery/index.yaml: our 2 base NVFP4 entries (qwen3.6-27b-nvfp4, qwen3.6-35b-a3b-nvfp4)
for the paged backend prepended to master's full list; master keeps its own
*-nvfp4-mtp variants (distinct entries). Go build + YAML validated; the 8 duplicate
gallery names are pre-existing in master, not introduced here.
The patchset still needs re-verification against the new tip (pin-sync, next step).
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
New backend = stock llama-cpp grpc-server + the paged patchset (forces LLAMA_PAGED=on),
shipped as its own meta-backend (mirrors turboquant, simpler: no fork pin, no
grpc-server patching - the paged runtime hooks already exist in grpc-server.cpp).
Stock llama-cpp untouched (LLAMA_PAGED?=on retained; the de-risk flip deferred for
sign-off). Gallery: qwen3.6-27b-nvfp4 (dense) + qwen3.6-35b-a3b-nvfp4 (MoE) with the
benchmark run config (paged_kv, max_batch_tokens, parallel, flash_attention, f16),
mudler/ GGUF uris (sha256 TODO until publish). Importer dropdown entry + tests.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier
Follow-up to the NER tier engine (#10360), already on master. This carries
only the incremental review fixes and tests that postdate that merge — the
feature itself is not re-introduced.
Review fixes:
- openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText
(the `elem < 0` guard above already returns).
- application.go: collapse ResolvePIIPolicy's inline re-implementation of
PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the
"explicit pii.enabled wins, else cloud-proxy default" rule) and return true
past the !enabled guard where it is provable.
- pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in
patternDetector.Detect into one local.
- grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat
bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire —
dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000]
is rejected here with an actionable error; >1000 still fails closed via
Parse. Specs pin the relationship so the guard can't silently revert.
- PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and
force the DOM value back when clamping (min={0} only constrained the spinner,
so a negative reached saved config and silently disabled the length filter).
Tests:
- piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead).
- model-config.spec.js: assert the min_len clamp, and that entity_actions
collapses a duplicate group to a single row (map semantics; regression guard
against emitting an array that drops a row on save).
- tests/e2e-backends: token_classify capability driving the TokenClassify gRPC
RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned
spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via
`make test-extra-backend-privacy-filter` (3/3 specs).
- Makefile: test-extra-backend-privacy-filter wrapper.
- tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact
(mask + block) through the full HTTP -> detector -> redactor path; gated on
PII_NER_MODEL_GGUF so the default suite is unaffected.
- .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job
running the container harness on CPU.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(gallery): add privacy-filter-nemotron (f16 + q8)
GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English
PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from
openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing
privacy-filter-multilingual entry, trading language breadth for category depth.
- privacy-filter-nemotron: F16 reference artifact (~2.8 GB).
- privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge
use; description notes the size/speed tradeoff and to validate on your own
data (a single dropped span is a PII leak).
Both run on the privacy-filter backend with known_usecases [token_classify] and
a default mask policy (min_score 0.5); operators add per-category entity_actions
as needed. sha256s taken from the HF repo's LFS object ids.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ced): sketch sound-classification backend (CED audio tagger)
Wires ced.cpp (CED, 527-class AudioSet sound-event tagger; baby cry,
footsteps, glass, alarms, dog bark) into LocalAI as a Go/purego backend.
SKETCH (backend skeleton real; core REST wiring + CI/gallery is a checklist
in DESIGN.md):
- backend/backend.proto: new SoundDetection rpc + SoundClass messages
(run `make protogen-go` to regenerate pkg/grpc/proto).
- backend/go/ced: main.go (purego dlopen libced.so + ced_capi.h),
goced.go (Ced gRPC backend: Load + SoundDetection), Makefile
(clone-at-pin CED_VERSION, ggml static-PIC shared build), run.sh,
package.sh, .gitignore.
- DESIGN.md: REST /v1/audio/classification wiring (handler/route/capability
registration checklist), gallery/index + CI registration, and a scoping
note for the realtime/websocket live-recognition path (sliding-window
classify over the existing ws transport + voicegate; the ced C-API
per-PCM entry point is already window-friendly).
Backend code does not compile until protogen-go regenerates the pb types
and a libced.so is built (Makefile clones+builds it).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): REST /v1/audio/classification endpoint + capability registration
Wires the ced sound-event classification backend (AudioSet audio tagger)
end to end through the REST surface, mirroring the transcription path.
- Handler: core/http/endpoints/openai/sound_classification.go parses the
multipart audio upload, temp-files it, resolves the model config and
calls the SoundDetection RPC; returns {model, detections[]} JSON.
- Backend wrapper: core/backend/sound_classification.go (ModelSoundDetection)
loads the model and normalizes the proto response into schema types.
- Schema: core/schema/sound_classification.go (SoundClassificationResult).
- gRPC layer: SoundDetection wired through the LocalAI wrapper (interface,
Backend client, Client, embed, server, base default) so the loader-typed
client exposes the RPC; proto regenerated via make protogen-go.
- Route: POST /v1/audio/classification (+ /audio/classification alias) with
the audio/multipart default-model middleware in routes/openai.go.
- Capability surfaces: swagger @Tags/@Router on the handler; FLAG_SOUND_
CLASSIFICATION usecase flag + UsecaseSoundClassification + UsecaseInfoMap +
GuessUsecases + ModalityGroups + GetAllModelConfigUsecases; meta usecase
option; /api/instructions audio area updated; auth RouteFeatureRegistry +
FeatureAudioClassification (APIFeatures, default ON) + FeatureMetas; UI
usecaseFilters, capabilities.js CAP_SOUND_CLASSIFICATION, Models.jsx filter
+ i18n; docs page features/audio-classification.md + whats-new + crosslink.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): realtime sound-event detection over the websocket API
When a realtime pipeline configures a sound-classification model, each
VAD-committed utterance (the same window the transcription path produces)
is also run through the CED sound-event classifier and the scored AudioSet
tags are emitted as a new server event. No new backend rpc is needed: the
SoundDetection gRPC method already exists on this branch.
- config: add Pipeline.SoundDetection (yaml/json sound_detection,omitempty)
beside Transcription/VAD.
- realtime: add Model.SoundDetection(ctx, audio, topK, threshold) to the
ModelInterface; implement it on wrappedModel and transcriptOnlyModel by
calling backend.ModelSoundDetection with the session's sound-classification
model config (mirrors how Transcribe dispatches). Load the optional config
in newModel / newTranscriptionOnlyModel; nil config keeps it additive.
- types: add ConversationItemSoundDetectionEvent (item_id, content_index,
detections[]{label,score,index}) with type conversation.item.sound_detection,
its ServerEventType constant and MarshalJSON, mirroring the transcription
completed event.
- realtime: add emitSoundDetection (unary path: classify the committed window,
build the event, t.SendEvent) and wire it at the utterance-commit hook right
after emitTranscription; gated on session.SoundDetectionEnabled (resolved
from Pipeline.SoundDetection at session setup, defaults top_k=5, threshold=0).
Its error is logged via xlog but never aborts the turn.
- test: Ginkgo specs for emitSoundDetection (tags emitted, empty detections,
classifier error) plus a SoundDetection method on the fakeModel double.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ced): implement SoundDetection in nodes backend test doubles
The SoundDetection method added to the grpc backend interface left two
test doubles (fakeBackendClient, fakeGRPCBackend) incomplete, so
core/services/nodes failed to compile under `go vet`/`go test` (go build
missed it: the doubles live in _test.go). Add the method to both,
mirroring their existing Detect mock. Repairs CI for the nodes package.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): decouple realtime sound detection from VAD (sound-only sessions)
Sound-event detection must activate on sounds, not speech, so it no longer
runs through the voice VAD/transcription path. A sound-detection-only
pipeline (sound_detection set, no transcription/LLM) now:
- is accepted by prepareRealtimeConfig (sound_detection counts as a pipeline
stage),
- builds a lightweight model via newSoundDetectionOnlyModel (no VAD/STT/LLM/TTS
loaded), and
- defaults the session to turn_detection none (no VAD) with no transcription
stage, so the client drives windowing via input_audio_buffer.commit
(option A: client-side sliding window). The per-PCM C-API already supports
arbitrary windows.
commitUtterance gains a sound-only branch: it emits the
conversation.item.sound_detection event (scored AudioSet tags) and stops -
no transcription, no LLM response. generateResponse is now guarded on a
transcription stage being present, so a sound-only turn never invokes the LLM.
Existing transcription/VAD sessions are unchanged (additive). Added a
commitUtterance sound-only Ginkgo spec asserting it emits the sound event and
neither transcribes nor generates a response. go vet + golangci-lint
(new-from-merge-base) clean; openai suite green.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): register sound-classification backend in gallery + CI
Mechanical backend-image registration for the ced sound-event classifier,
mirroring the parakeet-cpp Go/purego backend everywhere it is wired up.
- .github/backend-matrix.yml: add the ced build matrix, field-for-field copies
of the parakeet-cpp entries (cpu amd64/arm64, cublas cuda 12/13 amd64,
l4t cuda-13 arm64, l4t-jetpack cuda-12 arm64, sycl f32/f16, vulkan
amd64/arm64, rocm hipblas, and the metal darwin entry), changing only
backend and tag-suffix. dockerfile stays ./backend/Dockerfile.golang.
- backend/index.yaml: add the &ced meta anchor (capabilities map per platform)
plus ced-development and the per-arch image entries, each uri/mirror
tag-suffix matching the matrix exactly. The model gallery (GGUF) entry is
intentionally deferred pending the HuggingFace publish (TODO note inline).
- scripts/changed-backends.js: add an explicit item.backend === "ced" branch in
inferBackendPath mapping to backend/go/ced/, same mechanism and ordering as
the parakeet-cpp branch (before the generic golang fallthrough).
- .github/workflows/bump_deps.yaml: register mudler/ced.cpp -> CED_VERSION in
backend/go/ced/Makefile so the daily bot bumps the pin.
- swagger/{docs.go,swagger.json,swagger.yaml}: regenerated via make swagger so
the existing /v1/audio/classification annotations land in the generated spec.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): server-side windowing for realtime sound detection (option B)
Adds an optional server-driven sliding-window classifier so a sound-only
realtime client only has to stream audio (no input_audio_buffer.commit):
- Pipeline.sound_detection_window_ms / sound_detection_hop_ms config knobs.
When both > 0 on a sound-only session, the server classifies the last
window of streamed audio every hop and emits a conversation.item.sound_
detection event; the input buffer is trimmed to one window so a long
stream stays bounded. When unset, the session stays client-driven
(option A). Runs independent of VAD (sound events are not speech).
- handleSoundWindow (ticker) + classifySoundWindow (one tick, extracted so
it is unit-testable) + writeWindowWAV, which declares the true
InputSampleRate (NewWAVHeaderWithRate) so the classifier resamples
correctly. Goroutine is started after toggleVAD and torn down with the
session (close + wg.Wait).
- Register pipeline.sound_detection (+window_ms/hop_ms) in the config meta
registry; the earlier realtime commit added pipeline.sound_detection
without a registry entry, failing TestAllFieldsHaveRegistryEntries. This
fixes that and covers the two new knobs.
Tests: classifySoundWindow emits an event + trims the buffer to one window,
no-ops on too-little audio; writeWindowWAV declares the given sample rate.
go build/vet + golangci-lint (new-from-merge-base) clean; config + openai
suites green.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): add ced-base GGUF model gallery entries (f16 + q8_0)
The ced-base weights are now published at mudler/ced-base-gguf (Apache-2.0,
converted from mispeech/ced-base). Adds gallery/ced.yaml (backend: ced +
known_usecases: sound_classification) and two gallery/index.yaml entries
(ced-base-f16 default, ced-base-q8 smallest) with sha256-pinned files, and
removes the now-resolved TODO from backend/index.yaml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ced): add tiny/mini/small GGUF model gallery entries
Publishes the rest of the CED family (same architecture, metadata-driven port
verified end-to-end on ced-tiny) to mudler/ced-{tiny,mini,small}-gguf and adds
their f16 + q8_0 gallery entries:
ced-tiny (5.5M, edge/Pi-class) f16 11MB / q8_0 6MB
ced-mini (9.6M) f16 19MB / q8_0 11MB
ced-small (22M) f16 42MB / q8_0 23MB
All sha256-pinned. ced-base remains the accuracy default.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ced): point gallery entries at the consolidated mudler/ced-gguf repo
All CED quantizations (tiny/mini/small/base, f16/q8_0) now live in a single
HuggingFace repo, mudler/ced-gguf, instead of per-model repos. Repoint the 8
gallery model entries' urls + file uris accordingly. sha256 and filenames are
unchanged.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ced): bump CED_VERSION to the short-clip fix
Pin the ced backend to ced.cpp 99c6ed3, which fixes a crash on any clip
shorter than target_length (~10.11s): time_pos_embed was added at its full
63-frame grid instead of being sliced to the clip's actual time grid, tripping
ggml_can_repeat in ggml_add. Surfaced by the live realtime e2e (sub-10s
windows) and gated with a short-clip parity test upstream.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(ced): list ced.cpp as a LocalAI-team engine + backend-guide directive
- README.md: add ced.cpp to the "native C/C++/GGML engines developed and
maintained by the LocalAI project" table.
- docs/content/features/backends.md: add a Sound Classification backend
category (sound-event classification / audio tagging) listing ced.cpp.
- .agents/adding-backends.md: add a "Documenting the backend" section and two
verification-checklist items requiring new backends to be documented in the
backends.md category list, and in-house native engines to be added to the
README maintained-engines table. This directive was missing.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ced): repin CED_VERSION to the v0.1.0 release commit
ced.cpp history was squashed into a single release commit (tagged v0.1.0), so
the previous pin (99c6ed3) no longer exists upstream. Pin to c04ac14, the
v0.1.0 release commit, so the backend builds against a commit that exists.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ced): silence gosec G304/G103 + govet unsafeptr on audited paths
- sound_classification.go: os.Create(dst) where dst = temp dir + path.Base of
the upload (no traversal). #nosec G304, matching the depth-anything-cpp handler.
- goced.go: reading a NUL-terminated C string from a libced-owned buffer.
#nosec G103 (gosec) + //nolint:govet (golangci-lint's unsafeptr check), since
the uintptr is a C-owned malloc'd buffer, not Go-GC memory.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add Depth Anything V2 models + bump native version
Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is
depth-only (no camera pose, no confidence) and ships both relative
(relative inverse depth) and metric (depth in metres) variants. The Go
backend is model-agnostic, so no backend code changes are required — only
a native version bump and new gallery entries.
- backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the
depth-anything.cpp commit that adds the DA2 engine + C-API routing
(e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag
da2-support so it survives a squash-merge).
- gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus
Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models
carry the metric-depth tag; none carry camera-pose.
Assisted-by: Claude:claude-opus-4-8
* chore(depth-anything-cpp): pin to merged DA2 master commit
PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint
the pin from the pre-merge commit to the canonical master commit.
Assisted-by: Claude:claude-opus-4-8
---------
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see
backup/pii-ner-tier-engine-prerebase). Net change:
- privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter
PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan).
TokenClassify moves off the patched llama.cpp path onto this backend.
- PII filter reworked to be NER-centric (encoder/NER detection tier scanning
whole conversations as one document), with a recreated bounded restricted-
regex secret-matching pattern detector tier alongside it (per-model
pii_detection.builtins / .patterns + core/services/routing/piipattern).
- Detection labelled by source (ner vs pattern); backend trace / confidence /
debug observability; analyze/redact exposed as a synchronous API.
- Instance-wide default detector policy + per-usecase default-on; request
filtering extended to completions, embeddings, edits & Ollama.
- React UI: NER-centric PII editor, detector-models table, pattern/builtins
editor, middleware default-policy UI.
- Gallery: privacy-filter-multilingual token-classify model + NER install
filter; token_classify known_usecase; batch sized to context for NER models.
privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13
meta + image entries with a capabilities map) matching its CI matrix jobs,
and an /import-model auto-detect importer (PrivacyFilterImporter, narrow
privacy-filter GGUF detection) replacing the prior pref-only registration.
Reconciled against master's independent evolution:
- Dropped master's PIIPatternOverrides feature (global-pattern runtime
overrides + /api/pii/patterns API + runtime_settings.json persistence). The
per-model NER + pattern-detector design supersedes it; it was built on the
global redactor pattern set this branch replaced.
- Reverted the llama.cpp Score carry-patch (0006-server-task-type-score):
removed the patch and restored master's grpc-server.cpp Score RPC (direct
llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's
model_config validation forbidding score + chat/completion/embeddings on
llama-cpp. token_classify is unaffected (it runs on the privacy-filter
backend, not llama-cpp).
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
feat(ds4): wire SSD streaming + quality engine options, add 128GB DeepSeek gallery models
The ds4 backend zero-initialized ds4_engine_options and exposed none of the
engine's tunable knobs, so SSD streaming (run a model larger than RAM by
streaming routed MoE experts from the GGUF on SSD) and the quality/perf knobs
were unreachable from LocalAI model YAMLs.
Map ModelOptions.Options onto ds4_engine_options through a declarative table
(kEngineOptSpecs + apply_engine_option) instead of per-field branches: the
struct is fixed C with no reflection, so the field set is enumerated once and a
future knob is a one-line table row. Two fields use ds4's own typed parsers
(GiB budgets, cache-experts count-or-NGB). Bare flags (e.g. "ssd_streaming")
mean true; path-type options (mtp_path, expert_profile_path,
directional_steering_file) resolve relative to the model directory so a gallery
entry can reference a companion file by bare filename. mtp_draft/mtp_margin are
now validated rather than parsed with throwing std::stoi/std::stof.
Add gallery entries for the 128 GB class:
- deepseek-v4-flash-q2-q4 (~91 GB, mixed q2/q4, fits RAM, higher quality)
- deepseek-v4-flash-q4-ssd (~153 GB full 4-bit, runs on 128 GB via SSD streaming)
- deepseek-v4-flash-q2-mtp (~81 GB + MTP speculative draft weights)
- deepseek-v4-pro-q2-ssd (~433 GB Pro, experimental SSD streaming)
SSD streaming is Metal (Darwin) only; the options are inert on CUDA/CPU.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): add depth-anything-3-metric-large gallery entry
DA3METRIC-LARGE (ViT-L) single-file metric-scale depth + sky, served by the
existing depth-anything backend (same single-GGUF path as mono-large). GGUF
published at mudler/depth-anything.cpp-gguf.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): serve nested metric model (two-file load)
The DA3 nested model needs both branches (anyview GIANT + metric ViT-L) loaded
together. Wire it through the backend:
- Load reads a 'metric_model:<file>' entry from ModelOptions.Options and, when
present, calls da_capi_load_nested(anyview, metric) instead of da_capi_load
(registers the new abi-4 symbol; helper optionValue + unit test).
- gallery: depth-anything-3-nested (model=anyview, options=metric branch, both
GGUFs fetched) for metric-scale depth + pose.
- bump depth-anything.cpp pin to cce5edc (abi 4 / da_capi_load_nested).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery
Mirrors the locate-anything-cpp backend to register a new depth-anything
backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via
purego (cgo-less, no Python at inference).
- backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage),
purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts
building depth-anything.cpp's DA_SHARED static .so per CPU variant.
- backend/index.yaml: depth-anything backend meta + all hardware-variant
capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t).
- gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32,
small, large, giant, mono-large).
- .github/backend-matrix.yml: one build entry per hardware variant.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): typed Depth RPC + REST endpoint exposing full DA3 data
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): pin depth-anything.cpp to e0b6814 (ABI 3 dense C-API)
The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3);
pin the native build to the commit that exports them.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): pin depth-anything.cpp to v0.1.0 release (b515c31)
Repoint the native version from the now-orphaned e0b6814 to the
b515c31 release commit, kept alive by the upstream v0.1.0 tag.
C-API is unchanged (da_capi_abi_version == 3).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): wire depth-anything-cpp into build, CI bump, and importer
The backend dir, gallery index, and CI build-matrix were present but the
backend was never wired into the integration points that adding-backends.md
requires:
- root Makefile: add to .NOTPARALLEL, the test-extra chain, a BACKEND_*
definition, the docker-build target eval, and docker-build-backends
(mirrors parakeet-cpp; the backend's own Makefile already documented that
its `test` target is driven by test-extra).
- bump_deps.yaml: register the DEPTHANYTHING_VERSION pin so the daily
auto-bump bot tracks mudler/depth-anything.cpp master (it cannot see an
unregistered Makefile pin).
- import form: add a preference-only KnownBackend entry so depth-anything is
selectable at /import-model (mirrors sam3-cpp; no reliable GGUF auto-detect
signal, so pref-only per the doc's default).
changed-backends.js needs no entry: the generic golang suffix branch already
resolves backend/go/depth-anything-cpp/.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(depth): auto-detect importer for depth-anything GGUFs
Replace the preference-only entry with a real auto-detect importer
(mirrors parakeet-cpp / locate-anything):
- DepthAnythingImporter matches a .gguf whose name carries a
depth-anything token (depth-anything-<size>-<quant>.gguf), so
/import-model recognises mudler/depth-anything.cpp-gguf repos and direct
GGUF URLs without an explicit backend preference. preferences.backend=
"depth-anything" still forces it.
- Registered before LlamaCPPImporter so its GGUF bundles aren't claimed by
the generic .gguf importer; the narrow name match means it cannot claim
arbitrary llama GGUFs or the upstream safetensors PyTorch repos.
- Multi-quant repos pick the smallest quant by default (q4_k -> ... -> f32,
depth stays >0.998 corr even at q4_k); quantizations preference overrides.
- Drops the now-redundant knownPrefOnlyBackends entry (importer-backed
backends are not listed there, matching parakeet-cpp).
- Table-driven Ginkgo test covers detection, negative cases (llama GGUF,
upstream safetensors), default/override/fallback quant pick, and direct
URL import. 10/10 specs pass.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(depth): check conn.Close error in grpc Depth client (errcheck)
The new Depth() client method used a bare `defer conn.Close()`. golangci-lint
runs with new-from-merge-base, so although the 39 sibling methods use the same
bare form (grandfathered), the newly added line trips errcheck. Drop the result
explicitly to satisfy the linter.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): bump depth-anything.cpp to v0.1.1 (embeddable CMake)
v0.1.0 (b515c31) used ${CMAKE_SOURCE_DIR} for its include dirs, which
points at the parent project when built via add_subdirectory() as this
backend does, so the container build failed with missing stb_image.h /
da_gguf_keys.h. v0.1.1 (2d42897) switches to project-relative paths.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): resolve gosec findings in the backend wrapper
The code-scanning gate flagged three new failure-level alerts in
godepthanythingcpp.go (gosec runs with -no-fail; GitHub gates on new alerts):
- G301: export dirs were created with 0o755. Tighten to 0o750 (no world
access needed for backend-written export output).
- G304: writeDepthPNG creates req.GetDst(). That path is chosen by the
LocalAI core as the intended output destination (same pattern every
image backend uses), not attacker input, so annotate with #nosec G304
and document why.
The remaining G103 "audit unsafe" notes on the unsafe.Slice C-buffer copies
are warning-level (the same purego interop whisper/parakeet use) and do not
gate the check, per the supertonic exclusion precedent in secscan.yaml.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
* fix(depth): bump depth-anything.cpp to v0.1.2 (CUDA cross-build arch)
v0.1.1 forced CMAKE_CUDA_ARCHITECTURES=native, which breaks the GPU-less
l4t/cublas CI builds (nvcc "Unsupported gpu architecture 'compute_'" on
CMake 3.22). v0.1.2 (442eea4) drops the override and lets ggml pick its
default cross-build arch list.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The Gemma 4 QAT MTP assistant-head gallery entries currently fail to load in the stock llama.cpp backend with unknown architecture errors. Hide them until the assistant GGUFs are verified against the supported backend path.
Assisted-by: Codex:GPT-5 [gh] [git]
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Expands sherpa-onnx Piper TTS coverage in the model gallery. Previously only
5 single-speaker Piper voices shipped (it_IT-paola, en_US-amy, es_ES-davefx,
fr_FR-siwis, de_DE-thorsten). This adds 19 entries:
Italian (it_IT): dii-high, miro-high, riccardo-x_low.
UK English (en_GB): alan (low+medium), alba-medium, aru-medium, cori
(high+medium), dii-high, jenny_dioco-medium, miro-high,
northern_english_male-medium, semaine-medium, southern_english_female
(low+medium), southern_english_male-medium, vctk-medium, sweetbbak-amy.
Each entry mirrors the existing Piper block (sherpa-onnx-tts.yaml base config).
sha256, ONNX path, sample rate and speaker count were read from the actual
release tarballs; licenses and source URLs were taken from each archive's
MODEL_CARD/README rather than assumed:
- dii/miro voices are OpenVoiceOS models under CC BY-NC-SA 4.0 (non-commercial),
labelled as such in both the license field and description.
- cori is LibriVox public-domain (cc0-1.0); OpenSLR-83 voices are CC BY-SA 4.0;
alba/vctk are CC BY 4.0.
- vctk (109), aru (12) and semaine (4) are multi-speaker; tagged accordingly
with a note to select the speaker via the numeric voice id.
The legacy underscore-named southern_english_female_medium duplicate is
intentionally skipped. No backend change is needed: sherpa-onnx auto-detects
single-speaker VITS vs Kokoro, and each tarball ships its own espeak-ng-data.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
A model whose ModelFile is a single file (e.g. sherpa-onnx VITS/piper: the
.onnx) failed to load on remote worker nodes because the sibling assets the
backend resolves from the model dir — tokens.txt, lexicon.txt, the
espeak-ng-data / dict directories, Kokoro's voices.bin — were never staged.
Only the declared ModelFile was shipped, so the worker hit "failed to create
sherpa-onnx TTS engine" and TTS produced no audio.
Lean on the existing option-path staging instead of hardcoding filenames:
- stageGenericOptions now also resolves an option value relative to the model's
own directory (not just the frontend models dir), so a shared config can
declare companions with bare names regardless of whether Model includes a
subdirectory; and it expands directory-valued options (e.g. espeak-ng-data)
file-by-file rather than handing a directory fd to the stager.
- gallery/sherpa-onnx-tts.yaml declares the companion assets as option paths
(tokens, lexicon, espeak-ng-data, voices.bin, dict, per-lang lexicons). The
backend ignores these keys and keeps resolving siblings from the model dir;
they exist only so distributed staging ships them. Absent files are skipped.
Adds router_optionstage_test.go covering file + directory companion staging via
the model-dir fallback.
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(omnivoice-cpp): add C wrapper + CMake/Makefile build over OmniVoice ov_* ABI
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): add option/language parsing + WAV framing helpers with tests
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): wire purego binding with TTS + streaming TTSStream
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* build(omnivoice-cpp): wire backend into root Makefile
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci(omnivoice-cpp): add build matrix entries + dep-bump registration
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): register backend meta + image entries
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): expose as preference-only importable backend
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(gallery): add omnivoice-cpp TTS models (Q8_0 default + BF16 HQ)
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(omnivoice-cpp): document the OmniVoice TTS backend
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(omnivoice-cpp): add env-gated e2e for TTS + streaming
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(omnivoice-cpp): honor tts.audio_path/tts.voice config as default cloning reference
The model config tts.audio_path (ModelOptions.AudioPath) and tts.voice now
provide a default voice-cloning reference used when a request omits Voice, so a
cloned voice can be pinned in the model YAML instead of passed per request. A
per-request voice still overrides. Paths resolve relative to the model dir.
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(omnivoice-cpp): add missing omnivoice-cpp-development backend meta
Mirrors the whisper/vibevoice convention: a -development meta aggregating the
master-tagged image variants (the production meta and per-variant prod+dev image
entries already existed; only the development meta aggregator was missing).
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Wire the Kokoro model family into the sherpa-onnx backend (which only
supported VITS/Piper before) and add gallery voices for Italian, English,
Spanish, French and German plus a multilingual Kokoro model.
- csrc/shim.{c,h}: kokoro_* config setters (model/voices/tokens/data_dir/
dict_dir/lexicon/lang/length_scale) mirroring the VITS path, with the
matching frees in tts_config_free.
- backend.go: loadTTS now detects a Kokoro model (a voices.bin beside the
ONNX) and routes to configureKokoroTTS, otherwise configureVitsTTS.
Kokoro picks up espeak-ng-data, the jieba dict and the per-language
lexicons (only one English variant, to avoid tens of thousands of
duplicate-word warnings at load); the language= option hints the lang.
- backend_test.go: functional test for isKokoroModel detection.
- gallery: 5 Piper VITS voices (it_IT-paola, en_US-amy, es_ES-davefx,
fr_FR-siwis, de_DE-thorsten) + kokoro-multi-lang-v1.0, served through
sherpa-onnx-tts.yaml with native streaming TTS.
Verified by building the backend and synthesizing with a real Piper and
Kokoro model (31/31 specs pass, including real-model synth smokes).
Assisted-by: Claude:claude-opus-4-8 gofmt golangci-lint go-test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>