LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-22 07:39:02 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	aed181e6c1	chore(model gallery): 🤖 add 1 new models via gallery agent (#10423 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-21 17:40:55 +02:00
LocalAI [bot]	e19c43cf04	feat(gallery): add Depth Anything V2 models + bump native version (#10413 ) * feat(gallery): add Depth Anything V2 models + bump native version Add Depth Anything V2 (DA2) support to the depth-anything backend. DA2 is depth-only (no camera pose, no confidence) and ships both relative (relative inverse depth) and metric (depth in metres) variants. The Go backend is model-agnostic, so no backend code changes are required — only a native version bump and new gallery entries. - backend/go/depth-anything-cpp/Makefile: pin DEPTHANYTHING_VERSION to the depth-anything.cpp commit that adds the DA2 engine + C-API routing (e3dec57f13a52366bbc4f279ef44804915960a6b, kept alive by the upstream tag da2-support so it survives a squash-merge). - gallery/index.yaml: add 12 DA2 entries (4 base quants, small, large, plus Hypersim indoor and VKITTI outdoor metric models in S/B/L). Metric models carry the metric-depth tag; none carry camera-pose. Assisted-by: Claude:claude-opus-4-8 * chore(depth-anything-cpp): pin to merged DA2 master commit PR #1 (mudler/depth-anything.cpp) merged to master as f4e17de (squash); repoint the pin from the pre-merge commit to the canonical master commit. Assisted-by: Claude:claude-opus-4-8 --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-20 14:56:16 +02:00
LocalAI [bot]	67692cb984	chore(model-gallery): ⬆️ update checksum (#10397 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-19 00:03:10 +02:00
Richard Palethorpe	3fa7b2955c	feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 ) Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see backup/pii-ner-tier-engine-prerebase). Net change: - privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan). TokenClassify moves off the patched llama.cpp path onto this backend. - PII filter reworked to be NER-centric (encoder/NER detection tier scanning whole conversations as one document), with a recreated bounded restricted- regex secret-matching pattern detector tier alongside it (per-model pii_detection.builtins / .patterns + core/services/routing/piipattern). - Detection labelled by source (ner vs pattern); backend trace / confidence / debug observability; analyze/redact exposed as a synchronous API. - Instance-wide default detector policy + per-usecase default-on; request filtering extended to completions, embeddings, edits & Ollama. - React UI: NER-centric PII editor, detector-models table, pattern/builtins editor, middleware default-policy UI. - Gallery: privacy-filter-multilingual token-classify model + NER install filter; token_classify known_usecase; batch sized to context for NER models. privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13 meta + image entries with a capabilities map) matching its CI matrix jobs, and an /import-model auto-detect importer (PrivacyFilterImporter, narrow privacy-filter GGUF detection) replacing the prior pref-only registration. Reconciled against master's independent evolution: - Dropped master's PIIPatternOverrides feature (global-pattern runtime overrides + /api/pii/patterns API + runtime_settings.json persistence). The per-model NER + pattern-detector design supersedes it; it was built on the global redactor pattern set this branch replaced. - Reverted the llama.cpp Score carry-patch (0006-server-task-type-score): removed the patch and restored master's grpc-server.cpp Score RPC (direct llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's model_config validation forbidding score + chat/completion/embeddings on llama-cpp. token_classify is unaffected (it runs on the privacy-filter backend, not llama-cpp). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-18 11:45:22 +01:00
LocalAI [bot]	757822cd74	chore(model gallery): 🤖 add 1 new models via gallery agent (#10384 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-18 08:51:30 +02:00
LocalAI [bot]	e60c094a7d	feat(ds4): SSD streaming + quality engine options, 128GB DeepSeek gallery models (#10374 ) feat(ds4): wire SSD streaming + quality engine options, add 128GB DeepSeek gallery models The ds4 backend zero-initialized ds4_engine_options and exposed none of the engine's tunable knobs, so SSD streaming (run a model larger than RAM by streaming routed MoE experts from the GGUF on SSD) and the quality/perf knobs were unreachable from LocalAI model YAMLs. Map ModelOptions.Options onto ds4_engine_options through a declarative table (kEngineOptSpecs + apply_engine_option) instead of per-field branches: the struct is fixed C with no reflection, so the field set is enumerated once and a future knob is a one-line table row. Two fields use ds4's own typed parsers (GiB budgets, cache-experts count-or-NGB). Bare flags (e.g. "ssd_streaming") mean true; path-type options (mtp_path, expert_profile_path, directional_steering_file) resolve relative to the model directory so a gallery entry can reference a companion file by bare filename. mtp_draft/mtp_margin are now validated rather than parsed with throwing std::stoi/std::stof. Add gallery entries for the 128 GB class: - deepseek-v4-flash-q2-q4 (~91 GB, mixed q2/q4, fits RAM, higher quality) - deepseek-v4-flash-q4-ssd (~153 GB full 4-bit, runs on 128 GB via SSD streaming) - deepseek-v4-flash-q2-mtp (~81 GB + MTP speculative draft weights) - deepseek-v4-pro-q2-ssd (~433 GB Pro, experimental SSD streaming) SSD streaming is Metal (Darwin) only; the options are inert on CUDA/CPU. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-17 10:30:06 +02:00
LocalAI [bot]	de299ca101	chore(model-gallery): ⬆️ update checksum (#10371 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-17 09:28:47 +02:00
LocalAI [bot]	4c6750fe6b	feat(depth): metric-large + nested metric model gallery entries (#10363 ) * feat(depth): add depth-anything-3-metric-large gallery entry DA3METRIC-LARGE (ViT-L) single-file metric-scale depth + sky, served by the existing depth-anything backend (same single-GGUF path as mono-large). GGUF published at mudler/depth-anything.cpp-gguf. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): serve nested metric model (two-file load) The DA3 nested model needs both branches (anyview GIANT + metric ViT-L) loaded together. Wire it through the backend: - Load reads a 'metric_model:<file>' entry from ModelOptions.Options and, when present, calls da_capi_load_nested(anyview, metric) instead of da_capi_load (registers the new abi-4 symbol; helper optionValue + unit test). - gallery: depth-anything-3-nested (model=anyview, options=metric branch, both GGUFs fetched) for metric-scale depth + pose. - bump depth-anything.cpp pin to cce5edc (abi 4 / da_capi_load_nested). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 22:03:58 +02:00
LocalAI [bot]	294170d3ed	feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery (#10352 ) * feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery Mirrors the locate-anything-cpp backend to register a new depth-anything backend that wraps the Depth Anything 3 ggml port (depth-anything.cpp) via purego (cgo-less, no Python at inference). - backend/go/depth-anything-cpp/: gRPC backend (Load + Predict + GenerateImage), purego binding to the da_capi_* C ABI, CMake/Makefile/run/package/test scripts building depth-anything.cpp's DA_SHARED static .so per CPU variant. - backend/index.yaml: depth-anything backend meta + all hardware-variant capability entries (cpu/cuda12/cuda13/intel-sycl-f32+f16/vulkan/nvidia-l4t). - gallery/index.yaml: 8 Depth Anything 3 GGUF models (base q4_k/q8_0/f16/f32, small, large, giant, mono-large). - .github/backend-matrix.yml: one build entry per hardware variant. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): typed Depth RPC + REST endpoint exposing full DA3 data Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): pin depth-anything.cpp to e0b6814 (ABI 3 dense C-API) The Depth RPC handler calls da_capi_depth_dense / da_capi_points (C-API ABI 3); pin the native build to the commit that exports them. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): pin depth-anything.cpp to v0.1.0 release (b515c31) Repoint the native version from the now-orphaned e0b6814 to the b515c31 release commit, kept alive by the upstream v0.1.0 tag. C-API is unchanged (da_capi_abi_version == 3). Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): wire depth-anything-cpp into build, CI bump, and importer The backend dir, gallery index, and CI build-matrix were present but the backend was never wired into the integration points that adding-backends.md requires: - root Makefile: add to .NOTPARALLEL, the test-extra chain, a BACKEND_* definition, the docker-build target eval, and docker-build-backends (mirrors parakeet-cpp; the backend's own Makefile already documented that its `test` target is driven by test-extra). - bump_deps.yaml: register the DEPTHANYTHING_VERSION pin so the daily auto-bump bot tracks mudler/depth-anything.cpp master (it cannot see an unregistered Makefile pin). - import form: add a preference-only KnownBackend entry so depth-anything is selectable at /import-model (mirrors sam3-cpp; no reliable GGUF auto-detect signal, so pref-only per the doc's default). changed-backends.js needs no entry: the generic golang suffix branch already resolves backend/go/depth-anything-cpp/. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(depth): auto-detect importer for depth-anything GGUFs Replace the preference-only entry with a real auto-detect importer (mirrors parakeet-cpp / locate-anything): - DepthAnythingImporter matches a .gguf whose name carries a depth-anything token (depth-anything-<size>-<quant>.gguf), so /import-model recognises mudler/depth-anything.cpp-gguf repos and direct GGUF URLs without an explicit backend preference. preferences.backend= "depth-anything" still forces it. - Registered before LlamaCPPImporter so its GGUF bundles aren't claimed by the generic .gguf importer; the narrow name match means it cannot claim arbitrary llama GGUFs or the upstream safetensors PyTorch repos. - Multi-quant repos pick the smallest quant by default (q4_k -> ... -> f32, depth stays >0.998 corr even at q4_k); quantizations preference overrides. - Drops the now-redundant knownPrefOnlyBackends entry (importer-backed backends are not listed there, matching parakeet-cpp). - Table-driven Ginkgo test covers detection, negative cases (llama GGUF, upstream safetensors), default/override/fallback quant pick, and direct URL import. 10/10 specs pass. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(depth): check conn.Close error in grpc Depth client (errcheck) The new Depth() client method used a bare `defer conn.Close()`. golangci-lint runs with new-from-merge-base, so although the 39 sibling methods use the same bare form (grandfathered), the newly added line trips errcheck. Drop the result explicitly to satisfy the linter. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): bump depth-anything.cpp to v0.1.1 (embeddable CMake) v0.1.0 (b515c31) used ${CMAKE_SOURCE_DIR} for its include dirs, which points at the parent project when built via add_subdirectory() as this backend does, so the container build failed with missing stb_image.h / da_gguf_keys.h. v0.1.1 (2d42897) switches to project-relative paths. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): resolve gosec findings in the backend wrapper The code-scanning gate flagged three new failure-level alerts in godepthanythingcpp.go (gosec runs with -no-fail; GitHub gates on new alerts): - G301: export dirs were created with 0o755. Tighten to 0o750 (no world access needed for backend-written export output). - G304: writeDepthPNG creates req.GetDst(). That path is chosen by the LocalAI core as the intended output destination (same pattern every image backend uses), not attacker input, so annotate with #nosec G304 and document why. The remaining G103 "audit unsafe" notes on the unsafe.Slice C-buffer copies are warning-level (the same purego interop whisper/parakeet use) and do not gate the check, per the supertonic exclusion precedent in secscan.yaml. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 * fix(depth): bump depth-anything.cpp to v0.1.2 (CUDA cross-build arch) v0.1.1 forced CMAKE_CUDA_ARCHITECTURES=native, which breaks the GPU-less l4t/cublas CI builds (nvcc "Unsupported gpu architecture 'compute_'" on CMake 3.22). v0.1.2 (442eea4) drops the override and lets ggml pick its default cross-build arch list. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-16 16:28:28 +02:00
LocalAI [bot]	edc61053aa	fix(gallery): hide broken Gemma 4 QAT MTP entries (#10348 ) The Gemma 4 QAT MTP assistant-head gallery entries currently fail to load in the stock llama.cpp backend with unknown architecture errors. Hide them until the assistant GGUFs are verified against the supported backend path. Assisted-by: Codex:GPT-5 [gh] [git] Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-15 22:57:19 +02:00
LocalAI [bot]	2df2876db2	feat(supertonic): add Supertonic ONNX TTS backend (CPU) (#10342 ) * feat(supertonic): vendor upstream Go TTS pipeline (helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add gRPC backend (Load/TTS/TTSStream, CPU) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): satisfy unused linter (use onnxProvider; exclude vendored helper.go) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(supertonic): unit tests for resolvers + gated end-to-end synthesis Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(supertonic): gofmt backend.go comment block Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add Makefile, run.sh, package.sh (CPU build) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(supertonic): wire backend into root Makefile Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): check ort.DestroyEnvironment return (errcheck) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(supertonic): resolve voice_styles as sibling of onnx dir; guard trim; test voice Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add CPU build matrix + gallery index entries Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): expose as pref-only importable backend Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(supertonic): add Supertonic/supertonic-3 TTS model to the gallery 16 files (4 onnx + tts.json + unicode_indexer.json + 10 voice styles) from HF Supertone/supertonic-3, served via the supertonic backend. Defaults to voice F1; onnx/ + sibling voice_styles/ layout matches the backend's resolveVoicesDir. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(meta): register pipeline.max_history_items config field Pre-existing on master: the field was added without a registry entry, failing TestAllFieldsHaveRegistryEntries (core/config/meta). Add the entry so it renders properly in the model-config UI. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(secscan): exclude vendored supertonic backend from gosec helper.go is vendored from supertone-inc/supertonic; its G304/G404/G104 findings are inherent to upstream and the math/rand use is correct for flow-matching noise (crypto/rand would be wrong). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-15 16:54:11 +02:00
LocalAI [bot]	1dedb5277c	feat(gallery): add all Italian + all UK English sherpa-onnx Piper voices (#10332 ) Expands sherpa-onnx Piper TTS coverage in the model gallery. Previously only 5 single-speaker Piper voices shipped (it_IT-paola, en_US-amy, es_ES-davefx, fr_FR-siwis, de_DE-thorsten). This adds 19 entries: Italian (it_IT): dii-high, miro-high, riccardo-x_low. UK English (en_GB): alan (low+medium), alba-medium, aru-medium, cori (high+medium), dii-high, jenny_dioco-medium, miro-high, northern_english_male-medium, semaine-medium, southern_english_female (low+medium), southern_english_male-medium, vctk-medium, sweetbbak-amy. Each entry mirrors the existing Piper block (sherpa-onnx-tts.yaml base config). sha256, ONNX path, sample rate and speaker count were read from the actual release tarballs; licenses and source URLs were taken from each archive's MODEL_CARD/README rather than assumed: - dii/miro voices are OpenVoiceOS models under CC BY-NC-SA 4.0 (non-commercial), labelled as such in both the license field and description. - cori is LibriVox public-domain (cc0-1.0); OpenSLR-83 voices are CC BY-SA 4.0; alba/vctk are CC BY 4.0. - vctk (109), aru (12) and semaine (4) are multi-speaker; tagged accordingly with a note to select the speaker via the numeric voice id. The legacy underscore-named southern_english_female_medium duplicate is intentionally skipped. No backend change is needed: sherpa-onnx auto-detects single-speaker VITS vs Kokoro, and each tarball ships its own espeak-ng-data. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-14 18:33:44 +02:00
LocalAI [bot]	e046a7749f	chore(model gallery): 🤖 add 1 new models via gallery agent (#10328 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-14 16:43:32 +02:00
LocalAI [bot]	e5c95e0449	fix(distributed): stage backend companion assets to remote nodes (#10330 ) A model whose ModelFile is a single file (e.g. sherpa-onnx VITS/piper: the .onnx) failed to load on remote worker nodes because the sibling assets the backend resolves from the model dir — tokens.txt, lexicon.txt, the espeak-ng-data / dict directories, Kokoro's voices.bin — were never staged. Only the declared ModelFile was shipped, so the worker hit "failed to create sherpa-onnx TTS engine" and TTS produced no audio. Lean on the existing option-path staging instead of hardcoding filenames: - stageGenericOptions now also resolves an option value relative to the model's own directory (not just the frontend models dir), so a shared config can declare companions with bare names regardless of whether Model includes a subdirectory; and it expands directory-valued options (e.g. espeak-ng-data) file-by-file rather than handing a directory fd to the stager. - gallery/sherpa-onnx-tts.yaml declares the companion assets as option paths (tokens, lexicon, espeak-ng-data, voices.bin, dict, per-lang lexicons). The backend ignores these keys and keeps resolving siblings from the model dir; they exist only so distributed staging ships them. Absent files are skipped. Adds router_optionstage_test.go covering file + directory companion staging via the model-dir fallback. Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 16:42:59 +02:00
LocalAI [bot]	4bb592cf91	feat(qwen3-tts-cpp): migrate to ServeurpersoCom/qwentts.cpp (streaming, speakers, voice design) (#10316 ) * feat(qwen3-tts-cpp): repoint upstream to ServeurpersoCom/qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): flatten qt_* ABI into qt3_* purego shim Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): build shim against upstream qwen-core static lib Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add option/language/voice/sampling parsing Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add 24kHz WAV encode/decode/stream-header helpers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): purego backend with streaming, speakers, voice design Map TTSRequest onto qwentts.cpp: instructions->instruct, voice->named speaker or clone-reference path, params map->ref_text + sampling. Add TTSStream over the qt chunk callback. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): unit specs + build-gated TTS/TTSStream e2e Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(qwen3-tts-cpp): close defensive PCM-free gap on zero-sample result Register CppPCMFree before the n<=0 guard so a non-null buffer with zero samples cannot leak (the C contract returns NULL on failure, so this is defensive). Raised in code review. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): advertise TTSStream capability Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): update backend index metadata for qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): qwentts.cpp models - base/customvoice/voicedesign, Q8_0 & Q4_K_M Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(qwen3-tts-cpp): release note for qwentts.cpp migration Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): cover audio_path voice-cloning fallback Add resolveRequest unit specs (config audio_path used as the clone reference when Voice is empty; per-request audio Voice overrides it; a named-speaker Voice does not trigger cloning) plus a real-inference e2e that clones from audio_path (confirmed ref_spk_emb=yes in the pipeline). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): drop the release-note doc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 23:09:59 +02:00
LocalAI [bot]	0854932a25	feat(omnivoice-cpp): add OmniVoice TTS backend (file + streaming, voice cloning + voice design) (#10310 ) * feat(omnivoice-cpp): add C wrapper + CMake/Makefile build over OmniVoice ov_* ABI Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): add option/language parsing + WAV framing helpers with tests Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): wire purego binding with TTS + streaming TTSStream Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(omnivoice-cpp): wire backend into root Makefile Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(omnivoice-cpp): add build matrix entries + dep-bump registration Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): register backend meta + image entries Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): expose as preference-only importable backend Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add omnivoice-cpp TTS models (Q8_0 default + BF16 HQ) Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(omnivoice-cpp): document the OmniVoice TTS backend Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(omnivoice-cpp): add env-gated e2e for TTS + streaming Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(omnivoice-cpp): honor tts.audio_path/tts.voice config as default cloning reference The model config tts.audio_path (ModelOptions.AudioPath) and tts.voice now provide a default voice-cloning reference used when a request omits Voice, so a cloned voice can be pinned in the model YAML instead of passed per request. A per-request voice still overrides. Paths resolve relative to the model dir. Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(omnivoice-cpp): add missing omnivoice-cpp-development backend meta Mirrors the whisper/vibevoice convention: a -development meta aggregating the master-tagged image variants (the production meta and per-variant prod+dev image entries already existed; only the development meta aggregator was missing). Assisted-by: claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 21:28:46 +02:00
LocalAI [bot]	203410871b	feat(sherpa-onnx): add Kokoro TTS + multilingual Piper voices (#10309 ) Wire the Kokoro model family into the sherpa-onnx backend (which only supported VITS/Piper before) and add gallery voices for Italian, English, Spanish, French and German plus a multilingual Kokoro model. - csrc/shim.{c,h}: kokoro_* config setters (model/voices/tokens/data_dir/ dict_dir/lexicon/lang/length_scale) mirroring the VITS path, with the matching frees in tts_config_free. - backend.go: loadTTS now detects a Kokoro model (a voices.bin beside the ONNX) and routes to configureKokoroTTS, otherwise configureVitsTTS. Kokoro picks up espeak-ng-data, the jieba dict and the per-language lexicons (only one English variant, to avoid tens of thousands of duplicate-word warnings at load); the language= option hints the lang. - backend_test.go: functional test for isKokoroModel detection. - gallery: 5 Piper VITS voices (it_IT-paola, en_US-amy, es_ES-davefx, fr_FR-siwis, de_DE-thorsten) + kokoro-multi-lang-v1.0, served through sherpa-onnx-tts.yaml with native streaming TTS. Verified by building the backend and synthesizing with a real Piper and Kokoro model (31/31 specs pass, including real-model synth smokes). Assisted-by: Claude:claude-opus-4-8 gofmt golangci-lint go-test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 21:27:27 +02:00
LocalAI [bot]	53cbb578a9	chore(model gallery): 🤖 add 1 new models via gallery agent (#10304 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-13 11:03:03 +02:00
LocalAI [bot]	3351b62c91	chore(model gallery): 🤖 add 1 new models via gallery agent (#10302 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-13 10:59:23 +02:00
LocalAI [bot]	81ab62e874	chore(model gallery): 🤖 add 1 new models via gallery agent (#10298 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-13 09:58:11 +02:00
LocalAI [bot]	8f0059123b	feat(gallery): add 60 piper TTS voices across 42 languages (Phase 2) (#10296 ) Extends the piper voice set with a couple of voices per language for 42 more languages (Arabic, Bulgarian, Catalan, Czech, Welsh, Danish, Greek, Spanish, Basque, Persian, Finnish, French, Hindi, Hungarian, Indonesian, Icelandic, Georgian, Kazakh, Luxembourgish, Latvian, Malayalam, Nepali, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Albanian, Swedish, Swahili, Telugu, Turkish, Ukrainian, Urdu, Vietnamese, Chinese, ...), run through the crispasr backend's backend:piper engine and hosted at LocalAI-Community/piper-voices-GGUF. All converted from rhasspy/piper-voices with CrispASR's convert-piper-to-gguf.py and screened end-to-end on the pinned engine. Only single-speaker low/medium voices are included; high-quality decoders and multi-speaker models segfault and are excluded (e.g. zh_CN-chaowen dropped, huayan kept). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-13 09:19:21 +02:00
LocalAI [bot]	50dea8c983	feat(crispasr): bundle espeak-ng and add piper TTS voices to the gallery (#10283 ) CrispASR's piper backend phonemizes non-English text via espeak-ng (dlopen, the MIT-clean path; English uses a built-in G2P). The FROM scratch crispasr image shipped none of it, so non-English piper voices loaded but failed synthesis with "phonemization failed". Bundle the espeak-ng runtime so they work: - Dockerfile.golang: install espeak-ng-data + libespeak-ng1 and its libpcaudio0 / libsonic0 deps in the crispasr builder (espeak's dlopen fails without the latter two). - package.sh: copy libespeak-ng.so.1, libpcaudio.so.0, libsonic.so.0 into package/lib/ and the espeak-ng-data dir into the package root. - run.sh: export CRISPASR_ESPEAK_DATA_PATH so the bundled data is found. Add 9 single-speaker piper voices (de/en/it, incl. Italian paola + riccardo) to the gallery, run through backend:piper, hosted at LocalAI-Community/piper-voices-GGUF (converted from rhasspy/piper-voices with CrispASR's convert-piper-to-gguf.py). Only single-speaker low/medium voices are included; the engine does not yet support multi-speaker or high-quality piper decoders. All 9 verified end-to-end: each synthesizes a WAV at the model's native sample rate using only the image-bundled espeak payload. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-12 23:10:30 +02:00
LocalAI [bot]	4ce0f6102a	chore(model gallery): 🤖 add 1 new models via gallery agent (#10270 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-12 16:21:35 +02:00
Richard Palethorpe	085fc53bbc	fix(router): production-ready request router + auto-size batch for embedding/rerank (#10104 ) * fix(router): score classifier production-readiness Conversation trimming runs through the classifier model's chat template and trims by exact token count, sized to the model's n_batch which is now scaled to context so long probes can't crash the backend. Missing chat_message templates are a hard error at router build time. Router- facing factories (Embedder/Scorer/Reranker/TokenCounter) re-resolve ModelConfig per call so a model installed post-startup doesn't bind a stub Backend="" config and silently fall into the loader's auto- iterate path. New 'vector_store' backend trace recorded inside localVectorStore on every Search/Insert — including the backend-load-failure path that previously vanished into an xlog.Warn — with outcome tagging (hit/miss/empty_store/backend_load_error/find_error/insert_error/ok). Companion cleanup drops misleading similarity:0 and input_tokens_count:0 from non-hit and text-mode traces. Gallery local-store-development aliases to 'local-store' so the master image satisfies pkg/model.LocalStoreBackend lookups from the embedding cache. Misc: llama-cpp TokenizeString reads the correct 'prompt' JSON key (the original bug); ModelTokenize nil-guard; non-fatal mitm proxy startup; PII 'route_local' renamed to 'allow' with docs/UI in sync; model-editor footer no longer eats the edit area on small screens; several config-editor template/dropdown/section fixes. Tests: e2e router specs (casual/code-hint + long-conversation trim), vector_store trace specs, lazy-factory specs, gallery dev-alias resolution, Playwright trace badge + scroll regression. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(backend): auto-size batch to context for embedding and rerank models Embedding and rerank models pool over the whole input in a single physical batch (n_ubatch). With batch left at the 512 default, the backend rejects longer inputs with "input is too large to process", silently capping a large-context embedder (e.g. 8k/32k) at 512 tokens. Size n_batch to the context for these single-pass usecases, mirroring the existing FLAG_SCORE behaviour; an explicit batch: still wins. Extracts EffectiveContextSize/EffectiveBatchSize from grpcModelOpts so the effective decode window has one home for other callers to reuse. Adds an e2e-aio regression test that embeds a >512-token input. The AIO embedding model is switched to nomic-embed-text-v1.5 (2048 context) because the previous granite model was capped at 512 tokens and could not exercise the larger batch. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(gallery): raise arch-router scoring output cap via parallel:64 Scoring decodes the whole prompt+candidate in a single llama_decode and reads one logit row per candidate token. The vendored llama.cpp server caps causal output rows at n_parallel, so the default of 1 aborts with GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) on multi-token route labels. Set options: [parallel:64] on both arch-router quant entries to lift the cap; kv_unified (the grpc-server default) keeps the full context per sequence, so this does not split the KV cache. Assisted-by: claude-code:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-12 16:21:15 +02:00
LocalAI [bot]	56cc4f63fc	feat(backend): locate-anything-cpp (open-vocabulary object detection via ggml) (#10264 ) * feat(backend): add locate-anything-cpp backend (open-vocab detection via la_capi) A Go/purego backend wrapping locate-anything.cpp's la_capi C ABI, implementing the gRPC Detect RPC: image + open-vocabulary text prompt -> labeled boxes. Mirrors backend/go/rfdetr-cpp; static-links ggml into a per-CPU-variant .so. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend): register locate-anything-cpp in build matrix Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): locate-anything gallery entry + model importer Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(backend): locate-anything-cpp Load+Detect wire test Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add locate-anything-3b model to the gallery index Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(backend): register locate-anything.cpp in bump_deps auto-bump Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io> * ci(test): e2e smoke for locate-anything-cpp in test-extra (loads the 3B + image, runs Detect) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: mudler <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: mudler <mudler@localai.io> Co-authored-by: mudler <mudler@localai.io>	2026-06-12 14:59:07 +02:00
LocalAI [bot]	618e90cd13	feat(gallery): add Gemma 4 QAT family + MTP speculative-decoding pairs (#10215 ) Add the remaining official Google Gemma 4 QAT Q4_0 GGUFs (E2B, E4B, 26B-A4B, 31B) next to the existing 12B entry, each shipping its multimodal mmproj. Also add three MTP (Multi-Token Prediction) speculative-decoding bundles that pair each QAT target with a QAT-matched assistant/drafter head: - 12B <- Janvitos/gemma-4-12B-it-qat-assistant-MTP-Q8_0-GGUF - 26B-A4B <- boxwrench/gemma-4-qat-mtp-assistant-heads - 31B <- boxwrench/gemma-4-qat-mtp-assistant-heads The assistant heads use the gemma4_assistant architecture and are not standalone chat models, so each entry bundles the target + draft and sets draft_model together with the draft-mtp spec options (spec_type:draft-mtp / spec_n_max:6 / spec_p_min:0.75), matching MTPSpecOptions() in core/config/mtp.go. QAT-matched heads raise draft acceptance substantially over generic non-QAT heads. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-08 10:26:42 +02:00
LocalAI [bot]	6070402477	chore(model gallery): 🤖 add 1 new models via gallery agent (#10209 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-07 22:09:32 +02:00
LocalAI [bot]	f64b72dd7d	feat: support Ideogram4 in stablediffusion-ggml backend + gallery (#10201 ) * feat(stablediffusion-ggml): support Ideogram4 unconditional diffusion model Bump stable-diffusion.cpp from 1f9ee88 to b9254dd, the upstream commit that adds Ideogram4 support (leejet/stable-diffusion.cpp#1609). Ideogram4 derives its classifier-free guidance from a separate unconditional diffusion model, exposed upstream through the new sd_ctx_params_t.uncond_diffusion_model_path field. Wire that field into the gosd wrapper via a new uncond_diffusion_model_path option. The _path suffix is deliberate: the Go loader only resolves options whose name contains "path" to an absolute path under the model directory, so this keeps the option consistent with diffusion_model_path and high_noise_diffusion_model_path. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): add Ideogram4 stablediffusion-ggml models Single-file GGUF weights for Ideogram4 are now published (stduhpf/ideogram-4-gguf), so add the model to the gallery. Ideogram4 is a text-to-image model with strong, accurate in-image text rendering, driven by a Qwen3-VL-8B text encoder and real classifier-free guidance from a separate unconditional diffusion model (the uncond_diffusion_model_path support added in the preceding commit). Two index entries, both built on gallery/virtual.yaml with the full config inlined in overrides (same pattern as the other models, no dedicated template file): - ideogram-4-iq4nl-ggml (4-bit, ~11.6GB diffusion) - ideogram-4-q8_0-ggml (8-bit, ~20GB diffusion) Each bundles the diffusion + unconditional GGUF (stduhpf), the Qwen3-VL-8B-Instruct text encoder (unsloth), and the FLUX.2 VAE (Comfy-Org mirror, non-gated). cfg_scale is 7 to match the upstream Ideogram4 default, since it performs real CFG unlike the guidance-distilled Flux/Z-Image models. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-06 22:50:12 +02:00
LocalAI [bot]	03c84cff28	feat(parakeet-cpp): nemotron-3.5-asr multilingual streaming model + request language support (#10199 ) * feat(parakeet-cpp): honor request language (multilingual nemotron) on batched + streaming paths Reads opts.GetLanguage() and threads it through to the new parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_stream_begin_lang C-API entry points, both probed with Dlsym so the backend still loads against an older libparakeet.so (falling back to the non-lang paths, i.e. model default). parakeet.cpp's batched C-API takes a single target_lang for the whole batch, so the dispatcher only coalesces same-language requests: a request whose language differs from the batch leader is held as a single carry-over and becomes the leader of the next batch, never dropped and never left waiting (including on shutdown). A new batcher test asserts no dispatched batch is ever mixed-language and that every submitted request still receives a reply. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gallery): add parakeet-cpp-nemotron-3.5-asr-streaming-0.6b; bump parakeet.cpp pin Adds the multilingual prompt-conditioned streaming model to the gallery (q8_0 default, OpenMDW-1.1) and bumps the parakeet-cpp backend pin to the parakeet.cpp commit that ships nemotron support plus batched causal subsampling and the batched target_lang C-API. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-06 13:53:10 +02:00
LocalAI [bot]	9bc69c9e5f	chore(model gallery): 🤖 add 1 new models via gallery agent (#10200 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-06 13:52:46 +02:00
LocalAI [bot]	9f11b09c6a	chore(model-gallery): ⬆️ update checksum (#10169 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-04 00:32:15 +02:00
LocalAI [bot]	fb36c262fe	chore(model gallery): 🤖 add 1 new models via gallery agent (#10163 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-03 19:44:51 +02:00
LocalAI [bot]	a17753f7d1	chore(model-gallery): ⬆️ update checksum (#10131 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-06-01 23:39:47 +02:00
Richard Palethorpe	718223f33b	feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 ) * chore(localvqe): update backend to v1.3, add v1.2/v1.3 gallery models Bump the LocalVQE backend pin 72bfb4c6 -> b0f0378a, which adds the v1.2 (1.3 M) and v1.3 (4.8 M) GGUF SHA-256s to the upstream released-models allowlist (and the arch_version=3 loader) so both load without LOCALVQE_ALLOW_UNHASHED. Add gallery entries for localvqe-v1.2-1.3m and localvqe-v1.3-4.8m (SHA-256 verified against the downloaded weights) and update the audio-transform docs to make v1.3 the current default while noting the compact v1.1/v1.2 alternatives. Assisted-by: Claude:claude-opus-4-8 Claude-Code Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(flake): add ffmpeg-headless to the dev shell pkg/utils/ffmpeg_test.go shells out to the `ffmpeg` CLI, and the pre-commit gate runs those tests via `make test-coverage`. Without ffmpeg in the dev shell the gate fails with "executable file not found in $PATH". The headless build provides the CLI without GUI/X deps. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(localvqe): parse WAV by walking RIFF sub-chunks Walk the RIFF chunk list instead of assuming the canonical 44-byte header layout. Real inputs (browser-recorded clips, ffmpeg output with an 18/40-byte extensible `fmt ` chunk or trailing LIST/INFO metadata) would otherwise splice header/metadata bytes into the PCM stream as an audible impulse. Honour the `data` chunk size and validate that both `fmt ` and `data` chunks are present. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(security-headers): allow blob: in connect-src for waveform fetch The waveform renderer XHRs/fetches a freshly-created blob: object URL (e.g. an uploaded or enhanced clip before it has a server URL). XHR/fetch of blob: is governed by connect-src, not media-src, so it was blocked by the CSP. Add blob: to connect-src. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(react-ui): add input/output spectrogram view to AudioTransform The transform page only showed time-domain amplitude waveforms, so you could see how loud a clip was but not which frequencies the model touched. Add a time x frequency spectrogram heatmap and render the input and output spectrums side by side, so it's visible which bands the enhancement attenuates (bright input bands that go dark in the output). Computed client-side via a Hann-windowed STFT over both clips (a small dependency-free radix-2 FFT), defaulting to the LocalVQE 512/256 frame geometry. This shows the net input->output spectral change; the model's internal gain mask is not exposed by the backend. - src/utils/fft.js radix-2 FFT - src/hooks/useSpectrogram.js decode + STFT -> normalised dB magnitude grid - src/components/audio/Spectrogram.jsx canvas heatmap (magma colormap) - AudioTransform.jsx dual-spectrogram panel + CSS - e2e spec + UI coverage baseline bump (38.29 -> 39.0; measured ~39.4-40.2) Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * test(react-ui): make UI coverage deterministic, tighten the gate UI e2e line coverage swung ~1pp run-to-run (39.1% <-> 40.2%), which forced a loose 0.8pp tolerance on the monotonic gate — a band wide enough to let a real ~300-line regression through silently. The swing was a bug, not inherent jitter: the 'Create Agent navigates' spec ended on the URL assertion, so AgentCreate.jsx's ~400 lines were collected only when its render happened to beat the coverage teardown. Wait for the page to actually render (assert its heading) so those lines are covered every run. With the race gone, repeated runs land within ~0.013pp of each other, so: - tighten UI_COVERAGE_TOLERANCE 0.8 -> 0.1 (noise floor, not a drift band) - set the baseline to the real, reliably-achieved value (39.0 -> 39.86) Localised by running the V8-coverage suite repeatedly and diffing per-file line coverage; AgentCreate.jsx was the sole ~1pp flipper. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-31 23:56:46 +02:00
LocalAI [bot]	76fe0bb929	feat(crispasr): add CrispASR backend — multi-architecture ASR + TTS (#10099 ) * feat(crispasr): backend source files (Go gRPC server, C-ABI shim, build files) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * polish(crispasr): brand error strings + fix stale shim comment Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): register backend in root Makefile Mirror the whisper Go backend registration for the new crispasr backend: NOTPARALLEL entry, prepare-test-extra/test-extra hooks, BACKEND_CRISPASR definition, docker-build target generation, and the docker-build-backends aggregate target. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): add backend build matrix entries Mirror the 11 whisper golang Dockerfile matrix entries (CPU amd64/arm64, CUDA 12/13, L4T CUDA 13, Intel SYCL f32/f16, Vulkan amd64/arm64, L4T arm64, ROCm hipblas) with backend and tag-suffix substituted to crispasr. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add crispasr backend gallery entries Add the crispasr meta anchor and its full set of image gallery entries (cpu, metal, cuda12/13, rocm, intel-sycl f32/f16, vulkan, L4T arm64, L4T cuda13 arm64, plus -development variants), mirroring the whisper backend gallery block. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): bump CRISPASR_VERSION via bump_deps workflow Track CrispStrobe/CrispASR main branch and bump CRISPASR_VERSION in backend/go/crispasr/Makefile. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): don't wire fixture-gated test into test-extra Mirror the whisper Go backend: its AudioTranscription test is gated on model/audio fixtures and skips in CI, so building crispasr (the heaviest ggml compile in the tree) inside the unit-test lane adds a long compile for zero coverage. The backend image build in backend-matrix.yml remains the authoritative compile check. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): add darwin metal build entry (mirror whisper) The metal-crispasr gallery entries and capabilities.metal mapping reference -metal-darwin-arm64-crispasr, which is only produced by an includeDarwin entry. Mirror whisper's darwin metal entry so the tag actually gets built. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(crispasr): place hipblas matrix entry next to whisper twin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): register crispasr as pref-only ASR backend + test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): port whisper behavioral suite (cancellation + streaming) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): fix skip message env var names to CRISPASR_* Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): switch shim to crispasr_session_* multi-architecture API The shim used whisper_full(), which in CrispASR is the whisper-only path: libcrispasr only transcribes Whisper GGUFs through it. Multi-architecture transcription (Parakeet, Voxtral, Qwen3-ASR, Canary, Granite, FunASR, Paraformer, SenseVoice, ...) goes through the crispasr_session_* C-ABI, which auto-detects the architecture from the GGUF and dispatches to the matching backend. Rewrite the C shim around crispasr_session_open / _transcribe_lang / _result_* and add get_backend() so the selected backend is logged. load_model now takes a threads param (session_open binds n_threads at open). The session result is segment+word based with no token IDs and no per-decode callback, so drop n_tokens / get_token_id / get_segment_speaker_turn_next / set_new_segment_callback. set_abort is kept for API parity but is best-effort: the session transcribe is blocking with no abort hook. Update the purego bindings and gocrispasr.go to match: tokens are left empty, speaker-turn handling is removed, and AudioTranscriptionStream emits one delta per non-empty segment after the blocking decode returns (no progressive streaming via the session API), preserving the concat(deltas) == final.Text invariant. crispasr_session_set_translate is exported by libcrispasr but not declared in crispasr.h, so it is forward-declared in the shim alongside the open/transcribe/result functions. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * build(crispasr): link full CrispASR backend set for multi-arch support The shim's crispasr_session_* dispatch calls into the per-architecture backend libs (parakeet, voxtral, qwen3_asr, canary, funasr, paraformer, sensevoice, ...), which CrispASR builds as static archives. Linking only crispasr + ggml dead-stripped every backend object from the final module (nm backend-symbol count: 0), leaving a whisper-only .so. Link the same backend set as crispasr-cli so the static archives are pulled in. After this the module carries the backend symbols (nm count 407, .so grows from ~2.1MB to ~6.7MB) and the session API can dispatch to every compiled-in architecture. Also rewrite ${CMAKE_SOURCE_DIR}/examples/talk-llama to ${PROJECT_SOURCE_DIR}/... in the vendored src/CMakeLists.txt: CrispASR locates its vendored llama.cpp via ${CMAKE_SOURCE_DIR}, which is wrong when CrispASR is add_subdirectory'd (CMAKE_SOURCE_DIR points at this backend dir, not the CrispASR root). PROJECT_SOURCE_DIR is correct both standalone and as a subproject; the sed is idempotent. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): adapt suite to session API (blocking, no decode callback) Register the new symbol set (drop the removed token/speaker/callback funcs, add get_backend; load_model now takes 2 args). The session transcribe is blocking with no abort hook, so a mid-decode cancel can't interrupt it: change the cancellation spec to cancel the context before the call and assert codes.Canceled from the pre-call ctx.Err() check, dropping the <5s mid-decode timing assertion. The streaming spec still holds with per-segment post-decode emission (>=2 deltas, concat(deltas) == final.Text). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add CrispASR ASR model entries (-crispasr) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(gallery): keep only session-auto-detectable CrispASR ASR models The crispasr backend loads models via crispasr_session_open, which auto-detects the backend from the GGUF general.architecture using crispasr_detect_backend_from_gguf. Architectures not in that detect map cannot be opened, so those gallery entries fail to load. Removed entries whose architecture is not wired into CrispASR v0.6.11's session auto-detect router (they can be re-added when upstream maps them): - Not in the detect map: data2vec, firered-asr, funasr, fun-asr-mlt-nano, glm-asr, hubert, kyutai-stt, mega-asr, mimo-asr, moonshine{,-de,-streaming,-tiny-de}, omniasr{,-llm,-llm-1b}, paraformer, sensevoice. - Pending verification (filename-heuristic routed, not arch-detected): parakeet-ctc-0.6b, parakeet-ctc-1.1b. Their GGUFs are routed to the fastconformer-ctc backend by a filename heuristic in the model registry, which implies general.architecture is not a mapped string. Kept the parakeet rnnt/tdt_ctc variants: convert-parakeet-to-gguf.py writes general.architecture="parakeet" unconditionally and encodes the rnnt/ctc distinction in metadata fields, so they session-auto-detect. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): TTS synthesis via crispasr_session_synthesize (24kHz) Add tts_synthesize/tts_free/tts_set_voice to the C-ABI shim. They reuse the already-open g_session (crispasr_session_open auto-detects a TTS model) and dispatch to the upstream synthesis call, which returns malloc'd 24 kHz mono float PCM. Orpheus needs a SNAC codec path that we do not set, so it returns NULL here and surfaces as an error Go-side. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): implement TTS/TTSStream gRPC methods Bind the new shim functions via purego and implement TTS, TTSStream and a writeWAV24k helper. synthesize copies the C-owned PCM out before freeing it; TTS writes a 24 kHz mono 16-bit WAV to req.Dst via go-audio/wav. CrispASR has no progressive synth, so TTSStream synthesizes fully, encodes to WAV, and emits the bytes as a single chunk; it owns the results-channel close (the gRPC server wrapper ranges until close), mirroring vibevoice-cpp's TTSStream. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): log when a TTS voice override is not honored Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add CrispASR vibevoice-tts model entry Only vibevoice-tts works through the current shim: qwen3-tts, chatterbox, and orpheus require companion codec/s3gen/SNAC paths (set_codec_path / set_s3gen_path) that the shim doesn't wire yet, and kokoro/indextts/voxcpm2 aren't in the session auto-detect map. Those are follow-ups. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(crispasr): gated TTS synthesis spec Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(crispasr): satisfy golangci-lint (errcheck defers + unsafeptr nolint) The crispasr Go file is entirely new, so new-from-merge-base lints every line (unlike the grandfathered whisper backend it was forked from): - handle os.RemoveAll / fh.Close return values in AudioTranscription - annotate the two intentional C-pointer unsafe.Slice sites with //nolint:govet Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): backend: and codec: model options (explicit arch + companion files) Add two model-config options to the CrispASR backend via opts.Options: - backend:<name> selects an explicit CrispASR backend (bypassing auto-detect) by routing load_model through crispasr_session_open_explicit, unlocking architectures the detector won't pick on its own (qwen3, cohere, granite, voxtral, moonshine, mimo-asr, orpheus, kokoro, chatterbox, etc.). - codec:<path> loads a companion file (qwen3-tts codec, orpheus SNAC, chatterbox s3gen, or mimo-asr tokenizer) via the universal crispasr_session_set_codec_path setter after the session opens. A relative path resolves against the model directory. rc==0 means success or not-applicable; only a negative rc is fatal. The C shim load_model gains a backend_name argument and a new set_codec_path entry point; the Go bridge parses the prefix:value options and registers the new symbol. The vad_only path is unchanged. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): expand CrispASR models via backend:/codec: options (explicit arch + companions) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(gallery): use virtual.yaml base for crispasr models The crispasr entries are just backend + model + a couple options, fully expressed inline via overrides:/files: in gallery/index.yaml. Point each url: at the shared gallery/virtual.yaml (the established 'virtual' model trick) and drop the 36 redundant per-model gallery/-crispasr.yaml files. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(gallery): drop voice-requiring TTS entries (keep vibevoice-tts) Real e2e showed qwen3-tts/orpheus/chatterbox don't synthesize through the current shim: the codec: companion loads fine, but these engines additionally need a voice pack / voice prompt / reference clip (qwen3-tts base errors 'no voice'; chatterbox is zero-shot cloning; orpheus uses named voices) that the backend doesn't wire. (qwen3-tts also can't auto-detect: its GGUF arch is 'qwen3tts', unmapped by the detector — would need backend:qwen3-tts.) Removed to avoid shipping non-working gallery entries; vibevoice-tts (built-in voice, e2e-verified) remains the working TTS. Voice-pack wiring is a follow-up. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(crispasr): speaker: and voice: TTS options (baked speakers + voice packs/prompts) speaker:<name> -> crispasr_session_set_speaker_name (baked speakers: qwen3-tts CustomVoice, orpheus). voice:<path>(+voice_text:<ref>) -> crispasr_session_set_voice (voice-pack GGUF, or WAV zero-shot clone with ref text). Applied at Load as the default voice; req.Voice still overrides the speaker per request. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): re-add e2e-verified TTS engines (chatterbox, qwen3-tts-customvoice, orpheus) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 12:11:03 +02:00
LocalAI [bot]	4912c9b73a	feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084 ) * feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and --parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add &parakeetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(parakeet-cpp): document the parakeet-cpp transcription backend Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(parakeet-cpp): drop em dashes from comments and docs Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add parakeet-cpp f16 models to the model gallery Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION - goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): init the ggml submodule in the backend image clone The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): statically link ggml into libparakeet.so The shared libparakeet.so linked ggml's shared libs (libggml.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-30 14:46:10 +02:00
LocalAI [bot]	b389f0fe5f	chore(model-gallery): ⬆️ update checksum (#10081 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-30 00:11:57 +02:00
LocalAI [bot]	3c9b9529c0	chore(model gallery): 🤖 add 1 new models via gallery agent (#10061 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-29 16:39:14 +02:00
LocalAI [bot]	73cfedc023	fix: tool-call JSON leaks into content with stream+tools on tokenizer-template models (#10052 ) (#10057 ) * fix(grammars): honor properties_order entry at index 0 The JSON-schema-to-GBNF property sort used `aOrder != 0 && bOrder != 0` as its "is this key ordered?" guard. That treats index 0 — the first key listed in properties_order — as unset, so `properties_order: name,arguments` fell back to alphabetical ordering and still emitted "arguments" before "name". Use presence in the order map instead: listed keys sort by their index and ahead of unlisted keys, which keep a stable alphabetical order. This makes the documented `properties_order: name,arguments` actually produce name-first tool-call JSON. Relates to #10052. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(functions): defer tool grammar to the backend when the tokenizer template owns templating (#10052) When use_tokenizer_template delegates templating to the backend (llama.cpp), the backend also owns tool-call grammar generation and parsing. LocalAI was still generating its own GBNF grammar and sending it down. With a grammar present, llama.cpp does not hand the tools to its template, so its native peg/json tool parser never engages: it streams the grammar-constrained tool-call JSON back as plain content instead of emitting tool_calls. In streaming mode the JSON object leaked into the content field, and the Go-side incremental detector never gated content because the LocalAI-generated grammar emitted "arguments" before "name". The GGUF auto-import path already couples use_tokenizer_template with grammar.disable, but that block is skipped when a template is already configured, so gallery and hand-written configs (e.g. qwen3) that set the tokenizer template directly never got the paired grammar.disable. - SetDefaults now enforces the coupling for every config: when use_tokenizer_template is set, grammar generation is disabled and tools flow to the backend's native (name-first) pipeline. This also fixes already-installed models without editing each config. - Set function.grammar.disable in the shared gallery/qwen3.yaml, which is the base config referenced by every qwen3 gallery entry. Verified end to end against qwen3-4b with stream:true + tools: content no longer carries the tool-call JSON, reasoning is classified separately, and tool calls stream as proper name-first tool_calls deltas. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-29 10:12:53 +02:00
LocalAI [bot]	2671e0c6f7	chore(model-gallery): ⬆️ update checksum (#10038 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-28 00:22:19 +02:00
LocalAI [bot]	7a4ca8f60d	feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028 ) Adds a Go native gRPC backend that dlopens librfdetrcpp.so (built from mudler/rf-detr.cpp at the pinned RFDETR_VERSION) via purego and exposes the rfdetr.cpp inference pipeline through LocalAI's existing Detect RPC. Supports all 5 RF-DETR detection variants (Nano/Small/Base/Medium/Large) and 6 segmentation variants (SegNano/SegSmall/SegMedium/SegLarge/ SegXLarge/Seg2XLarge) with F32/F16/Q8_0/Q4_K quantizations. Pre-built GGUFs ship at mudler/rfdetr-cpp-* on HuggingFace. Detection returns Bbox + class_name + confidence; segmentation also returns PNG-encoded per-detection masks via the rfdetr_capi accessor functions (rfdetr_capi_get_detection_{class_id,box,score,class_name, mask_png}). End-to-end verified through POST /v1/detection: HTTP -> gRPC -> purego dlopen -> rfdetr.cpp -> ggml -> response (9 detections on the detection model, 21 detections + valid PNG masks on the seg-nano model against the kitchen fixture). Wiring: - backend/go/rfdetr-cpp/{main.go,gorfdetrcpp.go,CMakeLists.txt, Makefile,run.sh,package.sh,test.sh,.gitignore} - Top-level Makefile: BACKEND_RFDETR_CPP, docker-build target, .NOTPARALLEL, prepare-test-extra, test-extra - backend/go/rfdetr-cpp/Makefile: `test` target invoked by test-extra - .github/backend-matrix.yml: CPU + CUDA-12/13 + L4T CUDA-12/13 (arm64) + HIP + Vulkan (amd64 + arm64) + SYCL f32/f16 - backend/index.yaml: rfdetr-cpp meta anchor + latest/development image entries for every matrix tag-suffix - .github/workflows/bump_deps.yaml: RFDETR_VERSION pin tracking (mudler/rf-detr.cpp branch main) - gallery/index.yaml: 11 rfdetr-cpp-* entries (nano + 4 detection variants + 6 seg variants), all backed by mudler/rfdetr-cpp-* on HuggingFace with sha256 pinning on the F16 default - core/gallery/importers/rfdetr.go: GGUF auto-routing for HF imports (mudler/rfdetr-cpp-* repos route to rfdetr-cpp, Transformer-format repos stay on the Python rfdetr backend; explicit preferences.backend overrides both heuristics) - core/gallery/importers/rfdetr_test.go: table-driven coverage of the auto-routing + a live mudler/rfdetr-cpp-nano cross-check scripts/changed-backends.js needs no change: the existing Dockerfile.golang -> backend/go/${item.backend}/ branch already routes the 9 rfdetr-cpp matrix entries to the correct backend path. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-27 18:43:57 +02:00
LocalAI [bot]	80893a298b	chore(model gallery): 🤖 add 1 new models via gallery agent (#10016 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 22:02:12 +02:00
LocalAI [bot]	437f0fa193	chore(model gallery): 🤖 add 1 new models via gallery agent (#10011 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-26 08:45:10 +02:00
LocalAI [bot]	1c6c3adad6	fix(reasoning): stop <think> leaking into content when autoparser is in pure-content mode (#9991 ) When LocalAI templates a thinking model outside of jinja (the default for the qwen3 gallery family), llama.cpp's chat parser falls back to a "pure content" PEG parser that dumps the entire raw response into ChatDelta.Content with an empty ReasoningContent. The Go side then trusted that content verbatim and overrode tokenCallback's correctly-split reasoning, so <think>...</think> blocks ended up in the OpenAI `content` field. Regression from v4.0.0 introduced when the autoparser ChatDeltas path was added (#9224). The override now runs Go-side reasoning extraction defensively when the autoparser delivered content but no reasoning. The streaming worker gains a sticky preferAutoparser flag that flips on the first chunk where the autoparser classified reasoning_content; until then we use the streaming Go-side extractor. Realtime mirrors the non-streaming fallback. When the autoparser already populated ReasoningContent we trust it untouched, so jinja-enabled installs are not regressed. gallery/qwen3.yaml now enables use_jinja, letting the autoparser classify <think> natively for all 20+ qwen3 family entries that share this template. Fixes #9985 Assisted-by: Claude:opus-4-7 [Read] [Edit] [Bash] [Write] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-25 22:39:50 +02:00
Ettore Di Giacinto	c2cd3b9ada	fix(gallery/ltx-2.3): add vae_decode_only:false for i2v / flf2v (#9987 ) LTX-2.3 i2v inference fails inside generate_video with: [ERROR] LTXAV image conditioning requires VAE encoder weights; create the context with vae_decode_only=false Without vae_decode_only:false in the options block, gosd.cpp creates the sd_ctx with VAE encoder weights freed, so latent encoding of the init_image is impossible. Adding the option mirrors what we already do for Wan i2v entries. Affects all six LTX-2.3 entries (dev/distilled × UD-Q4_K_M, Q4_K_M, Q8_0). T2V wasn't impacted by the missing option since it has no init image to encode, which is why the T2V smoke earlier passed. Assisted-by: Claude:claude-opus-4-7	2026-05-25 21:40:12 +02:00
Ettore Di Giacinto	9ff270eb65	fix(gallery/ltx-2.3): add diffusion_model flag to all variants (#9986 ) LTX-2.3 entries (dev / distilled, UD-Q4_K_M / Q4_K_M / Q8_0) were missing the `diffusion_model` option in their overrides. Without it, gosd.cpp routes the main GGUF through the regular `model_path` code path in sd.cpp, which doesn't apply the `model.diffusion_model.` tensor prefix. sd.cpp's LTX-2.3 architecture detection (`VERSION_LTXAV`) in get_sd_version checks for prefixed tensor names — without the prefix, detection fails and load_model returns "could not load model". This is the same bug we hit for Wan when the option was missing. Adding `- diffusion_model` to all six LTX-2.3 entries' option blocks makes load_model take the diffusion_model_path branch so detection succeeds. Assisted-by: Claude:claude-opus-4-7	2026-05-25 21:10:40 +02:00
LocalAI [bot]	b02e3ffe61	feat(stablediffusion-ggml): LTX-2 support + LTX-2.3 GGUF gallery entries (#9980 ) stable-diffusion.cpp gained LTX-2 video generation, which requires an audio VAE and an embeddings_connectors safetensors in addition to the usual diffusion model, VAE, and LLM text encoder. The pinned commit exposes audio_vae_path and embeddings_connectors_path on sd_ctx_params_t; wire both through the option parser so gallery entries can point at the LTX-specific assets. Ship six LTX-2.3 GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0 each) backed by a new ltx-ggml.yaml template that defaults to euler / cfg_scale 6.0 / vae_decode_only:false / diffusion_flash_attn / offload_params_to_cpu — matching the upstream LTX-2 CLI recipe. Each entry pulls the model GGUF plus the QAT gemma-3-12b-it text encoder, video VAE, audio VAE, and embeddings connectors needed for T2V / I2V / FLF2V. Assisted-by: Claude:claude-opus-4-7 [Claude-Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-25 13:00:28 +02:00
Richard Palethorpe	6a80e23733	feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 ) Add a routing middleware stack and a cloud-proxy backend. * cloud-proxy: a Go gRPC backend that forwards OpenAI- and Anthropic-shaped chat requests to upstream providers, with an optional translate mode (OpenAI request -> Anthropic /v1/messages -> OpenAI response) and full tool-calling support. * routing: admission control, content-aware model routing (embedding cache + classifier + rerank + Arch-Router score), PII detection/redaction (regex + NER) with streaming filter and OpenAI/Anthropic adapters, and a per-user/per-key billing recorder backed by GORM or in-memory storage. * middleware: UsageMiddleware records usage via the billing recorder, plus admission, route-model, usage-stamp and trace middlewares. * observability: BackendTrace ring buffer stores full request bodies (capped), MITM proxy emits structured trace events, and router classifier decisions surface at /api/router/decide. * gallery: Arch-Router-1.5B (Q4_K_M and Q8_0). * UI: cloud-proxy model-editor fields, classifier system-prompt and score-normalization config, and a Traces page rendering request bodies. Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-25 09:28:27 +02:00
LocalAI [bot]	e3706c0512	chore(model-gallery): ⬆️ update checksum (#9910 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 23:38:45 +02:00
LocalAI [bot]	403d391316	chore(model-gallery): ⬆️ update checksum (#9901 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-05-20 01:01:20 +02:00

1 2 3 4 5 ...

1420 Commits