LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-20 05:04:04 -04:00

Files

LocalAI [bot] 4912c9b73a feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084 )

* feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text)

Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat
C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the
whisper / vibevoice-cpp backends).

L0 scope:
- main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register
  the C-API entry points, start the gRPC server.
- goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription
  (parakeet_capi_transcribe_path, decoder=0 = per-arch default head),
  Free, serialized through base.SingleThread since the C engine is a
  thread-unsafe singleton. char* returns are bound as uintptr so the
  malloc'd buffer is freed via parakeet_capi_free_string after copy.
- AudioTranscriptionStream returns a clear "not implemented in L0" error
  (closes the channel so the server doesn't hang), wired in L2.
- Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh),
  with a local-symlink dev shortcut; run.sh / package.sh mirror whisper.
- Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures.

Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single
unsafeptr vet note in goStringFromCPtr is documented and matches the
whisper backend's tolerated pattern.

Word/segment timestamps (L1) and cache-aware streaming (L2) follow.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json

AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes
the per-word / per-token timestamps into the TranscriptResult:

- Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like
  the other returns) and register it in main.go + the test loader.
- Parse the JSON document ({"text","words":[{w,start,end,conf}],
  "tokens":[{id,t,conf}]}) into typed structs.
- Synthesise a single whole-clip segment (parakeet emits no native segment
  boundaries) spanning the first word start to the last word end; token ids
  populate Segment.Tokens.
- Attach word-level timings only when timestamp_granularities=["word"],
  matching the OpenAI API (segment-level default). secondsToNanos mirrors
  the whisper backend's nanosecond convention.

Verified end-to-end against tdt_ctc-110m (f16): both the default and
word-granularity specs pass; builds clean, gofmt clean, vet shows only the
one documented unsafeptr note shared with the whisper backend.

Cache-aware streaming (L2) follows.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation

Wire AudioTranscriptionStream to the streaming RNN-T C-API:

- Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz
  mono float PCM ([]float32 via purego) and writes *eou_out on <EOU>/<EOB>.
- Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as
  the whisper backend), feed it in 1 s chunks, and emit each newly-finalized
  text run as a TranscriptStreamResponse delta.
- <EOU>/<EOB> events close the current segment; a closing FinalResult carries
  the full transcript plus the per-utterance segments (with a whole-clip
  fallback segment when no EOU fired).
- stream_begin returns 0 for non-streaming models, surfaced as a clear
  error instead of an empty stream. Honours context cancellation between
  chunks. Frees every malloc'd delta and the session.

Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed
transcript matches the offline 110m reference word-for-word, deltas
reconstruct the final text, and the spec passes alongside the offline
specs. Builds clean, gofmt clean, vet shows only the shared documented
unsafeptr note.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity)

Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA
NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the
existing ggml whisper Go backend 1:1.

- .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring
  every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64,
  nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64,
  rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with
  backend: "parakeet-cpp" and -*-parakeet-cpp tag-suffixes.
- scripts/changed-backends.js: explicit inferBackendPath branch resolving
  parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch.
- .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in
  backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master).
- backend/index.yaml: add &parakeetcpp meta + latest/development image entries
  for every matrix tag-suffix.
- Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP
  definition, docker-build target eval, and test-extra-backend-parakeet-cpp-
  transcription target (mirrors test-extra-backend-whisper-transcription).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs

Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model
and route to the parakeet-cpp backend (it also surfaces in /backends/known,
which drives the import dropdown).

- Match is narrow: a .gguf whose name carries a parakeet architecture token
  (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf,
  realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or
  preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary
  llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not
  runnable here).
- Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't
  swallowed by the generic .gguf importer.
- Import nests files under parakeet-cpp/models/<name>/, defaults to the
  smallest quant (q4_k, near-lossless on parakeet) with a size-ladder
  fallback, and honours preferences.quantizations / name / description.

Tested with synthetic HF details (no network): metadata, positive matches
(HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo
repo), and import (default quant, override, direct URL), 9 specs pass,
build/vet/gofmt clean.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(parakeet-cpp): document the parakeet-cpp transcription backend

Add parakeet-cpp to the audio-to-text backend list and a dedicated usage
section: direct GGUF import (auto-detects to the backend), model YAML,
word-level timestamps via timestamp_granularities[]=word, and cache-aware
streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf
collection repo.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra

The L3 commit added the test-extra-backend-parakeet-cpp-transcription
Makefile target but never invoked it in CI. Mirror the whisper job:

- Add a parakeet-cpp output to detect-changes (emitted by
  changed-backends.js from the matrix entry).
- Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp
  path filter / run-all, building the backend image and running the
  transcription e2e against tdt_ctc-110m + the JFK clip.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* style(parakeet-cpp): drop em dashes from comments and docs

Replace em dashes with plain punctuation in the backend comments, the
importer, package.sh, and the audio-to-text docs section (and use "and"
instead of the multiplication sign). No behaviour change.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(gallery): add parakeet-cpp f16 models to the model gallery

Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed
default) as gallery entries that install on the parakeet-cpp backend from
mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b,
ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming
realtime_eou_120m-v1. Each pins the file sha256 and routes transcript
usecases to the backend.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION

- goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer
  conversion (golangci-lint reports new-only issues, so unlike the whisper
  backend's identical line this one is flagged).
- Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit
  (the previous pin's commit no longer exists after upstream history was
  squashed), so the backend image clone/build resolves again.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit

The previous SHA pin was orphaned when parakeet.cpp's single-commit master
was amended/force-pushed, so the backend image clone (git fetch <sha>) failed
across every build variant. Repoint to 845c29e, which upstream now keeps
permanently fetchable via the `localai-backend-pin` tag, so future upstream
amends no longer break the backend build.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(parakeet-cpp): init the ggml submodule in the backend image clone

The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow
fetch + checkout but never initialised submodules, so third_party/ggml was
empty and the parakeet.cpp cmake build failed at
`add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build
variant. Add `git submodule update --init --recursive --depth 1
--single-branch` after checkout, mirroring the whisper backend. Verified
locally: clone + submodule + cmake configure now succeeds.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(parakeet-cpp): statically link ggml into libparakeet.so

The shared libparakeet.so linked ggml's shared libs (libggml*.so), but the
package only ships libparakeet.so, so at runtime dlopen failed with
"libggml.so.0: cannot open shared object file" (the e2e transcription test
panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF,
CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends
only on system libs already present in the runtime image. Verified locally:
ldd shows no libggml dependency.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream

The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m
(not a cache-aware streaming model), so stream_begin returned 0 and the call
errored. Per LocalAI's streaming contract (and the whisper backend), a
non-streaming model should fall back to a single offline transcription
emitted as one delta plus a closing FinalResult. Do that instead of erroring,
so the streaming endpoint works for every parakeet model. Verified locally:
the streaming spec passes against the non-streaming 110m model via fallback.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-30 14:46:10 +02:00

alpaca.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

arch-function.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

bge-m3-colbert.yaml

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

cerbero.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

chatml-hercules.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

chatml.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

codellama.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

command-r.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

deephermes.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

deepseek-r1.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

deepseek.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

dreamshaper.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

falcon3.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

flux-ggml.yaml

fix(flux): Set CFG=1 so that prompts are followed (#5378 )

2025-05-16 17:53:54 +02:00

flux.yaml

fix(flux): Set CFG=1 so that prompts are followed (#5378 )

2025-05-16 17:53:54 +02:00

gemma.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

granite3-2.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

granite4.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

granite.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

harmony.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

hermes-2-pro-mistral.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

hermes-vllm.yaml

chore(model-gallery): add more quants for popular models (#3365 )

2024-08-24 00:29:24 +02:00

index.yaml

feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) (#10084 )

2026-05-30 14:46:10 +02:00

jamba.yaml

chore(model gallery): add ai21labs_ai21-jamba-reasoning-3b (#6417 )

2025-10-09 15:00:56 +02:00

kokoros.yaml

feat: Add Kokoros backend (#9212 )

2026-04-08 19:23:16 +02:00

lfm.yaml

feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 )

2026-05-13 21:57:27 +02:00

liquid-audio.yaml

feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 )

2026-05-13 21:57:27 +02:00

llama3-instruct.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llama3.1-instruct-grammar.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llama3.1-instruct.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llama3.1-reflective.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llama3.2-fcall.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llama3.2-quantized.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

llava.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

ltx-ggml.yaml

feat(stablediffusion-ggml): LTX-2 support + LTX-2.3 GGUF gallery entries (#9980 )

2026-05-25 13:00:28 +02:00

mathstral.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

mistral-0.3.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

moondream.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

mudler.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

nanbeige4.1.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

noromaid.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

openvino.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

parler-tts.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

phi-2-chat.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

phi-2-orange.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

phi-3-chat.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

phi-3-vision.yaml

fix(phi3-vision): add multimodal template (#3944 )

2024-10-23 15:34:45 +02:00

phi-4-chat-fcall.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

phi-4-chat.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

piper.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

pocket-tts.yaml

feat(tts): add pocket-tts backend (#8018 )

2026-01-13 23:35:19 +01:00

qwen3-deepresearch.yaml

chore(model gallery): add alibaba-nlp_tongyi-deepresearch-30b-a3b (#6295 )

2025-09-17 09:22:19 +02:00

qwen3-openbuddy.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

qwen3.yaml

fix: tool-call JSON leaks into content with stream+tools on tokenizer-template models (#10052 ) (#10057 )

2026-05-29 10:12:53 +02:00

qwen-fcall.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

qwen-image.yaml

Update qwen-image.yaml

2025-08-06 10:40:46 +02:00

rerankers.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

rwkv.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

sd-ggml.yaml

chore(model gallery): add sd-3.5-large-ggml (#4647 )

2025-01-20 19:04:23 +01:00

sentencetransformers.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

sglang-gemma-4-e2b-mtp.yaml

feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686 )

2026-05-07 17:27:29 +02:00

sglang-gemma-4-e4b-mtp.yaml

feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686 )

2026-05-07 17:27:29 +02:00

sglang-mimo-7b-mtp.yaml

feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686 )

2026-05-07 17:27:29 +02:00

sglang.yaml

feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686 )

2026-05-07 17:27:29 +02:00

sherpa-onnx-asr.yaml

feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )

2026-04-24 14:40:06 +02:00

sherpa-onnx-tts.yaml

feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )

2026-04-24 14:40:06 +02:00

sherpa-onnx-vad.yaml

feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )

2026-04-24 14:40:06 +02:00

smolvlm.yaml

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

stablediffusion3.yaml

feat(sd-3): add stablediffusion 3 support (#2591 )

2024-06-18 15:09:39 +02:00

tuluv2.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

vibevoice.yaml

feat(vibevoice): add new backend (#7494 )

2025-12-10 21:14:21 +01:00

vicuna-chat.yaml

models(gallery): add apollo2-9b (#3860 )

2024-10-17 10:16:52 +02:00

virtual.yaml

fix: yamlint warnings and errors (#2131 )

2024-04-25 17:25:56 +00:00

vllm.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

wan-ggml.yaml

chore(gallery): fixup wan

2026-04-19 21:31:22 +00:00

whisper-base.yaml

models(gallery): add all whisper variants (#2462 )

2024-06-01 20:04:03 +02:00

wizardlm2.yaml

feat: refactor build process, drop embedded backends (#5875 )

2025-07-22 16:31:04 +02:00

z-image-ggml.yaml

Fix load of z-image-turbo (#9264 )

2026-04-11 08:42:13 +02:00