mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-14 19:58:44 -04:00
* feat(parakeet-cpp): L0 backend scaffold, LoadModel + AudioTranscription (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L1 word/segment timestamps via transcribe_path_json AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L2 cache-aware streaming with EOU segmentation Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes *eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L3 register backend in build/CI/gallery (whisper parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and -*-parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add ¶keetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(parakeet-cpp): L4 gallery importer for parakeet GGUFs Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(parakeet-cpp): document the parakeet-cpp transcription backend Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci(parakeet-cpp): wire transcription gRPC e2e test into test-extra The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * style(parakeet-cpp): drop em dashes from comments and docs Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(gallery): add parakeet-cpp f16 models to the model gallery Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): satisfy govet lint + bump PARAKEET_VERSION - goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): pin PARAKEET_VERSION to a tag-stable commit The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): init the ggml submodule in the backend image clone The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): statically link ggml into libparakeet.so The shared libparakeet.so linked ggml's shared libs (libggml*.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(parakeet-cpp): non-streaming fallback in AudioTranscriptionStream The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
181 lines
5.6 KiB
Go
181 lines
5.6 KiB
Go
package importers
|
|
|
|
import (
|
|
"encoding/json"
|
|
"path/filepath"
|
|
"strings"
|
|
|
|
"github.com/mudler/LocalAI/core/config"
|
|
"github.com/mudler/LocalAI/core/gallery"
|
|
"github.com/mudler/LocalAI/core/schema"
|
|
"github.com/mudler/LocalAI/pkg/downloader"
|
|
hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
|
|
"go.yaml.in/yaml/v2"
|
|
)
|
|
|
|
var _ Importer = &ParakeetCppImporter{}
|
|
|
|
// ParakeetCppImporter recognises parakeet.cpp GGUF weights, the C++/ggml port
|
|
// of NVIDIA NeMo Parakeet. The signal is narrow on purpose: parakeet.cpp names
|
|
// its weights "<arch>-<size>-<quant>.gguf" (e.g. tdt_ctc-110m-f16.gguf,
|
|
// rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), so we only match a
|
|
// .gguf whose name carries a parakeet architecture token. That keeps us from
|
|
// claiming arbitrary llama-style GGUFs (the importer is registered before
|
|
// llama-cpp), and it deliberately does NOT match the upstream nvidia/parakeet-*
|
|
// NeMo repos (which ship .nemo checkpoints, not runnable GGUFs).
|
|
// preferences.backend="parakeet-cpp" forces the importer regardless.
|
|
type ParakeetCppImporter struct{}
|
|
|
|
func (i *ParakeetCppImporter) Name() string { return "parakeet-cpp" }
|
|
func (i *ParakeetCppImporter) Modality() string { return "asr" }
|
|
func (i *ParakeetCppImporter) AutoDetects() bool { return true }
|
|
|
|
func (i *ParakeetCppImporter) Match(details Details) bool {
|
|
preferences, err := details.Preferences.MarshalJSON()
|
|
if err != nil {
|
|
return false
|
|
}
|
|
preferencesMap := make(map[string]any)
|
|
if len(preferences) > 0 {
|
|
if err := json.Unmarshal(preferences, &preferencesMap); err != nil {
|
|
return false
|
|
}
|
|
}
|
|
|
|
if b, ok := preferencesMap["backend"].(string); ok && b == "parakeet-cpp" {
|
|
return true
|
|
}
|
|
|
|
// Direct URL or path to a parakeet GGUF.
|
|
if isParakeetGGUF(filepath.Base(details.URI)) {
|
|
return true
|
|
}
|
|
|
|
// HF repo shipping at least one parakeet GGUF.
|
|
if details.HuggingFace != nil {
|
|
for _, f := range details.HuggingFace.Files {
|
|
if isParakeetGGUF(filepath.Base(f.Path)) {
|
|
return true
|
|
}
|
|
}
|
|
}
|
|
|
|
return false
|
|
}
|
|
|
|
func (i *ParakeetCppImporter) Import(details Details) (gallery.ModelConfig, error) {
|
|
preferences, err := details.Preferences.MarshalJSON()
|
|
if err != nil {
|
|
return gallery.ModelConfig{}, err
|
|
}
|
|
preferencesMap := make(map[string]any)
|
|
if len(preferences) > 0 {
|
|
if err := json.Unmarshal(preferences, &preferencesMap); err != nil {
|
|
return gallery.ModelConfig{}, err
|
|
}
|
|
}
|
|
|
|
name, ok := preferencesMap["name"].(string)
|
|
if !ok {
|
|
name = filepath.Base(details.URI)
|
|
}
|
|
|
|
description, ok := preferencesMap["description"].(string)
|
|
if !ok {
|
|
description = "Imported from " + details.URI
|
|
}
|
|
|
|
// parakeet quants are near-lossless even at Q4_K (WER 0.0 vs NeMo on 110m),
|
|
// so default to the smallest, then fall back up the size ladder; the last
|
|
// file wins if none match (mirrors whisper / llama-cpp).
|
|
preferredQuants, _ := preferencesMap["quantizations"].(string)
|
|
quants := []string{"q4_k", "q5_k", "q6_k", "q8_0", "f16"}
|
|
if preferredQuants != "" {
|
|
quants = strings.Split(preferredQuants, ",")
|
|
}
|
|
|
|
cfg := gallery.ModelConfig{
|
|
Name: name,
|
|
Description: description,
|
|
}
|
|
|
|
modelConfig := config.ModelConfig{
|
|
Name: name,
|
|
Description: description,
|
|
Backend: "parakeet-cpp",
|
|
KnownUsecaseStrings: []string{"transcript"},
|
|
}
|
|
|
|
uri := downloader.URI(details.URI)
|
|
directGGUF := isParakeetGGUF(filepath.Base(details.URI))
|
|
switch {
|
|
case uri.LooksLikeURL() && directGGUF:
|
|
// Direct file URL (e.g. .../resolve/main/tdt_ctc-110m-f16.gguf). The
|
|
// exact file is known, no quant pick.
|
|
fileName, err := uri.FilenameFromUrl()
|
|
if err != nil {
|
|
return gallery.ModelConfig{}, err
|
|
}
|
|
target := filepath.Join("parakeet-cpp", "models", name, fileName)
|
|
cfg.Files = append(cfg.Files, gallery.File{
|
|
URI: details.URI,
|
|
Filename: target,
|
|
})
|
|
modelConfig.PredictionOptions = schema.PredictionOptions{
|
|
BasicModelRequest: schema.BasicModelRequest{Model: target},
|
|
}
|
|
case details.HuggingFace != nil:
|
|
// HF repo: collect every parakeet GGUF, pick the preferred quant, and
|
|
// nest under parakeet-cpp/models/<name>/ so a multi-quant repo doesn't
|
|
// collide on disk.
|
|
var ggufFiles []hfapi.ModelFile
|
|
for _, f := range details.HuggingFace.Files {
|
|
if isParakeetGGUF(filepath.Base(f.Path)) {
|
|
ggufFiles = append(ggufFiles, f)
|
|
}
|
|
}
|
|
if chosen, ok := pickPreferredGGMLFile(ggufFiles, quants); ok {
|
|
target := filepath.Join("parakeet-cpp", "models", name, filepath.Base(chosen.Path))
|
|
cfg.Files = append(cfg.Files, gallery.File{
|
|
URI: chosen.URL,
|
|
Filename: target,
|
|
SHA256: chosen.SHA256,
|
|
})
|
|
modelConfig.PredictionOptions = schema.PredictionOptions{
|
|
BasicModelRequest: schema.BasicModelRequest{Model: target},
|
|
}
|
|
}
|
|
default:
|
|
// Bare URI with no HF metadata (pref-only path): point at the basename
|
|
// so users can tweak the YAML after import.
|
|
modelConfig.PredictionOptions = schema.PredictionOptions{
|
|
BasicModelRequest: schema.BasicModelRequest{Model: filepath.Base(details.URI)},
|
|
}
|
|
}
|
|
|
|
data, err := yaml.Marshal(modelConfig)
|
|
if err != nil {
|
|
return gallery.ModelConfig{}, err
|
|
}
|
|
cfg.ConfigFile = string(data)
|
|
|
|
return cfg, nil
|
|
}
|
|
|
|
// isParakeetGGUF reports whether name is a parakeet.cpp GGUF: a .gguf file
|
|
// whose name carries a parakeet architecture token. The .gguf check is
|
|
// case-insensitive; the tokens cover the published naming
|
|
// (<arch>-<size>-<quant>.gguf) plus a generic "parakeet" fallback.
|
|
func isParakeetGGUF(name string) bool {
|
|
lower := strings.ToLower(name)
|
|
if !strings.HasSuffix(lower, ".gguf") {
|
|
return false
|
|
}
|
|
for _, tok := range []string{"tdt_ctc", "tdt-", "tdt_", "rnnt", "ctc-", "ctc_", "realtime_eou", "parakeet"} {
|
|
if strings.Contains(lower, tok) {
|
|
return true
|
|
}
|
|
}
|
|
return false
|
|
}
|