mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-14 11:49:33 -04:00
* feat(qwen3-tts-cpp): repoint upstream to ServeurpersoCom/qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): flatten qt_* ABI into qt3_* purego shim Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): build shim against upstream qwen-core static lib Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add option/language/voice/sampling parsing Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): add 24kHz WAV encode/decode/stream-header helpers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): purego backend with streaming, speakers, voice design Map TTSRequest onto qwentts.cpp: instructions->instruct, voice->named speaker or clone-reference path, params map->ref_text + sampling. Add TTSStream over the qt chunk callback. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): unit specs + build-gated TTS/TTSStream e2e Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(qwen3-tts-cpp): close defensive PCM-free gap on zero-sample result Register CppPCMFree before the n<=0 guard so a non-null buffer with zero samples cannot leak (the C contract returns NULL on failure, so this is defensive). Raised in code review. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(qwen3-tts-cpp): advertise TTSStream capability Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): update backend index metadata for qwentts.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * feat(gallery): qwentts.cpp models - base/customvoice/voicedesign, Q8_0 & Q4_K_M Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * docs(qwen3-tts-cpp): release note for qwentts.cpp migration Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * test(qwen3-tts-cpp): cover audio_path voice-cloning fallback Add resolveRequest unit specs (config audio_path used as the clone reference when Voice is empty; per-request audio Voice overrides it; a named-speaker Voice does not trigger cloning) plus a real-inference e2e that clones from audio_path (confirmed ref_spk_emb=yes in the pipeline). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * chore(qwen3-tts-cpp): drop the release-note doc Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
48 lines
2.2 KiB
C
48 lines
2.2 KiB
C
#pragma once
|
|
|
|
extern "C" {
|
|
|
|
// Streaming PCM chunk callback. samples is mono float PCM at 24 kHz, valid
|
|
// only for the duration of the call. Return non-zero to continue, 0 to abort.
|
|
typedef int (*qt3_chunk_cb)(const float *samples, int n_samples,
|
|
void *user_data);
|
|
|
|
// Load the talker + codec/tokenizer GGUFs. use_fa / clamp_fp16 map to
|
|
// qt_init_params (the qt ABI exposes no thread count; ggml uses its own
|
|
// default). Returns 0 on success, non-zero on failure.
|
|
int qt3_load(const char *talker_path, const char *codec_path, int use_fa,
|
|
int clamp_fp16);
|
|
|
|
// Synthesize to a malloc'd float PCM buffer (caller frees via qt3_pcm_free).
|
|
// The synthesis mode (base / custom_voice / voice_design) is auto-detected by
|
|
// qt from the talker GGUF; speaker is honoured only for custom_voice, instruct
|
|
// for voice_design / custom_voice, and ref_samples (+ optional ref_text) drive
|
|
// base-mode cloning. qt enforces the rules and we surface qt_last_error() on
|
|
// QT_STATUS_MODE_INVALID. Writes the sample count to *out_n. Returns NULL on
|
|
// failure (out_n set to 0).
|
|
float *qt3_tts(const char *text, const char *lang, const char *instruct,
|
|
const char *speaker, const float *ref_samples, int ref_n,
|
|
const char *ref_text, long long seed, float temperature,
|
|
int top_k, float top_p, float repetition_penalty,
|
|
int max_new_tokens, int *out_n);
|
|
|
|
// Streaming synthesis: cb is invoked per PCM chunk as audio is produced. Same
|
|
// param semantics as qt3_tts. Returns 0 on success.
|
|
int qt3_tts_stream(const char *text, const char *lang, const char *instruct,
|
|
const char *speaker, const float *ref_samples, int ref_n,
|
|
const char *ref_text, long long seed, float temperature,
|
|
int top_k, float top_p, float repetition_penalty,
|
|
int max_new_tokens, qt3_chunk_cb cb, void *user_data);
|
|
|
|
// Free a buffer returned by qt3_tts.
|
|
void qt3_pcm_free(float *p);
|
|
|
|
// Release the qt context.
|
|
void qt3_unload(void);
|
|
|
|
// Named-speaker introspection (custom_voice models). Returns 0 / NULL when no
|
|
// model is loaded or the index is out of range.
|
|
int qt3_n_speakers(void);
|
|
const char *qt3_speaker_name(int i);
|
|
}
|