mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-01 05:36:49 -04:00
* feat: add documentation for undocumented API endpoints Creates comprehensive documentation for 8 previously undocumented endpoints: - Voice Activity Detection (/v1/vad) - Video Generation (/video) - Sound Generation (/v1/sound-generation) - Backend Monitor (/backend/monitor, /backend/shutdown) - Token Metrics (/tokenMetrics) - P2P endpoints (/api/p2p/* - 5 sub-endpoints) - System Info (/system, /version) Each documentation file includes HTTP method, request/response schemas, curl examples, sample JSON responses, and error codes. * docs: remove token-metrics endpoint documentation per review feedback The token-metrics endpoint is not wired into the HTTP router and should not be documented per reviewer request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: move system-info documentation to reference section Per review feedback, system-info endpoint docs are better suited for the reference section rather than features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4.1 KiB
4.1 KiB
+++ disableToc = false title = "Sound Generation" weight = 19 url = "/features/sound-generation/" +++
LocalAI supports generating audio from text descriptions via the /v1/sound-generation endpoint. This endpoint is compatible with the ElevenLabs sound generation API and can produce music, sound effects, and other audio content.
API
- Method:
POST - Endpoint:
/v1/sound-generation
Request
The request body is JSON. There are two usage modes: simple and advanced.
Simple mode
| Parameter | Type | Required | Description |
|---|---|---|---|
model_id |
string |
Yes | Model identifier |
text |
string |
Yes | Audio description or prompt |
instrumental |
bool |
No | Generate instrumental audio (no vocals) |
vocal_language |
string |
No | Language code for vocals (e.g. bn, ja) |
Advanced mode
| Parameter | Type | Required | Description |
|---|---|---|---|
model_id |
string |
Yes | Model identifier |
text |
string |
Yes | Text prompt or description |
duration_seconds |
float |
No | Target duration in seconds |
prompt_influence |
float |
No | Temperature / prompt influence parameter |
do_sample |
bool |
No | Enable sampling |
think |
bool |
No | Enable extended thinking for generation |
caption |
string |
No | Caption describing the audio |
lyrics |
string |
No | Lyrics for the generated audio |
bpm |
int |
No | Beats per minute |
keyscale |
string |
No | Musical key/scale (e.g. Ab major) |
language |
string |
No | Language code |
vocal_language |
string |
No | Vocal language (fallback if language is empty) |
timesignature |
string |
No | Time signature (e.g. 4) |
instrumental |
bool |
No | Generate instrumental audio (no vocals) |
Response
Returns a binary audio file with the appropriate Content-Type header (e.g. audio/wav, audio/mpeg, audio/flac, audio/ogg).
Usage
Generate a sound effect
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "rain falling on a tin roof"
}' \
--output rain.wav
Generate a song with vocals
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "a soft Bengali love song for a quiet evening",
"instrumental": false,
"vocal_language": "bn"
}' \
--output song.wav
Generate music with advanced parameters
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "upbeat pop",
"caption": "A funky Japanese disco track",
"lyrics": "[Verse 1]\nDancing in the neon lights",
"think": true,
"bpm": 120,
"duration_seconds": 225,
"keyscale": "Ab major",
"language": "ja",
"timesignature": "4"
}' \
--output disco.wav
Error Responses
| Status Code | Description |
|---|---|
| 400 | Missing or invalid model or request parameters |
| 500 | Backend error during sound generation |