mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-02 14:16:02 -04:00
* feat: add documentation for undocumented API endpoints Creates comprehensive documentation for 8 previously undocumented endpoints: - Voice Activity Detection (/v1/vad) - Video Generation (/video) - Sound Generation (/v1/sound-generation) - Backend Monitor (/backend/monitor, /backend/shutdown) - Token Metrics (/tokenMetrics) - P2P endpoints (/api/p2p/* - 5 sub-endpoints) - System Info (/system, /version) Each documentation file includes HTTP method, request/response schemas, curl examples, sample JSON responses, and error codes. * docs: remove token-metrics endpoint documentation per review feedback The token-metrics endpoint is not wired into the HTTP router and should not be documented per reviewer request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: move system-info documentation to reference section Per review feedback, system-info endpoint docs are better suited for the reference section rather than features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
105 lines
4.1 KiB
Markdown
105 lines
4.1 KiB
Markdown
+++
|
|
disableToc = false
|
|
title = "Sound Generation"
|
|
weight = 19
|
|
url = "/features/sound-generation/"
|
|
+++
|
|
|
|
LocalAI supports generating audio from text descriptions via the `/v1/sound-generation` endpoint. This endpoint is compatible with the [ElevenLabs sound generation API](https://elevenlabs.io/docs/api-reference/sound-generation) and can produce music, sound effects, and other audio content.
|
|
|
|
## API
|
|
|
|
- **Method:** `POST`
|
|
- **Endpoint:** `/v1/sound-generation`
|
|
|
|
### Request
|
|
|
|
The request body is JSON. There are two usage modes: simple and advanced.
|
|
|
|
#### Simple mode
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|------------------|----------|----------|----------------------------------------------|
|
|
| `model_id` | `string` | Yes | Model identifier |
|
|
| `text` | `string` | Yes | Audio description or prompt |
|
|
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
|
|
| `vocal_language` | `string` | No | Language code for vocals (e.g. `bn`, `ja`) |
|
|
|
|
#### Advanced mode
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|---------------------|----------|----------|-------------------------------------------------|
|
|
| `model_id` | `string` | Yes | Model identifier |
|
|
| `text` | `string` | Yes | Text prompt or description |
|
|
| `duration_seconds` | `float` | No | Target duration in seconds |
|
|
| `prompt_influence` | `float` | No | Temperature / prompt influence parameter |
|
|
| `do_sample` | `bool` | No | Enable sampling |
|
|
| `think` | `bool` | No | Enable extended thinking for generation |
|
|
| `caption` | `string` | No | Caption describing the audio |
|
|
| `lyrics` | `string` | No | Lyrics for the generated audio |
|
|
| `bpm` | `int` | No | Beats per minute |
|
|
| `keyscale` | `string` | No | Musical key/scale (e.g. `Ab major`) |
|
|
| `language` | `string` | No | Language code |
|
|
| `vocal_language` | `string` | No | Vocal language (fallback if `language` is empty) |
|
|
| `timesignature` | `string` | No | Time signature (e.g. `4`) |
|
|
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
|
|
|
|
### Response
|
|
|
|
Returns a binary audio file with the appropriate `Content-Type` header (e.g. `audio/wav`, `audio/mpeg`, `audio/flac`, `audio/ogg`).
|
|
|
|
## Usage
|
|
|
|
### Generate a sound effect
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/sound-generation \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model_id": "sound-model",
|
|
"text": "rain falling on a tin roof"
|
|
}' \
|
|
--output rain.wav
|
|
```
|
|
|
|
### Generate a song with vocals
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/sound-generation \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model_id": "sound-model",
|
|
"text": "a soft Bengali love song for a quiet evening",
|
|
"instrumental": false,
|
|
"vocal_language": "bn"
|
|
}' \
|
|
--output song.wav
|
|
```
|
|
|
|
### Generate music with advanced parameters
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/sound-generation \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model_id": "sound-model",
|
|
"text": "upbeat pop",
|
|
"caption": "A funky Japanese disco track",
|
|
"lyrics": "[Verse 1]\nDancing in the neon lights",
|
|
"think": true,
|
|
"bpm": 120,
|
|
"duration_seconds": 225,
|
|
"keyscale": "Ab major",
|
|
"language": "ja",
|
|
"timesignature": "4"
|
|
}' \
|
|
--output disco.wav
|
|
```
|
|
|
|
## Error Responses
|
|
|
|
| Status Code | Description |
|
|
|-------------|--------------------------------------------------|
|
|
| 400 | Missing or invalid model or request parameters |
|
|
| 500 | Backend error during sound generation |
|