Files
LocalAI/docs/content/features/sound-generation.md
LocalAI [bot] 9090bca920 feat: Add documentation for undocumented API endpoints (#8852)
* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:59:33 +01:00

105 lines
4.1 KiB
Markdown

+++
disableToc = false
title = "Sound Generation"
weight = 19
url = "/features/sound-generation/"
+++
LocalAI supports generating audio from text descriptions via the `/v1/sound-generation` endpoint. This endpoint is compatible with the [ElevenLabs sound generation API](https://elevenlabs.io/docs/api-reference/sound-generation) and can produce music, sound effects, and other audio content.
## API
- **Method:** `POST`
- **Endpoint:** `/v1/sound-generation`
### Request
The request body is JSON. There are two usage modes: simple and advanced.
#### Simple mode
| Parameter | Type | Required | Description |
|------------------|----------|----------|----------------------------------------------|
| `model_id` | `string` | Yes | Model identifier |
| `text` | `string` | Yes | Audio description or prompt |
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
| `vocal_language` | `string` | No | Language code for vocals (e.g. `bn`, `ja`) |
#### Advanced mode
| Parameter | Type | Required | Description |
|---------------------|----------|----------|-------------------------------------------------|
| `model_id` | `string` | Yes | Model identifier |
| `text` | `string` | Yes | Text prompt or description |
| `duration_seconds` | `float` | No | Target duration in seconds |
| `prompt_influence` | `float` | No | Temperature / prompt influence parameter |
| `do_sample` | `bool` | No | Enable sampling |
| `think` | `bool` | No | Enable extended thinking for generation |
| `caption` | `string` | No | Caption describing the audio |
| `lyrics` | `string` | No | Lyrics for the generated audio |
| `bpm` | `int` | No | Beats per minute |
| `keyscale` | `string` | No | Musical key/scale (e.g. `Ab major`) |
| `language` | `string` | No | Language code |
| `vocal_language` | `string` | No | Vocal language (fallback if `language` is empty) |
| `timesignature` | `string` | No | Time signature (e.g. `4`) |
| `instrumental` | `bool` | No | Generate instrumental audio (no vocals) |
### Response
Returns a binary audio file with the appropriate `Content-Type` header (e.g. `audio/wav`, `audio/mpeg`, `audio/flac`, `audio/ogg`).
## Usage
### Generate a sound effect
```bash
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "rain falling on a tin roof"
}' \
--output rain.wav
```
### Generate a song with vocals
```bash
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "a soft Bengali love song for a quiet evening",
"instrumental": false,
"vocal_language": "bn"
}' \
--output song.wav
```
### Generate music with advanced parameters
```bash
curl http://localhost:8080/v1/sound-generation \
-H "Content-Type: application/json" \
-d '{
"model_id": "sound-model",
"text": "upbeat pop",
"caption": "A funky Japanese disco track",
"lyrics": "[Verse 1]\nDancing in the neon lights",
"think": true,
"bpm": 120,
"duration_seconds": 225,
"keyscale": "Ab major",
"language": "ja",
"timesignature": "4"
}' \
--output disco.wav
```
## Error Responses
| Status Code | Description |
|-------------|--------------------------------------------------|
| 400 | Missing or invalid model or request parameters |
| 500 | Backend error during sound generation |