LocalAI/docs/content/features/sound-generation.md

+++
disableToc = false
title = "Sound Generation"
weight = 19
url = "/features/sound-generation/"
+++

LocalAI supports generating audio from text descriptions via the `/v1/sound-generation` endpoint. This endpoint is compatible with the [ElevenLabs sound generation API](https://elevenlabs.io/docs/api-reference/sound-generation) and can produce music, sound effects, and other audio content.

## API

- **Method:** `POST`
- **Endpoint:** `/v1/sound-generation`

### Request

The request body is JSON. There are two usage modes: simple and advanced.

#### Simple mode

| Parameter        | Type     | Required | Description                                  |
|------------------|----------|----------|----------------------------------------------|
| `model_id`       | `string` | Yes      | Model identifier                             |
| `text`           | `string` | Yes      | Audio description or prompt                  |
| `instrumental`   | `bool`   | No       | Generate instrumental audio (no vocals)      |
| `vocal_language` | `string` | No       | Language code for vocals (e.g. `bn`, `ja`)   |

#### Advanced mode

| Parameter           | Type     | Required | Description                                     |
|---------------------|----------|----------|-------------------------------------------------|
| `model_id`          | `string` | Yes      | Model identifier                                |
| `text`              | `string` | Yes      | Text prompt or description                      |
| `duration_seconds`  | `float`  | No       | Target duration in seconds                      |
| `prompt_influence`  | `float`  | No       | Temperature / prompt influence parameter        |
| `do_sample`         | `bool`   | No       | Enable sampling                                 |
| `think`             | `bool`   | No       | Enable extended thinking for generation         |
| `caption`           | `string` | No       | Caption describing the audio                    |
| `lyrics`            | `string` | No       | Lyrics for the generated audio                  |
| `bpm`               | `int`    | No       | Beats per minute                                |
| `keyscale`          | `string` | No       | Musical key/scale (e.g. `Ab major`)             |
| `language`          | `string` | No       | Language code                                   |
| `vocal_language`    | `string` | No       | Vocal language (fallback if `language` is empty) |
| `timesignature`     | `string` | No       | Time signature (e.g. `4`)                       |
| `instrumental`      | `bool`   | No       | Generate instrumental audio (no vocals)         |

### Response

Returns a binary audio file with the appropriate `Content-Type` header (e.g. `audio/wav`, `audio/mpeg`, `audio/flac`, `audio/ogg`).

## Usage

### Generate a sound effect

```bash
curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "rain falling on a tin roof"
  }' \
  --output rain.wav
```

### Generate a song with vocals

```bash
curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "a soft Bengali love song for a quiet evening",
    "instrumental": false,
    "vocal_language": "bn"
  }' \
  --output song.wav
```

### Generate music with advanced parameters

```bash
curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "upbeat pop",
    "caption": "A funky Japanese disco track",
    "lyrics": "[Verse 1]\nDancing in the neon lights",
    "think": true,
    "bpm": 120,
    "duration_seconds": 225,
    "keyscale": "Ab major",
    "language": "ja",
    "timesignature": "4"
  }' \
  --output disco.wav
```

## Error Responses

| Status Code | Description                                      |
|-------------|--------------------------------------------------|
| 400         | Missing or invalid model or request parameters   |
| 500         | Backend error during sound generation            |