LocalAI/docs/content/features/sound-generation.md at d200401e86e01ed82478fec6822f52b753fc802e

mirror of https://github.com/mudler/LocalAI.git synced 2026-04-01 05:36:49 -04:00

Files

LocalAI [bot] 9090bca920 feat: Add documentation for undocumented API endpoints (#8852 )

* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 17:59:33 +01:00

4.1 KiB

Raw Blame History

+++ disableToc = false title = "Sound Generation" weight = 19 url = "/features/sound-generation/" +++

LocalAI supports generating audio from text descriptions via the /v1/sound-generation endpoint. This endpoint is compatible with the ElevenLabs sound generation API and can produce music, sound effects, and other audio content.

API

Method: POST
Endpoint: /v1/sound-generation

Request

The request body is JSON. There are two usage modes: simple and advanced.

Simple mode

Parameter	Type	Required	Description
`model_id`	`string`	Yes	Model identifier
`text`	`string`	Yes	Audio description or prompt
`instrumental`	`bool`	No	Generate instrumental audio (no vocals)
`vocal_language`	`string`	No	Language code for vocals (e.g. `bn`, `ja`)

Advanced mode

Parameter	Type	Required	Description
`model_id`	`string`	Yes	Model identifier
`text`	`string`	Yes	Text prompt or description
`duration_seconds`	`float`	No	Target duration in seconds
`prompt_influence`	`float`	No	Temperature / prompt influence parameter
`do_sample`	`bool`	No	Enable sampling
`think`	`bool`	No	Enable extended thinking for generation
`caption`	`string`	No	Caption describing the audio
`lyrics`	`string`	No	Lyrics for the generated audio
`bpm`	`int`	No	Beats per minute
`keyscale`	`string`	No	Musical key/scale (e.g. `Ab major`)
`language`	`string`	No	Language code
`vocal_language`	`string`	No	Vocal language (fallback if `language` is empty)
`timesignature`	`string`	No	Time signature (e.g. `4`)
`instrumental`	`bool`	No	Generate instrumental audio (no vocals)

Response

Returns a binary audio file with the appropriate Content-Type header (e.g. audio/wav, audio/mpeg, audio/flac, audio/ogg).

Usage

Generate a sound effect

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "rain falling on a tin roof"
  }' \
  --output rain.wav

Generate a song with vocals

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "a soft Bengali love song for a quiet evening",
    "instrumental": false,
    "vocal_language": "bn"
  }' \
  --output song.wav

Generate music with advanced parameters

curl http://localhost:8080/v1/sound-generation \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "sound-model",
    "text": "upbeat pop",
    "caption": "A funky Japanese disco track",
    "lyrics": "[Verse 1]\nDancing in the neon lights",
    "think": true,
    "bpm": 120,
    "duration_seconds": 225,
    "keyscale": "Ab major",
    "language": "ja",
    "timesignature": "4"
  }' \
  --output disco.wav

Error Responses

Status Code	Description
400	Missing or invalid model or request parameters
500	Backend error during sound generation

4.1 KiB Raw Blame History

API