LocalAI/docs/content/features/voice-activity-detection.md at 9090bca9203bde85041c76c92fcfd4eef1f5c2c9

mirror of https://github.com/mudler/LocalAI.git synced 2026-03-31 21:25:59 -04:00

Files

LocalAI [bot] 9090bca920 feat: Add documentation for undocumented API endpoints (#8852 )

* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 17:59:33 +01:00

2.1 KiB

Raw Blame History

+++ disableToc = false title = "Voice Activity Detection (VAD)" weight = 17 url = "/features/voice-activity-detection/" +++

Voice Activity Detection (VAD) identifies segments of speech in audio data. LocalAI provides a /v1/vad endpoint powered by the Silero VAD backend.

API

Method: POST
Endpoints: /v1/vad, /vad

Request

The request body is JSON with the following fields:

Parameter	Type	Required	Description
`model`	`string`	Yes	Model name (e.g. `silero-vad`)
`audio`	`float32[]`	Yes	Array of audio samples (16kHz PCM float)

Response

Returns a JSON object with detected speech segments:

Field	Type	Description
`segments`	`array`	List of detected speech segments
`segments[].start`	`float`	Start time in seconds
`segments[].end`	`float`	End time in seconds

Usage

Example request

curl http://localhost:8080/v1/vad \
  -H "Content-Type: application/json" \
  -d '{
    "model": "silero-vad",
    "audio": [0.0012, -0.0045, 0.0053, -0.0021, ...]
  }'

Example response

{
  "segments": [
    {
      "start": 0.5,
      "end": 2.3
    },
    {
      "start": 3.1,
      "end": 5.8
    }
  ]
}

Model Configuration

Create a YAML configuration file for the VAD model:

name: silero-vad
backend: silero-vad

Detection Parameters

The Silero VAD backend uses the following internal defaults:

Sample rate: 16kHz
Threshold: 0.5
Min silence duration: 100ms
Speech pad duration: 30ms

Error Responses

Status Code	Description
400	Missing or invalid `model` or `audio` field
500	Backend error during VAD processing

2.1 KiB Raw Blame History

API