Files
LocalAI/docs/content/features/voice-activity-detection.md
LocalAI [bot] 9090bca920 feat: Add documentation for undocumented API endpoints (#8852)
* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:59:33 +01:00

88 lines
2.1 KiB
Markdown

+++
disableToc = false
title = "Voice Activity Detection (VAD)"
weight = 17
url = "/features/voice-activity-detection/"
+++
Voice Activity Detection (VAD) identifies segments of speech in audio data. LocalAI provides a `/v1/vad` endpoint powered by the [Silero VAD](https://github.com/snakers4/silero-vad) backend.
## API
- **Method:** `POST`
- **Endpoints:** `/v1/vad`, `/vad`
### Request
The request body is JSON with the following fields:
| Parameter | Type | Required | Description |
|-----------|------------|----------|------------------------------------------|
| `model` | `string` | Yes | Model name (e.g. `silero-vad`) |
| `audio` | `float32[]`| Yes | Array of audio samples (16kHz PCM float) |
### Response
Returns a JSON object with detected speech segments:
| Field | Type | Description |
|--------------------|-----------|------------------------------------|
| `segments` | `array` | List of detected speech segments |
| `segments[].start` | `float` | Start time in seconds |
| `segments[].end` | `float` | End time in seconds |
## Usage
### Example request
```bash
curl http://localhost:8080/v1/vad \
-H "Content-Type: application/json" \
-d '{
"model": "silero-vad",
"audio": [0.0012, -0.0045, 0.0053, -0.0021, ...]
}'
```
### Example response
```json
{
"segments": [
{
"start": 0.5,
"end": 2.3
},
{
"start": 3.1,
"end": 5.8
}
]
}
```
## Model Configuration
Create a YAML configuration file for the VAD model:
```yaml
name: silero-vad
backend: silero-vad
```
## Detection Parameters
The Silero VAD backend uses the following internal defaults:
- **Sample rate:** 16kHz
- **Threshold:** 0.5
- **Min silence duration:** 100ms
- **Speech pad duration:** 30ms
## Error Responses
| Status Code | Description |
|-------------|---------------------------------------------------|
| 400 | Missing or invalid `model` or `audio` field |
| 500 | Backend error during VAD processing |