mirror of
https://github.com/mudler/LocalAI.git
synced 2026-03-31 21:25:59 -04:00
* feat: add documentation for undocumented API endpoints Creates comprehensive documentation for 8 previously undocumented endpoints: - Voice Activity Detection (/v1/vad) - Video Generation (/video) - Sound Generation (/v1/sound-generation) - Backend Monitor (/backend/monitor, /backend/shutdown) - Token Metrics (/tokenMetrics) - P2P endpoints (/api/p2p/* - 5 sub-endpoints) - System Info (/system, /version) Each documentation file includes HTTP method, request/response schemas, curl examples, sample JSON responses, and error codes. * docs: remove token-metrics endpoint documentation per review feedback The token-metrics endpoint is not wired into the HTTP router and should not be documented per reviewer request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: move system-info documentation to reference section Per review feedback, system-info endpoint docs are better suited for the reference section rather than features. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: localai-bot <localai-bot@noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2.1 KiB
2.1 KiB
+++ disableToc = false title = "Voice Activity Detection (VAD)" weight = 17 url = "/features/voice-activity-detection/" +++
Voice Activity Detection (VAD) identifies segments of speech in audio data. LocalAI provides a /v1/vad endpoint powered by the Silero VAD backend.
API
- Method:
POST - Endpoints:
/v1/vad,/vad
Request
The request body is JSON with the following fields:
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | Model name (e.g. silero-vad) |
audio |
float32[] |
Yes | Array of audio samples (16kHz PCM float) |
Response
Returns a JSON object with detected speech segments:
| Field | Type | Description |
|---|---|---|
segments |
array |
List of detected speech segments |
segments[].start |
float |
Start time in seconds |
segments[].end |
float |
End time in seconds |
Usage
Example request
curl http://localhost:8080/v1/vad \
-H "Content-Type: application/json" \
-d '{
"model": "silero-vad",
"audio": [0.0012, -0.0045, 0.0053, -0.0021, ...]
}'
Example response
{
"segments": [
{
"start": 0.5,
"end": 2.3
},
{
"start": 3.1,
"end": 5.8
}
]
}
Model Configuration
Create a YAML configuration file for the VAD model:
name: silero-vad
backend: silero-vad
Detection Parameters
The Silero VAD backend uses the following internal defaults:
- Sample rate: 16kHz
- Threshold: 0.5
- Min silence duration: 100ms
- Speech pad duration: 30ms
Error Responses
| Status Code | Description |
|---|---|
| 400 | Missing or invalid model or audio field |
| 500 | Backend error during VAD processing |