Files
LocalAI/docs/content/features/video-generation.md
LocalAI [bot] 9090bca920 feat: Add documentation for undocumented API endpoints (#8852)
* feat: add documentation for undocumented API endpoints

Creates comprehensive documentation for 8 previously undocumented endpoints:
- Voice Activity Detection (/v1/vad)
- Video Generation (/video)
- Sound Generation (/v1/sound-generation)
- Backend Monitor (/backend/monitor, /backend/shutdown)
- Token Metrics (/tokenMetrics)
- P2P endpoints (/api/p2p/* - 5 sub-endpoints)
- System Info (/system, /version)

Each documentation file includes HTTP method, request/response schemas,
curl examples, sample JSON responses, and error codes.

* docs: remove token-metrics endpoint documentation per review feedback

The token-metrics endpoint is not wired into the HTTP router and
should not be documented per reviewer request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: move system-info documentation to reference section

Per review feedback, system-info endpoint docs are better suited
for the reference section rather than features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:59:33 +01:00

4.4 KiB

+++ disableToc = false title = "Video Generation" weight = 18 url = "/features/video-generation/" +++

LocalAI can generate videos from text prompts and optional reference images via the /video endpoint. Supported backends include diffusers, stablediffusion, and vllm-omni.

API

  • Method: POST
  • Endpoint: /video

Request

The request body is JSON with the following fields:

Parameter Type Required Default Description
model string Yes Model name to use
prompt string Yes Text description of the video to generate
negative_prompt string No What to exclude from the generated video
start_image string No Starting image as base64 string or URL
end_image string No Ending image for guided generation
width int No 512 Video width in pixels
height int No 512 Video height in pixels
num_frames int No Number of frames
fps int No Frames per second
seconds string No Duration in seconds
size string No Size specification (alternative to width/height)
input_reference string No Input reference for the generation
seed int No Random seed for reproducibility
cfg_scale float No Classifier-free guidance scale
step int No Number of inference steps
response_format string No url url to return a file URL, b64_json for base64 output

Response

Returns an OpenAI-compatible JSON response:

Field Type Description
created int Unix timestamp of generation
id string Unique identifier (UUID)
data array Array of generated video items
data[].url string URL path to video file (if response_format is url)
data[].b64_json string Base64-encoded video (if response_format is b64_json)

Usage

Generate a video from a text prompt

curl http://localhost:8080/video \
  -H "Content-Type: application/json" \
  -d '{
    "model": "video-model",
    "prompt": "A cat playing in a garden on a sunny day",
    "width": 512,
    "height": 512,
    "num_frames": 16,
    "fps": 8
  }'

Example response

{
  "created": 1709900000,
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "data": [
    {
      "url": "/generated-videos/abc123.mp4"
    }
  ]
}

Generate with a starting image

curl http://localhost:8080/video \
  -H "Content-Type: application/json" \
  -d '{
    "model": "video-model",
    "prompt": "A timelapse of flowers blooming",
    "start_image": "https://example.com/flowers.jpg",
    "num_frames": 24,
    "fps": 12,
    "seed": 42,
    "cfg_scale": 7.5,
    "step": 30
  }'

Get base64-encoded output

curl http://localhost:8080/video \
  -H "Content-Type: application/json" \
  -d '{
    "model": "video-model",
    "prompt": "Ocean waves on a beach",
    "response_format": "b64_json"
  }'

Error Responses

Status Code Description
400 Missing or invalid model or request parameters
500 Backend error during video generation