Files
LocalAI/docs/content/features/api-discovery.md
2026-04-04 15:14:35 +02:00

224 lines
7.2 KiB
Markdown

+++
title = "API Discovery & Instructions"
weight = 27
toc = true
description = "Programmatic API discovery for agents, tools, and automation"
tags = ["API", "Agents", "Instructions", "Configuration", "Advanced"]
categories = ["Features"]
+++
LocalAI exposes a set of discovery endpoints that let external agents, coding assistants, and automation tools programmatically learn what the instance can do and how to control it — without reading documentation ahead of time.
## Quick start
```bash
# 1. Discover what's available
curl http://localhost:8080/.well-known/localai.json
# 2. Browse instruction areas
curl http://localhost:8080/api/instructions
# 3. Get an API guide for a specific instruction
curl http://localhost:8080/api/instructions/config-management
```
## Well-Known Discovery Endpoint
`GET /.well-known/localai.json`
Returns the instance version, all available endpoint URLs (flat and categorized), and runtime capabilities.
**Example response (abbreviated):**
```json
{
"version": "v2.28.0",
"endpoints": {
"chat_completions": "/v1/chat/completions",
"models": "/v1/models",
"config_metadata": "/api/models/config-metadata",
"instructions": "/api/instructions",
"swagger": "/swagger/index.html"
},
"endpoint_groups": {
"openai_compatible": { "chat_completions": "/v1/chat/completions", "..." : "..." },
"config_management": { "config_metadata": "/api/models/config-metadata", "..." : "..." },
"model_management": { "..." : "..." },
"monitoring": { "..." : "..." }
},
"capabilities": {
"config_metadata": true,
"config_patch": true,
"vram_estimate": true,
"mcp": true,
"agents": false,
"p2p": false
}
}
```
The `capabilities` object reflects the current runtime configuration — for example, `mcp` is only `true` if MCP is enabled, and `agents` is `true` only if the agent pool is running.
## Instructions API
Instructions are curated groups of related API endpoints. Each instruction maps to one or more Swagger tags and provides a focused, LLM-readable guide.
### List all instructions
`GET /api/instructions`
```bash
curl http://localhost:8080/api/instructions
```
Returns a compact list of instruction areas:
```json
{
"instructions": [
{
"name": "chat-inference",
"description": "OpenAI-compatible chat completions, text completions, and embeddings",
"tags": ["inference", "embeddings"],
"url": "/api/instructions/chat-inference"
},
{
"name": "config-management",
"description": "Discover, read, and modify model configuration fields with VRAM estimation",
"tags": ["config"],
"url": "/api/instructions/config-management"
}
],
"hint": "Fetch GET {url} for a markdown API guide. Add ?format=json for a raw OpenAPI fragment."
}
```
**Available instructions:**
| Instruction | Description |
|-------------|-------------|
| `chat-inference` | Chat completions, text completions, embeddings (OpenAI-compatible) |
| `audio` | Text-to-speech, transcription, voice activity detection, sound generation |
| `images` | Image generation and inpainting |
| `model-management` | Browse gallery, install, delete, manage models and backends |
| `config-management` | Discover, read, and modify model config fields with VRAM estimation |
| `monitoring` | System metrics, backend status, system information |
| `mcp` | Model Context Protocol — tool-augmented chat with MCP servers |
| `agents` | Agent task and job management |
| `video` | Video generation from text prompts |
### Get an instruction guide
`GET /api/instructions/:name`
By default, returns a **markdown guide** suitable for LLMs and humans:
```bash
curl http://localhost:8080/api/instructions/config-management
```
Add `?format=json` to get a raw **OpenAPI fragment** (filtered Swagger spec with only the relevant paths and definitions):
```bash
curl http://localhost:8080/api/instructions/config-management?format=json
```
## Configuration Management APIs
These endpoints let agents discover model configuration fields, read current settings, modify them, and estimate VRAM usage.
### Config metadata
`GET /api/models/config-metadata`
Returns structured metadata for all model configuration fields, organized by section. Each field includes its YAML path, Go type, UI type, label, description, default value, validation constraints, and available options.
```bash
# All fields
curl http://localhost:8080/api/models/config-metadata
# Filter by section
curl http://localhost:8080/api/models/config-metadata?section=parameters
```
### Autocomplete values
`GET /api/models/config-metadata/autocomplete/:provider`
Returns runtime values for dynamic fields. Providers include `backends`, `models`, `models:chat`, `models:tts`, `models:transcript`, `models:vad`.
```bash
# List available backends
curl http://localhost:8080/api/models/config-metadata/autocomplete/backends
# List chat-capable models
curl http://localhost:8080/api/models/config-metadata/autocomplete/models:chat
```
### Read model config
`GET /api/models/config-json/:name`
Returns the full model configuration as JSON:
```bash
curl http://localhost:8080/api/models/config-json/my-model
```
### Update model config
`PATCH /api/models/config-json/:name`
Deep-merges a JSON patch into the existing model configuration. Only include the fields you want to change:
```bash
curl -X PATCH http://localhost:8080/api/models/config-json/my-model \
-H "Content-Type: application/json" \
-d '{"context_size": 16384, "gpu_layers": 40}'
```
The endpoint validates the merged config and writes it to disk as YAML.
{{% notice context="warning" %}}
Config management endpoints require **admin authentication** when API keys are configured. The discovery and instructions endpoints are unauthenticated.
{{% /notice %}}
### VRAM estimation
`POST /api/models/vram-estimate`
Estimates VRAM usage for an installed model based on its weight files, context size, and GPU layer offloading:
```bash
curl -X POST http://localhost:8080/api/models/vram-estimate \
-H "Content-Type: application/json" \
-d '{"model": "my-model", "context_size": 8192}'
```
```json
{
"sizeBytes": 4368438272,
"sizeDisplay": "4.4 GB",
"vramBytes": 6123456789,
"vramDisplay": "6.1 GB",
"context_note": "Estimate used default context_size=8192. The model's trained maximum context is 131072; VRAM usage will be higher at larger context sizes.",
"model_max_context": 131072
}
```
Optional parameters: `gpu_layers` (number of layers to offload, 0 = all), `kv_quant_bits` (KV cache quantization, 0 = fp16).
## Integration guide
A recommended workflow for agent/tool builders:
1. **Discover**: Fetch `/.well-known/localai.json` to learn available endpoints and capabilities
2. **Browse instructions**: Fetch `/api/instructions` for an overview of instruction areas
3. **Deep dive**: Fetch `/api/instructions/:name` for a markdown API guide on a specific area
4. **Explore config**: Use `/api/models/config-metadata` to understand configuration fields
5. **Interact**: Use the standard OpenAI-compatible endpoints for inference, and the config management endpoints for runtime tuning
## Swagger UI
The full interactive API documentation is available at `/swagger/index.html`. All annotated endpoints can be explored and tested directly from the browser.