## Motivation Adds basic api documentation ## Changes - Add docs/api.md - Modify README.md
4.1 KiB
EXO API – Technical Reference
This document describes the REST API exposed by the **EXO ** service, as implemented in:
src/exo/master/api.py
The API is used to manage model instances in the cluster, inspect cluster state, and perform inference using an OpenAI-compatible interface.
Base URL example:
http://localhost:52415
1. General / Meta Endpoints
Get Master Node ID
GET /node_id
Returns the identifier of the current master node.
Response (example):
{
"node_id": "node-1234"
}
Get Cluster State
GET /state
Returns the current state of the cluster, including nodes and active instances.
Response: JSON object describing topology, nodes, and instances.
Get Events
GET /events
Returns the list of internal events recorded by the master (mainly for debugging and observability).
Response: Array of event objects.
2. Model Instance Management
Create Instance
POST /instance
Creates a new model instance in the cluster.
Request body (example):
{
"instance": {
"model_id": "llama-3.2-1b",
"placement": { }
}
}
Response: JSON description of the created instance.
Delete Instance
DELETE /instance/{instance_id}
Deletes an existing instance by ID.
Path parameters:
instance_id: string, ID of the instance to delete
Response: Status / confirmation JSON.
Get Instance
GET /instance/{instance_id}
Returns details of a specific instance.
Path parameters:
instance_id: string
Response: JSON description of the instance.
Preview Placements
GET /instance/previews?model_id=...
Returns possible placement previews for a given model.
Query parameters:
model_id: string, required
Response: Array of placement preview objects.
Compute Placement
GET /instance/placement
Computes a placement for a potential instance without creating it.
Query parameters (typical):
model_id: stringsharding: string or configinstance_meta: JSON-encoded metadatamin_nodes: integer
Response: JSON object describing the proposed placement / instance configuration.
Place Instance (Dry Operation)
POST /place_instance
Performs a placement operation for an instance (planning step), without necessarily creating it.
Request body: JSON describing the instance to be placed.
Response: Placement result.
3. Models
List Models
GET /models
GET /v1/models (alias)
Returns the list of available models and their metadata.
Response: Array of model descriptors.
4. Inference / Chat Completions
OpenAI-Compatible Chat Completions
POST /v1/chat/completions
Executes a chat completion request using an OpenAI-compatible schema. Supports streaming and non-streaming modes.
Request body (example):
{
"model": "llama-3.2-1b",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello" }
],
"stream": false
}
Response: OpenAI-compatible chat completion response.
Benchmarked Chat Completions
POST /bench/chat/completions
Same as /v1/chat/completions, but also returns performance and generation statistics.
Request body:
Same schema as /v1/chat/completions.
Response: Chat completion plus benchmarking metrics.
5. Complete Endpoint Summary
GET /node_id
GET /state
GET /events
POST /instance
GET /instance/{instance_id}
DELETE /instance/{instance_id}
GET /instance/previews
GET /instance/placement
POST /place_instance
GET /models
GET /v1/models
POST /v1/chat/completions
POST /bench/chat/completions
6. Notes
- The
/v1/chat/completionsendpoint is compatible with the OpenAI API format, so existing OpenAI clients can be pointed to EXO by changing the base URL. - The instance placement endpoints allow you to plan and preview cluster allocations before actually creating instances.
- The
/eventsand/stateendpoints are primarily intended for operational visibility and debugging.