Files
exo/docs/api.md
PG b74a610537 Add a basic documentation to the api interface (#1122)
## Motivation

Adds basic api documentation

## Changes

- Add docs/api.md
- Modify README.md
2026-01-11 18:44:40 +00:00

4.1 KiB
Raw Blame History

EXO API Technical Reference

This document describes the REST API exposed by the **EXO ** service, as implemented in:

src/exo/master/api.py

The API is used to manage model instances in the cluster, inspect cluster state, and perform inference using an OpenAI-compatible interface.

Base URL example:

http://localhost:52415

1. General / Meta Endpoints

Get Master Node ID

GET /node_id

Returns the identifier of the current master node.

Response (example):

{
  "node_id": "node-1234"
}

Get Cluster State

GET /state

Returns the current state of the cluster, including nodes and active instances.

Response: JSON object describing topology, nodes, and instances.

Get Events

GET /events

Returns the list of internal events recorded by the master (mainly for debugging and observability).

Response: Array of event objects.

2. Model Instance Management

Create Instance

POST /instance

Creates a new model instance in the cluster.

Request body (example):

{
  "instance": {
    "model_id": "llama-3.2-1b",
    "placement": { }
  }
}

Response: JSON description of the created instance.

Delete Instance

DELETE /instance/{instance_id}

Deletes an existing instance by ID.

Path parameters:

  • instance_id: string, ID of the instance to delete

Response: Status / confirmation JSON.

Get Instance

GET /instance/{instance_id}

Returns details of a specific instance.

Path parameters:

  • instance_id: string

Response: JSON description of the instance.

Preview Placements

GET /instance/previews?model_id=...

Returns possible placement previews for a given model.

Query parameters:

  • model_id: string, required

Response: Array of placement preview objects.

Compute Placement

GET /instance/placement

Computes a placement for a potential instance without creating it.

Query parameters (typical):

  • model_id: string
  • sharding: string or config
  • instance_meta: JSON-encoded metadata
  • min_nodes: integer

Response: JSON object describing the proposed placement / instance configuration.

Place Instance (Dry Operation)

POST /place_instance

Performs a placement operation for an instance (planning step), without necessarily creating it.

Request body: JSON describing the instance to be placed.

Response: Placement result.

3. Models

List Models

GET /models GET /v1/models (alias)

Returns the list of available models and their metadata.

Response: Array of model descriptors.

4. Inference / Chat Completions

OpenAI-Compatible Chat Completions

POST /v1/chat/completions

Executes a chat completion request using an OpenAI-compatible schema. Supports streaming and non-streaming modes.

Request body (example):

{
  "model": "llama-3.2-1b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello" }
  ],
  "stream": false
}

Response: OpenAI-compatible chat completion response.

Benchmarked Chat Completions

POST /bench/chat/completions

Same as /v1/chat/completions, but also returns performance and generation statistics.

Request body: Same schema as /v1/chat/completions.

Response: Chat completion plus benchmarking metrics.

5. Complete Endpoint Summary

GET     /node_id
GET     /state
GET     /events

POST    /instance
GET     /instance/{instance_id}
DELETE  /instance/{instance_id}

GET     /instance/previews
GET     /instance/placement
POST    /place_instance

GET     /models
GET     /v1/models

POST    /v1/chat/completions
POST    /bench/chat/completions

6. Notes

  • The /v1/chat/completions endpoint is compatible with the OpenAI API format, so existing OpenAI clients can be pointed to EXO by changing the base URL.
  • The instance placement endpoints allow you to plan and preview cluster allocations before actually creating instances.
  • The /events and /state endpoints are primarily intended for operational visibility and debugging.