mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-01-19 03:22:01 -05:00

Files

PG b74a610537 Add a basic documentation to the api interface (#1122 )

## Motivation

Adds basic api documentation

## Changes

- Add docs/api.md
- Modify README.md

2026-01-11 18:44:40 +00:00

4.1 KiB

Raw Blame History

EXO API – Technical Reference

This document describes the REST API exposed by the **EXO ** service, as implemented in:

src/exo/master/api.py

The API is used to manage model instances in the cluster, inspect cluster state, and perform inference using an OpenAI-compatible interface.

Base URL example:

http://localhost:52415

1. General / Meta Endpoints

Get Master Node ID

GET /node_id

Returns the identifier of the current master node.

Response (example):

{
  "node_id": "node-1234"
}

Get Cluster State

GET /state

Returns the current state of the cluster, including nodes and active instances.

Response: JSON object describing topology, nodes, and instances.

Get Events

GET /events

Returns the list of internal events recorded by the master (mainly for debugging and observability).

Response: Array of event objects.

2. Model Instance Management

Create Instance

POST /instance

Creates a new model instance in the cluster.

Request body (example):

{
  "instance": {
    "model_id": "llama-3.2-1b",
    "placement": { }
  }
}

Response: JSON description of the created instance.

Delete Instance

DELETE /instance/{instance_id}

Deletes an existing instance by ID.

Path parameters:

instance_id: string, ID of the instance to delete

Response: Status / confirmation JSON.

Get Instance

GET /instance/{instance_id}

Returns details of a specific instance.

Path parameters:

instance_id: string

Response: JSON description of the instance.

Preview Placements

GET /instance/previews?model_id=...

Returns possible placement previews for a given model.

Query parameters:

model_id: string, required

Response: Array of placement preview objects.

Compute Placement

GET /instance/placement

Computes a placement for a potential instance without creating it.

Query parameters (typical):

model_id: string
sharding: string or config
instance_meta: JSON-encoded metadata
min_nodes: integer

Response: JSON object describing the proposed placement / instance configuration.

Place Instance (Dry Operation)

POST /place_instance

Performs a placement operation for an instance (planning step), without necessarily creating it.

Request body: JSON describing the instance to be placed.

Response: Placement result.

3. Models

List Models

GET /models GET /v1/models (alias)

Returns the list of available models and their metadata.

Response: Array of model descriptors.

4. Inference / Chat Completions

OpenAI-Compatible Chat Completions

POST /v1/chat/completions

Executes a chat completion request using an OpenAI-compatible schema. Supports streaming and non-streaming modes.