## Summary
- Adds explicit `POST /v1/cancel/{command_id}` REST endpoint to cancel
in-flight text and image generation commands by ID
- Previously, cancellation only worked via HTTP disconnect (client
closes SSE connection, triggering `anyio.get_cancelled_exc_class()`).
Non-streaming clients and external consumers had no way to cleanly
cancel active generation
- The endpoint looks up the command in active generation queues, sends
`TaskCancelled` to notify workers, and closes the sender stream. Returns
404 (OpenAI error format) if the command is not found or already
completed
## Changes
| File | Change |
|------|--------|
| `src/exo/shared/types/api.py` | Add `CancelCommandResponse` model |
| `src/exo/master/api.py` | Import, route registration, `cancel_command`
handler |
| `src/exo/master/tests/test_cancel_command.py` | 3 test cases (404,
text cancel, image cancel) |
| `docs/api.md` | Document new endpoint + update summary table |
## Design Decisions
- Uses `self._send()` (not raw `command_sender.send()`) — respects API
pause state during elections
- Uses `raise HTTPException` — feeds into exo's centralized OpenAI-style
error handler
- Returns typed `CancelCommandResponse` — consistent with
`CreateInstanceResponse` / `DeleteInstanceResponse` patterns
- `sender.close()` is idempotent so concurrent cancel requests for the
same command are harmless
## Test Plan
- [x] `test_cancel_nonexistent_command_returns_404` — verifies 404 with
OpenAI error format
- [x] `test_cancel_active_text_generation` — verifies 200,
`sender.close()` called, `TaskCancelled` sent with correct
`cancelled_command_id`
- [x] `test_cancel_active_image_generation` — same verification for
image queue
- [x] `basedpyright` — 0 new errors
- [x] `ruff check` — all checks passed
- [x] `pytest` — 3/3 new tests pass, no regressions
---------
Co-authored-by: Evan <evanev7@gmail.com>
6.3 KiB
EXO API – Technical Reference
This document describes the REST API exposed by the EXO service, as implemented in:
src/exo/master/api.py
The API is used to manage model instances in the cluster, inspect cluster state, and perform inference using an OpenAI-compatible interface.
Base URL example:
http://localhost:52415
1. General / Meta Endpoints
Get Master Node ID
GET /node_id
Returns the identifier of the current master node.
Response (example):
{
"node_id": "node-1234"
}
Get Cluster State
GET /state
Returns the current state of the cluster, including nodes and active instances.
Response: JSON object describing topology, nodes, and instances.
Get Events
GET /events
Returns the list of internal events recorded by the master (mainly for debugging and observability).
Response: Array of event objects.
2. Model Instance Management
Create Instance
POST /instance
Creates a new model instance in the cluster.
Request body (example):
{
"instance": {
"model_id": "llama-3.2-1b",
"placement": { }
}
}
Response: JSON description of the created instance.
Delete Instance
DELETE /instance/{instance_id}
Deletes an existing instance by ID.
Path parameters:
instance_id: string, ID of the instance to delete
Response: Status / confirmation JSON.
Get Instance
GET /instance/{instance_id}
Returns details of a specific instance.
Path parameters:
instance_id: string
Response: JSON description of the instance.
Preview Placements
GET /instance/previews?model_id=...
Returns possible placement previews for a given model.
Query parameters:
model_id: string, required
Response: Array of placement preview objects.
Compute Placement
GET /instance/placement
Computes a placement for a potential instance without creating it.
Query parameters (typical):
model_id: stringsharding: string or configinstance_meta: JSON-encoded metadatamin_nodes: integer
Response: JSON object describing the proposed placement / instance configuration.
Place Instance (Dry Operation)
POST /place_instance
Performs a placement operation for an instance (planning step), without necessarily creating it.
Request body: JSON describing the instance to be placed.
Response: Placement result.
3. Models
List Models
GET /models
GET /v1/models (alias)
Returns the list of available models and their metadata.
Response: Array of model descriptors.
4. Inference / Chat Completions
OpenAI-Compatible Chat Completions
POST /v1/chat/completions
Executes a chat completion request using an OpenAI-compatible schema. Supports streaming and non-streaming modes.
Request body (example):
{
"model": "llama-3.2-1b",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello" }
],
"stream": false
}
Response: OpenAI-compatible chat completion response.
Benchmarked Chat Completions
POST /bench/chat/completions
Same as /v1/chat/completions, but also returns performance and generation statistics.
Request body:
Same schema as /v1/chat/completions.
Response: Chat completion plus benchmarking metrics.
Cancel Command
POST /v1/cancel/{command_id}
Cancels an active generation command (text or image). Notifies workers and closes the stream.
Path parameters:
command_id: string, ID of the command to cancel
Response (example):
{
"message": "Command cancelled.",
"command_id": "cmd-abc-123"
}
Returns 404 if the command is not found or already completed.
5. Image Generation & Editing
Image Generation
POST /v1/images/generations
Executes an image generation request using an OpenAI-compatible schema with additional advanced_params.
Request body (example):
{
"prompt": "a robot playing chess",
"model": "flux-dev",
"stream": false,
}
Advanced Parameters (advanced_params):
| Parameter | Type | Constraints | Description |
|---|---|---|---|
seed |
int | >= 0 | Random seed for reproducible generation |
num_inference_steps |
int | 1-100 | Number of denoising steps |
guidance |
float | 1.0-20.0 | Classifier-free guidance scale |
negative_prompt |
string | - | Text describing what to avoid in the image |
Response: OpenAI-compatible image generation response.
Benchmarked Image Generation
POST /bench/images/generations
Same as /v1/images/generations, but also returns generation statistics.
Request body:
Same schema as /v1/images/generations.
Response: Image generation plus benchmarking metrics.
Image Editing
POST /v1/images/edits
Executes an image editing request using an OpenAI-compatible schema with additional advanced_params (same as /v1/images/generations).
Response:
Same format as /v1/images/generations.
Benchmarked Image Editing
POST /bench/images/edits
Same as /v1/images/edits, but also returns generation statistics.
Request:
Same schema as /v1/images/edits.
Response:
Same format as /bench/images/generations, including generation_stats.
6. Complete Endpoint Summary
GET /node_id
GET /state
GET /events
POST /instance
GET /instance/{instance_id}
DELETE /instance/{instance_id}
GET /instance/previews
GET /instance/placement
POST /place_instance
GET /models
GET /v1/models
POST /v1/chat/completions
POST /bench/chat/completions
POST /v1/cancel/{command_id}
POST /v1/images/generations
POST /bench/images/generations
POST /v1/images/edits
POST /bench/images/edits
7. Notes
- The
/v1/chat/completionsendpoint is compatible with the OpenAI Chat API format, so existing OpenAI clients can be pointed to EXO by changing the base URL. - The
/v1/images/generationsand/v1/images/editsendpoints are compatible with the OpenAI Images API format. - The instance placement endpoints allow you to plan and preview cluster allocations before actually creating instances.
- The
/eventsand/stateendpoints are primarily intended for operational visibility and debugging.