mirror of
https://github.com/exo-explore/exo.git
synced 2026-04-18 13:00:59 -04:00
## Motivation
Updated documentation for v1.0.68 release
## Changes
**docs/api.md:**
- Added documentation for new API endpoints: Claude Messages API
(`/v1/messages`), OpenAI Responses API (`/v1/responses`), and Ollama API
compatibility endpoints
- Documented custom model management endpoints (`POST /models/add`,
`DELETE /models/custom/{model_id}`)
- Added `enable_thinking` parameter documentation for thinking-capable
models (DeepSeek V3.1, Qwen3, GLM-4.7)
- Documented usage statistics in responses (prompt_tokens,
completion_tokens, total_tokens)
- Added streaming event format documentation for all API types
- Updated image generation section with FLUX.1-Kontext-dev support and
new dimensions (1024x1365, 1365x1024)
- Added request cancellation documentation
- Updated complete endpoint summary with all new endpoints
- Added security notes about trust_remote_code being opt-in
**README.md:**
- Updated Features section to highlight multiple API compatibility
options
- Added Environment Variables section documenting all configuration
options (EXO_MODELS_PATH, EXO_OFFLINE, EXO_ENABLE_IMAGE_MODELS,
EXO_LIBP2P_NAMESPACE, EXO_FAST_SYNCH, EXO_TRACING_ENABLED)
- Expanded "Using the API" section with examples for Claude Messages
API, OpenAI Responses API, and Ollama API
- Added custom model loading documentation with security notes
- Updated file locations to include log files and custom model cards
paths
**CONTRIBUTING.md:**
- Added documentation for TOML model cards format and the API adapter
pattern
**docs/architecture.md:**
- Documented the adapter architecture introduced in PR #1167
Closes #1653
---------
Co-authored-by: askmanu[bot] <192355599+askmanu[bot]@users.noreply.github.com>
Co-authored-by: Evan Quiney <evanev7@gmail.com>
85 lines
3.7 KiB
Markdown
85 lines
3.7 KiB
Markdown
# EXO Architecture overview
|
|
|
|
EXO uses an _Event Sourcing_ architecture, and Erlang-style _message passing_. To facilitate this, we've written a channel library extending anyio channels with inspiration from tokio::sync::mpsc.
|
|
|
|
Each logical module - designed to be functional independently of the others - communicates with the rest of the system by sending messages on topics.
|
|
|
|
## Systems
|
|
|
|
There are currently 5 major systems:
|
|
|
|
- Master
|
|
|
|
Executes placement and orders events through a single writer
|
|
|
|
- Worker
|
|
|
|
Schedules work on a node, gathers system information, etc.#
|
|
|
|
- Runner
|
|
|
|
Executes inference jobs (for now) in an isolated process from the worker for fault-tolerance.
|
|
|
|
- API
|
|
|
|
Runs a python webserver for exposing state and commands to client applications
|
|
|
|
- Election
|
|
|
|
Implements a distributed algorithm for master election in unstable networking conditions
|
|
|
|
## API Layer
|
|
|
|
The API system uses multiple adapters to support multiple API formats, converting them to a single request / response type.
|
|
|
|
### Adapter Pattern
|
|
|
|
Adapters convert between external API formats and EXO's internal types:
|
|
|
|
```
|
|
Chat Completions → [adapter] → TextGenerationTaskParams → Application
|
|
Claude Messages → [adapter] → TextGenerationTaskParams → Application
|
|
Responses API → [adapter] → TextGenerationTaskParams → Application
|
|
Ollama API → [adapter] → TextGenerationTaskParams → Application
|
|
```
|
|
|
|
Each adapter implements two key functions:
|
|
1. **Request conversion**: Converts API-specific requests to `TextGenerationTaskParams`
|
|
2. **Response generation**: Converts internal `TokenChunk` streams back to API-specific formats (streaming and non-streaming)
|
|
|
|
|
|
## Topics
|
|
|
|
There are currently 5 topics:
|
|
|
|
- Commands
|
|
|
|
The API and Worker instruct the master when the event log isn't sufficient. Namely placement and catchup requests go through Commands atm.
|
|
|
|
- Local Events
|
|
|
|
All nodes write events here, the master reads those events and orders them
|
|
|
|
- Global Events
|
|
|
|
The master writes events here, all nodes read from this topic and fold the produced events into their `State`
|
|
|
|
- Election Messages
|
|
|
|
Before establishing a cluster, nodes communicate here to negotiate a master node.
|
|
|
|
- Connection Messages
|
|
|
|
The networking system write mdns-discovered hardware connections here.
|
|
|
|
|
|
## Event Sourcing
|
|
|
|
Lots has been written about event sourcing, but it lets us centralize faulty connections and message ACKing with the following model.
|
|
|
|
Whenever a device produces side effects, it captures those side effects in an `Event`. `Event`s are then "applied" to their model of `State`, which is globally distributed across the cluster. Whenever a command is received, it is combined with state to produce side effects, captured in yet more events. The rule of thumb is "`Event`s are past tense, `Command`s are imperative". Telling a node to perform some action like "place this model" or "Give me a copy of the event log" is represented by a command (The worker's `Task`s are also commands), while "this node is using 300GB of ram" is an event. Notably, `Event`s SHOULD never cause side effects on their own. There are a few exceptions to this, we're working out the specifics of generalizing the distributed event sourcing model to make it better suit our needs
|
|
|
|
## Purity
|
|
|
|
A significant goal of the current design is to make data flow explicit. Classes should either represent simple data (`CamelCaseModel`s typically, and `TaggedModel`s for unions) or active `System`s (Erlang `Actor`s), with all transformations of that data being "referentially transparent" - destructure and construct new data, don't mutate in place. We have had varying degrees of success with this, and are still exploring where purity makes sense.
|