## Motivation
Updated documentation for v1.0.68 release
## Changes
**docs/api.md:**
- Added documentation for new API endpoints: Claude Messages API
(`/v1/messages`), OpenAI Responses API (`/v1/responses`), and Ollama API
compatibility endpoints
- Documented custom model management endpoints (`POST /models/add`,
`DELETE /models/custom/{model_id}`)
- Added `enable_thinking` parameter documentation for thinking-capable
models (DeepSeek V3.1, Qwen3, GLM-4.7)
- Documented usage statistics in responses (prompt_tokens,
completion_tokens, total_tokens)
- Added streaming event format documentation for all API types
- Updated image generation section with FLUX.1-Kontext-dev support and
new dimensions (1024x1365, 1365x1024)
- Added request cancellation documentation
- Updated complete endpoint summary with all new endpoints
- Added security notes about trust_remote_code being opt-in
**README.md:**
- Updated Features section to highlight multiple API compatibility
options
- Added Environment Variables section documenting all configuration
options (EXO_MODELS_PATH, EXO_OFFLINE, EXO_ENABLE_IMAGE_MODELS,
EXO_LIBP2P_NAMESPACE, EXO_FAST_SYNCH, EXO_TRACING_ENABLED)
- Expanded "Using the API" section with examples for Claude Messages
API, OpenAI Responses API, and Ollama API
- Added custom model loading documentation with security notes
- Updated file locations to include log files and custom model cards
paths
**CONTRIBUTING.md:**
- Added documentation for TOML model cards format and the API adapter
pattern
**docs/architecture.md:**
- Documented the adapter architecture introduced in PR #1167
Closes #1653
---------
Co-authored-by: askmanu[bot] <192355599+askmanu[bot]@users.noreply.github.com>
Co-authored-by: Evan Quiney <evanev7@gmail.com>
3.7 KiB
EXO Architecture overview
EXO uses an Event Sourcing architecture, and Erlang-style message passing. To facilitate this, we've written a channel library extending anyio channels with inspiration from tokio::sync::mpsc.
Each logical module - designed to be functional independently of the others - communicates with the rest of the system by sending messages on topics.
Systems
There are currently 5 major systems:
-
Master
Executes placement and orders events through a single writer
-
Worker
Schedules work on a node, gathers system information, etc.#
-
Runner
Executes inference jobs (for now) in an isolated process from the worker for fault-tolerance.
-
API
Runs a python webserver for exposing state and commands to client applications
-
Election
Implements a distributed algorithm for master election in unstable networking conditions
API Layer
The API system uses multiple adapters to support multiple API formats, converting them to a single request / response type.
Adapter Pattern
Adapters convert between external API formats and EXO's internal types:
Chat Completions → [adapter] → TextGenerationTaskParams → Application
Claude Messages → [adapter] → TextGenerationTaskParams → Application
Responses API → [adapter] → TextGenerationTaskParams → Application
Ollama API → [adapter] → TextGenerationTaskParams → Application
Each adapter implements two key functions:
- Request conversion: Converts API-specific requests to
TextGenerationTaskParams - Response generation: Converts internal
TokenChunkstreams back to API-specific formats (streaming and non-streaming)
Topics
There are currently 5 topics:
-
Commands
The API and Worker instruct the master when the event log isn't sufficient. Namely placement and catchup requests go through Commands atm.
-
Local Events
All nodes write events here, the master reads those events and orders them
-
Global Events
The master writes events here, all nodes read from this topic and fold the produced events into their
State -
Election Messages
Before establishing a cluster, nodes communicate here to negotiate a master node.
-
Connection Messages
The networking system write mdns-discovered hardware connections here.
Event Sourcing
Lots has been written about event sourcing, but it lets us centralize faulty connections and message ACKing with the following model.
Whenever a device produces side effects, it captures those side effects in an Event. Events are then "applied" to their model of State, which is globally distributed across the cluster. Whenever a command is received, it is combined with state to produce side effects, captured in yet more events. The rule of thumb is "Events are past tense, Commands are imperative". Telling a node to perform some action like "place this model" or "Give me a copy of the event log" is represented by a command (The worker's Tasks are also commands), while "this node is using 300GB of ram" is an event. Notably, Events SHOULD never cause side effects on their own. There are a few exceptions to this, we're working out the specifics of generalizing the distributed event sourcing model to make it better suit our needs
Purity
A significant goal of the current design is to make data flow explicit. Classes should either represent simple data (CamelCaseModels typically, and TaggedModels for unions) or active Systems (Erlang Actors), with all transformations of that data being "referentially transparent" - destructure and construct new data, don't mutate in place. We have had varying degrees of success with this, and are still exploring where purity makes sense.