mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-17 20:40:35 -04:00

Files

Andrei Onel 7a36d3968d docs: Update documentation for v1.0.68 release (#1667 )

## Motivation

Updated documentation for v1.0.68 release

## Changes

**docs/api.md:**
- Added documentation for new API endpoints: Claude Messages API
(`/v1/messages`), OpenAI Responses API (`/v1/responses`), and Ollama API
compatibility endpoints
- Documented custom model management endpoints (`POST /models/add`,
`DELETE /models/custom/{model_id}`)
- Added `enable_thinking` parameter documentation for thinking-capable
models (DeepSeek V3.1, Qwen3, GLM-4.7)
- Documented usage statistics in responses (prompt_tokens,
completion_tokens, total_tokens)
- Added streaming event format documentation for all API types
- Updated image generation section with FLUX.1-Kontext-dev support and
new dimensions (1024x1365, 1365x1024)
- Added request cancellation documentation
- Updated complete endpoint summary with all new endpoints
- Added security notes about trust_remote_code being opt-in

**README.md:**
- Updated Features section to highlight multiple API compatibility
options
- Added Environment Variables section documenting all configuration
options (EXO_MODELS_PATH, EXO_OFFLINE, EXO_ENABLE_IMAGE_MODELS,
EXO_LIBP2P_NAMESPACE, EXO_FAST_SYNCH, EXO_TRACING_ENABLED)
- Expanded "Using the API" section with examples for Claude Messages
API, OpenAI Responses API, and Ollama API
- Added custom model loading documentation with security notes
- Updated file locations to include log files and custom model cards
paths

**CONTRIBUTING.md:** 
- Added documentation for TOML model cards format and the API adapter
pattern

**docs/architecture.md:**
- Documented the adapter architecture introduced in PR #1167

Closes #1653

---------

Co-authored-by: askmanu[bot] <192355599+askmanu[bot]@users.noreply.github.com>
Co-authored-by: Evan Quiney <evanev7@gmail.com>

2026-03-06 11:32:46 +00:00

3.7 KiB

Raw Permalink Blame History

EXO Architecture overview

EXO uses an Event Sourcing architecture, and Erlang-style message passing. To facilitate this, we've written a channel library extending anyio channels with inspiration from tokio::sync::mpsc.

Each logical module - designed to be functional independently of the others - communicates with the rest of the system by sending messages on topics.

Systems

There are currently 5 major systems:

Master

Executes placement and orders events through a single writer
Worker

Schedules work on a node, gathers system information, etc.#
Runner

Executes inference jobs (for now) in an isolated process from the worker for fault-tolerance.
API

Runs a python webserver for exposing state and commands to client applications
Election

Implements a distributed algorithm for master election in unstable networking conditions

API Layer

The API system uses multiple adapters to support multiple API formats, converting them to a single request / response type.

Adapter Pattern

Adapters convert between external API formats and EXO's internal types:

Chat Completions → [adapter] → TextGenerationTaskParams → Application
Claude Messages  → [adapter] → TextGenerationTaskParams → Application
Responses API    → [adapter] → TextGenerationTaskParams → Application
Ollama API       → [adapter] → TextGenerationTaskParams → Application

Each adapter implements two key functions:

Request conversion: Converts API-specific requests to TextGenerationTaskParams
Response generation: Converts internal TokenChunk streams back to API-specific formats (streaming and non-streaming)

Topics

There are currently 5 topics:

Commands

The API and Worker instruct the master when the event log isn't sufficient. Namely placement and catchup requests go through Commands atm.
Local Events

All nodes write events here, the master reads those events and orders them
Global Events

The master writes events here, all nodes read from this topic and fold the produced events into their State
Election Messages

Before establishing a cluster, nodes communicate here to negotiate a master node.
Connection Messages

The networking system write mdns-discovered hardware connections here.

Event Sourcing

Lots has been written about event sourcing, but it lets us centralize faulty connections and message ACKing with the following model.

Whenever a device produces side effects, it captures those side effects in an Event. Events are then "applied" to their model of State, which is globally distributed across the cluster. Whenever a command is received, it is combined with state to produce side effects, captured in yet more events. The rule of thumb is "Events are past tense, Commands are imperative". Telling a node to perform some action like "place this model" or "Give me a copy of the event log" is represented by a command (The worker's Tasks are also commands), while "this node is using 300GB of ram" is an event. Notably, Events SHOULD never cause side effects on their own. There are a few exceptions to this, we're working out the specifics of generalizing the distributed event sourcing model to make it better suit our needs

Purity

A significant goal of the current design is to make data flow explicit. Classes should either represent simple data (CamelCaseModels typically, and TaggedModels for unions) or active Systems (Erlang Actors), with all transformations of that data being "referentially transparent" - destructure and construct new data, don't mutate in place. We have had varying degrees of success with this, and are still exploring where purity makes sense.

3.7 KiB Raw Permalink Blame History