mirror of
https://github.com/exo-explore/exo.git
synced 2026-02-05 11:43:17 -05:00
## Motivation
Add support for Claude Messages API and OpenAI Responses API to allow
users to interact with exo using these popular API formats. This enables
broader compatibility with existing tooling and SDKs that expect these
API formats.
## Architecture
Adapter logic lives exclusively in the API layer
(`src/exo/master/adapters/`). On the way in, each adapter converts its
API-specific request type (`ChatCompletionRequest`,
`ClaudeMessagesRequest`, `ResponsesRequest`) into
`TextGenerationTaskParams`. On the way out, each adapter converts the
`TokenChunk` stream back into its API-specific response format.
Everything inside the application — commands, worker, runner, event
sourcing — only sees `TextGenerationTaskParams` and `TokenChunk`. No
API-specific types cross the boundary.
```
API layer │ Application internals
│
Chat Completions → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ChatCompletionResponse
Claude Messages → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ClaudeMessagesResponse
Responses API → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ResponsesResponse
```
## Changes
### New Files
- `src/exo/shared/types/claude_api.py` - Pydantic types for Claude
Messages API
- `src/exo/shared/types/openai_responses.py` - Pydantic types for OpenAI
Responses API
- `src/exo/shared/types/text_generation.py` - Shared
`TextGenerationTaskParams` internal type
- `src/exo/master/adapters/chat_completions.py` - Chat Completions
adapter (streaming/non-streaming)
- `src/exo/master/adapters/claude.py` - Claude Messages adapter
(streaming/non-streaming)
- `src/exo/master/adapters/responses.py` - OpenAI Responses adapter
(streaming/non-streaming)
### Modified Files
- `src/exo/master/api.py` - Refactored to use adapters uniformly for all
endpoints; extracted `_resolve_and_validate_text_model` helper to
deduplicate model validation across all text endpoints; removed ad-hoc
`try/except ValueError` blocks from non-streaming paths
### New Endpoints
- `POST /v1/messages` - Claude Messages API (streaming and
non-streaming)
- `POST /v1/responses` - OpenAI Responses API (streaming and
non-streaming)
## Why It Works
All APIs are implemented as pure conversion adapters at the edge of the
application:
1. Adapter functions in `src/exo/master/adapters/` convert incoming
requests to `TextGenerationTaskParams`
2. `api.py` wraps the params in a `TextGeneration` command and sends it
through the existing command/event flow
3. The worker, runner, and event sourcing layers only handle
`TextGenerationTaskParams` and `TokenChunk` — they have no awareness of
Chat Completions, Claude, or Responses API formats
4. On response, adapter functions convert the `TokenChunk` stream back
to the caller's expected format
5. Model validation is handled by a single shared helper
(`_resolve_and_validate_text_model`), mirroring the existing
`_validate_image_model` pattern for image endpoints
No changes to core inference logic were needed.
### Streaming Formats
- **Chat Completions**: Uses `data: {...}\n\n` with `[DONE]` terminator
- **Claude**: Uses event types `message_start`, `content_block_start`,
`content_block_delta`, `content_block_stop`, `message_delta`,
`message_stop`
- **OpenAI Responses**: Uses event types `response.created`,
`response.in_progress`, `response.output_item.added`,
`response.content_part.added`, `response.output_text.delta`,
`response.output_text.done`, `response.content_part.done`,
`response.output_item.done`, `response.completed`
## Test Plan
### Manual Testing
Hardware: MacBook Pro M3 Max
**Non-streaming tests:**
```bash
# Chat Completions API
curl -X POST http://localhost:52415/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 20}'
# Claude Messages API
curl -X POST http://localhost:52415/v1/messages \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "max_tokens": 50, "messages": [{"role": "user", "content": "Hello"}]}'
# OpenAI Responses API
curl -X POST http://localhost:52415/v1/responses \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "input": "Hello", "max_output_tokens": 20}'
```
**Streaming tests:**
```bash
# Chat Completions API (streaming)
curl -N -X POST http://localhost:52415/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "messages": [{"role": "user", "content": "Hello"}], "stream": true, "max_tokens": 20}'
# Claude Messages API (streaming)
curl -N -X POST http://localhost:52415/v1/messages \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "max_tokens": 50, "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
# OpenAI Responses API (streaming)
curl -N -X POST http://localhost:52415/v1/responses \
-H "Content-Type: application/json" \
-d '{"model": "llama-3.2-1b", "input": "Hello", "stream": true, "max_output_tokens": 20}'
```
All endpoints tested successfully with proper response formats and
streaming events.
### Automated Testing
- Tests in `src/exo/master/tests/` all pass (85 tests)
- Type checker (basedpyright) passes with 0 errors
- Linter (ruff) passes
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Evan <evanev7@gmail.com>