Commit Graph

30 Commits

Author SHA1 Message Date
rltakashige
59669c1168 Tighten EXO bench concurrency numbers and explain methodology (#1811)
## Motivation

The timings in the batch generator are a little optimistic; a minor
change is needed to make them more correct.

## Changes

Include the time spent in the API in the generation tps and make sure to
send all requests simultaneously
2026-04-05 00:57:52 +01:00
vskiwi
fc1ae90111 fix: DeepSeek V3.2 warmup crash and tool calling + add catalog cards (#1769)
## Summary

DeepSeek V3.2 (`DeepseekV32ForCausalLM`) is already supported by exo's
inference engine (architecture whitelisted in `model_cards.py`, DSML
encoding added in #1548), but **doesn't work out of the box** due to two
bugs:

### Bug 1: `warmup_inference` passes empty model ID

`warmup_inference()` in `generate.py` accepts `model_id: ModelId` as a
parameter but creates `TextGenerationTaskParams(model=ModelId(""), ...)`
instead of using it. Since `_needs_dsml_encoding()` checks
`"deepseek-v3.2" in task_params.model.lower()`, the empty string never
matches → falls back to `tokenizer.apply_chat_template()` →
**ValueError** because V3.2 has no Jinja chat template.

**Fix:** `model=ModelId("")` → `model=model_id` (one line).

### Bug 2: `_needs_dsml_encoding` limited to tool calling

`_needs_dsml_encoding()` returns `True` only when `task_params.tools` is
present or tool messages exist in `chat_template_messages`. For warmup
and regular chat requests without tools → `return False` → Jinja
fallback → **ValueError**.

Unlike V3.1 (which has a `.jinja` chat template file that transformers
picks up automatically), V3.2 **has no Jinja template at all** — it uses
Python-based DSML encoding for all message types.

**Fix:** For V3.2, always return `True` — DSML encoding handles all
message types.

### Catalog cards

Added inference model cards for:
- `mlx-community/DeepSeek-V3.2-8bit`
- `mlx-community/DeepSeek-V3.2-4bit`

Parameters taken from model `config.json` on HuggingFace, storage sizes
from HF API. Capabilities include `thinking_toggle` (related: #1456).

## Notes

- The model ID string matching approach (`"deepseek-v3.2" in
model.lower()`) is acknowledged tech debt — see #1371 for the planned
architecture-based approach.

## Test plan

- [x] Start exo with DeepSeek V3.2 model → warmup should complete
without crash
- [x] Send a regular chat message (no tools) → should get a response
- [x] Send a chat message with tools → should work as before
- [x] V3.2 cards should appear in the dashboard model catalog

---------

Co-authored-by: user <user@m1.note>
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
Co-authored-by: Evan <evanev7@gmail.com>
2026-03-25 16:20:35 +00:00
DeepZima
2da740c387 Feat/static peer discovery (#1690)
**Enabling peers to be discovered in environments where mDNS is
unavailable (SSH sessions, headless servers, Docker).**

## Motivation
Exo discovers peers exclusively via mDNS, which works great on a local
network but breaks once you move beyond a single L2 broadcast domain:

- SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI
sessions (#1488)
- Headless servers/rack machines — #1682 ("DGX Spark does not find other
nodes")
- Docker Compose — mDNS is often unavailable across container networks;
e.g. #1462 (E2E test framework) needs an alternative

Related works: 
#1488 (working implementation made by @AlexCheema and closed because SSH
had a GUI workaround),
#1023 (Headscale WAN then closed due to merge conflicts), 
#1656 (discovery cleanup, open). 

This PR introduces an optional bootstrap mechanism for peer discovery
while leaving the existing mDNS behavior unchanged.

## Changes
Adds two new CLI flags:

- `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated
libp2p multiaddrs to dial on startup and retry periodically
- `--libp2p-port` — fixed TCP port for libp2p to listen on (default:
OS-assigned). Required when bootstrap peers, so other nodes know which
port to dial.

8 files: 
- `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in
existing retry loop
- `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to
`Behaviour`
- `rust/networking/examples/chatroom.rs`: Updated call site for new
create_swarm signature
- `rust/networking/tests/bootstrap_peers.rs`: Integration tests
- `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional
`bootstrap_peers` in PyO3 constructor
- `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub 
- `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` 
- `src/exo/main.py` : `--bootstrap-peers` CLI arg +
`EXO_BOOTSTRAP_PEERS` env var

## Why It Works

Bootstrap peers are dialed in the existing retry loop — the same path
taken by peers when mDNS-discovered. The swarm handles connection, Noise
handshake, and gossipsub mesh joining from there.

PeerId is intentionally not required in the multiaddr, the Noise
handshake discovers it.

Docker Compose example:

```yaml
services:
  exo-1:
    environment:
      EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000"
  exo-2:
    environment:
      EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000"
```

## Test Plan

### Manual Testing
<details>
<summary>Docker Compose config</summary>

```
services:
  exo-node1:
    build:
      context: .
      dockerfile: Dockerfile.bootstrap-test
    container_name: exo-bootstrap-node1
    hostname: exo-node1
    command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"]
    environment:
      - EXO_LIBP2P_NAMESPACE=bootstrap-test
    ports:
      - "52415:52415"
    networks:
      bootstrap-net:
        ipv4_address: 172.30.20.2
    deploy:
      resources:
        limits:
          memory: 4g

  exo-node2:
    build:
      context: .
      dockerfile: Dockerfile.bootstrap-test
    container_name: exo-bootstrap-node2
    hostname: exo-node2
    command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"]
    environment:
      - EXO_LIBP2P_NAMESPACE=bootstrap-test
    ports:
      - "52416:52415"
    networks:
      bootstrap-net:
        ipv4_address: 172.30.20.3
    deploy:
      resources:
        limits:
          memory: 4g

networks:
  bootstrap-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.30.20.0/24
```
</details> 

Two containers on a bridge network (`172.30.20.0/24`), fixed IPs,
`--libp2p-port 30000`, cross-referencing `--bootstrap-peers`.

Both nodes found each other and established a connection then ran the
election protocol.

### Automated Testing

4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs`
(`cargo test -p networking`):

| Test | What it verifies | Result |
|------|-----------------|--------|
| `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via
bootstrap addr (real TCP connection) | PASS |
| `create_swarm_with_empty_bootstrap_peers` | Backward compatibility —
no bootstrap peers works | PASS |
| `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs
silently filtered | PASS |
| `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS
|

All 4 pass. The connection test takes ~6s

---------

Signed-off-by: DeepZima <deepzima@outlook.com>
Co-authored-by: Evan <evanev7@gmail.com>
2026-03-25 10:55:12 +00:00
Evan Quiney
0096159728 up gossipsub limit (#1671)
bump gossipsub limit to 8MB + add warning
this is a bandaid solution for large messages while we figure out the
point to point protocol
2026-03-06 14:20:51 +00:00
rltakashige
37296c8249 Refactor runner for implementing batching (#1632)
## Motivation

Batching will require us to send tasks concurrently and queue them up.
Our current infrastructure cannot handle that all. This PR gets us
closer to this by allowing multiple tasks to be sent in parallel and
then queuing up tasks.

## Changes

Change Plan logic
Make runner main into a class
Add a "BatchGenerator" to which tasks can be submitted (although tasks
are handled sequentially) and sent back through an MpSender.
Refactor runner to accept tasks during generation
Keep the generator threading
Separate the runner into several files for better readability

## Test Plan

### Manual Testing
Tested manually, needs a lot more automated testing. Cancellation still
works on a single device. Needs checking on multiple devices.

### Automated Testing

---------

Co-authored-by: Evan Quiney <evanev7@gmail.com>
2026-03-03 14:38:55 +00:00
Jake Hillion
dc0bb5e13b fmt: add taplo TOML formatter to treefmt configuration 2026-02-27 17:19:48 +00:00
Evan Quiney
db73c4fd5d move messaging into rust (#1549)
the main body of the rust refactor. fixes the tokio panic on shutdown.
simplifies the networking module significantly. doesn't touch lp2p
behaviour
2026-02-26 13:58:22 +00:00
vskiwi
dab7ed4821 fix: handle gossipsub MessageTooLarge error to prevent silent crash (#1583)
## Summary

Large prompts (70K+ tokens / ~500KB+ JSON) cause exo to silently crash.
The root cause is an unhandled `PublishError::MessageTooLarge` from
gossipsub when serialized `TextGeneration` commands exceed the 1MB
`max_transmit_size` limit.

The error propagates as a generic Python exception through the PyO3
bindings. Since `_networking_publish` in `router.py` only catches
`NoPeersSubscribedToTopicError` and `AllQueuesFullError`, the unhandled
exception crashes the networking async task, causing exo to shut down
silently — no error message, no API response.

## Changes

- **Rust (PyO3 bindings):** Add `MessageTooLargeError` exception class
and handle `PublishError::MessageTooLarge` explicitly in the gossipsub
publish path, matching the existing pattern for
`NoPeersSubscribedToTopicError` and `AllQueuesFullError`
- **Python (router):** Catch `MessageTooLargeError` in
`_networking_publish` and log a warning with the message size,
preventing the networking task from crashing

## Reproduction

On a multi-node cluster with a large model (e.g., GLM-5 754B tensor
parallel over JACCL RDMA):
1. Send a chat completion request with ~70K+ tokens
2. exo silently shuts down — no error logged, curl gets no response
3. With shorter prompts (< ~50K tokens): works fine

## Test plan

- Verified `cargo check` passes for `networking` and `exo_pyo3_bindings`
crates
- Verified `ruff check` passes for modified Python files
- Manual testing on 4× Mac Studio M3 Ultra cluster: 50K token requests
pass, 70K+ previously caused silent shutdown, now logs a warning and
drops the oversized message gracefully

Co-authored-by: vsm <vsm@nomail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: rltakashige <rl.takashige@gmail.com>
2026-02-23 16:21:21 +00:00
Evan Quiney
4c4c6ce99f simplify rust ident module
this is partly dead code, partly narrowing the rust-python boundary in
prep for future rewrites. no testing as this is all type safe
refactoring.
2026-02-19 17:19:31 +00:00
Evan Quiney
cacb456cb2 remove nightly (#1538)
we have no good need for rust nightly (nor futures, for that matter)
2026-02-19 12:55:31 +00:00
Evan Quiney
8f01523ddb remove dead code (#1496) 2026-02-18 11:43:27 +00:00
Jake Hillion
199df64cfc util: remove VecExt trait, inline at call site (#1446)
VecExt added a .map() convenience method on Vec<T> that simply called
.into_iter().map(f).collect(). This thin wrapper provided no
optimisation benefit and obscured a standard iterator pattern behind a
nightly feature gate and an extra dependency.

Replaced the single call site in exo_pyo3_bindings with the equivalent
iterator chain and removed the ext module, the extend dependency, and
the trait_alias feature gate from the util crate.

Test plan:
- CI
2026-02-10 20:45:57 +00:00
Jake Hillion
c37eb24331 util: remove dead code (#1445)
The util crate contained several unused items: NonemptyArray,
BoxedSliceExt, a blanket Sealed trait, an empty alias module, six unused
nightly feature gates, and five unused Cargo dependencies (thiserror,
once_cell, internment, derive_more, bon, recursion).

Removed all items that had no references outside their own definitions,
keeping only WakerDeque, VecExt, and the trait_alias feature gate which
are actively used by the networking and exo_pyo3_bindings crates.

Test plan:
- CI
2026-02-10 20:10:54 +00:00
Jake Hillion
1f242e8eee gossipsub: stop silent message dropping and warn (#1434)
The 15-second publish_queue_duration caused messages in peer queues to
be silently dropped. When events are dropped, workers detect gaps in the
event index sequence and request missing events via the NACK path
(RequestEventLog), but this recovery is inefficient.

Removed the timeout configuration - gossipsub now uses its default
behavior without time-based eviction. If queue buildup is a concern,
queue size should be limited explicitly rather than dropping by timeout.

Split error handling to log AllQueuesFullError as a warning (indicates
peers are unresponsive) while keeping NoPeersSubscribedToTopicError
silent (expected during startup and network partitions).

Test plan:
- CI
2026-02-10 18:47:47 +00:00
Evan
4bc4d50685 rust: remove dead code
the system custodian has been made unnecessary with the swift app - we
can remove it

## testing
everything still builds
2026-01-15 16:51:46 +00:00
Jake Hillion
e6434ec446 nix: add Rust builds with crane and fenix
The Rust workspace lacked Nix build support, making it difficult to
build packages reproducibly or run checks in CI.

Added a flake-parts module at rust/parts.nix that uses crane for Rust
builds and fenix for the nightly toolchain. The source filter isolates
rust/ and root Cargo files to prevent Python/docs changes from
triggering Rust rebuilds. Exports packages (system_custodian,
exo_pyo3_bindings wheel, exo-rust-workspace) and checks (cargo-nextest,
cargo-doc) for all three target platforms.

The devShell now uses inputsFrom to inherit build dependencies from the
workspace package, removing the need for manual pkg-config/openssl setup.

Test plan:
- Ran `nix flake check` successfully
- Built `nix build ".#checks.x86_64-linux.cargo-nextest"` and tests pass
- Built `nix build ".#exo_pyo3_bindings"` and wheel is produced
2026-01-14 11:52:29 +00:00
Jake Hillion
5629983809 fmt: format all python/rust/nix files 2025-12-05 16:58:55 +00:00
Jake Hillion
5ef1df1e10 rust: move Cargo.toml to the root 2025-12-05 12:01:44 +00:00
Evan
7088988a65 bump pyo3 stub-gen 2025-11-25 12:13:53 +00:00
Alex Cheema
19e90572e6 set max_transmit_size on gossipsub to 1MB. Fixes large message erorr 2025-11-06 19:18:48 +00:00
rltakashige
16f724e24c Update staging 14
Co-authored-by: Evan <evanev7@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: David Munha Canas Correia <dmunha@MacBook-David.local>
Co-authored-by: github-actions bot <github-actions@users.noreply.github.com>
2025-11-05 01:44:24 +00:00
Evan Quiney
3b409647ba Squash merge merging_clusters into tensor_parallel94 2025-10-31 17:41:57 +00:00
Evan Quiney
962e5ef40d version bump for brew consistency 2025-10-07 15:18:54 +01:00
Evan Quiney
38ff949bf4 big refactor
Fix. Everything.

Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
2025-09-30 11:03:04 +01:00
Alex Cheema
92c9688bf0 Remove rust 2025-08-02 08:16:39 -07:00
Gelu Vrabie
0e32599e71 fix libp2p + other prs that were wrongly overwritten before (111,112,117,118,1119 + misc commits from Alex)
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>
Co-authored-by: Seth Howes <71157822+sethhowes@users.noreply.github.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-31 20:36:47 +01:00
Alex Cheema
57ca487fde Fixes for running this end to end
Co-authored-by: Gelu Vrabie <gelu.vrabie.univ@gmail.com>
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
2025-07-28 10:51:03 +01:00
Andrei Cravtov
b687dec6b2 Discovery integration master
Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2025-07-27 13:43:59 +01:00
Andrei Cravtov
3ab5609289 wrote race-condition-free persistent NodeID-getting function 2025-07-23 20:18:56 +01:00
Andrei Cravtov
8d2536d926 Implemented basic discovery library in Rust + python bindings
Co-authored-by: Gelu Vrabie <gelu@exolabs.net>
Co-authored-by: Seth Howes <sethshowes@gmail.com>
Co-authored-by: Matt Beton <matthew.beton@gmail.com>
2025-07-23 13:11:29 +01:00