mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-17 20:40:35 -04:00

Author	SHA1	Message	Date
rltakashige	59669c1168	Tighten EXO bench concurrency numbers and explain methodology (#1811 ) ## Motivation The timings in the batch generator are a little optimistic; a minor change is needed to make them more correct. ## Changes Include the time spent in the API in the generation tps and make sure to send all requests simultaneously	2026-04-05 00:57:52 +01:00
vskiwi	fc1ae90111	fix: DeepSeek V3.2 warmup crash and tool calling + add catalog cards (#1769 ) ## Summary DeepSeek V3.2 (`DeepseekV32ForCausalLM`) is already supported by exo's inference engine (architecture whitelisted in `model_cards.py`, DSML encoding added in #1548), but doesn't work out of the box due to two bugs: ### Bug 1: `warmup_inference` passes empty model ID `warmup_inference()` in `generate.py` accepts `model_id: ModelId` as a parameter but creates `TextGenerationTaskParams(model=ModelId(""), ...)` instead of using it. Since `_needs_dsml_encoding()` checks `"deepseek-v3.2" in task_params.model.lower()`, the empty string never matches → falls back to `tokenizer.apply_chat_template()` → ValueError because V3.2 has no Jinja chat template. Fix: `model=ModelId("")` → `model=model_id` (one line). ### Bug 2: `_needs_dsml_encoding` limited to tool calling `_needs_dsml_encoding()` returns `True` only when `task_params.tools` is present or tool messages exist in `chat_template_messages`. For warmup and regular chat requests without tools → `return False` → Jinja fallback → ValueError. Unlike V3.1 (which has a `.jinja` chat template file that transformers picks up automatically), V3.2 has no Jinja template at all — it uses Python-based DSML encoding for all message types. Fix: For V3.2, always return `True` — DSML encoding handles all message types. ### Catalog cards Added inference model cards for: - `mlx-community/DeepSeek-V3.2-8bit` - `mlx-community/DeepSeek-V3.2-4bit` Parameters taken from model `config.json` on HuggingFace, storage sizes from HF API. Capabilities include `thinking_toggle` (related: #1456). ## Notes - The model ID string matching approach (`"deepseek-v3.2" in model.lower()`) is acknowledged tech debt — see #1371 for the planned architecture-based approach. ## Test plan - [x] Start exo with DeepSeek V3.2 model → warmup should complete without crash - [x] Send a regular chat message (no tools) → should get a response - [x] Send a chat message with tools → should work as before - [x] V3.2 cards should appear in the dashboard model catalog --------- Co-authored-by: user <user@m1.note> Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net> Co-authored-by: Evan <evanev7@gmail.com>	2026-03-25 16:20:35 +00:00
DeepZima	2da740c387	Feat/static peer discovery (#1690 ) Enabling peers to be discovered in environments where mDNS is unavailable (SSH sessions, headless servers, Docker). ## Motivation Exo discovers peers exclusively via mDNS, which works great on a local network but breaks once you move beyond a single L2 broadcast domain: - SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI sessions (#1488) - Headless servers/rack machines — #1682 ("DGX Spark does not find other nodes") - Docker Compose — mDNS is often unavailable across container networks; e.g. #1462 (E2E test framework) needs an alternative Related works: #1488 (working implementation made by @AlexCheema and closed because SSH had a GUI workaround), #1023 (Headscale WAN then closed due to merge conflicts), #1656 (discovery cleanup, open). This PR introduces an optional bootstrap mechanism for peer discovery while leaving the existing mDNS behavior unchanged. ## Changes Adds two new CLI flags: - `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated libp2p multiaddrs to dial on startup and retry periodically - `--libp2p-port` — fixed TCP port for libp2p to listen on (default: OS-assigned). Required when bootstrap peers, so other nodes know which port to dial. 8 files: - `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in existing retry loop - `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to `Behaviour` - `rust/networking/examples/chatroom.rs`: Updated call site for new create_swarm signature - `rust/networking/tests/bootstrap_peers.rs`: Integration tests - `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional `bootstrap_peers` in PyO3 constructor - `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub - `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` - `src/exo/main.py` : `--bootstrap-peers` CLI arg + `EXO_BOOTSTRAP_PEERS` env var ## Why It Works Bootstrap peers are dialed in the existing retry loop — the same path taken by peers when mDNS-discovered. The swarm handles connection, Noise handshake, and gossipsub mesh joining from there. PeerId is intentionally not required in the multiaddr, the Noise handshake discovers it. Docker Compose example: ```yaml services: exo-1: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000" exo-2: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000" ``` ## Test Plan ### Manual Testing <details> <summary>Docker Compose config</summary> ``` services: exo-node1: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node1 hostname: exo-node1 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52415:52415" networks: bootstrap-net: ipv4_address: 172.30.20.2 deploy: resources: limits: memory: 4g exo-node2: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node2 hostname: exo-node2 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52416:52415" networks: bootstrap-net: ipv4_address: 172.30.20.3 deploy: resources: limits: memory: 4g networks: bootstrap-net: driver: bridge ipam: config: - subnet: 172.30.20.0/24 ``` </details> Two containers on a bridge network (`172.30.20.0/24`), fixed IPs, `--libp2p-port 30000`, cross-referencing `--bootstrap-peers`. Both nodes found each other and established a connection then ran the election protocol. ### Automated Testing 4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs` (`cargo test -p networking`): \| Test \| What it verifies \| Result \| \|------\|-----------------\|--------\| \| `two_nodes_connect_via_bootstrap_peers` \| Node B discovers Node A via bootstrap addr (real TCP connection) \| PASS \| \| `create_swarm_with_empty_bootstrap_peers` \| Backward compatibility — no bootstrap peers works \| PASS \| \| `create_swarm_ignores_invalid_bootstrap_addrs` \| Invalid multiaddrs silently filtered \| PASS \| \| `create_swarm_with_fixed_port` \| `listen_port` parameter works \| PASS \| All 4 pass. The connection test takes ~6s --------- Signed-off-by: DeepZima <deepzima@outlook.com> Co-authored-by: Evan <evanev7@gmail.com>	2026-03-25 10:55:12 +00:00
Evan Quiney	0096159728	up gossipsub limit (#1671 ) bump gossipsub limit to 8MB + add warning this is a bandaid solution for large messages while we figure out the point to point protocol	2026-03-06 14:20:51 +00:00
rltakashige	37296c8249	Refactor runner for implementing batching (#1632 ) ## Motivation Batching will require us to send tasks concurrently and queue them up. Our current infrastructure cannot handle that all. This PR gets us closer to this by allowing multiple tasks to be sent in parallel and then queuing up tasks. ## Changes Change Plan logic Make runner main into a class Add a "BatchGenerator" to which tasks can be submitted (although tasks are handled sequentially) and sent back through an MpSender. Refactor runner to accept tasks during generation Keep the generator threading Separate the runner into several files for better readability ## Test Plan ### Manual Testing Tested manually, needs a lot more automated testing. Cancellation still works on a single device. Needs checking on multiple devices. ### Automated Testing --------- Co-authored-by: Evan Quiney <evanev7@gmail.com>	2026-03-03 14:38:55 +00:00
Jake Hillion	dc0bb5e13b	fmt: add taplo TOML formatter to treefmt configuration	2026-02-27 17:19:48 +00:00
Evan Quiney	db73c4fd5d	move messaging into rust (#1549 ) the main body of the rust refactor. fixes the tokio panic on shutdown. simplifies the networking module significantly. doesn't touch lp2p behaviour	2026-02-26 13:58:22 +00:00
vskiwi	dab7ed4821	fix: handle gossipsub MessageTooLarge error to prevent silent crash (#1583 ) ## Summary Large prompts (70K+ tokens / ~500KB+ JSON) cause exo to silently crash. The root cause is an unhandled `PublishError::MessageTooLarge` from gossipsub when serialized `TextGeneration` commands exceed the 1MB `max_transmit_size` limit. The error propagates as a generic Python exception through the PyO3 bindings. Since `_networking_publish` in `router.py` only catches `NoPeersSubscribedToTopicError` and `AllQueuesFullError`, the unhandled exception crashes the networking async task, causing exo to shut down silently — no error message, no API response. ## Changes - Rust (PyO3 bindings): Add `MessageTooLargeError` exception class and handle `PublishError::MessageTooLarge` explicitly in the gossipsub publish path, matching the existing pattern for `NoPeersSubscribedToTopicError` and `AllQueuesFullError` - Python (router): Catch `MessageTooLargeError` in `_networking_publish` and log a warning with the message size, preventing the networking task from crashing ## Reproduction On a multi-node cluster with a large model (e.g., GLM-5 754B tensor parallel over JACCL RDMA): 1. Send a chat completion request with ~70K+ tokens 2. exo silently shuts down — no error logged, curl gets no response 3. With shorter prompts (< ~50K tokens): works fine ## Test plan - Verified `cargo check` passes for `networking` and `exo_pyo3_bindings` crates - Verified `ruff check` passes for modified Python files - Manual testing on 4× Mac Studio M3 Ultra cluster: 50K token requests pass, 70K+ previously caused silent shutdown, now logs a warning and drops the oversized message gracefully Co-authored-by: vsm <vsm@nomail.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-02-23 16:21:21 +00:00
Evan Quiney	4c4c6ce99f	simplify rust ident module this is partly dead code, partly narrowing the rust-python boundary in prep for future rewrites. no testing as this is all type safe refactoring.	2026-02-19 17:19:31 +00:00
Evan Quiney	cacb456cb2	remove nightly (#1538 ) we have no good need for rust nightly (nor futures, for that matter)	2026-02-19 12:55:31 +00:00
Evan Quiney	8f01523ddb	remove dead code (#1496 )	2026-02-18 11:43:27 +00:00
Jake Hillion	199df64cfc	util: remove VecExt trait, inline at call site (#1446 ) VecExt added a .map() convenience method on Vec<T> that simply called .into_iter().map(f).collect(). This thin wrapper provided no optimisation benefit and obscured a standard iterator pattern behind a nightly feature gate and an extra dependency. Replaced the single call site in exo_pyo3_bindings with the equivalent iterator chain and removed the ext module, the extend dependency, and the trait_alias feature gate from the util crate. Test plan: - CI	2026-02-10 20:45:57 +00:00
Jake Hillion	c37eb24331	util: remove dead code (#1445 ) The util crate contained several unused items: NonemptyArray, BoxedSliceExt, a blanket Sealed trait, an empty alias module, six unused nightly feature gates, and five unused Cargo dependencies (thiserror, once_cell, internment, derive_more, bon, recursion). Removed all items that had no references outside their own definitions, keeping only WakerDeque, VecExt, and the trait_alias feature gate which are actively used by the networking and exo_pyo3_bindings crates. Test plan: - CI	2026-02-10 20:10:54 +00:00
Jake Hillion	1f242e8eee	gossipsub: stop silent message dropping and warn (#1434 ) The 15-second publish_queue_duration caused messages in peer queues to be silently dropped. When events are dropped, workers detect gaps in the event index sequence and request missing events via the NACK path (RequestEventLog), but this recovery is inefficient. Removed the timeout configuration - gossipsub now uses its default behavior without time-based eviction. If queue buildup is a concern, queue size should be limited explicitly rather than dropping by timeout. Split error handling to log AllQueuesFullError as a warning (indicates peers are unresponsive) while keeping NoPeersSubscribedToTopicError silent (expected during startup and network partitions). Test plan: - CI	2026-02-10 18:47:47 +00:00
Evan	4bc4d50685	rust: remove dead code the system custodian has been made unnecessary with the swift app - we can remove it ## testing everything still builds	2026-01-15 16:51:46 +00:00
Jake Hillion	e6434ec446	nix: add Rust builds with crane and fenix The Rust workspace lacked Nix build support, making it difficult to build packages reproducibly or run checks in CI. Added a flake-parts module at rust/parts.nix that uses crane for Rust builds and fenix for the nightly toolchain. The source filter isolates rust/ and root Cargo files to prevent Python/docs changes from triggering Rust rebuilds. Exports packages (system_custodian, exo_pyo3_bindings wheel, exo-rust-workspace) and checks (cargo-nextest, cargo-doc) for all three target platforms. The devShell now uses inputsFrom to inherit build dependencies from the workspace package, removing the need for manual pkg-config/openssl setup. Test plan: - Ran `nix flake check` successfully - Built `nix build ".#checks.x86_64-linux.cargo-nextest"` and tests pass - Built `nix build ".#exo_pyo3_bindings"` and wheel is produced	2026-01-14 11:52:29 +00:00
Jake Hillion	5629983809	fmt: format all python/rust/nix files	2025-12-05 16:58:55 +00:00
Jake Hillion	5ef1df1e10	rust: move Cargo.toml to the root	2025-12-05 12:01:44 +00:00
Evan	7088988a65	bump pyo3 stub-gen	2025-11-25 12:13:53 +00:00
Alex Cheema	19e90572e6	set max_transmit_size on gossipsub to 1MB. Fixes large message erorr	2025-11-06 19:18:48 +00:00
rltakashige	16f724e24c	Update staging 14 Co-authored-by: Evan <evanev7@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: David Munha Canas Correia <dmunha@MacBook-David.local> Co-authored-by: github-actions bot <github-actions@users.noreply.github.com>	2025-11-05 01:44:24 +00:00
Evan Quiney	3b409647ba	Squash merge merging_clusters into tensor_parallel94	2025-10-31 17:41:57 +00:00
Evan Quiney	962e5ef40d	version bump for brew consistency	2025-10-07 15:18:54 +01:00
Evan Quiney	38ff949bf4	big refactor Fix. Everything. Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: Seth Howes <sethshowes@gmail.com>	2025-09-30 11:03:04 +01:00
Alex Cheema	92c9688bf0	Remove rust	2025-08-02 08:16:39 -07:00
Gelu Vrabie	0e32599e71	fix libp2p + other prs that were wrongly overwritten before (111,112,117,118,1119 + misc commits from Alex) Co-authored-by: Gelu Vrabie <gelu@exolabs.net> Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com> Co-authored-by: Seth Howes <71157822+sethhowes@users.noreply.github.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com>	2025-07-31 20:36:47 +01:00
Alex Cheema	57ca487fde	Fixes for running this end to end Co-authored-by: Gelu Vrabie <gelu.vrabie.univ@gmail.com> Co-authored-by: Gelu Vrabie <gelu@exolabs.net>	2025-07-28 10:51:03 +01:00
Andrei Cravtov	b687dec6b2	Discovery integration master Co-authored-by: Alex Cheema <alexcheema123@gmail.com>	2025-07-27 13:43:59 +01:00
Andrei Cravtov	3ab5609289	wrote race-condition-free persistent NodeID-getting function	2025-07-23 20:18:56 +01:00
Andrei Cravtov	8d2536d926	Implemented basic discovery library in Rust + python bindings Co-authored-by: Gelu Vrabie <gelu@exolabs.net> Co-authored-by: Seth Howes <sethshowes@gmail.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com>	2025-07-23 13:11:29 +01:00

30 Commits