mirror of
https://github.com/exo-explore/exo.git
synced 2026-04-18 04:52:40 -04:00
**Enabling peers to be discovered in environments where mDNS is unavailable (SSH sessions, headless servers, Docker).** ## Motivation Exo discovers peers exclusively via mDNS, which works great on a local network but breaks once you move beyond a single L2 broadcast domain: - SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI sessions (#1488) - Headless servers/rack machines — #1682 ("DGX Spark does not find other nodes") - Docker Compose — mDNS is often unavailable across container networks; e.g. #1462 (E2E test framework) needs an alternative Related works: #1488 (working implementation made by @AlexCheema and closed because SSH had a GUI workaround), #1023 (Headscale WAN then closed due to merge conflicts), #1656 (discovery cleanup, open). This PR introduces an optional bootstrap mechanism for peer discovery while leaving the existing mDNS behavior unchanged. ## Changes Adds two new CLI flags: - `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated libp2p multiaddrs to dial on startup and retry periodically - `--libp2p-port` — fixed TCP port for libp2p to listen on (default: OS-assigned). Required when bootstrap peers, so other nodes know which port to dial. 8 files: - `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in existing retry loop - `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to `Behaviour` - `rust/networking/examples/chatroom.rs`: Updated call site for new create_swarm signature - `rust/networking/tests/bootstrap_peers.rs`: Integration tests - `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional `bootstrap_peers` in PyO3 constructor - `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub - `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` - `src/exo/main.py` : `--bootstrap-peers` CLI arg + `EXO_BOOTSTRAP_PEERS` env var ## Why It Works Bootstrap peers are dialed in the existing retry loop — the same path taken by peers when mDNS-discovered. The swarm handles connection, Noise handshake, and gossipsub mesh joining from there. PeerId is intentionally not required in the multiaddr, the Noise handshake discovers it. Docker Compose example: ```yaml services: exo-1: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000" exo-2: environment: EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000" ``` ## Test Plan ### Manual Testing <details> <summary>Docker Compose config</summary> ``` services: exo-node1: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node1 hostname: exo-node1 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52415:52415" networks: bootstrap-net: ipv4_address: 172.30.20.2 deploy: resources: limits: memory: 4g exo-node2: build: context: . dockerfile: Dockerfile.bootstrap-test container_name: exo-bootstrap-node2 hostname: exo-node2 command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"] environment: - EXO_LIBP2P_NAMESPACE=bootstrap-test ports: - "52416:52415" networks: bootstrap-net: ipv4_address: 172.30.20.3 deploy: resources: limits: memory: 4g networks: bootstrap-net: driver: bridge ipam: config: - subnet: 172.30.20.0/24 ``` </details> Two containers on a bridge network (`172.30.20.0/24`), fixed IPs, `--libp2p-port 30000`, cross-referencing `--bootstrap-peers`. Both nodes found each other and established a connection then ran the election protocol. ### Automated Testing 4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs` (`cargo test -p networking`): | Test | What it verifies | Result | |------|-----------------|--------| | `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via bootstrap addr (real TCP connection) | PASS | | `create_swarm_with_empty_bootstrap_peers` | Backward compatibility — no bootstrap peers works | PASS | | `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs silently filtered | PASS | | `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS | All 4 pass. The connection test takes ~6s --------- Signed-off-by: DeepZima <deepzima@outlook.com> Co-authored-by: Evan <evanev7@gmail.com>
87 lines
2.7 KiB
Rust
87 lines
2.7 KiB
Rust
use futures_lite::StreamExt;
|
|
use libp2p::identity;
|
|
use networking::swarm;
|
|
use networking::swarm::{FromSwarm, ToSwarm};
|
|
use tokio::sync::{mpsc, oneshot};
|
|
use tokio::{io, io::AsyncBufReadExt as _};
|
|
use tracing_subscriber::EnvFilter;
|
|
use tracing_subscriber::filter::LevelFilter;
|
|
|
|
#[tokio::main]
|
|
async fn main() {
|
|
let _ = tracing_subscriber::fmt()
|
|
.with_env_filter(EnvFilter::from_default_env().add_directive(LevelFilter::INFO.into()))
|
|
.try_init();
|
|
|
|
let (to_swarm, from_client) = mpsc::channel(20);
|
|
|
|
// Configure swarm
|
|
let mut swarm = swarm::create_swarm(
|
|
identity::Keypair::generate_ed25519(),
|
|
from_client,
|
|
vec![],
|
|
0,
|
|
)
|
|
.expect("Swarm creation failed")
|
|
.into_stream();
|
|
|
|
// Create a Gossipsub topic & subscribe
|
|
let (tx, rx) = oneshot::channel();
|
|
_ = to_swarm
|
|
.send(ToSwarm::Subscribe {
|
|
topic: "test-net".to_string(),
|
|
result_sender: tx,
|
|
})
|
|
.await
|
|
.expect("should send");
|
|
|
|
// Read full lines from stdin
|
|
let mut stdin = io::BufReader::new(io::stdin()).lines();
|
|
println!("Enter messages via STDIN and they will be sent to connected peers using Gossipsub");
|
|
|
|
tokio::task::spawn(async move {
|
|
rx.await
|
|
.expect("tx not dropped")
|
|
.expect("subscribe shouldn't fail");
|
|
loop {
|
|
if let Ok(Some(line)) = stdin.next_line().await {
|
|
let (tx, rx) = oneshot::channel();
|
|
if let Err(e) = to_swarm
|
|
.send(swarm::ToSwarm::Publish {
|
|
topic: "test-net".to_string(),
|
|
data: line.as_bytes().to_vec(),
|
|
result_sender: tx,
|
|
})
|
|
.await
|
|
{
|
|
println!("Send error: {e:?}");
|
|
return;
|
|
};
|
|
match rx.await {
|
|
Ok(Err(e)) => println!("Publish error: {e:?}"),
|
|
Err(e) => println!("Publish error: {e:?}"),
|
|
Ok(_) => {}
|
|
}
|
|
}
|
|
}
|
|
});
|
|
|
|
// Kick it off
|
|
loop {
|
|
// on gossipsub outgoing
|
|
match swarm.next().await {
|
|
// on gossipsub incoming
|
|
Some(FromSwarm::Discovered { peer_id }) => {
|
|
println!("\n\nconnected to {peer_id}\n\n")
|
|
}
|
|
Some(FromSwarm::Expired { peer_id }) => {
|
|
println!("\n\ndisconnected from {peer_id}\n\n")
|
|
}
|
|
Some(FromSwarm::Message { from, topic, data }) => {
|
|
println!("{topic}/{from}:\n{}", String::from_utf8_lossy(&data))
|
|
}
|
|
None => {}
|
|
}
|
|
}
|
|
}
|