Files
exo/rust/networking/examples/chatroom.rs
DeepZima 2da740c387 Feat/static peer discovery (#1690)
**Enabling peers to be discovered in environments where mDNS is
unavailable (SSH sessions, headless servers, Docker).**

## Motivation
Exo discovers peers exclusively via mDNS, which works great on a local
network but breaks once you move beyond a single L2 broadcast domain:

- SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI
sessions (#1488)
- Headless servers/rack machines — #1682 ("DGX Spark does not find other
nodes")
- Docker Compose — mDNS is often unavailable across container networks;
e.g. #1462 (E2E test framework) needs an alternative

Related works: 
#1488 (working implementation made by @AlexCheema and closed because SSH
had a GUI workaround),
#1023 (Headscale WAN then closed due to merge conflicts), 
#1656 (discovery cleanup, open). 

This PR introduces an optional bootstrap mechanism for peer discovery
while leaving the existing mDNS behavior unchanged.

## Changes
Adds two new CLI flags:

- `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated
libp2p multiaddrs to dial on startup and retry periodically
- `--libp2p-port` — fixed TCP port for libp2p to listen on (default:
OS-assigned). Required when bootstrap peers, so other nodes know which
port to dial.

8 files: 
- `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in
existing retry loop
- `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to
`Behaviour`
- `rust/networking/examples/chatroom.rs`: Updated call site for new
create_swarm signature
- `rust/networking/tests/bootstrap_peers.rs`: Integration tests
- `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional
`bootstrap_peers` in PyO3 constructor
- `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub 
- `src/exo/routing/router.py`: Pass peers to `NetworkingHandle` 
- `src/exo/main.py` : `--bootstrap-peers` CLI arg +
`EXO_BOOTSTRAP_PEERS` env var

## Why It Works

Bootstrap peers are dialed in the existing retry loop — the same path
taken by peers when mDNS-discovered. The swarm handles connection, Noise
handshake, and gossipsub mesh joining from there.

PeerId is intentionally not required in the multiaddr, the Noise
handshake discovers it.

Docker Compose example:

```yaml
services:
  exo-1:
    environment:
      EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000"
  exo-2:
    environment:
      EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000"
```

## Test Plan

### Manual Testing
<details>
<summary>Docker Compose config</summary>

```
services:
  exo-node1:
    build:
      context: .
      dockerfile: Dockerfile.bootstrap-test
    container_name: exo-bootstrap-node1
    hostname: exo-node1
    command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"]
    environment:
      - EXO_LIBP2P_NAMESPACE=bootstrap-test
    ports:
      - "52415:52415"
    networks:
      bootstrap-net:
        ipv4_address: 172.30.20.2
    deploy:
      resources:
        limits:
          memory: 4g

  exo-node2:
    build:
      context: .
      dockerfile: Dockerfile.bootstrap-test
    container_name: exo-bootstrap-node2
    hostname: exo-node2
    command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"]
    environment:
      - EXO_LIBP2P_NAMESPACE=bootstrap-test
    ports:
      - "52416:52415"
    networks:
      bootstrap-net:
        ipv4_address: 172.30.20.3
    deploy:
      resources:
        limits:
          memory: 4g

networks:
  bootstrap-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.30.20.0/24
```
</details> 

Two containers on a bridge network (`172.30.20.0/24`), fixed IPs,
`--libp2p-port 30000`, cross-referencing `--bootstrap-peers`.

Both nodes found each other and established a connection then ran the
election protocol.

### Automated Testing

4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs`
(`cargo test -p networking`):

| Test | What it verifies | Result |
|------|-----------------|--------|
| `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via
bootstrap addr (real TCP connection) | PASS |
| `create_swarm_with_empty_bootstrap_peers` | Backward compatibility —
no bootstrap peers works | PASS |
| `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs
silently filtered | PASS |
| `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS
|

All 4 pass. The connection test takes ~6s

---------

Signed-off-by: DeepZima <deepzima@outlook.com>
Co-authored-by: Evan <evanev7@gmail.com>
2026-03-25 10:55:12 +00:00

87 lines
2.7 KiB
Rust

use futures_lite::StreamExt;
use libp2p::identity;
use networking::swarm;
use networking::swarm::{FromSwarm, ToSwarm};
use tokio::sync::{mpsc, oneshot};
use tokio::{io, io::AsyncBufReadExt as _};
use tracing_subscriber::EnvFilter;
use tracing_subscriber::filter::LevelFilter;
#[tokio::main]
async fn main() {
let _ = tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env().add_directive(LevelFilter::INFO.into()))
.try_init();
let (to_swarm, from_client) = mpsc::channel(20);
// Configure swarm
let mut swarm = swarm::create_swarm(
identity::Keypair::generate_ed25519(),
from_client,
vec![],
0,
)
.expect("Swarm creation failed")
.into_stream();
// Create a Gossipsub topic & subscribe
let (tx, rx) = oneshot::channel();
_ = to_swarm
.send(ToSwarm::Subscribe {
topic: "test-net".to_string(),
result_sender: tx,
})
.await
.expect("should send");
// Read full lines from stdin
let mut stdin = io::BufReader::new(io::stdin()).lines();
println!("Enter messages via STDIN and they will be sent to connected peers using Gossipsub");
tokio::task::spawn(async move {
rx.await
.expect("tx not dropped")
.expect("subscribe shouldn't fail");
loop {
if let Ok(Some(line)) = stdin.next_line().await {
let (tx, rx) = oneshot::channel();
if let Err(e) = to_swarm
.send(swarm::ToSwarm::Publish {
topic: "test-net".to_string(),
data: line.as_bytes().to_vec(),
result_sender: tx,
})
.await
{
println!("Send error: {e:?}");
return;
};
match rx.await {
Ok(Err(e)) => println!("Publish error: {e:?}"),
Err(e) => println!("Publish error: {e:?}"),
Ok(_) => {}
}
}
}
});
// Kick it off
loop {
// on gossipsub outgoing
match swarm.next().await {
// on gossipsub incoming
Some(FromSwarm::Discovered { peer_id }) => {
println!("\n\nconnected to {peer_id}\n\n")
}
Some(FromSwarm::Expired { peer_id }) => {
println!("\n\ndisconnected from {peer_id}\n\n")
}
Some(FromSwarm::Message { from, topic, data }) => {
println!("{topic}/{from}:\n{}", String::from_utf8_lossy(&data))
}
None => {}
}
}
}