**Enabling peers to be discovered in environments where mDNS is
unavailable (SSH sessions, headless servers, Docker).**
## Motivation
Exo discovers peers exclusively via mDNS, which works great on a local
network but breaks once you move beyond a single L2 broadcast domain:
- SSH sessions on macOS — TCC blocks mDNS multicast from non-GUI
sessions (#1488)
- Headless servers/rack machines — #1682 ("DGX Spark does not find other
nodes")
- Docker Compose — mDNS is often unavailable across container networks;
e.g. #1462 (E2E test framework) needs an alternative
Related works:
#1488 (working implementation made by @AlexCheema and closed because SSH
had a GUI workaround),
#1023 (Headscale WAN then closed due to merge conflicts),
#1656 (discovery cleanup, open).
This PR introduces an optional bootstrap mechanism for peer discovery
while leaving the existing mDNS behavior unchanged.
## Changes
Adds two new CLI flags:
- `--bootstrap-peers` (env: `EXO_BOOTSTRAP_PEERS`) — comma-separated
libp2p multiaddrs to dial on startup and retry periodically
- `--libp2p-port` — fixed TCP port for libp2p to listen on (default:
OS-assigned). Required when bootstrap peers, so other nodes know which
port to dial.
8 files:
- `rust/networking/src/discovery.rs`: Store bootstrap addrs, dial in
existing retry loop
- `rust/networking/src/swarm.rs`: Thread `bootstrap_peers` parameter to
`Behaviour`
- `rust/networking/examples/chatroom.rs`: Updated call site for new
create_swarm signature
- `rust/networking/tests/bootstrap_peers.rs`: Integration tests
- `rust/exo_pyo3_bindings/src/networking.rs`: Accept optional
`bootstrap_peers` in PyO3 constructor
- `rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi` : Update type stub
- `src/exo/routing/router.py`: Pass peers to `NetworkingHandle`
- `src/exo/main.py` : `--bootstrap-peers` CLI arg +
`EXO_BOOTSTRAP_PEERS` env var
## Why It Works
Bootstrap peers are dialed in the existing retry loop — the same path
taken by peers when mDNS-discovered. The swarm handles connection, Noise
handshake, and gossipsub mesh joining from there.
PeerId is intentionally not required in the multiaddr, the Noise
handshake discovers it.
Docker Compose example:
```yaml
services:
exo-1:
environment:
EXO_BOOTSTRAP_PEERS: "/ip4/exo-2/tcp/30000"
exo-2:
environment:
EXO_BOOTSTRAP_PEERS: "/ip4/exo-1/tcp/30000"
```
## Test Plan
### Manual Testing
<details>
<summary>Docker Compose config</summary>
```
services:
exo-node1:
build:
context: .
dockerfile: Dockerfile.bootstrap-test
container_name: exo-bootstrap-node1
hostname: exo-node1
command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.3/tcp/30000"]
environment:
- EXO_LIBP2P_NAMESPACE=bootstrap-test
ports:
- "52415:52415"
networks:
bootstrap-net:
ipv4_address: 172.30.20.2
deploy:
resources:
limits:
memory: 4g
exo-node2:
build:
context: .
dockerfile: Dockerfile.bootstrap-test
container_name: exo-bootstrap-node2
hostname: exo-node2
command: ["-q", "--libp2p-port", "30000", "--bootstrap-peers", "/ip4/172.30.20.2/tcp/30000"]
environment:
- EXO_LIBP2P_NAMESPACE=bootstrap-test
ports:
- "52416:52415"
networks:
bootstrap-net:
ipv4_address: 172.30.20.3
deploy:
resources:
limits:
memory: 4g
networks:
bootstrap-net:
driver: bridge
ipam:
config:
- subnet: 172.30.20.0/24
```
</details>
Two containers on a bridge network (`172.30.20.0/24`), fixed IPs,
`--libp2p-port 30000`, cross-referencing `--bootstrap-peers`.
Both nodes found each other and established a connection then ran the
election protocol.
### Automated Testing
4 Rust integration tests in `rust/networking/tests/bootstrap_peers.rs`
(`cargo test -p networking`):
| Test | What it verifies | Result |
|------|-----------------|--------|
| `two_nodes_connect_via_bootstrap_peers` | Node B discovers Node A via
bootstrap addr (real TCP connection) | PASS |
| `create_swarm_with_empty_bootstrap_peers` | Backward compatibility —
no bootstrap peers works | PASS |
| `create_swarm_ignores_invalid_bootstrap_addrs` | Invalid multiaddrs
silently filtered | PASS |
| `create_swarm_with_fixed_port` | `listen_port` parameter works | PASS
|
All 4 pass. The connection test takes ~6s
---------
Signed-off-by: DeepZima <deepzima@outlook.com>
Co-authored-by: Evan <evanev7@gmail.com>