Files
spacedrive/docs/developers/technology/p2p.mdx
Oscar Beaumont fd10eca115 P2P Docs (#2344)
* Add title to rspc

* wip

* rspc docs title cleanup

* a

* todo

* `p2p/crates`

* remove normalised cache docs

* wip

* words words and more words

* a

* wip

* a

* cleanup settings

* wip p2p docs

* wip
2024-05-14 10:19:33 +00:00

209 lines
9.3 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: p2p
index: 14
---
# Peer-to-peer
Our peer-to-peer technology works at the heart of Spacedrive allowing all of your devices to seamlessly communicate and share data. This documentation outlines
## Implementing features with P2P
TODO:
- From frontend, to backend
- Including authentication
- Versioning/making breaking change
- Show using `sd_p2p_tunnel`
## Underlying technology
### Terminology
- **Node**: An application running Spacedrive's network stack.
- This could be the Spacedrive app or the P2P relay.
- If you have multiple Spacedrive installations open on your computer, each one is an independant node.
- **Library**: A logical collection of your data within Spacedrive.
- From a theorical perspective, a library is just the conflict resolved state of one or more **instances** although a lot of the time we don't stricly treat it that way.
- **Instance**: An instance of a library running on a particular node.
- An instance correlates directly to each SQLite file.
- You could *technically* have more than one instance for a library on a single node, although our core would fall apart as we identify traffic by library.
- [`Identity`](https://github.com/spacedriveapp/spacedrive/blob/518d5836f6585a5f597c3ae5a0d27d084adc0a63/crates/p2p/src/identity.rs#L29) - A public/private keypair which represents the library or node.
- [`RemoteIdentity`](https://github.com/spacedriveapp/spacedrive/blob/518d5836f6585a5f597c3ae5a0d27d084adc0a63/crates/p2p/src/identity.rs#L70) - A public key which represents the library or node.
- [`PeerId`](https://docs.rs/libp2p/latest/libp2p/struct.PeerId.html) - The identifier libp2p uses. Can be derived from a `RemoteIdentity`.
### `sd_p2p` crate
The P2P crate was designed from the group up to be modular.
The `P2P` struct is the core of the system, and suprisingly doesn't actually do any P2P functionality. It's a state manager and event bus along with providing a hook system for other components of the P2P system to register themselves.
This modular design helps with separting the concern which helps with comprehending the entire system and makes it easier for testing.
The `sd_p2p` crate provides a hook for:
- `Mdns` - Local network discovery
- `Quic` - Quic transport layer built on top of `libp2p`
#### What are hooks?
A hook is very similar to an actor. It's a component which can be registered with the P2P system.
A hook allows for processing events from the P2P system and also ensures when the P2P system shuts down, the hook is also shutdown.
Their are special hooks called listeners. These are implemented as a superset of a regular hook and are able to create and accept connections.
Subcrates:
- [`sd_p2p_block`](https://github.com/spacedriveapp/spacedrive/tree/main/crates/p2p/crates/block) - Block protocol based on [SyncThing Block Exchange Protocol v1](https://docs.syncthing.net/specs/bep-v1.html)
- [`sd_p2p_proto`](https://github.com/spacedriveapp/spacedrive/tree/main/crates/p2p/crates/proto) - Utilities for zero fluff encoding and decoding.
- [`sd_p2p_tunnel`](https://github.com/spacedriveapp/spacedrive/tree/main/crates/p2p/crates/tunnel) - Encrypt a stream of data between two nodes
#### `sd_p2p_proto`
This crate provides utilities for implementing asynchronous deserializers and matching synchronous serializers. The goal of these implementations is to really quickly send and receive Rust structs over the network.
This crate allows for creating implementations faster than other common options, at the cost of some developer experience.
Before building this I originally compared the performance of both [msgpack](https://docs.rs/rmp-serde) and [bincode](https://docs.rs/bincode) against manual implementations using `AsyncRead` and I found that over the network using asynchronous deserialization was faster.
This makes logically makes sense as if you want to use a synchronous serializer you will do the following:
- Send the total length of the message
- Allocate a buffer for the message
- Wait asynchronously for the buffer to be filled
- Synchronously copy from the buffer into each of the struct fields
When using an asynchronous serializer you can skip sending the messages length and allocating the intermediate buffer as we can rely on the known length of each field while decoding and this is a win for performance and memory usage.
This crate provides utilities to make the implementations less error prone, however long term it would be great to replace this with a derive macro similar to how crates like [serde](https://serde.rs) work.
From my research no crate exists that meets these requirements. It is also a difficult problem because your juggling lifetimes and async which is rough. I attempted an implementation called [binario](https://github.com/oscartbeaumont/binario), however it is still incomplete so we never adopted it. I suspect Rust's recent stablisation of [RPITIT](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html) would make this much easier.
### Local Network Discovery
Our local network discovery uses [DNS-Based Service Discovery](https://www.rfc-editor.org/rfc/rfc6763.html) which itself is built around [Multicast DNS (mDNS)](https://datatracker.ietf.org/doc/html/rfc6762). This is a really well established technology and is used in [Spotify Connect](https://support.spotify.com/au/article/spotify-connect/), [Apple Airplay](https://www.apple.com/au/airplay/) and many other services you use every day.
#### Service Structure
The following is an example of what would be broadcast from a single Spacedrive node:
```toml
# {remote_identity_of_self}._sd._udp.local.
name=Oscars Laptop # Shown to the user to select a device
operating_system=macos # Used for UI purposes
device_model=MacBook Pro # Used for UI purposes
version=0.0.1 # Spacedrive version
# For each library that's active on the Spacedrive node:
# {library_uuid}={remote_identity_of_self}
d66ed0c3-03ac-4f9b-a374-a927830dfd5b=0l9vTOWu+5aJs0cyWxdfJEGtloEepGRAXcEuDeTDRPk
```
Within `sd-core` this is defined in two parts. The [`PeerMetadata` struct](https://github.com/spacedriveapp/spacedrive/blob/44478207e72495b3777e294660d78939711b544f/core/src/p2p/metadata.rs#L9) takes care of the node metadata and libraries are inserted by the [`libraries_hook`](https://github.com/spacedriveapp/spacedrive/blob/44478207e72495b3777e294660d78939711b544f/core/src/p2p/libraries.rs#L13).
#### Modes
<Notice
type="note"
text="This section discusses 'Contacts Only' which is not yet fully implemented (refer to ENG-1197)."
/>
Within Spacedrive's settings the user is able to choose between three modes for local network discovery:
- **Contacts only**: Only devices that are in your contacts list will be able to see your device.
- **Enabled**: All devices on the local network will be able to see your device.
- **Disabled**: No devices on the local network will be able to see your device.
**Enabled** and **Disabled** are implemented by spawning and shutting down the [`sd_p2p::Mdns`](https://github.com/spacedriveapp/spacedrive/blob/44478207e72495b3777e294660d78939711b544f/crates/p2p/src/mdns.rs#L17) service as required within `sd-core`.
**Contacts only** the mDNS service will not contain the [`PeerMetadata`](https://github.com/spacedriveapp/spacedrive/blob/44478207e72495b3777e294660d78939711b544f/core/src/p2p/metadata.rs#L9) fields and instead will contain a hash of the users Spacedrive identifier. If a Spacedrive node detects another node in the local network with a hash in it's contacts, it can make a request to the node and if the remote node also has the current node in it's contacts, it will respond with the full metadata.
#### Implementation
We make use of the [mdns-sd](https://docs.rs/mdns-sd) crate.
### Manual connection
The user can manually provide a set of [`SocketAddr`](https://doc.rust-lang.org/std/net/enum.SocketAddr.html)'s and the P2P system to attempt to connect to.
This feature primarily exists for usage in combination with Docker but it could be useful for working around difficult network setups.
#### Implementation
TODO - TODO
#### Problems with Docker
TODO - MDNS daemon
TODO - Docker and why it's a pain mDNS. Explain the current stuff i've done with it.
### Transport layer
TODO - Quic
TODO - Explain authentication
### Relay
TODO
### Direction Connect via Relay
TODO
#### Authentication
TODO - How we gonna restrict this???
#### Billing
TODO - How we gonna bill for this???
### Design Decisions
TODO
### Things I would do differently?
TODO
### Crates
TODO
#### Security
##### Threat model
TODO - Risks of sharing IP's using discovery, risks of compromised relay, risks of compromised local network during pairing
##### Authentication
TODO
##### Authorization
TODO
##### Tracking
TODO - Link to Apple stuff
#### Version compatibility and breaking changes
TODO - Compatibility across versions of Spacedrive
#### libp2p
TODO - Why libp2p fork?, Why libp2p can be problematic for what we do
TODO - How we transpose our certificates to libp2p certificates
#### Major issues
TODO - mDNS issues on Linux
TODO - The double up of service discovery when using local and relay
TODO - Question? Why does remote_identity_of_self show up in metadata and the mDNS record itself.
{/* TODO */}
TODO - Request flow. Eg. incoming goes from Quic to mpsc to the users code, to the handlers.
TODO - Resumable uploads/transfers