{{template "views/partials/head" .}}
Enable peer-to-peer distribution to scale your AI workloads across multiple devices. Share instances, shard models, and pool computational resources across your network.
Load balance across multiple instances
Split large models across workers
Pool resources from multiple devices
Start LocalAI with P2P enabled
local-ai run --p2p
This will automatically generate a network token for you.
Or use an existing token
export TOKEN="your-token-here"
local-ai run --p2p
If you already have a token from another instance, you can reuse it.
Access the P2P dashboard
Once enabled, refresh this page to see your network token and start connecting nodes.
LocalAI leverages cutting-edge peer-to-peer technologies to distribute AI workloads intelligently across your network
Share complete LocalAI instances across your network for load balancing and redundancy. Perfect for scaling across multiple devices.
Split large model weights across multiple workers. Currently supported with llama.cpp backends for efficient memory usage.
Pool computational resources from multiple devices, including your friends' machines, to handle larger workloads collaboratively.
Parallel processing
Add more nodes
Fault tolerant
Resource optimization
{{.P2PToken}}
The network token can be used to either share the instance or join a federation or a worker network. Below you will find examples on how to start a new instance or a worker with this token.
Instance sharing
nodes
Model sharding
workers
Connection token
Instance load balancing and sharing
Start LocalAI in federated mode to share your instance, or launch a federated server to distribute requests intelligently across multiple nodes in your network.
No nodes available
Start some workers to see them here
Distributed model computation (llama.cpp)
Deploy llama.cpp workers to split model weights across multiple devices. This enables processing larger models by distributing computational load and memory requirements.
No workers available
Start some workers to see them here