New nodes in a p2p network often make their initial connection to the p2p network through a set of nodes known as boot nodes. Information (e.g. addresses) about these boot nodes is e.g. embedded in an application binary or provided as a configuration option.
The boot nodes serve as an entry point, providing a list of other nodes in the network to newcomers. After connecting to the boot nodes, the new node can connect to those other nodes in the network, thereby no longer relying on the boot nodes.
A means of establishing communication between peers who are unable to communicate directly, with the assistance of a third peer willing and able to act as an intermediary.
In many real-world peer-to-peer networks, direct communication between all peers may be impossible for a variety of reasons. For example, one or more peers may be behind a firewall or have NAT traversal issues. Or maybe the peers don’t share any common transports.
In such cases, it’s possible to “bridge the gap” between peers, so long as each of them are capable of establishing a connection to a willing relay peer. If I only speak TCP and you only speak websockets, we can still hang out with the help of a bilingual pal.
Circuit relay is implemented in libp2p according to the relay spec, which defines a wire protocol and addressing scheme for relayed connections.
Client / Server
A network architecture defined by the presence of central “server” programs which provide services and resources to a (usually much larger) set of “client” programs. Typically clients do not communicate directly with one another, instead routing all communications through the server, which is inherently the most privileged member of the network.
A distributed hash table whose contents are spread throughout a network of participating peers. Much like an in-process hash table, values are associated with a key and can be retrieved by key. Most DHTs assign a portion of the addressable key space to nodes in a deterministic manner, which allows for efficient routing to the node responsible for a given key.
libp2p uses the DHT as the foundation for one of its peer routing implementations, and systems built with libp2p often use the DHT to provide metadata about content, advertise service availability, and more.
A libp2p connection is a communication channel that allows peers to read and write data.
Connections between peers are established via transports, which can be thought of as “connection factories”. For example, the TCP transport allows you to create connections that use TCP/IP as their underlying substrate.
The process of opening a libp2p connection to another peer is known as “dialing”, and accepting connections is known as “listening”. Together, an implementation of dialing and listening forms a transport.
The process of accepting incoming libp2p connections is known as “listening”, and it allows other peers to “dial” up and open network connections to your peer.
multiaddress (often abbreviated
multiaddr), is a convention for encoding multiple layers of addressing information into a single “future-proof” path structure.
/ip4/192.0.2.0/udp/1234 encodes two protocols along with their essential addressing information. The
/ip4/192.0.2.0 informs us that we want the
192.0.2.0 loopback address of the IPv4 protocol, and
/udp/1234 tells us we want to send UDP packets to port
Multiaddresses can be composed to describe multiple “layers” of addresses.
Hashes are central to many systems (git, for example), yet many systems store only the hash output itself, since the choice of hash function is an implicit design parameter of the system. This has the unfortunate effect of making it quite difficult to ever change your mind about what kind of hash function your system uses!
A multihash encodes the type of hash function used to produce the output, as well as the length of the output in bytes. This is added as a two-byte header to the original hash output, and in return for those two bytes, the header allows current and future systems to easily identify and validate many hash functions by leveraging common libraries. As new functions are added, you can much more easily extend your application or protocol to support them, since the old and new hash outputs will be easily distinguishable from one another.
The most prominent use of multihashes in libp2p is in the PeerId, which contains a hash of a peer’s public key. However, systems built with libp2p, most notably IPFS, use multihashes for other purposes. In the IPFS case, multihashes are used both to identify content and other peers, since IPFS uses libp2p and shares the same
In IPFS, multihashes are a key component of the CID, or content identifier, and the “v0” version of CID is a “raw” multihash of a piece of content. A “modern” CID combines a multihash of some content with some compact contextualizing metadata, allowing content-addressed systems like IPFS to create more meaningful links between hash-addressed data. For more on the subject of hash-linked data structures in p2p systems, see IPLD.
Multihashes are often represented as base58-encoded strings, for example,
QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N. The first two characters
Qm are the multihash header for the SHA-256 hash algorithm with a length of 256 bits, and are common to all base58-encoded multihashes using SHA-256.
Multiplexing (or “muxing”), refers to the process of combining multiple streams of communication over a single logical “medium”. For example, we can maintain multiple independent data streams over a single TCP network connection, which is itself of course being multiplexed over a single physical connection (ethernet, wifi, etc).
libp2p supports several implementations of stream multiplexing. The mplex specification defines a simple protocol with implementations in several languages. Other supported multiplexing protocols include yamux and spdy.
See Stream Muxer Implementations for status of multiplexing across libp2p language implementations.
multistream is a lightweight convention for “tagging” streams of binary data with a short header that identifies the content of the stream.
Network address translation in general is the mapping of addresses from one address space to another, as often happens at the boundary of private networks with the global internet. It is especially essential in IPv4 networks (which are still the vast majority), as the address space of IPv4 is quite limited. Using NAT, a local, private network can have a vast range of addresses within the internal network, while only consuming one public IP address from the global pool.
An unfortunate effect of NAT in practice is that it’s much easier to make outgoing connections from the private network to the public one than it is to call from outside in. This is because machines listening for connections on the internal network need to explicitly tell the router in charge of NAT that it should forward traffic for a given port (the multiplexing abstraction for the OS networking layer) to the listening machine.
This is less of an issue in a client / server model, because outgoing connections to the server give the router enough information to route the response back to the client where it needs to go.
In the peer-to-peer model, accepting connections from other peers is often just as important as initiating them, which means that we often need our peers to be publicly reachable from the global internet. There are many viable approaches to NAT Traversal, several of which are implemented in libp2p.
NAT traversal refers to the process of establishing connections with other machines across a NAT boundary. When crossing the boundary between IP networks (e.g. from a local network to the global internet), a Network Address Translation process occurs which maps addresses from one space to another.
For example, my home network has an internal range of IP addresses (10.0.1.x), which is part of a range of addresses that are reserved for private networks. If I start a program on my computer that listens for connections on its internal address, a user from the public internet has no way of reaching me, even if they know my public IP address. This is because I haven’t made my router aware of my program yet. When a connection comes in from the internet to my public IP address, the router needs to figure out which internal IP to route the request to, and to which port.
There are many ways to inform one’s router about services you want to expose. For consumer routers, there’s likely an admin interface that can setup mappings for any range of TCP or UDP ports. In many cases, routers will allow automatic registration of ports using a protocol called upnp, which libp2p supports. If enabled, libp2p will try to register your service with the router for automatic NAT traversal.
In some cases, automatic NAT traversal is impossible, often because multiple layers of NAT are involved. In such cases, we still want to be able to communicate, and we especially want to be reachable and allow other peers to dial in and use our services. This is the one of the motivations for Circuit Relay, which is a protocol involving a “relay” peer that is publicly reachable and can route traffic on behalf of others. Once a relay circuit is established, a peer behind an especially intractable NAT can advertise the relay circuit’s multiaddr, and the relay will accept incoming connections on our behalf and send us traffic via the relay.
The word “node” is quite overloaded in general programming contexts, and this is especially the case in peer-to-peer networking circles.
One common usage is when “node” refers to a single instance of a peer-to-peer software system, running at some time and place in the universe. For example,
I'm running an orbit-db node in AWS. I think it's on version 3.2.0. In this usage, “node” refers to the whole software program (the
daemon in unix-speak) which participates in the network. In this documentation, we’ll often use “peer” for this purpose instead, and the two terms are often used interchangeably in various p2p software discussions.
Many members of our community are excited about graphs in many contexts, so the graph terminology of “nodes and edges” is often used when discussing various subjects. Some common contexts for graph-related discussions:
When discussing the topology or structure of a peer-to-peer network, “node” is often used in the context of a graph of connected peers. Efficient construction and traversal of this graph is key to effective peer routing.
When discussing data structures, “node” is often useful for referring to key elements of the structure. For example, a linked list consists of many “nodes” containing both a value and a link (or, in graph terms, an “edge”) connecting it to the next node. Since many useful and interesting data structures can be described as graphs, much of the terminology of graph theory applies when discussing their properties. In particular, IPFS is naturally well-suited to storing and manipulating data structures which form a Directed Acyclic Graph, or DAG.
An especially interesting data structure for many in our community is IPLD, or Interplanetary Linked Data. Similar to libp2p, IPLD grew out of the real-world needs of IPFS, but is broadly useful and interesting in many contexts outside of IPFS. IPLD discussions often involve “nodes” of all the types discussed here.
An “overlay network” or just “overlay” refers to the logical structure of a peer-to-peer network, which is “overlaid” on top of the underlying transport mechanisms used for lower-level network communication.
Peer-to-peer systems are generally composed of one or more overlay networks, which determine how peers are identified and located, how messages are propagated throughout the system, and other key properties.
A single participant in a peer-to-peer network. While a given peer may support many protocols, it has a single PeerId which it uses to identify itself to other peers. Often used synonymously with node.
A unique, verifiable identifier for a peer that is impossible for another peer to forge or impersonate without trivial detection. In libp2p, peers are identified by their
PeerId, which is both globally unique and allows other peers to obtain the peer’s cryptographic public key.
The most common form of
PeerId is a multihash of a peer’s public key, which can be used to fetch the entire public key from the DHT for encryption or signature verification. There is also experimental support for embedding or “inlining” small public keys directly into the
PeerId, however, this is an area of ongoing discussion and should be treated with caution in production systems until finalized.
An important property of cryptographic peer identities is that they are decoupled from transport, allowing peers to verify the identity of other peers regardless of what underlying network they might use to communicate. This also gives them a much longer “shelf life” than location-based identifiers (for example, IP addresses), since identities remain stable across address changes.
Peer routing is the process of discovering the network “route” or address for a peer in the network, given the peer’s id.
It may also include “ambient” discovery of local peers, for example via multicast DNS.
The primary peer routing mechanism in libp2p uses a distributed hash table to locate peers, taking advantage of the Kademlia routing algorithm to efficiently locate peers.
A peer-to-peer (p2p) network is one in which the participants (referred to as peers or nodes) communicate with one another directly, on more or less “equal footing”. This does not necessarily mean that all peers are identical; some may have different roles in the overall network. However, one of the defining characteristics of a peer-to-peer network is that they do not require a privileged set of “servers” which behave completely differently from their “clients”, as is the case in the predominant client / server model.
In general, refers to “publish / subscribe”, a communication pattern in which participants “subscribe” for updates “published” by other participants, often on a named “topic”.
libp2p defines a pubsub spec, with links to several implementations in supported languages. Pubsub is an area of ongoing research and development, with multiple implementations optimized for different use cases and environments.
In general, a set of rules and data structures used for network communication.
libp2p is comprised of many protocols and makes use of many others provided by the operating system or runtime environment.
Most core libp2p functionality is defined in terms of protocols, and libp2p protocols are identified using multistream headers.
The process of reaching agreement on what protocol to use for a given stream of communication.
In libp2p, protocols are identified using a convention called multistream, which adds a small header to the beginning of a stream containing a unique name, including a version identifier.
When two peers first connect, they exchange a handshake to agree upon what protocols to use.
The implementation of the libp2p handshake is called multistream-select.
For details, see the protocol negotiation article.
TODO: Distinguish between the various types of “stream”. Could refer to
- raw tcp connection
- one component of a multistream connection
- node.js streams / pull-streams
Can refer to a collection of interconnected peers.
In the libp2p codebase, “swarm” may refer to a module that allows a peer to interact with its peers, although this component was later renamed “switch”.
In addition to managing transports, the switch also coordinates the “connection upgrade” process, which promotes a “raw” connection from the transport layer into one that supports protocol negotiation, stream multiplexing, and secure communications.
Sometimes called “swarm” for historical reasons.
In a peer-to-peer context, usually refers to the shape or structure of the overlay network formed by peers as they communicate with each other.
transport refers to the technology that lets us move bits from one machine to another. This may be a TCP network provided by the operating system, a websocket connection in a browser, or anything else capable of implementing the transport interface.